#38002

Committed 06 Feb 2025 06:14AM UTC coverage: 20.322% (-2.4%) from 22.722%

Build # #38002

Build Type

push

local

Committed by

web-flow

Commit Message

bpart: Fully switch to partitioned semantics (#57253)

This is the final PR in the binding partitions series (modulo bugs and
tweaks), i.e. it closes #54654 and thus closes #40399, which was the
original design sketch.

This thus activates the full designed semantics for binding partitions,
in particular allowing safe replacement of const bindings. It in
particular allows struct redefinitions. This thus closes
timholy/Revise.jl#18 and also closes #38584.

The biggest semantic change here is probably that this gets rid of the
notion of "resolvedness" of a binding. Previously, a lot of the behavior
of our implementation depended on when bindings were "resolved", which
could happen at basically an arbitrary point (in the compiler, in REPL
completion, in a different thread), making a lot of the semantics around
bindings ill- or at least implementation-defined. There are several
related issues in the bugtracker, so this closes #14055 closes #44604
closes #46354 closes #30277

It is also the last step to close #24569.
It also supports bindings for undef->defined transitions and thus closes
#53958 closes #54733 - however, this is not activated yet for
performance reasons and may need some further optimization.

Since resolvedness no longer exists, we need to replace it with some
hopefully more well-defined semantics. I will describe the semantics
below, but before I do I will make two notes:

1. There are a number of cases where these semantics will behave
slightly differently than the old semantics absent some other task going
around resolving random bindings.
2. The new behavior (except for the replacement stuff) was generally
permissible under the old semantics if the bindings happened to be
resolved at the right time.

With all that said, there are essentially three "strengths" of bindings:

1. Implicit Bindings: Anything implicitly obtained from `using Mod`, "no
binding", plus slightly more exotic corner cases around conflicts

2. Weakly declared bindin... (continued)

Run Details

11 of 111 new or added lines in 7 files covered. (9.91%)

1273 existing lines in 68 files now uncovered.

9908 of 48755 relevant lines covered (20.32%)

105126.48 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

19.44

/base/char.jl

# This file is a part of Julia. License is MIT: https://julialang.org/license

"""
The `AbstractChar` type is the supertype of all character implementations
in Julia. A character represents a Unicode code point, and can be converted
to an integer via the [`codepoint`](@ref) function in order to obtain the
numerical value of the code point, or constructed from the same integer.
These numerical values determine how characters are compared with `<` and `==`,
for example.  New `T <: AbstractChar` types should define a `codepoint(::T)`
method and a `T(::UInt32)` constructor, at minimum.

A given `AbstractChar` subtype may be capable of representing only a subset
of Unicode, in which case conversion from an unsupported `UInt32` value
may throw an error. Conversely, the built-in [`Char`](@ref) type represents
a *superset* of Unicode (in order to losslessly encode invalid byte streams),
in which case conversion of a non-Unicode value *to* `UInt32` throws an error.
The [`isvalid`](@ref) function can be used to check which codepoints are
representable in a given `AbstractChar` type.

Internally, an `AbstractChar` type may use a variety of encodings.  Conversion
via `codepoint(char)` will not reveal this encoding because it always returns the
Unicode value of the character. `print(io, c)` of any `c::AbstractChar`
produces an encoding determined by `io` (UTF-8 for all built-in `IO`
types), via conversion to `Char` if necessary.

`write(io, c)`, in contrast, may emit an encoding depending on
`typeof(c)`, and `read(io, typeof(c))` should read the same encoding as `write`.
New `AbstractChar` types must provide their own implementations of
`write` and `read`.
"""
AbstractChar

"""
    Char(c::Union{Number,AbstractChar})

`Char` is a 32-bit [`AbstractChar`](@ref) type that is the default representation
of characters in Julia. `Char` is the type used for character literals like `'x'`
and it is also the element type of [`String`](@ref).

In order to losslessly represent arbitrary byte streams stored in a `String`,
a `Char` value may store information that cannot be converted to a Unicode
codepoint — converting such a `Char` to `UInt32` will throw an error.
The [`isvalid(c::Char)`](@ref) function can be used to query whether `c`
represents a valid Unicode character.
"""
Char

@constprop :aggressive (::Type{T})(x::Number) where {T<:AbstractChar} = T(UInt32(x))
@constprop :aggressive AbstractChar(x::Number) = Char(x)
@constprop :aggressive (::Type{T})(x::AbstractChar) where {T<:Union{Number,AbstractChar}} = T(codepoint(x))
@constprop :aggressive (::Type{T})(x::AbstractChar) where {T<:Union{Int32,Int64}} = codepoint(x) % T
(::Type{T})(x::T) where {T<:AbstractChar} = x

"""
    ncodeunits(c::Char) -> Int

Return the number of code units required to encode a character as UTF-8.
This is the number of bytes which will be printed if the character is written
to an output stream, or `ncodeunits(string(c))` but computed efficiently.

!!! compat "Julia 1.1"
    This method requires at least Julia 1.1. In Julia 1.0 consider
    using `ncodeunits(string(c))`.
"""
function ncodeunits(c::Char)
    u = reinterpret(UInt32, c)
    # We care about how many trailing bytes are all zero
    # subtract that from the total number of bytes
    n_nonzero_bytes = sizeof(UInt32) - div(trailing_zeros(u), 0x8)
    # Take care of '\0', which has an all-zero bitpattern
    n_nonzero_bytes + iszero(u)
end

"""
    codepoint(c::AbstractChar) -> Integer

Return the Unicode codepoint (an unsigned integer) corresponding
to the character `c` (or throw an exception if `c` does not represent
a valid character). For `Char`, this is a `UInt32` value, but
`AbstractChar` types that represent only a subset of Unicode may
return a different-sized integer (e.g. `UInt8`).
"""
function codepoint end

@constprop :aggressive codepoint(c::Char) = UInt32(c)

struct InvalidCharError{T<:AbstractChar} <: Exception
    char::T
end
struct CodePointError{T<:Integer} <: Exception
    code::T
end
@noinline throw_invalid_char(c::AbstractChar) = throw(InvalidCharError(c))
@noinline throw_code_point_err(u::Integer) = throw(CodePointError(u))

function ismalformed(c::Char)
    u = bitcast(UInt32, c)
    l1 = leading_ones(u) << 3
    t0 = trailing_zeros(u) & 56
    (l1 == 8) | (l1 + t0 > 32) |
    (((u & 0x00c0c0c0) ⊻ 0x00808080) >> t0 != 0)
end

@inline is_overlong_enc(u::UInt32) = (u >> 24 == 0xc0) | (u >> 24 == 0xc1) | (u >> 21 == 0x0704) | (u >> 20 == 0x0f08)

function isoverlong(c::Char)
    u = bitcast(UInt32, c)
    is_overlong_enc(u)
end

# fallback: other AbstractChar types, by default, are assumed
#           not to support malformed or overlong encodings.

"""
    ismalformed(c::AbstractChar) -> Bool

Return `true` if `c` represents malformed (non-Unicode) data according to the
encoding used by `c`. Defaults to `false` for non-`Char` types.

See also [`show_invalid`](@ref).
"""
ismalformed(c::AbstractChar) = false

"""
    isoverlong(c::AbstractChar) -> Bool

Return `true` if `c` represents an overlong UTF-8 sequence. Defaults
to `false` for non-`Char` types.

See also [`decode_overlong`](@ref) and [`show_invalid`](@ref).
"""
isoverlong(c::AbstractChar) = false

@constprop :aggressive function UInt32(c::Char)
    # TODO: use optimized inline LLVM
    u = bitcast(UInt32, c)
    u < 0x80000000 && return u >> 24
    l1 = leading_ones(u)
    t0 = trailing_zeros(u) & 56
    (l1 == 1) | (8l1 + t0 > 32) |
    ((((u & 0x00c0c0c0) ⊻ 0x00808080) >> t0 != 0) | is_overlong_enc(u)) &&
        throw_invalid_char(c)
    u &= 0xffffffff >> l1
    u >>= t0
    ((u & 0x0000007f) >> 0) | ((u & 0x00007f00) >> 2) |
    ((u & 0x007f0000) >> 4) | ((u & 0x7f000000) >> 6)
end

"""
    decode_overlong(c::AbstractChar) -> Integer

When [`isoverlong(c)`](@ref) is `true`, `decode_overlong(c)` returns
the Unicode codepoint value of `c`. `AbstractChar` implementations
that support overlong encodings should implement `Base.decode_overlong`.
"""
function decode_overlong end

@constprop :aggressive function decode_overlong(c::Char)
    u = bitcast(UInt32, c)
    l1 = leading_ones(u)
    t0 = trailing_zeros(u) & 56
    u &= 0xffffffff >> l1
    u >>= t0
    ((u & 0x0000007f) >> 0) | ((u & 0x00007f00) >> 2) |
    ((u & 0x007f0000) >> 4) | ((u & 0x7f000000) >> 6)
end

@constprop :aggressive function Char(u::UInt32)
    u < 0x80 && return bitcast(Char, u << 24)
    u < 0x00200000 || throw_code_point_err(u)
    c = ((u << 0) & 0x0000003f) | ((u << 2) & 0x00003f00) |
        ((u << 4) & 0x003f0000) | ((u << 6) & 0x3f000000)
    c = u < 0x00000800 ? (c << 16) | 0xc0800000 :
        u < 0x00010000 ? (c << 08) | 0xe0808000 :
                         (c << 00) | 0xf0808080
    bitcast(Char, c)
end

@constprop :aggressive @noinline UInt32_cold(c::Char) = UInt32(c)
@constprop :aggressive function (T::Union{Type{Int8},Type{UInt8}})(c::Char)
    i = bitcast(Int32, c)
    i ≥ 0 ? ((i >>> 24) % T) : T(UInt32_cold(c))
end

@constprop :aggressive @noinline Char_cold(b::UInt32) = Char(b)
@constprop :aggressive function Char(b::Union{Int8,UInt8})
    0 ≤ b ≤ 0x7f ? bitcast(Char, (b % UInt32) << 24) : Char_cold(UInt32(b))
end

convert(::Type{AbstractChar}, x::Number) = Char(x) # default to Char
convert(::Type{T}, x::Number) where {T<:AbstractChar} = T(x)::T
convert(::Type{T}, x::AbstractChar) where {T<:Number} = T(x)::T
convert(::Type{T}, c::AbstractChar) where {T<:AbstractChar} = T(c)::T
convert(::Type{T}, c::T) where {T<:AbstractChar} = c

rem(x::AbstractChar, ::Type{T}) where {T<:Number} = rem(codepoint(x), T)

typemax(::Type{Char}) = bitcast(Char, typemax(UInt32))
typemin(::Type{Char}) = bitcast(Char, typemin(UInt32))

size(c::AbstractChar) = ()
size(c::AbstractChar, d::Integer) = d < 1 ? throw(BoundsError()) : 1
ndims(c::AbstractChar) = 0
ndims(::Type{<:AbstractChar}) = 0
length(c::AbstractChar) = 1
IteratorSize(::Type{Char}) = HasShape{0}()
firstindex(c::AbstractChar) = 1
lastindex(c::AbstractChar) = 1
getindex(c::AbstractChar) = c
getindex(c::AbstractChar, i::Integer) = i == 1 ? c : throw(BoundsError())
getindex(c::AbstractChar, I::Integer...) = all(x -> x == 1, I) ? c : throw(BoundsError())
first(c::AbstractChar) = c
last(c::AbstractChar) = c
eltype(::Type{T}) where {T<:AbstractChar} = T

iterate(c::AbstractChar, done=false) = done ? nothing : (c, true)
isempty(c::AbstractChar) = false
in(x::AbstractChar, y::AbstractChar) = x == y

==(x::Char, y::Char) = bitcast(UInt32, x) == bitcast(UInt32, y)
isless(x::Char, y::Char) = bitcast(UInt32, x) < bitcast(UInt32, y)
hash(x::Char, h::UInt) =
    hash_uint64(((bitcast(UInt32, x) + UInt64(0xd4d64234)) << 32) ⊻ UInt64(h))

first_utf8_byte(c::Char) = (bitcast(UInt32, c) >> 24) % UInt8
first_utf8_byte(c::AbstractChar) = first_utf8_byte(Char(c)::Char)

# fallbacks:
isless(x::AbstractChar, y::AbstractChar) = isless(Char(x), Char(y))
==(x::AbstractChar, y::AbstractChar) = Char(x) == Char(y)
hash(x::AbstractChar, h::UInt) = hash(Char(x), h)
widen(::Type{T}) where {T<:AbstractChar} = T

@inline -(x::AbstractChar, y::AbstractChar) = Int(x) - Int(y)
@inline function -(x::T, y::Integer) where {T<:AbstractChar}
    if x isa Char
        u = Int32((bitcast(UInt32, x) >> 24) % Int8)
        if u >= 0 # inline the runtime fast path
            z = u - y
            return 0 <= z < 0x80 ? bitcast(Char, (z % UInt32) << 24) : Char(UInt32(z))
        end
    end
    return T(Int32(x) - Int32(y))
end
@inline function +(x::T, y::Integer) where {T<:AbstractChar}
    if x isa Char
        u = Int32((bitcast(UInt32, x) >> 24) % Int8)
        if u >= 0 # inline the runtime fast path
            z = u + y
            return 0 <= z < 0x80 ? bitcast(Char, (z % UInt32) << 24) : Char(UInt32(z))
        end
    end
    return T(Int32(x) + Int32(y))
end
@inline +(x::Integer, y::AbstractChar) = y + x

# `print` should output UTF-8 by default for all AbstractChar types.
# (Packages may implement other IO subtypes to specify different encodings.)
# In contrast, `write(io, c)` outputs a `c` in an encoding determined by typeof(c).
print(io::IO, c::Char) = (write(io, c); nothing)
print(io::IO, c::AbstractChar) = print(io, Char(c)) # fallback: convert to output UTF-8

const hex_chars = UInt8['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
                        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
                        'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
                        's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

function show_invalid(io::IO, c::Char)
    write(io, 0x27)
    u = bitcast(UInt32, c)
    while true
        a = hex_chars[((u >> 28) & 0xf) + 1]
        b = hex_chars[((u >> 24) & 0xf) + 1]
        write(io, 0x5c, UInt8('x'), a, b)
        (u <<= 8) == 0 && break
    end
    write(io, 0x27)
end

"""
    show_invalid(io::IO, c::AbstractChar)

Called by `show(io, c)` when [`isoverlong(c)`](@ref) or
[`ismalformed(c)`](@ref) return `true`.   Subclasses
of `AbstractChar` should define `Base.show_invalid` methods
if they support storing invalid character data.
"""
show_invalid

# show c to io, assuming UTF-8 encoded output
function show(io::IO, c::AbstractChar)
    if c <= '\\'
        b = c == '\0' ? 0x30 :
            c == '\a' ? 0x61 :
            c == '\b' ? 0x62 :
            c == '\t' ? 0x74 :
            c == '\n' ? 0x6e :
            c == '\v' ? 0x76 :
            c == '\f' ? 0x66 :
            c == '\r' ? 0x72 :
            c == '\e' ? 0x65 :
            c == '\'' ? 0x27 :
            c == '\\' ? 0x5c : 0xff
        if b != 0xff
            write(io, 0x27, 0x5c, b, 0x27)
            return
        end
    end
    if isoverlong(c) || ismalformed(c)
        show_invalid(io, c)
    elseif isprint(c)
        write(io, 0x27)
        print(io, c) # use print, not write, to use UTF-8 for any AbstractChar
        write(io, 0x27)
    else # unprintable, well-formed, non-overlong Unicode
        u = codepoint(c)
        write(io, 0x27, 0x5c, u <= 0x7f ? 0x78 : u <= 0xffff ? 0x75 : 0x55)
        d = max(2, 8 - (leading_zeros(u) >> 2))
        while 0 < d
            write(io, hex_chars[((u >> ((d -= 1) << 2)) & 0xf) + 1])
        end
        write(io, 0x27)
    end
    return
end

function show(io::IO, ::MIME"text/plain", c::T) where {T<:AbstractChar}
    show(io, c)
    get(io, :compact, false)::Bool && return
    if !ismalformed(c)
        print(io, ": ")
        if isoverlong(c)
            print(io, "[overlong] ")
            u = decode_overlong(c)
            c = T(u)
        else
            u = codepoint(c)
        end
        h = uppercase(string(u, base = 16, pad = 4))
        print(io, (isascii(c) ? "ASCII/" : ""), "Unicode U+", h)
    else
        print(io, ": Malformed UTF-8")
    end
    abr = Unicode.category_abbrev(c)
    str = Unicode.category_string(c)
    print(io, " (category ", abr, ": ", str, ")")
end

1	# This file is a part of Julia. License is MIT: https://julialang.org/license
2
3	"""
4	The `AbstractChar` type is the supertype of all character implementations
5	in Julia. A character represents a Unicode code point, and can be converted
6	to an integer via the [`codepoint`](@ref) function in order to obtain the
7	numerical value of the code point, or constructed from the same integer.
8	These numerical values determine how characters are compared with `<` and `==`,
9	for example. New `T <: AbstractChar` types should define a `codepoint(::T)`
10	method and a `T(::UInt32)` constructor, at minimum.
11
12	A given `AbstractChar` subtype may be capable of representing only a subset
13	of Unicode, in which case conversion from an unsupported `UInt32` value
14	may throw an error. Conversely, the built-in [`Char`](@ref) type represents
15	a superset of Unicode (in order to losslessly encode invalid byte streams),
16	in which case conversion of a non-Unicode value to `UInt32` throws an error.
17	The [`isvalid`](@ref) function can be used to check which codepoints are
18	representable in a given `AbstractChar` type.
19
20	Internally, an `AbstractChar` type may use a variety of encodings. Conversion
21	via `codepoint(char)` will not reveal this encoding because it always returns the
22	Unicode value of the character. `print(io, c)` of any `c::AbstractChar`
23	produces an encoding determined by `io` (UTF-8 for all built-in `IO`
24	types), via conversion to `Char` if necessary.
25
26	`write(io, c)`, in contrast, may emit an encoding depending on
27	`typeof(c)`, and `read(io, typeof(c))` should read the same encoding as `write`.
28	New `AbstractChar` types must provide their own implementations of
29	`write` and `read`.
30	"""
31	AbstractChar
32
33	"""
34	Char(c::Union{Number,AbstractChar})
35
36	`Char` is a 32-bit [`AbstractChar`](@ref) type that is the default representation
37	of characters in Julia. `Char` is the type used for character literals like `'x'`
38	and it is also the element type of [`String`](@ref).
39
40	In order to losslessly represent arbitrary byte streams stored in a `String`,
41	a `Char` value may store information that cannot be converted to a Unicode
42	codepoint — converting such a `Char` to `UInt32` will throw an error.
43	The [`isvalid(c::Char)`](@ref) function can be used to query whether `c`
44	represents a valid Unicode character.
45	"""
46	Char
47
48	@constprop :aggressive (::Type{T})(x::Number) where {T<:AbstractChar} = T(UInt32(x))	×
49	@constprop :aggressive AbstractChar(x::Number) = Char(x)	×
50	@constprop :aggressive (::Type{T})(x::AbstractChar) where {T<:Union{Number,AbstractChar}} = T(codepoint(x))	2✔
51	@constprop :aggressive (::Type{T})(x::AbstractChar) where {T<:Union{Int32,Int64}} = codepoint(x) % T	6✔
52	(::Type{T})(x::T) where {T<:AbstractChar} = x	×
53
54	"""
55	ncodeunits(c::Char) -> Int
56
57	Return the number of code units required to encode a character as UTF-8.
58	This is the number of bytes which will be printed if the character is written
59	to an output stream, or `ncodeunits(string(c))` but computed efficiently.
60
61	!!! compat "Julia 1.1"
62	This method requires at least Julia 1.1. In Julia 1.0 consider
63	using `ncodeunits(string(c))`.
64	"""
65	function ncodeunits(c::Char)
66	u = reinterpret(UInt32, c)	18,715✔
67	# We care about how many trailing bytes are all zero
68	# subtract that from the total number of bytes
69	n_nonzero_bytes = sizeof(UInt32) - div(trailing_zeros(u), 0x8)	18,715✔
70	# Take care of '\0', which has an all-zero bitpattern
71	n_nonzero_bytes + iszero(u)	18,715✔
72	end
73
74	"""
75	codepoint(c::AbstractChar) -> Integer
76
77	Return the Unicode codepoint (an unsigned integer) corresponding
78	to the character `c` (or throw an exception if `c` does not represent
79	a valid character). For `Char`, this is a `UInt32` value, but
80	`AbstractChar` types that represent only a subset of Unicode may
81	return a different-sized integer (e.g. `UInt8`).
82	"""
83	function codepoint end
84
85	@constprop :aggressive codepoint(c::Char) = UInt32(c)	16✔
86
87	struct InvalidCharError{T<:AbstractChar} <: Exception
88	char::T	×
89	end
90	struct CodePointError{T<:Integer} <: Exception
91	code::T	×
92	end
93	@noinline throw_invalid_char(c::AbstractChar) = throw(InvalidCharError(c))	×
94	@noinline throw_code_point_err(u::Integer) = throw(CodePointError(u))	×
95
96	function ismalformed(c::Char)
97	u = bitcast(UInt32, c)	11,412✔
98	l1 = leading_ones(u) << 3	11,412✔
99	t0 = trailing_zeros(u) & 56	11,412✔
100	(l1 == 8) \| (l1 + t0 > 32) \|	11,412✔
101	(((u & 0x00c0c0c0) ⊻ 0x00808080) >> t0 != 0)
102	end
103
104	@inline is_overlong_enc(u::UInt32) = (u >> 24 == 0xc0) \| (u >> 24 == 0xc1) \| (u >> 21 == 0x0704) \| (u >> 20 == 0x0f08)	7,991✔
105
106	function isoverlong(c::Char)
107	u = bitcast(UInt32, c)	7,991✔
108	is_overlong_enc(u)	7,991✔
109	end
110
111	# fallback: other AbstractChar types, by default, are assumed
112	# not to support malformed or overlong encodings.
113
114	"""
115	ismalformed(c::AbstractChar) -> Bool
116
117	Return `true` if `c` represents malformed (non-Unicode) data according to the
118	encoding used by `c`. Defaults to `false` for non-`Char` types.
119
120	See also [`show_invalid`](@ref).
121	"""
122	ismalformed(c::AbstractChar) = false	×
123
124	"""
125	isoverlong(c::AbstractChar) -> Bool
126
127	Return `true` if `c` represents an overlong UTF-8 sequence. Defaults
128	to `false` for non-`Char` types.
129
130	See also [`decode_overlong`](@ref) and [`show_invalid`](@ref).
131	"""
132	isoverlong(c::AbstractChar) = false	×
133
134	@constprop :aggressive function UInt32(c::Char)
135	# TODO: use optimized inline LLVM
136	u = bitcast(UInt32, c)	3,542✔
137	u < 0x80000000 && return u >> 24	3,543✔
138	l1 = leading_ones(u)	×
139	t0 = trailing_zeros(u) & 56	×
140	(l1 == 1) \| (8l1 + t0 > 32) \|	×
141	((((u & 0x00c0c0c0) ⊻ 0x00808080) >> t0 != 0) \| is_overlong_enc(u)) &&
142	throw_invalid_char(c)
143	u &= 0xffffffff >> l1	×
144	u >>= t0	×
145	((u & 0x0000007f) >> 0) \| ((u & 0x00007f00) >> 2) \|	×
146	((u & 0x007f0000) >> 4) \| ((u & 0x7f000000) >> 6)
147	end
148
149	"""
150	decode_overlong(c::AbstractChar) -> Integer
151
152	When [`isoverlong(c)`](@ref) is `true`, `decode_overlong(c)` returns
153	the Unicode codepoint value of `c`. `AbstractChar` implementations
154	that support overlong encodings should implement `Base.decode_overlong`.
155	"""
156	function decode_overlong end
157
158	@constprop :aggressive function decode_overlong(c::Char)	×
159	u = bitcast(UInt32, c)	×
160	l1 = leading_ones(u)	×
161	t0 = trailing_zeros(u) & 56	×
162	u &= 0xffffffff >> l1	×
163	u >>= t0	×
164	((u & 0x0000007f) >> 0) \| ((u & 0x00007f00) >> 2) \|	×
165	((u & 0x007f0000) >> 4) \| ((u & 0x7f000000) >> 6)
166	end
167
168	@constprop :aggressive function Char(u::UInt32)
169	u < 0x80 && return bitcast(Char, u << 24)	×
170	u < 0x00200000 \|\| throw_code_point_err(u)	×
171	c = ((u << 0) & 0x0000003f) \| ((u << 2) & 0x00003f00) \|	×
172	((u << 4) & 0x003f0000) \| ((u << 6) & 0x3f000000)
173	c = u < 0x00000800 ? (c << 16) \| 0xc0800000 :	×
174	u < 0x00010000 ? (c << 08) \| 0xe0808000 :
175	(c << 00) \| 0xf0808080
176	bitcast(Char, c)	×
177	end
178
179	@constprop :aggressive @noinline UInt32_cold(c::Char) = UInt32(c)	×
180	@constprop :aggressive function (T::Union{Type{Int8},Type{UInt8}})(c::Char)	2✔
181	i = bitcast(Int32, c)	306✔
182	i ≥ 0 ? ((i >>> 24) % T) : T(UInt32_cold(c))	306✔
183	end
184
185	@constprop :aggressive @noinline Char_cold(b::UInt32) = Char(b)	×
186	@constprop :aggressive function Char(b::Union{Int8,UInt8})
187	0 ≤ b ≤ 0x7f ? bitcast(Char, (b % UInt32) << 24) : Char_cold(UInt32(b))	130,060✔
188	end
189
190	convert(::Type{AbstractChar}, x::Number) = Char(x) # default to Char	×
191	convert(::Type{T}, x::Number) where {T<:AbstractChar} = T(x)::T	×
192	convert(::Type{T}, x::AbstractChar) where {T<:Number} = T(x)::T	105✔
193	convert(::Type{T}, c::AbstractChar) where {T<:AbstractChar} = T(c)::T	×
194	convert(::Type{T}, c::T) where {T<:AbstractChar} = c	×
195
196	rem(x::AbstractChar, ::Type{T}) where {T<:Number} = rem(codepoint(x), T)	4✔
197
198	typemax(::Type{Char}) = bitcast(Char, typemax(UInt32))	×
199	typemin(::Type{Char}) = bitcast(Char, typemin(UInt32))	×
200
201	size(c::AbstractChar) = ()	×
202	size(c::AbstractChar, d::Integer) = d < 1 ? throw(BoundsError()) : 1	×
203	ndims(c::AbstractChar) = 0	×
204	ndims(::Type{<:AbstractChar}) = 0	×
205	length(c::AbstractChar) = 1	×
206	IteratorSize(::Type{Char}) = HasShape{0}()	×
207	firstindex(c::AbstractChar) = 1	×
208	lastindex(c::AbstractChar) = 1	×
209	getindex(c::AbstractChar) = c	×
210	getindex(c::AbstractChar, i::Integer) = i == 1 ? c : throw(BoundsError())	×
211	getindex(c::AbstractChar, I::Integer...) = all(x -> x == 1, I) ? c : throw(BoundsError())	×
212	first(c::AbstractChar) = c	×
213	last(c::AbstractChar) = c	×
214	eltype(::Type{T}) where {T<:AbstractChar} = T	×
215
216	iterate(c::AbstractChar, done=false) = done ? nothing : (c, true)	×
217	isempty(c::AbstractChar) = false	×
218	in(x::AbstractChar, y::AbstractChar) = x == y	13,449✔
219
220	==(x::Char, y::Char) = bitcast(UInt32, x) == bitcast(UInt32, y)	8,150,844✔
221	isless(x::Char, y::Char) = bitcast(UInt32, x) < bitcast(UInt32, y)	1,921,772✔
222	hash(x::Char, h::UInt) =	2✔
223	hash_uint64(((bitcast(UInt32, x) + UInt64(0xd4d64234)) << 32) ⊻ UInt64(h))
224
225	first_utf8_byte(c::Char) = (bitcast(UInt32, c) >> 24) % UInt8	3,627✔
226	first_utf8_byte(c::AbstractChar) = first_utf8_byte(Char(c)::Char)	×
227
228	# fallbacks:
229	isless(x::AbstractChar, y::AbstractChar) = isless(Char(x), Char(y))	×
230	==(x::AbstractChar, y::AbstractChar) = Char(x) == Char(y)	×
231	hash(x::AbstractChar, h::UInt) = hash(Char(x), h)	×
232	widen(::Type{T}) where {T<:AbstractChar} = T	×
233
234	@inline -(x::AbstractChar, y::AbstractChar) = Int(x) - Int(y)	4✔
235	@inline function -(x::T, y::Integer) where {T<:AbstractChar}
236	if x isa Char	×
UNCOV 237	u = Int32((bitcast(UInt32, x) >> 24) % Int8)	×
UNCOV 238	if u >= 0 # inline the runtime fast path	×
UNCOV 239	z = u - y	×
UNCOV 240	return 0 <= z < 0x80 ? bitcast(Char, (z % UInt32) << 24) : Char(UInt32(z))	×
241	end
242	end
243	return T(Int32(x) - Int32(y))	×
244	end
245	@inline function +(x::T, y::Integer) where {T<:AbstractChar}
246	if x isa Char	×
247	u = Int32((bitcast(UInt32, x) >> 24) % Int8)	×
248	if u >= 0 # inline the runtime fast path	×
249	z = u + y	×
250	return 0 <= z < 0x80 ? bitcast(Char, (z % UInt32) << 24) : Char(UInt32(z))	×
251	end
252	end
253	return T(Int32(x) + Int32(y))	×
254	end
255	@inline +(x::Integer, y::AbstractChar) = y + x	×
256
257	# `print` should output UTF-8 by default for all AbstractChar types.
258	# (Packages may implement other IO subtypes to specify different encodings.)
259	# In contrast, `write(io, c)` outputs a `c` in an encoding determined by typeof(c).
260	print(io::IO, c::Char) = (write(io, c); nothing)	137,672✔
261	print(io::IO, c::AbstractChar) = print(io, Char(c)) # fallback: convert to output UTF-8	×
262
263	const hex_chars = UInt8['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
264	'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
265	'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
266	's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
267
268	function show_invalid(io::IO, c::Char)	×
269	write(io, 0x27)	×
270	u = bitcast(UInt32, c)	×
271	while true	×
272	a = hex_chars[((u >> 28) & 0xf) + 1]	×
273	b = hex_chars[((u >> 24) & 0xf) + 1]	×
274	write(io, 0x5c, UInt8('x'), a, b)	×
275	(u <<= 8) == 0 && break	×
276	end	×
277	write(io, 0x27)	×
278	end
279
280	"""
281	show_invalid(io::IO, c::AbstractChar)
282
283	Called by `show(io, c)` when [`isoverlong(c)`](@ref) or
284	[`ismalformed(c)`](@ref) return `true`. Subclasses
285	of `AbstractChar` should define `Base.show_invalid` methods
286	if they support storing invalid character data.
287	"""
288	show_invalid
289
290	# show c to io, assuming UTF-8 encoded output
291	function show(io::IO, c::AbstractChar)	×
292	if c <= '\\'	×
293	b = c == '\0' ? 0x30 :	×
294	c == '\a' ? 0x61 :
295	c == '\b' ? 0x62 :
296	c == '\t' ? 0x74 :
297	c == '\n' ? 0x6e :
298	c == '\v' ? 0x76 :
299	c == '\f' ? 0x66 :
300	c == '\r' ? 0x72 :
301	c == '\e' ? 0x65 :
302	c == '\'' ? 0x27 :
303	c == '\\' ? 0x5c : 0xff
304	if b != 0xff	×
305	write(io, 0x27, 0x5c, b, 0x27)	×
306	return	×
307	end
308	end
309	if isoverlong(c) \|\| ismalformed(c)	×
310	show_invalid(io, c)	×
311	elseif isprint(c)	×
312	write(io, 0x27)	×
313	print(io, c) # use print, not write, to use UTF-8 for any AbstractChar	×
314	write(io, 0x27)	×
315	else # unprintable, well-formed, non-overlong Unicode
316	u = codepoint(c)	×
317	write(io, 0x27, 0x5c, u <= 0x7f ? 0x78 : u <= 0xffff ? 0x75 : 0x55)	×
318	d = max(2, 8 - (leading_zeros(u) >> 2))	×
319	while 0 < d	×
320	write(io, hex_chars[((u >> ((d -= 1) << 2)) & 0xf) + 1])	×
321	end	×
322	write(io, 0x27)	×
323	end
324	return	×
325	end
326
327	function show(io::IO, ::MIME"text/plain", c::T) where {T<:AbstractChar}	×
328	show(io, c)	×
329	get(io, :compact, false)::Bool && return	×
330	if !ismalformed(c)	×
331	print(io, ": ")	×
332	if isoverlong(c)	×
333	print(io, "[overlong] ")	×
334	u = decode_overlong(c)	×
335	c = T(u)	×
336	else
337	u = codepoint(c)	×
338	end
339	h = uppercase(string(u, base = 16, pad = 4))	×
340	print(io, (isascii(c) ? "ASCII/" : ""), "Unicode U+", h)	×
341	else
342	print(io, ": Malformed UTF-8")	×
343	end
344	abr = Unicode.category_abbrev(c)	×
345	str = Unicode.category_string(c)	×
346	print(io, " (category ", abr, ": ", str, ")")	×
347	end

JuliaLang / julia / #38002

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source File
Press 'n' to go to next uncovered line, 'b' for previous