• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #38002

06 Feb 2025 06:14AM UTC coverage: 20.322% (-2.4%) from 22.722%
#38002

push

local

web-flow
bpart: Fully switch to partitioned semantics (#57253)

This is the final PR in the binding partitions series (modulo bugs and
tweaks), i.e. it closes #54654 and thus closes #40399, which was the
original design sketch.

This thus activates the full designed semantics for binding partitions,
in particular allowing safe replacement of const bindings. It in
particular allows struct redefinitions. This thus closes
timholy/Revise.jl#18 and also closes #38584.

The biggest semantic change here is probably that this gets rid of the
notion of "resolvedness" of a binding. Previously, a lot of the behavior
of our implementation depended on when bindings were "resolved", which
could happen at basically an arbitrary point (in the compiler, in REPL
completion, in a different thread), making a lot of the semantics around
bindings ill- or at least implementation-defined. There are several
related issues in the bugtracker, so this closes #14055 closes #44604
closes #46354 closes #30277

It is also the last step to close #24569.
It also supports bindings for undef->defined transitions and thus closes
#53958 closes #54733 - however, this is not activated yet for
performance reasons and may need some further optimization.

Since resolvedness no longer exists, we need to replace it with some
hopefully more well-defined semantics. I will describe the semantics
below, but before I do I will make two notes:

1. There are a number of cases where these semantics will behave
slightly differently than the old semantics absent some other task going
around resolving random bindings.
2. The new behavior (except for the replacement stuff) was generally
permissible under the old semantics if the bindings happened to be
resolved at the right time.

With all that said, there are essentially three "strengths" of bindings:

1. Implicit Bindings: Anything implicitly obtained from `using Mod`, "no
binding", plus slightly more exotic corner cases around conflicts

2. Weakly declared bindin... (continued)

11 of 111 new or added lines in 7 files covered. (9.91%)

1273 existing lines in 68 files now uncovered.

9908 of 48755 relevant lines covered (20.32%)

105126.48 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

19.44
/base/char.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractChar` type is the supertype of all character implementations
5
in Julia. A character represents a Unicode code point, and can be converted
6
to an integer via the [`codepoint`](@ref) function in order to obtain the
7
numerical value of the code point, or constructed from the same integer.
8
These numerical values determine how characters are compared with `<` and `==`,
9
for example.  New `T <: AbstractChar` types should define a `codepoint(::T)`
10
method and a `T(::UInt32)` constructor, at minimum.
11

12
A given `AbstractChar` subtype may be capable of representing only a subset
13
of Unicode, in which case conversion from an unsupported `UInt32` value
14
may throw an error. Conversely, the built-in [`Char`](@ref) type represents
15
a *superset* of Unicode (in order to losslessly encode invalid byte streams),
16
in which case conversion of a non-Unicode value *to* `UInt32` throws an error.
17
The [`isvalid`](@ref) function can be used to check which codepoints are
18
representable in a given `AbstractChar` type.
19

20
Internally, an `AbstractChar` type may use a variety of encodings.  Conversion
21
via `codepoint(char)` will not reveal this encoding because it always returns the
22
Unicode value of the character. `print(io, c)` of any `c::AbstractChar`
23
produces an encoding determined by `io` (UTF-8 for all built-in `IO`
24
types), via conversion to `Char` if necessary.
25

26
`write(io, c)`, in contrast, may emit an encoding depending on
27
`typeof(c)`, and `read(io, typeof(c))` should read the same encoding as `write`.
28
New `AbstractChar` types must provide their own implementations of
29
`write` and `read`.
30
"""
31
AbstractChar
32

33
"""
34
    Char(c::Union{Number,AbstractChar})
35

36
`Char` is a 32-bit [`AbstractChar`](@ref) type that is the default representation
37
of characters in Julia. `Char` is the type used for character literals like `'x'`
38
and it is also the element type of [`String`](@ref).
39

40
In order to losslessly represent arbitrary byte streams stored in a `String`,
41
a `Char` value may store information that cannot be converted to a Unicode
42
codepoint — converting such a `Char` to `UInt32` will throw an error.
43
The [`isvalid(c::Char)`](@ref) function can be used to query whether `c`
44
represents a valid Unicode character.
45
"""
46
Char
47

48
@constprop :aggressive (::Type{T})(x::Number) where {T<:AbstractChar} = T(UInt32(x))
×
49
@constprop :aggressive AbstractChar(x::Number) = Char(x)
×
50
@constprop :aggressive (::Type{T})(x::AbstractChar) where {T<:Union{Number,AbstractChar}} = T(codepoint(x))
2✔
51
@constprop :aggressive (::Type{T})(x::AbstractChar) where {T<:Union{Int32,Int64}} = codepoint(x) % T
6✔
52
(::Type{T})(x::T) where {T<:AbstractChar} = x
×
53

54
"""
55
    ncodeunits(c::Char) -> Int
56

57
Return the number of code units required to encode a character as UTF-8.
58
This is the number of bytes which will be printed if the character is written
59
to an output stream, or `ncodeunits(string(c))` but computed efficiently.
60

61
!!! compat "Julia 1.1"
62
    This method requires at least Julia 1.1. In Julia 1.0 consider
63
    using `ncodeunits(string(c))`.
64
"""
65
function ncodeunits(c::Char)
66
    u = reinterpret(UInt32, c)
18,715✔
67
    # We care about how many trailing bytes are all zero
68
    # subtract that from the total number of bytes
69
    n_nonzero_bytes = sizeof(UInt32) - div(trailing_zeros(u), 0x8)
18,715✔
70
    # Take care of '\0', which has an all-zero bitpattern
71
    n_nonzero_bytes + iszero(u)
18,715✔
72
end
73

74
"""
75
    codepoint(c::AbstractChar) -> Integer
76

77
Return the Unicode codepoint (an unsigned integer) corresponding
78
to the character `c` (or throw an exception if `c` does not represent
79
a valid character). For `Char`, this is a `UInt32` value, but
80
`AbstractChar` types that represent only a subset of Unicode may
81
return a different-sized integer (e.g. `UInt8`).
82
"""
83
function codepoint end
84

85
@constprop :aggressive codepoint(c::Char) = UInt32(c)
16✔
86

87
struct InvalidCharError{T<:AbstractChar} <: Exception
88
    char::T
×
89
end
90
struct CodePointError{T<:Integer} <: Exception
91
    code::T
×
92
end
93
@noinline throw_invalid_char(c::AbstractChar) = throw(InvalidCharError(c))
×
94
@noinline throw_code_point_err(u::Integer) = throw(CodePointError(u))
×
95

96
function ismalformed(c::Char)
97
    u = bitcast(UInt32, c)
11,412✔
98
    l1 = leading_ones(u) << 3
11,412✔
99
    t0 = trailing_zeros(u) & 56
11,412✔
100
    (l1 == 8) | (l1 + t0 > 32) |
11,412✔
101
    (((u & 0x00c0c0c0) ⊻ 0x00808080) >> t0 != 0)
102
end
103

104
@inline is_overlong_enc(u::UInt32) = (u >> 24 == 0xc0) | (u >> 24 == 0xc1) | (u >> 21 == 0x0704) | (u >> 20 == 0x0f08)
7,991✔
105

106
function isoverlong(c::Char)
107
    u = bitcast(UInt32, c)
7,991✔
108
    is_overlong_enc(u)
7,991✔
109
end
110

111
# fallback: other AbstractChar types, by default, are assumed
112
#           not to support malformed or overlong encodings.
113

114
"""
115
    ismalformed(c::AbstractChar) -> Bool
116

117
Return `true` if `c` represents malformed (non-Unicode) data according to the
118
encoding used by `c`. Defaults to `false` for non-`Char` types.
119

120
See also [`show_invalid`](@ref).
121
"""
122
ismalformed(c::AbstractChar) = false
×
123

124
"""
125
    isoverlong(c::AbstractChar) -> Bool
126

127
Return `true` if `c` represents an overlong UTF-8 sequence. Defaults
128
to `false` for non-`Char` types.
129

130
See also [`decode_overlong`](@ref) and [`show_invalid`](@ref).
131
"""
132
isoverlong(c::AbstractChar) = false
×
133

134
@constprop :aggressive function UInt32(c::Char)
135
    # TODO: use optimized inline LLVM
136
    u = bitcast(UInt32, c)
3,542✔
137
    u < 0x80000000 && return u >> 24
3,543✔
138
    l1 = leading_ones(u)
×
139
    t0 = trailing_zeros(u) & 56
×
140
    (l1 == 1) | (8l1 + t0 > 32) |
×
141
    ((((u & 0x00c0c0c0) ⊻ 0x00808080) >> t0 != 0) | is_overlong_enc(u)) &&
142
        throw_invalid_char(c)
143
    u &= 0xffffffff >> l1
×
144
    u >>= t0
×
145
    ((u & 0x0000007f) >> 0) | ((u & 0x00007f00) >> 2) |
×
146
    ((u & 0x007f0000) >> 4) | ((u & 0x7f000000) >> 6)
147
end
148

149
"""
150
    decode_overlong(c::AbstractChar) -> Integer
151

152
When [`isoverlong(c)`](@ref) is `true`, `decode_overlong(c)` returns
153
the Unicode codepoint value of `c`. `AbstractChar` implementations
154
that support overlong encodings should implement `Base.decode_overlong`.
155
"""
156
function decode_overlong end
157

158
@constprop :aggressive function decode_overlong(c::Char)
×
159
    u = bitcast(UInt32, c)
×
160
    l1 = leading_ones(u)
×
161
    t0 = trailing_zeros(u) & 56
×
162
    u &= 0xffffffff >> l1
×
163
    u >>= t0
×
164
    ((u & 0x0000007f) >> 0) | ((u & 0x00007f00) >> 2) |
×
165
    ((u & 0x007f0000) >> 4) | ((u & 0x7f000000) >> 6)
166
end
167

168
@constprop :aggressive function Char(u::UInt32)
169
    u < 0x80 && return bitcast(Char, u << 24)
×
170
    u < 0x00200000 || throw_code_point_err(u)
×
171
    c = ((u << 0) & 0x0000003f) | ((u << 2) & 0x00003f00) |
×
172
        ((u << 4) & 0x003f0000) | ((u << 6) & 0x3f000000)
173
    c = u < 0x00000800 ? (c << 16) | 0xc0800000 :
×
174
        u < 0x00010000 ? (c << 08) | 0xe0808000 :
175
                         (c << 00) | 0xf0808080
176
    bitcast(Char, c)
×
177
end
178

179
@constprop :aggressive @noinline UInt32_cold(c::Char) = UInt32(c)
×
180
@constprop :aggressive function (T::Union{Type{Int8},Type{UInt8}})(c::Char)
2✔
181
    i = bitcast(Int32, c)
306✔
182
    i ≥ 0 ? ((i >>> 24) % T) : T(UInt32_cold(c))
306✔
183
end
184

185
@constprop :aggressive @noinline Char_cold(b::UInt32) = Char(b)
×
186
@constprop :aggressive function Char(b::Union{Int8,UInt8})
187
    0 ≤ b ≤ 0x7f ? bitcast(Char, (b % UInt32) << 24) : Char_cold(UInt32(b))
130,060✔
188
end
189

190
convert(::Type{AbstractChar}, x::Number) = Char(x) # default to Char
×
191
convert(::Type{T}, x::Number) where {T<:AbstractChar} = T(x)::T
×
192
convert(::Type{T}, x::AbstractChar) where {T<:Number} = T(x)::T
105✔
193
convert(::Type{T}, c::AbstractChar) where {T<:AbstractChar} = T(c)::T
×
194
convert(::Type{T}, c::T) where {T<:AbstractChar} = c
×
195

196
rem(x::AbstractChar, ::Type{T}) where {T<:Number} = rem(codepoint(x), T)
4✔
197

198
typemax(::Type{Char}) = bitcast(Char, typemax(UInt32))
×
199
typemin(::Type{Char}) = bitcast(Char, typemin(UInt32))
×
200

201
size(c::AbstractChar) = ()
×
202
size(c::AbstractChar, d::Integer) = d < 1 ? throw(BoundsError()) : 1
×
203
ndims(c::AbstractChar) = 0
×
204
ndims(::Type{<:AbstractChar}) = 0
×
205
length(c::AbstractChar) = 1
×
206
IteratorSize(::Type{Char}) = HasShape{0}()
×
207
firstindex(c::AbstractChar) = 1
×
208
lastindex(c::AbstractChar) = 1
×
209
getindex(c::AbstractChar) = c
×
210
getindex(c::AbstractChar, i::Integer) = i == 1 ? c : throw(BoundsError())
×
211
getindex(c::AbstractChar, I::Integer...) = all(x -> x == 1, I) ? c : throw(BoundsError())
×
212
first(c::AbstractChar) = c
×
213
last(c::AbstractChar) = c
×
214
eltype(::Type{T}) where {T<:AbstractChar} = T
×
215

216
iterate(c::AbstractChar, done=false) = done ? nothing : (c, true)
×
217
isempty(c::AbstractChar) = false
×
218
in(x::AbstractChar, y::AbstractChar) = x == y
13,449✔
219

220
==(x::Char, y::Char) = bitcast(UInt32, x) == bitcast(UInt32, y)
8,150,844✔
221
isless(x::Char, y::Char) = bitcast(UInt32, x) < bitcast(UInt32, y)
1,921,772✔
222
hash(x::Char, h::UInt) =
2✔
223
    hash_uint64(((bitcast(UInt32, x) + UInt64(0xd4d64234)) << 32) ⊻ UInt64(h))
224

225
first_utf8_byte(c::Char) = (bitcast(UInt32, c) >> 24) % UInt8
3,627✔
226
first_utf8_byte(c::AbstractChar) = first_utf8_byte(Char(c)::Char)
×
227

228
# fallbacks:
229
isless(x::AbstractChar, y::AbstractChar) = isless(Char(x), Char(y))
×
230
==(x::AbstractChar, y::AbstractChar) = Char(x) == Char(y)
×
231
hash(x::AbstractChar, h::UInt) = hash(Char(x), h)
×
232
widen(::Type{T}) where {T<:AbstractChar} = T
×
233

234
@inline -(x::AbstractChar, y::AbstractChar) = Int(x) - Int(y)
4✔
235
@inline function -(x::T, y::Integer) where {T<:AbstractChar}
236
    if x isa Char
×
UNCOV
237
        u = Int32((bitcast(UInt32, x) >> 24) % Int8)
×
UNCOV
238
        if u >= 0 # inline the runtime fast path
×
UNCOV
239
            z = u - y
×
UNCOV
240
            return 0 <= z < 0x80 ? bitcast(Char, (z % UInt32) << 24) : Char(UInt32(z))
×
241
        end
242
    end
243
    return T(Int32(x) - Int32(y))
×
244
end
245
@inline function +(x::T, y::Integer) where {T<:AbstractChar}
246
    if x isa Char
×
247
        u = Int32((bitcast(UInt32, x) >> 24) % Int8)
×
248
        if u >= 0 # inline the runtime fast path
×
249
            z = u + y
×
250
            return 0 <= z < 0x80 ? bitcast(Char, (z % UInt32) << 24) : Char(UInt32(z))
×
251
        end
252
    end
253
    return T(Int32(x) + Int32(y))
×
254
end
255
@inline +(x::Integer, y::AbstractChar) = y + x
×
256

257
# `print` should output UTF-8 by default for all AbstractChar types.
258
# (Packages may implement other IO subtypes to specify different encodings.)
259
# In contrast, `write(io, c)` outputs a `c` in an encoding determined by typeof(c).
260
print(io::IO, c::Char) = (write(io, c); nothing)
137,672✔
261
print(io::IO, c::AbstractChar) = print(io, Char(c)) # fallback: convert to output UTF-8
×
262

263
const hex_chars = UInt8['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
264
                        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
265
                        'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
266
                        's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
267

268
function show_invalid(io::IO, c::Char)
×
269
    write(io, 0x27)
×
270
    u = bitcast(UInt32, c)
×
271
    while true
×
272
        a = hex_chars[((u >> 28) & 0xf) + 1]
×
273
        b = hex_chars[((u >> 24) & 0xf) + 1]
×
274
        write(io, 0x5c, UInt8('x'), a, b)
×
275
        (u <<= 8) == 0 && break
×
276
    end
×
277
    write(io, 0x27)
×
278
end
279

280
"""
281
    show_invalid(io::IO, c::AbstractChar)
282

283
Called by `show(io, c)` when [`isoverlong(c)`](@ref) or
284
[`ismalformed(c)`](@ref) return `true`.   Subclasses
285
of `AbstractChar` should define `Base.show_invalid` methods
286
if they support storing invalid character data.
287
"""
288
show_invalid
289

290
# show c to io, assuming UTF-8 encoded output
291
function show(io::IO, c::AbstractChar)
×
292
    if c <= '\\'
×
293
        b = c == '\0' ? 0x30 :
×
294
            c == '\a' ? 0x61 :
295
            c == '\b' ? 0x62 :
296
            c == '\t' ? 0x74 :
297
            c == '\n' ? 0x6e :
298
            c == '\v' ? 0x76 :
299
            c == '\f' ? 0x66 :
300
            c == '\r' ? 0x72 :
301
            c == '\e' ? 0x65 :
302
            c == '\'' ? 0x27 :
303
            c == '\\' ? 0x5c : 0xff
304
        if b != 0xff
×
305
            write(io, 0x27, 0x5c, b, 0x27)
×
306
            return
×
307
        end
308
    end
309
    if isoverlong(c) || ismalformed(c)
×
310
        show_invalid(io, c)
×
311
    elseif isprint(c)
×
312
        write(io, 0x27)
×
313
        print(io, c) # use print, not write, to use UTF-8 for any AbstractChar
×
314
        write(io, 0x27)
×
315
    else # unprintable, well-formed, non-overlong Unicode
316
        u = codepoint(c)
×
317
        write(io, 0x27, 0x5c, u <= 0x7f ? 0x78 : u <= 0xffff ? 0x75 : 0x55)
×
318
        d = max(2, 8 - (leading_zeros(u) >> 2))
×
319
        while 0 < d
×
320
            write(io, hex_chars[((u >> ((d -= 1) << 2)) & 0xf) + 1])
×
321
        end
×
322
        write(io, 0x27)
×
323
    end
324
    return
×
325
end
326

327
function show(io::IO, ::MIME"text/plain", c::T) where {T<:AbstractChar}
×
328
    show(io, c)
×
329
    get(io, :compact, false)::Bool && return
×
330
    if !ismalformed(c)
×
331
        print(io, ": ")
×
332
        if isoverlong(c)
×
333
            print(io, "[overlong] ")
×
334
            u = decode_overlong(c)
×
335
            c = T(u)
×
336
        else
337
            u = codepoint(c)
×
338
        end
339
        h = uppercase(string(u, base = 16, pad = 4))
×
340
        print(io, (isascii(c) ? "ASCII/" : ""), "Unicode U+", h)
×
341
    else
342
        print(io, ": Malformed UTF-8")
×
343
    end
344
    abr = Unicode.category_abbrev(c)
×
345
    str = Unicode.category_string(c)
×
346
    print(io, " (category ", abr, ": ", str, ")")
×
347
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc