• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / 1558

06 Jun 2026 01:08AM UTC coverage: 77.813% (+0.1%) from 77.697%
1558

push

buildkite

web-flow
scheduler: avoid O(nthreads) wake-storm on every `@spawn` (#61826)

Fixes #61820
Fixes #50425

Linux - Ryzen 9 5950X
<img width="1560" height="720" alt="image"
src="https://github.com/user-attachments/assets/e3f667fb-6fed-46ed-837f-dfd1b8dd925a"
/>

Linux - Ryzen Threadripper PRO 7995WX 96-Cores
<img width="1560" height="720" alt="image"
src="https://github.com/user-attachments/assets/359e75b8-1a7e-4596-99a7-e7e5878f1b4d"
/>


Windows - i7-8700
<img width="1560" height="720" alt="image"
src="https://github.com/user-attachments/assets/1f9f4671-3e9a-45a7-b196-04dd779ed2f3"
/>



macOS - M2 Pro 6 p cores
<img width="1560" height="720" alt="image"
src="https://github.com/user-attachments/assets/0a5f7ea3-d443-4b23-a42b-b4ab3ee9940b"
/>







Developed with Claude:

---

`schedule` for a non-sticky task previously broadcast a wake to every
thread via
`jl_wakeup_thread(-1)`, performing a per-thread lock/signal/unlock under
`wakeup_thread`'s loop. Per-insert cost was linear in `jl_n_threads`,
and on
systems where the producer can be preempted (e.g. SMT + oversubscribed
thread
count on Windows/Linux) every iteration hit the kernel park/unpark path,
producing the >100x slowdown reported in #61820.

Add `jl_wakeup_threadpool(tpid)`, which wakes at most one sleeping
thread in the
target pool, with a round-robin start hint to spread wake load. Workers
re-check
the queue before sleeping (the existing store-buffering dance), so
bursty inserts
naturally wake additional consumers across the per-insert calls without
a
broadcast.

Restricting wakes to the task's own threadpool is also a correctness
improvement,
since `Partr.multiq_deletemin` only ever returns tasks from the caller's
pool --
waking out-of-pool threads was pure overhead.

The round-robin start hint is sharded across 64 cache-padded stripes
indexed by
the producing thread's tid. A single global atomic counter became the
dominant
cost of `@spawn` at high producer counts on multi-die parts (e.g. Ryzen
5950X... (continued)

2 of 2 new or added lines in 1 file covered. (100.0%)

482 existing lines in 17 files now uncovered.

66004 of 84824 relevant lines covered (77.81%)

22990385.8 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

88.51
/base/strings/string.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
    StringView{T <: AbstractVector{UInt8}} <: AbstractString
5

6
An `AbstractString` representation of any `vector` of `UInt8` data,
7
interpreted as UTF-8 encoded Unicode.
8
Similar to `String`, the underlying data may be invalid UTF-8.
9

10
`StringView(v::AbstractVector{UInt8})::StringView` does not make a copy of
11
or modify the `v`. Use `codeunits` to get `v` from the `StringView`.
12
After construction, `v` may be mutated, which will be reflected in
13
the resulting `StringView`.
14

15
!!! compat "Julia 1.14"
16
    The `StringView` type requires at least Julia 1.14.
17

18
# Examples
19
```jldoctest
20
julia> arr = [0x61, 0xf0, 0x63, 0x64];
21

22
julia> s = StringView(arr)
23
"a\\xf0cd"
24

25
julia> codeunits(s) === arr
26
true
27

28
julia> arr[2] = Int('b'); s
29
"abcd"
30
```
31
"""
32
struct StringView{T <: AbstractVector{UInt8}} <: AbstractString
33
    data::T
34

35
    function StringView{T}(data::T) where {T <: AbstractVector{UInt8}}
×
36
        # For now, StringViews code assumes one-based indexing
37
        require_one_based_indexing(data)
×
38

39
        # Prevent someone constructing e.g. a `StringView{AbstractVector{UInt8}}`,
40
        # the existence of which will complicate the implementation and provide
41
        # no usability benefit.
42
        if !isconcretetype(T)
×
43
            throw(ArgumentError("StringView must be parameterized with a concrete type"))
×
44
        end
45

46
        new{T}(data)
×
47
    end
48
end
49

50

51
"""
52
    StringIndexError(str, i)
53

54
An error occurred when trying to access `str` at index `i` that is not valid.
55
"""
56
struct StringIndexError <: Exception
57
    string::AbstractString
10✔
58
    index::Int
59
end
60
@noinline string_index_err((@nospecialize s::AbstractString), i::Integer) =
2✔
61
    throw(StringIndexError(s, Int(i)))
62
function showerror(io::IO, exc::StringIndexError)
8✔
63
    s = exc.string
8✔
64
    print(io, "StringIndexError: ", "invalid index [$(exc.index)]")
8✔
65
    if firstindex(s) <= exc.index <= ncodeunits(s)
8✔
66
        iprev = thisind(s, exc.index)
8✔
67
        inext = nextind(s, iprev)
8✔
68
        escprev = escape_string(s[iprev:iprev])
8✔
69
        if inext <= ncodeunits(s)
8✔
70
            escnext = escape_string(s[inext:inext])
6✔
71
            print(io, ", valid nearby indices [$iprev]=>'$escprev', [$inext]=>'$escnext'")
6✔
72
        else
73
            print(io, ", valid nearby index [$iprev]=>'$escprev'")
2✔
74
        end
75
    end
76
end
77

78
@inline between(b::T, lo::T, hi::T) where {T<:Integer} = (lo ≤ b) & (b ≤ hi)
785,677,706✔
79

80
"""
81
    String <: AbstractString
82

83
The default string type in Julia, used by e.g. string literals.
84

85
`String`s are immutable sequences of `Char`s. A `String` is stored internally as
86
a contiguous byte array, and while they are interpreted as being UTF-8 encoded,
87
they can be composed of any byte sequence. Use [`isvalid`](@ref) to validate
88
that the underlying byte sequence is valid as UTF-8.
89
"""
90
String
91

92
## constructors and conversions ##
93

94
# String constructor docstring from boot.jl, workaround for #16730
95
# and the unavailability of @doc in boot.jl context.
96
"""
97
    String(v::AbstractVector{UInt8})
98

99
Create a new `String` object using the data buffer from byte vector `v`.
100
If `v` is a `Vector{UInt8}` it will be truncated to zero length and future
101
modification of `v` cannot affect the contents of the resulting string.
102
To avoid truncation of `Vector{UInt8}` data, use `String(copy(v))`; for other
103
`AbstractVector` types, `String(v)` already makes a copy.
104

105
When possible, the memory of `v` will be used without copying when the `String`
106
object is created. This is guaranteed to be the case for byte vectors returned
107
by [`take!`](@ref) on a writable [`IOBuffer`](@ref) and by calls to
108
[`read(io, nb)`](@ref). This allows zero-copy conversion of I/O data to strings.
109
In other cases, `Vector{UInt8}` data may be copied, but `v` is truncated anyway
110
to guarantee consistent behavior.
111
"""
112
String(v::AbstractVector{UInt8}) = unsafe_takestring(copyto!(StringMemory(length(v)), v))
13,600,898✔
113

114
function String(v::Vector{UInt8})
4,287✔
115
    len = length(v)
17,381,527✔
116
    len == 0 && return ""
17,381,527✔
117
    ref = v.ref
17,273,598✔
118
    if ref.ptr_or_offset == ref.mem.ptr
17,273,600✔
119
        str = ccall(:jl_genericmemory_to_string, Ref{String}, (Any, Int), ref.mem, len)
17,273,595✔
120
    else
121
        str = ccall(:jl_pchar_to_string, Ref{String}, (Ptr{UInt8}, Int), ref, len)
5✔
122
    end
123
    # optimized empty!(v); sizehint!(v, 0) calls
124
    setfield!(v, :size, (0,))
17,273,600✔
125
    setfield!(v, :ref, memoryref(Memory{UInt8}()))
17,273,598✔
126
    return str
17,273,600✔
127
end
128

129
"""
130
    unsafe_takestring(m::Memory{UInt8})::String
131

132
Create a `String` from `m`, changing the interpretation of the contents of `m`.
133
This is done without copying, if possible. Thus, any access to `m` after
134
calling this function, either to read or to write, is undefined behavior.
135
"""
136
function unsafe_takestring(m::Memory{UInt8})
137
    isempty(m) ? "" : ccall(:jl_genericmemory_to_string, Ref{String}, (Any, Int), m, length(m))
13,602,683✔
138
end
139

140
"""
141
    takestring!(x) -> String
142

143
Create a string from the content of `x`, emptying `x`.
144

145
# Examples
146
```jldoctest
147
julia> v = [0x61, 0x62, 0x63];
148

149
julia> s = takestring!(v)
150
"abc"
151

152
julia> isempty(v)
153
true
154
```
155
"""
156
takestring!(v::Vector{UInt8}) = String(v)
×
157

158
"""
159
    unsafe_string(p::Ptr{UInt8}, [length::Integer])
160
    unsafe_string(p::Cstring)
161

162
Copy a string from the address of a C-style (NUL-terminated) string encoded as UTF-8.
163
(The pointer can be safely freed afterwards.) If `length` is specified
164
(the length of the data in bytes), the string does not have to be NUL-terminated.
165

166
This function is labeled "unsafe" because it will crash if `p` is not
167
a valid memory address to data of the requested length.
168
"""
169
function unsafe_string(p::Union{Ptr{UInt8},Ptr{Int8}}, len::Integer)
993✔
170
    p == C_NULL && throw(ArgumentError("cannot convert NULL to string"))
5,417,890✔
171
    ccall(:jl_pchar_to_string, Ref{String}, (Ptr{UInt8}, Int), p, len)
5,417,888✔
172
end
173
function unsafe_string(p::Union{Ptr{UInt8},Ptr{Int8}})
2,454✔
174
    p == C_NULL && throw(ArgumentError("cannot convert NULL to string"))
6,947,234✔
175
    ccall(:jl_cstr_to_string, Ref{String}, (Ptr{UInt8},), p)
6,947,234✔
176
end
177

178
# This is `@assume_effects :total !:consistent @ccall jl_alloc_string(n::Csize_t)::Ref{String}`,
179
# but the macro is not available at this time in bootstrap, so we write it manually.
180
const _string_n_override = 0x04ee
181
@eval _string_n(n::Integer) = $(Expr(:foreigncall, QuoteNode(:jl_alloc_string), Ref{String},
138,960,718✔
182
    :(Core.svec(Csize_t)), 1, QuoteNode((:ccall, _string_n_override, false)), :(convert(Csize_t, n))))
183

184
"""
185
    String(s::AbstractString)
186

187
Create a new `String` from an existing `AbstractString`.
188
"""
189
String(s::AbstractString) = print_to_string(s)
916✔
190
@assume_effects :total String(s::Symbol) = unsafe_string(unsafe_convert(Ptr{UInt8}, s))
6,397,065✔
191

192
unsafe_wrap(::Type{Memory{UInt8}}, s::String) = ccall(:jl_string_to_genericmemory, Ref{Memory{UInt8}}, (Any,), s)
23,455,686✔
193
unsafe_wrap(::Type{Vector{UInt8}}, s::String) = wrap(Array, unsafe_wrap(Memory{UInt8}, s))
108,171✔
194

195
Vector{UInt8}(s::CodeUnits{UInt8,String}) = copyto!(Vector{UInt8}(undef, length(s)), s)
41,590✔
196
Vector{UInt8}(s::String) = Vector{UInt8}(codeunits(s))
41,550✔
197
Array{UInt8}(s::String)  = Vector{UInt8}(codeunits(s))
×
198

199
String(s::CodeUnits{UInt8,String}) = s.s
2✔
200

201
## low-level functions ##
202

203
pointer(s::String) = unsafe_convert(Ptr{UInt8}, s)
1,475,027,161✔
204
pointer(s::String, i::Integer) = pointer(s) + Int(i)::Int - 1
738,111,727✔
205

206
ncodeunits(s::String) = Core.sizeof(s)
1,084,222,010✔
207
codeunit(s::String) = UInt8
2,181,044✔
208

209
codeunit(s::String, i::Integer) = codeunit(s, Int(i)::Int)
4✔
210
@assume_effects :foldable @inline function codeunit(s::String, i::Int)
20,168✔
211
    @boundscheck checkbounds(s, i)
724,334,795✔
212
    b = GC.@preserve s unsafe_load(pointer(s, i))
724,334,819✔
213
    return b
719,517,575✔
214
end
215

216
## comparison ##
217

218
@assume_effects :total _memcmp(a::String, b::String) = @invoke _memcmp(a::Union{Ptr{UInt8},AbstractString},b::Union{Ptr{UInt8},AbstractString})
617,485✔
219

220
_memcmp(a::Union{Ptr{UInt8},AbstractString}, b::Union{Ptr{UInt8},AbstractString}) = _memcmp(a, b, min(sizeof(a), sizeof(b)))
1,351,713✔
221
function _memcmp(a::Union{Ptr{UInt8},AbstractString}, b::Union{Ptr{UInt8},AbstractString}, len::Int)
35✔
222
    GC.@preserve a b begin
1,565,850✔
223
        pa = unsafe_convert(Ptr{UInt8}, a)
1,565,850✔
224
        pb = unsafe_convert(Ptr{UInt8}, b)
1,565,850✔
225
        memcmp(pa, pb, len % Csize_t) % Int
1,565,850✔
226
    end
227
end
228

229
function cmp(a::String, b::String)
2✔
230
    al, bl = sizeof(a), sizeof(b)
617,485✔
231
    c = _memcmp(a, b)
617,485✔
232
    return c < 0 ? -1 : c > 0 ? +1 : cmp(al,bl)
903,603✔
233
end
234

235
==(a::String, b::String) = a===b
33,593,075✔
236

237
typemin(::Type{String}) = ""
×
238
typemin(::String) = typemin(String)
×
239

240
## thisind, nextind ##
241

242
@propagate_inbounds thisind(s::String, i::Int) = _thisind_str(s, i)
148,929,726✔
243

244
# nothrow: i == ncodeunits(s) always satisfies the bounds check inside _thisind_str
245
# (it short-circuits when i == 0, otherwise 1 ≤ i ≤ n).
246
@assume_effects :nothrow lastindex(s::String) = thisind(s, ncodeunits(s)::Int)
31,615,269✔
247

248
# s should be String, StringView, or SubString{String}
249
@inline function _thisind_str(s, i::Int)
8,449✔
250
    i == 0 && return 0
74,973,095✔
251
    n = ncodeunits(s)
74,856,273✔
252
    i == n + 1 && return i
74,856,273✔
253
    @boundscheck between(i, 1, n) || throw(BoundsError(s, i))
74,856,259✔
254
    @inbounds b = codeunit(s, i)
74,856,259✔
255
    (b & 0xc0 == 0x80) & (i-1 > 0) || return i
140,226,394✔
256
    (@noinline function _thisind_continued(s, i, n) # mark the rest of the function as a slow-path
8,865,522✔
257
        local b
×
258
        @inbounds b = codeunit(s, i-1)
196,401✔
259
        between(b, 0b11000000, 0b11110111) && return i-1
196,401✔
260
        (b & 0xc0 == 0x80) & (i-2 > 0) || return i
129,636✔
261
        @inbounds b = codeunit(s, i-2)
129,636✔
262
        between(b, 0b11100000, 0b11110111) && return i-2
129,636✔
263
        (b & 0xc0 == 0x80) & (i-3 > 0) || return i
18✔
264
        @inbounds b = codeunit(s, i-3)
18✔
265
        between(b, 0b11110000, 0b11110111) && return i-3
18✔
266
        return i
×
267
    end)(s, i, n)
268
end
269

270
@propagate_inbounds nextind(s::String, i::Int) = _nextind_str(s, i)
92,550,911✔
271

272
# s should be String or SubString{String}
273
@inline function _nextind_str(s, i::Int)
5,121✔
274
    i == 0 && return 1
128,193,782✔
275
    n = ncodeunits(s)
128,182,291✔
276
    @boundscheck between(i, 1, n) || throw(BoundsError(s, i))
128,182,291✔
277
    @inbounds l = codeunit(s, i)
128,182,291✔
278
    between(l, 0x80, 0xf7) || return i+1
256,131,232✔
279
    (@noinline function _nextind_continued(s, i, n, l) # mark the rest of the function as a slow-path
296,522✔
280
        if l < 0xc0
63,942✔
281
            # handle invalid codeunit index by scanning back to the start of this index
282
            # (which may be the same as this index)
UNCOV
283
            i′ = @inbounds thisind(s, i)
×
UNCOV
284
            i′ >= i && return i+1
×
UNCOV
285
            i = i′
×
UNCOV
286
            @inbounds l = codeunit(s, i)
×
UNCOV
287
            (l < 0x80) | (0xf8 ≤ l) && return i+1
×
UNCOV
288
            @assert l >= 0xc0 "invalid codeunit"
×
289
        end
290
        # first continuation byte
291
        (i += 1) > n && return i
63,942✔
292
        @inbounds b = codeunit(s, i)
63,942✔
293
        b & 0xc0 ≠ 0x80 && return i
63,942✔
294
        ((i += 1) > n) | (l < 0xe0) && return i
63,942✔
295
        # second continuation byte
296
        @inbounds b = codeunit(s, i)
63,892✔
297
        b & 0xc0 ≠ 0x80 && return i
63,892✔
298
        ((i += 1) > n) | (l < 0xf0) && return i
63,892✔
299
        # third continuation byte
UNCOV
300
        @inbounds b = codeunit(s, i)
×
UNCOV
301
        return ifelse(b & 0xc0 ≠ 0x80, i, i+1)
×
302
    end)(s, i, n, l)
303
end
304

305
## checking UTF-8 & ASCII validity ##
306
#=
307
    The UTF-8 Validation is performed by a shift based DFA.
308
    ┌───────────────────────────────────────────────────────────────────┐
309
    │    UTF-8 DFA State Diagram    ┌──────────────2──────────────┐     │
310
    │                               ├────────3────────┐           │     │
311
    │                 ┌──────────┐  │     ┌─┐        ┌▼┐          │     │
312
    │      ASCII      │  UTF-8   │  ├─5──►│9├───1────► │          │     │
313
    │                 │          │  │     ├─┤        │ │         ┌▼┐    │
314
    │                 │  ┌─0─┐   │  ├─6──►│8├─1,7,9──►4├──1,7,9──► │    │
315
    │      ┌─0─┐      │  │   │   │  │     ├─┤        │ │         │ │    │
316
    │      │   │      │ ┌▼───┴┐  │  ├─11─►│7├──7,9───► │ ┌───────►3├─┐  │
317
    │     ┌▼───┴┐     │ │     │  ▼  │     └─┘        └─┘ │       │ │ │  │
318
    │     │  0  ├─────┘ │  1  ├─► ──┤                    │  ┌────► │ │  │
319
    │     └─────┘       │     │     │     ┌─┐            │  │    └─┘ │  │
320
    │                   └──▲──┘     ├─10─►│5├─────7──────┘  │        │  │
321
    │                      │        │     ├─┤               │        │  │
322
    │                      │        └─4──►│6├─────1,9───────┘        │  │
323
    │          INVALID     │              └─┘                        │  │
324
    │           ┌─*─┐      └──────────────────1,7,9──────────────────┘  │
325
    │          ┌▼───┴┐                                                  │
326
    │          │  2  ◄─── All undefined transitions result in state 2   │
327
    │          └─────┘                                                  │
328
    └───────────────────────────────────────────────────────────────────┘
329

330
        Validation States
331
            0 -> _UTF8_DFA_ASCII is the start state and will only stay in this state if the string is only ASCII characters
332
                        If the DFA ends in this state the string is ASCII only
333
            1 -> _UTF8_DFA_ACCEPT is the valid complete character state of the DFA once it has encountered a UTF-8 Unicode character
334
            2 -> _UTF8_DFA_INVALID is only reached by invalid bytes and once in this state it will not change
335
                    as seen by all 1s in that column of table below
336
            3 -> One valid continuation byte needed to return to state 0
337
        4,5,6 -> Two valid continuation bytes needed to return to state 0
338
        7,8,9 -> Three valids continuation bytes needed to return to state 0
339

340
                        Current State
341
                    0̲  1̲  2̲  3̲  4̲  5̲  6̲  7̲  8̲  9̲
342
                0 | 0  1  2  2  2  2  2  2  2  2
343
                1 | 2  2  2  1  3  2  3  2  4  4
344
                2 | 3  3  2  2  2  2  2  2  2  2
345
                3 | 4  4  2  2  2  2  2  2  2  2
346
                4 | 6  6  2  2  2  2  2  2  2  2
347
    Character   5 | 9  9  2  2  2  2  2  2  2  2     <- Next State
348
    Class       6 | 8  8  2  2  2  2  2  2  2  2
349
                7 | 2  2  2  1  3  3  2  4  4  2
350
                8 | 2  2  2  2  2  2  2  2  2  2
351
                9 | 2  2  2  1  3  2  3  4  4  2
352
               10 | 5  5  2  2  2  2  2  2  2  2
353
               11 | 7  7  2  2  2  2  2  2  2  2
354

355
           Shifts | 0  4 10 14 18 24  8 20 12 26
356

357
    The shifts that represent each state were derived using the SMT solver Z3, to ensure when encoded into
358
    the rows the correct shift was a result.
359

360
    Each character class row is encoding 10 states with shifts as defined above. By shifting the bitsof a row by
361
    the current state then masking the result with 0x11110 give the shift for the new state
362

363

364
=#
365

366
#State type used by UTF-8 DFA
367
const _UTF8DFAState = UInt32
368
# Fill the table with 256 UInt64 representing the DFA transitions for all bytes
369
const _UTF8_DFA_TABLE = let # let block rather than function doesn't pollute base
370
    num_classes=12
371
    num_states=10
372
    bit_per_state = 6
373

374
    # These shifts were derived using a SMT solver
375
    state_shifts = [0, 4, 10, 14, 18, 24, 8, 20, 12, 26]
376

377
    character_classes = [   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
378
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
379
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
380
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
381
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
382
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
383
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
384
                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
385
                            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
386
                            9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
387
                            7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
388
                            7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
389
                            8, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
390
                            2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
391
                            10, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3,
392
                            11, 6, 6, 6, 5, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 ]
393

394
    # These are the rows discussed in comments above
395
    state_arrays = [ 0  1  2  2  2  2  2  2  2  2;
396
                     2  2  2  1  3  2  3  2  4  4;
397
                     3  3  2  2  2  2  2  2  2  2;
398
                     4  4  2  2  2  2  2  2  2  2;
399
                     6  6  2  2  2  2  2  2  2  2;
400
                     9  9  2  2  2  2  2  2  2  2;
401
                     8  8  2  2  2  2  2  2  2  2;
402
                     2  2  2  1  3  3  2  4  4  2;
403
                     2  2  2  2  2  2  2  2  2  2;
404
                     2  2  2  1  3  2  3  4  4  2;
405
                     5  5  2  2  2  2  2  2  2  2;
406
                     7  7  2  2  2  2  2  2  2  2]
407

408
    #This converts the state_arrays into the shift encoded _UTF8DFAState
409
    class_row = zeros(_UTF8DFAState, num_classes)
410

411
    for i = 1:num_classes
412
        row = _UTF8DFAState(0)
413
        for j in 1:num_states
414
            #Calculate the shift required for the next state
415
            to_shift = UInt8((state_shifts[state_arrays[i,j]+1]) )
416
            #Shift the next state into the position of the current state
417
            row = row | (_UTF8DFAState(to_shift) << state_shifts[j])
418
        end
419
        class_row[i]=row
420
    end
421

422
    map(c->class_row[c+1],character_classes)
×
423
end
424

425

426
const _UTF8_DFA_ASCII = _UTF8DFAState(0) #This state represents the start and end of any valid string
427
const _UTF8_DFA_ACCEPT = _UTF8DFAState(4) #This state represents the start and end of any valid string
428
const _UTF8_DFA_INVALID = _UTF8DFAState(10) # If the state machine is ever in this state just stop
429

430
# The dfa step is broken out so that it may be used in other functions. The mask was calculated to work with state shifts above
431
@inline _utf_dfa_step(state::_UTF8DFAState, byte::UInt8) = @inbounds (_UTF8_DFA_TABLE[byte+1] >> state) & _UTF8DFAState(0x0000001E)
110,699✔
432

433
@inline function _isvalid_utf8_dfa(state::_UTF8DFAState, bytes::AbstractVector{UInt8}, first::Int = firstindex(bytes), last::Int = lastindex(bytes))
10,496✔
434
    for i = first:last
51,861✔
435
       @inbounds state = _utf_dfa_step(state, bytes[i])
110,699✔
436
    end
169,537✔
437
    return (state)
51,861✔
438
end
439

440
@inline function  _find_nonascii_chunk(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
441
    n=first
20✔
442
    while n <= last - chunk_size
80✔
443
        _isascii(cu,n,n+chunk_size-1) || return n
60✔
444
        n += chunk_size
60✔
445
    end
60✔
446
    n= last-chunk_size+1
20✔
447
    _isascii(cu,n,last) || return n
20✔
448
    return nothing
20✔
449
end
450

451
##
452

453
# Classifications of string
454
    # 0: neither valid ASCII nor UTF-8
455
    # 1: valid ASCII
456
    # 2: valid UTF-8
457
 byte_string_classify(s::AbstractString) = byte_string_classify(codeunits(s))
48✔
458

459

460
function byte_string_classify(bytes::AbstractVector{UInt8})
48✔
461
    chunk_size = 1024
41,747✔
462
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
41,747✔
463
    n = length(bytes)
41,747✔
464
    if n > chunk_threshold
41,747✔
465
        start = _find_nonascii_chunk(chunk_size,bytes,1,n)
20✔
466
        isnothing(start) && return 1
20✔
467
    else
468
        _isascii(bytes,1,n) && return 1
41,727✔
469
        start = 1
40,597✔
470
    end
471
    return _byte_string_classify_nonascii(bytes,start,n)
40,597✔
472
end
473

474
function _byte_string_classify_nonascii(bytes::AbstractVector{UInt8}, first::Int, last::Int)
40,597✔
475
    chunk_size = 256
40,597✔
476

477
    start = first
40,597✔
478
    stop = min(last,first + chunk_size - 1)
40,597✔
479
    state = _UTF8_DFA_ACCEPT
40,597✔
480
    while start <= last
50,288✔
481
        # try to process ascii chunks
482
        while state == _UTF8_DFA_ACCEPT
40,597✔
483
            _isascii(bytes,start,stop) || break
40,597✔
484
            (start = start + chunk_size) <= last || break
×
485
            stop = min(last,stop + chunk_size)
×
486
        end
×
487
        # Process non ascii chunk
488
        state = _isvalid_utf8_dfa(state,bytes,start,stop)
99,435✔
489
        state == _UTF8_DFA_INVALID && return 0
40,597✔
490

491
        start = start + chunk_size
9,691✔
492
        stop = min(last,stop + chunk_size)
9,691✔
493
    end
9,691✔
494
    return ifelse(state == _UTF8_DFA_ACCEPT,2,0)
9,691✔
495
end
496

497
isvalid(::Type{String}, bytes::AbstractVector{UInt8}) = (@inline byte_string_classify(bytes)) ≠ 0
41,964✔
498
isvalid(::Type{String}, s::AbstractString) =  (@inline byte_string_classify(s)) ≠ 0
48✔
499

500
@inline isvalid(s::AbstractString) = @inline isvalid(String, codeunits(s))
1,446✔
501

502
is_valid_continuation(c) = c & 0xc0 == 0x80
436✔
503

504
## required core functionality ##
505

506
@inline function iterate(s::Union{String, StringView}, i::Int=firstindex(s))
2,839✔
507
    (i % UInt) - 1 < ncodeunits(s) || return nothing
388,936,348✔
508
    b = @inbounds codeunit(s, i)
329,853,493✔
509
    u = UInt32(b) << 24
329,853,493✔
510
    between(b, 0x80, 0xf7) || return reinterpret(Char, u), i+1
659,112,884✔
511
    return @noinline iterate_continued(s, i, u)
594,102✔
512
end
513

514
# duck-type s so that external UTF-8 string packages like StringViews can hook in
515
function iterate_continued(s, i::Int, u::UInt32)
215,702✔
516
    @label begin
215,702✔
517
        u < 0xc0000000 && (i += 1; break)
215,702✔
518
        n = ncodeunits(s)
215,417✔
519
        # first continuation byte
520
        (i += 1) > n && break
215,417✔
521
        @inbounds b = codeunit(s, i)
215,417✔
522
        b & 0xc0 == 0x80 || break
215,417✔
523
        u |= UInt32(b) << 16
214,943✔
524
        # second continuation byte
525
        ((i += 1) > n) | (u < 0xe0000000) && break
214,943✔
526
        @inbounds b = codeunit(s, i)
157,982✔
527
        b & 0xc0 == 0x80 || break
157,982✔
528
        u |= UInt32(b) << 8
157,982✔
529
        # third continuation byte
530
        ((i += 1) > n) | (u < 0xf0000000) && break
157,982✔
531
        @inbounds b = codeunit(s, i)
665✔
532
        b & 0xc0 == 0x80 || break
665✔
533
        u |= UInt32(b); i += 1
665✔
534
    end
535
    return reinterpret(Char, u), i
215,702✔
536
end
537

538
@propagate_inbounds function getindex(s::Union{String, StringView}, i::Int)
3,185✔
539
    b = codeunit(s, i)
128,348,354✔
540
    u = UInt32(b) << 24
128,348,354✔
541
    between(b, 0x80, 0xf7) || return reinterpret(Char, u)
256,507,028✔
542
    return getindex_continued(s, i, u)
189,622✔
543
end
544

545
# duck-type s so that external UTF-8 string packages like StringViews can hook in
546
function getindex_continued(s, i::Int, u::UInt32)
7,397✔
547
    @label begin
7,397✔
548
        if u < 0xc0000000
7,397✔
549
            # called from `getindex` which checks bounds
550
            @inbounds isvalid(s, i) && break
×
551
            string_index_err(s, i)
×
552
        end
553
        n = ncodeunits(s)
7,397✔
554

555
        (i += 1) > n && break
7,397✔
556
        @inbounds b = codeunit(s, i) # cont byte 1
7,397✔
557
        b & 0xc0 == 0x80 || break
7,397✔
558
        u |= UInt32(b) << 16
7,397✔
559

560
        ((i += 1) > n) | (u < 0xe0000000) && break
7,397✔
561
        @inbounds b = codeunit(s, i) # cont byte 2
7,347✔
562
        b & 0xc0 == 0x80 || break
7,347✔
563
        u |= UInt32(b) << 8
7,347✔
564

565
        ((i += 1) > n) | (u < 0xf0000000) && break
7,347✔
566
        @inbounds b = codeunit(s, i) # cont byte 3
36✔
567
        b & 0xc0 == 0x80 || break
36✔
568
        u |= UInt32(b)
36✔
569
    end
570
    return reinterpret(Char, u)
7,397✔
571
end
572

573
function getindex(s::Union{String, StringView}, r::AbstractUnitRange{<:Integer})
8✔
574
    span = (Int(first(r))::Int):(Int(last(r)))::Int
8✔
575
    return s[span]
8✔
576
end
577

578
@inline function getindex(s::String, r::UnitRange{Int})
1,270✔
579
    isempty(r) && return ""
2,564,220✔
580
    i, j = first(r), last(r)
1,450,558✔
581
    @boundscheck begin
2,517,386✔
582
        checkbounds(s, r)
2,517,386✔
583
        @inbounds isvalid(s, i) || string_index_err(s, i)
2,517,386✔
584
        @inbounds isvalid(s, j) || string_index_err(s, j)
2,517,386✔
585
    end
586
    # Safety: The boundscheck checked r is inbounds in s,
587
    # and since we also checked r is not empty, j must be inbounds in s
588
    j = @inbounds nextind(s, j) - 1
5,032,806✔
589
    n = (j - i + 1) % UInt
2,517,386✔
590
    ss = _string_n(n)
2,517,386✔
591
    GC.@preserve s ss unsafe_copyto!(pointer(ss), pointer(s, i), n)
2,517,386✔
592
    return ss
2,517,386✔
593
end
594

595
# nothrow because we know the start and end indices are valid
596
@assume_effects :nothrow function length(s::String)
102,197✔
597
    return length_continued(s, 1, ncodeunits(s), ncodeunits(s))
102,197✔
598
end
599

600
function length(s::StringView)
×
601
    return length_continued(s, 1, ncodeunits(s), ncodeunits(s))
×
602
end
603

604
# effects needed because @inbounds
605
@assume_effects :consistent :effect_free @inline function length(s::String, i::Int, j::Int)
606
    _length(s, i, j)
169,228✔
607
end
608

609
@inline function length(s::StringView, i::Int, j::Int)
×
610
    _length(s, i, j)
×
611
end
612

613
@inline function _length(s::Union{String, StringView}, i::Int, j::Int)
614
    @boundscheck begin
113,528✔
615
        0 < i ≤ ncodeunits(s)+1 || throw(BoundsError(s, i))
113,528✔
616
        0 ≤ j < ncodeunits(s)+1 || throw(BoundsError(s, j))
113,528✔
617
    end
618
    j < i && return 0
113,528✔
619
    @inbounds i, k = thisind(s, i), i
113,852✔
620
    c = j - i + (i == k)
56,926✔
621
    @inbounds length_continued(s, i, j, c)
56,926✔
622
end
623

624
@assume_effects :terminates_globally @propagate_inbounds function length_continued(s::String, i::Int, n::Int, c::Int)
4✔
625
    _length_continued(s, i, n, c)
159,129✔
626
end
627

628
@propagate_inbounds function length_continued(s::StringView, i::Int, n::Int, c::Int)
×
629
    _length_continued(s, i, n, c)
×
630
end
631

632

633
@propagate_inbounds function _length_continued(s::Union{String, StringView}, i::Int, n::Int, c::Int)
4✔
634
    i < n || return c
160,559✔
635
    b = codeunit(s, i)
157,687✔
636
    while true
906,627✔
637
        while true
3,406,179✔
638
            (i += 1) ≤ n || return c
18,898,198✔
639
            0xc0 ≤ b ≤ 0xf7 && break
18,585,420✔
640
            b = codeunit(s, i)
17,835,182✔
641
        end
17,835,182✔
642
        l = b
10✔
643
        b = codeunit(s, i) # cont byte 1
750,238✔
644
        c -= (x = b & 0xc0 == 0x80)
750,238✔
645
        x & (l ≥ 0xe0) || continue
750,238✔
646

647
        (i += 1) ≤ n || return c
60,470✔
648
        b = codeunit(s, i) # cont byte 2
57,874✔
649
        c -= (x = b & 0xc0 == 0x80)
57,874✔
650
        x & (l ≥ 0xf0) || continue
115,748✔
651

652
        (i += 1) ≤ n || return c
×
653
        b = codeunit(s, i) # cont byte 3
×
654
        c -= (b & 0xc0 == 0x80)
×
655
    end
748,940✔
656
end
657

658
## overload methods for efficiency ##
659

660
isvalid(s::String, i::Int) = checkbounds(Bool, s, i) && thisind(s, i) == i
117,156,789✔
661

662
# `isascii(::AbstractVector)` reduces to `@inbounds codeunit(::String, ::Int)`, total.
663
isascii(s::String) = @assume_effects :nothrow :foldable isascii(codeunits(s))
6,446,605✔
664

665
# don't assume effects for general integers since we cannot know their implementation
666
@assume_effects :foldable repeat(c::Char, r::BitInteger) = @invoke repeat(c::Char, r::Integer)
7,675,792✔
667

668
"""
669
    repeat(c::AbstractChar, r::Integer)::String
670

671
Repeat a character `r` times. This can equivalently be accomplished by calling
672
[`c^r`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
673

674
# Examples
675
```jldoctest
676
julia> repeat('A', 3)
677
"AAA"
678
```
679
"""
680
function repeat(c::AbstractChar, r::Integer)
7,663,262✔
681
    r < 0 && throw(ArgumentError("can't repeat a character $r times"))
7,664,357✔
682
    r = UInt(r)::UInt
7,664,357✔
683
    c = Char(c)::Char
7,664,357✔
684
    r == 0 && return ""
7,664,357✔
685
    u = bswap(reinterpret(UInt32, c))
7,664,276✔
686
    n = 4 - (leading_zeros(u | 0xff) >> 3)
7,664,276✔
687
    s = _string_n(n*r)
7,664,276✔
688
    p = pointer(s)
7,664,274✔
689
    GC.@preserve s if n == 1
7,664,274✔
690
        memset(p, u % UInt8, r)
7,664,262✔
691
    elseif n == 2
12✔
692
        p16 = reinterpret(Ptr{UInt16}, p)
4✔
693
        for i = 1:r
4✔
694
            unsafe_store!(p16, u % UInt16, i)
8✔
695
        end
8✔
696
    elseif n == 3
8✔
697
        b1 = (u >> 0) % UInt8
4✔
698
        b2 = (u >> 8) % UInt8
4✔
699
        b3 = (u >> 16) % UInt8
4✔
700
        for i = 0:r-1
4✔
701
            unsafe_store!(p, b1, 3i + 1)
8✔
702
            unsafe_store!(p, b2, 3i + 2)
8✔
703
            unsafe_store!(p, b3, 3i + 3)
8✔
704
        end
8✔
705
    elseif n == 4
4✔
706
        p32 = reinterpret(Ptr{UInt32}, p)
4✔
707
        for i = 1:r
4✔
708
            unsafe_store!(p32, u, i)
8✔
709
        end
7,664,282✔
710
    end
711
    return s
7,664,274✔
712
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc