• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #38011

17 Feb 2025 06:24AM UTC coverage: 20.248% (-5.6%) from 25.839%
#38011

push

local

web-flow
bpart: Track whether any binding replacement has happened in image modules (#57433)

This implements the optimization proposed in #57426 by keeping track of
whether any bindings were replaced in image modules (excluding `Main` as
facilitated by #57426). In addition, we augment serialization to keep
track of whether a method body contains any GlobalRefs that point to a
loaded (system or package) image. If both of these flags are true, we
can skip scanning the body of the method, since we know that we neither
need to add any additional backedges nor were any of the referenced
bindings invalidated. The performance impact on end-to-end load time is
small, but measurable. Overall `@time using ModelingToolkit`
consistently improves about 5% using this PR. However, I should note
that using time is still about 40% slower than 1.11. This is not
necessarily an Apples-to-Apples comparison as there were substantial
other changes on 1.12 (as well as current load-time-tunings targeting
older versions), but I wanted to put the number context.

2 of 15 new or added lines in 5 files covered. (13.33%)

2655 existing lines in 108 files now uncovered.

9867 of 48731 relevant lines covered (20.25%)

107722.08 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

30.11
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
import Core: Symbol
4

5
"""
6
The `AbstractString` type is the supertype of all string implementations in
7
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
8
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
9
about strings:
10

11
* Strings are encoded in terms of fixed-size "code units"
12
  * Code units can be extracted with `codeunit(s, i)`
13
  * The first code unit has index `1`
14
  * The last code unit has index `ncodeunits(s)`
15
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
16
* String indexing is done in terms of these code units:
17
  * Characters are extracted by `s[i]` with a valid string index `i`
18
  * Each `AbstractChar` in a string is encoded by one or more code units
19
  * Only the index of the first code unit of an `AbstractChar` is a valid index
20
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
21
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
22

23
Some string functions that extract code units, characters or substrings from
24
strings error if you pass them out-of-bounds or invalid string indices. This
25
includes `codeunit(s, i)` and `s[i]`. Functions that do string
26
index arithmetic take a more relaxed approach to indexing and give you the
27
closest valid string index when in-bounds, or when out-of-bounds, behave as if
28
there were an infinite number of characters padding each side of the string.
29
Usually these imaginary padding characters have code unit length `1` but string
30
types may choose different "imaginary" character sizes as makes sense for their
31
implementations (e.g. substrings may pass index arithmetic through to the
32
underlying string they provide a view into). Relaxed indexing functions include
33
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
34
model allows index arithmetic to work with out-of-bounds indices as
35
intermediate values so long as one never uses them to retrieve a character,
36
which often helps avoid needing to code around edge cases.
37

38
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
39
[`nextind`](@ref), [`prevind`](@ref).
40
"""
41
AbstractString
42

43
## required string functions ##
44

45
"""
46
    ncodeunits(s::AbstractString) -> Int
47

48
Return the number of code units in a string. Indices that are in bounds to
49
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
50
are valid – they may not be the start of a character, but they will return a
51
code unit value when calling `codeunit(s,i)`.
52

53
# Examples
54
```jldoctest
55
julia> ncodeunits("The Julia Language")
56
18
57

58
julia> ncodeunits("∫eˣ")
59
6
60

61
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
62
(3, 1, 2)
63
```
64

65
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
66
[`length`](@ref), [`lastindex`](@ref).
67
"""
68
ncodeunits(s::AbstractString)
69

70
"""
71
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
72

73
Return the code unit type of the given string object. For ASCII, Latin-1, or
74
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
75
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
76
limited to these three types, but it's hard to think of widely used string
77
encodings that don't use one of these units. `codeunit(s)` is the same as
78
`typeof(codeunit(s,1))` when `s` is a non-empty string.
79

80
See also [`ncodeunits`](@ref).
81
"""
82
codeunit(s::AbstractString)
83

84
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
85

86
"""
87
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
88

89
Return the code unit value in the string `s` at index `i`. Note that
90

91
    codeunit(s, i) :: codeunit(s)
92

93
I.e. the value returned by `codeunit(s, i)` is of the type returned by
94
`codeunit(s)`.
95

96
# Examples
97
```jldoctest
98
julia> a = codeunit("Hello", 2)
99
0x65
100

101
julia> typeof(a)
102
UInt8
103
```
104

105
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
106
"""
107
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
×
108
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
109

110
"""
111
    isvalid(s::AbstractString, i::Integer) -> Bool
112

113
Predicate indicating whether the given index is the start of the encoding of a
114
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
115
character whose encoding starts at that index, if it's false, then `s[i]` will
116
raise an invalid index error or a bounds error depending on if `i` is in bounds.
117
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
118
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
119
is a basic assumption of Julia's generic string support.
120

121
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
122
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
123

124
# Examples
125
```jldoctest
126
julia> str = "αβγdef";
127

128
julia> isvalid(str, 1)
129
true
130

131
julia> str[1]
132
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
133

134
julia> isvalid(str, 2)
135
false
136

137
julia> str[2]
138
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
139
Stacktrace:
140
[...]
141
```
142
"""
UNCOV
143
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
×
144
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
145

146
"""
147
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
148

149
Return a tuple of the character in `s` at index `i` with the index of the start
150
of the following character in `s`. This is the key method that allows strings to
151
be iterated, yielding a sequences of characters. The `iterate` function, as part
152
of the iteration protocol may assume that `i` is the start of a character in `s`.
153

154
See also [`getindex`](@ref), [`checkbounds`](@ref).
155
"""
UNCOV
156
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
×
157
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
158

159
## basic generic definitions ##
160

161
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
×
162

163
"""
164
    sizeof(str::AbstractString)
165

166
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
167
the size, in bytes, of one code unit in `str`.
168

169
# Examples
170
```jldoctest
171
julia> sizeof("")
172
0
173

174
julia> sizeof("∀")
175
3
176
```
177
"""
178
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
994,625✔
179
firstindex(s::AbstractString) = 1
×
180
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
23,534✔
181
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
1,307,413✔
182

183
@propagate_inbounds first(s::AbstractString) = s[firstindex(s)]
30,801✔
184

UNCOV
185
function getindex(s::AbstractString, i::Integer)
×
UNCOV
186
    @boundscheck checkbounds(s, i)
×
UNCOV
187
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
×
188
end
189

190
getindex(s::AbstractString, i::Colon) = s
×
191
# TODO: handle other ranges with stride ±1 specially?
192
# TODO: add more @propagate_inbounds annotations?
193
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
×
194
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
×
195
getindex(s::AbstractString, v::AbstractVector{Bool}) =
×
196
    throw(ArgumentError("logical indexing not supported for strings"))
197

198
function get(s::AbstractString, i::Integer, default)
×
199
    checkbounds(Bool, s, i) ? (@inbounds s[i]) : default
×
200
end
201

202
## bounds checking ##
203

204
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
171,023✔
205
    1 ≤ i ≤ ncodeunits(s)::Int
206
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
8,771✔
207
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
208
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
×
209
    all(i -> checkbounds(Bool, s, i), I)
×
210
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
×
211
    all(i -> checkbounds(Bool, s, i), I)
×
212
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
169,527✔
213
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
214

215
## construction, conversion, promotion ##
216

217
string() = ""
×
218
string(s::AbstractString) = s
×
219

220
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
221
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
222
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
×
223

224
Symbol(s::AbstractString) = Symbol(String(s))
×
225
Symbol(x...) = Symbol(string(x...))
18✔
226

227
convert(::Type{T}, s::T) where {T<:AbstractString} = s
×
228
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
62,775✔
229

230
## summary ##
231

232
function summary(io::IO, s::AbstractString)
×
233
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
×
234
    print(io, prefix, " ", typeof(s))
×
235
end
236

237
## string & character concatenation ##
238

239
"""
240
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
241

242
Concatenate strings and/or characters, producing a [`String`](@ref) or
243
[`AnnotatedString`](@ref) (as appropriate). This is equivalent to calling the
244
[`string`](@ref) or [`annotatedstring`](@ref) function on the arguments. Concatenation of built-in string
245
types always produces a value of type `String` but other string types may choose
246
to return a string of a different type as appropriate.
247

248
# Examples
249
```jldoctest
250
julia> "Hello " * "world"
251
"Hello world"
252

253
julia> 'j' * "ulia"
254
"julia"
255
```
256
"""
257
function (*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
1✔
258
    if _isannotated(s1) || any(_isannotated, ss)
×
259
        annotatedstring(s1, ss...)
×
260
    else
261
        string(s1, ss...)
20,211✔
262
    end
263
end
264

265
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
×
266

267
# This could be written as a single statement with three ||-clauses, however then effect
268
# analysis thinks it may throw and runtime checks are added.
269
# Also see `substring.jl` for the `::SubString{T}` method.
270
_isannotated(S::Type) = S != Union{} && (S <: AnnotatedString || S <: AnnotatedChar)
×
271
_isannotated(s) = _isannotated(typeof(s))
×
272

273
## generic string comparison ##
274

275
"""
276
    cmp(a::AbstractString, b::AbstractString) -> Int
277

278
Compare two strings. Return `0` if both strings have the same length and the character
279
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
280
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
281
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
282
code points).
283

284
# Examples
285
```jldoctest
286
julia> cmp("abc", "abc")
287
0
288

289
julia> cmp("ab", "abc")
290
-1
291

292
julia> cmp("abc", "ab")
293
1
294

295
julia> cmp("ab", "ac")
296
-1
297

298
julia> cmp("ac", "ab")
299
1
300

301
julia> cmp("α", "a")
302
1
303

304
julia> cmp("b", "β")
305
-1
306
```
307
"""
308
function cmp(a::AbstractString, b::AbstractString)
×
309
    a === b && return 0
×
310
    (iv1, iv2) = (iterate(a), iterate(b))
×
311
    while iv1 !== nothing && iv2 !== nothing
×
312
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
×
313
        c ≠ d && return ifelse(c < d, -1, 1)
×
314
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
×
315
    end
×
316
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
×
317
end
318

319
"""
320
    ==(a::AbstractString, b::AbstractString) -> Bool
321

322
Test whether two strings are equal character by character (technically, Unicode
323
code point by code point). Should either string be a [`AnnotatedString`](@ref) the
324
string properties must match too.
325

326
# Examples
327
```jldoctest
328
julia> "abc" == "abc"
329
true
330

331
julia> "abc" == "αβγ"
332
false
333
```
334
"""
335
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
×
336

337
"""
338
    isless(a::AbstractString, b::AbstractString) -> Bool
339

340
Test whether string `a` comes before string `b` in alphabetical order
341
(technically, in lexicographical order by Unicode code points).
342

343
# Examples
344
```jldoctest
345
julia> isless("a", "b")
346
true
347

348
julia> isless("β", "α")
349
false
350

351
julia> isless("a", "a")
352
false
353
```
354
"""
355
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
232,646✔
356

357
# faster comparisons for symbols
358

359
@assume_effects :total function cmp(a::Symbol, b::Symbol)
360
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
×
361
end
362

363
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
×
364

365
# hashing
366

367
hash(s::AbstractString, h::UInt) = hash(String(s), h)
×
368

369
## character index arithmetic ##
370

371
"""
372
    length(s::AbstractString) -> Int
373
    length(s::AbstractString, i::Integer, j::Integer) -> Int
374

375
Return the number of characters in string `s` from indices `i` through `j`.
376

377
This is computed as the number of code unit indices from `i` to `j` which are
378
valid character indices. With only a single string argument, this computes
379
the number of characters in the entire string. With `i` and `j` arguments it
380
computes the number of indices between `i` and `j` inclusive that are valid
381
indices in the string `s`. In addition to in-bounds values, `i` may take the
382
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
383
value `0`.
384

385
!!! note
386
    The time complexity of this operation is linear in general. That is, it
387
    will take the time proportional to the number of bytes or characters in
388
    the string because it counts the value on the fly. This is in contrast to
389
    the method for arrays, which is a constant-time operation.
390

391
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
392
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
393

394
# Examples
395
```jldoctest
396
julia> length("jμΛIα")
397
5
398
```
399
"""
400
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
×
401

402
function length(s::AbstractString, i::Int, j::Int)
×
403
    @boundscheck begin
×
404
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
×
405
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
×
406
    end
407
    n = 0
×
408
    for k = i:j
×
409
        @inbounds n += isvalid(s, k)
×
410
    end
×
411
    return n
×
412
end
413

414
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
×
415
    length(s, Int(i), Int(j))
416

417
"""
418
    thisind(s::AbstractString, i::Integer) -> Int
419

420
If `i` is in bounds in `s` return the index of the start of the character whose
421
encoding code unit `i` is part of. In other words, if `i` is the start of a
422
character, return `i`; if `i` is not the start of a character, rewind until the
423
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
424
return `i`. In all other cases throw `BoundsError`.
425

426
# Examples
427
```jldoctest
428
julia> thisind("α", 0)
429
0
430

431
julia> thisind("α", 1)
432
1
433

434
julia> thisind("α", 2)
435
1
436

437
julia> thisind("α", 3)
438
3
439

440
julia> thisind("α", 4)
441
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
442
[...]
443

444
julia> thisind("α", -1)
445
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
446
[...]
447
```
448
"""
449
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
×
450

451
function thisind(s::AbstractString, i::Int)
×
452
    z = ncodeunits(s)::Int + 1
×
453
    i == z && return i
×
454
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
×
455
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
×
456
        i -= 1
×
457
    end
×
458
    return i
×
459
end
460

461
"""
462
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
463

464
* Case `n == 1`
465

466
  If `i` is in bounds in `s` return the index of the start of the character whose
467
  encoding starts before index `i`. In other words, if `i` is the start of a
468
  character, return the start of the previous character; if `i` is not the start
469
  of a character, rewind until the start of a character and return that index.
470
  If `i` is equal to `1` return `0`.
471
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
472
  Otherwise throw `BoundsError`.
473

474
* Case `n > 1`
475

476
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
477
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
478
  iteration decreases the returned value by `1`.
479
  This means that in this case `prevind` can return a negative value.
480

481
* Case `n == 0`
482

483
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
484
  Otherwise `StringIndexError` or `BoundsError` is thrown.
485

486
# Examples
487
```jldoctest
488
julia> prevind("α", 3)
489
1
490

491
julia> prevind("α", 1)
492
0
493

494
julia> prevind("α", 0)
495
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
496
[...]
497

498
julia> prevind("α", 2, 2)
499
0
500

501
julia> prevind("α", 2, 3)
502
-1
503
```
504
"""
505
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
×
506
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
132✔
507
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
643✔
508

509
function prevind(s::AbstractString, i::Int, n::Int)
634✔
510
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
634✔
511
    z = ncodeunits(s) + 1
634✔
512
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
634✔
513
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
634✔
514
    while n > 0 && 1 < i
1,246✔
515
        @inbounds n -= isvalid(s, i -= 1)
1,224✔
516
    end
612✔
517
    return i - n
634✔
518
end
519

520
"""
521
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
522

523
* Case `n == 1`
524

525
  If `i` is in bounds in `s` return the index of the start of the character whose
526
  encoding starts after index `i`. In other words, if `i` is the start of a
527
  character, return the start of the next character; if `i` is not the start
528
  of a character, move forward until the start of a character and return that index.
529
  If `i` is equal to `0` return `1`.
530
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
531
  Otherwise throw `BoundsError`.
532

533
* Case `n > 1`
534

535
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
536
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
537
  each remaining iteration increases the returned value by `1`. This means that in this
538
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
539

540
* Case `n == 0`
541

542
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
543
  Otherwise `StringIndexError` or `BoundsError` is thrown.
544

545
# Examples
546
```jldoctest
547
julia> nextind("α", 0)
548
1
549

550
julia> nextind("α", 1)
551
3
552

553
julia> nextind("α", 3)
554
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
555
[...]
556

557
julia> nextind("α", 0, 2)
558
3
559

560
julia> nextind("α", 1, 2)
561
4
562
```
563
"""
564
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
×
565
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
×
566
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
×
567

568
function nextind(s::AbstractString, i::Int, n::Int)
×
569
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
×
570
    z = ncodeunits(s)
×
571
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
×
572
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
×
573
    while n > 0 && i < z
×
574
        @inbounds n -= isvalid(s, i += 1)
×
575
    end
×
576
    return i + n
×
577
end
578

579
## string index iteration type ##
580

581
struct EachStringIndex{T<:AbstractString}
UNCOV
582
    s::T
×
583
end
UNCOV
584
keys(s::AbstractString) = EachStringIndex(s)
×
585

586
length(e::EachStringIndex) = length(e.s)
×
587
first(::EachStringIndex) = 1
×
588
last(e::EachStringIndex) = lastindex(e.s)
36✔
589
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
116✔
590
eltype(::Type{<:EachStringIndex}) = Int
×
591

592
"""
593
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
594

595
Test whether a character belongs to the ASCII character set, or whether this is true for
596
all elements of a string.
597

598
# Examples
599
```jldoctest
600
julia> isascii('a')
601
true
602

603
julia> isascii('α')
604
false
605

606
julia> isascii("abc")
607
true
608

609
julia> isascii("αβγ")
610
false
611
```
612
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
613
to remove or replace non-ASCII characters, respectively:
614
```jldoctest
615
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
616
"abcdefgh"
617

618
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
619
"abcde fgh"
620
```
621
"""
622
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
1,654✔
623
isascii(s::AbstractString) = all(isascii, s)
×
624
isascii(c::AbstractChar) = UInt32(c) < 0x80
×
625

626
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
627
    r = zero(CU)
×
628
    for n = first:last
5✔
629
        @inbounds r |= code_units[n]
45✔
630
    end
85✔
631
    return 0 ≤ r < 0x80
5✔
632
end
633

634
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
635
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
636
    n=first
×
637
    while n <= last - chunk_size
×
638
        _isascii(cu,n,n+chunk_size-1) || return false
×
639
        n += chunk_size
×
640
    end
×
641
    return  _isascii(cu,last-chunk_size+1,last)
×
642
end
643
"""
644
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
645

646
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
647
This function is intended to be used by other string implementations that need a fast ASCII check.
648
"""
649
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
5✔
650
    chunk_size = 1024
×
651
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
652
    first = firstindex(cu);   last = lastindex(cu)
5✔
653
    l = last - first + 1
5✔
654
    l < chunk_threshold && return _isascii(cu,first,last)
5✔
655
    return _isascii_chunks(chunk_size,cu,first,last)
×
656
end
657

658
## string map, filter ##
659

660
function map(f, s::AbstractString)
1,354✔
661
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
1,354✔
662
    index = UInt(1)
1,354✔
663
    for c::AbstractChar in s
2,708✔
664
        c′ = f(c)
13,126✔
665
        isa(c′, AbstractChar) || throw(ArgumentError(
13,126✔
666
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
667
            "try map(f, collect(s)) or a comprehension instead"))
668
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
13,126✔
669
        index += __unsafe_string!(out, convert(Char, c′), index)
13,126✔
670
    end
24,898✔
671
    resize!(out, index-1)
1,354✔
672
    sizehint!(out, index-1)
1,354✔
673
    return String(out)
1,354✔
674
end
675

676
function filter(f, s::AbstractString)
×
677
    out = IOBuffer(sizehint=sizeof(s))
×
678
    for c in s
×
679
        f(c) && write(out, c)
×
680
    end
×
681
    String(_unsafe_take!(out))
×
682
end
683

684
## string first and last ##
685

686
"""
687
    first(s::AbstractString, n::Integer)
688

689
Get a string consisting of the first `n` characters of `s`.
690

691
# Examples
692
```jldoctest
693
julia> first("∀ϵ≠0: ϵ²>0", 0)
694
""
695

696
julia> first("∀ϵ≠0: ϵ²>0", 1)
697
"∀"
698

699
julia> first("∀ϵ≠0: ϵ²>0", 3)
700
"∀ϵ≠"
701
```
702
"""
703
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
×
704

705
"""
706
    last(s::AbstractString, n::Integer)
707

708
Get a string consisting of the last `n` characters of `s`.
709

710
# Examples
711
```jldoctest
712
julia> last("∀ϵ≠0: ϵ²>0", 0)
713
""
714

715
julia> last("∀ϵ≠0: ϵ²>0", 1)
716
"0"
717

718
julia> last("∀ϵ≠0: ϵ²>0", 3)
719
"²>0"
720
```
721
"""
722
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
×
723

724
"""
725
    reverseind(v, i)
726

727
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
728
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
729
cases where `v` contains non-ASCII characters.)
730

731
# Examples
732
```jldoctest
733
julia> s = "Julia🚀"
734
"Julia🚀"
735

736
julia> r = reverse(s)
737
"🚀ailuJ"
738

739
julia> for i in eachindex(s)
740
           print(r[reverseind(r, i)])
741
       end
742
Julia🚀
743
```
744
"""
745
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
×
746

747
"""
748
    repeat(s::AbstractString, r::Integer)
749

750
Repeat a string `r` times. This can be written as `s^r`.
751

752
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
753

754
# Examples
755
```jldoctest
756
julia> repeat("ha", 3)
757
"hahaha"
758
```
759
"""
760
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
×
761

762
"""
763
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
764

765
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
766

767
See also [`repeat`](@ref).
768

769
# Examples
770
```jldoctest
771
julia> "Test "^3
772
"Test Test Test "
773
```
774
"""
775
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
476✔
776

777
# reverse-order iteration for strings and indices thereof
UNCOV
778
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
×
779
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
×
780

781
## code unit access ##
782

783
"""
784
    CodeUnits(s::AbstractString)
785

786
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
787
of the string's representation.
788
"""
789
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
790
    s::S
791
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
396,209✔
792
end
793

794
length(s::CodeUnits) = ncodeunits(s.s)
3,567,689✔
795
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
×
796
size(s::CodeUnits) = (length(s),)
5✔
797
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
×
798
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
3,212,479✔
799
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
×
800
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
3,718,847✔
801

802

803
write(io::IO, s::CodeUnits) = write(io, s.s)
×
804

805
cconvert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)
×
806
cconvert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = cconvert(Ptr{Int8}, s.s)
×
807

808
"""
809
    codeunits(s::AbstractString)
810

811
Obtain a vector-like object containing the code units of a string.
812
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
813
for new string types if necessary.
814

815
# Examples
816
```jldoctest
817
julia> codeunits("Juλia")
818
6-element Base.CodeUnits{UInt8, String}:
819
 0x4a
820
 0x75
821
 0xce
822
 0xbb
823
 0x69
824
 0x61
825
```
826
"""
827
codeunits(s::AbstractString) = CodeUnits(s)
396,209✔
828

829
function _split_rest(s::AbstractString, n::Int)
×
830
    lastind = lastindex(s)
×
831
    i = try
×
832
        prevind(s, lastind, n)
×
833
    catch e
834
        e isa BoundsError || rethrow()
×
835
        _check_length_split_rest(length(s), n)
×
836
    end
837
    last_n = SubString(s, nextind(s, i), lastind)
×
838
    front = s[begin:i]
×
839
    return front, last_n
×
840
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc