• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #38004

08 Feb 2025 04:15AM UTC coverage: 19.401% (-3.6%) from 23.008%
#38004

push

local

web-flow
Inference: propagate struct initialization info on `setfield!` (#57222)

When a variable has a field set with `setfield!(var, field, value)`,
inference now assumes that this specific field is defined and may for
example constant-propagate `isdefined(var, field)` as `true`.
`PartialStruct`, the lattice element used to encode this information,
still has a few limitations in terms of what it may represent (it cannot
represent mutable structs with non-contiguously defined fields yet),
further work on extending it would increase the impact of this change.

Consider the following function:
```julia
julia> function f()
           a = A(1)
           setfield!(a, :y, 2)
           invokelatest(identity, a)
           isdefined(a, :y) && return 1.0
           a
       end
f (generic function with 1 method)
```

Here is before on `master`:
```julia
julia> @code_typed f()
CodeInfo(
1 ─ %1 = %new(Main.A, 1)::A
│          builtin Main.setfield!(%1, :y, 2)::Int64
│        dynamic builtin (Core._call_latest)(identity, %1)::Any
│   %4 =   builtin Main.isdefined(%1, :y)::Bool
└──      goto #3 if not %4
2 ─      return 1.0
3 ─      return %1
) => Union{Float64, A}
```

And after this PR:
```julia
julia> @code_typed f()
CodeInfo(
1 ─ %1 = %new(Main.A, 1)::A
│          builtin Main.setfield!(%1, :y, 2)::Int64
│        dynamic builtin (Core._call_latest)(identity, %1)::Any
└──      return 1.0
) => Float64
```

---------

Co-authored-by: Cédric Belmant <cedric.belmant@juliahub.com>

9440 of 48658 relevant lines covered (19.4%)

95900.84 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

29.21
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
import Core: Symbol
4

5
"""
6
The `AbstractString` type is the supertype of all string implementations in
7
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
8
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
9
about strings:
10

11
* Strings are encoded in terms of fixed-size "code units"
12
  * Code units can be extracted with `codeunit(s, i)`
13
  * The first code unit has index `1`
14
  * The last code unit has index `ncodeunits(s)`
15
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
16
* String indexing is done in terms of these code units:
17
  * Characters are extracted by `s[i]` with a valid string index `i`
18
  * Each `AbstractChar` in a string is encoded by one or more code units
19
  * Only the index of the first code unit of an `AbstractChar` is a valid index
20
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
21
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
22

23
Some string functions that extract code units, characters or substrings from
24
strings error if you pass them out-of-bounds or invalid string indices. This
25
includes `codeunit(s, i)` and `s[i]`. Functions that do string
26
index arithmetic take a more relaxed approach to indexing and give you the
27
closest valid string index when in-bounds, or when out-of-bounds, behave as if
28
there were an infinite number of characters padding each side of the string.
29
Usually these imaginary padding characters have code unit length `1` but string
30
types may choose different "imaginary" character sizes as makes sense for their
31
implementations (e.g. substrings may pass index arithmetic through to the
32
underlying string they provide a view into). Relaxed indexing functions include
33
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
34
model allows index arithmetic to work with out-of-bounds indices as
35
intermediate values so long as one never uses them to retrieve a character,
36
which often helps avoid needing to code around edge cases.
37

38
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
39
[`nextind`](@ref), [`prevind`](@ref).
40
"""
41
AbstractString
42

43
## required string functions ##
44

45
"""
46
    ncodeunits(s::AbstractString) -> Int
47

48
Return the number of code units in a string. Indices that are in bounds to
49
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
50
are valid – they may not be the start of a character, but they will return a
51
code unit value when calling `codeunit(s,i)`.
52

53
# Examples
54
```jldoctest
55
julia> ncodeunits("The Julia Language")
56
18
57

58
julia> ncodeunits("∫eˣ")
59
6
60

61
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
62
(3, 1, 2)
63
```
64

65
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
66
[`length`](@ref), [`lastindex`](@ref).
67
"""
68
ncodeunits(s::AbstractString)
69

70
"""
71
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
72

73
Return the code unit type of the given string object. For ASCII, Latin-1, or
74
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
75
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
76
limited to these three types, but it's hard to think of widely used string
77
encodings that don't use one of these units. `codeunit(s)` is the same as
78
`typeof(codeunit(s,1))` when `s` is a non-empty string.
79

80
See also [`ncodeunits`](@ref).
81
"""
82
codeunit(s::AbstractString)
83

84
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
85

86
"""
87
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
88

89
Return the code unit value in the string `s` at index `i`. Note that
90

91
    codeunit(s, i) :: codeunit(s)
92

93
I.e. the value returned by `codeunit(s, i)` is of the type returned by
94
`codeunit(s)`.
95

96
# Examples
97
```jldoctest
98
julia> a = codeunit("Hello", 2)
99
0x65
100

101
julia> typeof(a)
102
UInt8
103
```
104

105
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
106
"""
107
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
×
108
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
109

110
"""
111
    isvalid(s::AbstractString, i::Integer) -> Bool
112

113
Predicate indicating whether the given index is the start of the encoding of a
114
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
115
character whose encoding starts at that index, if it's false, then `s[i]` will
116
raise an invalid index error or a bounds error depending on if `i` is in bounds.
117
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
118
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
119
is a basic assumption of Julia's generic string support.
120

121
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
122
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
123

124
# Examples
125
```jldoctest
126
julia> str = "αβγdef";
127

128
julia> isvalid(str, 1)
129
true
130

131
julia> str[1]
132
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
133

134
julia> isvalid(str, 2)
135
false
136

137
julia> str[2]
138
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
139
Stacktrace:
140
[...]
141
```
142
"""
143
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
×
144
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
145

146
"""
147
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
148

149
Return a tuple of the character in `s` at index `i` with the index of the start
150
of the following character in `s`. This is the key method that allows strings to
151
be iterated, yielding a sequences of characters. The `iterate` function, as part
152
of the iteration protocol may assume that `i` is the start of a character in `s`.
153

154
See also [`getindex`](@ref), [`checkbounds`](@ref).
155
"""
156
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
×
157
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
158

159
## basic generic definitions ##
160

161
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
×
162

163
"""
164
    sizeof(str::AbstractString)
165

166
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
167
the size, in bytes, of one code unit in `str`.
168

169
# Examples
170
```jldoctest
171
julia> sizeof("")
172
0
173

174
julia> sizeof("∀")
175
3
176
```
177
"""
178
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
991,385✔
179
firstindex(s::AbstractString) = 1
×
180
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
22,924✔
181
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
1,371,019✔
182

183
@propagate_inbounds first(s::AbstractString) = s[firstindex(s)]
30,604✔
184

185
function getindex(s::AbstractString, i::Integer)
×
186
    @boundscheck checkbounds(s, i)
×
187
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
×
188
end
189

190
getindex(s::AbstractString, i::Colon) = s
×
191
# TODO: handle other ranges with stride ±1 specially?
192
# TODO: add more @propagate_inbounds annotations?
193
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
×
194
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
×
195
getindex(s::AbstractString, v::AbstractVector{Bool}) =
×
196
    throw(ArgumentError("logical indexing not supported for strings"))
197

198
function get(s::AbstractString, i::Integer, default)
×
199
# TODO: use ternary once @inbounds is expression-like
200
    if checkbounds(Bool, s, i)
×
201
        @inbounds return s[i]
×
202
    else
203
        return default
×
204
    end
205
end
206

207
## bounds checking ##
208

209
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
174,886✔
210
    1 ≤ i ≤ ncodeunits(s)::Int
211
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
8,043✔
212
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
213
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
×
214
    all(i -> checkbounds(Bool, s, i), I)
×
215
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
×
216
    all(i -> checkbounds(Bool, s, i), I)
×
217
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
174,148✔
218
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
219

220
## construction, conversion, promotion ##
221

222
string() = ""
×
223
string(s::AbstractString) = s
×
224

225
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
226
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
227
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
×
228

229
Symbol(s::AbstractString) = Symbol(String(s))
×
230
Symbol(x...) = Symbol(string(x...))
18✔
231

232
convert(::Type{T}, s::T) where {T<:AbstractString} = s
×
233
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
63,234✔
234

235
## summary ##
236

237
function summary(io::IO, s::AbstractString)
×
238
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
×
239
    print(io, prefix, " ", typeof(s))
×
240
end
241

242
## string & character concatenation ##
243

244
"""
245
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
246

247
Concatenate strings and/or characters, producing a [`String`](@ref) or
248
[`AnnotatedString`](@ref) (as appropriate). This is equivalent to calling the
249
[`string`](@ref) or [`annotatedstring`](@ref) function on the arguments. Concatenation of built-in string
250
types always produces a value of type `String` but other string types may choose
251
to return a string of a different type as appropriate.
252

253
# Examples
254
```jldoctest
255
julia> "Hello " * "world"
256
"Hello world"
257

258
julia> 'j' * "ulia"
259
"julia"
260
```
261
"""
262
function (*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
×
263
    if _isannotated(s1) || any(_isannotated, ss)
×
264
        annotatedstring(s1, ss...)
×
265
    else
266
        string(s1, ss...)
20,121✔
267
    end
268
end
269

270
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
×
271

272
# This could be written as a single statement with three ||-clauses, however then effect
273
# analysis thinks it may throw and runtime checks are added.
274
# Also see `substring.jl` for the `::SubString{T}` method.
275
_isannotated(S::Type) = S != Union{} && (S <: AnnotatedString || S <: AnnotatedChar)
×
276
_isannotated(s) = _isannotated(typeof(s))
×
277

278
## generic string comparison ##
279

280
"""
281
    cmp(a::AbstractString, b::AbstractString) -> Int
282

283
Compare two strings. Return `0` if both strings have the same length and the character
284
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
285
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
286
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
287
code points).
288

289
# Examples
290
```jldoctest
291
julia> cmp("abc", "abc")
292
0
293

294
julia> cmp("ab", "abc")
295
-1
296

297
julia> cmp("abc", "ab")
298
1
299

300
julia> cmp("ab", "ac")
301
-1
302

303
julia> cmp("ac", "ab")
304
1
305

306
julia> cmp("α", "a")
307
1
308

309
julia> cmp("b", "β")
310
-1
311
```
312
"""
313
function cmp(a::AbstractString, b::AbstractString)
×
314
    a === b && return 0
×
315
    (iv1, iv2) = (iterate(a), iterate(b))
×
316
    while iv1 !== nothing && iv2 !== nothing
×
317
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
×
318
        c ≠ d && return ifelse(c < d, -1, 1)
×
319
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
×
320
    end
×
321
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
×
322
end
323

324
"""
325
    ==(a::AbstractString, b::AbstractString) -> Bool
326

327
Test whether two strings are equal character by character (technically, Unicode
328
code point by code point). Should either string be a [`AnnotatedString`](@ref) the
329
string properties must match too.
330

331
# Examples
332
```jldoctest
333
julia> "abc" == "abc"
334
true
335

336
julia> "abc" == "αβγ"
337
false
338
```
339
"""
340
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
×
341

342
"""
343
    isless(a::AbstractString, b::AbstractString) -> Bool
344

345
Test whether string `a` comes before string `b` in alphabetical order
346
(technically, in lexicographical order by Unicode code points).
347

348
# Examples
349
```jldoctest
350
julia> isless("a", "b")
351
true
352

353
julia> isless("β", "α")
354
false
355

356
julia> isless("a", "a")
357
false
358
```
359
"""
360
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
231,025✔
361

362
# faster comparisons for symbols
363

364
@assume_effects :total function cmp(a::Symbol, b::Symbol)
365
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
×
366
end
367

368
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
×
369

370
# hashing
371

372
hash(s::AbstractString, h::UInt) = hash(String(s), h)
×
373

374
## character index arithmetic ##
375

376
"""
377
    length(s::AbstractString) -> Int
378
    length(s::AbstractString, i::Integer, j::Integer) -> Int
379

380
Return the number of characters in string `s` from indices `i` through `j`.
381

382
This is computed as the number of code unit indices from `i` to `j` which are
383
valid character indices. With only a single string argument, this computes
384
the number of characters in the entire string. With `i` and `j` arguments it
385
computes the number of indices between `i` and `j` inclusive that are valid
386
indices in the string `s`. In addition to in-bounds values, `i` may take the
387
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
388
value `0`.
389

390
!!! note
391
    The time complexity of this operation is linear in general. That is, it
392
    will take the time proportional to the number of bytes or characters in
393
    the string because it counts the value on the fly. This is in contrast to
394
    the method for arrays, which is a constant-time operation.
395

396
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
397
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
398

399
# Examples
400
```jldoctest
401
julia> length("jμΛIα")
402
5
403
```
404
"""
405
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
×
406

407
function length(s::AbstractString, i::Int, j::Int)
×
408
    @boundscheck begin
×
409
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
×
410
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
×
411
    end
412
    n = 0
×
413
    for k = i:j
×
414
        @inbounds n += isvalid(s, k)
×
415
    end
×
416
    return n
×
417
end
418

419
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
×
420
    length(s, Int(i), Int(j))
421

422
"""
423
    thisind(s::AbstractString, i::Integer) -> Int
424

425
If `i` is in bounds in `s` return the index of the start of the character whose
426
encoding code unit `i` is part of. In other words, if `i` is the start of a
427
character, return `i`; if `i` is not the start of a character, rewind until the
428
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
429
return `i`. In all other cases throw `BoundsError`.
430

431
# Examples
432
```jldoctest
433
julia> thisind("α", 0)
434
0
435

436
julia> thisind("α", 1)
437
1
438

439
julia> thisind("α", 2)
440
1
441

442
julia> thisind("α", 3)
443
3
444

445
julia> thisind("α", 4)
446
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
447
[...]
448

449
julia> thisind("α", -1)
450
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
451
[...]
452
```
453
"""
454
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
×
455

456
function thisind(s::AbstractString, i::Int)
×
457
    z = ncodeunits(s)::Int + 1
×
458
    i == z && return i
×
459
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
×
460
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
×
461
        i -= 1
×
462
    end
×
463
    return i
×
464
end
465

466
"""
467
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
468

469
* Case `n == 1`
470

471
  If `i` is in bounds in `s` return the index of the start of the character whose
472
  encoding starts before index `i`. In other words, if `i` is the start of a
473
  character, return the start of the previous character; if `i` is not the start
474
  of a character, rewind until the start of a character and return that index.
475
  If `i` is equal to `1` return `0`.
476
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
477
  Otherwise throw `BoundsError`.
478

479
* Case `n > 1`
480

481
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
482
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
483
  iteration decreases the returned value by `1`.
484
  This means that in this case `prevind` can return a negative value.
485

486
* Case `n == 0`
487

488
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
489
  Otherwise `StringIndexError` or `BoundsError` is thrown.
490

491
# Examples
492
```jldoctest
493
julia> prevind("α", 3)
494
1
495

496
julia> prevind("α", 1)
497
0
498

499
julia> prevind("α", 0)
500
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
501
[...]
502

503
julia> prevind("α", 2, 2)
504
0
505

506
julia> prevind("α", 2, 3)
507
-1
508
```
509
"""
510
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
×
511
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
20✔
512
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
198✔
513

514
function prevind(s::AbstractString, i::Int, n::Int)
189✔
515
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
189✔
516
    z = ncodeunits(s) + 1
189✔
517
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
189✔
518
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
189✔
519
    while n > 0 && 1 < i
370✔
520
        @inbounds n -= isvalid(s, i -= 1)
362✔
521
    end
181✔
522
    return i - n
189✔
523
end
524

525
"""
526
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
527

528
* Case `n == 1`
529

530
  If `i` is in bounds in `s` return the index of the start of the character whose
531
  encoding starts after index `i`. In other words, if `i` is the start of a
532
  character, return the start of the next character; if `i` is not the start
533
  of a character, move forward until the start of a character and return that index.
534
  If `i` is equal to `0` return `1`.
535
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
536
  Otherwise throw `BoundsError`.
537

538
* Case `n > 1`
539

540
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
541
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
542
  each remaining iteration increases the returned value by `1`. This means that in this
543
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
544

545
* Case `n == 0`
546

547
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
548
  Otherwise `StringIndexError` or `BoundsError` is thrown.
549

550
# Examples
551
```jldoctest
552
julia> nextind("α", 0)
553
1
554

555
julia> nextind("α", 1)
556
3
557

558
julia> nextind("α", 3)
559
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
560
[...]
561

562
julia> nextind("α", 0, 2)
563
3
564

565
julia> nextind("α", 1, 2)
566
4
567
```
568
"""
569
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
×
570
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
×
571
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
×
572

573
function nextind(s::AbstractString, i::Int, n::Int)
×
574
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
×
575
    z = ncodeunits(s)
×
576
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
×
577
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
×
578
    while n > 0 && i < z
×
579
        @inbounds n -= isvalid(s, i += 1)
×
580
    end
×
581
    return i + n
×
582
end
583

584
## string index iteration type ##
585

586
struct EachStringIndex{T<:AbstractString}
587
    s::T
×
588
end
589
keys(s::AbstractString) = EachStringIndex(s)
×
590

591
length(e::EachStringIndex) = length(e.s)
×
592
first(::EachStringIndex) = 1
×
593
last(e::EachStringIndex) = lastindex(e.s)
36✔
594
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
97✔
595
eltype(::Type{<:EachStringIndex}) = Int
×
596

597
"""
598
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
599

600
Test whether a character belongs to the ASCII character set, or whether this is true for
601
all elements of a string.
602

603
# Examples
604
```jldoctest
605
julia> isascii('a')
606
true
607

608
julia> isascii('α')
609
false
610

611
julia> isascii("abc")
612
true
613

614
julia> isascii("αβγ")
615
false
616
```
617
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
618
to remove or replace non-ASCII characters, respectively:
619
```jldoctest
620
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
621
"abcdefgh"
622

623
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
624
"abcde fgh"
625
```
626
"""
627
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
1,645✔
628
isascii(s::AbstractString) = all(isascii, s)
×
629
isascii(c::AbstractChar) = UInt32(c) < 0x80
×
630

631
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
632
    r = zero(CU)
×
633
    for n = first:last
4✔
634
        @inbounds r |= code_units[n]
41✔
635
    end
78✔
636
    return 0 ≤ r < 0x80
4✔
637
end
638

639
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
640
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
641
    n=first
×
642
    while n <= last - chunk_size
×
643
        _isascii(cu,n,n+chunk_size-1) || return false
×
644
        n += chunk_size
×
645
    end
×
646
    return  _isascii(cu,last-chunk_size+1,last)
×
647
end
648
"""
649
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
650

651
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
652
This function is intended to be used by other string implementations that need a fast ASCII check.
653
"""
654
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
4✔
655
    chunk_size = 1024
×
656
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
657
    first = firstindex(cu);   last = lastindex(cu)
4✔
658
    l = last - first + 1
4✔
659
    l < chunk_threshold && return _isascii(cu,first,last)
4✔
660
    return _isascii_chunks(chunk_size,cu,first,last)
×
661
end
662

663
## string map, filter ##
664

665
function map(f, s::AbstractString)
2,001✔
666
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
2,001✔
667
    index = UInt(1)
2,001✔
668
    for c::AbstractChar in s
4,002✔
669
        c′ = f(c)
18,645✔
670
        isa(c′, AbstractChar) || throw(ArgumentError(
18,645✔
671
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
672
            "try map(f, collect(s)) or a comprehension instead"))
673
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
18,645✔
674
        index += __unsafe_string!(out, convert(Char, c′), index)
18,645✔
675
    end
35,289✔
676
    resize!(out, index-1)
2,001✔
677
    sizehint!(out, index-1)
2,001✔
678
    return String(out)
2,001✔
679
end
680

681
function filter(f, s::AbstractString)
×
682
    out = IOBuffer(sizehint=sizeof(s))
×
683
    for c in s
×
684
        f(c) && write(out, c)
×
685
    end
×
686
    String(_unsafe_take!(out))
×
687
end
688

689
## string first and last ##
690

691
"""
692
    first(s::AbstractString, n::Integer)
693

694
Get a string consisting of the first `n` characters of `s`.
695

696
# Examples
697
```jldoctest
698
julia> first("∀ϵ≠0: ϵ²>0", 0)
699
""
700

701
julia> first("∀ϵ≠0: ϵ²>0", 1)
702
"∀"
703

704
julia> first("∀ϵ≠0: ϵ²>0", 3)
705
"∀ϵ≠"
706
```
707
"""
708
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
×
709

710
"""
711
    last(s::AbstractString, n::Integer)
712

713
Get a string consisting of the last `n` characters of `s`.
714

715
# Examples
716
```jldoctest
717
julia> last("∀ϵ≠0: ϵ²>0", 0)
718
""
719

720
julia> last("∀ϵ≠0: ϵ²>0", 1)
721
"0"
722

723
julia> last("∀ϵ≠0: ϵ²>0", 3)
724
"²>0"
725
```
726
"""
727
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
×
728

729
"""
730
    reverseind(v, i)
731

732
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
733
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
734
cases where `v` contains non-ASCII characters.)
735

736
# Examples
737
```jldoctest
738
julia> s = "Julia🚀"
739
"Julia🚀"
740

741
julia> r = reverse(s)
742
"🚀ailuJ"
743

744
julia> for i in eachindex(s)
745
           print(r[reverseind(r, i)])
746
       end
747
Julia🚀
748
```
749
"""
750
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
×
751

752
"""
753
    repeat(s::AbstractString, r::Integer)
754

755
Repeat a string `r` times. This can be written as `s^r`.
756

757
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
758

759
# Examples
760
```jldoctest
761
julia> repeat("ha", 3)
762
"hahaha"
763
```
764
"""
765
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
×
766

767
"""
768
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
769

770
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
771

772
See also [`repeat`](@ref).
773

774
# Examples
775
```jldoctest
776
julia> "Test "^3
777
"Test Test Test "
778
```
779
"""
780
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
469✔
781

782
# reverse-order iteration for strings and indices thereof
783
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
×
784
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
×
785

786
## code unit access ##
787

788
"""
789
    CodeUnits(s::AbstractString)
790

791
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
792
of the string's representation.
793
"""
794
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
795
    s::S
796
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
395,019✔
797
end
798

799
length(s::CodeUnits) = ncodeunits(s.s)
3,556,730✔
800
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
×
801
size(s::CodeUnits) = (length(s),)
4✔
802
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
×
803
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
3,202,568✔
804
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
×
805
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
3,707,396✔
806

807

808
write(io::IO, s::CodeUnits) = write(io, s.s)
×
809

810
cconvert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)
×
811
cconvert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = cconvert(Ptr{Int8}, s.s)
×
812

813
"""
814
    codeunits(s::AbstractString)
815

816
Obtain a vector-like object containing the code units of a string.
817
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
818
for new string types if necessary.
819

820
# Examples
821
```jldoctest
822
julia> codeunits("Juλia")
823
6-element Base.CodeUnits{UInt8, String}:
824
 0x4a
825
 0x75
826
 0xce
827
 0xbb
828
 0x69
829
 0x61
830
```
831
"""
832
codeunits(s::AbstractString) = CodeUnits(s)
395,019✔
833

834
function _split_rest(s::AbstractString, n::Int)
×
835
    lastind = lastindex(s)
×
836
    i = try
×
837
        prevind(s, lastind, n)
×
838
    catch e
839
        e isa BoundsError || rethrow()
×
840
        _check_length_split_rest(length(s), n)
×
841
    end
842
    last_n = SubString(s, nextind(s, i), lastind)
×
843
    front = s[begin:i]
×
844
    return front, last_n
×
845
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc