• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37997

29 Jan 2025 02:08AM UTC coverage: 17.283% (-68.7%) from 85.981%
#37997

push

local

web-flow
bpart: Start enforcing min_world for global variable definitions (#57150)

This is the analog of #57102 for global variables. Unlike for consants,
there is no automatic global backdate mechanism. The reasoning for this
is that global variables can be declared at any time, unlike constants
which can only be decalared once their value is available. As a result
code patterns using `Core.eval` to declare globals are rarer and likely
incorrect.

1 of 22 new or added lines in 3 files covered. (4.55%)

31430 existing lines in 188 files now uncovered.

7903 of 45728 relevant lines covered (17.28%)

98663.7 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

17.88
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractString` type is the supertype of all string implementations in
5
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
6
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
7
about strings:
8

9
* Strings are encoded in terms of fixed-size "code units"
10
  * Code units can be extracted with `codeunit(s, i)`
11
  * The first code unit has index `1`
12
  * The last code unit has index `ncodeunits(s)`
13
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
14
* String indexing is done in terms of these code units:
15
  * Characters are extracted by `s[i]` with a valid string index `i`
16
  * Each `AbstractChar` in a string is encoded by one or more code units
17
  * Only the index of the first code unit of an `AbstractChar` is a valid index
18
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
20

21
Some string functions that extract code units, characters or substrings from
22
strings error if you pass them out-of-bounds or invalid string indices. This
23
includes `codeunit(s, i)` and `s[i]`. Functions that do string
24
index arithmetic take a more relaxed approach to indexing and give you the
25
closest valid string index when in-bounds, or when out-of-bounds, behave as if
26
there were an infinite number of characters padding each side of the string.
27
Usually these imaginary padding characters have code unit length `1` but string
28
types may choose different "imaginary" character sizes as makes sense for their
29
implementations (e.g. substrings may pass index arithmetic through to the
30
underlying string they provide a view into). Relaxed indexing functions include
31
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
32
model allows index arithmetic to work with out-of-bounds indices as
33
intermediate values so long as one never uses them to retrieve a character,
34
which often helps avoid needing to code around edge cases.
35

36
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
37
[`nextind`](@ref), [`prevind`](@ref).
38
"""
39
AbstractString
40

41
## required string functions ##
42

43
"""
44
    ncodeunits(s::AbstractString) -> Int
45

46
Return the number of code units in a string. Indices that are in bounds to
47
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
48
are valid – they may not be the start of a character, but they will return a
49
code unit value when calling `codeunit(s,i)`.
50

51
# Examples
52
```jldoctest
53
julia> ncodeunits("The Julia Language")
54
18
55

56
julia> ncodeunits("∫eˣ")
57
6
58

59
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
60
(3, 1, 2)
61
```
62

63
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
64
[`length`](@ref), [`lastindex`](@ref).
65
"""
66
ncodeunits(s::AbstractString)
67

68
"""
69
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
70

71
Return the code unit type of the given string object. For ASCII, Latin-1, or
72
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
73
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
74
limited to these three types, but it's hard to think of widely used string
75
encodings that don't use one of these units. `codeunit(s)` is the same as
76
`typeof(codeunit(s,1))` when `s` is a non-empty string.
77

78
See also [`ncodeunits`](@ref).
79
"""
80
codeunit(s::AbstractString)
81

82
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
83

84
"""
85
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
86

87
Return the code unit value in the string `s` at index `i`. Note that
88

89
    codeunit(s, i) :: codeunit(s)
90

91
I.e. the value returned by `codeunit(s, i)` is of the type returned by
92
`codeunit(s)`.
93

94
# Examples
95
```jldoctest
96
julia> a = codeunit("Hello", 2)
97
0x65
98

99
julia> typeof(a)
100
UInt8
101
```
102

103
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
104
"""
UNCOV
105
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
×
106
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
107

108
"""
109
    isvalid(s::AbstractString, i::Integer) -> Bool
110

111
Predicate indicating whether the given index is the start of the encoding of a
112
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
113
character whose encoding starts at that index, if it's false, then `s[i]` will
114
raise an invalid index error or a bounds error depending on if `i` is in bounds.
115
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
116
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
117
is a basic assumption of Julia's generic string support.
118

119
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
120
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
121

122
# Examples
123
```jldoctest
124
julia> str = "αβγdef";
125

126
julia> isvalid(str, 1)
127
true
128

129
julia> str[1]
130
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
131

132
julia> isvalid(str, 2)
133
false
134

135
julia> str[2]
136
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
137
Stacktrace:
138
[...]
139
```
140
"""
UNCOV
141
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
×
142
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
143

144
"""
145
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
146

147
Return a tuple of the character in `s` at index `i` with the index of the start
148
of the following character in `s`. This is the key method that allows strings to
149
be iterated, yielding a sequences of characters. The `iterate` function, as part
150
of the iteration protocol may assume that `i` is the start of a character in `s`.
151

152
See also [`getindex`](@ref), [`checkbounds`](@ref).
153
"""
UNCOV
154
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
×
155
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
156

157
## basic generic definitions ##
158

UNCOV
159
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
×
160

161
"""
162
    sizeof(str::AbstractString)
163

164
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
165
the size, in bytes, of one code unit in `str`.
166

167
# Examples
168
```jldoctest
169
julia> sizeof("")
170
0
171

172
julia> sizeof("∀")
173
3
174
```
175
"""
176
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
988,392✔
177
firstindex(s::AbstractString) = 1
×
178
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
22,594✔
179
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
1,365,444✔
180

181
@propagate_inbounds first(s::AbstractString) = s[firstindex(s)]
30,417✔
182

UNCOV
183
function getindex(s::AbstractString, i::Integer)
×
UNCOV
184
    @boundscheck checkbounds(s, i)
×
UNCOV
185
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
×
186
end
187

UNCOV
188
getindex(s::AbstractString, i::Colon) = s
×
189
# TODO: handle other ranges with stride ±1 specially?
190
# TODO: add more @propagate_inbounds annotations?
UNCOV
191
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
×
UNCOV
192
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
×
UNCOV
193
getindex(s::AbstractString, v::AbstractVector{Bool}) =
×
194
    throw(ArgumentError("logical indexing not supported for strings"))
195

UNCOV
196
function get(s::AbstractString, i::Integer, default)
×
197
# TODO: use ternary once @inbounds is expression-like
UNCOV
198
    if checkbounds(Bool, s, i)
×
UNCOV
199
        @inbounds return s[i]
×
200
    else
UNCOV
201
        return default
×
202
    end
203
end
204

205
## bounds checking ##
206

207
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
173,947✔
208
    1 ≤ i ≤ ncodeunits(s)::Int
209
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
7,691✔
210
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
UNCOV
211
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
×
UNCOV
212
    all(i -> checkbounds(Bool, s, i), I)
×
UNCOV
213
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
×
UNCOV
214
    all(i -> checkbounds(Bool, s, i), I)
×
215
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
173,501✔
216
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
217

218
## construction, conversion, promotion ##
219

UNCOV
220
string() = ""
×
UNCOV
221
string(s::AbstractString) = s
×
222

223
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
UNCOV
224
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
UNCOV
225
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
×
226

UNCOV
227
Symbol(s::AbstractString) = Symbol(String(s))
×
228
Symbol(x...) = Symbol(string(x...))
18✔
229

UNCOV
230
convert(::Type{T}, s::T) where {T<:AbstractString} = s
×
231
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
63,090✔
232

233
## summary ##
234

UNCOV
235
function summary(io::IO, s::AbstractString)
×
UNCOV
236
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
×
UNCOV
237
    print(io, prefix, " ", typeof(s))
×
238
end
239

240
## string & character concatenation ##
241

242
"""
243
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
244

245
Concatenate strings and/or characters, producing a [`String`](@ref) or
246
[`AnnotatedString`](@ref) (as appropriate). This is equivalent to calling the
247
[`string`](@ref) or [`annotatedstring`](@ref) function on the arguments. Concatenation of built-in string
248
types always produces a value of type `String` but other string types may choose
249
to return a string of a different type as appropriate.
250

251
# Examples
252
```jldoctest
253
julia> "Hello " * "world"
254
"Hello world"
255

256
julia> 'j' * "ulia"
257
"julia"
258
```
259
"""
260
function (*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
UNCOV
261
    if _isannotated(s1) || any(_isannotated, ss)
×
UNCOV
262
        annotatedstring(s1, ss...)
×
263
    else
264
        string(s1, ss...)
20,130✔
265
    end
266
end
267

UNCOV
268
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
×
269

270
# This could be written as a single statement with three ||-clauses, however then effect
271
# analysis thinks it may throw and runtime checks are added.
272
# Also see `substring.jl` for the `::SubString{T}` method.
UNCOV
273
_isannotated(S::Type) = S != Union{} && (S <: AnnotatedString || S <: AnnotatedChar)
×
UNCOV
274
_isannotated(s) = _isannotated(typeof(s))
×
275

276
## generic string comparison ##
277

278
"""
279
    cmp(a::AbstractString, b::AbstractString) -> Int
280

281
Compare two strings. Return `0` if both strings have the same length and the character
282
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
283
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
284
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
285
code points).
286

287
# Examples
288
```jldoctest
289
julia> cmp("abc", "abc")
290
0
291

292
julia> cmp("ab", "abc")
293
-1
294

295
julia> cmp("abc", "ab")
296
1
297

298
julia> cmp("ab", "ac")
299
-1
300

301
julia> cmp("ac", "ab")
302
1
303

304
julia> cmp("α", "a")
305
1
306

307
julia> cmp("b", "β")
308
-1
309
```
310
"""
UNCOV
311
function cmp(a::AbstractString, b::AbstractString)
×
UNCOV
312
    a === b && return 0
×
UNCOV
313
    (iv1, iv2) = (iterate(a), iterate(b))
×
UNCOV
314
    while iv1 !== nothing && iv2 !== nothing
×
UNCOV
315
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
×
UNCOV
316
        c ≠ d && return ifelse(c < d, -1, 1)
×
UNCOV
317
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
×
UNCOV
318
    end
×
UNCOV
319
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
×
320
end
321

322
"""
323
    ==(a::AbstractString, b::AbstractString) -> Bool
324

325
Test whether two strings are equal character by character (technically, Unicode
326
code point by code point). Should either string be a [`AnnotatedString`](@ref) the
327
string properties must match too.
328

329
# Examples
330
```jldoctest
331
julia> "abc" == "abc"
332
true
333

334
julia> "abc" == "αβγ"
335
false
336
```
337
"""
UNCOV
338
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
×
339

340
"""
341
    isless(a::AbstractString, b::AbstractString) -> Bool
342

343
Test whether string `a` comes before string `b` in alphabetical order
344
(technically, in lexicographical order by Unicode code points).
345

346
# Examples
347
```jldoctest
348
julia> isless("a", "b")
349
true
350

351
julia> isless("β", "α")
352
false
353

354
julia> isless("a", "a")
355
false
356
```
357
"""
358
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
232,638✔
359

360
# faster comparisons for symbols
361

UNCOV
362
@assume_effects :total function cmp(a::Symbol, b::Symbol)
×
UNCOV
363
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
×
364
end
365

UNCOV
366
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
×
367

368
# hashing
369

UNCOV
370
hash(s::AbstractString, h::UInt) = hash(String(s), h)
×
371

372
## character index arithmetic ##
373

374
"""
375
    length(s::AbstractString) -> Int
376
    length(s::AbstractString, i::Integer, j::Integer) -> Int
377

378
Return the number of characters in string `s` from indices `i` through `j`.
379

380
This is computed as the number of code unit indices from `i` to `j` which are
381
valid character indices. With only a single string argument, this computes
382
the number of characters in the entire string. With `i` and `j` arguments it
383
computes the number of indices between `i` and `j` inclusive that are valid
384
indices in the string `s`. In addition to in-bounds values, `i` may take the
385
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
386
value `0`.
387

388
!!! note
389
    The time complexity of this operation is linear in general. That is, it
390
    will take the time proportional to the number of bytes or characters in
391
    the string because it counts the value on the fly. This is in contrast to
392
    the method for arrays, which is a constant-time operation.
393

394
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
395
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
396

397
# Examples
398
```jldoctest
399
julia> length("jμΛIα")
400
5
401
```
402
"""
UNCOV
403
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
×
404

UNCOV
405
function length(s::AbstractString, i::Int, j::Int)
×
UNCOV
406
    @boundscheck begin
×
UNCOV
407
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
×
UNCOV
408
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
×
409
    end
UNCOV
410
    n = 0
×
UNCOV
411
    for k = i:j
×
UNCOV
412
        @inbounds n += isvalid(s, k)
×
UNCOV
413
    end
×
UNCOV
414
    return n
×
415
end
416

UNCOV
417
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
×
418
    length(s, Int(i), Int(j))
419

420
"""
421
    thisind(s::AbstractString, i::Integer) -> Int
422

423
If `i` is in bounds in `s` return the index of the start of the character whose
424
encoding code unit `i` is part of. In other words, if `i` is the start of a
425
character, return `i`; if `i` is not the start of a character, rewind until the
426
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
427
return `i`. In all other cases throw `BoundsError`.
428

429
# Examples
430
```jldoctest
431
julia> thisind("α", 0)
432
0
433

434
julia> thisind("α", 1)
435
1
436

437
julia> thisind("α", 2)
438
1
439

440
julia> thisind("α", 3)
441
3
442

443
julia> thisind("α", 4)
444
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
445
[...]
446

447
julia> thisind("α", -1)
448
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
449
[...]
450
```
451
"""
UNCOV
452
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
×
453

UNCOV
454
function thisind(s::AbstractString, i::Int)
×
UNCOV
455
    z = ncodeunits(s)::Int + 1
×
UNCOV
456
    i == z && return i
×
UNCOV
457
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
×
UNCOV
458
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
×
UNCOV
459
        i -= 1
×
UNCOV
460
    end
×
UNCOV
461
    return i
×
462
end
463

464
"""
465
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
466

467
* Case `n == 1`
468

469
  If `i` is in bounds in `s` return the index of the start of the character whose
470
  encoding starts before index `i`. In other words, if `i` is the start of a
471
  character, return the start of the previous character; if `i` is not the start
472
  of a character, rewind until the start of a character and return that index.
473
  If `i` is equal to `1` return `0`.
474
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
475
  Otherwise throw `BoundsError`.
476

477
* Case `n > 1`
478

479
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
480
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
481
  iteration decreases the returned value by `1`.
482
  This means that in this case `prevind` can return a negative value.
483

484
* Case `n == 0`
485

486
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
487
  Otherwise `StringIndexError` or `BoundsError` is thrown.
488

489
# Examples
490
```jldoctest
491
julia> prevind("α", 3)
492
1
493

494
julia> prevind("α", 1)
495
0
496

497
julia> prevind("α", 0)
498
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
499
[...]
500

501
julia> prevind("α", 2, 2)
502
0
503

504
julia> prevind("α", 2, 3)
505
-1
506
```
507
"""
UNCOV
508
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
×
UNCOV
509
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
×
510
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
13✔
511

UNCOV
512
function prevind(s::AbstractString, i::Int, n::Int)
×
UNCOV
513
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
×
UNCOV
514
    z = ncodeunits(s) + 1
×
UNCOV
515
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
×
UNCOV
516
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
×
UNCOV
517
    while n > 0 && 1 < i
×
UNCOV
518
        @inbounds n -= isvalid(s, i -= 1)
×
UNCOV
519
    end
×
UNCOV
520
    return i - n
×
521
end
522

523
"""
524
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
525

526
* Case `n == 1`
527

528
  If `i` is in bounds in `s` return the index of the start of the character whose
529
  encoding starts after index `i`. In other words, if `i` is the start of a
530
  character, return the start of the next character; if `i` is not the start
531
  of a character, move forward until the start of a character and return that index.
532
  If `i` is equal to `0` return `1`.
533
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
534
  Otherwise throw `BoundsError`.
535

536
* Case `n > 1`
537

538
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
539
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
540
  each remaining iteration increases the returned value by `1`. This means that in this
541
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
542

543
* Case `n == 0`
544

545
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
546
  Otherwise `StringIndexError` or `BoundsError` is thrown.
547

548
# Examples
549
```jldoctest
550
julia> nextind("α", 0)
551
1
552

553
julia> nextind("α", 1)
554
3
555

556
julia> nextind("α", 3)
557
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
558
[...]
559

560
julia> nextind("α", 0, 2)
561
3
562

563
julia> nextind("α", 1, 2)
564
4
565
```
566
"""
UNCOV
567
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
×
UNCOV
568
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
×
UNCOV
569
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
×
570

UNCOV
571
function nextind(s::AbstractString, i::Int, n::Int)
×
UNCOV
572
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
×
UNCOV
573
    z = ncodeunits(s)
×
UNCOV
574
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
×
UNCOV
575
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
×
UNCOV
576
    while n > 0 && i < z
×
UNCOV
577
        @inbounds n -= isvalid(s, i += 1)
×
UNCOV
578
    end
×
UNCOV
579
    return i + n
×
580
end
581

582
## string index iteration type ##
583

584
struct EachStringIndex{T<:AbstractString}
UNCOV
585
    s::T
×
586
end
UNCOV
587
keys(s::AbstractString) = EachStringIndex(s)
×
588

UNCOV
589
length(e::EachStringIndex) = length(e.s)
×
590
first(::EachStringIndex) = 1
×
591
last(e::EachStringIndex) = lastindex(e.s)
36✔
UNCOV
592
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
×
UNCOV
593
eltype(::Type{<:EachStringIndex}) = Int
×
594

595
"""
596
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
597

598
Test whether a character belongs to the ASCII character set, or whether this is true for
599
all elements of a string.
600

601
# Examples
602
```jldoctest
603
julia> isascii('a')
604
true
605

606
julia> isascii('α')
607
false
608

609
julia> isascii("abc")
610
true
611

612
julia> isascii("αβγ")
613
false
614
```
615
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
616
to remove or replace non-ASCII characters, respectively:
617
```jldoctest
618
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
619
"abcdefgh"
620

621
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
622
"abcde fgh"
623
```
624
"""
625
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
4,053✔
UNCOV
626
isascii(s::AbstractString) = all(isascii, s)
×
UNCOV
627
isascii(c::AbstractChar) = UInt32(c) < 0x80
×
628

629
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
UNCOV
630
    r = zero(CU)
×
UNCOV
631
    for n = first:last
×
UNCOV
632
        @inbounds r |= code_units[n]
×
UNCOV
633
    end
×
UNCOV
634
    return 0 ≤ r < 0x80
×
635
end
636

637
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
UNCOV
638
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
×
639
    n=first
×
UNCOV
640
    while n <= last - chunk_size
×
UNCOV
641
        _isascii(cu,n,n+chunk_size-1) || return false
×
UNCOV
642
        n += chunk_size
×
UNCOV
643
    end
×
UNCOV
644
    return  _isascii(cu,last-chunk_size+1,last)
×
645
end
646
"""
647
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
648

649
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
650
This function is intended to be used by other string implementations that need a fast ASCII check.
651
"""
UNCOV
652
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
×
653
    chunk_size = 1024
×
654
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
UNCOV
655
    first = firstindex(cu);   last = lastindex(cu)
×
UNCOV
656
    l = last - first + 1
×
UNCOV
657
    l < chunk_threshold && return _isascii(cu,first,last)
×
UNCOV
658
    return _isascii_chunks(chunk_size,cu,first,last)
×
659
end
660

661
## string map, filter ##
662

663
function map(f, s::AbstractString)
2,007✔
664
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
2,007✔
665
    index = UInt(1)
2,007✔
666
    for c::AbstractChar in s
4,014✔
667
        c′ = f(c)
18,663✔
668
        isa(c′, AbstractChar) || throw(ArgumentError(
18,663✔
669
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
670
            "try map(f, collect(s)) or a comprehension instead"))
671
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
18,663✔
672
        index += __unsafe_string!(out, convert(Char, c′), index)
18,663✔
673
    end
35,319✔
674
    resize!(out, index-1)
2,007✔
675
    sizehint!(out, index-1)
2,007✔
676
    return String(out)
2,007✔
677
end
678

UNCOV
679
function filter(f, s::AbstractString)
×
UNCOV
680
    out = IOBuffer(sizehint=sizeof(s))
×
UNCOV
681
    for c in s
×
UNCOV
682
        f(c) && write(out, c)
×
UNCOV
683
    end
×
UNCOV
684
    String(_unsafe_take!(out))
×
685
end
686

687
## string first and last ##
688

689
"""
690
    first(s::AbstractString, n::Integer)
691

692
Get a string consisting of the first `n` characters of `s`.
693

694
# Examples
695
```jldoctest
696
julia> first("∀ϵ≠0: ϵ²>0", 0)
697
""
698

699
julia> first("∀ϵ≠0: ϵ²>0", 1)
700
"∀"
701

702
julia> first("∀ϵ≠0: ϵ²>0", 3)
703
"∀ϵ≠"
704
```
705
"""
UNCOV
706
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
×
707

708
"""
709
    last(s::AbstractString, n::Integer)
710

711
Get a string consisting of the last `n` characters of `s`.
712

713
# Examples
714
```jldoctest
715
julia> last("∀ϵ≠0: ϵ²>0", 0)
716
""
717

718
julia> last("∀ϵ≠0: ϵ²>0", 1)
719
"0"
720

721
julia> last("∀ϵ≠0: ϵ²>0", 3)
722
"²>0"
723
```
724
"""
UNCOV
725
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
×
726

727
"""
728
    reverseind(v, i)
729

730
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
731
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
732
cases where `v` contains non-ASCII characters.)
733

734
# Examples
735
```jldoctest
736
julia> s = "Julia🚀"
737
"Julia🚀"
738

739
julia> r = reverse(s)
740
"🚀ailuJ"
741

742
julia> for i in eachindex(s)
743
           print(r[reverseind(r, i)])
744
       end
745
Julia🚀
746
```
747
"""
UNCOV
748
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
×
749

750
"""
751
    repeat(s::AbstractString, r::Integer)
752

753
Repeat a string `r` times. This can be written as `s^r`.
754

755
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
756

757
# Examples
758
```jldoctest
759
julia> repeat("ha", 3)
760
"hahaha"
761
```
762
"""
UNCOV
763
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
×
764

765
"""
766
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
767

768
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
769

770
See also [`repeat`](@ref).
771

772
# Examples
773
```jldoctest
774
julia> "Test "^3
775
"Test Test Test "
776
```
777
"""
778
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
523✔
779

780
# reverse-order iteration for strings and indices thereof
UNCOV
781
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
×
UNCOV
782
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
×
783

784
## code unit access ##
785

786
"""
787
    CodeUnits(s::AbstractString)
788

789
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
790
of the string's representation.
791
"""
792
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
793
    s::S
794
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
394,086✔
795
end
796

797
length(s::CodeUnits) = ncodeunits(s.s)
3,547,761✔
UNCOV
798
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
×
UNCOV
799
size(s::CodeUnits) = (length(s),)
×
UNCOV
800
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
×
801
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
3,194,381✔
UNCOV
802
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
×
803
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
3,698,054✔
804

805

UNCOV
806
write(io::IO, s::CodeUnits) = write(io, s.s)
×
807

UNCOV
808
cconvert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)
×
809
cconvert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = cconvert(Ptr{Int8}, s.s)
×
810

811
"""
812
    codeunits(s::AbstractString)
813

814
Obtain a vector-like object containing the code units of a string.
815
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
816
for new string types if necessary.
817

818
# Examples
819
```jldoctest
820
julia> codeunits("Juλia")
821
6-element Base.CodeUnits{UInt8, String}:
822
 0x4a
823
 0x75
824
 0xce
825
 0xbb
826
 0x69
827
 0x61
828
```
829
"""
830
codeunits(s::AbstractString) = CodeUnits(s)
394,086✔
831

UNCOV
832
function _split_rest(s::AbstractString, n::Int)
×
UNCOV
833
    lastind = lastindex(s)
×
UNCOV
834
    i = try
×
UNCOV
835
        prevind(s, lastind, n)
×
836
    catch e
837
        e isa BoundsError || rethrow()
×
UNCOV
838
        _check_length_split_rest(length(s), n)
×
839
    end
UNCOV
840
    last_n = SubString(s, nextind(s, i), lastind)
×
UNCOV
841
    front = s[begin:i]
×
UNCOV
842
    return front, last_n
×
843
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc