• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37940

22 Oct 2024 05:36AM UTC coverage: 85.868% (-1.8%) from 87.654%
#37940

push

local

web-flow
Remove NewPM pass exports. (#56269)

All ecosystem consumers have switched to the string-based API.

77546 of 90308 relevant lines covered (85.87%)

16057626.0 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

93.82
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractString` type is the supertype of all string implementations in
5
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
6
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
7
about strings:
8

9
* Strings are encoded in terms of fixed-size "code units"
10
  * Code units can be extracted with `codeunit(s, i)`
11
  * The first code unit has index `1`
12
  * The last code unit has index `ncodeunits(s)`
13
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
14
* String indexing is done in terms of these code units:
15
  * Characters are extracted by `s[i]` with a valid string index `i`
16
  * Each `AbstractChar` in a string is encoded by one or more code units
17
  * Only the index of the first code unit of an `AbstractChar` is a valid index
18
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
20

21
Some string functions that extract code units, characters or substrings from
22
strings error if you pass them out-of-bounds or invalid string indices. This
23
includes `codeunit(s, i)` and `s[i]`. Functions that do string
24
index arithmetic take a more relaxed approach to indexing and give you the
25
closest valid string index when in-bounds, or when out-of-bounds, behave as if
26
there were an infinite number of characters padding each side of the string.
27
Usually these imaginary padding characters have code unit length `1` but string
28
types may choose different "imaginary" character sizes as makes sense for their
29
implementations (e.g. substrings may pass index arithmetic through to the
30
underlying string they provide a view into). Relaxed indexing functions include
31
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
32
model allows index arithmetic to work with out-of-bounds indices as
33
intermediate values so long as one never uses them to retrieve a character,
34
which often helps avoid needing to code around edge cases.
35

36
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
37
[`nextind`](@ref), [`prevind`](@ref).
38
"""
39
AbstractString
40

41
## required string functions ##
42

43
"""
44
    ncodeunits(s::AbstractString) -> Int
45

46
Return the number of code units in a string. Indices that are in bounds to
47
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
48
are valid – they may not be the start of a character, but they will return a
49
code unit value when calling `codeunit(s,i)`.
50

51
# Examples
52
```jldoctest
53
julia> ncodeunits("The Julia Language")
54
18
55

56
julia> ncodeunits("∫eˣ")
57
6
58

59
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
60
(3, 1, 2)
61
```
62

63
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
64
[`length`](@ref), [`lastindex`](@ref).
65
"""
66
ncodeunits(s::AbstractString)
67

68
"""
69
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
70

71
Return the code unit type of the given string object. For ASCII, Latin-1, or
72
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
73
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
74
limited to these three types, but it's hard to think of widely used string
75
encodings that don't use one of these units. `codeunit(s)` is the same as
76
`typeof(codeunit(s,1))` when `s` is a non-empty string.
77

78
See also [`ncodeunits`](@ref).
79
"""
80
codeunit(s::AbstractString)
81

82
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
83

84
"""
85
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
86

87
Return the code unit value in the string `s` at index `i`. Note that
88

89
    codeunit(s, i) :: codeunit(s)
90

91
I.e. the value returned by `codeunit(s, i)` is of the type returned by
92
`codeunit(s)`.
93

94
# Examples
95
```jldoctest
96
julia> a = codeunit("Hello", 2)
97
0x65
98

99
julia> typeof(a)
100
UInt8
101
```
102

103
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
104
"""
105
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
3✔
106
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
107

108
"""
109
    isvalid(s::AbstractString, i::Integer) -> Bool
110

111
Predicate indicating whether the given index is the start of the encoding of a
112
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
113
character whose encoding starts at that index, if it's false, then `s[i]` will
114
raise an invalid index error or a bounds error depending on if `i` is in bounds.
115
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
116
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
117
is a basic assumption of Julia's generic string support.
118

119
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
120
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
121

122
# Examples
123
```jldoctest
124
julia> str = "αβγdef";
125

126
julia> isvalid(str, 1)
127
true
128

129
julia> str[1]
130
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
131

132
julia> isvalid(str, 2)
133
false
134

135
julia> str[2]
136
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
137
Stacktrace:
138
[...]
139
```
140
"""
141
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
339,933✔
142
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
143

144
"""
145
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
146

147
Return a tuple of the character in `s` at index `i` with the index of the start
148
of the following character in `s`. This is the key method that allows strings to
149
be iterated, yielding a sequences of characters. The `iterate` function, as part
150
of the iteration protocol may assume that `i` is the start of a character in `s`.
151

152
See also [`getindex`](@ref), [`checkbounds`](@ref).
153
"""
154
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
339,916✔
155
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
156

157
## basic generic definitions ##
158

159
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
×
160

161
"""
162
    sizeof(str::AbstractString)
163

164
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
165
the size, in bytes, of one code unit in `str`.
166

167
# Examples
168
```jldoctest
169
julia> sizeof("")
170
0
171

172
julia> sizeof("∀")
173
3
174
```
175
"""
176
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
21,481,084✔
177
firstindex(s::AbstractString) = 1
×
178
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
29,150,613✔
179
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
14,038,453✔
180

181
@propagate_inbounds first(s::AbstractString) = s[firstindex(s)]
8,794,644✔
182

183
function getindex(s::AbstractString, i::Integer)
170,185✔
184
    @boundscheck checkbounds(s, i)
174,341✔
185
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
174,339✔
186
end
187

188
getindex(s::AbstractString, i::Colon) = s
1✔
189
# TODO: handle other ranges with stride ±1 specially?
190
# TODO: add more @propagate_inbounds annotations?
191
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
4✔
192
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
4✔
193
getindex(s::AbstractString, v::AbstractVector{Bool}) =
2✔
194
    throw(ArgumentError("logical indexing not supported for strings"))
195

196
function get(s::AbstractString, i::Integer, default)
5✔
197
# TODO: use ternary once @inbounds is expression-like
198
    if checkbounds(Bool, s, i)
5✔
199
        @inbounds return s[i]
3✔
200
    else
201
        return default
2✔
202
    end
203
end
204

205
## bounds checking ##
206

207
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
754,277,621✔
208
    1 ≤ i ≤ ncodeunits(s)::Int
209
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
33,947,151✔
210
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
211
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
1✔
212
    all(i -> checkbounds(Bool, s, i), I)
1✔
213
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
7✔
214
    all(i -> checkbounds(Bool, s, i), I)
22✔
215
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
620,593,560✔
216
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
217

218
## construction, conversion, promotion ##
219

220
string() = ""
22✔
221
string(s::AbstractString) = s
172✔
222

223
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
224
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
1✔
225
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
523✔
226

227
Symbol(s::AbstractString) = Symbol(String(s))
1✔
228
Symbol(x...) = Symbol(string(x...))
145,019✔
229

230
convert(::Type{T}, s::T) where {T<:AbstractString} = s
21✔
231
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
259,888✔
232

233
## summary ##
234

235
function summary(io::IO, s::AbstractString)
3✔
236
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
5✔
237
    print(io, prefix, " ", typeof(s))
3✔
238
end
239

240
## string & character concatenation ##
241

242
"""
243
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
244

245
Concatenate strings and/or characters, producing a [`String`](@ref) or
246
[`AnnotatedString`](@ref) (as appropriate). This is equivalent to calling the
247
[`string`](@ref) or [`annotatedstring`](@ref) function on the arguments. Concatenation of built-in string
248
types always produces a value of type `String` but other string types may choose
249
to return a string of a different type as appropriate.
250

251
# Examples
252
```jldoctest
253
julia> "Hello " * "world"
254
"Hello world"
255

256
julia> 'j' * "ulia"
257
"julia"
258
```
259
"""
260
function (*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
2,361✔
261
    if _isannotated(s1) || any(_isannotated, ss)
675✔
262
        annotatedstring(s1, ss...)
1,739✔
263
    else
264
        string(s1, ss...)
5,907,281✔
265
    end
266
end
267

268
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
4✔
269

270
# This could be written as a single statement with three ||-clauses, however then effect
271
# analysis thinks it may throw and runtime checks are added.
272
# Also see `substring.jl` for the `::SubString{T}` method.
273
_isannotated(S::Type) = S != Union{} && (S <: AnnotatedString || S <: AnnotatedChar)
201✔
274
_isannotated(s) = _isannotated(typeof(s))
177✔
275

276
## generic string comparison ##
277

278
"""
279
    cmp(a::AbstractString, b::AbstractString) -> Int
280

281
Compare two strings. Return `0` if both strings have the same length and the character
282
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
283
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
284
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
285
code points).
286

287
# Examples
288
```jldoctest
289
julia> cmp("abc", "abc")
290
0
291

292
julia> cmp("ab", "abc")
293
-1
294

295
julia> cmp("abc", "ab")
296
1
297

298
julia> cmp("ab", "ac")
299
-1
300

301
julia> cmp("ac", "ab")
302
1
303

304
julia> cmp("α", "a")
305
1
306

307
julia> cmp("b", "β")
308
-1
309
```
310
"""
311
function cmp(a::AbstractString, b::AbstractString)
373✔
312
    a === b && return 0
373✔
313
    (iv1, iv2) = (iterate(a), iterate(b))
551✔
314
    while iv1 !== nothing && iv2 !== nothing
3,632✔
315
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
3,319✔
316
        c ≠ d && return ifelse(c < d, -1, 1)
3,319✔
317
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
6,199✔
318
    end
3,276✔
319
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
313✔
320
end
321

322
"""
323
    ==(a::AbstractString, b::AbstractString) -> Bool
324

325
Test whether two strings are equal character by character (technically, Unicode
326
code point by code point). Should either string be a [`AnnotatedString`](@ref) the
327
string properties must match too.
328

329
# Examples
330
```jldoctest
331
julia> "abc" == "abc"
332
true
333

334
julia> "abc" == "αβγ"
335
false
336
```
337
"""
338
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
314✔
339

340
"""
341
    isless(a::AbstractString, b::AbstractString) -> Bool
342

343
Test whether string `a` comes before string `b` in alphabetical order
344
(technically, in lexicographical order by Unicode code points).
345

346
# Examples
347
```jldoctest
348
julia> isless("a", "b")
349
true
350

351
julia> isless("β", "α")
352
false
353

354
julia> isless("a", "a")
355
false
356
```
357
"""
358
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
9,800,256✔
359

360
# faster comparisons for symbols
361

362
@assume_effects :total function cmp(a::Symbol, b::Symbol)
363
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
37,759,080✔
364
end
365

366
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
37,737,518✔
367

368
# hashing
369

370
hash(s::AbstractString, h::UInt) = hash(String(s), h)
1✔
371

372
## character index arithmetic ##
373

374
"""
375
    length(s::AbstractString) -> Int
376
    length(s::AbstractString, i::Integer, j::Integer) -> Int
377

378
Return the number of characters in string `s` from indices `i` through `j`.
379

380
This is computed as the number of code unit indices from `i` to `j` which are
381
valid character indices. With only a single string argument, this computes
382
the number of characters in the entire string. With `i` and `j` arguments it
383
computes the number of indices between `i` and `j` inclusive that are valid
384
indices in the string `s`. In addition to in-bounds values, `i` may take the
385
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
386
value `0`.
387

388
!!! note
389
    The time complexity of this operation is linear in general. That is, it
390
    will take the time proportional to the number of bytes or characters in
391
    the string because it counts the value on the fly. This is in contrast to
392
    the method for arrays, which is a constant-time operation.
393

394
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
395
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
396

397
# Examples
398
```jldoctest
399
julia> length("jμΛIα")
400
5
401
```
402
"""
403
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
3,366✔
404

405
function length(s::AbstractString, i::Int, j::Int)
59,573✔
406
    @boundscheck begin
59,575✔
407
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
59,575✔
408
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
59,577✔
409
    end
410
    n = 0
59,572✔
411
    for k = i:j
88,085✔
412
        @inbounds n += isvalid(s, k)
1,703,882✔
413
    end
3,375,540✔
414
    return n
59,572✔
415
end
416

417
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
1✔
418
    length(s, Int(i), Int(j))
419

420
"""
421
    thisind(s::AbstractString, i::Integer) -> Int
422

423
If `i` is in bounds in `s` return the index of the start of the character whose
424
encoding code unit `i` is part of. In other words, if `i` is the start of a
425
character, return `i`; if `i` is not the start of a character, rewind until the
426
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
427
return `i`. In all other cases throw `BoundsError`.
428

429
# Examples
430
```jldoctest
431
julia> thisind("α", 0)
432
0
433

434
julia> thisind("α", 1)
435
1
436

437
julia> thisind("α", 2)
438
1
439

440
julia> thisind("α", 3)
441
3
442

443
julia> thisind("α", 4)
444
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
445
[...]
446

447
julia> thisind("α", -1)
448
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
449
[...]
450
```
451
"""
452
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
4✔
453

454
function thisind(s::AbstractString, i::Int)
1,698✔
455
    z = ncodeunits(s)::Int + 1
1,798✔
456
    i == z && return i
1,798✔
457
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
1,801✔
458
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
1,943✔
459
        i -= 1
1,614✔
460
    end
1,614✔
461
    return i
1,777✔
462
end
463

464
"""
465
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
466

467
* Case `n == 1`
468

469
  If `i` is in bounds in `s` return the index of the start of the character whose
470
  encoding starts before index `i`. In other words, if `i` is the start of a
471
  character, return the start of the previous character; if `i` is not the start
472
  of a character, rewind until the start of a character and return that index.
473
  If `i` is equal to `1` return `0`.
474
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
475
  Otherwise throw `BoundsError`.
476

477
* Case `n > 1`
478

479
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
480
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
481
  iteration decreases the returned value by `1`.
482
  This means that in this case `prevind` can return a negative value.
483

484
* Case `n == 0`
485

486
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
487
  Otherwise `StringIndexError` or `BoundsError` is thrown.
488

489
# Examples
490
```jldoctest
491
julia> prevind("α", 3)
492
1
493

494
julia> prevind("α", 1)
495
0
496

497
julia> prevind("α", 0)
498
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
499
[...]
500

501
julia> prevind("α", 2, 2)
502
0
503

504
julia> prevind("α", 2, 3)
505
-1
506
```
507
"""
508
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
2✔
509
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
7,810,929✔
510
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
19,817,465✔
511

512
function prevind(s::AbstractString, i::Int, n::Int)
22,293,769✔
513
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
22,293,771✔
514
    z = ncodeunits(s) + 1
22,293,767✔
515
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
22,293,800✔
516
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
22,293,733✔
517
    while n > 0 && 1 < i
69,100,588✔
518
        @inbounds n -= isvalid(s, i -= 1)
93,766,651✔
519
    end
46,883,897✔
520
    return i - n
22,216,691✔
521
end
522

523
"""
524
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
525

526
* Case `n == 1`
527

528
  If `i` is in bounds in `s` return the index of the start of the character whose
529
  encoding starts after index `i`. In other words, if `i` is the start of a
530
  character, return the start of the next character; if `i` is not the start
531
  of a character, move forward until the start of a character and return that index.
532
  If `i` is equal to `0` return `1`.
533
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
534
  Otherwise throw `BoundsError`.
535

536
* Case `n > 1`
537

538
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
539
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
540
  each remaining iteration increases the returned value by `1`. This means that in this
541
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
542

543
* Case `n == 0`
544

545
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
546
  Otherwise `StringIndexError` or `BoundsError` is thrown.
547

548
# Examples
549
```jldoctest
550
julia> nextind("α", 0)
551
1
552

553
julia> nextind("α", 1)
554
3
555

556
julia> nextind("α", 3)
557
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
558
[...]
559

560
julia> nextind("α", 0, 2)
561
3
562

563
julia> nextind("α", 1, 2)
564
4
565
```
566
"""
567
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
2✔
568
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
2✔
569
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
269,798✔
570

571
function nextind(s::AbstractString, i::Int, n::Int)
2,797,103✔
572
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
2,797,107✔
573
    z = ncodeunits(s)
2,797,100✔
574
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
2,797,119✔
575
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
2,797,080✔
576
    while n > 0 && i < z
37,109,774✔
577
        @inbounds n -= isvalid(s, i += 1)
68,755,876✔
578
    end
34,380,498✔
579
    return i + n
2,729,276✔
580
end
581

582
## string index iteration type ##
583

584
struct EachStringIndex{T<:AbstractString}
585
    s::T
1,930✔
586
end
587
keys(s::AbstractString) = EachStringIndex(s)
1,444✔
588

589
length(e::EachStringIndex) = length(e.s)
1✔
590
first(::EachStringIndex) = 1
×
591
last(e::EachStringIndex) = lastindex(e.s)
995,109✔
592
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
17,327,774✔
593
eltype(::Type{<:EachStringIndex}) = Int
×
594

595
"""
596
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
597

598
Test whether a character belongs to the ASCII character set, or whether this is true for
599
all elements of a string.
600

601
# Examples
602
```jldoctest
603
julia> isascii('a')
604
true
605

606
julia> isascii('α')
607
false
608

609
julia> isascii("abc")
610
true
611

612
julia> isascii("αβγ")
613
false
614
```
615
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
616
to remove or replace non-ASCII characters, respectively:
617
```jldoctest
618
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
619
"abcdefgh"
620

621
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
622
"abcde fgh"
623
```
624
"""
625
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
10,256,419✔
626
isascii(s::AbstractString) = all(isascii, s)
1✔
627
isascii(c::AbstractChar) = UInt32(c) < 0x80
1✔
628

629
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
630
    r = zero(CU)
40,602✔
631
    for n = first:last
706,127✔
632
        @inbounds r |= code_units[n]
8,062,869✔
633
    end
15,419,615✔
634
    return 0 ≤ r < 0x80
706,127✔
635
end
636

637
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
638
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
639
    n=first
×
640
    while n <= last - chunk_size
786✔
641
        _isascii(cu,n,n+chunk_size-1) || return false
780✔
642
        n += chunk_size
732✔
643
    end
732✔
644
    return  _isascii(cu,last-chunk_size+1,last)
30✔
645
end
646
"""
647
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
648

649
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
650
This function is intended to be used by other string implementations that need a fast ASCII check.
651
"""
652
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
664,746✔
653
    chunk_size = 1024
×
654
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
655
    first = firstindex(cu);   last = lastindex(cu)
664,746✔
656
    l = last - first + 1
664,746✔
657
    l < chunk_threshold && return _isascii(cu,first,last)
664,746✔
658
    return _isascii_chunks(chunk_size,cu,first,last)
786✔
659
end
660

661
## string map, filter ##
662

663
function map(f, s::AbstractString)
20,698✔
664
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
20,707✔
665
    index = UInt(1)
37✔
666
    for c::AbstractChar in s
41,331✔
667
        c′ = f(c)
201,151✔
668
        isa(c′, AbstractChar) || throw(ArgumentError(
214✔
669
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
670
            "try map(f, collect(s)) or a comprehension instead"))
671
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
201,082✔
672
        index += __unsafe_string!(out, convert(Char, c′), index)
201,163✔
673
    end
381,411✔
674
    resize!(out, index-1)
20,697✔
675
    sizehint!(out, index-1)
20,697✔
676
    return String(out)
20,697✔
677
end
678

679
function filter(f, s::AbstractString)
2✔
680
    out = IOBuffer(sizehint=sizeof(s))
4✔
681
    for c in s
2✔
682
        f(c) && write(out, c)
57✔
683
    end
57✔
684
    String(_unsafe_take!(out))
2✔
685
end
686

687
## string first and last ##
688

689
"""
690
    first(s::AbstractString, n::Integer)
691

692
Get a string consisting of the first `n` characters of `s`.
693

694
# Examples
695
```jldoctest
696
julia> first("∀ϵ≠0: ϵ²>0", 0)
697
""
698

699
julia> first("∀ϵ≠0: ϵ²>0", 1)
700
"∀"
701

702
julia> first("∀ϵ≠0: ϵ²>0", 3)
703
"∀ϵ≠"
704
```
705
"""
706
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
26✔
707

708
"""
709
    last(s::AbstractString, n::Integer)
710

711
Get a string consisting of the last `n` characters of `s`.
712

713
# Examples
714
```jldoctest
715
julia> last("∀ϵ≠0: ϵ²>0", 0)
716
""
717

718
julia> last("∀ϵ≠0: ϵ²>0", 1)
719
"0"
720

721
julia> last("∀ϵ≠0: ϵ²>0", 3)
722
"²>0"
723
```
724
"""
725
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
8✔
726

727
"""
728
    reverseind(v, i)
729

730
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
731
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
732
cases where `v` contains non-ASCII characters.)
733

734
# Examples
735
```jldoctest
736
julia> s = "Julia🚀"
737
"Julia🚀"
738

739
julia> r = reverse(s)
740
"🚀ailuJ"
741

742
julia> for i in eachindex(s)
743
           print(r[reverseind(r, i)])
744
       end
745
Julia🚀
746
```
747
"""
748
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
2,130✔
749

750
"""
751
    repeat(s::AbstractString, r::Integer)
752

753
Repeat a string `r` times. This can be written as `s^r`.
754

755
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
756

757
# Examples
758
```jldoctest
759
julia> repeat("ha", 3)
760
"hahaha"
761
```
762
"""
763
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
5✔
764

765
"""
766
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
767

768
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
769

770
See also [`repeat`](@ref).
771

772
# Examples
773
```jldoctest
774
julia> "Test "^3
775
"Test Test Test "
776
```
777
"""
778
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
1,056,844✔
779

780
# reverse-order iteration for strings and indices thereof
781
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
4,166,565✔
782
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
×
783

784
## code unit access ##
785

786
"""
787
    CodeUnits(s::AbstractString)
788

789
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
790
of the string's representation.
791
"""
792
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
793
    s::S
794
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
1,925,253✔
795
end
796

797
length(s::CodeUnits) = ncodeunits(s.s)
74,098,293✔
798
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
1✔
799
size(s::CodeUnits) = (length(s),)
897,496✔
800
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
3✔
801
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
163,011,925✔
802
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
1✔
803
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
73,370,326✔
804

805

806
write(io::IO, s::CodeUnits) = write(io, s.s)
23✔
807

808
cconvert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)
40✔
809
cconvert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = cconvert(Ptr{Int8}, s.s)
×
810

811
"""
812
    codeunits(s::AbstractString)
813

814
Obtain a vector-like object containing the code units of a string.
815
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
816
for new string types if necessary.
817

818
# Examples
819
```jldoctest
820
julia> codeunits("Juλia")
821
6-element Base.CodeUnits{UInt8, String}:
822
 0x4a
823
 0x75
824
 0xce
825
 0xbb
826
 0x69
827
 0x61
828
```
829
"""
830
codeunits(s::AbstractString) = CodeUnits(s)
1,925,250✔
831

832
function _split_rest(s::AbstractString, n::Int)
1✔
833
    lastind = lastindex(s)
2✔
834
    i = try
1✔
835
        prevind(s, lastind, n)
1✔
836
    catch e
837
        e isa BoundsError || rethrow()
×
838
        _check_length_split_rest(length(s), n)
1✔
839
    end
840
    last_n = SubString(s, nextind(s, i), lastind)
1✔
841
    front = s[begin:i]
1✔
842
    return front, last_n
1✔
843
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc