• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37735

02 Apr 2024 02:31AM UTC coverage: 80.448% (-1.0%) from 81.405%
#37735

push

local

web-flow
documentation followup for "invert linetable representation (#52415)" (#53781)

- fix up added documents: eb05b4f2a
- ~~set up a specific type to capture the 3-set data of `codelocs`:
6afde4b74~~ (moved to a separate PR)

69784 of 86744 relevant lines covered (80.45%)

14187382.51 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

89.89
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractString` type is the supertype of all string implementations in
5
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
6
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
7
about strings:
8

9
* Strings are encoded in terms of fixed-size "code units"
10
  * Code units can be extracted with `codeunit(s, i)`
11
  * The first code unit has index `1`
12
  * The last code unit has index `ncodeunits(s)`
13
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
14
* String indexing is done in terms of these code units:
15
  * Characters are extracted by `s[i]` with a valid string index `i`
16
  * Each `AbstractChar` in a string is encoded by one or more code units
17
  * Only the index of the first code unit of an `AbstractChar` is a valid index
18
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
20

21
Some string functions that extract code units, characters or substrings from
22
strings error if you pass them out-of-bounds or invalid string indices. This
23
includes `codeunit(s, i)` and `s[i]`. Functions that do string
24
index arithmetic take a more relaxed approach to indexing and give you the
25
closest valid string index when in-bounds, or when out-of-bounds, behave as if
26
there were an infinite number of characters padding each side of the string.
27
Usually these imaginary padding characters have code unit length `1` but string
28
types may choose different "imaginary" character sizes as makes sense for their
29
implementations (e.g. substrings may pass index arithmetic through to the
30
underlying string they provide a view into). Relaxed indexing functions include
31
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
32
model allows index arithmetic to work with out-of-bounds indices as
33
intermediate values so long as one never uses them to retrieve a character,
34
which often helps avoid needing to code around edge cases.
35

36
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
37
[`nextind`](@ref), [`prevind`](@ref).
38
"""
39
AbstractString
40

41
## required string functions ##
42

43
"""
44
    ncodeunits(s::AbstractString) -> Int
45

46
Return the number of code units in a string. Indices that are in bounds to
47
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
48
are valid – they may not be the start of a character, but they will return a
49
code unit value when calling `codeunit(s,i)`.
50

51
# Examples
52
```jldoctest
53
julia> ncodeunits("The Julia Language")
54
18
55

56
julia> ncodeunits("∫eˣ")
57
6
58

59
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
60
(3, 1, 2)
61
```
62

63
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
64
[`length`](@ref), [`lastindex`](@ref).
65
"""
66
ncodeunits(s::AbstractString)
67

68
"""
69
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
70

71
Return the code unit type of the given string object. For ASCII, Latin-1, or
72
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
73
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
74
limited to these three types, but it's hard to think of widely used string
75
encodings that don't use one of these units. `codeunit(s)` is the same as
76
`typeof(codeunit(s,1))` when `s` is a non-empty string.
77

78
See also [`ncodeunits`](@ref).
79
"""
80
codeunit(s::AbstractString)
81

82
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
83

84
"""
85
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
86

87
Return the code unit value in the string `s` at index `i`. Note that
88

89
    codeunit(s, i) :: codeunit(s)
90

91
I.e. the value returned by `codeunit(s, i)` is of the type returned by
92
`codeunit(s)`.
93

94
# Examples
95
```jldoctest
96
julia> a = codeunit("Hello", 2)
97
0x65
98

99
julia> typeof(a)
100
UInt8
101
```
102

103
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
104
"""
105
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
3✔
106
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
107

108
"""
109
    isvalid(s::AbstractString, i::Integer) -> Bool
110

111
Predicate indicating whether the given index is the start of the encoding of a
112
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
113
character whose encoding starts at that index, if it's false, then `s[i]` will
114
raise an invalid index error or a bounds error depending on if `i` is in bounds.
115
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
116
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
117
is a basic assumption of Julia's generic string support.
118

119
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
120
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
121

122
# Examples
123
```jldoctest
124
julia> str = "αβγdef";
125

126
julia> isvalid(str, 1)
127
true
128

129
julia> str[1]
130
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
131

132
julia> isvalid(str, 2)
133
false
134

135
julia> str[2]
136
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
137
Stacktrace:
138
[...]
139
```
140
"""
141
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
407,385✔
142
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
143

144
"""
145
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
146

147
Return a tuple of the character in `s` at index `i` with the index of the start
148
of the following character in `s`. This is the key method that allows strings to
149
be iterated, yielding a sequences of characters. If `i` is out of bounds in `s`
150
then a bounds error is raised. The `iterate` function, as part of the iteration
151
protocol may assume that `i` is the start of a character in `s`.
152

153
See also [`getindex`](@ref), [`checkbounds`](@ref).
154
"""
155
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
407,368✔
156
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
157

158
## basic generic definitions ##
159

160
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
×
161

162
"""
163
    sizeof(str::AbstractString)
164

165
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
166
the size, in bytes, of one code unit in `str`.
167

168
# Examples
169
```jldoctest
170
julia> sizeof("")
171
0
172

173
julia> sizeof("∀")
174
3
175
```
176
"""
177
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
19,011,138✔
178
firstindex(s::AbstractString) = 1
×
179
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
27,193,249✔
180
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
12,828,732✔
181

182
@propagate_inbounds first(s::AbstractString) = s[firstindex(s)]
8,157,858✔
183

184
function getindex(s::AbstractString, i::Integer)
203,911✔
185
    @boundscheck checkbounds(s, i)
208,213✔
186
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
208,211✔
187
end
188

189
getindex(s::AbstractString, i::Colon) = s
1✔
190
# TODO: handle other ranges with stride ±1 specially?
191
# TODO: add more @propagate_inbounds annotations?
192
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
4✔
193
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
4✔
194
getindex(s::AbstractString, v::AbstractVector{Bool}) =
2✔
195
    throw(ArgumentError("logical indexing not supported for strings"))
196

197
function get(s::AbstractString, i::Integer, default)
5✔
198
# TODO: use ternary once @inbounds is expression-like
199
    if checkbounds(Bool, s, i)
5✔
200
        @inbounds return s[i]
3✔
201
    else
202
        return default
2✔
203
    end
204
end
205

206
## bounds checking ##
207

208
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
623,734,338✔
209
    1 ≤ i ≤ ncodeunits(s)::Int
210
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
31,582,991✔
211
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
212
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
1✔
213
    all(i -> checkbounds(Bool, s, i), I)
1✔
214
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
7✔
215
    all(i -> checkbounds(Bool, s, i), I)
22✔
216
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
489,785,216✔
217
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
218

219
## construction, conversion, promotion ##
220

221
string() = ""
22✔
222
string(s::AbstractString) = s
172✔
223

224
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
225
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
1✔
226
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
523✔
227

228
Symbol(s::AbstractString) = Symbol(String(s))
1✔
229
Symbol(x...) = Symbol(string(x...))
91,545✔
230

231
convert(::Type{T}, s::T) where {T<:AbstractString} = s
21✔
232
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
212,408✔
233

234
## summary ##
235

236
function summary(io::IO, s::AbstractString)
3✔
237
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
5✔
238
    print(io, prefix, " ", typeof(s))
3✔
239
end
240

241
## string & character concatenation ##
242

243
"""
244
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
245

246
Concatenate strings and/or characters, producing a [`String`](@ref) or
247
[`AnnotatedString`](@ref) (as appropriate). This is equivalent to calling the
248
[`string`](@ref) or [`annotatedstring`](@ref) function on the arguments. Concatenation of built-in string
249
types always produces a value of type `String` but other string types may choose
250
to return a string of a different type as appropriate.
251

252
# Examples
253
```jldoctest
254
julia> "Hello " * "world"
255
"Hello world"
256

257
julia> 'j' * "ulia"
258
"julia"
259
```
260
"""
261
function (*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
3,422✔
262
    if _isannotated(s1) || any(_isannotated, ss)
1,108,614✔
263
        annotatedstring(s1, ss...)
559✔
264
    else
265
        string(s1, ss...)
5,788,618✔
266
    end
267
end
268

269
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
4✔
270

271
# This could be written as a single statement with three ||-clauses, however then effect
272
# analysis thinks it may throw and runtime checks are added.
273
# Also see `substring.jl` for the `::SubString{T}` method.
274
_isannotated(S::Type) = S != Union{} && (S <: AnnotatedString || S <: AnnotatedChar)
203✔
275
_isannotated(s) = _isannotated(typeof(s))
176✔
276

277
## generic string comparison ##
278

279
"""
280
    cmp(a::AbstractString, b::AbstractString) -> Int
281

282
Compare two strings. Return `0` if both strings have the same length and the character
283
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
284
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
285
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
286
code points).
287

288
# Examples
289
```jldoctest
290
julia> cmp("abc", "abc")
291
0
292

293
julia> cmp("ab", "abc")
294
-1
295

296
julia> cmp("abc", "ab")
297
1
298

299
julia> cmp("ab", "ac")
300
-1
301

302
julia> cmp("ac", "ab")
303
1
304

305
julia> cmp("α", "a")
306
1
307

308
julia> cmp("b", "β")
309
-1
310
```
311
"""
312
function cmp(a::AbstractString, b::AbstractString)
348✔
313
    a === b && return 0
348✔
314
    (iv1, iv2) = (iterate(a), iterate(b))
491✔
315
    while iv1 !== nothing && iv2 !== nothing
924✔
316
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
641✔
317
        c ≠ d && return ifelse(c < d, -1, 1)
641✔
318
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
873✔
319
    end
598✔
320
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
283✔
321
end
322

323
"""
324
    ==(a::AbstractString, b::AbstractString) -> Bool
325

326
Test whether two strings are equal character by character (technically, Unicode
327
code point by code point). Should either string be a [`AnnotatedString`](@ref) the
328
string properties must match too.
329

330
# Examples
331
```jldoctest
332
julia> "abc" == "abc"
333
true
334

335
julia> "abc" == "αβγ"
336
false
337
```
338
"""
339
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
289✔
340

341
"""
342
    isless(a::AbstractString, b::AbstractString) -> Bool
343

344
Test whether string `a` comes before string `b` in alphabetical order
345
(technically, in lexicographical order by Unicode code points).
346

347
# Examples
348
```jldoctest
349
julia> isless("a", "b")
350
true
351

352
julia> isless("β", "α")
353
false
354

355
julia> isless("a", "a")
356
false
357
```
358
"""
359
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
6,017,790✔
360

361
# faster comparisons for symbols
362

363
@assume_effects :total function cmp(a::Symbol, b::Symbol)
364
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
35,695,096✔
365
end
366

367
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
35,677,870✔
368

369
# hashing
370

371
hash(s::AbstractString, h::UInt) = hash(String(s), h)
1✔
372

373
## character index arithmetic ##
374

375
"""
376
    length(s::AbstractString) -> Int
377
    length(s::AbstractString, i::Integer, j::Integer) -> Int
378

379
Return the number of characters in string `s` from indices `i` through `j`.
380

381
This is computed as the number of code unit indices from `i` to `j` which are
382
valid character indices. With only a single string argument, this computes
383
the number of characters in the entire string. With `i` and `j` arguments it
384
computes the number of indices between `i` and `j` inclusive that are valid
385
indices in the string `s`. In addition to in-bounds values, `i` may take the
386
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
387
value `0`.
388

389
!!! note
390
    The time complexity of this operation is linear in general. That is, it
391
    will take the time proportional to the number of bytes or characters in
392
    the string because it counts the value on the fly. This is in contrast to
393
    the method for arrays, which is a constant-time operation.
394

395
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
396
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
397

398
# Examples
399
```jldoctest
400
julia> length("jμΛIα")
401
5
402
```
403
"""
404
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
3,396✔
405

406
function length(s::AbstractString, i::Int, j::Int)
59,635✔
407
    @boundscheck begin
59,635✔
408
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
59,635✔
409
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
59,638✔
410
    end
411
    n = 0
59,632✔
412
    for k = i:j
88,158✔
413
        @inbounds n += isvalid(s, k)
1,704,074✔
414
    end
3,375,876✔
415
    return n
59,632✔
416
end
417

418
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
1✔
419
    length(s, Int(i), Int(j))
420

421
"""
422
    thisind(s::AbstractString, i::Integer) -> Int
423

424
If `i` is in bounds in `s` return the index of the start of the character whose
425
encoding code unit `i` is part of. In other words, if `i` is the start of a
426
character, return `i`; if `i` is not the start of a character, rewind until the
427
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
428
return `i`. In all other cases throw `BoundsError`.
429

430
# Examples
431
```jldoctest
432
julia> thisind("α", 0)
433
0
434

435
julia> thisind("α", 1)
436
1
437

438
julia> thisind("α", 2)
439
1
440

441
julia> thisind("α", 3)
442
3
443

444
julia> thisind("α", 4)
445
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
446
[...]
447

448
julia> thisind("α", -1)
449
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
450
[...]
451
```
452
"""
453
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
52✔
454

455
function thisind(s::AbstractString, i::Int)
1,562✔
456
    z = ncodeunits(s)::Int + 1
1,670✔
457
    i == z && return i
1,670✔
458
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
1,673✔
459
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
1,689✔
460
        i -= 1
1,614✔
461
    end
1,614✔
462
    return i
1,649✔
463
end
464

465
"""
466
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
467

468
* Case `n == 1`
469

470
  If `i` is in bounds in `s` return the index of the start of the character whose
471
  encoding starts before index `i`. In other words, if `i` is the start of a
472
  character, return the start of the previous character; if `i` is not the start
473
  of a character, rewind until the start of a character and return that index.
474
  If `i` is equal to `1` return `0`.
475
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
476
  Otherwise throw `BoundsError`.
477

478
* Case `n > 1`
479

480
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
481
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
482
  iteration decreases the returned value by `1`.
483
  This means that in this case `prevind` can return a negative value.
484

485
* Case `n == 0`
486

487
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
488
  Otherwise `StringIndexError` or `BoundsError` is thrown.
489

490
# Examples
491
```jldoctest
492
julia> prevind("α", 3)
493
1
494

495
julia> prevind("α", 1)
496
0
497

498
julia> prevind("α", 0)
499
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
500
[...]
501

502
julia> prevind("α", 2, 2)
503
0
504

505
julia> prevind("α", 2, 3)
506
-1
507
```
508
"""
509
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
2✔
510
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
7,640,713✔
511
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
20,198,849✔
512

513
function prevind(s::AbstractString, i::Int, n::Int)
22,672,621✔
514
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
22,672,621✔
515
    z = ncodeunits(s) + 1
22,672,617✔
516
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
22,672,651✔
517
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
22,672,583✔
518
    while n > 0 && 1 < i
69,949,454✔
519
        @inbounds n -= isvalid(s, i -= 1)
94,706,659✔
520
    end
47,353,913✔
521
    return i - n
22,595,541✔
522
end
523

524
"""
525
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
526

527
* Case `n == 1`
528

529
  If `i` is in bounds in `s` return the index of the start of the character whose
530
  encoding starts after index `i`. In other words, if `i` is the start of a
531
  character, return the start of the next character; if `i` is not the start
532
  of a character, move forward until the start of a character and return that index.
533
  If `i` is equal to `0` return `1`.
534
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
535
  Otherwise throw `BoundsError`.
536

537
* Case `n > 1`
538

539
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
540
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
541
  each remaining iteration increases the returned value by `1`. This means that in this
542
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
543

544
* Case `n == 0`
545

546
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
547
  Otherwise `StringIndexError` or `BoundsError` is thrown.
548

549
# Examples
550
```jldoctest
551
julia> nextind("α", 0)
552
1
553

554
julia> nextind("α", 1)
555
3
556

557
julia> nextind("α", 3)
558
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
559
[...]
560

561
julia> nextind("α", 0, 2)
562
3
563

564
julia> nextind("α", 1, 2)
565
4
566
```
567
"""
568
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
2✔
569
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
2✔
570
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
3,376✔
571

572
function nextind(s::AbstractString, i::Int, n::Int)
2,532,225✔
573
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
2,532,227✔
574
    z = ncodeunits(s)
2,532,220✔
575
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
2,532,240✔
576
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
2,532,200✔
577
    while n > 0 && i < z
36,652,936✔
578
        @inbounds n -= isvalid(s, i += 1)
68,371,761✔
579
    end
34,188,550✔
580
    return i + n
2,464,386✔
581
end
582

583
## string index iteration type ##
584

585
struct EachStringIndex{T<:AbstractString}
586
    s::T
3,590✔
587
end
588
keys(s::AbstractString) = EachStringIndex(s)
3,290✔
589

590
length(e::EachStringIndex) = length(e.s)
912,260✔
591
first(::EachStringIndex) = 1
×
592
last(e::EachStringIndex) = lastindex(e.s)
2,353✔
593
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
17,056,599✔
594
eltype(::Type{<:EachStringIndex}) = Int
×
595

596
"""
597
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
598

599
Test whether a character belongs to the ASCII character set, or whether this is true for
600
all elements of a string.
601

602
# Examples
603
```jldoctest
604
julia> isascii('a')
605
true
606

607
julia> isascii('α')
608
false
609

610
julia> isascii("abc")
611
true
612

613
julia> isascii("αβγ")
614
false
615
```
616
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
617
to remove or replace non-ASCII characters, respectively:
618
```jldoctest
619
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
620
"abcdefgh"
621

622
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
623
"abcde fgh"
624
```
625
"""
626
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
1,946,536✔
627
isascii(s::AbstractString) = all(isascii, s)
1✔
628
isascii(c::AbstractChar) = UInt32(c) < 0x80
1✔
629

630
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
631
    r = zero(CU)
40,601✔
632
    for n = first:last
663,693✔
633
        @inbounds r |= code_units[n]
7,854,766✔
634
    end
15,045,843✔
635
    return 0 ≤ r < 0x80
663,693✔
636
end
637

638
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
639
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
640
    n=first
×
641
    while n <= last - chunk_size
786✔
642
        _isascii(cu,n,n+chunk_size-1) || return false
780✔
643
        n += chunk_size
732✔
644
    end
732✔
645
    return  _isascii(cu,last-chunk_size+1,last)
30✔
646
end
647
"""
648
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
649

650
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
651
This function is intended to be used by other string implementations that need a fast ASCII check.
652
"""
653
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
622,313✔
654
    chunk_size = 1024
×
655
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
656
    first = firstindex(cu);   last = lastindex(cu)
622,313✔
657
    l = last - first + 1
622,313✔
658
    l < chunk_threshold && return _isascii(cu,first,last)
622,313✔
659
    return _isascii_chunks(chunk_size,cu,first,last)
786✔
660
end
661

662
## string map, filter ##
663

664
function map(f, s::AbstractString)
20,212✔
665
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
20,221✔
666
    index = UInt(1)
2,015✔
667
    for c::AbstractChar in s
40,367✔
668
        c′ = f(c)
196,236✔
669
        isa(c′, AbstractChar) || throw(ArgumentError(
29,634✔
670
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
671
            "try map(f, collect(s)) or a comprehension instead"))
672
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
196,235✔
673
        index += __unsafe_string!(out, convert(Char, c′), index)
196,328✔
674
    end
372,227✔
675
    resize!(out, index-1)
20,211✔
676
    sizehint!(out, index-1)
20,211✔
677
    return String(out)
20,211✔
678
end
679

680
function filter(f, s::AbstractString)
2✔
681
    out = IOBuffer(sizehint=sizeof(s))
4✔
682
    for c in s
2✔
683
        f(c) && write(out, c)
57✔
684
    end
57✔
685
    String(_unsafe_take!(out))
2✔
686
end
687

688
## string first and last ##
689

690
"""
691
    first(s::AbstractString, n::Integer)
692

693
Get a string consisting of the first `n` characters of `s`.
694

695
# Examples
696
```jldoctest
697
julia> first("∀ϵ≠0: ϵ²>0", 0)
698
""
699

700
julia> first("∀ϵ≠0: ϵ²>0", 1)
701
"∀"
702

703
julia> first("∀ϵ≠0: ϵ²>0", 3)
704
"∀ϵ≠"
705
```
706
"""
707
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
26✔
708

709
"""
710
    last(s::AbstractString, n::Integer)
711

712
Get a string consisting of the last `n` characters of `s`.
713

714
# Examples
715
```jldoctest
716
julia> last("∀ϵ≠0: ϵ²>0", 0)
717
""
718

719
julia> last("∀ϵ≠0: ϵ²>0", 1)
720
"0"
721

722
julia> last("∀ϵ≠0: ϵ²>0", 3)
723
"²>0"
724
```
725
"""
726
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
8✔
727

728
"""
729
    reverseind(v, i)
730

731
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
732
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
733
cases where `v` contains non-ASCII characters.)
734

735
# Examples
736
```jldoctest
737
julia> s = "Julia🚀"
738
"Julia🚀"
739

740
julia> r = reverse(s)
741
"🚀ailuJ"
742

743
julia> for i in eachindex(s)
744
           print(r[reverseind(r, i)])
745
       end
746
Julia🚀
747
```
748
"""
749
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
2,234✔
750

751
"""
752
    repeat(s::AbstractString, r::Integer)
753

754
Repeat a string `r` times. This can be written as `s^r`.
755

756
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
757

758
# Examples
759
```jldoctest
760
julia> repeat("ha", 3)
761
"hahaha"
762
```
763
"""
764
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
5✔
765

766
"""
767
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
768

769
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
770

771
See also [`repeat`](@ref).
772

773
# Examples
774
```jldoctest
775
julia> "Test "^3
776
"Test Test Test "
777
```
778
"""
779
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
917,603✔
780

781
# reverse-order iteration for strings and indices thereof
782
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
5,111,935✔
783
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
5,126,043✔
784

785
## code unit access ##
786

787
"""
788
    CodeUnits(s::AbstractString)
789

790
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
791
of the string's representation.
792
"""
793
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
794
    s::S
795
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
1,618,101✔
796
end
797

798
length(s::CodeUnits) = ncodeunits(s.s)
48,441,826✔
799
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
71✔
800
size(s::CodeUnits) = (length(s),)
5,880,326✔
801
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
3✔
802
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
67,091,295✔
803
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
1✔
804
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
42,467,939✔
805

806

807
write(io::IO, s::CodeUnits) = write(io, s.s)
1✔
808

809
cconvert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)
56✔
810
cconvert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = cconvert(Ptr{Int8}, s.s)
×
811

812
"""
813
    codeunits(s::AbstractString)
814

815
Obtain a vector-like object containing the code units of a string.
816
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
817
for new string types if necessary.
818

819
# Examples
820
```jldoctest
821
julia> codeunits("Juλia")
822
6-element Base.CodeUnits{UInt8, String}:
823
 0x4a
824
 0x75
825
 0xce
826
 0xbb
827
 0x69
828
 0x61
829
```
830
"""
831
codeunits(s::AbstractString) = CodeUnits(s)
1,618,100✔
832

833
function _split_rest(s::AbstractString, n::Int)
×
834
    lastind = lastindex(s)
×
835
    i = try
×
836
        prevind(s, lastind, n)
×
837
    catch e
838
        e isa BoundsError || rethrow()
×
839
        _check_length_split_rest(length(s), n)
×
840
    end
841
    last_n = SubString(s, nextind(s, i), lastind)
×
842
    front = s[begin:i]
×
843
    return front, last_n
×
844
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc