• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37650

12 Oct 2023 03:02PM UTC coverage: 85.263% (-2.3%) from 87.56%
#37650

push

local

web-flow
Revert "Reinstate load-time Pkg.precompile" (#51675)

1 of 1 new or added line in 1 file covered. (100.0%)

70225 of 82363 relevant lines covered (85.26%)

12218375.87 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

96.0
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractString` type is the supertype of all string implementations in
5
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
6
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
7
about strings:
8

9
* Strings are encoded in terms of fixed-size "code units"
10
  * Code units can be extracted with `codeunit(s, i)`
11
  * The first code unit has index `1`
12
  * The last code unit has index `ncodeunits(s)`
13
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
14
* String indexing is done in terms of these code units:
15
  * Characters are extracted by `s[i]` with a valid string index `i`
16
  * Each `AbstractChar` in a string is encoded by one or more code units
17
  * Only the index of the first code unit of an `AbstractChar` is a valid index
18
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
20

21
Some string functions that extract code units, characters or substrings from
22
strings error if you pass them out-of-bounds or invalid string indices. This
23
includes `codeunit(s, i)` and `s[i]`. Functions that do string
24
index arithmetic take a more relaxed approach to indexing and give you the
25
closest valid string index when in-bounds, or when out-of-bounds, behave as if
26
there were an infinite number of characters padding each side of the string.
27
Usually these imaginary padding characters have code unit length `1` but string
28
types may choose different "imaginary" character sizes as makes sense for their
29
implementations (e.g. substrings may pass index arithmetic through to the
30
underlying string they provide a view into). Relaxed indexing functions include
31
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
32
model allows index arithmetic to work with out-of-bounds indices as
33
intermediate values so long as one never uses them to retrieve a character,
34
which often helps avoid needing to code around edge cases.
35

36
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
37
[`nextind`](@ref), [`prevind`](@ref).
38
"""
39
AbstractString
40

41
## required string functions ##
42

43
"""
44
    ncodeunits(s::AbstractString) -> Int
45

46
Return the number of code units in a string. Indices that are in bounds to
47
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
48
are valid – they may not be the start of a character, but they will return a
49
code unit value when calling `codeunit(s,i)`.
50

51
# Examples
52
```jldoctest
53
julia> ncodeunits("The Julia Language")
54
18
55

56
julia> ncodeunits("∫eˣ")
57
6
58

59
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
60
(3, 1, 2)
61
```
62

63
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
64
[`length`](@ref), [`lastindex`](@ref).
65
"""
66
ncodeunits(s::AbstractString)
67

68
"""
69
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
70

71
Return the code unit type of the given string object. For ASCII, Latin-1, or
72
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
73
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
74
limited to these three types, but it's hard to think of widely used string
75
encodings that don't use one of these units. `codeunit(s)` is the same as
76
`typeof(codeunit(s,1))` when `s` is a non-empty string.
77

78
See also [`ncodeunits`](@ref).
79
"""
80
codeunit(s::AbstractString)
81

82
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
83

84
"""
85
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
86

87
Return the code unit value in the string `s` at index `i`. Note that
88

89
    codeunit(s, i) :: codeunit(s)
90

91
I.e. the value returned by `codeunit(s, i)` is of the type returned by
92
`codeunit(s)`.
93

94
# Examples
95
```jldoctest
96
julia> a = codeunit("Hello", 2)
97
0x65
98

99
julia> typeof(a)
100
UInt8
101
```
102

103
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
104
"""
105
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
3✔
106
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
107

108
"""
109
    isvalid(s::AbstractString, i::Integer) -> Bool
110

111
Predicate indicating whether the given index is the start of the encoding of a
112
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
113
character whose encoding starts at that index, if it's false, then `s[i]` will
114
raise an invalid index error or a bounds error depending on if `i` is in bounds.
115
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
116
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
117
is a basic assumption of Julia's generic string support.
118

119
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
120
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
121

122
# Examples
123
```jldoctest
124
julia> str = "αβγdef";
125

126
julia> isvalid(str, 1)
127
true
128

129
julia> str[1]
130
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
131

132
julia> isvalid(str, 2)
133
false
134

135
julia> str[2]
136
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
137
Stacktrace:
138
[...]
139
```
140
"""
141
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
195,453✔
142
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
143

144
"""
145
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
146

147
Return a tuple of the character in `s` at index `i` with the index of the start
148
of the following character in `s`. This is the key method that allows strings to
149
be iterated, yielding a sequences of characters. If `i` is out of bounds in `s`
150
then a bounds error is raised. The `iterate` function, as part of the iteration
151
protocol may assume that `i` is the start of a character in `s`.
152

153
See also [`getindex`](@ref), [`checkbounds`](@ref).
154
"""
155
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
390,874✔
156
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
157

158
## basic generic definitions ##
159

160
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
105✔
161

162
"""
163
    sizeof(str::AbstractString)
164

165
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
166
the size, in bytes, of one code unit in `str`.
167

168
# Examples
169
```jldoctest
170
julia> sizeof("")
171
0
172

173
julia> sizeof("∀")
174
3
175
```
176
"""
177
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
13,311,818✔
178
firstindex(s::AbstractString) = 1
852✔
179
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
7,743,201✔
180
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
7,216,094✔
181

182
function getindex(s::AbstractString, i::Integer)
199,859✔
183
    @boundscheck checkbounds(s, i)
199,860✔
184
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
199,858✔
185
end
186

187
getindex(s::AbstractString, i::Colon) = s
1✔
188
# TODO: handle other ranges with stride ±1 specially?
189
# TODO: add more @propagate_inbounds annotations?
190
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
4✔
191
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
27✔
192
getindex(s::AbstractString, v::AbstractVector{Bool}) =
2✔
193
    throw(ArgumentError("logical indexing not supported for strings"))
194

195
function get(s::AbstractString, i::Integer, default)
5✔
196
# TODO: use ternary once @inbounds is expression-like
197
    if checkbounds(Bool, s, i)
6✔
198
        @inbounds return s[i]
3✔
199
    else
200
        return default
2✔
201
    end
202
end
203

204
## bounds checking ##
205

206
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
460,491,206✔
207
    1 ≤ i ≤ ncodeunits(s)::Int
208
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
15,320,346✔
209
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
210
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
1✔
211
    all(i -> checkbounds(Bool, s, i), I)
1✔
212
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
9✔
213
    all(i -> checkbounds(Bool, s, i), I)
24✔
214
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
351,386,526✔
215
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
216

217
## construction, conversion, promotion ##
218

219
string() = ""
22✔
220
string(s::AbstractString) = s
4✔
221

222
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
223
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
1✔
224
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
283✔
225

226
Symbol(s::AbstractString) = Symbol(String(s))
1✔
227
Symbol(x...) = Symbol(string(x...))
88,037✔
228

229
convert(::Type{T}, s::T) where {T<:AbstractString} = s
21✔
230
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
143,142✔
231

232
## summary ##
233

234
function summary(io::IO, s::AbstractString)
3✔
235
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
5✔
236
    print(io, prefix, " ", typeof(s))
3✔
237
end
238

239
## string & character concatenation ##
240

241
"""
242
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
243

244
Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent
245
to calling the [`string`](@ref) function on the arguments. Concatenation of built-in
246
string types always produces a value of type `String` but other string types may choose
247
to return a string of a different type as appropriate.
248

249
# Examples
250
```jldoctest
251
julia> "Hello " * "world"
252
"Hello world"
253

254
julia> 'j' * "ulia"
255
"julia"
256
```
257
"""
258
(*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...) = string(s1, ss...)
6,399,497✔
259

260
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
4✔
261

262
## generic string comparison ##
263

264
"""
265
    cmp(a::AbstractString, b::AbstractString) -> Int
266

267
Compare two strings. Return `0` if both strings have the same length and the character
268
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
269
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
270
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
271
code points).
272

273
# Examples
274
```jldoctest
275
julia> cmp("abc", "abc")
276
0
277

278
julia> cmp("ab", "abc")
279
-1
280

281
julia> cmp("abc", "ab")
282
1
283

284
julia> cmp("ab", "ac")
285
-1
286

287
julia> cmp("ac", "ab")
288
1
289

290
julia> cmp("α", "a")
291
1
292

293
julia> cmp("b", "β")
294
-1
295
```
296
"""
297
function cmp(a::AbstractString, b::AbstractString)
331✔
298
    a === b && return 0
331✔
299
    (iv1, iv2) = (iterate(a), iterate(b))
468✔
300
    while iv1 !== nothing && iv2 !== nothing
871✔
301
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
599✔
302
        c ≠ d && return ifelse(c < d, -1, 1)
599✔
303
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
802✔
304
    end
557✔
305
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
272✔
306
end
307

308
"""
309
    ==(a::AbstractString, b::AbstractString) -> Bool
310

311
Test whether two strings are equal character by character (technically, Unicode
312
code point by code point).
313

314
# Examples
315
```jldoctest
316
julia> "abc" == "abc"
317
true
318

319
julia> "abc" == "αβγ"
320
false
321
```
322
"""
323
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
272✔
324

325
"""
326
    isless(a::AbstractString, b::AbstractString) -> Bool
327

328
Test whether string `a` comes before string `b` in alphabetical order
329
(technically, in lexicographical order by Unicode code points).
330

331
# Examples
332
```jldoctest
333
julia> isless("a", "b")
334
true
335

336
julia> isless("β", "α")
337
false
338

339
julia> isless("a", "a")
340
false
341
```
342
"""
343
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
8,901,972✔
344

345
# faster comparisons for symbols
346

347
@assume_effects :total function cmp(a::Symbol, b::Symbol)
58✔
348
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
34,121,720✔
349
end
350

351
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
34,105,626✔
352

353
# hashing
354

355
hash(s::AbstractString, h::UInt) = hash(String(s), h)
1✔
356

357
## character index arithmetic ##
358

359
"""
360
    length(s::AbstractString) -> Int
361
    length(s::AbstractString, i::Integer, j::Integer) -> Int
362

363
Return the number of characters in string `s` from indices `i` through `j`.
364

365
This is computed as the number of code unit indices from `i` to `j` which are
366
valid character indices. With only a single string argument, this computes
367
the number of characters in the entire string. With `i` and `j` arguments it
368
computes the number of indices between `i` and `j` inclusive that are valid
369
indices in the string `s`. In addition to in-bounds values, `i` may take the
370
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
371
value `0`.
372

373
!!! note
374
    The time complexity of this operation is linear in general. That is, it
375
    will take the time proportional to the number of bytes or characters in
376
    the string because it counts the value on the fly. This is in contrast to
377
    the method for arrays, which is a constant-time operation.
378

379
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
380
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
381

382
# Examples
383
```jldoctest
384
julia> length("jμΛIα")
385
5
386
```
387
"""
388
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
3,366✔
389

390
function length(s::AbstractString, i::Int, j::Int)
59,575✔
391
    @boundscheck begin
59,575✔
392
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
59,575✔
393
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
59,578✔
394
    end
395
    n = 0
59,572✔
396
    for k = i:j
90,630✔
397
        @inbounds n += isvalid(s, k)
1,703,294✔
398
    end
3,375,530✔
399
    return n
59,572✔
400
end
401

402
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
1✔
403
    length(s, Int(i), Int(j))
404

405
"""
406
    thisind(s::AbstractString, i::Integer) -> Int
407

408
If `i` is in bounds in `s` return the index of the start of the character whose
409
encoding code unit `i` is part of. In other words, if `i` is the start of a
410
character, return `i`; if `i` is not the start of a character, rewind until the
411
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
412
return `i`. In all other cases throw `BoundsError`.
413

414
# Examples
415
```jldoctest
416
julia> thisind("α", 0)
417
0
418

419
julia> thisind("α", 1)
420
1
421

422
julia> thisind("α", 2)
423
1
424

425
julia> thisind("α", 3)
426
3
427

428
julia> thisind("α", 4)
429
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
430
[...]
431

432
julia> thisind("α", -1)
433
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
434
[...]
435
```
436
"""
437
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
4✔
438

439
function thisind(s::AbstractString, i::Int)
1,659✔
440
    z = ncodeunits(s)::Int + 1
1,659✔
441
    i == z && return i
1,659✔
442
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
1,662✔
443
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
1,638✔
444
        i -= 1
1,614✔
445
    end
1,614✔
446
    return i
1,638✔
447
end
448

449
"""
450
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
451

452
* Case `n == 1`
453

454
  If `i` is in bounds in `s` return the index of the start of the character whose
455
  encoding starts before index `i`. In other words, if `i` is the start of a
456
  character, return the start of the previous character; if `i` is not the start
457
  of a character, rewind until the start of a character and return that index.
458
  If `i` is equal to `1` return `0`.
459
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
460
  Otherwise throw `BoundsError`.
461

462
* Case `n > 1`
463

464
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
465
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
466
  iteration decreases the returned value by `1`.
467
  This means that in this case `prevind` can return a negative value.
468

469
* Case `n == 0`
470

471
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
472
  Otherwise `StringIndexError` or `BoundsError` is thrown.
473

474
# Examples
475
```jldoctest
476
julia> prevind("α", 3)
477
1
478

479
julia> prevind("α", 1)
480
0
481

482
julia> prevind("α", 0)
483
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
484
[...]
485

486
julia> prevind("α", 2, 2)
487
0
488

489
julia> prevind("α", 2, 3)
490
-1
491
```
492
"""
493
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
2✔
494
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
2,262,279✔
495
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
9,194,404✔
496

497
function prevind(s::AbstractString, i::Int, n::Int)
11,671,069✔
498
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
11,671,069✔
499
    z = ncodeunits(s) + 1
11,671,065✔
500
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
11,671,099✔
501
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
11,671,031✔
502
    while n > 0 && 1 < i
48,038,950✔
503
        @inbounds n -= isvalid(s, i -= 1)
36,444,961✔
504
    end
36,444,961✔
505
    return i - n
11,593,989✔
506
end
507

508
"""
509
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
510

511
* Case `n == 1`
512

513
  If `i` is in bounds in `s` return the index of the start of the character whose
514
  encoding starts after index `i`. In other words, if `i` is the start of a
515
  character, return the start of the next character; if `i` is not the start
516
  of a character, move forward until the start of a character and return that index.
517
  If `i` is equal to `0` return `1`.
518
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
519
  Otherwise throw `BoundsError`.
520

521
* Case `n > 1`
522

523
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
524
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
525
  each remaining iteration increases the returned value by `1`. This means that in this
526
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
527

528
* Case `n == 0`
529

530
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
531
  Otherwise `StringIndexError` or `BoundsError` is thrown.
532

533
# Examples
534
```jldoctest
535
julia> nextind("α", 0)
536
1
537

538
julia> nextind("α", 1)
539
3
540

541
julia> nextind("α", 3)
542
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
543
[...]
544

545
julia> nextind("α", 0, 2)
546
3
547

548
julia> nextind("α", 1, 2)
549
4
550
```
551
"""
552
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
2✔
553
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
2✔
554
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
3,266✔
555

556
function nextind(s::AbstractString, i::Int, n::Int)
2,531,265✔
557
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
2,531,265✔
558
    z = ncodeunits(s)
2,531,258✔
559
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
2,531,278✔
560
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
2,531,238✔
561
    while n > 0 && i < z
36,552,273✔
562
        @inbounds n -= isvalid(s, i += 1)
34,088,849✔
563
    end
34,088,849✔
564
    return i + n
2,463,424✔
565
end
566

567
## string index iteration type ##
568

569
struct EachStringIndex{T<:AbstractString}
570
    s::T
3,319✔
571
end
572
keys(s::AbstractString) = EachStringIndex(s)
2,895✔
573

574
length(e::EachStringIndex) = length(e.s)
1,488✔
575
first(::EachStringIndex) = 1
861✔
576
last(e::EachStringIndex) = lastindex(e.s)
1,137✔
577
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
18,956,613✔
578
eltype(::Type{<:EachStringIndex}) = Int
7✔
579

580
"""
581
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
582

583
Test whether a character belongs to the ASCII character set, or whether this is true for
584
all elements of a string.
585

586
# Examples
587
```jldoctest
588
julia> isascii('a')
589
true
590

591
julia> isascii('α')
592
false
593

594
julia> isascii("abc")
595
true
596

597
julia> isascii("αβγ")
598
false
599
```
600
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
601
to remove or replace non-ASCII characters, respectively:
602
```jldoctest
603
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
604
"abcdefgh"
605

606
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
607
"abcde fgh"
608
```
609
"""
610
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
6,675,900✔
611
isascii(s::AbstractString) = all(isascii, s)
1✔
612
isascii(c::AbstractChar) = UInt32(c) < 0x80
1✔
613

614
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
40,647✔
615
    r = zero(CU)
40,600✔
616
    for n = first:last
1,425,855✔
617
        @inbounds r |= code_units[n]
8,098,366✔
618
    end
15,483,810✔
619
    return 0 ≤ r < 0x80
712,933✔
620
end
621

622
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
623
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
×
624
    n=first
×
625
    while n <= last - chunk_size
786✔
626
        _isascii(cu,n,n+chunk_size-1) || return false
780✔
627
        n += chunk_size
732✔
628
    end
732✔
629
    return  _isascii(cu,last-chunk_size+1,last)
30✔
630
end
631
"""
632
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
633

634
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
635
This function is intended to be used by other string implementations that need a fast ASCII check.
636
"""
637
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
671,554✔
638
    chunk_size = 1024
×
639
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
640
    first = firstindex(cu);   last = lastindex(cu)
671,554✔
641
    l = last - first + 1
671,554✔
642
    l < chunk_threshold && return _isascii(cu,first,last)
671,554✔
643
    return _isascii_chunks(chunk_size,cu,first,last)
786✔
644
end
645

646
## string map, filter ##
647

648
function map(f, s::AbstractString)
17,575✔
649
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
17,584✔
650
    index = UInt(1)
24✔
651
    for c::AbstractChar in s
35,089✔
652
        c′ = f(c)
172,469✔
653
        isa(c′, AbstractChar) || throw(ArgumentError(
136✔
654
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
655
            "try map(f, collect(s)) or a comprehension instead"))
656
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
172,468✔
657
        index += __unsafe_string!(out, convert(Char, c′), index)
172,542✔
658
    end
327,334✔
659
    resize!(out, index-1)
35,148✔
660
    sizehint!(out, index-1)
17,574✔
661
    return String(out)
17,574✔
662
end
663

664
function filter(f, s::AbstractString)
2✔
665
    out = IOBuffer(sizehint=sizeof(s))
4✔
666
    for c in s
2✔
667
        f(c) && write(out, c)
57✔
668
    end
57✔
669
    String(_unsafe_take!(out))
2✔
670
end
671

672
## string first and last ##
673

674
"""
675
    first(s::AbstractString, n::Integer)
676

677
Get a string consisting of the first `n` characters of `s`.
678

679
# Examples
680
```jldoctest
681
julia> first("∀ϵ≠0: ϵ²>0", 0)
682
""
683

684
julia> first("∀ϵ≠0: ϵ²>0", 1)
685
"∀"
686

687
julia> first("∀ϵ≠0: ϵ²>0", 3)
688
"∀ϵ≠"
689
```
690
"""
691
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
26✔
692

693
"""
694
    last(s::AbstractString, n::Integer)
695

696
Get a string consisting of the last `n` characters of `s`.
697

698
# Examples
699
```jldoctest
700
julia> last("∀ϵ≠0: ϵ²>0", 0)
701
""
702

703
julia> last("∀ϵ≠0: ϵ²>0", 1)
704
"0"
705

706
julia> last("∀ϵ≠0: ϵ²>0", 3)
707
"²>0"
708
```
709
"""
710
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
8✔
711

712
"""
713
    reverseind(v, i)
714

715
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
716
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
717
cases where `v` contains non-ASCII characters.)
718

719
# Examples
720
```jldoctest
721
julia> s = "Julia🚀"
722
"Julia🚀"
723

724
julia> r = reverse(s)
725
"🚀ailuJ"
726

727
julia> for i in eachindex(s)
728
           print(r[reverseind(r, i)])
729
       end
730
Julia🚀
731
```
732
"""
733
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
1,307✔
734

735
"""
736
    repeat(s::AbstractString, r::Integer)
737

738
Repeat a string `r` times. This can be written as `s^r`.
739

740
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
741

742
# Examples
743
```jldoctest
744
julia> repeat("ha", 3)
745
"hahaha"
746
```
747
"""
748
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
5✔
749

750
"""
751
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
752

753
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
754

755
See also [`repeat`](@ref).
756

757
# Examples
758
```jldoctest
759
julia> "Test "^3
760
"Test Test Test "
761
```
762
"""
763
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
777,005✔
764

765
# reverse-order iteration for strings and indices thereof
766
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
454,593✔
767
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
467,561✔
768

769
## code unit access ##
770

771
"""
772
    CodeUnits(s::AbstractString)
773

774
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
775
of the string's representation.
776
"""
777
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
778
    s::S
779
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
1,339,632✔
780
end
781

782
length(s::CodeUnits) = ncodeunits(s.s)
39,003,857✔
783
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
71✔
784
size(s::CodeUnits) = (length(s),)
779,634✔
785
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
3✔
786
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
52,497,797✔
787
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
1✔
788
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
38,330,311✔
789

790

791
write(io::IO, s::CodeUnits) = write(io, s.s)
×
792

793
unsafe_convert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = unsafe_convert(Ptr{T}, s.s)
48✔
794
unsafe_convert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = unsafe_convert(Ptr{Int8}, s.s)
1✔
795

796
"""
797
    codeunits(s::AbstractString)
798

799
Obtain a vector-like object containing the code units of a string.
800
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
801
for new string types if necessary.
802

803
# Examples
804
```jldoctest
805
julia> codeunits("Juλia")
806
6-element Base.CodeUnits{UInt8, String}:
807
 0x4a
808
 0x75
809
 0xce
810
 0xbb
811
 0x69
812
 0x61
813
```
814
"""
815
codeunits(s::AbstractString) = CodeUnits(s)
1,340,152✔
816

817
function _split_rest(s::AbstractString, n::Int)
1✔
818
    lastind = lastindex(s)
1✔
819
    i = try
1✔
820
        prevind(s, lastind, n)
1✔
821
    catch e
822
        e isa BoundsError || rethrow()
×
823
        _check_length_split_rest(length(s), n)
1✔
824
    end
825
    last_n = SubString(s, nextind(s, i), lastind)
1✔
826
    front = s[begin:i]
1✔
827
    return front, last_n
1✔
828
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc