• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37433

pending completion
#37433

push

local

web-flow
Merge pull request #48513 from JuliaLang/jn/extend-once

ensure extension triggers are only run by the package that satified them

60 of 60 new or added lines in 1 file covered. (100.0%)

72324 of 82360 relevant lines covered (87.81%)

31376331.4 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

98.06
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractString` type is the supertype of all string implementations in
5
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
6
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
7
about strings:
8

9
* Strings are encoded in terms of fixed-size "code units"
10
  * Code units can be extracted with `codeunit(s, i)`
11
  * The first code unit has index `1`
12
  * The last code unit has index `ncodeunits(s)`
13
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
14
* String indexing is done in terms of these code units:
15
  * Characters are extracted by `s[i]` with a valid string index `i`
16
  * Each `AbstractChar` in a string is encoded by one or more code units
17
  * Only the index of the first code unit of an `AbstractChar` is a valid index
18
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
  * String encodings are [self-synchronizing] – i.e. `isvalid(s, i)` is O(1)
20

21
[self-synchronizing]: https://en.wikipedia.org/wiki/Self-synchronizing_code
22

23
Some string functions that extract code units, characters or substrings from
24
strings error if you pass them out-of-bounds or invalid string indices. This
25
includes `codeunit(s, i)` and `s[i]`. Functions that do string
26
index arithmetic take a more relaxed approach to indexing and give you the
27
closest valid string index when in-bounds, or when out-of-bounds, behave as if
28
there were an infinite number of characters padding each side of the string.
29
Usually these imaginary padding characters have code unit length `1` but string
30
types may choose different "imaginary" character sizes as makes sense for their
31
implementations (e.g. substrings may pass index arithmetic through to the
32
underlying string they provide a view into). Relaxed indexing functions include
33
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
34
model allows index arithmetic to work with out-of- bounds indices as
35
intermediate values so long as one never uses them to retrieve a character,
36
which often helps avoid needing to code around edge cases.
37

38
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
39
[`nextind`](@ref), [`prevind`](@ref).
40
"""
41
AbstractString
42

43
## required string functions ##
44

45
"""
46
    ncodeunits(s::AbstractString) -> Int
47

48
Return the number of code units in a string. Indices that are in bounds to
49
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
50
are valid – they may not be the start of a character, but they will return a
51
code unit value when calling `codeunit(s,i)`.
52

53
# Examples
54
```jldoctest
55
julia> ncodeunits("The Julia Language")
56
18
57

58
julia> ncodeunits("∫eˣ")
59
6
60

61
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
62
(3, 1, 2)
63
```
64

65
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
66
[`length`](@ref), [`lastindex`](@ref).
67
"""
68
ncodeunits(s::AbstractString)
69

70
"""
71
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
72

73
Return the code unit type of the given string object. For ASCII, Latin-1, or
74
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
75
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
76
limited to these three types, but it's hard to think of widely used string
77
encodings that don't use one of these units. `codeunit(s)` is the same as
78
`typeof(codeunit(s,1))` when `s` is a non-empty string.
79

80
See also [`ncodeunits`](@ref).
81
"""
82
codeunit(s::AbstractString)
83

84
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
85

86
"""
87
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
88

89
Return the code unit value in the string `s` at index `i`. Note that
90

91
    codeunit(s, i) :: codeunit(s)
92

93
I.e. the value returned by `codeunit(s, i)` is of the type returned by
94
`codeunit(s)`.
95

96
# Examples
97
```jldoctest
98
julia> a = codeunit("Hello", 2)
99
0x65
100

101
julia> typeof(a)
102
UInt8
103
```
104

105
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
106
"""
107
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
3✔
108
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
109

110
"""
111
    isvalid(s::AbstractString, i::Integer) -> Bool
112

113
Predicate indicating whether the given index is the start of the encoding of a
114
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
115
character whose encoding starts at that index, if it's false, then `s[i]` will
116
raise an invalid index error or a bounds error depending on if `i` is in bounds.
117
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
118
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
119
is a basic assumption of Julia's generic string support.
120

121
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
122
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
123

124
# Examples
125
```jldoctest
126
julia> str = "αβγdef";
127

128
julia> isvalid(str, 1)
129
true
130

131
julia> str[1]
132
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
133

134
julia> isvalid(str, 2)
135
false
136

137
julia> str[2]
138
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
139
Stacktrace:
140
[...]
141
```
142
"""
143
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
164,995✔
144
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
145

146
"""
147
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
148

149
Return a tuple of the character in `s` at index `i` with the index of the start
150
of the following character in `s`. This is the key method that allows strings to
151
be iterated, yielding a sequences of characters. If `i` is out of bounds in `s`
152
then a bounds error is raised. The `iterate` function, as part of the iteration
153
protocol may assume that `i` is the start of a character in `s`.
154

155
See also [`getindex`](@ref), [`checkbounds`](@ref).
156
"""
157
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
329,958✔
158
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
159

160
## basic generic definitions ##
161

162
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
100✔
163

164
"""
165
    sizeof(str::AbstractString)
166

167
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
168
the size, in bytes, of one code unit in `str`.
169

170
# Examples
171
```jldoctest
172
julia> sizeof("")
173
0
174

175
julia> sizeof("∀")
176
3
177
```
178
"""
179
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
11,504,861✔
180
firstindex(s::AbstractString) = 1
850✔
181
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
7,234,137✔
182
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
5,406,616✔
183

184
function getindex(s::AbstractString, i::Integer)
167,621✔
185
    @boundscheck checkbounds(s, i)
167,622✔
186
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
167,621✔
187
end
188

189
getindex(s::AbstractString, i::Colon) = s
1✔
190
# TODO: handle other ranges with stride ±1 specially?
191
# TODO: add more @propagate_inbounds annotations?
192
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
4✔
193
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
27✔
194
getindex(s::AbstractString, v::AbstractVector{Bool}) =
2✔
195
    throw(ArgumentError("logical indexing not supported for strings"))
196

197
function get(s::AbstractString, i::Integer, default)
5✔
198
# TODO: use ternary once @inbounds is expression-like
199
    if checkbounds(Bool, s, i)
6✔
200
        @inbounds return s[i]
3✔
201
    else
202
        return default
2✔
203
    end
204
end
205

206
## bounds checking ##
207

208
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
91,770,254✔
209
    1 ≤ i ≤ ncodeunits(s)::Int
210
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
12,906,620✔
211
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
212
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
1✔
213
    all(i -> checkbounds(Bool, s, i), I)
1✔
214
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
9✔
215
    all(i -> checkbounds(Bool, s, i), I)
24✔
216
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
16,072,514✔
217
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
218

219
## construction, conversion, promotion ##
220

221
string() = ""
87✔
222
string(s::AbstractString) = s
4✔
223

224
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
225
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
1✔
226
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
235✔
227

228
Symbol(s::AbstractString) = Symbol(String(s))
1✔
229
Symbol(x...) = Symbol(string(x...))
14,121✔
230

231
convert(::Type{T}, s::T) where {T<:AbstractString} = s
100,589✔
232
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
64,952✔
233

234
## summary ##
235

236
function summary(io::IO, s::AbstractString)
3✔
237
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
5✔
238
    print(io, prefix, " ", typeof(s))
3✔
239
end
240

241
## string & character concatenation ##
242

243
"""
244
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
245

246
Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent
247
to calling the [`string`](@ref) function on the arguments. Concatenation of built-in
248
string types always produces a value of type `String` but other string types may choose
249
to return a string of a different type as appropriate.
250

251
# Examples
252
```jldoctest
253
julia> "Hello " * "world"
254
"Hello world"
255

256
julia> 'j' * "ulia"
257
"julia"
258
```
259
"""
260
(*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...) = string(s1, ss...)
4,603,178✔
261

262
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
4✔
263

264
## generic string comparison ##
265

266
"""
267
    cmp(a::AbstractString, b::AbstractString) -> Int
268

269
Compare two strings. Return `0` if both strings have the same length and the character
270
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
271
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
272
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
273
code points).
274

275
# Examples
276
```jldoctest
277
julia> cmp("abc", "abc")
278
0
279

280
julia> cmp("ab", "abc")
281
-1
282

283
julia> cmp("abc", "ab")
284
1
285

286
julia> cmp("ab", "ac")
287
-1
288

289
julia> cmp("ac", "ab")
290
1
291

292
julia> cmp("α", "a")
293
1
294

295
julia> cmp("b", "β")
296
-1
297
```
298
"""
299
function cmp(a::AbstractString, b::AbstractString)
342✔
300
    a === b && return 0
342✔
301
    (iv1, iv2) = (iterate(a), iterate(b))
489✔
302
    while iv1 !== nothing && iv2 !== nothing
919✔
303
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
637✔
304
        c ≠ d && return ifelse(c < d, -1, 1)
637✔
305
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
867✔
306
    end
594✔
307
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
282✔
308
end
309

310
"""
311
    ==(a::AbstractString, b::AbstractString) -> Bool
312

313
Test whether two strings are equal character by character (technically, Unicode
314
code point by code point).
315

316
# Examples
317
```jldoctest
318
julia> "abc" == "abc"
319
true
320

321
julia> "abc" == "αβγ"
322
false
323
```
324
"""
325
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
283✔
326

327
"""
328
    isless(a::AbstractString, b::AbstractString) -> Bool
329

330
Test whether string `a` comes before string `b` in alphabetical order
331
(technically, in lexicographical order by Unicode code points).
332

333
# Examples
334
```jldoctest
335
julia> isless("a", "b")
336
true
337

338
julia> isless("β", "α")
339
false
340

341
julia> isless("a", "a")
342
false
343
```
344
"""
345
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
182,147✔
346

347
# faster comparisons for symbols
348

349
@assume_effects :total function cmp(a::Symbol, b::Symbol)
×
350
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
13,648,746✔
351
end
352

353
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
13,633,866✔
354

355
# hashing
356

357
hash(s::AbstractString, h::UInt) = hash(String(s), h)
1✔
358

359
## character index arithmetic ##
360

361
"""
362
    length(s::AbstractString) -> Int
363
    length(s::AbstractString, i::Integer, j::Integer) -> Int
364

365
Return the number of characters in string `s` from indices `i` through `j`.
366

367
This is computed as the number of code unit indices from `i` to `j` which are
368
valid character indices. With only a single string argument, this computes
369
the number of characters in the entire string. With `i` and `j` arguments it
370
computes the number of indices between `i` and `j` inclusive that are valid
371
indices in the string `s`. In addition to in-bounds values, `i` may take the
372
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
373
value `0`.
374

375
!!! note
376
    The time complexity of this operation is linear in general. That is, it
377
    will take the time proportional to the number of bytes or characters in
378
    the string because it counts the value on the fly. This is in contrast to
379
    the method for arrays, which is a constant-time operation.
380

381
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
382
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
383

384
# Examples
385
```jldoctest
386
julia> length("jμΛIα")
387
5
388
```
389
"""
390
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
3,362✔
391

392
function length(s::AbstractString, i::Int, j::Int)
59,571✔
393
    @boundscheck begin
59,571✔
394
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
59,571✔
395
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
59,574✔
396
    end
397
    n = 0
59,568✔
398
    for k = i:j
90,622✔
399
        @inbounds n += isvalid(s, k)
1,703,198✔
400
    end
3,375,342✔
401
    return n
59,568✔
402
end
403

404
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
1✔
405
    length(s, Int(i), Int(j))
406

407
"""
408
    thisind(s::AbstractString, i::Integer) -> Int
409

410
If `i` is in bounds in `s` return the index of the start of the character whose
411
encoding code unit `i` is part of. In other words, if `i` is the start of a
412
character, return `i`; if `i` is not the start of a character, rewind until the
413
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
414
return `i`. In all other cases throw `BoundsError`.
415

416
# Examples
417
```jldoctest
418
julia> thisind("α", 0)
419
0
420

421
julia> thisind("α", 1)
422
1
423

424
julia> thisind("α", 2)
425
1
426

427
julia> thisind("α", 3)
428
3
429

430
julia> thisind("α", 4)
431
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
432
[...]
433

434
julia> thisind("α", -1)
435
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
436
[...]
437
```
438
"""
439
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
4✔
440

441
function thisind(s::AbstractString, i::Int)
1,210✔
442
    z = ncodeunits(s)::Int + 1
1,210✔
443
    i == z && return i
1,210✔
444
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
1,213✔
445
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
1,189✔
446
        i -= 1
1,154✔
447
    end
1,154✔
448
    return i
1,189✔
449
end
450

451
"""
452
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
453

454
* Case `n == 1`
455

456
  If `i` is in bounds in `s` return the index of the start of the character whose
457
  encoding starts before index `i`. In other words, if `i` is the start of a
458
  character, return the start of the previous character; if `i` is not the start
459
  of a character, rewind until the start of a character and return that index.
460
  If `i` is equal to `1` return `0`.
461
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
462
  Otherwise throw `BoundsError`.
463

464
* Case `n > 1`
465

466
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
467
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
468
  iteration decreases the returned value by `1`.
469
  This means that in this case `prevind` can return a negative value.
470

471
* Case `n == 0`
472

473
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
474
  Otherwise `StringIndexError` or `BoundsError` is thrown.
475

476
# Examples
477
```jldoctest
478
julia> prevind("α", 3)
479
1
480

481
julia> prevind("α", 1)
482
0
483

484
julia> prevind("α", 0)
485
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
486
[...]
487

488
julia> prevind("α", 2, 2)
489
0
490

491
julia> prevind("α", 2, 3)
492
-1
493
```
494
"""
495
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
2✔
496
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
2,236,720✔
497
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
8,776,547✔
498

499
function prevind(s::AbstractString, i::Int, n::Int)
11,000,780✔
500
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
11,000,780✔
501
    z = ncodeunits(s) + 1
11,000,776✔
502
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
11,000,810✔
503
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
11,000,742✔
504
    while n > 0 && 1 < i
44,960,274✔
505
        @inbounds n -= isvalid(s, i -= 1)
34,025,464✔
506
    end
34,025,464✔
507
    return i - n
10,934,810✔
508
end
509

510
"""
511
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
512

513
* Case `n == 1`
514

515
  If `i` is in bounds in `s` return the index of the start of the character whose
516
  encoding starts after index `i`. In other words, if `i` is the start of a
517
  character, return the start of the next character; if `i` is not the start
518
  of a character, move forward until the start of a character and return that index.
519
  If `i` is equal to `0` return `1`.
520
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
521
  Otherwise throw `BoundsError`.
522

523
* Case `n > 1`
524

525
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
526
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
527
  each remaining iteration increases the returned value by `1`. This means that in this
528
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
529

530
* Case `n == 0`
531

532
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
533
  Otherwise `StringIndexError` or `BoundsError` is thrown.
534

535
# Examples
536
```jldoctest
537
julia> nextind("α", 0)
538
1
539

540
julia> nextind("α", 1)
541
3
542

543
julia> nextind("α", 3)
544
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
545
[...]
546

547
julia> nextind("α", 0, 2)
548
3
549

550
julia> nextind("α", 1, 2)
551
4
552
```
553
"""
554
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
2✔
555
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
2✔
556
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
2,157✔
557

558
function nextind(s::AbstractString, i::Int, n::Int)
2,277,477✔
559
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
2,277,477✔
560
    z = ncodeunits(s)
2,277,470✔
561
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
2,277,490✔
562
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
2,277,450✔
563
    while n > 0 && i < z
34,341,986✔
564
        @inbounds n -= isvalid(s, i += 1)
32,121,120✔
565
    end
32,121,120✔
566
    return i + n
2,220,866✔
567
end
568

569
## string index iteration type ##
570

571
struct EachStringIndex{T<:AbstractString}
572
    s::T
5,314✔
573
end
574
keys(s::AbstractString) = EachStringIndex(s)
39,085✔
575

576
length(e::EachStringIndex) = length(e.s)
868✔
577
first(::EachStringIndex) = 1
405✔
578
last(e::EachStringIndex) = lastindex(e.s)
342✔
579
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
418,535✔
580
eltype(::Type{<:EachStringIndex}) = Int
3✔
581

582
"""
583
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
584

585
Test whether a character belongs to the ASCII character set, or whether this is true for
586
all elements of a string.
587

588
# Examples
589
```jldoctest
590
julia> isascii('a')
591
true
592

593
julia> isascii('α')
594
false
595

596
julia> isascii("abc")
597
true
598

599
julia> isascii("αβγ")
600
false
601
```
602
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
603
to remove or replace non-ASCII characters, respectively:
604
```jldoctest
605
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
606
"abcdefgh"
607

608
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
609
"abcde fgh"
610
```
611
"""
612
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
3,975,991✔
613
isascii(s::AbstractString) = all(isascii, s)
1✔
614
isascii(c::AbstractChar) = UInt32(c) < 0x80
1✔
615

616
## string map, filter ##
617

618
function map(f, s::AbstractString)
18,714✔
619
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
18,723✔
620
    index = UInt(1)
17✔
621
    for c::AbstractChar in s
37,371✔
622
        c′ = f(c)
207,302✔
623
        isa(c′, AbstractChar) || throw(ArgumentError(
101✔
624
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
625
            "try map(f, collect(s)) or a comprehension instead"))
626
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
207,301✔
627
        index += __unsafe_string!(out, convert(Char, c′), index)
207,375✔
628
    end
395,857✔
629
    resize!(out, index-1)
37,426✔
630
    sizehint!(out, index-1)
18,713✔
631
    return String(out)
18,713✔
632
end
633

634
function filter(f, s::AbstractString)
2✔
635
    out = IOBuffer(sizehint=sizeof(s))
4✔
636
    for c in s
2✔
637
        f(c) && write(out, c)
57✔
638
    end
57✔
639
    String(take!(out))
2✔
640
end
641

642
## string first and last ##
643

644
"""
645
    first(s::AbstractString, n::Integer)
646

647
Get a string consisting of the first `n` characters of `s`.
648

649
# Examples
650
```jldoctest
651
julia> first("∀ϵ≠0: ϵ²>0", 0)
652
""
653

654
julia> first("∀ϵ≠0: ϵ²>0", 1)
655
"∀"
656

657
julia> first("∀ϵ≠0: ϵ²>0", 3)
658
"∀ϵ≠"
659
```
660
"""
661
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
26✔
662

663
"""
664
    last(s::AbstractString, n::Integer)
665

666
Get a string consisting of the last `n` characters of `s`.
667

668
# Examples
669
```jldoctest
670
julia> last("∀ϵ≠0: ϵ²>0", 0)
671
""
672

673
julia> last("∀ϵ≠0: ϵ²>0", 1)
674
"0"
675

676
julia> last("∀ϵ≠0: ϵ²>0", 3)
677
"²>0"
678
```
679
"""
680
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
8✔
681

682
"""
683
    reverseind(v, i)
684

685
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
686
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
687
cases where `v` contains non-ASCII characters.)
688

689
# Examples
690
```jldoctest
691
julia> s = "Julia🚀"
692
"Julia🚀"
693

694
julia> r = reverse(s)
695
"🚀ailuJ"
696

697
julia> for i in eachindex(s)
698
           print(r[reverseind(r, i)])
699
       end
700
Julia🚀
701
```
702
"""
703
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
767✔
704

705
"""
706
    repeat(s::AbstractString, r::Integer)
707

708
Repeat a string `r` times. This can be written as `s^r`.
709

710
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
711

712
# Examples
713
```jldoctest
714
julia> repeat("ha", 3)
715
"hahaha"
716
```
717
"""
718
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
5✔
719

720
"""
721
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
722

723
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
724

725
See also [`repeat`](@ref).
726

727
# Examples
728
```jldoctest
729
julia> "Test "^3
730
"Test Test Test "
731
```
732
"""
733
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
760,303✔
734

735
# reverse-order iteration for strings and indices thereof
736
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
578,639✔
737
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
593,319✔
738

739
## code unit access ##
740

741
"""
742
    CodeUnits(s::AbstractString)
743

744
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
745
of the string's representation.
746
"""
747
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
748
    s::S
749
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
55,696✔
750
end
751

752
length(s::CodeUnits) = ncodeunits(s.s)
868,179✔
753
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
61✔
754
size(s::CodeUnits) = (length(s),)
1,425✔
755
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
3✔
756
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
4,226,117✔
757
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
1✔
758
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
794,734✔
759

760

761
write(io::IO, s::CodeUnits) = write(io, s.s)
1✔
762

763
unsafe_convert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = unsafe_convert(Ptr{T}, s.s)
48✔
764
unsafe_convert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = unsafe_convert(Ptr{Int8}, s.s)
1✔
765

766
"""
767
    codeunits(s::AbstractString)
768

769
Obtain a vector-like object containing the code units of a string.
770
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
771
for new string types if necessary.
772

773
# Examples
774
```jldoctest
775
julia> codeunits("Juλia")
776
6-element Base.CodeUnits{UInt8, String}:
777
 0x4a
778
 0x75
779
 0xce
780
 0xbb
781
 0x69
782
 0x61
783
```
784
"""
785
codeunits(s::AbstractString) = CodeUnits(s)
55,802✔
786

787
function _split_rest(s::AbstractString, n::Int)
1✔
788
    lastind = lastindex(s)
1✔
789
    i = try
1✔
790
        prevind(s, lastind, n)
1✔
791
    catch e
792
        e isa BoundsError || rethrow()
×
793
        _check_length_split_rest(length(s), n)
1✔
794
    end
795
    last_n = SubString(s, nextind(s, i), lastind)
1✔
796
    front = s[begin:i]
1✔
797
    return front, last_n
1✔
798
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc