• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37662

25 Oct 2023 07:08AM UTC coverage: 87.999% (+1.5%) from 86.476%
#37662

push

local

web-flow
Improve efficiency of minimum/maximum(::Diagonal) (#30236)

```
julia> DM = Diagonal(rand(100));

julia> @btime minimum($DM);    # before
  27.987 μs (0 allocations: 0 bytes)

julia> @btime minimum($DM);    # after
  246.091 ns (0 allocations: 0 bytes)
```

8 of 8 new or added lines in 1 file covered. (100.0%)

74065 of 84166 relevant lines covered (88.0%)

12219725.89 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

96.11
/base/strings/basic.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
"""
4
The `AbstractString` type is the supertype of all string implementations in
5
Julia. Strings are encodings of sequences of [Unicode](https://unicode.org/)
6
code points as represented by the `AbstractChar` type. Julia makes a few assumptions
7
about strings:
8

9
* Strings are encoded in terms of fixed-size "code units"
10
  * Code units can be extracted with `codeunit(s, i)`
11
  * The first code unit has index `1`
12
  * The last code unit has index `ncodeunits(s)`
13
  * Any index `i` such that `1 ≤ i ≤ ncodeunits(s)` is in bounds
14
* String indexing is done in terms of these code units:
15
  * Characters are extracted by `s[i]` with a valid string index `i`
16
  * Each `AbstractChar` in a string is encoded by one or more code units
17
  * Only the index of the first code unit of an `AbstractChar` is a valid index
18
  * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
  * String encodings are [self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code) – i.e. `isvalid(s, i)` is O(1)
20

21
Some string functions that extract code units, characters or substrings from
22
strings error if you pass them out-of-bounds or invalid string indices. This
23
includes `codeunit(s, i)` and `s[i]`. Functions that do string
24
index arithmetic take a more relaxed approach to indexing and give you the
25
closest valid string index when in-bounds, or when out-of-bounds, behave as if
26
there were an infinite number of characters padding each side of the string.
27
Usually these imaginary padding characters have code unit length `1` but string
28
types may choose different "imaginary" character sizes as makes sense for their
29
implementations (e.g. substrings may pass index arithmetic through to the
30
underlying string they provide a view into). Relaxed indexing functions include
31
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
32
model allows index arithmetic to work with out-of-bounds indices as
33
intermediate values so long as one never uses them to retrieve a character,
34
which often helps avoid needing to code around edge cases.
35

36
See also [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref),
37
[`nextind`](@ref), [`prevind`](@ref).
38
"""
39
AbstractString
40

41
## required string functions ##
42

43
"""
44
    ncodeunits(s::AbstractString) -> Int
45

46
Return the number of code units in a string. Indices that are in bounds to
47
access this string must satisfy `1 ≤ i ≤ ncodeunits(s)`. Not all such indices
48
are valid – they may not be the start of a character, but they will return a
49
code unit value when calling `codeunit(s,i)`.
50

51
# Examples
52
```jldoctest
53
julia> ncodeunits("The Julia Language")
54
18
55

56
julia> ncodeunits("∫eˣ")
57
6
58

59
julia> ncodeunits('∫'), ncodeunits('e'), ncodeunits('ˣ')
60
(3, 1, 2)
61
```
62

63
See also [`codeunit`](@ref), [`checkbounds`](@ref), [`sizeof`](@ref),
64
[`length`](@ref), [`lastindex`](@ref).
65
"""
66
ncodeunits(s::AbstractString)
67

68
"""
69
    codeunit(s::AbstractString) -> Type{<:Union{UInt8, UInt16, UInt32}}
70

71
Return the code unit type of the given string object. For ASCII, Latin-1, or
72
UTF-8 encoded strings, this would be `UInt8`; for UCS-2 and UTF-16 it would be
73
`UInt16`; for UTF-32 it would be `UInt32`. The code unit type need not be
74
limited to these three types, but it's hard to think of widely used string
75
encodings that don't use one of these units. `codeunit(s)` is the same as
76
`typeof(codeunit(s,1))` when `s` is a non-empty string.
77

78
See also [`ncodeunits`](@ref).
79
"""
80
codeunit(s::AbstractString)
81

82
const CodeunitType = Union{Type{UInt8},Type{UInt16},Type{UInt32}}
83

84
"""
85
    codeunit(s::AbstractString, i::Integer) -> Union{UInt8, UInt16, UInt32}
86

87
Return the code unit value in the string `s` at index `i`. Note that
88

89
    codeunit(s, i) :: codeunit(s)
90

91
I.e. the value returned by `codeunit(s, i)` is of the type returned by
92
`codeunit(s)`.
93

94
# Examples
95
```jldoctest
96
julia> a = codeunit("Hello", 2)
97
0x65
98

99
julia> typeof(a)
100
UInt8
101
```
102

103
See also [`ncodeunits`](@ref), [`checkbounds`](@ref).
104
"""
105
@propagate_inbounds codeunit(s::AbstractString, i::Integer) = i isa Int ?
3✔
106
    throw(MethodError(codeunit, (s, i))) : codeunit(s, Int(i))
107

108
"""
109
    isvalid(s::AbstractString, i::Integer) -> Bool
110

111
Predicate indicating whether the given index is the start of the encoding of a
112
character in `s` or not. If `isvalid(s, i)` is true then `s[i]` will return the
113
character whose encoding starts at that index, if it's false, then `s[i]` will
114
raise an invalid index error or a bounds error depending on if `i` is in bounds.
115
In order for `isvalid(s, i)` to be an O(1) function, the encoding of `s` must be
116
[self-synchronizing](https://en.wikipedia.org/wiki/Self-synchronizing_code). This
117
is a basic assumption of Julia's generic string support.
118

119
See also [`getindex`](@ref), [`iterate`](@ref), [`thisind`](@ref),
120
[`nextind`](@ref), [`prevind`](@ref), [`length`](@ref).
121

122
# Examples
123
```jldoctest
124
julia> str = "αβγdef";
125

126
julia> isvalid(str, 1)
127
true
128

129
julia> str[1]
130
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
131

132
julia> isvalid(str, 2)
133
false
134

135
julia> str[2]
136
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'β'
137
Stacktrace:
138
[...]
139
```
140
"""
141
@propagate_inbounds isvalid(s::AbstractString, i::Integer) = i isa Int ?
242,589✔
142
    throw(MethodError(isvalid, (s, i))) : isvalid(s, Int(i))
143

144
"""
145
    iterate(s::AbstractString, i::Integer) -> Union{Tuple{<:AbstractChar, Int}, Nothing}
146

147
Return a tuple of the character in `s` at index `i` with the index of the start
148
of the following character in `s`. This is the key method that allows strings to
149
be iterated, yielding a sequences of characters. If `i` is out of bounds in `s`
150
then a bounds error is raised. The `iterate` function, as part of the iteration
151
protocol may assume that `i` is the start of a character in `s`.
152

153
See also [`getindex`](@ref), [`checkbounds`](@ref).
154
"""
155
@propagate_inbounds iterate(s::AbstractString, i::Integer) = i isa Int ?
485,146✔
156
    throw(MethodError(iterate, (s, i))) : iterate(s, Int(i))
157

158
## basic generic definitions ##
159

160
eltype(::Type{<:AbstractString}) = Char # some string types may use another AbstractChar
108✔
161

162
"""
163
    sizeof(str::AbstractString)
164

165
Size, in bytes, of the string `str`. Equal to the number of code units in `str` multiplied by
166
the size, in bytes, of one code unit in `str`.
167

168
# Examples
169
```jldoctest
170
julia> sizeof("")
171
0
172

173
julia> sizeof("∀")
174
3
175
```
176
"""
177
sizeof(s::AbstractString) = ncodeunits(s)::Int * sizeof(codeunit(s)::CodeunitType)
15,753,677✔
178
firstindex(s::AbstractString) = 1
851✔
179
lastindex(s::AbstractString) = thisind(s, ncodeunits(s)::Int)
8,682,711✔
180
isempty(s::AbstractString) = iszero(ncodeunits(s)::Int)
9,060,642✔
181

182
function getindex(s::AbstractString, i::Integer)
247,007✔
183
    @boundscheck checkbounds(s, i)
247,008✔
184
    @inbounds return isvalid(s, i) ? (iterate(s, i)::NTuple{2,Any})[1] : string_index_err(s, i)
247,006✔
185
end
186

187
getindex(s::AbstractString, i::Colon) = s
1✔
188
# TODO: handle other ranges with stride ±1 specially?
189
# TODO: add more @propagate_inbounds annotations?
190
getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
4✔
191
    sprint(io->(for i in v; write(io, s[i]) end), sizehint=length(v))
27✔
192
getindex(s::AbstractString, v::AbstractVector{Bool}) =
2✔
193
    throw(ArgumentError("logical indexing not supported for strings"))
194

195
function get(s::AbstractString, i::Integer, default)
5✔
196
# TODO: use ternary once @inbounds is expression-like
197
    if checkbounds(Bool, s, i)
6✔
198
        @inbounds return s[i]
3✔
199
    else
200
        return default
2✔
201
    end
202
end
203

204
## bounds checking ##
205

206
checkbounds(::Type{Bool}, s::AbstractString, i::Integer) =
484,522,498✔
207
    1 ≤ i ≤ ncodeunits(s)::Int
208
checkbounds(::Type{Bool}, s::AbstractString, r::AbstractRange{<:Integer}) =
16,494,076✔
209
    isempty(r) || (1 ≤ minimum(r) && maximum(r) ≤ ncodeunits(s)::Int)
210
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Real}) =
1✔
211
    all(i -> checkbounds(Bool, s, i), I)
1✔
212
checkbounds(::Type{Bool}, s::AbstractString, I::AbstractArray{<:Integer}) =
9✔
213
    all(i -> checkbounds(Bool, s, i), I)
24✔
214
checkbounds(s::AbstractString, I::Union{Integer,AbstractArray}) =
373,781,516✔
215
    checkbounds(Bool, s, I) ? nothing : throw(BoundsError(s, I))
216

217
## construction, conversion, promotion ##
218

219
string() = ""
23✔
220
string(s::AbstractString) = s
100✔
221

222
Vector{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
×
223
Array{UInt8}(s::AbstractString) = unsafe_wrap(Vector{UInt8}, String(s))
1✔
224
Vector{T}(s::AbstractString) where {T<:AbstractChar} = collect(T, s)
283✔
225

226
Symbol(s::AbstractString) = Symbol(String(s))
1✔
227
Symbol(x...) = Symbol(string(x...))
90,166✔
228

229
convert(::Type{T}, s::T) where {T<:AbstractString} = s
26✔
230
convert(::Type{T}, s::AbstractString) where {T<:AbstractString} = T(s)::T
195,415✔
231

232
## summary ##
233

234
function summary(io::IO, s::AbstractString)
3✔
235
    prefix = isempty(s) ? "empty" : string(ncodeunits(s), "-codeunit")
5✔
236
    print(io, prefix, " ", typeof(s))
3✔
237
end
238

239
## string & character concatenation ##
240

241
"""
242
    *(s::Union{AbstractString, AbstractChar}, t::Union{AbstractString, AbstractChar}...) -> AbstractString
243

244
Concatenate strings and/or characters, producing a [`String`](@ref) or
245
[`AnnotatedString`](@ref) (as appropriate). This is equivalent to calling the
246
[`string`](@ref) or [`annotatedstring`](@ref) function on the arguments. Concatenation of built-in string
247
types always produces a value of type `String` but other string types may choose
248
to return a string of a different type as appropriate.
249

250
# Examples
251
```jldoctest
252
julia> "Hello " * "world"
253
"Hello world"
254

255
julia> 'j' * "ulia"
256
"julia"
257
```
258
"""
259
function (*)(s1::Union{AbstractChar, AbstractString}, ss::Union{AbstractChar, AbstractString}...)
586,190✔
260
    isannotated = s1 isa AnnotatedString || s1 isa AnnotatedChar ||
1,820,703✔
261
        any(s -> s isa AnnotatedString || s isa AnnotatedChar, ss)
53✔
262
    if isannotated
494✔
263
        annotatedstring(s1, ss...)
403✔
264
    else
265
        string(s1, ss...)
6,928,824✔
266
    end
267
end
268

269
one(::Union{T,Type{T}}) where {T<:AbstractString} = convert(T, "")
4✔
270

271
## generic string comparison ##
272

273
"""
274
    cmp(a::AbstractString, b::AbstractString) -> Int
275

276
Compare two strings. Return `0` if both strings have the same length and the character
277
at each index is the same in both strings. Return `-1` if `a` is a prefix of `b`, or if
278
`a` comes before `b` in alphabetical order. Return `1` if `b` is a prefix of `a`, or if
279
`b` comes before `a` in alphabetical order (technically, lexicographical order by Unicode
280
code points).
281

282
# Examples
283
```jldoctest
284
julia> cmp("abc", "abc")
285
0
286

287
julia> cmp("ab", "abc")
288
-1
289

290
julia> cmp("abc", "ab")
291
1
292

293
julia> cmp("ab", "ac")
294
-1
295

296
julia> cmp("ac", "ab")
297
1
298

299
julia> cmp("α", "a")
300
1
301

302
julia> cmp("b", "β")
303
-1
304
```
305
"""
306
function cmp(a::AbstractString, b::AbstractString)
348✔
307
    a === b && return 0
348✔
308
    (iv1, iv2) = (iterate(a), iterate(b))
491✔
309
    while iv1 !== nothing && iv2 !== nothing
924✔
310
        (c, d) = (first(iv1)::AbstractChar, first(iv2)::AbstractChar)
641✔
311
        c ≠ d && return ifelse(c < d, -1, 1)
641✔
312
        (iv1, iv2) = (iterate(a, last(iv1)), iterate(b, last(iv2)))
874✔
313
    end
598✔
314
    return iv1 === nothing ? (iv2 === nothing ? 0 : -1) : 1
283✔
315
end
316

317
"""
318
    ==(a::AbstractString, b::AbstractString) -> Bool
319

320
Test whether two strings are equal character by character (technically, Unicode
321
code point by code point). Should either string be a [`AnnotatedString`](@ref) the
322
string properties must match too.
323

324
# Examples
325
```jldoctest
326
julia> "abc" == "abc"
327
true
328

329
julia> "abc" == "αβγ"
330
false
331
```
332
"""
333
==(a::AbstractString, b::AbstractString) = cmp(a, b) == 0
289✔
334

335
"""
336
    isless(a::AbstractString, b::AbstractString) -> Bool
337

338
Test whether string `a` comes before string `b` in alphabetical order
339
(technically, in lexicographical order by Unicode code points).
340

341
# Examples
342
```jldoctest
343
julia> isless("a", "b")
344
true
345

346
julia> isless("β", "α")
347
false
348

349
julia> isless("a", "a")
350
false
351
```
352
"""
353
isless(a::AbstractString, b::AbstractString) = cmp(a, b) < 0
9,146,241✔
354

355
# faster comparisons for symbols
356

357
@assume_effects :total function cmp(a::Symbol, b::Symbol)
56✔
358
    Int(sign(ccall(:strcmp, Int32, (Cstring, Cstring), a, b)))
34,871,693✔
359
end
360

361
isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
34,861,139✔
362

363
# hashing
364

365
hash(s::AbstractString, h::UInt) = hash(String(s), h)
1✔
366

367
## character index arithmetic ##
368

369
"""
370
    length(s::AbstractString) -> Int
371
    length(s::AbstractString, i::Integer, j::Integer) -> Int
372

373
Return the number of characters in string `s` from indices `i` through `j`.
374

375
This is computed as the number of code unit indices from `i` to `j` which are
376
valid character indices. With only a single string argument, this computes
377
the number of characters in the entire string. With `i` and `j` arguments it
378
computes the number of indices between `i` and `j` inclusive that are valid
379
indices in the string `s`. In addition to in-bounds values, `i` may take the
380
out-of-bounds value `ncodeunits(s) + 1` and `j` may take the out-of-bounds
381
value `0`.
382

383
!!! note
384
    The time complexity of this operation is linear in general. That is, it
385
    will take the time proportional to the number of bytes or characters in
386
    the string because it counts the value on the fly. This is in contrast to
387
    the method for arrays, which is a constant-time operation.
388

389
See also [`isvalid`](@ref), [`ncodeunits`](@ref), [`lastindex`](@ref),
390
[`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref).
391

392
# Examples
393
```jldoctest
394
julia> length("jμΛIα")
395
5
396
```
397
"""
398
length(s::AbstractString) = @inbounds return length(s, 1, ncodeunits(s)::Int)
3,367✔
399

400
function length(s::AbstractString, i::Int, j::Int)
59,576✔
401
    @boundscheck begin
59,576✔
402
        0 < i ≤ ncodeunits(s)::Int+1 || throw(BoundsError(s, i))
59,576✔
403
        0 ≤ j < ncodeunits(s)::Int+1 || throw(BoundsError(s, j))
59,579✔
404
    end
405
    n = 0
59,573✔
406
    for k = i:j
90,632✔
407
        @inbounds n += isvalid(s, k)
1,703,305✔
408
    end
3,375,551✔
409
    return n
59,573✔
410
end
411

412
@propagate_inbounds length(s::AbstractString, i::Integer, j::Integer) =
1✔
413
    length(s, Int(i), Int(j))
414

415
"""
416
    thisind(s::AbstractString, i::Integer) -> Int
417

418
If `i` is in bounds in `s` return the index of the start of the character whose
419
encoding code unit `i` is part of. In other words, if `i` is the start of a
420
character, return `i`; if `i` is not the start of a character, rewind until the
421
start of a character and return that index. If `i` is equal to 0 or `ncodeunits(s)+1`
422
return `i`. In all other cases throw `BoundsError`.
423

424
# Examples
425
```jldoctest
426
julia> thisind("α", 0)
427
0
428

429
julia> thisind("α", 1)
430
1
431

432
julia> thisind("α", 2)
433
1
434

435
julia> thisind("α", 3)
436
3
437

438
julia> thisind("α", 4)
439
ERROR: BoundsError: attempt to access 2-codeunit String at index [4]
440
[...]
441

442
julia> thisind("α", -1)
443
ERROR: BoundsError: attempt to access 2-codeunit String at index [-1]
444
[...]
445
```
446
"""
447
thisind(s::AbstractString, i::Integer) = thisind(s, Int(i))
4✔
448

449
function thisind(s::AbstractString, i::Int)
1,670✔
450
    z = ncodeunits(s)::Int + 1
1,670✔
451
    i == z && return i
1,670✔
452
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
1,673✔
453
    @inbounds while 1 < i && !(isvalid(s, i)::Bool)
1,649✔
454
        i -= 1
1,614✔
455
    end
1,614✔
456
    return i
1,649✔
457
end
458

459
"""
460
    prevind(str::AbstractString, i::Integer, n::Integer=1) -> Int
461

462
* Case `n == 1`
463

464
  If `i` is in bounds in `s` return the index of the start of the character whose
465
  encoding starts before index `i`. In other words, if `i` is the start of a
466
  character, return the start of the previous character; if `i` is not the start
467
  of a character, rewind until the start of a character and return that index.
468
  If `i` is equal to `1` return `0`.
469
  If `i` is equal to `ncodeunits(str)+1` return `lastindex(str)`.
470
  Otherwise throw `BoundsError`.
471

472
* Case `n > 1`
473

474
  Behaves like applying `n` times `prevind` for `n==1`. The only difference
475
  is that if `n` is so large that applying `prevind` would reach `0` then each remaining
476
  iteration decreases the returned value by `1`.
477
  This means that in this case `prevind` can return a negative value.
478

479
* Case `n == 0`
480

481
  Return `i` only if `i` is a valid index in `str` or is equal to `ncodeunits(str)+1`.
482
  Otherwise `StringIndexError` or `BoundsError` is thrown.
483

484
# Examples
485
```jldoctest
486
julia> prevind("α", 3)
487
1
488

489
julia> prevind("α", 1)
490
0
491

492
julia> prevind("α", 0)
493
ERROR: BoundsError: attempt to access 2-codeunit String at index [0]
494
[...]
495

496
julia> prevind("α", 2, 2)
497
0
498

499
julia> prevind("α", 2, 3)
500
-1
501
```
502
"""
503
prevind(s::AbstractString, i::Integer, n::Integer) = prevind(s, Int(i), Int(n))
2✔
504
prevind(s::AbstractString, i::Integer)             = prevind(s, Int(i))
2,413,779✔
505
prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
10,208,531✔
506

507
function prevind(s::AbstractString, i::Int, n::Int)
12,680,484✔
508
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
12,680,484✔
509
    z = ncodeunits(s) + 1
12,680,480✔
510
    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
12,680,514✔
511
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
12,680,446✔
512
    while n > 0 && 1 < i
49,971,233✔
513
        @inbounds n -= isvalid(s, i -= 1)
37,367,829✔
514
    end
37,367,829✔
515
    return i - n
12,603,404✔
516
end
517

518
"""
519
    nextind(str::AbstractString, i::Integer, n::Integer=1) -> Int
520

521
* Case `n == 1`
522

523
  If `i` is in bounds in `s` return the index of the start of the character whose
524
  encoding starts after index `i`. In other words, if `i` is the start of a
525
  character, return the start of the next character; if `i` is not the start
526
  of a character, move forward until the start of a character and return that index.
527
  If `i` is equal to `0` return `1`.
528
  If `i` is in bounds but greater or equal to `lastindex(str)` return `ncodeunits(str)+1`.
529
  Otherwise throw `BoundsError`.
530

531
* Case `n > 1`
532

533
  Behaves like applying `n` times `nextind` for `n==1`. The only difference
534
  is that if `n` is so large that applying `nextind` would reach `ncodeunits(str)+1` then
535
  each remaining iteration increases the returned value by `1`. This means that in this
536
  case `nextind` can return a value greater than `ncodeunits(str)+1`.
537

538
* Case `n == 0`
539

540
  Return `i` only if `i` is a valid index in `s` or is equal to `0`.
541
  Otherwise `StringIndexError` or `BoundsError` is thrown.
542

543
# Examples
544
```jldoctest
545
julia> nextind("α", 0)
546
1
547

548
julia> nextind("α", 1)
549
3
550

551
julia> nextind("α", 3)
552
ERROR: BoundsError: attempt to access 2-codeunit String at index [3]
553
[...]
554

555
julia> nextind("α", 0, 2)
556
3
557

558
julia> nextind("α", 1, 2)
559
4
560
```
561
"""
562
nextind(s::AbstractString, i::Integer, n::Integer) = nextind(s, Int(i), Int(n))
2✔
563
nextind(s::AbstractString, i::Integer)             = nextind(s, Int(i))
2✔
564
nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
4,068✔
565

566
function nextind(s::AbstractString, i::Int, n::Int)
2,532,064✔
567
    n < 0 && throw(ArgumentError("n cannot be negative: $n"))
2,532,064✔
568
    z = ncodeunits(s)
2,532,057✔
569
    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
2,532,077✔
570
    n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
2,532,037✔
571
    while n > 0 && i < z
36,553,666✔
572
        @inbounds n -= isvalid(s, i += 1)
34,089,443✔
573
    end
34,089,443✔
574
    return i + n
2,464,223✔
575
end
576

577
## string index iteration type ##
578

579
struct EachStringIndex{T<:AbstractString}
580
    s::T
4,104✔
581
end
582
keys(s::AbstractString) = EachStringIndex(s)
3,740✔
583

584
length(e::EachStringIndex) = length(e.s)
1,480✔
585
first(::EachStringIndex) = 1
861✔
586
last(e::EachStringIndex) = lastindex(e.s)
1,175✔
587
iterate(e::EachStringIndex, state=firstindex(e.s)) = state > ncodeunits(e.s) ? nothing : (state, nextind(e.s, state))
19,358,646✔
588
eltype(::Type{<:EachStringIndex}) = Int
7✔
589

590
"""
591
    isascii(c::Union{AbstractChar,AbstractString}) -> Bool
592

593
Test whether a character belongs to the ASCII character set, or whether this is true for
594
all elements of a string.
595

596
# Examples
597
```jldoctest
598
julia> isascii('a')
599
true
600

601
julia> isascii('α')
602
false
603

604
julia> isascii("abc")
605
true
606

607
julia> isascii("αβγ")
608
false
609
```
610
For example, `isascii` can be used as a predicate function for [`filter`](@ref) or [`replace`](@ref)
611
to remove or replace non-ASCII characters, respectively:
612
```jldoctest
613
julia> filter(isascii, "abcdeγfgh") # discard non-ASCII chars
614
"abcdefgh"
615

616
julia> replace("abcdeγfgh", !isascii=>' ') # replace non-ASCII chars with spaces
617
"abcde fgh"
618
```
619
"""
620
isascii(c::Char) = bswap(reinterpret(UInt32, c)) < 0x80
7,741,036✔
621
isascii(s::AbstractString) = all(isascii, s)
1✔
622
isascii(c::AbstractChar) = UInt32(c) < 0x80
1✔
623

624
@inline function _isascii(code_units::AbstractVector{CU}, first, last) where {CU}
40,509✔
625
    r = zero(CU)
40,464✔
626
    for n = first:last
1,467,705✔
627
        @inbounds r |= code_units[n]
8,189,134✔
628
    end
15,644,421✔
629
    return 0 ≤ r < 0x80
733,858✔
630
end
631

632
#The chunking algorithm makes the last two chunks overlap inorder to keep the size fixed
633
@inline function  _isascii_chunks(chunk_size,cu::AbstractVector{CU}, first,last) where {CU}
×
634
    n=first
×
635
    while n <= last - chunk_size
786✔
636
        _isascii(cu,n,n+chunk_size-1) || return false
780✔
637
        n += chunk_size
732✔
638
    end
732✔
639
    return  _isascii(cu,last-chunk_size+1,last)
30✔
640
end
641
"""
642
    isascii(cu::AbstractVector{CU}) where {CU <: Integer} -> Bool
643

644
Test whether all values in the vector belong to the ASCII character set (0x00 to 0x7f).
645
This function is intended to be used by other string implementations that need a fast ASCII check.
646
"""
647
function isascii(cu::AbstractVector{CU}) where {CU <: Integer}
692,477✔
648
    chunk_size = 1024
×
649
    chunk_threshold =  chunk_size + (chunk_size ÷ 2)
×
650
    first = firstindex(cu);   last = lastindex(cu)
692,477✔
651
    l = last - first + 1
692,477✔
652
    l < chunk_threshold && return _isascii(cu,first,last)
692,477✔
653
    return _isascii_chunks(chunk_size,cu,first,last)
786✔
654
end
655

656
## string map, filter ##
657

658
function map(f, s::AbstractString)
20,885✔
659
    out = StringVector(max(4, sizeof(s)::Int÷sizeof(codeunit(s)::CodeunitType)))
20,894✔
660
    index = UInt(1)
26✔
661
    for c::AbstractChar in s
41,709✔
662
        c′ = f(c)
206,539✔
663
        isa(c′, AbstractChar) || throw(ArgumentError(
144✔
664
            "map(f, s::AbstractString) requires f to return AbstractChar; " *
665
            "try map(f, collect(s)) or a comprehension instead"))
666
        index + 3 > length(out) && resize!(out, unsigned(2 * length(out)))
206,538✔
667
        index += __unsafe_string!(out, convert(Char, c′), index)
206,612✔
668
    end
392,164✔
669
    resize!(out, index-1)
41,768✔
670
    sizehint!(out, index-1)
20,884✔
671
    return String(out)
20,884✔
672
end
673

674
function filter(f, s::AbstractString)
2✔
675
    out = IOBuffer(sizehint=sizeof(s))
4✔
676
    for c in s
2✔
677
        f(c) && write(out, c)
57✔
678
    end
57✔
679
    String(_unsafe_take!(out))
2✔
680
end
681

682
## string first and last ##
683

684
"""
685
    first(s::AbstractString, n::Integer)
686

687
Get a string consisting of the first `n` characters of `s`.
688

689
# Examples
690
```jldoctest
691
julia> first("∀ϵ≠0: ϵ²>0", 0)
692
""
693

694
julia> first("∀ϵ≠0: ϵ²>0", 1)
695
"∀"
696

697
julia> first("∀ϵ≠0: ϵ²>0", 3)
698
"∀ϵ≠"
699
```
700
"""
701
first(s::AbstractString, n::Integer) = @inbounds s[1:min(end, nextind(s, 0, n))]
26✔
702

703
"""
704
    last(s::AbstractString, n::Integer)
705

706
Get a string consisting of the last `n` characters of `s`.
707

708
# Examples
709
```jldoctest
710
julia> last("∀ϵ≠0: ϵ²>0", 0)
711
""
712

713
julia> last("∀ϵ≠0: ϵ²>0", 1)
714
"0"
715

716
julia> last("∀ϵ≠0: ϵ²>0", 3)
717
"²>0"
718
```
719
"""
720
last(s::AbstractString, n::Integer) = @inbounds s[max(1, prevind(s, ncodeunits(s)+1, n)):end]
8✔
721

722
"""
723
    reverseind(v, i)
724

725
Given an index `i` in [`reverse(v)`](@ref), return the corresponding index in
726
`v` so that `v[reverseind(v,i)] == reverse(v)[i]`. (This can be nontrivial in
727
cases where `v` contains non-ASCII characters.)
728

729
# Examples
730
```jldoctest
731
julia> s = "Julia🚀"
732
"Julia🚀"
733

734
julia> r = reverse(s)
735
"🚀ailuJ"
736

737
julia> for i in eachindex(s)
738
           print(r[reverseind(r, i)])
739
       end
740
Julia🚀
741
```
742
"""
743
reverseind(s::AbstractString, i::Integer) = thisind(s, ncodeunits(s)-i+1)
1,307✔
744

745
"""
746
    repeat(s::AbstractString, r::Integer)
747

748
Repeat a string `r` times. This can be written as `s^r`.
749

750
See also [`^`](@ref :^(::Union{AbstractString, AbstractChar}, ::Integer)).
751

752
# Examples
753
```jldoctest
754
julia> repeat("ha", 3)
755
"hahaha"
756
```
757
"""
758
repeat(s::AbstractString, r::Integer) = repeat(String(s), r)
5✔
759

760
"""
761
    ^(s::Union{AbstractString,AbstractChar}, n::Integer) -> AbstractString
762

763
Repeat a string or character `n` times. This can also be written as `repeat(s, n)`.
764

765
See also [`repeat`](@ref).
766

767
# Examples
768
```jldoctest
769
julia> "Test "^3
770
"Test Test Test "
771
```
772
"""
773
(^)(s::Union{AbstractString,AbstractChar}, r::Integer) = repeat(s, r)
777,674✔
774

775
# reverse-order iteration for strings and indices thereof
776
iterate(r::Iterators.Reverse{<:AbstractString}, i=lastindex(r.itr)) = i < firstindex(r.itr) ? nothing : (r.itr[i], prevind(r.itr, i))
478,241✔
777
iterate(r::Iterators.Reverse{<:EachStringIndex}, i=lastindex(r.itr.s)) = i < firstindex(r.itr.s) ? nothing : (i, prevind(r.itr.s, i))
490,975✔
778

779
## code unit access ##
780

781
"""
782
    CodeUnits(s::AbstractString)
783

784
Wrap a string (without copying) in an immutable vector-like object that accesses the code units
785
of the string's representation.
786
"""
787
struct CodeUnits{T,S<:AbstractString} <: DenseVector{T}
788
    s::S
789
    CodeUnits(s::S) where {S<:AbstractString} = new{codeunit(s),S}(s)
1,676,465✔
790
end
791

792
length(s::CodeUnits) = ncodeunits(s.s)
50,827,743✔
793
sizeof(s::CodeUnits{T}) where {T} = ncodeunits(s.s) * sizeof(T)
71✔
794
size(s::CodeUnits) = (length(s),)
8,650,704✔
795
elsize(s::Type{<:CodeUnits{T}}) where {T} = sizeof(T)
3✔
796
@propagate_inbounds getindex(s::CodeUnits, i::Int) = codeunit(s.s, i)
66,649,903✔
797
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
1✔
798
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
42,084,848✔
799

800

801
write(io::IO, s::CodeUnits) = write(io, s.s)
1✔
802

803
cconvert(::Type{Ptr{T}},    s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)
62✔
804
cconvert(::Type{Ptr{Int8}}, s::CodeUnits{UInt8}) = cconvert(Ptr{Int8}, s.s)
×
805

806
"""
807
    codeunits(s::AbstractString)
808

809
Obtain a vector-like object containing the code units of a string.
810
Returns a `CodeUnits` wrapper by default, but `codeunits` may optionally be defined
811
for new string types if necessary.
812

813
# Examples
814
```jldoctest
815
julia> codeunits("Juλia")
816
6-element Base.CodeUnits{UInt8, String}:
817
 0x4a
818
 0x75
819
 0xce
820
 0xbb
821
 0x69
822
 0x61
823
```
824
"""
825
codeunits(s::AbstractString) = CodeUnits(s)
1,677,149✔
826

827
function _split_rest(s::AbstractString, n::Int)
1✔
828
    lastind = lastindex(s)
1✔
829
    i = try
1✔
830
        prevind(s, lastind, n)
1✔
831
    catch e
832
        e isa BoundsError || rethrow()
×
833
        _check_length_split_rest(length(s), n)
1✔
834
    end
835
    last_n = SubString(s, nextind(s, i), lastind)
1✔
836
    front = s[begin:i]
1✔
837
    return front, last_n
1✔
838
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc