• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #38002

06 Feb 2025 06:14AM UTC coverage: 20.322% (-2.4%) from 22.722%
#38002

push

local

web-flow
bpart: Fully switch to partitioned semantics (#57253)

This is the final PR in the binding partitions series (modulo bugs and
tweaks), i.e. it closes #54654 and thus closes #40399, which was the
original design sketch.

This thus activates the full designed semantics for binding partitions,
in particular allowing safe replacement of const bindings. It in
particular allows struct redefinitions. This thus closes
timholy/Revise.jl#18 and also closes #38584.

The biggest semantic change here is probably that this gets rid of the
notion of "resolvedness" of a binding. Previously, a lot of the behavior
of our implementation depended on when bindings were "resolved", which
could happen at basically an arbitrary point (in the compiler, in REPL
completion, in a different thread), making a lot of the semantics around
bindings ill- or at least implementation-defined. There are several
related issues in the bugtracker, so this closes #14055 closes #44604
closes #46354 closes #30277

It is also the last step to close #24569.
It also supports bindings for undef->defined transitions and thus closes
#53958 closes #54733 - however, this is not activated yet for
performance reasons and may need some further optimization.

Since resolvedness no longer exists, we need to replace it with some
hopefully more well-defined semantics. I will describe the semantics
below, but before I do I will make two notes:

1. There are a number of cases where these semantics will behave
slightly differently than the old semantics absent some other task going
around resolving random bindings.
2. The new behavior (except for the replacement stuff) was generally
permissible under the old semantics if the bindings happened to be
resolved at the right time.

With all that said, there are essentially three "strengths" of bindings:

1. Implicit Bindings: Anything implicitly obtained from `using Mod`, "no
binding", plus slightly more exotic corner cases around conflicts

2. Weakly declared bindin... (continued)

11 of 111 new or added lines in 7 files covered. (9.91%)

1273 existing lines in 68 files now uncovered.

9908 of 48755 relevant lines covered (20.32%)

105126.48 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

16.72
/base/regex.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
## object-oriented Regex interface ##
4

5
include("pcre.jl")
6

7
const DEFAULT_COMPILER_OPTS = PCRE.UTF | PCRE.MATCH_INVALID_UTF | PCRE.ALT_BSUX | PCRE.UCP
8
const DEFAULT_MATCH_OPTS = PCRE.NO_UTF_CHECK
9

10
"""
11
    Regex(pattern[, flags]) <: AbstractPattern
12

13
A type representing a regular expression. `Regex` objects can be used to match strings
14
with [`match`](@ref).
15

16
`Regex` objects can be created using the [`@r_str`](@ref) string macro. The
17
`Regex(pattern[, flags])` constructor is usually used if the `pattern` string needs
18
to be interpolated. See the documentation of the string macro for details on flags.
19

20
!!! note
21
    To escape interpolated variables use `\\Q` and `\\E` (e.g. `Regex("\\\\Q\$x\\\\E")`)
22
"""
23
mutable struct Regex <: AbstractPattern
24
    pattern::String
25
    compile_options::UInt32
26
    match_options::UInt32
27
    regex::Ptr{Cvoid}
28

29
    function Regex(pattern::AbstractString, compile_options::Integer,
×
30
                   match_options::Integer)
31
        pattern = String(pattern)::String
×
32
        compile_options = UInt32(compile_options)
×
33
        match_options = UInt32(match_options)
×
34
        if (compile_options & ~PCRE.COMPILE_MASK) != 0
×
35
            throw(ArgumentError("invalid regex compile options: $compile_options"))
×
36
        end
37
        if (match_options & ~PCRE.EXECUTE_MASK) !=0
×
38
            throw(ArgumentError("invalid regex match options: $match_options"))
×
39
        end
40
        re = compile(new(pattern, compile_options, match_options, C_NULL))
×
41
        finalizer(re) do re
×
42
            re.regex == C_NULL || PCRE.free_re(re.regex)
×
43
        end
44
        re
×
45
    end
46
end
47

48
function Regex(pattern::AbstractString, flags::AbstractString)
×
49
    compile_options = DEFAULT_COMPILER_OPTS
×
50
    match_options = DEFAULT_MATCH_OPTS
×
51
    for f in flags
×
52
        if f == 'a'
×
53
            # instruct pcre2 to treat the strings as simple bytes (aka "ASCII"), not char encodings
54
            compile_options &= ~PCRE.UCP  # user can re-enable with (*UCP)
×
55
            compile_options &= ~PCRE.UTF # user can re-enable with (*UTF)
×
56
            compile_options &= ~PCRE.MATCH_INVALID_UTF # this would force on UTF
×
57
            match_options &= ~PCRE.NO_UTF_CHECK # if the user did force on UTF, we should check it for safety
×
58
        else
59
            compile_options |= f=='i' ? PCRE.CASELESS  :
×
60
                               f=='m' ? PCRE.MULTILINE :
61
                               f=='s' ? PCRE.DOTALL    :
62
                               f=='x' ? PCRE.EXTENDED  :
63
                               throw(ArgumentError("unknown regex flag: $f"))
64
        end
65
    end
×
66
    Regex(pattern, compile_options, match_options)
×
67
end
68
Regex(pattern::AbstractString) = Regex(pattern, DEFAULT_COMPILER_OPTS, DEFAULT_MATCH_OPTS)
×
69

70
function compile(regex::Regex)
320✔
71
    if regex.regex == C_NULL
320✔
72
        if !isdefinedglobal(PCRE, :PCRE_COMPILE_LOCK)
3✔
73
            regex.regex = PCRE.compile(regex.pattern, regex.compile_options)
×
74
            PCRE.jit_compile(regex.regex)
×
75
        else
76
            l = PCRE.PCRE_COMPILE_LOCK
3✔
77
            lock(l)
3✔
78
            try
3✔
79
                if regex.regex == C_NULL
3✔
80
                    regex.regex = PCRE.compile(regex.pattern, regex.compile_options)
3✔
81
                    PCRE.jit_compile(regex.regex)
3✔
82
                end
83
            finally
84
                unlock(l)
3✔
85
            end
86
        end
87
    end
88
    regex
×
89
end
90

91
"""
92
    @r_str -> Regex
93

94
Construct a regex, such as `r"^[a-z]*\$"`, without interpolation and unescaping (except for
95
quotation mark `"` which still has to be escaped). The regex also accepts one or more flags,
96
listed after the ending quote, to change its behaviour:
97

98
- `i` enables case-insensitive matching
99
- `m` treats the `^` and `\$` tokens as matching the start and end of individual lines, as
100
  opposed to the whole string.
101
- `s` allows the `.` modifier to match newlines.
102
- `x` enables "free-spacing mode": whitespace between regex tokens is ignored except when escaped with `\\`,
103
   and `#` in the regex is treated as starting a comment (which is ignored to the line ending).
104
- `a` enables ASCII mode (disables `UTF` and `UCP` modes). By default `\\B`, `\\b`, `\\D`,
105
  `\\d`, `\\S`, `\\s`, `\\W`, `\\w`, etc. match based on Unicode character properties. With
106
  this option, these sequences only match ASCII characters. This includes `\\u` also, which
107
  will emit the specified character value directly as a single byte, and not attempt to
108
  encode it into UTF-8. Importantly, this option allows matching against invalid UTF-8
109
  strings, by treating both matcher and target as simple bytes (as if they were ISO/IEC
110
  8859-1 / Latin-1 bytes) instead of as character encodings. In this case, this option is
111
  often combined with `s`. This option can be further refined by starting the pattern with
112
  (*UCP) or (*UTF).
113

114
See [`Regex`](@ref) if interpolation is needed.
115

116
# Examples
117
```jldoctest
118
julia> match(r"a+.*b+.*?d\$"ism, "Goodbye,\\nOh, angry,\\nBad world\\n")
119
RegexMatch("angry,\\nBad world")
120
```
121
This regex has the first three flags enabled.
122
"""
123
macro r_str(pattern, flags...) Regex(pattern, flags...) end
12✔
124

UNCOV
125
function show(io::IO, re::Regex)
×
UNCOV
126
    imsx = PCRE.CASELESS|PCRE.MULTILINE|PCRE.DOTALL|PCRE.EXTENDED
×
UNCOV
127
    ac = PCRE.UTF|PCRE.MATCH_INVALID_UTF|PCRE.UCP
×
UNCOV
128
    am = PCRE.NO_UTF_CHECK
×
UNCOV
129
    opts = re.compile_options
×
UNCOV
130
    mopts = re.match_options
×
UNCOV
131
    default = ((opts & ~imsx) | ac) == DEFAULT_COMPILER_OPTS
×
UNCOV
132
    if default
×
UNCOV
133
       if (opts & ac) == ac
×
UNCOV
134
           default = mopts == DEFAULT_MATCH_OPTS
×
135
       elseif (opts & ac) == 0
×
136
           default = mopts == (DEFAULT_MATCH_OPTS & ~am)
×
137
       else
138
           default = false
×
139
       end
140
   end
UNCOV
141
    if default
×
UNCOV
142
        print(io, "r\"")
×
UNCOV
143
        escape_raw_string(io, re.pattern)
×
UNCOV
144
        print(io, "\"")
×
UNCOV
145
        if (opts & PCRE.CASELESS ) != 0; print(io, "i"); end
×
UNCOV
146
        if (opts & PCRE.MULTILINE) != 0; print(io, "m"); end
×
UNCOV
147
        if (opts & PCRE.DOTALL   ) != 0; print(io, "s"); end
×
UNCOV
148
        if (opts & PCRE.EXTENDED ) != 0; print(io, "x"); end
×
UNCOV
149
        if (opts & ac            ) == 0; print(io, "a"); end
×
150
    else
151
        print(io, "Regex(")
×
152
        show(io, re.pattern)
×
153
        print(io, ", ")
×
154
        show(io, opts)
×
155
        print(io, ", ")
×
156
        show(io, mopts)
×
157
        print(io, ")")
×
158
    end
159
end
160

161
"""
162
`AbstractMatch` objects are used to represent information about matches found
163
in a string using an `AbstractPattern`.
164
"""
165
abstract type AbstractMatch end
166

167
"""
168
    RegexMatch <: AbstractMatch
169

170
A type representing a single match to a [`Regex`](@ref) found in a string.
171
Typically created from the [`match`](@ref) function.
172

173
The `match` field stores the substring of the entire matched string.
174
The `captures` field stores the substrings for each capture group, indexed by number.
175
To index by capture group name, the entire match object should be indexed instead,
176
as shown in the examples.
177
The location of the start of the match is stored in the `offset` field.
178
The `offsets` field stores the locations of the start of each capture group,
179
with 0 denoting a group that was not captured.
180

181
This type can be used as an iterator over the capture groups of the `Regex`,
182
yielding the substrings captured in each group.
183
Because of this, the captures of a match can be destructured.
184
If a group was not captured, `nothing` will be yielded instead of a substring.
185

186
Methods that accept a `RegexMatch` object are defined for [`iterate`](@ref),
187
[`length`](@ref), [`eltype`](@ref), [`keys`](@ref keys(::RegexMatch)), [`haskey`](@ref), and
188
[`getindex`](@ref), where keys are the names or numbers of a capture group.
189
See [`keys`](@ref keys(::RegexMatch)) for more information.
190

191
`Tuple(m)`, `NamedTuple(m)`, and `Dict(m)` can be used to construct more flexible collection types from `RegexMatch` objects.
192

193
!!! compat "Julia 1.11"
194
    Constructing NamedTuples and Dicts from RegexMatches requires Julia 1.11
195

196
# Examples
197
```jldoctest
198
julia> m = match(r"(?<hour>\\d+):(?<minute>\\d+)(am|pm)?", "11:30 in the morning")
199
RegexMatch("11:30", hour="11", minute="30", 3=nothing)
200

201
julia> m.match
202
"11:30"
203

204
julia> m.captures
205
3-element Vector{Union{Nothing, SubString{String}}}:
206
 "11"
207
 "30"
208
 nothing
209

210

211
julia> m["minute"]
212
"30"
213

214
julia> hr, min, ampm = m; # destructure capture groups by iteration
215

216
julia> hr
217
"11"
218

219
julia> Dict(m)
220
Dict{Any, Union{Nothing, SubString{String}}} with 3 entries:
221
  "hour"   => "11"
222
  3        => nothing
223
  "minute" => "30"
224
```
225
"""
226
struct RegexMatch{S<:AbstractString} <: AbstractMatch
227
    match::SubString{S}
17✔
228
    captures::Vector{Union{Nothing,SubString{S}}}
229
    offset::Int
230
    offsets::Vector{Int}
231
    regex::Regex
232
end
233

234
RegexMatch(match::SubString{S}, captures::Vector{Union{Nothing,SubString{S}}},
×
235
           offset::Union{Int, UInt}, offsets::Vector{Int}, regex::Regex) where {S<:AbstractString} =
17✔
236
    RegexMatch{S}(match, captures, offset, offsets, regex)
237

238
"""
239
    keys(m::RegexMatch) -> Vector
240

241
Return a vector of keys for all capture groups of the underlying regex.
242
A key is included even if the capture group fails to match.
243
That is, `idx` will be in the return value even if `m[idx] == nothing`.
244

245
Unnamed capture groups will have integer keys corresponding to their index.
246
Named capture groups will have string keys.
247

248
!!! compat "Julia 1.7"
249
    This method was added in Julia 1.7
250

251
# Examples
252
```jldoctest
253
julia> keys(match(r"(?<hour>\\d+):(?<minute>\\d+)(am|pm)?", "11:30"))
254
3-element Vector{Any}:
255
  "hour"
256
  "minute"
257
 3
258
```
259
"""
260
function keys(m::RegexMatch)
×
261
    idx_to_capture_name = PCRE.capture_names(m.regex.regex)
×
262
    return map(eachindex(m.captures)) do i
×
263
        # If the capture group is named, return it's name, else return it's index
264
        get(idx_to_capture_name, i, i)
×
265
    end
266
end
267

268
function show(io::IO, m::RegexMatch)
×
269
    print(io, "RegexMatch(")
×
270
    show(io, m.match)
×
271
    capture_keys = keys(m)
×
272
    if !isempty(capture_keys)
×
273
        print(io, ", ")
×
274
        for (i, capture_name) in enumerate(capture_keys)
×
275
            print(io, capture_name, "=")
×
276
            show(io, m.captures[i])
×
277
            if i < length(m)
×
278
                print(io, ", ")
×
279
            end
280
        end
×
281
    end
282
    print(io, ")")
×
283
end
284

285
# Capture group extraction
286
getindex(m::RegexMatch, idx::Integer) = m.captures[idx]
122✔
287
function getindex(m::RegexMatch, name::Union{AbstractString,Symbol})
288
    idx = PCRE.substring_number_from_name(m.regex.regex, name)
×
289
    idx <= 0 && error("no capture group named $name found in regex")
×
290
    m[idx]
×
291
end
292

293
haskey(m::RegexMatch, idx::Integer) = idx in eachindex(m.captures)
×
294
function haskey(m::RegexMatch, name::Union{AbstractString,Symbol})
×
295
    idx = PCRE.substring_number_from_name(m.regex.regex, name)
×
296
    return idx > 0
×
297
end
298

299
iterate(m::RegexMatch, args...) = iterate(m.captures, args...)
×
300
length(m::RegexMatch) = length(m.captures)
×
301
eltype(m::RegexMatch) = eltype(m.captures)
×
302

303
NamedTuple(m::RegexMatch) = NamedTuple{Symbol.(Tuple(keys(m)))}(values(m))
×
304
Dict(m::RegexMatch) = Dict(pairs(m))
×
305

306
function occursin(r::Regex, s::AbstractString; offset::Integer=0)
254,969✔
307
    compile(r)
254,969✔
308
    return PCRE.exec_r(r.regex, String(s), offset, r.match_options)
254,969✔
309
end
310

311
function occursin(r::Regex, s::SubString{String}; offset::Integer=0)
646✔
312
    compile(r)
646✔
313
    return PCRE.exec_r(r.regex, s, offset, r.match_options)
646✔
314
end
315

316
"""
317
    startswith(s::AbstractString, prefix::Regex)
318

319
Return `true` if `s` starts with the regex pattern, `prefix`.
320

321
!!! note
322
    `startswith` does not compile the anchoring into the regular
323
    expression, but instead passes the anchoring as
324
    `match_option` to PCRE. If compile time is amortized,
325
    `occursin(r"^...", s)` is faster than `startswith(s, r"...")`.
326

327
See also [`occursin`](@ref) and [`endswith`](@ref).
328

329
!!! compat "Julia 1.2"
330
    This method requires at least Julia 1.2.
331

332
# Examples
333
```jldoctest
334
julia> startswith("JuliaLang", r"Julia|Romeo")
335
true
336
```
337
"""
338
function startswith(s::AbstractString, r::Regex)
1✔
339
    compile(r)
145✔
340
    return PCRE.exec_r(r.regex, String(s), 0, r.match_options | PCRE.ANCHORED)
145✔
341
end
342

343
function startswith(s::SubString{String}, r::Regex)
×
344
    compile(r)
×
345
    return PCRE.exec_r(r.regex, s, 0, r.match_options | PCRE.ANCHORED)
×
346
end
347

348
"""
349
    endswith(s::AbstractString, suffix::Regex)
350

351
Return `true` if `s` ends with the regex pattern, `suffix`.
352

353
!!! note
354
    `endswith` does not compile the anchoring into the regular
355
    expression, but instead passes the anchoring as
356
    `match_option` to PCRE. If compile time is amortized,
357
    `occursin(r"...\$", s)` is faster than `endswith(s, r"...")`.
358

359
See also [`occursin`](@ref) and [`startswith`](@ref).
360

361
!!! compat "Julia 1.2"
362
    This method requires at least Julia 1.2.
363

364
# Examples
365
```jldoctest
366
julia> endswith("JuliaLang", r"Lang|Roberts")
367
true
368
```
369
"""
370
function endswith(s::AbstractString, r::Regex)
×
371
    compile(r)
×
372
    return PCRE.exec_r(r.regex, String(s), 0, r.match_options | PCRE.ENDANCHORED)
×
373
end
374

375
function endswith(s::SubString{String}, r::Regex)
×
376
    compile(r)
×
377
    return PCRE.exec_r(r.regex, s, 0, r.match_options | PCRE.ENDANCHORED)
×
378
end
379

380
function chopprefix(s::AbstractString, prefix::Regex)
×
381
    m = match(prefix, s, firstindex(s), PCRE.ANCHORED)
×
382
    m === nothing && return SubString(s)
×
383
    return SubString(s, ncodeunits(m.match) + 1)
×
384
end
385

386
function chopsuffix(s::AbstractString, suffix::Regex)
×
387
    m = match(suffix, s, firstindex(s), PCRE.ENDANCHORED)
×
388
    m === nothing && return SubString(s)
×
389
    isempty(m.match) && return SubString(s)
×
390
    return SubString(s, firstindex(s), prevind(s, m.offset))
×
391
end
392

393

394
"""
395
    match(r::Regex, s::AbstractString[, idx::Integer[, addopts]])
396

397
Search for the first match of the regular expression `r` in `s` and return a [`RegexMatch`](@ref)
398
object containing the match, or nothing if the match failed.
399
The optional `idx` argument specifies an index at which to start the search.
400
The matching substring can be retrieved by accessing `m.match`, the captured sequences can be retrieved by accessing `m.captures`.
401
The resulting [`RegexMatch`](@ref) object can be used to construct other collections: e.g. `Tuple(m)`, `NamedTuple(m)`.
402

403
!!! compat "Julia 1.11"
404
    Constructing NamedTuples and Dicts requires Julia 1.11
405

406
# Examples
407
```jldoctest
408
julia> rx = r"a(.)a"
409
r"a(.)a"
410

411
julia> m = match(rx, "cabac")
412
RegexMatch("aba", 1="b")
413

414
julia> m.captures
415
1-element Vector{Union{Nothing, SubString{String}}}:
416
 "b"
417

418
julia> m.match
419
"aba"
420

421
julia> match(rx, "cabac", 3) === nothing
422
true
423
```
424
"""
425
function match end
426

427
function match(re::Regex, str::Union{SubString{String}, String}, idx::Integer,
19✔
428
               add_opts::UInt32=UInt32(0))
429
    compile(re)
6,302✔
430
    opts = re.match_options | add_opts
19✔
431
    matched, data = PCRE.exec_r_data(re.regex, str, idx-1, opts)
19✔
432
    if !matched
19✔
433
        PCRE.free_match_data(data)
2✔
434
        return nothing
2✔
435
    end
436
    n = div(PCRE.ovec_length(data), 2) - 1
17✔
437
    p = PCRE.ovec_ptr(data)
17✔
438
    mat = SubString(str, unsafe_load(p, 1)+1, prevind(str, unsafe_load(p, 2)+1))
17✔
439
    cap = Union{Nothing,SubString{String}}[unsafe_load(p,2i+1) == PCRE.UNSET ? nothing :
51✔
440
                                        SubString(str, unsafe_load(p,2i+1)+1,
441
                                                  prevind(str, unsafe_load(p,2i+2)+1)) for i=1:n]
442
    off = Int[ unsafe_load(p,2i+1)+1 for i=1:n ]
51✔
443
    result = RegexMatch(mat, cap, unsafe_load(p,1)+1, off, re)
17✔
444
    PCRE.free_match_data(data)
17✔
445
    return result
17✔
446
end
447

448
function _annotatedmatch(m::RegexMatch{S}, str::AnnotatedString{S}) where {S<:AbstractString}
×
449
    RegexMatch{AnnotatedString{S}}(
×
450
        (@inbounds SubString{AnnotatedString{S}}(
×
451
            str, m.match.offset, m.match.ncodeunits, Val(:noshift))),
452
        Union{Nothing,SubString{AnnotatedString{S}}}[
453
            if !isnothing(cap)
454
                (@inbounds SubString{AnnotatedString{S}}(
×
455
                    str, cap.offset, cap.ncodeunits, Val(:noshift)))
456
            end for cap in m.captures],
457
        m.offset, m.offsets, m.regex)
458
end
459

460
function match(re::Regex, str::AnnotatedString)
×
461
    m = match(re, str.string)
×
462
    if !isnothing(m)
×
463
        _annotatedmatch(m, str)
×
464
    end
465
end
466

467
function match(re::Regex, str::AnnotatedString, idx::Integer, add_opts::UInt32=UInt32(0))
×
468
    m = match(re, str.string, idx, add_opts)
×
469
    if !isnothing(m)
×
470
        _annotatedmatch(m, str)
×
471
    end
472
end
473

474
match(r::Regex, s::AbstractString) = match(r, s, firstindex(s))
6,283✔
475
match(r::Regex, s::AbstractString, i::Integer) = throw(ArgumentError(
×
476
    "regex matching is only available for the String and AnnotatedString types; use String(s) to convert"
477
))
478

479
findnext(re::Regex, str::Union{String,SubString}, idx::Integer) = _findnext_re(re, str, idx, C_NULL)
280✔
480

481
# TODO: return only start index and update deprecation
482
# duck-type str so that external UTF-8 string packages like StringViews can hook in
483
function _findnext_re(re::Regex, str, idx::Integer, match_data::Ptr{Cvoid})
272✔
484
    if idx > nextind(str,lastindex(str))
544✔
485
        throw(BoundsError())
×
486
    end
487
    opts = re.match_options
272✔
488
    compile(re)
272✔
489
    alloc = match_data == C_NULL
272✔
490
    if alloc
272✔
491
        matched, data = PCRE.exec_r_data(re.regex, str, idx-1, opts)
272✔
492
    else
UNCOV
493
        matched = PCRE.exec(re.regex, str, idx-1, opts, match_data)
×
494
        data = match_data
×
495
    end
496
    if matched
272✔
497
        p = PCRE.ovec_ptr(data)
243✔
498
        ans = (Int(unsafe_load(p,1))+1):prevind(str,Int(unsafe_load(p,2))+1)
243✔
499
    else
500
        ans = nothing
×
501
    end
502
    alloc && PCRE.free_match_data(data)
272✔
503
    return ans
272✔
504
end
505
findnext(r::Regex, s::AbstractString, idx::Integer) = throw(ArgumentError(
×
506
    "regex search is only available for the String type; use String(s) to convert"
507
))
508
findfirst(r::Regex, s::AbstractString) = findnext(r,s,firstindex(s))
×
509

510

511
"""
512
    findall(c::AbstractChar, s::AbstractString)
513

514
Return a vector `I` of the indices of `s` where `s[i] == c`. If there are no such
515
elements in `s`, return an empty array.
516

517
# Examples
518
```jldoctest
519
julia> findall('a', "batman")
520
2-element Vector{Int64}:
521
 2
522
 5
523
```
524

525
!!! compat "Julia 1.7"
526
     This method requires at least Julia 1.7.
527
"""
528
findall(c::AbstractChar, s::AbstractString) = findall(isequal(c),s)
×
529

530

531
"""
532
    count(
533
        pattern::Union{AbstractChar,AbstractString,AbstractPattern},
534
        string::AbstractString;
535
        overlap::Bool = false,
536
    )
537

538
Return the number of matches for `pattern` in `string`. This is equivalent to
539
calling `length(findall(pattern, string))` but more efficient.
540

541
If `overlap=true`, the matching sequences are allowed to overlap indices in the
542
original string, otherwise they must be from disjoint character ranges.
543

544
!!! compat "Julia 1.3"
545
     This method requires at least Julia 1.3.
546

547
!!! compat "Julia 1.7"
548
      Using a character as the pattern requires at least Julia 1.7.
549

550
# Examples
551
```jldoctest
552
julia> count('a', "JuliaLang")
553
2
554

555
julia> count(r"a(.)a", "cabacabac", overlap=true)
556
3
557

558
julia> count(r"a(.)a", "cabacabac")
559
2
560
```
561
"""
562
function count(t::Union{AbstractChar,AbstractString,AbstractPattern}, s::AbstractString; overlap::Bool=false)
×
563
    n = 0
564
    i, e = firstindex(s), lastindex(s)
565
    while true
566
        r = findnext(t, s, i)
567
        isnothing(r) && break
568
        n += 1
569
        j = overlap || isempty(r) ? first(r) : last(r)
570
        j > e && break
571
        @inbounds i = nextind(s, j)
572
    end
573
    return n
574
end
575

576
"""
577
    SubstitutionString(substr) <: AbstractString
578

579
Stores the given string `substr` as a `SubstitutionString`, for use in regular expression
580
substitutions. Most commonly constructed using the [`@s_str`](@ref) macro.
581

582
# Examples
583
```jldoctest
584
julia> SubstitutionString("Hello \\\\g<name>, it's \\\\1")
585
s"Hello \\g<name>, it's \\1"
586

587
julia> subst = s"Hello \\g<name>, it's \\1"
588
s"Hello \\g<name>, it's \\1"
589

590
julia> typeof(subst)
591
SubstitutionString{String}
592
```
593
"""
594
struct SubstitutionString{T<:AbstractString} <: AbstractString
595
    string::T
1✔
596
end
597

598
ncodeunits(s::SubstitutionString) = ncodeunits(s.string)::Int
×
599
codeunit(s::SubstitutionString) = codeunit(s.string)::CodeunitType
×
600
codeunit(s::SubstitutionString, i::Integer) = codeunit(s.string, i)::Union{UInt8, UInt16, UInt32}
×
601
isvalid(s::SubstitutionString, i::Integer) = isvalid(s.string, i)::Bool
×
602
iterate(s::SubstitutionString, i::Integer...) = iterate(s.string, i...)::Union{Nothing,Tuple{AbstractChar,Int}}
×
603

604
function show(io::IO, s::SubstitutionString)
×
605
    print(io, "s\"")
×
606
    escape_raw_string(io, s.string)
×
607
    print(io, "\"")
×
608
end
609

610
"""
611
    @s_str -> SubstitutionString
612

613
Construct a substitution string, used for regular expression substitutions.  Within the
614
string, sequences of the form `\\N` refer to the Nth capture group in the regex, and
615
`\\g<groupname>` refers to a named capture group with name `groupname`.
616

617
# Examples
618
```jldoctest
619
julia> msg = "#Hello# from Julia";
620

621
julia> replace(msg, r"#(.+)# from (?<from>\\w+)" => s"FROM: \\g<from>; MESSAGE: \\1")
622
"FROM: Julia; MESSAGE: Hello"
623
```
624
"""
625
macro s_str(string) SubstitutionString(string) end
626

627
# replacement
628

629
struct RegexAndMatchData
630
    re::Regex
631
    match_data::Ptr{Cvoid}
UNCOV
632
    RegexAndMatchData(re::Regex) = (compile(re); new(re, PCRE.create_match_data(re.regex)))
×
633
end
634

UNCOV
635
findnext(pat::RegexAndMatchData, str, i) = _findnext_re(pat.re, str, i, pat.match_data)
×
636

UNCOV
637
_pat_replacer(r::Regex) = RegexAndMatchData(r)
×
638

UNCOV
639
_free_pat_replacer(r::RegexAndMatchData) = PCRE.free_match_data(r.match_data)
×
640

641
replace_err(repl) = error("Bad replacement string: $repl")
×
642

643
function _write_capture(io::IO, group::Int, str, r, re::RegexAndMatchData)
644
    len = PCRE.substring_length_bynumber(re.match_data, group)
×
645
    # in the case of an optional group that doesn't match, len == 0
646
    len == 0 && return
×
647
    ensureroom(io, len+1)
×
648
    PCRE.substring_copy_bynumber(re.match_data, group,
×
649
        pointer(io.data, io.ptr), len+1)
650
    io.ptr += len
×
651
    io.size = max(io.size, io.ptr - 1)
×
652
    nothing
×
653
end
654
function _write_capture(io::IO, group::Int, str, r, re)
×
655
    group == 0 || replace_err("pattern is not a Regex")
×
656
    return print(io, SubString(str, r))
×
657
end
658

659

660
const SUB_CHAR = '\\'
661
const GROUP_CHAR = 'g'
662
const KEEP_ESC = [SUB_CHAR, GROUP_CHAR, '0':'9'...]
663

664
function _replace(io, repl_s::SubstitutionString, str, r, re)
×
665
    LBRACKET = '<'
×
666
    RBRACKET = '>'
×
667
    repl = unescape_string(repl_s.string, KEEP_ESC)
×
668
    i = firstindex(repl)
×
669
    e = lastindex(repl)
×
670
    while i <= e
×
671
        if repl[i] == SUB_CHAR
×
672
            next_i = nextind(repl, i)
×
673
            next_i > e && replace_err(repl)
×
674
            if repl[next_i] == SUB_CHAR
×
675
                write(io, SUB_CHAR)
×
676
                i = nextind(repl, next_i)
×
677
            elseif isdigit(repl[next_i])
×
678
                group = parse(Int, repl[next_i])
×
679
                i = nextind(repl, next_i)
×
680
                while i <= e
×
681
                    if isdigit(repl[i])
×
682
                        group = 10group + parse(Int, repl[i])
×
683
                        i = nextind(repl, i)
×
684
                    else
685
                        break
×
686
                    end
687
                end
×
688
                _write_capture(io, group, str, r, re)
×
689
            elseif repl[next_i] == GROUP_CHAR
×
690
                i = nextind(repl, next_i)
×
691
                if i > e || repl[i] != LBRACKET
×
692
                    replace_err(repl)
×
693
                end
694
                i = nextind(repl, i)
×
695
                i > e && replace_err(repl)
×
696
                groupstart = i
×
697
                while repl[i] != RBRACKET
×
698
                    i = nextind(repl, i)
×
699
                    i > e && replace_err(repl)
×
700
                end
×
701
                groupname = SubString(repl, groupstart, prevind(repl, i))
×
702
                if all(isdigit, groupname)
×
703
                    group = parse(Int, groupname)
×
704
                elseif re isa RegexAndMatchData
×
705
                    group = PCRE.substring_number_from_name(re.re.regex, groupname)
×
706
                    group < 0 && replace_err("Group $groupname not found in regex $(re.re)")
×
707
                else
708
                    group = -1
×
709
                end
710
                _write_capture(io, group, str, r, re)
×
711
                i = nextind(repl, i)
×
712
            else
713
                replace_err(repl)
×
714
            end
715
        else
716
            write(io, repl[i])
×
717
            i = nextind(repl, i)
×
718
        end
719
    end
×
720
end
721

722
struct RegexMatchIterator{S <: AbstractString}
723
    regex::Regex
724
    string::S
725
    overlap::Bool
726

727
    RegexMatchIterator(regex::Regex, string::AbstractString, ovr::Bool=false) =
×
728
        new{String}(regex, String(string), ovr)
729
    RegexMatchIterator(regex::Regex, string::AnnotatedString, ovr::Bool=false) =
×
730
        new{AnnotatedString{String}}(regex, AnnotatedString(String(string.string), string.annotations), ovr)
731
end
732
compile(itr::RegexMatchIterator) = (compile(itr.regex); itr)
×
733
eltype(::Type{<:RegexMatchIterator}) = RegexMatch
×
734
IteratorSize(::Type{<:RegexMatchIterator}) = SizeUnknown()
×
735

736
function iterate(itr::RegexMatchIterator, (offset,prevempty)=(1,false))
737
    opts_nonempty = UInt32(PCRE.ANCHORED | PCRE.NOTEMPTY_ATSTART)
×
738
    while true
×
739
        mat = match(itr.regex, itr.string, offset,
×
740
                    prevempty ? opts_nonempty : UInt32(0))
741

742
        if mat === nothing
×
743
            if prevempty && offset <= sizeof(itr.string)
×
744
                offset = nextind(itr.string, offset)
×
745
                prevempty = false
×
746
                continue
×
747
            else
748
                break
×
749
            end
750
        else
751
            if itr.overlap
×
752
                if !isempty(mat.match)
×
753
                    offset = nextind(itr.string, mat.offset)
×
754
                else
755
                    offset = mat.offset
×
756
                end
757
            else
758
                offset = mat.offset + ncodeunits(mat.match)
×
759
            end
760
            return (mat, (offset, isempty(mat.match)))
×
761
        end
762
    end
×
763
    nothing
×
764
end
765

766
"""
767
    eachmatch(r::Regex, s::AbstractString; overlap::Bool=false)
768

769
Search for all matches of the regular expression `r` in `s` and return an iterator over the
770
matches. If `overlap` is `true`, the matching sequences are allowed to overlap indices in the
771
original string, otherwise they must be from distinct character ranges.
772

773
# Examples
774
```jldoctest
775
julia> rx = r"a.a"
776
r"a.a"
777

778
julia> m = eachmatch(rx, "a1a2a3a")
779
Base.RegexMatchIterator{String}(r"a.a", "a1a2a3a", false)
780

781
julia> collect(m)
782
2-element Vector{RegexMatch}:
783
 RegexMatch("a1a")
784
 RegexMatch("a3a")
785

786
julia> collect(eachmatch(rx, "a1a2a3a", overlap = true))
787
3-element Vector{RegexMatch}:
788
 RegexMatch("a1a")
789
 RegexMatch("a2a")
790
 RegexMatch("a3a")
791
```
792
"""
793
eachmatch(re::Regex, str::AbstractString; overlap = false) =
×
794
    RegexMatchIterator(re, str, overlap)
795

796
## comparison ##
797

798
function ==(a::Regex, b::Regex)
×
799
    a.pattern == b.pattern && a.compile_options == b.compile_options && a.match_options == b.match_options
×
800
end
801

802
## hash ##
803
const hashre_seed = UInt === UInt64 ? 0x67e195eb8555e72d : 0xe32373e4
804
function hash(r::Regex, h::UInt)
×
805
    h += hashre_seed
×
806
    h = hash(r.pattern, h)
×
807
    h = hash(r.compile_options, h)
×
808
    h = hash(r.match_options, h)
×
809
end
810

811
## String operations ##
812

813
"""
814
    *(s::Regex, t::Union{Regex,AbstractString,AbstractChar}) -> Regex
815
    *(s::Union{Regex,AbstractString,AbstractChar}, t::Regex) -> Regex
816

817
Concatenate regexes, strings and/or characters, producing a [`Regex`](@ref).
818
String and character arguments must be matched exactly in the resulting regex,
819
meaning that the contained characters are devoid of any special meaning
820
(they are quoted with "\\Q" and "\\E").
821

822
!!! compat "Julia 1.3"
823
    This method requires at least Julia 1.3.
824

825
# Examples
826
```jldoctest
827
julia> match(r"Hello|Good bye" * ' ' * "world", "Hello world")
828
RegexMatch("Hello world")
829

830
julia> r = r"a|b" * "c|d"
831
r"(?:a|b)\\Qc|d\\E"
832

833
julia> match(r, "ac") == nothing
834
true
835

836
julia> match(r, "ac|d")
837
RegexMatch("ac|d")
838
```
839
"""
840
function *(r1::Union{Regex,AbstractString,AbstractChar}, rs::Union{Regex,AbstractString,AbstractChar}...)
×
841
    mask = PCRE.CASELESS | PCRE.MULTILINE | PCRE.DOTALL | PCRE.EXTENDED # imsx
×
842
    match_opts   = nothing # all args must agree on this
×
843
    compile_opts = nothing # all args must agree on this
×
844
    shared = mask
×
845
    for r in (r1, rs...)
×
846
        r isa Regex || continue
×
847
        if match_opts === nothing
×
848
            match_opts = r.match_options
×
849
            compile_opts = r.compile_options & ~mask
×
850
        else
851
            r.match_options == match_opts &&
×
852
                r.compile_options & ~mask == compile_opts ||
853
                throw(ArgumentError("cannot multiply regexes: incompatible options"))
854
        end
855
        shared &= r.compile_options
×
856
    end
×
857
    unshared = mask & ~shared
×
858
    Regex(string(wrap_string(r1, unshared), wrap_string.(rs, Ref(unshared))...), compile_opts | shared, match_opts)
×
859
end
860

861
*(r::Regex) = r # avoids wrapping r in a useless subpattern
×
862

863
wrap_string(r::Regex, unshared::UInt32) = string("(?", regex_opts_str(r.compile_options & unshared), ':', r.pattern, ')')
×
864
# if s contains raw"\E", split '\' and 'E' within two distinct \Q...\E groups:
865
wrap_string(s::AbstractString, ::UInt32) =  string("\\Q", replace(s, raw"\E" => raw"\\E\QE"), "\\E")
×
866
wrap_string(s::AbstractChar, ::UInt32) = string("\\Q", s, "\\E")
×
867

868
regex_opts_str(opts) = (isassigned(_regex_opts_str) ? _regex_opts_str[] : init_regex())[opts]
×
869

870
# UInt32 to String mapping for some compile options
871
const _regex_opts_str = Ref{ImmutableDict{UInt32,String}}()
872

873
@noinline init_regex() = _regex_opts_str[] = foldl(0:15, init=ImmutableDict{UInt32,String}()) do d, o
×
874
    opt = UInt32(0)
×
875
    str = ""
×
876
    if o & 1 != 0
×
877
        opt |= PCRE.CASELESS
×
878
        str *= 'i'
×
879
    end
880
    if o & 2 != 0
×
881
        opt |= PCRE.MULTILINE
×
882
        str *= 'm'
×
883
    end
884
    if o & 4 != 0
×
885
        opt |= PCRE.DOTALL
×
886
        str *= 's'
×
887
    end
888
    if o & 8 != 0
×
889
        opt |= PCRE.EXTENDED
×
890
        str *= 'x'
×
891
    end
892
    ImmutableDict(d, opt => str)
×
893
end
894

895

896
"""
897
    ^(s::Regex, n::Integer) -> Regex
898

899
Repeat a regex `n` times.
900

901
!!! compat "Julia 1.3"
902
    This method requires at least Julia 1.3.
903

904
# Examples
905
```jldoctest
906
julia> r"Test "^2
907
r"(?:Test ){2}"
908

909
julia> match(r"Test "^2, "Test Test ")
910
RegexMatch("Test Test ")
911
```
912
"""
913
^(r::Regex, i::Integer) = Regex(string("(?:", r.pattern, "){$i}"), r.compile_options, r.match_options)
×
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc