• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / 1408

14 Jan 2026 04:57AM UTC coverage: 76.686% (-0.06%) from 76.749%
1408

push

buildkite

web-flow
Move thunk and export evaluation out of `jl_toplevel_eval_flex` (#60528)

Deconstructing `jl_toplevel_eval_flex` to help with JuliaLowering integration
and to further disentangle top level evaluation from `Expr` and from the code in
toplevel.c:

### `jl_eval_thunk`: C API for CodeInfo evaluation without Expr

Separate CodeInfo evaluation out of `jl_toplevel_eval_flex` into
`jl_eval_thunk` so that `CodeInfo` can become the basic unit of toplevel
evaluation rather than `Expr`.

The latest world age set and restore now moves entirely into `jl_eval_thunk` as
`jl_toplevel_eval_flex` no longer modifies the task's world age. The world age
set and restore inside `jl_eval_thunk` needs to moves outward so that it wraps
around the calls to `jl_resolve_definition_effects_in_ir` which interprets some
limited top level expressions for the argument types foreigncall, etc.
(Previously the world age was updated to the latest in functions which call
`jl_toplevel_eval_flex` so resolving definition effects happened to work.)

`jl_filename` and `jl_lineno` are also set from `jl_eval_thunk` using the same
debuginfo used within codegen so that deprecation binding errors and
`jl_fprint_critical_error` can contain the approximate line number.

### `jl_module_public` API for declaring symbols public/exported

Move public/export batching out of `jl_toplevel_eval_flex` and into a C runtime
function `jl_module_public` which can be called without using eval and without
constructing an Expr.

### Side quest: Fix GC rooting in `jl_declare_constant_val2`

Various seemingly-unrelated changes in `jl_toplevel_eval_flex`
mysteriously cause the static analyzer to see a rooting problem in
`jl_declare_constant_val2`: The `val` argument is currently declared
JL_MAYBE_UNROOTED, but we call JL_LOCK without rooting it.

I've done the simplest fix here which is to root `val` inside
`jl_declare_constant_val2`. An alternative would be to remove the
`JL_MAYBE_UNROOTED` annotation which could be an ... (continued)

62680 of 81736 relevant lines covered (76.69%)

23380003.16 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

75.88
/Compiler/src/optimize.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
#############
4
# constants #
5
#############
6

7
# The slot has uses that are not statically dominated by any assignment
8
# This is implied by `SLOT_USEDUNDEF`.
9
# If this is not set, all the uses are (statically) dominated by the defs.
10
# In particular, if a slot has `AssignedOnce && !StaticUndef`, it is an SSA.
11
const SLOT_STATICUNDEF  = 1 # slot might be used before it is defined (structurally)
12
const SLOT_ASSIGNEDONCE = 16 # slot is assigned to only once
13
const SLOT_USEDUNDEF    = 32 # slot has uses that might raise UndefVarError
14
# const SLOT_CALLED      = 64
15

16
# NOTE make sure to sync the flag definitions below with julia.h and `jl_code_info_set_ir` in method.c
17

18
const IR_FLAG_NULL        = zero(UInt32)
19
# This statement is marked as @inbounds by user.
20
# If replaced by inlining, any contained boundschecks may be removed.
21
const IR_FLAG_INBOUNDS    = one(UInt32) << 0
22
# This statement is marked as @inline by user
23
const IR_FLAG_INLINE      = one(UInt32) << 1
24
# This statement is marked as @noinline by user
25
const IR_FLAG_NOINLINE    = one(UInt32) << 2
26
# This statement is proven :consistent
27
const IR_FLAG_CONSISTENT  = one(UInt32) << 3
28
# This statement is proven :effect_free
29
const IR_FLAG_EFFECT_FREE = one(UInt32) << 4
30
# This statement is proven :nothrow
31
const IR_FLAG_NOTHROW     = one(UInt32) << 5
32
# This statement is proven :terminates_globally
33
const IR_FLAG_TERMINATES  = one(UInt32) << 6
34
#const IR_FLAG_TERMINATES_LOCALLY = one(UInt32) << 7
35
#const IR_FLAG_NOTASKSTATE = one(UInt32) << 8
36
#const IR_FLAG_INACCESSIBLEMEM = one(UInt32) << 9
37
const IR_FLAG_NOUB        = one(UInt32) << 10
38
#const IR_FLAG_NOUBINIB   = one(UInt32) << 11
39
#const IR_FLAG_CONSISTENTOVERLAY = one(UInt32) << 12
40
# This statement is :nortcall
41
const IR_FLAG_NORTCALL = one(UInt32) << 13
42
# An optimization pass has updated this statement in a way that may
43
# have exposed information that inference did not see. Re-running
44
# inference on this statement may be profitable.
45
const IR_FLAG_REFINED     = one(UInt32) << 16
46
# This statement has no users and may be deleted if flags get refined to IR_FLAGS_REMOVABLE
47
const IR_FLAG_UNUSED      = one(UInt32) << 17
48
# TODO: Both of these next two should eventually go away once
49
# This statement is :effect_free == EFFECT_FREE_IF_INACCESSIBLEMEMONLY
50
const IR_FLAG_EFIIMO      = one(UInt32) << 18
51
# This statement is :inaccessiblememonly == INACCESSIBLEMEM_OR_ARGMEMONLY
52
const IR_FLAG_INACCESSIBLEMEM_OR_ARGMEM = one(UInt32) << 19
53

54
const NUM_IR_FLAGS = 3 # sync with julia.h
55

56
const IR_FLAGS_EFFECTS =
57
    IR_FLAG_CONSISTENT | IR_FLAG_EFFECT_FREE | IR_FLAG_NOTHROW |
58
    IR_FLAG_TERMINATES | IR_FLAG_NOUB | IR_FLAG_NORTCALL
59

60
const IR_FLAGS_REMOVABLE = IR_FLAG_EFFECT_FREE | IR_FLAG_NOTHROW | IR_FLAG_TERMINATES
61

62
const IR_FLAGS_NEEDS_EA = IR_FLAG_EFIIMO | IR_FLAG_INACCESSIBLEMEM_OR_ARGMEM
63

64
has_flag(curr::UInt32, flag::UInt32) = (curr & flag) == flag
163,912,564✔
65

66
function iscallstmt(@nospecialize stmt)
67
    stmt isa Expr || return false
5,966,631✔
68
    head = stmt.head
4,329,375✔
69
    return head === :call || head === :invoke || head === :foreigncall
8,255,410✔
70
end
71

72
function flags_for_effects(effects::Effects)
73
    flags = zero(UInt32)
1,001,418✔
74
    if is_consistent(effects)
4,209,784✔
75
        flags |= IR_FLAG_CONSISTENT
907,695✔
76
    end
77
    if is_effect_free(effects)
4,209,784✔
78
        flags |= IR_FLAG_EFFECT_FREE
3,174,871✔
79
    elseif is_effect_free_if_inaccessiblememonly(effects)
1,034,913✔
80
        flags |= IR_FLAG_EFIIMO
15,015✔
81
    end
82
    if is_nothrow(effects)
4,209,784✔
83
        flags |= IR_FLAG_NOTHROW
3,863,201✔
84
    end
85
    if is_terminates(effects)
4,209,784✔
86
        flags |= IR_FLAG_TERMINATES
4,066,325✔
87
    end
88
    if is_inaccessiblemem_or_argmemonly(effects)
4,209,784✔
89
        flags |= IR_FLAG_INACCESSIBLEMEM_OR_ARGMEM
258,311✔
90
    end
91
    if is_noub(effects)
4,209,784✔
92
        flags |= IR_FLAG_NOUB
3,974,573✔
93
    end
94
    if is_nortcall(effects)
4,209,784✔
95
        flags |= IR_FLAG_NORTCALL
4,053,457✔
96
    end
97
    return flags
4,209,784✔
98
end
99

100
const TOP_TUPLE = GlobalRef(Core, :tuple)
101

102
inlining_cost(@nospecialize src) =
900,352✔
103
    src isa Union{MaybeCompressed,UInt8} ? ccall(:jl_ir_inlining_cost, InlineCostType, (Any,), src) : MAX_INLINE_COST
104
is_inlineable(@nospecialize src) = inlining_cost(src) != MAX_INLINE_COST
900,352✔
105
set_inlineable!(src::CodeInfo, val::Bool) =
113✔
106
    src.inlining_cost = (val ? MIN_INLINE_COST : MAX_INLINE_COST)
107

108
function inline_cost_clamp(x::Int)
109
    x > MAX_INLINE_COST && return MAX_INLINE_COST
107,799✔
110
    x < MIN_INLINE_COST && return MIN_INLINE_COST
107,799✔
111
    x = ccall(:jl_encode_inlining_cost, UInt8, (InlineCostType,), x)
37,651✔
112
    x = ccall(:jl_decode_inlining_cost, InlineCostType, (UInt8,), x)
37,651✔
113
    return x
37,651✔
114
end
115

116
const SRC_FLAG_DECLARED_INLINE = 0x1
117
const SRC_FLAG_DECLARED_NOINLINE = 0x2
118

119
is_declared_inline(@nospecialize src::MaybeCompressed) =
373,626✔
120
    ccall(:jl_ir_flag_inlining, UInt8, (Any,), src) == SRC_FLAG_DECLARED_INLINE
121

122
is_declared_noinline(@nospecialize src::MaybeCompressed) =
9✔
123
    ccall(:jl_ir_flag_inlining, UInt8, (Any,), src) == SRC_FLAG_DECLARED_NOINLINE
124

125
#####################
126
# OptimizationState #
127
#####################
128

129
# return whether this src should be inlined. If so, retrieve_ir_for_inlining must return an IRCode from it
130

131
function src_inlining_policy(interp::AbstractInterpreter, mi::MethodInstance,
132
    @nospecialize(src), @nospecialize(info::CallInfo), stmt_flag::UInt32)
133
    # If we have a generator, but we can't invoke it (because argument type information is lacking),
134
    # don't inline so we defer its invocation to runtime where we'll have precise type information.
135
    if isa(mi.def, Method) && hasgenerator(mi)
529,087✔
136
        may_invoke_generator(mi) || return false
314✔
137
    end
138
    return src_inlining_policy(interp, src, info, stmt_flag)
529,102✔
139
end
140

141
function src_inlining_policy(::AbstractInterpreter,
533,504✔
142
    @nospecialize(src), @nospecialize(info::CallInfo), stmt_flag::UInt32)
143
    isa(src, OptimizationState) && (src = src.src)
533,513✔
144
    if isa(src, MaybeCompressed)
533,513✔
145
        src_inlineable = is_stmt_inline(stmt_flag) || is_inlineable(src)
1,065,267✔
146
        return src_inlineable
532,668✔
147
    elseif isa(src, IRCode)
845✔
148
        return true
9✔
149
    end
150
    @assert !isa(src, CodeInstance) # handled by caller
836✔
151
    return false
836✔
152
end
153

154
struct InliningState{Interp<:AbstractInterpreter}
155
    edges::Vector{Any}
168,604✔
156
    interp::Interp
157
    opt_cache::IdDict{MethodInstance,CodeInstance}
158
end
159
function InliningState(sv::InferenceState, interp::AbstractInterpreter,
160
                       opt_cache::IdDict{MethodInstance,CodeInstance}=IdDict{MethodInstance,CodeInstance}())
161
    return InliningState(sv.edges, interp, opt_cache)
168,586✔
162
end
163
function InliningState(interp::AbstractInterpreter,
9✔
164
                       opt_cache::IdDict{MethodInstance,CodeInstance}=IdDict{MethodInstance,CodeInstance}())
165
    return InliningState(Any[], interp, opt_cache)
27✔
166
end
167

168
struct OptimizerCache{CodeCache}
169
    cache::CodeCache
170
    opt_cache::IdDict{MethodInstance,CodeInstance}
171
    function OptimizerCache(
172
        cache::CodeCache,
173
        opt_cache::IdDict{MethodInstance,CodeInstance}) where CodeCache
174
        return new{CodeCache}(cache, opt_cache)
17,887✔
175
    end
176
end
177
function get((; cache, opt_cache)::OptimizerCache, mi::MethodInstance, default)
178
    if haskey(opt_cache, mi)
19,843✔
179
        return opt_cache[mi] # this is incomplete right now, but will be finished (by finish_cycle) before caching anything
219✔
180
    end
181
    return get(cache, mi, default)
19,405✔
182
end
183

184
# get `code_cache(::AbstractInterpreter)` from `state::InliningState`
185
function code_cache(state::InliningState)
186
    cache = code_cache(state.interp)
19,624✔
187
    return OptimizerCache(cache, state.opt_cache)
19,624✔
188
end
189

190
mutable struct OptimizationResult
191
    ir::IRCode
169,392✔
192
    inline_flag::UInt8
193
    simplified::Bool # indicates whether the IR was processed with `cfg_simplify!`
194
end
195

196
function simplify_ir!(result::OptimizationResult)
197
    result.ir = cfg_simplify!(result.ir)
126,260✔
198
    result.simplified = true
126,260✔
199
end
200

201
mutable struct OptimizationState{Interp<:AbstractInterpreter}
202
    linfo::MethodInstance
168,595✔
203
    src::CodeInfo
204
    optresult::Union{Nothing, OptimizationResult}
205
    stmt_info::Vector{CallInfo}
206
    mod::Module
207
    sptypes::Vector{VarState}
208
    slottypes::Vector{Any}
209
    inlining::InliningState{Interp}
210
    cfg::CFG
211
    unreachable::BitSet
212
    bb_vartables::Vector{Union{Nothing,VarTable}}
213
    insert_coverage::Bool
214
end
215
function OptimizationState(sv::InferenceState, interp::AbstractInterpreter,
216
                           opt_cache::IdDict{MethodInstance,CodeInstance}=IdDict{MethodInstance,CodeInstance}())
217
    inlining = InliningState(sv, interp, opt_cache)
179,898✔
218
    return OptimizationState(sv.linfo, sv.src, nothing, sv.stmt_info, sv.mod,
168,586✔
219
                             sv.sptypes, sv.slottypes, inlining, sv.cfg,
220
                             sv.unreachable, sv.bb_vartables, sv.insert_coverage)
221
end
222
function OptimizationState(mi::MethodInstance, src::CodeInfo, interp::AbstractInterpreter,
9✔
223
                           opt_cache::IdDict{MethodInstance,CodeInstance}=IdDict{MethodInstance,CodeInstance}())
224
    # prepare src for running optimization passes if it isn't already
225
    nssavalues = src.ssavaluetypes
18✔
226
    if nssavalues isa Int
9✔
227
        src.ssavaluetypes = Any[ Any for _ = 1:nssavalues ]
72✔
228
    else
229
        nssavalues = length(src.ssavaluetypes::Vector{Any})
×
230
    end
231
    sptypes = sptypes_from_meth_instance(mi)
9✔
232
    nslots = length(src.slotflags)
9✔
233
    slottypes = src.slottypes
9✔
234
    if slottypes === nothing
9✔
235
        slottypes = Any[ Any for _ = 1:nslots ]
24✔
236
    end
237
    stmt_info = CallInfo[ NoCallInfo() for _ = 1:nssavalues ]
72✔
238
    # cache some useful state computations
239
    def = mi.def
9✔
240
    mod = isa(def, Method) ? def.module : def
12✔
241
    # Allow using the global MI cache, but don't track edges.
242
    # This method is mostly used for unit testing the optimizer
243
    inlining = InliningState(interp, opt_cache)
9✔
244
    cfg = compute_basic_blocks(src.code)
9✔
245
    unreachable = BitSet()
9✔
246
    bb_vartables = Union{VarTable,Nothing}[]
9✔
247
    for _ = 1:length(cfg.blocks)
9✔
248
        push!(bb_vartables, VarState[
129✔
249
            VarState(slottypes[slot], typemin(Int), src.slotflags[slot] & SLOT_USEDUNDEF != 0)
250
            for slot = 1:nslots
251
        ])
252
    end
51✔
253
    return OptimizationState(mi, src, nothing, stmt_info, mod, sptypes, slottypes, inlining, cfg, unreachable, bb_vartables, false)
9✔
254
end
255
function OptimizationState(mi::MethodInstance, interp::AbstractInterpreter)
6✔
256
    world = get_inference_world(interp)
6✔
257
    src = retrieve_code_info(mi, world)
6✔
258
    src === nothing && return nothing
6✔
259
    return OptimizationState(mi, src, interp)
6✔
260
end
261

262
function argextype end # imported by EscapeAnalysis
263
function try_compute_field end # imported by EscapeAnalysis
264

265
include("ssair/heap.jl")
266
include("ssair/slot2ssa.jl")
267
include("ssair/inlining.jl")
268
include("ssair/verify.jl")
269
include("ssair/legacy.jl")
270
include("ssair/EscapeAnalysis.jl")
271
include("ssair/passes.jl")
272
include("ssair/irinterp.jl")
273

274
function ir_to_codeinf!(opt::OptimizationState, frame::InferenceState, edges::SimpleVector)
11,062✔
275
    ir_to_codeinf!(opt, edges, compute_inlining_cost(frame.interp, frame.result, opt.optresult))
11,062✔
276
end
277

278
function ir_to_codeinf!(opt::OptimizationState, edges::SimpleVector, inlining_cost::InlineCostType)
279
    src = ir_to_codeinf!(opt, edges)
22,124✔
280
    src.inlining_cost = inlining_cost
11,062✔
281
    src
11,062✔
282
end
283

284
function ir_to_codeinf!(opt::OptimizationState, edges::SimpleVector)
285
    src = ir_to_codeinf!(opt)
25,031✔
286
    src.edges = edges
13,969✔
287
    src
4,126✔
288
end
289

290
function ir_to_codeinf!(opt::OptimizationState)
291
    (; linfo, src, optresult) = opt
39,673✔
292
    if optresult === nothing
39,673✔
293
        return src
2,907✔
294
    end
295
    src = ir_to_codeinf!(src, optresult.ir)
36,766✔
296
    opt.optresult = nothing
36,766✔
297
    opt.src = src
36,766✔
298
    maybe_validate_code(linfo, src, "optimized")
36,766✔
299
    return src
36,766✔
300
end
301

302
function ir_to_codeinf!(src::CodeInfo, ir::IRCode)
303
    replace_code_newstyle!(src, ir)
36,837✔
304
    widen_all_consts!(src)
36,837✔
305
    return src
12,897✔
306
end
307

308
# widen all Const elements in type annotations
309
function widen_all_consts!(src::CodeInfo)
12,897✔
310
    ssavaluetypes = src.ssavaluetypes::Vector{Any}
12,897✔
311
    for i = 1:length(ssavaluetypes)
25,794✔
312
        ssavaluetypes[i] = widenconst(ssavaluetypes[i])
1,394,077✔
313
    end
2,775,257✔
314

315
    for i = 1:length(src.code)
25,794✔
316
        x = src.code[i]
1,394,077✔
317
        if isa(x, PiNode)
1,394,077✔
318
            src.code[i] = PiNode(x.val, widenconst(x.typ))
9,352✔
319
        end
320
    end
2,775,257✔
321

322
    return src
12,897✔
323
end
324

325
#########
326
# logic #
327
#########
328

329
_topmod(sv::OptimizationState) = _topmod(sv.mod)
×
330

331
is_stmt_inline(stmt_flag::UInt32) = has_flag(stmt_flag, IR_FLAG_INLINE)
1,157,154✔
332
is_stmt_noinline(stmt_flag::UInt32) = has_flag(stmt_flag, IR_FLAG_NOINLINE)
584,518✔
333

334
function new_expr_effect_flags(𝕃ₒ::AbstractLattice, args::Vector{Any}, src::Union{IRCode,IncrementalCompact}, pattern_match=nothing)
6,849✔
335
    Targ = args[1]
13,572✔
336
    atyp = argextype(Targ, src)
6,849✔
337
    # `Expr(:new)` of unknown type could raise arbitrary TypeError.
338
    typ, isexact = instanceof_tfunc(atyp, true)
6,849✔
339
    if !isexact
6,849✔
340
        atyp = unwrap_unionall(widenconst(atyp))
150✔
341
        if isType(atyp) && isTypeDataType(atyp.parameters[1])
300✔
342
            typ = atyp.parameters[1]
150✔
343
        else
344
            return (false, false, false)
×
345
        end
346
        isabstracttype(typ) && return (false, false, false)
150✔
347
    else
348
        isconcretedispatch(typ) || return (false, false, false)
6,699✔
349
    end
350
    typ = typ::DataType
6,849✔
351
    fcount = datatype_fieldcount(typ)
6,849✔
352
    fcount === nothing && return (false, false, false)
6,849✔
353
    fcount >= length(args) - 1 || return (false, false, false)
6,849✔
354
    for fidx in 1:(length(args) - 1)
13,696✔
355
        farg = args[fidx + 1]
12,240✔
356
        eT = argextype(farg, src)
12,240✔
357
        fT = fieldtype(typ, fidx)
12,240✔
358
        if !isexact && has_free_typevars(fT)
12,240✔
359
            if pattern_match !== nothing && pattern_match(src, typ, fidx, Targ, farg)
144✔
360
                continue
×
361
            end
362
            return (false, false, false)
144✔
363
        end
364
        ⊑(𝕃ₒ, eT, fT) || return (false, false, false)
12,096✔
365
    end
17,489✔
366
    return (false, true, true)
6,705✔
367
end
368

369
# Returns a tuple of `(:consistent, :removable, :nothrow)` flags for a given statement.
370
function stmt_effect_flags(𝕃ₒ::AbstractLattice, @nospecialize(stmt), @nospecialize(rt), src::Union{IRCode,IncrementalCompact})
1,235,178✔
371
    # TODO: We're duplicating analysis from inference here.
372
    isa(stmt, PiNode) && return (true, true, true)
1,235,178✔
373
    isa(stmt, PhiNode) && return (true, true, true)
1,185,538✔
374
    isa(stmt, ReturnNode) && return (true, false, true)
1,175,445✔
375
    isa(stmt, EnterNode) && return (true, false, true)
1,040,445✔
376
    isa(stmt, GotoNode) && return (true, false, true)
1,037,301✔
377
    isa(stmt, GotoIfNot) && return (true, false, ⊑(𝕃ₒ, argextype(stmt.cond, src), Bool))
1,029,137✔
378
    if isa(stmt, GlobalRef)
1,021,975✔
379
        # Modeled more precisely in abstract_eval_globalref. In general, if a
380
        # GlobalRef was moved to statement position, it is probably not `const`,
381
        # so we can't say much about it anyway.
382
        return (false, false, false)
×
383
    elseif isa(stmt, Expr)
1,021,975✔
384
        (; head, args) = stmt
757,569✔
385
        if head === :static_parameter
757,569✔
386
            # if we aren't certain enough about the type, it might be an UndefVarError at runtime
387
            sptypes = isa(src, IRCode) ? src.sptypes : src.ir.sptypes
50✔
388
            nothrow = !sptypes[args[1]::Int].undef
50✔
389
            return (true, nothrow, nothrow)
50✔
390
        end
391
        if head === :call
757,519✔
392
            f = argextype(args[1], src)
325,912✔
393
            f = singleton_type(f)
325,912✔
394
            f === nothing && return (false, false, false)
325,912✔
395
            if f === Intrinsics.cglobal || f === Intrinsics.llvmcall
651,728✔
396
                # TODO: these are not yet linearized
397
                return (false, false, false)
×
398
            end
399
            isa(f, Builtin) || return (false, false, false)
565,658✔
400
            # Needs to be handled in inlining to look at the callee effects
401
            f === Core._apply_iterate && return (false, false, false)
86,070✔
402
            argtypes = Any[argextype(args[arg], src) for arg in 2:length(args)]
165,184✔
403
            effects = builtin_effects(𝕃ₒ, f, argtypes, rt)
86,070✔
404
            consistent = is_consistent(effects)
86,070✔
405
            effect_free = is_effect_free(effects)
86,070✔
406
            nothrow = is_nothrow(effects)
86,070✔
407
            terminates = is_terminates(effects)
86,070✔
408
            removable = effect_free & nothrow & terminates
86,070✔
409
            return (consistent, removable, nothrow)
86,070✔
410
        elseif head === :new
431,607✔
411
            return new_expr_effect_flags(𝕃ₒ, args, src)
6,723✔
412
        elseif head === :foreigncall
424,884✔
413
            effects = foreigncall_effects(stmt) do @nospecialize x
620✔
414
                argextype(x, src)
415
            end
416
            consistent = is_consistent(effects)
620✔
417
            effect_free = is_effect_free(effects)
620✔
418
            nothrow = is_nothrow(effects)
620✔
419
            terminates = is_terminates(effects)
620✔
420
            removable = effect_free & nothrow & terminates
620✔
421
            return (consistent, removable, nothrow)
620✔
422
        elseif head === :new_opaque_closure
424,264✔
423
            length(args) < 4 && return (false, false, false)
×
424
            typ = argextype(args[1], src)
×
425
            typ, isexact = instanceof_tfunc(typ, true)
×
426
            isexact || return (false, false, false)
×
427
            ⊑(𝕃ₒ, typ, Tuple) || return (false, false, false)
×
428
            rt_lb = argextype(args[2], src)
×
429
            rt_ub = argextype(args[3], src)
×
430
            source = argextype(args[5], src)
×
431
            if !(⊑(𝕃ₒ, rt_lb, Type) && ⊑(𝕃ₒ, rt_ub, Type) && ⊑(𝕃ₒ, source, Method))
×
432
                return (false, false, false)
×
433
            end
434
            return (false, true, true)
×
435
        elseif head === :inbounds
424,264✔
436
            return (true, true, true)
×
437
        elseif head === :boundscheck || head === :isdefined || head === :the_exception || head === :copyast
848,423✔
438
            return (false, true, true)
333✔
439
        else
440
            # e.g. :loopinfo
441
            return (false, false, false)
423,931✔
442
        end
443
    end
444
    isa(stmt, SlotNumber) && error("unexpected IR elements")
264,406✔
445
    return (true, true, true)
264,406✔
446
end
447

448
function recompute_effects_flags(𝕃ₒ::AbstractLattice, @nospecialize(stmt), @nospecialize(rt),
449
                                 src::Union{IRCode,IncrementalCompact})
450
    flag = IR_FLAG_NULL
1,235,178✔
451
    (consistent, removable, nothrow) = stmt_effect_flags(𝕃ₒ, stmt, rt, src)
1,425,797✔
452
    if consistent
1,425,797✔
453
        flag |= IR_FLAG_CONSISTENT
558,003✔
454
    end
455
    if removable
1,425,797✔
456
        flag |= IR_FLAGS_REMOVABLE
460,860✔
457
    elseif nothrow
964,937✔
458
        flag |= IR_FLAG_NOTHROW
192,458✔
459
    end
460
    if !iscallstmt(stmt)
1,911,577✔
461
        # There is a bit of a subtle point here, which is that some non-call
462
        # statements (e.g. PiNode) can be UB:, however, we consider it
463
        # illegal to introduce such statements that actually cause UB (for any
464
        # input). Ideally that'd be handled at insertion time (TODO), but for
465
        # the time being just do that here.
466
        flag |= IR_FLAG_NOUB
1,032,115✔
467
    end
468
    return flag
1,425,797✔
469
end
470

471
"""
472
    argextype(x, src::Union{IRCode,IncrementalCompact}) -> t
473
    argextype(x, src::CodeInfo, sptypes::Vector{VarState}) -> t
474

475
Return the type of value `x` in the context of inferred source `src`.
476
Note that `t` might be an extended lattice element.
477
Use `widenconst(t)` to get the native Julia type of `x`.
478
"""
479
argextype(@nospecialize(x), ir::IRCode, sptypes::Vector{VarState} = ir.sptypes) =
5,698,882✔
480
    argextype(x, ir, sptypes, ir.argtypes)
481
function argextype(@nospecialize(x), compact::IncrementalCompact, sptypes::Vector{VarState} = compact.ir.sptypes)
6,484✔
482
    isa(x, AnySSAValue) && return types(compact)[x]
6,798,579✔
483
    return argextype(x, compact, sptypes, compact.ir.argtypes)
3,092,914✔
484
end
485
function argextype(@nospecialize(x), src::CodeInfo, sptypes::Vector{VarState})
486
    return argextype(x, src, sptypes, src.slottypes::Union{Vector{Any},Nothing})
4,554✔
487
end
488
function argextype(
4,935,581✔
489
    @nospecialize(x), src::Union{IRCode,IncrementalCompact,CodeInfo},
490
    sptypes::Vector{VarState}, slottypes::Union{Vector{Any},Nothing})
491
    if isa(x, Expr)
4,983,430✔
492
        if x.head === :static_parameter
×
493
            idx = x.args[1]::Int
×
494
            (1 ≤ idx ≤ length(sptypes)) || throw(InvalidIRError())
×
495
            return sptypes[idx].typ
×
496
        elseif x.head === :boundscheck
×
497
            return Bool
×
498
        elseif x.head === :copyast
×
499
            length(x.args) == 0 && throw(InvalidIRError())
×
500
            return argextype(x.args[1], src, sptypes, slottypes)
×
501
        end
502
        Core.println("argextype called on Expr with head ", x.head,
×
503
                     " which is not valid for IR in argument-position.")
504
        @assert false
×
505
    elseif isa(x, SlotNumber)
4,983,430✔
506
        slottypes === nothing && return Any
×
507
        (1 ≤ x.id ≤ length(slottypes)) || throw(InvalidIRError())
×
508
        return slottypes[x.id]
×
509
    elseif isa(x, SSAValue)
4,983,430✔
510
        return abstract_eval_ssavalue(x, src)
1,006,843✔
511
    elseif isa(x, Argument)
4,725,028✔
512
        slottypes === nothing && return Any
312,949✔
513
        (1 ≤ x.n ≤ length(slottypes)) || throw(InvalidIRError())
313,132✔
514
        return slottypes[x.n]
313,132✔
515
    elseif isa(x, QuoteNode)
4,412,079✔
516
        return Const(x.value)
40,450✔
517
    elseif isa(x, GlobalRef)
4,371,629✔
518
        return abstract_eval_globalref_type(x, src)
3,758,013✔
519
    elseif isa(x, PhiNode) || isa(x, PhiCNode) || isa(x, UpsilonNode)
1,227,217✔
520
        return Any
×
521
    elseif isa(x, PiNode)
613,616✔
522
        return x.typ
×
523
    else
524
        return Const(x)
613,616✔
525
    end
526
end
527
function abstract_eval_ssavalue(s::SSAValue, src::CodeInfo)
×
528
    ssavaluetypes = src.ssavaluetypes
×
529
    if ssavaluetypes isa Int
×
530
        (1 ≤ s.id ≤ ssavaluetypes) || throw(InvalidIRError())
×
531
        return Any
×
532
    else
533
        return abstract_eval_ssavalue(s, ssavaluetypes::Vector{Any})
×
534
    end
535
end
536
abstract_eval_ssavalue(s::SSAValue, src::Union{IRCode,IncrementalCompact}) = types(src)[s]
1,006,897✔
537

538
"""
539
    finishopt!(interp::AbstractInterpreter, opt::OptimizationState, ir::IRCode)
540

541
Called at the end of optimization to store the resulting IR back into the OptimizationState.
542
"""
543
function finishopt!(::AbstractInterpreter, opt::OptimizationState, ir::IRCode)
1,175✔
544
    opt.optresult = OptimizationResult(ir, ccall(:jl_ir_flag_inlining, UInt8, (Any,), opt.src), false)
169,392✔
545
    return nothing
159,530✔
546
end
547

548
function visit_bb_phis!(callback, ir::IRCode, bb::Int)
549
    stmts = ir.cfg.blocks[bb].stmts
×
550
    for idx in stmts
×
551
        stmt = ir[SSAValue(idx)][:stmt]
×
552
        if !isa(stmt, PhiNode)
×
553
            if !is_valid_phiblock_stmt(stmt)
×
554
                return
×
555
            end
556
        else
557
            callback(idx)
×
558
        end
559
    end
×
560
end
561

562
function any_stmt_may_throw(ir::IRCode, bb::Int)
563
    for idx in ir.cfg.blocks[bb].stmts
25,918✔
564
        if !has_flag(ir[SSAValue(idx)], IR_FLAG_NOTHROW)
19,224✔
565
            return true
12,941✔
566
        end
567
    end
12,548✔
568
    return false
18✔
569
end
570

571
visit_conditional_successors(callback, ir::IRCode, bb::Int) = # used for test
6✔
572
    visit_conditional_successors(callback, LazyPostDomtree(ir), ir, bb)
573
function visit_conditional_successors(callback, lazypostdomtree::LazyPostDomtree, ir::IRCode, bb::Int)
12,947✔
574
    visited = BitSet((bb,))
12,947✔
575
    worklist = Int[bb]
12,947✔
576
    while !isempty(worklist)
12,969✔
577
        thisbb = popfirst!(worklist)
12,963✔
578
        for succ in ir.cfg.blocks[thisbb].succs
12,963✔
579
            succ in visited && continue
13,129✔
580
            push!(visited, succ)
13,129✔
581
            if postdominates(get!(lazypostdomtree), succ, bb)
19,274✔
582
                # this successor is not conditional, so no need to visit it further
583
                continue
154✔
584
            elseif callback(succ)
19,258✔
585
                return true
12,941✔
586
            else
587
                push!(worklist, succ)
34✔
588
            end
589
        end
188✔
590
    end
22✔
591
    return false
6✔
592
end
593

594
struct AugmentedDomtree
595
    cfg::CFG
×
596
    domtree::DomTree
597
end
598

599
mutable struct LazyAugmentedDomtree
600
    const ir::IRCode
601
    agdomtree::AugmentedDomtree
602
    LazyAugmentedDomtree(ir::IRCode) = new(ir)
158,092✔
603
end
604

605
function get!(lazyagdomtree::LazyAugmentedDomtree)
×
606
    isdefined(lazyagdomtree, :agdomtree) && return lazyagdomtree.agdomtree
×
607
    ir = lazyagdomtree.ir
×
608
    cfg = copy(ir.cfg)
×
609
    # Add a virtual basic block to represent the exit
610
    push!(cfg.blocks, BasicBlock(StmtRange(0:-1)))
×
611
    for bb = 1:(length(cfg.blocks)-1)
×
612
        terminator = ir[SSAValue(last(cfg.blocks[bb].stmts))][:stmt]
×
613
        if isa(terminator, ReturnNode) && isdefined(terminator, :val)
×
614
            cfg_insert_edge!(cfg, bb, length(cfg.blocks))
×
615
        end
616
    end
×
617
    domtree = construct_domtree(cfg)
×
618
    return lazyagdomtree.agdomtree = AugmentedDomtree(cfg, domtree)
×
619
end
620

621
mutable struct PostOptAnalysisState
622
    const result::InferenceResult
623
    const ir::IRCode
624
    const inconsistent::BitSetBoundedMinPrioritySet
625
    const tpdum::TwoPhaseDefUseMap
626
    const lazypostdomtree::LazyPostDomtree
627
    const lazyagdomtree::LazyAugmentedDomtree
628
    const ea_analysis_pending::Vector{Int}
629
    all_retpaths_consistent::Bool
630
    all_effect_free::Bool
631
    effect_free_if_argmem_only::Union{Nothing,Bool}
632
    all_nothrow::Bool
633
    all_noub::Bool
634
    any_conditional_ub::Bool
635
    nortcall::Bool
636
    function PostOptAnalysisState(result::InferenceResult, ir::IRCode)
637
        inconsistent = BitSetBoundedMinPrioritySet(length(ir.stmts))
158,092✔
638
        tpdum = TwoPhaseDefUseMap(length(ir.stmts))
4,959,719✔
639
        lazypostdomtree = LazyPostDomtree(ir)
158,092✔
640
        lazyagdomtree = LazyAugmentedDomtree(ir)
158,092✔
641
        return new(result, ir, inconsistent, tpdum, lazypostdomtree, lazyagdomtree, Int[],
158,092✔
642
                   true, true, nothing, true, true, false, true)
643
    end
644
end
645

646
give_up_refinements!(sv::PostOptAnalysisState) =
3,126✔
647
    sv.all_retpaths_consistent = sv.all_effect_free = sv.effect_free_if_argmem_only =
648
    sv.all_nothrow = sv.all_noub = sv.nortcall = false
649

650
function any_refinable(sv::PostOptAnalysisState)
651
    effects = sv.result.ipo_effects
160,636✔
652
    return ((!is_consistent(effects) & sv.all_retpaths_consistent) |
160,636✔
653
            (!is_effect_free(effects) & sv.all_effect_free) |
654
            (!is_nothrow(effects) & sv.all_nothrow) |
655
            (!is_noub(effects) & sv.all_noub) |
656
            (!is_nortcall(effects) & sv.nortcall))
657
end
658

659
struct GetNativeEscapeCache{CodeCache}
660
    code_cache::CodeCache
661
    GetNativeEscapeCache(code_cache::CodeCache) where CodeCache = new{CodeCache}(code_cache)
×
662
end
663
GetNativeEscapeCache(interp::AbstractInterpreter) = GetNativeEscapeCache(code_cache(interp))
×
664
function ((; code_cache)::GetNativeEscapeCache)(codeinst::Union{CodeInstance,MethodInstance})
×
665
    if codeinst isa MethodInstance
×
666
        codeinst = get(code_cache, codeinst, nothing)
×
667
        codeinst === nothing && return false
×
668
    end
669
    argescapes = traverse_analysis_results(codeinst) do @nospecialize result
×
670
        return result isa EscapeAnalysis.ArgEscapeCache ? result : nothing
×
671
    end
672
    if argescapes !== nothing
×
673
        return argescapes
×
674
    end
675
    effects = codeinst isa CodeInstance ? decode_effects(codeinst.ipo_purity_bits) : codeinst.ipo_effects
×
676
    if is_effect_free(effects) && is_inaccessiblememonly(effects)
×
677
        # We might not have run EA on simple frames without any escapes (e.g. when optimization
678
        # is skipped when result is constant-folded by abstract interpretation). If those
679
        # frames aren't inlined, the accuracy of EA for caller context takes a big hit.
680
        # This is a HACK to avoid that, but obviously, a more comprehensive fix would be ideal.
681
        return true
×
682
    end
683
    return false
×
684
end
685

686
function refine_effects!(interp::AbstractInterpreter, opt::OptimizationState, sv::PostOptAnalysisState)
158,092✔
687
    if !is_effect_free(sv.result.ipo_effects) && sv.all_effect_free && !isempty(sv.ea_analysis_pending)
158,092✔
688
        ir = sv.ir
×
689
        nargs = Int(opt.src.nargs)
×
690
        estate = EscapeAnalysis.analyze_escapes(ir, nargs, optimizer_lattice(interp), get_escape_cache(interp))
×
691
        argescapes = EscapeAnalysis.ArgEscapeCache(estate)
×
692
        stack_analysis_result!(sv.result, argescapes)
×
693
        validate_mutable_arg_escapes!(estate, sv)
×
694
    end
695

696
    any_refinable(sv) || return false
315,560✔
697
    effects = sv.result.ipo_effects
624✔
698
    sv.result.ipo_effects = Effects(effects;
624✔
699
        consistent = sv.all_retpaths_consistent ? ALWAYS_TRUE : effects.consistent,
700
        effect_free = sv.all_effect_free ? ALWAYS_TRUE :
701
                      sv.effect_free_if_argmem_only === true ? EFFECT_FREE_IF_INACCESSIBLEMEMONLY : effects.effect_free,
702
        nothrow = sv.all_nothrow ? true : effects.nothrow,
703
        noub = sv.all_noub ? (sv.any_conditional_ub ? NOUB_IF_NOINBOUNDS : ALWAYS_TRUE) : effects.noub,
704
        nortcall = sv.nortcall ? true : effects.nortcall)
705
    return true
624✔
706
end
707

708
function is_ipo_dataflow_analysis_profitable(effects::Effects)
709
    return !(is_consistent(effects) && is_effect_free(effects) &&
158,394✔
710
             is_nothrow(effects) && is_noub(effects))
711
end
712

713
function iscall_with_boundscheck(@nospecialize(stmt), sv::PostOptAnalysisState)
714
    isexpr(stmt, :call) || return false
8,042,572✔
715
    ft = argextype(stmt.args[1], sv.ir)
278,584✔
716
    f = singleton_type(ft)
278,584✔
717
    f === nothing && return false
278,584✔
718
    if f === getfield
278,584✔
719
        nargs = 4
77,754✔
720
    elseif f === memoryrefnew
200,830✔
721
        nargs= 3
20,650✔
722
    elseif f === memoryrefget || f === memoryref_isassigned
360,148✔
723
        nargs = 4
212✔
724
    elseif f === memoryrefset!
179,968✔
725
        nargs = 5
12,294✔
726
    else
727
        return false
167,674✔
728
    end
729
    length(stmt.args) < nargs && return false
110,910✔
730
    boundscheck = stmt.args[end]
26,931✔
731
    argextype(boundscheck, sv.ir) === Bool || return false
51,589✔
732
    isa(boundscheck, SSAValue) || return false
2,273✔
733
    return true
2,273✔
734
end
735

736
function check_all_args_noescape!(sv::PostOptAnalysisState, ir::IRCode, @nospecialize(stmt),
×
737
                                  estate::EscapeAnalysis.EscapeState)
738
    stmt isa Expr || return false
×
739
    if isexpr(stmt, :invoke)
×
740
        startidx = 2
×
741
    elseif isexpr(stmt, :new)
×
742
        startidx = 1
×
743
    else
744
        return false
×
745
    end
746
    has_no_escape(x::EscapeAnalysis.EscapeInfo) =
×
747
        EscapeAnalysis.has_no_escape(EscapeAnalysis.ignore_argescape(x))
748
    for i = startidx:length(stmt.args)
×
749
        arg = stmt.args[i]
×
750
        argt = argextype(arg, ir)
×
751
        if is_mutation_free_argtype(argt)
×
752
            continue
×
753
        end
754
        # See if we can find the allocation
755
        if isa(arg, Argument)
×
756
            if has_no_escape(estate[arg])
×
757
                # Even if we prove everything else effect_free, the best we can
758
                # say is :effect_free_if_argmem_only
759
                if sv.effect_free_if_argmem_only === nothing
×
760
                    sv.effect_free_if_argmem_only = true
×
761
                end
762
            else
763
                sv.effect_free_if_argmem_only = false
×
764
            end
765
            return false
×
766
        elseif isa(arg, SSAValue)
×
767
            has_no_escape(estate[arg]) || return false
×
768
            check_all_args_noescape!(sv, ir, ir[arg][:stmt], estate) || return false
×
769
        else
770
            return false
×
771
        end
772
    end
×
773
    return true
×
774
end
775

776
function validate_mutable_arg_escapes!(estate::EscapeAnalysis.EscapeState, sv::PostOptAnalysisState)
777
    ir = sv.ir
×
778
    for idx in sv.ea_analysis_pending
×
779
        # See if any mutable memory was allocated in this function and determined
780
        # not to escape.
781
        inst = ir[SSAValue(idx)]
×
782
        stmt = inst[:stmt]
×
783
        if !check_all_args_noescape!(sv, ir, stmt, estate)
×
784
            return sv.all_effect_free = false
×
785
        end
786
    end
×
787
    return true
×
788
end
789

790
function is_conditional_noub(inst::Instruction, sv::PostOptAnalysisState)
791
    stmt = inst[:stmt]
69,191✔
792
    iscall_with_boundscheck(stmt, sv) || return false
137,910✔
793
    barg = stmt.args[end]::SSAValue
472✔
794
    bstmt = sv.ir[barg][:stmt]
472✔
795
    isexpr(bstmt, :boundscheck) || return false
472✔
796
    # If IR_FLAG_INBOUNDS is already set, no more conditional ub
797
    (!isempty(bstmt.args) && bstmt.args[1] === false) && return false
472✔
798
    return true
324✔
799
end
800

801
function scan_non_dataflow_flags!(inst::Instruction, sv::PostOptAnalysisState)
4,093,931✔
802
    flag = inst[:flag]
4,093,931✔
803
    # If we can prove that the argmem does not escape the current function, we can
804
    # refine this to :effect_free.
805
    needs_ea_validation = has_flag(flag, IR_FLAGS_NEEDS_EA)
4,093,931✔
806
    stmt = inst[:stmt]
4,093,931✔
807
    if !needs_ea_validation
4,093,931✔
808
        if !isterminator(stmt) && stmt !== nothing
8,100,684✔
809
            # ignore control flow node – they are not removable on their own and thus not
810
            # have `IR_FLAG_EFFECT_FREE` but still do not taint `:effect_free`-ness of
811
            # the whole method invocation
812
            sv.all_effect_free &= has_flag(flag, IR_FLAG_EFFECT_FREE)
3,799,004✔
813
        end
814
    elseif sv.all_effect_free
13,683✔
815
        if (isexpr(stmt, :invoke) || isexpr(stmt, :new) ||
×
816
            # HACK for performance: limit the scope of EA to code with object field access only,
817
            # since its abilities to reason about e.g. arrays are currently very limited anyways.
818
            is_known_call(stmt, setfield!, sv.ir))
819
            push!(sv.ea_analysis_pending, inst.idx)
×
820
        else
821
            sv.all_effect_free = false
×
822
        end
823
    end
824
    sv.all_nothrow &= has_flag(flag, IR_FLAG_NOTHROW)
4,093,931✔
825
    if !has_flag(flag, IR_FLAG_NOUB)
4,093,931✔
826
        # Special case: `:boundscheck` into `getfield` or memory operations is `:noub_if_noinbounds`
827
        if is_conditional_noub(inst, sv)
69,515✔
828
            sv.any_conditional_ub = true
324✔
829
        else
830
            sv.all_noub = false
68,867✔
831
        end
832
    end
833
    if !has_flag(flag, IR_FLAG_NORTCALL)
4,093,931✔
834
        # if a function call that might invoke `Core.Compiler.return_type` has been deleted,
835
        # there's no need to taint with `:nortcall`, allowing concrete evaluation
836
        if iscallstmt(stmt)
7,162,461✔
837
            sv.nortcall = false
50,484✔
838
        end
839
    end
840
    nothing
4,093,931✔
841
end
842

843
function scan_inconsistency!(inst::Instruction, sv::PostOptAnalysisState)
4,091,387✔
844
    flag = inst[:flag]
4,091,387✔
845
    stmt_inconsistent = !has_flag(flag, IR_FLAG_CONSISTENT)
4,091,387✔
846
    stmt = inst[:stmt]
4,091,387✔
847
    # Special case: For `getfield` and memory operations, we allow inconsistency of the :boundscheck argument
848
    (; inconsistent, tpdum) = sv
4,091,387✔
849
    if iscall_with_boundscheck(stmt, sv)
4,346,564✔
850
        for i = 1:length(stmt.args)
3,602✔
851
            val = stmt.args[i]
7,294✔
852
            # SSAValue should be the only permitted argument type which can be inconsistent found here.
853
            # Others (e.g. GlobalRef) should have been moved to statement position. See stmt_effect_flags.
854
            if isa(val, SSAValue)
7,294✔
855
                if i < length(stmt.args)  # not the boundscheck argument (which is last)
3,896✔
856
                    stmt_inconsistent |= val.id in inconsistent
4,190✔
857
                end
858
                count!(tpdum, val)
3,896✔
859
            end
860
        end
12,787✔
861
    else
862
        for ur in userefs(stmt)
4,660,962✔
863
            val = ur[]
1,437,406✔
864
            if isa(val, SSAValue)
1,437,406✔
865
                stmt_inconsistent |= val.id in inconsistent
923,294✔
866
                count!(tpdum, val)
461,647✔
867
            end
868
        end
1,437,406✔
869
    end
870
    stmt_inconsistent && push!(inconsistent, inst.idx)
4,091,387✔
871
    return stmt_inconsistent
4,091,387✔
872
end
873

874
struct ScanStmt
875
    sv::PostOptAnalysisState
158,092✔
876
end
877

878
function ((; sv)::ScanStmt)(inst::Instruction, lstmt::Int, bb::Int)
4,094,513✔
879
    stmt = inst[:stmt]
4,094,513✔
880

881
    if isa(stmt, EnterNode)
4,094,513✔
882
        # try/catch not yet modeled
883
        give_up_refinements!(sv)
3,126✔
884
        return true # don't bail out early -- can cause tpdum counts to be off
3,126✔
885
    end
886

887
    scan_non_dataflow_flags!(inst, sv)
4,091,387✔
888

889
    stmt_inconsistent = scan_inconsistency!(inst, sv)
4,091,387✔
890

891
    if stmt_inconsistent
4,091,387✔
892
        if !has_flag(inst[:flag], IR_FLAG_NOTHROW)
3,812,916✔
893
            # Taint :consistent if this statement may raise since :consistent requires
894
            # consistent termination. TODO: Separate :consistent_return and :consistent_termination from :consistent.
895
            sv.all_retpaths_consistent = false
3,445,897✔
896
        end
897
        if inst.idx == lstmt
3,812,916✔
898
            if isa(stmt, ReturnNode) && isdefined(stmt, :val)
122,879✔
899
                sv.all_retpaths_consistent = false
46,259✔
900
            elseif isa(stmt, GotoIfNot)
76,620✔
901
                # Conditional Branch with inconsistent condition.
902
                # If we do not know this function terminates, taint consistency, now,
903
                # :consistent requires consistent termination. TODO: Just look at the
904
                # inconsistent region.
905
                if !sv.result.ipo_effects.terminates
31,580✔
906
                    sv.all_retpaths_consistent = false
18,639✔
907
                elseif visit_conditional_successors(sv.lazypostdomtree, sv.ir, bb) do succ::Int
12,941✔
908
                        return any_stmt_may_throw(sv.ir, succ)
19,242✔
909
                    end
910
                    # check if this `GotoIfNot` leads to conditional throws, which taints consistency
911
                    sv.all_retpaths_consistent = false
12,941✔
912
                else
913
                    (; cfg, domtree) = get!(sv.lazyagdomtree)
×
914
                    for succ in iterated_dominance_frontier(cfg, BlockLiveness(sv.ir.cfg.blocks[bb].succs, nothing), domtree)
×
915
                        if succ == length(cfg.blocks)
×
916
                            # Phi node in the virtual exit -> We have a conditional
917
                            # return. TODO: Check if all the retvals are egal.
918
                            sv.all_retpaths_consistent = false
×
919
                        else
920
                            visit_bb_phis!(sv.ir, succ) do phiidx::Int
×
921
                                push!(sv.inconsistent, phiidx)
×
922
                            end
923
                        end
924
                    end
×
925
                end
926
            end
927
        end
928
    end
929

930
    # Do not bail out early, as this can cause tpdum counts to be off.
931
    # # bail out early if there are no possibilities to refine the effects
932
    # if !any_refinable(sv)
933
    #     return nothing
934
    # end
935

936
    return true
4,091,387✔
937
end
938

939
function check_inconsistentcy!(sv::PostOptAnalysisState, scanner::BBScanner)
3,264✔
940
    (; ir, inconsistent, tpdum) = sv
3,264✔
941

942
    sv.all_retpaths_consistent || return
6,528✔
943
    scan!(ScanStmt(sv), scanner, false)
×
944
    sv.all_retpaths_consistent || return
×
945
    complete!(tpdum); push!(scanner.bb_ip, 1)
×
946
    populate_def_use_map!(tpdum, scanner)
×
947

948
    stmt_ip = BitSetBoundedMinPrioritySet(length(ir.stmts))
×
949
    for def in inconsistent
×
950
        append!(stmt_ip, tpdum[def])
×
951
   end
×
952
    lazydomtree = LazyDomtree(ir)
×
953
    while !isempty(stmt_ip)
×
954
        idx = popfirst!(stmt_ip)
×
955
        idx in inconsistent && continue # already processed
×
956
        inst = ir[SSAValue(idx)]
×
957
        stmt = inst[:stmt]
×
958
        if iscall_with_boundscheck(stmt, sv)
×
959
            # recompute inconsistent flags for call while skipping boundscheck (last) argument
960
            any_non_boundscheck_inconsistent = false
×
961
            for i = 1:(length(stmt.args)-1)
×
962
                val = stmt.args[i]
×
963
                if isa(val, SSAValue)
×
964
                    any_non_boundscheck_inconsistent |= val.id in inconsistent
×
965
                    any_non_boundscheck_inconsistent && break
×
966
                end
967
            end
×
968
            any_non_boundscheck_inconsistent || continue
×
969
        elseif isa(stmt, ReturnNode)
×
970
            sv.all_retpaths_consistent = false
×
971
            return
×
972
        elseif isa(stmt, GotoIfNot)
×
973
            bb = block_for_inst(ir, idx)
×
974
            cfg = ir.cfg
×
975
            blockliveness = BlockLiveness(cfg.blocks[bb].succs, nothing)
×
976
            for succ in iterated_dominance_frontier(cfg, blockliveness, get!(lazydomtree))
×
977
                visit_bb_phis!(ir, succ) do phiidx::Int
×
978
                    phiidx in inconsistent || push!(stmt_ip, phiidx)
×
979
                end
980
            end
×
981
        end
982
        push!(inconsistent, idx)
×
983
        append!(stmt_ip, tpdum[idx])
×
984
    end
×
985
end
986

987
function ipo_dataflow_analysis!(interp::AbstractInterpreter, opt::OptimizationState,
158,394✔
988
                                ir::IRCode, result::InferenceResult)
989
    if !is_ipo_dataflow_analysis_profitable(result.ipo_effects)
158,394✔
990
        return false
302✔
991
    end
992

993
    @assert isempty(ir.new_nodes) "IRCode should be compacted before post-opt analysis"
158,092✔
994

995
    sv = PostOptAnalysisState(result, ir)
4,959,719✔
996
    scanner = BBScanner(ir)
158,092✔
997

998
    completed_scan = scan!(ScanStmt(sv), scanner, true)
158,092✔
999

1000
    if !completed_scan
158,092✔
1001
        # finish scanning for all_retpaths_consistent computation
1002
        check_inconsistentcy!(sv, scanner)
3,588✔
1003
        if !sv.all_retpaths_consistent
3,588✔
1004
            # No longer any dataflow concerns, just scan the flags
1005
            scan!(scanner, false) do inst::Instruction, ::Int, ::Int
3,588✔
1006
                scan_non_dataflow_flags!(inst, sv)
2,544✔
1007
                # bail out early if there are no possibilities to refine the effects
1008
                if !any_refinable(sv)
2,544✔
1009
                    return nothing
2,544✔
1010
                end
1011
                return true
×
1012
            end
1013
        end
1014
    end
1015

1016
    return refine_effects!(interp, opt, sv)
158,092✔
1017
end
1018

1019
# run the optimization work
1020
function optimize(interp::AbstractInterpreter, opt::OptimizationState, caller::InferenceResult)
157,155✔
1021
    @zone "CC: OPTIMIZER" ir = run_passes_ipo_safe(opt.src, opt)
168,223✔
1022
    ipo_dataflow_analysis!(interp, opt, ir, caller)
168,217✔
1023
    finishopt!(interp, opt, ir)
168,217✔
1024
    return nothing
158,426✔
1025
end
1026

1027
const ALL_PASS_NAMES = String[]
1028
macro pass(name::String, expr)
1029
    optimize_until = esc(:optimize_until)
1030
    stage = esc(:__stage__)
1031
    macrocall = :(@zone $name $(esc(expr)))
1032
    macrocall.args[2] = __source__  # `@timeit` may want to use it
1033
    push!(ALL_PASS_NAMES, name)
1034
    quote
1035
        $macrocall
1,203,603✔
1036
        matchpass($optimize_until, ($stage += 1), $name) && $(esc(:(@goto __done__)))
1,109,765✔
1037
    end
1038
end
1039

1040
matchpass(optimize_until::Int, stage, _) = optimize_until == stage
117✔
1041
matchpass(optimize_until::String, _, name) = optimize_until == name
130✔
1042
matchpass(::Nothing, _, _) = false
×
1043

1044
function run_passes_ipo_safe(
158,552✔
1045
    ci::CodeInfo,
1046
    sv::OptimizationState,
1047
    optimize_until::Union{Nothing, Int, String} = nothing)  # run all passes by default
1048
    if optimize_until isa String && !contains_is(ALL_PASS_NAMES, optimize_until)
326,884✔
1049
        error("invalid `optimize_until` argument, no such optimization pass")
2✔
1050
    elseif optimize_until isa Int && (optimize_until < 1 || optimize_until > length(ALL_PASS_NAMES))
158,580✔
1051
        error("invalid `optimize_until` argument, no such optimization pass")
2✔
1052
    end
1053

1054
    __stage__ = 0  # used by @pass
158,548✔
1055
    # NOTE: The pass name MUST be unique for `optimize_until::String` to work
1056
    @pass "CC: CONVERT"   ir = convert_to_ircode(ci, sv)
1057
    @pass "CC: SLOT2REG"  ir = slot2reg(ir, ci, sv)
1058
    # TODO: Domsorting can produce an updated domtree - no need to recompute here
1059
    @pass "CC: COMPACT_1" ir = compact!(ir)
1060
    @pass "CC: INLINING"  ir = ssa_inlining_pass!(ir, sv.inlining, ci.propagate_inbounds)
1061
    # @zone "CC: VERIFY 2" verify_ir(ir)
1062
    @pass "CC: COMPACT_2" ir = compact!(ir)
1063
    @pass "CC: SROA"      ir = sroa_pass!(ir, sv.inlining)
1064
    @pass "CC: ADCE"      (ir, made_changes) = adce_pass!(ir, sv.inlining)
1065
    if made_changes
158,483✔
1066
        @pass "CC: COMPACT_3" ir = compact!(ir, true)
1067
    end
1068
    if is_asserts()
158,483✔
1069
        @zone "CC: VERIFY_3" begin
1070
            verify_ir(ir, true, false, optimizer_lattice(sv.inlining.interp), sv.linfo)
×
1071
            verify_linetable(ir.debuginfo, length(ir.stmts))
×
1072
        end
1073
    end
1074
    @label __done__  # used by @pass
1075
    return ir
158,545✔
1076
end
1077

1078
function strip_trailing_junk!(code::Vector{Any}, ssavaluetypes::Vector{Any}, ssaflags::Vector, debuginfo::DebugInfoStream, cfg::CFG, info::Vector{CallInfo})
129,894✔
1079
    # Remove `nothing`s at the end, we don't handle them well
1080
    # (we expect the last instruction to be a terminator)
1081
    codelocs = debuginfo.codelocs
129,894✔
1082
    for i = length(code):-1:1
259,788✔
1083
        if code[i] !== nothing
129,894✔
1084
            resize!(code, i)
129,894✔
1085
            resize!(ssavaluetypes, i)
129,894✔
1086
            resize!(codelocs, 3i)
129,894✔
1087
            resize!(info, i)
129,894✔
1088
            resize!(ssaflags, i)
129,894✔
1089
            break
129,894✔
1090
        end
1091
    end
×
1092
    # If the last instruction is not a terminator, add one. This can
1093
    # happen for implicit return on dead branches.
1094
    term = code[end]
129,894✔
1095
    if !isa(term, GotoIfNot) && !isa(term, GotoNode) && !isa(term, ReturnNode)
129,894✔
1096
        push!(code, ReturnNode())
9,087✔
1097
        push!(ssavaluetypes, Union{})
9,087✔
1098
        push!(codelocs, 0, 0, 0)
9,087✔
1099
        push!(info, NoCallInfo())
9,087✔
1100
        push!(ssaflags, IR_FLAG_NOTHROW)
9,087✔
1101

1102
        # Update CFG to include appended terminator
1103
        old_range = cfg.blocks[end].stmts
9,087✔
1104
        new_range = StmtRange(first(old_range), last(old_range) + 1)
9,087✔
1105
        cfg.blocks[end] = BasicBlock(cfg.blocks[end], new_range)
9,087✔
1106
        (length(cfg.index) == length(cfg.blocks)) && (cfg.index[end] += 1)
9,087✔
1107
    end
1108
    nothing
129,894✔
1109
end
1110

1111
function changed_lineinfo(di::DebugInfo, codeloc::Int, prevloc::Int)
1,343,567✔
1112
    while true
1,343,567✔
1113
        next = getdebugidx(di, codeloc)
1,344,942✔
1114
        line = next[1]
1,344,942✔
1115
        line < 0 && return false # invalid info
1,344,942✔
1116
        line == 0 && next[2] == 0 && return false # no new info
1,344,913✔
1117
        prevloc <= 0 && return true # no old info
1,190,777✔
1118
        prev = getdebugidx(di, prevloc)
1,060,897✔
1119
        next === prev && return false # exactly identical
1,060,897✔
1120
        prevline = prev[1]
75,419✔
1121
        prevline < 0 && return true # previous invalid info, now valid
75,419✔
1122
        edge = next[2]
75,419✔
1123
        edge === prev[2] || return true # change to this edge
75,515✔
1124
        linetable = di.linetable
75,323✔
1125
        # check for change to line number here
1126
        if linetable === nothing || line == 0
75,323✔
1127
            line == prevline || return true
149,271✔
1128
        else
1129
            changed_lineinfo(linetable::DebugInfo, Int(line), Int(prevline)) && return true
×
1130
        end
1131
        # check for change to edge here
1132
        edge == 0 && return false # no edge here
1,375✔
1133
        di = di.edges[Int(edge)]::DebugInfo
1,375✔
1134
        codeloc = Int(next[3])
1,375✔
1135
        prevloc = Int(prev[3])
1,375✔
1136
    end
1,375✔
1137
end
1138

1139
function convert_to_ircode(ci::CodeInfo, sv::OptimizationState)
158,412✔
1140
    # Update control-flow to reflect any unreachable branches.
1141
    ssavaluetypes = ci.ssavaluetypes::Vector{Any}
158,412✔
1142
    ci.code = code = copy_exprargs(ci.code)
1,637,676✔
1143
    di = DebugInfoStream(sv.linfo, ci.debuginfo, length(code))
158,412✔
1144
    codelocs = di.codelocs
158,412✔
1145
    ssaflags = ci.ssaflags
158,412✔
1146
    for i = 1:length(code)
288,306✔
1147
        expr = code[i]
1,637,676✔
1148
        if !(i in sv.unreachable)
1,637,676✔
1149
            if isa(expr, GotoIfNot)
1,444,219✔
1150
                # Replace this live GotoIfNot with:
1151
                # - no-op if :nothrow and the branch target is unreachable
1152
                # - cond if :nothrow and both targets are unreachable
1153
                # - typeassert if must-throw
1154
                block = block_for_inst(sv.cfg, i)
59,863✔
1155
                if ssavaluetypes[i] === Bottom
59,863✔
1156
                    destblock = block_for_inst(sv.cfg, expr.dest)
×
1157
                    cfg_delete_edge!(sv.cfg, block, block + 1)
×
1158
                    ((block + 1) != destblock) && cfg_delete_edge!(sv.cfg, block, destblock)
×
1159
                    expr = Expr(:call, Core.typeassert, expr.cond, Bool)
×
1160
                elseif i + 1 in sv.unreachable
59,863✔
1161
                    @assert has_flag(ssaflags[i], IR_FLAG_NOTHROW)
13,926✔
1162
                    cfg_delete_edge!(sv.cfg, block, block + 1)
13,926✔
1163
                    expr = GotoNode(expr.dest)
13,926✔
1164
                elseif expr.dest in sv.unreachable
45,937✔
1165
                    @assert has_flag(ssaflags[i], IR_FLAG_NOTHROW)
37,454✔
1166
                    cfg_delete_edge!(sv.cfg, block, block_for_inst(sv.cfg, expr.dest))
37,454✔
1167
                    expr = nothing
37,454✔
1168
                end
1169
                code[i] = expr
59,863✔
1170
            elseif isa(expr, EnterNode)
1,384,356✔
1171
                catchdest = expr.catch_dest
3,285✔
1172
                if catchdest in sv.unreachable
3,285✔
1173
                    cfg_delete_edge!(sv.cfg, block_for_inst(sv.cfg, i), block_for_inst(sv.cfg, catchdest))
15✔
1174
                    if isdefined(expr, :scope)
15✔
1175
                        # We've proven that nothing inside the enter region throws,
1176
                        # but we don't yet know whether something might read the scope,
1177
                        # so we need to retain this enter for the time being. However,
1178
                        # we use the special marker `0` to indicate that setting up
1179
                        # the try/catch frame is not required.
1180
                        code[i] = EnterNode(expr, 0)
12✔
1181
                    else
1182
                        code[i] = nothing
3✔
1183
                    end
1184
                end
1185
            elseif isa(expr, PhiNode)
1,381,071✔
1186
                new_edges = Int32[]
×
1187
                new_vals = Any[]
×
1188
                for j = 1:length(expr.edges)
×
1189
                    edge = expr.edges[j]
×
1190
                    (edge in sv.unreachable || (ssavaluetypes[edge] === Union{} && !isa(code[edge], PhiNode))) && continue
×
1191
                    push!(new_edges, edge)
×
1192
                    if isassigned(expr.values, j)
×
1193
                        push!(new_vals, expr.values[j])
×
1194
                    else
1195
                        resize!(new_vals, length(new_edges))
×
1196
                    end
1197
                end
×
1198
                code[i] = PhiNode(new_edges, new_vals)
×
1199
            end
1200
        end
1201
    end
3,116,940✔
1202

1203
    # Go through and add an unreachable node after every
1204
    # Union{} call. Then reindex labels.
1205
    stmtinfo = sv.stmt_info
158,412✔
1206
    meta = Expr[]
158,412✔
1207
    idx = 1
158,412✔
1208
    oldidx = 1
158,412✔
1209
    nstmts = length(code)
158,412✔
1210
    ssachangemap = labelchangemap = blockchangemap = nothing
158,412✔
1211
    prevloc = 0
158,412✔
1212
    while idx <= length(code)
1,788,899✔
1213
        if sv.insert_coverage && changed_lineinfo(ci.debuginfo, oldidx, prevloc)
1,630,487✔
1214
            # insert a side-effect instruction before the current instruction in the same basic block
1215
            insert!(code, idx, Expr(:code_coverage_effect))
258,341✔
1216
            splice!(codelocs, 3idx-2:3idx-3, (codelocs[3idx-2], codelocs[3idx-1], codelocs[3idx-0]))
258,341✔
1217
            insert!(ssavaluetypes, idx, Nothing)
258,341✔
1218
            insert!(stmtinfo, idx, NoCallInfo())
258,341✔
1219
            insert!(ssaflags, idx, IR_FLAG_NULL)
258,341✔
1220
            if ssachangemap === nothing
258,341✔
1221
                ssachangemap = fill(0, nstmts)
1,637,479✔
1222
            end
1223
            if labelchangemap === nothing
258,341✔
1224
                labelchangemap = fill(0, nstmts)
1,637,479✔
1225
            end
1226
            ssachangemap[oldidx] += 1
258,341✔
1227
            if oldidx < length(labelchangemap)
258,341✔
1228
                labelchangemap[oldidx + 1] += 1
258,069✔
1229
            end
1230
            if blockchangemap === nothing
258,341✔
1231
                blockchangemap = fill(0, length(sv.cfg.blocks))
322,744✔
1232
            end
1233
            blockchangemap[block_for_inst(sv.cfg, oldidx)] += 1
258,341✔
1234
            idx += 1
258,341✔
1235
            prevloc = oldidx
258,341✔
1236
        end
1237
        if ssavaluetypes[idx] === Union{} && !(oldidx in sv.unreachable) && !isa(code[idx], PhiNode)
1,630,487✔
1238
            # We should have converted any must-throw terminators to an equivalent w/o control-flow edges
1239
            @assert !isterminator(code[idx])
4,754✔
1240

1241
            block = block_for_inst(sv.cfg, oldidx)
4,754✔
1242
            block_end = last(sv.cfg.blocks[block].stmts) + (idx - oldidx)
4,754✔
1243

1244
            # Delete all successors to this basic block
1245
            for succ in sv.cfg.blocks[block].succs
4,754✔
1246
                preds = sv.cfg.blocks[succ].preds
1,243✔
1247
                deleteat!(preds, findfirst(x::Int->x==block, preds)::Int)
6,537✔
1248
            end
1,243✔
1249
            empty!(sv.cfg.blocks[block].succs)
4,754✔
1250

1251
            if !(idx < length(code) && isa(code[idx + 1], ReturnNode) && !isdefined((code[idx + 1]::ReturnNode), :val))
4,754✔
1252
                # Any statements from here to the end of the block have been wrapped in Core.Const(...)
1253
                # by type inference (effectively deleting them). Only task left is to replace the block
1254
                # terminator with an explicit `unreachable` marker.
1255

1256
                if block_end > idx
4,754✔
1257
                    if is_asserts()
4,153✔
1258
                        # Verify that type-inference did its job
1259
                        for i = (oldidx + 1):last(sv.cfg.blocks[block].stmts)
×
1260
                            @assert i in sv.unreachable
×
1261
                        end
×
1262
                    end
1263
                    code[block_end] = ReturnNode()
4,153✔
1264
                    codelocs[3block_end-2], codelocs[3block_end-1], codelocs[3block_end-0] = (codelocs[3idx-2], codelocs[3idx-1], codelocs[3idx-0])
4,153✔
1265
                    ssavaluetypes[block_end] = Union{}
4,153✔
1266
                    stmtinfo[block_end] = NoCallInfo()
4,153✔
1267
                    ssaflags[block_end] = IR_FLAG_NOTHROW
4,153✔
1268
                    idx += block_end - idx
4,153✔
1269
                else
1270
                    insert!(code, idx + 1, ReturnNode())
601✔
1271
                    splice!(codelocs, 3idx-2:3idx-3, (codelocs[3idx-2], codelocs[3idx-1], codelocs[3idx-0]))
601✔
1272
                    insert!(ssavaluetypes, idx + 1, Union{})
601✔
1273
                    insert!(stmtinfo, idx + 1, NoCallInfo())
601✔
1274
                    insert!(ssaflags, idx + 1, IR_FLAG_NOTHROW)
601✔
1275
                    if ssachangemap === nothing
601✔
1276
                        ssachangemap = fill(0, nstmts)
×
1277
                    end
1278
                    if labelchangemap === nothing
601✔
1279
                        labelchangemap = sv.insert_coverage ? fill(0, nstmts) : ssachangemap
×
1280
                    end
1281
                    if oldidx < length(ssachangemap)
601✔
1282
                        ssachangemap[oldidx + 1] += 1
601✔
1283
                        sv.insert_coverage && (labelchangemap[oldidx + 1] += 1)
601✔
1284
                    end
1285
                    if blockchangemap === nothing
601✔
1286
                        blockchangemap = fill(0, length(sv.cfg.blocks))
×
1287
                    end
1288
                    blockchangemap[block] += 1
601✔
1289
                    idx += 1
601✔
1290
                end
1291
                oldidx = last(sv.cfg.blocks[block].stmts)
4,754✔
1292
            end
1293
        end
1294
        idx += 1
1,630,487✔
1295
        oldidx += 1
1,630,487✔
1296
    end
1,630,487✔
1297
    empty!(sv.unreachable)
159,131✔
1298

1299
    if ssachangemap !== nothing && labelchangemap !== nothing
158,412✔
1300
        renumber_ir_elements!(code, ssachangemap, labelchangemap)
158,230✔
1301
    end
1302
    if blockchangemap !== nothing
158,412✔
1303
        renumber_cfg_stmts!(sv.cfg, blockchangemap)
158,230✔
1304
    end
1305

1306
    for i = 1:length(code)
288,306✔
1307
        code[i] = process_meta!(meta, code[i])
1,896,618✔
1308
    end
3,634,824✔
1309
    strip_trailing_junk!(code, ssavaluetypes, ssaflags, di, sv.cfg, stmtinfo)
158,412✔
1310
    types = Any[]
158,412✔
1311
    stmts = InstructionStream(code, types, stmtinfo, codelocs, ssaflags)
158,412✔
1312
    # NOTE this `argtypes` contains types of slots yet: it will be modified to contain the
1313
    # types of call arguments only once `slot2reg` converts this `IRCode` to the SSA form
1314
    # and eliminates slots (see below)
1315
    argtypes = sv.slottypes
158,412✔
1316
    return IRCode(stmts, sv.cfg, di, argtypes, meta, sv.sptypes, world_range(ci))
158,412✔
1317
end
1318

1319
function process_meta!(meta::Vector{Expr}, @nospecialize stmt)
1320
    if isexpr(stmt, :meta) && length(stmt.args) ≥ 1
1,896,618✔
1321
        push!(meta, stmt)
×
1322
        return nothing
×
1323
    end
1324
    return stmt
1,896,618✔
1325
end
1326

1327
function slot2reg(ir::IRCode, ci::CodeInfo, sv::OptimizationState)
129,894✔
1328
    # need `ci` for the slot metadata, IR for the code
1329
    @zone "CC: DOMTREE_1" domtree = construct_domtree(ir)
158,544✔
1330
    defuse_insts = scan_slot_def_use(Int(ci.nargs), ci, ir.stmts.stmt)
158,544✔
1331
    𝕃ₒ = optimizer_lattice(sv.inlining.interp)
158,412✔
1332
    @zone "CC: CONSTRUCT_SSA" ir = construct_ssa!(ci, ir, sv, domtree, defuse_insts, 𝕃ₒ) # consumes `ir`
158,544✔
1333
    # NOTE now we have converted `ir` to the SSA form and eliminated slots
1334
    # let's resize `argtypes` now and remove unnecessary types for the eliminated slots
1335
    resize!(ir.argtypes, ci.nargs)
158,544✔
1336
    return ir
158,544✔
1337
end
1338

1339
## Computing the cost of a function body
1340

1341
# saturating sum (inputs are non-negative), prevents overflow with typemax(Int) below
1342
plus_saturate(x::Int, y::Int) = max(x, y, x+y)
3,204,105✔
1343

1344
# known return type
1345
isknowntype(@nospecialize T) = (T === Union{}) || isa(T, Const) || isconcretetype(widenconst(T))
4,538✔
1346

1347
function statement_cost(ex::Expr, line::Int, src::Union{CodeInfo, IRCode}, sptypes::Vector{VarState},
2,964,818✔
1348
                        params::OptimizationParams)
1349
    #=const=# UNKNOWN_CALL_COST = 20
2,964,818✔
1350
    head = ex.head
2,964,818✔
1351
    if is_meta_expr_head(head)
5,923,579✔
1352
        return 0
6,057✔
1353
    elseif head === :call
2,958,761✔
1354
        farg = ex.args[1]
172,914✔
1355
        ftyp = argextype(farg, src, sptypes)
172,914✔
1356
        if ftyp === IntrinsicFunction && farg isa SSAValue
172,914✔
1357
            # if this comes from code that was already inlined into another function,
1358
            # Consts have been widened. try to recover in simple cases.
1359
            farg = isa(src, CodeInfo) ? src.code[farg.id] : src[farg][:stmt]
×
1360
            if isa(farg, GlobalRef) || isa(farg, QuoteNode) || isa(farg, IntrinsicFunction) || isexpr(farg, :static_parameter)
×
1361
                ftyp = argextype(farg, src, sptypes)
×
1362
            end
1363
        end
1364
        f = singleton_type(ftyp)
172,914✔
1365
        if isa(f, IntrinsicFunction)
172,914✔
1366
            iidx = Int(reinterpret(Int32, f::IntrinsicFunction)) + 1
67,195✔
1367
            if isassigned(T_IFUNC, iidx)
67,195✔
1368
                minarg, maxarg, = T_IFUNC[iidx]
67,195✔
1369
                nargs = length(ex.args)
67,195✔
1370
                if minarg + 1 <= nargs <= maxarg + 1
67,195✔
1371
                    # With mostly constant arguments, all Intrinsics tend to become very cheap
1372
                    # and are likely to combine with the operations around them,
1373
                    # so reduce their cost by half.
1374
                    cost = T_IFUNC_COST[iidx]
67,195✔
1375
                    if cost == 0 || nargs < 3 ||
107,887✔
1376
                       (f === Intrinsics.cglobal || f === Intrinsics.llvmcall) # these hold malformed IR, so argextype will crash on them
1377
                        return cost
28,220✔
1378
                    end
1379
                    aty2 = widenconditional(argextype(ex.args[2], src, sptypes))
38,975✔
1380
                    nconst = Int(aty2 isa Const)
38,975✔
1381
                    for i = 3:nargs
77,876✔
1382
                        aty = widenconditional(argextype(ex.args[i], src, sptypes))
38,975✔
1383
                        if widenconst(aty) != widenconst(aty2)
38,975✔
1384
                            nconst = 0
9,942✔
1385
                            break
9,942✔
1386
                        end
1387
                        nconst += aty isa Const
29,033✔
1388
                    end
29,033✔
1389
                    if nconst + 2 >= nargs
38,975✔
1390
                        cost = (cost - 1) ÷ 2
19,646✔
1391
                    end
1392
                    return cost
38,975✔
1393
                end
1394
            end
1395
            # unknown/unhandled intrinsic: hopefully the caller gets a slightly better answer after the inlining
1396
            return UNKNOWN_CALL_COST
×
1397
        end
1398
        if isa(f, Builtin) && f !== invoke
105,719✔
1399
            # The efficiency of operations like a[i] and s.b
1400
            # depend strongly on whether the result can be
1401
            # inferred, so check the type of ex
1402
            if f === Core.getfield || f === Core.tuple || f === Core.getglobal
161,908✔
1403
                # we might like to penalize non-inferrability, but
1404
                # tuple iteration/destructuring makes that impossible
1405
                # return plus_saturate(argcost, isknowntype(extyp) ? 1 : params.inline_nonleaf_penalty)
1406
                return 0
55,941✔
1407
            elseif (f === Core.memoryrefget || f === Core.memoryref_isassigned) && length(ex.args) >= 3
99,488✔
1408
                atyp = argextype(ex.args[2], src, sptypes)
58✔
1409
                return isknowntype(atyp) ? 1 : params.inline_nonleaf_penalty
58✔
1410
            elseif f === Core.memoryrefset! && length(ex.args) >= 3
49,715✔
1411
                atyp = argextype(ex.args[2], src, sptypes)
4,488✔
1412
                return isknowntype(atyp) ? 5 : params.inline_nonleaf_penalty
4,488✔
1413
            elseif f === typeassert && isconstType(widenconst(argextype(ex.args[3], src, sptypes)))
50,643✔
1414
                return 1
1,236✔
1415
            end
1416
            fidx = find_tfunc(f)
1,047,728✔
1417
            if fidx === nothing
43,991✔
1418
                # unknown/unhandled builtin
1419
                # Use the generic cost of a direct function call
1420
                return UNKNOWN_CALL_COST
4,236✔
1421
            end
1422
            return T_FFUNC_COST[fidx]
39,755✔
1423
        end
1424
        extyp = line == -1 ? Any : argextype(SSAValue(line), src, sptypes)
10✔
1425
        if extyp === Union{}
5✔
1426
            return 0
×
1427
        end
1428
        return params.inline_nonleaf_penalty
5✔
1429
    elseif head === :foreigncall
2,785,847✔
1430
        foreigncall = ex.args[1]
2,203✔
1431
        if isexpr(foreigncall, :tuple, 1)
2,203✔
1432
            foreigncall = foreigncall.args[1]
1,061✔
1433
            if foreigncall isa QuoteNode && foreigncall.value === :jl_string_ptr
1,061✔
1434
                return 1
11✔
1435
            end
1436
        end
1437
        return 20
2,192✔
1438
    elseif head === :invoke || head === :invoke_modify
5,519,425✔
1439
        # Calls whose "return type" is Union{} do not actually return:
1440
        # they are errors. Since these are not part of the typical
1441
        # run-time of the function, we omit them from
1442
        # consideration. This way, non-inlined error branches do not
1443
        # prevent inlining.
1444
        extyp = line == -1 ? Any : argextype(SSAValue(line), src, sptypes)
95,726✔
1445
        return extyp === Union{} ? 0 : UNKNOWN_CALL_COST
47,863✔
1446
    elseif head === :(=)
2,735,781✔
1447
        return statement_cost(ex.args[2], -1, src, sptypes, params)
×
1448
    elseif head === :copyast
2,735,781✔
1449
        return 100
×
1450
    end
1451
    return 0
2,735,781✔
1452
end
1453

1454
function statement_or_branch_cost(@nospecialize(stmt), line::Int, src::Union{CodeInfo, IRCode}, sptypes::Vector{VarState},
1455
                                  params::OptimizationParams)
1456
    thiscost = 0
3,204,105✔
1457
    dst(tgt) = isa(src, IRCode) ? first(src.cfg.blocks[tgt].stmts) : tgt
3,273,137✔
1458
    if stmt isa Expr
3,204,150✔
1459
        thiscost = statement_cost(stmt, line, src, sptypes, params)::Int
2,964,848✔
1460
    elseif stmt isa GotoNode
239,302✔
1461
        # loops are generally always expensive
1462
        # but assume that forward jumps are already counted for from
1463
        # summing the cost of the not-taken branch
1464
        thiscost = dst(stmt.label) < line ? 40 : 0
41,697✔
1465
    elseif stmt isa GotoIfNot
197,605✔
1466
        thiscost = dst(stmt.dest) < line ? 40 : 0
27,344✔
1467
    elseif stmt isa EnterNode
170,261✔
1468
        # try/catch is a couple function calls,
1469
        # but don't inline functions with try/catch
1470
        # since these aren't usually performance-sensitive functions,
1471
        # and llvm is more likely to miscompile them when these functions get large
1472
        thiscost = typemax(Int)
2,178✔
1473
    end
1474
    return thiscost
3,204,150✔
1475
end
1476

1477
function inline_cost_model(ir::IRCode, params::OptimizationParams, cost_threshold::Int)
112,012✔
1478
    bodycost = 0
112,012✔
1479
    for i = 1:length(ir.stmts)
223,969✔
1480
        stmt = ir[SSAValue(i)][:stmt]
3,204,105✔
1481
        thiscost = statement_or_branch_cost(stmt, i, ir, ir.sptypes, params)
3,443,392✔
1482
        bodycost = plus_saturate(bodycost, thiscost)
3,204,105✔
1483
        if bodycost > cost_threshold
3,204,105✔
1484
            return MAX_INLINE_COST
4,213✔
1485
        end
1486
    end
6,291,985✔
1487
    return inline_cost_clamp(bodycost)
107,799✔
1488
end
1489

1490
function statement_costs!(cost::Vector{Int}, body::Vector{Any}, src::Union{CodeInfo, IRCode}, sptypes::Vector{VarState}, params::OptimizationParams)
1491
    maxcost = 0
×
1492
    for line = 1:length(body)
3✔
1493
        stmt = body[line]
45✔
1494
        thiscost = statement_or_branch_cost(stmt, line, src, sptypes,
60✔
1495
                                            params)
1496
        cost[line] = thiscost
45✔
1497
        if thiscost > maxcost
45✔
1498
            maxcost = thiscost
×
1499
        end
1500
    end
87✔
1501
    return maxcost
3✔
1502
end
1503

1504
function renumber_ir_elements!(body::Vector{Any}, cfg::Union{CFG,Nothing}, ssachangemap::Vector{Int})
×
1505
    return renumber_ir_elements!(body, cfg, ssachangemap, ssachangemap)
×
1506
end
1507

1508
function cumsum_ssamap!(ssachangemap::Vector{Int})
1509
    any_change = false
389,640✔
1510
    rel_change = 0
389,640✔
1511
    for i = 1:length(ssachangemap)
779,280✔
1512
        val = ssachangemap[i]
2,968,779✔
1513
        any_change |= val ≠ 0
2,968,779✔
1514
        rel_change += val
2,968,779✔
1515
        if val == -1
2,968,779✔
1516
            # Keep a marker that this statement was deleted
1517
            ssachangemap[i] = typemin(Int)
×
1518
        else
1519
            ssachangemap[i] = rel_change
2,968,779✔
1520
        end
1521
    end
5,547,918✔
1522
    return any_change
389,640✔
1523
end
1524

1525
function renumber_ir_elements!(body::Vector{Any}, ssachangemap::Vector{Int}, labelchangemap::Vector{Int})
129,880✔
1526
    any_change = cumsum_ssamap!(labelchangemap)
1,349,974✔
1527
    if ssachangemap !== labelchangemap
129,880✔
1528
        any_change |= cumsum_ssamap!(ssachangemap)
1,349,974✔
1529
    end
1530
    any_change || return
129,880✔
1531
    for i = 1:length(body)
259,760✔
1532
        el = body[i]
1,554,175✔
1533
        if isa(el, GotoNode)
1,554,175✔
1534
            body[i] = GotoNode(el.label + labelchangemap[el.label])
37,759✔
1535
        elseif isa(el, GotoIfNot)
1,516,416✔
1536
            cond = el.cond
3,920✔
1537
            if isa(cond, SSAValue)
3,920✔
1538
                cond = SSAValue(cond.id + ssachangemap[cond.id])
3,860✔
1539
            end
1540
            was_deleted = labelchangemap[el.dest] == typemin(Int)
3,920✔
1541
            body[i] = was_deleted ? cond : GotoIfNot(cond, el.dest + labelchangemap[el.dest])
3,920✔
1542
        elseif isa(el, ReturnNode)
1,512,496✔
1543
            if isdefined(el, :val)
134,966✔
1544
                val = el.val
131,211✔
1545
                if isa(val, SSAValue)
131,211✔
1546
                    body[i] = ReturnNode(SSAValue(val.id + ssachangemap[val.id]))
130,990✔
1547
                end
1548
            end
1549
        elseif isa(el, SSAValue)
1,377,530✔
1550
            body[i] = SSAValue(el.id + ssachangemap[el.id])
×
1551
        elseif isa(el, PhiNode)
1,377,530✔
1552
            i = 1
×
1553
            edges = el.edges
×
1554
            values = el.values
×
1555
            while i <= length(edges)
×
1556
                was_deleted = ssachangemap[edges[i]] == typemin(Int)
×
1557
                if was_deleted
×
1558
                    deleteat!(edges, i)
×
1559
                    deleteat!(values, i)
×
1560
                else
1561
                    edges[i] += ssachangemap[edges[i]]
×
1562
                    val = values[i]
×
1563
                    if isa(val, SSAValue)
×
1564
                        values[i] = SSAValue(val.id + ssachangemap[val.id])
×
1565
                    end
1566
                    i += 1
×
1567
                end
1568
            end
×
1569
        elseif isa(el, EnterNode)
1,377,530✔
1570
            tgt = el.catch_dest
3,144✔
1571
            if tgt != 0
3,144✔
1572
                was_deleted = labelchangemap[tgt] == typemin(Int)
3,144✔
1573
                if was_deleted
3,144✔
1574
                    @assert !isdefined(el, :scope)
×
1575
                    body[i] = nothing
×
1576
                else
1577
                    if isdefined(el, :scope) && isa(el.scope, SSAValue)
3,144✔
1578
                        body[i] = EnterNode(tgt + labelchangemap[tgt], SSAValue(el.scope.id + ssachangemap[el.scope.id]))
2,916✔
1579
                    else
1580
                        body[i] = EnterNode(el, tgt + labelchangemap[tgt])
228✔
1581
                    end
1582
                end
1583
            end
1584
        elseif isa(el, Expr)
1,374,386✔
1585
            if el.head === :(=) && el.args[2] isa Expr
716,836✔
1586
                el = el.args[2]::Expr
40,060✔
1587
            end
1588
            if !is_meta_expr_head(el.head)
1,433,561✔
1589
                args = el.args
716,725✔
1590
                for i = 1:length(args)
1,229,298✔
1591
                    el = args[i]
1,377,624✔
1592
                    if isa(el, SSAValue)
1,377,624✔
1593
                        args[i] = SSAValue(el.id + ssachangemap[el.id])
688,994✔
1594
                    end
1595
                end
1,377,624✔
1596
            end
1597
        end
1598
    end
1,554,175✔
1599
end
1600

1601
function renumber_cfg_stmts!(cfg::CFG, blockchangemap::Vector{Int})
129,880✔
1602
    cumsum_ssamap!(blockchangemap) || return
129,880✔
1603
    for i = 1:length(cfg.blocks)
259,760✔
1604
        old_range = cfg.blocks[i].stmts
268,831✔
1605
        new_range = StmtRange(first(old_range) + ((i > 1) ? blockchangemap[i - 1] : 0),
268,831✔
1606
                              last(old_range) + blockchangemap[i])
1607
        cfg.blocks[i] = BasicBlock(cfg.blocks[i], new_range)
268,831✔
1608
        if i <= length(cfg.index)
268,831✔
1609
            cfg.index[i] = cfg.index[i] + blockchangemap[i]
268,831✔
1610
        end
1611
    end
268,831✔
1612
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc