• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

tarantool / luajit / 10940225480

19 Sep 2024 11:22AM UTC coverage: 92.845% (-0.06%) from 92.909%
10940225480

push

github

Buristan
FFI: Workaround for platform dlerror() returning NULL.

Contributed by mcclure.

(cherry picked from commit 478bcfe52)

The `ffi.load()` implementation assumes the string returned from
`dlerror()` is non-NULL and immediately dereferences it. This may lead
to a crash on POSIX platforms [1] where it is possible.

This patch adds the corresponding check and the default "dlopen failed"
error message.

Sergey Kaplun:
* added the description and the test for the problem

[1]: https://pubs.opengroup.org/onlinepubs/009695399/functions/dlerror.html

Part of tarantool/tarantool#10199

Reviewed-by: Sergey Bronnikov <sergeyb@tarantool.org>
Reviewed-by: Maxim Kokryashkin <m.kokryashkin@tarantool.org>
Signed-off-by: Sergey Kaplun <skaplun@tarantool.org>

5684 of 6027 branches covered (94.31%)

Branch coverage included in aggregate %.

2 of 2 new or added lines in 1 file covered. (100.0%)

26 existing lines in 5 files now uncovered.

21670 of 23435 relevant lines covered (92.47%)

2912061.21 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

77.16
/src/lj_opt_fold.c
1
/*
2
** FOLD: Constant Folding, Algebraic Simplifications and Reassociation.
3
** ABCelim: Array Bounds Check Elimination.
4
** CSE: Common-Subexpression Elimination.
5
** Copyright (C) 2005-2017 Mike Pall. See Copyright Notice in luajit.h
6
*/
7

8
#define lj_opt_fold_c
9
#define LUA_CORE
10

11
#include <math.h>
12

13
#include "lj_obj.h"
14

15
#if LJ_HASJIT
16

17
#include "lj_buf.h"
18
#include "lj_str.h"
19
#include "lj_tab.h"
20
#include "lj_ir.h"
21
#include "lj_jit.h"
22
#include "lj_ircall.h"
23
#include "lj_iropt.h"
24
#include "lj_trace.h"
25
#if LJ_HASFFI
26
#include "lj_ctype.h"
27
#include "lj_carith.h"
28
#endif
29
#include "lj_vm.h"
30
#include "lj_strscan.h"
31
#include "lj_strfmt.h"
32

33
/* Here's a short description how the FOLD engine processes instructions:
34
**
35
** The FOLD engine receives a single instruction stored in fins (J->fold.ins).
36
** The instruction and its operands are used to select matching fold rules.
37
** These are applied iteratively until a fixed point is reached.
38
**
39
** The 8 bit opcode of the instruction itself plus the opcodes of the
40
** two instructions referenced by its operands form a 24 bit key
41
** 'ins left right' (unused operands -> 0, literals -> lowest 8 bits).
42
**
43
** This key is used for partial matching against the fold rules. The
44
** left/right operand fields of the key are successively masked with
45
** the 'any' wildcard, from most specific to least specific:
46
**
47
**   ins left right
48
**   ins any  right
49
**   ins left any
50
**   ins any  any
51
**
52
** The masked key is used to lookup a matching fold rule in a semi-perfect
53
** hash table. If a matching rule is found, the related fold function is run.
54
** Multiple rules can share the same fold function. A fold rule may return
55
** one of several special values:
56
**
57
** - NEXTFOLD means no folding was applied, because an additional test
58
**   inside the fold function failed. Matching continues against less
59
**   specific fold rules. Finally the instruction is passed on to CSE.
60
**
61
** - RETRYFOLD means the instruction was modified in-place. Folding is
62
**   retried as if this instruction had just been received.
63
**
64
** All other return values are terminal actions -- no further folding is
65
** applied:
66
**
67
** - INTFOLD(i) returns a reference to the integer constant i.
68
**
69
** - LEFTFOLD and RIGHTFOLD return the left/right operand reference
70
**   without emitting an instruction.
71
**
72
** - CSEFOLD and EMITFOLD pass the instruction directly to CSE or emit
73
**   it without passing through any further optimizations.
74
**
75
** - FAILFOLD, DROPFOLD and CONDFOLD only apply to instructions which have
76
**   no result (e.g. guarded assertions): FAILFOLD means the guard would
77
**   always fail, i.e. the current trace is pointless. DROPFOLD means
78
**   the guard is always true and has been eliminated. CONDFOLD is a
79
**   shortcut for FAILFOLD + cond (i.e. drop if true, otherwise fail).
80
**
81
** - Any other return value is interpreted as an IRRef or TRef. This
82
**   can be a reference to an existing or a newly created instruction.
83
**   Only the least-significant 16 bits (IRRef1) are used to form a TRef
84
**   which is finally returned to the caller.
85
**
86
** The FOLD engine receives instructions both from the trace recorder and
87
** substituted instructions from LOOP unrolling. This means all types
88
** of instructions may end up here, even though the recorder bypasses
89
** FOLD in some cases. Thus all loads, stores and allocations must have
90
** an any/any rule to avoid being passed on to CSE.
91
**
92
** Carefully read the following requirements before adding or modifying
93
** any fold rules:
94
**
95
** Requirement #1: All fold rules must preserve their destination type.
96
**
97
** Consistently use INTFOLD() (KINT result) or lj_ir_knum() (KNUM result).
98
** Never use lj_ir_knumint() which can have either a KINT or KNUM result.
99
**
100
** Requirement #2: Fold rules should not create *new* instructions which
101
** reference operands *across* PHIs.
102
**
103
** E.g. a RETRYFOLD with 'fins->op1 = fleft->op1' is invalid if the
104
** left operand is a PHI. Then fleft->op1 would point across the PHI
105
** frontier to an invariant instruction. Adding a PHI for this instruction
106
** would be counterproductive. The solution is to add a barrier which
107
** prevents folding across PHIs, i.e. 'PHIBARRIER(fleft)' in this case.
108
** The only exception is for recurrences with high latencies like
109
** repeated int->num->int conversions.
110
**
111
** One could relax this condition a bit if the referenced instruction is
112
** a PHI, too. But this often leads to worse code due to excessive
113
** register shuffling.
114
**
115
** Note: returning *existing* instructions (e.g. LEFTFOLD) is ok, though.
116
** Even returning fleft->op1 would be ok, because a new PHI will added,
117
** if needed. But again, this leads to excessive register shuffling and
118
** should be avoided.
119
**
120
** Requirement #3: The set of all fold rules must be monotonic to guarantee
121
** termination.
122
**
123
** The goal is optimization, so one primarily wants to add strength-reducing
124
** rules. This means eliminating an instruction or replacing an instruction
125
** with one or more simpler instructions. Don't add fold rules which point
126
** into the other direction.
127
**
128
** Some rules (like commutativity) do not directly reduce the strength of
129
** an instruction, but enable other fold rules (e.g. by moving constants
130
** to the right operand). These rules must be made unidirectional to avoid
131
** cycles.
132
**
133
** Rule of thumb: the trace recorder expands the IR and FOLD shrinks it.
134
*/
135

136
/* Some local macros to save typing. Undef'd at the end. */
137
#define IR(ref)                (&J->cur.ir[(ref)])
138
#define fins                (&J->fold.ins)
139
#define fleft                (J->fold.left)
140
#define fright                (J->fold.right)
141
#define knumleft        (ir_knum(fleft)->n)
142
#define knumright        (ir_knum(fright)->n)
143

144
/* Pass IR on to next optimization in chain (FOLD). */
145
#define emitir(ot, a, b)        (lj_ir_set(J, (ot), (a), (b)), lj_opt_fold(J))
146

147
/* Fold function type. Fastcall on x86 significantly reduces their size. */
148
typedef IRRef (LJ_FASTCALL *FoldFunc)(jit_State *J);
149

150
/* Macros for the fold specs, so buildvm can recognize them. */
151
#define LJFOLD(x)
152
#define LJFOLDX(x)
153
#define LJFOLDF(name)        static TRef LJ_FASTCALL fold_##name(jit_State *J)
154
/* Note: They must be at the start of a line or buildvm ignores them! */
155

156
/* Barrier to prevent using operands across PHIs. */
157
#define PHIBARRIER(ir)        if (irt_isphi((ir)->t)) return NEXTFOLD
158

159
/* Barrier to prevent folding across a GC step.
160
** GC steps can only happen at the head of a trace and at LOOP.
161
** And the GC is only driven forward if there's at least one allocation.
162
*/
163
#define gcstep_barrier(J, ref) \
164
  ((ref) < J->chain[IR_LOOP] && \
165
   (J->chain[IR_SNEW] || J->chain[IR_XSNEW] || \
166
    J->chain[IR_TNEW] || J->chain[IR_TDUP] || \
167
    J->chain[IR_CNEW] || J->chain[IR_CNEWI] || \
168
    J->chain[IR_BUFSTR] || J->chain[IR_TOSTR] || J->chain[IR_CALLA]))
169

170
/* -- Constant folding for FP numbers ------------------------------------- */
171

172
LJFOLD(ADD KNUM KNUM)
173
LJFOLD(SUB KNUM KNUM)
174
LJFOLD(MUL KNUM KNUM)
175
LJFOLD(DIV KNUM KNUM)
176
LJFOLD(LDEXP KNUM KNUM)
177
LJFOLD(MIN KNUM KNUM)
178
LJFOLD(MAX KNUM KNUM)
179
LJFOLDF(kfold_numarith)
1,768✔
180
{
181
  lua_Number a = knumleft;
1,768✔
182
  lua_Number b = knumright;
1,768✔
183
  lua_Number y = lj_vm_foldarith(a, b, fins->o - IR_ADD);
1,768✔
184
  return lj_ir_knum(J, y);
1,768✔
185
}
186

187
LJFOLD(NEG KNUM FLOAD)
188
LJFOLD(ABS KNUM FLOAD)
189
LJFOLDF(kfold_numabsneg)
11✔
190
{
191
  lua_Number a = knumleft;
11✔
192
  lua_Number y = lj_vm_foldarith(a, a, fins->o - IR_ADD);
11✔
193
  return lj_ir_knum(J, y);
11✔
194
}
195

196
LJFOLD(LDEXP KNUM KINT)
197
LJFOLDF(kfold_ldexp)
×
198
{
199
#if LJ_TARGET_X86ORX64
200
  UNUSED(J);
×
201
  return NEXTFOLD;
×
202
#else
203
  return lj_ir_knum(J, ldexp(knumleft, fright->i));
204
#endif
205
}
206

207
LJFOLD(FPMATH KNUM any)
208
LJFOLDF(kfold_fpmath)
56✔
209
{
210
  lua_Number a = knumleft;
56✔
211
  lua_Number y = lj_vm_foldfpm(a, fins->op2);
56✔
212
  return lj_ir_knum(J, y);
56✔
213
}
214

215
LJFOLD(CALLN KNUM any)
216
LJFOLDF(kfold_fpcall1)
5✔
217
{
218
  const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
5✔
219
  if (CCI_TYPE(ci) == IRT_NUM) {
5✔
220
    double y = ((double (*)(double))ci->func)(knumleft);
5✔
221
    return lj_ir_knum(J, y);
5✔
222
  }
223
  return NEXTFOLD;
224
}
225

226
LJFOLD(CALLN CARG IRCALL_atan2)
227
LJFOLDF(kfold_fpcall2)
1✔
228
{
229
  if (irref_isk(fleft->op1) && irref_isk(fleft->op2)) {
1✔
230
    const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
1✔
231
    double a = ir_knum(IR(fleft->op1))->n;
1✔
232
    double b = ir_knum(IR(fleft->op2))->n;
1✔
233
    double y = ((double (*)(double, double))ci->func)(a, b);
1✔
234
    return lj_ir_knum(J, y);
1✔
235
  }
236
  return NEXTFOLD;
237
}
238

239
LJFOLD(POW KNUM KNUM)
240
LJFOLDF(kfold_numpow)
10,182✔
241
{
242
  return lj_ir_knum(J, lj_vm_foldarith(knumleft, knumright, IR_POW - IR_ADD));
10,182✔
243
}
244

245
/* Must not use kfold_kref for numbers (could be NaN). */
246
LJFOLD(EQ KNUM KNUM)
247
LJFOLD(NE KNUM KNUM)
248
LJFOLD(LT KNUM KNUM)
249
LJFOLD(GE KNUM KNUM)
250
LJFOLD(LE KNUM KNUM)
251
LJFOLD(GT KNUM KNUM)
252
LJFOLD(ULT KNUM KNUM)
253
LJFOLD(UGE KNUM KNUM)
254
LJFOLD(ULE KNUM KNUM)
255
LJFOLD(UGT KNUM KNUM)
256
LJFOLDF(kfold_numcomp)
24✔
257
{
258
  return CONDFOLD(lj_ir_numcmp(knumleft, knumright, (IROp)fins->o));
24✔
259
}
260

261
/* -- Constant folding for 32 bit integers -------------------------------- */
262

263
#if LUAJIT_USE_UBSAN
264
/* Cdata arithmetic depends on the interger overflow. */
265
static int32_t kfold_intop(int32_t k1, int32_t k2, IROp op)
266
  __attribute__((no_sanitize("signed-integer-overflow")));
267
#endif
268
static int32_t kfold_intop(int32_t k1, int32_t k2, IROp op)
4,131✔
269
{
270
  switch (op) {
4,131✔
271
  case IR_ADD: k1 += k2; break;
2,176✔
272
  case IR_SUB: k1 -= k2; break;
47✔
273
  case IR_MUL: k1 *= k2; break;
×
274
  case IR_MOD: k1 = lj_vm_modi(k1, k2); break;
33✔
275
  case IR_NEG: k1 = (int32_t)(~(uint32_t)k1+1u); break;
×
UNCOV
276
  case IR_BAND: k1 &= k2; break;
×
UNCOV
277
  case IR_BOR: k1 |= k2; break;
×
278
  case IR_BXOR: k1 ^= k2; break;
×
279
  case IR_BSHL: k1 <<= (k2 & 31); break;
7✔
280
  case IR_BSHR: k1 = (int32_t)((uint32_t)k1 >> (k2 & 31)); break;
11✔
281
  case IR_BSAR: k1 >>= (k2 & 31); break;
×
282
  case IR_BROL: k1 = (int32_t)lj_rol((uint32_t)k1, (k2 & 31)); break;
1✔
283
  case IR_BROR: k1 = (int32_t)lj_ror((uint32_t)k1, (k2 & 31)); break;
×
284
  case IR_MIN: k1 = k1 < k2 ? k1 : k2; break;
2✔
285
  case IR_MAX: k1 = k1 > k2 ? k1 : k2; break;
1,854✔
286
  default: lj_assertX(0, "bad IR op %d", op); break;
287
  }
288
  return k1;
4,131✔
289
}
290

291
LJFOLD(ADD KINT KINT)
292
LJFOLD(SUB KINT KINT)
293
LJFOLD(MUL KINT KINT)
294
LJFOLD(MOD KINT KINT)
295
LJFOLD(NEG KINT KINT)
296
LJFOLD(BAND KINT KINT)
297
LJFOLD(BOR KINT KINT)
298
LJFOLD(BXOR KINT KINT)
299
LJFOLD(BSHL KINT KINT)
300
LJFOLD(BSHR KINT KINT)
301
LJFOLD(BSAR KINT KINT)
302
LJFOLD(BROL KINT KINT)
303
LJFOLD(BROR KINT KINT)
304
LJFOLD(MIN KINT KINT)
305
LJFOLD(MAX KINT KINT)
306
LJFOLDF(kfold_intarith)
2,323✔
307
{
308
  return INTFOLD(kfold_intop(fleft->i, fright->i, (IROp)fins->o));
2,323✔
309
}
310

311
LJFOLD(ADDOV KINT KINT)
312
LJFOLD(SUBOV KINT KINT)
313
LJFOLD(MULOV KINT KINT)
314
LJFOLDF(kfold_intovarith)
12,428✔
315
{
316
  lua_Number n = lj_vm_foldarith((lua_Number)fleft->i, (lua_Number)fright->i,
24,856✔
317
                                 fins->o - IR_ADDOV);
12,428✔
318
  int32_t k = lj_num2int(n);
12,428✔
319
  if (n != (lua_Number)k)
12,428✔
320
    return FAILFOLD;
321
  return INTFOLD(k);
12,428✔
322
}
323

324
LJFOLD(BNOT KINT)
325
LJFOLDF(kfold_bnot)
×
326
{
327
  return INTFOLD(~fleft->i);
×
328
}
329

330
LJFOLD(BSWAP KINT)
331
LJFOLDF(kfold_bswap)
×
332
{
333
  return INTFOLD((int32_t)lj_bswap((uint32_t)fleft->i));
×
334
}
335

336
LJFOLD(LT KINT KINT)
337
LJFOLD(GE KINT KINT)
338
LJFOLD(LE KINT KINT)
339
LJFOLD(GT KINT KINT)
340
LJFOLD(ULT KINT KINT)
341
LJFOLD(UGE KINT KINT)
342
LJFOLD(ULE KINT KINT)
343
LJFOLD(UGT KINT KINT)
344
LJFOLD(ABC KINT KINT)
345
LJFOLDF(kfold_intcomp)
799✔
346
{
347
  int32_t a = fleft->i, b = fright->i;
799✔
348
  switch ((IROp)fins->o) {
799✔
349
  case IR_LT: return CONDFOLD(a < b);
62✔
350
  case IR_GE: return CONDFOLD(a >= b);
151✔
351
  case IR_LE: return CONDFOLD(a <= b);
233✔
352
  case IR_GT: return CONDFOLD(a > b);
15✔
353
  case IR_ULT: return CONDFOLD((uint32_t)a < (uint32_t)b);
×
354
  case IR_UGE: return CONDFOLD((uint32_t)a >= (uint32_t)b);
×
355
  case IR_ULE: return CONDFOLD((uint32_t)a <= (uint32_t)b);
42✔
356
  case IR_ABC:
296✔
357
  case IR_UGT: return CONDFOLD((uint32_t)a > (uint32_t)b);
296✔
358
  default: lj_assertJ(0, "bad IR op %d", fins->o); return FAILFOLD;
359
  }
360
}
361

362
LJFOLD(UGE any KINT)
363
LJFOLDF(kfold_intcomp0)
59✔
364
{
365
  if (fright->i == 0)
59✔
366
    return DROPFOLD;
37✔
367
  return NEXTFOLD;
368
}
369

370
/* -- Constant folding for 64 bit integers -------------------------------- */
371

372
static uint64_t kfold_int64arith(jit_State *J, uint64_t k1, uint64_t k2,
373
                                 IROp op)
374
{
375
  UNUSED(J);
376
#if LJ_HASFFI
377
  switch (op) {
378
  case IR_ADD: k1 += k2; break;
379
  case IR_SUB: k1 -= k2; break;
380
  case IR_MUL: k1 *= k2; break;
381
  case IR_BAND: k1 &= k2; break;
382
  case IR_BOR: k1 |= k2; break;
383
  case IR_BXOR: k1 ^= k2; break;
384
  case IR_BSHL: k1 <<= (k2 & 63); break;
385
  case IR_BSHR: k1 = (int32_t)((uint32_t)k1 >> (k2 & 63)); break;
386
  case IR_BSAR: k1 >>= (k2 & 63); break;
387
  case IR_BROL: k1 = (int32_t)lj_rol((uint32_t)k1, (k2 & 63)); break;
388
  case IR_BROR: k1 = (int32_t)lj_ror((uint32_t)k1, (k2 & 63)); break;
389
  default: lj_assertJ(0, "bad IR op %d", op); break;
390
  }
391
#else
392
  UNUSED(k2); UNUSED(op);
393
  lj_assertJ(0, "FFI IR op without FFI");
394
#endif
395
  return k1;
396
}
397

398
LJFOLD(ADD KINT64 KINT64)
399
LJFOLD(SUB KINT64 KINT64)
400
LJFOLD(MUL KINT64 KINT64)
401
LJFOLD(BAND KINT64 KINT64)
402
LJFOLD(BOR KINT64 KINT64)
403
LJFOLD(BXOR KINT64 KINT64)
404
LJFOLDF(kfold_int64arith)
63✔
405
{
406
  return INT64FOLD(kfold_int64arith(J, ir_k64(fleft)->u64,
63✔
407
                                    ir_k64(fright)->u64, (IROp)fins->o));
408
}
409

410
LJFOLD(DIV KINT64 KINT64)
411
LJFOLD(MOD KINT64 KINT64)
412
LJFOLD(POW KINT64 KINT64)
413
LJFOLDF(kfold_int64arith2)
×
414
{
415
#if LJ_HASFFI
416
  uint64_t k1 = ir_k64(fleft)->u64, k2 = ir_k64(fright)->u64;
×
417
  if (irt_isi64(fins->t)) {
×
418
    k1 = fins->o == IR_DIV ? lj_carith_divi64((int64_t)k1, (int64_t)k2) :
×
419
         fins->o == IR_MOD ? lj_carith_modi64((int64_t)k1, (int64_t)k2) :
×
420
                             lj_carith_powi64((int64_t)k1, (int64_t)k2);
×
421
  } else {
422
    k1 = fins->o == IR_DIV ? lj_carith_divu64(k1, k2) :
×
423
         fins->o == IR_MOD ? lj_carith_modu64(k1, k2) :
×
424
                             lj_carith_powu64(k1, k2);
×
425
  }
426
  return INT64FOLD(k1);
×
427
#else
428
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
429
#endif
430
}
431

432
LJFOLD(BSHL KINT64 KINT)
433
LJFOLD(BSHR KINT64 KINT)
434
LJFOLD(BSAR KINT64 KINT)
435
LJFOLD(BROL KINT64 KINT)
436
LJFOLD(BROR KINT64 KINT)
437
LJFOLDF(kfold_int64shift)
×
438
{
439
#if LJ_HASFFI
440
  uint64_t k = ir_k64(fleft)->u64;
×
441
  int32_t sh = (fright->i & 63);
×
442
  return INT64FOLD(lj_carith_shift64(k, sh, fins->o - IR_BSHL));
×
443
#else
444
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
445
#endif
446
}
447

448
LJFOLD(BNOT KINT64)
449
LJFOLDF(kfold_bnot64)
×
450
{
451
#if LJ_HASFFI
452
  return INT64FOLD(~ir_k64(fleft)->u64);
×
453
#else
454
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
455
#endif
456
}
457

458
LJFOLD(BSWAP KINT64)
459
LJFOLDF(kfold_bswap64)
×
460
{
461
#if LJ_HASFFI
462
  return INT64FOLD(lj_bswap64(ir_k64(fleft)->u64));
×
463
#else
464
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
465
#endif
466
}
467

468
LJFOLD(LT KINT64 KINT64)
469
LJFOLD(GE KINT64 KINT64)
470
LJFOLD(LE KINT64 KINT64)
471
LJFOLD(GT KINT64 KINT64)
472
LJFOLD(ULT KINT64 KINT64)
473
LJFOLD(UGE KINT64 KINT64)
474
LJFOLD(ULE KINT64 KINT64)
475
LJFOLD(UGT KINT64 KINT64)
476
LJFOLDF(kfold_int64comp)
×
477
{
478
#if LJ_HASFFI
479
  uint64_t a = ir_k64(fleft)->u64, b = ir_k64(fright)->u64;
×
480
  switch ((IROp)fins->o) {
×
481
  case IR_LT: return CONDFOLD((int64_t)a < (int64_t)b);
×
482
  case IR_GE: return CONDFOLD((int64_t)a >= (int64_t)b);
×
483
  case IR_LE: return CONDFOLD((int64_t)a <= (int64_t)b);
×
484
  case IR_GT: return CONDFOLD((int64_t)a > (int64_t)b);
×
485
  case IR_ULT: return CONDFOLD(a < b);
×
486
  case IR_UGE: return CONDFOLD(a >= b);
×
487
  case IR_ULE: return CONDFOLD(a <= b);
×
488
  case IR_UGT: return CONDFOLD(a > b);
×
489
  default: lj_assertJ(0, "bad IR op %d", fins->o); return FAILFOLD;
490
  }
491
#else
492
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
493
#endif
494
}
495

496
LJFOLD(UGE any KINT64)
497
LJFOLDF(kfold_int64comp0)
18,424✔
498
{
499
#if LJ_HASFFI
500
  if (ir_k64(fright)->u64 == 0)
18,424✔
501
    return DROPFOLD;
×
502
  return NEXTFOLD;
503
#else
504
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
505
#endif
506
}
507

508
/* -- Constant folding for strings ---------------------------------------- */
509

510
LJFOLD(SNEW KKPTR KINT)
511
LJFOLDF(kfold_snew_kptr)
6✔
512
{
513
  GCstr *s = lj_str_new(J->L, (const char *)ir_kptr(fleft), (size_t)fright->i);
6✔
514
  return lj_ir_kstr(J, s);
6✔
515
}
516

517
LJFOLD(SNEW any KINT)
518
LJFOLDF(kfold_snew_empty)
52✔
519
{
520
  if (fright->i == 0)
52✔
521
    return lj_ir_kstr(J, &J2G(J)->strempty);
×
522
  return NEXTFOLD;
523
}
524

525
LJFOLD(STRREF KGC KINT)
526
LJFOLDF(kfold_strref)
98✔
527
{
528
  GCstr *str = ir_kstr(fleft);
98✔
529
  lj_assertJ((MSize)fright->i <= str->len, "bad string ref");
98✔
530
  return lj_ir_kkptr(J, (char *)strdata(str) + fright->i);
98✔
531
}
532

533
LJFOLD(STRREF SNEW any)
534
LJFOLDF(kfold_strref_snew)
3✔
535
{
536
  PHIBARRIER(fleft);
3✔
537
  if (irref_isk(fins->op2) && fright->i == 0) {
3✔
538
    return fleft->op1;  /* strref(snew(ptr, len), 0) ==> ptr */
2✔
539
  } else {
540
    /* Reassociate: strref(snew(strref(str, a), len), b) ==> strref(str, a+b) */
541
    IRIns *ir = IR(fleft->op1);
1✔
542
    if (ir->o == IR_STRREF) {
1✔
543
      IRRef1 str = ir->op1;  /* IRIns * is not valid across emitir. */
×
544
      PHIBARRIER(ir);
×
545
      fins->op2 = emitir(IRTI(IR_ADD), ir->op2, fins->op2); /* Clobbers fins! */
×
546
      fins->op1 = str;
×
547
      fins->ot = IRT(IR_STRREF, IRT_PGC);
×
548
      return RETRYFOLD;
×
549
    }
550
  }
551
  return NEXTFOLD;
552
}
553

554
LJFOLD(CALLN CARG IRCALL_lj_str_cmp)
555
LJFOLDF(kfold_strcmp)
11✔
556
{
557
  if (irref_isk(fleft->op1) && irref_isk(fleft->op2)) {
11✔
558
    GCstr *a = ir_kstr(IR(fleft->op1));
×
559
    GCstr *b = ir_kstr(IR(fleft->op2));
×
560
    return INTFOLD(lj_str_cmp(a, b));
×
561
  }
562
  return NEXTFOLD;
563
}
564

565
/* -- Constant folding and forwarding for buffers ------------------------- */
566

567
/*
568
** Buffer ops perform stores, but their effect is limited to the buffer
569
** itself. Also, buffer ops are chained: a use of an op implies a use of
570
** all other ops up the chain. Conversely, if an op is unused, all ops
571
** up the chain can go unsed. This largely eliminates the need to treat
572
** them as stores.
573
**
574
** Alas, treating them as normal (IRM_N) ops doesn't work, because they
575
** cannot be CSEd in isolation. CSE for IRM_N is implicitly done in LOOP
576
** or if FOLD is disabled.
577
**
578
** The compromise is to declare them as loads, emit them like stores and
579
** CSE whole chains manually when the BUFSTR is to be emitted. Any chain
580
** fragments left over from CSE are eliminated by DCE.
581
*/
582

583
/* BUFHDR is emitted like a store, see below. */
584

585
LJFOLD(BUFPUT BUFHDR BUFSTR)
586
LJFOLDF(bufput_append)
53✔
587
{
588
  /* New buffer, no other buffer op inbetween and same buffer? */
589
  if ((J->flags & JIT_F_OPT_FWD) &&
53✔
590
      !(fleft->op2 & IRBUFHDR_APPEND) &&
53✔
591
      fleft->prev == fright->op2 &&
52✔
592
      fleft->op1 == IR(fright->op2)->op1 &&
37✔
593
      !(irt_isphi(fright->t) && IR(fright->op2)->prev)) {
34✔
594
    IRRef ref = fins->op1;
33✔
595
    IR(ref)->op2 = (fleft->op2 | IRBUFHDR_APPEND);  /* Modify BUFHDR. */
33✔
596
    IR(ref)->op1 = fright->op1;
33✔
597
    return ref;
33✔
598
  }
599
  return EMITFOLD;  /* Always emit, CSE later. */
20✔
600
}
601

602
LJFOLD(BUFPUT any any)
603
LJFOLDF(bufput_kgc)
1,692✔
604
{
605
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && fright->o == IR_KGC) {
1,692✔
606
    GCstr *s2 = ir_kstr(fright);
867✔
607
    if (s2->len == 0) {  /* Empty string? */
867✔
608
      return LEFTFOLD;
17✔
609
    } else {
610
      if (fleft->o == IR_BUFPUT && irref_isk(fleft->op2) &&
850✔
611
          !irt_isphi(fleft->t)) {  /* Join two constant string puts in a row. */
196✔
612
        GCstr *s1 = ir_kstr(IR(fleft->op2));
196✔
613
        IRRef kref = lj_ir_kstr(J, lj_buf_cat2str(J->L, s1, s2));
196✔
614
        /* lj_ir_kstr() may realloc the IR and invalidates any IRIns *. */
615
        IR(fins->op1)->op2 = kref;  /* Modify previous BUFPUT. */
196✔
616
        return fins->op1;
196✔
617
      }
618
    }
619
  }
620
  return EMITFOLD;  /* Always emit, CSE later. */
1,479✔
621
}
622

623
LJFOLD(BUFSTR any any)
624
LJFOLDF(bufstr_kfold_cse)
950✔
625
{
626
  lj_assertJ(fleft->o == IR_BUFHDR || fleft->o == IR_BUFPUT ||
950✔
627
             fleft->o == IR_CALLL,
628
             "bad buffer constructor IR op %d", fleft->o);
629
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
950✔
630
    if (fleft->o == IR_BUFHDR) {  /* No put operations? */
940✔
631
      if (!(fleft->op2 & IRBUFHDR_APPEND))  /* Empty buffer? */
2✔
632
        return lj_ir_kstr(J, &J2G(J)->strempty);
1✔
633
      fins->op1 = fleft->op1;
1✔
634
      fins->op2 = fleft->prev;  /* Relies on checks in bufput_append. */
1✔
635
      return CSEFOLD;
1✔
636
    } else if (fleft->o == IR_BUFPUT) {
938✔
637
      IRIns *irb = IR(fleft->op1);
647✔
638
      if (irb->o == IR_BUFHDR && !(irb->op2 & IRBUFHDR_APPEND))
647✔
639
        return fleft->op2;  /* Shortcut for a single put operation. */
83✔
640
    }
641
  }
642
  /* Try to CSE the whole chain. */
643
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
865✔
644
    IRRef ref = J->chain[IR_BUFSTR];
857✔
645
    while (ref) {
3,815✔
646
      IRIns *irs = IR(ref), *ira = fleft, *irb = IR(irs->op1);
3,007✔
647
      while (ira->o == irb->o && ira->op2 == irb->op2) {
3,738✔
648
        lj_assertJ(ira->o == IR_BUFHDR || ira->o == IR_BUFPUT ||
792✔
649
                   ira->o == IR_CALLL || ira->o == IR_CARG,
650
                   "bad buffer constructor IR op %d", ira->o);
651
        if (ira->o == IR_BUFHDR && !(ira->op2 & IRBUFHDR_APPEND))
792✔
652
          return ref;  /* CSE succeeded. */
49✔
653
        if (ira->o == IR_CALLL && ira->op2 == IRCALL_lj_buf_puttab)
743✔
654
          break;
655
        ira = IR(ira->op1);
731✔
656
        irb = IR(irb->op1);
731✔
657
      }
658
      ref = irs->prev;
2,958✔
659
    }
660
  }
661
  return EMITFOLD;  /* No CSE possible. */
816✔
662
}
663

664
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_reverse)
665
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_upper)
666
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_lower)
667
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putquoted)
668
LJFOLDF(bufput_kfold_op)
34✔
669
{
670
  if (irref_isk(fleft->op2)) {
34✔
671
    const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
4✔
672
    SBuf *sb = lj_buf_tmp_(J->L);
4✔
673
    sb = ((SBuf * (LJ_FASTCALL *)(SBuf *, GCstr *))ci->func)(sb,
8✔
674
                                                       ir_kstr(IR(fleft->op2)));
4✔
675
    fins->o = IR_BUFPUT;
4✔
676
    fins->op1 = fleft->op1;
4✔
677
    fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
4✔
678
    return RETRYFOLD;
4✔
679
  }
680
  return EMITFOLD;  /* Always emit, CSE later. */
30✔
681
}
682

683
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_rep)
684
LJFOLDF(bufput_kfold_rep)
78✔
685
{
686
  if (irref_isk(fleft->op2)) {
78✔
687
    IRIns *irc = IR(fleft->op1);
6✔
688
    if (irref_isk(irc->op2)) {
6✔
689
      SBuf *sb = lj_buf_tmp_(J->L);
4✔
690
      sb = lj_buf_putstr_rep(sb, ir_kstr(IR(irc->op2)), IR(fleft->op2)->i);
4✔
691
      fins->o = IR_BUFPUT;
4✔
692
      fins->op1 = irc->op1;
4✔
693
      fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
4✔
694
      return RETRYFOLD;
4✔
695
    }
696
  }
697
  return EMITFOLD;  /* Always emit, CSE later. */
74✔
698
}
699

700
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfxint)
701
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum_int)
702
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum_uint)
703
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum)
704
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfstr)
705
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfchar)
706
LJFOLDF(bufput_kfold_fmt)
263✔
707
{
708
  IRIns *irc = IR(fleft->op1);
263✔
709
  lj_assertJ(irref_isk(irc->op2), "SFormat must be const");
263✔
710
  if (irref_isk(fleft->op2)) {
263✔
711
    SFormat sf = (SFormat)IR(irc->op2)->i;
66✔
712
    IRIns *ira = IR(fleft->op2);
66✔
713
    SBuf *sb = lj_buf_tmp_(J->L);
66✔
714
    switch (fins->op2) {
66✔
715
    case IRCALL_lj_strfmt_putfxint:
2✔
716
      sb = lj_strfmt_putfxint(sb, sf, ir_k64(ira)->u64);
2✔
717
      break;
2✔
718
    case IRCALL_lj_strfmt_putfstr:
×
719
      sb = lj_strfmt_putfstr(sb, sf, ir_kstr(ira));
×
720
      break;
×
721
    case IRCALL_lj_strfmt_putfchar:
×
722
      sb = lj_strfmt_putfchar(sb, sf, ira->i);
×
723
      break;
×
724
    case IRCALL_lj_strfmt_putfnum_int:
64✔
725
    case IRCALL_lj_strfmt_putfnum_uint:
726
    case IRCALL_lj_strfmt_putfnum:
727
    default: {
728
      const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
64✔
729
      sb = ((SBuf * (*)(SBuf *, SFormat, lua_Number))ci->func)(sb, sf,
64✔
730
                                                         ir_knum(ira)->n);
731
      break;
64✔
732
      }
733
    }
734
    fins->o = IR_BUFPUT;
66✔
735
    fins->op1 = irc->op1;
66✔
736
    fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
66✔
737
    return RETRYFOLD;
66✔
738
  }
739
  return EMITFOLD;  /* Always emit, CSE later. */
197✔
740
}
741

742
/* -- Constant folding of pointer arithmetic ------------------------------ */
743

744
LJFOLD(ADD KGC KINT)
745
LJFOLD(ADD KGC KINT64)
746
LJFOLDF(kfold_add_kgc)
44✔
747
{
748
  GCobj *o = ir_kgc(fleft);
44✔
749
#if LJ_64
750
  ptrdiff_t ofs = (ptrdiff_t)ir_kint64(fright)->u64;
44✔
751
#else
752
  ptrdiff_t ofs = fright->i;
753
#endif
754
#if LJ_HASFFI
755
  if (irt_iscdata(fleft->t)) {
44✔
756
    CType *ct = ctype_raw(ctype_ctsG(J2G(J)), gco2cd(o)->ctypeid);
39✔
757
    if (ctype_isnum(ct->info) || ctype_isenum(ct->info) ||
39✔
758
        ctype_isptr(ct->info) || ctype_isfunc(ct->info) ||
39✔
759
        ctype_iscomplex(ct->info) || ctype_isvector(ct->info))
39✔
760
      return lj_ir_kkptr(J, (char *)o + ofs);
1✔
761
  }
762
#endif
763
  return lj_ir_kptr(J, (char *)o + ofs);
43✔
764
}
765

766
LJFOLD(ADD KPTR KINT)
767
LJFOLD(ADD KPTR KINT64)
768
LJFOLD(ADD KKPTR KINT)
769
LJFOLD(ADD KKPTR KINT64)
770
LJFOLDF(kfold_add_kptr)
26✔
771
{
772
  void *p = ir_kptr(fleft);
26✔
773
#if LJ_64
774
  ptrdiff_t ofs = (ptrdiff_t)ir_kint64(fright)->u64;
26✔
775
#else
776
  ptrdiff_t ofs = fright->i;
777
#endif
778
  return lj_ir_kptr_(J, fleft->o, (char *)p + ofs);
26✔
779
}
780

781
LJFOLD(ADD any KGC)
782
LJFOLD(ADD any KPTR)
783
LJFOLD(ADD any KKPTR)
784
LJFOLDF(kfold_add_kright)
24✔
785
{
786
  if (fleft->o == IR_KINT || fleft->o == IR_KINT64) {
24✔
787
    IRRef1 tmp = fins->op1; fins->op1 = fins->op2; fins->op2 = tmp;
24✔
788
    return RETRYFOLD;
24✔
789
  }
790
  return NEXTFOLD;
791
}
792

793
/* -- Constant folding of conversions ------------------------------------- */
794

795
LJFOLD(TOBIT KNUM KNUM)
796
LJFOLDF(kfold_tobit)
2✔
797
{
798
  return INTFOLD(lj_num2bit(knumleft));
2✔
799
}
800

801
LJFOLD(CONV KINT IRCONV_NUM_INT)
802
LJFOLDF(kfold_conv_kint_num)
269,203✔
803
{
804
  return lj_ir_knum(J, (lua_Number)fleft->i);
269,203✔
805
}
806

807
LJFOLD(CONV KINT IRCONV_NUM_U32)
808
LJFOLDF(kfold_conv_kintu32_num)
×
809
{
810
  return lj_ir_knum(J, (lua_Number)(uint32_t)fleft->i);
×
811
}
812

813
LJFOLD(CONV KINT IRCONV_INT_I8)
814
LJFOLD(CONV KINT IRCONV_INT_U8)
815
LJFOLD(CONV KINT IRCONV_INT_I16)
816
LJFOLD(CONV KINT IRCONV_INT_U16)
817
LJFOLDF(kfold_conv_kint_ext)
8✔
818
{
819
  int32_t k = fleft->i;
8✔
820
  if ((fins->op2 & IRCONV_SRCMASK) == IRT_I8) k = (int8_t)k;
8✔
821
  else if ((fins->op2 & IRCONV_SRCMASK) == IRT_U8) k = (uint8_t)k;
6✔
822
  else if ((fins->op2 & IRCONV_SRCMASK) == IRT_I16) k = (int16_t)k;
×
823
  else k = (uint16_t)k;
×
824
  return INTFOLD(k);
8✔
825
}
826

827
LJFOLD(CONV KINT IRCONV_I64_INT)
828
LJFOLD(CONV KINT IRCONV_U64_INT)
829
LJFOLD(CONV KINT IRCONV_I64_U32)
830
LJFOLD(CONV KINT IRCONV_U64_U32)
831
LJFOLDF(kfold_conv_kint_i64)
26,862✔
832
{
833
  if ((fins->op2 & IRCONV_SEXT))
26,862✔
834
    return INT64FOLD((uint64_t)(int64_t)fleft->i);
26,857✔
835
  else
836
    return INT64FOLD((uint64_t)(int64_t)(uint32_t)fleft->i);
5✔
837
}
838

839
LJFOLD(CONV KINT64 IRCONV_NUM_I64)
840
LJFOLDF(kfold_conv_kint64_num_i64)
×
841
{
842
  return lj_ir_knum(J, (lua_Number)(int64_t)ir_kint64(fleft)->u64);
×
843
}
844

845
LJFOLD(CONV KINT64 IRCONV_NUM_U64)
846
LJFOLDF(kfold_conv_kint64_num_u64)
×
847
{
848
  return lj_ir_knum(J, (lua_Number)ir_kint64(fleft)->u64);
×
849
}
850

851
LJFOLD(CONV KINT64 IRCONV_INT_I64)
852
LJFOLD(CONV KINT64 IRCONV_U32_I64)
853
LJFOLDF(kfold_conv_kint64_int_i64)
×
854
{
855
  return INTFOLD((int32_t)ir_kint64(fleft)->u64);
×
856
}
857

858
LJFOLD(CONV KNUM IRCONV_INT_NUM)
859
LJFOLDF(kfold_conv_knum_int_num)
6✔
860
{
861
  lua_Number n = knumleft;
6✔
862
  int32_t k = lj_num2int(n);
6✔
863
  if (irt_isguard(fins->t) && n != (lua_Number)k) {
6✔
864
    /* We're about to create a guard which always fails, like CONV +1.5.
865
    ** Some pathological loops cause this during LICM, e.g.:
866
    **   local x,k,t = 0,1.5,{1,[1.5]=2}
867
    **   for i=1,200 do x = x+ t[k]; k = k == 1 and 1.5 or 1 end
868
    **   assert(x == 300)
869
    */
870
    return FAILFOLD;
871
  }
872
  return INTFOLD(k);
6✔
873
}
874

875
LJFOLD(CONV KNUM IRCONV_U32_NUM)
876
LJFOLDF(kfold_conv_knum_u32_num)
2✔
877
{
878
#ifdef _MSC_VER
879
  {  /* Workaround for MSVC bug. */
880
    volatile uint32_t u = (uint32_t)knumleft;
881
    return INTFOLD((int32_t)u);
882
  }
883
#else
884
  return INTFOLD((int32_t)(uint32_t)knumleft);
2✔
885
#endif
886
}
887

888
LJFOLD(CONV KNUM IRCONV_I64_NUM)
889
LJFOLDF(kfold_conv_knum_i64_num)
8✔
890
{
891
  return INT64FOLD((uint64_t)(int64_t)knumleft);
8✔
892
}
893

894
LJFOLD(CONV KNUM IRCONV_U64_NUM)
895
LJFOLDF(kfold_conv_knum_u64_num)
10,181✔
896
{
897
  return INT64FOLD(lj_num2u64(knumleft));
10,181✔
898
}
899

900
LJFOLD(TOSTR KNUM any)
901
LJFOLDF(kfold_tostr_knum)
1✔
902
{
903
  return lj_ir_kstr(J, lj_strfmt_num(J->L, ir_knum(fleft)));
1✔
904
}
905

906
LJFOLD(TOSTR KINT any)
907
LJFOLDF(kfold_tostr_kint)
11✔
908
{
909
  return lj_ir_kstr(J, fins->op2 == IRTOSTR_INT ?
11✔
910
                       lj_strfmt_int(J->L, fleft->i) :
911
                       lj_strfmt_char(J->L, fleft->i));
912
}
913

914
LJFOLD(STRTO KGC)
915
LJFOLDF(kfold_strto)
35✔
916
{
917
  TValue n;
35✔
918
  if (lj_strscan_num(ir_kstr(fleft), &n))
35✔
919
    return lj_ir_knum(J, numV(&n));
35✔
920
  return FAILFOLD;
921
}
922

923
/* -- Constant folding of equality checks --------------------------------- */
924

925
/* Don't constant-fold away FLOAD checks against KNULL. */
926
LJFOLD(EQ FLOAD KNULL)
927
LJFOLD(NE FLOAD KNULL)
928
LJFOLDX(lj_opt_cse)
929

930
/* But fold all other KNULL compares, since only KNULL is equal to KNULL. */
931
LJFOLD(EQ any KNULL)
932
LJFOLD(NE any KNULL)
933
LJFOLD(EQ KNULL any)
934
LJFOLD(NE KNULL any)
935
LJFOLD(EQ KINT KINT)  /* Constants are unique, so same refs <==> same value. */
936
LJFOLD(NE KINT KINT)
937
LJFOLD(EQ KINT64 KINT64)
938
LJFOLD(NE KINT64 KINT64)
939
LJFOLD(EQ KGC KGC)
940
LJFOLD(NE KGC KGC)
941
LJFOLDF(kfold_kref)
832,913✔
942
{
943
  return CONDFOLD((fins->op1 == fins->op2) ^ (fins->o == IR_NE));
832,913✔
944
}
945

946
/* -- Algebraic shortcuts ------------------------------------------------- */
947

948
LJFOLD(FPMATH FPMATH IRFPM_FLOOR)
949
LJFOLD(FPMATH FPMATH IRFPM_CEIL)
950
LJFOLD(FPMATH FPMATH IRFPM_TRUNC)
951
LJFOLDF(shortcut_round)
×
952
{
953
  IRFPMathOp op = (IRFPMathOp)fleft->op2;
×
954
  if (op == IRFPM_FLOOR || op == IRFPM_CEIL || op == IRFPM_TRUNC)
×
955
    return LEFTFOLD;  /* round(round_left(x)) = round_left(x) */
×
956
  return NEXTFOLD;
957
}
958

959
LJFOLD(ABS ABS FLOAD)
960
LJFOLDF(shortcut_left)
×
961
{
962
  return LEFTFOLD;  /* f(g(x)) ==> g(x) */
×
963
}
964

965
LJFOLD(ABS NEG FLOAD)
966
LJFOLDF(shortcut_dropleft)
×
967
{
968
  PHIBARRIER(fleft);
×
969
  fins->op1 = fleft->op1;  /* abs(neg(x)) ==> abs(x) */
×
970
  return RETRYFOLD;
×
971
}
972

973
/* Note: no safe shortcuts with STRTO and TOSTR ("1e2" ==> +100 ==> "100"). */
974
LJFOLD(NEG NEG any)
975
LJFOLD(BNOT BNOT)
976
LJFOLD(BSWAP BSWAP)
977
LJFOLDF(shortcut_leftleft)
×
978
{
979
  PHIBARRIER(fleft);  /* See above. Fold would be ok, but not beneficial. */
×
980
  return fleft->op1;  /* f(g(x)) ==> x */
×
981
}
982

983
/* -- FP algebraic simplifications ---------------------------------------- */
984

985
/* FP arithmetic is tricky -- there's not much to simplify.
986
** Please note the following common pitfalls before sending "improvements":
987
**   x+0 ==> x  is INVALID for x=-0
988
**   0-x ==> -x is INVALID for x=+0
989
**   x*0 ==> 0  is INVALID for x=-0, x=+-Inf or x=NaN
990
*/
991

992
LJFOLD(ADD NEG any)
993
LJFOLDF(simplify_numadd_negx)
×
994
{
995
  PHIBARRIER(fleft);
×
996
  fins->o = IR_SUB;  /* (-a) + b ==> b - a */
×
997
  fins->op1 = fins->op2;
×
998
  fins->op2 = fleft->op1;
×
999
  return RETRYFOLD;
×
1000
}
1001

1002
LJFOLD(ADD any NEG)
1003
LJFOLDF(simplify_numadd_xneg)
×
1004
{
1005
  PHIBARRIER(fright);
×
1006
  fins->o = IR_SUB;  /* a + (-b) ==> a - b */
×
1007
  fins->op2 = fright->op1;
×
1008
  return RETRYFOLD;
×
1009
}
1010

1011
LJFOLD(SUB any KNUM)
1012
LJFOLDF(simplify_numsub_k)
1,985✔
1013
{
1014
  if (ir_knum(fright)->u64 == 0)  /* x - (+0) ==> x */
1,985✔
1015
    return LEFTFOLD;
1,003✔
1016
  return NEXTFOLD;
1017
}
1018

1019
LJFOLD(SUB NEG KNUM)
1020
LJFOLDF(simplify_numsub_negk)
×
1021
{
1022
  PHIBARRIER(fleft);
×
1023
  fins->op2 = fleft->op1;  /* (-x) - k ==> (-k) - x */
×
1024
  fins->op1 = (IRRef1)lj_ir_knum(J, -knumright);
×
1025
  return RETRYFOLD;
×
1026
}
1027

1028
LJFOLD(SUB any NEG)
1029
LJFOLDF(simplify_numsub_xneg)
×
1030
{
1031
  PHIBARRIER(fright);
×
1032
  fins->o = IR_ADD;  /* a - (-b) ==> a + b */
×
1033
  fins->op2 = fright->op1;
×
1034
  return RETRYFOLD;
×
1035
}
1036

1037
LJFOLD(MUL any KNUM)
1038
LJFOLD(DIV any KNUM)
1039
LJFOLDF(simplify_nummuldiv_k)
263✔
1040
{
1041
  lua_Number n = knumright;
263✔
1042
  if (n == 1.0) {  /* x o 1 ==> x */
263✔
1043
    return LEFTFOLD;
×
1044
  } else if (n == -1.0) {  /* x o -1 ==> -x */
263✔
1045
    IRRef op1 = fins->op1;
×
1046
    fins->op2 = (IRRef1)lj_ir_ksimd(J, LJ_KSIMD_NEG);  /* Modifies fins. */
×
1047
    fins->op1 = op1;
×
1048
    fins->o = IR_NEG;
×
1049
    return RETRYFOLD;
×
1050
  } else if (fins->o == IR_MUL && n == 2.0) {  /* x * 2 ==> x + x */
263✔
1051
    fins->o = IR_ADD;
2✔
1052
    fins->op2 = fins->op1;
2✔
1053
    return RETRYFOLD;
2✔
1054
  } else if (fins->o == IR_DIV) {  /* x / 2^k ==> x * 2^-k */
261✔
1055
    uint64_t u = ir_knum(fright)->u64;
132✔
1056
    uint32_t ex = ((uint32_t)(u >> 52) & 0x7ff);
132✔
1057
    if ((u & U64x(000fffff,ffffffff)) == 0 && ex - 1 < 0x7fd) {
132✔
1058
      u = (u & ((uint64_t)1 << 63)) | ((uint64_t)(0x7fe - ex) << 52);
9✔
1059
      fins->o = IR_MUL;  /* Multiply by exact reciprocal. */
9✔
1060
      fins->op2 = lj_ir_knum_u64(J, u);
9✔
1061
      return RETRYFOLD;
9✔
1062
    }
1063
  }
1064
  return NEXTFOLD;
1065
}
1066

1067
LJFOLD(MUL NEG KNUM)
1068
LJFOLD(DIV NEG KNUM)
1069
LJFOLDF(simplify_nummuldiv_negk)
×
1070
{
1071
  PHIBARRIER(fleft);
×
1072
  fins->op1 = fleft->op1;  /* (-a) o k ==> a o (-k) */
×
1073
  fins->op2 = (IRRef1)lj_ir_knum(J, -knumright);
×
1074
  return RETRYFOLD;
×
1075
}
1076

1077
LJFOLD(MUL NEG NEG)
1078
LJFOLD(DIV NEG NEG)
1079
LJFOLDF(simplify_nummuldiv_negneg)
×
1080
{
1081
  PHIBARRIER(fleft);
×
1082
  PHIBARRIER(fright);
×
1083
  fins->op1 = fleft->op1;  /* (-a) o (-b) ==> a o b */
×
1084
  fins->op2 = fright->op1;
×
1085
  return RETRYFOLD;
×
1086
}
1087

1088
LJFOLD(POW any KNUM)
1089
LJFOLDF(simplify_numpow_k)
4✔
1090
{
1091
  if (knumright == 0)  /* x ^ 0 ==> 1 */
4✔
1092
    return lj_ir_knum_one(J);  /* Result must be a number, not an int. */
×
1093
  else if (knumright == 1)  /* x ^ 1 ==> x */
4✔
1094
    return LEFTFOLD;
×
1095
  else if (knumright == 2)  /* x ^ 2 ==> x * x */
4✔
1096
    return emitir(IRTN(IR_MUL), fins->op1, fins->op1);
×
1097
  else
1098
    return NEXTFOLD;
1099
}
1100

1101
/* -- Simplify conversions ------------------------------------------------ */
1102

1103
LJFOLD(CONV CONV IRCONV_NUM_INT)  /* _NUM */
1104
LJFOLDF(shortcut_conv_num_int)
27✔
1105
{
1106
  PHIBARRIER(fleft);
27✔
1107
  /* Only safe with a guarded conversion to int. */
1108
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_NUM && irt_isguard(fleft->t))
27✔
1109
    return fleft->op1;  /* f(g(x)) ==> x */
8✔
1110
  return NEXTFOLD;
1111
}
1112

1113
LJFOLD(CONV CONV IRCONV_INT_NUM)  /* _INT */
1114
LJFOLD(CONV CONV IRCONV_U32_NUM)  /* _U32*/
1115
LJFOLDF(simplify_conv_int_num)
134✔
1116
{
1117
  /* Fold even across PHI to avoid expensive num->int conversions in loop. */
1118
  if ((fleft->op2 & IRCONV_SRCMASK) ==
134✔
1119
      ((fins->op2 & IRCONV_DSTMASK) >> IRCONV_DSH))
134✔
1120
    return fleft->op1;
20✔
1121
  return NEXTFOLD;
1122
}
1123

1124
LJFOLD(CONV CONV IRCONV_I64_NUM)  /* _INT or _U32 */
1125
LJFOLD(CONV CONV IRCONV_U64_NUM)  /* _INT or _U32 */
1126
LJFOLDF(simplify_conv_i64_num)
115✔
1127
{
1128
  PHIBARRIER(fleft);
115✔
1129
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT) {
114✔
1130
    /* Reduce to a sign-extension. */
1131
    fins->op1 = fleft->op1;
×
1132
    fins->op2 = ((IRT_I64<<5)|IRT_INT|IRCONV_SEXT);
×
1133
    return RETRYFOLD;
×
1134
  } else if ((fleft->op2 & IRCONV_SRCMASK) == IRT_U32) {
114✔
1135
#if LJ_TARGET_X64
1136
    return fleft->op1;
1✔
1137
#else
1138
    /* Reduce to a zero-extension. */
1139
    fins->op1 = fleft->op1;
1140
    fins->op2 = (IRT_I64<<5)|IRT_U32;
1141
    return RETRYFOLD;
1142
#endif
1143
  }
1144
  return NEXTFOLD;
1145
}
1146

1147
LJFOLD(CONV CONV IRCONV_INT_I64)  /* _INT or _U32 */
1148
LJFOLD(CONV CONV IRCONV_INT_U64)  /* _INT or _U32 */
1149
LJFOLD(CONV CONV IRCONV_U32_I64)  /* _INT or _U32 */
1150
LJFOLD(CONV CONV IRCONV_U32_U64)  /* _INT or _U32 */
1151
LJFOLDF(simplify_conv_int_i64)
2✔
1152
{
1153
  int src;
2✔
1154
  PHIBARRIER(fleft);
2✔
1155
  src = (fleft->op2 & IRCONV_SRCMASK);
2✔
1156
  if (src == IRT_INT || src == IRT_U32) {
2✔
1157
    if (src == ((fins->op2 & IRCONV_DSTMASK) >> IRCONV_DSH)) {
2✔
1158
      return fleft->op1;
×
1159
    } else {
1160
      fins->op2 = ((fins->op2 & IRCONV_DSTMASK) | src);
2✔
1161
      fins->op1 = fleft->op1;
2✔
1162
      return RETRYFOLD;
2✔
1163
    }
1164
  }
1165
  return NEXTFOLD;
1166
}
1167

1168
LJFOLD(CONV CONV IRCONV_FLOAT_NUM)  /* _FLOAT */
1169
LJFOLDF(simplify_conv_flt_num)
5✔
1170
{
1171
  PHIBARRIER(fleft);
5✔
1172
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_FLOAT)
5✔
1173
    return fleft->op1;
1✔
1174
  return NEXTFOLD;
1175
}
1176

1177
/* Shortcut TOBIT + IRT_NUM <- IRT_INT/IRT_U32 conversion. */
1178
LJFOLD(TOBIT CONV KNUM)
1179
LJFOLDF(simplify_tobit_conv)
41✔
1180
{
1181
  /* Fold even across PHI to avoid expensive num->int conversions in loop. */
1182
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT) {
41✔
1183
    lj_assertJ(irt_isnum(fleft->t), "expected TOBIT number arg");
7✔
1184
    return fleft->op1;
7✔
1185
  } else if ((fleft->op2 & IRCONV_SRCMASK) == IRT_U32) {
34✔
1186
    lj_assertJ(irt_isnum(fleft->t), "expected TOBIT number arg");
2✔
1187
    fins->o = IR_CONV;
2✔
1188
    fins->op1 = fleft->op1;
2✔
1189
    fins->op2 = (IRT_INT<<5)|IRT_U32;
2✔
1190
    return RETRYFOLD;
2✔
1191
  }
1192
  return NEXTFOLD;
1193
}
1194

1195
/* Shortcut floor/ceil/round + IRT_NUM <- IRT_INT/IRT_U32 conversion. */
1196
LJFOLD(FPMATH CONV IRFPM_FLOOR)
1197
LJFOLD(FPMATH CONV IRFPM_CEIL)
1198
LJFOLD(FPMATH CONV IRFPM_TRUNC)
1199
LJFOLDF(simplify_floor_conv)
×
1200
{
1201
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT ||
×
1202
      (fleft->op2 & IRCONV_SRCMASK) == IRT_U32)
1203
    return LEFTFOLD;
×
1204
  return NEXTFOLD;
1205
}
1206

1207
/* Strength reduction of widening. */
1208
LJFOLD(CONV any IRCONV_I64_INT)
1209
LJFOLD(CONV any IRCONV_U64_INT)
1210
LJFOLDF(simplify_conv_sext)
10,630✔
1211
{
1212
  IRRef ref = fins->op1;
10,630✔
1213
  int64_t ofs = 0;
10,630✔
1214
  if (!(fins->op2 & IRCONV_SEXT))
10,630✔
1215
    return NEXTFOLD;
1216
  PHIBARRIER(fleft);
10,603✔
1217
  if (fleft->o == IR_XLOAD && (irt_isu8(fleft->t) || irt_isu16(fleft->t)))
10,600✔
1218
    goto ok_reduce;
×
1219
  if (fleft->o == IR_ADD && irref_isk(fleft->op2)) {
10,600✔
1220
    ofs = (int64_t)IR(fleft->op2)->i;
×
1221
    ref = fleft->op1;
×
1222
  }
1223
  /* Use scalar evolution analysis results to strength-reduce sign-extension. */
1224
  if (ref == J->scev.idx) {
10,600✔
1225
    IRRef lo = J->scev.dir ? J->scev.start : J->scev.stop;
224✔
1226
    lj_assertJ(irt_isint(J->scev.t), "only int SCEV supported");
224✔
1227
    if (lo && IR(lo)->o == IR_KINT && IR(lo)->i + ofs >= 0) {
224✔
1228
    ok_reduce:
221✔
1229
#if LJ_TARGET_X64
1230
      /* Eliminate widening. All 32 bit ops do an implicit zero-extension. */
1231
      return LEFTFOLD;
221✔
1232
#else
1233
      /* Reduce to a (cheaper) zero-extension. */
1234
      fins->op2 &= ~IRCONV_SEXT;
1235
      return RETRYFOLD;
1236
#endif
1237
    }
1238
  }
1239
  return NEXTFOLD;
1240
}
1241

1242
/* Strength reduction of narrowing. */
1243
LJFOLD(CONV ADD IRCONV_INT_I64)
1244
LJFOLD(CONV SUB IRCONV_INT_I64)
1245
LJFOLD(CONV MUL IRCONV_INT_I64)
1246
LJFOLD(CONV ADD IRCONV_INT_U64)
1247
LJFOLD(CONV SUB IRCONV_INT_U64)
1248
LJFOLD(CONV MUL IRCONV_INT_U64)
1249
LJFOLD(CONV ADD IRCONV_U32_I64)
1250
LJFOLD(CONV SUB IRCONV_U32_I64)
1251
LJFOLD(CONV MUL IRCONV_U32_I64)
1252
LJFOLD(CONV ADD IRCONV_U32_U64)
1253
LJFOLD(CONV SUB IRCONV_U32_U64)
1254
LJFOLD(CONV MUL IRCONV_U32_U64)
1255
LJFOLDF(simplify_conv_narrow)
2✔
1256
{
1257
  IROp op = (IROp)fleft->o;
2✔
1258
  IRType t = irt_type(fins->t);
2✔
1259
  IRRef op1 = fleft->op1, op2 = fleft->op2, mode = fins->op2;
2✔
1260
  PHIBARRIER(fleft);
2✔
1261
  op1 = emitir(IRT(IR_CONV, t), op1, mode);
2✔
1262
  op2 = emitir(IRT(IR_CONV, t), op2, mode);
2✔
1263
  fins->ot = IRT(op, t);
2✔
1264
  fins->op1 = op1;
2✔
1265
  fins->op2 = op2;
2✔
1266
  return RETRYFOLD;
2✔
1267
}
1268

1269
/* Special CSE rule for CONV. */
1270
LJFOLD(CONV any any)
1271
LJFOLDF(cse_conv)
77,307✔
1272
{
1273
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
77,307✔
1274
    IRRef op1 = fins->op1, op2 = (fins->op2 & IRCONV_MODEMASK);
77,307✔
1275
    uint8_t guard = irt_isguard(fins->t);
77,307✔
1276
    IRRef ref = J->chain[IR_CONV];
77,307✔
1277
    while (ref > op1) {
105,657✔
1278
      IRIns *ir = IR(ref);
51,041✔
1279
      /* Commoning with stronger checks is ok. */
1280
      if (ir->op1 == op1 && (ir->op2 & IRCONV_MODEMASK) == op2 &&
51,041✔
1281
          irt_isguard(ir->t) >= guard)
22,691✔
1282
        return ref;
22,691✔
1283
      ref = ir->prev;
28,350✔
1284
    }
1285
  }
1286
  return EMITFOLD;  /* No fallthrough to regular CSE. */
54,616✔
1287
}
1288

1289
/* FP conversion narrowing. */
1290
LJFOLD(TOBIT ADD KNUM)
1291
LJFOLD(TOBIT SUB KNUM)
1292
LJFOLD(CONV ADD IRCONV_INT_NUM)
1293
LJFOLD(CONV SUB IRCONV_INT_NUM)
1294
LJFOLD(CONV ADD IRCONV_I64_NUM)
1295
LJFOLD(CONV SUB IRCONV_I64_NUM)
1296
LJFOLDF(narrow_convert)
10,608✔
1297
{
1298
  PHIBARRIER(fleft);
10,608✔
1299
  /* Narrowing ignores PHIs and repeating it inside the loop is not useful. */
1300
  if (J->chain[IR_LOOP])
10,513✔
1301
    return NEXTFOLD;
1302
  lj_assertJ(fins->o != IR_CONV || (fins->op2&IRCONV_CONVMASK) != IRCONV_TOBIT,
10,468✔
1303
             "unexpected CONV TOBIT");
1304
  return lj_opt_narrow_convert(J);
10,468✔
1305
}
1306

1307
/* -- Integer algebraic simplifications ----------------------------------- */
1308

1309
LJFOLD(ADD any KINT)
1310
LJFOLD(ADDOV any KINT)
1311
LJFOLD(SUBOV any KINT)
1312
LJFOLDF(simplify_intadd_k)
4,363✔
1313
{
1314
  if (fright->i == 0)  /* i o 0 ==> i */
4,363✔
1315
    return LEFTFOLD;
146✔
1316
  return NEXTFOLD;
1317
}
1318

1319
LJFOLD(MULOV any KINT)
1320
LJFOLDF(simplify_intmul_k)
4✔
1321
{
1322
  if (fright->i == 0)  /* i * 0 ==> 0 */
4✔
1323
    return RIGHTFOLD;
×
1324
  if (fright->i == 1)  /* i * 1 ==> i */
4✔
1325
    return LEFTFOLD;
×
1326
  if (fright->i == 2) {  /* i * 2 ==> i + i */
4✔
1327
    fins->o = IR_ADDOV;
×
1328
    fins->op2 = fins->op1;
×
1329
    return RETRYFOLD;
×
1330
  }
1331
  return NEXTFOLD;
1332
}
1333

1334
LJFOLD(SUB any KINT)
1335
LJFOLDF(simplify_intsub_k)
120✔
1336
{
1337
  if (fright->i == 0)  /* i - 0 ==> i */
120✔
1338
    return LEFTFOLD;
70✔
1339
  fins->o = IR_ADD;  /* i - k ==> i + (-k) */
50✔
1340
  fins->op2 = (IRRef1)lj_ir_kint(J, (int32_t)(~(uint32_t)fright->i+1u));  /* Overflow for -2^31 ok. */
50✔
1341
  return RETRYFOLD;
50✔
1342
}
1343

1344
LJFOLD(SUB KINT any)
1345
LJFOLD(SUB KINT64 any)
1346
LJFOLDF(simplify_intsub_kleft)
24✔
1347
{
1348
  if (fleft->o == IR_KINT ? (fleft->i == 0) : (ir_kint64(fleft)->u64 == 0)) {
24✔
1349
    fins->o = IR_NEG;  /* 0 - i ==> -i */
11✔
1350
    fins->op1 = fins->op2;
11✔
1351
    return RETRYFOLD;
11✔
1352
  }
1353
  return NEXTFOLD;
1354
}
1355

1356
LJFOLD(ADD any KINT64)
1357
LJFOLDF(simplify_intadd_k64)
31,382✔
1358
{
1359
  if (ir_kint64(fright)->u64 == 0)  /* i + 0 ==> i */
31,382✔
1360
    return LEFTFOLD;
1,559✔
1361
  return NEXTFOLD;
1362
}
1363

1364
LJFOLD(SUB any KINT64)
1365
LJFOLDF(simplify_intsub_k64)
17✔
1366
{
1367
  uint64_t k = ir_kint64(fright)->u64;
17✔
1368
  if (k == 0)  /* i - 0 ==> i */
17✔
1369
    return LEFTFOLD;
×
1370
  fins->o = IR_ADD;  /* i - k ==> i + (-k) */
17✔
1371
  fins->op2 = (IRRef1)lj_ir_kint64(J, ~k+1u);
17✔
1372
  return RETRYFOLD;
17✔
1373
}
1374

1375
static TRef simplify_intmul_k(jit_State *J, int32_t k)
25,998✔
1376
{
1377
  /* Note: many more simplifications are possible, e.g. 2^k1 +- 2^k2.
1378
  ** But this is mainly intended for simple address arithmetic.
1379
  ** Also it's easier for the backend to optimize the original multiplies.
1380
  */
1381
  if (k == 0) {  /* i * 0 ==> 0 */
25,998✔
1382
    return RIGHTFOLD;
×
1383
  } else if (k == 1) {  /* i * 1 ==> i */
25,998✔
1384
    return LEFTFOLD;
20,059✔
1385
  } else if ((k & (k-1)) == 0) {  /* i * 2^k ==> i << k */
5,939✔
1386
    fins->o = IR_BSHL;
5,937✔
1387
    fins->op2 = lj_ir_kint(J, lj_fls((uint32_t)k));
5,937✔
1388
    return RETRYFOLD;
5,937✔
1389
  }
1390
  return NEXTFOLD;
1391
}
1392

1393
LJFOLD(MUL any KINT)
1394
LJFOLDF(simplify_intmul_k32)
×
1395
{
1396
  if (fright->i >= 0)
×
1397
    return simplify_intmul_k(J, fright->i);
×
1398
  return NEXTFOLD;
1399
}
1400

1401
LJFOLD(MUL any KINT64)
1402
LJFOLDF(simplify_intmul_k64)
28,886✔
1403
{
1404
#if LJ_HASFFI
1405
  if (ir_kint64(fright)->u64 < 0x80000000u)
28,886✔
1406
    return simplify_intmul_k(J, (int32_t)ir_kint64(fright)->u64);
25,998✔
1407
  return NEXTFOLD;
1408
#else
1409
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1410
#endif
1411
}
1412

1413
LJFOLD(MOD any KINT)
1414
LJFOLDF(simplify_intmod_k)
70✔
1415
{
1416
  int32_t k = fright->i;
70✔
1417
  lj_assertJ(k != 0, "integer mod 0");
70✔
1418
  if (k > 0 && (k & (k-1)) == 0) {  /* i % (2^k) ==> i & (2^k-1) */
70✔
1419
    fins->o = IR_BAND;
6✔
1420
    fins->op2 = lj_ir_kint(J, k-1);
6✔
1421
    return RETRYFOLD;
6✔
1422
  }
1423
  return NEXTFOLD;
1424
}
1425

1426
LJFOLD(MOD KINT any)
1427
LJFOLDF(simplify_intmod_kleft)
2✔
1428
{
1429
  if (fleft->i == 0)
2✔
1430
    return INTFOLD(0);
×
1431
  return NEXTFOLD;
1432
}
1433

1434
LJFOLD(SUB any any)
1435
LJFOLD(SUBOV any any)
1436
LJFOLDF(simplify_intsub)
86,939✔
1437
{
1438
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))  /* i - i ==> 0 */
86,939✔
1439
    return irt_is64(fins->t) ? INT64FOLD(0) : INTFOLD(0);
×
1440
  return NEXTFOLD;
1441
}
1442

1443
LJFOLD(SUB ADD any)
1444
LJFOLDF(simplify_intsubadd_leftcancel)
27✔
1445
{
1446
  if (!irt_isnum(fins->t)) {
27✔
1447
    PHIBARRIER(fleft);
21✔
1448
    if (fins->op2 == fleft->op1)  /* (i + j) - i ==> j */
20✔
1449
      return fleft->op2;
17✔
1450
    if (fins->op2 == fleft->op2)  /* (i + j) - j ==> i */
3✔
1451
      return fleft->op1;
×
1452
  }
1453
  return NEXTFOLD;
1454
}
1455

1456
LJFOLD(SUB SUB any)
1457
LJFOLDF(simplify_intsubsub_leftcancel)
321✔
1458
{
1459
  if (!irt_isnum(fins->t)) {
321✔
1460
    PHIBARRIER(fleft);
4✔
1461
    if (fins->op2 == fleft->op1) {  /* (i - j) - i ==> 0 - j */
2✔
1462
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
×
1463
      fins->op2 = fleft->op2;
×
1464
      return RETRYFOLD;
×
1465
    }
1466
  }
1467
  return NEXTFOLD;
1468
}
1469

1470
LJFOLD(SUB any SUB)
1471
LJFOLDF(simplify_intsubsub_rightcancel)
×
1472
{
1473
  if (!irt_isnum(fins->t)) {
×
1474
    PHIBARRIER(fright);
×
1475
    if (fins->op1 == fright->op1)  /* i - (i - j) ==> j */
×
1476
      return fright->op2;
×
1477
  }
1478
  return NEXTFOLD;
1479
}
1480

1481
LJFOLD(SUB any ADD)
1482
LJFOLDF(simplify_intsubadd_rightcancel)
57✔
1483
{
1484
  if (!irt_isnum(fins->t)) {
57✔
1485
    PHIBARRIER(fright);
54✔
1486
    if (fins->op1 == fright->op1) {  /* i - (i + j) ==> 0 - j */
52✔
1487
      fins->op2 = fright->op2;
11✔
1488
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
11✔
1489
      return RETRYFOLD;
11✔
1490
    }
1491
    if (fins->op1 == fright->op2) {  /* i - (j + i) ==> 0 - j */
41✔
1492
      fins->op2 = fright->op1;
5✔
1493
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
5✔
1494
      return RETRYFOLD;
5✔
1495
    }
1496
  }
1497
  return NEXTFOLD;
1498
}
1499

1500
LJFOLD(SUB ADD ADD)
1501
LJFOLDF(simplify_intsubaddadd_cancel)
15✔
1502
{
1503
  if (!irt_isnum(fins->t)) {
15✔
1504
    PHIBARRIER(fleft);
15✔
1505
    PHIBARRIER(fright);
14✔
1506
    if (fleft->op1 == fright->op1) {  /* (i + j1) - (i + j2) ==> j1 - j2 */
13✔
1507
      fins->op1 = fleft->op2;
12✔
1508
      fins->op2 = fright->op2;
12✔
1509
      return RETRYFOLD;
12✔
1510
    }
1511
    if (fleft->op1 == fright->op2) {  /* (i + j1) - (j2 + i) ==> j1 - j2 */
1✔
1512
      fins->op1 = fleft->op2;
×
1513
      fins->op2 = fright->op1;
×
1514
      return RETRYFOLD;
×
1515
    }
1516
    if (fleft->op2 == fright->op1) {  /* (j1 + i) - (i + j2) ==> j1 - j2 */
1✔
1517
      fins->op1 = fleft->op1;
×
1518
      fins->op2 = fright->op2;
×
1519
      return RETRYFOLD;
×
1520
    }
1521
    if (fleft->op2 == fright->op2) {  /* (j1 + i) - (j2 + i) ==> j1 - j2 */
1✔
1522
      fins->op1 = fleft->op1;
×
1523
      fins->op2 = fright->op1;
×
1524
      return RETRYFOLD;
×
1525
    }
1526
  }
1527
  return NEXTFOLD;
1528
}
1529

1530
LJFOLD(BAND any KINT)
1531
LJFOLD(BAND any KINT64)
1532
LJFOLDF(simplify_band_k)
36,914✔
1533
{
1534
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
36,914✔
1535
                                     (int64_t)ir_k64(fright)->u64;
6✔
1536
  if (k == 0)  /* i & 0 ==> 0 */
36,914✔
1537
    return RIGHTFOLD;
1✔
1538
  if (k == -1)  /* i & -1 ==> i */
36,913✔
1539
    return LEFTFOLD;
×
1540
  return NEXTFOLD;
1541
}
1542

1543
LJFOLD(BOR any KINT)
1544
LJFOLD(BOR any KINT64)
1545
LJFOLDF(simplify_bor_k)
56✔
1546
{
1547
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
56✔
1548
                                     (int64_t)ir_k64(fright)->u64;
6✔
1549
  if (k == 0)  /* i | 0 ==> i */
56✔
1550
    return LEFTFOLD;
×
1551
  if (k == -1)  /* i | -1 ==> -1 */
56✔
1552
    return RIGHTFOLD;
×
1553
  return NEXTFOLD;
1554
}
1555

1556
LJFOLD(BXOR any KINT)
1557
LJFOLD(BXOR any KINT64)
1558
LJFOLDF(simplify_bxor_k)
10✔
1559
{
1560
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
10✔
1561
                                     (int64_t)ir_k64(fright)->u64;
6✔
1562
  if (k == 0)  /* i xor 0 ==> i */
10✔
1563
    return LEFTFOLD;
4✔
1564
  if (k == -1) {  /* i xor -1 ==> ~i */
6✔
1565
    fins->o = IR_BNOT;
×
1566
    fins->op2 = 0;
×
1567
    return RETRYFOLD;
×
1568
  }
1569
  return NEXTFOLD;
1570
}
1571

1572
LJFOLD(BSHL any KINT)
1573
LJFOLD(BSHR any KINT)
1574
LJFOLD(BSAR any KINT)
1575
LJFOLD(BROL any KINT)
1576
LJFOLD(BROR any KINT)
1577
LJFOLDF(simplify_shift_ik)
6,138✔
1578
{
1579
  int32_t mask = irt_is64(fins->t) ? 63 : 31;
6,138✔
1580
  int32_t k = (fright->i & mask);
6,138✔
1581
  if (k == 0)  /* i o 0 ==> i */
6,138✔
1582
    return LEFTFOLD;
1✔
1583
  if (k == 1 && fins->o == IR_BSHL) {  /* i << 1 ==> i + i */
6,137✔
1584
    fins->o = IR_ADD;
18✔
1585
    fins->op2 = fins->op1;
18✔
1586
    return RETRYFOLD;
18✔
1587
  }
1588
  if (k != fright->i) {  /* i o k ==> i o (k & mask) */
6,119✔
1589
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
×
1590
    return RETRYFOLD;
×
1591
  }
1592
#ifndef LJ_TARGET_UNIFYROT
1593
  if (fins->o == IR_BROR) {  /* bror(i, k) ==> brol(i, (-k)&mask) */
6,119✔
1594
    fins->o = IR_BROL;
×
1595
    fins->op2 = (IRRef1)lj_ir_kint(J, (-k)&mask);
×
1596
    return RETRYFOLD;
×
1597
  }
1598
#endif
1599
  return NEXTFOLD;
1600
}
1601

1602
LJFOLD(BSHL any BAND)
1603
LJFOLD(BSHR any BAND)
1604
LJFOLD(BSAR any BAND)
1605
LJFOLD(BROL any BAND)
1606
LJFOLD(BROR any BAND)
1607
LJFOLDF(simplify_shift_andk)
×
1608
{
1609
  IRIns *irk = IR(fright->op2);
×
1610
  PHIBARRIER(fright);
×
1611
  if ((fins->o < IR_BROL ? LJ_TARGET_MASKSHIFT : LJ_TARGET_MASKROT) &&
×
1612
      irk->o == IR_KINT) {  /* i o (j & mask) ==> i o j */
×
1613
    int32_t mask = irt_is64(fins->t) ? 63 : 31;
×
1614
    int32_t k = irk->i & mask;
×
1615
    if (k == mask) {
×
1616
      fins->op2 = fright->op1;
×
1617
      return RETRYFOLD;
×
1618
    }
1619
  }
1620
  return NEXTFOLD;
1621
}
1622

1623
LJFOLD(BSHL KINT any)
1624
LJFOLD(BSHR KINT any)
1625
LJFOLD(BSHL KINT64 any)
1626
LJFOLD(BSHR KINT64 any)
1627
LJFOLDF(simplify_shift1_ki)
2✔
1628
{
1629
  int64_t k = fleft->o == IR_KINT ? (int64_t)fleft->i :
2✔
1630
                                    (int64_t)ir_k64(fleft)->u64;
2✔
1631
  if (k == 0)  /* 0 o i ==> 0 */
2✔
1632
    return LEFTFOLD;
×
1633
  return NEXTFOLD;
1634
}
1635

1636
LJFOLD(BSAR KINT any)
1637
LJFOLD(BROL KINT any)
1638
LJFOLD(BROR KINT any)
1639
LJFOLD(BSAR KINT64 any)
1640
LJFOLD(BROL KINT64 any)
1641
LJFOLD(BROR KINT64 any)
1642
LJFOLDF(simplify_shift2_ki)
2✔
1643
{
1644
  int64_t k = fleft->o == IR_KINT ? (int64_t)fleft->i :
2✔
1645
                                    (int64_t)ir_k64(fleft)->u64;
2✔
1646
  if (k == 0 || k == -1)  /* 0 o i ==> 0; -1 o i ==> -1 */
2✔
1647
    return LEFTFOLD;
×
1648
  return NEXTFOLD;
1649
}
1650

1651
LJFOLD(BSHL BAND KINT)
1652
LJFOLD(BSHR BAND KINT)
1653
LJFOLD(BROL BAND KINT)
1654
LJFOLD(BROR BAND KINT)
1655
LJFOLDF(simplify_shiftk_andk)
5✔
1656
{
1657
  IRIns *irk = IR(fleft->op2);
5✔
1658
  PHIBARRIER(fleft);
5✔
1659
  if (irk->o == IR_KINT) {  /* (i & k1) o k2 ==> (i o k2) & (k1 o k2) */
5✔
1660
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
5✔
1661
    fins->op1 = fleft->op1;
5✔
1662
    fins->op1 = (IRRef1)lj_opt_fold(J);
5✔
1663
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
5✔
1664
    fins->ot = IRTI(IR_BAND);
5✔
1665
    return RETRYFOLD;
5✔
UNCOV
1666
  } else if (irk->o == IR_KINT64) {
×
UNCOV
1667
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, fright->i,
×
UNCOV
1668
                                  (IROp)fins->o);
×
UNCOV
1669
    IROpT ot = fleft->ot;
×
UNCOV
1670
    fins->op1 = fleft->op1;
×
UNCOV
1671
    fins->op1 = (IRRef1)lj_opt_fold(J);
×
UNCOV
1672
    fins->op2 = (IRRef1)lj_ir_kint64(J, k);
×
UNCOV
1673
    fins->ot = ot;
×
UNCOV
1674
    return RETRYFOLD;
×
1675
  }
1676
  return NEXTFOLD;
1677
}
1678

1679
LJFOLD(BAND BSHL KINT)
1680
LJFOLD(BAND BSHR KINT)
1681
LJFOLDF(simplify_andk_shiftk)
14✔
1682
{
1683
  IRIns *irk = IR(fleft->op2);
14✔
1684
  if (irk->o == IR_KINT &&
14✔
1685
      kfold_intop(-1, irk->i, (IROp)fleft->o) == fright->i)
14✔
1686
    return LEFTFOLD;  /* (i o k1) & k2 ==> i, if (-1 o k1) == k2 */
2✔
1687
  return NEXTFOLD;
1688
}
1689

1690
LJFOLD(BAND BOR KINT)
1691
LJFOLD(BOR BAND KINT)
1692
LJFOLDF(simplify_andor_k)
×
1693
{
1694
  IRIns *irk = IR(fleft->op2);
×
1695
  PHIBARRIER(fleft);
×
1696
  if (irk->o == IR_KINT) {
×
1697
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
×
1698
    /* (i | k1) & k2 ==> i & k2, if (k1 & k2) == 0. */
1699
    /* (i & k1) | k2 ==> i | k2, if (k1 | k2) == -1. */
1700
    if (k == (fins->o == IR_BAND ? 0 : -1)) {
×
1701
      fins->op1 = fleft->op1;
×
1702
      return RETRYFOLD;
×
1703
    }
1704
  }
1705
  return NEXTFOLD;
1706
}
1707

1708
LJFOLD(BAND BOR KINT64)
1709
LJFOLD(BOR BAND KINT64)
1710
LJFOLDF(simplify_andor_k64)
×
1711
{
1712
#if LJ_HASFFI
1713
  IRIns *irk = IR(fleft->op2);
×
1714
  PHIBARRIER(fleft);
×
1715
  if (irk->o == IR_KINT64) {
×
1716
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, ir_k64(fright)->u64,
×
1717
                                  (IROp)fins->o);
×
1718
    /* (i | k1) & k2 ==> i & k2, if (k1 & k2) == 0. */
1719
    /* (i & k1) | k2 ==> i | k2, if (k1 | k2) == -1. */
1720
    if (k == (fins->o == IR_BAND ? (uint64_t)0 : ~(uint64_t)0)) {
×
1721
      fins->op1 = fleft->op1;
×
1722
      return RETRYFOLD;
×
1723
    }
1724
  }
1725
  return NEXTFOLD;
1726
#else
1727
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1728
#endif
1729
}
1730

1731
/* -- Reassociation ------------------------------------------------------- */
1732

1733
LJFOLD(ADD ADD KINT)
1734
LJFOLD(MUL MUL KINT)
1735
LJFOLD(BAND BAND KINT)
1736
LJFOLD(BOR BOR KINT)
1737
LJFOLD(BXOR BXOR KINT)
1738
LJFOLDF(reassoc_intarith_k)
1,813✔
1739
{
1740
  IRIns *irk = IR(fleft->op2);
1,813✔
1741
  if (irk->o == IR_KINT) {
1,813✔
1742
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
1,789✔
1743
    if (k == irk->i)  /* (i o k1) o k2 ==> i o k1, if (k1 o k2) == k1. */
1,789✔
1744
      return LEFTFOLD;
16✔
1745
    PHIBARRIER(fleft);
1,773✔
1746
    fins->op1 = fleft->op1;
112✔
1747
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
112✔
1748
    return RETRYFOLD;  /* (i o k1) o k2 ==> i o (k1 o k2) */
112✔
1749
  }
1750
  return NEXTFOLD;
1751
}
1752

1753
LJFOLD(ADD ADD KINT64)
1754
LJFOLD(MUL MUL KINT64)
1755
LJFOLD(BAND BAND KINT64)
1756
LJFOLD(BOR BOR KINT64)
1757
LJFOLD(BXOR BXOR KINT64)
1758
LJFOLDF(reassoc_intarith_k64)
27,659✔
1759
{
1760
#if LJ_HASFFI
1761
  IRIns *irk = IR(fleft->op2);
27,659✔
1762
  if (irk->o == IR_KINT64) {
27,659✔
1763
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, ir_k64(fright)->u64,
8,853✔
1764
                                  (IROp)fins->o);
8,853✔
1765
    PHIBARRIER(fleft);
8,853✔
1766
    fins->op1 = fleft->op1;
8,845✔
1767
    fins->op2 = (IRRef1)lj_ir_kint64(J, k);
8,845✔
1768
    return RETRYFOLD;  /* (i o k1) o k2 ==> i o (k1 o k2) */
8,845✔
1769
  }
1770
  return NEXTFOLD;
1771
#else
1772
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1773
#endif
1774
}
1775

1776
LJFOLD(BAND BAND any)
1777
LJFOLD(BOR BOR any)
1778
LJFOLDF(reassoc_dup)
1✔
1779
{
1780
  if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2)
1✔
1781
    return LEFTFOLD;  /* (a o b) o a ==> a o b; (a o b) o b ==> a o b */
×
1782
  return NEXTFOLD;
1783
}
1784

1785
LJFOLD(MIN MIN any)
1786
LJFOLD(MAX MAX any)
1787
LJFOLDF(reassoc_dup_minmax)
8✔
1788
{
1789
  if (fins->op2 == fleft->op2)
8✔
1790
    return LEFTFOLD;  /* (a o b) o b ==> a o b */
2✔
1791
  return NEXTFOLD;
1792
}
1793

1794
LJFOLD(BXOR BXOR any)
1795
LJFOLDF(reassoc_bxor)
1✔
1796
{
1797
  PHIBARRIER(fleft);
1✔
1798
  if (fins->op2 == fleft->op1)  /* (a xor b) xor a ==> b */
1✔
1799
    return fleft->op2;
×
1800
  if (fins->op2 == fleft->op2)  /* (a xor b) xor b ==> a */
1✔
1801
    return fleft->op1;
×
1802
  return NEXTFOLD;
1803
}
1804

1805
LJFOLD(BSHL BSHL KINT)
1806
LJFOLD(BSHR BSHR KINT)
1807
LJFOLD(BSAR BSAR KINT)
1808
LJFOLD(BROL BROL KINT)
1809
LJFOLD(BROR BROR KINT)
1810
LJFOLDF(reassoc_shift)
2✔
1811
{
1812
  IRIns *irk = IR(fleft->op2);
2✔
1813
  PHIBARRIER(fleft);  /* The (shift any KINT) rule covers k2 == 0 and more. */
2✔
1814
  if (irk->o == IR_KINT) {  /* (i o k1) o k2 ==> i o (k1 + k2) */
1✔
1815
    int32_t mask = irt_is64(fins->t) ? 63 : 31;
1✔
1816
    int32_t k = (irk->i & mask) + (fright->i & mask);
1✔
1817
    if (k > mask) {  /* Combined shift too wide? */
1✔
1818
      if (fins->o == IR_BSHL || fins->o == IR_BSHR)
×
1819
        return mask == 31 ? INTFOLD(0) : INT64FOLD(0);
×
1820
      else if (fins->o == IR_BSAR)
×
1821
        k = mask;
1822
      else
1823
        k &= mask;
×
1824
    }
1825
    fins->op1 = fleft->op1;
1✔
1826
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
1✔
1827
    return RETRYFOLD;
1✔
1828
  }
1829
  return NEXTFOLD;
1830
}
1831

1832
LJFOLD(MIN MIN KINT)
1833
LJFOLD(MAX MAX KINT)
1834
LJFOLDF(reassoc_minmax_k)
×
1835
{
1836
  IRIns *irk = IR(fleft->op2);
×
1837
  if (irk->o == IR_KINT) {
×
1838
    int32_t a = irk->i;
×
1839
    int32_t y = kfold_intop(a, fright->i, fins->o);
×
1840
    if (a == y)  /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
×
1841
      return LEFTFOLD;
×
1842
    PHIBARRIER(fleft);
×
1843
    fins->op1 = fleft->op1;
×
1844
    fins->op2 = (IRRef1)lj_ir_kint(J, y);
×
1845
    return RETRYFOLD;  /* (x o k1) o k2 ==> x o (k1 o k2) */
×
1846
  }
1847
  return NEXTFOLD;
1848
}
1849

1850
/* -- Array bounds check elimination -------------------------------------- */
1851

1852
/* Eliminate ABC across PHIs to handle t[i-1] forwarding case.
1853
** ABC(asize, (i+k)+(-k)) ==> ABC(asize, i), but only if it already exists.
1854
** Could be generalized to (i+k1)+k2 ==> i+(k1+k2), but needs better disambig.
1855
*/
1856
LJFOLD(ABC any ADD)
1857
LJFOLDF(abc_fwd)
638✔
1858
{
1859
  if (LJ_LIKELY(J->flags & JIT_F_OPT_ABC)) {
638✔
1860
    if (irref_isk(fright->op2)) {
638✔
1861
      IRIns *add2 = IR(fright->op1);
637✔
1862
      if (add2->o == IR_ADD && irref_isk(add2->op2) &&
637✔
1863
          IR(fright->op2)->i == -IR(add2->op2)->i) {
30✔
1864
        IRRef ref = J->chain[IR_ABC];
6✔
1865
        IRRef lim = add2->op1;
6✔
1866
        if (fins->op1 > lim) lim = fins->op1;
6✔
1867
        while (ref > lim) {
15✔
1868
          IRIns *ir = IR(ref);
14✔
1869
          if (ir->op1 == fins->op1 && ir->op2 == add2->op1)
14✔
1870
            return DROPFOLD;
1871
          ref = ir->prev;
9✔
1872
        }
1873
      }
1874
    }
1875
  }
1876
  return NEXTFOLD;
1877
}
1878

1879
/* Eliminate ABC for constants.
1880
** ABC(asize, k1), ABC(asize k2) ==> ABC(asize, max(k1, k2))
1881
** Drop second ABC if k2 is lower. Otherwise patch first ABC with k2.
1882
*/
1883
LJFOLD(ABC any KINT)
1884
LJFOLDF(abc_k)
1,069✔
1885
{
1886
  PHIBARRIER(fleft);
1,069✔
1887
  if (LJ_LIKELY(J->flags & JIT_F_OPT_ABC)) {
1,067✔
1888
    IRRef ref = J->chain[IR_ABC];
1,067✔
1889
    IRRef asize = fins->op1;
1,067✔
1890
    while (ref > asize) {
1,203✔
1891
      IRIns *ir = IR(ref);
907✔
1892
      if (ir->op1 == asize && irref_isk(ir->op2)) {
907✔
1893
        uint32_t k = (uint32_t)IR(ir->op2)->i;
771✔
1894
        if ((uint32_t)fright->i > k)
771✔
1895
          ir->op2 = fins->op2;
52✔
1896
        return DROPFOLD;
771✔
1897
      }
1898
      ref = ir->prev;
136✔
1899
    }
1900
    return EMITFOLD;  /* Already performed CSE. */
296✔
1901
  }
1902
  return NEXTFOLD;
1903
}
1904

1905
/* Eliminate invariant ABC inside loop. */
1906
LJFOLD(ABC any any)
1907
LJFOLDF(abc_invar)
4,429✔
1908
{
1909
  /* Invariant ABC marked as PTR. Drop if op1 is invariant, too. */
1910
  if (!irt_isint(fins->t) && fins->op1 < J->chain[IR_LOOP] &&
4,429✔
1911
      !irt_isphi(IR(fins->op1)->t))
7✔
1912
    return DROPFOLD;
7✔
1913
  return NEXTFOLD;
1914
}
1915

1916
/* -- Commutativity ------------------------------------------------------- */
1917

1918
/* The refs of commutative ops are canonicalized. Lower refs go to the right.
1919
** Rationale behind this:
1920
** - It (also) moves constants to the right.
1921
** - It reduces the number of FOLD rules (e.g. (BOR any KINT) suffices).
1922
** - It helps CSE to find more matches.
1923
** - The assembler generates better code with constants at the right.
1924
*/
1925

1926
LJFOLD(ADD any any)
1927
LJFOLD(MUL any any)
1928
LJFOLD(ADDOV any any)
1929
LJFOLD(MULOV any any)
1930
LJFOLDF(comm_swap)
1,509,055✔
1931
{
1932
  if (fins->op1 < fins->op2) {  /* Move lower ref to the right. */
246,688✔
1933
    IRRef1 tmp = fins->op1;
85,130✔
1934
    fins->op1 = fins->op2;
85,130✔
1935
    fins->op2 = tmp;
85,130✔
1936
    return RETRYFOLD;
85,130✔
1937
  }
1938
  return NEXTFOLD;
1939
}
1940

1941
LJFOLD(EQ any any)
1942
LJFOLD(NE any any)
1943
LJFOLDF(comm_equal)
1,225,481✔
1944
{
1945
  /* For non-numbers only: x == x ==> drop; x ~= x ==> fail */
1946
  if (fins->op1 == fins->op2 &&
1,225,481✔
1947
      (!irt_isnum(fins->t) ||
299✔
1948
       (fleft->o == IR_CONV &&  /* Converted integers cannot be NaN. */
101✔
1949
        (uint32_t)(fleft->op2 & IRCONV_SRCMASK) - (uint32_t)IRT_I8 <= (uint32_t)(IRT_U64 - IRT_U8))))
11✔
1950
    return CONDFOLD(fins->o == IR_EQ);
206✔
1951
  return fold_comm_swap(J);
1,225,275✔
1952
}
1953

1954
LJFOLD(LT any any)
1955
LJFOLD(GE any any)
1956
LJFOLD(LE any any)
1957
LJFOLD(GT any any)
1958
LJFOLD(ULT any any)
1959
LJFOLD(UGE any any)
1960
LJFOLD(ULE any any)
1961
LJFOLD(UGT any any)
1962
LJFOLDF(comm_comp)
279,090✔
1963
{
1964
  /* For non-numbers only: x <=> x ==> drop; x <> x ==> fail */
1965
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))
279,090✔
1966
    return CONDFOLD((fins->o ^ (fins->o >> 1)) & 1);
×
1967
  if (fins->op1 < fins->op2) {  /* Move lower ref to the right. */
279,090✔
1968
    IRRef1 tmp = fins->op1;
92,668✔
1969
    fins->op1 = fins->op2;
92,668✔
1970
    fins->op2 = tmp;
92,668✔
1971
    fins->o ^= 3; /* GT <-> LT, GE <-> LE, does not affect U */
92,668✔
1972
    return RETRYFOLD;
92,668✔
1973
  }
1974
  return NEXTFOLD;
1975
}
1976

1977
LJFOLD(BAND any any)
1978
LJFOLD(BOR any any)
1979
LJFOLDF(comm_dup)
37,046✔
1980
{
1981
  if (fins->op1 == fins->op2)  /* x o x ==> x */
37,046✔
1982
    return LEFTFOLD;
×
1983
  return fold_comm_swap(J);
37,046✔
1984
}
1985

1986
LJFOLD(MIN any any)
1987
LJFOLD(MAX any any)
1988
LJFOLDF(comm_dup_minmax)
73,120✔
1989
{
1990
  if (fins->op1 == fins->op2)  /* x o x ==> x */
73,120✔
1991
    return LEFTFOLD;
×
1992
  return NEXTFOLD;
1993
}
1994

1995
LJFOLD(BXOR any any)
1996
LJFOLDF(comm_bxor)
48✔
1997
{
1998
  if (fins->op1 == fins->op2)  /* i xor i ==> 0 */
48✔
1999
    return irt_is64(fins->t) ? INT64FOLD(0) : INTFOLD(0);
2✔
2000
  return fold_comm_swap(J);
46✔
2001
}
2002

2003
/* -- Simplification of compound expressions ------------------------------ */
2004

2005
static TRef kfold_xload(jit_State *J, IRIns *ir, const void *p)
2006
{
2007
  int32_t k;
2008
  switch (irt_type(ir->t)) {
2009
  case IRT_NUM: return lj_ir_knum_u64(J, *(uint64_t *)p);
2010
  case IRT_I8: k = (int32_t)*(int8_t *)p; break;
2011
  case IRT_U8: k = (int32_t)*(uint8_t *)p; break;
2012
  case IRT_I16: k = (int32_t)(int16_t)lj_getu16(p); break;
2013
  case IRT_U16: k = (int32_t)(uint16_t)lj_getu16(p); break;
2014
  case IRT_INT: case IRT_U32: k = (int32_t)lj_getu32(p); break;
2015
  case IRT_I64: case IRT_U64: return lj_ir_kint64(J, *(uint64_t *)p);
2016
  default: return 0;
2017
  }
2018
  return lj_ir_kint(J, k);
2019
}
2020

2021
/* Turn: string.sub(str, a, b) == kstr
2022
** into: string.byte(str, a) == string.byte(kstr, 1) etc.
2023
** Note: this creates unaligned XLOADs on x86/x64.
2024
*/
2025
LJFOLD(EQ SNEW KGC)
2026
LJFOLD(NE SNEW KGC)
2027
LJFOLDF(merge_eqne_snew_kgc)
83✔
2028
{
2029
  GCstr *kstr = ir_kstr(fright);
83✔
2030
  int32_t len = (int32_t)kstr->len;
83✔
2031
  lj_assertJ(irt_isstr(fins->t), "bad equality IR type");
83✔
2032

2033
#if LJ_TARGET_UNALIGNED
2034
#define FOLD_SNEW_MAX_LEN        4  /* Handle string lengths 0, 1, 2, 3, 4. */
2035
#define FOLD_SNEW_TYPE8                IRT_I8        /* Creates shorter immediates. */
2036
#else
2037
#define FOLD_SNEW_MAX_LEN        1  /* Handle string lengths 0 or 1. */
2038
#define FOLD_SNEW_TYPE8                IRT_U8  /* Prefer unsigned loads. */
2039
#endif
2040

2041
  PHIBARRIER(fleft);
83✔
2042
  if (len <= FOLD_SNEW_MAX_LEN) {
79✔
2043
    IROp op = (IROp)fins->o;
79✔
2044
    IRRef strref = fleft->op1;
79✔
2045
    if (IR(strref)->o != IR_STRREF)
79✔
2046
      return NEXTFOLD;
2047
    if (op == IR_EQ) {
79✔
2048
      emitir(IRTGI(IR_EQ), fleft->op2, lj_ir_kint(J, len));
17✔
2049
      /* Caveat: fins/fleft/fright is no longer valid after emitir. */
2050
    } else {
2051
      /* NE is not expanded since this would need an OR of two conds. */
2052
      if (!irref_isk(fleft->op2))  /* Only handle the constant length case. */
62✔
2053
        return NEXTFOLD;
2054
      if (IR(fleft->op2)->i != len)
60✔
2055
        return DROPFOLD;
2056
    }
2057
    if (len > 0) {
47✔
2058
      /* A 4 byte load for length 3 is ok -- all strings have an extra NUL. */
2059
      uint16_t ot = (uint16_t)(len == 1 ? IRT(IR_XLOAD, FOLD_SNEW_TYPE8) :
46✔
2060
                               len == 2 ? IRT(IR_XLOAD, IRT_U16) :
2061
                               IRTI(IR_XLOAD));
2062
      TRef tmp = emitir(ot, strref,
89✔
2063
                        IRXLOAD_READONLY | (len > 1 ? IRXLOAD_UNALIGNED : 0));
2064
      TRef val = kfold_xload(J, IR(tref_ref(tmp)), strdata(kstr));
46✔
2065
      if (len == 3)
46✔
2066
        tmp = emitir(IRTI(IR_BAND), tmp,
1✔
2067
                     lj_ir_kint(J, LJ_ENDIAN_SELECT(0x00ffffff, 0xffffff00)));
2068
      fins->op1 = (IRRef1)tmp;
46✔
2069
      fins->op2 = (IRRef1)val;
46✔
2070
      fins->ot = (IROpT)IRTGI(op);
46✔
2071
      return RETRYFOLD;
46✔
2072
    } else {
2073
      return DROPFOLD;
2074
    }
2075
  }
2076
  return NEXTFOLD;
2077
}
2078

2079
/* -- Loads --------------------------------------------------------------- */
2080

2081
/* Loads cannot be folded or passed on to CSE in general.
2082
** Alias analysis is needed to check for forwarding opportunities.
2083
**
2084
** Caveat: *all* loads must be listed here or they end up at CSE!
2085
*/
2086

2087
LJFOLD(ALOAD any)
2088
LJFOLDX(lj_opt_fwd_aload)
2089

2090
/* From HREF fwd (see below). Must eliminate, not supported by fwd/backend. */
2091
LJFOLD(HLOAD KKPTR)
2092
LJFOLDF(kfold_hload_kkptr)
3✔
2093
{
2094
  UNUSED(J);
3✔
2095
  lj_assertJ(ir_kptr(fleft) == niltvg(J2G(J)), "expected niltv");
3✔
2096
  return TREF_NIL;
3✔
2097
}
2098

2099
LJFOLD(HLOAD any)
2100
LJFOLDX(lj_opt_fwd_hload)
2101

2102
LJFOLD(ULOAD any)
2103
LJFOLDX(lj_opt_fwd_uload)
2104

2105
LJFOLD(CALLL any IRCALL_lj_tab_len)
2106
LJFOLDX(lj_opt_fwd_tab_len)
2107

2108
/* Upvalue refs are really loads, but there are no corresponding stores.
2109
** So CSE is ok for them, except for UREFO across a GC step (see below).
2110
** If the referenced function is const, its upvalue addresses are const, too.
2111
** This can be used to improve CSE by looking for the same address,
2112
** even if the upvalues originate from a different function.
2113
*/
2114
LJFOLD(UREFO KGC any)
2115
LJFOLD(UREFC KGC any)
2116
LJFOLDF(cse_uref)
54,990✔
2117
{
2118
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
54,990✔
2119
    IRRef ref = J->chain[fins->o];
54,990✔
2120
    GCfunc *fn = ir_kfunc(fleft);
54,990✔
2121
    GCupval *uv = gco2uv(gcref(fn->l.uvptr[(fins->op2 >> 8)]));
54,990✔
2122
    while (ref > 0) {
94,831✔
2123
      IRIns *ir = IR(ref);
67,639✔
2124
      if (irref_isk(ir->op1)) {
67,639✔
2125
        GCfunc *fn2 = ir_kfunc(IR(ir->op1));
66,986✔
2126
        if (gco2uv(gcref(fn2->l.uvptr[(ir->op2 >> 8)])) == uv) {
66,986✔
2127
          if (fins->o == IR_UREFO && gcstep_barrier(J, ref))
27,798✔
2128
            break;
2129
          return ref;
2130
        }
2131
      }
2132
      ref = ir->prev;
39,841✔
2133
    }
2134
  }
2135
  return EMITFOLD;
27,203✔
2136
}
2137

2138
LJFOLD(HREFK any any)
2139
LJFOLDX(lj_opt_fwd_hrefk)
2140

2141
LJFOLD(HREF TNEW any)
2142
LJFOLDF(fwd_href_tnew)
21✔
2143
{
2144
  if (lj_opt_fwd_href_nokey(J))
21✔
2145
    return lj_ir_kkptr(J, niltvg(J2G(J)));
16✔
2146
  return NEXTFOLD;
2147
}
2148

2149
LJFOLD(HREF TDUP KPRI)
2150
LJFOLD(HREF TDUP KGC)
2151
LJFOLD(HREF TDUP KNUM)
2152
LJFOLDF(fwd_href_tdup)
180✔
2153
{
2154
  TValue keyv;
180✔
2155
  lj_ir_kvalue(J->L, &keyv, fright);
180✔
2156
  if (lj_tab_get(J->L, ir_ktab(IR(fleft->op1)), &keyv) == niltvg(J2G(J)) &&
360✔
2157
      lj_opt_fwd_href_nokey(J))
180✔
2158
    return lj_ir_kkptr(J, niltvg(J2G(J)));
130✔
2159
  return NEXTFOLD;
2160
}
2161

2162
/* We can safely FOLD/CSE array/hash refs and field loads, since there
2163
** are no corresponding stores. But we need to check for any NEWREF with
2164
** an aliased table, as it may invalidate all of the pointers and fields.
2165
** Only HREF needs the NEWREF check -- AREF and HREFK already depend on
2166
** FLOADs. And NEWREF itself is treated like a store (see below).
2167
** LREF is constant (per trace) since coroutine switches are not inlined.
2168
*/
2169
LJFOLD(FLOAD TNEW IRFL_TAB_ASIZE)
2170
LJFOLDF(fload_tab_tnew_asize)
979✔
2171
{
2172
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
979✔
2173
    return INTFOLD(fleft->op1);
307✔
2174
  return NEXTFOLD;
2175
}
2176

2177
LJFOLD(FLOAD TNEW IRFL_TAB_HMASK)
2178
LJFOLDF(fload_tab_tnew_hmask)
94✔
2179
{
2180
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
94✔
2181
    return INTFOLD((1 << fleft->op2)-1);
70✔
2182
  return NEXTFOLD;
2183
}
2184

2185
LJFOLD(FLOAD TDUP IRFL_TAB_ASIZE)
2186
LJFOLDF(fload_tab_tdup_asize)
62✔
2187
{
2188
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
62✔
2189
    return INTFOLD((int32_t)ir_ktab(IR(fleft->op1))->asize);
45✔
2190
  return NEXTFOLD;
2191
}
2192

2193
LJFOLD(FLOAD TDUP IRFL_TAB_HMASK)
2194
LJFOLDF(fload_tab_tdup_hmask)
1,358✔
2195
{
2196
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
1,358✔
2197
    return INTFOLD((int32_t)ir_ktab(IR(fleft->op1))->hmask);
1,103✔
2198
  return NEXTFOLD;
2199
}
2200

2201
LJFOLD(HREF any any)
2202
LJFOLD(FLOAD any IRFL_TAB_ARRAY)
2203
LJFOLD(FLOAD any IRFL_TAB_NODE)
2204
LJFOLD(FLOAD any IRFL_TAB_ASIZE)
2205
LJFOLD(FLOAD any IRFL_TAB_HMASK)
2206
LJFOLDF(fload_tab_ah)
1,974,160✔
2207
{
2208
  TRef tr = lj_opt_cse(J);
1,974,160✔
2209
  return lj_opt_fwd_tptr(J, tref_ref(tr)) ? tr : EMITFOLD;
1,974,160✔
2210
}
2211

2212
/* Strings are immutable, so we can safely FOLD/CSE the related FLOAD. */
2213
LJFOLD(FLOAD KGC IRFL_STR_LEN)
2214
LJFOLDF(fload_str_len_kgc)
112✔
2215
{
2216
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
112✔
2217
    return INTFOLD((int32_t)ir_kstr(fleft)->len);
112✔
2218
  return NEXTFOLD;
2219
}
2220

2221
LJFOLD(FLOAD SNEW IRFL_STR_LEN)
2222
LJFOLDF(fload_str_len_snew)
3✔
2223
{
2224
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
3✔
2225
    PHIBARRIER(fleft);
3✔
2226
    return fleft->op2;
3✔
2227
  }
2228
  return NEXTFOLD;
2229
}
2230

2231
LJFOLD(FLOAD TOSTR IRFL_STR_LEN)
2232
LJFOLDF(fload_str_len_tostr)
×
2233
{
2234
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && fleft->op2 == IRTOSTR_CHAR)
×
2235
    return INTFOLD(1);
×
2236
  return NEXTFOLD;
2237
}
2238

2239
/* The C type ID of cdata objects is immutable. */
2240
LJFOLD(FLOAD KGC IRFL_CDATA_CTYPEID)
2241
LJFOLDF(fload_cdata_typeid_kgc)
8,064✔
2242
{
2243
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
8,064✔
2244
    return INTFOLD((int32_t)ir_kcdata(fleft)->ctypeid);
8,064✔
2245
  return NEXTFOLD;
2246
}
2247

2248
/* Get the contents of immutable cdata objects. */
2249
LJFOLD(FLOAD KGC IRFL_CDATA_PTR)
2250
LJFOLD(FLOAD KGC IRFL_CDATA_INT)
2251
LJFOLD(FLOAD KGC IRFL_CDATA_INT64)
2252
LJFOLDF(fload_cdata_int64_kgc)
8,020✔
2253
{
2254
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
8,020✔
2255
    void *p = cdataptr(ir_kcdata(fleft));
8,020✔
2256
    if (irt_is64(fins->t))
8,020✔
2257
      return INT64FOLD(*(uint64_t *)p);
7,989✔
2258
    else
2259
      return INTFOLD(*(int32_t *)p);
31✔
2260
  }
2261
  return NEXTFOLD;
2262
}
2263

2264
LJFOLD(FLOAD CNEW IRFL_CDATA_CTYPEID)
2265
LJFOLD(FLOAD CNEWI IRFL_CDATA_CTYPEID)
2266
LJFOLDF(fload_cdata_typeid_cnew)
50,282✔
2267
{
2268
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
50,282✔
2269
    return fleft->op1;  /* No PHI barrier needed. CNEW/CNEWI op1 is const. */
50,282✔
2270
  return NEXTFOLD;
2271
}
2272

2273
/* Pointer, int and int64 cdata objects are immutable. */
2274
LJFOLD(FLOAD CNEWI IRFL_CDATA_PTR)
2275
LJFOLD(FLOAD CNEWI IRFL_CDATA_INT)
2276
LJFOLD(FLOAD CNEWI IRFL_CDATA_INT64)
2277
LJFOLDF(fload_cdata_ptr_int64_cnew)
50,206✔
2278
{
2279
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
50,206✔
2280
    return fleft->op2;  /* Fold even across PHI to avoid allocations. */
50,206✔
2281
  return NEXTFOLD;
2282
}
2283

2284
LJFOLD(FLOAD any IRFL_STR_LEN)
2285
LJFOLD(FLOAD any IRFL_FUNC_ENV)
2286
LJFOLD(FLOAD any IRFL_THREAD_ENV)
2287
LJFOLD(FLOAD any IRFL_CDATA_CTYPEID)
2288
LJFOLD(FLOAD any IRFL_CDATA_PTR)
2289
LJFOLD(FLOAD any IRFL_CDATA_INT)
2290
LJFOLD(FLOAD any IRFL_CDATA_INT64)
2291
LJFOLD(VLOAD any any)  /* Vararg loads have no corresponding stores. */
2292
LJFOLDX(lj_opt_cse)
2293

2294
/* All other field loads need alias analysis. */
2295
LJFOLD(FLOAD any any)
2296
LJFOLDX(lj_opt_fwd_fload)
2297

2298
/* This is for LOOP only. Recording handles SLOADs internally. */
2299
LJFOLD(SLOAD any any)
2300
LJFOLDF(fwd_sload)
5,658✔
2301
{
2302
  if ((fins->op2 & IRSLOAD_FRAME)) {
5,658✔
2303
    TRef tr = lj_opt_cse(J);
40✔
2304
    return tref_ref(tr) < J->chain[IR_RETF] ? EMITFOLD : tr;
40✔
2305
  } else {
2306
    lj_assertJ(J->slot[fins->op1] != 0, "uninitialized slot accessed");
5,618✔
2307
    return J->slot[fins->op1];
5,618✔
2308
  }
2309
}
2310

2311
/* Only fold for KKPTR. The pointer _and_ the contents must be const. */
2312
LJFOLD(XLOAD KKPTR any)
2313
LJFOLDF(xload_kptr)
37✔
2314
{
2315
  TRef tr = kfold_xload(J, fins, ir_kptr(fleft));
37✔
2316
  return tr ? tr : NEXTFOLD;
37✔
2317
}
2318

2319
LJFOLD(XLOAD any any)
2320
LJFOLDX(lj_opt_fwd_xload)
2321

2322
/* -- Frame handling ------------------------------------------------------ */
2323

2324
/* Prevent CSE of a REF_BASE operand across IR_RETF. */
2325
LJFOLD(SUB any BASE)
2326
LJFOLD(SUB BASE any)
2327
LJFOLD(EQ any BASE)
2328
LJFOLDF(fold_base)
515✔
2329
{
2330
  return lj_opt_cselim(J, J->chain[IR_RETF]);
515✔
2331
}
2332

2333
/* -- Write barriers ------------------------------------------------------ */
2334

2335
/* Write barriers are amenable to CSE, but not across any incremental
2336
** GC steps.
2337
**
2338
** The same logic applies to open upvalue references, because a stack
2339
** may be resized during a GC step (not the current stack, but maybe that
2340
** of a coroutine).
2341
*/
2342
LJFOLD(TBAR any)
2343
LJFOLD(OBAR any any)
2344
LJFOLD(UREFO any any)
2345
LJFOLDF(barrier_tab)
153,019✔
2346
{
2347
  TRef tr = lj_opt_cse(J);
153,019✔
2348
  if (gcstep_barrier(J, tref_ref(tr)))  /* CSE across GC step? */
153,019✔
2349
    return EMITFOLD;  /* Raw emit. Assumes fins is left intact by CSE. */
61✔
2350
  return tr;
2351
}
2352

2353
LJFOLD(TBAR TNEW)
2354
LJFOLD(TBAR TDUP)
2355
LJFOLDF(barrier_tnew_tdup)
1,001✔
2356
{
2357
  /* New tables are always white and never need a barrier. */
2358
  if (fins->op1 < J->chain[IR_LOOP])  /* Except across a GC step. */
1,001✔
2359
    return NEXTFOLD;
×
2360
  return DROPFOLD;
2361
}
2362

2363
/* -- Profiling ----------------------------------------------------------- */
2364

2365
LJFOLD(PROF any any)
2366
LJFOLDF(prof)
1✔
2367
{
2368
  IRRef ref = J->chain[IR_PROF];
1✔
2369
  if (ref+1 == J->cur.nins)  /* Drop neighbouring IR_PROF. */
1✔
2370
    return ref;
2371
  return EMITFOLD;
1✔
2372
}
2373

2374
/* -- Stores and allocations ---------------------------------------------- */
2375

2376
/* Stores and allocations cannot be folded or passed on to CSE in general.
2377
** But some stores can be eliminated with dead-store elimination (DSE).
2378
**
2379
** Caveat: *all* stores and allocs must be listed here or they end up at CSE!
2380
*/
2381

2382
LJFOLD(ASTORE any any)
2383
LJFOLD(HSTORE any any)
2384
LJFOLDX(lj_opt_dse_ahstore)
2385

2386
LJFOLD(USTORE any any)
2387
LJFOLDX(lj_opt_dse_ustore)
2388

2389
LJFOLD(FSTORE any any)
2390
LJFOLDX(lj_opt_dse_fstore)
2391

2392
LJFOLD(XSTORE any any)
2393
LJFOLDX(lj_opt_dse_xstore)
2394

2395
LJFOLD(NEWREF any any)  /* Treated like a store. */
2396
LJFOLD(CALLA any any)
2397
LJFOLD(CALLL any any)  /* Safeguard fallback. */
2398
LJFOLD(CALLS any any)
2399
LJFOLD(CALLXS any any)
2400
LJFOLD(XBAR)
2401
LJFOLD(RETF any any)  /* Modifies BASE. */
2402
LJFOLD(TNEW any any)
2403
LJFOLD(TDUP any)
2404
LJFOLD(CNEW any any)
2405
LJFOLD(XSNEW any any)
2406
LJFOLD(BUFHDR any any)
2407
LJFOLDX(lj_ir_emit)
2408

2409
/* ------------------------------------------------------------------------ */
2410

2411
/* Every entry in the generated hash table is a 32 bit pattern:
2412
**
2413
** xxxxxxxx iiiiiii lllllll rrrrrrrrrr
2414
**
2415
**   xxxxxxxx = 8 bit index into fold function table
2416
**    iiiiiii = 7 bit folded instruction opcode
2417
**    lllllll = 7 bit left instruction opcode
2418
** rrrrrrrrrr = 8 bit right instruction opcode or 10 bits from literal field
2419
*/
2420

2421
#include "lj_folddef.h"
2422

2423
/* ------------------------------------------------------------------------ */
2424

2425
/* Fold IR instruction. */
2426
TRef LJ_FASTCALL lj_opt_fold(jit_State *J)
8,652,515✔
2427
{
2428
  uint32_t key, any;
8,652,515✔
2429
  IRRef ref;
8,652,515✔
2430

2431
  if (LJ_UNLIKELY((J->flags & JIT_F_OPT_MASK) != JIT_F_OPT_DEFAULT)) {
8,652,515✔
2432
    lj_assertJ(((JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE|JIT_F_OPT_DSE) |
645,442✔
2433
                JIT_F_OPT_DEFAULT) == JIT_F_OPT_DEFAULT,
2434
               "bad JIT_F_OPT_DEFAULT");
2435
    /* Folding disabled? Chain to CSE, but not for loads/stores/allocs. */
2436
    if (!(J->flags & JIT_F_OPT_FOLD) && irm_kind(lj_ir_mode[fins->o]) == IRM_N)
645,442✔
2437
      return lj_opt_cse(J);
280,386✔
2438

2439
    /* No FOLD, forwarding or CSE? Emit raw IR for loads, except for SLOAD. */
2440
    if ((J->flags & (JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE)) !=
365,056✔
2441
                    (JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE) &&
363,432✔
2442
        irm_kind(lj_ir_mode[fins->o]) == IRM_L && fins->o != IR_SLOAD)
363,432✔
2443
      return lj_ir_emit(J);
269,026✔
2444

2445
    /* No FOLD or DSE? Emit raw IR for stores. */
2446
    if ((J->flags & (JIT_F_OPT_FOLD|JIT_F_OPT_DSE)) !=
96,030✔
2447
                    (JIT_F_OPT_FOLD|JIT_F_OPT_DSE) &&
94,250✔
2448
        irm_kind(lj_ir_mode[fins->o]) == IRM_S)
94,250✔
2449
      return lj_ir_emit(J);
93,923✔
2450
  }
2451

2452
  /* Fold engine start/retry point. */
2453
retry:
8,009,180✔
2454
  /* Construct key from opcode and operand opcodes (unless literal/none). */
2455
  key = ((uint32_t)fins->o << 17);
8,202,202✔
2456
  if (fins->op1 >= J->cur.nk) {
8,202,202✔
2457
    key += (uint32_t)IR(fins->op1)->o << 10;
8,192,423✔
2458
    *fleft = *IR(fins->op1);
8,192,423✔
2459
    if (fins->op1 < REF_TRUE)
8,192,423✔
2460
      fleft[1] = IR(fins->op1)[1];
1,554,323✔
2461
  }
2462
  if (fins->op2 >= J->cur.nk) {
8,202,202✔
2463
    key += (uint32_t)IR(fins->op2)->o;
4,390,809✔
2464
    *fright = *IR(fins->op2);
4,390,809✔
2465
    if (fins->op2 < REF_TRUE)
4,390,809✔
2466
      fright[1] = IR(fins->op2)[1];
3,735,138✔
2467
  } else {
2468
    key += (fins->op2 & 0x3ffu);  /* Literal mask. Must include IRCONV_*MASK. */
3,811,393✔
2469
  }
2470

2471
  /* Check for a match in order from most specific to least specific. */
2472
  any = 0;
8,202,202✔
2473
  for (;;) {
36,146,218✔
2474
    uint32_t k = key | (any & 0x1ffff);
22,174,210✔
2475
    uint32_t h = fold_hashkey(k);
22,174,210✔
2476
    uint32_t fh = fold_hash[h];  /* Lookup key in semi-perfect hash table. */
22,174,210✔
2477
    if ((fh & 0xffffff) == k || (fh = fold_hash[h+1], (fh & 0xffffff) == k)) {
22,174,210✔
2478
      ref = (IRRef)tref_ref(fold_func[fh >> 24](J));
8,254,156✔
2479
      if (ref != NEXTFOLD)
8,254,156✔
2480
        break;
2481
    }
2482
    if (any == 0xfffff)  /* Exhausted folding. Pass on to CSE. */
15,828,132✔
2483
      return lj_opt_cse(J);
1,856,124✔
2484
    any = (any | (any >> 10)) ^ 0xffc00;
13,972,008✔
2485
  }
2486

2487
  /* Return value processing, ordered by frequency. */
2488
  if (LJ_LIKELY(ref >= MAX_FOLD))
6,346,078✔
2489
    return TREF(ref, irt_t(IR(ref)->t));
5,292,707✔
2490
  if (ref == RETRYFOLD)
1,053,371✔
2491
    goto retry;
193,022✔
2492
  if (ref == KINTFOLD)
860,349✔
2493
    return lj_ir_kint(J, fins->i);
24,503✔
2494
  if (ref == FAILFOLD)
835,846✔
2495
    lj_trace_err(J, LJ_TRERR_GFAIL);
6✔
2496
  lj_assertJ(ref == DROPFOLD, "bad fold result");
2497
  return REF_DROP;
2498
}
2499

2500
/* -- Common-Subexpression Elimination ------------------------------------ */
2501

2502
/* CSE an IR instruction. This is very fast due to the skip-list chains. */
2503
TRef LJ_FASTCALL lj_opt_cse(jit_State *J)
5,489,192✔
2504
{
2505
  /* Avoid narrow to wide store-to-load forwarding stall */
2506
  IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
5,489,192✔
2507
  IROp op = fins->o;
5,489,192✔
2508
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
5,489,192✔
2509
    /* Limited search for same operands in per-opcode chain. */
2510
    IRRef ref = J->chain[op];
5,209,082✔
2511
    IRRef lim = fins->op1;
5,209,082✔
2512
    if (fins->op2 > lim) lim = fins->op2;  /* Relies on lit < REF_BIAS. */
5,209,082✔
2513
    while (ref > lim) {
19,332,557✔
2514
      if (IR(ref)->op12 == op12)
17,205,874✔
2515
        return TREF(ref, irt_t(IR(ref)->t));  /* Common subexpression found. */
3,082,399✔
2516
      ref = IR(ref)->prev;
14,123,475✔
2517
    }
2518
  }
2519
  /* Otherwise emit IR (inlined for speed). */
2520
  {
2521
    IRRef ref = lj_ir_nextins(J);
2,406,793✔
2522
    IRIns *ir = IR(ref);
2,406,792✔
2523
    ir->prev = J->chain[op];
2,406,792✔
2524
    ir->op12 = op12;
2,406,792✔
2525
    J->chain[op] = (IRRef1)ref;
2,406,792✔
2526
    ir->o = fins->o;
2,406,792✔
2527
    J->guardemit.irt |= fins->t.irt;
2,406,792✔
2528
    return TREF(ref, irt_t((ir->t = fins->t)));
2,406,792✔
2529
  }
2530
}
2531

2532
/* CSE with explicit search limit. */
2533
TRef LJ_FASTCALL lj_opt_cselim(jit_State *J, IRRef lim)
187,972✔
2534
{
2535
  IRRef ref = J->chain[fins->o];
187,972✔
2536
  IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
187,972✔
2537
  while (ref > lim) {
748,276✔
2538
    if (IR(ref)->op12 == op12)
656,815✔
2539
      return ref;
96,511✔
2540
    ref = IR(ref)->prev;
560,304✔
2541
  }
2542
  return lj_ir_emit(J);
91,461✔
2543
}
2544

2545
/* ------------------------------------------------------------------------ */
2546

2547
#undef IR
2548
#undef fins
2549
#undef fleft
2550
#undef fright
2551
#undef knumleft
2552
#undef knumright
2553
#undef emitir
2554

2555
#endif
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc