• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

tarantool / luajit / 9525830691

15 Jun 2024 05:25AM UTC coverage: 92.594% (+0.06%) from 92.536%
9525830691

push

github

Buristan
test: remove inline suppressions of _TARANTOOL

The patch defines `_TARANTOOL` as a global in the luacheck configuration
file and removes inline suppressions in test files.

5659 of 6018 branches covered (94.03%)

Branch coverage included in aggregate %.

21596 of 23417 relevant lines covered (92.22%)

2932914.2 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

77.24
/src/lj_opt_fold.c
1
/*
2
** FOLD: Constant Folding, Algebraic Simplifications and Reassociation.
3
** ABCelim: Array Bounds Check Elimination.
4
** CSE: Common-Subexpression Elimination.
5
** Copyright (C) 2005-2017 Mike Pall. See Copyright Notice in luajit.h
6
*/
7

8
#define lj_opt_fold_c
9
#define LUA_CORE
10

11
#include <math.h>
12

13
#include "lj_obj.h"
14

15
#if LJ_HASJIT
16

17
#include "lj_buf.h"
18
#include "lj_str.h"
19
#include "lj_tab.h"
20
#include "lj_ir.h"
21
#include "lj_jit.h"
22
#include "lj_ircall.h"
23
#include "lj_iropt.h"
24
#include "lj_trace.h"
25
#if LJ_HASFFI
26
#include "lj_ctype.h"
27
#include "lj_carith.h"
28
#endif
29
#include "lj_vm.h"
30
#include "lj_strscan.h"
31
#include "lj_strfmt.h"
32

33
/* Here's a short description how the FOLD engine processes instructions:
34
**
35
** The FOLD engine receives a single instruction stored in fins (J->fold.ins).
36
** The instruction and its operands are used to select matching fold rules.
37
** These are applied iteratively until a fixed point is reached.
38
**
39
** The 8 bit opcode of the instruction itself plus the opcodes of the
40
** two instructions referenced by its operands form a 24 bit key
41
** 'ins left right' (unused operands -> 0, literals -> lowest 8 bits).
42
**
43
** This key is used for partial matching against the fold rules. The
44
** left/right operand fields of the key are successively masked with
45
** the 'any' wildcard, from most specific to least specific:
46
**
47
**   ins left right
48
**   ins any  right
49
**   ins left any
50
**   ins any  any
51
**
52
** The masked key is used to lookup a matching fold rule in a semi-perfect
53
** hash table. If a matching rule is found, the related fold function is run.
54
** Multiple rules can share the same fold function. A fold rule may return
55
** one of several special values:
56
**
57
** - NEXTFOLD means no folding was applied, because an additional test
58
**   inside the fold function failed. Matching continues against less
59
**   specific fold rules. Finally the instruction is passed on to CSE.
60
**
61
** - RETRYFOLD means the instruction was modified in-place. Folding is
62
**   retried as if this instruction had just been received.
63
**
64
** All other return values are terminal actions -- no further folding is
65
** applied:
66
**
67
** - INTFOLD(i) returns a reference to the integer constant i.
68
**
69
** - LEFTFOLD and RIGHTFOLD return the left/right operand reference
70
**   without emitting an instruction.
71
**
72
** - CSEFOLD and EMITFOLD pass the instruction directly to CSE or emit
73
**   it without passing through any further optimizations.
74
**
75
** - FAILFOLD, DROPFOLD and CONDFOLD only apply to instructions which have
76
**   no result (e.g. guarded assertions): FAILFOLD means the guard would
77
**   always fail, i.e. the current trace is pointless. DROPFOLD means
78
**   the guard is always true and has been eliminated. CONDFOLD is a
79
**   shortcut for FAILFOLD + cond (i.e. drop if true, otherwise fail).
80
**
81
** - Any other return value is interpreted as an IRRef or TRef. This
82
**   can be a reference to an existing or a newly created instruction.
83
**   Only the least-significant 16 bits (IRRef1) are used to form a TRef
84
**   which is finally returned to the caller.
85
**
86
** The FOLD engine receives instructions both from the trace recorder and
87
** substituted instructions from LOOP unrolling. This means all types
88
** of instructions may end up here, even though the recorder bypasses
89
** FOLD in some cases. Thus all loads, stores and allocations must have
90
** an any/any rule to avoid being passed on to CSE.
91
**
92
** Carefully read the following requirements before adding or modifying
93
** any fold rules:
94
**
95
** Requirement #1: All fold rules must preserve their destination type.
96
**
97
** Consistently use INTFOLD() (KINT result) or lj_ir_knum() (KNUM result).
98
** Never use lj_ir_knumint() which can have either a KINT or KNUM result.
99
**
100
** Requirement #2: Fold rules should not create *new* instructions which
101
** reference operands *across* PHIs.
102
**
103
** E.g. a RETRYFOLD with 'fins->op1 = fleft->op1' is invalid if the
104
** left operand is a PHI. Then fleft->op1 would point across the PHI
105
** frontier to an invariant instruction. Adding a PHI for this instruction
106
** would be counterproductive. The solution is to add a barrier which
107
** prevents folding across PHIs, i.e. 'PHIBARRIER(fleft)' in this case.
108
** The only exception is for recurrences with high latencies like
109
** repeated int->num->int conversions.
110
**
111
** One could relax this condition a bit if the referenced instruction is
112
** a PHI, too. But this often leads to worse code due to excessive
113
** register shuffling.
114
**
115
** Note: returning *existing* instructions (e.g. LEFTFOLD) is ok, though.
116
** Even returning fleft->op1 would be ok, because a new PHI will added,
117
** if needed. But again, this leads to excessive register shuffling and
118
** should be avoided.
119
**
120
** Requirement #3: The set of all fold rules must be monotonic to guarantee
121
** termination.
122
**
123
** The goal is optimization, so one primarily wants to add strength-reducing
124
** rules. This means eliminating an instruction or replacing an instruction
125
** with one or more simpler instructions. Don't add fold rules which point
126
** into the other direction.
127
**
128
** Some rules (like commutativity) do not directly reduce the strength of
129
** an instruction, but enable other fold rules (e.g. by moving constants
130
** to the right operand). These rules must be made unidirectional to avoid
131
** cycles.
132
**
133
** Rule of thumb: the trace recorder expands the IR and FOLD shrinks it.
134
*/
135

136
/* Some local macros to save typing. Undef'd at the end. */
137
#define IR(ref)                (&J->cur.ir[(ref)])
138
#define fins                (&J->fold.ins)
139
#define fleft                (J->fold.left)
140
#define fright                (J->fold.right)
141
#define knumleft        (ir_knum(fleft)->n)
142
#define knumright        (ir_knum(fright)->n)
143

144
/* Pass IR on to next optimization in chain (FOLD). */
145
#define emitir(ot, a, b)        (lj_ir_set(J, (ot), (a), (b)), lj_opt_fold(J))
146

147
/* Fold function type. Fastcall on x86 significantly reduces their size. */
148
typedef IRRef (LJ_FASTCALL *FoldFunc)(jit_State *J);
149

150
/* Macros for the fold specs, so buildvm can recognize them. */
151
#define LJFOLD(x)
152
#define LJFOLDX(x)
153
#define LJFOLDF(name)        static TRef LJ_FASTCALL fold_##name(jit_State *J)
154
/* Note: They must be at the start of a line or buildvm ignores them! */
155

156
/* Barrier to prevent using operands across PHIs. */
157
#define PHIBARRIER(ir)        if (irt_isphi((ir)->t)) return NEXTFOLD
158

159
/* Barrier to prevent folding across a GC step.
160
** GC steps can only happen at the head of a trace and at LOOP.
161
** And the GC is only driven forward if there's at least one allocation.
162
*/
163
#define gcstep_barrier(J, ref) \
164
  ((ref) < J->chain[IR_LOOP] && \
165
   (J->chain[IR_SNEW] || J->chain[IR_XSNEW] || \
166
    J->chain[IR_TNEW] || J->chain[IR_TDUP] || \
167
    J->chain[IR_CNEW] || J->chain[IR_CNEWI] || \
168
    J->chain[IR_BUFSTR] || J->chain[IR_TOSTR] || J->chain[IR_CALLA]))
169

170
/* -- Constant folding for FP numbers ------------------------------------- */
171

172
LJFOLD(ADD KNUM KNUM)
173
LJFOLD(SUB KNUM KNUM)
174
LJFOLD(MUL KNUM KNUM)
175
LJFOLD(DIV KNUM KNUM)
176
LJFOLD(LDEXP KNUM KNUM)
177
LJFOLD(MIN KNUM KNUM)
178
LJFOLD(MAX KNUM KNUM)
179
LJFOLDF(kfold_numarith)
2,063✔
180
{
181
  lua_Number a = knumleft;
2,063✔
182
  lua_Number b = knumright;
2,063✔
183
  lua_Number y = lj_vm_foldarith(a, b, fins->o - IR_ADD);
2,063✔
184
  return lj_ir_knum(J, y);
2,063✔
185
}
186

187
LJFOLD(NEG KNUM FLOAD)
188
LJFOLD(ABS KNUM FLOAD)
189
LJFOLDF(kfold_numabsneg)
11✔
190
{
191
  lua_Number a = knumleft;
11✔
192
  lua_Number y = lj_vm_foldarith(a, a, fins->o - IR_ADD);
11✔
193
  return lj_ir_knum(J, y);
11✔
194
}
195

196
LJFOLD(LDEXP KNUM KINT)
197
LJFOLDF(kfold_ldexp)
×
198
{
199
#if LJ_TARGET_X86ORX64
200
  UNUSED(J);
×
201
  return NEXTFOLD;
×
202
#else
203
  return lj_ir_knum(J, ldexp(knumleft, fright->i));
204
#endif
205
}
206

207
LJFOLD(FPMATH KNUM any)
208
LJFOLDF(kfold_fpmath)
11✔
209
{
210
  lua_Number a = knumleft;
11✔
211
  lua_Number y = lj_vm_foldfpm(a, fins->op2);
11✔
212
  return lj_ir_knum(J, y);
11✔
213
}
214

215
LJFOLD(CALLN KNUM any)
216
LJFOLDF(kfold_fpcall1)
5✔
217
{
218
  const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
5✔
219
  if (CCI_TYPE(ci) == IRT_NUM) {
5✔
220
    double y = ((double (*)(double))ci->func)(knumleft);
5✔
221
    return lj_ir_knum(J, y);
5✔
222
  }
223
  return NEXTFOLD;
224
}
225

226
LJFOLD(CALLN CARG IRCALL_atan2)
227
LJFOLDF(kfold_fpcall2)
1✔
228
{
229
  if (irref_isk(fleft->op1) && irref_isk(fleft->op2)) {
1✔
230
    const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
1✔
231
    double a = ir_knum(IR(fleft->op1))->n;
1✔
232
    double b = ir_knum(IR(fleft->op2))->n;
1✔
233
    double y = ((double (*)(double, double))ci->func)(a, b);
1✔
234
    return lj_ir_knum(J, y);
1✔
235
  }
236
  return NEXTFOLD;
237
}
238

239
LJFOLD(POW KNUM KNUM)
240
LJFOLDF(kfold_numpow)
9,269✔
241
{
242
  return lj_ir_knum(J, lj_vm_foldarith(knumleft, knumright, IR_POW - IR_ADD));
9,269✔
243
}
244

245
/* Must not use kfold_kref for numbers (could be NaN). */
246
LJFOLD(EQ KNUM KNUM)
247
LJFOLD(NE KNUM KNUM)
248
LJFOLD(LT KNUM KNUM)
249
LJFOLD(GE KNUM KNUM)
250
LJFOLD(LE KNUM KNUM)
251
LJFOLD(GT KNUM KNUM)
252
LJFOLD(ULT KNUM KNUM)
253
LJFOLD(UGE KNUM KNUM)
254
LJFOLD(ULE KNUM KNUM)
255
LJFOLD(UGT KNUM KNUM)
256
LJFOLDF(kfold_numcomp)
6✔
257
{
258
  return CONDFOLD(lj_ir_numcmp(knumleft, knumright, (IROp)fins->o));
6✔
259
}
260

261
/* -- Constant folding for 32 bit integers -------------------------------- */
262

263
static int32_t kfold_intop(int32_t k1, int32_t k2, IROp op)
4,580✔
264
{
265
  switch (op) {
4,580✔
266
  case IR_ADD: k1 += k2; break;
1,992✔
267
  case IR_SUB: k1 -= k2; break;
48✔
268
  case IR_MUL: k1 *= k2; break;
×
269
  case IR_MOD: k1 = lj_vm_modi(k1, k2); break;
1✔
270
  case IR_NEG: k1 = -k1; break;
×
271
  case IR_BAND: k1 &= k2; break;
×
272
  case IR_BOR: k1 |= k2; break;
×
273
  case IR_BXOR: k1 ^= k2; break;
×
274
  case IR_BSHL: k1 <<= (k2 & 31); break;
7✔
275
  case IR_BSHR: k1 = (int32_t)((uint32_t)k1 >> (k2 & 31)); break;
11✔
276
  case IR_BSAR: k1 >>= (k2 & 31); break;
×
277
  case IR_BROL: k1 = (int32_t)lj_rol((uint32_t)k1, (k2 & 31)); break;
1✔
278
  case IR_BROR: k1 = (int32_t)lj_ror((uint32_t)k1, (k2 & 31)); break;
×
279
  case IR_MIN: k1 = k1 < k2 ? k1 : k2; break;
2✔
280
  case IR_MAX: k1 = k1 > k2 ? k1 : k2; break;
2,518✔
281
  default: lj_assertX(0, "bad IR op %d", op); break;
282
  }
283
  return k1;
4,580✔
284
}
285

286
LJFOLD(ADD KINT KINT)
287
LJFOLD(SUB KINT KINT)
288
LJFOLD(MUL KINT KINT)
289
LJFOLD(MOD KINT KINT)
290
LJFOLD(NEG KINT KINT)
291
LJFOLD(BAND KINT KINT)
292
LJFOLD(BOR KINT KINT)
293
LJFOLD(BXOR KINT KINT)
294
LJFOLD(BSHL KINT KINT)
295
LJFOLD(BSHR KINT KINT)
296
LJFOLD(BSAR KINT KINT)
297
LJFOLD(BROL KINT KINT)
298
LJFOLD(BROR KINT KINT)
299
LJFOLD(MIN KINT KINT)
300
LJFOLD(MAX KINT KINT)
301
LJFOLDF(kfold_intarith)
2,951✔
302
{
303
  return INTFOLD(kfold_intop(fleft->i, fright->i, (IROp)fins->o));
2,951✔
304
}
305

306
LJFOLD(ADDOV KINT KINT)
307
LJFOLD(SUBOV KINT KINT)
308
LJFOLD(MULOV KINT KINT)
309
LJFOLDF(kfold_intovarith)
12,321✔
310
{
311
  lua_Number n = lj_vm_foldarith((lua_Number)fleft->i, (lua_Number)fright->i,
24,642✔
312
                                 fins->o - IR_ADDOV);
12,321✔
313
  int32_t k = lj_num2int(n);
12,321✔
314
  if (n != (lua_Number)k)
12,321✔
315
    return FAILFOLD;
316
  return INTFOLD(k);
12,321✔
317
}
318

319
LJFOLD(BNOT KINT)
320
LJFOLDF(kfold_bnot)
×
321
{
322
  return INTFOLD(~fleft->i);
×
323
}
324

325
LJFOLD(BSWAP KINT)
326
LJFOLDF(kfold_bswap)
×
327
{
328
  return INTFOLD((int32_t)lj_bswap((uint32_t)fleft->i));
×
329
}
330

331
LJFOLD(LT KINT KINT)
332
LJFOLD(GE KINT KINT)
333
LJFOLD(LE KINT KINT)
334
LJFOLD(GT KINT KINT)
335
LJFOLD(ULT KINT KINT)
336
LJFOLD(UGE KINT KINT)
337
LJFOLD(ULE KINT KINT)
338
LJFOLD(UGT KINT KINT)
339
LJFOLD(ABC KINT KINT)
340
LJFOLDF(kfold_intcomp)
798✔
341
{
342
  int32_t a = fleft->i, b = fright->i;
798✔
343
  switch ((IROp)fins->o) {
798✔
344
  case IR_LT: return CONDFOLD(a < b);
58✔
345
  case IR_GE: return CONDFOLD(a >= b);
159✔
346
  case IR_LE: return CONDFOLD(a <= b);
226✔
347
  case IR_GT: return CONDFOLD(a > b);
14✔
348
  case IR_ULT: return CONDFOLD((uint32_t)a < (uint32_t)b);
×
349
  case IR_UGE: return CONDFOLD((uint32_t)a >= (uint32_t)b);
×
350
  case IR_ULE: return CONDFOLD((uint32_t)a <= (uint32_t)b);
40✔
351
  case IR_ABC:
301✔
352
  case IR_UGT: return CONDFOLD((uint32_t)a > (uint32_t)b);
301✔
353
  default: lj_assertJ(0, "bad IR op %d", fins->o); return FAILFOLD;
354
  }
355
}
356

357
LJFOLD(UGE any KINT)
358
LJFOLDF(kfold_intcomp0)
55✔
359
{
360
  if (fright->i == 0)
55✔
361
    return DROPFOLD;
31✔
362
  return NEXTFOLD;
363
}
364

365
/* -- Constant folding for 64 bit integers -------------------------------- */
366

367
static uint64_t kfold_int64arith(jit_State *J, uint64_t k1, uint64_t k2,
368
                                 IROp op)
369
{
370
  UNUSED(J);
371
#if LJ_HASFFI
372
  switch (op) {
373
  case IR_ADD: k1 += k2; break;
374
  case IR_SUB: k1 -= k2; break;
375
  case IR_MUL: k1 *= k2; break;
376
  case IR_BAND: k1 &= k2; break;
377
  case IR_BOR: k1 |= k2; break;
378
  case IR_BXOR: k1 ^= k2; break;
379
  case IR_BSHL: k1 <<= (k2 & 63); break;
380
  case IR_BSHR: k1 = (int32_t)((uint32_t)k1 >> (k2 & 63)); break;
381
  case IR_BSAR: k1 >>= (k2 & 63); break;
382
  case IR_BROL: k1 = (int32_t)lj_rol((uint32_t)k1, (k2 & 63)); break;
383
  case IR_BROR: k1 = (int32_t)lj_ror((uint32_t)k1, (k2 & 63)); break;
384
  default: lj_assertJ(0, "bad IR op %d", op); break;
385
  }
386
#else
387
  UNUSED(k2); UNUSED(op);
388
  lj_assertJ(0, "FFI IR op without FFI");
389
#endif
390
  return k1;
391
}
392

393
LJFOLD(ADD KINT64 KINT64)
394
LJFOLD(SUB KINT64 KINT64)
395
LJFOLD(MUL KINT64 KINT64)
396
LJFOLD(BAND KINT64 KINT64)
397
LJFOLD(BOR KINT64 KINT64)
398
LJFOLD(BXOR KINT64 KINT64)
399
LJFOLDF(kfold_int64arith)
59✔
400
{
401
  return INT64FOLD(kfold_int64arith(J, ir_k64(fleft)->u64,
59✔
402
                                    ir_k64(fright)->u64, (IROp)fins->o));
403
}
404

405
LJFOLD(DIV KINT64 KINT64)
406
LJFOLD(MOD KINT64 KINT64)
407
LJFOLD(POW KINT64 KINT64)
408
LJFOLDF(kfold_int64arith2)
×
409
{
410
#if LJ_HASFFI
411
  uint64_t k1 = ir_k64(fleft)->u64, k2 = ir_k64(fright)->u64;
×
412
  if (irt_isi64(fins->t)) {
×
413
    k1 = fins->o == IR_DIV ? lj_carith_divi64((int64_t)k1, (int64_t)k2) :
×
414
         fins->o == IR_MOD ? lj_carith_modi64((int64_t)k1, (int64_t)k2) :
×
415
                             lj_carith_powi64((int64_t)k1, (int64_t)k2);
×
416
  } else {
417
    k1 = fins->o == IR_DIV ? lj_carith_divu64(k1, k2) :
×
418
         fins->o == IR_MOD ? lj_carith_modu64(k1, k2) :
×
419
                             lj_carith_powu64(k1, k2);
×
420
  }
421
  return INT64FOLD(k1);
×
422
#else
423
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
424
#endif
425
}
426

427
LJFOLD(BSHL KINT64 KINT)
428
LJFOLD(BSHR KINT64 KINT)
429
LJFOLD(BSAR KINT64 KINT)
430
LJFOLD(BROL KINT64 KINT)
431
LJFOLD(BROR KINT64 KINT)
432
LJFOLDF(kfold_int64shift)
×
433
{
434
#if LJ_HASFFI
435
  uint64_t k = ir_k64(fleft)->u64;
×
436
  int32_t sh = (fright->i & 63);
×
437
  return INT64FOLD(lj_carith_shift64(k, sh, fins->o - IR_BSHL));
×
438
#else
439
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
440
#endif
441
}
442

443
LJFOLD(BNOT KINT64)
444
LJFOLDF(kfold_bnot64)
×
445
{
446
#if LJ_HASFFI
447
  return INT64FOLD(~ir_k64(fleft)->u64);
×
448
#else
449
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
450
#endif
451
}
452

453
LJFOLD(BSWAP KINT64)
454
LJFOLDF(kfold_bswap64)
×
455
{
456
#if LJ_HASFFI
457
  return INT64FOLD(lj_bswap64(ir_k64(fleft)->u64));
×
458
#else
459
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
460
#endif
461
}
462

463
LJFOLD(LT KINT64 KINT64)
464
LJFOLD(GE KINT64 KINT64)
465
LJFOLD(LE KINT64 KINT64)
466
LJFOLD(GT KINT64 KINT64)
467
LJFOLD(ULT KINT64 KINT64)
468
LJFOLD(UGE KINT64 KINT64)
469
LJFOLD(ULE KINT64 KINT64)
470
LJFOLD(UGT KINT64 KINT64)
471
LJFOLDF(kfold_int64comp)
×
472
{
473
#if LJ_HASFFI
474
  uint64_t a = ir_k64(fleft)->u64, b = ir_k64(fright)->u64;
×
475
  switch ((IROp)fins->o) {
×
476
  case IR_LT: return CONDFOLD((int64_t)a < (int64_t)b);
×
477
  case IR_GE: return CONDFOLD((int64_t)a >= (int64_t)b);
×
478
  case IR_LE: return CONDFOLD((int64_t)a <= (int64_t)b);
×
479
  case IR_GT: return CONDFOLD((int64_t)a > (int64_t)b);
×
480
  case IR_ULT: return CONDFOLD(a < b);
×
481
  case IR_UGE: return CONDFOLD(a >= b);
×
482
  case IR_ULE: return CONDFOLD(a <= b);
×
483
  case IR_UGT: return CONDFOLD(a > b);
×
484
  default: lj_assertJ(0, "bad IR op %d", fins->o); return FAILFOLD;
485
  }
486
#else
487
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
488
#endif
489
}
490

491
LJFOLD(UGE any KINT64)
492
LJFOLDF(kfold_int64comp0)
20,037✔
493
{
494
#if LJ_HASFFI
495
  if (ir_k64(fright)->u64 == 0)
20,037✔
496
    return DROPFOLD;
×
497
  return NEXTFOLD;
498
#else
499
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
500
#endif
501
}
502

503
/* -- Constant folding for strings ---------------------------------------- */
504

505
LJFOLD(SNEW KKPTR KINT)
506
LJFOLDF(kfold_snew_kptr)
5✔
507
{
508
  GCstr *s = lj_str_new(J->L, (const char *)ir_kptr(fleft), (size_t)fright->i);
5✔
509
  return lj_ir_kstr(J, s);
5✔
510
}
511

512
LJFOLD(SNEW any KINT)
513
LJFOLDF(kfold_snew_empty)
56✔
514
{
515
  if (fright->i == 0)
56✔
516
    return lj_ir_kstr(J, &J2G(J)->strempty);
×
517
  return NEXTFOLD;
518
}
519

520
LJFOLD(STRREF KGC KINT)
521
LJFOLDF(kfold_strref)
101✔
522
{
523
  GCstr *str = ir_kstr(fleft);
101✔
524
  lj_assertJ((MSize)fright->i <= str->len, "bad string ref");
101✔
525
  return lj_ir_kkptr(J, (char *)strdata(str) + fright->i);
101✔
526
}
527

528
LJFOLD(STRREF SNEW any)
529
LJFOLDF(kfold_strref_snew)
3✔
530
{
531
  PHIBARRIER(fleft);
3✔
532
  if (irref_isk(fins->op2) && fright->i == 0) {
3✔
533
    return fleft->op1;  /* strref(snew(ptr, len), 0) ==> ptr */
2✔
534
  } else {
535
    /* Reassociate: strref(snew(strref(str, a), len), b) ==> strref(str, a+b) */
536
    IRIns *ir = IR(fleft->op1);
1✔
537
    if (ir->o == IR_STRREF) {
1✔
538
      IRRef1 str = ir->op1;  /* IRIns * is not valid across emitir. */
×
539
      PHIBARRIER(ir);
×
540
      fins->op2 = emitir(IRTI(IR_ADD), ir->op2, fins->op2); /* Clobbers fins! */
×
541
      fins->op1 = str;
×
542
      fins->ot = IRT(IR_STRREF, IRT_PGC);
×
543
      return RETRYFOLD;
×
544
    }
545
  }
546
  return NEXTFOLD;
547
}
548

549
LJFOLD(CALLN CARG IRCALL_lj_str_cmp)
550
LJFOLDF(kfold_strcmp)
2✔
551
{
552
  if (irref_isk(fleft->op1) && irref_isk(fleft->op2)) {
2✔
553
    GCstr *a = ir_kstr(IR(fleft->op1));
×
554
    GCstr *b = ir_kstr(IR(fleft->op2));
×
555
    return INTFOLD(lj_str_cmp(a, b));
×
556
  }
557
  return NEXTFOLD;
558
}
559

560
/* -- Constant folding and forwarding for buffers ------------------------- */
561

562
/*
563
** Buffer ops perform stores, but their effect is limited to the buffer
564
** itself. Also, buffer ops are chained: a use of an op implies a use of
565
** all other ops up the chain. Conversely, if an op is unused, all ops
566
** up the chain can go unsed. This largely eliminates the need to treat
567
** them as stores.
568
**
569
** Alas, treating them as normal (IRM_N) ops doesn't work, because they
570
** cannot be CSEd in isolation. CSE for IRM_N is implicitly done in LOOP
571
** or if FOLD is disabled.
572
**
573
** The compromise is to declare them as loads, emit them like stores and
574
** CSE whole chains manually when the BUFSTR is to be emitted. Any chain
575
** fragments left over from CSE are eliminated by DCE.
576
*/
577

578
/* BUFHDR is emitted like a store, see below. */
579

580
LJFOLD(BUFPUT BUFHDR BUFSTR)
581
LJFOLDF(bufput_append)
45✔
582
{
583
  /* New buffer, no other buffer op inbetween and same buffer? */
584
  if ((J->flags & JIT_F_OPT_FWD) &&
45✔
585
      !(fleft->op2 & IRBUFHDR_APPEND) &&
45✔
586
      fleft->prev == fright->op2 &&
45✔
587
      fleft->op1 == IR(fright->op2)->op1 &&
32✔
588
      !(irt_isphi(fright->t) && IR(fright->op2)->prev)) {
30✔
589
    IRRef ref = fins->op1;
29✔
590
    IR(ref)->op2 = (fleft->op2 | IRBUFHDR_APPEND);  /* Modify BUFHDR. */
29✔
591
    IR(ref)->op1 = fright->op1;
29✔
592
    return ref;
29✔
593
  }
594
  return EMITFOLD;  /* Always emit, CSE later. */
16✔
595
}
596

597
LJFOLD(BUFPUT any any)
598
LJFOLDF(bufput_kgc)
1,661✔
599
{
600
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && fright->o == IR_KGC) {
1,661✔
601
    GCstr *s2 = ir_kstr(fright);
833✔
602
    if (s2->len == 0) {  /* Empty string? */
833✔
603
      return LEFTFOLD;
15✔
604
    } else {
605
      if (fleft->o == IR_BUFPUT && irref_isk(fleft->op2) &&
818✔
606
          !irt_isphi(fleft->t)) {  /* Join two constant string puts in a row. */
134✔
607
        GCstr *s1 = ir_kstr(IR(fleft->op2));
134✔
608
        IRRef kref = lj_ir_kstr(J, lj_buf_cat2str(J->L, s1, s2));
134✔
609
        /* lj_ir_kstr() may realloc the IR and invalidates any IRIns *. */
610
        IR(fins->op1)->op2 = kref;  /* Modify previous BUFPUT. */
134✔
611
        return fins->op1;
134✔
612
      }
613
    }
614
  }
615
  return EMITFOLD;  /* Always emit, CSE later. */
1,512✔
616
}
617

618
LJFOLD(BUFSTR any any)
619
LJFOLDF(bufstr_kfold_cse)
962✔
620
{
621
  lj_assertJ(fleft->o == IR_BUFHDR || fleft->o == IR_BUFPUT ||
962✔
622
             fleft->o == IR_CALLL,
623
             "bad buffer constructor IR op %d", fleft->o);
624
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
962✔
625
    if (fleft->o == IR_BUFHDR) {  /* No put operations? */
951✔
626
      if (!(fleft->op2 & IRBUFHDR_APPEND))  /* Empty buffer? */
1✔
627
        return lj_ir_kstr(J, &J2G(J)->strempty);
×
628
      fins->op1 = fleft->op1;
1✔
629
      fins->op2 = fleft->prev;  /* Relies on checks in bufput_append. */
1✔
630
      return CSEFOLD;
1✔
631
    } else if (fleft->o == IR_BUFPUT) {
950✔
632
      IRIns *irb = IR(fleft->op1);
670✔
633
      if (irb->o == IR_BUFHDR && !(irb->op2 & IRBUFHDR_APPEND))
670✔
634
        return fleft->op2;  /* Shortcut for a single put operation. */
112✔
635
    }
636
  }
637
  /* Try to CSE the whole chain. */
638
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
849✔
639
    IRRef ref = J->chain[IR_BUFSTR];
838✔
640
    while (ref) {
3,772✔
641
      IRIns *irs = IR(ref), *ira = fleft, *irb = IR(irs->op1);
2,981✔
642
      while (ira->o == irb->o && ira->op2 == irb->op2) {
3,701✔
643
        lj_assertJ(ira->o == IR_BUFHDR || ira->o == IR_BUFPUT ||
778✔
644
                   ira->o == IR_CALLL || ira->o == IR_CARG,
645
                   "bad buffer constructor IR op %d", ira->o);
646
        if (ira->o == IR_BUFHDR && !(ira->op2 & IRBUFHDR_APPEND))
778✔
647
          return ref;  /* CSE succeeded. */
47✔
648
        if (ira->o == IR_CALLL && ira->op2 == IRCALL_lj_buf_puttab)
731✔
649
          break;
650
        ira = IR(ira->op1);
720✔
651
        irb = IR(irb->op1);
720✔
652
      }
653
      ref = irs->prev;
2,934✔
654
    }
655
  }
656
  return EMITFOLD;  /* No CSE possible. */
802✔
657
}
658

659
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_reverse)
660
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_upper)
661
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_lower)
662
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putquoted)
663
LJFOLDF(bufput_kfold_op)
31✔
664
{
665
  if (irref_isk(fleft->op2)) {
31✔
666
    const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
3✔
667
    SBuf *sb = lj_buf_tmp_(J->L);
3✔
668
    sb = ((SBuf * (LJ_FASTCALL *)(SBuf *, GCstr *))ci->func)(sb,
6✔
669
                                                       ir_kstr(IR(fleft->op2)));
3✔
670
    fins->o = IR_BUFPUT;
3✔
671
    fins->op1 = fleft->op1;
3✔
672
    fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
3✔
673
    return RETRYFOLD;
3✔
674
  }
675
  return EMITFOLD;  /* Always emit, CSE later. */
28✔
676
}
677

678
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_rep)
679
LJFOLDF(bufput_kfold_rep)
73✔
680
{
681
  if (irref_isk(fleft->op2)) {
73✔
682
    IRIns *irc = IR(fleft->op1);
6✔
683
    if (irref_isk(irc->op2)) {
6✔
684
      SBuf *sb = lj_buf_tmp_(J->L);
4✔
685
      sb = lj_buf_putstr_rep(sb, ir_kstr(IR(irc->op2)), IR(fleft->op2)->i);
4✔
686
      fins->o = IR_BUFPUT;
4✔
687
      fins->op1 = irc->op1;
4✔
688
      fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
4✔
689
      return RETRYFOLD;
4✔
690
    }
691
  }
692
  return EMITFOLD;  /* Always emit, CSE later. */
69✔
693
}
694

695
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfxint)
696
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum_int)
697
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum_uint)
698
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum)
699
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfstr)
700
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfchar)
701
LJFOLDF(bufput_kfold_fmt)
266✔
702
{
703
  IRIns *irc = IR(fleft->op1);
266✔
704
  lj_assertJ(irref_isk(irc->op2), "SFormat must be const");
266✔
705
  if (irref_isk(fleft->op2)) {
266✔
706
    SFormat sf = (SFormat)IR(irc->op2)->i;
69✔
707
    IRIns *ira = IR(fleft->op2);
69✔
708
    SBuf *sb = lj_buf_tmp_(J->L);
69✔
709
    switch (fins->op2) {
69✔
710
    case IRCALL_lj_strfmt_putfxint:
×
711
      sb = lj_strfmt_putfxint(sb, sf, ir_k64(ira)->u64);
×
712
      break;
×
713
    case IRCALL_lj_strfmt_putfstr:
×
714
      sb = lj_strfmt_putfstr(sb, sf, ir_kstr(ira));
×
715
      break;
×
716
    case IRCALL_lj_strfmt_putfchar:
×
717
      sb = lj_strfmt_putfchar(sb, sf, ira->i);
×
718
      break;
×
719
    case IRCALL_lj_strfmt_putfnum_int:
69✔
720
    case IRCALL_lj_strfmt_putfnum_uint:
721
    case IRCALL_lj_strfmt_putfnum:
722
    default: {
723
      const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
69✔
724
      sb = ((SBuf * (*)(SBuf *, SFormat, lua_Number))ci->func)(sb, sf,
69✔
725
                                                         ir_knum(ira)->n);
726
      break;
69✔
727
      }
728
    }
729
    fins->o = IR_BUFPUT;
69✔
730
    fins->op1 = irc->op1;
69✔
731
    fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
69✔
732
    return RETRYFOLD;
69✔
733
  }
734
  return EMITFOLD;  /* Always emit, CSE later. */
197✔
735
}
736

737
/* -- Constant folding of pointer arithmetic ------------------------------ */
738

739
LJFOLD(ADD KGC KINT)
740
LJFOLD(ADD KGC KINT64)
741
LJFOLDF(kfold_add_kgc)
36✔
742
{
743
  GCobj *o = ir_kgc(fleft);
36✔
744
#if LJ_64
745
  ptrdiff_t ofs = (ptrdiff_t)ir_kint64(fright)->u64;
36✔
746
#else
747
  ptrdiff_t ofs = fright->i;
748
#endif
749
#if LJ_HASFFI
750
  if (irt_iscdata(fleft->t)) {
36✔
751
    CType *ct = ctype_raw(ctype_ctsG(J2G(J)), gco2cd(o)->ctypeid);
31✔
752
    if (ctype_isnum(ct->info) || ctype_isenum(ct->info) ||
31✔
753
        ctype_isptr(ct->info) || ctype_isfunc(ct->info) ||
31✔
754
        ctype_iscomplex(ct->info) || ctype_isvector(ct->info))
31✔
755
      return lj_ir_kkptr(J, (char *)o + ofs);
1✔
756
  }
757
#endif
758
  return lj_ir_kptr(J, (char *)o + ofs);
35✔
759
}
760

761
LJFOLD(ADD KPTR KINT)
762
LJFOLD(ADD KPTR KINT64)
763
LJFOLD(ADD KKPTR KINT)
764
LJFOLD(ADD KKPTR KINT64)
765
LJFOLDF(kfold_add_kptr)
24✔
766
{
767
  void *p = ir_kptr(fleft);
24✔
768
#if LJ_64
769
  ptrdiff_t ofs = (ptrdiff_t)ir_kint64(fright)->u64;
24✔
770
#else
771
  ptrdiff_t ofs = fright->i;
772
#endif
773
  return lj_ir_kptr_(J, fleft->o, (char *)p + ofs);
24✔
774
}
775

776
LJFOLD(ADD any KGC)
777
LJFOLD(ADD any KPTR)
778
LJFOLD(ADD any KKPTR)
779
LJFOLDF(kfold_add_kright)
22✔
780
{
781
  if (fleft->o == IR_KINT || fleft->o == IR_KINT64) {
22✔
782
    IRRef1 tmp = fins->op1; fins->op1 = fins->op2; fins->op2 = tmp;
22✔
783
    return RETRYFOLD;
22✔
784
  }
785
  return NEXTFOLD;
786
}
787

788
/* -- Constant folding of conversions ------------------------------------- */
789

790
LJFOLD(TOBIT KNUM KNUM)
791
LJFOLDF(kfold_tobit)
2✔
792
{
793
  return INTFOLD(lj_num2bit(knumleft));
2✔
794
}
795

796
LJFOLD(CONV KINT IRCONV_NUM_INT)
797
LJFOLDF(kfold_conv_kint_num)
343,102✔
798
{
799
  return lj_ir_knum(J, (lua_Number)fleft->i);
343,102✔
800
}
801

802
LJFOLD(CONV KINT IRCONV_NUM_U32)
803
LJFOLDF(kfold_conv_kintu32_num)
×
804
{
805
  return lj_ir_knum(J, (lua_Number)(uint32_t)fleft->i);
×
806
}
807

808
LJFOLD(CONV KINT IRCONV_INT_I8)
809
LJFOLD(CONV KINT IRCONV_INT_U8)
810
LJFOLD(CONV KINT IRCONV_INT_I16)
811
LJFOLD(CONV KINT IRCONV_INT_U16)
812
LJFOLDF(kfold_conv_kint_ext)
3✔
813
{
814
  int32_t k = fleft->i;
3✔
815
  if ((fins->op2 & IRCONV_SRCMASK) == IRT_I8) k = (int8_t)k;
3✔
816
  else if ((fins->op2 & IRCONV_SRCMASK) == IRT_U8) k = (uint8_t)k;
3✔
817
  else if ((fins->op2 & IRCONV_SRCMASK) == IRT_I16) k = (int16_t)k;
×
818
  else k = (uint16_t)k;
×
819
  return INTFOLD(k);
3✔
820
}
821

822
LJFOLD(CONV KINT IRCONV_I64_INT)
823
LJFOLD(CONV KINT IRCONV_U64_INT)
824
LJFOLD(CONV KINT IRCONV_I64_U32)
825
LJFOLD(CONV KINT IRCONV_U64_U32)
826
LJFOLDF(kfold_conv_kint_i64)
31,041✔
827
{
828
  if ((fins->op2 & IRCONV_SEXT))
31,041✔
829
    return INT64FOLD((uint64_t)(int64_t)fleft->i);
31,038✔
830
  else
831
    return INT64FOLD((uint64_t)(int64_t)(uint32_t)fleft->i);
3✔
832
}
833

834
LJFOLD(CONV KINT64 IRCONV_NUM_I64)
835
LJFOLDF(kfold_conv_kint64_num_i64)
×
836
{
837
  return lj_ir_knum(J, (lua_Number)(int64_t)ir_kint64(fleft)->u64);
×
838
}
839

840
LJFOLD(CONV KINT64 IRCONV_NUM_U64)
841
LJFOLDF(kfold_conv_kint64_num_u64)
×
842
{
843
  return lj_ir_knum(J, (lua_Number)ir_kint64(fleft)->u64);
×
844
}
845

846
LJFOLD(CONV KINT64 IRCONV_INT_I64)
847
LJFOLD(CONV KINT64 IRCONV_U32_I64)
848
LJFOLDF(kfold_conv_kint64_int_i64)
×
849
{
850
  return INTFOLD((int32_t)ir_kint64(fleft)->u64);
×
851
}
852

853
LJFOLD(CONV KNUM IRCONV_INT_NUM)
854
LJFOLDF(kfold_conv_knum_int_num)
6✔
855
{
856
  lua_Number n = knumleft;
6✔
857
  int32_t k = lj_num2int(n);
6✔
858
  if (irt_isguard(fins->t) && n != (lua_Number)k) {
6✔
859
    /* We're about to create a guard which always fails, like CONV +1.5.
860
    ** Some pathological loops cause this during LICM, e.g.:
861
    **   local x,k,t = 0,1.5,{1,[1.5]=2}
862
    **   for i=1,200 do x = x+ t[k]; k = k == 1 and 1.5 or 1 end
863
    **   assert(x == 300)
864
    */
865
    return FAILFOLD;
866
  }
867
  return INTFOLD(k);
6✔
868
}
869

870
LJFOLD(CONV KNUM IRCONV_U32_NUM)
871
LJFOLDF(kfold_conv_knum_u32_num)
2✔
872
{
873
#ifdef _MSC_VER
874
  {  /* Workaround for MSVC bug. */
875
    volatile uint32_t u = (uint32_t)knumleft;
876
    return INTFOLD((int32_t)u);
877
  }
878
#else
879
  return INTFOLD((int32_t)(uint32_t)knumleft);
2✔
880
#endif
881
}
882

883
LJFOLD(CONV KNUM IRCONV_I64_NUM)
884
LJFOLDF(kfold_conv_knum_i64_num)
8✔
885
{
886
  return INT64FOLD((uint64_t)(int64_t)knumleft);
8✔
887
}
888

889
LJFOLD(CONV KNUM IRCONV_U64_NUM)
890
LJFOLDF(kfold_conv_knum_u64_num)
9,268✔
891
{
892
  return INT64FOLD(lj_num2u64(knumleft));
9,268✔
893
}
894

895
LJFOLD(TOSTR KNUM any)
896
LJFOLDF(kfold_tostr_knum)
1✔
897
{
898
  return lj_ir_kstr(J, lj_strfmt_num(J->L, ir_knum(fleft)));
1✔
899
}
900

901
LJFOLD(TOSTR KINT any)
902
LJFOLDF(kfold_tostr_kint)
9✔
903
{
904
  return lj_ir_kstr(J, fins->op2 == IRTOSTR_INT ?
9✔
905
                       lj_strfmt_int(J->L, fleft->i) :
906
                       lj_strfmt_char(J->L, fleft->i));
907
}
908

909
LJFOLD(STRTO KGC)
910
LJFOLDF(kfold_strto)
35✔
911
{
912
  TValue n;
35✔
913
  if (lj_strscan_num(ir_kstr(fleft), &n))
35✔
914
    return lj_ir_knum(J, numV(&n));
35✔
915
  return FAILFOLD;
916
}
917

918
/* -- Constant folding of equality checks --------------------------------- */
919

920
/* Don't constant-fold away FLOAD checks against KNULL. */
921
LJFOLD(EQ FLOAD KNULL)
922
LJFOLD(NE FLOAD KNULL)
923
LJFOLDX(lj_opt_cse)
924

925
/* But fold all other KNULL compares, since only KNULL is equal to KNULL. */
926
LJFOLD(EQ any KNULL)
927
LJFOLD(NE any KNULL)
928
LJFOLD(EQ KNULL any)
929
LJFOLD(NE KNULL any)
930
LJFOLD(EQ KINT KINT)  /* Constants are unique, so same refs <==> same value. */
931
LJFOLD(NE KINT KINT)
932
LJFOLD(EQ KINT64 KINT64)
933
LJFOLD(NE KINT64 KINT64)
934
LJFOLD(EQ KGC KGC)
935
LJFOLD(NE KGC KGC)
936
LJFOLDF(kfold_kref)
1,039,282✔
937
{
938
  return CONDFOLD((fins->op1 == fins->op2) ^ (fins->o == IR_NE));
1,039,282✔
939
}
940

941
/* -- Algebraic shortcuts ------------------------------------------------- */
942

943
LJFOLD(FPMATH FPMATH IRFPM_FLOOR)
944
LJFOLD(FPMATH FPMATH IRFPM_CEIL)
945
LJFOLD(FPMATH FPMATH IRFPM_TRUNC)
946
LJFOLDF(shortcut_round)
×
947
{
948
  IRFPMathOp op = (IRFPMathOp)fleft->op2;
×
949
  if (op == IRFPM_FLOOR || op == IRFPM_CEIL || op == IRFPM_TRUNC)
×
950
    return LEFTFOLD;  /* round(round_left(x)) = round_left(x) */
×
951
  return NEXTFOLD;
952
}
953

954
LJFOLD(ABS ABS FLOAD)
955
LJFOLDF(shortcut_left)
×
956
{
957
  return LEFTFOLD;  /* f(g(x)) ==> g(x) */
×
958
}
959

960
LJFOLD(ABS NEG FLOAD)
961
LJFOLDF(shortcut_dropleft)
×
962
{
963
  PHIBARRIER(fleft);
×
964
  fins->op1 = fleft->op1;  /* abs(neg(x)) ==> abs(x) */
×
965
  return RETRYFOLD;
×
966
}
967

968
/* Note: no safe shortcuts with STRTO and TOSTR ("1e2" ==> +100 ==> "100"). */
969
LJFOLD(NEG NEG any)
970
LJFOLD(BNOT BNOT)
971
LJFOLD(BSWAP BSWAP)
972
LJFOLDF(shortcut_leftleft)
×
973
{
974
  PHIBARRIER(fleft);  /* See above. Fold would be ok, but not beneficial. */
×
975
  return fleft->op1;  /* f(g(x)) ==> x */
×
976
}
977

978
/* -- FP algebraic simplifications ---------------------------------------- */
979

980
/* FP arithmetic is tricky -- there's not much to simplify.
981
** Please note the following common pitfalls before sending "improvements":
982
**   x+0 ==> x  is INVALID for x=-0
983
**   0-x ==> -x is INVALID for x=+0
984
**   x*0 ==> 0  is INVALID for x=-0, x=+-Inf or x=NaN
985
*/
986

987
LJFOLD(ADD NEG any)
988
LJFOLDF(simplify_numadd_negx)
×
989
{
990
  PHIBARRIER(fleft);
×
991
  fins->o = IR_SUB;  /* (-a) + b ==> b - a */
×
992
  fins->op1 = fins->op2;
×
993
  fins->op2 = fleft->op1;
×
994
  return RETRYFOLD;
×
995
}
996

997
LJFOLD(ADD any NEG)
998
LJFOLDF(simplify_numadd_xneg)
×
999
{
1000
  PHIBARRIER(fright);
×
1001
  fins->o = IR_SUB;  /* a + (-b) ==> a - b */
×
1002
  fins->op2 = fright->op1;
×
1003
  return RETRYFOLD;
×
1004
}
1005

1006
LJFOLD(SUB any KNUM)
1007
LJFOLDF(simplify_numsub_k)
1,973✔
1008
{
1009
  if (ir_knum(fright)->u64 == 0)  /* x - (+0) ==> x */
1,973✔
1010
    return LEFTFOLD;
1,343✔
1011
  return NEXTFOLD;
1012
}
1013

1014
LJFOLD(SUB NEG KNUM)
1015
LJFOLDF(simplify_numsub_negk)
×
1016
{
1017
  PHIBARRIER(fleft);
×
1018
  fins->op2 = fleft->op1;  /* (-x) - k ==> (-k) - x */
×
1019
  fins->op1 = (IRRef1)lj_ir_knum(J, -knumright);
×
1020
  return RETRYFOLD;
×
1021
}
1022

1023
LJFOLD(SUB any NEG)
1024
LJFOLDF(simplify_numsub_xneg)
×
1025
{
1026
  PHIBARRIER(fright);
×
1027
  fins->o = IR_ADD;  /* a - (-b) ==> a + b */
×
1028
  fins->op2 = fright->op1;
×
1029
  return RETRYFOLD;
×
1030
}
1031

1032
LJFOLD(MUL any KNUM)
1033
LJFOLD(DIV any KNUM)
1034
LJFOLDF(simplify_nummuldiv_k)
259✔
1035
{
1036
  lua_Number n = knumright;
259✔
1037
  if (n == 1.0) {  /* x o 1 ==> x */
259✔
1038
    return LEFTFOLD;
×
1039
  } else if (n == -1.0) {  /* x o -1 ==> -x */
259✔
1040
    IRRef op1 = fins->op1;
×
1041
    fins->op2 = (IRRef1)lj_ir_ksimd(J, LJ_KSIMD_NEG);  /* Modifies fins. */
×
1042
    fins->op1 = op1;
×
1043
    fins->o = IR_NEG;
×
1044
    return RETRYFOLD;
×
1045
  } else if (fins->o == IR_MUL && n == 2.0) {  /* x * 2 ==> x + x */
259✔
1046
    fins->o = IR_ADD;
2✔
1047
    fins->op2 = fins->op1;
2✔
1048
    return RETRYFOLD;
2✔
1049
  } else if (fins->o == IR_DIV) {  /* x / 2^k ==> x * 2^-k */
257✔
1050
    uint64_t u = ir_knum(fright)->u64;
132✔
1051
    uint32_t ex = ((uint32_t)(u >> 52) & 0x7ff);
132✔
1052
    if ((u & U64x(000fffff,ffffffff)) == 0 && ex - 1 < 0x7fd) {
132✔
1053
      u = (u & ((uint64_t)1 << 63)) | ((uint64_t)(0x7fe - ex) << 52);
9✔
1054
      fins->o = IR_MUL;  /* Multiply by exact reciprocal. */
9✔
1055
      fins->op2 = lj_ir_knum_u64(J, u);
9✔
1056
      return RETRYFOLD;
9✔
1057
    }
1058
  }
1059
  return NEXTFOLD;
1060
}
1061

1062
LJFOLD(MUL NEG KNUM)
1063
LJFOLD(DIV NEG KNUM)
1064
LJFOLDF(simplify_nummuldiv_negk)
×
1065
{
1066
  PHIBARRIER(fleft);
×
1067
  fins->op1 = fleft->op1;  /* (-a) o k ==> a o (-k) */
×
1068
  fins->op2 = (IRRef1)lj_ir_knum(J, -knumright);
×
1069
  return RETRYFOLD;
×
1070
}
1071

1072
LJFOLD(MUL NEG NEG)
1073
LJFOLD(DIV NEG NEG)
1074
LJFOLDF(simplify_nummuldiv_negneg)
×
1075
{
1076
  PHIBARRIER(fleft);
×
1077
  PHIBARRIER(fright);
×
1078
  fins->op1 = fleft->op1;  /* (-a) o (-b) ==> a o b */
×
1079
  fins->op2 = fright->op1;
×
1080
  return RETRYFOLD;
×
1081
}
1082

1083
LJFOLD(POW any KNUM)
1084
LJFOLDF(simplify_numpow_k)
4✔
1085
{
1086
  if (knumright == 0)  /* x ^ 0 ==> 1 */
4✔
1087
    return lj_ir_knum_one(J);  /* Result must be a number, not an int. */
×
1088
  else if (knumright == 1)  /* x ^ 1 ==> x */
4✔
1089
    return LEFTFOLD;
×
1090
  else if (knumright == 2)  /* x ^ 2 ==> x * x */
4✔
1091
    return emitir(IRTN(IR_MUL), fins->op1, fins->op1);
×
1092
  else
1093
    return NEXTFOLD;
1094
}
1095

1096
/* -- Simplify conversions ------------------------------------------------ */
1097

1098
LJFOLD(CONV CONV IRCONV_NUM_INT)  /* _NUM */
1099
LJFOLDF(shortcut_conv_num_int)
27✔
1100
{
1101
  PHIBARRIER(fleft);
27✔
1102
  /* Only safe with a guarded conversion to int. */
1103
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_NUM && irt_isguard(fleft->t))
27✔
1104
    return fleft->op1;  /* f(g(x)) ==> x */
8✔
1105
  return NEXTFOLD;
1106
}
1107

1108
LJFOLD(CONV CONV IRCONV_INT_NUM)  /* _INT */
1109
LJFOLD(CONV CONV IRCONV_U32_NUM)  /* _U32*/
1110
LJFOLDF(simplify_conv_int_num)
174✔
1111
{
1112
  /* Fold even across PHI to avoid expensive num->int conversions in loop. */
1113
  if ((fleft->op2 & IRCONV_SRCMASK) ==
174✔
1114
      ((fins->op2 & IRCONV_DSTMASK) >> IRCONV_DSH))
174✔
1115
    return fleft->op1;
20✔
1116
  return NEXTFOLD;
1117
}
1118

1119
LJFOLD(CONV CONV IRCONV_I64_NUM)  /* _INT or _U32 */
1120
LJFOLD(CONV CONV IRCONV_U64_NUM)  /* _INT or _U32 */
1121
LJFOLDF(simplify_conv_i64_num)
154✔
1122
{
1123
  PHIBARRIER(fleft);
154✔
1124
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT) {
154✔
1125
    /* Reduce to a sign-extension. */
1126
    fins->op1 = fleft->op1;
×
1127
    fins->op2 = ((IRT_I64<<5)|IRT_INT|IRCONV_SEXT);
×
1128
    return RETRYFOLD;
×
1129
  } else if ((fleft->op2 & IRCONV_SRCMASK) == IRT_U32) {
154✔
1130
#if LJ_TARGET_X64
1131
    return fleft->op1;
1✔
1132
#else
1133
    /* Reduce to a zero-extension. */
1134
    fins->op1 = fleft->op1;
1135
    fins->op2 = (IRT_I64<<5)|IRT_U32;
1136
    return RETRYFOLD;
1137
#endif
1138
  }
1139
  return NEXTFOLD;
1140
}
1141

1142
LJFOLD(CONV CONV IRCONV_INT_I64)  /* _INT or _U32 */
1143
LJFOLD(CONV CONV IRCONV_INT_U64)  /* _INT or _U32 */
1144
LJFOLD(CONV CONV IRCONV_U32_I64)  /* _INT or _U32 */
1145
LJFOLD(CONV CONV IRCONV_U32_U64)  /* _INT or _U32 */
1146
LJFOLDF(simplify_conv_int_i64)
2✔
1147
{
1148
  int src;
2✔
1149
  PHIBARRIER(fleft);
2✔
1150
  src = (fleft->op2 & IRCONV_SRCMASK);
2✔
1151
  if (src == IRT_INT || src == IRT_U32) {
2✔
1152
    if (src == ((fins->op2 & IRCONV_DSTMASK) >> IRCONV_DSH)) {
2✔
1153
      return fleft->op1;
×
1154
    } else {
1155
      fins->op2 = ((fins->op2 & IRCONV_DSTMASK) | src);
2✔
1156
      fins->op1 = fleft->op1;
2✔
1157
      return RETRYFOLD;
2✔
1158
    }
1159
  }
1160
  return NEXTFOLD;
1161
}
1162

1163
LJFOLD(CONV CONV IRCONV_FLOAT_NUM)  /* _FLOAT */
1164
LJFOLDF(simplify_conv_flt_num)
5✔
1165
{
1166
  PHIBARRIER(fleft);
5✔
1167
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_FLOAT)
5✔
1168
    return fleft->op1;
1✔
1169
  return NEXTFOLD;
1170
}
1171

1172
/* Shortcut TOBIT + IRT_NUM <- IRT_INT/IRT_U32 conversion. */
1173
LJFOLD(TOBIT CONV KNUM)
1174
LJFOLDF(simplify_tobit_conv)
40✔
1175
{
1176
  /* Fold even across PHI to avoid expensive num->int conversions in loop. */
1177
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT) {
40✔
1178
    lj_assertJ(irt_isnum(fleft->t), "expected TOBIT number arg");
6✔
1179
    return fleft->op1;
6✔
1180
  } else if ((fleft->op2 & IRCONV_SRCMASK) == IRT_U32) {
34✔
1181
    lj_assertJ(irt_isnum(fleft->t), "expected TOBIT number arg");
2✔
1182
    fins->o = IR_CONV;
2✔
1183
    fins->op1 = fleft->op1;
2✔
1184
    fins->op2 = (IRT_INT<<5)|IRT_U32;
2✔
1185
    return RETRYFOLD;
2✔
1186
  }
1187
  return NEXTFOLD;
1188
}
1189

1190
/* Shortcut floor/ceil/round + IRT_NUM <- IRT_INT/IRT_U32 conversion. */
1191
LJFOLD(FPMATH CONV IRFPM_FLOOR)
1192
LJFOLD(FPMATH CONV IRFPM_CEIL)
1193
LJFOLD(FPMATH CONV IRFPM_TRUNC)
1194
LJFOLDF(simplify_floor_conv)
×
1195
{
1196
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT ||
×
1197
      (fleft->op2 & IRCONV_SRCMASK) == IRT_U32)
1198
    return LEFTFOLD;
×
1199
  return NEXTFOLD;
1200
}
1201

1202
/* Strength reduction of widening. */
1203
LJFOLD(CONV any IRCONV_I64_INT)
1204
LJFOLD(CONV any IRCONV_U64_INT)
1205
LJFOLDF(simplify_conv_sext)
9,776✔
1206
{
1207
  IRRef ref = fins->op1;
9,776✔
1208
  int64_t ofs = 0;
9,776✔
1209
  if (!(fins->op2 & IRCONV_SEXT))
9,776✔
1210
    return NEXTFOLD;
1211
  PHIBARRIER(fleft);
9,748✔
1212
  if (fleft->o == IR_XLOAD && (irt_isu8(fleft->t) || irt_isu16(fleft->t)))
9,745✔
1213
    goto ok_reduce;
×
1214
  if (fleft->o == IR_ADD && irref_isk(fleft->op2)) {
9,745✔
1215
    ofs = (int64_t)IR(fleft->op2)->i;
×
1216
    ref = fleft->op1;
×
1217
  }
1218
  /* Use scalar evolution analysis results to strength-reduce sign-extension. */
1219
  if (ref == J->scev.idx) {
9,745✔
1220
    IRRef lo = J->scev.dir ? J->scev.start : J->scev.stop;
218✔
1221
    lj_assertJ(irt_isint(J->scev.t), "only int SCEV supported");
218✔
1222
    if (lo && IR(lo)->o == IR_KINT && IR(lo)->i + ofs >= 0) {
218✔
1223
    ok_reduce:
215✔
1224
#if LJ_TARGET_X64
1225
      /* Eliminate widening. All 32 bit ops do an implicit zero-extension. */
1226
      return LEFTFOLD;
215✔
1227
#else
1228
      /* Reduce to a (cheaper) zero-extension. */
1229
      fins->op2 &= ~IRCONV_SEXT;
1230
      return RETRYFOLD;
1231
#endif
1232
    }
1233
  }
1234
  return NEXTFOLD;
1235
}
1236

1237
/* Strength reduction of narrowing. */
1238
LJFOLD(CONV ADD IRCONV_INT_I64)
1239
LJFOLD(CONV SUB IRCONV_INT_I64)
1240
LJFOLD(CONV MUL IRCONV_INT_I64)
1241
LJFOLD(CONV ADD IRCONV_INT_U64)
1242
LJFOLD(CONV SUB IRCONV_INT_U64)
1243
LJFOLD(CONV MUL IRCONV_INT_U64)
1244
LJFOLD(CONV ADD IRCONV_U32_I64)
1245
LJFOLD(CONV SUB IRCONV_U32_I64)
1246
LJFOLD(CONV MUL IRCONV_U32_I64)
1247
LJFOLD(CONV ADD IRCONV_U32_U64)
1248
LJFOLD(CONV SUB IRCONV_U32_U64)
1249
LJFOLD(CONV MUL IRCONV_U32_U64)
1250
LJFOLDF(simplify_conv_narrow)
2✔
1251
{
1252
  IROp op = (IROp)fleft->o;
2✔
1253
  IRType t = irt_type(fins->t);
2✔
1254
  IRRef op1 = fleft->op1, op2 = fleft->op2, mode = fins->op2;
2✔
1255
  PHIBARRIER(fleft);
2✔
1256
  op1 = emitir(IRT(IR_CONV, t), op1, mode);
2✔
1257
  op2 = emitir(IRT(IR_CONV, t), op2, mode);
2✔
1258
  fins->ot = IRT(op, t);
2✔
1259
  fins->op1 = op1;
2✔
1260
  fins->op2 = op2;
2✔
1261
  return RETRYFOLD;
2✔
1262
}
1263

1264
/* Special CSE rule for CONV. */
1265
LJFOLD(CONV any any)
1266
LJFOLDF(cse_conv)
93,223✔
1267
{
1268
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
93,223✔
1269
    IRRef op1 = fins->op1, op2 = (fins->op2 & IRCONV_MODEMASK);
93,223✔
1270
    uint8_t guard = irt_isguard(fins->t);
93,223✔
1271
    IRRef ref = J->chain[IR_CONV];
93,223✔
1272
    while (ref > op1) {
126,025✔
1273
      IRIns *ir = IR(ref);
62,767✔
1274
      /* Commoning with stronger checks is ok. */
1275
      if (ir->op1 == op1 && (ir->op2 & IRCONV_MODEMASK) == op2 &&
62,767✔
1276
          irt_isguard(ir->t) >= guard)
29,965✔
1277
        return ref;
29,965✔
1278
      ref = ir->prev;
32,802✔
1279
    }
1280
  }
1281
  return EMITFOLD;  /* No fallthrough to regular CSE. */
63,258✔
1282
}
1283

1284
/* FP conversion narrowing. */
1285
LJFOLD(TOBIT ADD KNUM)
1286
LJFOLD(TOBIT SUB KNUM)
1287
LJFOLD(CONV ADD IRCONV_INT_NUM)
1288
LJFOLD(CONV SUB IRCONV_INT_NUM)
1289
LJFOLD(CONV ADD IRCONV_I64_NUM)
1290
LJFOLD(CONV SUB IRCONV_I64_NUM)
1291
LJFOLDF(narrow_convert)
9,734✔
1292
{
1293
  PHIBARRIER(fleft);
9,734✔
1294
  /* Narrowing ignores PHIs and repeating it inside the loop is not useful. */
1295
  if (J->chain[IR_LOOP])
9,610✔
1296
    return NEXTFOLD;
1297
  lj_assertJ(fins->o != IR_CONV || (fins->op2&IRCONV_CONVMASK) != IRCONV_TOBIT,
9,565✔
1298
             "unexpected CONV TOBIT");
1299
  return lj_opt_narrow_convert(J);
9,565✔
1300
}
1301

1302
/* -- Integer algebraic simplifications ----------------------------------- */
1303

1304
LJFOLD(ADD any KINT)
1305
LJFOLD(ADDOV any KINT)
1306
LJFOLD(SUBOV any KINT)
1307
LJFOLDF(simplify_intadd_k)
3,959✔
1308
{
1309
  if (fright->i == 0)  /* i o 0 ==> i */
3,959✔
1310
    return LEFTFOLD;
147✔
1311
  return NEXTFOLD;
1312
}
1313

1314
LJFOLD(MULOV any KINT)
1315
LJFOLDF(simplify_intmul_k)
4✔
1316
{
1317
  if (fright->i == 0)  /* i * 0 ==> 0 */
4✔
1318
    return RIGHTFOLD;
×
1319
  if (fright->i == 1)  /* i * 1 ==> i */
4✔
1320
    return LEFTFOLD;
×
1321
  if (fright->i == 2) {  /* i * 2 ==> i + i */
4✔
1322
    fins->o = IR_ADDOV;
×
1323
    fins->op2 = fins->op1;
×
1324
    return RETRYFOLD;
×
1325
  }
1326
  return NEXTFOLD;
1327
}
1328

1329
LJFOLD(SUB any KINT)
1330
LJFOLDF(simplify_intsub_k)
122✔
1331
{
1332
  if (fright->i == 0)  /* i - 0 ==> i */
122✔
1333
    return LEFTFOLD;
74✔
1334
  fins->o = IR_ADD;  /* i - k ==> i + (-k) */
48✔
1335
  fins->op2 = (IRRef1)lj_ir_kint(J, -fright->i);  /* Overflow for -2^31 ok. */
48✔
1336
  return RETRYFOLD;
48✔
1337
}
1338

1339
LJFOLD(SUB KINT any)
1340
LJFOLD(SUB KINT64 any)
1341
LJFOLDF(simplify_intsub_kleft)
24✔
1342
{
1343
  if (fleft->o == IR_KINT ? (fleft->i == 0) : (ir_kint64(fleft)->u64 == 0)) {
24✔
1344
    fins->o = IR_NEG;  /* 0 - i ==> -i */
11✔
1345
    fins->op1 = fins->op2;
11✔
1346
    return RETRYFOLD;
11✔
1347
  }
1348
  return NEXTFOLD;
1349
}
1350

1351
LJFOLD(ADD any KINT64)
1352
LJFOLDF(simplify_intadd_k64)
32,021✔
1353
{
1354
  if (ir_kint64(fright)->u64 == 0)  /* i + 0 ==> i */
32,021✔
1355
    return LEFTFOLD;
1,460✔
1356
  return NEXTFOLD;
1357
}
1358

1359
LJFOLD(SUB any KINT64)
1360
LJFOLDF(simplify_intsub_k64)
16✔
1361
{
1362
  uint64_t k = ir_kint64(fright)->u64;
16✔
1363
  if (k == 0)  /* i - 0 ==> i */
16✔
1364
    return LEFTFOLD;
×
1365
  fins->o = IR_ADD;  /* i - k ==> i + (-k) */
16✔
1366
  fins->op2 = (IRRef1)lj_ir_kint64(J, (uint64_t)-(int64_t)k);
16✔
1367
  return RETRYFOLD;
16✔
1368
}
1369

1370
static TRef simplify_intmul_k(jit_State *J, int32_t k)
26,987✔
1371
{
1372
  /* Note: many more simplifications are possible, e.g. 2^k1 +- 2^k2.
1373
  ** But this is mainly intended for simple address arithmetic.
1374
  ** Also it's easier for the backend to optimize the original multiplies.
1375
  */
1376
  if (k == 0) {  /* i * 0 ==> 0 */
26,987✔
1377
    return RIGHTFOLD;
×
1378
  } else if (k == 1) {  /* i * 1 ==> i */
26,987✔
1379
    return LEFTFOLD;
21,595✔
1380
  } else if ((k & (k-1)) == 0) {  /* i * 2^k ==> i << k */
5,392✔
1381
    fins->o = IR_BSHL;
5,390✔
1382
    fins->op2 = lj_ir_kint(J, lj_fls((uint32_t)k));
5,390✔
1383
    return RETRYFOLD;
5,390✔
1384
  }
1385
  return NEXTFOLD;
1386
}
1387

1388
LJFOLD(MUL any KINT)
1389
LJFOLDF(simplify_intmul_k32)
×
1390
{
1391
  if (fright->i >= 0)
×
1392
    return simplify_intmul_k(J, fright->i);
×
1393
  return NEXTFOLD;
1394
}
1395

1396
LJFOLD(MUL any KINT64)
1397
LJFOLDF(simplify_intmul_k64)
29,603✔
1398
{
1399
#if LJ_HASFFI
1400
  if (ir_kint64(fright)->u64 < 0x80000000u)
29,603✔
1401
    return simplify_intmul_k(J, (int32_t)ir_kint64(fright)->u64);
26,987✔
1402
  return NEXTFOLD;
1403
#else
1404
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1405
#endif
1406
}
1407

1408
LJFOLD(MOD any KINT)
1409
LJFOLDF(simplify_intmod_k)
71✔
1410
{
1411
  int32_t k = fright->i;
71✔
1412
  lj_assertJ(k != 0, "integer mod 0");
71✔
1413
  if (k > 0 && (k & (k-1)) == 0) {  /* i % (2^k) ==> i & (2^k-1) */
71✔
1414
    fins->o = IR_BAND;
9✔
1415
    fins->op2 = lj_ir_kint(J, k-1);
9✔
1416
    return RETRYFOLD;
9✔
1417
  }
1418
  return NEXTFOLD;
1419
}
1420

1421
LJFOLD(MOD KINT any)
1422
LJFOLDF(simplify_intmod_kleft)
2✔
1423
{
1424
  if (fleft->i == 0)
2✔
1425
    return INTFOLD(0);
×
1426
  return NEXTFOLD;
1427
}
1428

1429
LJFOLD(SUB any any)
1430
LJFOLD(SUBOV any any)
1431
LJFOLDF(simplify_intsub)
110,506✔
1432
{
1433
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))  /* i - i ==> 0 */
110,506✔
1434
    return irt_is64(fins->t) ? INT64FOLD(0) : INTFOLD(0);
×
1435
  return NEXTFOLD;
1436
}
1437

1438
LJFOLD(SUB ADD any)
1439
LJFOLDF(simplify_intsubadd_leftcancel)
31✔
1440
{
1441
  if (!irt_isnum(fins->t)) {
31✔
1442
    PHIBARRIER(fleft);
23✔
1443
    if (fins->op2 == fleft->op1)  /* (i + j) - i ==> j */
22✔
1444
      return fleft->op2;
19✔
1445
    if (fins->op2 == fleft->op2)  /* (i + j) - j ==> i */
3✔
1446
      return fleft->op1;
×
1447
  }
1448
  return NEXTFOLD;
1449
}
1450

1451
LJFOLD(SUB SUB any)
1452
LJFOLDF(simplify_intsubsub_leftcancel)
177✔
1453
{
1454
  if (!irt_isnum(fins->t)) {
177✔
1455
    PHIBARRIER(fleft);
4✔
1456
    if (fins->op2 == fleft->op1) {  /* (i - j) - i ==> 0 - j */
2✔
1457
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
×
1458
      fins->op2 = fleft->op2;
×
1459
      return RETRYFOLD;
×
1460
    }
1461
  }
1462
  return NEXTFOLD;
1463
}
1464

1465
LJFOLD(SUB any SUB)
1466
LJFOLDF(simplify_intsubsub_rightcancel)
×
1467
{
1468
  if (!irt_isnum(fins->t)) {
×
1469
    PHIBARRIER(fright);
×
1470
    if (fins->op1 == fright->op1)  /* i - (i - j) ==> j */
×
1471
      return fright->op2;
×
1472
  }
1473
  return NEXTFOLD;
1474
}
1475

1476
LJFOLD(SUB any ADD)
1477
LJFOLDF(simplify_intsubadd_rightcancel)
53✔
1478
{
1479
  if (!irt_isnum(fins->t)) {
53✔
1480
    PHIBARRIER(fright);
52✔
1481
    if (fins->op1 == fright->op1) {  /* i - (i + j) ==> 0 - j */
50✔
1482
      fins->op2 = fright->op2;
9✔
1483
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
9✔
1484
      return RETRYFOLD;
9✔
1485
    }
1486
    if (fins->op1 == fright->op2) {  /* i - (j + i) ==> 0 - j */
41✔
1487
      fins->op2 = fright->op1;
5✔
1488
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
5✔
1489
      return RETRYFOLD;
5✔
1490
    }
1491
  }
1492
  return NEXTFOLD;
1493
}
1494

1495
LJFOLD(SUB ADD ADD)
1496
LJFOLDF(simplify_intsubaddadd_cancel)
17✔
1497
{
1498
  if (!irt_isnum(fins->t)) {
17✔
1499
    PHIBARRIER(fleft);
17✔
1500
    PHIBARRIER(fright);
16✔
1501
    if (fleft->op1 == fright->op1) {  /* (i + j1) - (i + j2) ==> j1 - j2 */
15✔
1502
      fins->op1 = fleft->op2;
14✔
1503
      fins->op2 = fright->op2;
14✔
1504
      return RETRYFOLD;
14✔
1505
    }
1506
    if (fleft->op1 == fright->op2) {  /* (i + j1) - (j2 + i) ==> j1 - j2 */
1✔
1507
      fins->op1 = fleft->op2;
×
1508
      fins->op2 = fright->op1;
×
1509
      return RETRYFOLD;
×
1510
    }
1511
    if (fleft->op2 == fright->op1) {  /* (j1 + i) - (i + j2) ==> j1 - j2 */
1✔
1512
      fins->op1 = fleft->op1;
×
1513
      fins->op2 = fright->op2;
×
1514
      return RETRYFOLD;
×
1515
    }
1516
    if (fleft->op2 == fright->op2) {  /* (j1 + i) - (j2 + i) ==> j1 - j2 */
1✔
1517
      fins->op1 = fleft->op1;
×
1518
      fins->op2 = fright->op1;
×
1519
      return RETRYFOLD;
×
1520
    }
1521
  }
1522
  return NEXTFOLD;
1523
}
1524

1525
LJFOLD(BAND any KINT)
1526
LJFOLD(BAND any KINT64)
1527
LJFOLDF(simplify_band_k)
40,079✔
1528
{
1529
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
40,079✔
1530
                                     (int64_t)ir_k64(fright)->u64;
9✔
1531
  if (k == 0)  /* i & 0 ==> 0 */
40,079✔
1532
    return RIGHTFOLD;
×
1533
  if (k == -1)  /* i & -1 ==> i */
40,079✔
1534
    return LEFTFOLD;
×
1535
  return NEXTFOLD;
1536
}
1537

1538
LJFOLD(BOR any KINT)
1539
LJFOLD(BOR any KINT64)
1540
LJFOLDF(simplify_bor_k)
58✔
1541
{
1542
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
58✔
1543
                                     (int64_t)ir_k64(fright)->u64;
6✔
1544
  if (k == 0)  /* i | 0 ==> i */
58✔
1545
    return LEFTFOLD;
×
1546
  if (k == -1)  /* i | -1 ==> -1 */
58✔
1547
    return RIGHTFOLD;
×
1548
  return NEXTFOLD;
1549
}
1550

1551
LJFOLD(BXOR any KINT)
1552
LJFOLD(BXOR any KINT64)
1553
LJFOLDF(simplify_bxor_k)
10✔
1554
{
1555
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
10✔
1556
                                     (int64_t)ir_k64(fright)->u64;
6✔
1557
  if (k == 0)  /* i xor 0 ==> i */
10✔
1558
    return LEFTFOLD;
4✔
1559
  if (k == -1) {  /* i xor -1 ==> ~i */
6✔
1560
    fins->o = IR_BNOT;
×
1561
    fins->op2 = 0;
×
1562
    return RETRYFOLD;
×
1563
  }
1564
  return NEXTFOLD;
1565
}
1566

1567
LJFOLD(BSHL any KINT)
1568
LJFOLD(BSHR any KINT)
1569
LJFOLD(BSAR any KINT)
1570
LJFOLD(BROL any KINT)
1571
LJFOLD(BROR any KINT)
1572
LJFOLDF(simplify_shift_ik)
5,590✔
1573
{
1574
  int32_t mask = irt_is64(fins->t) ? 63 : 31;
5,590✔
1575
  int32_t k = (fright->i & mask);
5,590✔
1576
  if (k == 0)  /* i o 0 ==> i */
5,590✔
1577
    return LEFTFOLD;
1✔
1578
  if (k == 1 && fins->o == IR_BSHL) {  /* i << 1 ==> i + i */
5,589✔
1579
    fins->o = IR_ADD;
18✔
1580
    fins->op2 = fins->op1;
18✔
1581
    return RETRYFOLD;
18✔
1582
  }
1583
  if (k != fright->i) {  /* i o k ==> i o (k & mask) */
5,571✔
1584
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
×
1585
    return RETRYFOLD;
×
1586
  }
1587
#ifndef LJ_TARGET_UNIFYROT
1588
  if (fins->o == IR_BROR) {  /* bror(i, k) ==> brol(i, (-k)&mask) */
5,571✔
1589
    fins->o = IR_BROL;
×
1590
    fins->op2 = (IRRef1)lj_ir_kint(J, (-k)&mask);
×
1591
    return RETRYFOLD;
×
1592
  }
1593
#endif
1594
  return NEXTFOLD;
1595
}
1596

1597
LJFOLD(BSHL any BAND)
1598
LJFOLD(BSHR any BAND)
1599
LJFOLD(BSAR any BAND)
1600
LJFOLD(BROL any BAND)
1601
LJFOLD(BROR any BAND)
1602
LJFOLDF(simplify_shift_andk)
×
1603
{
1604
  IRIns *irk = IR(fright->op2);
×
1605
  PHIBARRIER(fright);
×
1606
  if ((fins->o < IR_BROL ? LJ_TARGET_MASKSHIFT : LJ_TARGET_MASKROT) &&
×
1607
      irk->o == IR_KINT) {  /* i o (j & mask) ==> i o j */
×
1608
    int32_t mask = irt_is64(fins->t) ? 63 : 31;
×
1609
    int32_t k = irk->i & mask;
×
1610
    if (k == mask) {
×
1611
      fins->op2 = fright->op1;
×
1612
      return RETRYFOLD;
×
1613
    }
1614
  }
1615
  return NEXTFOLD;
1616
}
1617

1618
LJFOLD(BSHL KINT any)
1619
LJFOLD(BSHR KINT any)
1620
LJFOLD(BSHL KINT64 any)
1621
LJFOLD(BSHR KINT64 any)
1622
LJFOLDF(simplify_shift1_ki)
2✔
1623
{
1624
  int64_t k = fleft->o == IR_KINT ? (int64_t)fleft->i :
2✔
1625
                                    (int64_t)ir_k64(fleft)->u64;
2✔
1626
  if (k == 0)  /* 0 o i ==> 0 */
2✔
1627
    return LEFTFOLD;
×
1628
  return NEXTFOLD;
1629
}
1630

1631
LJFOLD(BSAR KINT any)
1632
LJFOLD(BROL KINT any)
1633
LJFOLD(BROR KINT any)
1634
LJFOLD(BSAR KINT64 any)
1635
LJFOLD(BROL KINT64 any)
1636
LJFOLD(BROR KINT64 any)
1637
LJFOLDF(simplify_shift2_ki)
2✔
1638
{
1639
  int64_t k = fleft->o == IR_KINT ? (int64_t)fleft->i :
2✔
1640
                                    (int64_t)ir_k64(fleft)->u64;
2✔
1641
  if (k == 0 || k == -1)  /* 0 o i ==> 0; -1 o i ==> -1 */
2✔
1642
    return LEFTFOLD;
×
1643
  return NEXTFOLD;
1644
}
1645

1646
LJFOLD(BSHL BAND KINT)
1647
LJFOLD(BSHR BAND KINT)
1648
LJFOLD(BROL BAND KINT)
1649
LJFOLD(BROR BAND KINT)
1650
LJFOLDF(simplify_shiftk_andk)
6✔
1651
{
1652
  IRIns *irk = IR(fleft->op2);
6✔
1653
  PHIBARRIER(fleft);
6✔
1654
  if (irk->o == IR_KINT) {  /* (i & k1) o k2 ==> (i o k2) & (k1 o k2) */
6✔
1655
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
5✔
1656
    fins->op1 = fleft->op1;
5✔
1657
    fins->op1 = (IRRef1)lj_opt_fold(J);
5✔
1658
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
5✔
1659
    fins->ot = IRTI(IR_BAND);
5✔
1660
    return RETRYFOLD;
5✔
1661
  } else if (irk->o == IR_KINT64) {
1✔
1662
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, fright->i,
1✔
1663
                                  (IROp)fins->o);
1✔
1664
    IROpT ot = fleft->ot;
1✔
1665
    fins->op1 = fleft->op1;
1✔
1666
    fins->op1 = (IRRef1)lj_opt_fold(J);
1✔
1667
    fins->op2 = (IRRef1)lj_ir_kint64(J, k);
1✔
1668
    fins->ot = ot;
1✔
1669
    return RETRYFOLD;
1✔
1670
  }
1671
  return NEXTFOLD;
1672
}
1673

1674
LJFOLD(BAND BSHL KINT)
1675
LJFOLD(BAND BSHR KINT)
1676
LJFOLDF(simplify_andk_shiftk)
14✔
1677
{
1678
  IRIns *irk = IR(fleft->op2);
14✔
1679
  if (irk->o == IR_KINT &&
14✔
1680
      kfold_intop(-1, irk->i, (IROp)fleft->o) == fright->i)
14✔
1681
    return LEFTFOLD;  /* (i o k1) & k2 ==> i, if (-1 o k1) == k2 */
2✔
1682
  return NEXTFOLD;
1683
}
1684

1685
LJFOLD(BAND BOR KINT)
1686
LJFOLD(BOR BAND KINT)
1687
LJFOLDF(simplify_andor_k)
×
1688
{
1689
  IRIns *irk = IR(fleft->op2);
×
1690
  PHIBARRIER(fleft);
×
1691
  if (irk->o == IR_KINT) {
×
1692
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
×
1693
    /* (i | k1) & k2 ==> i & k2, if (k1 & k2) == 0. */
1694
    /* (i & k1) | k2 ==> i | k2, if (k1 | k2) == -1. */
1695
    if (k == (fins->o == IR_BAND ? 0 : -1)) {
×
1696
      fins->op1 = fleft->op1;
×
1697
      return RETRYFOLD;
×
1698
    }
1699
  }
1700
  return NEXTFOLD;
1701
}
1702

1703
LJFOLD(BAND BOR KINT64)
1704
LJFOLD(BOR BAND KINT64)
1705
LJFOLDF(simplify_andor_k64)
×
1706
{
1707
#if LJ_HASFFI
1708
  IRIns *irk = IR(fleft->op2);
×
1709
  PHIBARRIER(fleft);
×
1710
  if (irk->o == IR_KINT64) {
×
1711
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, ir_k64(fright)->u64,
×
1712
                                  (IROp)fins->o);
×
1713
    /* (i | k1) & k2 ==> i & k2, if (k1 & k2) == 0. */
1714
    /* (i & k1) | k2 ==> i | k2, if (k1 | k2) == -1. */
1715
    if (k == (fins->o == IR_BAND ? (uint64_t)0 : ~(uint64_t)0)) {
×
1716
      fins->op1 = fleft->op1;
×
1717
      return RETRYFOLD;
×
1718
    }
1719
  }
1720
  return NEXTFOLD;
1721
#else
1722
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1723
#endif
1724
}
1725

1726
/* -- Reassociation ------------------------------------------------------- */
1727

1728
LJFOLD(ADD ADD KINT)
1729
LJFOLD(MUL MUL KINT)
1730
LJFOLD(BAND BAND KINT)
1731
LJFOLD(BOR BOR KINT)
1732
LJFOLD(BXOR BXOR KINT)
1733
LJFOLDF(reassoc_intarith_k)
1,632✔
1734
{
1735
  IRIns *irk = IR(fleft->op2);
1,632✔
1736
  if (irk->o == IR_KINT) {
1,632✔
1737
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
1,610✔
1738
    if (k == irk->i)  /* (i o k1) o k2 ==> i o k1, if (k1 o k2) == k1. */
1,610✔
1739
      return LEFTFOLD;
16✔
1740
    PHIBARRIER(fleft);
1,594✔
1741
    fins->op1 = fleft->op1;
117✔
1742
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
117✔
1743
    return RETRYFOLD;  /* (i o k1) o k2 ==> i o (k1 o k2) */
117✔
1744
  }
1745
  return NEXTFOLD;
1746
}
1747

1748
LJFOLD(ADD ADD KINT64)
1749
LJFOLD(MUL MUL KINT64)
1750
LJFOLD(BAND BAND KINT64)
1751
LJFOLD(BOR BOR KINT64)
1752
LJFOLD(BXOR BXOR KINT64)
1753
LJFOLDF(reassoc_intarith_k64)
28,519✔
1754
{
1755
#if LJ_HASFFI
1756
  IRIns *irk = IR(fleft->op2);
28,519✔
1757
  if (irk->o == IR_KINT64) {
28,519✔
1758
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, ir_k64(fright)->u64,
8,114✔
1759
                                  (IROp)fins->o);
8,114✔
1760
    PHIBARRIER(fleft);
8,114✔
1761
    fins->op1 = fleft->op1;
8,107✔
1762
    fins->op2 = (IRRef1)lj_ir_kint64(J, k);
8,107✔
1763
    return RETRYFOLD;  /* (i o k1) o k2 ==> i o (k1 o k2) */
8,107✔
1764
  }
1765
  return NEXTFOLD;
1766
#else
1767
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1768
#endif
1769
}
1770

1771
LJFOLD(BAND BAND any)
1772
LJFOLD(BOR BOR any)
1773
LJFOLDF(reassoc_dup)
×
1774
{
1775
  if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2)
×
1776
    return LEFTFOLD;  /* (a o b) o a ==> a o b; (a o b) o b ==> a o b */
×
1777
  return NEXTFOLD;
1778
}
1779

1780
LJFOLD(MIN MIN any)
1781
LJFOLD(MAX MAX any)
1782
LJFOLDF(reassoc_dup_minmax)
6✔
1783
{
1784
  if (fins->op2 == fleft->op2)
6✔
1785
    return LEFTFOLD;  /* (a o b) o b ==> a o b */
×
1786
  return NEXTFOLD;
1787
}
1788

1789
LJFOLD(BXOR BXOR any)
1790
LJFOLDF(reassoc_bxor)
1✔
1791
{
1792
  PHIBARRIER(fleft);
1✔
1793
  if (fins->op2 == fleft->op1)  /* (a xor b) xor a ==> b */
1✔
1794
    return fleft->op2;
×
1795
  if (fins->op2 == fleft->op2)  /* (a xor b) xor b ==> a */
1✔
1796
    return fleft->op1;
×
1797
  return NEXTFOLD;
1798
}
1799

1800
LJFOLD(BSHL BSHL KINT)
1801
LJFOLD(BSHR BSHR KINT)
1802
LJFOLD(BSAR BSAR KINT)
1803
LJFOLD(BROL BROL KINT)
1804
LJFOLD(BROR BROR KINT)
1805
LJFOLDF(reassoc_shift)
2✔
1806
{
1807
  IRIns *irk = IR(fleft->op2);
2✔
1808
  PHIBARRIER(fleft);  /* The (shift any KINT) rule covers k2 == 0 and more. */
2✔
1809
  if (irk->o == IR_KINT) {  /* (i o k1) o k2 ==> i o (k1 + k2) */
1✔
1810
    int32_t mask = irt_is64(fins->t) ? 63 : 31;
1✔
1811
    int32_t k = (irk->i & mask) + (fright->i & mask);
1✔
1812
    if (k > mask) {  /* Combined shift too wide? */
1✔
1813
      if (fins->o == IR_BSHL || fins->o == IR_BSHR)
×
1814
        return mask == 31 ? INTFOLD(0) : INT64FOLD(0);
×
1815
      else if (fins->o == IR_BSAR)
×
1816
        k = mask;
1817
      else
1818
        k &= mask;
×
1819
    }
1820
    fins->op1 = fleft->op1;
1✔
1821
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
1✔
1822
    return RETRYFOLD;
1✔
1823
  }
1824
  return NEXTFOLD;
1825
}
1826

1827
LJFOLD(MIN MIN KINT)
1828
LJFOLD(MAX MAX KINT)
1829
LJFOLDF(reassoc_minmax_k)
×
1830
{
1831
  IRIns *irk = IR(fleft->op2);
×
1832
  if (irk->o == IR_KINT) {
×
1833
    int32_t a = irk->i;
×
1834
    int32_t y = kfold_intop(a, fright->i, fins->o);
×
1835
    if (a == y)  /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
×
1836
      return LEFTFOLD;
×
1837
    PHIBARRIER(fleft);
×
1838
    fins->op1 = fleft->op1;
×
1839
    fins->op2 = (IRRef1)lj_ir_kint(J, y);
×
1840
    return RETRYFOLD;  /* (x o k1) o k2 ==> x o (k1 o k2) */
×
1841
  }
1842
  return NEXTFOLD;
1843
}
1844

1845
/* -- Array bounds check elimination -------------------------------------- */
1846

1847
/* Eliminate ABC across PHIs to handle t[i-1] forwarding case.
1848
** ABC(asize, (i+k)+(-k)) ==> ABC(asize, i), but only if it already exists.
1849
** Could be generalized to (i+k1)+k2 ==> i+(k1+k2), but needs better disambig.
1850
*/
1851
LJFOLD(ABC any ADD)
1852
LJFOLDF(abc_fwd)
617✔
1853
{
1854
  if (LJ_LIKELY(J->flags & JIT_F_OPT_ABC)) {
617✔
1855
    if (irref_isk(fright->op2)) {
617✔
1856
      IRIns *add2 = IR(fright->op1);
616✔
1857
      if (add2->o == IR_ADD && irref_isk(add2->op2) &&
616✔
1858
          IR(fright->op2)->i == -IR(add2->op2)->i) {
28✔
1859
        IRRef ref = J->chain[IR_ABC];
5✔
1860
        IRRef lim = add2->op1;
5✔
1861
        if (fins->op1 > lim) lim = fins->op1;
5✔
1862
        while (ref > lim) {
14✔
1863
          IRIns *ir = IR(ref);
13✔
1864
          if (ir->op1 == fins->op1 && ir->op2 == add2->op1)
13✔
1865
            return DROPFOLD;
1866
          ref = ir->prev;
9✔
1867
        }
1868
      }
1869
    }
1870
  }
1871
  return NEXTFOLD;
1872
}
1873

1874
/* Eliminate ABC for constants.
1875
** ABC(asize, k1), ABC(asize k2) ==> ABC(asize, max(k1, k2))
1876
** Drop second ABC if k2 is lower. Otherwise patch first ABC with k2.
1877
*/
1878
LJFOLD(ABC any KINT)
1879
LJFOLDF(abc_k)
1,046✔
1880
{
1881
  PHIBARRIER(fleft);
1,046✔
1882
  if (LJ_LIKELY(J->flags & JIT_F_OPT_ABC)) {
1,044✔
1883
    IRRef ref = J->chain[IR_ABC];
1,044✔
1884
    IRRef asize = fins->op1;
1,044✔
1885
    while (ref > asize) {
1,163✔
1886
      IRIns *ir = IR(ref);
886✔
1887
      if (ir->op1 == asize && irref_isk(ir->op2)) {
886✔
1888
        uint32_t k = (uint32_t)IR(ir->op2)->i;
767✔
1889
        if ((uint32_t)fright->i > k)
767✔
1890
          ir->op2 = fins->op2;
44✔
1891
        return DROPFOLD;
767✔
1892
      }
1893
      ref = ir->prev;
119✔
1894
    }
1895
    return EMITFOLD;  /* Already performed CSE. */
277✔
1896
  }
1897
  return NEXTFOLD;
1898
}
1899

1900
/* Eliminate invariant ABC inside loop. */
1901
LJFOLD(ABC any any)
1902
LJFOLDF(abc_invar)
4,410✔
1903
{
1904
  /* Invariant ABC marked as PTR. Drop if op1 is invariant, too. */
1905
  if (!irt_isint(fins->t) && fins->op1 < J->chain[IR_LOOP] &&
4,410✔
1906
      !irt_isphi(IR(fins->op1)->t))
6✔
1907
    return DROPFOLD;
6✔
1908
  return NEXTFOLD;
1909
}
1910

1911
/* -- Commutativity ------------------------------------------------------- */
1912

1913
/* The refs of commutative ops are canonicalized. Lower refs go to the right.
1914
** Rationale behind this:
1915
** - It (also) moves constants to the right.
1916
** - It reduces the number of FOLD rules (e.g. (BOR any KINT) suffices).
1917
** - It helps CSE to find more matches.
1918
** - The assembler generates better code with constants at the right.
1919
*/
1920

1921
LJFOLD(ADD any any)
1922
LJFOLD(MUL any any)
1923
LJFOLD(ADDOV any any)
1924
LJFOLD(MULOV any any)
1925
LJFOLDF(comm_swap)
1,892,023✔
1926
{
1927
  if (fins->op1 < fins->op2) {  /* Move lower ref to the right. */
295,221✔
1928
    IRRef1 tmp = fins->op1;
107,650✔
1929
    fins->op1 = fins->op2;
107,650✔
1930
    fins->op2 = tmp;
107,650✔
1931
    return RETRYFOLD;
107,650✔
1932
  }
1933
  return NEXTFOLD;
1934
}
1935

1936
LJFOLD(EQ any any)
1937
LJFOLD(NE any any)
1938
LJFOLDF(comm_equal)
1,556,777✔
1939
{
1940
  /* For non-numbers only: x == x ==> drop; x ~= x ==> fail */
1941
  if (fins->op1 == fins->op2 &&
1,556,777✔
1942
      (!irt_isnum(fins->t) ||
321✔
1943
       (fleft->o == IR_CONV &&  /* Converted integers cannot be NaN. */
97✔
1944
        (uint32_t)(fleft->op2 & IRCONV_SRCMASK) - (uint32_t)IRT_I8 <= (uint32_t)(IRT_U64 - IRT_U8))))
11✔
1945
    return CONDFOLD(fins->o == IR_EQ);
232✔
1946
  return fold_comm_swap(J);
1,556,545✔
1947
}
1948

1949
LJFOLD(LT any any)
1950
LJFOLD(GE any any)
1951
LJFOLD(LE any any)
1952
LJFOLD(GT any any)
1953
LJFOLD(ULT any any)
1954
LJFOLD(UGE any any)
1955
LJFOLD(ULE any any)
1956
LJFOLD(UGT any any)
1957
LJFOLDF(comm_comp)
358,154✔
1958
{
1959
  /* For non-numbers only: x <=> x ==> drop; x <> x ==> fail */
1960
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))
358,154✔
1961
    return CONDFOLD((fins->o ^ (fins->o >> 1)) & 1);
×
1962
  if (fins->op1 < fins->op2) {  /* Move lower ref to the right. */
358,154✔
1963
    IRRef1 tmp = fins->op1;
118,301✔
1964
    fins->op1 = fins->op2;
118,301✔
1965
    fins->op2 = tmp;
118,301✔
1966
    fins->o ^= 3; /* GT <-> LT, GE <-> LE, does not affect U */
118,301✔
1967
    return RETRYFOLD;
118,301✔
1968
  }
1969
  return NEXTFOLD;
1970
}
1971

1972
LJFOLD(BAND any any)
1973
LJFOLD(BOR any any)
1974
LJFOLDF(comm_dup)
40,211✔
1975
{
1976
  if (fins->op1 == fins->op2)  /* x o x ==> x */
40,211✔
1977
    return LEFTFOLD;
×
1978
  return fold_comm_swap(J);
40,211✔
1979
}
1980

1981
LJFOLD(MIN any any)
1982
LJFOLD(MAX any any)
1983
LJFOLDF(comm_dup_minmax)
96,439✔
1984
{
1985
  if (fins->op1 == fins->op2)  /* x o x ==> x */
96,439✔
1986
    return LEFTFOLD;
×
1987
  return NEXTFOLD;
1988
}
1989

1990
LJFOLD(BXOR any any)
1991
LJFOLDF(comm_bxor)
48✔
1992
{
1993
  if (fins->op1 == fins->op2)  /* i xor i ==> 0 */
48✔
1994
    return irt_is64(fins->t) ? INT64FOLD(0) : INTFOLD(0);
2✔
1995
  return fold_comm_swap(J);
46✔
1996
}
1997

1998
/* -- Simplification of compound expressions ------------------------------ */
1999

2000
static TRef kfold_xload(jit_State *J, IRIns *ir, const void *p)
2001
{
2002
  int32_t k;
2003
  switch (irt_type(ir->t)) {
2004
  case IRT_NUM: return lj_ir_knum_u64(J, *(uint64_t *)p);
2005
  case IRT_I8: k = (int32_t)*(int8_t *)p; break;
2006
  case IRT_U8: k = (int32_t)*(uint8_t *)p; break;
2007
  case IRT_I16: k = (int32_t)(int16_t)lj_getu16(p); break;
2008
  case IRT_U16: k = (int32_t)(uint16_t)lj_getu16(p); break;
2009
  case IRT_INT: case IRT_U32: k = (int32_t)lj_getu32(p); break;
2010
  case IRT_I64: case IRT_U64: return lj_ir_kint64(J, *(uint64_t *)p);
2011
  default: return 0;
2012
  }
2013
  return lj_ir_kint(J, k);
2014
}
2015

2016
/* Turn: string.sub(str, a, b) == kstr
2017
** into: string.byte(str, a) == string.byte(kstr, 1) etc.
2018
** Note: this creates unaligned XLOADs on x86/x64.
2019
*/
2020
LJFOLD(EQ SNEW KGC)
2021
LJFOLD(NE SNEW KGC)
2022
LJFOLDF(merge_eqne_snew_kgc)
97✔
2023
{
2024
  GCstr *kstr = ir_kstr(fright);
97✔
2025
  int32_t len = (int32_t)kstr->len;
97✔
2026
  lj_assertJ(irt_isstr(fins->t), "bad equality IR type");
97✔
2027

2028
#if LJ_TARGET_UNALIGNED
2029
#define FOLD_SNEW_MAX_LEN        4  /* Handle string lengths 0, 1, 2, 3, 4. */
2030
#define FOLD_SNEW_TYPE8                IRT_I8        /* Creates shorter immediates. */
2031
#else
2032
#define FOLD_SNEW_MAX_LEN        1  /* Handle string lengths 0 or 1. */
2033
#define FOLD_SNEW_TYPE8                IRT_U8  /* Prefer unsigned loads. */
2034
#endif
2035

2036
  PHIBARRIER(fleft);
97✔
2037
  if (len <= FOLD_SNEW_MAX_LEN) {
93✔
2038
    IROp op = (IROp)fins->o;
93✔
2039
    IRRef strref = fleft->op1;
93✔
2040
    if (IR(strref)->o != IR_STRREF)
93✔
2041
      return NEXTFOLD;
2042
    if (op == IR_EQ) {
93✔
2043
      emitir(IRTGI(IR_EQ), fleft->op2, lj_ir_kint(J, len));
20✔
2044
      /* Caveat: fins/fleft/fright is no longer valid after emitir. */
2045
    } else {
2046
      /* NE is not expanded since this would need an OR of two conds. */
2047
      if (!irref_isk(fleft->op2))  /* Only handle the constant length case. */
73✔
2048
        return NEXTFOLD;
2049
      if (IR(fleft->op2)->i != len)
71✔
2050
        return DROPFOLD;
2051
    }
2052
    if (len > 0) {
53✔
2053
      /* A 4 byte load for length 3 is ok -- all strings have an extra NUL. */
2054
      uint16_t ot = (uint16_t)(len == 1 ? IRT(IR_XLOAD, FOLD_SNEW_TYPE8) :
52✔
2055
                               len == 2 ? IRT(IR_XLOAD, IRT_U16) :
2056
                               IRTI(IR_XLOAD));
2057
      TRef tmp = emitir(ot, strref,
101✔
2058
                        IRXLOAD_READONLY | (len > 1 ? IRXLOAD_UNALIGNED : 0));
2059
      TRef val = kfold_xload(J, IR(tref_ref(tmp)), strdata(kstr));
52✔
2060
      if (len == 3)
52✔
2061
        tmp = emitir(IRTI(IR_BAND), tmp,
1✔
2062
                     lj_ir_kint(J, LJ_ENDIAN_SELECT(0x00ffffff, 0xffffff00)));
2063
      fins->op1 = (IRRef1)tmp;
52✔
2064
      fins->op2 = (IRRef1)val;
52✔
2065
      fins->ot = (IROpT)IRTGI(op);
52✔
2066
      return RETRYFOLD;
52✔
2067
    } else {
2068
      return DROPFOLD;
2069
    }
2070
  }
2071
  return NEXTFOLD;
2072
}
2073

2074
/* -- Loads --------------------------------------------------------------- */
2075

2076
/* Loads cannot be folded or passed on to CSE in general.
2077
** Alias analysis is needed to check for forwarding opportunities.
2078
**
2079
** Caveat: *all* loads must be listed here or they end up at CSE!
2080
*/
2081

2082
LJFOLD(ALOAD any)
2083
LJFOLDX(lj_opt_fwd_aload)
2084

2085
/* From HREF fwd (see below). Must eliminate, not supported by fwd/backend. */
2086
LJFOLD(HLOAD KKPTR)
2087
LJFOLDF(kfold_hload_kkptr)
4✔
2088
{
2089
  UNUSED(J);
4✔
2090
  lj_assertJ(ir_kptr(fleft) == niltvg(J2G(J)), "expected niltv");
4✔
2091
  return TREF_NIL;
4✔
2092
}
2093

2094
LJFOLD(HLOAD any)
2095
LJFOLDX(lj_opt_fwd_hload)
2096

2097
LJFOLD(ULOAD any)
2098
LJFOLDX(lj_opt_fwd_uload)
2099

2100
LJFOLD(CALLL any IRCALL_lj_tab_len)
2101
LJFOLDX(lj_opt_fwd_tab_len)
2102

2103
/* Upvalue refs are really loads, but there are no corresponding stores.
2104
** So CSE is ok for them, except for UREFO across a GC step (see below).
2105
** If the referenced function is const, its upvalue addresses are const, too.
2106
** This can be used to improve CSE by looking for the same address,
2107
** even if the upvalues originate from a different function.
2108
*/
2109
LJFOLD(UREFO KGC any)
2110
LJFOLD(UREFC KGC any)
2111
LJFOLDF(cse_uref)
63,523✔
2112
{
2113
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
63,523✔
2114
    IRRef ref = J->chain[fins->o];
63,523✔
2115
    GCfunc *fn = ir_kfunc(fleft);
63,523✔
2116
    GCupval *uv = gco2uv(gcref(fn->l.uvptr[(fins->op2 >> 8)]));
63,523✔
2117
    while (ref > 0) {
109,612✔
2118
      IRIns *ir = IR(ref);
74,812✔
2119
      if (irref_isk(ir->op1)) {
74,812✔
2120
        GCfunc *fn2 = ir_kfunc(IR(ir->op1));
73,938✔
2121
        if (gco2uv(gcref(fn2->l.uvptr[(ir->op2 >> 8)])) == uv) {
73,938✔
2122
          if (fins->o == IR_UREFO && gcstep_barrier(J, ref))
28,723✔
2123
            break;
2124
          return ref;
2125
        }
2126
      }
2127
      ref = ir->prev;
46,089✔
2128
    }
2129
  }
2130
  return EMITFOLD;
34,811✔
2131
}
2132

2133
LJFOLD(HREFK any any)
2134
LJFOLDX(lj_opt_fwd_hrefk)
2135

2136
LJFOLD(HREF TNEW any)
2137
LJFOLDF(fwd_href_tnew)
34✔
2138
{
2139
  if (lj_opt_fwd_href_nokey(J))
34✔
2140
    return lj_ir_kkptr(J, niltvg(J2G(J)));
14✔
2141
  return NEXTFOLD;
2142
}
2143

2144
LJFOLD(HREF TDUP KPRI)
2145
LJFOLD(HREF TDUP KGC)
2146
LJFOLD(HREF TDUP KNUM)
2147
LJFOLDF(fwd_href_tdup)
227✔
2148
{
2149
  TValue keyv;
227✔
2150
  lj_ir_kvalue(J->L, &keyv, fright);
227✔
2151
  if (lj_tab_get(J->L, ir_ktab(IR(fleft->op1)), &keyv) == niltvg(J2G(J)) &&
454✔
2152
      lj_opt_fwd_href_nokey(J))
227✔
2153
    return lj_ir_kkptr(J, niltvg(J2G(J)));
159✔
2154
  return NEXTFOLD;
2155
}
2156

2157
/* We can safely FOLD/CSE array/hash refs and field loads, since there
2158
** are no corresponding stores. But we need to check for any NEWREF with
2159
** an aliased table, as it may invalidate all of the pointers and fields.
2160
** Only HREF needs the NEWREF check -- AREF and HREFK already depend on
2161
** FLOADs. And NEWREF itself is treated like a store (see below).
2162
** LREF is constant (per trace) since coroutine switches are not inlined.
2163
*/
2164
LJFOLD(FLOAD TNEW IRFL_TAB_ASIZE)
2165
LJFOLDF(fload_tab_tnew_asize)
982✔
2166
{
2167
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
982✔
2168
    return INTFOLD(fleft->op1);
308✔
2169
  return NEXTFOLD;
2170
}
2171

2172
LJFOLD(FLOAD TNEW IRFL_TAB_HMASK)
2173
LJFOLDF(fload_tab_tnew_hmask)
98✔
2174
{
2175
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
98✔
2176
    return INTFOLD((1 << fleft->op2)-1);
72✔
2177
  return NEXTFOLD;
2178
}
2179

2180
LJFOLD(FLOAD TDUP IRFL_TAB_ASIZE)
2181
LJFOLDF(fload_tab_tdup_asize)
58✔
2182
{
2183
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
58✔
2184
    return INTFOLD((int32_t)ir_ktab(IR(fleft->op1))->asize);
41✔
2185
  return NEXTFOLD;
2186
}
2187

2188
LJFOLD(FLOAD TDUP IRFL_TAB_HMASK)
2189
LJFOLDF(fload_tab_tdup_hmask)
1,608✔
2190
{
2191
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
1,608✔
2192
    return INTFOLD((int32_t)ir_ktab(IR(fleft->op1))->hmask);
1,285✔
2193
  return NEXTFOLD;
2194
}
2195

2196
LJFOLD(HREF any any)
2197
LJFOLD(FLOAD any IRFL_TAB_ARRAY)
2198
LJFOLD(FLOAD any IRFL_TAB_NODE)
2199
LJFOLD(FLOAD any IRFL_TAB_ASIZE)
2200
LJFOLD(FLOAD any IRFL_TAB_HMASK)
2201
LJFOLDF(fload_tab_ah)
2,527,802✔
2202
{
2203
  TRef tr = lj_opt_cse(J);
2,527,802✔
2204
  return lj_opt_fwd_tptr(J, tref_ref(tr)) ? tr : EMITFOLD;
2,527,802✔
2205
}
2206

2207
/* Strings are immutable, so we can safely FOLD/CSE the related FLOAD. */
2208
LJFOLD(FLOAD KGC IRFL_STR_LEN)
2209
LJFOLDF(fload_str_len_kgc)
112✔
2210
{
2211
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
112✔
2212
    return INTFOLD((int32_t)ir_kstr(fleft)->len);
112✔
2213
  return NEXTFOLD;
2214
}
2215

2216
LJFOLD(FLOAD SNEW IRFL_STR_LEN)
2217
LJFOLDF(fload_str_len_snew)
3✔
2218
{
2219
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
3✔
2220
    PHIBARRIER(fleft);
3✔
2221
    return fleft->op2;
3✔
2222
  }
2223
  return NEXTFOLD;
2224
}
2225

2226
LJFOLD(FLOAD TOSTR IRFL_STR_LEN)
2227
LJFOLDF(fload_str_len_tostr)
×
2228
{
2229
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && fleft->op2 == IRTOSTR_CHAR)
×
2230
    return INTFOLD(1);
×
2231
  return NEXTFOLD;
2232
}
2233

2234
/* The C type ID of cdata objects is immutable. */
2235
LJFOLD(FLOAD KGC IRFL_CDATA_CTYPEID)
2236
LJFOLDF(fload_cdata_typeid_kgc)
10,438✔
2237
{
2238
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
10,438✔
2239
    return INTFOLD((int32_t)ir_kcdata(fleft)->ctypeid);
10,438✔
2240
  return NEXTFOLD;
2241
}
2242

2243
/* Get the contents of immutable cdata objects. */
2244
LJFOLD(FLOAD KGC IRFL_CDATA_PTR)
2245
LJFOLD(FLOAD KGC IRFL_CDATA_INT)
2246
LJFOLD(FLOAD KGC IRFL_CDATA_INT64)
2247
LJFOLDF(fload_cdata_int64_kgc)
10,402✔
2248
{
2249
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
10,402✔
2250
    void *p = cdataptr(ir_kcdata(fleft));
10,402✔
2251
    if (irt_is64(fins->t))
10,402✔
2252
      return INT64FOLD(*(uint64_t *)p);
10,370✔
2253
    else
2254
      return INTFOLD(*(int32_t *)p);
32✔
2255
  }
2256
  return NEXTFOLD;
2257
}
2258

2259
LJFOLD(FLOAD CNEW IRFL_CDATA_CTYPEID)
2260
LJFOLD(FLOAD CNEWI IRFL_CDATA_CTYPEID)
2261
LJFOLDF(fload_cdata_typeid_cnew)
49,461✔
2262
{
2263
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
49,461✔
2264
    return fleft->op1;  /* No PHI barrier needed. CNEW/CNEWI op1 is const. */
49,461✔
2265
  return NEXTFOLD;
2266
}
2267

2268
/* Pointer, int and int64 cdata objects are immutable. */
2269
LJFOLD(FLOAD CNEWI IRFL_CDATA_PTR)
2270
LJFOLD(FLOAD CNEWI IRFL_CDATA_INT)
2271
LJFOLD(FLOAD CNEWI IRFL_CDATA_INT64)
2272
LJFOLDF(fload_cdata_ptr_int64_cnew)
49,391✔
2273
{
2274
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
49,391✔
2275
    return fleft->op2;  /* Fold even across PHI to avoid allocations. */
49,391✔
2276
  return NEXTFOLD;
2277
}
2278

2279
LJFOLD(FLOAD any IRFL_STR_LEN)
2280
LJFOLD(FLOAD any IRFL_FUNC_ENV)
2281
LJFOLD(FLOAD any IRFL_THREAD_ENV)
2282
LJFOLD(FLOAD any IRFL_CDATA_CTYPEID)
2283
LJFOLD(FLOAD any IRFL_CDATA_PTR)
2284
LJFOLD(FLOAD any IRFL_CDATA_INT)
2285
LJFOLD(FLOAD any IRFL_CDATA_INT64)
2286
LJFOLD(VLOAD any any)  /* Vararg loads have no corresponding stores. */
2287
LJFOLDX(lj_opt_cse)
2288

2289
/* All other field loads need alias analysis. */
2290
LJFOLD(FLOAD any any)
2291
LJFOLDX(lj_opt_fwd_fload)
2292

2293
/* This is for LOOP only. Recording handles SLOADs internally. */
2294
LJFOLD(SLOAD any any)
2295
LJFOLDF(fwd_sload)
5,463✔
2296
{
2297
  if ((fins->op2 & IRSLOAD_FRAME)) {
5,463✔
2298
    TRef tr = lj_opt_cse(J);
38✔
2299
    return tref_ref(tr) < J->chain[IR_RETF] ? EMITFOLD : tr;
38✔
2300
  } else {
2301
    lj_assertJ(J->slot[fins->op1] != 0, "uninitialized slot accessed");
5,425✔
2302
    return J->slot[fins->op1];
5,425✔
2303
  }
2304
}
2305

2306
/* Only fold for KKPTR. The pointer _and_ the contents must be const. */
2307
LJFOLD(XLOAD KKPTR any)
2308
LJFOLDF(xload_kptr)
42✔
2309
{
2310
  TRef tr = kfold_xload(J, fins, ir_kptr(fleft));
42✔
2311
  return tr ? tr : NEXTFOLD;
42✔
2312
}
2313

2314
LJFOLD(XLOAD any any)
2315
LJFOLDX(lj_opt_fwd_xload)
2316

2317
/* -- Frame handling ------------------------------------------------------ */
2318

2319
/* Prevent CSE of a REF_BASE operand across IR_RETF. */
2320
LJFOLD(SUB any BASE)
2321
LJFOLD(SUB BASE any)
2322
LJFOLD(EQ any BASE)
2323
LJFOLDF(fold_base)
498✔
2324
{
2325
  return lj_opt_cselim(J, J->chain[IR_RETF]);
498✔
2326
}
2327

2328
/* -- Write barriers ------------------------------------------------------ */
2329

2330
/* Write barriers are amenable to CSE, but not across any incremental
2331
** GC steps.
2332
**
2333
** The same logic applies to open upvalue references, because a stack
2334
** may be resized during a GC step (not the current stack, but maybe that
2335
** of a coroutine).
2336
*/
2337
LJFOLD(TBAR any)
2338
LJFOLD(OBAR any any)
2339
LJFOLD(UREFO any any)
2340
LJFOLDF(barrier_tab)
201,544✔
2341
{
2342
  TRef tr = lj_opt_cse(J);
201,544✔
2343
  if (gcstep_barrier(J, tref_ref(tr)))  /* CSE across GC step? */
201,544✔
2344
    return EMITFOLD;  /* Raw emit. Assumes fins is left intact by CSE. */
59✔
2345
  return tr;
2346
}
2347

2348
LJFOLD(TBAR TNEW)
2349
LJFOLD(TBAR TDUP)
2350
LJFOLDF(barrier_tnew_tdup)
1,175✔
2351
{
2352
  /* New tables are always white and never need a barrier. */
2353
  if (fins->op1 < J->chain[IR_LOOP])  /* Except across a GC step. */
1,175✔
2354
    return NEXTFOLD;
×
2355
  return DROPFOLD;
2356
}
2357

2358
/* -- Profiling ----------------------------------------------------------- */
2359

2360
LJFOLD(PROF any any)
2361
LJFOLDF(prof)
1✔
2362
{
2363
  IRRef ref = J->chain[IR_PROF];
1✔
2364
  if (ref+1 == J->cur.nins)  /* Drop neighbouring IR_PROF. */
1✔
2365
    return ref;
2366
  return EMITFOLD;
1✔
2367
}
2368

2369
/* -- Stores and allocations ---------------------------------------------- */
2370

2371
/* Stores and allocations cannot be folded or passed on to CSE in general.
2372
** But some stores can be eliminated with dead-store elimination (DSE).
2373
**
2374
** Caveat: *all* stores and allocs must be listed here or they end up at CSE!
2375
*/
2376

2377
LJFOLD(ASTORE any any)
2378
LJFOLD(HSTORE any any)
2379
LJFOLDX(lj_opt_dse_ahstore)
2380

2381
LJFOLD(USTORE any any)
2382
LJFOLDX(lj_opt_dse_ustore)
2383

2384
LJFOLD(FSTORE any any)
2385
LJFOLDX(lj_opt_dse_fstore)
2386

2387
LJFOLD(XSTORE any any)
2388
LJFOLDX(lj_opt_dse_xstore)
2389

2390
LJFOLD(NEWREF any any)  /* Treated like a store. */
2391
LJFOLD(CALLA any any)
2392
LJFOLD(CALLL any any)  /* Safeguard fallback. */
2393
LJFOLD(CALLS any any)
2394
LJFOLD(CALLXS any any)
2395
LJFOLD(XBAR)
2396
LJFOLD(RETF any any)  /* Modifies BASE. */
2397
LJFOLD(TNEW any any)
2398
LJFOLD(TDUP any)
2399
LJFOLD(CNEW any any)
2400
LJFOLD(XSNEW any any)
2401
LJFOLD(BUFHDR any any)
2402
LJFOLDX(lj_ir_emit)
2403

2404
/* ------------------------------------------------------------------------ */
2405

2406
/* Every entry in the generated hash table is a 32 bit pattern:
2407
**
2408
** xxxxxxxx iiiiiii lllllll rrrrrrrrrr
2409
**
2410
**   xxxxxxxx = 8 bit index into fold function table
2411
**    iiiiiii = 7 bit folded instruction opcode
2412
**    lllllll = 7 bit left instruction opcode
2413
** rrrrrrrrrr = 8 bit right instruction opcode or 10 bits from literal field
2414
*/
2415

2416
#include "lj_folddef.h"
2417

2418
/* ------------------------------------------------------------------------ */
2419

2420
/* Fold IR instruction. */
2421
TRef LJ_FASTCALL lj_opt_fold(jit_State *J)
10,700,774✔
2422
{
2423
  uint32_t key, any;
10,700,774✔
2424
  IRRef ref;
10,700,774✔
2425

2426
  if (LJ_UNLIKELY((J->flags & JIT_F_OPT_MASK) != JIT_F_OPT_DEFAULT)) {
10,700,774✔
2427
    lj_assertJ(((JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE|JIT_F_OPT_DSE) |
603,054✔
2428
                JIT_F_OPT_DEFAULT) == JIT_F_OPT_DEFAULT,
2429
               "bad JIT_F_OPT_DEFAULT");
2430
    /* Folding disabled? Chain to CSE, but not for loads/stores/allocs. */
2431
    if (!(J->flags & JIT_F_OPT_FOLD) && irm_kind(lj_ir_mode[fins->o]) == IRM_N)
603,054✔
2432
      return lj_opt_cse(J);
260,842✔
2433

2434
    /* No FOLD, forwarding or CSE? Emit raw IR for loads, except for SLOAD. */
2435
    if ((J->flags & (JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE)) !=
342,212✔
2436
                    (JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE) &&
340,609✔
2437
        irm_kind(lj_ir_mode[fins->o]) == IRM_L && fins->o != IR_SLOAD)
340,609✔
2438
      return lj_ir_emit(J);
252,631✔
2439

2440
    /* No FOLD or DSE? Emit raw IR for stores. */
2441
    if ((J->flags & (JIT_F_OPT_FOLD|JIT_F_OPT_DSE)) !=
89,581✔
2442
                    (JIT_F_OPT_FOLD|JIT_F_OPT_DSE) &&
87,822✔
2443
        irm_kind(lj_ir_mode[fins->o]) == IRM_S)
87,822✔
2444
      return lj_ir_emit(J);
87,587✔
2445
  }
2446

2447
  /* Fold engine start/retry point. */
2448
retry:
10,099,714✔
2449
  /* Construct key from opcode and operand opcodes (unless literal/none). */
2450
  key = ((uint32_t)fins->o << 17);
10,339,611✔
2451
  if (fins->op1 >= J->cur.nk) {
10,339,611✔
2452
    key += (uint32_t)IR(fins->op1)->o << 10;
10,329,986✔
2453
    *fleft = *IR(fins->op1);
10,329,986✔
2454
    if (fins->op1 < REF_TRUE)
10,329,986✔
2455
      fleft[1] = IR(fins->op1)[1];
1,929,438✔
2456
  }
2457
  if (fins->op2 >= J->cur.nk) {
10,339,611✔
2458
    key += (uint32_t)IR(fins->op2)->o;
5,531,331✔
2459
    *fright = *IR(fins->op2);
5,531,331✔
2460
    if (fins->op2 < REF_TRUE)
5,531,331✔
2461
      fright[1] = IR(fins->op2)[1];
4,716,643✔
2462
  } else {
2463
    key += (fins->op2 & 0x3ffu);  /* Literal mask. Must include IRCONV_*MASK. */
4,808,280✔
2464
  }
2465

2466
  /* Check for a match in order from most specific to least specific. */
2467
  any = 0;
10,339,611✔
2468
  for (;;) {
45,693,299✔
2469
    uint32_t k = key | (any & 0x1ffff);
28,016,455✔
2470
    uint32_t h = fold_hashkey(k);
28,016,455✔
2471
    uint32_t fh = fold_hash[h];  /* Lookup key in semi-perfect hash table. */
28,016,455✔
2472
    if ((fh & 0xffffff) == k || (fh = fold_hash[h+1], (fh & 0xffffff) == k)) {
28,016,455✔
2473
      ref = (IRRef)tref_ref(fold_func[fh >> 24](J));
10,394,379✔
2474
      if (ref != NEXTFOLD)
10,394,379✔
2475
        break;
2476
    }
2477
    if (any == 0xfffff)  /* Exhausted folding. Pass on to CSE. */
19,995,446✔
2478
      return lj_opt_cse(J);
2,318,602✔
2479
    any = (any | (any >> 10)) ^ 0xffc00;
17,676,844✔
2480
  }
2481

2482
  /* Return value processing, ordered by frequency. */
2483
  if (LJ_LIKELY(ref >= MAX_FOLD))
8,021,009✔
2484
    return TREF(ref, irt_t(IR(ref)->t));
6,711,146✔
2485
  if (ref == RETRYFOLD)
1,309,863✔
2486
    goto retry;
239,897✔
2487
  if (ref == KINTFOLD)
1,069,966✔
2488
    return lj_ir_kint(J, fins->i);
27,575✔
2489
  if (ref == FAILFOLD)
1,042,391✔
2490
    lj_trace_err(J, LJ_TRERR_GFAIL);
3✔
2491
  lj_assertJ(ref == DROPFOLD, "bad fold result");
2492
  return REF_DROP;
2493
}
2494

2495
/* -- Common-Subexpression Elimination ------------------------------------ */
2496

2497
/* CSE an IR instruction. This is very fast due to the skip-list chains. */
2498
TRef LJ_FASTCALL lj_opt_cse(jit_State *J)
6,874,710✔
2499
{
2500
  /* Avoid narrow to wide store-to-load forwarding stall */
2501
  IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
6,874,710✔
2502
  IROp op = fins->o;
6,874,710✔
2503
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
6,874,710✔
2504
    /* Limited search for same operands in per-opcode chain. */
2505
    IRRef ref = J->chain[op];
6,613,868✔
2506
    IRRef lim = fins->op1;
6,613,868✔
2507
    if (fins->op2 > lim) lim = fins->op2;  /* Relies on lit < REF_BIAS. */
6,613,868✔
2508
    while (ref > lim) {
23,630,732✔
2509
      if (IR(ref)->op12 == op12)
20,908,550✔
2510
        return TREF(ref, irt_t(IR(ref)->t));  /* Common subexpression found. */
3,891,686✔
2511
      ref = IR(ref)->prev;
17,016,864✔
2512
    }
2513
  }
2514
  /* Otherwise emit IR (inlined for speed). */
2515
  {
2516
    IRRef ref = lj_ir_nextins(J);
2,983,024✔
2517
    IRIns *ir = IR(ref);
2,983,024✔
2518
    ir->prev = J->chain[op];
2,983,024✔
2519
    ir->op12 = op12;
2,983,024✔
2520
    J->chain[op] = (IRRef1)ref;
2,983,024✔
2521
    ir->o = fins->o;
2,983,024✔
2522
    J->guardemit.irt |= fins->t.irt;
2,983,024✔
2523
    return TREF(ref, irt_t((ir->t = fins->t)));
2,983,024✔
2524
  }
2525
}
2526

2527
/* CSE with explicit search limit. */
2528
TRef LJ_FASTCALL lj_opt_cselim(jit_State *J, IRRef lim)
247,452✔
2529
{
2530
  IRRef ref = J->chain[fins->o];
247,452✔
2531
  IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
247,452✔
2532
  while (ref > lim) {
982,017✔
2533
    if (IR(ref)->op12 == op12)
862,032✔
2534
      return ref;
127,467✔
2535
    ref = IR(ref)->prev;
734,565✔
2536
  }
2537
  return lj_ir_emit(J);
119,985✔
2538
}
2539

2540
/* ------------------------------------------------------------------------ */
2541

2542
#undef IR
2543
#undef fins
2544
#undef fleft
2545
#undef fright
2546
#undef knumleft
2547
#undef knumright
2548
#undef emitir
2549

2550
#endif
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc