• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

tarantool / luajit / 6929314139

20 Nov 2023 11:11AM UTC coverage: 88.552% (+0.08%) from 88.476%
6929314139

push

github

Buristan
Fix ABC FOLD rule with constants.

Reported by XmiliaH.

(cherry-picked from commit c8bcf1e5f)

`fold_abc_k()` doesn't patch the first ABC check when the right constant
operand is negative. This leads to out-of-bounds access from the array
on a trace. This patch casts the operands to uint32_t for comparison. If
the right IR contains a negative integer, the second IR will always be
patched. Also, because the ABC check on the trace is unordered, this
guard will always fail.

Also, this fold rule creates new instructions that reference operands
across PHIs. This opens the room for other optimizations (like DCE), so
some guards become eliminated, and we use out-of-bounds access from the
array part of the table on trace. This patch adds the missing
`PHIBARRIER()` check.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

5369 of 5976 branches covered (0.0%)

Branch coverage included in aggregate %.

3 of 3 new or added lines in 1 file covered. (100.0%)

4 existing lines in 1 file now uncovered.

20560 of 23305 relevant lines covered (88.22%)

2748214.85 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

75.79
/src/lj_opt_fold.c
1
/*
2
** FOLD: Constant Folding, Algebraic Simplifications and Reassociation.
3
** ABCelim: Array Bounds Check Elimination.
4
** CSE: Common-Subexpression Elimination.
5
** Copyright (C) 2005-2017 Mike Pall. See Copyright Notice in luajit.h
6
*/
7

8
#define lj_opt_fold_c
9
#define LUA_CORE
10

11
#include <math.h>
12

13
#include "lj_obj.h"
14

15
#if LJ_HASJIT
16

17
#include "lj_buf.h"
18
#include "lj_str.h"
19
#include "lj_tab.h"
20
#include "lj_ir.h"
21
#include "lj_jit.h"
22
#include "lj_ircall.h"
23
#include "lj_iropt.h"
24
#include "lj_trace.h"
25
#if LJ_HASFFI
26
#include "lj_ctype.h"
27
#include "lj_carith.h"
28
#endif
29
#include "lj_vm.h"
30
#include "lj_strscan.h"
31
#include "lj_strfmt.h"
32

33
/* Here's a short description how the FOLD engine processes instructions:
34
**
35
** The FOLD engine receives a single instruction stored in fins (J->fold.ins).
36
** The instruction and its operands are used to select matching fold rules.
37
** These are applied iteratively until a fixed point is reached.
38
**
39
** The 8 bit opcode of the instruction itself plus the opcodes of the
40
** two instructions referenced by its operands form a 24 bit key
41
** 'ins left right' (unused operands -> 0, literals -> lowest 8 bits).
42
**
43
** This key is used for partial matching against the fold rules. The
44
** left/right operand fields of the key are successively masked with
45
** the 'any' wildcard, from most specific to least specific:
46
**
47
**   ins left right
48
**   ins any  right
49
**   ins left any
50
**   ins any  any
51
**
52
** The masked key is used to lookup a matching fold rule in a semi-perfect
53
** hash table. If a matching rule is found, the related fold function is run.
54
** Multiple rules can share the same fold function. A fold rule may return
55
** one of several special values:
56
**
57
** - NEXTFOLD means no folding was applied, because an additional test
58
**   inside the fold function failed. Matching continues against less
59
**   specific fold rules. Finally the instruction is passed on to CSE.
60
**
61
** - RETRYFOLD means the instruction was modified in-place. Folding is
62
**   retried as if this instruction had just been received.
63
**
64
** All other return values are terminal actions -- no further folding is
65
** applied:
66
**
67
** - INTFOLD(i) returns a reference to the integer constant i.
68
**
69
** - LEFTFOLD and RIGHTFOLD return the left/right operand reference
70
**   without emitting an instruction.
71
**
72
** - CSEFOLD and EMITFOLD pass the instruction directly to CSE or emit
73
**   it without passing through any further optimizations.
74
**
75
** - FAILFOLD, DROPFOLD and CONDFOLD only apply to instructions which have
76
**   no result (e.g. guarded assertions): FAILFOLD means the guard would
77
**   always fail, i.e. the current trace is pointless. DROPFOLD means
78
**   the guard is always true and has been eliminated. CONDFOLD is a
79
**   shortcut for FAILFOLD + cond (i.e. drop if true, otherwise fail).
80
**
81
** - Any other return value is interpreted as an IRRef or TRef. This
82
**   can be a reference to an existing or a newly created instruction.
83
**   Only the least-significant 16 bits (IRRef1) are used to form a TRef
84
**   which is finally returned to the caller.
85
**
86
** The FOLD engine receives instructions both from the trace recorder and
87
** substituted instructions from LOOP unrolling. This means all types
88
** of instructions may end up here, even though the recorder bypasses
89
** FOLD in some cases. Thus all loads, stores and allocations must have
90
** an any/any rule to avoid being passed on to CSE.
91
**
92
** Carefully read the following requirements before adding or modifying
93
** any fold rules:
94
**
95
** Requirement #1: All fold rules must preserve their destination type.
96
**
97
** Consistently use INTFOLD() (KINT result) or lj_ir_knum() (KNUM result).
98
** Never use lj_ir_knumint() which can have either a KINT or KNUM result.
99
**
100
** Requirement #2: Fold rules should not create *new* instructions which
101
** reference operands *across* PHIs.
102
**
103
** E.g. a RETRYFOLD with 'fins->op1 = fleft->op1' is invalid if the
104
** left operand is a PHI. Then fleft->op1 would point across the PHI
105
** frontier to an invariant instruction. Adding a PHI for this instruction
106
** would be counterproductive. The solution is to add a barrier which
107
** prevents folding across PHIs, i.e. 'PHIBARRIER(fleft)' in this case.
108
** The only exception is for recurrences with high latencies like
109
** repeated int->num->int conversions.
110
**
111
** One could relax this condition a bit if the referenced instruction is
112
** a PHI, too. But this often leads to worse code due to excessive
113
** register shuffling.
114
**
115
** Note: returning *existing* instructions (e.g. LEFTFOLD) is ok, though.
116
** Even returning fleft->op1 would be ok, because a new PHI will added,
117
** if needed. But again, this leads to excessive register shuffling and
118
** should be avoided.
119
**
120
** Requirement #3: The set of all fold rules must be monotonic to guarantee
121
** termination.
122
**
123
** The goal is optimization, so one primarily wants to add strength-reducing
124
** rules. This means eliminating an instruction or replacing an instruction
125
** with one or more simpler instructions. Don't add fold rules which point
126
** into the other direction.
127
**
128
** Some rules (like commutativity) do not directly reduce the strength of
129
** an instruction, but enable other fold rules (e.g. by moving constants
130
** to the right operand). These rules must be made unidirectional to avoid
131
** cycles.
132
**
133
** Rule of thumb: the trace recorder expands the IR and FOLD shrinks it.
134
*/
135

136
/* Some local macros to save typing. Undef'd at the end. */
137
#define IR(ref)                (&J->cur.ir[(ref)])
138
#define fins                (&J->fold.ins)
139
#define fleft                (J->fold.left)
140
#define fright                (J->fold.right)
141
#define knumleft        (ir_knum(fleft)->n)
142
#define knumright        (ir_knum(fright)->n)
143

144
/* Pass IR on to next optimization in chain (FOLD). */
145
#define emitir(ot, a, b)        (lj_ir_set(J, (ot), (a), (b)), lj_opt_fold(J))
146

147
/* Fold function type. Fastcall on x86 significantly reduces their size. */
148
typedef IRRef (LJ_FASTCALL *FoldFunc)(jit_State *J);
149

150
/* Macros for the fold specs, so buildvm can recognize them. */
151
#define LJFOLD(x)
152
#define LJFOLDX(x)
153
#define LJFOLDF(name)        static TRef LJ_FASTCALL fold_##name(jit_State *J)
154
/* Note: They must be at the start of a line or buildvm ignores them! */
155

156
/* Barrier to prevent using operands across PHIs. */
157
#define PHIBARRIER(ir)        if (irt_isphi((ir)->t)) return NEXTFOLD
158

159
/* Barrier to prevent folding across a GC step.
160
** GC steps can only happen at the head of a trace and at LOOP.
161
** And the GC is only driven forward if there's at least one allocation.
162
*/
163
#define gcstep_barrier(J, ref) \
164
  ((ref) < J->chain[IR_LOOP] && \
165
   (J->chain[IR_SNEW] || J->chain[IR_XSNEW] || \
166
    J->chain[IR_TNEW] || J->chain[IR_TDUP] || \
167
    J->chain[IR_CNEW] || J->chain[IR_CNEWI] || \
168
    J->chain[IR_BUFSTR] || J->chain[IR_TOSTR] || J->chain[IR_CALLA]))
169

170
/* -- Constant folding for FP numbers ------------------------------------- */
171

172
LJFOLD(ADD KNUM KNUM)
173
LJFOLD(SUB KNUM KNUM)
174
LJFOLD(MUL KNUM KNUM)
175
LJFOLD(DIV KNUM KNUM)
176
LJFOLD(LDEXP KNUM KNUM)
177
LJFOLD(MIN KNUM KNUM)
178
LJFOLD(MAX KNUM KNUM)
179
LJFOLDF(kfold_numarith)
125✔
180
{
181
  lua_Number a = knumleft;
125✔
182
  lua_Number b = knumright;
125✔
183
  lua_Number y = lj_vm_foldarith(a, b, fins->o - IR_ADD);
125✔
184
  return lj_ir_knum(J, y);
125✔
185
}
186

187
LJFOLD(NEG KNUM FLOAD)
188
LJFOLD(ABS KNUM FLOAD)
189
LJFOLDF(kfold_numabsneg)
11✔
190
{
191
  lua_Number a = knumleft;
11✔
192
  lua_Number y = lj_vm_foldarith(a, a, fins->o - IR_ADD);
11✔
193
  return lj_ir_knum(J, y);
11✔
194
}
195

196
LJFOLD(LDEXP KNUM KINT)
197
LJFOLDF(kfold_ldexp)
×
198
{
199
#if LJ_TARGET_X86ORX64
200
  UNUSED(J);
×
201
  return NEXTFOLD;
×
202
#else
203
  return lj_ir_knum(J, ldexp(knumleft, fright->i));
204
#endif
205
}
206

207
LJFOLD(FPMATH KNUM any)
208
LJFOLDF(kfold_fpmath)
9✔
209
{
210
  lua_Number a = knumleft;
9✔
211
  lua_Number y = lj_vm_foldfpm(a, fins->op2);
9✔
212
  return lj_ir_knum(J, y);
9✔
213
}
214

215
LJFOLD(CALLN KNUM any)
216
LJFOLDF(kfold_fpcall1)
5✔
217
{
218
  const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
5✔
219
  if (CCI_TYPE(ci) == IRT_NUM) {
5✔
220
    double y = ((double (*)(double))ci->func)(knumleft);
5✔
221
    return lj_ir_knum(J, y);
5✔
222
  }
223
  return NEXTFOLD;
224
}
225

226
LJFOLD(CALLN CARG IRCALL_atan2)
227
LJFOLDF(kfold_fpcall2)
1✔
228
{
229
  if (irref_isk(fleft->op1) && irref_isk(fleft->op2)) {
1✔
230
    const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
1✔
231
    double a = ir_knum(IR(fleft->op1))->n;
1✔
232
    double b = ir_knum(IR(fleft->op2))->n;
1✔
233
    double y = ((double (*)(double, double))ci->func)(a, b);
1✔
234
    return lj_ir_knum(J, y);
1✔
235
  }
236
  return NEXTFOLD;
237
}
238

239
LJFOLD(POW KNUM KNUM)
240
LJFOLDF(kfold_numpow)
2✔
241
{
242
  return lj_ir_knum(J, lj_vm_foldarith(knumleft, knumright, IR_POW - IR_ADD));
2✔
243
}
244

245
/* Must not use kfold_kref for numbers (could be NaN). */
246
LJFOLD(EQ KNUM KNUM)
247
LJFOLD(NE KNUM KNUM)
248
LJFOLD(LT KNUM KNUM)
249
LJFOLD(GE KNUM KNUM)
250
LJFOLD(LE KNUM KNUM)
251
LJFOLD(GT KNUM KNUM)
252
LJFOLD(ULT KNUM KNUM)
253
LJFOLD(UGE KNUM KNUM)
254
LJFOLD(ULE KNUM KNUM)
255
LJFOLD(UGT KNUM KNUM)
256
LJFOLDF(kfold_numcomp)
4✔
257
{
258
  return CONDFOLD(lj_ir_numcmp(knumleft, knumright, (IROp)fins->o));
4✔
259
}
260

261
/* -- Constant folding for 32 bit integers -------------------------------- */
262

263
static int32_t kfold_intop(int32_t k1, int32_t k2, IROp op)
1,878✔
264
{
265
  switch (op) {
1,878✔
266
  case IR_ADD: k1 += k2; break;
1,734✔
267
  case IR_SUB: k1 -= k2; break;
44✔
268
  case IR_MUL: k1 *= k2; break;
×
269
  case IR_MOD: k1 = lj_vm_modi(k1, k2); break;
2✔
270
  case IR_NEG: k1 = -k1; break;
×
271
  case IR_BAND: k1 &= k2; break;
×
272
  case IR_BOR: k1 |= k2; break;
×
273
  case IR_BXOR: k1 ^= k2; break;
×
274
  case IR_BSHL: k1 <<= (k2 & 31); break;
7✔
275
  case IR_BSHR: k1 = (int32_t)((uint32_t)k1 >> (k2 & 31)); break;
11✔
276
  case IR_BSAR: k1 >>= (k2 & 31); break;
×
277
  case IR_BROL: k1 = (int32_t)lj_rol((uint32_t)k1, (k2 & 31)); break;
1✔
278
  case IR_BROR: k1 = (int32_t)lj_ror((uint32_t)k1, (k2 & 31)); break;
×
279
  case IR_MIN: k1 = k1 < k2 ? k1 : k2; break;
2✔
280
  case IR_MAX: k1 = k1 > k2 ? k1 : k2; break;
77✔
281
  default: lj_assertX(0, "bad IR op %d", op); break;
282
  }
283
  return k1;
1,878✔
284
}
285

286
LJFOLD(ADD KINT KINT)
287
LJFOLD(SUB KINT KINT)
288
LJFOLD(MUL KINT KINT)
289
LJFOLD(MOD KINT KINT)
290
LJFOLD(NEG KINT KINT)
291
LJFOLD(BAND KINT KINT)
292
LJFOLD(BOR KINT KINT)
293
LJFOLD(BXOR KINT KINT)
294
LJFOLD(BSHL KINT KINT)
295
LJFOLD(BSHR KINT KINT)
296
LJFOLD(BSAR KINT KINT)
297
LJFOLD(BROL KINT KINT)
298
LJFOLD(BROR KINT KINT)
299
LJFOLD(MIN KINT KINT)
300
LJFOLD(MAX KINT KINT)
301
LJFOLDF(kfold_intarith)
446✔
302
{
303
  return INTFOLD(kfold_intop(fleft->i, fright->i, (IROp)fins->o));
446✔
304
}
305

306
LJFOLD(ADDOV KINT KINT)
307
LJFOLD(SUBOV KINT KINT)
308
LJFOLD(MULOV KINT KINT)
309
LJFOLDF(kfold_intovarith)
110✔
310
{
311
  lua_Number n = lj_vm_foldarith((lua_Number)fleft->i, (lua_Number)fright->i,
220✔
312
                                 fins->o - IR_ADDOV);
110✔
313
  int32_t k = lj_num2int(n);
110✔
314
  if (n != (lua_Number)k)
110✔
315
    return FAILFOLD;
316
  return INTFOLD(k);
110✔
317
}
318

319
LJFOLD(BNOT KINT)
320
LJFOLDF(kfold_bnot)
×
321
{
322
  return INTFOLD(~fleft->i);
×
323
}
324

325
LJFOLD(BSWAP KINT)
326
LJFOLDF(kfold_bswap)
×
327
{
328
  return INTFOLD((int32_t)lj_bswap((uint32_t)fleft->i));
×
329
}
330

331
LJFOLD(LT KINT KINT)
332
LJFOLD(GE KINT KINT)
333
LJFOLD(LE KINT KINT)
334
LJFOLD(GT KINT KINT)
335
LJFOLD(ULT KINT KINT)
336
LJFOLD(UGE KINT KINT)
337
LJFOLD(ULE KINT KINT)
338
LJFOLD(UGT KINT KINT)
339
LJFOLD(ABC KINT KINT)
340
LJFOLDF(kfold_intcomp)
564✔
341
{
342
  int32_t a = fleft->i, b = fright->i;
564✔
343
  switch ((IROp)fins->o) {
564✔
344
  case IR_LT: return CONDFOLD(a < b);
53✔
345
  case IR_GE: return CONDFOLD(a >= b);
132✔
346
  case IR_LE: return CONDFOLD(a <= b);
147✔
347
  case IR_GT: return CONDFOLD(a > b);
14✔
348
  case IR_ULT: return CONDFOLD((uint32_t)a < (uint32_t)b);
×
349
  case IR_UGE: return CONDFOLD((uint32_t)a >= (uint32_t)b);
×
350
  case IR_ULE: return CONDFOLD((uint32_t)a <= (uint32_t)b);
35✔
351
  case IR_ABC:
183✔
352
  case IR_UGT: return CONDFOLD((uint32_t)a > (uint32_t)b);
183✔
353
  default: lj_assertJ(0, "bad IR op %d", fins->o); return FAILFOLD;
354
  }
355
}
356

357
LJFOLD(UGE any KINT)
358
LJFOLDF(kfold_intcomp0)
29✔
359
{
360
  if (fright->i == 0)
29✔
361
    return DROPFOLD;
9✔
362
  return NEXTFOLD;
363
}
364

365
/* -- Constant folding for 64 bit integers -------------------------------- */
366

367
static uint64_t kfold_int64arith(jit_State *J, uint64_t k1, uint64_t k2,
368
                                 IROp op)
369
{
370
  UNUSED(J);
371
#if LJ_HASFFI
372
  switch (op) {
373
  case IR_ADD: k1 += k2; break;
374
  case IR_SUB: k1 -= k2; break;
375
  case IR_MUL: k1 *= k2; break;
376
  case IR_BAND: k1 &= k2; break;
377
  case IR_BOR: k1 |= k2; break;
378
  case IR_BXOR: k1 ^= k2; break;
379
  case IR_BSHL: k1 <<= (k2 & 63); break;
380
  case IR_BSHR: k1 = (int32_t)((uint32_t)k1 >> (k2 & 63)); break;
381
  case IR_BSAR: k1 >>= (k2 & 63); break;
382
  case IR_BROL: k1 = (int32_t)lj_rol((uint32_t)k1, (k2 & 63)); break;
383
  case IR_BROR: k1 = (int32_t)lj_ror((uint32_t)k1, (k2 & 63)); break;
384
  default: lj_assertJ(0, "bad IR op %d", op); break;
385
  }
386
#else
387
  UNUSED(k2); UNUSED(op);
388
  lj_assertJ(0, "FFI IR op without FFI");
389
#endif
390
  return k1;
391
}
392

393
LJFOLD(ADD KINT64 KINT64)
394
LJFOLD(SUB KINT64 KINT64)
395
LJFOLD(MUL KINT64 KINT64)
396
LJFOLD(BAND KINT64 KINT64)
397
LJFOLD(BOR KINT64 KINT64)
398
LJFOLD(BXOR KINT64 KINT64)
399
LJFOLDF(kfold_int64arith)
54✔
400
{
401
  return INT64FOLD(kfold_int64arith(J, ir_k64(fleft)->u64,
54✔
402
                                    ir_k64(fright)->u64, (IROp)fins->o));
403
}
404

405
LJFOLD(DIV KINT64 KINT64)
406
LJFOLD(MOD KINT64 KINT64)
407
LJFOLD(POW KINT64 KINT64)
408
LJFOLDF(kfold_int64arith2)
×
409
{
410
#if LJ_HASFFI
411
  uint64_t k1 = ir_k64(fleft)->u64, k2 = ir_k64(fright)->u64;
×
412
  if (irt_isi64(fins->t)) {
×
413
    k1 = fins->o == IR_DIV ? lj_carith_divi64((int64_t)k1, (int64_t)k2) :
×
414
         fins->o == IR_MOD ? lj_carith_modi64((int64_t)k1, (int64_t)k2) :
×
415
                             lj_carith_powi64((int64_t)k1, (int64_t)k2);
×
416
  } else {
417
    k1 = fins->o == IR_DIV ? lj_carith_divu64(k1, k2) :
×
418
         fins->o == IR_MOD ? lj_carith_modu64(k1, k2) :
×
419
                             lj_carith_powu64(k1, k2);
×
420
  }
421
  return INT64FOLD(k1);
×
422
#else
423
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
424
#endif
425
}
426

427
LJFOLD(BSHL KINT64 KINT)
428
LJFOLD(BSHR KINT64 KINT)
429
LJFOLD(BSAR KINT64 KINT)
430
LJFOLD(BROL KINT64 KINT)
431
LJFOLD(BROR KINT64 KINT)
UNCOV
432
LJFOLDF(kfold_int64shift)
×
433
{
434
#if LJ_HASFFI
UNCOV
435
  uint64_t k = ir_k64(fleft)->u64;
×
UNCOV
436
  int32_t sh = (fright->i & 63);
×
UNCOV
437
  return INT64FOLD(lj_carith_shift64(k, sh, fins->o - IR_BSHL));
×
438
#else
439
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
440
#endif
441
}
442

443
LJFOLD(BNOT KINT64)
444
LJFOLDF(kfold_bnot64)
×
445
{
446
#if LJ_HASFFI
447
  return INT64FOLD(~ir_k64(fleft)->u64);
×
448
#else
449
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
450
#endif
451
}
452

453
LJFOLD(BSWAP KINT64)
454
LJFOLDF(kfold_bswap64)
×
455
{
456
#if LJ_HASFFI
457
  return INT64FOLD(lj_bswap64(ir_k64(fleft)->u64));
×
458
#else
459
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
460
#endif
461
}
462

463
LJFOLD(LT KINT64 KINT64)
464
LJFOLD(GE KINT64 KINT64)
465
LJFOLD(LE KINT64 KINT64)
466
LJFOLD(GT KINT64 KINT64)
467
LJFOLD(ULT KINT64 KINT64)
468
LJFOLD(UGE KINT64 KINT64)
469
LJFOLD(ULE KINT64 KINT64)
470
LJFOLD(UGT KINT64 KINT64)
471
LJFOLDF(kfold_int64comp)
×
472
{
473
#if LJ_HASFFI
474
  uint64_t a = ir_k64(fleft)->u64, b = ir_k64(fright)->u64;
×
475
  switch ((IROp)fins->o) {
×
476
  case IR_LT: return CONDFOLD((int64_t)a < (int64_t)b);
×
477
  case IR_GE: return CONDFOLD((int64_t)a >= (int64_t)b);
×
478
  case IR_LE: return CONDFOLD((int64_t)a <= (int64_t)b);
×
479
  case IR_GT: return CONDFOLD((int64_t)a > (int64_t)b);
×
480
  case IR_ULT: return CONDFOLD(a < b);
×
481
  case IR_UGE: return CONDFOLD(a >= b);
×
482
  case IR_ULE: return CONDFOLD(a <= b);
×
483
  case IR_UGT: return CONDFOLD(a > b);
×
484
  default: lj_assertJ(0, "bad IR op %d", fins->o); return FAILFOLD;
485
  }
486
#else
487
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
488
#endif
489
}
490

491
LJFOLD(UGE any KINT64)
492
LJFOLDF(kfold_int64comp0)
14✔
493
{
494
#if LJ_HASFFI
495
  if (ir_k64(fright)->u64 == 0)
14✔
496
    return DROPFOLD;
×
497
  return NEXTFOLD;
498
#else
499
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
500
#endif
501
}
502

503
/* -- Constant folding for strings ---------------------------------------- */
504

505
LJFOLD(SNEW KKPTR KINT)
506
LJFOLDF(kfold_snew_kptr)
6✔
507
{
508
  GCstr *s = lj_str_new(J->L, (const char *)ir_kptr(fleft), (size_t)fright->i);
6✔
509
  return lj_ir_kstr(J, s);
6✔
510
}
511

512
LJFOLD(SNEW any KINT)
513
LJFOLDF(kfold_snew_empty)
48✔
514
{
515
  if (fright->i == 0)
48✔
516
    return lj_ir_kstr(J, &J2G(J)->strempty);
×
517
  return NEXTFOLD;
518
}
519

520
LJFOLD(STRREF KGC KINT)
521
LJFOLDF(kfold_strref)
85✔
522
{
523
  GCstr *str = ir_kstr(fleft);
85✔
524
  lj_assertJ((MSize)fright->i <= str->len, "bad string ref");
85✔
525
  return lj_ir_kkptr(J, (char *)strdata(str) + fright->i);
85✔
526
}
527

528
LJFOLD(STRREF SNEW any)
529
LJFOLDF(kfold_strref_snew)
3✔
530
{
531
  PHIBARRIER(fleft);
3✔
532
  if (irref_isk(fins->op2) && fright->i == 0) {
3✔
533
    return fleft->op1;  /* strref(snew(ptr, len), 0) ==> ptr */
2✔
534
  } else {
535
    /* Reassociate: strref(snew(strref(str, a), len), b) ==> strref(str, a+b) */
536
    IRIns *ir = IR(fleft->op1);
1✔
537
    if (ir->o == IR_STRREF) {
1✔
538
      IRRef1 str = ir->op1;  /* IRIns * is not valid across emitir. */
×
539
      PHIBARRIER(ir);
×
540
      fins->op2 = emitir(IRTI(IR_ADD), ir->op2, fins->op2); /* Clobbers fins! */
×
541
      fins->op1 = str;
×
542
      fins->ot = IRT(IR_STRREF, IRT_PGC);
×
543
      return RETRYFOLD;
×
544
    }
545
  }
546
  return NEXTFOLD;
547
}
548

549
LJFOLD(CALLN CARG IRCALL_lj_str_cmp)
550
LJFOLDF(kfold_strcmp)
2✔
551
{
552
  if (irref_isk(fleft->op1) && irref_isk(fleft->op2)) {
2✔
553
    GCstr *a = ir_kstr(IR(fleft->op1));
×
554
    GCstr *b = ir_kstr(IR(fleft->op2));
×
555
    return INTFOLD(lj_str_cmp(a, b));
×
556
  }
557
  return NEXTFOLD;
558
}
559

560
/* -- Constant folding and forwarding for buffers ------------------------- */
561

562
/*
563
** Buffer ops perform stores, but their effect is limited to the buffer
564
** itself. Also, buffer ops are chained: a use of an op implies a use of
565
** all other ops up the chain. Conversely, if an op is unused, all ops
566
** up the chain can go unsed. This largely eliminates the need to treat
567
** them as stores.
568
**
569
** Alas, treating them as normal (IRM_N) ops doesn't work, because they
570
** cannot be CSEd in isolation. CSE for IRM_N is implicitly done in LOOP
571
** or if FOLD is disabled.
572
**
573
** The compromise is to declare them as loads, emit them like stores and
574
** CSE whole chains manually when the BUFSTR is to be emitted. Any chain
575
** fragments left over from CSE are eliminated by DCE.
576
*/
577

578
/* BUFHDR is emitted like a store, see below. */
579

580
LJFOLD(BUFPUT BUFHDR BUFSTR)
581
LJFOLDF(bufput_append)
44✔
582
{
583
  /* New buffer, no other buffer op inbetween and same buffer? */
584
  if ((J->flags & JIT_F_OPT_FWD) &&
44✔
585
      !(fleft->op2 & IRBUFHDR_APPEND) &&
44✔
586
      fleft->prev == fright->op2 &&
43✔
587
      fleft->op1 == IR(fright->op2)->op1) {
30✔
588
    IRRef ref = fins->op1;
29✔
589
    IR(ref)->op2 = (fleft->op2 | IRBUFHDR_APPEND);  /* Modify BUFHDR. */
29✔
590
    IR(ref)->op1 = fright->op1;
29✔
591
    return ref;
29✔
592
  }
593
  return EMITFOLD;  /* Always emit, CSE later. */
15✔
594
}
595

596
LJFOLD(BUFPUT any any)
597
LJFOLDF(bufput_kgc)
2,117✔
598
{
599
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && fright->o == IR_KGC) {
2,117✔
600
    GCstr *s2 = ir_kstr(fright);
842✔
601
    if (s2->len == 0) {  /* Empty string? */
842✔
602
      return LEFTFOLD;
12✔
603
    } else {
604
      if (fleft->o == IR_BUFPUT && irref_isk(fleft->op2) &&
830✔
605
          !irt_isphi(fleft->t)) {  /* Join two constant string puts in a row. */
35✔
606
        GCstr *s1 = ir_kstr(IR(fleft->op2));
35✔
607
        IRRef kref = lj_ir_kstr(J, lj_buf_cat2str(J->L, s1, s2));
35✔
608
        /* lj_ir_kstr() may realloc the IR and invalidates any IRIns *. */
609
        IR(fins->op1)->op2 = kref;  /* Modify previous BUFPUT. */
35✔
610
        return fins->op1;
35✔
611
      }
612
    }
613
  }
614
  return EMITFOLD;  /* Always emit, CSE later. */
2,070✔
615
}
616

617
LJFOLD(BUFSTR any any)
618
LJFOLDF(bufstr_kfold_cse)
1,222✔
619
{
620
  lj_assertJ(fleft->o == IR_BUFHDR || fleft->o == IR_BUFPUT ||
1,222✔
621
             fleft->o == IR_CALLL,
622
             "bad buffer constructor IR op %d", fleft->o);
623
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
1,222✔
624
    if (fleft->o == IR_BUFHDR) {  /* No put operations? */
1,214✔
625
      if (!(fleft->op2 & IRBUFHDR_APPEND))  /* Empty buffer? */
2✔
626
        return lj_ir_kstr(J, &J2G(J)->strempty);
1✔
627
      fins->op1 = fleft->op1;
1✔
628
      fins->op2 = fleft->prev;  /* Relies on checks in bufput_append. */
1✔
629
      return CSEFOLD;
1✔
630
    } else if (fleft->o == IR_BUFPUT) {
1,212✔
631
      IRIns *irb = IR(fleft->op1);
849✔
632
      if (irb->o == IR_BUFHDR && !(irb->op2 & IRBUFHDR_APPEND))
849✔
633
        return fleft->op2;  /* Shortcut for a single put operation. */
23✔
634
    }
635
  }
636
  /* Try to CSE the whole chain. */
637
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
1,197✔
638
    IRRef ref = J->chain[IR_BUFSTR];
1,189✔
639
    while (ref) {
27,228✔
640
      IRIns *irs = IR(ref), *ira = fleft, *irb = IR(irs->op1);
26,086✔
641
      while (ira->o == irb->o && ira->op2 == irb->op2) {
29,106✔
642
        lj_assertJ(ira->o == IR_BUFHDR || ira->o == IR_BUFPUT ||
3,078✔
643
                   ira->o == IR_CALLL || ira->o == IR_CARG,
644
                   "bad buffer constructor IR op %d", ira->o);
645
        if (ira->o == IR_BUFHDR && !(ira->op2 & IRBUFHDR_APPEND))
3,078✔
646
          return ref;  /* CSE succeeded. */
47✔
647
        if (ira->o == IR_CALLL && ira->op2 == IRCALL_lj_buf_puttab)
3,031✔
648
          break;
649
        ira = IR(ira->op1);
3,020✔
650
        irb = IR(irb->op1);
3,020✔
651
      }
652
      ref = irs->prev;
26,039✔
653
    }
654
  }
655
  return EMITFOLD;  /* No CSE possible. */
1,150✔
656
}
657

658
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_reverse)
659
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_upper)
660
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_lower)
661
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putquoted)
662
LJFOLDF(bufput_kfold_op)
32✔
663
{
664
  if (irref_isk(fleft->op2)) {
32✔
665
    const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
4✔
666
    SBuf *sb = lj_buf_tmp_(J->L);
4✔
667
    sb = ((SBuf * (LJ_FASTCALL *)(SBuf *, GCstr *))ci->func)(sb,
8✔
668
                                                       ir_kstr(IR(fleft->op2)));
4✔
669
    fins->o = IR_BUFPUT;
4✔
670
    fins->op1 = fleft->op1;
4✔
671
    fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
4✔
672
    return RETRYFOLD;
4✔
673
  }
674
  return EMITFOLD;  /* Always emit, CSE later. */
28✔
675
}
676

677
LJFOLD(CALLL CARG IRCALL_lj_buf_putstr_rep)
678
LJFOLDF(bufput_kfold_rep)
56✔
679
{
680
  if (irref_isk(fleft->op2)) {
56✔
681
    IRIns *irc = IR(fleft->op1);
7✔
682
    if (irref_isk(irc->op2)) {
7✔
683
      SBuf *sb = lj_buf_tmp_(J->L);
5✔
684
      sb = lj_buf_putstr_rep(sb, ir_kstr(IR(irc->op2)), IR(fleft->op2)->i);
5✔
685
      fins->o = IR_BUFPUT;
5✔
686
      fins->op1 = irc->op1;
5✔
687
      fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
5✔
688
      return RETRYFOLD;
5✔
689
    }
690
  }
691
  return EMITFOLD;  /* Always emit, CSE later. */
51✔
692
}
693

694
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfxint)
695
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum_int)
696
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum_uint)
697
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfnum)
698
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfstr)
699
LJFOLD(CALLL CARG IRCALL_lj_strfmt_putfchar)
700
LJFOLDF(bufput_kfold_fmt)
305✔
701
{
702
  IRIns *irc = IR(fleft->op1);
305✔
703
  lj_assertJ(irref_isk(irc->op2), "SFormat must be const");
305✔
704
  if (irref_isk(fleft->op2)) {
305✔
705
    SFormat sf = (SFormat)IR(irc->op2)->i;
5✔
706
    IRIns *ira = IR(fleft->op2);
5✔
707
    SBuf *sb = lj_buf_tmp_(J->L);
5✔
708
    switch (fins->op2) {
5✔
709
    case IRCALL_lj_strfmt_putfxint:
1✔
710
      sb = lj_strfmt_putfxint(sb, sf, ir_k64(ira)->u64);
1✔
711
      break;
1✔
712
    case IRCALL_lj_strfmt_putfstr:
×
713
      sb = lj_strfmt_putfstr(sb, sf, ir_kstr(ira));
×
714
      break;
×
715
    case IRCALL_lj_strfmt_putfchar:
×
716
      sb = lj_strfmt_putfchar(sb, sf, ira->i);
×
717
      break;
×
718
    case IRCALL_lj_strfmt_putfnum_int:
4✔
719
    case IRCALL_lj_strfmt_putfnum_uint:
720
    case IRCALL_lj_strfmt_putfnum:
721
    default: {
722
      const CCallInfo *ci = &lj_ir_callinfo[fins->op2];
4✔
723
      sb = ((SBuf * (*)(SBuf *, SFormat, lua_Number))ci->func)(sb, sf,
4✔
724
                                                         ir_knum(ira)->n);
725
      break;
4✔
726
      }
727
    }
728
    fins->o = IR_BUFPUT;
5✔
729
    fins->op1 = irc->op1;
5✔
730
    fins->op2 = lj_ir_kstr(J, lj_buf_tostr(sb));
5✔
731
    return RETRYFOLD;
5✔
732
  }
733
  return EMITFOLD;  /* Always emit, CSE later. */
300✔
734
}
735

736
/* -- Constant folding of pointer arithmetic ------------------------------ */
737

738
LJFOLD(ADD KGC KINT)
739
LJFOLD(ADD KGC KINT64)
740
LJFOLDF(kfold_add_kgc)
32✔
741
{
742
  GCobj *o = ir_kgc(fleft);
32✔
743
#if LJ_64
744
  ptrdiff_t ofs = (ptrdiff_t)ir_kint64(fright)->u64;
32✔
745
#else
746
  ptrdiff_t ofs = fright->i;
747
#endif
748
#if LJ_HASFFI
749
  if (irt_iscdata(fleft->t)) {
32✔
750
    CType *ct = ctype_raw(ctype_ctsG(J2G(J)), gco2cd(o)->ctypeid);
30✔
751
    if (ctype_isnum(ct->info) || ctype_isenum(ct->info) ||
30✔
752
        ctype_isptr(ct->info) || ctype_isfunc(ct->info) ||
30✔
753
        ctype_iscomplex(ct->info) || ctype_isvector(ct->info))
30✔
754
      return lj_ir_kkptr(J, (char *)o + ofs);
1✔
755
  }
756
#endif
757
  return lj_ir_kptr(J, (char *)o + ofs);
31✔
758
}
759

760
LJFOLD(ADD KPTR KINT)
761
LJFOLD(ADD KPTR KINT64)
762
LJFOLD(ADD KKPTR KINT)
763
LJFOLD(ADD KKPTR KINT64)
764
LJFOLDF(kfold_add_kptr)
24✔
765
{
766
  void *p = ir_kptr(fleft);
24✔
767
#if LJ_64
768
  ptrdiff_t ofs = (ptrdiff_t)ir_kint64(fright)->u64;
24✔
769
#else
770
  ptrdiff_t ofs = fright->i;
771
#endif
772
  return lj_ir_kptr_(J, fleft->o, (char *)p + ofs);
24✔
773
}
774

775
LJFOLD(ADD any KGC)
776
LJFOLD(ADD any KPTR)
777
LJFOLD(ADD any KKPTR)
778
LJFOLDF(kfold_add_kright)
22✔
779
{
780
  if (fleft->o == IR_KINT || fleft->o == IR_KINT64) {
22✔
781
    IRRef1 tmp = fins->op1; fins->op1 = fins->op2; fins->op2 = tmp;
22✔
782
    return RETRYFOLD;
22✔
783
  }
784
  return NEXTFOLD;
785
}
786

787
/* -- Constant folding of conversions ------------------------------------- */
788

789
LJFOLD(TOBIT KNUM KNUM)
790
LJFOLDF(kfold_tobit)
1✔
791
{
792
  return INTFOLD(lj_num2bit(knumleft));
1✔
793
}
794

795
LJFOLD(CONV KINT IRCONV_NUM_INT)
796
LJFOLDF(kfold_conv_kint_num)
3,262✔
797
{
798
  return lj_ir_knum(J, (lua_Number)fleft->i);
3,262✔
799
}
800

801
LJFOLD(CONV KINT IRCONV_NUM_U32)
802
LJFOLDF(kfold_conv_kintu32_num)
×
803
{
804
  return lj_ir_knum(J, (lua_Number)(uint32_t)fleft->i);
×
805
}
806

807
LJFOLD(CONV KINT IRCONV_INT_I8)
808
LJFOLD(CONV KINT IRCONV_INT_U8)
809
LJFOLD(CONV KINT IRCONV_INT_I16)
810
LJFOLD(CONV KINT IRCONV_INT_U16)
811
LJFOLDF(kfold_conv_kint_ext)
3✔
812
{
813
  int32_t k = fleft->i;
3✔
814
  if ((fins->op2 & IRCONV_SRCMASK) == IRT_I8) k = (int8_t)k;
3✔
815
  else if ((fins->op2 & IRCONV_SRCMASK) == IRT_U8) k = (uint8_t)k;
3✔
816
  else if ((fins->op2 & IRCONV_SRCMASK) == IRT_I16) k = (int16_t)k;
×
817
  else k = (uint16_t)k;
×
818
  return INTFOLD(k);
3✔
819
}
820

821
LJFOLD(CONV KINT IRCONV_I64_INT)
822
LJFOLD(CONV KINT IRCONV_U64_INT)
823
LJFOLD(CONV KINT IRCONV_I64_U32)
824
LJFOLD(CONV KINT IRCONV_U64_U32)
825
LJFOLDF(kfold_conv_kint_i64)
115✔
826
{
827
  if ((fins->op2 & IRCONV_SEXT))
115✔
828
    return INT64FOLD((uint64_t)(int64_t)fleft->i);
111✔
829
  else
830
    return INT64FOLD((uint64_t)(int64_t)(uint32_t)fleft->i);
4✔
831
}
832

833
LJFOLD(CONV KINT64 IRCONV_NUM_I64)
834
LJFOLDF(kfold_conv_kint64_num_i64)
×
835
{
836
  return lj_ir_knum(J, (lua_Number)(int64_t)ir_kint64(fleft)->u64);
×
837
}
838

839
LJFOLD(CONV KINT64 IRCONV_NUM_U64)
840
LJFOLDF(kfold_conv_kint64_num_u64)
×
841
{
842
  return lj_ir_knum(J, (lua_Number)ir_kint64(fleft)->u64);
×
843
}
844

845
LJFOLD(CONV KINT64 IRCONV_INT_I64)
846
LJFOLD(CONV KINT64 IRCONV_U32_I64)
847
LJFOLDF(kfold_conv_kint64_int_i64)
×
848
{
849
  return INTFOLD((int32_t)ir_kint64(fleft)->u64);
×
850
}
851

852
LJFOLD(CONV KNUM IRCONV_INT_NUM)
853
LJFOLDF(kfold_conv_knum_int_num)
9✔
854
{
855
  lua_Number n = knumleft;
9✔
856
  int32_t k = lj_num2int(n);
9✔
857
  if (irt_isguard(fins->t) && n != (lua_Number)k) {
9✔
858
    /* We're about to create a guard which always fails, like CONV +1.5.
859
    ** Some pathological loops cause this during LICM, e.g.:
860
    **   local x,k,t = 0,1.5,{1,[1.5]=2}
861
    **   for i=1,200 do x = x+ t[k]; k = k == 1 and 1.5 or 1 end
862
    **   assert(x == 300)
863
    */
864
    return FAILFOLD;
865
  }
866
  return INTFOLD(k);
9✔
867
}
868

869
LJFOLD(CONV KNUM IRCONV_U32_NUM)
870
LJFOLDF(kfold_conv_knum_u32_num)
×
871
{
872
#ifdef _MSC_VER
873
  {  /* Workaround for MSVC bug. */
874
    volatile uint32_t u = (uint32_t)knumleft;
875
    return INTFOLD((int32_t)u);
876
  }
877
#else
878
  return INTFOLD((int32_t)(uint32_t)knumleft);
×
879
#endif
880
}
881

882
LJFOLD(CONV KNUM IRCONV_I64_NUM)
883
LJFOLDF(kfold_conv_knum_i64_num)
×
884
{
885
  return INT64FOLD((uint64_t)(int64_t)knumleft);
×
886
}
887

888
LJFOLD(CONV KNUM IRCONV_U64_NUM)
889
LJFOLDF(kfold_conv_knum_u64_num)
×
890
{
891
  return INT64FOLD(lj_num2u64(knumleft));
×
892
}
893

894
LJFOLD(TOSTR KNUM any)
895
LJFOLDF(kfold_tostr_knum)
1✔
896
{
897
  return lj_ir_kstr(J, lj_strfmt_num(J->L, ir_knum(fleft)));
1✔
898
}
899

900
LJFOLD(TOSTR KINT any)
901
LJFOLDF(kfold_tostr_kint)
9✔
902
{
903
  return lj_ir_kstr(J, fins->op2 == IRTOSTR_INT ?
9✔
904
                       lj_strfmt_int(J->L, fleft->i) :
905
                       lj_strfmt_char(J->L, fleft->i));
906
}
907

908
LJFOLD(STRTO KGC)
909
LJFOLDF(kfold_strto)
1✔
910
{
911
  TValue n;
1✔
912
  if (lj_strscan_num(ir_kstr(fleft), &n))
1✔
913
    return lj_ir_knum(J, numV(&n));
1✔
914
  return FAILFOLD;
915
}
916

917
/* -- Constant folding of equality checks --------------------------------- */
918

919
/* Don't constant-fold away FLOAD checks against KNULL. */
920
LJFOLD(EQ FLOAD KNULL)
921
LJFOLD(NE FLOAD KNULL)
922
LJFOLDX(lj_opt_cse)
923

924
/* But fold all other KNULL compares, since only KNULL is equal to KNULL. */
925
LJFOLD(EQ any KNULL)
926
LJFOLD(NE any KNULL)
927
LJFOLD(EQ KNULL any)
928
LJFOLD(NE KNULL any)
929
LJFOLD(EQ KINT KINT)  /* Constants are unique, so same refs <==> same value. */
930
LJFOLD(NE KINT KINT)
931
LJFOLD(EQ KINT64 KINT64)
932
LJFOLD(NE KINT64 KINT64)
933
LJFOLD(EQ KGC KGC)
934
LJFOLD(NE KGC KGC)
935
LJFOLDF(kfold_kref)
6,691✔
936
{
937
  return CONDFOLD((fins->op1 == fins->op2) ^ (fins->o == IR_NE));
6,691✔
938
}
939

940
/* -- Algebraic shortcuts ------------------------------------------------- */
941

942
LJFOLD(FPMATH FPMATH IRFPM_FLOOR)
943
LJFOLD(FPMATH FPMATH IRFPM_CEIL)
944
LJFOLD(FPMATH FPMATH IRFPM_TRUNC)
945
LJFOLDF(shortcut_round)
×
946
{
947
  IRFPMathOp op = (IRFPMathOp)fleft->op2;
×
948
  if (op == IRFPM_FLOOR || op == IRFPM_CEIL || op == IRFPM_TRUNC)
×
949
    return LEFTFOLD;  /* round(round_left(x)) = round_left(x) */
×
950
  return NEXTFOLD;
951
}
952

953
LJFOLD(ABS ABS FLOAD)
954
LJFOLDF(shortcut_left)
×
955
{
956
  return LEFTFOLD;  /* f(g(x)) ==> g(x) */
×
957
}
958

959
LJFOLD(ABS NEG FLOAD)
960
LJFOLDF(shortcut_dropleft)
×
961
{
962
  PHIBARRIER(fleft);
×
963
  fins->op1 = fleft->op1;  /* abs(neg(x)) ==> abs(x) */
×
964
  return RETRYFOLD;
×
965
}
966

967
/* Note: no safe shortcuts with STRTO and TOSTR ("1e2" ==> +100 ==> "100"). */
968
LJFOLD(NEG NEG any)
969
LJFOLD(BNOT BNOT)
970
LJFOLD(BSWAP BSWAP)
971
LJFOLDF(shortcut_leftleft)
×
972
{
973
  PHIBARRIER(fleft);  /* See above. Fold would be ok, but not beneficial. */
×
974
  return fleft->op1;  /* f(g(x)) ==> x */
×
975
}
976

977
/* -- FP algebraic simplifications ---------------------------------------- */
978

979
/* FP arithmetic is tricky -- there's not much to simplify.
980
** Please note the following common pitfalls before sending "improvements":
981
**   x+0 ==> x  is INVALID for x=-0
982
**   0-x ==> -x is INVALID for x=+0
983
**   x*0 ==> 0  is INVALID for x=-0, x=+-Inf or x=NaN
984
*/
985

986
LJFOLD(ADD NEG any)
987
LJFOLDF(simplify_numadd_negx)
×
988
{
989
  PHIBARRIER(fleft);
×
990
  fins->o = IR_SUB;  /* (-a) + b ==> b - a */
×
991
  fins->op1 = fins->op2;
×
992
  fins->op2 = fleft->op1;
×
993
  return RETRYFOLD;
×
994
}
995

996
LJFOLD(ADD any NEG)
997
LJFOLDF(simplify_numadd_xneg)
×
998
{
999
  PHIBARRIER(fright);
×
1000
  fins->o = IR_SUB;  /* a + (-b) ==> a - b */
×
1001
  fins->op2 = fright->op1;
×
1002
  return RETRYFOLD;
×
1003
}
1004

1005
LJFOLD(SUB any KNUM)
1006
LJFOLDF(simplify_numsub_k)
239✔
1007
{
1008
  if (ir_knum(fright)->u64 == 0)  /* x - (+0) ==> x */
239✔
1009
    return LEFTFOLD;
27✔
1010
  return NEXTFOLD;
1011
}
1012

1013
LJFOLD(SUB NEG KNUM)
1014
LJFOLDF(simplify_numsub_negk)
×
1015
{
1016
  PHIBARRIER(fleft);
×
1017
  fins->op2 = fleft->op1;  /* (-x) - k ==> (-k) - x */
×
1018
  fins->op1 = (IRRef1)lj_ir_knum(J, -knumright);
×
1019
  return RETRYFOLD;
×
1020
}
1021

1022
LJFOLD(SUB any NEG)
1023
LJFOLDF(simplify_numsub_xneg)
×
1024
{
1025
  PHIBARRIER(fright);
×
1026
  fins->o = IR_ADD;  /* a - (-b) ==> a + b */
×
1027
  fins->op2 = fright->op1;
×
1028
  return RETRYFOLD;
×
1029
}
1030

1031
LJFOLD(MUL any KNUM)
1032
LJFOLD(DIV any KNUM)
1033
LJFOLDF(simplify_nummuldiv_k)
407✔
1034
{
1035
  lua_Number n = knumright;
407✔
1036
  if (n == 1.0) {  /* x o 1 ==> x */
407✔
1037
    return LEFTFOLD;
×
1038
  } else if (n == -1.0) {  /* x o -1 ==> -x */
407✔
1039
    IRRef op1 = fins->op1;
×
1040
    fins->op2 = (IRRef1)lj_ir_ksimd(J, LJ_KSIMD_NEG);  /* Modifies fins. */
×
1041
    fins->op1 = op1;
×
1042
    fins->o = IR_NEG;
×
1043
    return RETRYFOLD;
×
1044
  } else if (fins->o == IR_MUL && n == 2.0) {  /* x * 2 ==> x + x */
407✔
1045
    fins->o = IR_ADD;
2✔
1046
    fins->op2 = fins->op1;
2✔
1047
    return RETRYFOLD;
2✔
1048
  } else if (fins->o == IR_DIV) {  /* x / 2^k ==> x * 2^-k */
405✔
1049
    uint64_t u = ir_knum(fright)->u64;
248✔
1050
    uint32_t ex = ((uint32_t)(u >> 52) & 0x7ff);
248✔
1051
    if ((u & U64x(000fffff,ffffffff)) == 0 && ex - 1 < 0x7fd) {
248✔
1052
      u = (u & ((uint64_t)1 << 63)) | ((uint64_t)(0x7fe - ex) << 52);
7✔
1053
      fins->o = IR_MUL;  /* Multiply by exact reciprocal. */
7✔
1054
      fins->op2 = lj_ir_knum_u64(J, u);
7✔
1055
      return RETRYFOLD;
7✔
1056
    }
1057
  }
1058
  return NEXTFOLD;
1059
}
1060

1061
LJFOLD(MUL NEG KNUM)
1062
LJFOLD(DIV NEG KNUM)
1063
LJFOLDF(simplify_nummuldiv_negk)
×
1064
{
1065
  PHIBARRIER(fleft);
×
1066
  fins->op1 = fleft->op1;  /* (-a) o k ==> a o (-k) */
×
1067
  fins->op2 = (IRRef1)lj_ir_knum(J, -knumright);
×
1068
  return RETRYFOLD;
×
1069
}
1070

1071
LJFOLD(MUL NEG NEG)
1072
LJFOLD(DIV NEG NEG)
1073
LJFOLDF(simplify_nummuldiv_negneg)
×
1074
{
1075
  PHIBARRIER(fleft);
×
1076
  PHIBARRIER(fright);
×
1077
  fins->op1 = fleft->op1;  /* (-a) o (-b) ==> a o b */
×
1078
  fins->op2 = fright->op1;
×
1079
  return RETRYFOLD;
×
1080
}
1081

1082
LJFOLD(POW any KNUM)
1083
LJFOLDF(simplify_numpow_k)
4✔
1084
{
1085
  if (knumright == 0)  /* x ^ 0 ==> 1 */
4✔
1086
    return lj_ir_knum_one(J);  /* Result must be a number, not an int. */
×
1087
  else if (knumright == 1)  /* x ^ 1 ==> x */
4✔
1088
    return LEFTFOLD;
×
1089
  else if (knumright == 2)  /* x ^ 2 ==> x * x */
4✔
1090
    return emitir(IRTN(IR_MUL), fins->op1, fins->op1);
×
1091
  else
1092
    return NEXTFOLD;
1093
}
1094

1095
/* -- Simplify conversions ------------------------------------------------ */
1096

1097
LJFOLD(CONV CONV IRCONV_NUM_INT)  /* _NUM */
1098
LJFOLDF(shortcut_conv_num_int)
14✔
1099
{
1100
  PHIBARRIER(fleft);
14✔
1101
  /* Only safe with a guarded conversion to int. */
1102
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_NUM && irt_isguard(fleft->t))
14✔
1103
    return fleft->op1;  /* f(g(x)) ==> x */
8✔
1104
  return NEXTFOLD;
1105
}
1106

1107
LJFOLD(CONV CONV IRCONV_INT_NUM)  /* _INT */
1108
LJFOLD(CONV CONV IRCONV_U32_NUM)  /* _U32*/
1109
LJFOLDF(simplify_conv_int_num)
22✔
1110
{
1111
  /* Fold even across PHI to avoid expensive num->int conversions in loop. */
1112
  if ((fleft->op2 & IRCONV_SRCMASK) ==
22✔
1113
      ((fins->op2 & IRCONV_DSTMASK) >> IRCONV_DSH))
22✔
1114
    return fleft->op1;
19✔
1115
  return NEXTFOLD;
1116
}
1117

1118
LJFOLD(CONV CONV IRCONV_I64_NUM)  /* _INT or _U32 */
1119
LJFOLD(CONV CONV IRCONV_U64_NUM)  /* _INT or _U32 */
1120
LJFOLDF(simplify_conv_i64_num)
2✔
1121
{
1122
  PHIBARRIER(fleft);
2✔
1123
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT) {
2✔
1124
    /* Reduce to a sign-extension. */
1125
    fins->op1 = fleft->op1;
×
1126
    fins->op2 = ((IRT_I64<<5)|IRT_INT|IRCONV_SEXT);
×
1127
    return RETRYFOLD;
×
1128
  } else if ((fleft->op2 & IRCONV_SRCMASK) == IRT_U32) {
2✔
1129
#if LJ_TARGET_X64
1130
    return fleft->op1;
1✔
1131
#else
1132
    /* Reduce to a zero-extension. */
1133
    fins->op1 = fleft->op1;
1134
    fins->op2 = (IRT_I64<<5)|IRT_U32;
1135
    return RETRYFOLD;
1136
#endif
1137
  }
1138
  return NEXTFOLD;
1139
}
1140

1141
LJFOLD(CONV CONV IRCONV_INT_I64)  /* _INT or _U32 */
1142
LJFOLD(CONV CONV IRCONV_INT_U64)  /* _INT or _U32 */
1143
LJFOLD(CONV CONV IRCONV_U32_I64)  /* _INT or _U32 */
1144
LJFOLD(CONV CONV IRCONV_U32_U64)  /* _INT or _U32 */
1145
LJFOLDF(simplify_conv_int_i64)
2✔
1146
{
1147
  int src;
2✔
1148
  PHIBARRIER(fleft);
2✔
1149
  src = (fleft->op2 & IRCONV_SRCMASK);
2✔
1150
  if (src == IRT_INT || src == IRT_U32) {
2✔
1151
    if (src == ((fins->op2 & IRCONV_DSTMASK) >> IRCONV_DSH)) {
2✔
1152
      return fleft->op1;
×
1153
    } else {
1154
      fins->op2 = ((fins->op2 & IRCONV_DSTMASK) | src);
2✔
1155
      fins->op1 = fleft->op1;
2✔
1156
      return RETRYFOLD;
2✔
1157
    }
1158
  }
1159
  return NEXTFOLD;
1160
}
1161

1162
LJFOLD(CONV CONV IRCONV_FLOAT_NUM)  /* _FLOAT */
1163
LJFOLDF(simplify_conv_flt_num)
5✔
1164
{
1165
  PHIBARRIER(fleft);
5✔
1166
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_FLOAT)
5✔
1167
    return fleft->op1;
1✔
1168
  return NEXTFOLD;
1169
}
1170

1171
/* Shortcut TOBIT + IRT_NUM <- IRT_INT/IRT_U32 conversion. */
1172
LJFOLD(TOBIT CONV KNUM)
1173
LJFOLDF(simplify_tobit_conv)
3✔
1174
{
1175
  /* Fold even across PHI to avoid expensive num->int conversions in loop. */
1176
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT) {
3✔
1177
    lj_assertJ(irt_isnum(fleft->t), "expected TOBIT number arg");
3✔
1178
    return fleft->op1;
3✔
1179
  } else if ((fleft->op2 & IRCONV_SRCMASK) == IRT_U32) {
×
1180
    lj_assertJ(irt_isnum(fleft->t), "expected TOBIT number arg");
×
1181
    fins->o = IR_CONV;
×
1182
    fins->op1 = fleft->op1;
×
1183
    fins->op2 = (IRT_INT<<5)|IRT_U32;
×
1184
    return RETRYFOLD;
×
1185
  }
1186
  return NEXTFOLD;
1187
}
1188

1189
/* Shortcut floor/ceil/round + IRT_NUM <- IRT_INT/IRT_U32 conversion. */
1190
LJFOLD(FPMATH CONV IRFPM_FLOOR)
1191
LJFOLD(FPMATH CONV IRFPM_CEIL)
1192
LJFOLD(FPMATH CONV IRFPM_TRUNC)
1193
LJFOLDF(simplify_floor_conv)
×
1194
{
1195
  if ((fleft->op2 & IRCONV_SRCMASK) == IRT_INT ||
×
1196
      (fleft->op2 & IRCONV_SRCMASK) == IRT_U32)
1197
    return LEFTFOLD;
×
1198
  return NEXTFOLD;
1199
}
1200

1201
/* Strength reduction of widening. */
1202
LJFOLD(CONV any IRCONV_I64_INT)
1203
LJFOLD(CONV any IRCONV_U64_INT)
1204
LJFOLDF(simplify_conv_sext)
172✔
1205
{
1206
  IRRef ref = fins->op1;
172✔
1207
  int64_t ofs = 0;
172✔
1208
  if (!(fins->op2 & IRCONV_SEXT))
172✔
1209
    return NEXTFOLD;
1210
  PHIBARRIER(fleft);
152✔
1211
  if (fleft->o == IR_XLOAD && (irt_isu8(fleft->t) || irt_isu16(fleft->t)))
150✔
1212
    goto ok_reduce;
×
1213
  if (fleft->o == IR_ADD && irref_isk(fleft->op2)) {
150✔
1214
    ofs = (int64_t)IR(fleft->op2)->i;
×
1215
    ref = fleft->op1;
×
1216
  }
1217
  /* Use scalar evolution analysis results to strength-reduce sign-extension. */
1218
  if (ref == J->scev.idx) {
150✔
1219
    IRRef lo = J->scev.dir ? J->scev.start : J->scev.stop;
141✔
1220
    lj_assertJ(irt_isint(J->scev.t), "only int SCEV supported");
141✔
1221
    if (lo && IR(lo)->o == IR_KINT && IR(lo)->i + ofs >= 0) {
141✔
1222
    ok_reduce:
139✔
1223
#if LJ_TARGET_X64
1224
      /* Eliminate widening. All 32 bit ops do an implicit zero-extension. */
1225
      return LEFTFOLD;
139✔
1226
#else
1227
      /* Reduce to a (cheaper) zero-extension. */
1228
      fins->op2 &= ~IRCONV_SEXT;
1229
      return RETRYFOLD;
1230
#endif
1231
    }
1232
  }
1233
  return NEXTFOLD;
1234
}
1235

1236
/* Strength reduction of narrowing. */
1237
LJFOLD(CONV ADD IRCONV_INT_I64)
1238
LJFOLD(CONV SUB IRCONV_INT_I64)
1239
LJFOLD(CONV MUL IRCONV_INT_I64)
1240
LJFOLD(CONV ADD IRCONV_INT_U64)
1241
LJFOLD(CONV SUB IRCONV_INT_U64)
1242
LJFOLD(CONV MUL IRCONV_INT_U64)
1243
LJFOLD(CONV ADD IRCONV_U32_I64)
1244
LJFOLD(CONV SUB IRCONV_U32_I64)
1245
LJFOLD(CONV MUL IRCONV_U32_I64)
1246
LJFOLD(CONV ADD IRCONV_U32_U64)
1247
LJFOLD(CONV SUB IRCONV_U32_U64)
1248
LJFOLD(CONV MUL IRCONV_U32_U64)
1249
LJFOLDF(simplify_conv_narrow)
2✔
1250
{
1251
  IROp op = (IROp)fleft->o;
2✔
1252
  IRType t = irt_type(fins->t);
2✔
1253
  IRRef op1 = fleft->op1, op2 = fleft->op2, mode = fins->op2;
2✔
1254
  PHIBARRIER(fleft);
2✔
1255
  op1 = emitir(IRT(IR_CONV, t), op1, mode);
2✔
1256
  op2 = emitir(IRT(IR_CONV, t), op2, mode);
2✔
1257
  fins->ot = IRT(op, t);
2✔
1258
  fins->op1 = op1;
2✔
1259
  fins->op2 = op2;
2✔
1260
  return RETRYFOLD;
2✔
1261
}
1262

1263
/* Special CSE rule for CONV. */
1264
LJFOLD(CONV any any)
1265
LJFOLDF(cse_conv)
1,928✔
1266
{
1267
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
1,928✔
1268
    IRRef op1 = fins->op1, op2 = (fins->op2 & IRCONV_MODEMASK);
1,928✔
1269
    uint8_t guard = irt_isguard(fins->t);
1,928✔
1270
    IRRef ref = J->chain[IR_CONV];
1,928✔
1271
    while (ref > op1) {
3,392✔
1272
      IRIns *ir = IR(ref);
1,785✔
1273
      /* Commoning with stronger checks is ok. */
1274
      if (ir->op1 == op1 && (ir->op2 & IRCONV_MODEMASK) == op2 &&
1,785✔
1275
          irt_isguard(ir->t) >= guard)
321✔
1276
        return ref;
321✔
1277
      ref = ir->prev;
1,464✔
1278
    }
1279
  }
1280
  return EMITFOLD;  /* No fallthrough to regular CSE. */
1,607✔
1281
}
1282

1283
/* FP conversion narrowing. */
1284
LJFOLD(TOBIT ADD KNUM)
1285
LJFOLD(TOBIT SUB KNUM)
1286
LJFOLD(CONV ADD IRCONV_INT_NUM)
1287
LJFOLD(CONV SUB IRCONV_INT_NUM)
1288
LJFOLD(CONV ADD IRCONV_I64_NUM)
1289
LJFOLD(CONV SUB IRCONV_I64_NUM)
1290
LJFOLDF(narrow_convert)
378✔
1291
{
1292
  PHIBARRIER(fleft);
378✔
1293
  /* Narrowing ignores PHIs and repeating it inside the loop is not useful. */
1294
  if (J->chain[IR_LOOP])
366✔
1295
    return NEXTFOLD;
1296
  lj_assertJ(fins->o != IR_CONV || (fins->op2&IRCONV_CONVMASK) != IRCONV_TOBIT,
322✔
1297
             "unexpected CONV TOBIT");
1298
  return lj_opt_narrow_convert(J);
322✔
1299
}
1300

1301
/* -- Integer algebraic simplifications ----------------------------------- */
1302

1303
LJFOLD(ADD any KINT)
1304
LJFOLD(ADDOV any KINT)
1305
LJFOLD(SUBOV any KINT)
1306
LJFOLDF(simplify_intadd_k)
3,509✔
1307
{
1308
  if (fright->i == 0)  /* i o 0 ==> i */
3,509✔
1309
    return LEFTFOLD;
138✔
1310
  return NEXTFOLD;
1311
}
1312

1313
LJFOLD(MULOV any KINT)
1314
LJFOLDF(simplify_intmul_k)
×
1315
{
1316
  if (fright->i == 0)  /* i * 0 ==> 0 */
×
1317
    return RIGHTFOLD;
×
1318
  if (fright->i == 1)  /* i * 1 ==> i */
×
1319
    return LEFTFOLD;
×
1320
  if (fright->i == 2) {  /* i * 2 ==> i + i */
×
1321
    fins->o = IR_ADDOV;
×
1322
    fins->op2 = fins->op1;
×
1323
    return RETRYFOLD;
×
1324
  }
1325
  return NEXTFOLD;
1326
}
1327

1328
LJFOLD(SUB any KINT)
1329
LJFOLDF(simplify_intsub_k)
87✔
1330
{
1331
  if (fright->i == 0)  /* i - 0 ==> i */
87✔
1332
    return LEFTFOLD;
56✔
1333
  fins->o = IR_ADD;  /* i - k ==> i + (-k) */
31✔
1334
  fins->op2 = (IRRef1)lj_ir_kint(J, -fright->i);  /* Overflow for -2^31 ok. */
31✔
1335
  return RETRYFOLD;
31✔
1336
}
1337

1338
LJFOLD(SUB KINT any)
1339
LJFOLD(SUB KINT64 any)
1340
LJFOLDF(simplify_intsub_kleft)
22✔
1341
{
1342
  if (fleft->o == IR_KINT ? (fleft->i == 0) : (ir_kint64(fleft)->u64 == 0)) {
22✔
1343
    fins->o = IR_NEG;  /* 0 - i ==> -i */
12✔
1344
    fins->op1 = fins->op2;
12✔
1345
    return RETRYFOLD;
12✔
1346
  }
1347
  return NEXTFOLD;
1348
}
1349

1350
LJFOLD(ADD any KINT64)
1351
LJFOLDF(simplify_intadd_k64)
775✔
1352
{
1353
  if (ir_kint64(fright)->u64 == 0)  /* i + 0 ==> i */
775✔
1354
    return LEFTFOLD;
19✔
1355
  return NEXTFOLD;
1356
}
1357

1358
LJFOLD(SUB any KINT64)
1359
LJFOLDF(simplify_intsub_k64)
13✔
1360
{
1361
  uint64_t k = ir_kint64(fright)->u64;
13✔
1362
  if (k == 0)  /* i - 0 ==> i */
13✔
1363
    return LEFTFOLD;
×
1364
  fins->o = IR_ADD;  /* i - k ==> i + (-k) */
13✔
1365
  fins->op2 = (IRRef1)lj_ir_kint64(J, (uint64_t)-(int64_t)k);
13✔
1366
  return RETRYFOLD;
13✔
1367
}
1368

1369
static TRef simplify_intmul_k(jit_State *J, int32_t k)
144✔
1370
{
1371
  /* Note: many more simplifications are possible, e.g. 2^k1 +- 2^k2.
1372
  ** But this is mainly intended for simple address arithmetic.
1373
  ** Also it's easier for the backend to optimize the original multiplies.
1374
  */
1375
  if (k == 0) {  /* i * 0 ==> 0 */
144✔
1376
    return RIGHTFOLD;
×
1377
  } else if (k == 1) {  /* i * 1 ==> i */
144✔
1378
    return LEFTFOLD;
47✔
1379
  } else if ((k & (k-1)) == 0) {  /* i * 2^k ==> i << k */
97✔
1380
    fins->o = IR_BSHL;
95✔
1381
    fins->op2 = lj_ir_kint(J, lj_fls((uint32_t)k));
95✔
1382
    return RETRYFOLD;
95✔
1383
  }
1384
  return NEXTFOLD;
1385
}
1386

1387
LJFOLD(MUL any KINT)
1388
LJFOLDF(simplify_intmul_k32)
×
1389
{
1390
  if (fright->i >= 0)
×
1391
    return simplify_intmul_k(J, fright->i);
×
1392
  return NEXTFOLD;
1393
}
1394

1395
LJFOLD(MUL any KINT64)
1396
LJFOLDF(simplify_intmul_k64)
146✔
1397
{
1398
#if LJ_HASFFI
1399
  if (ir_kint64(fright)->u64 < 0x80000000u)
146✔
1400
    return simplify_intmul_k(J, (int32_t)ir_kint64(fright)->u64);
144✔
1401
  return NEXTFOLD;
1402
#else
1403
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1404
#endif
1405
}
1406

1407
LJFOLD(MOD any KINT)
1408
LJFOLDF(simplify_intmod_k)
64✔
1409
{
1410
  int32_t k = fright->i;
64✔
1411
  lj_assertJ(k != 0, "integer mod 0");
64✔
1412
  if (k > 0 && (k & (k-1)) == 0) {  /* i % (2^k) ==> i & (2^k-1) */
64✔
1413
    fins->o = IR_BAND;
4✔
1414
    fins->op2 = lj_ir_kint(J, k-1);
4✔
1415
    return RETRYFOLD;
4✔
1416
  }
1417
  return NEXTFOLD;
1418
}
1419

1420
LJFOLD(MOD KINT any)
1421
LJFOLDF(simplify_intmod_kleft)
2✔
1422
{
1423
  if (fleft->i == 0)
2✔
1424
    return INTFOLD(0);
×
1425
  return NEXTFOLD;
1426
}
1427

1428
LJFOLD(SUB any any)
1429
LJFOLD(SUBOV any any)
1430
LJFOLDF(simplify_intsub)
972✔
1431
{
1432
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))  /* i - i ==> 0 */
972✔
1433
    return irt_is64(fins->t) ? INT64FOLD(0) : INTFOLD(0);
×
1434
  return NEXTFOLD;
1435
}
1436

1437
LJFOLD(SUB ADD any)
1438
LJFOLDF(simplify_intsubadd_leftcancel)
87✔
1439
{
1440
  if (!irt_isnum(fins->t)) {
87✔
1441
    PHIBARRIER(fleft);
21✔
1442
    if (fins->op2 == fleft->op1)  /* (i + j) - i ==> j */
20✔
1443
      return fleft->op2;
17✔
1444
    if (fins->op2 == fleft->op2)  /* (i + j) - j ==> i */
3✔
1445
      return fleft->op1;
×
1446
  }
1447
  return NEXTFOLD;
1448
}
1449

1450
LJFOLD(SUB SUB any)
1451
LJFOLDF(simplify_intsubsub_leftcancel)
67✔
1452
{
1453
  if (!irt_isnum(fins->t)) {
67✔
1454
    PHIBARRIER(fleft);
2✔
1455
    if (fins->op2 == fleft->op1) {  /* (i - j) - i ==> 0 - j */
2✔
1456
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
×
1457
      fins->op2 = fleft->op2;
×
1458
      return RETRYFOLD;
×
1459
    }
1460
  }
1461
  return NEXTFOLD;
1462
}
1463

1464
LJFOLD(SUB any SUB)
1465
LJFOLDF(simplify_intsubsub_rightcancel)
×
1466
{
1467
  if (!irt_isnum(fins->t)) {
×
1468
    PHIBARRIER(fright);
×
1469
    if (fins->op1 == fright->op1)  /* i - (i - j) ==> j */
×
1470
      return fright->op2;
×
1471
  }
1472
  return NEXTFOLD;
1473
}
1474

1475
LJFOLD(SUB any ADD)
1476
LJFOLDF(simplify_intsubadd_rightcancel)
48✔
1477
{
1478
  if (!irt_isnum(fins->t)) {
48✔
1479
    PHIBARRIER(fright);
47✔
1480
    if (fins->op1 == fright->op1) {  /* i - (i + j) ==> 0 - j */
45✔
1481
      fins->op2 = fright->op2;
7✔
1482
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
7✔
1483
      return RETRYFOLD;
7✔
1484
    }
1485
    if (fins->op1 == fright->op2) {  /* i - (j + i) ==> 0 - j */
38✔
1486
      fins->op2 = fright->op1;
6✔
1487
      fins->op1 = (IRRef1)lj_ir_kint(J, 0);
6✔
1488
      return RETRYFOLD;
6✔
1489
    }
1490
  }
1491
  return NEXTFOLD;
1492
}
1493

1494
LJFOLD(SUB ADD ADD)
1495
LJFOLDF(simplify_intsubaddadd_cancel)
15✔
1496
{
1497
  if (!irt_isnum(fins->t)) {
15✔
1498
    PHIBARRIER(fleft);
15✔
1499
    PHIBARRIER(fright);
14✔
1500
    if (fleft->op1 == fright->op1) {  /* (i + j1) - (i + j2) ==> j1 - j2 */
13✔
1501
      fins->op1 = fleft->op2;
12✔
1502
      fins->op2 = fright->op2;
12✔
1503
      return RETRYFOLD;
12✔
1504
    }
1505
    if (fleft->op1 == fright->op2) {  /* (i + j1) - (j2 + i) ==> j1 - j2 */
1✔
1506
      fins->op1 = fleft->op2;
×
1507
      fins->op2 = fright->op1;
×
1508
      return RETRYFOLD;
×
1509
    }
1510
    if (fleft->op2 == fright->op1) {  /* (j1 + i) - (i + j2) ==> j1 - j2 */
1✔
1511
      fins->op1 = fleft->op1;
×
1512
      fins->op2 = fright->op2;
×
1513
      return RETRYFOLD;
×
1514
    }
1515
    if (fleft->op2 == fright->op2) {  /* (j1 + i) - (j2 + i) ==> j1 - j2 */
1✔
1516
      fins->op1 = fleft->op1;
×
1517
      fins->op2 = fright->op1;
×
1518
      return RETRYFOLD;
×
1519
    }
1520
  }
1521
  return NEXTFOLD;
1522
}
1523

1524
LJFOLD(BAND any KINT)
1525
LJFOLD(BAND any KINT64)
1526
LJFOLDF(simplify_band_k)
206✔
1527
{
1528
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
206✔
1529
                                     (int64_t)ir_k64(fright)->u64;
9✔
1530
  if (k == 0)  /* i & 0 ==> 0 */
206✔
1531
    return RIGHTFOLD;
1✔
1532
  if (k == -1)  /* i & -1 ==> i */
205✔
1533
    return LEFTFOLD;
×
1534
  return NEXTFOLD;
1535
}
1536

1537
LJFOLD(BOR any KINT)
1538
LJFOLD(BOR any KINT64)
1539
LJFOLDF(simplify_bor_k)
63✔
1540
{
1541
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
63✔
1542
                                     (int64_t)ir_k64(fright)->u64;
6✔
1543
  if (k == 0)  /* i | 0 ==> i */
63✔
1544
    return LEFTFOLD;
×
1545
  if (k == -1)  /* i | -1 ==> -1 */
63✔
1546
    return RIGHTFOLD;
×
1547
  return NEXTFOLD;
1548
}
1549

1550
LJFOLD(BXOR any KINT)
1551
LJFOLD(BXOR any KINT64)
1552
LJFOLDF(simplify_bxor_k)
7✔
1553
{
1554
  int64_t k = fright->o == IR_KINT ? (int64_t)fright->i :
7✔
1555
                                     (int64_t)ir_k64(fright)->u64;
6✔
1556
  if (k == 0)  /* i xor 0 ==> i */
7✔
1557
    return LEFTFOLD;
1✔
1558
  if (k == -1) {  /* i xor -1 ==> ~i */
6✔
1559
    fins->o = IR_BNOT;
×
1560
    fins->op2 = 0;
×
1561
    return RETRYFOLD;
×
1562
  }
1563
  return NEXTFOLD;
1564
}
1565

1566
LJFOLD(BSHL any KINT)
1567
LJFOLD(BSHR any KINT)
1568
LJFOLD(BSAR any KINT)
1569
LJFOLD(BROL any KINT)
1570
LJFOLD(BROR any KINT)
1571
LJFOLDF(simplify_shift_ik)
236✔
1572
{
1573
  int32_t mask = irt_is64(fins->t) ? 63 : 31;
236✔
1574
  int32_t k = (fright->i & mask);
236✔
1575
  if (k == 0)  /* i o 0 ==> i */
236✔
1576
    return LEFTFOLD;
×
1577
  if (k == 1 && fins->o == IR_BSHL) {  /* i << 1 ==> i + i */
236✔
1578
    fins->o = IR_ADD;
15✔
1579
    fins->op2 = fins->op1;
15✔
1580
    return RETRYFOLD;
15✔
1581
  }
1582
  if (k != fright->i) {  /* i o k ==> i o (k & mask) */
221✔
1583
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
×
1584
    return RETRYFOLD;
×
1585
  }
1586
#ifndef LJ_TARGET_UNIFYROT
1587
  if (fins->o == IR_BROR) {  /* bror(i, k) ==> brol(i, (-k)&mask) */
221✔
1588
    fins->o = IR_BROL;
×
1589
    fins->op2 = (IRRef1)lj_ir_kint(J, (-k)&mask);
×
1590
    return RETRYFOLD;
×
1591
  }
1592
#endif
1593
  return NEXTFOLD;
1594
}
1595

1596
LJFOLD(BSHL any BAND)
1597
LJFOLD(BSHR any BAND)
1598
LJFOLD(BSAR any BAND)
1599
LJFOLD(BROL any BAND)
1600
LJFOLD(BROR any BAND)
1601
LJFOLDF(simplify_shift_andk)
×
1602
{
1603
  IRIns *irk = IR(fright->op2);
×
1604
  PHIBARRIER(fright);
×
1605
  if ((fins->o < IR_BROL ? LJ_TARGET_MASKSHIFT : LJ_TARGET_MASKROT) &&
×
1606
      irk->o == IR_KINT) {  /* i o (j & mask) ==> i o j */
×
1607
    int32_t mask = irt_is64(fins->t) ? 63 : 31;
×
1608
    int32_t k = irk->i & mask;
×
1609
    if (k == mask) {
×
1610
      fins->op2 = fright->op1;
×
1611
      return RETRYFOLD;
×
1612
    }
1613
  }
1614
  return NEXTFOLD;
1615
}
1616

1617
LJFOLD(BSHL KINT any)
1618
LJFOLD(BSHR KINT any)
1619
LJFOLD(BSHL KINT64 any)
1620
LJFOLD(BSHR KINT64 any)
1621
LJFOLDF(simplify_shift1_ki)
2✔
1622
{
1623
  int64_t k = fleft->o == IR_KINT ? (int64_t)fleft->i :
2✔
1624
                                    (int64_t)ir_k64(fleft)->u64;
2✔
1625
  if (k == 0)  /* 0 o i ==> 0 */
2✔
1626
    return LEFTFOLD;
×
1627
  return NEXTFOLD;
1628
}
1629

1630
LJFOLD(BSAR KINT any)
1631
LJFOLD(BROL KINT any)
1632
LJFOLD(BROR KINT any)
1633
LJFOLD(BSAR KINT64 any)
1634
LJFOLD(BROL KINT64 any)
1635
LJFOLD(BROR KINT64 any)
1636
LJFOLDF(simplify_shift2_ki)
2✔
1637
{
1638
  int64_t k = fleft->o == IR_KINT ? (int64_t)fleft->i :
2✔
1639
                                    (int64_t)ir_k64(fleft)->u64;
2✔
1640
  if (k == 0 || k == -1)  /* 0 o i ==> 0; -1 o i ==> -1 */
2✔
1641
    return LEFTFOLD;
×
1642
  return NEXTFOLD;
1643
}
1644

1645
LJFOLD(BSHL BAND KINT)
1646
LJFOLD(BSHR BAND KINT)
1647
LJFOLD(BROL BAND KINT)
1648
LJFOLD(BROR BAND KINT)
1649
LJFOLDF(simplify_shiftk_andk)
6✔
1650
{
1651
  IRIns *irk = IR(fleft->op2);
6✔
1652
  PHIBARRIER(fleft);
6✔
1653
  if (irk->o == IR_KINT) {  /* (i & k1) o k2 ==> (i o k2) & (k1 o k2) */
6✔
1654
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
5✔
1655
    fins->op1 = fleft->op1;
5✔
1656
    fins->op1 = (IRRef1)lj_opt_fold(J);
5✔
1657
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
5✔
1658
    fins->ot = IRTI(IR_BAND);
5✔
1659
    return RETRYFOLD;
5✔
1660
  } else if (irk->o == IR_KINT64) {
1✔
1661
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, fright->i,
1✔
1662
                                  (IROp)fins->o);
1✔
1663
    IROpT ot = fleft->ot;
1✔
1664
    fins->op1 = fleft->op1;
1✔
1665
    fins->op1 = (IRRef1)lj_opt_fold(J);
1✔
1666
    fins->op2 = (IRRef1)lj_ir_kint64(J, k);
1✔
1667
    fins->ot = ot;
1✔
1668
    return RETRYFOLD;
1✔
1669
  }
1670
  return NEXTFOLD;
1671
}
1672

1673
LJFOLD(BAND BSHL KINT)
1674
LJFOLD(BAND BSHR KINT)
1675
LJFOLDF(simplify_andk_shiftk)
14✔
1676
{
1677
  IRIns *irk = IR(fleft->op2);
14✔
1678
  if (irk->o == IR_KINT &&
14✔
1679
      kfold_intop(-1, irk->i, (IROp)fleft->o) == fright->i)
14✔
1680
    return LEFTFOLD;  /* (i o k1) & k2 ==> i, if (-1 o k1) == k2 */
2✔
1681
  return NEXTFOLD;
1682
}
1683

1684
LJFOLD(BAND BOR KINT)
1685
LJFOLD(BOR BAND KINT)
1686
LJFOLDF(simplify_andor_k)
×
1687
{
1688
  IRIns *irk = IR(fleft->op2);
×
1689
  PHIBARRIER(fleft);
×
1690
  if (irk->o == IR_KINT) {
×
1691
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
×
1692
    /* (i | k1) & k2 ==> i & k2, if (k1 & k2) == 0. */
1693
    /* (i & k1) | k2 ==> i | k2, if (k1 | k2) == -1. */
1694
    if (k == (fins->o == IR_BAND ? 0 : -1)) {
×
1695
      fins->op1 = fleft->op1;
×
1696
      return RETRYFOLD;
×
1697
    }
1698
  }
1699
  return NEXTFOLD;
1700
}
1701

1702
LJFOLD(BAND BOR KINT64)
1703
LJFOLD(BOR BAND KINT64)
1704
LJFOLDF(simplify_andor_k64)
×
1705
{
1706
#if LJ_HASFFI
1707
  IRIns *irk = IR(fleft->op2);
×
1708
  PHIBARRIER(fleft);
×
1709
  if (irk->o == IR_KINT64) {
×
1710
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, ir_k64(fright)->u64,
×
1711
                                  (IROp)fins->o);
×
1712
    /* (i | k1) & k2 ==> i & k2, if (k1 & k2) == 0. */
1713
    /* (i & k1) | k2 ==> i | k2, if (k1 | k2) == -1. */
1714
    if (k == (fins->o == IR_BAND ? (uint64_t)0 : ~(uint64_t)0)) {
×
1715
      fins->op1 = fleft->op1;
×
1716
      return RETRYFOLD;
×
1717
    }
1718
  }
1719
  return NEXTFOLD;
1720
#else
1721
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1722
#endif
1723
}
1724

1725
/* -- Reassociation ------------------------------------------------------- */
1726

1727
LJFOLD(ADD ADD KINT)
1728
LJFOLD(MUL MUL KINT)
1729
LJFOLD(BAND BAND KINT)
1730
LJFOLD(BOR BOR KINT)
1731
LJFOLD(BXOR BXOR KINT)
1732
LJFOLDF(reassoc_intarith_k)
1,437✔
1733
{
1734
  IRIns *irk = IR(fleft->op2);
1,437✔
1735
  if (irk->o == IR_KINT) {
1,437✔
1736
    int32_t k = kfold_intop(irk->i, fright->i, (IROp)fins->o);
1,413✔
1737
    if (k == irk->i)  /* (i o k1) o k2 ==> i o k1, if (k1 o k2) == k1. */
1,413✔
1738
      return LEFTFOLD;
15✔
1739
    PHIBARRIER(fleft);
1,398✔
1740
    fins->op1 = fleft->op1;
107✔
1741
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
107✔
1742
    return RETRYFOLD;  /* (i o k1) o k2 ==> i o (k1 o k2) */
107✔
1743
  }
1744
  return NEXTFOLD;
1745
}
1746

1747
LJFOLD(ADD ADD KINT64)
1748
LJFOLD(MUL MUL KINT64)
1749
LJFOLD(BAND BAND KINT64)
1750
LJFOLD(BOR BOR KINT64)
1751
LJFOLD(BXOR BXOR KINT64)
1752
LJFOLDF(reassoc_intarith_k64)
396✔
1753
{
1754
#if LJ_HASFFI
1755
  IRIns *irk = IR(fleft->op2);
396✔
1756
  if (irk->o == IR_KINT64) {
396✔
1757
    uint64_t k = kfold_int64arith(J, ir_k64(irk)->u64, ir_k64(fright)->u64,
120✔
1758
                                  (IROp)fins->o);
120✔
1759
    PHIBARRIER(fleft);
120✔
1760
    fins->op1 = fleft->op1;
113✔
1761
    fins->op2 = (IRRef1)lj_ir_kint64(J, k);
113✔
1762
    return RETRYFOLD;  /* (i o k1) o k2 ==> i o (k1 o k2) */
113✔
1763
  }
1764
  return NEXTFOLD;
1765
#else
1766
  UNUSED(J); lj_assertJ(0, "FFI IR op without FFI"); return FAILFOLD;
1767
#endif
1768
}
1769

1770
LJFOLD(BAND BAND any)
1771
LJFOLD(BOR BOR any)
1772
LJFOLDF(reassoc_dup)
×
1773
{
1774
  if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2)
×
1775
    return LEFTFOLD;  /* (a o b) o a ==> a o b; (a o b) o b ==> a o b */
×
1776
  return NEXTFOLD;
1777
}
1778

1779
LJFOLD(MIN MIN any)
1780
LJFOLD(MAX MAX any)
1781
LJFOLDF(reassoc_dup_minmax)
6✔
1782
{
1783
  if (fins->op2 == fleft->op2)
6✔
1784
    return LEFTFOLD;  /* (a o b) o b ==> a o b */
×
1785
  return NEXTFOLD;
1786
}
1787

1788
LJFOLD(BXOR BXOR any)
1789
LJFOLDF(reassoc_bxor)
1✔
1790
{
1791
  PHIBARRIER(fleft);
1✔
1792
  if (fins->op2 == fleft->op1)  /* (a xor b) xor a ==> b */
1✔
1793
    return fleft->op2;
×
1794
  if (fins->op2 == fleft->op2)  /* (a xor b) xor b ==> a */
1✔
1795
    return fleft->op1;
×
1796
  return NEXTFOLD;
1797
}
1798

1799
LJFOLD(BSHL BSHL KINT)
1800
LJFOLD(BSHR BSHR KINT)
1801
LJFOLD(BSAR BSAR KINT)
1802
LJFOLD(BROL BROL KINT)
1803
LJFOLD(BROR BROR KINT)
1804
LJFOLDF(reassoc_shift)
1✔
1805
{
1806
  IRIns *irk = IR(fleft->op2);
1✔
1807
  PHIBARRIER(fleft);  /* The (shift any KINT) rule covers k2 == 0 and more. */
1✔
1808
  if (irk->o == IR_KINT) {  /* (i o k1) o k2 ==> i o (k1 + k2) */
1✔
1809
    int32_t mask = irt_is64(fins->t) ? 63 : 31;
1✔
1810
    int32_t k = (irk->i & mask) + (fright->i & mask);
1✔
1811
    if (k > mask) {  /* Combined shift too wide? */
1✔
1812
      if (fins->o == IR_BSHL || fins->o == IR_BSHR)
×
1813
        return mask == 31 ? INTFOLD(0) : INT64FOLD(0);
×
1814
      else if (fins->o == IR_BSAR)
×
1815
        k = mask;
1816
      else
1817
        k &= mask;
×
1818
    }
1819
    fins->op1 = fleft->op1;
1✔
1820
    fins->op2 = (IRRef1)lj_ir_kint(J, k);
1✔
1821
    return RETRYFOLD;
1✔
1822
  }
1823
  return NEXTFOLD;
1824
}
1825

1826
LJFOLD(MIN MIN KINT)
1827
LJFOLD(MAX MAX KINT)
1828
LJFOLDF(reassoc_minmax_k)
×
1829
{
1830
  IRIns *irk = IR(fleft->op2);
×
1831
  if (irk->o == IR_KINT) {
×
1832
    int32_t a = irk->i;
×
1833
    int32_t y = kfold_intop(a, fright->i, fins->o);
×
1834
    if (a == y)  /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
×
1835
      return LEFTFOLD;
×
1836
    PHIBARRIER(fleft);
×
1837
    fins->op1 = fleft->op1;
×
1838
    fins->op2 = (IRRef1)lj_ir_kint(J, y);
×
1839
    return RETRYFOLD;  /* (x o k1) o k2 ==> x o (k1 o k2) */
×
1840
  }
1841
  return NEXTFOLD;
1842
}
1843

1844
/* -- Array bounds check elimination -------------------------------------- */
1845

1846
/* Eliminate ABC across PHIs to handle t[i-1] forwarding case.
1847
** ABC(asize, (i+k)+(-k)) ==> ABC(asize, i), but only if it already exists.
1848
** Could be generalized to (i+k1)+k2 ==> i+(k1+k2), but needs better disambig.
1849
*/
1850
LJFOLD(ABC any ADD)
1851
LJFOLDF(abc_fwd)
615✔
1852
{
1853
  if (LJ_LIKELY(J->flags & JIT_F_OPT_ABC)) {
615✔
1854
    if (irref_isk(fright->op2)) {
615✔
1855
      IRIns *add2 = IR(fright->op1);
614✔
1856
      if (add2->o == IR_ADD && irref_isk(add2->op2) &&
614✔
1857
          IR(fright->op2)->i == -IR(add2->op2)->i) {
27✔
1858
        IRRef ref = J->chain[IR_ABC];
5✔
1859
        IRRef lim = add2->op1;
5✔
1860
        if (fins->op1 > lim) lim = fins->op1;
5✔
1861
        while (ref > lim) {
14✔
1862
          IRIns *ir = IR(ref);
13✔
1863
          if (ir->op1 == fins->op1 && ir->op2 == add2->op1)
13✔
1864
            return DROPFOLD;
1865
          ref = ir->prev;
9✔
1866
        }
1867
      }
1868
    }
1869
  }
1870
  return NEXTFOLD;
1871
}
1872

1873
/* Eliminate ABC for constants.
1874
** ABC(asize, k1), ABC(asize k2) ==> ABC(asize, max(k1, k2))
1875
** Drop second ABC if k2 is lower. Otherwise patch first ABC with k2.
1876
*/
1877
LJFOLD(ABC any KINT)
1878
LJFOLDF(abc_k)
961✔
1879
{
1880
  PHIBARRIER(fleft);
961✔
1881
  if (LJ_LIKELY(J->flags & JIT_F_OPT_ABC)) {
959✔
1882
    IRRef ref = J->chain[IR_ABC];
959✔
1883
    IRRef asize = fins->op1;
959✔
1884
    while (ref > asize) {
1,078✔
1885
      IRIns *ir = IR(ref);
819✔
1886
      if (ir->op1 == asize && irref_isk(ir->op2)) {
819✔
1887
        uint32_t k = (uint32_t)IR(ir->op2)->i;
700✔
1888
        if ((uint32_t)fright->i > k)
700✔
1889
          ir->op2 = fins->op2;
36✔
1890
        return DROPFOLD;
700✔
1891
      }
1892
      ref = ir->prev;
119✔
1893
    }
1894
    return EMITFOLD;  /* Already performed CSE. */
259✔
1895
  }
1896
  return NEXTFOLD;
1897
}
1898

1899
/* Eliminate invariant ABC inside loop. */
1900
LJFOLD(ABC any any)
1901
LJFOLDF(abc_invar)
4,343✔
1902
{
1903
  /* Invariant ABC marked as PTR. Drop if op1 is invariant, too. */
1904
  if (!irt_isint(fins->t) && fins->op1 < J->chain[IR_LOOP] &&
4,343✔
1905
      !irt_isphi(IR(fins->op1)->t))
6✔
1906
    return DROPFOLD;
6✔
1907
  return NEXTFOLD;
1908
}
1909

1910
/* -- Commutativity ------------------------------------------------------- */
1911

1912
/* The refs of commutative ops are canonicalized. Lower refs go to the right.
1913
** Rationale behind this:
1914
** - It (also) moves constants to the right.
1915
** - It reduces the number of FOLD rules (e.g. (BOR any KINT) suffices).
1916
** - It helps CSE to find more matches.
1917
** - The assembler generates better code with constants at the right.
1918
*/
1919

1920
LJFOLD(ADD any any)
1921
LJFOLD(MUL any any)
1922
LJFOLD(ADDOV any any)
1923
LJFOLD(MULOV any any)
1924
LJFOLDF(comm_swap)
26,252✔
1925
{
1926
  if (fins->op1 < fins->op2) {  /* Move lower ref to the right. */
11,486✔
1927
    IRRef1 tmp = fins->op1;
1,988✔
1928
    fins->op1 = fins->op2;
1,988✔
1929
    fins->op2 = tmp;
1,988✔
1930
    return RETRYFOLD;
1,988✔
1931
  }
1932
  return NEXTFOLD;
1933
}
1934

1935
LJFOLD(EQ any any)
1936
LJFOLD(NE any any)
1937
LJFOLDF(comm_equal)
14,544✔
1938
{
1939
  /* For non-numbers only: x == x ==> drop; x ~= x ==> fail */
1940
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))
14,544✔
1941
    return CONDFOLD(fins->o == IR_EQ);
94✔
1942
  return fold_comm_swap(J);
14,450✔
1943
}
1944

1945
LJFOLD(LT any any)
1946
LJFOLD(GE any any)
1947
LJFOLD(LE any any)
1948
LJFOLD(GT any any)
1949
LJFOLD(ULT any any)
1950
LJFOLD(UGE any any)
1951
LJFOLD(ULE any any)
1952
LJFOLD(UGT any any)
1953
LJFOLDF(comm_comp)
6,026✔
1954
{
1955
  /* For non-numbers only: x <=> x ==> drop; x <> x ==> fail */
1956
  if (fins->op1 == fins->op2 && !irt_isnum(fins->t))
6,026✔
1957
    return CONDFOLD((fins->o ^ (fins->o >> 1)) & 1);
×
1958
  if (fins->op1 < fins->op2) {  /* Move lower ref to the right. */
6,026✔
1959
    IRRef1 tmp = fins->op1;
782✔
1960
    fins->op1 = fins->op2;
782✔
1961
    fins->op2 = tmp;
782✔
1962
    fins->o ^= 3; /* GT <-> LT, GE <-> LE, does not affect U */
782✔
1963
    return RETRYFOLD;
782✔
1964
  }
1965
  return NEXTFOLD;
1966
}
1967

1968
LJFOLD(BAND any any)
1969
LJFOLD(BOR any any)
1970
LJFOLDF(comm_dup)
278✔
1971
{
1972
  if (fins->op1 == fins->op2)  /* x o x ==> x */
278✔
1973
    return LEFTFOLD;
×
1974
  return fold_comm_swap(J);
278✔
1975
}
1976

1977
LJFOLD(MIN any any)
1978
LJFOLD(MAX any any)
1979
LJFOLDF(comm_dup_minmax)
555✔
1980
{
1981
  if (fins->op1 == fins->op2)  /* x o x ==> x */
555✔
1982
    return LEFTFOLD;
×
1983
  return NEXTFOLD;
1984
}
1985

1986
LJFOLD(BXOR any any)
1987
LJFOLDF(comm_bxor)
40✔
1988
{
1989
  if (fins->op1 == fins->op2)  /* i xor i ==> 0 */
40✔
1990
    return irt_is64(fins->t) ? INT64FOLD(0) : INTFOLD(0);
2✔
1991
  return fold_comm_swap(J);
38✔
1992
}
1993

1994
/* -- Simplification of compound expressions ------------------------------ */
1995

1996
static TRef kfold_xload(jit_State *J, IRIns *ir, const void *p)
1997
{
1998
  int32_t k;
1999
  switch (irt_type(ir->t)) {
2000
  case IRT_NUM: return lj_ir_knum_u64(J, *(uint64_t *)p);
2001
  case IRT_I8: k = (int32_t)*(int8_t *)p; break;
2002
  case IRT_U8: k = (int32_t)*(uint8_t *)p; break;
2003
  case IRT_I16: k = (int32_t)(int16_t)lj_getu16(p); break;
2004
  case IRT_U16: k = (int32_t)(uint16_t)lj_getu16(p); break;
2005
  case IRT_INT: case IRT_U32: k = (int32_t)lj_getu32(p); break;
2006
  case IRT_I64: case IRT_U64: return lj_ir_kint64(J, *(uint64_t *)p);
2007
  default: return 0;
2008
  }
2009
  return lj_ir_kint(J, k);
2010
}
2011

2012
/* Turn: string.sub(str, a, b) == kstr
2013
** into: string.byte(str, a) == string.byte(kstr, 1) etc.
2014
** Note: this creates unaligned XLOADs on x86/x64.
2015
*/
2016
LJFOLD(EQ SNEW KGC)
2017
LJFOLD(NE SNEW KGC)
2018
LJFOLDF(merge_eqne_snew_kgc)
81✔
2019
{
2020
  GCstr *kstr = ir_kstr(fright);
81✔
2021
  int32_t len = (int32_t)kstr->len;
81✔
2022
  lj_assertJ(irt_isstr(fins->t), "bad equality IR type");
81✔
2023

2024
#if LJ_TARGET_UNALIGNED
2025
#define FOLD_SNEW_MAX_LEN        4  /* Handle string lengths 0, 1, 2, 3, 4. */
2026
#define FOLD_SNEW_TYPE8                IRT_I8        /* Creates shorter immediates. */
2027
#else
2028
#define FOLD_SNEW_MAX_LEN        1  /* Handle string lengths 0 or 1. */
2029
#define FOLD_SNEW_TYPE8                IRT_U8  /* Prefer unsigned loads. */
2030
#endif
2031

2032
  PHIBARRIER(fleft);
81✔
2033
  if (len <= FOLD_SNEW_MAX_LEN) {
77✔
2034
    IROp op = (IROp)fins->o;
77✔
2035
    IRRef strref = fleft->op1;
77✔
2036
    if (IR(strref)->o != IR_STRREF)
77✔
2037
      return NEXTFOLD;
2038
    if (op == IR_EQ) {
77✔
2039
      emitir(IRTGI(IR_EQ), fleft->op2, lj_ir_kint(J, len));
19✔
2040
      /* Caveat: fins/fleft/fright is no longer valid after emitir. */
2041
    } else {
2042
      /* NE is not expanded since this would need an OR of two conds. */
2043
      if (!irref_isk(fleft->op2))  /* Only handle the constant length case. */
58✔
2044
        return NEXTFOLD;
2045
      if (IR(fleft->op2)->i != len)
56✔
2046
        return DROPFOLD;
2047
    }
2048
    if (len > 0) {
44✔
2049
      /* A 4 byte load for length 3 is ok -- all strings have an extra NUL. */
2050
      uint16_t ot = (uint16_t)(len == 1 ? IRT(IR_XLOAD, FOLD_SNEW_TYPE8) :
43✔
2051
                               len == 2 ? IRT(IR_XLOAD, IRT_U16) :
2052
                               IRTI(IR_XLOAD));
2053
      TRef tmp = emitir(ot, strref,
83✔
2054
                        IRXLOAD_READONLY | (len > 1 ? IRXLOAD_UNALIGNED : 0));
2055
      TRef val = kfold_xload(J, IR(tref_ref(tmp)), strdata(kstr));
43✔
2056
      if (len == 3)
43✔
2057
        tmp = emitir(IRTI(IR_BAND), tmp,
1✔
2058
                     lj_ir_kint(J, LJ_ENDIAN_SELECT(0x00ffffff, 0xffffff00)));
2059
      fins->op1 = (IRRef1)tmp;
43✔
2060
      fins->op2 = (IRRef1)val;
43✔
2061
      fins->ot = (IROpT)IRTGI(op);
43✔
2062
      return RETRYFOLD;
43✔
2063
    } else {
2064
      return DROPFOLD;
2065
    }
2066
  }
2067
  return NEXTFOLD;
2068
}
2069

2070
/* -- Loads --------------------------------------------------------------- */
2071

2072
/* Loads cannot be folded or passed on to CSE in general.
2073
** Alias analysis is needed to check for forwarding opportunities.
2074
**
2075
** Caveat: *all* loads must be listed here or they end up at CSE!
2076
*/
2077

2078
LJFOLD(ALOAD any)
2079
LJFOLDX(lj_opt_fwd_aload)
2080

2081
/* From HREF fwd (see below). Must eliminate, not supported by fwd/backend. */
2082
LJFOLD(HLOAD KKPTR)
2083
LJFOLDF(kfold_hload_kkptr)
3✔
2084
{
2085
  UNUSED(J);
3✔
2086
  lj_assertJ(ir_kptr(fleft) == niltvg(J2G(J)), "expected niltv");
3✔
2087
  return TREF_NIL;
3✔
2088
}
2089

2090
LJFOLD(HLOAD any)
2091
LJFOLDX(lj_opt_fwd_hload)
2092

2093
LJFOLD(ULOAD any)
2094
LJFOLDX(lj_opt_fwd_uload)
2095

2096
LJFOLD(CALLL any IRCALL_lj_tab_len)
2097
LJFOLDX(lj_opt_fwd_tab_len)
2098

2099
/* Upvalue refs are really loads, but there are no corresponding stores.
2100
** So CSE is ok for them, except for UREFO across a GC step (see below).
2101
** If the referenced function is const, its upvalue addresses are const, too.
2102
** This can be used to improve CSE by looking for the same address,
2103
** even if the upvalues originate from a different function.
2104
*/
2105
LJFOLD(UREFO KGC any)
2106
LJFOLD(UREFC KGC any)
2107
LJFOLDF(cse_uref)
635✔
2108
{
2109
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
635✔
2110
    IRRef ref = J->chain[fins->o];
635✔
2111
    GCfunc *fn = ir_kfunc(fleft);
635✔
2112
    GCupval *uv = gco2uv(gcref(fn->l.uvptr[(fins->op2 >> 8)]));
635✔
2113
    while (ref > 0) {
1,074✔
2114
      IRIns *ir = IR(ref);
724✔
2115
      if (irref_isk(ir->op1)) {
724✔
2116
        GCfunc *fn2 = ir_kfunc(IR(ir->op1));
684✔
2117
        if (gco2uv(gcref(fn2->l.uvptr[(ir->op2 >> 8)])) == uv) {
684✔
2118
          if (fins->o == IR_UREFO && gcstep_barrier(J, ref))
285✔
2119
            break;
2120
          return ref;
2121
        }
2122
      }
2123
      ref = ir->prev;
439✔
2124
    }
2125
  }
2126
  return EMITFOLD;
358✔
2127
}
2128

2129
LJFOLD(HREFK any any)
2130
LJFOLDX(lj_opt_fwd_hrefk)
2131

2132
LJFOLD(HREF TNEW any)
2133
LJFOLDF(fwd_href_tnew)
12✔
2134
{
2135
  if (lj_opt_fwd_href_nokey(J))
12✔
2136
    return lj_ir_kkptr(J, niltvg(J2G(J)));
8✔
2137
  return NEXTFOLD;
2138
}
2139

2140
LJFOLD(HREF TDUP KPRI)
2141
LJFOLD(HREF TDUP KGC)
2142
LJFOLD(HREF TDUP KNUM)
2143
LJFOLDF(fwd_href_tdup)
4✔
2144
{
2145
  TValue keyv;
4✔
2146
  lj_ir_kvalue(J->L, &keyv, fright);
4✔
2147
  if (lj_tab_get(J->L, ir_ktab(IR(fleft->op1)), &keyv) == niltvg(J2G(J)) &&
8✔
2148
      lj_opt_fwd_href_nokey(J))
4✔
2149
    return lj_ir_kkptr(J, niltvg(J2G(J)));
4✔
2150
  return NEXTFOLD;
2151
}
2152

2153
/* We can safely FOLD/CSE array/hash refs and field loads, since there
2154
** are no corresponding stores. But we need to check for any NEWREF with
2155
** an aliased table, as it may invalidate all of the pointers and fields.
2156
** Only HREF needs the NEWREF check -- AREF and HREFK already depend on
2157
** FLOADs. And NEWREF itself is treated like a store (see below).
2158
** LREF is constant (per trace) since coroutine switches are not inlined.
2159
*/
2160
LJFOLD(FLOAD TNEW IRFL_TAB_ASIZE)
2161
LJFOLDF(fload_tab_tnew_asize)
773✔
2162
{
2163
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
773✔
2164
    return INTFOLD(fleft->op1);
182✔
2165
  return NEXTFOLD;
2166
}
2167

2168
LJFOLD(FLOAD TNEW IRFL_TAB_HMASK)
2169
LJFOLDF(fload_tab_tnew_hmask)
50✔
2170
{
2171
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
50✔
2172
    return INTFOLD((1 << fleft->op2)-1);
40✔
2173
  return NEXTFOLD;
2174
}
2175

2176
LJFOLD(FLOAD TDUP IRFL_TAB_ASIZE)
2177
LJFOLDF(fload_tab_tdup_asize)
46✔
2178
{
2179
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
46✔
2180
    return INTFOLD((int32_t)ir_ktab(IR(fleft->op1))->asize);
34✔
2181
  return NEXTFOLD;
2182
}
2183

2184
LJFOLD(FLOAD TDUP IRFL_TAB_HMASK)
2185
LJFOLDF(fload_tab_tdup_hmask)
124✔
2186
{
2187
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && lj_opt_fwd_tptr(J, fins->op1))
124✔
2188
    return INTFOLD((int32_t)ir_ktab(IR(fleft->op1))->hmask);
107✔
2189
  return NEXTFOLD;
2190
}
2191

2192
LJFOLD(HREF any any)
2193
LJFOLD(FLOAD any IRFL_TAB_ARRAY)
2194
LJFOLD(FLOAD any IRFL_TAB_NODE)
2195
LJFOLD(FLOAD any IRFL_TAB_ASIZE)
2196
LJFOLD(FLOAD any IRFL_TAB_HMASK)
2197
LJFOLDF(fload_tab_ah)
30,026✔
2198
{
2199
  TRef tr = lj_opt_cse(J);
30,026✔
2200
  return lj_opt_fwd_tptr(J, tref_ref(tr)) ? tr : EMITFOLD;
30,026✔
2201
}
2202

2203
/* Strings are immutable, so we can safely FOLD/CSE the related FLOAD. */
2204
LJFOLD(FLOAD KGC IRFL_STR_LEN)
2205
LJFOLDF(fload_str_len_kgc)
91✔
2206
{
2207
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
91✔
2208
    return INTFOLD((int32_t)ir_kstr(fleft)->len);
91✔
2209
  return NEXTFOLD;
2210
}
2211

2212
LJFOLD(FLOAD SNEW IRFL_STR_LEN)
2213
LJFOLDF(fload_str_len_snew)
3✔
2214
{
2215
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
3✔
2216
    PHIBARRIER(fleft);
3✔
2217
    return fleft->op2;
3✔
2218
  }
2219
  return NEXTFOLD;
2220
}
2221

2222
LJFOLD(FLOAD TOSTR IRFL_STR_LEN)
2223
LJFOLDF(fload_str_len_tostr)
×
2224
{
2225
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD) && fleft->op2 == IRTOSTR_CHAR)
×
2226
    return INTFOLD(1);
×
2227
  return NEXTFOLD;
2228
}
2229

2230
/* The C type ID of cdata objects is immutable. */
2231
LJFOLD(FLOAD KGC IRFL_CDATA_CTYPEID)
2232
LJFOLDF(fload_cdata_typeid_kgc)
79✔
2233
{
2234
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
79✔
2235
    return INTFOLD((int32_t)ir_kcdata(fleft)->ctypeid);
79✔
2236
  return NEXTFOLD;
2237
}
2238

2239
/* Get the contents of immutable cdata objects. */
2240
LJFOLD(FLOAD KGC IRFL_CDATA_PTR)
2241
LJFOLD(FLOAD KGC IRFL_CDATA_INT)
2242
LJFOLD(FLOAD KGC IRFL_CDATA_INT64)
2243
LJFOLDF(fload_cdata_int64_kgc)
49✔
2244
{
2245
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD)) {
49✔
2246
    void *p = cdataptr(ir_kcdata(fleft));
49✔
2247
    if (irt_is64(fins->t))
49✔
2248
      return INT64FOLD(*(uint64_t *)p);
45✔
2249
    else
2250
      return INTFOLD(*(int32_t *)p);
4✔
2251
  }
2252
  return NEXTFOLD;
2253
}
2254

2255
LJFOLD(FLOAD CNEW IRFL_CDATA_CTYPEID)
2256
LJFOLD(FLOAD CNEWI IRFL_CDATA_CTYPEID)
2257
LJFOLDF(fload_cdata_typeid_cnew)
327✔
2258
{
2259
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
327✔
2260
    return fleft->op1;  /* No PHI barrier needed. CNEW/CNEWI op1 is const. */
327✔
2261
  return NEXTFOLD;
2262
}
2263

2264
/* Pointer, int and int64 cdata objects are immutable. */
2265
LJFOLD(FLOAD CNEWI IRFL_CDATA_PTR)
2266
LJFOLD(FLOAD CNEWI IRFL_CDATA_INT)
2267
LJFOLD(FLOAD CNEWI IRFL_CDATA_INT64)
2268
LJFOLDF(fload_cdata_ptr_int64_cnew)
283✔
2269
{
2270
  if (LJ_LIKELY(J->flags & JIT_F_OPT_FOLD))
283✔
2271
    return fleft->op2;  /* Fold even across PHI to avoid allocations. */
283✔
2272
  return NEXTFOLD;
2273
}
2274

2275
LJFOLD(FLOAD any IRFL_STR_LEN)
2276
LJFOLD(FLOAD any IRFL_FUNC_ENV)
2277
LJFOLD(FLOAD any IRFL_THREAD_ENV)
2278
LJFOLD(FLOAD any IRFL_CDATA_CTYPEID)
2279
LJFOLD(FLOAD any IRFL_CDATA_PTR)
2280
LJFOLD(FLOAD any IRFL_CDATA_INT)
2281
LJFOLD(FLOAD any IRFL_CDATA_INT64)
2282
LJFOLD(VLOAD any any)  /* Vararg loads have no corresponding stores. */
2283
LJFOLDX(lj_opt_cse)
2284

2285
/* All other field loads need alias analysis. */
2286
LJFOLD(FLOAD any any)
2287
LJFOLDX(lj_opt_fwd_fload)
2288

2289
/* This is for LOOP only. Recording handles SLOADs internally. */
2290
LJFOLD(SLOAD any any)
2291
LJFOLDF(fwd_sload)
4,480✔
2292
{
2293
  if ((fins->op2 & IRSLOAD_FRAME)) {
4,480✔
2294
    TRef tr = lj_opt_cse(J);
38✔
2295
    return tref_ref(tr) < J->chain[IR_RETF] ? EMITFOLD : tr;
38✔
2296
  } else {
2297
    lj_assertJ(J->slot[fins->op1] != 0, "uninitialized slot accessed");
4,442✔
2298
    return J->slot[fins->op1];
4,442✔
2299
  }
2300
}
2301

2302
/* Only fold for KKPTR. The pointer _and_ the contents must be const. */
2303
LJFOLD(XLOAD KKPTR any)
2304
LJFOLDF(xload_kptr)
49✔
2305
{
2306
  TRef tr = kfold_xload(J, fins, ir_kptr(fleft));
49✔
2307
  return tr ? tr : NEXTFOLD;
49✔
2308
}
2309

2310
LJFOLD(XLOAD any any)
2311
LJFOLDX(lj_opt_fwd_xload)
2312

2313
/* -- Write barriers ------------------------------------------------------ */
2314

2315
/* Write barriers are amenable to CSE, but not across any incremental
2316
** GC steps.
2317
**
2318
** The same logic applies to open upvalue references, because a stack
2319
** may be resized during a GC step (not the current stack, but maybe that
2320
** of a coroutine).
2321
*/
2322
LJFOLD(TBAR any)
2323
LJFOLD(OBAR any any)
2324
LJFOLD(UREFO any any)
2325
LJFOLDF(barrier_tab)
1,717✔
2326
{
2327
  TRef tr = lj_opt_cse(J);
1,717✔
2328
  if (gcstep_barrier(J, tref_ref(tr)))  /* CSE across GC step? */
1,717✔
2329
    return EMITFOLD;  /* Raw emit. Assumes fins is left intact by CSE. */
53✔
2330
  return tr;
2331
}
2332

2333
LJFOLD(TBAR TNEW)
2334
LJFOLD(TBAR TDUP)
2335
LJFOLDF(barrier_tnew_tdup)
132✔
2336
{
2337
  /* New tables are always white and never need a barrier. */
2338
  if (fins->op1 < J->chain[IR_LOOP])  /* Except across a GC step. */
132✔
2339
    return NEXTFOLD;
×
2340
  return DROPFOLD;
2341
}
2342

2343
/* -- Profiling ----------------------------------------------------------- */
2344

2345
LJFOLD(PROF any any)
2346
LJFOLDF(prof)
1✔
2347
{
2348
  IRRef ref = J->chain[IR_PROF];
1✔
2349
  if (ref+1 == J->cur.nins)  /* Drop neighbouring IR_PROF. */
1✔
2350
    return ref;
2351
  return EMITFOLD;
1✔
2352
}
2353

2354
/* -- Stores and allocations ---------------------------------------------- */
2355

2356
/* Stores and allocations cannot be folded or passed on to CSE in general.
2357
** But some stores can be eliminated with dead-store elimination (DSE).
2358
**
2359
** Caveat: *all* stores and allocs must be listed here or they end up at CSE!
2360
*/
2361

2362
LJFOLD(ASTORE any any)
2363
LJFOLD(HSTORE any any)
2364
LJFOLDX(lj_opt_dse_ahstore)
2365

2366
LJFOLD(USTORE any any)
2367
LJFOLDX(lj_opt_dse_ustore)
2368

2369
LJFOLD(FSTORE any any)
2370
LJFOLDX(lj_opt_dse_fstore)
2371

2372
LJFOLD(XSTORE any any)
2373
LJFOLDX(lj_opt_dse_xstore)
2374

2375
LJFOLD(NEWREF any any)  /* Treated like a store. */
2376
LJFOLD(CALLA any any)
2377
LJFOLD(CALLL any any)  /* Safeguard fallback. */
2378
LJFOLD(CALLS any any)
2379
LJFOLD(CALLXS any any)
2380
LJFOLD(XBAR)
2381
LJFOLD(RETF any any)  /* Modifies BASE. */
2382
LJFOLD(TNEW any any)
2383
LJFOLD(TDUP any)
2384
LJFOLD(CNEW any any)
2385
LJFOLD(XSNEW any any)
2386
LJFOLD(BUFHDR any any)
2387
LJFOLDX(lj_ir_emit)
2388

2389
/* ------------------------------------------------------------------------ */
2390

2391
/* Every entry in the generated hash table is a 32 bit pattern:
2392
**
2393
** xxxxxxxx iiiiiii lllllll rrrrrrrrrr
2394
**
2395
**   xxxxxxxx = 8 bit index into fold function table
2396
**    iiiiiii = 7 bit folded instruction opcode
2397
**    lllllll = 7 bit left instruction opcode
2398
** rrrrrrrrrr = 8 bit right instruction opcode or 10 bits from literal field
2399
*/
2400

2401
#include "lj_folddef.h"
2402

2403
/* ------------------------------------------------------------------------ */
2404

2405
/* Fold IR instruction. */
2406
TRef LJ_FASTCALL lj_opt_fold(jit_State *J)
752,320✔
2407
{
2408
  uint32_t key, any;
752,320✔
2409
  IRRef ref;
752,320✔
2410

2411
  if (LJ_UNLIKELY((J->flags & JIT_F_OPT_MASK) != JIT_F_OPT_DEFAULT)) {
752,320✔
2412
    lj_assertJ(((JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE|JIT_F_OPT_DSE) |
608,431✔
2413
                JIT_F_OPT_DEFAULT) == JIT_F_OPT_DEFAULT,
2414
               "bad JIT_F_OPT_DEFAULT");
2415
    /* Folding disabled? Chain to CSE, but not for loads/stores/allocs. */
2416
    if (!(J->flags & JIT_F_OPT_FOLD) && irm_kind(lj_ir_mode[fins->o]) == IRM_N)
608,431✔
2417
      return lj_opt_cse(J);
263,415✔
2418

2419
    /* No FOLD, forwarding or CSE? Emit raw IR for loads, except for SLOAD. */
2420
    if ((J->flags & (JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE)) !=
345,016✔
2421
                    (JIT_F_OPT_FOLD|JIT_F_OPT_FWD|JIT_F_OPT_CSE) &&
343,413✔
2422
        irm_kind(lj_ir_mode[fins->o]) == IRM_L && fins->o != IR_SLOAD)
343,413✔
2423
      return lj_ir_emit(J);
254,495✔
2424

2425
    /* No FOLD or DSE? Emit raw IR for stores. */
2426
    if ((J->flags & (JIT_F_OPT_FOLD|JIT_F_OPT_DSE)) !=
90,521✔
2427
                    (JIT_F_OPT_FOLD|JIT_F_OPT_DSE) &&
88,762✔
2428
        irm_kind(lj_ir_mode[fins->o]) == IRM_S)
88,762✔
2429
      return lj_ir_emit(J);
88,511✔
2430
  }
2431

2432
  /* Fold engine start/retry point. */
2433
retry:
145,899✔
2434
  /* Construct key from opcode and operand opcodes (unless literal/none). */
2435
  key = ((uint32_t)fins->o << 17);
149,192✔
2436
  if (fins->op1 >= J->cur.nk) {
149,192✔
2437
    key += (uint32_t)IR(fins->op1)->o << 10;
141,239✔
2438
    *fleft = *IR(fins->op1);
141,239✔
2439
    if (fins->op1 < REF_TRUE)
141,239✔
2440
      fleft[1] = IR(fins->op1)[1];
18,523✔
2441
  }
2442
  if (fins->op2 >= J->cur.nk) {
149,192✔
2443
    key += (uint32_t)IR(fins->op2)->o;
80,413✔
2444
    *fright = *IR(fins->op2);
80,413✔
2445
    if (fins->op2 < REF_TRUE)
80,413✔
2446
      fright[1] = IR(fins->op2)[1];
52,890✔
2447
  } else {
2448
    key += (fins->op2 & 0x3ffu);  /* Literal mask. Must include IRCONV_*MASK. */
68,779✔
2449
  }
2450

2451
  /* Check for a match in order from most specific to least specific. */
2452
  any = 0;
149,192✔
2453
  for (;;) {
755,656✔
2454
    uint32_t k = key | (any & 0x1ffff);
452,424✔
2455
    uint32_t h = fold_hashkey(k);
452,424✔
2456
    uint32_t fh = fold_hash[h];  /* Lookup key in semi-perfect hash table. */
452,424✔
2457
    if ((fh & 0xffffff) == k || (fh = fold_hash[h+1], (fh & 0xffffff) == k)) {
452,424✔
2458
      ref = (IRRef)tref_ref(fold_func[fh >> 24](J));
143,343✔
2459
      if (ref != NEXTFOLD)
143,343✔
2460
        break;
2461
    }
2462
    if (any == 0xfffff)  /* Exhausted folding. Pass on to CSE. */
353,002✔
2463
      return lj_opt_cse(J);
49,770✔
2464
    any = (any | (any >> 10)) ^ 0xffc00;
303,232✔
2465
  }
2466

2467
  /* Return value processing, ordered by frequency. */
2468
  if (LJ_LIKELY(ref >= MAX_FOLD))
99,422✔
2469
    return TREF(ref, irt_t(IR(ref)->t));
86,744✔
2470
  if (ref == RETRYFOLD)
12,678✔
2471
    goto retry;
3,293✔
2472
  if (ref == KINTFOLD)
9,385✔
2473
    return lj_ir_kint(J, fins->i);
1,108✔
2474
  if (ref == FAILFOLD)
8,277✔
2475
    lj_trace_err(J, LJ_TRERR_GFAIL);
1✔
2476
  lj_assertJ(ref == DROPFOLD, "bad fold result");
2477
  return REF_DROP;
2478
}
2479

2480
/* -- Common-Subexpression Elimination ------------------------------------ */
2481

2482
/* CSE an IR instruction. This is very fast due to the skip-list chains. */
2483
TRef LJ_FASTCALL lj_opt_cse(jit_State *J)
359,381✔
2484
{
2485
  /* Avoid narrow to wide store-to-load forwarding stall */
2486
  IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
359,381✔
2487
  IROp op = fins->o;
359,381✔
2488
  if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
359,381✔
2489
    /* Limited search for same operands in per-opcode chain. */
2490
    IRRef ref = J->chain[op];
95,966✔
2491
    IRRef lim = fins->op1;
95,966✔
2492
    if (fins->op2 > lim) lim = fins->op2;  /* Relies on lit < REF_BIAS. */
95,966✔
2493
    while (ref > lim) {
236,990✔
2494
      if (IR(ref)->op12 == op12)
178,037✔
2495
        return TREF(ref, irt_t(IR(ref)->t));  /* Common subexpression found. */
37,013✔
2496
      ref = IR(ref)->prev;
141,024✔
2497
    }
2498
  }
2499
  /* Otherwise emit IR (inlined for speed). */
2500
  {
2501
    IRRef ref = lj_ir_nextins(J);
322,368✔
2502
    IRIns *ir = IR(ref);
322,368✔
2503
    ir->prev = J->chain[op];
322,368✔
2504
    ir->op12 = op12;
322,368✔
2505
    J->chain[op] = (IRRef1)ref;
322,368✔
2506
    ir->o = fins->o;
322,368✔
2507
    J->guardemit.irt |= fins->t.irt;
322,368✔
2508
    return TREF(ref, irt_t((ir->t = fins->t)));
322,368✔
2509
  }
2510
}
2511

2512
/* CSE with explicit search limit. */
2513
TRef LJ_FASTCALL lj_opt_cselim(jit_State *J, IRRef lim)
3,887✔
2514
{
2515
  IRRef ref = J->chain[fins->o];
3,887✔
2516
  IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
3,887✔
2517
  while (ref > lim) {
14,493✔
2518
    if (IR(ref)->op12 == op12)
12,345✔
2519
      return ref;
1,739✔
2520
    ref = IR(ref)->prev;
10,606✔
2521
  }
2522
  return lj_ir_emit(J);
2,148✔
2523
}
2524

2525
/* ------------------------------------------------------------------------ */
2526

2527
#undef IR
2528
#undef fins
2529
#undef fleft
2530
#undef fright
2531
#undef knumleft
2532
#undef knumright
2533
#undef emitir
2534

2535
#endif
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc