• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37606

31 Aug 2023 03:12AM UTC coverage: 86.167% (-0.03%) from 86.2%
#37606

push

local

web-flow
Refine effects based on optimizer-derived information (#50805)

The optimizer may be able to derive information that is not available to
inference. For example, it may SROA a mutable value to derive additional
constant information. Additionally, some effects, like :consistent are
path-dependent and should ideally be scanned once all optimizations are
done. Now, there is a bit of a complication that we have generally so
far taken the position that the optimizer may do non-IPO-safe
optimizations, although in practice we never actually implemented any.
This was a sensible choice, because we weren't really doing anything
with the post-optimized IR other than feeding it into codegen anyway.
However, with irinterp and this change, there's now two consumers of
IPO-safely optimized IR. I do still think we may at some point want to
run passes that allow IPO-unsafe optimizations, but we can always add
them at the end of the pipeline.

With these changes, the effect analysis is a lot more precise. For
example, we can now derive :consistent for these functions:
```
function f1(b)
    if Base.inferencebarrier(b)
        error()
    end
    return b
end

function f3(x)
    @fastmath sqrt(x)
    return x
end
```
and we can derive `:nothrow` for this function:
```
function f2()
    if Ref(false)[]
        error()
    end
    return true
end
```

414 of 414 new or added lines in 13 files covered. (100.0%)

73383 of 85164 relevant lines covered (86.17%)

12690106.67 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

87.86
/base/float.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
const IEEEFloat = Union{Float16, Float32, Float64}
4

5
## floating point traits ##
6

7
"""
8
    Inf16
9

10
Positive infinity of type [`Float16`](@ref).
11
"""
12
const Inf16 = bitcast(Float16, 0x7c00)
13
"""
14
    NaN16
15

16
A not-a-number value of type [`Float16`](@ref).
17
"""
18
const NaN16 = bitcast(Float16, 0x7e00)
19
"""
20
    Inf32
21

22
Positive infinity of type [`Float32`](@ref).
23
"""
24
const Inf32 = bitcast(Float32, 0x7f800000)
25
"""
26
    NaN32
27

28
A not-a-number value of type [`Float32`](@ref).
29
"""
30
const NaN32 = bitcast(Float32, 0x7fc00000)
31
const Inf64 = bitcast(Float64, 0x7ff0000000000000)
32
const NaN64 = bitcast(Float64, 0x7ff8000000000000)
33

34
const Inf = Inf64
35
"""
36
    Inf, Inf64
37

38
Positive infinity of type [`Float64`](@ref).
39

40
See also: [`isfinite`](@ref), [`typemax`](@ref), [`NaN`](@ref), [`Inf32`](@ref).
41

42
# Examples
43
```jldoctest
44
julia> π/0
45
Inf
46

47
julia> +1.0 / -0.0
48
-Inf
49

50
julia> ℯ^-Inf
51
0.0
52
```
53
"""
54
Inf, Inf64
55

56
const NaN = NaN64
57
"""
58
    NaN, NaN64
59

60
A not-a-number value of type [`Float64`](@ref).
61

62
See also: [`isnan`](@ref), [`missing`](@ref), [`NaN32`](@ref), [`Inf`](@ref).
63

64
# Examples
65
```jldoctest
66
julia> 0/0
67
NaN
68

69
julia> Inf - Inf
70
NaN
71

72
julia> NaN == NaN, isequal(NaN, NaN), NaN === NaN
73
(false, true, true)
74
```
75
"""
76
NaN, NaN64
77

78
# bit patterns
79
reinterpret(::Type{Unsigned}, x::Float64) = reinterpret(UInt64, x)
10,859,367✔
80
reinterpret(::Type{Unsigned}, x::Float32) = reinterpret(UInt32, x)
605,581,679✔
81
reinterpret(::Type{Unsigned}, x::Float16) = reinterpret(UInt16, x)
3,796,213✔
82
reinterpret(::Type{Signed}, x::Float64) = reinterpret(Int64, x)
600,259,752✔
83
reinterpret(::Type{Signed}, x::Float32) = reinterpret(Int32, x)
600,449,756✔
84
reinterpret(::Type{Signed}, x::Float16) = reinterpret(Int16, x)
675,698✔
85

86
sign_mask(::Type{Float64}) =        0x8000_0000_0000_0000
×
87
exponent_mask(::Type{Float64}) =    0x7ff0_0000_0000_0000
×
88
exponent_one(::Type{Float64}) =     0x3ff0_0000_0000_0000
×
89
exponent_half(::Type{Float64}) =    0x3fe0_0000_0000_0000
90✔
90
significand_mask(::Type{Float64}) = 0x000f_ffff_ffff_ffff
×
91

92
sign_mask(::Type{Float32}) =        0x8000_0000
3,708,533✔
93
exponent_mask(::Type{Float32}) =    0x7f80_0000
×
94
exponent_one(::Type{Float32}) =     0x3f80_0000
×
95
exponent_half(::Type{Float32}) =    0x3f00_0000
1,000,025✔
96
significand_mask(::Type{Float32}) = 0x007f_ffff
×
97

98
sign_mask(::Type{Float16}) =        0x8000
1,679,821✔
99
exponent_mask(::Type{Float16}) =    0x7c00
×
100
exponent_one(::Type{Float16}) =     0x3c00
×
101
exponent_half(::Type{Float16}) =    0x3800
999,271✔
102
significand_mask(::Type{Float16}) = 0x03ff
×
103

104
mantissa(x::T) where {T} = reinterpret(Unsigned, x) & significand_mask(T)
1,078,924✔
105

106
for T in (Float16, Float32, Float64)
107
    @eval significand_bits(::Type{$T}) = $(trailing_ones(significand_mask(T)))
×
108
    @eval exponent_bits(::Type{$T}) = $(sizeof(T)*8 - significand_bits(T) - 1)
312,575,479✔
109
    @eval exponent_bias(::Type{$T}) = $(Int(exponent_one(T) >> significand_bits(T)))
×
110
    # maximum float exponent
111
    @eval exponent_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)) - exponent_bias(T) - 1)
×
112
    # maximum float exponent without bias
113
    @eval exponent_raw_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)))
151,321,648✔
114
end
115

116
"""
117
    exponent_max(T)
118

119
Maximum [`exponent`](@ref) value for a floating point number of type `T`.
120

121
# Examples
122
```jldoctest
123
julia> Base.exponent_max(Float64)
124
1023
125
```
126

127
Note, `exponent_max(T) + 1` is a possible value of the exponent field
128
with bias, which might be used as sentinel value for `Inf` or `NaN`.
129
"""
130
function exponent_max end
131

132
"""
133
    exponent_raw_max(T)
134

135
Maximum value of the [`exponent`](@ref) field for a floating point number of type `T` without bias,
136
i.e. the maximum integer value representable by [`exponent_bits(T)`](@ref) bits.
137
"""
138
function exponent_raw_max end
139

140
"""
141
IEEE 754 definition of the minimum exponent.
142
"""
143
ieee754_exponent_min(::Type{T}) where {T<:IEEEFloat} = Int(1 - exponent_max(T))::Int
312,575,392✔
144

145
exponent_min(::Type{Float16}) = ieee754_exponent_min(Float16)
312,575,392✔
146
exponent_min(::Type{Float32}) = ieee754_exponent_min(Float32)
×
147
exponent_min(::Type{Float64}) = ieee754_exponent_min(Float64)
×
148

149
function ieee754_representation(
312,575,392✔
150
    ::Type{F}, sign_bit::Bool, exponent_field::Integer, significand_field::Integer
151
) where {F<:IEEEFloat}
152
    T = uinttype(F)
312,575,392✔
153
    ret::T = sign_bit
1,269,090,431✔
154
    ret <<= exponent_bits(F)
1,269,090,431✔
155
    ret |= exponent_field
1,269,090,431✔
156
    ret <<= significand_bits(F)
1,269,090,431✔
157
    ret |= significand_field
1,269,090,431✔
158
end
159

160
# ±floatmax(T)
161
function ieee754_representation(
125,377,412✔
162
    ::Type{F}, sign_bit::Bool, ::Val{:omega}
163
) where {F<:IEEEFloat}
164
    ieee754_representation(F, sign_bit, exponent_raw_max(F) - 1, significand_mask(F))
146,001,656✔
165
end
166

167
# NaN or an infinity
168
function ieee754_representation(
85,360,948✔
169
    ::Type{F}, sign_bit::Bool, significand_field::Integer, ::Val{:nan}
170
) where {F<:IEEEFloat}
171
    ieee754_representation(F, sign_bit, exponent_raw_max(F), significand_field)
194,979,421✔
172
end
173

174
# NaN with default payload
175
function ieee754_representation(
26✔
176
    ::Type{F}, sign_bit::Bool, ::Val{:nan}
177
) where {F<:IEEEFloat}
178
    ieee754_representation(F, sign_bit, one(uinttype(F)) << (significand_bits(F) - 1), Val(:nan))
91✔
179
end
180

181
# Infinity
182
function ieee754_representation(
161,057,828✔
183
    ::Type{F}, sign_bit::Bool, ::Val{:inf}
184
) where {F<:IEEEFloat}
185
    ieee754_representation(F, sign_bit, false, Val(:nan))
194,979,330✔
186
end
187

188
# Subnormal or zero
189
function ieee754_representation(
79,799,247✔
190
    ::Type{F}, sign_bit::Bool, significand_field::Integer, ::Val{:subnormal}
191
) where {F<:IEEEFloat}
192
    ieee754_representation(F, sign_bit, false, significand_field)
180,996,161✔
193
end
194

195
# Zero
196
function ieee754_representation(
151,390,494✔
197
    ::Type{F}, sign_bit::Bool, ::Val{:zero}
198
) where {F<:IEEEFloat}
199
    ieee754_representation(F, sign_bit, false, Val(:subnormal))
180,996,161✔
200
end
201

202
"""
203
    uabs(x::Integer)
204

205
Return the absolute value of `x`, possibly returning a different type should the
206
operation be susceptible to overflow. This typically arises when `x` is a two's complement
207
signed integer, so that `abs(typemin(x)) == typemin(x) < 0`, in which case the result of
208
`uabs(x)` will be an unsigned integer of the same size.
209
"""
210
uabs(x::Integer) = abs(x)
5✔
211
uabs(x::BitSigned) = unsigned(abs(x))
4,006,412✔
212

213
## conversions to floating-point ##
214

215
# TODO: deprecate in 2.0
216
Float16(x::Integer) = convert(Float16, convert(Float32, x)::Float32)
×
217

218
for t1 in (Float16, Float32, Float64)
219
    for st in (Int8, Int16, Int32, Int64)
220
        @eval begin
221
            (::Type{$t1})(x::($st)) = sitofp($t1, x)
263,022,200✔
222
            promote_rule(::Type{$t1}, ::Type{$st}) = $t1
40,356,313✔
223
        end
224
    end
225
    for ut in (Bool, UInt8, UInt16, UInt32, UInt64)
226
        @eval begin
227
            (::Type{$t1})(x::($ut)) = uitofp($t1, x)
111,303,818✔
228
            promote_rule(::Type{$t1}, ::Type{$ut}) = $t1
1,650,518✔
229
        end
230
    end
231
end
232

233
Bool(x::Real) = x==0 ? false : x==1 ? true : throw(InexactError(:Bool, Bool, x))
16,377,383✔
234

235
promote_rule(::Type{Float64}, ::Type{UInt128}) = Float64
53✔
236
promote_rule(::Type{Float64}, ::Type{Int128}) = Float64
724,735✔
237
promote_rule(::Type{Float32}, ::Type{UInt128}) = Float32
23✔
238
promote_rule(::Type{Float32}, ::Type{Int128}) = Float32
23✔
239
promote_rule(::Type{Float16}, ::Type{UInt128}) = Float16
23✔
240
promote_rule(::Type{Float16}, ::Type{Int128}) = Float16
23✔
241

242
function Float64(x::UInt128)
21,182✔
243
    if x < UInt128(1) << 104 # Can fit it in two 52 bits mantissas
21,206✔
244
        low_exp = 0x1p52
×
245
        high_exp = 0x1p104
×
246
        low_bits = (x % UInt64) & Base.significand_mask(Float64)
177✔
247
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
177✔
248
        high_bits = ((x >> 52) % UInt64)
177✔
249
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
177✔
250
        low_value + high_value
193✔
251
    else # Large enough that low bits only affect rounding, pack low bits
252
        low_exp = 0x1p76
×
253
        high_exp = 0x1p128
×
254
        low_bits = ((x >> 12) % UInt64) >> 12 | (x % UInt64) & 0xFFFFFF
21,013✔
255
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
21,013✔
256
        high_bits = ((x >> 76) % UInt64)
21,013✔
257
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
21,013✔
258
        low_value + high_value
21,013✔
259
    end
260
end
261

262
function Float64(x::Int128)
4,005,729✔
263
    sign_bit = ((x >> 127) % UInt64) << 63
3,354,467✔
264
    ux = uabs(x)
4,005,755✔
265
    if ux < UInt128(1) << 104 # Can fit it in two 52 bits mantissas
4,005,755✔
266
        low_exp = 0x1p52
×
267
        high_exp = 0x1p104
×
268
        low_bits = (ux % UInt64) & Base.significand_mask(Float64)
3,334,431✔
269
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
3,334,431✔
270
        high_bits = ((ux >> 52) % UInt64)
3,334,431✔
271
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
3,334,431✔
272
        reinterpret(Float64, sign_bit | reinterpret(UInt64, low_value + high_value))
3,985,719✔
273
    else # Large enough that low bits only affect rounding, pack low bits
274
        low_exp = 0x1p76
×
275
        high_exp = 0x1p128
×
276
        low_bits = ((ux >> 12) % UInt64) >> 12 | (ux % UInt64) & 0xFFFFFF
20,036✔
277
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
20,036✔
278
        high_bits = ((ux >> 76) % UInt64)
20,036✔
279
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
20,036✔
280
        reinterpret(Float64, sign_bit | reinterpret(UInt64, low_value + high_value))
20,036✔
281
    end
282
end
283

284
function Float32(x::UInt128)
30✔
285
    x == 0 && return 0f0
30✔
286
    n = top_set_bit(x) # ndigits0z(x,2)
28✔
287
    if n <= 24
28✔
288
        y = ((x % UInt32) << (24-n)) & 0x007f_ffff
25✔
289
    else
290
        y = ((x >> (n-25)) % UInt32) & 0x00ff_ffff # keep 1 extra bit
3✔
291
        y = (y+one(UInt32))>>1 # round, ties up (extra leading bit in case of next exponent)
3✔
292
        y &= ~UInt32(trailing_zeros(x) == (n-25)) # fix last bit to round to even
3✔
293
    end
294
    d = ((n+126) % UInt32) << 23
28✔
295
    reinterpret(Float32, d + y)
28✔
296
end
297

298
function Float32(x::Int128)
32✔
299
    x == 0 && return 0f0
32✔
300
    s = ((x >>> 96) % UInt32) & 0x8000_0000 # sign bit
31✔
301
    x = abs(x) % UInt128
31✔
302
    n = top_set_bit(x) # ndigits0z(x,2)
31✔
303
    if n <= 24
31✔
304
        y = ((x % UInt32) << (24-n)) & 0x007f_ffff
26✔
305
    else
306
        y = ((x >> (n-25)) % UInt32) & 0x00ff_ffff # keep 1 extra bit
5✔
307
        y = (y+one(UInt32))>>1 # round, ties up (extra leading bit in case of next exponent)
5✔
308
        y &= ~UInt32(trailing_zeros(x) == (n-25)) # fix last bit to round to even
5✔
309
    end
310
    d = ((n+126) % UInt32) << 23
31✔
311
    reinterpret(Float32, s | d + y)
31✔
312
end
313

314
# TODO: optimize
315
Float16(x::UInt128) = convert(Float16, Float64(x))
24✔
316
Float16(x::Int128)  = convert(Float16, Float64(x))
28✔
317

318
Float16(x::Float32) = fptrunc(Float16, x)
6,476,734✔
319
Float16(x::Float64) = fptrunc(Float16, x)
288,497✔
320
Float32(x::Float64) = fptrunc(Float32, x)
457,999,848✔
321

322
Float32(x::Float16) = fpext(Float32, x)
27,509,466✔
323
Float64(x::Float32) = fpext(Float64, x)
478,607,348✔
324
Float64(x::Float16) = fpext(Float64, x)
3,123,282✔
325

326
AbstractFloat(x::Bool)    = Float64(x)
448,143✔
327
AbstractFloat(x::Int8)    = Float64(x)
84✔
328
AbstractFloat(x::Int16)   = Float64(x)
59✔
329
AbstractFloat(x::Int32)   = Float64(x)
67,998✔
330
AbstractFloat(x::Int64)   = Float64(x) # LOSSY
20,699,999✔
331
AbstractFloat(x::Int128)  = Float64(x) # LOSSY
1,327,061✔
332
AbstractFloat(x::UInt8)   = Float64(x)
12,290✔
333
AbstractFloat(x::UInt16)  = Float64(x)
45✔
334
AbstractFloat(x::UInt32)  = Float64(x)
45✔
335
AbstractFloat(x::UInt64)  = Float64(x) # LOSSY
1,596✔
336
AbstractFloat(x::UInt128) = Float64(x) # LOSSY
2,058✔
337

338
Bool(x::Float16) = x==0 ? false : x==1 ? true : throw(InexactError(:Bool, Bool, x))
5✔
339

340
"""
341
    float(x)
342

343
Convert a number or array to a floating point data type.
344

345
See also: [`complex`](@ref), [`oftype`](@ref), [`convert`](@ref).
346

347
# Examples
348
```jldoctest
349
julia> float(1:1000)
350
1.0:1.0:1000.0
351

352
julia> float(typemax(Int32))
353
2.147483647e9
354
```
355
"""
356
float(x) = AbstractFloat(x)
40,594,134✔
357

358
"""
359
    float(T::Type)
360

361
Return an appropriate type to represent a value of type `T` as a floating point value.
362
Equivalent to `typeof(float(zero(T)))`.
363

364
# Examples
365
```jldoctest
366
julia> float(Complex{Int})
367
ComplexF64 (alias for Complex{Float64})
368

369
julia> float(Int)
370
Float64
371
```
372
"""
373
float(::Type{T}) where {T<:Number} = typeof(float(zero(T)))
3,761✔
374
float(::Type{T}) where {T<:AbstractFloat} = T
23,703✔
375
float(::Type{Union{}}, slurp...) = Union{}(0.0)
×
376

377
"""
378
    unsafe_trunc(T, x)
379

380
Return the nearest integral value of type `T` whose absolute value is
381
less than or equal to the absolute value of `x`. If the value is not representable by `T`,
382
an arbitrary value will be returned.
383
See also [`trunc`](@ref).
384

385
# Examples
386
```jldoctest
387
julia> unsafe_trunc(Int, -2.2)
388
-2
389

390
julia> unsafe_trunc(Int, NaN)
391
-9223372036854775808
392
```
393
"""
394
function unsafe_trunc end
395

396
for Ti in (Int8, Int16, Int32, Int64)
397
    @eval begin
398
        unsafe_trunc(::Type{$Ti}, x::IEEEFloat) = fptosi($Ti, x)
46,037,185✔
399
    end
400
end
401
for Ti in (UInt8, UInt16, UInt32, UInt64)
402
    @eval begin
403
        unsafe_trunc(::Type{$Ti}, x::IEEEFloat) = fptoui($Ti, x)
40,875,265✔
404
    end
405
end
406

407
function unsafe_trunc(::Type{UInt128}, x::Float64)
652,444✔
408
    xu = reinterpret(UInt64,x)
652,444✔
409
    k = Int(xu >> 52) & 0x07ff - 1075
652,444✔
410
    xu = (xu & 0x000f_ffff_ffff_ffff) | 0x0010_0000_0000_0000
652,444✔
411
    if k <= 0
652,444✔
412
        UInt128(xu >> -k)
651,407✔
413
    else
414
        UInt128(xu) << k
1,037✔
415
    end
416
end
417
function unsafe_trunc(::Type{Int128}, x::Float64)
651,398✔
418
    copysign(unsafe_trunc(UInt128,x) % Int128, x)
651,422✔
419
end
420

421
function unsafe_trunc(::Type{UInt128}, x::Float32)
63✔
422
    xu = reinterpret(UInt32,x)
63✔
423
    k = Int(xu >> 23) & 0x00ff - 150
63✔
424
    xu = (xu & 0x007f_ffff) | 0x0080_0000
63✔
425
    if k <= 0
63✔
426
        UInt128(xu >> -k)
43✔
427
    else
428
        UInt128(xu) << k
20✔
429
    end
430
end
431
function unsafe_trunc(::Type{Int128}, x::Float32)
31✔
432
    copysign(unsafe_trunc(UInt128,x) % Int128, x)
45✔
433
end
434

435
unsafe_trunc(::Type{UInt128}, x::Float16) = unsafe_trunc(UInt128, Float32(x))
18✔
436
unsafe_trunc(::Type{Int128}, x::Float16) = unsafe_trunc(Int128, Float32(x))
17✔
437

438
# matches convert methods
439
# also determines floor, ceil, round
440
trunc(::Type{Signed}, x::IEEEFloat) = trunc(Int,x)
×
441
trunc(::Type{Unsigned}, x::IEEEFloat) = trunc(UInt,x)
×
442
trunc(::Type{Integer}, x::IEEEFloat) = trunc(Int,x)
367✔
443

444
# Bool
445
trunc(::Type{Bool}, x::AbstractFloat) = (-1 < x < 2) ? 1 <= x : throw(InexactError(:trunc, Bool, x))
8✔
446
floor(::Type{Bool}, x::AbstractFloat) = (0 <= x < 2) ? 1 <= x : throw(InexactError(:floor, Bool, x))
6✔
447
ceil(::Type{Bool}, x::AbstractFloat)  = (-1 < x <= 1) ? 0 < x : throw(InexactError(:ceil, Bool, x))
6✔
448
round(::Type{Bool}, x::AbstractFloat) = (-0.5 <= x < 1.5) ? 0.5 < x : throw(InexactError(:round, Bool, x))
8✔
449

450
round(x::IEEEFloat, r::RoundingMode{:ToZero})  = trunc_llvm(x)
24,105,968✔
451
round(x::IEEEFloat, r::RoundingMode{:Down})    = floor_llvm(x)
215,057✔
452
round(x::IEEEFloat, r::RoundingMode{:Up})      = ceil_llvm(x)
690,715✔
453
round(x::IEEEFloat, r::RoundingMode{:Nearest}) = rint_llvm(x)
11,059,338✔
454

455
## floating point promotions ##
456
promote_rule(::Type{Float32}, ::Type{Float16}) = Float32
12,263,228✔
457
promote_rule(::Type{Float64}, ::Type{Float16}) = Float64
×
458
promote_rule(::Type{Float64}, ::Type{Float32}) = Float64
×
459

460
widen(::Type{Float16}) = Float32
11,745,911✔
461
widen(::Type{Float32}) = Float64
9,555,526✔
462

463
## floating point arithmetic ##
464
-(x::IEEEFloat) = neg_float(x)
414,690,317✔
465

466
+(x::T, y::T) where {T<:IEEEFloat} = add_float(x, y)
719,958,853✔
467
-(x::T, y::T) where {T<:IEEEFloat} = sub_float(x, y)
1,246,929,935✔
468
*(x::T, y::T) where {T<:IEEEFloat} = mul_float(x, y)
2,147,483,647✔
469
/(x::T, y::T) where {T<:IEEEFloat} = div_float(x, y)
892,583,313✔
470

471
muladd(x::T, y::T, z::T) where {T<:IEEEFloat} = muladd_float(x, y, z)
819,065,770✔
472

473
# TODO: faster floating point div?
474
# TODO: faster floating point fld?
475
# TODO: faster floating point mod?
476

477
function unbiased_exponent(x::T) where {T<:IEEEFloat}
1,498✔
478
    return (reinterpret(Unsigned, x) & exponent_mask(T)) >> significand_bits(T)
1,078,900✔
479
end
480

481
function explicit_mantissa_noinfnan(x::T) where {T<:IEEEFloat}
1,498✔
482
    m = mantissa(x)
1,078,900✔
483
    issubnormal(x) || (m |= significand_mask(T) + uinttype(T)(1))
2,157,776✔
484
    return m
1,078,900✔
485
end
486

487
function _to_float(number::U, ep) where {U<:Unsigned}
368✔
488
    F = floattype(U)
368✔
489
    S = signed(U)
368✔
490
    epint = unsafe_trunc(S,ep)
523,421✔
491
    lz::signed(U) = unsafe_trunc(S, Core.Intrinsics.ctlz_int(number) - U(exponent_bits(F)))
523,421✔
492
    number <<= lz
523,421✔
493
    epint -= lz
523,421✔
494
    bits = U(0)
368✔
495
    if epint >= 0
523,421✔
496
        bits = number & significand_mask(F)
523,405✔
497
        bits |= ((epint + S(1)) << significand_bits(F)) & exponent_mask(F)
523,405✔
498
    else
499
        bits = (number >> -epint) & significand_mask(F)
16✔
500
    end
501
    return reinterpret(F, bits)
523,421✔
502
end
503

504
@assume_effects :terminates_locally :nothrow function rem_internal(x::T, y::T) where {T<:IEEEFloat}
611,814✔
505
    xuint = reinterpret(Unsigned, x)
611,814✔
506
    yuint = reinterpret(Unsigned, y)
611,814✔
507
    if xuint <= yuint
611,814✔
508
        if xuint < yuint
72,364✔
509
            return x
66,674✔
510
        end
511
        return zero(T)
5,690✔
512
    end
513

514
    e_x = unbiased_exponent(x)
539,450✔
515
    e_y = unbiased_exponent(y)
539,450✔
516
    # Most common case where |y| is "very normal" and |x/y| < 2^EXPONENT_WIDTH
517
    if e_y > (significand_bits(T)) && (e_x - e_y) <= (exponent_bits(T))
539,450✔
518
        m_x = explicit_mantissa_noinfnan(x)
248,470✔
519
        m_y = explicit_mantissa_noinfnan(y)
248,470✔
520
        d = urem_int((m_x << (e_x - e_y)),  m_y)
124,235✔
521
        iszero(d) && return zero(T)
124,235✔
522
        return _to_float(d, e_y - uinttype(T)(1))
108,491✔
523
    end
524
    # Both are subnormals
525
    if e_x == 0 && e_y == 0
415,215✔
526
        return reinterpret(T, urem_int(xuint, yuint) & significand_mask(T))
×
527
    end
528

529
    m_x = explicit_mantissa_noinfnan(x)
830,430✔
530
    e_x -= uinttype(T)(1)
415,215✔
531
    m_y = explicit_mantissa_noinfnan(y)
830,406✔
532
    lz_m_y = uinttype(T)(exponent_bits(T))
44✔
533
    if e_y > 0
415,215✔
534
        e_y -= uinttype(T)(1)
415,191✔
535
    else
536
        m_y = mantissa(y)
24✔
537
        lz_m_y = Core.Intrinsics.ctlz_int(m_y)
24✔
538
    end
539

540
    tz_m_y = Core.Intrinsics.cttz_int(m_y)
415,215✔
541
    sides_zeroes_cnt = lz_m_y + tz_m_y
415,215✔
542

543
    # n>0
544
    exp_diff = e_x - e_y
415,215✔
545
    # Shift hy right until the end or n = 0
546
    right_shift = min(exp_diff, tz_m_y)
415,215✔
547
    m_y >>= right_shift
415,215✔
548
    exp_diff -= right_shift
415,215✔
549
    e_y += right_shift
415,215✔
550
    # Shift hx left until the end or n = 0
551
    left_shift = min(exp_diff, uinttype(T)(exponent_bits(T)))
415,215✔
552
    m_x <<= left_shift
415,215✔
553
    exp_diff -= left_shift
415,215✔
554

555
    m_x = urem_int(m_x, m_y)
415,215✔
556
    iszero(m_x) && return zero(T)
415,215✔
557
    iszero(exp_diff) && return _to_float(m_x, e_y)
414,930✔
558

559
    while exp_diff > sides_zeroes_cnt
402,804✔
560
        exp_diff -= sides_zeroes_cnt
1,215✔
561
        m_x <<= sides_zeroes_cnt
1,215✔
562
        m_x = urem_int(m_x, m_y)
1,215✔
563
    end
1,215✔
564
    m_x <<= exp_diff
401,589✔
565
    m_x = urem_int(m_x, m_y)
401,589✔
566
    return _to_float(m_x, e_y)
401,589✔
567
end
568

569
function rem(x::T, y::T) where {T<:IEEEFloat}
21,411✔
570
    if isfinite(x) && !iszero(x) && isfinite(y) && !iszero(y)
620,764✔
571
        return copysign(rem_internal(abs(x), abs(y)), x)
611,824✔
572
    elseif isinf(x) || isnan(y) || iszero(y)  # y can still be Inf
17,867✔
573
        return T(NaN)
41✔
574
    else
575
        return x
8,899✔
576
    end
577
end
578

579
function mod(x::T, y::T) where {T<:AbstractFloat}
54,670✔
580
    r = rem(x,y)
94,594✔
581
    if r == 0
90,197✔
582
        copysign(r,y)
15,387✔
583
    elseif (r > 0) ⊻ (y > 0)
74,810✔
584
        r+y
26,538✔
585
    else
586
        r
48,272✔
587
    end
588
end
589

590
## floating point comparisons ##
591
==(x::T, y::T) where {T<:IEEEFloat} = eq_float(x, y)
435,653,819✔
592
!=(x::T, y::T) where {T<:IEEEFloat} = ne_float(x, y)
2,147,483,647✔
593
<( x::T, y::T) where {T<:IEEEFloat} = lt_float(x, y)
180,311,247✔
594
<=(x::T, y::T) where {T<:IEEEFloat} = le_float(x, y)
143,146,005✔
595

596
isequal(x::T, y::T) where {T<:IEEEFloat} = fpiseq(x, y)
3,224,170✔
597

598
# interpret as sign-magnitude integer
599
@inline function _fpint(x)
9,539✔
600
    IntT = inttype(typeof(x))
9,496✔
601
    ix = reinterpret(IntT, x)
1,218,567✔
602
    return ifelse(ix < zero(IntT), ix ⊻ typemax(IntT), ix)
1,218,567✔
603
end
604

605
@inline function isless(a::T, b::T) where T<:IEEEFloat
92,898✔
606
    (isnan(a) || isnan(b)) && return !isnan(a)
1,448,176✔
607

608
    return _fpint(a) < _fpint(b)
723,866✔
609
end
610

611
# Exact Float (Tf) vs Integer (Ti) comparisons
612
# Assumes:
613
# - typemax(Ti) == 2^n-1
614
# - typemax(Ti) can't be exactly represented by Tf:
615
#   => Tf(typemax(Ti)) == 2^n or Inf
616
# - typemin(Ti) can be exactly represented by Tf
617
#
618
# 1. convert y::Ti to float fy::Tf
619
# 2. perform Tf comparison x vs fy
620
# 3. if x == fy, check if (1) resulted in rounding:
621
#  a. convert fy back to Ti and compare with original y
622
#  b. unsafe_convert undefined behaviour if fy == Tf(typemax(Ti))
623
#     (but consequently x == fy > y)
624
for Ti in (Int64,UInt64,Int128,UInt128)
625
    for Tf in (Float32,Float64)
626
        @eval begin
627
            function ==(x::$Tf, y::$Ti)
5,660,269✔
628
                fy = ($Tf)(y)
6,173,312✔
629
                (x == fy) & (fy != $(Tf(typemax(Ti)))) & (y == unsafe_trunc($Ti,fy))
8,644,745✔
630
            end
631
            ==(y::$Ti, x::$Tf) = x==y
290,755✔
632

633
            function <(x::$Ti, y::$Tf)
40,810,661✔
634
                fx = ($Tf)(x)
40,812,697✔
635
                (fx < y) | ((fx == y) & ((fx == $(Tf(typemax(Ti)))) | (x < unsafe_trunc($Ti,fx)) ))
40,961,307✔
636
            end
637
            function <=(x::$Ti, y::$Tf)
20,306✔
638
                fx = ($Tf)(x)
211,368✔
639
                (fx < y) | ((fx == y) & ((fx == $(Tf(typemax(Ti)))) | (x <= unsafe_trunc($Ti,fx)) ))
783,285✔
640
            end
641

642
            function <(x::$Tf, y::$Ti)
630,716✔
643
                fy = ($Tf)(y)
817,037✔
644
                (x < fy) | ((x == fy) & (fy < $(Tf(typemax(Ti)))) & (unsafe_trunc($Ti,fy) < y))
1,090,727✔
645
            end
646
            function <=(x::$Tf, y::$Ti)
22,251✔
647
                fy = ($Tf)(y)
22,251✔
648
                (x < fy) | ((x == fy) & (fy < $(Tf(typemax(Ti)))) & (unsafe_trunc($Ti,fy) <= y))
22,872✔
649
            end
650
        end
651
    end
652
end
653
for op in (:(==), :<, :<=)
654
    @eval begin
655
        ($op)(x::Float16, y::Union{Int128,UInt128,Int64,UInt64}) = ($op)(Float64(x), Float64(y))
2,166,172✔
656
        ($op)(x::Union{Int128,UInt128,Int64,UInt64}, y::Float16) = ($op)(Float64(x), Float64(y))
16,656✔
657

658
        ($op)(x::Union{Float16,Float32}, y::Union{Int32,UInt32}) = ($op)(Float64(x), Float64(y))
246,181✔
659
        ($op)(x::Union{Int32,UInt32}, y::Union{Float16,Float32}) = ($op)(Float64(x), Float64(y))
78✔
660

661
        ($op)(x::Float16, y::Union{Int16,UInt16}) = ($op)(Float32(x), Float32(y))
10✔
662
        ($op)(x::Union{Int16,UInt16}, y::Float16) = ($op)(Float32(x), Float32(y))
12✔
663
    end
664
end
665

666

667
abs(x::IEEEFloat) = abs_float(x)
185,897,637✔
668

669
"""
670
    isnan(f) -> Bool
671

672
Test whether a number value is a NaN, an indeterminate value which is neither an infinity
673
nor a finite number ("not a number").
674

675
See also: [`iszero`](@ref), [`isone`](@ref), [`isinf`](@ref), [`ismissing`](@ref).
676
"""
677
isnan(x::AbstractFloat) = (x != x)::Bool
2,147,483,647✔
678
isnan(x::Number) = false
88,831✔
679

680
isfinite(x::AbstractFloat) = !isnan(x - x)
1,017,713,026✔
681
isfinite(x::Real) = decompose(x)[3] != 0
69,253✔
682
isfinite(x::Integer) = true
122,388✔
683

684
"""
685
    isinf(f) -> Bool
686

687
Test whether a number is infinite.
688

689
See also: [`Inf`](@ref), [`iszero`](@ref), [`isfinite`](@ref), [`isnan`](@ref).
690
"""
691
isinf(x::Real) = !isnan(x) & !isfinite(x)
177,883✔
692
isinf(x::IEEEFloat) = abs(x) === oftype(x, Inf)
36,695,924✔
693

694
const hx_NaN = hash_uint64(reinterpret(UInt64, NaN))
695
function hash(x::Float64, h::UInt)
82,254✔
696
    # see comments on trunc and hash(Real, UInt)
697
    if typemin(Int64) <= x < typemax(Int64)
82,254✔
698
        xi = fptosi(Int64, x)
82,102✔
699
        if isequal(xi, x)
82,102✔
700
            return hash(xi, h)
32,149✔
701
        end
702
    elseif typemin(UInt64) <= x < typemax(UInt64)
152✔
703
        xu = fptoui(UInt64, x)
94✔
704
        if isequal(xu, x)
94✔
705
            return hash(xu, h)
94✔
706
        end
707
    elseif isnan(x)
58✔
708
        return hx_NaN ⊻ h # NaN does not have a stable bit pattern
51✔
709
    end
710
    return hash_uint64(bitcast(UInt64, x)) - 3h
49,960✔
711
end
712

713
hash(x::Float32, h::UInt) = hash(Float64(x), h)
4,150✔
714

715
function hash(x::Float16, h::UInt)
7✔
716
    # see comments on trunc and hash(Real, UInt)
717
    if isfinite(x) # all finite Float16 fit in Int64
7✔
718
        xi = fptosi(Int64, x)
7✔
719
        if isequal(xi, x)
7✔
720
            return hash(xi, h)
7✔
721
        end
722
    elseif isnan(x)
×
723
        return hx_NaN ⊻ h # NaN does not have a stable bit pattern
×
724
    end
725
    return hash_uint64(bitcast(UInt64, Float64(x))) - 3h
×
726
end
727

728
## generic hashing for rational values ##
729
function hash(x::Real, h::UInt)
8,442✔
730
    # decompose x as num*2^pow/den
731
    num, pow, den = decompose(x)
8,574✔
732

733
    # handle special values
734
    num == 0 && den == 0 && return hash(NaN, h)
8,442✔
735
    num == 0 && return hash(ifelse(den > 0, 0.0, -0.0), h)
8,442✔
736
    den == 0 && return hash(ifelse(num > 0, Inf, -Inf), h)
4,180✔
737

738
    # normalize decomposition
739
    if den < 0
4,180✔
740
        num = -num
832✔
741
        den = -den
832✔
742
    end
743
    num_z = trailing_zeros(num)
4,180✔
744
    num >>= num_z
6,518✔
745
    den_z = trailing_zeros(den)
4,180✔
746
    den >>= den_z
4,183✔
747
    pow += num_z - den_z
4,180✔
748
    # If the real can be represented as an Int64, UInt64, or Float64, hash as those types.
749
    # To be an Integer the denominator must be 1 and the power must be non-negative.
750
    if den == 1
4,180✔
751
        # left = ceil(log2(num*2^pow))
752
        left = top_set_bit(abs(num)) + pow
7,318✔
753
        # 2^-1074 is the minimum Float64 so if the power is smaller, not a Float64
754
        if -1074 <= pow
4,177✔
755
            if 0 <= pow # if pow is non-negative, it is an integer
4,177✔
756
                left <= 63 && return hash(Int64(num) << Int(pow), h)
4,174✔
757
                left <= 64 && !signbit(num) && return hash(UInt64(num) << Int(pow), h)
176✔
758
            end # typemin(Int64) handled by Float64 case
759
            # 2^1024 is the maximum Float64 so if the power is greater, not a Float64
760
            # Float64s only have 53 mantisa bits (including implicit bit)
761
            left <= 1024 && left - pow <= 53 && return hash(ldexp(Float64(num), pow), h)
111✔
762
        end
763
    else
764
        h = hash_integer(den, h)
3✔
765
    end
766
    # handle generic rational values
767
    h = hash_integer(pow, h)
5✔
768
    h = hash_integer(num, h)
5✔
769
    return h
5✔
770
end
771

772
#=
773
`decompose(x)`: non-canonical decomposition of rational values as `num*2^pow/den`.
774

775
The decompose function is the point where rational-valued numeric types that support
776
hashing hook into the hashing protocol. `decompose(x)` should return three integer
777
values `num, pow, den`, such that the value of `x` is mathematically equal to
778

779
    num*2^pow/den
780

781
The decomposition need not be canonical in the sense that it just needs to be *some*
782
way to express `x` in this form, not any particular way – with the restriction that
783
`num` and `den` may not share any odd common factors. They may, however, have powers
784
of two in common – the generic hashing code will normalize those as necessary.
785

786
Special values:
787

788
 - `x` is zero: `num` should be zero and `den` should have the same sign as `x`
789
 - `x` is infinite: `den` should be zero and `num` should have the same sign as `x`
790
 - `x` is not a number: `num` and `den` should both be zero
791
=#
792

793
decompose(x::Integer) = x, 0, 1
4,606✔
794

795
function decompose(x::Float16)::NTuple{3,Int}
×
796
    isnan(x) && return 0, 0, 0
×
797
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
×
798
    n = reinterpret(UInt16, x)
×
799
    s = (n & 0x03ff) % Int16
×
800
    e = ((n & 0x7c00) >> 10) % Int
×
801
    s |= Int16(e != 0) << 10
×
802
    d = ifelse(signbit(x), -1, 1)
×
803
    s, e - 25 + (e == 0), d
×
804
end
805

806
function decompose(x::Float32)::NTuple{3,Int}
74✔
807
    isnan(x) && return 0, 0, 0
74✔
808
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
74✔
809
    n = reinterpret(UInt32, x)
66✔
810
    s = (n & 0x007fffff) % Int32
66✔
811
    e = ((n & 0x7f800000) >> 23) % Int
66✔
812
    s |= Int32(e != 0) << 23
66✔
813
    d = ifelse(signbit(x), -1, 1)
66✔
814
    s, e - 150 + (e == 0), d
66✔
815
end
816

817
function decompose(x::Float64)::Tuple{Int64, Int, Int}
18,730✔
818
    isnan(x) && return 0, 0, 0
18,730✔
819
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
18,730✔
820
    n = reinterpret(UInt64, x)
18,723✔
821
    s = (n & 0x000fffffffffffff) % Int64
18,723✔
822
    e = ((n & 0x7ff0000000000000) >> 52) % Int
18,723✔
823
    s |= Int64(e != 0) << 52
18,723✔
824
    d = ifelse(signbit(x), -1, 1)
18,723✔
825
    s, e - 1075 + (e == 0), d
18,723✔
826
end
827

828

829
"""
830
    precision(num::AbstractFloat; base::Integer=2)
831
    precision(T::Type; base::Integer=2)
832

833
Get the precision of a floating point number, as defined by the effective number of bits in
834
the significand, or the precision of a floating-point type `T` (its current default, if
835
`T` is a variable-precision type like [`BigFloat`](@ref)).
836

837
If `base` is specified, then it returns the maximum corresponding
838
number of significand digits in that base.
839

840
!!! compat "Julia 1.8"
841
    The `base` keyword requires at least Julia 1.8.
842
"""
843
function precision end
844

845
_precision(::Type{Float16}) = 11
×
846
_precision(::Type{Float32}) = 24
×
847
_precision(::Type{Float64}) = 53
×
848
function _precision(x, base::Integer=2)
36,634,036✔
849
    base > 1 || throw(DomainError(base, "`base` cannot be less than 2."))
36,634,040✔
850
    p = _precision(x)
73,233,468✔
851
    return base == 2 ? Int(p) : floor(Int, p / log2(base))
36,634,041✔
852
end
853
precision(::Type{T}; base::Integer=2) where {T<:AbstractFloat} = _precision(T, base)
73,198,950✔
854
precision(::T; base::Integer=2) where {T<:AbstractFloat} = precision(T; base)
134✔
855

856

857
"""
858
    nextfloat(x::AbstractFloat, n::Integer)
859

860
The result of `n` iterative applications of `nextfloat` to `x` if `n >= 0`, or `-n`
861
applications of [`prevfloat`](@ref) if `n < 0`.
862
"""
863
function nextfloat(f::IEEEFloat, d::Integer)
601,256,836✔
864
    F = typeof(f)
601,125,406✔
865
    fumax = reinterpret(Unsigned, F(Inf))
601,125,406✔
866
    U = typeof(fumax)
601,125,406✔
867

868
    isnan(f) && return f
1,201,384,987✔
869
    fi = reinterpret(Signed, f)
1,201,384,985✔
870
    fneg = fi < 0
1,201,384,985✔
871
    fu = unsigned(fi & typemax(fi))
1,201,384,985✔
872

873
    dneg = d < 0
601,125,607✔
874
    da = uabs(d)
601,125,608✔
875
    if da > typemax(U)
1,201,384,985✔
876
        fneg = dneg
4✔
877
        fu = fumax
4✔
878
    else
879
        du = da % U
601,125,401✔
880
        if fneg ⊻ dneg
1,201,384,981✔
881
            if du > fu
764,636✔
882
                fu = min(fumax, du - fu)
101✔
883
                fneg = !fneg
101✔
884
            else
885
                fu = fu - du
1,398,317✔
886
            end
887
        else
888
            if fumax - fu < du
1,200,620,345✔
889
                fu = fumax
42✔
890
            else
891
                fu = fu + du
1,200,620,303✔
892
            end
893
        end
894
    end
895
    if fneg
1,201,384,985✔
896
        fu |= sign_mask(F)
536,892✔
897
    end
898
    reinterpret(F, fu)
1,201,384,985✔
899
end
900

901
"""
902
    nextfloat(x::AbstractFloat)
903

904
Return the smallest floating point number `y` of the same type as `x` such `x < y`. If no
905
such `y` exists (e.g. if `x` is `Inf` or `NaN`), then return `x`.
906

907
See also: [`prevfloat`](@ref), [`eps`](@ref), [`issubnormal`](@ref).
908
"""
909
nextfloat(x::AbstractFloat) = nextfloat(x,1)
1,200,996,522✔
910

911
"""
912
    prevfloat(x::AbstractFloat, n::Integer)
913

914
The result of `n` iterative applications of `prevfloat` to `x` if `n >= 0`, or `-n`
915
applications of [`nextfloat`](@ref) if `n < 0`.
916
"""
917
prevfloat(x::AbstractFloat, d::Integer) = nextfloat(x, -d)
9✔
918

919
"""
920
    prevfloat(x::AbstractFloat)
921

922
Return the largest floating point number `y` of the same type as `x` such `y < x`. If no
923
such `y` exists (e.g. if `x` is `-Inf` or `NaN`), then return `x`.
924
"""
925
prevfloat(x::AbstractFloat) = nextfloat(x,-1)
1,087,758✔
926

927
for Ti in (Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128)
928
    for Tf in (Float16, Float32, Float64)
929
        if Ti <: Unsigned || sizeof(Ti) < sizeof(Tf)
930
            # Here `Tf(typemin(Ti))-1` is exact, so we can compare the lower-bound
931
            # directly. `Tf(typemax(Ti))+1` is either always exactly representable, or
932
            # rounded to `Inf` (e.g. when `Ti==UInt128 && Tf==Float32`).
933
            @eval begin
934
                function trunc(::Type{$Ti},x::$Tf)
9,884✔
935
                    if $(Tf(typemin(Ti))-one(Tf)) < x < $(Tf(typemax(Ti))+one(Tf))
418,785✔
936
                        return unsafe_trunc($Ti,x)
418,775✔
937
                    else
938
                        throw(InexactError(:trunc, $Ti, x))
16✔
939
                    end
940
                end
941
                function (::Type{$Ti})(x::$Tf)
3,881✔
942
                    # When typemax(Ti) is not representable by Tf but typemax(Ti) + 1 is,
943
                    # then < Tf(typemax(Ti) + 1) is stricter than <= Tf(typemax(Ti)). Using
944
                    # the former causes us to throw on UInt64(Float64(typemax(UInt64))+1)
945
                    if ($(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti))+one(Tf))) && isinteger(x)
3,888✔
946
                        return unsafe_trunc($Ti,x)
4,585✔
947
                    else
948
                        throw(InexactError($(Expr(:quote,Ti.name.name)), $Ti, x))
304✔
949
                    end
950
                end
951
            end
952
        else
953
            # Here `eps(Tf(typemin(Ti))) > 1`, so the only value which can be truncated to
954
            # `Tf(typemin(Ti)` is itself. Similarly, `Tf(typemax(Ti))` is inexact and will
955
            # be rounded up. This assumes that `Tf(typemin(Ti)) > -Inf`, which is true for
956
            # these types, but not for `Float16` or larger integer types.
957
            @eval begin
958
                function trunc(::Type{$Ti},x::$Tf)
24,219,211✔
959
                    if $(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti)))
32,753,884✔
960
                        return unsafe_trunc($Ti,x)
32,753,890✔
961
                    else
962
                        throw(InexactError(:trunc, $Ti, x))
8✔
963
                    end
964
                end
965
                function (::Type{$Ti})(x::$Tf)
23,212,720✔
966
                    if ($(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti)))) && isinteger(x)
23,943,017✔
967
                        return unsafe_trunc($Ti,x)
23,942,855✔
968
                    else
969
                        throw(InexactError($(Expr(:quote,Ti.name.name)), $Ti, x))
151✔
970
                    end
971
                end
972
            end
973
        end
974
    end
975
end
976

977
"""
978
    issubnormal(f) -> Bool
979

980
Test whether a floating point number is subnormal.
981

982
An IEEE floating point number is [subnormal](https://en.wikipedia.org/wiki/Subnormal_number)
983
when its exponent bits are zero and its significand is not zero.
984

985
# Examples
986
```jldoctest
987
julia> floatmin(Float32)
988
1.1754944f-38
989

990
julia> issubnormal(1.0f-37)
991
false
992

993
julia> issubnormal(1.0f-38)
994
true
995
```
996
"""
997
function issubnormal(x::T) where {T<:IEEEFloat}
3,332,180✔
998
    y = reinterpret(Unsigned, x)
4,932,131✔
999
    (y & exponent_mask(T) == 0) & (y & significand_mask(T) != 0)
4,932,131✔
1000
end
1001

1002
ispow2(x::AbstractFloat) = !iszero(x) && frexp(x)[1] == 0.5
42✔
1003
iseven(x::AbstractFloat) = isinteger(x) && (abs(x) > maxintfloat(x) || iseven(Integer(x)))
52✔
1004
isodd(x::AbstractFloat) = isinteger(x) && abs(x) ≤ maxintfloat(x) && isodd(Integer(x))
28✔
1005

1006
@eval begin
1007
    typemin(::Type{Float16}) = $(bitcast(Float16, 0xfc00))
20✔
1008
    typemax(::Type{Float16}) = $(Inf16)
21✔
1009
    typemin(::Type{Float32}) = $(-Inf32)
20✔
1010
    typemax(::Type{Float32}) = $(Inf32)
247✔
1011
    typemin(::Type{Float64}) = $(-Inf64)
55✔
1012
    typemax(::Type{Float64}) = $(Inf64)
1,299✔
1013
    typemin(x::T) where {T<:Real} = typemin(T)
×
1014
    typemax(x::T) where {T<:Real} = typemax(T)
601,165,157✔
1015

1016
    floatmin(::Type{Float16}) = $(bitcast(Float16, 0x0400))
493,415✔
1017
    floatmin(::Type{Float32}) = $(bitcast(Float32, 0x00800000))
161,573✔
1018
    floatmin(::Type{Float64}) = $(bitcast(Float64, 0x0010000000000000))
×
1019
    floatmax(::Type{Float16}) = $(bitcast(Float16, 0x7bff))
30,841✔
1020
    floatmax(::Type{Float32}) = $(bitcast(Float32, 0x7f7fffff))
130,466✔
1021
    floatmax(::Type{Float64}) = $(bitcast(Float64, 0x7fefffffffffffff))
1,021,529✔
1022

1023
    eps(x::AbstractFloat) = isfinite(x) ? abs(x) >= floatmin(x) ? ldexp(eps(typeof(x)), exponent(x)) : nextfloat(zero(x)) : oftype(x, NaN)
1,374,800✔
1024
    eps(::Type{Float16}) = $(bitcast(Float16, 0x1400))
399,701✔
1025
    eps(::Type{Float32}) = $(bitcast(Float32, 0x34000000))
621,382✔
1026
    eps(::Type{Float64}) = $(bitcast(Float64, 0x3cb0000000000000))
1✔
1027
    eps() = eps(Float64)
547✔
1028
end
1029

1030
"""
1031
    floatmin(T = Float64)
1032

1033
Return the smallest positive normal number representable by the floating-point
1034
type `T`.
1035

1036
# Examples
1037
```jldoctest
1038
julia> floatmin(Float16)
1039
Float16(6.104e-5)
1040

1041
julia> floatmin(Float32)
1042
1.1754944f-38
1043

1044
julia> floatmin()
1045
2.2250738585072014e-308
1046
```
1047
"""
1048
floatmin(x::T) where {T<:AbstractFloat} = floatmin(T)
619,836✔
1049

1050
"""
1051
    floatmax(T = Float64)
1052

1053
Return the largest finite number representable by the floating-point type `T`.
1054

1055
See also: [`typemax`](@ref), [`floatmin`](@ref), [`eps`](@ref).
1056

1057
# Examples
1058
```jldoctest
1059
julia> floatmax(Float16)
1060
Float16(6.55e4)
1061

1062
julia> floatmax(Float32)
1063
3.4028235f38
1064

1065
julia> floatmax()
1066
1.7976931348623157e308
1067

1068
julia> typemax(Float64)
1069
Inf
1070
```
1071
"""
1072
floatmax(x::T) where {T<:AbstractFloat} = floatmax(T)
274,514✔
1073

1074
floatmin() = floatmin(Float64)
16✔
1075
floatmax() = floatmax(Float64)
19✔
1076

1077
"""
1078
    eps(::Type{T}) where T<:AbstractFloat
1079
    eps()
1080

1081
Return the *machine epsilon* of the floating point type `T` (`T = Float64` by
1082
default). This is defined as the gap between 1 and the next largest value representable by
1083
`typeof(one(T))`, and is equivalent to `eps(one(T))`.  (Since `eps(T)` is a
1084
bound on the *relative error* of `T`, it is a "dimensionless" quantity like [`one`](@ref).)
1085

1086
# Examples
1087
```jldoctest
1088
julia> eps()
1089
2.220446049250313e-16
1090

1091
julia> eps(Float32)
1092
1.1920929f-7
1093

1094
julia> 1.0 + eps()
1095
1.0000000000000002
1096

1097
julia> 1.0 + eps()/2
1098
1.0
1099
```
1100
"""
1101
eps(::Type{<:AbstractFloat})
1102

1103
"""
1104
    eps(x::AbstractFloat)
1105

1106
Return the *unit in last place* (ulp) of `x`. This is the distance between consecutive
1107
representable floating point values at `x`. In most cases, if the distance on either side
1108
of `x` is different, then the larger of the two is taken, that is
1109

1110
    eps(x) == max(x-prevfloat(x), nextfloat(x)-x)
1111

1112
The exceptions to this rule are the smallest and largest finite values
1113
(e.g. `nextfloat(-Inf)` and `prevfloat(Inf)` for [`Float64`](@ref)), which round to the
1114
smaller of the values.
1115

1116
The rationale for this behavior is that `eps` bounds the floating point rounding
1117
error. Under the default `RoundNearest` rounding mode, if ``y`` is a real number and ``x``
1118
is the nearest floating point number to ``y``, then
1119

1120
```math
1121
|y-x| \\leq \\operatorname{eps}(x)/2.
1122
```
1123

1124
See also: [`nextfloat`](@ref), [`issubnormal`](@ref), [`floatmax`](@ref).
1125

1126
# Examples
1127
```jldoctest
1128
julia> eps(1.0)
1129
2.220446049250313e-16
1130

1131
julia> eps(prevfloat(2.0))
1132
2.220446049250313e-16
1133

1134
julia> eps(2.0)
1135
4.440892098500626e-16
1136

1137
julia> x = prevfloat(Inf)      # largest finite Float64
1138
1.7976931348623157e308
1139

1140
julia> x + eps(x)/2            # rounds up
1141
Inf
1142

1143
julia> x + prevfloat(eps(x)/2) # rounds down
1144
1.7976931348623157e308
1145
```
1146
"""
1147
eps(::AbstractFloat)
1148

1149

1150
## byte order swaps for arbitrary-endianness serialization/deserialization ##
1151
bswap(x::IEEEFloat) = bswap_int(x)
7✔
1152

1153
# integer size of float
1154
uinttype(::Type{Float64}) = UInt64
×
1155
uinttype(::Type{Float32}) = UInt32
×
1156
uinttype(::Type{Float16}) = UInt16
×
1157
inttype(::Type{Float64}) = Int64
×
1158
inttype(::Type{Float32}) = Int32
9,398✔
1159
inttype(::Type{Float16}) = Int16
98✔
1160
# float size of integer
1161
floattype(::Type{UInt64}) = Float64
×
1162
floattype(::Type{UInt32}) = Float32
361✔
1163
floattype(::Type{UInt16}) = Float16
7✔
1164
floattype(::Type{Int64}) = Float64
×
1165
floattype(::Type{Int32}) = Float32
×
1166
floattype(::Type{Int16}) = Float16
×
1167

1168

1169
## Array operations on floating point numbers ##
1170

1171
float(A::AbstractArray{<:AbstractFloat}) = A
×
1172

1173
function float(A::AbstractArray{T}) where T
21✔
1174
    if !isconcretetype(T)
21✔
1175
        error("`float` not defined on abstractly-typed arrays; please convert to a more specific type")
×
1176
    end
1177
    convert(AbstractArray{typeof(float(zero(T)))}, A)
21✔
1178
end
1179

1180
float(r::StepRange) = float(r.start):float(r.step):float(last(r))
1✔
1181
float(r::UnitRange) = float(r.start):float(last(r))
1✔
1182
float(r::StepRangeLen{T}) where {T} =
4✔
1183
    StepRangeLen{typeof(float(T(r.ref)))}(float(r.ref), float(r.step), length(r), r.offset)
1184
function float(r::LinRange)
×
1185
    LinRange(float(r.start), float(r.stop), length(r))
×
1186
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc