• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #37770

05 May 2024 01:26AM UTC coverage: 85.802% (-1.6%) from 87.442%
#37770

push

local

web-flow
A better mechanism for coordinating internal breaking changes. (#53849)

This was origiginally supposed to be an issue, but I just started
writing out the whole code in the issue text to explain what I want all
the behavior to be, so instead, here's the actual implementation of it,
with the motativation in the commit message, and the details of the
actual behavior in the code change ;)

Sometimes packages rely on Julia internals. This is in general
discouraged, but of course for some packages, there isn't really any
other option. If your packages needs to hook the julia internals in a
deep way or is specifically about introspecting the way that julia
itself works, then some amount of reliance on internals is inevitable.
In general, we're happy to let people touch the internals, as long as
they (and their users) are aware that things will break and it's on them
to fix things.

That said, I think we've been a little bit too *caveat emptor* on this
entire business. There's a number of really key packages that rely on
internals (I'm thinking in particular of Revise, Cthulhu and its
dependency stacks) that if they're broken, it's really hard to even
develop julia itself. In particular, these packages have been broken on
Julia master for a more than a week now (following #52415) and there has
been much frustration.

I think one of the biggest issues is that we're generally relying on
`VERSION` checks for these kinds of things. This isn't really a problem
when updating a package between released major versions, but for closely
coupled packages like the above you run into two problems:

1. Since the VERSION number of a package is not known ahead of time,
some breaking changes cannot be made atomically, i.e. we need to merge
the base PR (which bumps people's nightly) in order to get the version
number, which we then need to plug into the various PRs in all the
various packages. If something goes wrong in this process (as it did... (continued)

0 of 3 new or added lines in 1 file covered. (0.0%)

1453 existing lines in 67 files now uncovered.

74896 of 87289 relevant lines covered (85.8%)

14448147.81 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

80.86
/base/float.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
const IEEEFloat = Union{Float16, Float32, Float64}
4

5
## floating point traits ##
6

7
"""
8
    Inf16
9

10
Positive infinity of type [`Float16`](@ref).
11
"""
12
const Inf16 = bitcast(Float16, 0x7c00)
13
"""
14
    NaN16
15

16
A not-a-number value of type [`Float16`](@ref).
17

18
See also: [`NaN`](@ref).
19
"""
20
const NaN16 = bitcast(Float16, 0x7e00)
21
"""
22
    Inf32
23

24
Positive infinity of type [`Float32`](@ref).
25
"""
26
const Inf32 = bitcast(Float32, 0x7f800000)
27
"""
28
    NaN32
29

30
A not-a-number value of type [`Float32`](@ref).
31

32
See also: [`NaN`](@ref).
33
"""
34
const NaN32 = bitcast(Float32, 0x7fc00000)
35
const Inf64 = bitcast(Float64, 0x7ff0000000000000)
36
const NaN64 = bitcast(Float64, 0x7ff8000000000000)
37

38
const Inf = Inf64
39
"""
40
    Inf, Inf64
41

42
Positive infinity of type [`Float64`](@ref).
43

44
See also: [`isfinite`](@ref), [`typemax`](@ref), [`NaN`](@ref), [`Inf32`](@ref).
45

46
# Examples
47
```jldoctest
48
julia> π/0
49
Inf
50

51
julia> +1.0 / -0.0
52
-Inf
53

54
julia> ℯ^-Inf
55
0.0
56
```
57
"""
58
Inf, Inf64
59

60
const NaN = NaN64
61
"""
62
    NaN, NaN64
63

64
A not-a-number value of type [`Float64`](@ref).
65

66
See also: [`isnan`](@ref), [`missing`](@ref), [`NaN32`](@ref), [`Inf`](@ref).
67

68
# Examples
69
```jldoctest
70
julia> 0/0
71
NaN
72

73
julia> Inf - Inf
74
NaN
75

76
julia> NaN == NaN, isequal(NaN, NaN), isnan(NaN)
77
(false, true, true)
78
```
79

80
!!! note
81
    Always use [`isnan`](@ref) or [`isequal`](@ref) for checking for `NaN`.
82
    Using `x === NaN` may give unexpected results:
83
    ```julia-repl
84
    julia> reinterpret(UInt32, NaN32)
85
    0x7fc00000
86

87
    julia> NaN32p1 = reinterpret(Float32, 0x7fc00001)
88
    NaN32
89

90
    julia> NaN32p1 === NaN32, isequal(NaN32p1, NaN32), isnan(NaN32p1)
91
    (false, true, true)
92
    ```
93
"""
94
NaN, NaN64
95

96
# bit patterns
97
reinterpret(::Type{Unsigned}, x::Float64) = reinterpret(UInt64, x)
32,882,078✔
98
reinterpret(::Type{Unsigned}, x::Float32) = reinterpret(UInt32, x)
604,864,925✔
99
reinterpret(::Type{Unsigned}, x::Float16) = reinterpret(UInt16, x)
2,669,276✔
100
reinterpret(::Type{Signed}, x::Float64) = reinterpret(Int64, x)
600,006,910✔
101
reinterpret(::Type{Signed}, x::Float32) = reinterpret(Int32, x)
600,138,968✔
102
reinterpret(::Type{Signed}, x::Float16) = reinterpret(Int16, x)
751✔
103

104
sign_mask(::Type{Float64}) =        0x8000_0000_0000_0000
×
105
exponent_mask(::Type{Float64}) =    0x7ff0_0000_0000_0000
×
106
exponent_one(::Type{Float64}) =     0x3ff0_0000_0000_0000
×
107
exponent_half(::Type{Float64}) =    0x3fe0_0000_0000_0000
×
108
significand_mask(::Type{Float64}) = 0x000f_ffff_ffff_ffff
×
109

110
sign_mask(::Type{Float32}) =        0x8000_0000
×
111
exponent_mask(::Type{Float32}) =    0x7f80_0000
×
112
exponent_one(::Type{Float32}) =     0x3f80_0000
×
113
exponent_half(::Type{Float32}) =    0x3f00_0000
×
114
significand_mask(::Type{Float32}) = 0x007f_ffff
×
115

116
sign_mask(::Type{Float16}) =        0x8000
×
117
exponent_mask(::Type{Float16}) =    0x7c00
×
118
exponent_one(::Type{Float16}) =     0x3c00
×
119
exponent_half(::Type{Float16}) =    0x3800
×
120
significand_mask(::Type{Float16}) = 0x03ff
×
121

122
mantissa(x::T) where {T} = reinterpret(Unsigned, x) & significand_mask(T)
3,613,070✔
123

124
for T in (Float16, Float32, Float64)
125
    @eval significand_bits(::Type{$T}) = $(trailing_ones(significand_mask(T)))
×
126
    @eval exponent_bits(::Type{$T}) = $(sizeof(T)*8 - significand_bits(T) - 1)
×
127
    @eval exponent_bias(::Type{$T}) = $(Int(exponent_one(T) >> significand_bits(T)))
×
128
    # maximum float exponent
129
    @eval exponent_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)) - exponent_bias(T) - 1)
×
130
    # maximum float exponent without bias
131
    @eval exponent_raw_max(::Type{$T}) = $(Int(exponent_mask(T) >> significand_bits(T)))
×
132
end
133

134
"""
135
    exponent_max(T)
136

137
Maximum [`exponent`](@ref) value for a floating point number of type `T`.
138

139
# Examples
140
```jldoctest
141
julia> Base.exponent_max(Float64)
142
1023
143
```
144

145
Note, `exponent_max(T) + 1` is a possible value of the exponent field
146
with bias, which might be used as sentinel value for `Inf` or `NaN`.
147
"""
148
function exponent_max end
149

150
"""
151
    exponent_raw_max(T)
152

153
Maximum value of the [`exponent`](@ref) field for a floating point number of type `T` without bias,
154
i.e. the maximum integer value representable by [`exponent_bits(T)`](@ref) bits.
155
"""
156
function exponent_raw_max end
157

158
"""
159
IEEE 754 definition of the minimum exponent.
160
"""
161
ieee754_exponent_min(::Type{T}) where {T<:IEEEFloat} = Int(1 - exponent_max(T))::Int
9,312✔
162

163
exponent_min(::Type{Float16}) = ieee754_exponent_min(Float16)
9,312✔
164
exponent_min(::Type{Float32}) = ieee754_exponent_min(Float32)
×
165
exponent_min(::Type{Float64}) = ieee754_exponent_min(Float64)
×
166

167
function ieee754_representation(
168
    ::Type{F}, sign_bit::Bool, exponent_field::Integer, significand_field::Integer
169
) where {F<:IEEEFloat}
170
    T = uinttype(F)
9,312✔
171
    ret::T = sign_bit
1,388,365✔
172
    ret <<= exponent_bits(F)
1,388,365✔
173
    ret |= exponent_field
1,388,365✔
174
    ret <<= significand_bits(F)
1,388,365✔
175
    ret |= significand_field
1,388,365✔
176
end
177

178
# ±floatmax(T)
179
function ieee754_representation(
180
    ::Type{F}, sign_bit::Bool, ::Val{:omega}
181
) where {F<:IEEEFloat}
UNCOV
182
    ieee754_representation(F, sign_bit, exponent_raw_max(F) - 1, significand_mask(F))
×
183
end
184

185
# NaN or an infinity
186
function ieee754_representation(
187
    ::Type{F}, sign_bit::Bool, significand_field::Integer, ::Val{:nan}
188
) where {F<:IEEEFloat}
189
    ieee754_representation(F, sign_bit, exponent_raw_max(F), significand_field)
1,265✔
190
end
191

192
# NaN with default payload
193
function ieee754_representation(
194
    ::Type{F}, sign_bit::Bool, ::Val{:nan}
195
) where {F<:IEEEFloat}
196
    ieee754_representation(F, sign_bit, one(uinttype(F)) << (significand_bits(F) - 1), Val(:nan))
91✔
197
end
198

199
# Infinity
200
function ieee754_representation(
201
    ::Type{F}, sign_bit::Bool, ::Val{:inf}
202
) where {F<:IEEEFloat}
203
    ieee754_representation(F, sign_bit, false, Val(:nan))
1,174✔
204
end
205

206
# Subnormal or zero
207
function ieee754_representation(
208
    ::Type{F}, sign_bit::Bool, significand_field::Integer, ::Val{:subnormal}
209
) where {F<:IEEEFloat}
210
    ieee754_representation(F, sign_bit, false, significand_field)
26,797✔
211
end
212

213
# Zero
214
function ieee754_representation(
215
    ::Type{F}, sign_bit::Bool, ::Val{:zero}
216
) where {F<:IEEEFloat}
217
    ieee754_representation(F, sign_bit, false, Val(:subnormal))
26,797✔
218
end
219

220
"""
221
    uabs(x::Integer)
222

223
Return the absolute value of `x`, possibly returning a different type should the
224
operation be susceptible to overflow. This typically arises when `x` is a two's complement
225
signed integer, so that `abs(typemin(x)) == typemin(x) < 0`, in which case the result of
226
`uabs(x)` will be an unsigned integer of the same size.
227
"""
228
uabs(x::Integer) = abs(x)
1,238,122✔
229
uabs(x::BitSigned) = unsigned(abs(x))
4,700,306✔
230

231
## conversions to floating-point ##
232

233
# TODO: deprecate in 2.0
234
Float16(x::Integer) = convert(Float16, convert(Float32, x)::Float32)
×
235

236
for t1 in (Float16, Float32, Float64)
237
    for st in (Int8, Int16, Int32, Int64)
238
        @eval begin
239
            (::Type{$t1})(x::($st)) = sitofp($t1, x)
258,877,845✔
240
            promote_rule(::Type{$t1}, ::Type{$st}) = $t1
×
241
        end
242
    end
243
    for ut in (Bool, UInt8, UInt16, UInt32, UInt64)
244
        @eval begin
245
            (::Type{$t1})(x::($ut)) = uitofp($t1, x)
103,879,791✔
246
            promote_rule(::Type{$t1}, ::Type{$ut}) = $t1
×
247
        end
248
    end
249
end
250

251
Bool(x::Real) = x==0 ? false : x==1 ? true : throw(InexactError(:Bool, Bool, x))
16,409,537✔
252

253
promote_rule(::Type{Float64}, ::Type{UInt128}) = Float64
×
254
promote_rule(::Type{Float64}, ::Type{Int128}) = Float64
×
255
promote_rule(::Type{Float32}, ::Type{UInt128}) = Float32
×
256
promote_rule(::Type{Float32}, ::Type{Int128}) = Float32
×
257
promote_rule(::Type{Float16}, ::Type{UInt128}) = Float16
×
258
promote_rule(::Type{Float16}, ::Type{Int128}) = Float16
×
259

260
function Float64(x::UInt128)
15✔
261
    if x < UInt128(1) << 104 # Can fit it in two 52 bits mantissas
21,916✔
262
        low_exp = 0x1p52
×
263
        high_exp = 0x1p104
×
264
        low_bits = (x % UInt64) & Base.significand_mask(Float64)
892✔
265
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
892✔
266
        high_bits = ((x >> 52) % UInt64)
892✔
267
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
892✔
268
        low_value + high_value
892✔
269
    else # Large enough that low bits only affect rounding, pack low bits
270
        low_exp = 0x1p76
×
271
        high_exp = 0x1p128
×
272
        low_bits = ((x >> 12) % UInt64) >> 12 | (x % UInt64) & 0xFFFFFF
21,008✔
273
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
21,008✔
274
        high_bits = ((x >> 76) % UInt64)
21,008✔
275
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
21,008✔
276
        low_value + high_value
21,008✔
277
    end
278
end
279

280
function Float64(x::Int128)
60✔
281
    sign_bit = ((x >> 127) % UInt64) << 63
3,445,833✔
282
    ux = uabs(x)
4,097,121✔
283
    if ux < UInt128(1) << 104 # Can fit it in two 52 bits mantissas
4,097,121✔
284
        low_exp = 0x1p52
×
285
        high_exp = 0x1p104
×
286
        low_bits = (ux % UInt64) & Base.significand_mask(Float64)
3,425,812✔
287
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
3,425,812✔
288
        high_bits = ((ux >> 52) % UInt64)
3,425,812✔
289
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
3,425,812✔
290
        reinterpret(Float64, sign_bit | reinterpret(UInt64, low_value + high_value))
3,425,812✔
291
    else # Large enough that low bits only affect rounding, pack low bits
292
        low_exp = 0x1p76
×
293
        high_exp = 0x1p128
×
294
        low_bits = ((ux >> 12) % UInt64) >> 12 | (ux % UInt64) & 0xFFFFFF
20,021✔
295
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
20,021✔
296
        high_bits = ((ux >> 76) % UInt64)
20,021✔
297
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
20,021✔
298
        reinterpret(Float64, sign_bit | reinterpret(UInt64, low_value + high_value))
20,021✔
299
    end
300
end
301

302
function Float32(x::UInt128)
6✔
303
    x == 0 && return 0f0
324✔
304
    n = top_set_bit(x) # ndigits0z(x,2)
308✔
305
    if n <= 24
308✔
306
        y = ((x % UInt32) << (24-n)) & 0x007f_ffff
305✔
307
    else
308
        y = ((x >> (n-25)) % UInt32) & 0x00ff_ffff # keep 1 extra bit
3✔
309
        y = (y+one(UInt32))>>1 # round, ties up (extra leading bit in case of next exponent)
3✔
310
        y &= ~UInt32(trailing_zeros(x) == (n-25)) # fix last bit to round to even
3✔
311
    end
312
    d = ((n+126) % UInt32) << 23
308✔
313
    reinterpret(Float32, d + y)
308✔
314
end
315

316
function Float32(x::Int128)
8✔
317
    x == 0 && return 0f0
326✔
318
    s = ((x >>> 96) % UInt32) & 0x8000_0000 # sign bit
311✔
319
    x = abs(x) % UInt128
311✔
320
    n = top_set_bit(x) # ndigits0z(x,2)
311✔
321
    if n <= 24
311✔
322
        y = ((x % UInt32) << (24-n)) & 0x007f_ffff
306✔
323
    else
324
        y = ((x >> (n-25)) % UInt32) & 0x00ff_ffff # keep 1 extra bit
5✔
325
        y = (y+one(UInt32))>>1 # round, ties up (extra leading bit in case of next exponent)
5✔
326
        y &= ~UInt32(trailing_zeros(x) == (n-25)) # fix last bit to round to even
5✔
327
    end
328
    d = ((n+126) % UInt32) << 23
311✔
329
    reinterpret(Float32, s | d + y)
311✔
330
end
331

332
# TODO: optimize
333
Float16(x::UInt128) = convert(Float16, Float64(x))
34✔
334
Float16(x::Int128)  = convert(Float16, Float64(x))
34✔
335

336
Float16(x::Float32) = fptrunc(Float16, x)
5,935,265✔
337
Float16(x::Float64) = fptrunc(Float16, x)
40,167✔
338
Float32(x::Float64) = fptrunc(Float32, x)
458,599,687✔
339

340
Float32(x::Float16) = fpext(Float32, x)
26,300,269✔
341
Float64(x::Float32) = fpext(Float64, x)
478,239,657✔
342
Float64(x::Float16) = fpext(Float64, x)
3,549,643✔
343

344
AbstractFloat(x::Bool)    = Float64(x)
1,002,547✔
345
AbstractFloat(x::Int8)    = Float64(x)
192✔
346
AbstractFloat(x::Int16)   = Float64(x)
101✔
347
AbstractFloat(x::Int32)   = Float64(x)
68,113✔
348
AbstractFloat(x::Int64)   = Float64(x) # LOSSY
17,401,002✔
349
AbstractFloat(x::Int128)  = Float64(x) # LOSSY
1,417,682✔
350
AbstractFloat(x::UInt8)   = Float64(x)
8,222✔
351
AbstractFloat(x::UInt16)  = Float64(x)
45✔
352
AbstractFloat(x::UInt32)  = Float64(x)
45✔
353
AbstractFloat(x::UInt64)  = Float64(x) # LOSSY
1,683✔
354
AbstractFloat(x::UInt128) = Float64(x) # LOSSY
2,058✔
355

UNCOV
356
Bool(x::Float16) = x==0 ? false : x==1 ? true : throw(InexactError(:Bool, Bool, x))
×
357

358
"""
359
    float(x)
360

361
Convert a number or array to a floating point data type.
362

363
See also: [`complex`](@ref), [`oftype`](@ref), [`convert`](@ref).
364

365
# Examples
366
```jldoctest
367
julia> float(1:1000)
368
1.0:1.0:1000.0
369

370
julia> float(typemax(Int32))
371
2.147483647e9
372
```
373
"""
374
float(x) = AbstractFloat(x)
51,508,587✔
375

376
"""
377
    float(T::Type)
378

379
Return an appropriate type to represent a value of type `T` as a floating point value.
380
Equivalent to `typeof(float(zero(T)))`.
381

382
# Examples
383
```jldoctest
384
julia> float(Complex{Int})
385
ComplexF64 (alias for Complex{Float64})
386

387
julia> float(Int)
388
Float64
389
```
390
"""
391
float(::Type{T}) where {T<:Number} = typeof(float(zero(T)))
3,793✔
392
float(::Type{T}) where {T<:AbstractFloat} = T
23,930✔
393
float(::Type{Union{}}, slurp...) = Union{}(0.0)
×
394

395
"""
396
    unsafe_trunc(T, x)
397

398
Return the nearest integral value of type `T` whose absolute value is
399
less than or equal to the absolute value of `x`. If the value is not representable by `T`,
400
an arbitrary value will be returned.
401
See also [`trunc`](@ref).
402

403
# Examples
404
```jldoctest
405
julia> unsafe_trunc(Int, -2.2)
406
-2
407

408
julia> unsafe_trunc(Int, NaN)
409
-9223372036854775808
410
```
411
"""
412
function unsafe_trunc end
413

414
for Ti in (Int8, Int16, Int32, Int64)
415
    @eval begin
416
        unsafe_trunc(::Type{$Ti}, x::IEEEFloat) = fptosi($Ti, x)
49,726,494✔
417
    end
418
end
419
for Ti in (UInt8, UInt16, UInt32, UInt64)
420
    @eval begin
421
        unsafe_trunc(::Type{$Ti}, x::IEEEFloat) = fptoui($Ti, x)
57,225,925✔
422
    end
423
end
424

425
function unsafe_trunc(::Type{UInt128}, x::Float64)
426
    xu = reinterpret(UInt64,x)
653,288✔
427
    k = Int(xu >> 52) & 0x07ff - 1075
653,288✔
428
    xu = (xu & 0x000f_ffff_ffff_ffff) | 0x0010_0000_0000_0000
653,288✔
429
    if k <= 0
653,288✔
430
        UInt128(xu >> -k)
652,252✔
431
    else
432
        UInt128(xu) << k
1,036✔
433
    end
434
end
435
function unsafe_trunc(::Type{Int128}, x::Float64)
436
    copysign(unsafe_trunc(UInt128,x) % Int128, x)
651,862✔
437
end
438

439
function unsafe_trunc(::Type{UInt128}, x::Float32)
440
    xu = reinterpret(UInt32,x)
622✔
441
    k = Int(xu >> 23) & 0x00ff - 150
622✔
442
    xu = (xu & 0x007f_ffff) | 0x0080_0000
622✔
443
    if k <= 0
622✔
444
        UInt128(xu >> -k)
602✔
445
    else
446
        UInt128(xu) << k
20✔
447
    end
448
end
449
function unsafe_trunc(::Type{Int128}, x::Float32)
450
    copysign(unsafe_trunc(UInt128,x) % Int128, x)
324✔
451
end
452

453
unsafe_trunc(::Type{UInt128}, x::Float16) = unsafe_trunc(UInt128, Float32(x))
14✔
454
unsafe_trunc(::Type{Int128}, x::Float16) = unsafe_trunc(Int128, Float32(x))
12✔
455

456
# matches convert methods
457
# also determines trunc, floor, ceil
458
round(::Type{Signed},   x::IEEEFloat, r::RoundingMode) = round(Int, x, r)
×
459
round(::Type{Unsigned}, x::IEEEFloat, r::RoundingMode) = round(UInt, x, r)
×
460
round(::Type{Integer},  x::IEEEFloat, r::RoundingMode) = round(Int, x, r)
3,384✔
461

462
round(x::IEEEFloat, ::RoundingMode{:ToZero})  = trunc_llvm(x)
36,570,720✔
463
round(x::IEEEFloat, ::RoundingMode{:Down})    = floor_llvm(x)
357,044✔
464
round(x::IEEEFloat, ::RoundingMode{:Up})      = ceil_llvm(x)
642,209✔
465
round(x::IEEEFloat, ::RoundingMode{:Nearest}) = rint_llvm(x)
12,356,601✔
466

467
## floating point promotions ##
468
promote_rule(::Type{Float32}, ::Type{Float16}) = Float32
×
469
promote_rule(::Type{Float64}, ::Type{Float16}) = Float64
×
470
promote_rule(::Type{Float64}, ::Type{Float32}) = Float64
×
471

472
widen(::Type{Float16}) = Float32
×
473
widen(::Type{Float32}) = Float64
×
474

475
## floating point arithmetic ##
476
-(x::IEEEFloat) = neg_float(x)
423,647,443✔
477

478
+(x::T, y::T) where {T<:IEEEFloat} = add_float(x, y)
682,253,394✔
479
-(x::T, y::T) where {T<:IEEEFloat} = sub_float(x, y)
1,287,645,115✔
480
*(x::T, y::T) where {T<:IEEEFloat} = mul_float(x, y)
2,147,483,647✔
481
/(x::T, y::T) where {T<:IEEEFloat} = div_float(x, y)
900,931,227✔
482

483
muladd(x::T, y::T, z::T) where {T<:IEEEFloat} = muladd_float(x, y, z)
823,881,717✔
484

485
# TODO: faster floating point div?
486
# TODO: faster floating point fld?
487
# TODO: faster floating point mod?
488

489
function unbiased_exponent(x::T) where {T<:IEEEFloat}
490
    return (reinterpret(Unsigned, x) & exponent_mask(T)) >> significand_bits(T)
3,613,046✔
491
end
492

493
function explicit_mantissa_noinfnan(x::T) where {T<:IEEEFloat}
494
    m = mantissa(x)
3,613,046✔
495
    issubnormal(x) || (m |= significand_mask(T) + uinttype(T)(1))
7,226,052✔
496
    return m
3,613,046✔
497
end
498

499
function _to_float(number::U, ep) where {U<:Unsigned}
500
    F = floattype(U)
368✔
501
    S = signed(U)
368✔
502
    epint = unsafe_trunc(S,ep)
1,791,644✔
503
    lz::signed(U) = unsafe_trunc(S, Core.Intrinsics.ctlz_int(number) - U(exponent_bits(F)))
1,791,644✔
504
    number <<= lz
1,791,644✔
505
    epint -= lz
1,791,644✔
506
    bits = U(0)
368✔
507
    if epint >= 0
1,791,644✔
508
        bits = number & significand_mask(F)
1,791,628✔
509
        bits |= ((epint + S(1)) << significand_bits(F)) & exponent_mask(F)
1,791,628✔
510
    else
511
        bits = (number >> -epint) & significand_mask(F)
16✔
512
    end
513
    return reinterpret(F, bits)
1,791,644✔
514
end
515

516
@assume_effects :terminates_locally :nothrow function rem_internal(x::T, y::T) where {T<:IEEEFloat}
3,182,453✔
517
    xuint = reinterpret(Unsigned, x)
3,182,465✔
518
    yuint = reinterpret(Unsigned, y)
3,182,465✔
519
    if xuint <= yuint
3,182,465✔
520
        if xuint < yuint
1,375,942✔
521
            return x
1,368,172✔
522
        end
523
        return zero(T)
7,770✔
524
    end
525

526
    e_x = unbiased_exponent(x)
1,806,523✔
527
    e_y = unbiased_exponent(y)
1,806,523✔
528
    # Most common case where |y| is "very normal" and |x/y| < 2^EXPONENT_WIDTH
529
    if e_y > (significand_bits(T)) && (e_x - e_y) <= (exponent_bits(T))
1,806,523✔
530
        m_x = explicit_mantissa_noinfnan(x)
2,782,616✔
531
        m_y = explicit_mantissa_noinfnan(y)
2,782,616✔
532
        d = urem_int((m_x << (e_x - e_y)),  m_y)
1,391,308✔
533
        iszero(d) && return zero(T)
1,391,308✔
534
        return _to_float(d, e_y - uinttype(T)(1))
1,376,714✔
535
    end
536
    # Both are subnormals
537
    if e_x == 0 && e_y == 0
415,215✔
538
        return reinterpret(T, urem_int(xuint, yuint) & significand_mask(T))
×
539
    end
540

541
    m_x = explicit_mantissa_noinfnan(x)
830,418✔
542
    e_x -= uinttype(T)(1)
415,215✔
543
    m_y = explicit_mantissa_noinfnan(y)
830,402✔
544
    lz_m_y = uinttype(T)(exponent_bits(T))
44✔
545
    if e_y > 0
415,215✔
546
        e_y -= uinttype(T)(1)
415,191✔
547
    else
548
        m_y = mantissa(y)
24✔
549
        lz_m_y = Core.Intrinsics.ctlz_int(m_y)
24✔
550
    end
551

552
    tz_m_y = Core.Intrinsics.cttz_int(m_y)
415,215✔
553
    sides_zeroes_cnt = lz_m_y + tz_m_y
415,215✔
554

555
    # n>0
556
    exp_diff = e_x - e_y
415,215✔
557
    # Shift hy right until the end or n = 0
558
    right_shift = min(exp_diff, tz_m_y)
415,215✔
559
    m_y >>= right_shift
415,215✔
560
    exp_diff -= right_shift
415,215✔
561
    e_y += right_shift
415,215✔
562
    # Shift hx left until the end or n = 0
563
    left_shift = min(exp_diff, uinttype(T)(exponent_bits(T)))
415,215✔
564
    m_x <<= left_shift
415,215✔
565
    exp_diff -= left_shift
415,215✔
566

567
    m_x = urem_int(m_x, m_y)
415,215✔
568
    iszero(m_x) && return zero(T)
415,215✔
569
    iszero(exp_diff) && return _to_float(m_x, e_y)
414,930✔
570

571
    while exp_diff > sides_zeroes_cnt
402,808✔
572
        exp_diff -= sides_zeroes_cnt
1,215✔
573
        m_x <<= sides_zeroes_cnt
1,215✔
574
        m_x = urem_int(m_x, m_y)
1,215✔
575
    end
1,215✔
576
    m_x <<= exp_diff
401,593✔
577
    m_x = urem_int(m_x, m_y)
401,593✔
578
    return _to_float(m_x, e_y)
401,601✔
579
end
580

581
function rem(x::T, y::T) where {T<:IEEEFloat}
3,055✔
582
    if isfinite(x) && !iszero(x) && isfinite(y) && !iszero(y)
3,191,335✔
583
        return copysign(rem_internal(abs(x), abs(y)), x)
3,182,483✔
584
    elseif isinf(x) || isnan(y) || iszero(y)  # y can still be Inf
17,707✔
585
        return T(NaN)
41✔
586
    else
587
        return x
8,819✔
588
    end
589
end
590

591
function mod(x::T, y::T) where {T<:AbstractFloat}
5,072✔
592
    r = rem(x,y)
124,693✔
593
    if r == 0
120,304✔
594
        copysign(r,y)
16,204✔
595
    elseif (r > 0) ⊻ (y > 0)
104,100✔
596
        r+y
28,633✔
597
    else
598
        r
826✔
599
    end
600
end
601

602
## floating point comparisons ##
603
==(x::T, y::T) where {T<:IEEEFloat} = eq_float(x, y)
446,978,405✔
604
!=(x::T, y::T) where {T<:IEEEFloat} = ne_float(x, y)
2,147,483,647✔
605
<( x::T, y::T) where {T<:IEEEFloat} = lt_float(x, y)
184,989,758✔
606
<=(x::T, y::T) where {T<:IEEEFloat} = le_float(x, y)
144,690,722✔
607

608
isequal(x::T, y::T) where {T<:IEEEFloat} = fpiseq(x, y)
3,110,301✔
609

610
# interpret as sign-magnitude integer
611
@inline function _fpint(x)
6✔
612
    IntT = inttype(typeof(x))
69,340✔
613
    ix = reinterpret(IntT, x)
90,702,473✔
614
    return ifelse(ix < zero(IntT), ix ⊻ typemax(IntT), ix)
90,702,473✔
615
end
616

617
@inline function isless(a::T, b::T) where T<:IEEEFloat
68✔
618
    (isnan(a) || isnan(b)) && return !isnan(a)
90,962,431✔
619

620
    return _fpint(a) < _fpint(b)
45,465,931✔
621
end
622

623
# Exact Float (Tf) vs Integer (Ti) comparisons
624
# Assumes:
625
# - typemax(Ti) == 2^n-1
626
# - typemax(Ti) can't be exactly represented by Tf:
627
#   => Tf(typemax(Ti)) == 2^n or Inf
628
# - typemin(Ti) can be exactly represented by Tf
629
#
630
# 1. convert y::Ti to float fy::Tf
631
# 2. perform Tf comparison x vs fy
632
# 3. if x == fy, check if (1) resulted in rounding:
633
#  a. convert fy back to Ti and compare with original y
634
#  b. unsafe_convert undefined behaviour if fy == Tf(typemax(Ti))
635
#     (but consequently x == fy > y)
636
for Ti in (Int64,UInt64,Int128,UInt128)
637
    for Tf in (Float32,Float64)
638
        @eval begin
639
            function ==(x::$Tf, y::$Ti)
228,211✔
640
                fy = ($Tf)(y)
8,809,483✔
641
                (x == fy) & (fy != $(Tf(typemax(Ti)))) & (y == unsafe_trunc($Ti,fy))
22,131,660✔
642
            end
643
            ==(y::$Ti, x::$Tf) = x==y
760,478✔
644

645
            function <(x::$Ti, y::$Tf)
5,596✔
646
                fx = ($Tf)(x)
57,577,619✔
647
                (fx < y) | ((fx == y) & ((fx == $(Tf(typemax(Ti)))) | (x < unsafe_trunc($Ti,fx)) ))
57,681,921✔
648
            end
649
            function <=(x::$Ti, y::$Tf)
12,996✔
650
                fx = ($Tf)(x)
256,928✔
651
                (fx < y) | ((fx == y) & ((fx == $(Tf(typemax(Ti)))) | (x <= unsafe_trunc($Ti,fx)) ))
817,444✔
652
            end
653

654
            function <(x::$Tf, y::$Ti)
15,257✔
655
                fy = ($Tf)(y)
824,378✔
656
                (x < fy) | ((x == fy) & (fy < $(Tf(typemax(Ti)))) & (unsafe_trunc($Ti,fy) < y))
1,589,509✔
657
            end
658
            function <=(x::$Tf, y::$Ti)
10,452✔
659
                fy = ($Tf)(y)
26,192✔
660
                (x < fy) | ((x == fy) & (fy < $(Tf(typemax(Ti)))) & (unsafe_trunc($Ti,fy) <= y))
26,923✔
661
            end
662
        end
663
    end
664
end
665
for op in (:(==), :<, :<=)
666
    @eval begin
667
        ($op)(x::Float16, y::Union{Int128,UInt128,Int64,UInt64}) = ($op)(Float64(x), Float64(y))
2,496,839✔
668
        ($op)(x::Union{Int128,UInt128,Int64,UInt64}, y::Float16) = ($op)(Float64(x), Float64(y))
18,499✔
669

670
        ($op)(x::Union{Float16,Float32}, y::Union{Int32,UInt32}) = ($op)(Float64(x), Float64(y))
246,711✔
671
        ($op)(x::Union{Int32,UInt32}, y::Union{Float16,Float32}) = ($op)(Float64(x), Float64(y))
599✔
672

673
        ($op)(x::Float16, y::Union{Int16,UInt16}) = ($op)(Float32(x), Float32(y))
272✔
674
        ($op)(x::Union{Int16,UInt16}, y::Float16) = ($op)(Float32(x), Float32(y))
266✔
675
    end
676
end
677

678

679
abs(x::IEEEFloat) = abs_float(x)
158,823,922✔
680

681
"""
682
    isnan(f) -> Bool
683

684
Test whether a number value is a NaN, an indeterminate value which is neither an infinity
685
nor a finite number ("not a number").
686

687
See also: [`iszero`](@ref), [`isone`](@ref), [`isinf`](@ref), [`ismissing`](@ref).
688
"""
689
isnan(x::AbstractFloat) = (x != x)::Bool
2,147,483,647✔
690
isnan(x::Number) = false
×
691

692
isfinite(x::AbstractFloat) = !isnan(x - x)
626,219,485✔
693
isfinite(x::Real) = decompose(x)[3] != 0
107,249✔
694
isfinite(x::Integer) = true
×
695

696
"""
697
    isinf(f) -> Bool
698

699
Test whether a number is infinite.
700

701
See also: [`Inf`](@ref), [`iszero`](@ref), [`isfinite`](@ref), [`isnan`](@ref).
702
"""
703
isinf(x::Real) = !isnan(x) & !isfinite(x)
141,848✔
704
isinf(x::IEEEFloat) = abs(x) === oftype(x, Inf)
40,758,523✔
705

706
const hx_NaN = hash_uint64(reinterpret(UInt64, NaN))
707
function hash(x::Float64, h::UInt)
235✔
708
    # see comments on trunc and hash(Real, UInt)
709
    if typemin(Int64) <= x < typemax(Int64)
550,417✔
710
        xi = fptosi(Int64, x)
550,265✔
711
        if isequal(xi, x)
550,265✔
712
            return hash(xi, h)
265,199✔
713
        end
714
    elseif typemin(UInt64) <= x < typemax(UInt64)
152✔
715
        xu = fptoui(UInt64, x)
94✔
716
        if isequal(xu, x)
94✔
717
            return hash(xu, h)
94✔
718
        end
719
    elseif isnan(x)
58✔
720
        return hx_NaN ⊻ h # NaN does not have a stable bit pattern
51✔
721
    end
722
    return hash_uint64(bitcast(UInt64, x)) - 3h
521,623✔
723
end
724

725
hash(x::Float32, h::UInt) = hash(Float64(x), h)
6,745✔
726

727
function hash(x::Float16, h::UInt)
728
    # see comments on trunc and hash(Real, UInt)
729
    if isfinite(x) # all finite Float16 fit in Int64
54✔
730
        xi = fptosi(Int64, x)
54✔
731
        if isequal(xi, x)
54✔
732
            return hash(xi, h)
7✔
733
        end
734
    elseif isnan(x)
×
735
        return hx_NaN ⊻ h # NaN does not have a stable bit pattern
×
736
    end
737
    return hash_uint64(bitcast(UInt64, Float64(x))) - 3h
47✔
738
end
739

740
## generic hashing for rational values ##
741
function hash(x::Real, h::UInt)
242,185✔
742
    # decompose x as num*2^pow/den
743
    num, pow, den = decompose(x)
5,493✔
744

745
    # handle special values
746
    num == 0 && den == 0 && return hash(NaN, h)
242,185✔
747
    num == 0 && return hash(ifelse(den > 0, 0.0, -0.0), h)
242,185✔
748
    den == 0 && return hash(ifelse(num > 0, Inf, -Inf), h)
5,107✔
749

750
    # normalize decomposition
751
    if den < 0
5,107✔
752
        num = -num
886✔
753
        den = -den
886✔
754
    end
755
    num_z = trailing_zeros(num)
5,459✔
756
    num >>= num_z
8,121✔
757
    den_z = trailing_zeros(den)
5,107✔
758
    den >>= den_z
5,110✔
759
    pow += num_z - den_z
5,459✔
760
    # If the real can be represented as an Int64, UInt64, or Float64, hash as those types.
761
    # To be an Integer the denominator must be 1 and the power must be non-negative.
762
    if den == 1
5,107✔
763
        # left = ceil(log2(num*2^pow))
764
        left = top_set_bit(abs(num)) + pow
8,921✔
765
        # 2^-1074 is the minimum Float64 so if the power is smaller, not a Float64
766
        if -1074 <= pow
5,456✔
767
            if 0 <= pow # if pow is non-negative, it is an integer
5,456✔
768
                left <= 63 && return hash(Int64(num) << Int(pow), h)
5,349✔
769
                left <= 64 && !signbit(num) && return hash(UInt64(num) << Int(pow), h)
851✔
770
            end # typemin(Int64) handled by Float64 case
771
            # 2^1024 is the maximum Float64 so if the power is greater, not a Float64
772
            # Float64s only have 53 mantisa bits (including implicit bit)
773
            left <= 1024 && left - pow <= 53 && return hash(ldexp(Float64(num), pow), h)
890✔
774
        end
775
    else
776
        h = hash_integer(den, h)
3✔
777
    end
778
    # handle generic rational values
779
    h = hash_integer(pow, h)
680✔
780
    h = hash_integer(num, h)
683✔
781
    return h
680✔
782
end
783

784
#=
785
`decompose(x)`: non-canonical decomposition of rational values as `num*2^pow/den`.
786

787
The decompose function is the point where rational-valued numeric types that support
788
hashing hook into the hashing protocol. `decompose(x)` should return three integer
789
values `num, pow, den`, such that the value of `x` is mathematically equal to
790

791
    num*2^pow/den
792

793
The decomposition need not be canonical in the sense that it just needs to be *some*
794
way to express `x` in this form, not any particular way – with the restriction that
795
`num` and `den` may not share any odd common factors. They may, however, have powers
796
of two in common – the generic hashing code will normalize those as necessary.
797

798
Special values:
799

800
 - `x` is zero: `num` should be zero and `den` should have the same sign as `x`
801
 - `x` is infinite: `den` should be zero and `num` should have the same sign as `x`
802
 - `x` is not a number: `num` and `den` should both be zero
803
=#
804

805
decompose(x::Integer) = x, 0, 1
1,201✔
806

807
function decompose(x::Float16)::NTuple{3,Int}
808
    isnan(x) && return 0, 0, 0
132✔
809
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
132✔
810
    n = reinterpret(UInt16, x)
132✔
811
    s = (n & 0x03ff) % Int16
132✔
812
    e = ((n & 0x7c00) >> 10) % Int
132✔
813
    s |= Int16(e != 0) << 10
132✔
814
    d = ifelse(signbit(x), -1, 1)
132✔
815
    s, e - 25 + (e == 0), d
132✔
816
end
817

818
function decompose(x::Float32)::NTuple{3,Int}
819
    isnan(x) && return 0, 0, 0
206✔
820
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
206✔
821
    n = reinterpret(UInt32, x)
198✔
822
    s = (n & 0x007fffff) % Int32
198✔
823
    e = ((n & 0x7f800000) >> 23) % Int
198✔
824
    s |= Int32(e != 0) << 23
198✔
825
    d = ifelse(signbit(x), -1, 1)
198✔
826
    s, e - 150 + (e == 0), d
198✔
827
end
828

829
function decompose(x::Float64)::Tuple{Int64, Int, Int}
830
    isnan(x) && return 0, 0, 0
18,862✔
831
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
18,862✔
832
    n = reinterpret(UInt64, x)
18,855✔
833
    s = (n & 0x000fffffffffffff) % Int64
18,855✔
834
    e = ((n & 0x7ff0000000000000) >> 52) % Int
18,855✔
835
    s |= Int64(e != 0) << 52
18,855✔
836
    d = ifelse(signbit(x), -1, 1)
18,855✔
837
    s, e - 1075 + (e == 0), d
18,855✔
838
end
839

840

841
"""
842
    precision(num::AbstractFloat; base::Integer=2)
843
    precision(T::Type; base::Integer=2)
844

845
Get the precision of a floating point number, as defined by the effective number of bits in
846
the significand, or the precision of a floating-point type `T` (its current default, if
847
`T` is a variable-precision type like [`BigFloat`](@ref)).
848

849
If `base` is specified, then it returns the maximum corresponding
850
number of significand digits in that base.
851

852
!!! compat "Julia 1.8"
853
    The `base` keyword requires at least Julia 1.8.
854
"""
855
function precision end
856

857
_precision_with_base_2(::Type{Float16}) = 11
×
858
_precision_with_base_2(::Type{Float32}) = 24
×
859
_precision_with_base_2(::Type{Float64}) = 53
×
860
function _precision(x, base::Integer)
90,523✔
861
    base > 1 || throw(DomainError(base, "`base` cannot be less than 2."))
90,551✔
862
    p = _precision_with_base_2(x)
175,414✔
863
    return base == 2 ? Int(p) : floor(Int, p / log2(base))
139,134✔
864
end
865
precision(::Type{T}; base::Integer=2) where {T<:AbstractFloat} = _precision(T, base)
169,994✔
866
precision(::T; base::Integer=2) where {T<:AbstractFloat} = precision(T; base)
303✔
867

868

869
"""
870
    nextfloat(x::AbstractFloat, n::Integer)
871

872
The result of `n` iterative applications of `nextfloat` to `x` if `n >= 0`, or `-n`
873
applications of [`prevfloat`](@ref) if `n < 0`.
874
"""
875
function nextfloat(f::IEEEFloat, d::Integer)
476✔
876
    F = typeof(f)
600,139,653✔
877
    fumax = reinterpret(Unsigned, F(Inf))
600,139,653✔
878
    U = typeof(fumax)
600,139,653✔
879

880
    isnan(f) && return f
1,200,146,383✔
881
    fi = reinterpret(Signed, f)
1,200,146,381✔
882
    fneg = fi < 0
1,200,146,381✔
883
    fu = unsigned(fi & typemax(fi))
1,200,146,381✔
884

885
    dneg = d < 0
600,139,853✔
886
    da = uabs(d)
600,139,853✔
887
    if da > typemax(U)
1,200,146,381✔
888
        fneg = dneg
4✔
889
        fu = fumax
4✔
890
    else
891
        du = da % U
600,139,648✔
892
        if fneg ⊻ dneg
1,200,146,377✔
893
            if du > fu
131,910✔
894
                fu = min(fumax, du - fu)
105✔
895
                fneg = !fneg
105✔
896
            else
897
                fu = fu - du
131,805✔
898
            end
899
        else
900
            if fumax - fu < du
1,200,014,467✔
901
                fu = fumax
38✔
902
            else
903
                fu = fu + du
1,200,014,425✔
904
            end
905
        end
906
    end
907
    if fneg
1,200,146,381✔
908
        fu |= sign_mask(F)
262✔
909
    end
910
    reinterpret(F, fu)
1,200,146,381✔
911
end
912

913
"""
914
    nextfloat(x::AbstractFloat)
915

916
Return the smallest floating point number `y` of the same type as `x` such `x < y`. If no
917
such `y` exists (e.g. if `x` is `Inf` or `NaN`), then return `x`.
918

919
See also: [`prevfloat`](@ref), [`eps`](@ref), [`issubnormal`](@ref).
920
"""
921
nextfloat(x::AbstractFloat) = nextfloat(x,1)
2,147,483,647✔
922

923
"""
924
    prevfloat(x::AbstractFloat, n::Integer)
925

926
The result of `n` iterative applications of `prevfloat` to `x` if `n >= 0`, or `-n`
927
applications of [`nextfloat`](@ref) if `n < 0`.
928
"""
929
prevfloat(x::AbstractFloat, d::Integer) = nextfloat(x, -d)
261✔
930

931
"""
932
    prevfloat(x::AbstractFloat)
933

934
Return the largest floating point number `y` of the same type as `x` such `y < x`. If no
935
such `y` exists (e.g. if `x` is `-Inf` or `NaN`), then return `x`.
936
"""
937
prevfloat(x::AbstractFloat) = nextfloat(x,-1)
264,326✔
938

939
for Ti in (Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128)
940
    for Tf in (Float16, Float32, Float64)
941
        if Ti <: Unsigned || sizeof(Ti) < sizeof(Tf)
942
            # Here `Tf(typemin(Ti))-1` is exact, so we can compare the lower-bound
943
            # directly. `Tf(typemax(Ti))+1` is either always exactly representable, or
944
            # rounded to `Inf` (e.g. when `Ti==UInt128 && Tf==Float32`).
945
            @eval begin
946
                function round(::Type{$Ti},x::$Tf,::RoundingMode{:ToZero})
947
                    if $(Tf(typemin(Ti))-one(Tf)) < x < $(Tf(typemax(Ti))+one(Tf))
1,113✔
948
                        return unsafe_trunc($Ti,x)
1,113✔
949
                    else
950
                        throw(InexactError(:round, $Ti, x, RoundToZero))
×
951
                    end
952
                end
953
                function (::Type{$Ti})(x::$Tf)
362✔
954
                    # When typemax(Ti) is not representable by Tf but typemax(Ti) + 1 is,
955
                    # then < Tf(typemax(Ti) + 1) is stricter than <= Tf(typemax(Ti)). Using
956
                    # the former causes us to throw on UInt64(Float64(typemax(UInt64))+1)
957
                    if ($(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti))+one(Tf))) && isinteger(x)
23,442✔
958
                        return unsafe_trunc($Ti,x)
24,103✔
959
                    else
960
                        throw(InexactError($(Expr(:quote,Ti.name.name)), $Ti, x))
340✔
961
                    end
962
                end
963
            end
964
        else
965
            # Here `eps(Tf(typemin(Ti))) > 1`, so the only value which can be truncated to
966
            # `Tf(typemin(Ti)` is itself. Similarly, `Tf(typemax(Ti))` is inexact and will
967
            # be rounded up. This assumes that `Tf(typemin(Ti)) > -Inf`, which is true for
968
            # these types, but not for `Float16` or larger integer types.
969
            @eval begin
970
                function round(::Type{$Ti},x::$Tf,::RoundingMode{:ToZero})
971
                    if $(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti)))
22,074,449✔
972
                        return unsafe_trunc($Ti,x)
22,074,449✔
973
                    else
974
                        throw(InexactError(:round, $Ti, x, RoundToZero))
×
975
                    end
976
                end
977
                function (::Type{$Ti})(x::$Tf)
208✔
978
                    if ($(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti)))) && isinteger(x)
36,452,216✔
979
                        return unsafe_trunc($Ti,x)
36,451,986✔
980
                    else
981
                        throw(InexactError($(Expr(:quote,Ti.name.name)), $Ti, x))
229✔
982
                    end
983
                end
984
            end
985
        end
986
    end
987
end
988

989
"""
990
    issubnormal(f) -> Bool
991

992
Test whether a floating point number is subnormal.
993

994
An IEEE floating point number is [subnormal](https://en.wikipedia.org/wiki/Subnormal_number)
995
when its exponent bits are zero and its significand is not zero.
996

997
# Examples
998
```jldoctest
999
julia> floatmin(Float32)
1000
1.1754944f-38
1001

1002
julia> issubnormal(1.0f-37)
1003
false
1004

1005
julia> issubnormal(1.0f-38)
1006
true
1007
```
1008
"""
1009
function issubnormal(x::T) where {T<:IEEEFloat}
141,220✔
1010
    y = reinterpret(Unsigned, x)
7,275,800✔
1011
    (y & exponent_mask(T) == 0) & (y & significand_mask(T) != 0)
7,275,800✔
1012
end
1013

1014
ispow2(x::AbstractFloat) = !iszero(x) && frexp(x)[1] == 0.5
42✔
1015
iseven(x::AbstractFloat) = isinteger(x) && (abs(x) > maxintfloat(x) || iseven(Integer(x)))
52✔
1016
isodd(x::AbstractFloat) = isinteger(x) && abs(x) ≤ maxintfloat(x) && isodd(Integer(x))
28✔
1017

1018
@eval begin
1019
    typemin(::Type{Float16}) = $(bitcast(Float16, 0xfc00))
×
1020
    typemax(::Type{Float16}) = $(Inf16)
×
1021
    typemin(::Type{Float32}) = $(-Inf32)
×
1022
    typemax(::Type{Float32}) = $(Inf32)
×
1023
    typemin(::Type{Float64}) = $(-Inf64)
×
1024
    typemax(::Type{Float64}) = $(Inf64)
×
1025
    typemin(x::T) where {T<:Real} = typemin(T)
5,054✔
1026
    typemax(x::T) where {T<:Real} = typemax(T)
600,332,081✔
1027

1028
    floatmin(::Type{Float16}) = $(bitcast(Float16, 0x0400))
×
1029
    floatmin(::Type{Float32}) = $(bitcast(Float32, 0x00800000))
×
1030
    floatmin(::Type{Float64}) = $(bitcast(Float64, 0x0010000000000000))
×
1031
    floatmax(::Type{Float16}) = $(bitcast(Float16, 0x7bff))
×
1032
    floatmax(::Type{Float32}) = $(bitcast(Float32, 0x7f7fffff))
×
1033
    floatmax(::Type{Float64}) = $(bitcast(Float64, 0x7fefffffffffffff))
×
1034

1035
    eps(::Type{Float16}) = $(bitcast(Float16, 0x1400))
×
1036
    eps(::Type{Float32}) = $(bitcast(Float32, 0x34000000))
×
1037
    eps(::Type{Float64}) = $(bitcast(Float64, 0x3cb0000000000000))
×
1038
    eps() = eps(Float64)
554✔
1039
end
1040

1041
eps(x::AbstractFloat) = isfinite(x) ? abs(x) >= floatmin(x) ? ldexp(eps(typeof(x)), exponent(x)) : nextfloat(zero(x)) : oftype(x, NaN)
1,868✔
1042

1043
function eps(x::T) where T<:IEEEFloat
629,433✔
1044
    # For isfinite(x), toggling the LSB will produce either prevfloat(x) or
1045
    # nextfloat(x) but will never change the sign or exponent.
1046
    # For !isfinite(x), this will map Inf to NaN and NaN to NaN or Inf.
1047
    y = reinterpret(T, reinterpret(Unsigned, x) ⊻ true)
1,901,765✔
1048
    # The absolute difference between these values is eps(x). This is true even
1049
    # for Inf/NaN values.
1050
    return abs(x - y)
1,901,765✔
1051
end
1052

1053
"""
1054
    floatmin(T = Float64)
1055

1056
Return the smallest positive normal number representable by the floating-point
1057
type `T`.
1058

1059
# Examples
1060
```jldoctest
1061
julia> floatmin(Float16)
1062
Float16(6.104e-5)
1063

1064
julia> floatmin(Float32)
1065
1.1754944f-38
1066

1067
julia> floatmin()
1068
2.2250738585072014e-308
1069
```
1070
"""
1071
floatmin(x::T) where {T<:AbstractFloat} = floatmin(T)
1,567,429✔
1072

1073
"""
1074
    floatmax(T = Float64)
1075

1076
Return the largest finite number representable by the floating-point type `T`.
1077

1078
See also: [`typemax`](@ref), [`floatmin`](@ref), [`eps`](@ref).
1079

1080
# Examples
1081
```jldoctest
1082
julia> floatmax(Float16)
1083
Float16(6.55e4)
1084

1085
julia> floatmax(Float32)
1086
3.4028235f38
1087

1088
julia> floatmax()
1089
1.7976931348623157e308
1090

1091
julia> typemax(Float64)
1092
Inf
1093
```
1094
"""
1095
floatmax(x::T) where {T<:AbstractFloat} = floatmax(T)
782,783✔
1096

1097
floatmin() = floatmin(Float64)
16✔
1098
floatmax() = floatmax(Float64)
19✔
1099

1100
"""
1101
    eps(::Type{T}) where T<:AbstractFloat
1102
    eps()
1103

1104
Return the *machine epsilon* of the floating point type `T` (`T = Float64` by
1105
default). This is defined as the gap between 1 and the next largest value representable by
1106
`typeof(one(T))`, and is equivalent to `eps(one(T))`.  (Since `eps(T)` is a
1107
bound on the *relative error* of `T`, it is a "dimensionless" quantity like [`one`](@ref).)
1108

1109
# Examples
1110
```jldoctest
1111
julia> eps()
1112
2.220446049250313e-16
1113

1114
julia> eps(Float32)
1115
1.1920929f-7
1116

1117
julia> 1.0 + eps()
1118
1.0000000000000002
1119

1120
julia> 1.0 + eps()/2
1121
1.0
1122
```
1123
"""
1124
eps(::Type{<:AbstractFloat})
1125

1126
"""
1127
    eps(x::AbstractFloat)
1128

1129
Return the *unit in last place* (ulp) of `x`. This is the distance between consecutive
1130
representable floating point values at `x`. In most cases, if the distance on either side
1131
of `x` is different, then the larger of the two is taken, that is
1132

1133
    eps(x) == max(x-prevfloat(x), nextfloat(x)-x)
1134

1135
The exceptions to this rule are the smallest and largest finite values
1136
(e.g. `nextfloat(-Inf)` and `prevfloat(Inf)` for [`Float64`](@ref)), which round to the
1137
smaller of the values.
1138

1139
The rationale for this behavior is that `eps` bounds the floating point rounding
1140
error. Under the default `RoundNearest` rounding mode, if ``y`` is a real number and ``x``
1141
is the nearest floating point number to ``y``, then
1142

1143
```math
1144
|y-x| \\leq \\operatorname{eps}(x)/2.
1145
```
1146

1147
See also: [`nextfloat`](@ref), [`issubnormal`](@ref), [`floatmax`](@ref).
1148

1149
# Examples
1150
```jldoctest
1151
julia> eps(1.0)
1152
2.220446049250313e-16
1153

1154
julia> eps(prevfloat(2.0))
1155
2.220446049250313e-16
1156

1157
julia> eps(2.0)
1158
4.440892098500626e-16
1159

1160
julia> x = prevfloat(Inf)      # largest finite Float64
1161
1.7976931348623157e308
1162

1163
julia> x + eps(x)/2            # rounds up
1164
Inf
1165

1166
julia> x + prevfloat(eps(x)/2) # rounds down
1167
1.7976931348623157e308
1168
```
1169
"""
1170
eps(::AbstractFloat)
1171

1172

1173
## byte order swaps for arbitrary-endianness serialization/deserialization ##
1174
bswap(x::IEEEFloat) = bswap_int(x)
7✔
1175

1176
# integer size of float
1177
uinttype(::Type{Float64}) = UInt64
×
1178
uinttype(::Type{Float32}) = UInt32
×
1179
uinttype(::Type{Float16}) = UInt16
×
1180
inttype(::Type{Float64}) = Int64
×
1181
inttype(::Type{Float32}) = Int32
×
1182
inttype(::Type{Float16}) = Int16
×
1183
# float size of integer
1184
floattype(::Type{UInt64}) = Float64
×
1185
floattype(::Type{UInt32}) = Float32
×
1186
floattype(::Type{UInt16}) = Float16
×
1187
floattype(::Type{Int64}) = Float64
×
1188
floattype(::Type{Int32}) = Float32
×
1189
floattype(::Type{Int16}) = Float16
×
1190

1191

1192
## Array operations on floating point numbers ##
1193

1194
float(A::AbstractArray{<:AbstractFloat}) = A
2✔
1195

1196
function float(A::AbstractArray{T}) where T
302✔
1197
    if !isconcretetype(T)
327✔
1198
        error("`float` not defined on abstractly-typed arrays; please convert to a more specific type")
×
1199
    end
1200
    convert(AbstractArray{typeof(float(zero(T)))}, A)
334✔
1201
end
1202

1203
float(r::StepRange) = float(r.start):float(r.step):float(last(r))
49✔
1204
float(r::UnitRange) = float(r.start):float(last(r))
49✔
1205
float(r::StepRangeLen{T}) where {T} =
4✔
1206
    StepRangeLen{typeof(float(T(r.ref)))}(float(r.ref), float(r.step), length(r), r.offset)
1207
function float(r::LinRange)
×
1208
    LinRange(float(r.start), float(r.stop), length(r))
×
1209
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc