• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

JuliaLang / julia / #38002

06 Feb 2025 06:14AM UTC coverage: 20.322% (-2.4%) from 22.722%
#38002

push

local

web-flow
bpart: Fully switch to partitioned semantics (#57253)

This is the final PR in the binding partitions series (modulo bugs and
tweaks), i.e. it closes #54654 and thus closes #40399, which was the
original design sketch.

This thus activates the full designed semantics for binding partitions,
in particular allowing safe replacement of const bindings. It in
particular allows struct redefinitions. This thus closes
timholy/Revise.jl#18 and also closes #38584.

The biggest semantic change here is probably that this gets rid of the
notion of "resolvedness" of a binding. Previously, a lot of the behavior
of our implementation depended on when bindings were "resolved", which
could happen at basically an arbitrary point (in the compiler, in REPL
completion, in a different thread), making a lot of the semantics around
bindings ill- or at least implementation-defined. There are several
related issues in the bugtracker, so this closes #14055 closes #44604
closes #46354 closes #30277

It is also the last step to close #24569.
It also supports bindings for undef->defined transitions and thus closes
#53958 closes #54733 - however, this is not activated yet for
performance reasons and may need some further optimization.

Since resolvedness no longer exists, we need to replace it with some
hopefully more well-defined semantics. I will describe the semantics
below, but before I do I will make two notes:

1. There are a number of cases where these semantics will behave
slightly differently than the old semantics absent some other task going
around resolving random bindings.
2. The new behavior (except for the replacement stuff) was generally
permissible under the old semantics if the bindings happened to be
resolved at the right time.

With all that said, there are essentially three "strengths" of bindings:

1. Implicit Bindings: Anything implicitly obtained from `using Mod`, "no
binding", plus slightly more exotic corner cases around conflicts

2. Weakly declared bindin... (continued)

11 of 111 new or added lines in 7 files covered. (9.91%)

1273 existing lines in 68 files now uncovered.

9908 of 48755 relevant lines covered (20.32%)

105126.48 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

14.02
/base/float.jl
1
# This file is a part of Julia. License is MIT: https://julialang.org/license
2

3
const IEEEFloat = Union{Float16, Float32, Float64}
4

5
## floating point traits ##
6

7
"""
8
    Inf16
9

10
Positive infinity of type [`Float16`](@ref).
11
"""
12
const Inf16 = bitcast(Float16, 0x7c00)
13
"""
14
    NaN16
15

16
A not-a-number value of type [`Float16`](@ref).
17

18
See also: [`NaN`](@ref).
19
"""
20
const NaN16 = bitcast(Float16, 0x7e00)
21
"""
22
    Inf32
23

24
Positive infinity of type [`Float32`](@ref).
25
"""
26
const Inf32 = bitcast(Float32, 0x7f800000)
27
"""
28
    NaN32
29

30
A not-a-number value of type [`Float32`](@ref).
31

32
See also: [`NaN`](@ref).
33
"""
34
const NaN32 = bitcast(Float32, 0x7fc00000)
35
const Inf64 = bitcast(Float64, 0x7ff0000000000000)
36
const NaN64 = bitcast(Float64, 0x7ff8000000000000)
37

38
const Inf = Inf64
39
"""
40
    Inf, Inf64
41

42
Positive infinity of type [`Float64`](@ref).
43

44
See also: [`isfinite`](@ref), [`typemax`](@ref), [`NaN`](@ref), [`Inf32`](@ref).
45

46
# Examples
47
```jldoctest
48
julia> π/0
49
Inf
50

51
julia> +1.0 / -0.0
52
-Inf
53

54
julia> ℯ^-Inf
55
0.0
56
```
57
"""
58
Inf, Inf64
59

60
const NaN = NaN64
61
"""
62
    NaN, NaN64
63

64
A not-a-number value of type [`Float64`](@ref).
65

66
See also: [`isnan`](@ref), [`missing`](@ref), [`NaN32`](@ref), [`Inf`](@ref).
67

68
# Examples
69
```jldoctest
70
julia> 0/0
71
NaN
72

73
julia> Inf - Inf
74
NaN
75

76
julia> NaN == NaN, isequal(NaN, NaN), isnan(NaN)
77
(false, true, true)
78
```
79

80
!!! note
81
    Always use [`isnan`](@ref) or [`isequal`](@ref) for checking for `NaN`.
82
    Using `x === NaN` may give unexpected results:
83
    ```julia-repl
84
    julia> reinterpret(UInt32, NaN32)
85
    0x7fc00000
86

87
    julia> NaN32p1 = reinterpret(Float32, 0x7fc00001)
88
    NaN32
89

90
    julia> NaN32p1 === NaN32, isequal(NaN32p1, NaN32), isnan(NaN32p1)
91
    (false, true, true)
92
    ```
93
"""
94
NaN, NaN64
95

96
# bit patterns
97
reinterpret(::Type{Unsigned}, x::Float64) = reinterpret(UInt64, x)
8✔
98
reinterpret(::Type{Unsigned}, x::Float32) = reinterpret(UInt32, x)
×
99
reinterpret(::Type{Unsigned}, x::Float16) = reinterpret(UInt16, x)
×
100
reinterpret(::Type{Signed}, x::Float64) = reinterpret(Int64, x)
×
101
reinterpret(::Type{Signed}, x::Float32) = reinterpret(Int32, x)
×
102
reinterpret(::Type{Signed}, x::Float16) = reinterpret(Int16, x)
×
103

104
sign_mask(::Type{Float64}) =        0x8000_0000_0000_0000
×
105
exponent_mask(::Type{Float64}) =    0x7ff0_0000_0000_0000
×
106
exponent_one(::Type{Float64}) =     0x3ff0_0000_0000_0000
×
107
exponent_half(::Type{Float64}) =    0x3fe0_0000_0000_0000
×
108
significand_mask(::Type{Float64}) = 0x000f_ffff_ffff_ffff
×
109

110
sign_mask(::Type{Float32}) =        0x8000_0000
×
111
exponent_mask(::Type{Float32}) =    0x7f80_0000
×
112
exponent_one(::Type{Float32}) =     0x3f80_0000
×
113
exponent_half(::Type{Float32}) =    0x3f00_0000
×
114
significand_mask(::Type{Float32}) = 0x007f_ffff
×
115

116
sign_mask(::Type{Float16}) =        0x8000
×
117
exponent_mask(::Type{Float16}) =    0x7c00
×
118
exponent_one(::Type{Float16}) =     0x3c00
×
119
exponent_half(::Type{Float16}) =    0x3800
×
120
significand_mask(::Type{Float16}) = 0x03ff
×
121

122
mantissa(x::T) where {T} = reinterpret(Unsigned, x) & significand_mask(T)
2✔
123

124
for T in (Float16, Float32, Float64)
125
    sb = trailing_ones(significand_mask(T))
126
    em = exponent_mask(T)
127
    eb = Int(exponent_one(T) >> sb)
128
    @eval significand_bits(::Type{$T}) = $(sb)
×
129
    @eval exponent_bits(::Type{$T}) = $(sizeof(T)*8 - sb - 1)
×
130
    @eval exponent_bias(::Type{$T}) = $(eb)
×
131
    # maximum float exponent
132
    @eval exponent_max(::Type{$T}) = $(Int(em >> sb) - eb - 1)
×
133
    # maximum float exponent without bias
134
    @eval exponent_raw_max(::Type{$T}) = $(Int(em >> sb))
×
135
end
136

137
"""
138
    exponent_max(T)
139

140
Maximum [`exponent`](@ref) value for a floating point number of type `T`.
141

142
# Examples
143
```jldoctest
144
julia> Base.exponent_max(Float64)
145
1023
146
```
147

148
Note, `exponent_max(T) + 1` is a possible value of the exponent field
149
with bias, which might be used as sentinel value for `Inf` or `NaN`.
150
"""
151
function exponent_max end
152

153
"""
154
    exponent_raw_max(T)
155

156
Maximum value of the [`exponent`](@ref) field for a floating point number of type `T` without bias,
157
i.e. the maximum integer value representable by [`exponent_bits(T)`](@ref) bits.
158
"""
159
function exponent_raw_max end
160

161
"""
162
IEEE 754 definition of the minimum exponent.
163
"""
164
ieee754_exponent_min(::Type{T}) where {T<:IEEEFloat} = Int(1 - exponent_max(T))::Int
×
165

166
exponent_min(::Type{Float16}) = ieee754_exponent_min(Float16)
×
167
exponent_min(::Type{Float32}) = ieee754_exponent_min(Float32)
×
168
exponent_min(::Type{Float64}) = ieee754_exponent_min(Float64)
×
169

170
function ieee754_representation(
×
171
    ::Type{F}, sign_bit::Bool, exponent_field::Integer, significand_field::Integer
172
) where {F<:IEEEFloat}
173
    T = uinttype(F)
×
174
    ret::T = sign_bit
×
175
    ret <<= exponent_bits(F)
×
176
    ret |= exponent_field
×
177
    ret <<= significand_bits(F)
×
178
    ret |= significand_field
×
179
end
180

181
# ±floatmax(T)
182
function ieee754_representation(
×
183
    ::Type{F}, sign_bit::Bool, ::Val{:omega}
184
) where {F<:IEEEFloat}
185
    ieee754_representation(F, sign_bit, exponent_raw_max(F) - 1, significand_mask(F))
×
186
end
187

188
# NaN or an infinity
189
function ieee754_representation(
×
190
    ::Type{F}, sign_bit::Bool, significand_field::Integer, ::Val{:nan}
191
) where {F<:IEEEFloat}
192
    ieee754_representation(F, sign_bit, exponent_raw_max(F), significand_field)
×
193
end
194

195
# NaN with default payload
196
function ieee754_representation(
×
197
    ::Type{F}, sign_bit::Bool, ::Val{:nan}
198
) where {F<:IEEEFloat}
199
    ieee754_representation(F, sign_bit, one(uinttype(F)) << (significand_bits(F) - 1), Val(:nan))
×
200
end
201

202
# Infinity
203
function ieee754_representation(
×
204
    ::Type{F}, sign_bit::Bool, ::Val{:inf}
205
) where {F<:IEEEFloat}
206
    ieee754_representation(F, sign_bit, false, Val(:nan))
×
207
end
208

209
# Subnormal or zero
210
function ieee754_representation(
×
211
    ::Type{F}, sign_bit::Bool, significand_field::Integer, ::Val{:subnormal}
212
) where {F<:IEEEFloat}
213
    ieee754_representation(F, sign_bit, false, significand_field)
×
214
end
215

216
# Zero
217
function ieee754_representation(
×
218
    ::Type{F}, sign_bit::Bool, ::Val{:zero}
219
) where {F<:IEEEFloat}
220
    ieee754_representation(F, sign_bit, false, Val(:subnormal))
×
221
end
222

223
"""
224
    uabs(x::Integer)
225

226
Return the absolute value of `x`, possibly returning a different type should the
227
operation be susceptible to overflow. This typically arises when `x` is a two's complement
228
signed integer, so that `abs(typemin(x)) == typemin(x) < 0`, in which case the result of
229
`uabs(x)` will be an unsigned integer of the same size.
230
"""
231
uabs(x::Integer) = abs(x)
×
232
uabs(x::BitSigned) = unsigned(abs(x))
×
233

234
## conversions to floating-point ##
235

236
# TODO: deprecate in 2.0
237
Float16(x::Integer) = convert(Float16, convert(Float32, x)::Float32)
×
238

239
for t1 in (Float16, Float32, Float64)
240
    for st in (Int8, Int16, Int32, Int64)
241
        @eval begin
242
            (::Type{$t1})(x::($st)) = sitofp($t1, x)
7,928✔
243
            promote_rule(::Type{$t1}, ::Type{$st}) = $t1
×
244
        end
245
    end
246
    for ut in (Bool, UInt8, UInt16, UInt32, UInt64)
247
        @eval begin
248
            (::Type{$t1})(x::($ut)) = uitofp($t1, x)
230✔
249
            promote_rule(::Type{$t1}, ::Type{$ut}) = $t1
×
250
        end
251
    end
252
end
253

254
promote_rule(::Type{Float64}, ::Type{UInt128}) = Float64
×
255
promote_rule(::Type{Float64}, ::Type{Int128}) = Float64
×
256
promote_rule(::Type{Float32}, ::Type{UInt128}) = Float32
×
257
promote_rule(::Type{Float32}, ::Type{Int128}) = Float32
×
258
promote_rule(::Type{Float16}, ::Type{UInt128}) = Float16
×
259
promote_rule(::Type{Float16}, ::Type{Int128}) = Float16
×
260

261
function Float64(x::UInt128)
×
262
    if x < UInt128(1) << 104 # Can fit it in two 52 bits mantissas
×
263
        low_exp = 0x1p52
×
264
        high_exp = 0x1p104
×
265
        low_bits = (x % UInt64) & Base.significand_mask(Float64)
×
266
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
×
267
        high_bits = ((x >> 52) % UInt64)
×
268
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
×
269
        low_value + high_value
×
270
    else # Large enough that low bits only affect rounding, pack low bits
271
        low_exp = 0x1p76
×
272
        high_exp = 0x1p128
×
273
        low_bits = ((x >> 12) % UInt64) >> 12 | (x % UInt64) & 0xFFFFFF
×
274
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
×
275
        high_bits = ((x >> 76) % UInt64)
×
276
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
×
277
        low_value + high_value
×
278
    end
279
end
280

281
function Float64(x::Int128)
×
282
    sign_bit = ((x >> 127) % UInt64) << 63
×
283
    ux = uabs(x)
×
284
    if ux < UInt128(1) << 104 # Can fit it in two 52 bits mantissas
×
285
        low_exp = 0x1p52
×
286
        high_exp = 0x1p104
×
287
        low_bits = (ux % UInt64) & Base.significand_mask(Float64)
×
288
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
×
289
        high_bits = ((ux >> 52) % UInt64)
×
290
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
×
291
        reinterpret(Float64, sign_bit | reinterpret(UInt64, low_value + high_value))
×
292
    else # Large enough that low bits only affect rounding, pack low bits
293
        low_exp = 0x1p76
×
294
        high_exp = 0x1p128
×
295
        low_bits = ((ux >> 12) % UInt64) >> 12 | (ux % UInt64) & 0xFFFFFF
×
296
        low_value = reinterpret(Float64, reinterpret(UInt64, low_exp) | low_bits) - low_exp
×
297
        high_bits = ((ux >> 76) % UInt64)
×
298
        high_value = reinterpret(Float64, reinterpret(UInt64, high_exp) | high_bits) - high_exp
×
299
        reinterpret(Float64, sign_bit | reinterpret(UInt64, low_value + high_value))
×
300
    end
301
end
302

303
function Float32(x::UInt128)
×
304
    x == 0 && return 0f0
×
305
    n = top_set_bit(x) # ndigits0z(x,2)
×
306
    if n <= 24
×
307
        y = ((x % UInt32) << (24-n)) & 0x007f_ffff
×
308
    else
309
        y = ((x >> (n-25)) % UInt32) & 0x00ff_ffff # keep 1 extra bit
×
310
        y = (y+one(UInt32))>>1 # round, ties up (extra leading bit in case of next exponent)
×
311
        y &= ~UInt32(trailing_zeros(x) == (n-25)) # fix last bit to round to even
×
312
    end
313
    d = ((n+126) % UInt32) << 23
×
314
    reinterpret(Float32, d + y)
×
315
end
316

317
function Float32(x::Int128)
×
318
    x == 0 && return 0f0
×
319
    s = ((x >>> 96) % UInt32) & 0x8000_0000 # sign bit
×
320
    x = abs(x) % UInt128
×
321
    n = top_set_bit(x) # ndigits0z(x,2)
×
322
    if n <= 24
×
323
        y = ((x % UInt32) << (24-n)) & 0x007f_ffff
×
324
    else
325
        y = ((x >> (n-25)) % UInt32) & 0x00ff_ffff # keep 1 extra bit
×
326
        y = (y+one(UInt32))>>1 # round, ties up (extra leading bit in case of next exponent)
×
327
        y &= ~UInt32(trailing_zeros(x) == (n-25)) # fix last bit to round to even
×
328
    end
329
    d = ((n+126) % UInt32) << 23
×
330
    reinterpret(Float32, s | d + y)
×
331
end
332

333
# TODO: optimize
334
Float16(x::UInt128) = convert(Float16, Float64(x))
×
335
Float16(x::Int128)  = convert(Float16, Float64(x))
×
336

337
Float16(x::Float32) = fptrunc(Float16, x)
×
338
Float16(x::Float64) = fptrunc(Float16, x)
×
339
Float32(x::Float64) = fptrunc(Float32, x)
×
340

341
Float32(x::Float16) = fpext(Float32, x)
×
342
Float64(x::Float32) = fpext(Float64, x)
×
343
Float64(x::Float16) = fpext(Float64, x)
×
344

345
AbstractFloat(x::Bool)    = Float64(x)
×
346
AbstractFloat(x::Int8)    = Float64(x)
×
347
AbstractFloat(x::Int16)   = Float64(x)
×
348
AbstractFloat(x::Int32)   = Float64(x)
×
349
AbstractFloat(x::Int64)   = Float64(x) # LOSSY
7,911✔
350
AbstractFloat(x::Int128)  = Float64(x) # LOSSY
×
351
AbstractFloat(x::UInt8)   = Float64(x)
×
352
AbstractFloat(x::UInt16)  = Float64(x)
×
353
AbstractFloat(x::UInt32)  = Float64(x)
×
354
AbstractFloat(x::UInt64)  = Float64(x) # LOSSY
12✔
355
AbstractFloat(x::UInt128) = Float64(x) # LOSSY
×
356

357
Bool(x::Float16) = x==0 ? false : x==1 ? true : throw(InexactError(:Bool, Bool, x))
×
358

359
"""
360
    float(x)
361

362
Convert a number or array to a floating point data type.
363

364
See also: [`complex`](@ref), [`oftype`](@ref), [`convert`](@ref).
365

366
# Examples
367
```jldoctest
368
julia> float(1:1000)
369
1.0:1.0:1000.0
370

371
julia> float(typemax(Int32))
372
2.147483647e9
373
```
374
"""
375
float(x) = AbstractFloat(x)
7,923✔
376

377
"""
378
    float(T::Type)
379

380
Return an appropriate type to represent a value of type `T` as a floating point value.
381
Equivalent to `typeof(float(zero(T)))`.
382

383
# Examples
384
```jldoctest
385
julia> float(Complex{Int})
386
ComplexF64 (alias for Complex{Float64})
387

388
julia> float(Int)
389
Float64
390
```
391
"""
392
float(::Type{T}) where {T<:Number} = typeof(float(zero(T)))
×
393
float(::Type{T}) where {T<:AbstractFloat} = T
×
394
float(::Type{Union{}}, slurp...) = Union{}(0.0)
×
395

396
"""
397
    unsafe_trunc(T, x)
398

399
Return the nearest integral value of type `T` whose absolute value is
400
less than or equal to the absolute value of `x`. If the value is not representable by `T`,
401
an arbitrary value will be returned.
402
See also [`trunc`](@ref).
403

404
# Examples
405
```jldoctest
406
julia> unsafe_trunc(Int, -2.2)
407
-2
408

409
julia> unsafe_trunc(Int, NaN)
410
-9223372036854775808
411
```
412
"""
413
function unsafe_trunc end
414

415
for Ti in (Int8, Int16, Int32, Int64)
416
    @eval begin
417
        unsafe_trunc(::Type{$Ti}, x::IEEEFloat) = fptosi($Ti, x)
29✔
418
    end
419
end
420
for Ti in (UInt8, UInt16, UInt32, UInt64)
421
    @eval begin
422
        unsafe_trunc(::Type{$Ti}, x::IEEEFloat) = fptoui($Ti, x)
173✔
423
    end
424
end
425

426
function unsafe_trunc(::Type{UInt128}, x::Float64)
×
427
    xu = reinterpret(UInt64,x)
×
428
    k = Int(xu >> 52) & 0x07ff - 1075
×
429
    xu = (xu & 0x000f_ffff_ffff_ffff) | 0x0010_0000_0000_0000
×
430
    if k <= 0
×
431
        UInt128(xu >> -k)
×
432
    else
433
        UInt128(xu) << k
×
434
    end
435
end
436
function unsafe_trunc(::Type{Int128}, x::Float64)
×
437
    copysign(unsafe_trunc(UInt128,x) % Int128, x)
×
438
end
439

440
function unsafe_trunc(::Type{UInt128}, x::Float32)
×
441
    xu = reinterpret(UInt32,x)
×
442
    k = Int(xu >> 23) & 0x00ff - 150
×
443
    xu = (xu & 0x007f_ffff) | 0x0080_0000
×
444
    if k <= 0
×
445
        UInt128(xu >> -k)
×
446
    else
447
        UInt128(xu) << k
×
448
    end
449
end
450
function unsafe_trunc(::Type{Int128}, x::Float32)
×
451
    copysign(unsafe_trunc(UInt128,x) % Int128, x)
×
452
end
453

454
unsafe_trunc(::Type{UInt128}, x::Float16) = unsafe_trunc(UInt128, Float32(x))
×
455
unsafe_trunc(::Type{Int128}, x::Float16) = unsafe_trunc(Int128, Float32(x))
×
456

457
# matches convert methods
458
# also determines trunc, floor, ceil
459
round(::Type{Signed},   x::IEEEFloat, r::RoundingMode) = round(Int, x, r)
×
460
round(::Type{Unsigned}, x::IEEEFloat, r::RoundingMode) = round(UInt, x, r)
×
461
round(::Type{Integer},  x::IEEEFloat, r::RoundingMode) = round(Int, x, r)
×
462

463
round(x::IEEEFloat, ::RoundingMode{:ToZero})  = trunc_llvm(x)
27✔
UNCOV
464
round(x::IEEEFloat, ::RoundingMode{:Down})    = floor_llvm(x)
×
465
round(x::IEEEFloat, ::RoundingMode{:Up})      = ceil_llvm(x)
20✔
466
round(x::IEEEFloat, ::RoundingMode{:Nearest}) = rint_llvm(x)
13✔
467

468
rounds_up(x, ::RoundingMode{:Down}) = false
×
469
rounds_up(x, ::RoundingMode{:Up}) = true
×
470
rounds_up(x, ::RoundingMode{:ToZero}) = signbit(x)
×
471
rounds_up(x, ::RoundingMode{:FromZero}) = !signbit(x)
×
472
function _round_convert(::Type{T}, x_integer, x, r::Union{RoundingMode{:ToZero}, RoundingMode{:FromZero}, RoundingMode{:Up}, RoundingMode{:Down}}) where {T<:AbstractFloat}
×
473
    x_t = convert(T, x_integer)
×
474
    if rounds_up(x, r)
×
475
        x_t < x ? nextfloat(x_t) : x_t
×
476
    else
477
        x_t > x ? prevfloat(x_t) : x_t
×
478
    end
479
end
480

481
## floating point promotions ##
482
promote_rule(::Type{Float32}, ::Type{Float16}) = Float32
×
483
promote_rule(::Type{Float64}, ::Type{Float16}) = Float64
×
484
promote_rule(::Type{Float64}, ::Type{Float32}) = Float64
×
485

486
widen(::Type{Float16}) = Float32
×
487
widen(::Type{Float32}) = Float64
×
488

489
## floating point arithmetic ##
490
-(x::IEEEFloat) = neg_float(x)
×
491

492
+(x::T, y::T) where {T<:IEEEFloat} = add_float(x, y)
4✔
493
-(x::T, y::T) where {T<:IEEEFloat} = sub_float(x, y)
217✔
494
*(x::T, y::T) where {T<:IEEEFloat} = mul_float(x, y)
20✔
495
/(x::T, y::T) where {T<:IEEEFloat} = div_float(x, y)
7,949✔
496

497
muladd(x::T, y::T, z::T) where {T<:IEEEFloat} = muladd_float(x, y, z)
×
498

499
# TODO: faster floating point div?
500
# TODO: faster floating point fld?
501
# TODO: faster floating point mod?
502

503
function unbiased_exponent(x::T) where {T<:IEEEFloat}
504
    return (reinterpret(Unsigned, x) & exponent_mask(T)) >> significand_bits(T)
2✔
505
end
506

507
function explicit_mantissa_noinfnan(x::T) where {T<:IEEEFloat}
508
    m = mantissa(x)
2✔
509
    issubnormal(x) || (m |= significand_mask(T) + uinttype(T)(1))
4✔
510
    return m
2✔
511
end
512

513
function _to_float(number::U, ep) where {U<:Unsigned}
514
    F = floattype(U)
×
515
    S = signed(U)
×
516
    epint = unsafe_trunc(S,ep)
1✔
517
    lz::signed(U) = unsafe_trunc(S, Core.Intrinsics.ctlz_int(number) - U(exponent_bits(F)))
1✔
518
    number <<= lz
1✔
519
    epint -= lz
1✔
520
    bits = U(0)
×
521
    if epint >= 0
1✔
522
        bits = number & significand_mask(F)
1✔
523
        bits |= ((epint + S(1)) << significand_bits(F)) & exponent_mask(F)
1✔
524
    else
525
        bits = (number >> -epint) & significand_mask(F)
×
526
    end
527
    return reinterpret(F, bits)
1✔
528
end
529

530
@assume_effects :terminates_locally :nothrow function rem_internal(x::T, y::T) where {T<:IEEEFloat}
1✔
531
    xuint = reinterpret(Unsigned, x)
1✔
532
    yuint = reinterpret(Unsigned, y)
1✔
533
    if xuint <= yuint
1✔
534
        if xuint < yuint
×
535
            return x
×
536
        end
537
        return zero(T)
×
538
    end
539

540
    e_x = unbiased_exponent(x)
1✔
541
    e_y = unbiased_exponent(y)
1✔
542
    # Most common case where |y| is "very normal" and |x/y| < 2^EXPONENT_WIDTH
543
    if e_y > (significand_bits(T)) && (e_x - e_y) <= (exponent_bits(T))
1✔
544
        m_x = explicit_mantissa_noinfnan(x)
2✔
545
        m_y = explicit_mantissa_noinfnan(y)
2✔
546
        d = urem_int((m_x << (e_x - e_y)),  m_y)
1✔
547
        iszero(d) && return zero(T)
1✔
548
        return _to_float(d, e_y - uinttype(T)(1))
1✔
549
    end
550
    # Both are subnormals
551
    if e_x == 0 && e_y == 0
×
552
        return reinterpret(T, urem_int(xuint, yuint) & significand_mask(T))
×
553
    end
554

555
    m_x = explicit_mantissa_noinfnan(x)
×
556
    e_x -= uinttype(T)(1)
×
557
    m_y = explicit_mantissa_noinfnan(y)
×
558
    lz_m_y = uinttype(T)(exponent_bits(T))
×
559
    if e_y > 0
×
560
        e_y -= uinttype(T)(1)
×
561
    else
562
        m_y = mantissa(y)
×
563
        lz_m_y = Core.Intrinsics.ctlz_int(m_y)
×
564
    end
565

566
    tz_m_y = Core.Intrinsics.cttz_int(m_y)
×
567
    sides_zeroes_cnt = lz_m_y + tz_m_y
×
568

569
    # n>0
570
    exp_diff = e_x - e_y
×
571
    # Shift hy right until the end or n = 0
572
    right_shift = min(exp_diff, tz_m_y)
×
573
    m_y >>= right_shift
×
574
    exp_diff -= right_shift
×
575
    e_y += right_shift
×
576
    # Shift hx left until the end or n = 0
577
    left_shift = min(exp_diff, uinttype(T)(exponent_bits(T)))
×
578
    m_x <<= left_shift
×
579
    exp_diff -= left_shift
×
580

581
    m_x = urem_int(m_x, m_y)
×
582
    iszero(m_x) && return zero(T)
×
583
    iszero(exp_diff) && return _to_float(m_x, e_y)
×
584

585
    while exp_diff > sides_zeroes_cnt
×
586
        exp_diff -= sides_zeroes_cnt
×
587
        m_x <<= sides_zeroes_cnt
×
588
        m_x = urem_int(m_x, m_y)
×
589
    end
×
590
    m_x <<= exp_diff
×
591
    m_x = urem_int(m_x, m_y)
×
592
    return _to_float(m_x, e_y)
×
593
end
594

595
function rem(x::T, y::T) where {T<:IEEEFloat}
596
    if isfinite(x) && !iszero(x) && isfinite(y) && !iszero(y)
13✔
597
        return copysign(rem_internal(abs(x), abs(y)), x)
13✔
598
    elseif isinf(x) || isnan(y) || iszero(y)  # y can still be Inf
×
599
        return T(NaN)
×
600
    else
601
        return x
×
602
    end
603
end
604

605
function mod(x::T, y::T) where {T<:AbstractFloat}
606
    r = rem(x,y)
1✔
607
    if r == 0
1✔
608
        copysign(r,y)
×
609
    elseif (r > 0) ⊻ (y > 0)
1✔
610
        r+y
1✔
611
    else
612
        r
×
613
    end
614
end
615

616
## floating point comparisons ##
617
==(x::T, y::T) where {T<:IEEEFloat} = eq_float(x, y)
242✔
618
!=(x::T, y::T) where {T<:IEEEFloat} = ne_float(x, y)
147✔
619
<( x::T, y::T) where {T<:IEEEFloat} = lt_float(x, y)
577✔
620
<=(x::T, y::T) where {T<:IEEEFloat} = le_float(x, y)
37✔
621

622
isequal(x::T, y::T) where {T<:IEEEFloat} = fpiseq(x, y)
×
623

624
# interpret as sign-magnitude integer
625
@inline function _fpint(x)
626
    IntT = inttype(typeof(x))
×
627
    ix = reinterpret(IntT, x)
×
628
    return ifelse(ix < zero(IntT), ix ⊻ typemax(IntT), ix)
×
629
end
630

631
@inline function isless(a::T, b::T) where T<:IEEEFloat
632
    (isnan(a) || isnan(b)) && return !isnan(a)
×
633

634
    return _fpint(a) < _fpint(b)
×
635
end
636

637
# Exact Float (Tf) vs Integer (Ti) comparisons
638
# Assumes:
639
# - typemax(Ti) == 2^n-1
640
# - typemax(Ti) can't be exactly represented by Tf:
641
#   => Tf(typemax(Ti)) == 2^n or Inf
642
# - typemin(Ti) can be exactly represented by Tf
643
#
644
# 1. convert y::Ti to float fy::Tf
645
# 2. perform Tf comparison x vs fy
646
# 3. if x == fy, check if (1) resulted in rounding:
647
#  a. convert fy back to Ti and compare with original y
648
#  b. unsafe_convert undefined behaviour if fy == Tf(typemax(Ti))
649
#     (but consequently x == fy > y)
650
for Ti in (Int64,UInt64,Int128,UInt128)
651
    for Tf in (Float32,Float64)
652
        @eval begin
653
            function ==(x::$Tf, y::$Ti)
654
                fy = ($Tf)(y)
2✔
655
                (x == fy) & (fy != $(Tf(typemax(Ti)))) & (y == unsafe_trunc($Ti,fy))
4✔
656
            end
657
            ==(y::$Ti, x::$Tf) = x==y
×
658

659
            function <(x::$Ti, y::$Tf)
660
                fx = ($Tf)(x)
126✔
661
                (fx < y) | ((fx == y) & ((fx == $(Tf(typemax(Ti)))) | (x < unsafe_trunc($Ti,fx)) ))
127✔
662
            end
663
            function <=(x::$Ti, y::$Tf)
664
                fx = ($Tf)(x)
3✔
665
                (fx < y) | ((fx == y) & ((fx == $(Tf(typemax(Ti)))) | (x <= unsafe_trunc($Ti,fx)) ))
3✔
666
            end
667

668
            function <(x::$Tf, y::$Ti)
669
                fy = ($Tf)(y)
184✔
670
                (x < fy) | ((x == fy) & (fy < $(Tf(typemax(Ti)))) & (unsafe_trunc($Ti,fy) < y))
184✔
671
            end
672
            function <=(x::$Tf, y::$Ti)
673
                fy = ($Tf)(y)
122✔
674
                (x < fy) | ((x == fy) & (fy < $(Tf(typemax(Ti)))) & (unsafe_trunc($Ti,fy) <= y))
122✔
675
            end
676
        end
677
    end
678
end
679
for op in (:(==), :<, :<=)
680
    @eval begin
681
        ($op)(x::Float16, y::Union{Int128,UInt128,Int64,UInt64}) = ($op)(Float64(x), Float64(y))
×
682
        ($op)(x::Union{Int128,UInt128,Int64,UInt64}, y::Float16) = ($op)(Float64(x), Float64(y))
×
683

684
        ($op)(x::Union{Float16,Float32}, y::Union{Int32,UInt32}) = ($op)(Float64(x), Float64(y))
×
685
        ($op)(x::Union{Int32,UInt32}, y::Union{Float16,Float32}) = ($op)(Float64(x), Float64(y))
×
686

687
        ($op)(x::Float16, y::Union{Int16,UInt16}) = ($op)(Float32(x), Float32(y))
×
688
        ($op)(x::Union{Int16,UInt16}, y::Float16) = ($op)(Float32(x), Float32(y))
×
689
    end
690
end
691

692

693
abs(x::IEEEFloat) = abs_float(x)
14✔
694

695
"""
696
    isnan(f) -> Bool
697

698
Test whether a number value is a NaN, an indeterminate value which is neither an infinity
699
nor a finite number ("not a number").
700

701
See also: [`iszero`](@ref), [`isone`](@ref), [`isinf`](@ref), [`ismissing`](@ref).
702
"""
703
isnan(x::AbstractFloat) = (x != x)::Bool
103✔
704
isnan(x::Number) = false
×
705

706
isfinite(x::AbstractFloat) = !isnan(x - x)
101✔
707
isfinite(x::Real) = decompose(x)[3] != 0
×
708
isfinite(x::Integer) = true
×
709

710
"""
711
    isinf(f) -> Bool
712

713
Test whether a number is infinite.
714

715
See also: [`Inf`](@ref), [`iszero`](@ref), [`isfinite`](@ref), [`isnan`](@ref).
716
"""
717
isinf(x::Real) = !isnan(x) & !isfinite(x)
×
718
isinf(x::IEEEFloat) = abs(x) === oftype(x, Inf)
×
719

720
const hx_NaN = hash_uint64(reinterpret(UInt64, NaN))
721
function hash(x::Float64, h::UInt)
×
722
    # see comments on trunc and hash(Real, UInt)
723
    if typemin(Int64) <= x < typemax(Int64)
×
724
        xi = fptosi(Int64, x)
×
725
        if isequal(xi, x)
×
726
            return hash(xi, h)
×
727
        end
728
    elseif typemin(UInt64) <= x < typemax(UInt64)
×
729
        xu = fptoui(UInt64, x)
×
730
        if isequal(xu, x)
×
731
            return hash(xu, h)
×
732
        end
733
    elseif isnan(x)
×
734
        return hx_NaN ⊻ h # NaN does not have a stable bit pattern
×
735
    end
736
    return hash_uint64(bitcast(UInt64, x)) - 3h
×
737
end
738

739
hash(x::Float32, h::UInt) = hash(Float64(x), h)
×
740

741
function hash(x::Float16, h::UInt)
×
742
    # see comments on trunc and hash(Real, UInt)
743
    if isfinite(x) # all finite Float16 fit in Int64
×
744
        xi = fptosi(Int64, x)
×
745
        if isequal(xi, x)
×
746
            return hash(xi, h)
×
747
        end
748
    elseif isnan(x)
×
749
        return hx_NaN ⊻ h # NaN does not have a stable bit pattern
×
750
    end
751
    return hash_uint64(bitcast(UInt64, Float64(x))) - 3h
×
752
end
753

754
## generic hashing for rational values ##
755
function hash(x::Real, h::UInt)
×
756
    # decompose x as num*2^pow/den
757
    num, pow, den = decompose(x)
×
758

759
    # handle special values
760
    num == 0 && den == 0 && return hash(NaN, h)
×
761
    num == 0 && return hash(ifelse(den > 0, 0.0, -0.0), h)
×
762
    den == 0 && return hash(ifelse(num > 0, Inf, -Inf), h)
×
763

764
    # normalize decomposition
765
    if den < 0
×
766
        num = -num
×
767
        den = -den
×
768
    end
769
    num_z = trailing_zeros(num)
×
770
    num >>= num_z
×
771
    den_z = trailing_zeros(den)
×
772
    den >>= den_z
×
773
    pow += num_z - den_z
×
774
    # If the real can be represented as an Int64, UInt64, or Float64, hash as those types.
775
    # To be an Integer the denominator must be 1 and the power must be non-negative.
776
    if den == 1
×
777
        # left = ceil(log2(num*2^pow))
778
        left = top_set_bit(abs(num)) + pow
×
779
        # 2^-1074 is the minimum Float64 so if the power is smaller, not a Float64
780
        if -1074 <= pow
×
781
            if 0 <= pow # if pow is non-negative, it is an integer
×
782
                left <= 63 && return hash(Int64(num) << Int(pow), h)
×
783
                left <= 64 && !signbit(num) && return hash(UInt64(num) << Int(pow), h)
×
784
            end # typemin(Int64) handled by Float64 case
785
            # 2^1024 is the maximum Float64 so if the power is greater, not a Float64
786
            # Float64s only have 53 mantisa bits (including implicit bit)
787
            left <= 1024 && left - pow <= 53 && return hash(ldexp(Float64(num), pow), h)
×
788
        end
789
    else
790
        h = hash_integer(den, h)
×
791
    end
792
    # handle generic rational values
793
    h = hash_integer(pow, h)
×
794
    h = hash_integer(num, h)
×
795
    return h
×
796
end
797

798
#=
799
`decompose(x)`: non-canonical decomposition of rational values as `num*2^pow/den`.
800

801
The decompose function is the point where rational-valued numeric types that support
802
hashing hook into the hashing protocol. `decompose(x)` should return three integer
803
values `num, pow, den`, such that the value of `x` is mathematically equal to
804

805
    num*2^pow/den
806

807
The decomposition need not be canonical in the sense that it just needs to be *some*
808
way to express `x` in this form, not any particular way – with the restriction that
809
`num` and `den` may not share any odd common factors. They may, however, have powers
810
of two in common – the generic hashing code will normalize those as necessary.
811

812
Special values:
813

814
 - `x` is zero: `num` should be zero and `den` should have the same sign as `x`
815
 - `x` is infinite: `den` should be zero and `num` should have the same sign as `x`
816
 - `x` is not a number: `num` and `den` should both be zero
817
=#
818

819
decompose(x::Integer) = x, 0, 1
×
820

821
function decompose(x::Float16)::NTuple{3,Int}
×
822
    isnan(x) && return 0, 0, 0
×
823
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
×
824
    n = reinterpret(UInt16, x)
×
825
    s = (n & 0x03ff) % Int16
×
826
    e = ((n & 0x7c00) >> 10) % Int
×
827
    s |= Int16(e != 0) << 10
×
828
    d = ifelse(signbit(x), -1, 1)
×
829
    s, e - 25 + (e == 0), d
×
830
end
831

832
function decompose(x::Float32)::NTuple{3,Int}
×
833
    isnan(x) && return 0, 0, 0
×
834
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
×
835
    n = reinterpret(UInt32, x)
×
836
    s = (n & 0x007fffff) % Int32
×
837
    e = ((n & 0x7f800000) >> 23) % Int
×
838
    s |= Int32(e != 0) << 23
×
839
    d = ifelse(signbit(x), -1, 1)
×
840
    s, e - 150 + (e == 0), d
×
841
end
842

843
function decompose(x::Float64)::Tuple{Int64, Int, Int}
×
844
    isnan(x) && return 0, 0, 0
×
845
    isinf(x) && return ifelse(x < 0, -1, 1), 0, 0
×
846
    n = reinterpret(UInt64, x)
×
847
    s = (n & 0x000fffffffffffff) % Int64
×
848
    e = ((n & 0x7ff0000000000000) >> 52) % Int
×
849
    s |= Int64(e != 0) << 52
×
850
    d = ifelse(signbit(x), -1, 1)
×
851
    s, e - 1075 + (e == 0), d
×
852
end
853

854

855
"""
856
    precision(num::AbstractFloat; base::Integer=2)
857
    precision(T::Type; base::Integer=2)
858

859
Get the precision of a floating point number, as defined by the effective number of bits in
860
the significand, or the precision of a floating-point type `T` (its current default, if
861
`T` is a variable-precision type like [`BigFloat`](@ref)).
862

863
If `base` is specified, then it returns the maximum corresponding
864
number of significand digits in that base.
865

866
!!! compat "Julia 1.8"
867
    The `base` keyword requires at least Julia 1.8.
868
"""
869
function precision end
870

871
_precision_with_base_2(::Type{Float16}) = 11
×
872
_precision_with_base_2(::Type{Float32}) = 24
×
873
_precision_with_base_2(::Type{Float64}) = 53
×
874
function _precision(x, base::Integer)
×
875
    base > 1 || throw(DomainError(base, "`base` cannot be less than 2."))
×
876
    p = _precision_with_base_2(x)
×
877
    return base == 2 ? Int(p) : floor(Int, p / log2(base))
×
878
end
879
precision(::Type{T}; base::Integer=2) where {T<:AbstractFloat} = _precision(T, base)
×
880
precision(::T; base::Integer=2) where {T<:AbstractFloat} = precision(T; base)
×
881

882

883
"""
884
    nextfloat(x::AbstractFloat, n::Integer)
885

886
The result of `n` iterative applications of `nextfloat` to `x` if `n >= 0`, or `-n`
887
applications of [`prevfloat`](@ref) if `n < 0`.
888
"""
889
function nextfloat(f::IEEEFloat, d::Integer)
890
    F = typeof(f)
×
891
    fumax = reinterpret(Unsigned, F(Inf))
×
892
    U = typeof(fumax)
×
893

894
    isnan(f) && return f
×
895
    fi = reinterpret(Signed, f)
×
896
    fneg = fi < 0
×
897
    fu = unsigned(fi & typemax(fi))
×
898

899
    dneg = d < 0
×
900
    da = uabs(d)
×
901
    if da > typemax(U)
×
902
        fneg = dneg
×
903
        fu = fumax
×
904
    else
905
        du = da % U
×
906
        if fneg ⊻ dneg
×
907
            if du > fu
×
908
                fu = min(fumax, du - fu)
×
909
                fneg = !fneg
×
910
            else
911
                fu = fu - du
×
912
            end
913
        else
914
            if fumax - fu < du
×
915
                fu = fumax
×
916
            else
917
                fu = fu + du
×
918
            end
919
        end
920
    end
921
    if fneg
×
922
        fu |= sign_mask(F)
×
923
    end
924
    reinterpret(F, fu)
×
925
end
926

927
"""
928
    nextfloat(x::AbstractFloat)
929

930
Return the smallest floating point number `y` of the same type as `x` such that `x < y`.
931
If no such `y` exists (e.g. if `x` is `Inf` or `NaN`), then return `x`.
932

933
See also: [`prevfloat`](@ref), [`eps`](@ref), [`issubnormal`](@ref).
934
"""
935
nextfloat(x::AbstractFloat) = nextfloat(x,1)
×
936

937
"""
938
    prevfloat(x::AbstractFloat, n::Integer)
939

940
The result of `n` iterative applications of `prevfloat` to `x` if `n >= 0`, or `-n`
941
applications of [`nextfloat`](@ref) if `n < 0`.
942
"""
943
prevfloat(x::AbstractFloat, d::Integer) = nextfloat(x, -d)
×
944

945
"""
946
    prevfloat(x::AbstractFloat)
947

948
Return the largest floating point number `y` of the same type as `x` such that `y < x`.
949
If no such `y` exists (e.g. if `x` is `-Inf` or `NaN`), then return `x`.
950
"""
951
prevfloat(x::AbstractFloat) = nextfloat(x,-1)
×
952

953
for Ti in (Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128)
954
    for Tf in (Float16, Float32, Float64)
955
        if Ti <: Unsigned || sizeof(Ti) < sizeof(Tf)
956
            # Here `Tf(typemin(Ti))-1` is exact, so we can compare the lower-bound
957
            # directly. `Tf(typemax(Ti))+1` is either always exactly representable, or
958
            # rounded to `Inf` (e.g. when `Ti==UInt128 && Tf==Float32`).
959
            @eval begin
960
                function round(::Type{$Ti},x::$Tf,::RoundingMode{:ToZero})
×
961
                    if $(Tf(typemin(Ti))-one(Tf)) < x < $(Tf(typemax(Ti))+one(Tf))
×
962
                        return unsafe_trunc($Ti,x)
×
963
                    else
964
                        throw(InexactError(:round, $Ti, x, RoundToZero))
×
965
                    end
966
                end
967
                function (::Type{$Ti})(x::$Tf)
×
968
                    # When typemax(Ti) is not representable by Tf but typemax(Ti) + 1 is,
969
                    # then < Tf(typemax(Ti) + 1) is stricter than <= Tf(typemax(Ti)). Using
970
                    # the former causes us to throw on UInt64(Float64(typemax(UInt64))+1)
971
                    if ($(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti))+one(Tf))) && isinteger(x)
×
972
                        return unsafe_trunc($Ti,x)
×
973
                    else
974
                        throw(InexactError($(Expr(:quote,Ti.name.name)), $Ti, x))
×
975
                    end
976
                end
977
            end
978
        else
979
            # Here `eps(Tf(typemin(Ti))) > 1`, so the only value which can be truncated to
980
            # `Tf(typemin(Ti)` is itself. Similarly, `Tf(typemax(Ti))` is inexact and will
981
            # be rounded up. This assumes that `Tf(typemin(Ti)) > -Inf`, which is true for
982
            # these types, but not for `Float16` or larger integer types.
983
            @eval begin
984
                function round(::Type{$Ti},x::$Tf,::RoundingMode{:ToZero})
×
985
                    if $(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti)))
×
986
                        return unsafe_trunc($Ti,x)
×
987
                    else
988
                        throw(InexactError(:round, $Ti, x, RoundToZero))
×
989
                    end
990
                end
991
                function (::Type{$Ti})(x::$Tf)
992
                    if ($(Tf(typemin(Ti))) <= x < $(Tf(typemax(Ti)))) && isinteger(x)
27✔
993
                        return unsafe_trunc($Ti,x)
27✔
994
                    else
995
                        throw(InexactError($(Expr(:quote,Ti.name.name)), $Ti, x))
×
996
                    end
997
                end
998
            end
999
        end
1000
    end
1001
end
1002

1003
"""
1004
    issubnormal(f) -> Bool
1005

1006
Test whether a floating point number is subnormal.
1007

1008
An IEEE floating point number is [subnormal](https://en.wikipedia.org/wiki/Subnormal_number)
1009
when its exponent bits are zero and its significand is not zero.
1010

1011
# Examples
1012
```jldoctest
1013
julia> floatmin(Float32)
1014
1.1754944f-38
1015

1016
julia> issubnormal(1.0f-37)
1017
false
1018

1019
julia> issubnormal(1.0f-38)
1020
true
1021
```
1022
"""
1023
function issubnormal(x::T) where {T<:IEEEFloat}
1024
    y = reinterpret(Unsigned, x)
2✔
1025
    (y & exponent_mask(T) == 0) & (y & significand_mask(T) != 0)
2✔
1026
end
1027

1028
ispow2(x::AbstractFloat) = !iszero(x) && frexp(x)[1] == 0.5
×
1029
iseven(x::AbstractFloat) = isinteger(x) && (abs(x) > maxintfloat(x) || iseven(Integer(x)))
×
1030
isodd(x::AbstractFloat) = isinteger(x) && abs(x) ≤ maxintfloat(x) && isodd(Integer(x))
×
1031

1032
@eval begin
1033
    typemin(::Type{Float16}) = $(bitcast(Float16, 0xfc00))
×
1034
    typemax(::Type{Float16}) = $(Inf16)
×
1035
    typemin(::Type{Float32}) = $(-Inf32)
×
1036
    typemax(::Type{Float32}) = $(Inf32)
×
1037
    typemin(::Type{Float64}) = $(-Inf64)
×
1038
    typemax(::Type{Float64}) = $(Inf64)
×
1039
    typemin(x::T) where {T<:Real} = typemin(T)
×
1040
    typemax(x::T) where {T<:Real} = typemax(T)
×
1041

1042
    floatmin(::Type{Float16}) = $(bitcast(Float16, 0x0400))
×
1043
    floatmin(::Type{Float32}) = $(bitcast(Float32, 0x00800000))
×
1044
    floatmin(::Type{Float64}) = $(bitcast(Float64, 0x0010000000000000))
×
1045
    floatmax(::Type{Float16}) = $(bitcast(Float16, 0x7bff))
×
1046
    floatmax(::Type{Float32}) = $(bitcast(Float32, 0x7f7fffff))
×
1047
    floatmax(::Type{Float64}) = $(bitcast(Float64, 0x7fefffffffffffff))
×
1048

1049
    eps(::Type{Float16}) = $(bitcast(Float16, 0x1400))
×
1050
    eps(::Type{Float32}) = $(bitcast(Float32, 0x34000000))
×
1051
    eps(::Type{Float64}) = $(bitcast(Float64, 0x3cb0000000000000))
×
1052
    eps() = eps(Float64)
×
1053
end
1054

1055
eps(x::AbstractFloat) = isfinite(x) ? abs(x) >= floatmin(x) ? ldexp(eps(typeof(x)), exponent(x)) : nextfloat(zero(x)) : oftype(x, NaN)
×
1056

1057
function eps(x::T) where T<:IEEEFloat
×
1058
    # For isfinite(x), toggling the LSB will produce either prevfloat(x) or
1059
    # nextfloat(x) but will never change the sign or exponent.
1060
    # For !isfinite(x), this will map Inf to NaN and NaN to NaN or Inf.
1061
    y = reinterpret(T, reinterpret(Unsigned, x) ⊻ true)
×
1062
    # The absolute difference between these values is eps(x). This is true even
1063
    # for Inf/NaN values.
1064
    return abs(x - y)
×
1065
end
1066

1067
"""
1068
    floatmin(T = Float64)
1069

1070
Return the smallest positive normal number representable by the floating-point
1071
type `T`.
1072

1073
# Examples
1074
```jldoctest
1075
julia> floatmin(Float16)
1076
Float16(6.104e-5)
1077

1078
julia> floatmin(Float32)
1079
1.1754944f-38
1080

1081
julia> floatmin()
1082
2.2250738585072014e-308
1083
```
1084
"""
1085
floatmin(x::T) where {T<:AbstractFloat} = floatmin(T)
×
1086

1087
"""
1088
    floatmax(T = Float64)
1089

1090
Return the largest finite number representable by the floating-point type `T`.
1091

1092
See also: [`typemax`](@ref), [`floatmin`](@ref), [`eps`](@ref).
1093

1094
# Examples
1095
```jldoctest
1096
julia> floatmax(Float16)
1097
Float16(6.55e4)
1098

1099
julia> floatmax(Float32)
1100
3.4028235f38
1101

1102
julia> floatmax()
1103
1.7976931348623157e308
1104

1105
julia> typemax(Float64)
1106
Inf
1107
```
1108
"""
1109
floatmax(x::T) where {T<:AbstractFloat} = floatmax(T)
×
1110

1111
floatmin() = floatmin(Float64)
×
1112
floatmax() = floatmax(Float64)
×
1113

1114
"""
1115
    eps(::Type{T}) where T<:AbstractFloat
1116
    eps()
1117

1118
Return the *machine epsilon* of the floating point type `T` (`T = Float64` by
1119
default). This is defined as the gap between 1 and the next largest value representable by
1120
`typeof(one(T))`, and is equivalent to `eps(one(T))`.  (Since `eps(T)` is a
1121
bound on the *relative error* of `T`, it is a "dimensionless" quantity like [`one`](@ref).)
1122

1123
# Examples
1124
```jldoctest
1125
julia> eps()
1126
2.220446049250313e-16
1127

1128
julia> eps(Float32)
1129
1.1920929f-7
1130

1131
julia> 1.0 + eps()
1132
1.0000000000000002
1133

1134
julia> 1.0 + eps()/2
1135
1.0
1136
```
1137
"""
1138
eps(::Type{<:AbstractFloat})
1139

1140
"""
1141
    eps(x::AbstractFloat)
1142

1143
Return the *unit in last place* (ulp) of `x`. This is the distance between consecutive
1144
representable floating point values at `x`. In most cases, if the distance on either side
1145
of `x` is different, then the larger of the two is taken, that is
1146

1147
    eps(x) == max(x-prevfloat(x), nextfloat(x)-x)
1148

1149
The exceptions to this rule are the smallest and largest finite values
1150
(e.g. `nextfloat(-Inf)` and `prevfloat(Inf)` for [`Float64`](@ref)), which round to the
1151
smaller of the values.
1152

1153
The rationale for this behavior is that `eps` bounds the floating point rounding
1154
error. Under the default `RoundNearest` rounding mode, if ``y`` is a real number and ``x``
1155
is the nearest floating point number to ``y``, then
1156

1157
```math
1158
|y-x| \\leq \\operatorname{eps}(x)/2.
1159
```
1160

1161
See also: [`nextfloat`](@ref), [`issubnormal`](@ref), [`floatmax`](@ref).
1162

1163
# Examples
1164
```jldoctest
1165
julia> eps(1.0)
1166
2.220446049250313e-16
1167

1168
julia> eps(prevfloat(2.0))
1169
2.220446049250313e-16
1170

1171
julia> eps(2.0)
1172
4.440892098500626e-16
1173

1174
julia> x = prevfloat(Inf)      # largest finite Float64
1175
1.7976931348623157e308
1176

1177
julia> x + eps(x)/2            # rounds up
1178
Inf
1179

1180
julia> x + prevfloat(eps(x)/2) # rounds down
1181
1.7976931348623157e308
1182
```
1183
"""
1184
eps(::AbstractFloat)
1185

1186

1187
## byte order swaps for arbitrary-endianness serialization/deserialization ##
1188
bswap(x::IEEEFloat) = bswap_int(x)
×
1189

1190
# integer size of float
1191
uinttype(::Type{Float64}) = UInt64
×
1192
uinttype(::Type{Float32}) = UInt32
×
1193
uinttype(::Type{Float16}) = UInt16
×
1194
inttype(::Type{Float64}) = Int64
×
1195
inttype(::Type{Float32}) = Int32
×
1196
inttype(::Type{Float16}) = Int16
×
1197
# float size of integer
1198
floattype(::Type{UInt64}) = Float64
×
1199
floattype(::Type{UInt32}) = Float32
×
1200
floattype(::Type{UInt16}) = Float16
×
1201
floattype(::Type{Int64}) = Float64
×
1202
floattype(::Type{Int32}) = Float32
×
1203
floattype(::Type{Int16}) = Float16
×
1204

1205

1206
## Array operations on floating point numbers ##
1207

1208
float(A::AbstractArray{<:AbstractFloat}) = A
×
1209

1210
function float(A::AbstractArray{T}) where T
×
1211
    if !isconcretetype(T)
×
1212
        error("`float` not defined on abstractly-typed arrays; please convert to a more specific type")
×
1213
    end
1214
    convert(AbstractArray{typeof(float(zero(T)))}, A)
×
1215
end
1216

1217
float(r::StepRange) = float(r.start):float(r.step):float(last(r))
×
1218
float(r::UnitRange) = float(r.start):float(last(r))
×
1219
float(r::StepRangeLen{T}) where {T} =
×
1220
    StepRangeLen{typeof(float(T(r.ref)))}(float(r.ref), float(r.step), length(r), r.offset)
1221
function float(r::LinRange)
×
1222
    LinRange(float(r.start), float(r.stop), length(r))
×
1223
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc