24944226215

Committed 26 Apr 2026 12:27AM UTC coverage: 84.328% (+0.01%) from 84.317%

Build # 24944226215

Build Type

push

github

Committed by

web-flow

Commit Message

feat(math): add sw::math::constexpr_math facility with constexpr log2 (#762)

* feat(math): add sw::math::constexpr_math facility with constexpr log2

Introduces a new self-contained library of constexpr math functions for
IEEE-754 float and double, parallel to the existing sw::math::function
facility but specifically for compile-time evaluation. Per the issue
brief: "constexpr is a slightly different use case than templated math
functions ... it would be nice to have this as a separate and
specialized include set ... independently useful outside of Universal's
number systems".

Layout:
  include/sw/math/constexpr_math.hpp       -- aggregator
  include/sw/math/constexpr_math/log2.hpp  -- base-2 logarithm
  internal/constexpr_math/api/log2.cpp     -- regression test (cm_log2)

Namespace: sw::math::constexpr_math

Algorithm (log2):
  1. Decompose x = 2^E * M with M in [1, 2) via std::bit_cast<uint64_t>.
  2. Range-reduce to M in [1/sqrt(2), sqrt(2)] by halving M and
     incrementing E when M > sqrt(2).
  3. With u = (M-1)/(M+1), |u| <= 0.1716, evaluate
       ln(M) = 2 * artanh(u) = 2 * (u + u^3/3 + u^5/5 + ...)
     via Horner. Coefficients are odd reciprocals (1/3, 1/5, ...) --
     derivable from first principles, no external tooling
     (no Sollya, no Remez).
  4. log2(x) = E + ln(M) / ln(2)

Polynomial degree:
  - double: degree 21 (terms up to u^21/21) -> ~5e-17 truncation error
  - float : degree 11 (terms up to u^11/11) -> ~1e-9 truncation error

Special-value handling follows IEEE-754:
  log2(NaN)  -> NaN
  log2(x<0)  -> NaN
  log2(0)    -> -infinity
  log2(+inf) -> +infinity

Subnormals are normalized via a software shift loop (constexpr-safe).

Verification:
  - 22 static_assert checks for exact powers of 2, special values,
    log2(3.14) and log2(0.75) acceptance forms
  - Runtime sweep over 23 hand-picked points + subnormal sample
  - Out-of-band stress sweep (100K random samples in [1e-100, 1e100]):
    max relative error 1.74e-16 = 0.78... (continued)

Coverage Stats

90 of 104 new or added lines in 2 files covered. (86.54%)

4 existing lines in 3 files now uncovered.

45103 of 53485 relevant lines covered (84.33%)

6403422.69 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

91.55

/include/sw/math/constexpr_math/log2.hpp

#pragma once
// log2.hpp: constexpr base-2 logarithm for IEEE-754 float and double
//
// Copyright (C) 2017 Stillwater Supercomputing, Inc.
// SPDX-License-Identifier: MIT
//
// This file is part of the universal numbers project, which is released under an MIT Open Source license.
//
// -----------------------------------------------------------------------------
// Algorithm
// -----------------------------------------------------------------------------
// Given x > 0, IEEE-754 representation gives x = 2^E * M with M in [1, 2).
// We further range-reduce so that M is in [1/sqrt(2), sqrt(2)] by halving M
// and incrementing E when M > sqrt(2). Then with
//
//     u = (M - 1) / (M + 1),   |u| <= (sqrt(2) - 1)/(sqrt(2) + 1) ~ 0.1716
//
// the identity
//
//     ln(M) = 2 * artanh(u) = 2 * (u + u^3/3 + u^5/5 + u^7/7 + ...)
//
// converges very quickly because |u^2| <= 0.0294. The series uses only odd
// reciprocals (1, 1/3, 1/5, ...) so the coefficients are derivable from first
// principles without any external tooling (no Sollya, no Remez, no minimax).
//
// Finally  log2(x) = E + ln(M) / ln(2).
//
// Polynomial degree is chosen so the truncation error is below 1 ulp:
//   - float : degree 11 (terms up to u^11 / 11) gives ~1e-9 error
//   - double: degree 21 (terms up to u^21 / 21) gives ~5e-17 error
//
// -----------------------------------------------------------------------------
// Why is_constant_evaluated dispatch is NOT needed
// -----------------------------------------------------------------------------
// std::bit_cast (C++20) is constexpr for trivially-copyable types. Floating-
// point arithmetic in constexpr is allowed since C++14 on gcc/clang. So the
// same code path works at compile time and at runtime.

#include <bit>
#include <cstdint>
#include <limits>
#include <type_traits>

namespace sw { namespace math { namespace constexpr_math {

namespace detail {

// IEEE-754 layout guards: this implementation decodes fixed exponent/mantissa
// bit positions via std::bit_cast. On platforms whose floating-point format is
// not IEC 559 (e.g., IBM hex float on z/Architecture, VAX), the bit pattern is
// different and the algorithm would silently produce wrong results. Fail at
// compile time rather than at runtime.
static_assert(std::numeric_limits<float>::is_iec559,
              "sw::math::constexpr_math requires IEEE-754 single-precision (IEC 559) float");
static_assert(std::numeric_limits<float>::digits == 24,
              "sw::math::constexpr_math requires float with 24-bit significand (1 implicit + 23 stored)");
static_assert(std::numeric_limits<double>::is_iec559,
              "sw::math::constexpr_math requires IEEE-754 double-precision (IEC 559) double");
static_assert(std::numeric_limits<double>::digits == 53,
              "sw::math::constexpr_math requires double with 53-bit significand (1 implicit + 52 stored)");

// 1 / ln(2), to full double precision. Hardcoded as the well-known value
// log2(e) = 1.4426950408889634073599246810018921... (cf. C99 M_LOG2E).
// Verifiable independently by computing ln(2) via the same atanh series and
// taking the reciprocal.
inline constexpr double LOG2E = 1.4426950408889634;
inline constexpr double SQRT2 = 1.4142135623730951;  // sqrt(2)

}  // namespace detail

// constexpr log2(double): returns the base-2 logarithm of x.
// Special values follow IEEE-754 conventions:
//   log2(NaN)  -> NaN
//   log2(x<0)  -> NaN
//   log2(0)    -> -infinity
//   log2(+inf) -> +infinity
constexpr double log2(double x) {
        if (x != x) return x;                                                // NaN propagation
        if (x < 0.0) return std::numeric_limits<double>::quiet_NaN();
        if (x == 0.0) return -std::numeric_limits<double>::infinity();
        if (x == std::numeric_limits<double>::infinity()) return x;

        // Decompose x into 2^E * M with M in [1, 2).
        std::uint64_t bits = std::bit_cast<std::uint64_t>(x);
        int exp_field = static_cast<int>((bits >> 52) & 0x7FFu);

        // Subnormal: normalize by counting how many bits we need to shift the
        // mantissa left so its MSB sits at bit 52 (the implicit-1 position).
        int subnormal_shift = 0;
        if (exp_field == 0) {
                std::uint64_t mant = bits & ((std::uint64_t{1} << 52) - 1);
                // mant != 0 because x != 0 (handled above)
                while ((mant & (std::uint64_t{1} << 52)) == 0) {
                        mant <<= 1;
                        ++subnormal_shift;
                }
                bits = mant;
                exp_field = 1;  // base exponent for the renormalized mantissa
        }

        int E = exp_field - 1023 - subnormal_shift;

        // Reconstruct M in [1, 2) by clearing the exponent field and setting it
        // to bias (1023). Sign bit is necessarily zero (we already returned for x<0).
        std::uint64_t m_bits = (bits & ((std::uint64_t{1} << 52) - 1))
                               | (std::uint64_t{1023} << 52);
        double M = std::bit_cast<double>(m_bits);

        // Range reduction: shift M into [1/sqrt(2), sqrt(2)] for fast convergence.
        if (M > detail::SQRT2) {
                M *= 0.5;
                E += 1;
        }

        // u = (M - 1) / (M + 1), |u| <= 0.1716
        double u = (M - 1.0) / (M + 1.0);
        double u2 = u * u;

        // Horner evaluation of the artanh series
        //   artanh(u) = u * (1 + u^2/3 + u^4/5 + ... + u^20/21)
        // from the smallest term down. Degree 21 -> ~5e-17 error.
        double s = 1.0 / 21.0;
        s = u2 * s + 1.0 / 19.0;
        s = u2 * s + 1.0 / 17.0;
        s = u2 * s + 1.0 / 15.0;
        s = u2 * s + 1.0 / 13.0;
        s = u2 * s + 1.0 / 11.0;
        s = u2 * s + 1.0 / 9.0;
        s = u2 * s + 1.0 / 7.0;
        s = u2 * s + 1.0 / 5.0;
        s = u2 * s + 1.0 / 3.0;
        s = u2 * s + 1.0;
        double artanh = u * s;

        // log2(M) = (2 * artanh(u)) * (1 / ln(2))
        double log2M = 2.0 * artanh * detail::LOG2E;

        return static_cast<double>(E) + log2M;
}

// constexpr log2(float): same algorithm at lower precision (degree 11 series).
constexpr float log2(float x) {
        if (x != x) return x;
        if (x < 0.0f) return std::numeric_limits<float>::quiet_NaN();
        if (x == 0.0f) return -std::numeric_limits<float>::infinity();
        if (x == std::numeric_limits<float>::infinity()) return x;

        std::uint32_t bits = std::bit_cast<std::uint32_t>(x);
        int exp_field = static_cast<int>((bits >> 23) & 0xFFu);

        int subnormal_shift = 0;
        if (exp_field == 0) {
                std::uint32_t mant = bits & ((std::uint32_t{1} << 23) - 1);
                while ((mant & (std::uint32_t{1} << 23)) == 0) {
                        mant <<= 1;
                        ++subnormal_shift;
                }
                bits = mant;
                exp_field = 1;
        }

        int E = exp_field - 127 - subnormal_shift;

        std::uint32_t m_bits = (bits & ((std::uint32_t{1} << 23) - 1))
                               | (std::uint32_t{127} << 23);
        float M = std::bit_cast<float>(m_bits);

        if (M > static_cast<float>(detail::SQRT2)) {
                M *= 0.5f;
                E += 1;
        }

        float u = (M - 1.0f) / (M + 1.0f);
        float u2 = u * u;

        // Degree 11: ~1e-9 error, well below float epsilon (~1.2e-7).
        float s = 1.0f / 11.0f;
        s = u2 * s + 1.0f / 9.0f;
        s = u2 * s + 1.0f / 7.0f;
        s = u2 * s + 1.0f / 5.0f;
        s = u2 * s + 1.0f / 3.0f;
        s = u2 * s + 1.0f;
        float artanh = u * s;

        float log2M = 2.0f * artanh * static_cast<float>(detail::LOG2E);

        return static_cast<float>(E) + log2M;
}

}}}  // namespace sw::math::constexpr_math

1	#pragma once
2	// log2.hpp: constexpr base-2 logarithm for IEEE-754 float and double
3	//
4	// Copyright (C) 2017 Stillwater Supercomputing, Inc.
5	// SPDX-License-Identifier: MIT
6	//
7	// This file is part of the universal numbers project, which is released under an MIT Open Source license.
8	//
9	// -----------------------------------------------------------------------------
10	// Algorithm
11	// -----------------------------------------------------------------------------
12	// Given x > 0, IEEE-754 representation gives x = 2^E * M with M in [1, 2).
13	// We further range-reduce so that M is in [1/sqrt(2), sqrt(2)] by halving M
14	// and incrementing E when M > sqrt(2). Then with
15	//
16	// u = (M - 1) / (M + 1), \|u\| <= (sqrt(2) - 1)/(sqrt(2) + 1) ~ 0.1716
17	//
18	// the identity
19	//
20	// ln(M) = 2 * artanh(u) = 2 * (u + u^3/3 + u^5/5 + u^7/7 + ...)
21	//
22	// converges very quickly because \|u^2\| <= 0.0294. The series uses only odd
23	// reciprocals (1, 1/3, 1/5, ...) so the coefficients are derivable from first
24	// principles without any external tooling (no Sollya, no Remez, no minimax).
25	//
26	// Finally log2(x) = E + ln(M) / ln(2).
27	//
28	// Polynomial degree is chosen so the truncation error is below 1 ulp:
29	// - float : degree 11 (terms up to u^11 / 11) gives ~1e-9 error
30	// - double: degree 21 (terms up to u^21 / 21) gives ~5e-17 error
31	//
32	// -----------------------------------------------------------------------------
33	// Why is_constant_evaluated dispatch is NOT needed
34	// -----------------------------------------------------------------------------
35	// std::bit_cast (C++20) is constexpr for trivially-copyable types. Floating-
36	// point arithmetic in constexpr is allowed since C++14 on gcc/clang. So the
37	// same code path works at compile time and at runtime.
38
39	#include <bit>
40	#include <cstdint>
41	#include <limits>
42	#include <type_traits>
43
44	namespace sw { namespace math { namespace constexpr_math {
45
46	namespace detail {
47
48	// IEEE-754 layout guards: this implementation decodes fixed exponent/mantissa
49	// bit positions via std::bit_cast. On platforms whose floating-point format is
50	// not IEC 559 (e.g., IBM hex float on z/Architecture, VAX), the bit pattern is
51	// different and the algorithm would silently produce wrong results. Fail at
52	// compile time rather than at runtime.
53	static_assert(std::numeric_limits<float>::is_iec559,
54	"sw::math::constexpr_math requires IEEE-754 single-precision (IEC 559) float");
55	static_assert(std::numeric_limits<float>::digits == 24,
56	"sw::math::constexpr_math requires float with 24-bit significand (1 implicit + 23 stored)");
57	static_assert(std::numeric_limits<double>::is_iec559,
58	"sw::math::constexpr_math requires IEEE-754 double-precision (IEC 559) double");
59	static_assert(std::numeric_limits<double>::digits == 53,
60	"sw::math::constexpr_math requires double with 53-bit significand (1 implicit + 52 stored)");
61
62	// 1 / ln(2), to full double precision. Hardcoded as the well-known value
63	// log2(e) = 1.4426950408889634073599246810018921... (cf. C99 M_LOG2E).
64	// Verifiable independently by computing ln(2) via the same atanh series and
65	// taking the reciprocal.
66	inline constexpr double LOG2E = 1.4426950408889634;
67	inline constexpr double SQRT2 = 1.4142135623730951; // sqrt(2)
68
69	} // namespace detail
70
71	// constexpr log2(double): returns the base-2 logarithm of x.
72	// Special values follow IEEE-754 conventions:
73	// log2(NaN) -> NaN
74	// log2(x<0) -> NaN
75	// log2(0) -> -infinity
76	// log2(+inf) -> +infinity
77	constexpr double log2(double x) {	24✔
78	if (x != x) return x; // NaN propagation	24✔
79	if (x < 0.0) return std::numeric_limits<double>::quiet_NaN();	24✔
80	if (x == 0.0) return -std::numeric_limits<double>::infinity();	24✔
81	if (x == std::numeric_limits<double>::infinity()) return x;	24✔
82
83	// Decompose x into 2^E * M with M in [1, 2).
84	std::uint64_t bits = std::bit_cast<std::uint64_t>(x);	24✔
85	int exp_field = static_cast<int>((bits >> 52) & 0x7FFu);	24✔
86
87	// Subnormal: normalize by counting how many bits we need to shift the
88	// mantissa left so its MSB sits at bit 52 (the implicit-1 position).
89	int subnormal_shift = 0;	24✔
90	if (exp_field == 0) {	24✔
91	std::uint64_t mant = bits & ((std::uint64_t{1} << 52) - 1);	1✔
92	// mant != 0 because x != 0 (handled above)
93	while ((mant & (std::uint64_t{1} << 52)) == 0) {	3✔
94	mant <<= 1;	2✔
95	++subnormal_shift;	2✔
96	}
97	bits = mant;	1✔
98	exp_field = 1; // base exponent for the renormalized mantissa	1✔
99	}
100
101	int E = exp_field - 1023 - subnormal_shift;	24✔
102
103	// Reconstruct M in [1, 2) by clearing the exponent field and setting it
104	// to bias (1023). Sign bit is necessarily zero (we already returned for x<0).
105	std::uint64_t m_bits = (bits & ((std::uint64_t{1} << 52) - 1))	24✔
106	\| (std::uint64_t{1023} << 52);	24✔
107	double M = std::bit_cast<double>(m_bits);	24✔
108
109	// Range reduction: shift M into [1/sqrt(2), sqrt(2)] for fast convergence.
110	if (M > detail::SQRT2) {	24✔
111	M *= 0.5;	11✔
112	E += 1;	11✔
113	}
114
115	// u = (M - 1) / (M + 1), \|u\| <= 0.1716
116	double u = (M - 1.0) / (M + 1.0);	24✔
117	double u2 = u * u;	24✔
118
119	// Horner evaluation of the artanh series
120	// artanh(u) = u * (1 + u^2/3 + u^4/5 + ... + u^20/21)
121	// from the smallest term down. Degree 21 -> ~5e-17 error.
122	double s = 1.0 / 21.0;	24✔
123	s = u2 * s + 1.0 / 19.0;	24✔
124	s = u2 * s + 1.0 / 17.0;	24✔
125	s = u2 * s + 1.0 / 15.0;	24✔
126	s = u2 * s + 1.0 / 13.0;	24✔
127	s = u2 * s + 1.0 / 11.0;	24✔
128	s = u2 * s + 1.0 / 9.0;	24✔
129	s = u2 * s + 1.0 / 7.0;	24✔
130	s = u2 * s + 1.0 / 5.0;	24✔
131	s = u2 * s + 1.0 / 3.0;	24✔
132	s = u2 * s + 1.0;	24✔
133	double artanh = u * s;	24✔
134
135	// log2(M) = (2 * artanh(u)) * (1 / ln(2))
136	double log2M = 2.0 * artanh * detail::LOG2E;	24✔
137
138	return static_cast<double>(E) + log2M;	24✔
139	}
140
141	// constexpr log2(float): same algorithm at lower precision (degree 11 series).
142	constexpr float log2(float x) {	9✔
143	if (x != x) return x;	9✔
144	if (x < 0.0f) return std::numeric_limits<float>::quiet_NaN();	9✔
145	if (x == 0.0f) return -std::numeric_limits<float>::infinity();	9✔
146	if (x == std::numeric_limits<float>::infinity()) return x;	9✔
147
148	std::uint32_t bits = std::bit_cast<std::uint32_t>(x);	9✔
149	int exp_field = static_cast<int>((bits >> 23) & 0xFFu);	9✔
150
151	int subnormal_shift = 0;	9✔
152	if (exp_field == 0) {	9✔
NEW 153	std::uint32_t mant = bits & ((std::uint32_t{1} << 23) - 1);	×
NEW 154	while ((mant & (std::uint32_t{1} << 23)) == 0) {	×
NEW 155	mant <<= 1;	×
NEW 156	++subnormal_shift;	×
157	}
NEW 158	bits = mant;	×
NEW 159	exp_field = 1;	×
160	}
161
162	int E = exp_field - 127 - subnormal_shift;	9✔
163
164	std::uint32_t m_bits = (bits & ((std::uint32_t{1} << 23) - 1))	9✔
165	\| (std::uint32_t{127} << 23);	9✔
166	float M = std::bit_cast<float>(m_bits);	9✔
167
168	if (M > static_cast<float>(detail::SQRT2)) {	9✔
169	M *= 0.5f;	4✔
170	E += 1;	4✔
171	}
172
173	float u = (M - 1.0f) / (M + 1.0f);	9✔
174	float u2 = u * u;	9✔
175
176	// Degree 11: ~1e-9 error, well below float epsilon (~1.2e-7).
177	float s = 1.0f / 11.0f;	9✔
178	s = u2 * s + 1.0f / 9.0f;	9✔
179	s = u2 * s + 1.0f / 7.0f;	9✔
180	s = u2 * s + 1.0f / 5.0f;	9✔
181	s = u2 * s + 1.0f / 3.0f;	9✔
182	s = u2 * s + 1.0f;	9✔
183	float artanh = u * s;	9✔
184
185	float log2M = 2.0f * artanh * static_cast<float>(detail::LOG2E);	9✔
186
187	return static_cast<float>(E) + log2M;	9✔
188	}
189
190	}}} // namespace sw::math::constexpr_math

stillwater-sc / universal / 24944226215

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source File
Press 'n' to go to next uncovered line, 'b' for previous