1
85%
main: 85%

Ran 09 Feb 2026 06:11PM UTC

Files 13

Run time 0s

Badge

Embed ▾

Committed 09 Feb 2026 06:10PM UTC coverage: 84.812% (+0.01%) from 84.802%

Job # 21835834301.1

Build Type

push

github

Committed by

web-flow

Commit Message

Rewrite to_lower/1 using multi-clause pattern matching for 1.05x-1.32x (#114)

Replace the binary comprehension approach with cascading pattern match
clauses (32-16-8-4-1 bytes), eliminating the expensive remainder-based
dispatch that dominated runtime for most input sizes.

-- Previous implementation

The old to_lower/1 delegated to to_lower_chunk/1 which selected a binary
comprehension chunk size based on divisibility:

    to_lower_chunk(Data) when byte_size(Data) rem 8 =:= 0 ->
        << <<lower(B0), ..., lower(B7)>> || <<B0,...,B7>> <= Data >>;
    to_lower_chunk(Data) when byte_size(Data) rem 7 =:= 0 -> ...
    to_lower_chunk(Data) when byte_size(Data) rem 6 =:= 0 -> ...
    ...down to rem 2, then a 1-byte fallback.

-- New implementation

    do_to_lower(<<B0,...,B31, Rest/binary>>, Acc) ->  %% 32 bytes
        ...
    do_to_lower(<<B0,...,B15, Rest/binary>>, Acc) ->  %% 16 bytes
        ...
    do_to_lower(<<B0,...,B7, Rest/binary>>, Acc) ->   %% 8 bytes
        ...
    do_to_lower(<<B0,B1,B2,B3, Rest/binary>>, Acc) -> %% 4 bytes
        ...
    do_to_lower(<<B0, Rest/binary>>, Acc) ->           %% 1 byte
        ...
    do_to_lower(<<>>, Acc) -> Acc.

-- Why the old approach was slow: x86_64 assembly analysis

The old to_lower_chunk/1 dispatch chain compiled to this on x86_64:

    1. byte_size % 8 == 0?  → lbc$^0   (bitwise AND — cheap)
    2. byte_size % 7 == 0?  → lbc$^1   (idiv r9 — ~20-90 cycles)
    3. byte_size % 6 == 0?  → lbc$^2   (idiv r9 — ~20-90 cycles)
    4. byte_size % 5 == 0?  → lbc$^3   (idiv r9 — ~20-90 cycles)
    5. byte_size % 4 == 0?  → lbc$^4   (bitwise AND — cheap)
    6. byte_size % 3 == 0?  → lbc$^5   (idiv r9 — ~20-90 cycles)
    7. byte_size % 2 == 0?  → lbc$^6   (bitwise AND — cheap)
    8. else                  → lbc$^7   (1 byte at a time)

For sizes not divisible by 8 (the common case), the CPU must execute
up to 4 idiv instructions just to select the chunk size. Primes and
sizes with poor divisibility (e.g. 1... (continued)

Run Details

2619 of 3088 relevant lines covered (84.81%)

3417.3 hits per line

dnsimple / dns_erlang / 21835834301 / 1
85%
main: 85%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 21835834301.1

dnsimple / dns_erlang / 21835834301 / 1 85% main: 85%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 21835834301.1

dnsimple / dns_erlang / 21835834301 / 1
85%
main: 85%

README BADGES
x