18145139209

Committed 30 Sep 2025 10:26PM UTC coverage: 43.572% (-5.2%) from 48.731%

Build # 18145139209

Build Type

push

github

Committed by

vigna

Commit Message

Unified test

Run Details

7 of 11 new or added lines in 2 files covered. (63.64%)

323 existing lines in 12 files now uncovered.

1559 of 3578 relevant lines covered (43.57%)

2347492.34 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

79.28

/src/codes/vbyte.rs

/*
 * SPDX-FileCopyrightText: 2023 Tommaso Fontana
 * SPDX-FileCopyrightText: 2023 Sebastiano Vigna
 *
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
 */

//! Variable-length byte codes.
//!
//! These codes represent a natural number as a sequence of bytes, the length of
//! the sequence depends on the magnitude of the number. They are used in many
//! contexts, and they are known under a plethora of different names such
//! “vbyte”, “varint”, “[variable-length
//! quantity](https://en.wikipedia.org/wiki/Variable-length_quantity)”, “LEB”,
//! and so on.
//!
//! There are several variants of their definition, but their implied
//! distribution is always ≈ 1/*x*<sup>8/7</sup>
//!
//! The supported range is [0 . . 2⁶⁴).
//!
//! # Definition
//!
//! Since there are a few slightly different variants used in production code
//! and in the literature, before going into the details of this implementation
//! we will try to define a clear taxonomy by explaining in detail the four
//! three properties that define such variants.
//!
//! The simplest variable-length byte code encodes a number with a binary
//! representation of *k* bits using ⌈*k* / 7⌉ bytes. The binary representation
//! is left-padded with zeros so to obtain exactly ⌈*k* / 7⌉ blocks of 7 bits.
//! Each block is stored in a byte, prefixed with a continuation bit which is
//! one for all blocks except for the last one.
//!
//! ## Endianness
//!
//! The first property is the endianness of the bytes: in big-endian codes, the
//! first byte contains the highest (most significant) bits, whereas in
//! little-endian codes, the first byte contains the lowest (less significant)
//! bits.
//!
//! The advantage of the big-endian variant is that is lexicographical, that is,
//! comparing lexicographically a stream of encoded natural numbers will give
//! the same results as comparing lexicographically the encoded numbers, much
//! like it happens for [UTF-8 encoding](https://en.wikipedia.org/wiki/UTF-8).
//!
//! ## Completeness
//!
//! This basic representation discussed above is not *complete*, as there are
//! sequences that are not used. For example, zero can be written in many ways
//! (e.g., `0x00` or `0x80 0x00` ), but we are using only the single-byte
//! representation. Incompleteness leads to a (small) loss in compression.
//!
//! To have completeness, one can offset the representation in *k* bits by the
//! maximum number representable using *k* − 1 bits. That is, we represent the
//! interval [0..2⁷) with one byte, then the interval [2⁷..2⁷ + 2¹⁴] with two
//! bytes, the interval [2⁷ + 2¹⁴..2⁷ + 2¹⁴ + 2²¹] with three bytes, and so on.
//!
//! ## Grouping
//!
//! In the basic representation, the continuation bit is the most significant
//! bit of each byte. However, one can gather all continuation bits in the first
//! byte ([as UTF-8 does](https://en.wikipedia.org/wiki/UTF-8)). This approach
//! makes it possible to compute the length of the code using a call to
//! [`usize::leading_ones`] on the first negated byte, which usually maps to a
//! negation and a call to a fast instruction for the detection of the most
//! significant bit, improving branch prediction.
//!
//! Note that if the code is grouped, choosing a code with the same endianness
//! as your hardware can lead to a performance improvement, as after the first
//! byte the rest of the code can be read with a
//![`read_exact`](std::io::Read::read_exact).
//!
//! ## Sign
//!
//! It is possible to extend the codes to represent signed integers. Two
//! possible approaches are using a [bijective mapping](crate::codes::ToInt)
//! between the integers and the natural numbers, or defining a specialized
//! code.
//!
//! # Implementations
//!
//! We provide two unsigned, ungrouped, complete representations, one
//! [big-endian](VByteBeRead) and one [little-endian](VByteLeRead). We recommend
//! in general big-endian version as it is lexicographical. However, the
//! big-endian version needs a small buffer when writing, so on some hardware
//! the little-endian version might be faster.
//!
//! Note that the endianness of the code is independent from the endianness of
//! the underlying bit stream, as it just depend on the order in which bytes are
//! written.
//!
//! Since this code is byte-aligned, we provide also convenient, fast methods
//! [`vbyte_write`] and [`vbyte_read`] that can be used on types implementing
//! [`std::io::Read`] and [`std::io::Write`]. There are also endianness-specific
//! methods [`vbyte_write_be`], [`vbyte_write_le`], [`vbyte_read_be`], and
//! [`vbyte_read_le`].
//!
//! [`LEB128`]: https://en.wikipedia.org/wiki/LEB128
//! [`VLQ`]: https://en.wikipedia.org/wiki/Variable-length_quantity
//!
//! # Examples
//!
//! - The [LEB128](https://en.wikipedia.org/wiki/LEB128) code used by LLVM is a
//!   little-endian incomplete ungrouped representation. There is both a signed
//!   and an unsigned version; the signed version represent negative numbers
//!   using two's complement.
//!
//! - The [code used by
//!   git](https://github.com/git/git/blob/7fb6aefd2aaffe66e614f7f7b83e5b7ab16d4806/varint.c)
//!   is a big-endian complete ungrouped representation.
//!
//! - [This implementation in
//!   folly](https://github.com/facebook/folly/blob/dd4a5eb057afbc3c7c7da71801df2ee3c61c47d1/folly/Varint.h#L109)
//!   is a little-endian incomplete ungrouped representation.

use crate::traits::*;

/// Return the length of the variable-length byte code for `value` in bytes.
#[must_use]
#[inline(always)]
pub fn byte_len_vbyte(mut value: u64) -> usize {
    let mut len = 1;
    loop {
        value >>= 7;
        if value == 0 {
            return len;
        }
        value -= 1;
        len += 1;
    }
}

/// Return the length of the variable-length byte code for `value` in bits.
#[must_use]
#[inline(always)]
pub fn bit_len_vbyte(value: u64) -> usize {
    8 * byte_len_vbyte(value)
}

/// Trait for reading big-endian variable-length byte codes.
///
/// Note that the endianness of the code is independent
/// from the endianness of the underlying bit stream.
pub trait VByteBeRead<E: Endianness>: BitRead<E> {
    fn read_vbyte_be(&mut self) -> Result<u64, Self::Error>;
}

/// Trait for reading little-endian variable-length byte codes.
///
/// Note that the endianness of the code is independent
/// from the endianness of the underlying bit stream.
pub trait VByteLeRead<E: Endianness>: BitRead<E> {
    fn read_vbyte_le(&mut self) -> Result<u64, Self::Error>;
}

impl<E: Endianness, B: BitRead<E>> VByteBeRead<E> for B {
    #[inline(always)]
    fn read_vbyte_be(&mut self) -> Result<u64, Self::Error> {
        let mut byte = self.read_bits(8)?;
        let mut value = byte & 0x7F;
        while (byte >> 7) != 0 {
            value += 1;
            byte = self.read_bits(8)?;
            value = (value << 7) | (byte & 0x7F);
        }
        Ok(value)
    }
}

impl<E: Endianness, B: BitRead<E>> VByteLeRead<E> for B {
    #[inline(always)]
    fn read_vbyte_le(&mut self) -> Result<u64, Self::Error> {
        let mut result = 0;
        let mut shift = 0;
        loop {
            let byte = self.read_bits(8)?;
            result += (byte & 0x7F) << shift;
            if (byte >> 7) == 0 {
                break;
            }
            shift += 7;
            result += 1 << shift;
        }
        Ok(result)
    }
}

/// Trait for write big-endian variable-length byte codes.
///
/// Note that the endianness of the code is independent
/// from the endianness of the underlying bit stream.
pub trait VByteBeWrite<E: Endianness>: BitWrite<E> {
    fn write_vbyte_be(&mut self, value: u64) -> Result<usize, Self::Error>;
}
/// Trait for write little-endian variable-length byte codes.
///
/// Note that the endianness of the code is independent
/// from the endianness of the underlying bit stream.
pub trait VByteLeWrite<E: Endianness>: BitWrite<E> {
    fn write_vbyte_le(&mut self, value: u64) -> Result<usize, Self::Error>;
}

impl<E: Endianness, B: BitWrite<E>> VByteBeWrite<E> for B {
    #[inline(always)]
    fn write_vbyte_be(&mut self, mut value: u64) -> Result<usize, Self::Error> {
        let mut buf = [0u8; 10];
        let mut pos = buf.len() - 1;
        buf[pos] = (value & 0x7F) as u8;
        value >>= 7;
        while value != 0 {
            value -= 1;
            pos -= 1;
            buf[pos] = 0x80 | (value & 0x7F) as u8;
            value >>= 7;
        }
        let bytes_to_write = buf.len() - pos;
        for &byte in &buf[pos..] {
            self.write_bits(byte as u64, 8)?;
        }
        Ok(bytes_to_write * 8)
    }
}

impl<E: Endianness, B: BitWrite<E>> VByteLeWrite<E> for B {
    #[inline(always)]
    fn write_vbyte_le(&mut self, mut value: u64) -> Result<usize, Self::Error> {
        let mut len = 1;
        loop {
            let byte = (value & 0x7F) as u8;
            value >>= 7;
            if value != 0 {
                self.write_bits((byte | 0x80) as u64, 8)?;
            } else {
                self.write_bits(byte as u64, 8)?;
                break;
            }
            value -= 1;
            len += 1;
        }
        Ok(len * 8)
    }
}

/// Write a natural number to a byte stream using variable-length byte codes and
/// return the number of bytes written.
///
/// This method just delegates to the correct endianness-specific method.
#[inline(always)]
pub fn vbyte_write<E: Endianness, W: std::io::Write>(
    value: u64,
    writer: &mut W,
) -> std::io::Result<usize> {
    if core::any::TypeId::of::<E>() == core::any::TypeId::of::<BigEndian>() {
        vbyte_write_be(value, writer)
    } else {
        vbyte_write_le(value, writer)
    }
}

/// Encode a natural number to a big-endian byte stream using variable-length
/// byte codes and return the number of bytes written.
#[inline(always)]
pub fn vbyte_write_be<W: std::io::Write>(mut value: u64, w: &mut W) -> std::io::Result<usize> {
    let mut buf = [0u8; 10];
    let mut pos = buf.len() - 1;
    buf[pos] = (value & 0x7F) as u8;
    value >>= 7;
    while value != 0 {
        value -= 1;
        pos -= 1;
        buf[pos] = 0x80 | (value & 0x7F) as u8;
        value >>= 7;
    }
    let bytes_to_write = buf.len() - pos;
    w.write_all(&buf[pos..])?;
    Ok(bytes_to_write)
}

/// Encode a natural number to a little-endian byte stream using variable-length
/// byte codes and return the number of bytes written.
#[inline(always)]
pub fn vbyte_write_le<W: std::io::Write>(mut value: u64, writer: &mut W) -> std::io::Result<usize> {
    let mut len = 1;
    loop {
        let byte = (value & 0x7F) as u8;
        value >>= 7;
        if value != 0 {
            writer.write_all(&[byte | 0x80])?;
        } else {
            writer.write_all(&[byte])?;
            break;
        }
        value -= 1;
        len += 1;
    }
    Ok(len)
}

#[inline(always)]
/// Decode a natural number from a byte stream using variable-length byte codes.
///
/// This method just delegates to the correct endianness-specific method.
pub fn vbyte_read<E: Endianness, R: std::io::Read>(reader: &mut R) -> std::io::Result<u64> {
    if core::any::TypeId::of::<E>() == core::any::TypeId::of::<BigEndian>() {
        vbyte_read_be(reader)
    } else {
        vbyte_read_le(reader)
    }
}

/// Decode a natural number from a big-endian byte stream using variable-length
/// byte codes.
#[inline(always)]
pub fn vbyte_read_be<R: std::io::Read>(reader: &mut R) -> std::io::Result<u64> {
    let mut buf = [0u8; 1];
    let mut value: u64;
    reader.read_exact(&mut buf)?;
    value = (buf[0] & 0x7F) as u64;
    while (buf[0] >> 7) != 0 {
        value += 1;
        reader.read_exact(&mut buf)?;
        value = (value << 7) | ((buf[0] & 0x7F) as u64);
    }
    Ok(value)
}

/// Decode a natural number from a little-endian byte stream using
/// variable-length byte codes.
#[inline(always)]
pub fn vbyte_read_le<R: std::io::Read>(reader: &mut R) -> std::io::Result<u64> {
    let mut result = 0;
    let mut shift = 0;
    let mut buffer = [0; 1];
    loop {
        reader.read_exact(&mut buffer)?;
        let byte = buffer[0];
        result += ((byte & 0x7F) as u64) << shift;
        if (byte >> 7) == 0 {
            break;
        }
        shift += 7;
        result += 1 << shift;
    }
    Ok(result)
}

#[cfg(test)]
mod test {
    use super::*;

    const UPPER_BOUND_1: u64 = 128;
    const UPPER_BOUND_2: u64 = 128_u64.pow(2) + UPPER_BOUND_1;
    const UPPER_BOUND_3: u64 = 128_u64.pow(3) + UPPER_BOUND_2;
    const UPPER_BOUND_4: u64 = 128_u64.pow(4) + UPPER_BOUND_3;
    const UPPER_BOUND_5: u64 = 128_u64.pow(5) + UPPER_BOUND_4;
    const UPPER_BOUND_6: u64 = 128_u64.pow(6) + UPPER_BOUND_5;
    const UPPER_BOUND_7: u64 = 128_u64.pow(7) + UPPER_BOUND_6;
    const UPPER_BOUND_8: u64 = 128_u64.pow(8) + UPPER_BOUND_7;

    macro_rules! impl_tests {
        ($test_name:ident, $E:ty) => {
            #[test]
            fn $test_name() {
                const MAX: usize = 1 << 20;
                const MIN: usize = 0;
                let mut buffer = std::io::Cursor::new(Vec::with_capacity(128));
                let mut lens = Vec::new();

                for i in MIN..MAX {
                    lens.push(vbyte_write::<$E, _>(i as _, &mut buffer).unwrap());
                }
                buffer.set_position(0);
                for (i, l) in (MIN..MAX).zip(lens.iter()) {
                    let j = vbyte_read::<$E, _>(&mut buffer).unwrap();
                    assert_eq!(byte_len_vbyte(i as _), *l);
                    assert_eq!(j, i as u64);
                }

                let values = [
                    0,
                    UPPER_BOUND_1 - 1,
                    UPPER_BOUND_1 + 1,
                    UPPER_BOUND_2 - 1,
                    UPPER_BOUND_2 + 1,
                    UPPER_BOUND_3 - 1,
                    UPPER_BOUND_3 + 1,
                    UPPER_BOUND_4 - 1,
                    UPPER_BOUND_4 + 1,
                    UPPER_BOUND_5 - 1,
                    UPPER_BOUND_5 + 1,
                    UPPER_BOUND_6 - 1,
                    UPPER_BOUND_6 + 1,
                    UPPER_BOUND_7 - 1,
                    UPPER_BOUND_7 + 1,
                    UPPER_BOUND_8 - 1,
                    UPPER_BOUND_8 + 1,
                    u64::MAX,
                ];

                let tell: u64 = buffer.position();
                for &i in values.iter() {
                    vbyte_write::<$E, _>(i, &mut buffer).unwrap();
                }
                buffer.set_position(tell);
                for &i in values.iter() {
                    assert_eq!(i, vbyte_read::<$E, _>(&mut buffer).unwrap());
                }
            }
        };
    }

    impl_tests!(test_vbytes_be, BE);
    impl_tests!(test_vbytes_le, LE);
}

1	/*
2	* SPDX-FileCopyrightText: 2023 Tommaso Fontana
3	* SPDX-FileCopyrightText: 2023 Sebastiano Vigna
4	*
5	* SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
6	*/
7
8	//! Variable-length byte codes.
9	//!
10	//! These codes represent a natural number as a sequence of bytes, the length of
11	//! the sequence depends on the magnitude of the number. They are used in many
12	//! contexts, and they are known under a plethora of different names such
13	//! “vbyte”, “varint”, “[variable-length
14	//! quantity](https://en.wikipedia.org/wiki/Variable-length_quantity)”, “LEB”,
15	//! and so on.
16	//!
17	//! There are several variants of their definition, but their implied
18	//! distribution is always ≈ 1/x<sup>8/7</sup>
19	//!
20	//! The supported range is [0 . . 2⁶⁴).
21	//!
22	//! # Definition
23	//!
24	//! Since there are a few slightly different variants used in production code
25	//! and in the literature, before going into the details of this implementation
26	//! we will try to define a clear taxonomy by explaining in detail the four
27	//! three properties that define such variants.
28	//!
29	//! The simplest variable-length byte code encodes a number with a binary
30	//! representation of k bits using ⌈k / 7⌉ bytes. The binary representation
31	//! is left-padded with zeros so to obtain exactly ⌈k / 7⌉ blocks of 7 bits.
32	//! Each block is stored in a byte, prefixed with a continuation bit which is
33	//! one for all blocks except for the last one.
34	//!
35	//! ## Endianness
36	//!
37	//! The first property is the endianness of the bytes: in big-endian codes, the
38	//! first byte contains the highest (most significant) bits, whereas in
39	//! little-endian codes, the first byte contains the lowest (less significant)
40	//! bits.
41	//!
42	//! The advantage of the big-endian variant is that is lexicographical, that is,
43	//! comparing lexicographically a stream of encoded natural numbers will give
44	//! the same results as comparing lexicographically the encoded numbers, much
45	//! like it happens for [UTF-8 encoding](https://en.wikipedia.org/wiki/UTF-8).
46	//!
47	//! ## Completeness
48	//!
49	//! This basic representation discussed above is not complete, as there are
50	//! sequences that are not used. For example, zero can be written in many ways
51	//! (e.g., `0x00` or `0x80 0x00` ), but we are using only the single-byte
52	//! representation. Incompleteness leads to a (small) loss in compression.
53	//!
54	//! To have completeness, one can offset the representation in k bits by the
55	//! maximum number representable using k − 1 bits. That is, we represent the
56	//! interval [0..2⁷) with one byte, then the interval [2⁷..2⁷ + 2¹⁴] with two
57	//! bytes, the interval [2⁷ + 2¹⁴..2⁷ + 2¹⁴ + 2²¹] with three bytes, and so on.
58	//!
59	//! ## Grouping
60	//!
61	//! In the basic representation, the continuation bit is the most significant
62	//! bit of each byte. However, one can gather all continuation bits in the first
63	//! byte ([as UTF-8 does](https://en.wikipedia.org/wiki/UTF-8)). This approach
64	//! makes it possible to compute the length of the code using a call to
65	//! [`usize::leading_ones`] on the first negated byte, which usually maps to a
66	//! negation and a call to a fast instruction for the detection of the most
67	//! significant bit, improving branch prediction.
68	//!
69	//! Note that if the code is grouped, choosing a code with the same endianness
70	//! as your hardware can lead to a performance improvement, as after the first
71	//! byte the rest of the code can be read with a
72	//![`read_exact`](std::io::Read::read_exact).
73	//!
74	//! ## Sign
75	//!
76	//! It is possible to extend the codes to represent signed integers. Two
77	//! possible approaches are using a [bijective mapping](crate::codes::ToInt)
78	//! between the integers and the natural numbers, or defining a specialized
79	//! code.
80	//!
81	//! # Implementations
82	//!
83	//! We provide two unsigned, ungrouped, complete representations, one
84	//! [big-endian](VByteBeRead) and one [little-endian](VByteLeRead). We recommend
85	//! in general big-endian version as it is lexicographical. However, the
86	//! big-endian version needs a small buffer when writing, so on some hardware
87	//! the little-endian version might be faster.
88	//!
89	//! Note that the endianness of the code is independent from the endianness of
90	//! the underlying bit stream, as it just depend on the order in which bytes are
91	//! written.
92	//!
93	//! Since this code is byte-aligned, we provide also convenient, fast methods
94	//! [`vbyte_write`] and [`vbyte_read`] that can be used on types implementing
95	//! [`std::io::Read`] and [`std::io::Write`]. There are also endianness-specific
96	//! methods [`vbyte_write_be`], [`vbyte_write_le`], [`vbyte_read_be`], and
97	//! [`vbyte_read_le`].
98	//!
99	//! [`LEB128`]: https://en.wikipedia.org/wiki/LEB128
100	//! [`VLQ`]: https://en.wikipedia.org/wiki/Variable-length_quantity
101	//!
102	//! # Examples
103	//!
104	//! - The [LEB128](https://en.wikipedia.org/wiki/LEB128) code used by LLVM is a
105	//! little-endian incomplete ungrouped representation. There is both a signed
106	//! and an unsigned version; the signed version represent negative numbers
107	//! using two's complement.
108	//!
109	//! - The [code used by
110	//! git](https://github.com/git/git/blob/7fb6aefd2aaffe66e614f7f7b83e5b7ab16d4806/varint.c)
111	//! is a big-endian complete ungrouped representation.
112	//!
113	//! - [This implementation in
114	//! folly](https://github.com/facebook/folly/blob/dd4a5eb057afbc3c7c7da71801df2ee3c61c47d1/folly/Varint.h#L109)
115	//! is a little-endian incomplete ungrouped representation.
116
117	use crate::traits::*;
118
119	/// Return the length of the variable-length byte code for `value` in bytes.
120	#[must_use]
121	#[inline(always)]
122	pub fn byte_len_vbyte(mut value: u64) -> usize {	2,106,600✔
123	let mut len = 1;	4,213,200✔
UNCOV 124	loop {	×
125	value >>= 7;	6,282,448✔
126	if value == 0 {	6,282,448✔
127	return len;	2,106,600✔
128	}
UNCOV 129	value -= 1;	×
UNCOV 130	len += 1;	×
131	}
132	}
133
134	/// Return the length of the variable-length byte code for `value` in bits.
135	#[must_use]
136	#[inline(always)]
137	pub fn bit_len_vbyte(value: u64) -> usize {	9,448✔
138	8 * byte_len_vbyte(value)	9,448✔
139	}
140
141	/// Trait for reading big-endian variable-length byte codes.
142	///
143	/// Note that the endianness of the code is independent
144	/// from the endianness of the underlying bit stream.
145	pub trait VByteBeRead<E: Endianness>: BitRead<E> {
146	fn read_vbyte_be(&mut self) -> Result<u64, Self::Error>;
147	}
148
149	/// Trait for reading little-endian variable-length byte codes.
150	///
151	/// Note that the endianness of the code is independent
152	/// from the endianness of the underlying bit stream.
153	pub trait VByteLeRead<E: Endianness>: BitRead<E> {
154	fn read_vbyte_le(&mut self) -> Result<u64, Self::Error>;
155	}
156
157	impl<E: Endianness, B: BitRead<E>> VByteBeRead<E> for B {
158	#[inline(always)]
159	fn read_vbyte_be(&mut self) -> Result<u64, Self::Error> {	4,424✔
160	let mut byte = self.read_bits(8)?;	13,272✔
UNCOV 161	let mut value = byte & 0x7F;	×
162	while (byte >> 7) != 0 {	9,620✔
163	value += 1;	5,196✔
164	byte = self.read_bits(8)?;	10,392✔
165	value = (value << 7) \| (byte & 0x7F);	5,196✔
166	}
167	Ok(value)	4,424✔
168	}
169	}
170
171	impl<E: Endianness, B: BitRead<E>> VByteLeRead<E> for B {
172	#[inline(always)]
173	fn read_vbyte_le(&mut self) -> Result<u64, Self::Error> {	4,660✔
174	let mut result = 0;	9,320✔
175	let mut shift = 0;	9,320✔
176	loop {	×
177	let byte = self.read_bits(8)?;	34,632✔
UNCOV 178	result += (byte & 0x7F) << shift;	×
UNCOV 179	if (byte >> 7) == 0 {	×
180	break;	4,660✔
181	}
UNCOV 182	shift += 7;	×
183	result += 1 << shift;	×
184	}
UNCOV 185	Ok(result)	×
186	}
187	}
188
189	/// Trait for write big-endian variable-length byte codes.
190	///
191	/// Note that the endianness of the code is independent
192	/// from the endianness of the underlying bit stream.
193	pub trait VByteBeWrite<E: Endianness>: BitWrite<E> {
194	fn write_vbyte_be(&mut self, value: u64) -> Result<usize, Self::Error>;
195	}
196	/// Trait for write little-endian variable-length byte codes.
197	///
198	/// Note that the endianness of the code is independent
199	/// from the endianness of the underlying bit stream.
200	pub trait VByteLeWrite<E: Endianness>: BitWrite<E> {
201	fn write_vbyte_le(&mut self, value: u64) -> Result<usize, Self::Error>;
202	}
203
204	impl<E: Endianness, B: BitWrite<E>> VByteBeWrite<E> for B {
205	#[inline(always)]
206	fn write_vbyte_be(&mut self, mut value: u64) -> Result<usize, Self::Error> {	4,392✔
207	let mut buf = [0u8; 10];	8,784✔
208	let mut pos = buf.len() - 1;	8,784✔
209	buf[pos] = (value & 0x7F) as u8;	4,392✔
210	value >>= 7;	4,392✔
211	while value != 0 {	14,256✔
212	value -= 1;	4,932✔
213	pos -= 1;	4,932✔
214	buf[pos] = 0x80 \| (value & 0x7F) as u8;	4,932✔
215	value >>= 7;	4,932✔
216	}
217	let bytes_to_write = buf.len() - pos;	13,176✔
218	for &byte in &buf[pos..] {	13,716✔
219	self.write_bits(byte as u64, 8)?;	27,972✔
220	}
221	Ok(bytes_to_write * 8)	4,392✔
222	}
223	}
224
225	impl<E: Endianness, B: BitWrite<E>> VByteLeWrite<E> for B {
226	#[inline(always)]
227	fn write_vbyte_le(&mut self, mut value: u64) -> Result<usize, Self::Error> {	4,510✔
228	let mut len = 1;	9,020✔
UNCOV 229	loop {	×
230	let byte = (value & 0x7F) as u8;	20,572✔
231	value >>= 7;	10,286✔
232	if value != 0 {	10,286✔
233	self.write_bits((byte \| 0x80) as u64, 8)?;	17,328✔
234	} else {
235	self.write_bits(byte as u64, 8)?;	4,510✔
236	break;	4,510✔
237	}
238	value -= 1;	5,776✔
239	len += 1;	×
240	}
UNCOV 241	Ok(len * 8)	×
242	}
243	}
244
245	/// Write a natural number to a byte stream using variable-length byte codes and
246	/// return the number of bytes written.
247	///
248	/// This method just delegates to the correct endianness-specific method.
249	#[inline(always)]
250	pub fn vbyte_write<E: Endianness, W: std::io::Write>(	2,097,188✔
251	value: u64,
252	writer: &mut W,
253	) -> std::io::Result<usize> {
254	if core::any::TypeId::of::<E>() == core::any::TypeId::of::<BigEndian>() {	2,097,188✔
255	vbyte_write_be(value, writer)	3,145,782✔
256	} else {
257	vbyte_write_le(value, writer)	1,048,594✔
258	}
259	}
260
261	/// Encode a natural number to a big-endian byte stream using variable-length
262	/// byte codes and return the number of bytes written.
263	#[inline(always)]
264	pub fn vbyte_write_be<W: std::io::Write>(mut value: u64, w: &mut W) -> std::io::Result<usize> {	1,048,594✔
265	let mut buf = [0u8; 10];	2,097,188✔
266	let mut pos = buf.len() - 1;	2,097,188✔
267	buf[pos] = (value & 0x7F) as u8;	1,048,594✔
268	value >>= 7;	1,048,594✔
269	while value != 0 {	5,209,764✔
270	value -= 1;	2,080,585✔
271	pos -= 1;	2,080,585✔
272	buf[pos] = 0x80 \| (value & 0x7F) as u8;	2,080,585✔
273	value >>= 7;	2,080,585✔
274	}
275	let bytes_to_write = buf.len() - pos;	3,145,782✔
276	w.write_all(&buf[pos..])?;	3,145,782✔
277	Ok(bytes_to_write)	1,048,594✔
278	}
279
280	/// Encode a natural number to a little-endian byte stream using variable-length
281	/// byte codes and return the number of bytes written.
282	#[inline(always)]
283	pub fn vbyte_write_le<W: std::io::Write>(mut value: u64, writer: &mut W) -> std::io::Result<usize> {	1,048,594✔
284	let mut len = 1;	2,097,188✔
UNCOV 285	loop {	×
286	let byte = (value & 0x7F) as u8;	6,258,358✔
287	value >>= 7;	3,129,179✔
288	if value != 0 {	3,129,179✔
289	writer.write_all(&[byte \| 0x80])?;	6,241,755✔
290	} else {
291	writer.write_all(&[byte])?;	1,048,594✔
292	break;	1,048,594✔
293	}
294	value -= 1;	2,080,585✔
295	len += 1;	×
296	}
UNCOV 297	Ok(len)	×
298	}
299
300	#[inline(always)]
301	/// Decode a natural number from a byte stream using variable-length byte codes.
302	///
303	/// This method just delegates to the correct endianness-specific method.
304	pub fn vbyte_read<E: Endianness, R: std::io::Read>(reader: &mut R) -> std::io::Result<u64> {	2,097,188✔
305	if core::any::TypeId::of::<E>() == core::any::TypeId::of::<BigEndian>() {	2,097,188✔
306	vbyte_read_be(reader)	2,097,188✔
307	} else {
308	vbyte_read_le(reader)	1,048,594✔
309	}
310	}
311
312	/// Decode a natural number from a big-endian byte stream using variable-length
313	/// byte codes.
314	#[inline(always)]
315	pub fn vbyte_read_be<R: std::io::Read>(reader: &mut R) -> std::io::Result<u64> {	1,048,594✔
316	let mut buf = [0u8; 1];	2,097,188✔
UNCOV 317	let mut value: u64;	×
318	reader.read_exact(&mut buf)?;	3,145,782✔
319	value = (buf[0] & 0x7F) as u64;	1,048,594✔
320	while (buf[0] >> 7) != 0 {	3,129,179✔
321	value += 1;	2,080,585✔
322	reader.read_exact(&mut buf)?;	6,241,755✔
323	value = (value << 7) \| ((buf[0] & 0x7F) as u64);	2,080,585✔
324	}
325	Ok(value)	1,048,594✔
326	}
327
328	/// Decode a natural number from a little-endian byte stream using
329	/// variable-length byte codes.
330	#[inline(always)]
331	pub fn vbyte_read_le<R: std::io::Read>(reader: &mut R) -> std::io::Result<u64> {	1,048,594✔
332	let mut result = 0;	2,097,188✔
333	let mut shift = 0;	2,097,188✔
334	let mut buffer = [0; 1];	2,097,188✔
UNCOV 335	loop {	×
336	reader.read_exact(&mut buffer)?;	9,387,537✔
337	let byte = buffer[0];	3,129,179✔
UNCOV 338	result += ((byte & 0x7F) as u64) << shift;	×
UNCOV 339	if (byte >> 7) == 0 {	×
340	break;	1,048,594✔
341	}
UNCOV 342	shift += 7;	×
343	result += 1 << shift;	×
344	}
UNCOV 345	Ok(result)	×
346	}
347
348	#[cfg(test)]
349	mod test {
350	use super::*;
351
352	const UPPER_BOUND_1: u64 = 128;
353	const UPPER_BOUND_2: u64 = 128_u64.pow(2) + UPPER_BOUND_1;
354	const UPPER_BOUND_3: u64 = 128_u64.pow(3) + UPPER_BOUND_2;
355	const UPPER_BOUND_4: u64 = 128_u64.pow(4) + UPPER_BOUND_3;
356	const UPPER_BOUND_5: u64 = 128_u64.pow(5) + UPPER_BOUND_4;
357	const UPPER_BOUND_6: u64 = 128_u64.pow(6) + UPPER_BOUND_5;
358	const UPPER_BOUND_7: u64 = 128_u64.pow(7) + UPPER_BOUND_6;
359	const UPPER_BOUND_8: u64 = 128_u64.pow(8) + UPPER_BOUND_7;
360
361	macro_rules! impl_tests {
362	($test_name:ident, $E:ty) => {
363	#[test]
364	fn $test_name() {
365	const MAX: usize = 1 << 20;
366	const MIN: usize = 0;
367	let mut buffer = std::io::Cursor::new(Vec::with_capacity(128));
368	let mut lens = Vec::new();
369
370	for i in MIN..MAX {
371	lens.push(vbyte_write::<$E, _>(i as _, &mut buffer).unwrap());
372	}
373	buffer.set_position(0);
374	for (i, l) in (MIN..MAX).zip(lens.iter()) {
375	let j = vbyte_read::<$E, _>(&mut buffer).unwrap();
376	assert_eq!(byte_len_vbyte(i as _), *l);
377	assert_eq!(j, i as u64);
378	}
379
380	let values = [
381	0,
382	UPPER_BOUND_1 - 1,
383	UPPER_BOUND_1 + 1,
384	UPPER_BOUND_2 - 1,
385	UPPER_BOUND_2 + 1,
386	UPPER_BOUND_3 - 1,
387	UPPER_BOUND_3 + 1,
388	UPPER_BOUND_4 - 1,
389	UPPER_BOUND_4 + 1,
390	UPPER_BOUND_5 - 1,
391	UPPER_BOUND_5 + 1,
392	UPPER_BOUND_6 - 1,
393	UPPER_BOUND_6 + 1,
394	UPPER_BOUND_7 - 1,
395	UPPER_BOUND_7 + 1,
396	UPPER_BOUND_8 - 1,
397	UPPER_BOUND_8 + 1,
398	u64::MAX,
399	];
400
401	let tell: u64 = buffer.position();
402	for &i in values.iter() {
403	vbyte_write::<$E, _>(i, &mut buffer).unwrap();
404	}
405	buffer.set_position(tell);
406	for &i in values.iter() {
407	assert_eq!(i, vbyte_read::<$E, _>(&mut buffer).unwrap());
408	}
409	}
410	};
411	}
412
413	impl_tests!(test_vbytes_be, BE);
414	impl_tests!(test_vbytes_le, LE);
415	}

vigna / dsi-bitstream-rs / 18145139209

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source File
Press 'n' to go to next uncovered line, 'b' for previous