• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

actioninja / refpack-rs / 15969602071

30 Jun 2025 09:48AM UTC coverage: 92.053% (+0.9%) from 91.159%
15969602071

push

github

web-flow
Improve compression ratio (#12)

* Update benchmarks

- add all zeros benchmark
- remove redundant benchmark
- add compressed size print

* Improve compression ratio

* fix clippy lints and format

* make number of hash chaining levels a constant

* add CompressionOptions parameter to compress functions

* add random booleans benchmark

This is a surprisingly useful benchmark, as a theoretically optimal compression algorithm will compress to a ratio of 8:1, and compression speeds tend to follow real-world usage better than the other benchmarks.

* make benchmark sizes cover a broader range

* change "fast" algorithm to use hash chaining

* add back original algorithm as "fastest"

* make optimal algorithm use a circular buffer

With this the memory usage for large data sizes is a fraction of what it used to be, but the current implementation needs to be re-optimized for this layout

* fix control bench

* formatting

* ignore optimal_matches_reference test

* remove short_skip_position

In theory it is a small algorithmic improvement, but the extra space needed for alignment and hash table pretty much completely nullify the benefit.
Removing this also simplifies the algorithm somewhat.

* change HashMap to PrefixTable

* merge duplicate search loop

* drop cost_state early

This reduces peak memory usage slightly.

* format

* refactor: rename HASH_CHAIN_MODULO to HASH_CHAIN_BUFFER_SIZE

* refactor: format

* refactor: add MPL headers to files that are missing the header

* refactor: replace commented code with a comment explaining why the code is not present

* doc: add documentation to CompressionOptions

* refactor: do not expose PrefixSearcher in the library interface

* refactor: only compile HashchainPrefixSearcher in test builds

* doc: document PrefixSearcher search function

* doc: add doc comment to encode_slice_hc

* fix: change debug_assert to assert to make sure all tests pass in release

* doc: fully comment encode_slice_hc

* doc: fully... (continued)

834 of 919 new or added lines in 12 files covered. (90.75%)

4 existing lines in 2 files now uncovered.

1529 of 1661 relevant lines covered (92.05%)

370838.14 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

61.84
/src/data/decompression.rs
1
////////////////////////////////////////////////////////////////////////////////
2
// This Source Code Form is subject to the terms of the Mozilla Public         /
3
// License, v. 2.0. If a copy of the MPL was not distributed with this         /
4
// file, You can obtain one at https://mozilla.org/MPL/2.0/.                   /
5
//                                                                             /
6
////////////////////////////////////////////////////////////////////////////////
7

8
//! Decompression parsing, algorithms, and functionality. Exact decompression
9
//! algorithm is subject to change.
10
//!
11
//! Basic concept is to parse the header, identify key information such as
12
//! decompressed header, then parse as a repeating stream of "command" blocks,
13
//! consisting of a control code and any (if any) following literal bytes.
14
//!
15
//! Literal bytes should always be written before performing the control code
16
//! operation.
17
//!
18
//! Control code operations are "run length encoded", which is a format of
19
//! encoding where the "length" of a pointer-style control can be longer than
20
//! the lookback length. This indicates that the section within the lookback
21
//! region should repeat until the length is fulfilled
22
//!
23
//! # Example Byte-by-byte Algorithm
24
//!
25
//! This is *normally* handled via byte-by-byte decoding, where a lookback
26
//! position and write position counter are incremented at the same time,
27
//! however other faster implementations may be possible. This algorithm
28
//! explanation is purely intended to illustrate the concept.
29
//!
30
//! ## Algorithm steps
31
//!
32
//! while in reality the possible allowed values are any arbitrary byte,
33
//! characters are used to indicate whole bytes to simplify illustration
34
//!
35
//! Given the current decoded output of `DEADBEEF`, the next control encountered
36
//! has a lookback of `4`, and a length of `16`.
37
//!
38
//! ### 1. Create Pointers
39
//! Create pointers/get indexes for the current lookback position and current
40
//! output position
41
//!
42
//! ```text
43
//! DEADBEEF
44
//!     ^   ^
45
//!     LB  O
46
//! ```
47
//!
48
//! ### 2. Copy from lookback to output
49
//! Take the current byte at the lookback position, and copy it to the output
50
//! position
51
//!
52
//! ```text
53
//! DEADBEEFB
54
//!     ^   ^
55
//!     LB  O
56
//! ```
57
//!
58
//! ### 3. Advance Pointers
59
//! Increment both the lookback position and the output position
60
//!
61
//! ```text
62
//! DEADBEEFB
63
//!      ^   ^
64
//!      LB  O
65
//! ```
66
//!
67
//! ### 4. Repeat `length` times
68
//! Steps 2 and 3 should repeat a total of times equal to the length of the
69
//! control block.
70
//!
71
//! #### 4.A "What about when the lookback reaches already output?"
72
//! In a case where the length overruns into the already written output, the
73
//! process continues as normal and the output starts to repeat.
74
//!
75
//! in this example output, On the 5th iteration you will encounter this state:
76
//! ```text
77
//! DEADBEEFBEEF
78
//!         ^   ^
79
//!         LB  O
80
//! ```
81
//! The lookback is currently pointing at the first "B" that has been written.
82
//! This isn't actually a problem, and step 2 and 3 should be followed as normal:
83
//!
84
//! ```text
85
//! DEADBEEFBEEFB
86
//!         ^   ^
87
//!         LB  O
88
//! ```
89
//!
90
//! ```text
91
//! DEADBEEFBEEFB
92
//!          ^   ^
93
//!          LB  O
94
//! ```
95
//!
96
//! ## Final Result
97
//! Given that the algorithm was followed properly, the final output of the
98
//! example input should be
99
//! ```text
100
//! DEADBEEFBEEFBEEFBEEFBEEF
101
//! ```
102
use std::io::{Cursor, Read, Seek, Write};
103

104
use crate::data::control::Command;
105
use crate::data::{copy_from_reader, rle_decode_fixed};
106
use crate::format::Format;
107
use crate::header::Header;
108
use crate::RefPackError;
109

110
// Returning the internal buffer is the fastest way to return the data
111
// since that way the buffer doesn't have to be copied,
112
// this function is used to reach optimal performance
113
fn decompress_internal<F: Format>(
101,049✔
114
    reader: &mut (impl Read + Seek),
101,049✔
115
) -> Result<Vec<u8>, RefPackError> {
101,049✔
116
    let Header {
117
        decompressed_length,
101,049✔
118
        ..
119
    } = Header::read::<F::HeaderMode>(reader)?;
101,049✔
120

121
    let mut decompression_buffer = vec![0; decompressed_length as usize];
101,049✔
122
    let mut position = 0usize;
101,049✔
123

124
    loop {
125
        let command = Command::read(reader)?;
262,307✔
126

127
        match command {
262,307✔
128
            Command::Short {
129
                offset,
57,220✔
130
                length,
57,220✔
131
                literal,
57,220✔
132
            }
133
            | Command::Medium {
134
                offset,
1,861✔
135
                length,
1,861✔
136
                literal,
1,861✔
137
            } => {
138
                if literal > 0 {
59,081✔
139
                    position = copy_from_reader(
2,634✔
140
                        &mut decompression_buffer,
2,634✔
141
                        reader,
2,634✔
142
                        position,
2,634✔
143
                        literal as usize,
2,634✔
UNCOV
144
                    )?;
×
145
                }
56,447✔
146
                position = rle_decode_fixed(
59,081✔
147
                    &mut decompression_buffer,
59,081✔
148
                    position,
59,081✔
149
                    offset as usize,
59,081✔
150
                    length as usize,
59,081✔
151
                )
152
                .map_err(|error| RefPackError::ControlError { error, position })?;
59,081✔
153
            }
154
            Command::Long {
155
                offset,
×
156
                length,
×
157
                literal,
×
158
            } => {
159
                if literal > 0 {
×
160
                    position = copy_from_reader(
×
161
                        &mut decompression_buffer,
×
162
                        reader,
×
163
                        position,
×
164
                        literal as usize,
×
165
                    )?;
×
166
                }
×
167
                position = rle_decode_fixed(
×
168
                    &mut decompression_buffer,
×
169
                    position,
×
170
                    offset as usize,
×
171
                    length as usize,
×
172
                )
173
                .map_err(|error| RefPackError::ControlError { error, position })?;
×
174
            }
175
            Command::Literal(literal) => {
102,177✔
176
                position = copy_from_reader(
102,177✔
177
                    &mut decompression_buffer,
102,177✔
178
                    reader,
102,177✔
179
                    position,
102,177✔
180
                    literal as usize,
102,177✔
UNCOV
181
                )?;
×
182
            }
183
            Command::Stop(literal) => {
101,049✔
184
                copy_from_reader(
101,049✔
185
                    &mut decompression_buffer,
101,049✔
186
                    reader,
101,049✔
187
                    position,
101,049✔
188
                    literal as usize,
101,049✔
UNCOV
189
                )?;
×
190
                break;
101,049✔
191
            }
192
        }
193
    }
194

195
    Ok(decompression_buffer)
101,049✔
196
}
101,049✔
197

198
/// Decompress `refpack` data. Accepts arbitrary `Read`s and `Write`s.
199
///
200
/// # Example
201
///
202
/// ```Rust
203
/// use std::io::Cursor;
204
///
205
/// let mut input = Cursor::new(/* some refpack data */);
206
/// let mut output = Cursor::new(Vec::new());
207
///
208
/// // decompress the input into the output
209
/// refpack::compress(&mut input, &mut output);
210
/// // output now contains the decompressed version of the input
211
/// ```
212
/// # Errors
213
/// - [RefPackError::BadMagic]: Header magic was malformed, likely indicating
214
///   either uncompressed data or attempting to decompress data in an incorrect
215
///   format
216
/// - [RefPackError::BadFlags]: Header magic was malformed, likely indicating
217
///   either uncompressed data or attempting to decompress data in an incorrect
218
///   format
219
/// - [RefPackError::ControlError]: Invalid control code operation was attempted
220
///   to be performed. This normally indicated corrupted or invalid refpack
221
///   data
222
/// - [RefPackError::Io]: Generic IO error occured while attempting to read or
223
///   write data
224
pub fn decompress<F: Format>(
×
225
    reader: &mut (impl Read + Seek),
×
226
    writer: &mut impl Write,
×
227
) -> Result<(), RefPackError> {
×
228
    let data = decompress_internal::<F>(reader)?;
×
229

230
    writer.write_all(data.as_slice())?;
×
231
    writer.flush()?;
×
232

233
    Ok(())
×
234
}
×
235

236
/// Wrapped decompress function with a bit easier and cleaner of an API.
237
/// Takes a slice of bytes and returns a Vec of byes
238
/// In implementation this just creates `Cursor`s for the reader and writer and
239
/// calls `decompress`
240
///
241
/// # Returns
242
///
243
/// A Result containing either `Vec<u8>` of the decompressed data or a
244
/// `RefPackError`.
245
///
246
/// # Errors
247
/// - [RefPackError::BadMagic]: Header was malformed, likely indicating either
248
///   uncompressed data or attempting to decompress data in an incorrect format
249
/// - [RefPackError::BadFlags]: Header magic was malformed, likely indicating
250
///   either uncompressed data or attempting to decompress data in an incorrect
251
///   format
252
/// - [RefPackError::ControlError]: Invalid control code operation was attempted
253
///   to be performed. This normally indicated corrupted or invalid refpack
254
///   data
255
/// - [RefPackError::Io]: Generic IO error occured while attempting to read or
256
///   write data
257
#[inline]
258
pub fn easy_decompress<F: Format>(input: &[u8]) -> Result<Vec<u8>, RefPackError> {
101,049✔
259
    let mut reader = Cursor::new(input);
101,049✔
260
    decompress_internal::<F>(&mut reader)
101,049✔
261
}
101,049✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc