• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

vigna / sux-rs / 18480907774

13 Oct 2025 11:33PM UTC coverage: 67.167% (+1.8%) from 65.37%
18480907774

push

github

vigna
Further methods moved to bit_vec_ops; added missing docs; iterators on atomic bit vector no longer take an exclusive reference

13 of 29 new or added lines in 2 files covered. (44.83%)

360 existing lines in 5 files now uncovered.

3852 of 5735 relevant lines covered (67.17%)

117331163.32 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

58.62
/src/func/vbuilder.rs
1
/*
2
 * SPDX-FileCopyrightText: 2025 Sebastiano Vigna
3
 *
4
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
5
 */
6

7
#![allow(clippy::type_complexity)]
8
#![allow(clippy::too_many_arguments)]
9

10
use crate::bits::*;
11
use crate::dict::VFilter;
12
use crate::func::{shard_edge::ShardEdge, *};
13
use crate::traits::BitVecOpsMut;
14
use crate::traits::bit_field_slice::{BitFieldSlice, BitFieldSliceMut, Word};
15
use crate::utils::*;
16
use common_traits::{
17
    CastableInto, DowncastableFrom, DowncastableInto, UnsignedInt, UpcastableFrom, UpcastableInto,
18
};
19
use derivative::Derivative;
20
use derive_setters::*;
21
use dsi_progress_logger::*;
22
use log::info;
23
use rand::SeedableRng;
24
use rand::rngs::SmallRng;
25
use rand::{Rng, RngCore};
26
use rayon::iter::ParallelIterator;
27
use rayon::slice::ParallelSlice;
28
use rdst::*;
29
use std::any::TypeId;
30
use std::borrow::{Borrow, Cow};
31
use std::hint::unreachable_unchecked;
32
use std::marker::PhantomData;
33
use std::mem::transmute;
34
use std::ops::{BitXor, BitXorAssign};
35
use std::slice::Iter;
36
use std::sync::Arc;
37
use std::sync::atomic::{AtomicBool, Ordering};
38
use std::time::Instant;
39
use thread_priority::ThreadPriority;
40

41
use super::shard_edge::FuseLge3Shards;
42

43
const LOG2_MAX_SHARDS: u32 = 16;
44

45
/// A builder for [`VFunc`] and [`VFilter`].
46
///
47
/// Keys must implement the [`ToSig`] trait, which provides a method to compute
48
/// a signature of the key.
49
///
50
/// There are two construction modes: in core memory (default) and
51
/// [offline](VBuilder::offline); both use a [`SigStore`]. In the first case,
52
/// space will be allocated in core memory for signatures and associated values
53
/// for all keys; in the second case, such information will be stored in a
54
/// number of on-disk buckets.
55
///
56
/// There are several setters: for example, you can set [set the maximum number
57
/// of threads](VBuilder::max_num_threads).
58
///
59
/// Once signatures have been computed, each parallel thread will process a
60
/// shard of the signature/value pairs. For very large key sets shards will be
61
/// significantly smaller than the number of keys, so the memory usage, in
62
/// particular in offline mode, can be significantly reduced. Note that using
63
/// too many threads might actually be harmful due to memory contention.
64
///
65
/// The generic parameters are explained in the [`VFunc`] documentation. You
66
/// have to choose the type of the output values and the backend. The remaining
67
/// parameters have default values that are the same as those of
68
/// [`VFunc`]/[`VFilter`], and some elaboration about them can be found in their
69
/// documentation.
70
///
71
/// All construction methods require to pass one or two [`RewindableIoLender`]s
72
/// (keys and possibly values), and the construction might fail and keys might
73
/// be scanned again. The structures in the [`lenders`] module provide easy ways
74
/// to build such lenders, even starting from compressed files of UTF-8 strings.
75
/// The type of the keys of the resulting filter or function will be the type of
76
/// the elements of the [`RewindableIoLender`].
77
///
78
/// # Examples
79
///
80
/// In this example, we build a function that maps each key to itself using a
81
/// boxed slice of `usize` as a backend (note that this is really wasteful). The
82
/// setter for the expected number of keys is used to optimize the construction.
83
/// We use the [`FromIntoIterator`] adapter to turn a clonable [`IntoIterator`]
84
/// into a [`RewindableIoLender`]. Note that you need the
85
/// [`dsi-progress-logger`](https://crates.io/crates/dsi-progress-logger) crate.
86
///
87
/// Type inference derives the input type (`usize`) from the type of the items
88
/// returned by the first [`RewindableIoLender`], and the output type (again,
89
/// `usize`, the first type parameter), from the backend type (`Box<[usize]>`,
90
/// the second type parameter):
91
///
92
/// ```rust
93
/// # use sux::func::VBuilder;
94
/// # use dsi_progress_logger::no_logging;
95
/// # use sux::utils::FromIntoIterator;
96
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
97
/// let builder = VBuilder::<_, Box<[usize]>>::default()
98
///     .expected_num_keys(100);
99
/// let func = builder.try_build_func(
100
///    FromIntoIterator::from(0..100),
101
///    FromIntoIterator::from(0..100),
102
///    no_logging![]
103
/// )?;
104
///
105
/// for i in 0..100 {
106
///    assert_eq!(i, func.get(&i));
107
/// }
108
/// #     Ok(())
109
/// # }
110
/// ```
111
///
112
/// Alternatively we can use the bit-field vector backend, that will use
113
/// ⌈log₂(99)⌉ bits per backend element:
114
///
115
/// ```rust
116
/// # use sux::func::VBuilder;
117
/// # use dsi_progress_logger::no_logging;
118
/// # use sux::utils::FromIntoIterator;
119
/// # use sux::bits::BitFieldVec;
120
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
121
/// let builder = VBuilder::<_, BitFieldVec<usize>>::default()
122
///     .expected_num_keys(100);
123
/// let func = builder.try_build_func(
124
///    FromIntoIterator::from(0..100),
125
///    FromIntoIterator::from(0..100),
126
///    no_logging![]
127
/// )?;
128
///
129
/// for i in 0..100 {
130
///    assert_eq!(i, func.get(&i));
131
/// }
132
/// #     Ok(())
133
/// # }
134
/// ```
135
///
136
/// Since the numbers are small, we can also try to use a fixed-size output:
137
/// type inference takes care of making the second range `0..100` a range of
138
/// `u8`. Note that the type of keys is always `usize`, as it is still inferred
139
/// from the type of the items returned by the first [`RewindableIoLender`]:
140
///
141
/// ```rust
142
/// # use sux::func::VBuilder;
143
/// # use dsi_progress_logger::no_logging;
144
/// # use sux::utils::FromIntoIterator;
145
/// # use sux::bits::BitFieldVec;
146
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
147
/// let builder = VBuilder::<_, Box<[u8]>>::default()
148
///     .expected_num_keys(100);
149
/// let func = builder.try_build_func(
150
///    FromIntoIterator::from(0..100),
151
///    FromIntoIterator::from(0..100),
152
///    no_logging![]
153
/// )?;
154
///
155
/// for i in 0..100 {
156
///    assert_eq!(i, func.get(&i));
157
/// }
158
/// #     Ok(())
159
/// # }
160
/// ```
161
///
162
///
163
/// We now try to build a fast 8-bit filter for the same key set, using a boxed
164
/// slice of `u8` as a backend (this is not wasteful, as the filter uses 8-bit
165
/// hashes):
166
///
167
/// ```rust
168
/// # use sux::func::VBuilder;
169
/// # use dsi_progress_logger::no_logging;
170
/// # use sux::utils::FromIntoIterator;
171
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
172
/// let builder = VBuilder::<_, Box<[u8]>>::default()
173
///     .expected_num_keys(100);
174
/// let func = builder.try_build_filter(
175
///    FromIntoIterator::from(0..100),
176
///    no_logging![]
177
/// )?;
178
///
179
/// for i in 0..100 {
180
///    assert!(func[i]);
181
/// }
182
/// #     Ok(())
183
/// # }
184
/// ```
185
///
186
/// Since the keys are very few, we can switch to 64-bit signatures, and no
187
/// shards, which will yield faster queries:
188
///
189
/// ```rust
190
/// # use sux::func::VBuilder;
191
/// # use sux::func::shard_edge::FuseLge3NoShards;
192
/// # use dsi_progress_logger::no_logging;
193
/// # use sux::utils::FromIntoIterator;
194
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
195
/// let builder = VBuilder::<_, Box<[u8]>, [u64; 1], FuseLge3NoShards>::default()
196
///     .expected_num_keys(100);
197
/// let func = builder.try_build_filter(
198
///    FromIntoIterator::from(0..100),
199
///    no_logging![]
200
/// )?;
201
///
202
/// for i in 0..100 {
203
///    assert!(func[i]);
204
/// }
205
/// #     Ok(())
206
/// # }
207
/// ```
208

209
#[derive(Setters, Debug, Derivative)]
210
#[derivative(Default)]
211
#[setters(generate = false)]
212
pub struct VBuilder<
213
    W: BinSafe + Word,
214
    D: BitFieldSlice<W> + Send + Sync = Box<[W]>,
215
    S = [u64; 2],
216
    E: ShardEdge<S, 3> = FuseLge3Shards,
217
> {
218
    /// The expected number of keys.
219
    ///
220
    /// While this setter is optional, setting this value to a reasonable bound
221
    /// on the actual number of keys will significantly speed up the
222
    /// construction.
223
    #[setters(generate = true, strip_option)]
224
    #[derivative(Default(value = "None"))]
225
    expected_num_keys: Option<usize>,
226

227
    /// The maximum number of parallel threads to use. The default is 8.
228
    #[setters(generate = true)]
229
    #[derivative(Default(value = "8"))]
230
    max_num_threads: usize,
231

232
    /// Use disk-based buckets to reduce core memory usage at construction time.
233
    #[setters(generate = true)]
234
    offline: bool,
235

236
    /// Check for duplicated signatures. This is not necessary in general,
237
    /// but if you suspect you might be feeding duplicate keys, you can
238
    /// enable this check.
239
    #[setters(generate = true)]
240
    check_dups: bool,
241

242
    /// Use always the low-memory peel-by-signature algorithm (true) or the
243
    /// high-memory peel-by-index algorithm (false). The former is slightly
244
    /// slower, but it uses much less memory. Normally [`VBuilder`] uses
245
    /// high-mem and switches to low-mem if there are more
246
    /// than two threads and more than four shards.
247
    #[setters(generate = true, strip_option)]
248
    #[derivative(Default(value = "None"))]
249
    low_mem: Option<bool>,
250

251
    /// The seed for the random number generator.
252
    #[setters(generate = true)]
253
    seed: u64,
254

255
    /// The base-2 logarithm of buckets of the [`SigStore`]. The default is 8.
256
    /// This value is automatically overridden, even if set, if you provide an
257
    /// [expected number of keys](VBuilder::expected_num_keys).
258
    #[setters(generate = true, strip_option)]
259
    #[derivative(Default(value = "8"))]
260
    log2_buckets: u32,
261

262
    /// The target relative space loss due to [ε-cost
263
    /// sharding](https://doi.org/10.4230/LIPIcs.ESA.2019.38).s
264
    ///
265
    /// The default is 0.001. Setting a larger target, for example, 0.01, will
266
    /// increase the space overhead due to sharding, but will provide in general
267
    /// finer shards. This might not always happen, however, because the ε-cost
268
    /// bound is just one of the bounds concurring in limiting the amount of
269
    /// sharding for a specific [`ShardEdge`]. For example, increasing the
270
    /// target to 0.01 will provide very fine sharding using `Mwhc3Shards`
271
    /// shard/edge logic, activated by the `mwhc` feature, but will affect only
272
    /// slightly [`FuseLge3Shards`] or
273
    /// [`FuseLge3FullSigs`](crate::func::shard_edge::FuseLge3FullSigs).
274
    #[setters(generate = true, strip_option)]
275
    #[derivative(Default(value = "0.001"))]
276
    eps: f64,
277

278
    /// The bit width of the maximum value.
279
    bit_width: usize,
280
    /// The edge generator.
281
    shard_edge: E,
282
    /// The number of keys.
283
    num_keys: usize,
284
    /// The ratio between the number of vertices and the number of edges
285
    /// (i.e., between the number of variables and the number of equations).
286
    c: f64,
287
    /// Whether we should use [lazy Gaussian
288
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517).
289
    lge: bool,
290
    /// The number of threads to use.
291
    num_threads: usize,
292
    /// Fast-stop for failed attempts.
293
    failed: AtomicBool,
294
    #[doc(hidden)]
295
    _marker_v: PhantomData<(W, D, S)>,
296
}
297

298
/// Fatal build errors.
299
#[derive(thiserror::Error, Debug)]
300
pub enum BuildError {
301
    #[error("Duplicate key")]
302
    /// A duplicate key was detected.
303
    DuplicateKey,
304
    #[error("Duplicate local signatures: use full signatures")]
305
    /// A duplicate key was detected.
306
    DuplicateLocalSignatures,
307
    #[error("Value too large for specified bit size")]
308
    /// A value is too large for the specified bit size.
309
    ValueTooLarge,
310
}
311

312
#[derive(thiserror::Error, Debug)]
313
/// Transient error during the build, leading to
314
/// trying with a different seed.
315
pub enum SolveError {
316
    #[error("Duplicate signature")]
317
    /// A duplicate signature was detected.
318
    DuplicateSignature,
319
    #[error("Duplicate local signature")]
320
    /// A duplicate local signature was detected.
321
    DuplicateLocalSignature,
322
    #[error("Max shard too big")]
323
    /// The maximum shard is too big relatively to the average shard.
324
    MaxShardTooBig,
325
    #[error("Unsolvable shard")]
326
    /// A shard cannot be solved.
327
    UnsolvableShard,
328
}
329

330
/// The result of a peeling procedure.
331
enum PeelResult<
332
    'a,
333
    W: BinSafe + Word + Send + Sync,
334
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync + 'a,
335
    S: Sig + BinSafe,
336
    E: ShardEdge<S, 3>,
337
    V: BinSafe,
338
> {
339
    Complete(),
340
    Partial {
341
        /// The shard index.
342
        shard_index: usize,
343
        /// The shard.
344
        shard: Arc<Vec<SigVal<S, V>>>,
345
        /// The data.
346
        data: ShardData<'a, W, D>,
347
        /// The double stack whose upper stack contains the peeled edges
348
        /// (possibly represented by the vertex from which they have been
349
        /// peeled).
350
        double_stack: DoubleStack<E::Vertex>,
351
        /// The sides stack.
352
        sides_stack: Vec<u8>,
353
    },
354
}
355

356
/// A graph represented compactly.
357
///
358
/// Each (*k*-hyper)edge is a set of *k* vertices (by construction fuse graphs
359
/// do not have degenerate edges), but we represent it internally as a vector.
360
/// We call *side* the position of a vertex in the edge.
361
///
362
/// For each vertex we store information about the edges incident to the vertex
363
/// and the sides of the vertex in such edges. While technically not necessary
364
/// to perform peeling, the knowledge of the sides speeds up the peeling visit
365
/// by reducing the number of tests that are necessary to update the degrees
366
/// once the edge is peeled (see the `peel_by_*` methods). For the same reason
367
/// it also speeds up assignment.
368
///
369
/// Depending on the peeling method (by signature or by index), the graph will
370
/// store edge indices or signature/value pairs (the generic parameter `X`).
371
///
372
/// Edge information is packed together using Djamal's XOR trick (see
373
/// [“Cache-Oblivious Peeling of Random
374
/// Hypergraphs”](https://doi.org/10.1109/DCC.2014.48)): since during the
375
/// peeling b visit we need to know the content of the list only when a single
376
/// edge index is present, we can XOR together all the edge information.
377
///
378
/// We use a single byte to store the degree (six upper bits) and the XOR of the
379
/// sides (lower two bits). The degree can be stored with a small number of bits
380
/// because the graph is random, so the maximum degree is *O*(log log *n*).
381
/// In any case, the Boolean field `overflow` will become `true` in case of
382
/// overflow.
383
///
384
/// When we peel an edge, we just [zero the degree](Self::zero), leaving the
385
/// edge information and the side in place for further processing later.
386
struct XorGraph<X: Copy + Default + BitXor + BitXorAssign> {
387
    edges: Box<[X]>,
388
    degrees_sides: Box<[u8]>,
389
    overflow: bool,
390
}
391

392
impl<X: BitXor + BitXorAssign + Default + Copy> XorGraph<X> {
393
    pub fn new(n: usize) -> XorGraph<X> {
187✔
394
        XorGraph {
395
            edges: vec![X::default(); n].into(),
935✔
396
            degrees_sides: vec![0; n].into(),
374✔
397
            overflow: false,
398
        }
399
    }
400

401
    #[inline(always)]
402
    pub fn add(&mut self, v: usize, x: X, side: usize) {
69,242,100✔
403
        debug_assert!(side < 3);
138,484,200✔
404
        let (degree_size, overflow) = self.degrees_sides[v].overflowing_add(4);
207,726,300✔
405
        self.degrees_sides[v] = degree_size;
69,242,100✔
406
        self.overflow |= overflow;
69,242,100✔
407
        self.degrees_sides[v] ^= side as u8;
69,242,100✔
408
        self.edges[v] ^= x;
69,242,100✔
409
    }
410

411
    #[inline(always)]
412
    pub fn remove(&mut self, v: usize, x: X, side: usize) {
44,901,368✔
413
        debug_assert!(side < 3);
89,802,736✔
414
        self.degrees_sides[v] -= 4;
44,901,368✔
415
        self.degrees_sides[v] ^= side as u8;
44,901,368✔
416
        self.edges[v] ^= x;
44,901,368✔
417
    }
418

419
    #[inline(always)]
420
    pub fn zero(&mut self, v: usize) {
22,450,684✔
421
        self.degrees_sides[v] &= 0b11;
22,450,684✔
422
    }
423

424
    #[inline(always)]
425
    pub fn edge_and_side(&self, v: usize) -> (X, usize) {
32,450,684✔
426
        debug_assert!(self.degree(v) < 2);
97,352,052✔
427
        (self.edges[v] as _, (self.degrees_sides[v] & 0b11) as _)
64,901,368✔
428
    }
429

430
    #[inline(always)]
431
    pub fn degree(&self, v: usize) -> u8 {
101,315,344✔
432
        self.degrees_sides[v] >> 2
101,315,344✔
433
    }
434

435
    pub fn degrees(
187✔
436
        &self,
437
    ) -> std::iter::Map<std::iter::Copied<std::slice::Iter<'_, u8>>, fn(u8) -> u8> {
438
        self.degrees_sides.iter().copied().map(|d| d >> 2)
26,094,784✔
439
    }
440
}
441

442
/// A preallocated stack implementation that avoids the expensive (even if
443
/// rarely taken) branch of the `Vec` implementation in which memory is
444
/// reallocated. Note that using [`Vec::with_capacity`] is not enough, because
445
/// for the CPU the branch is still there.
446
struct FastStack<X: Copy + Default> {
447
    stack: Vec<X>,
448
    top: usize,
449
}
450

451
impl<X: Copy + Default> FastStack<X> {
452
    pub fn new(n: usize) -> FastStack<X> {
24✔
453
        FastStack {
454
            stack: vec![X::default(); n],
48✔
455
            top: 0,
456
        }
457
    }
458

459
    pub fn push(&mut self, x: X) {
24,000,000✔
460
        debug_assert!(self.top < self.stack.len());
72,000,000✔
461
        self.stack[self.top] = x;
24,000,000✔
462
        self.top += 1;
24,000,000✔
463
    }
464

465
    pub fn len(&self) -> usize {
12✔
466
        self.top
12✔
467
    }
468

469
    pub fn iter(&self) -> std::slice::Iter<'_, X> {
24✔
470
        self.stack[..self.top].iter()
48✔
471
    }
472
}
473

474
/// Two stacks in the same vector.
475
///
476
/// This struct implements a pair of stacks sharing the same memory. The lower
477
/// stack grows from the beginning of the vector, the upper stack grows from the
478
/// end of the vector. Since we use the lower stack for the visit and the upper
479
/// stack for peeled edges (possibly represented by the vertex from which they
480
/// have been peeled), the sum of the lengths of the two stacks cannot exceed
481
/// the length of the vector.
482
#[derive(Debug)]
483
struct DoubleStack<V> {
484
    stack: Vec<V>,
485
    lower: usize,
486
    upper: usize,
487
}
488

489
impl<V: Default + Copy> DoubleStack<V> {
490
    fn new(n: usize) -> DoubleStack<V> {
175✔
491
        DoubleStack {
492
            stack: vec![V::default(); n],
525✔
493
            lower: 0,
494
            upper: n,
495
        }
496
    }
497
}
498

499
impl<V: Copy> DoubleStack<V> {
500
    #[inline(always)]
501
    fn push_lower(&mut self, v: V) {
10,951,140✔
502
        debug_assert!(self.lower < self.upper);
21,902,280✔
503
        self.stack[self.lower] = v;
10,951,140✔
504
        self.lower += 1;
10,951,140✔
505
    }
506

507
    #[inline(always)]
508
    fn push_upper(&mut self, v: V) {
10,450,684✔
509
        debug_assert!(self.lower < self.upper);
20,901,368✔
510
        self.upper -= 1;
10,450,684✔
511
        self.stack[self.upper] = v;
10,450,684✔
512
    }
513

514
    #[inline(always)]
515
    fn pop_lower(&mut self) -> Option<V> {
10,951,315✔
516
        if self.lower == 0 {
10,951,315✔
517
            None
175✔
518
        } else {
519
            self.lower -= 1;
10,951,140✔
UNCOV
520
            Some(self.stack[self.lower])
×
521
        }
522
    }
523

524
    fn upper_len(&self) -> usize {
330✔
525
        self.stack.len() - self.upper
660✔
526
    }
527

528
    fn iter_upper(&self) -> Iter<'_, V> {
200✔
529
        self.stack[self.upper..].iter()
400✔
530
    }
531
}
532

533
/// An iterator over segments of data associated with each shard.
534
type ShardDataIter<'a, W, D> = <D as BitFieldSliceMut<W>>::ChunksMut<'a>;
535
/// A segment of data associated with a specific shard.
536
type ShardData<'a, W, D> = <ShardDataIter<'a, W, D> as Iterator>::Item;
537

538
/// Builds a new function using a `Box<[W]>` to store values.
539
///
540
/// Since values are stored in a boxed slice access is particularly fast, but
541
/// the bit width of the output of the function will be exactly the bit width of
542
/// the unsigned type `W`.
543
impl<W: BinSafe + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, Box<[W]>, S, E>
544
where
545
    u128: UpcastableFrom<W>,
546
    SigVal<S, W>: RadixKey,
547
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
548
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
549
{
UNCOV
550
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
551
        mut self,
552
        keys: impl RewindableIoLender<B>,
553
        values: impl RewindableIoLender<W>,
554
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
555
    ) -> anyhow::Result<VFunc<T, W, Box<[W]>, S, E>>
556
    where
557
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
558
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
559
    {
560
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
×
561
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
×
UNCOV
562
        self.build_loop(keys, values, None, &get_val, new_data, pl)
×
563
    }
564
}
565

566
/// Builds a new filter using a `Box<[W]>` to store values.
567
///
568
/// Since values are stored in a boxed slice access is particularly fast, but
569
/// the number of bits of the hashes will be  exactly the bit width of the
570
/// unsigned type `W`.
571
impl<W: BinSafe + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
572
    VBuilder<W, Box<[W]>, S, E>
573
where
574
    u128: UpcastableFrom<W>,
575
    SigVal<S, EmptyVal>: RadixKey,
576
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
577
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
578
{
579
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
36✔
580
        mut self,
581
        keys: impl RewindableIoLender<B>,
582
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
583
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, Box<[W]>, S, E>>>
584
    where
585
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
586
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
587
    {
588
        let filter_mask = W::MAX;
72✔
589
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
12,630,474✔
590
            mix64(shard_edge.edge_hash(sig_val.sig)).downcast()
63,152,190✔
591
        };
592
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
378✔
593

UNCOV
594
        Ok(VFilter {
×
595
            func: self.build_loop(
72✔
596
                keys,
36✔
597
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
108✔
598
                Some(W::BITS),
36✔
599
                &get_val,
36✔
600
                new_data,
36✔
601
                pl,
36✔
602
            )?,
603
            filter_mask,
36✔
UNCOV
604
            hash_bits: W::BITS as u32,
×
605
        })
606
    }
607
}
608

609
/// Builds a new function using a [bit-field vector](BitFieldVec) on words of
610
/// the unsigned type `W` to store values.
611
///
612
/// Since values are stored in a bit-field vector access will be slower than
613
/// when using a boxed slice, but the bit width of stored values will be the
614
/// minimum necessary. It must be in any case at most the bit width of `W`.
615
///
616
/// Typically `W` will be `usize` or `u64`.
617
impl<W: BinSafe + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, BitFieldVec<W>, S, E>
618
where
619
    u128: UpcastableFrom<W>,
620
    SigVal<S, W>: RadixKey,
621
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
622
{
623
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
27✔
624
        mut self,
625
        keys: impl RewindableIoLender<B>,
626
        values: impl RewindableIoLender<W>,
627
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
628
    ) -> anyhow::Result<VFunc<T, W, BitFieldVec<W>, S, E>> {
629
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
27✔
630
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
273✔
631
        self.build_loop(keys, values, None, &get_val, new_data, pl)
216✔
632
    }
633
}
634

635
/// Builds a new filter using a [bit-field vector](BitFieldVec) on words of the
636
/// unsigned type `W` to store values.
637
///
638
/// Since values are stored in a bit-field vector access will be slower than
639
/// when using a boxed slice, but the number of bits of the hashes can be set at
640
/// will. It must be in any case at most the bit width of `W`.
641
///
642
/// Typically `W` will be `usize` or `u64`.
643
impl<W: BinSafe + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
644
    VBuilder<W, BitFieldVec<W>, S, E>
645
where
646
    u128: UpcastableFrom<W>,
647
    SigVal<S, EmptyVal>: RadixKey,
648
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
649
{
UNCOV
650
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
651
        mut self,
652
        keys: impl RewindableIoLender<B>,
653
        filter_bits: usize,
654
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
655
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, BitFieldVec<W>, S, E>>> {
656
        assert!(filter_bits > 0);
×
657
        assert!(filter_bits <= W::BITS);
×
658
        let filter_mask = W::MAX >> (W::BITS - filter_bits);
×
659
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
×
660
            <W as DowncastableFrom<u64>>::downcast_from(mix64(shard_edge.edge_hash(sig_val.sig)))
×
UNCOV
661
                & filter_mask
×
662
        };
UNCOV
663
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
×
664

665
        Ok(VFilter {
×
666
            func: self.build_loop(
×
667
                keys,
×
668
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
×
669
                Some(filter_bits),
×
670
                &get_val,
×
671
                new_data,
×
UNCOV
672
                pl,
×
673
            )?,
674
            filter_mask,
×
UNCOV
675
            hash_bits: filter_bits as _,
×
676
        })
677
    }
678
}
679

680
impl<
681
    W: BinSafe + Word,
682
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
683
    S: Sig + Send + Sync,
684
    E: ShardEdge<S, 3>,
685
> VBuilder<W, D, S, E>
686
{
687
    /// Builds and returns a new function with given keys and values.
688
    ///
689
    /// This method can build functions based both on vectors and on bit-field
690
    /// vectors. The necessary abstraction is provided by the
691
    /// `new_data(bit_width, len)` function, which is called to create the data
692
    /// structure to store the values.
693
    ///
694
    /// When `V` is [`EmptyVal`], the this method builds a function supporting a
695
    /// filter by mapping each key to a mix of its local signature. The
696
    /// necessary abstraction is provided by the `get_val` function, which is
697
    /// called to extract the value from the signature/value pair; in the case
698
    /// of functions it returns the value stored in the signature/value pair,
699
    /// and in the case of filters it returns the hash associated with the
700
    /// signature.
701
    fn build_loop<
63✔
702
        T: ?Sized + ToSig<S> + std::fmt::Debug,
703
        B: ?Sized + Borrow<T>,
704
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
705
    >(
706
        &mut self,
707
        mut keys: impl RewindableIoLender<B>,
708
        mut values: impl RewindableIoLender<V>,
709
        bit_width: Option<usize>,
710
        get_val: &(impl Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync),
711
        new_data: fn(usize, usize) -> D,
712
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
713
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
714
    where
715
        u128: UpcastableFrom<W>,
716
        SigVal<S, V>: RadixKey,
717
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
718
        for<'a> ShardDataIter<'a, W, D>: Send,
719
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
720
    {
721
        let mut dup_count = 0;
126✔
722
        let mut local_dup_count = 0;
126✔
723
        let mut prng = SmallRng::seed_from_u64(self.seed);
189✔
724

725
        if let Some(expected_num_keys) = self.expected_num_keys {
115✔
726
            self.shard_edge.set_up_shards(expected_num_keys, self.eps);
×
UNCOV
727
            self.log2_buckets = self.shard_edge.shard_high_bits();
×
728
        }
729

730
        pl.info(format_args!("Using 2^{} buckets", self.log2_buckets));
252✔
731

732
        // Loop until success or duplicate detection
UNCOV
733
        loop {
×
734
            let seed = prng.random();
591✔
735

736
            match if self.offline {
197✔
737
                self.try_seed(
144✔
738
                    seed,
72✔
739
                    sig_store::new_offline::<S, V>(
72✔
740
                        self.log2_buckets,
72✔
UNCOV
741
                        LOG2_MAX_SHARDS,
×
742
                        self.expected_num_keys,
72✔
743
                    )?,
744
                    &mut keys,
×
745
                    &mut values,
×
746
                    bit_width,
×
747
                    get_val,
×
748
                    new_data,
×
UNCOV
749
                    pl,
×
750
                )
751
            } else {
752
                self.try_seed(
250✔
753
                    seed,
×
754
                    sig_store::new_online::<S, V>(
×
755
                        self.log2_buckets,
×
756
                        LOG2_MAX_SHARDS,
×
UNCOV
757
                        self.expected_num_keys,
×
758
                    )?,
759
                    &mut keys,
×
760
                    &mut values,
×
761
                    bit_width,
×
762
                    get_val,
×
763
                    new_data,
×
UNCOV
764
                    pl,
×
765
                )
766
            } {
767
                Ok(func) => {
62✔
UNCOV
768
                    return Ok(func);
×
769
                }
770
                Err(error) => {
135✔
771
                    match error.downcast::<SolveError>() {
135✔
772
                        Ok(vfunc_error) => match vfunc_error {
135✔
773
                            // Let's try another seed, but just a few times--most likely,
774
                            // duplicate keys
UNCOV
775
                            SolveError::DuplicateSignature => {
×
776
                                if dup_count >= 3 {
4✔
777
                                    pl.error(format_args!("Duplicate keys (duplicate 128-bit signatures with four different seeds)"));
4✔
778
                                    return Err(BuildError::DuplicateKey.into());
1✔
779
                                }
780
                                pl.warn(format_args!(
×
UNCOV
781
                                "Duplicate 128-bit signature, trying again with a different seed..."
×
782
                                ));
UNCOV
783
                                dup_count += 1;
×
784
                            }
785
                            // Let's try another seed, but just a few times
786
                            SolveError::DuplicateLocalSignature => {
×
787
                                if local_dup_count >= 2 {
×
788
                                    pl.error(format_args!("Duplicate local signatures: use full signatures (duplicate local signatures with three different seeds)"));
×
UNCOV
789
                                    return Err(BuildError::DuplicateLocalSignatures.into());
×
790
                                }
791
                                pl.warn(format_args!(
×
UNCOV
792
                                "Duplicate local signature, trying again with a different seed..."
×
793
                                ));
UNCOV
794
                                local_dup_count += 1;
×
795
                            }
796
                            SolveError::MaxShardTooBig => {
1✔
797
                                pl.warn(format_args!(
3✔
798
                                "The maximum shard is too big, trying again with a different seed..."
1✔
799
                               ));
800
                            }
801
                            // Let's just try another seed
802
                            SolveError::UnsolvableShard => {
130✔
803
                                pl.warn(format_args!(
390✔
804
                                    "Unsolvable shard, trying again with a different seed..."
130✔
805
                                ));
806
                            }
807
                        },
UNCOV
808
                        Err(error) => return Err(error),
×
809
                    }
810
                }
811
            }
812

813
            values = values.rewind().map_err(Into::into)?;
268✔
814
            keys = keys.rewind().map_err(Into::into)?;
134✔
815
        }
816
    }
817
}
818

819
impl<
820
    W: BinSafe + Word,
821
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
822
    S: Sig + Send + Sync,
823
    E: ShardEdge<S, 3>,
824
> VBuilder<W, D, S, E>
825
{
826
    /// Tries to build a function using specific seed. See the comments in the
827
    /// [`VBuilder::build_loop`] method for more details.
828
    ///
829
    /// This methods reads the input, sets up the shards, allocates the backend
830
    /// using `new_data`, and passes the backend and an iterator on shards to
831
    /// the [`VBuilder::try_build_from_shard_iter`] method.
832
    fn try_seed<
197✔
833
        T: ?Sized + ToSig<S> + std::fmt::Debug,
834
        B: ?Sized + Borrow<T>,
835
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
836
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
837
    >(
838
        &mut self,
839
        seed: u64,
840
        mut sig_store: impl SigStore<S, V>,
841
        keys: &mut impl RewindableIoLender<B>,
842
        values: &mut impl RewindableIoLender<V>,
843
        bit_width: Option<usize>,
844
        get_val: &G,
845
        new_data: fn(usize, usize) -> D,
846
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
847
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
848
    where
849
        SigVal<S, V>: RadixKey,
850
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
851
        for<'a> ShardDataIter<'a, W, D>: Send,
852
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
853
    {
854
        let shard_edge = &mut self.shard_edge;
394✔
855

856
        pl.expected_updates(self.expected_num_keys);
591✔
857
        pl.item_name("key");
394✔
858
        pl.start(format!(
591✔
859
            "Computing and storing {}-bit signatures in {} using seed 0x{:016x}...",
197✔
860
            std::mem::size_of::<S>() * 8,
197✔
861
            sig_store
197✔
862
                .temp_dir()
197✔
863
                .map(|d| d.path().to_string_lossy())
341✔
864
                .unwrap_or(Cow::Borrowed("memory")),
394✔
UNCOV
865
            seed
×
866
        ));
867

868
        // This will be the maximum value for functions and EmptyVal for filters
869
        let mut maybe_max_value = V::default();
394✔
870
        let start = Instant::now();
394✔
871

872
        while let Some(result) = keys.next() {
46,361,677✔
UNCOV
873
            match result {
×
874
                Ok(key) => {
23,180,740✔
UNCOV
875
                    pl.light_update();
×
876
                    // This might be an actual value, if we are building a
877
                    // function, or EmptyVal, if we are building a filter.
878
                    let &maybe_val = values
23,180,740✔
879
                        .next()
880
                        .expect("Not enough values")
UNCOV
881
                        .map_err(Into::into)?;
×
882
                    let sig_val = SigVal {
UNCOV
883
                        sig: T::to_sig(key.borrow(), seed),
×
884
                        val: maybe_val,
885
                    };
886
                    maybe_max_value = Ord::max(maybe_max_value, maybe_val);
×
UNCOV
887
                    sig_store.try_push(sig_val)?;
×
888
                }
UNCOV
889
                Err(e) => return Err(e.into()),
×
890
            }
891
        }
892
        pl.done();
394✔
893

894
        self.num_keys = sig_store.len();
197✔
895
        self.bit_width = if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
197✔
896
            bit_width.expect("Bit width must be set for filters")
345✔
897
        } else {
898
            let len_width = maybe_max_value.upcast().len() as usize;
82✔
899
            if let Some(bit_width) = bit_width {
×
900
                if len_width > bit_width {
×
UNCOV
901
                    return Err(BuildError::ValueTooLarge.into());
×
902
                }
UNCOV
903
                bit_width
×
904
            } else {
905
                len_width
82✔
906
            }
907
        };
908

909
        info!(
197✔
910
            "Computation of signatures from inputs completed in {:.3} seconds ({} keys, {:.3} ns/key)",
160✔
911
            start.elapsed().as_secs_f64(),
320✔
UNCOV
912
            self.num_keys,
×
913
            start.elapsed().as_nanos() as f64 / self.num_keys as f64
160✔
914
        );
915

UNCOV
916
        let start = Instant::now();
×
917

918
        let shard_store = sig_store.into_shard_store(shard_edge.shard_high_bits())?;
197✔
919
        let max_shard = shard_store.shard_sizes().iter().copied().max().unwrap_or(0);
×
UNCOV
920
        let filter = TypeId::of::<V>() == TypeId::of::<EmptyVal>();
×
921

922
        shard_edge.set_up_shards(self.num_keys, self.eps);
×
UNCOV
923
        (self.c, self.lge) = shard_edge.set_up_graphs(self.num_keys, max_shard);
×
924

925
        if filter {
115✔
926
            pl.info(format_args!(
345✔
927
                "Number of keys: {} Bit width: {}",
115✔
928
                self.num_keys, self.bit_width,
115✔
929
            ));
930
        } else {
931
            pl.info(format_args!(
82✔
932
                "Number of keys: {} Max value: {} Bit width: {}",
82✔
933
                self.num_keys,
82✔
934
                maybe_max_value.upcast(),
82✔
935
                self.bit_width,
82✔
936
            ));
937
        }
938

939
        if shard_edge.shard_high_bits() != 0 {
6✔
940
            pl.info(format_args!(
24✔
941
                "Max shard / average shard: {:.2}%",
12✔
942
                (100.0 * max_shard as f64)
12✔
943
                    / (self.num_keys as f64 / shard_edge.num_shards() as f64)
12✔
944
            ));
945
        }
946

UNCOV
947
        if max_shard as f64 > 1.01 * self.num_keys as f64 / shard_edge.num_shards() as f64 {
×
948
            // This might sometimes happen with small sharded graphs
949
            Err(SolveError::MaxShardTooBig.into())
1✔
950
        } else {
951
            let data = new_data(
952
                self.bit_width,
×
UNCOV
953
                shard_edge.num_vertices() * shard_edge.num_shards(),
×
954
            );
UNCOV
955
            self.try_build_from_shard_iter(seed, data, shard_store.into_iter(), get_val, pl)
×
956
                .inspect(|_| {
62✔
957
                    info!(
62✔
958
                        "Construction from signatures completed in {:.3} seconds ({} keys, {:.3} ns/key)",
52✔
959
                        start.elapsed().as_secs_f64(),
104✔
UNCOV
960
                        self.num_keys,
×
961
                        start.elapsed().as_nanos() as f64 / self.num_keys as f64
52✔
962
                    );
963
                })
UNCOV
964
                .map_err(Into::into)
×
965
        }
966
    }
967

968
    /// Builds and return a new function starting from an iterator on shards.
969
    ///
970
    /// See [`VBuilder::build_loop`] for more details on the parameters.
971
    fn try_build_from_shard_iter<
196✔
972
        T: ?Sized + ToSig<S>,
973
        I,
974
        P,
975
        V: BinSafe + Default + Send + Sync,
976
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
977
    >(
978
        &mut self,
979
        seed: u64,
980
        mut data: D,
981
        shard_iter: I,
982
        get_val: &G,
983
        pl: &mut P,
984
    ) -> Result<VFunc<T, W, D, S, E>, SolveError>
985
    where
986
        SigVal<S, V>: RadixKey,
987
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
988
        P: ProgressLog + Clone + Send + Sync,
989
        I: Iterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
990
        for<'a> ShardDataIter<'a, W, D>: Send,
991
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
992
    {
993
        let shard_edge = &self.shard_edge;
392✔
994
        self.num_threads = shard_edge.num_shards().min(self.max_num_threads);
588✔
995

996
        pl.info(format_args!("{}", self.shard_edge));
784✔
997
        pl.info(format_args!(
588✔
998
            "c: {}, Overhead: {:+.4}% Number of threads: {}",
196✔
UNCOV
999
            self.c,
×
1000
            100. * ((shard_edge.num_vertices() * shard_edge.num_shards()) as f64
588✔
1001
                / (self.num_keys as f64)
196✔
1002
                - 1.),
196✔
UNCOV
1003
            self.num_threads
×
1004
        ));
1005

1006
        if self.lge {
196✔
1007
            pl.info(format_args!("Peeling towards lazy Gaussian elimination"));
696✔
1008
            self.par_solve(
348✔
1009
                shard_iter,
174✔
1010
                &mut data,
174✔
1011
                |this, shard_index, shard, data, pl| {
165✔
1012
                    this.lge_shard(shard_index, shard, data, get_val, pl)
1,155✔
1013
                },
1014
                &mut pl.concurrent(),
174✔
1015
                pl,
174✔
1016
            )?;
1017
        } else if self.low_mem == Some(true)
22✔
1018
            || self.low_mem.is_none() && self.num_threads > 3 && shard_edge.num_shards() > 2
14✔
1019
        {
1020
            // Less memory, but slower
1021
            self.par_solve(
10✔
1022
                shard_iter,
×
UNCOV
1023
                &mut data,
×
1024
                |this, shard_index, shard, data, pl| {
10✔
1025
                    this.peel_by_sig_vals_low_mem(shard_index, shard, data, get_val, pl)
70✔
1026
                },
1027
                &mut pl.concurrent(),
×
UNCOV
1028
                pl,
×
1029
            )?;
1030
        } else {
1031
            // More memory, but faster
1032
            self.par_solve(
12✔
1033
                shard_iter,
×
UNCOV
1034
                &mut data,
×
1035
                |this, shard_index, shard, data, pl| {
12✔
1036
                    this.peel_by_sig_vals_high_mem(shard_index, shard, data, get_val, pl)
84✔
1037
                },
1038
                &mut pl.concurrent(),
×
UNCOV
1039
                pl,
×
1040
            )?;
1041
        }
1042

1043
        pl.info(format_args!(
62✔
1044
            "Bits/keys: {} ({:+.4}%)",
×
1045
            data.len() as f64 * self.bit_width as f64 / self.num_keys as f64,
×
UNCOV
1046
            100.0 * (data.len() as f64 / self.num_keys as f64 - 1.),
×
1047
        ));
1048

1049
        Ok(VFunc {
×
1050
            seed,
×
1051
            shard_edge: self.shard_edge,
×
1052
            num_keys: self.num_keys,
×
1053
            data,
×
1054
            _marker_t: std::marker::PhantomData,
×
1055
            _marker_w: std::marker::PhantomData,
×
UNCOV
1056
            _marker_s: std::marker::PhantomData,
×
1057
        })
1058
    }
1059
}
1060

1061
macro_rules! remove_edge {
1062
    ($xor_graph: ident, $e: ident, $side: ident, $edge: ident, $stack: ident, $push:ident) => {
1063
        match $side {
1064
            0 => {
1065
                if $xor_graph.degree($e[1]) == 2 {
1066
                    $stack.$push($e[1].cast());
1067
                }
1068
                $xor_graph.remove($e[1], $edge, 1);
1069
                if $xor_graph.degree($e[2]) == 2 {
1070
                    $stack.$push($e[2].cast());
1071
                }
1072
                $xor_graph.remove($e[2], $edge, 2);
1073
            }
1074
            1 => {
1075
                if $xor_graph.degree($e[0]) == 2 {
1076
                    $stack.$push($e[0].cast());
1077
                }
1078
                $xor_graph.remove($e[0], $edge, 0);
1079
                if $xor_graph.degree($e[2]) == 2 {
1080
                    $stack.$push($e[2].cast());
1081
                }
1082
                $xor_graph.remove($e[2], $edge, 2);
1083
            }
1084
            2 => {
1085
                if $xor_graph.degree($e[0]) == 2 {
1086
                    $stack.$push($e[0].cast());
1087
                }
1088
                $xor_graph.remove($e[0], $edge, 0);
1089
                if $xor_graph.degree($e[1]) == 2 {
1090
                    $stack.$push($e[1].cast());
1091
                }
1092
                $xor_graph.remove($e[1], $edge, 1);
1093
            }
1094
            _ => unsafe { unreachable_unchecked() },
1095
        }
1096
    };
1097
}
1098

1099
impl<
1100
    W: BinSafe + Word + Send + Sync,
1101
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
1102
    S: Sig + BinSafe,
1103
    E: ShardEdge<S, 3>,
1104
> VBuilder<W, D, S, E>
1105
{
1106
    fn count_sort<V: BinSafe>(&self, data: &mut [SigVal<S, V>]) {
187✔
1107
        let num_sort_keys = self.shard_edge.num_sort_keys();
561✔
1108
        let mut count = vec![0; num_sort_keys];
561✔
1109

1110
        let mut copied = Box::new_uninit_slice(data.len());
748✔
1111
        for (&sig_val, copy) in data.iter().zip(copied.iter_mut()) {
23,081,635✔
1112
            count[self.shard_edge.sort_key(sig_val.sig)] += 1;
×
UNCOV
1113
            copy.write(sig_val);
×
1114
        }
1115
        let copied = unsafe { copied.assume_init() };
561✔
1116

1117
        count.iter_mut().fold(0, |acc, c| {
8,613✔
1118
            let old = *c;
16,478✔
1119
            *c = acc;
8,239✔
1120
            acc + old
8,239✔
1121
        });
1122

1123
        for &sig_val in copied.iter() {
23,081,074✔
1124
            let key = self.shard_edge.sort_key(sig_val.sig);
×
1125
            data[count[key]] = sig_val;
×
UNCOV
1126
            count[key] += 1;
×
1127
        }
1128
    }
1129

1130
    /// After this number of keys, in the case of filters we remove duplicate
1131
    /// edges.
1132
    const MAX_NO_LOCAL_SIG_CHECK: usize = 1 << 33;
1133

1134
    /// Solves in parallel shards returned by an iterator, storing
1135
    /// the result in `data`.
1136
    ///
1137
    /// # Arguments
1138
    ///
1139
    /// * `shard_iter`: an iterator returning the shards to solve.
1140
    ///
1141
    /// * `data`: the storage for the solution values.
1142
    ///
1143
    /// * `solve_shard`: a method to solve a shard; it takes the shard index,
1144
    ///   the shard, shard-local data, and a progress logger. It may
1145
    ///   fail by returning an error.
1146
    ///
1147
    /// * `main_pl`: the progress logger for the overall computation.
1148
    ///
1149
    /// * `pl`: a progress logger that will be cloned to display the progress of
1150
    ///   a current shard.
1151
    ///
1152
    /// # Errors
1153
    ///
1154
    /// This method returns an error if one of the shards cannot be solved, or
1155
    /// if duplicates are detected.
1156
    fn par_solve<
196✔
1157
        'b,
1158
        V: BinSafe,
1159
        I: IntoIterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1160
        SS: Fn(&Self, usize, Arc<Vec<SigVal<S, V>>>, ShardData<'b, W, D>, &mut P) -> Result<(), ()>
1161
            + Send
1162
            + Sync
1163
            + Copy,
1164
        C: ConcurrentProgressLog + Send + Sync,
1165
        P: ProgressLog + Clone + Send + Sync,
1166
    >(
1167
        &self,
1168
        shard_iter: I,
1169
        data: &'b mut D,
1170
        solve_shard: SS,
1171
        main_pl: &mut C,
1172
        pl: &mut P,
1173
    ) -> Result<(), SolveError>
1174
    where
1175
        I::IntoIter: Send,
1176
        SigVal<S, V>: RadixKey,
1177
        for<'a> ShardDataIter<'a, W, D>: Send,
1178
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
1179
    {
1180
        main_pl
392✔
1181
            .item_name("shard")
1182
            .expected_updates(Some(self.shard_edge.num_shards()))
392✔
1183
            .display_memory(true)
1184
            .start("Solving shards...");
1185

1186
        self.failed.store(false, Ordering::Relaxed);
588✔
1187
        let num_shards = self.shard_edge.num_shards();
588✔
1188
        let buffer_size = self.num_threads.ilog2() as usize;
392✔
1189

1190
        let (err_send, err_recv) = crossbeam_channel::bounded::<_>(self.num_threads);
588✔
1191
        let (data_send, data_recv) = crossbeam_channel::bounded::<(
1192
            usize,
1193
            (Arc<Vec<SigVal<S, V>>>, ShardData<'_, W, D>),
1194
        )>(buffer_size);
196✔
1195

1196
        let result = std::thread::scope(|scope| {
588✔
1197
            scope.spawn(move || {
588✔
1198
                let _ = thread_priority::set_current_thread_priority(ThreadPriority::Max);
196✔
1199
                for val in shard_iter
397✔
1200
                    .into_iter()
196✔
1201
                    .zip(data.try_chunks_mut(self.shard_edge.num_vertices()).unwrap())
1,176✔
1202
                    .enumerate()
196✔
1203
                {
1204
                    if data_send.send(val).is_err() {
603✔
UNCOV
1205
                        break;
×
1206
                    }
1207
                }
1208

1209
                drop(data_send);
392✔
1210
            });
1211

1212
            for _thread_id in 0..self.num_threads {
397✔
1213
                let mut main_pl = main_pl.clone();
×
1214
                let mut pl = pl.clone();
×
1215
                let err_send = err_send.clone();
×
UNCOV
1216
                let data_recv = data_recv.clone();
×
1217
                scope.spawn(move || {
201✔
UNCOV
1218
                    loop {
×
1219
                        match data_recv.recv() {
258✔
1220
                            Err(_) => return,
57✔
1221
                            Ok((shard_index, (shard, mut data))) => {
201✔
UNCOV
1222
                                if shard.is_empty() {
×
1223
                                    return;
10✔
1224
                                }
1225

1226
                                main_pl.info(format_args!(
×
1227
                                    "Analyzing shard {}/{}...",
×
1228
                                    shard_index + 1,
×
UNCOV
1229
                                    num_shards
×
1230
                                ));
1231

1232
                                pl.start(format!(
×
1233
                                    "Sorting shard {}/{}...",
×
1234
                                    shard_index + 1,
×
UNCOV
1235
                                    num_shards
×
1236
                                ));
1237

1238
                                {
1239
                                    // SAFETY: only one thread may be accessing the
1240
                                    // shard, and we will be consuming it
1241
                                    let shard = unsafe {
×
UNCOV
1242
                                        &mut *(Arc::as_ptr(&shard) as *mut Vec<SigVal<S, V>>)
×
1243
                                    };
1244

UNCOV
1245
                                    if self.check_dups {
×
1246
                                        shard.radix_sort_builder().sort();
12✔
1247
                                        if shard.par_windows(2).any(|w| w[0].sig == w[1].sig) {
16✔
1248
                                            let _ = err_send.send(SolveError::DuplicateSignature);
8✔
1249
                                            return;
4✔
1250
                                        }
1251
                                    }
1252

1253
                                    if TypeId::of::<E::LocalSig>() != TypeId::of::<S>()
187✔
1254
                                        && self.num_keys > Self::MAX_NO_LOCAL_SIG_CHECK
109✔
1255
                                    {
1256
                                        // Check for duplicate local signatures
1257

1258
                                        // E::SortSig provides a transmutable
1259
                                        // view of SigVal with an implementation
1260
                                        // of RadixKey that is compatible with
1261
                                        // the sort induced by the key returned
1262
                                        // by ShardEdge::sort_key, and equality
1263
                                        // that implies equality of local
1264
                                        // signatures.
1265

1266
                                        // SAFETY: we drop this immediately after sorting.
1267
                                        let shard = unsafe {
×
1268
                                            transmute::<
×
1269
                                                &mut Vec<SigVal<S, V>>,
×
1270
                                                &mut Vec<E::SortSigVal<V>>,
×
UNCOV
1271
                                            >(shard)
×
1272
                                        };
1273

1274
                                        let builder = shard.radix_sort_builder();
×
1275
                                        if self.max_num_threads == 1 {
×
1276
                                            builder
×
1277
                                                .with_single_threaded_tuner()
×
UNCOV
1278
                                                .with_parallel(false)
×
1279
                                        } else {
UNCOV
1280
                                            builder
×
1281
                                        }
UNCOV
1282
                                        .sort();
×
1283

1284
                                        let shard_len = shard.len();
×
UNCOV
1285
                                        shard.dedup();
×
1286

UNCOV
1287
                                        if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
×
1288
                                            // Duplicate local signatures on
1289
                                            // large filters can be simply
1290
                                            // removed: they do not change the
1291
                                            // semantics of the filter because
1292
                                            // hashes are computed on
1293
                                            // local signatures.
1294
                                            pl.info(format_args!(
×
1295
                                                "Removed signatures: {}",
×
UNCOV
1296
                                                shard_len - shard.len()
×
1297
                                            ));
1298
                                        } else {
1299
                                            // For function, we have to try again
1300
                                            if shard_len != shard.len() {
×
1301
                                                let _ = err_send
×
1302
                                                    .send(SolveError::DuplicateLocalSignature);
×
UNCOV
1303
                                                return;
×
1304
                                            }
1305
                                        }
1306
                                    } else if self.shard_edge.num_sort_keys() != 1 {
374✔
1307
                                        // Sorting the signatures increases locality
1308
                                        self.count_sort::<V>(shard);
374✔
1309
                                    }
1310
                                }
1311

1312
                                pl.done_with_count(shard.len());
187✔
1313

1314
                                main_pl.info(format_args!(
×
1315
                                    "Solving shard {}/{}...",
×
1316
                                    shard_index + 1,
×
UNCOV
1317
                                    num_shards
×
1318
                                ));
1319

1320
                                if self.failed.load(Ordering::Relaxed) {
×
UNCOV
1321
                                    return;
×
1322
                                }
1323

1324
                                if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
111✔
1325
                                    // For filters, we fill the array with random data, otherwise
1326
                                    // elements with signature 0 would have a significantly higher
1327
                                    // probability of being false positives.
1328
                                    //
1329
                                    // SAFETY: We work around the fact that [usize] does not implement Fill
1330
                                    Mwc192::seed_from_u64(self.seed).fill_bytes(unsafe {
333✔
1331
                                        data.as_mut_slice().align_to_mut::<u8>().1
111✔
1332
                                    });
1333
                                }
1334

UNCOV
1335
                                if solve_shard(self, shard_index, shard, data, &mut pl).is_err() {
×
1336
                                    let _ = err_send.send(SolveError::UnsolvableShard);
260✔
1337
                                    return;
130✔
1338
                                }
1339

1340
                                if self.failed.load(Ordering::Relaxed) {
×
UNCOV
1341
                                    return;
×
1342
                                }
1343

1344
                                main_pl.info(format_args!(
×
1345
                                    "Completed shard {}/{}",
×
1346
                                    shard_index + 1,
×
UNCOV
1347
                                    num_shards
×
1348
                                ));
UNCOV
1349
                                main_pl.update_and_display();
×
1350
                            }
1351
                        }
1352
                    }
1353
                });
1354
            }
1355

1356
            drop(err_send);
392✔
1357
            drop(data_recv);
392✔
1358

1359
            if let Some(error) = err_recv.into_iter().next() {
330✔
1360
                self.failed.store(true, Ordering::Relaxed);
×
UNCOV
1361
                return Err(error);
×
1362
            }
1363

1364
            Ok(())
62✔
1365
        });
1366

1367
        main_pl.done();
392✔
1368
        result
196✔
1369
    }
1370

1371
    /// Peels a shard via edge indices.
1372
    ///
1373
    /// This peeler uses a [`SigVal`] per key (the shard), a
1374
    /// [`ShardEdge::Vertex`] and a byte per vertex (for the [`XorGraph`]), a
1375
    /// [`ShardEdge::Vertex`] per vertex (for the [`DoubleStack`]), and a final
1376
    /// byte per vertex (for the stack of sides).
1377
    ///
1378
    /// This peeler uses more memory than
1379
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) but
1380
    /// less memory than
1381
    /// [`peel_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem). It
1382
    /// is fairly slow as it has to go through a cache-unfriendly memory
1383
    /// indirection every time it has to retrieve a [`SigVal`] from the shard,
1384
    /// but it is the peeler of choice when [lazy Gaussian
1385
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) is required, as
1386
    /// after a failed peel-by-sig-vals it is not possible to retrieve
1387
    /// information about the signature/value pairs in the shard.
1388
    ///
1389
    /// In theory one could avoid the stack of sides by putting vertices,
1390
    /// instead of edge indices, on the upper stack, and retrieving edge indices
1391
    /// and sides from the [`XorGraph`], as
1392
    /// [`peel_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) does,
1393
    ///  but this would be less cache friendly. This peeler is only used for
1394
    /// very small instances, and since we are going to pass through lazy
1395
    /// Gaussian elimination some additional speed is a good idea.
1396
    fn peel_by_index<'a, V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1397
        &self,
1398
        shard_index: usize,
1399
        shard: Arc<Vec<SigVal<S, V>>>,
1400
        data: ShardData<'a, W, D>,
1401
        get_val: &G,
1402
        pl: &mut impl ProgressLog,
1403
    ) -> Result<PeelResult<'a, W, D, S, E, V>, ()> {
1404
        let shard_edge = &self.shard_edge;
330✔
1405
        let num_vertices = shard_edge.num_vertices();
495✔
1406
        let num_shards = shard_edge.num_shards();
495✔
1407

1408
        pl.start(format!(
495✔
1409
            "Generating graph for shard {}/{}...",
165✔
1410
            shard_index + 1,
165✔
UNCOV
1411
            num_shards
×
1412
        ));
1413

1414
        let mut xor_graph = XorGraph::<E::Vertex>::new(num_vertices);
495✔
1415
        for (edge_index, sig_val) in shard.iter().enumerate() {
1,081,030✔
1416
            for (side, &v) in shard_edge
3,242,100✔
1417
                .local_edge(shard_edge.local_sig(sig_val.sig))
×
1418
                .iter()
×
UNCOV
1419
                .enumerate()
×
1420
            {
UNCOV
1421
                xor_graph.add(v, edge_index.cast(), side);
×
1422
            }
1423
        }
1424
        pl.done_with_count(shard.len());
495✔
1425

1426
        assert!(
165✔
1427
            !xor_graph.overflow,
165✔
1428
            "Degree overflow for shard {}/{}",
×
1429
            shard_index + 1,
×
UNCOV
1430
            num_shards
×
1431
        );
1432

1433
        if self.failed.load(Ordering::Relaxed) {
495✔
UNCOV
1434
            return Err(());
×
1435
        }
1436

1437
        // The lower stack contains vertices to be visited. The upper stack
1438
        // contains peeled edges.
1439
        let mut double_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
UNCOV
1440
        let mut sides_stack = Vec::<u8>::new();
×
1441

1442
        pl.start(format!(
×
1443
            "Peeling graph for shard {}/{} by edge indices...",
×
1444
            shard_index + 1,
×
UNCOV
1445
            num_shards
×
1446
        ));
1447

1448
        // Preload all vertices of degree one in the visit stack
1449
        for (v, degree) in xor_graph.degrees().enumerate() {
1,223,124✔
1450
            if degree == 1 {
229,108✔
1451
                double_stack.push_lower(v.cast());
687,324✔
1452
            }
1453
        }
1454

1455
        while let Some(v) = double_stack.pop_lower() {
967,413✔
1456
            let v: usize = v.upcast();
×
UNCOV
1457
            if xor_graph.degree(v) == 0 {
×
1458
                continue;
32,940✔
1459
            }
1460
            debug_assert!(xor_graph.degree(v) == 1);
1,352,052✔
1461
            let (edge_index, side) = xor_graph.edge_and_side(v);
1,802,736✔
1462
            xor_graph.zero(v);
1,352,052✔
1463
            double_stack.push_upper(edge_index);
1,352,052✔
1464
            sides_stack.push(side as u8);
1,352,052✔
1465

1466
            let e = shard_edge.local_edge(shard_edge.local_sig(shard[edge_index.upcast()].sig));
3,154,788✔
1467
            remove_edge!(xor_graph, e, side, edge_index, double_stack, push_lower);
450,684✔
1468
        }
1469

1470
        pl.done();
330✔
1471

1472
        if shard.len() != double_stack.upper_len() {
330✔
1473
            pl.info(format_args!(
465✔
1474
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
155✔
1475
                shard_index + 1,
155✔
UNCOV
1476
                num_shards,
×
1477
                double_stack.upper_len(),
310✔
1478
                shard.len(),
155✔
1479
            ));
1480
            return Ok(PeelResult::Partial {
155✔
1481
                shard_index,
310✔
1482
                shard,
310✔
1483
                data,
310✔
1484
                double_stack,
155✔
1485
                sides_stack,
155✔
1486
            });
1487
        }
1488

1489
        self.assign(
×
1490
            shard_index,
×
1491
            data,
×
1492
            double_stack
×
UNCOV
1493
                .iter_upper()
×
1494
                .map(|&edge_index| {
100✔
1495
                    // Turn edge indices into local edge signatures
1496
                    // and associated values
1497
                    let sig_val = &shard[edge_index.upcast()];
300✔
1498
                    let local_sig = shard_edge.local_sig(sig_val.sig);
400✔
1499
                    (
1500
                        local_sig,
200✔
1501
                        get_val(
200✔
1502
                            shard_edge,
200✔
1503
                            SigVal {
100✔
1504
                                sig: local_sig,
100✔
1505
                                val: sig_val.val,
100✔
1506
                            },
1507
                        ),
1508
                    )
1509
                })
1510
                .zip(sides_stack.into_iter().rev()),
×
UNCOV
1511
            pl,
×
1512
        );
1513

UNCOV
1514
        Ok(PeelResult::Complete())
×
1515
    }
1516

1517
    /// Peels a shard via signature/value pairs using a stack of peeled
1518
    /// signatures/value pairs.
1519
    ///
1520
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1521
    /// it drops it immediately after building the graph.
1522
    ///
1523
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]), a
1524
    /// [`ShardEdge::Vertex`] per vertex (for visit stack, albeit usually the stack
1525
    /// never contains more than a third of the vertices), and a [`SigVal`] and
1526
    /// a byte per key (for the stack of peeled edges).
1527
    ///
1528
    /// This is the fastest and more memory-consuming peeler. It has however
1529
    /// just a small advantage during assignment with respect to
1530
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem), which
1531
    /// uses almost half the memory. It is the peeler of choice for low levels
1532
    /// of parallelism.
1533
    ///
1534
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1535
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a failed
1536
    /// peeling it is not possible to retrieve information about the
1537
    /// signature/value pairs in the shard.
1538
    fn peel_by_sig_vals_high_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
12✔
1539
        &self,
1540
        shard_index: usize,
1541
        shard: Arc<Vec<SigVal<S, V>>>,
1542
        data: ShardData<'_, W, D>,
1543
        get_val: &G,
1544
        pl: &mut impl ProgressLog,
1545
    ) -> Result<(), ()>
1546
    where
1547
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1548
    {
1549
        let shard_edge = &self.shard_edge;
24✔
1550
        let num_vertices = shard_edge.num_vertices();
36✔
1551
        let num_shards = shard_edge.num_shards();
36✔
1552
        let shard_len = shard.len();
24✔
1553

1554
        pl.start(format!(
36✔
1555
            "Generating graph for shard {}/{}...",
12✔
1556
            shard_index + 1,
12✔
UNCOV
1557
            num_shards
×
1558
        ));
1559

1560
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
36✔
1561
        for &sig_val in shard.iter() {
12,000,024✔
UNCOV
1562
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1563
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
36,000,000✔
1564
                xor_graph.add(
×
1565
                    v,
×
1566
                    SigVal {
×
1567
                        sig: local_sig,
×
UNCOV
1568
                        val: sig_val.val,
×
1569
                    },
UNCOV
1570
                    side,
×
1571
                );
1572
            }
1573
        }
1574
        pl.done_with_count(shard.len());
36✔
1575

1576
        // We are using a consuming iterator over the shard store, so this
1577
        // drop will free the memory used by the signatures
1578
        drop(shard);
24✔
1579

1580
        assert!(
12✔
1581
            !xor_graph.overflow,
12✔
1582
            "Degree overflow for shard {}/{}",
×
1583
            shard_index + 1,
×
UNCOV
1584
            num_shards
×
1585
        );
1586

1587
        if self.failed.load(Ordering::Relaxed) {
36✔
UNCOV
1588
            return Err(());
×
1589
        }
1590

1591
        let mut sig_vals_stack = FastStack::<SigVal<E::LocalSig, V>>::new(shard_len);
×
UNCOV
1592
        let mut sides_stack = FastStack::<u8>::new(shard_len);
×
1593
        // Experimentally this stack never grows beyond a little more than
1594
        // num_vertices / 4
UNCOV
1595
        let mut visit_stack = Vec::<E::Vertex>::with_capacity(num_vertices / 3);
×
1596

1597
        pl.start(format!(
×
1598
            "Peeling graph for shard {}/{} by signatures (high-mem)...",
×
1599
            shard_index + 1,
×
UNCOV
1600
            num_shards
×
1601
        ));
1602

1603
        // Preload all vertices of degree one in the visit stack
1604
        for (v, degree) in xor_graph.degrees().enumerate() {
13,565,952✔
1605
            if degree == 1 {
2,530,634✔
1606
                visit_stack.push(v.cast());
7,591,902✔
1607
            }
1608
        }
1609

1610
        while let Some(v) = visit_stack.pop() {
25,122,948✔
1611
            let v: usize = v.upcast();
×
UNCOV
1612
            if xor_graph.degree(v) == 0 {
×
1613
                continue;
561,468✔
1614
            }
1615
            let (sig_val, side) = xor_graph.edge_and_side(v);
48,000,000✔
1616
            xor_graph.zero(v);
36,000,000✔
1617
            sig_vals_stack.push(sig_val);
36,000,000✔
1618
            sides_stack.push(side as u8);
36,000,000✔
1619

1620
            let e = self.shard_edge.local_edge(sig_val.sig);
48,000,000✔
1621
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push);
12,000,000✔
1622
        }
1623

1624
        pl.done();
24✔
1625

1626
        if shard_len != sig_vals_stack.len() {
24✔
1627
            pl.info(format_args!(
×
1628
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1629
                shard_index + 1,
×
1630
                num_shards,
×
1631
                sig_vals_stack.len(),
×
UNCOV
1632
                shard_len
×
1633
            ));
UNCOV
1634
            return Err(());
×
1635
        }
1636

1637
        self.assign(
×
1638
            shard_index,
×
1639
            data,
×
1640
            sig_vals_stack
×
1641
                .iter()
×
UNCOV
1642
                .rev()
×
1643
                .map(|&sig_val| (sig_val.sig, get_val(shard_edge, sig_val)))
36,000,000✔
1644
                .zip(sides_stack.iter().copied().rev()),
×
UNCOV
1645
            pl,
×
1646
        );
1647

UNCOV
1648
        Ok(())
×
1649
    }
1650

1651
    /// Peels a shard via signature/value pairs using a stack of vertices to
1652
    /// represent peeled edges.
1653
    ///
1654
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1655
    /// it drops it immediately after building the graph.
1656
    ///
1657
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]) and a
1658
    /// [`ShardEdge::Vertex`] per vertex (for a [`DoubleStack`]).
1659
    ///
1660
    /// This is by far the less memory-hungry peeler, and it is just slightly
1661
    /// slower than
1662
    /// [`peek_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem),
1663
    /// which uses almost twice the memory. It is the peeler of choice for
1664
    /// significant levels of parallelism.
1665
    ///
1666
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1667
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a
1668
    /// failed peeling it is not possible to retrieve information about the
1669
    /// signature/value pairs in the shard.
1670
    fn peel_by_sig_vals_low_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
10✔
1671
        &self,
1672
        shard_index: usize,
1673
        shard: Arc<Vec<SigVal<S, V>>>,
1674
        data: ShardData<'_, W, D>,
1675
        get_val: &G,
1676
        pl: &mut impl ProgressLog,
1677
    ) -> Result<(), ()>
1678
    where
1679
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1680
    {
1681
        let shard_edge = &self.shard_edge;
20✔
1682
        let num_vertices = shard_edge.num_vertices();
30✔
1683
        let num_shards = shard_edge.num_shards();
30✔
1684
        let shard_len = shard.len();
20✔
1685

1686
        pl.start(format!(
30✔
1687
            "Generating graph for shard {}/{}...",
10✔
1688
            shard_index + 1,
10✔
UNCOV
1689
            num_shards,
×
1690
        ));
1691

1692
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
30✔
1693
        for &sig_val in shard.iter() {
10,000,020✔
UNCOV
1694
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1695
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
30,000,000✔
1696
                xor_graph.add(
×
1697
                    v,
×
1698
                    SigVal {
×
1699
                        sig: local_sig,
×
UNCOV
1700
                        val: sig_val.val,
×
1701
                    },
UNCOV
1702
                    side,
×
1703
                );
1704
            }
1705
        }
1706
        pl.done_with_count(shard.len());
30✔
1707

1708
        // We are using a consuming iterator over the shard store, so this
1709
        // drop will free the memory used by the signatures
1710
        drop(shard);
20✔
1711

1712
        assert!(
10✔
1713
            !xor_graph.overflow,
10✔
1714
            "Degree overflow for shard {}/{}",
×
1715
            shard_index + 1,
×
UNCOV
1716
            num_shards
×
1717
        );
1718

1719
        if self.failed.load(Ordering::Relaxed) {
30✔
UNCOV
1720
            return Err(());
×
1721
        }
1722

UNCOV
1723
        let mut visit_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1724

1725
        pl.start(format!(
×
1726
            "Peeling graph for shard {}/{} by signatures (low-mem)...",
×
1727
            shard_index + 1,
×
UNCOV
1728
            num_shards
×
1729
        ));
1730

1731
        // Preload all vertices of degree one in the visit stack
1732
        for (v, degree) in xor_graph.degrees().enumerate() {
11,304,960✔
1733
            if degree == 1 {
2,108,606✔
1734
                visit_stack.push_lower(v.cast());
6,325,818✔
1735
            }
1736
        }
1737

1738
        while let Some(v) = visit_stack.pop_lower() {
20,935,042✔
1739
            let v: usize = v.upcast();
×
UNCOV
1740
            if xor_graph.degree(v) == 0 {
×
1741
                continue;
467,516✔
1742
            }
1743
            let (sig_val, side) = xor_graph.edge_and_side(v);
40,000,000✔
1744
            xor_graph.zero(v);
30,000,000✔
1745
            visit_stack.push_upper(v.cast());
40,000,000✔
1746

1747
            let e = self.shard_edge.local_edge(sig_val.sig);
40,000,000✔
1748
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push_lower);
10,000,000✔
1749
        }
1750

1751
        pl.done();
20✔
1752

1753
        if shard_len != visit_stack.upper_len() {
20✔
1754
            pl.info(format_args!(
×
1755
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1756
                shard_index + 1,
×
1757
                num_shards,
×
1758
                visit_stack.upper_len(),
×
UNCOV
1759
                shard_len
×
1760
            ));
UNCOV
1761
            return Err(());
×
1762
        }
1763

1764
        self.assign(
×
1765
            shard_index,
×
UNCOV
1766
            data,
×
1767
            visit_stack.iter_upper().map(|&v| {
10,000,000✔
1768
                let (sig_val, side) = xor_graph.edge_and_side(v.upcast());
50,000,000✔
1769
                ((sig_val.sig, get_val(shard_edge, sig_val)), side as u8)
30,000,000✔
1770
            }),
UNCOV
1771
            pl,
×
1772
        );
1773

UNCOV
1774
        Ok(())
×
1775
    }
1776

1777
    /// Solves a shard of given index possibly using [lazy Gaussian
1778
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517), and stores the
1779
    /// solution in the given data.
1780
    ///
1781
    /// As a first try, the shard is [peeled by index](VBuilder::peel_by_index).
1782
    /// If the peeling is [partial](PeelResult::Partial), lazy Gaussian
1783
    /// elimination is used to solve the remaining edges.
1784
    ///
1785
    /// This method will scan the double stack, without emptying it, to check
1786
    /// which edges have been peeled. The information will be then passed to
1787
    /// [`VBuilder::assign`] to complete the assignment of values.
1788
    fn lge_shard<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1789
        &self,
1790
        shard_index: usize,
1791
        shard: Arc<Vec<SigVal<S, V>>>,
1792
        data: ShardData<'_, W, D>,
1793
        get_val: &G,
1794
        pl: &mut impl ProgressLog,
1795
    ) -> Result<(), ()> {
1796
        let shard_edge = &self.shard_edge;
330✔
1797
        // Let's try to peel first
1798
        match self.peel_by_index(shard_index, shard, data, get_val, pl) {
1,155✔
UNCOV
1799
            Err(()) => Err(()),
×
1800
            Ok(PeelResult::Complete()) => Ok(()),
10✔
UNCOV
1801
            Ok(PeelResult::Partial {
×
1802
                shard_index,
155✔
1803
                shard,
155✔
1804
                mut data,
155✔
1805
                double_stack,
155✔
1806
                sides_stack,
155✔
UNCOV
1807
            }) => {
×
1808
                pl.info(format_args!("Switching to lazy Gaussian elimination..."));
620✔
1809
                // Likely result--we have solve the rest
1810
                pl.start(format!(
465✔
1811
                    "Generating system for shard {}/{}...",
155✔
1812
                    shard_index + 1,
155✔
1813
                    shard_edge.num_shards()
310✔
1814
                ));
1815

1816
                let num_vertices = shard_edge.num_vertices();
465✔
1817
                let mut peeled_edges = BitVec::new(shard.len());
465✔
1818
                let mut used_vars = BitVec::new(num_vertices);
465✔
1819
                for &edge in double_stack.iter_upper() {
450,894✔
UNCOV
1820
                    peeled_edges.set(edge.upcast(), true);
×
1821
                }
1822

1823
                // Create data for a system using non-peeled edges
1824
                //
1825
                // SAFETY: there is no undefined behavior here, but the
1826
                // raw construction methods we use assume that the
1827
                // equations are sorted, that the variables are not repeated,
1828
                // and the variables are in the range [0..num_vertices).
1829
                let mut system = unsafe {
1830
                    crate::utils::mod2_sys::Modulo2System::from_parts(
1831
                        num_vertices,
155✔
1832
                        shard
155✔
1833
                            .iter()
155✔
1834
                            .enumerate()
155✔
1835
                            .filter(|(edge_index, _)| !peeled_edges[*edge_index])
1,080,755✔
1836
                            .map(|(_edge_index, sig_val)| {
630,171✔
1837
                                let local_sig = shard_edge.local_sig(sig_val.sig);
2,520,064✔
1838
                                let mut eq: Vec<_> = shard_edge
2,520,064✔
1839
                                    .local_edge(local_sig)
630,016✔
1840
                                    .iter()
630,016✔
1841
                                    .map(|&x| {
2,520,064✔
1842
                                        used_vars.set(x, true);
5,670,144✔
1843
                                        x as u32
1,890,048✔
1844
                                    })
1845
                                    .collect();
630,016✔
1846
                                eq.sort_unstable();
630,016✔
1847
                                crate::utils::mod2_sys::Modulo2Equation::from_parts(
630,016✔
1848
                                    eq,
630,016✔
1849
                                    get_val(
1,260,032✔
1850
                                        shard_edge,
1,260,032✔
1851
                                        SigVal {
630,016✔
1852
                                            sig: local_sig,
630,016✔
1853
                                            val: sig_val.val,
630,016✔
1854
                                        },
1855
                                    ),
1856
                                )
1857
                            })
1858
                            .collect(),
155✔
1859
                    )
1860
                };
1861

1862
                if self.failed.load(Ordering::Relaxed) {
465✔
UNCOV
1863
                    return Err(());
×
1864
                }
1865

UNCOV
1866
                pl.start("Solving system...");
×
1867
                let result = system.lazy_gaussian_elimination().map_err(|_| ())?;
155✔
UNCOV
1868
                pl.done_with_count(system.num_equations());
×
1869

1870
                for (v, &value) in result.iter().enumerate().filter(|(v, _)| used_vars[*v]) {
1,749,491✔
UNCOV
1871
                    data.set(v, value);
×
1872
                }
1873

1874
                self.assign(
×
1875
                    shard_index,
×
1876
                    data,
×
1877
                    double_stack
×
UNCOV
1878
                        .iter_upper()
×
1879
                        .map(|&edge_index| {
420,614✔
1880
                            let sig_val = &shard[edge_index.upcast()];
1,261,842✔
1881
                            let local_sig = shard_edge.local_sig(sig_val.sig);
1,682,456✔
1882
                            (
1883
                                local_sig,
841,228✔
1884
                                get_val(
841,228✔
1885
                                    shard_edge,
841,228✔
1886
                                    SigVal {
420,614✔
1887
                                        sig: local_sig,
420,614✔
1888
                                        val: sig_val.val,
420,614✔
1889
                                    },
1890
                                ),
1891
                            )
1892
                        })
1893
                        .zip(sides_stack.into_iter().rev()),
×
UNCOV
1894
                    pl,
×
1895
                );
UNCOV
1896
                Ok(())
×
1897
            }
1898
        }
1899
    }
1900

1901
    /// Perform assignment of values based on peeling data.
1902
    ///
1903
    /// This method might be called after a successful peeling procedure, or
1904
    /// after a linear solver has been used to solve the remaining edges.
1905
    ///
1906
    /// `sig_vals_sides` is an iterator returning pairs of signature/value pairs
1907
    /// and sides in reverse peeling order.
1908
    fn assign(
57✔
1909
        &self,
1910
        shard_index: usize,
1911
        mut data: ShardData<'_, W, D>,
1912
        sigs_vals_sides: impl Iterator<Item = ((E::LocalSig, W), u8)>,
1913
        pl: &mut impl ProgressLog,
1914
    ) {
1915
        if self.failed.load(Ordering::Relaxed) {
171✔
UNCOV
1916
            return;
×
1917
        }
1918

1919
        pl.start(format!(
×
1920
            "Assigning values for shard {}/{}...",
×
1921
            shard_index + 1,
×
UNCOV
1922
            self.shard_edge.num_shards()
×
1923
        ));
1924

1925
        for ((sig, val), side) in sigs_vals_sides {
89,682,913✔
1926
            let edge = self.shard_edge.local_edge(sig);
89,682,856✔
1927
            let side = side as usize;
44,841,428✔
1928
            unsafe {
1929
                let xor = match side {
44,841,428✔
1930
                    0 => data.get_unchecked(edge[1]) ^ data.get_unchecked(edge[2]),
3,935,744✔
1931
                    1 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[2]),
34,066,525✔
1932
                    2 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[1]),
58,358,325✔
1933
                    _ => core::hint::unreachable_unchecked(),
1934
                };
1935

UNCOV
1936
                data.set_unchecked(edge[side], val ^ xor);
×
1937
            }
1938
        }
1939
        pl.done();
57✔
1940
    }
1941
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc