• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

vigna / sux-rs / 19758362676

28 Nov 2025 08:33AM UTC coverage: 75.266% (-0.09%) from 75.352%
19758362676

push

github

vigna
Vbuilder was not working properly without a provided expected number of keys

1 of 1 new or added line in 1 file covered. (100.0%)

5 existing lines in 2 files now uncovered.

4388 of 5830 relevant lines covered (75.27%)

124961103.93 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

79.74
/src/func/vbuilder.rs
1
/*
2
 * SPDX-FileCopyrightText: 2025 Sebastiano Vigna
3
 *
4
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
5
 */
6

7
#![allow(clippy::type_complexity)]
8
#![allow(clippy::too_many_arguments)]
9

10
use crate::bits::*;
11
use crate::dict::VFilter;
12
use crate::func::{shard_edge::ShardEdge, *};
13
use crate::traits::BitVecOpsMut;
14
use crate::traits::bit_field_slice::{BitFieldSlice, BitFieldSliceMut, Word};
15
use crate::utils::*;
16
use common_traits::{
17
    CastableInto, DowncastableFrom, DowncastableInto, UnsignedInt, UpcastableFrom, UpcastableInto,
18
};
19
use derivative::Derivative;
20
use derive_setters::*;
21
use dsi_progress_logger::*;
22
use lender::FallibleLending;
23
use log::info;
24
use rand::SeedableRng;
25
use rand::rngs::SmallRng;
26
use rand::{Rng, RngCore};
27
use rayon::iter::ParallelIterator;
28
use rayon::slice::ParallelSlice;
29
use rdst::*;
30
use std::any::TypeId;
31
use std::borrow::{Borrow, Cow};
32
use std::hint::unreachable_unchecked;
33
use std::marker::PhantomData;
34
use std::mem::transmute;
35
use std::ops::{BitXor, BitXorAssign};
36
use std::slice::Iter;
37
use std::sync::Arc;
38
use std::sync::atomic::{AtomicBool, Ordering};
39
use std::time::Instant;
40
use thread_priority::ThreadPriority;
41
use value_traits::slices::{SliceByValue, SliceByValueMut};
42

43
use super::shard_edge::FuseLge3Shards;
44

45
const LOG2_MAX_SHARDS: u32 = 16;
46

47
/// A builder for [`VFunc`] and [`VFilter`].
48
///
49
/// Keys must implement the [`ToSig`] trait, which provides a method to compute
50
/// a signature of the key.
51
///
52
/// There are two construction modes: in core memory (default) and
53
/// [offline](VBuilder::offline); both use a [`SigStore`]. In the first case,
54
/// space will be allocated in core memory for signatures and associated values
55
/// for all keys; in the second case, such information will be stored in a
56
/// number of on-disk buckets.
57
///
58
/// There are several setters: for example, you can set [set the maximum number
59
/// of threads](VBuilder::max_num_threads).
60
///
61
/// Once signatures have been computed, each parallel thread will process a
62
/// shard of the signature/value pairs. For very large key sets shards will be
63
/// significantly smaller than the number of keys, so the memory usage, in
64
/// particular in offline mode, can be significantly reduced. Note that using
65
/// too many threads might actually be harmful due to memory contention.
66
///
67
/// The generic parameters are explained in the [`VFunc`] documentation. You
68
/// have to choose the type of the output values and the backend. The remaining
69
/// parameters have default values that are the same as those of
70
/// [`VFunc`]/[`VFilter`], and some elaboration about them can be found in their
71
/// documentation.
72
///
73
/// All construction methods require to pass one or two [`FallibleRewindableLender`]s
74
/// (keys and possibly values), and the construction might fail and keys might
75
/// be scanned again. The structures in the [`lenders`] module provide easy ways
76
/// to build such lenders, even starting from compressed files of UTF-8 strings.
77
/// The type of the keys of the resulting filter or function will be the type of
78
/// the elements of the [`FallibleRewindableLender`].
79
///
80
/// # Examples
81
///
82
/// In this example, we build a function that maps each key to itself using a
83
/// boxed slice of `usize` as a backend (note that this is really wasteful). The
84
/// setter for the expected number of keys is used to optimize the construction.
85
/// We use the [`FromCloneableIntoIterator`] adapter to turn a clonable [`IntoIterator`]
86
/// into a [`FallibleRewindableLender`]. Note that you need the
87
/// [`dsi-progress-logger`](https://crates.io/crates/dsi-progress-logger) crate.
88
///
89
/// Type inference derives the input type (`usize`) from the type of the items
90
/// returned by the first [`FallibleRewindableLender`], and the output type (again,
91
/// `usize`, the first type parameter), from the backend type (`Box<[usize]>`,
92
/// the second type parameter):
93
///
94
/// ```rust
95
/// # use sux::func::VBuilder;
96
/// # use dsi_progress_logger::no_logging;
97
/// # use sux::utils::FromCloneableIntoIterator;
98
/// let builder = VBuilder::<_, Box<[usize]>>::default()
99
///     .expected_num_keys(100);
100
/// let func = builder.try_build_func(
101
///    FromCloneableIntoIterator::new(0..100),
102
///    FromCloneableIntoIterator::new(0..100),
103
///    no_logging![]
104
/// )?;
105
///
106
/// for i in 0..100 {
107
///    assert_eq!(i, func.get(&i));
108
/// }
109
/// # Ok::<(), Box<dyn std::error::Error>>(())
110
/// ```
111
///
112
/// Alternatively we can use the bit-field vector backend, that will use
113
/// ⌈log₂(99)⌉ bits per backend element:
114
///
115
/// ```rust
116
/// # use sux::func::VBuilder;
117
/// # use dsi_progress_logger::no_logging;
118
/// # use sux::utils::FromCloneableIntoIterator;
119
/// # use sux::bits::BitFieldVec;
120
/// let builder = VBuilder::<_, BitFieldVec<usize>>::default()
121
///     .expected_num_keys(100);
122
/// let func = builder.try_build_func(
123
///    FromCloneableIntoIterator::new(0..100),
124
///    FromCloneableIntoIterator::new(0..100),
125
///    no_logging![]
126
/// )?;
127
///
128
/// for i in 0..100 {
129
///    assert_eq!(i, func.get(&i));
130
/// }
131
/// # Ok::<(), Box<dyn std::error::Error>>(())
132
/// ```
133
///
134
/// Since the numbers are small, we can also try to use a fixed-size output:
135
/// type inference takes care of making the second range `0..100` a range of
136
/// `u8`. Note that the type of keys is always `usize`, as it is still inferred
137
/// from the type of the items returned by the first [`FallibleRewindableLender`]:
138
///
139
/// ```rust
140
/// # use sux::func::VBuilder;
141
/// # use dsi_progress_logger::no_logging;
142
/// # use sux::utils::FromCloneableIntoIterator;
143
/// # use sux::bits::BitFieldVec;
144
/// let builder = VBuilder::<_, Box<[u8]>>::default()
145
///     .expected_num_keys(100);
146
/// let func = builder.try_build_func(
147
///    FromCloneableIntoIterator::new(0..100),
148
///    FromCloneableIntoIterator::new(0..100),
149
///    no_logging![]
150
/// )?;
151
///
152
/// for i in 0..100 {
153
///    assert_eq!(i, func.get(&i));
154
/// }
155
/// # Ok::<(), Box<dyn std::error::Error>>(())
156
/// ```
157
///
158
///
159
/// We now try to build a fast 8-bit filter for the same key set, using a boxed
160
/// slice of `u8` as a backend (this is not wasteful, as the filter uses 8-bit
161
/// hashes):
162
///
163
/// ```rust
164
/// # use sux::func::VBuilder;
165
/// # use dsi_progress_logger::no_logging;
166
/// # use sux::utils::FromCloneableIntoIterator;
167
/// let builder = VBuilder::<_, Box<[u8]>>::default()
168
///     .expected_num_keys(100);
169
/// let func = builder.try_build_filter(
170
///    FromCloneableIntoIterator::new(0..100),
171
///    no_logging![]
172
/// )?;
173
///
174
/// for i in 0..100 {
175
///    assert!(func[i]);
176
/// }
177
/// # Ok::<(), Box<dyn std::error::Error>>(())
178
/// ```
179
///
180
/// Since the keys are very few, we can switch to 64-bit signatures, and no
181
/// shards, which will yield faster queries:
182
///
183
/// ```rust
184
/// # use sux::func::VBuilder;
185
/// # use sux::func::shard_edge::FuseLge3NoShards;
186
/// # use dsi_progress_logger::no_logging;
187
/// # use sux::utils::FromCloneableIntoIterator;
188
/// let builder = VBuilder::<_, Box<[u8]>, [u64; 1], FuseLge3NoShards>::default()
189
///     .expected_num_keys(100);
190
/// let func = builder.try_build_filter(
191
///    FromCloneableIntoIterator::new(0..100),
192
///    no_logging![]
193
/// )?;
194
///
195
/// for i in 0..100 {
196
///    assert!(func[i]);
197
/// }
198
/// # Ok::<(), Box<dyn std::error::Error>>(())
199
/// ```
200

201
#[derive(Setters, Debug, Derivative)]
202
#[derivative(Default)]
203
#[setters(generate = false)]
204
pub struct VBuilder<
205
    W: Word + BinSafe,
206
    D: BitFieldSlice<W> + Send + Sync = Box<[W]>,
207
    S = [u64; 2],
208
    E: ShardEdge<S, 3> = FuseLge3Shards,
209
> {
210
    /// The expected number of keys.
211
    ///
212
    /// While this setter is optional, setting this value to a reasonable bound
213
    /// on the actual number of keys will significantly speed up the
214
    /// construction.
215
    #[setters(generate = true, strip_option)]
216
    #[derivative(Default(value = "None"))]
217
    expected_num_keys: Option<usize>,
218

219
    /// The maximum number of parallel threads to use. The default is 8.
220
    #[setters(generate = true)]
221
    #[derivative(Default(value = "8"))]
222
    max_num_threads: usize,
223

224
    /// Use disk-based buckets to reduce core memory usage at construction time.
225
    #[setters(generate = true)]
226
    offline: bool,
227

228
    /// Check for duplicated signatures. This is not necessary in general,
229
    /// but if you suspect you might be feeding duplicate keys, you can
230
    /// enable this check.
231
    #[setters(generate = true)]
232
    check_dups: bool,
233

234
    /// Use always the low-memory peel-by-signature algorithm (true) or the
235
    /// high-memory peel-by-index algorithm (false). The former is slightly
236
    /// slower, but it uses much less memory. Normally [`VBuilder`] uses
237
    /// high-mem and switches to low-mem if there are more
238
    /// than two threads and more than four shards.
239
    #[setters(generate = true, strip_option)]
240
    #[derivative(Default(value = "None"))]
241
    low_mem: Option<bool>,
242

243
    /// The seed for the random number generator.
244
    #[setters(generate = true)]
245
    seed: u64,
246

247
    /// The base-2 logarithm of buckets of the [`SigStore`]. The default is 8.
248
    /// This value is automatically overridden, even if set, if you provide an
249
    /// [expected number of keys](VBuilder::expected_num_keys).
250
    #[setters(generate = true, strip_option)]
251
    #[derivative(Default(value = "8"))]
252
    log2_buckets: u32,
253

254
    /// The target relative space loss due to [ε-cost
255
    /// sharding](https://doi.org/10.4230/LIPIcs.ESA.2019.38).s
256
    ///
257
    /// The default is 0.001. Setting a larger target, for example, 0.01, will
258
    /// increase the space overhead due to sharding, but will provide in general
259
    /// finer shards. This might not always happen, however, because the ε-cost
260
    /// bound is just one of the bounds concurring in limiting the amount of
261
    /// sharding for a specific [`ShardEdge`]. For example, increasing the
262
    /// target to 0.01 will provide very fine sharding using `Mwhc3Shards`
263
    /// shard/edge logic, activated by the `mwhc` feature, but will affect only
264
    /// slightly [`FuseLge3Shards`] or
265
    /// [`FuseLge3FullSigs`](crate::func::shard_edge::FuseLge3FullSigs).
266
    #[setters(generate = true, strip_option)]
267
    #[derivative(Default(value = "0.001"))]
268
    eps: f64,
269

270
    /// The bit width of the maximum value.
271
    bit_width: usize,
272
    /// The edge generator.
273
    shard_edge: E,
274
    /// The number of keys.
275
    num_keys: usize,
276
    /// The ratio between the number of vertices and the number of edges
277
    /// (i.e., between the number of variables and the number of equations).
278
    c: f64,
279
    /// Whether we should use [lazy Gaussian
280
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517).
281
    lge: bool,
282
    /// The number of threads to use.
283
    num_threads: usize,
284
    /// Fast-stop for failed attempts.
285
    failed: AtomicBool,
286
    #[doc(hidden)]
287
    _marker_v: PhantomData<(W, D, S)>,
288
}
289

290
/// Fatal build errors.
291
#[derive(thiserror::Error, Debug)]
292
pub enum BuildError {
293
    #[error("Duplicate key")]
294
    /// A duplicate key was detected.
295
    DuplicateKey,
296
    #[error("Duplicate local signatures: use full signatures")]
297
    /// A duplicate key was detected.
298
    DuplicateLocalSignatures,
299
    #[error("Value too large for specified bit size")]
300
    /// A value is too large for the specified bit size.
301
    ValueTooLarge,
302
}
303

304
#[derive(thiserror::Error, Debug)]
305
/// Transient error during the build, leading to
306
/// trying with a different seed.
307
pub enum SolveError {
308
    #[error("Duplicate signature")]
309
    /// A duplicate signature was detected.
310
    DuplicateSignature,
311
    #[error("Duplicate local signature")]
312
    /// A duplicate local signature was detected.
313
    DuplicateLocalSignature,
314
    #[error("Max shard too big")]
315
    /// The maximum shard is too big relatively to the average shard.
316
    MaxShardTooBig,
317
    #[error("Unsolvable shard")]
318
    /// A shard cannot be solved.
319
    UnsolvableShard,
320
}
321

322
/// The result of a peeling procedure.
323
enum PeelResult<
324
    'a,
325
    W: Word + BinSafe + Send + Sync,
326
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync + 'a,
327
    S: Sig + BinSafe,
328
    E: ShardEdge<S, 3>,
329
    V: BinSafe,
330
> {
331
    Complete(),
332
    Partial {
333
        /// The shard index.
334
        shard_index: usize,
335
        /// The shard.
336
        shard: Arc<Vec<SigVal<S, V>>>,
337
        /// The data.
338
        data: ShardData<'a, D>,
339
        /// The double stack whose upper stack contains the peeled edges
340
        /// (possibly represented by the vertex from which they have been
341
        /// peeled).
342
        double_stack: DoubleStack<E::Vertex>,
343
        /// The sides stack.
344
        sides_stack: Vec<u8>,
345
        _marker: PhantomData<W>,
346
    },
347
}
348

349
/// A graph represented compactly.
350
///
351
/// Each (*k*-hyper)edge is a set of *k* vertices (by construction fuse graphs
352
/// do not have degenerate edges), but we represent it internally as a vector.
353
/// We call *side* the position of a vertex in the edge.
354
///
355
/// For each vertex we store information about the edges incident to the vertex
356
/// and the sides of the vertex in such edges. While technically not necessary
357
/// to perform peeling, the knowledge of the sides speeds up the peeling visit
358
/// by reducing the number of tests that are necessary to update the degrees
359
/// once the edge is peeled (see the `peel_by_*` methods). For the same reason
360
/// it also speeds up assignment.
361
///
362
/// Depending on the peeling method (by signature or by index), the graph will
363
/// store edge indices or signature/value pairs (the generic parameter `X`).
364
///
365
/// Edge information is packed together using Djamal's XOR trick (see
366
/// [“Cache-Oblivious Peeling of Random
367
/// Hypergraphs”](https://doi.org/10.1109/DCC.2014.48)): since during the
368
/// peeling b visit we need to know the content of the list only when a single
369
/// edge index is present, we can XOR together all the edge information.
370
///
371
/// We use a single byte to store the degree (six upper bits) and the XOR of the
372
/// sides (lower two bits). The degree can be stored with a small number of bits
373
/// because the graph is random, so the maximum degree is *O*(log log *n*).
374
/// In any case, the Boolean field `overflow` will become `true` in case of
375
/// overflow.
376
///
377
/// When we peel an edge, we just [zero the degree](Self::zero), leaving the
378
/// edge information and the side in place for further processing later.
379
struct XorGraph<X: Copy + Default + BitXor + BitXorAssign> {
380
    edges: Box<[X]>,
381
    degrees_sides: Box<[u8]>,
382
    overflow: bool,
383
}
384

385
impl<X: BitXor + BitXorAssign + Default + Copy> XorGraph<X> {
386
    pub fn new(n: usize) -> XorGraph<X> {
187✔
387
        XorGraph {
388
            edges: vec![X::default(); n].into(),
935✔
389
            degrees_sides: vec![0; n].into(),
374✔
390
            overflow: false,
391
        }
392
    }
393

394
    #[inline(always)]
395
    pub fn add(&mut self, v: usize, x: X, side: usize) {
69,242,100✔
396
        debug_assert!(side < 3);
138,484,200✔
397
        let (degree_size, overflow) = self.degrees_sides[v].overflowing_add(4);
207,726,300✔
398
        self.degrees_sides[v] = degree_size;
69,242,100✔
399
        self.overflow |= overflow;
69,242,100✔
400
        self.degrees_sides[v] ^= side as u8;
69,242,100✔
401
        self.edges[v] ^= x;
69,242,100✔
402
    }
403

404
    #[inline(always)]
405
    pub fn remove(&mut self, v: usize, x: X, side: usize) {
44,883,320✔
406
        debug_assert!(side < 3);
89,766,640✔
407
        self.degrees_sides[v] -= 4;
44,883,320✔
408
        self.degrees_sides[v] ^= side as u8;
44,883,320✔
409
        self.edges[v] ^= x;
44,883,320✔
410
    }
411

412
    #[inline(always)]
413
    pub fn zero(&mut self, v: usize) {
22,441,660✔
414
        self.degrees_sides[v] &= 0b11;
22,441,660✔
415
    }
416

417
    #[inline(always)]
418
    pub fn edge_and_side(&self, v: usize) -> (X, usize) {
32,441,660✔
419
        debug_assert!(self.degree(v) < 2);
97,324,980✔
420
        (self.edges[v] as _, (self.degrees_sides[v] & 0b11) as _)
64,883,320✔
421
    }
422

423
    #[inline(always)]
424
    pub fn degree(&self, v: usize) -> u8 {
101,269,584✔
425
        self.degrees_sides[v] >> 2
101,269,584✔
426
    }
427

428
    pub fn degrees(
187✔
429
        &self,
430
    ) -> std::iter::Map<std::iter::Copied<std::slice::Iter<'_, u8>>, fn(u8) -> u8> {
431
        self.degrees_sides.iter().copied().map(|d| d >> 2)
26,093,760✔
432
    }
433
}
434

435
/// A preallocated stack implementation that avoids the expensive (even if
436
/// rarely taken) branch of the `Vec` implementation in which memory is
437
/// reallocated. Note that using [`Vec::with_capacity`] is not enough, because
438
/// for the CPU the branch is still there.
439
struct FastStack<X: Copy + Default> {
440
    stack: Vec<X>,
441
    top: usize,
442
}
443

444
impl<X: Copy + Default> FastStack<X> {
445
    pub fn new(n: usize) -> FastStack<X> {
24✔
446
        FastStack {
447
            stack: vec![X::default(); n],
48✔
448
            top: 0,
449
        }
450
    }
451

452
    pub fn push(&mut self, x: X) {
24,000,000✔
453
        debug_assert!(self.top < self.stack.len());
72,000,000✔
454
        self.stack[self.top] = x;
24,000,000✔
455
        self.top += 1;
24,000,000✔
456
    }
457

458
    pub fn len(&self) -> usize {
12✔
459
        self.top
12✔
460
    }
461

462
    pub fn iter(&self) -> std::slice::Iter<'_, X> {
24✔
463
        self.stack[..self.top].iter()
48✔
464
    }
465
}
466

467
/// Two stacks in the same vector.
468
///
469
/// This struct implements a pair of stacks sharing the same memory. The lower
470
/// stack grows from the beginning of the vector, the upper stack grows from the
471
/// end of the vector. Since we use the lower stack for the visit and the upper
472
/// stack for peeled edges (possibly represented by the vertex from which they
473
/// have been peeled), the sum of the lengths of the two stacks cannot exceed
474
/// the length of the vector.
475
#[derive(Debug)]
476
struct DoubleStack<V> {
477
    stack: Vec<V>,
478
    lower: usize,
479
    upper: usize,
480
}
481

482
impl<V: Default + Copy> DoubleStack<V> {
483
    fn new(n: usize) -> DoubleStack<V> {
175✔
484
        DoubleStack {
485
            stack: vec![V::default(); n],
525✔
486
            lower: 0,
487
            upper: n,
488
        }
489
    }
490
}
491

492
impl<V: Copy> DoubleStack<V> {
493
    #[inline(always)]
494
    fn push_lower(&mut self, v: V) {
10,941,476✔
495
        debug_assert!(self.lower < self.upper);
21,882,952✔
496
        self.stack[self.lower] = v;
10,941,476✔
497
        self.lower += 1;
10,941,476✔
498
    }
499

500
    #[inline(always)]
501
    fn push_upper(&mut self, v: V) {
10,441,660✔
502
        debug_assert!(self.lower < self.upper);
20,883,320✔
503
        self.upper -= 1;
10,441,660✔
504
        self.stack[self.upper] = v;
10,441,660✔
505
    }
506

507
    #[inline(always)]
508
    fn pop_lower(&mut self) -> Option<V> {
10,941,651✔
509
        if self.lower == 0 {
10,941,651✔
510
            None
175✔
511
        } else {
512
            self.lower -= 1;
10,941,476✔
513
            Some(self.stack[self.lower])
10,941,476✔
514
        }
515
    }
516

517
    fn upper_len(&self) -> usize {
330✔
518
        self.stack.len() - self.upper
660✔
519
    }
520

521
    fn iter_upper(&self) -> Iter<'_, V> {
200✔
522
        self.stack[self.upper..].iter()
400✔
523
    }
524
}
525

526
/// An iterator over segments of data associated with each shard.
527
type ShardDataIter<'a, D> = <D as SliceByValueMut>::ChunksMut<'a>;
528
/// A segment of data associated with a specific shard.
529
type ShardData<'a, D> = <ShardDataIter<'a, D> as Iterator>::Item;
530

531
/// Builds a new function using a `Box<[W]>` to store values.
532
///
533
/// Since values are stored in a boxed slice access is particularly fast, but
534
/// the bit width of the output of the function will be exactly the bit width of
535
/// the unsigned type `W`.
536
impl<W: Word + BinSafe, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, Box<[W]>, S, E>
537
where
538
    u128: UpcastableFrom<W>,
539
    SigVal<S, W>: RadixKey,
540
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
541
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
542
{
543
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
544
        mut self,
545
        keys: impl FallibleRewindableLender<
546
            RewindError: std::error::Error + Send + Sync + 'static,
547
            Error: std::error::Error + Send + Sync + 'static,
548
        > + for<'lend> FallibleLending<'lend, Lend = &'lend B>,
549
        values: impl FallibleRewindableLender<
550
            RewindError: std::error::Error + Send + Sync + 'static,
551
            Error: std::error::Error + Send + Sync + 'static,
552
        > + for<'lend> FallibleLending<'lend, Lend = &'lend W>,
553
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
554
    ) -> anyhow::Result<VFunc<T, W, Box<[W]>, S, E>>
555
    where
556
        for<'a> <<Box<[W]> as SliceByValueMut>::ChunksMut<'a> as Iterator>::Item:
557
            BitFieldSliceMut<W>,
558
        for<'a> ShardDataIter<'a, Box<[W]>>: Send,
559
        for<'a> ShardData<'a, Box<[W]>>: Send,
560
    {
561
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
×
562
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
×
563
        self.build_loop(keys, values, None, &get_val, new_data, pl)
×
564
    }
565
}
566

567
/// Builds a new filter using a `Box<[W]>` to store values.
568
///
569
/// Since values are stored in a boxed slice access is particularly fast, but
570
/// the number of bits of the hashes will be  exactly the bit width of the
571
/// unsigned type `W`.
572
impl<W: Word + BinSafe + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
573
    VBuilder<W, Box<[W]>, S, E>
574
where
575
    u128: UpcastableFrom<W>,
576
    SigVal<S, EmptyVal>: RadixKey,
577
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
578
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
579
{
580
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
36✔
581
        mut self,
582
        keys: impl FallibleRewindableLender<
583
            RewindError: std::error::Error + Send + Sync + 'static,
584
            Error: std::error::Error + Send + Sync + 'static,
585
        > + for<'lend> FallibleLending<'lend, Lend = &'lend B>,
586
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
587
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, Box<[W]>, S, E>>>
588
    where
589
        for<'a> <<Box<[W]> as SliceByValueMut>::ChunksMut<'a> as Iterator>::Item:
590
            BitFieldSliceMut<W>,
591
        for<'a> ShardDataIter<'a, Box<[W]>>: Send,
592
        for<'a> ShardData<'a, Box<[W]>>: Send,
593
    {
594
        let filter_mask = W::MAX;
72✔
595
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
12,630,474✔
596
            mix64(shard_edge.edge_hash(sig_val.sig)).downcast()
63,152,190✔
597
        };
598
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
378✔
599

600
        Ok(VFilter {
×
601
            func: self.build_loop(
72✔
602
                keys,
36✔
603
                FromCloneableIntoIterator::from(itertools::repeat_n(
72✔
604
                    EmptyVal::default(),
36✔
605
                    usize::MAX,
×
606
                )),
607
                Some(W::BITS),
36✔
608
                &get_val,
36✔
609
                new_data,
36✔
610
                pl,
36✔
611
            )?,
612
            filter_mask,
36✔
613
            hash_bits: W::BITS as u32,
36✔
614
        })
615
    }
616
}
617

618
/// Builds a new function using a [bit-field vector](BitFieldVec) on words of
619
/// the unsigned type `W` to store values.
620
///
621
/// Since values are stored in a bit-field vector access will be slower than
622
/// when using a boxed slice, but the bit width of stored values will be the
623
/// minimum necessary. It must be in any case at most the bit width of `W`.
624
///
625
/// Typically `W` will be `usize` or `u64`.
626
impl<W: Word + BinSafe, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, BitFieldVec<W>, S, E>
627
where
628
    u128: UpcastableFrom<W>,
629
    SigVal<S, W>: RadixKey,
630
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
631
{
632
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
27✔
633
        mut self,
634
        keys: impl FallibleRewindableLender<
635
            RewindError: std::error::Error + Send + Sync + 'static,
636
            Error: std::error::Error + Send + Sync + 'static,
637
        > + for<'lend> FallibleLending<'lend, Lend = &'lend B>,
638
        values: impl FallibleRewindableLender<
639
            RewindError: std::error::Error + Send + Sync + 'static,
640
            Error: std::error::Error + Send + Sync + 'static,
641
        > + for<'lend> FallibleLending<'lend, Lend = &'lend W>,
642
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
643
    ) -> anyhow::Result<VFunc<T, W, BitFieldVec<W>, S, E>> {
644
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
27✔
645
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
273✔
646
        self.build_loop(keys, values, None, &get_val, new_data, pl)
216✔
647
    }
648
}
649

650
/// Builds a new filter using a [bit-field vector](BitFieldVec) on words of the
651
/// unsigned type `W` to store values.
652
///
653
/// Since values are stored in a bit-field vector access will be slower than
654
/// when using a boxed slice, but the number of bits of the hashes can be set at
655
/// will. It must be in any case at most the bit width of `W`.
656
///
657
/// Typically `W` will be `usize` or `u64`.
658
impl<W: Word + BinSafe + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
659
    VBuilder<W, BitFieldVec<W>, S, E>
660
where
661
    u128: UpcastableFrom<W>,
662
    SigVal<S, EmptyVal>: RadixKey,
663
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
664
{
665
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
666
        mut self,
667
        keys: impl FallibleRewindableLender<
668
            RewindError: std::error::Error + Send + Sync + 'static,
669
            Error: std::error::Error + Send + Sync + 'static,
670
        > + for<'lend> FallibleLending<'lend, Lend = &'lend B>,
671
        filter_bits: usize,
672
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
673
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, BitFieldVec<W>, S, E>>> {
674
        assert!(filter_bits > 0);
×
675
        assert!(filter_bits <= W::BITS);
×
676
        let filter_mask = W::MAX >> (W::BITS - filter_bits);
×
677
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
×
678
            <W as DowncastableFrom<u64>>::downcast_from(mix64(shard_edge.edge_hash(sig_val.sig)))
×
679
                & filter_mask
×
680
        };
681
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
×
682

683
        Ok(VFilter {
×
684
            func: self.build_loop(
×
685
                keys,
×
686
                FromCloneableIntoIterator::from(itertools::repeat_n(
×
687
                    EmptyVal::default(),
×
688
                    usize::MAX,
×
689
                )),
690
                Some(filter_bits),
×
691
                &get_val,
×
692
                new_data,
×
693
                pl,
×
694
            )?,
695
            filter_mask,
×
696
            hash_bits: filter_bits as _,
×
697
        })
698
    }
699
}
700

701
impl<
702
    W: Word + BinSafe,
703
    D: BitFieldSlice<W>
704
        + for<'a> BitFieldSliceMut<W, ChunksMut<'a>: Iterator<Item: BitFieldSliceMut<W>>>
705
        + Send
706
        + Sync,
707
    S: Sig + Send + Sync,
708
    E: ShardEdge<S, 3>,
709
> VBuilder<W, D, S, E>
710
{
711
    /// Builds and returns a new function with given keys and values.
712
    ///
713
    /// This method can build functions based both on vectors and on bit-field
714
    /// vectors. The necessary abstraction is provided by the
715
    /// `new_data(bit_width, len)` function, which is called to create the data
716
    /// structure to store the values.
717
    ///
718
    /// When `V` is [`EmptyVal`], the this method builds a function supporting a
719
    /// filter by mapping each key to a mix of its local signature. The
720
    /// necessary abstraction is provided by the `get_val` function, which is
721
    /// called to extract the value from the signature/value pair; in the case
722
    /// of functions it returns the value stored in the signature/value pair,
723
    /// and in the case of filters it returns the hash associated with the
724
    /// signature.
725
    fn build_loop<
63✔
726
        T: ?Sized + ToSig<S> + std::fmt::Debug,
727
        B: ?Sized + Borrow<T>,
728
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
729
    >(
730
        &mut self,
731
        mut keys: impl FallibleRewindableLender<
732
            RewindError: std::error::Error + Send + Sync + 'static,
733
            Error: std::error::Error + Send + Sync + 'static,
734
        > + for<'lend> FallibleLending<'lend, Lend = &'lend B>,
735
        mut values: impl FallibleRewindableLender<
736
            RewindError: std::error::Error + Send + Sync + 'static,
737
            Error: std::error::Error + Send + Sync + 'static,
738
        > + for<'lend> FallibleLending<'lend, Lend = &'lend V>,
739
        bit_width: Option<usize>,
740
        get_val: &(impl Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync),
741
        new_data: fn(usize, usize) -> D,
742
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
743
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
744
    where
745
        u128: UpcastableFrom<W>,
746
        SigVal<S, V>: RadixKey,
747
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
748
        for<'a> ShardDataIter<'a, D>: Send,
749
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
750
    {
751
        let mut dup_count = 0;
126✔
752
        let mut local_dup_count = 0;
126✔
753
        let mut prng = SmallRng::seed_from_u64(self.seed);
189✔
754

755
        if let Some(expected_num_keys) = self.expected_num_keys {
167✔
756
            self.shard_edge.set_up_shards(expected_num_keys, self.eps);
260✔
757
            self.log2_buckets = self.shard_edge.shard_high_bits();
52✔
758
        }
759

760
        pl.info(format_args!("Using 2^{} buckets", self.log2_buckets));
189✔
761

762
        // Loop until success or duplicate detection
763
        loop {
×
764
            let seed = prng.random();
588✔
765

766
            match if self.offline {
196✔
767
                self.try_seed(
144✔
768
                    seed,
72✔
769
                    sig_store::new_offline::<S, V>(
72✔
770
                        self.log2_buckets,
72✔
771
                        LOG2_MAX_SHARDS,
×
772
                        self.expected_num_keys,
72✔
773
                    )?,
774
                    &mut keys,
72✔
775
                    &mut values,
72✔
776
                    bit_width,
72✔
777
                    get_val,
72✔
778
                    new_data,
72✔
779
                    pl,
72✔
780
                )
781
            } else {
782
                self.try_seed(
248✔
783
                    seed,
124✔
784
                    sig_store::new_online::<S, V>(
124✔
785
                        self.log2_buckets,
124✔
786
                        LOG2_MAX_SHARDS,
×
787
                        self.expected_num_keys,
124✔
788
                    )?,
789
                    &mut keys,
124✔
790
                    &mut values,
124✔
791
                    bit_width,
124✔
792
                    get_val,
124✔
793
                    new_data,
124✔
794
                    pl,
124✔
795
                )
796
            } {
797
                Ok(func) => {
62✔
798
                    return Ok(func);
62✔
799
                }
800
                Err(error) => {
134✔
801
                    match error.downcast::<SolveError>() {
134✔
802
                        Ok(vfunc_error) => match vfunc_error {
268✔
803
                            // Let's try another seed, but just a few times--most likely,
804
                            // duplicate keys
805
                            SolveError::DuplicateSignature => {
×
806
                                if dup_count >= 3 {
4✔
807
                                    pl.error(format_args!("Duplicate keys (duplicate 128-bit signatures with four different seeds)"));
3✔
808
                                    return Err(BuildError::DuplicateKey.into());
1✔
809
                                }
810
                                pl.warn(format_args!(
9✔
811
                                "Duplicate 128-bit signature, trying again with a different seed..."
×
812
                                ));
813
                                dup_count += 1;
3✔
814
                            }
815
                            // Let's try another seed, but just a few times
816
                            SolveError::DuplicateLocalSignature => {
×
817
                                if local_dup_count >= 2 {
×
818
                                    pl.error(format_args!("Duplicate local signatures: use full signatures (duplicate local signatures with three different seeds)"));
×
819
                                    return Err(BuildError::DuplicateLocalSignatures.into());
×
820
                                }
821
                                pl.warn(format_args!(
×
822
                                "Duplicate local signature, trying again with a different seed..."
×
823
                                ));
824
                                local_dup_count += 1;
×
825
                            }
UNCOV
826
                            SolveError::MaxShardTooBig => {
×
UNCOV
827
                                pl.warn(format_args!(
×
UNCOV
828
                                "The maximum shard is too big, trying again with a different seed..."
×
829
                               ));
830
                            }
831
                            // Let's just try another seed
832
                            SolveError::UnsolvableShard => {
130✔
833
                                pl.warn(format_args!(
260✔
834
                                    "Unsolvable shard, trying again with a different seed..."
130✔
835
                                ));
836
                            }
837
                        },
838
                        Err(error) => return Err(error),
×
839
                    }
840
                }
841
            }
842

843
            values = values.rewind()?;
399✔
844
            keys = keys.rewind()?;
399✔
845
        }
846
    }
847
}
848

849
impl<
850
    W: Word + BinSafe,
851
    D: BitFieldSlice<W>
852
        + for<'a> BitFieldSliceMut<W, ChunksMut<'a>: Iterator<Item: BitFieldSliceMut<W>>>
853
        + Send
854
        + Sync,
855
    S: Sig + Send + Sync,
856
    E: ShardEdge<S, 3>,
857
> VBuilder<W, D, S, E>
858
{
859
    /// Tries to build a function using specific seed. See the comments in the
860
    /// [`VBuilder::build_loop`] method for more details.
861
    ///
862
    /// This methods reads the input, sets up the shards, allocates the backend
863
    /// using `new_data`, and passes the backend and an iterator on shards to
864
    /// the [`VBuilder::try_build_from_shard_iter`] method.
865
    fn try_seed<
196✔
866
        T: ?Sized + ToSig<S> + std::fmt::Debug,
867
        B: ?Sized + Borrow<T>,
868
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
869
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
870
    >(
871
        &mut self,
872
        seed: u64,
873
        mut sig_store: impl SigStore<S, V>,
874
        keys: &mut (
875
                 impl FallibleRewindableLender<
876
            RewindError: std::error::Error + Send + Sync + 'static,
877
            Error: std::error::Error + Send + Sync + 'static,
878
        > + for<'lend> FallibleLending<'lend, Lend = &'lend B>
879
             ),
880
        values: &mut (
881
                 impl FallibleRewindableLender<
882
            RewindError: std::error::Error + Send + Sync + 'static,
883
            Error: std::error::Error + Send + Sync + 'static,
884
        > + for<'lend> FallibleLending<'lend, Lend = &'lend V>
885
             ),
886
        bit_width: Option<usize>,
887
        get_val: &G,
888
        new_data: fn(usize, usize) -> D,
889
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
890
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
891
    where
892
        SigVal<S, V>: RadixKey,
893
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
894
        for<'a> ShardDataIter<'a, D>: Send,
895
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
896
    {
897
        let shard_edge = &mut self.shard_edge;
392✔
898

899
        pl.expected_updates(self.expected_num_keys);
588✔
900
        pl.item_name("key");
392✔
901
        pl.start(format!(
588✔
902
            "Computing and storing {}-bit signatures in {} using seed 0x{:016x}...",
×
903
            std::mem::size_of::<S>() * 8,
196✔
904
            sig_store
196✔
905
                .temp_dir()
196✔
906
                .map(|d| d.path().to_string_lossy())
340✔
907
                .unwrap_or(Cow::Borrowed("memory")),
392✔
908
            seed
×
909
        ));
910

911
        // This will be the maximum value for functions and EmptyVal for filters
912
        let mut maybe_max_value = V::default();
392✔
913
        let start = Instant::now();
392✔
914

915
        while let Some(key) = keys.next()? {
69,242,612✔
916
            pl.light_update();
46,161,480✔
917
            // This might be an actual value, if we are building a
918
            // function, or EmptyVal, if we are building a filter.
919
            let &maybe_val = values.next()?.expect("Not enough values");
115,403,700✔
920
            let sig_val = SigVal {
921
                sig: T::to_sig(key.borrow(), seed),
92,322,960✔
922
                val: maybe_val,
923
            };
924
            maybe_max_value = Ord::max(maybe_max_value, maybe_val);
46,161,480✔
925
            sig_store.try_push(sig_val)?;
69,242,220✔
926
        }
927
        pl.done();
392✔
928

929
        self.num_keys = sig_store.len();
196✔
930
        self.bit_width = if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
196✔
931
            bit_width.expect("Bit width must be set for filters")
342✔
932
        } else {
933
            let len_width = maybe_max_value.upcast().len() as usize;
246✔
934
            if let Some(bit_width) = bit_width {
82✔
935
                if len_width > bit_width {
×
936
                    return Err(BuildError::ValueTooLarge.into());
×
937
                }
938
                bit_width
×
939
            } else {
940
                len_width
82✔
941
            }
942
        };
943

944
        info!(
196✔
945
            "Computation of signatures from inputs completed in {:.3} seconds ({} keys, {:.3} ns/key)",
×
946
            start.elapsed().as_secs_f64(),
320✔
947
            self.num_keys,
×
948
            start.elapsed().as_nanos() as f64 / self.num_keys as f64
160✔
949
        );
950

951
        shard_edge.set_up_shards(self.num_keys, self.eps);
784✔
952

953
        let start = Instant::now();
392✔
954

955
        let shard_store = sig_store.into_shard_store(shard_edge.shard_high_bits())?;
980✔
956
        let max_shard = shard_store.shard_sizes().iter().copied().max().unwrap_or(0);
1,176✔
957
        let filter = TypeId::of::<V>() == TypeId::of::<EmptyVal>();
392✔
958

959
        (self.c, self.lge) = shard_edge.set_up_graphs(self.num_keys, max_shard);
980✔
960

961
        if filter {
310✔
962
            pl.info(format_args!(
228✔
963
                "Number of keys: {} Bit width: {}",
114✔
964
                self.num_keys, self.bit_width,
114✔
965
            ));
966
        } else {
967
            pl.info(format_args!(
328✔
968
                "Number of keys: {} Max value: {} Bit width: {}",
82✔
969
                self.num_keys,
82✔
970
                maybe_max_value.upcast(),
82✔
971
                self.bit_width,
82✔
972
            ));
973
        }
974

975
        if shard_edge.shard_high_bits() != 0 {
201✔
976
            pl.info(format_args!(
20✔
977
                "Max shard / average shard: {:.2}%",
5✔
978
                (100.0 * max_shard as f64)
10✔
979
                    / (self.num_keys as f64 / shard_edge.num_shards() as f64)
10✔
980
            ));
981
        }
982

983
        if max_shard as f64 > 1.01 * self.num_keys as f64 / shard_edge.num_shards() as f64 {
588✔
984
            // This might sometimes happen with small sharded graphs
UNCOV
985
            Err(SolveError::MaxShardTooBig.into())
×
986
        } else {
987
            let data = new_data(
988
                self.bit_width,
392✔
989
                shard_edge.num_vertices() * shard_edge.num_shards(),
588✔
990
            );
991
            self.try_build_from_shard_iter(seed, data, shard_store.into_iter(), get_val, pl)
1,568✔
992
                .inspect(|_| {
258✔
993
                    info!(
62✔
994
                        "Construction from signatures completed in {:.3} seconds ({} keys, {:.3} ns/key)",
×
995
                        start.elapsed().as_secs_f64(),
104✔
996
                        self.num_keys,
×
997
                        start.elapsed().as_nanos() as f64 / self.num_keys as f64
52✔
998
                    );
999
                })
1000
                .map_err(Into::into)
196✔
1001
        }
1002
    }
1003

1004
    /// Builds and return a new function starting from an iterator on shards.
1005
    ///
1006
    /// See [`VBuilder::build_loop`] for more details on the parameters.
1007
    fn try_build_from_shard_iter<
196✔
1008
        T: ?Sized + ToSig<S>,
1009
        I,
1010
        P,
1011
        V: BinSafe + Default + Send + Sync,
1012
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1013
    >(
1014
        &mut self,
1015
        seed: u64,
1016
        mut data: D,
1017
        shard_iter: I,
1018
        get_val: &G,
1019
        pl: &mut P,
1020
    ) -> Result<VFunc<T, W, D, S, E>, SolveError>
1021
    where
1022
        SigVal<S, V>: RadixKey,
1023
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
1024
        P: ProgressLog + Clone + Send + Sync,
1025
        I: Iterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1026
        for<'a> ShardDataIter<'a, D>: Send,
1027
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
1028
    {
1029
        let shard_edge = &self.shard_edge;
392✔
1030
        self.num_threads = shard_edge.num_shards().min(self.max_num_threads);
588✔
1031

1032
        pl.info(format_args!("{}", self.shard_edge));
588✔
1033
        pl.info(format_args!(
588✔
1034
            "c: {}, Overhead: {:+.4}% Number of threads: {}",
×
1035
            self.c,
×
1036
            100. * ((shard_edge.num_vertices() * shard_edge.num_shards()) as f64
588✔
1037
                / (self.num_keys as f64)
196✔
1038
                - 1.),
196✔
1039
            self.num_threads
×
1040
        ));
1041

1042
        if self.lge {
196✔
1043
            pl.info(format_args!("Peeling towards lazy Gaussian elimination"));
522✔
1044
            self.par_solve(
348✔
1045
                shard_iter,
174✔
1046
                &mut data,
174✔
1047
                |this, shard_index, shard, data, pl| {
165✔
1048
                    this.lge_shard(shard_index, shard, data, get_val, pl)
1,155✔
1049
                },
1050
                &mut pl.concurrent(),
174✔
1051
                pl,
174✔
1052
            )?;
1053
        } else if self.low_mem == Some(true)
22✔
1054
            || self.low_mem.is_none() && self.num_threads > 3 && shard_edge.num_shards() > 2
26✔
1055
        {
1056
            // Less memory, but slower
1057
            self.par_solve(
20✔
1058
                shard_iter,
10✔
1059
                &mut data,
10✔
1060
                |this, shard_index, shard, data, pl| {
10✔
1061
                    this.peel_by_sig_vals_low_mem(shard_index, shard, data, get_val, pl)
70✔
1062
                },
1063
                &mut pl.concurrent(),
10✔
1064
                pl,
10✔
1065
            )?;
1066
        } else {
1067
            // More memory, but faster
1068
            self.par_solve(
24✔
1069
                shard_iter,
12✔
1070
                &mut data,
12✔
1071
                |this, shard_index, shard, data, pl| {
12✔
1072
                    this.peel_by_sig_vals_high_mem(shard_index, shard, data, get_val, pl)
84✔
1073
                },
1074
                &mut pl.concurrent(),
12✔
1075
                pl,
12✔
1076
            )?;
1077
        }
1078

1079
        pl.info(format_args!(
186✔
1080
            "Bits/keys: {} ({:+.4}%)",
×
1081
            data.len() as f64 * self.bit_width as f64 / self.num_keys as f64,
62✔
1082
            100.0 * (data.len() as f64 / self.num_keys as f64 - 1.),
62✔
1083
        ));
1084

1085
        Ok(VFunc {
62✔
1086
            seed,
124✔
1087
            shard_edge: self.shard_edge,
124✔
1088
            num_keys: self.num_keys,
124✔
1089
            data,
124✔
1090
            _marker_t: std::marker::PhantomData,
124✔
1091
            _marker_w: std::marker::PhantomData,
62✔
1092
            _marker_s: std::marker::PhantomData,
62✔
1093
        })
1094
    }
1095
}
1096

1097
macro_rules! remove_edge {
1098
    ($xor_graph: ident, $e: ident, $side: ident, $edge: ident, $stack: ident, $push:ident) => {
1099
        match $side {
1100
            0 => {
1101
                if $xor_graph.degree($e[1]) == 2 {
1102
                    $stack.$push($e[1].cast());
1103
                }
1104
                $xor_graph.remove($e[1], $edge, 1);
1105
                if $xor_graph.degree($e[2]) == 2 {
1106
                    $stack.$push($e[2].cast());
1107
                }
1108
                $xor_graph.remove($e[2], $edge, 2);
1109
            }
1110
            1 => {
1111
                if $xor_graph.degree($e[0]) == 2 {
1112
                    $stack.$push($e[0].cast());
1113
                }
1114
                $xor_graph.remove($e[0], $edge, 0);
1115
                if $xor_graph.degree($e[2]) == 2 {
1116
                    $stack.$push($e[2].cast());
1117
                }
1118
                $xor_graph.remove($e[2], $edge, 2);
1119
            }
1120
            2 => {
1121
                if $xor_graph.degree($e[0]) == 2 {
1122
                    $stack.$push($e[0].cast());
1123
                }
1124
                $xor_graph.remove($e[0], $edge, 0);
1125
                if $xor_graph.degree($e[1]) == 2 {
1126
                    $stack.$push($e[1].cast());
1127
                }
1128
                $xor_graph.remove($e[1], $edge, 1);
1129
            }
1130
            _ => unsafe { unreachable_unchecked() },
1131
        }
1132
    };
1133
}
1134

1135
impl<
1136
    W: Word + BinSafe + Send + Sync,
1137
    D: BitFieldSlice<W>
1138
        + for<'a> BitFieldSliceMut<W, ChunksMut<'a>: Iterator<Item: BitFieldSliceMut<W>>>
1139
        + Send
1140
        + Sync,
1141
    S: Sig + BinSafe,
1142
    E: ShardEdge<S, 3>,
1143
> VBuilder<W, D, S, E>
1144
{
1145
    fn count_sort<V: BinSafe>(&self, data: &mut [SigVal<S, V>]) {
187✔
1146
        let num_sort_keys = self.shard_edge.num_sort_keys();
561✔
1147
        let mut count = vec![0; num_sort_keys];
561✔
1148

1149
        let mut copied = Box::new_uninit_slice(data.len());
748✔
1150
        for (&sig_val, copy) in data.iter().zip(copied.iter_mut()) {
69,243,035✔
1151
            count[self.shard_edge.sort_key(sig_val.sig)] += 1;
92,322,800✔
1152
            copy.write(sig_val);
46,161,400✔
1153
        }
1154
        let copied = unsafe { copied.assume_init() };
561✔
1155

1156
        count.iter_mut().fold(0, |acc, c| {
8,611✔
1157
            let old = *c;
16,474✔
1158
            *c = acc;
8,237✔
1159
            acc + old
8,237✔
1160
        });
1161

1162
        for &sig_val in copied.iter() {
46,161,774✔
1163
            let key = self.shard_edge.sort_key(sig_val.sig);
115,403,500✔
1164
            data[count[key]] = sig_val;
46,161,400✔
1165
            count[key] += 1;
23,080,700✔
1166
        }
1167
    }
1168

1169
    /// After this number of keys, in the case of filters we remove duplicate
1170
    /// edges.
1171
    const MAX_NO_LOCAL_SIG_CHECK: usize = 1 << 33;
1172

1173
    /// Solves in parallel shards returned by an iterator, storing
1174
    /// the result in `data`.
1175
    ///
1176
    /// # Arguments
1177
    ///
1178
    /// * `shard_iter`: an iterator returning the shards to solve.
1179
    ///
1180
    /// * `data`: the storage for the solution values.
1181
    ///
1182
    /// * `solve_shard`: a method to solve a shard; it takes the shard index,
1183
    ///   the shard, shard-local data, and a progress logger. It may
1184
    ///   fail by returning an error.
1185
    ///
1186
    /// * `main_pl`: the progress logger for the overall computation.
1187
    ///
1188
    /// * `pl`: a progress logger that will be cloned to display the progress of
1189
    ///   a current shard.
1190
    ///
1191
    /// # Errors
1192
    ///
1193
    /// This method returns an error if one of the shards cannot be solved, or
1194
    /// if duplicates are detected.
1195
    fn par_solve<
196✔
1196
        'b,
1197
        V: BinSafe,
1198
        I: IntoIterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1199
        SS: Fn(&Self, usize, Arc<Vec<SigVal<S, V>>>, ShardData<'b, D>, &mut P) -> Result<(), ()>
1200
            + Send
1201
            + Sync
1202
            + Copy,
1203
        C: ConcurrentProgressLog + Send + Sync,
1204
        P: ProgressLog + Clone + Send + Sync,
1205
    >(
1206
        &self,
1207
        shard_iter: I,
1208
        data: &'b mut D,
1209
        solve_shard: SS,
1210
        main_pl: &mut C,
1211
        pl: &mut P,
1212
    ) -> Result<(), SolveError>
1213
    where
1214
        I::IntoIter: Send,
1215
        SigVal<S, V>: RadixKey,
1216
        for<'a> ShardDataIter<'a, D>: Send,
1217
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
1218
    {
1219
        main_pl
392✔
1220
            .item_name("shard")
1221
            .expected_updates(Some(self.shard_edge.num_shards()))
392✔
1222
            .display_memory(true)
1223
            .start("Solving shards...");
1224

1225
        self.failed.store(false, Ordering::Relaxed);
588✔
1226
        let num_shards = self.shard_edge.num_shards();
588✔
1227
        let buffer_size = self.num_threads.ilog2() as usize;
392✔
1228

1229
        let (err_send, err_recv) = crossbeam_channel::bounded::<_>(self.num_threads);
588✔
1230
        let (data_send, data_recv) = crossbeam_channel::bounded::<(
1231
            usize,
1232
            (Arc<Vec<SigVal<S, V>>>, ShardData<'_, D>),
1233
        )>(buffer_size);
196✔
1234

1235
        let result = std::thread::scope(|scope| {
588✔
1236
            scope.spawn(move || {
588✔
1237
                let _ = thread_priority::set_current_thread_priority(ThreadPriority::Max);
196✔
1238
                for val in shard_iter
397✔
1239
                    .into_iter()
196✔
1240
                    .zip(data.try_chunks_mut(self.shard_edge.num_vertices()).unwrap())
1,176✔
1241
                    .enumerate()
196✔
1242
                {
1243
                    if data_send.send(val).is_err() {
603✔
1244
                        break;
×
1245
                    }
1246
                }
1247

1248
                drop(data_send);
392✔
1249
            });
1250

1251
            for _thread_id in 0..self.num_threads {
397✔
1252
                let mut main_pl = main_pl.clone();
603✔
1253
                let mut pl = pl.clone();
603✔
1254
                let err_send = err_send.clone();
603✔
1255
                let data_recv = data_recv.clone();
603✔
1256
                scope.spawn(move || {
603✔
1257
                    loop {
×
1258
                        match data_recv.recv() {
258✔
1259
                            Err(_) => return,
57✔
1260
                            Ok((shard_index, (shard, mut data))) => {
603✔
1261
                                if shard.is_empty() {
402✔
1262
                                    return;
10✔
1263
                                }
1264

1265
                                main_pl.info(format_args!(
573✔
1266
                                    "Analyzing shard {}/{}...",
×
1267
                                    shard_index + 1,
191✔
1268
                                    num_shards
×
1269
                                ));
1270

1271
                                pl.start(format!(
573✔
1272
                                    "Sorting shard {}/{}...",
×
1273
                                    shard_index + 1,
191✔
1274
                                    num_shards
×
1275
                                ));
1276

1277
                                {
1278
                                    // SAFETY: only one thread may be accessing the
1279
                                    // shard, and we will be consuming it
1280
                                    let shard = unsafe {
191✔
1281
                                        &mut *(Arc::as_ptr(&shard) as *mut Vec<SigVal<S, V>>)
191✔
1282
                                    };
1283

1284
                                    if self.check_dups {
191✔
1285
                                        shard.radix_sort_builder().sort();
12✔
1286
                                        if shard.par_windows(2).any(|w| w[0].sig == w[1].sig) {
18✔
1287
                                            let _ = err_send.send(SolveError::DuplicateSignature);
8✔
1288
                                            return;
4✔
1289
                                        }
1290
                                    }
1291

1292
                                    if TypeId::of::<E::LocalSig>() != TypeId::of::<S>()
187✔
1293
                                        && self.num_keys > Self::MAX_NO_LOCAL_SIG_CHECK
109✔
1294
                                    {
1295
                                        // Check for duplicate local signatures
1296

1297
                                        // E::SortSig provides a transmutable
1298
                                        // view of SigVal with an implementation
1299
                                        // of RadixKey that is compatible with
1300
                                        // the sort induced by the key returned
1301
                                        // by ShardEdge::sort_key, and equality
1302
                                        // that implies equality of local
1303
                                        // signatures.
1304

1305
                                        // SAFETY: we drop this immediately after sorting.
1306
                                        let shard = unsafe {
×
1307
                                            transmute::<
×
1308
                                                &mut Vec<SigVal<S, V>>,
×
1309
                                                &mut Vec<E::SortSigVal<V>>,
×
1310
                                            >(shard)
×
1311
                                        };
1312

1313
                                        let builder = shard.radix_sort_builder();
×
1314
                                        if self.max_num_threads == 1 {
×
1315
                                            builder
×
1316
                                                .with_single_threaded_tuner()
×
1317
                                                .with_parallel(false)
×
1318
                                        } else {
1319
                                            builder
×
1320
                                        }
1321
                                        .sort();
×
1322

1323
                                        let shard_len = shard.len();
×
1324
                                        shard.dedup();
×
1325

1326
                                        if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
×
1327
                                            // Duplicate local signatures on
1328
                                            // large filters can be simply
1329
                                            // removed: they do not change the
1330
                                            // semantics of the filter because
1331
                                            // hashes are computed on
1332
                                            // local signatures.
1333
                                            pl.info(format_args!(
×
1334
                                                "Removed signatures: {}",
×
1335
                                                shard_len - shard.len()
×
1336
                                            ));
1337
                                        } else {
1338
                                            // For function, we have to try again
1339
                                            if shard_len != shard.len() {
×
1340
                                                let _ = err_send
×
1341
                                                    .send(SolveError::DuplicateLocalSignature);
×
1342
                                                return;
×
1343
                                            }
1344
                                        }
1345
                                    } else if self.shard_edge.num_sort_keys() != 1 {
374✔
1346
                                        // Sorting the signatures increases locality
1347
                                        self.count_sort::<V>(shard);
374✔
1348
                                    }
1349
                                }
1350

1351
                                pl.done_with_count(shard.len());
748✔
1352

1353
                                main_pl.info(format_args!(
561✔
1354
                                    "Solving shard {}/{}...",
×
1355
                                    shard_index + 1,
187✔
1356
                                    num_shards
×
1357
                                ));
1358

1359
                                if self.failed.load(Ordering::Relaxed) {
561✔
1360
                                    return;
×
1361
                                }
1362

1363
                                if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
298✔
1364
                                    // For filters, we fill the array with random data, otherwise
1365
                                    // elements with signature 0 would have a significantly higher
1366
                                    // probability of being false positives.
1367
                                    //
1368
                                    // SAFETY: We work around the fact that [usize] does not implement Fill
1369
                                    Mwc192::seed_from_u64(self.seed).fill_bytes(unsafe {
333✔
1370
                                        data.as_mut_slice().align_to_mut::<u8>().1
111✔
1371
                                    });
1372
                                }
1373

1374
                                if solve_shard(self, shard_index, shard, data, &mut pl).is_err() {
1,122✔
1375
                                    let _ = err_send.send(SolveError::UnsolvableShard);
260✔
1376
                                    return;
130✔
1377
                                }
1378

1379
                                if self.failed.load(Ordering::Relaxed) {
171✔
1380
                                    return;
×
1381
                                }
1382

1383
                                main_pl.info(format_args!(
171✔
1384
                                    "Completed shard {}/{}",
×
1385
                                    shard_index + 1,
57✔
1386
                                    num_shards
×
1387
                                ));
1388
                                main_pl.update_and_display();
114✔
1389
                            }
1390
                        }
1391
                    }
1392
                });
1393
            }
1394

1395
            drop(err_send);
392✔
1396
            drop(data_recv);
392✔
1397

1398
            if let Some(error) = err_recv.into_iter().next() {
330✔
1399
                self.failed.store(true, Ordering::Relaxed);
402✔
1400
                return Err(error);
134✔
1401
            }
1402

1403
            Ok(())
62✔
1404
        });
1405

1406
        main_pl.done();
392✔
1407
        result
196✔
1408
    }
1409

1410
    /// Peels a shard via edge indices.
1411
    ///
1412
    /// This peeler uses a [`SigVal`] per key (the shard), a
1413
    /// [`ShardEdge::Vertex`] and a byte per vertex (for the [`XorGraph`]), a
1414
    /// [`ShardEdge::Vertex`] per vertex (for the [`DoubleStack`]), and a final
1415
    /// byte per vertex (for the stack of sides).
1416
    ///
1417
    /// This peeler uses more memory than
1418
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) but
1419
    /// less memory than
1420
    /// [`peel_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem). It
1421
    /// is fairly slow as it has to go through a cache-unfriendly memory
1422
    /// indirection every time it has to retrieve a [`SigVal`] from the shard,
1423
    /// but it is the peeler of choice when [lazy Gaussian
1424
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) is required, as
1425
    /// after a failed peel-by-sig-vals it is not possible to retrieve
1426
    /// information about the signature/value pairs in the shard.
1427
    ///
1428
    /// In theory one could avoid the stack of sides by putting vertices,
1429
    /// instead of edge indices, on the upper stack, and retrieving edge indices
1430
    /// and sides from the [`XorGraph`], as
1431
    /// [`peel_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) does,
1432
    ///  but this would be less cache friendly. This peeler is only used for
1433
    /// very small instances, and since we are going to pass through lazy
1434
    /// Gaussian elimination some additional speed is a good idea.
1435
    fn peel_by_index<'a, V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1436
        &self,
1437
        shard_index: usize,
1438
        shard: Arc<Vec<SigVal<S, V>>>,
1439
        data: ShardData<'a, D>,
1440
        get_val: &G,
1441
        pl: &mut impl ProgressLog,
1442
    ) -> Result<PeelResult<'a, W, D, S, E, V>, ()> {
1443
        let shard_edge = &self.shard_edge;
330✔
1444
        let num_vertices = shard_edge.num_vertices();
495✔
1445
        let num_shards = shard_edge.num_shards();
495✔
1446

1447
        pl.start(format!(
495✔
1448
            "Generating graph for shard {}/{}...",
×
1449
            shard_index + 1,
165✔
1450
            num_shards
×
1451
        ));
1452

1453
        let mut xor_graph = XorGraph::<E::Vertex>::new(num_vertices);
495✔
1454
        for (edge_index, sig_val) in shard.iter().enumerate() {
2,161,730✔
1455
            for (side, &v) in shard_edge
8,645,600✔
1456
                .local_edge(shard_edge.local_sig(sig_val.sig))
3,242,100✔
1457
                .iter()
1,080,700✔
1458
                .enumerate()
1,080,700✔
1459
            {
1460
                xor_graph.add(v, edge_index.cast(), side);
16,210,500✔
1461
            }
1462
        }
1463
        pl.done_with_count(shard.len());
660✔
1464

1465
        assert!(
165✔
1466
            !xor_graph.overflow,
165✔
1467
            "Degree overflow for shard {}/{}",
×
1468
            shard_index + 1,
×
1469
            num_shards
×
1470
        );
1471

1472
        if self.failed.load(Ordering::Relaxed) {
495✔
1473
            return Err(());
×
1474
        }
1475

1476
        // The lower stack contains vertices to be visited. The upper stack
1477
        // contains peeled edges.
1478
        let mut double_stack = DoubleStack::<E::Vertex>::new(num_vertices);
495✔
1479
        let mut sides_stack = Vec::<u8>::new();
330✔
1480

1481
        pl.start(format!(
495✔
1482
            "Peeling graph for shard {}/{} by edge indices...",
×
1483
            shard_index + 1,
165✔
1484
            num_shards
×
1485
        ));
1486

1487
        // Preload all vertices of degree one in the visit stack
1488
        for (v, degree) in xor_graph.degrees().enumerate() {
2,444,695✔
1489
            if degree == 1 {
1,450,670✔
1490
                double_stack.push_lower(v.cast());
685,710✔
1491
            }
1492
        }
1493

1494
        while let Some(v) = double_stack.pop_lower() {
948,085✔
1495
            let v: usize = v.upcast();
1,895,840✔
1496
            if xor_graph.degree(v) == 0 {
947,920✔
1497
                continue;
32,300✔
1498
            }
1499
            debug_assert!(xor_graph.degree(v) == 1);
1,324,980✔
1500
            let (edge_index, side) = xor_graph.edge_and_side(v);
1,766,640✔
1501
            xor_graph.zero(v);
1,324,980✔
1502
            double_stack.push_upper(edge_index);
1,324,980✔
1503
            sides_stack.push(side as u8);
1,324,980✔
1504

1505
            let e = shard_edge.local_edge(shard_edge.local_sig(shard[edge_index.upcast()].sig));
3,091,620✔
1506
            remove_edge!(xor_graph, e, side, edge_index, double_stack, push_lower);
441,660✔
1507
        }
1508

1509
        pl.done();
330✔
1510

1511
        if shard.len() != double_stack.upper_len() {
495✔
1512
            pl.info(format_args!(
465✔
1513
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1514
                shard_index + 1,
155✔
1515
                num_shards,
×
1516
                double_stack.upper_len(),
310✔
1517
                shard.len(),
310✔
1518
            ));
1519
            return Ok(PeelResult::Partial {
155✔
1520
                shard_index,
310✔
1521
                shard,
310✔
1522
                data,
310✔
1523
                double_stack,
310✔
1524
                sides_stack,
155✔
1525
                _marker: std::marker::PhantomData,
155✔
1526
            });
1527
        }
1528

1529
        self.assign(
20✔
1530
            shard_index,
10✔
1531
            data,
10✔
1532
            double_stack
10✔
1533
                .iter_upper()
10✔
1534
                .map(|&edge_index| {
110✔
1535
                    // Turn edge indices into local edge signatures
1536
                    // and associated values
1537
                    let sig_val = &shard[edge_index.upcast()];
300✔
1538
                    let local_sig = shard_edge.local_sig(sig_val.sig);
400✔
1539
                    (
1540
                        local_sig,
200✔
1541
                        get_val(
200✔
1542
                            shard_edge,
200✔
1543
                            SigVal {
100✔
1544
                                sig: local_sig,
100✔
1545
                                val: sig_val.val,
100✔
1546
                            },
1547
                        ),
1548
                    )
1549
                })
1550
                .zip(sides_stack.into_iter().rev()),
40✔
1551
            pl,
10✔
1552
        );
1553

1554
        Ok(PeelResult::Complete())
10✔
1555
    }
1556

1557
    /// Peels a shard via signature/value pairs using a stack of peeled
1558
    /// signatures/value pairs.
1559
    ///
1560
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1561
    /// it drops it immediately after building the graph.
1562
    ///
1563
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]), a
1564
    /// [`ShardEdge::Vertex`] per vertex (for visit stack, albeit usually the stack
1565
    /// never contains more than a third of the vertices), and a [`SigVal`] and
1566
    /// a byte per key (for the stack of peeled edges).
1567
    ///
1568
    /// This is the fastest and more memory-consuming peeler. It has however
1569
    /// just a small advantage during assignment with respect to
1570
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem), which
1571
    /// uses almost half the memory. It is the peeler of choice for low levels
1572
    /// of parallelism.
1573
    ///
1574
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1575
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a failed
1576
    /// peeling it is not possible to retrieve information about the
1577
    /// signature/value pairs in the shard.
1578
    fn peel_by_sig_vals_high_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
12✔
1579
        &self,
1580
        shard_index: usize,
1581
        shard: Arc<Vec<SigVal<S, V>>>,
1582
        data: ShardData<'_, D>,
1583
        get_val: &G,
1584
        pl: &mut impl ProgressLog,
1585
    ) -> Result<(), ()>
1586
    where
1587
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1588
    {
1589
        let shard_edge = &self.shard_edge;
24✔
1590
        let num_vertices = shard_edge.num_vertices();
36✔
1591
        let num_shards = shard_edge.num_shards();
36✔
1592
        let shard_len = shard.len();
36✔
1593

1594
        pl.start(format!(
36✔
1595
            "Generating graph for shard {}/{}...",
×
1596
            shard_index + 1,
12✔
1597
            num_shards
×
1598
        ));
1599

1600
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
36✔
1601
        for &sig_val in shard.iter() {
12,000,012✔
1602
            let local_sig = shard_edge.local_sig(sig_val.sig);
48,000,000✔
1603
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
156,000,000✔
1604
                xor_graph.add(
108,000,000✔
1605
                    v,
72,000,000✔
1606
                    SigVal {
36,000,000✔
1607
                        sig: local_sig,
36,000,000✔
1608
                        val: sig_val.val,
36,000,000✔
1609
                    },
1610
                    side,
36,000,000✔
1611
                );
1612
            }
1613
        }
1614
        pl.done_with_count(shard.len());
48✔
1615

1616
        // We are using a consuming iterator over the shard store, so this
1617
        // drop will free the memory used by the signatures
1618
        drop(shard);
24✔
1619

1620
        assert!(
12✔
1621
            !xor_graph.overflow,
12✔
1622
            "Degree overflow for shard {}/{}",
×
1623
            shard_index + 1,
×
1624
            num_shards
×
1625
        );
1626

1627
        if self.failed.load(Ordering::Relaxed) {
36✔
1628
            return Err(());
×
1629
        }
1630

1631
        let mut sig_vals_stack = FastStack::<SigVal<E::LocalSig, V>>::new(shard_len);
36✔
1632
        let mut sides_stack = FastStack::<u8>::new(shard_len);
36✔
1633
        // Experimentally this stack never grows beyond a little more than
1634
        // num_vertices / 4
1635
        let mut visit_stack = Vec::<E::Vertex>::with_capacity(num_vertices / 3);
36✔
1636

1637
        pl.start(format!(
36✔
1638
            "Peeling graph for shard {}/{} by signatures (high-mem)...",
×
1639
            shard_index + 1,
12✔
1640
            num_shards
×
1641
        ));
1642

1643
        // Preload all vertices of degree one in the visit stack
1644
        for (v, degree) in xor_graph.degrees().enumerate() {
27,131,940✔
1645
            if degree == 1 {
16,096,586✔
1646
                visit_stack.push(v.cast());
7,591,902✔
1647
            }
1648
        }
1649

1650
        while let Some(v) = visit_stack.pop() {
25,122,948✔
1651
            let v: usize = v.upcast();
50,245,872✔
1652
            if xor_graph.degree(v) == 0 {
25,122,936✔
1653
                continue;
561,468✔
1654
            }
1655
            let (sig_val, side) = xor_graph.edge_and_side(v);
48,000,000✔
1656
            xor_graph.zero(v);
36,000,000✔
1657
            sig_vals_stack.push(sig_val);
36,000,000✔
1658
            sides_stack.push(side as u8);
36,000,000✔
1659

1660
            let e = self.shard_edge.local_edge(sig_val.sig);
48,000,000✔
1661
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push);
12,000,000✔
1662
        }
1663

1664
        pl.done();
24✔
1665

1666
        if shard_len != sig_vals_stack.len() {
24✔
1667
            pl.info(format_args!(
×
1668
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1669
                shard_index + 1,
×
1670
                num_shards,
×
1671
                sig_vals_stack.len(),
×
1672
                shard_len
×
1673
            ));
1674
            return Err(());
×
1675
        }
1676

1677
        self.assign(
24✔
1678
            shard_index,
12✔
1679
            data,
12✔
1680
            sig_vals_stack
12✔
1681
                .iter()
12✔
1682
                .rev()
12✔
1683
                .map(|&sig_val| (sig_val.sig, get_val(shard_edge, sig_val)))
36,000,012✔
1684
                .zip(sides_stack.iter().copied().rev()),
60✔
1685
            pl,
12✔
1686
        );
1687

1688
        Ok(())
12✔
1689
    }
1690

1691
    /// Peels a shard via signature/value pairs using a stack of vertices to
1692
    /// represent peeled edges.
1693
    ///
1694
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1695
    /// it drops it immediately after building the graph.
1696
    ///
1697
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]) and a
1698
    /// [`ShardEdge::Vertex`] per vertex (for a [`DoubleStack`]).
1699
    ///
1700
    /// This is by far the less memory-hungry peeler, and it is just slightly
1701
    /// slower than
1702
    /// [`peek_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem),
1703
    /// which uses almost twice the memory. It is the peeler of choice for
1704
    /// significant levels of parallelism.
1705
    ///
1706
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1707
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a
1708
    /// failed peeling it is not possible to retrieve information about the
1709
    /// signature/value pairs in the shard.
1710
    fn peel_by_sig_vals_low_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
10✔
1711
        &self,
1712
        shard_index: usize,
1713
        shard: Arc<Vec<SigVal<S, V>>>,
1714
        data: ShardData<'_, D>,
1715
        get_val: &G,
1716
        pl: &mut impl ProgressLog,
1717
    ) -> Result<(), ()>
1718
    where
1719
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1720
    {
1721
        let shard_edge = &self.shard_edge;
20✔
1722
        let num_vertices = shard_edge.num_vertices();
30✔
1723
        let num_shards = shard_edge.num_shards();
30✔
1724
        let shard_len = shard.len();
30✔
1725

1726
        pl.start(format!(
30✔
1727
            "Generating graph for shard {}/{}...",
×
1728
            shard_index + 1,
10✔
1729
            num_shards,
×
1730
        ));
1731

1732
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
30✔
1733
        for &sig_val in shard.iter() {
10,000,010✔
1734
            let local_sig = shard_edge.local_sig(sig_val.sig);
40,000,000✔
1735
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
130,000,000✔
1736
                xor_graph.add(
90,000,000✔
1737
                    v,
60,000,000✔
1738
                    SigVal {
30,000,000✔
1739
                        sig: local_sig,
30,000,000✔
1740
                        val: sig_val.val,
30,000,000✔
1741
                    },
1742
                    side,
30,000,000✔
1743
                );
1744
            }
1745
        }
1746
        pl.done_with_count(shard.len());
40✔
1747

1748
        // We are using a consuming iterator over the shard store, so this
1749
        // drop will free the memory used by the signatures
1750
        drop(shard);
20✔
1751

1752
        assert!(
10✔
1753
            !xor_graph.overflow,
10✔
1754
            "Degree overflow for shard {}/{}",
×
1755
            shard_index + 1,
×
1756
            num_shards
×
1757
        );
1758

1759
        if self.failed.load(Ordering::Relaxed) {
30✔
1760
            return Err(());
×
1761
        }
1762

1763
        let mut visit_stack = DoubleStack::<E::Vertex>::new(num_vertices);
30✔
1764

1765
        pl.start(format!(
30✔
1766
            "Peeling graph for shard {}/{} by signatures (low-mem)...",
×
1767
            shard_index + 1,
10✔
1768
            num_shards
×
1769
        ));
1770

1771
        // Preload all vertices of degree one in the visit stack
1772
        for (v, degree) in xor_graph.degrees().enumerate() {
22,609,950✔
1773
            if degree == 1 {
13,413,566✔
1774
                visit_stack.push_lower(v.cast());
6,325,818✔
1775
            }
1776
        }
1777

1778
        while let Some(v) = visit_stack.pop_lower() {
20,935,042✔
1779
            let v: usize = v.upcast();
41,870,064✔
1780
            if xor_graph.degree(v) == 0 {
20,935,032✔
1781
                continue;
467,516✔
1782
            }
1783
            let (sig_val, side) = xor_graph.edge_and_side(v);
40,000,000✔
1784
            xor_graph.zero(v);
30,000,000✔
1785
            visit_stack.push_upper(v.cast());
40,000,000✔
1786

1787
            let e = self.shard_edge.local_edge(sig_val.sig);
40,000,000✔
1788
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push_lower);
10,000,000✔
1789
        }
1790

1791
        pl.done();
20✔
1792

1793
        if shard_len != visit_stack.upper_len() {
20✔
1794
            pl.info(format_args!(
×
1795
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1796
                shard_index + 1,
×
1797
                num_shards,
×
1798
                visit_stack.upper_len(),
×
1799
                shard_len
×
1800
            ));
1801
            return Err(());
×
1802
        }
1803

1804
        self.assign(
20✔
1805
            shard_index,
10✔
1806
            data,
10✔
1807
            visit_stack.iter_upper().map(|&v| {
10,000,030✔
1808
                let (sig_val, side) = xor_graph.edge_and_side(v.upcast());
50,000,000✔
1809
                ((sig_val.sig, get_val(shard_edge, sig_val)), side as u8)
30,000,000✔
1810
            }),
1811
            pl,
10✔
1812
        );
1813

1814
        Ok(())
10✔
1815
    }
1816

1817
    /// Solves a shard of given index possibly using [lazy Gaussian
1818
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517), and stores the
1819
    /// solution in the given data.
1820
    ///
1821
    /// As a first try, the shard is [peeled by index](VBuilder::peel_by_index).
1822
    /// If the peeling is [partial](PeelResult::Partial), lazy Gaussian
1823
    /// elimination is used to solve the remaining edges.
1824
    ///
1825
    /// This method will scan the double stack, without emptying it, to check
1826
    /// which edges have been peeled. The information will be then passed to
1827
    /// [`VBuilder::assign`] to complete the assignment of values.
1828
    fn lge_shard<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1829
        &self,
1830
        shard_index: usize,
1831
        shard: Arc<Vec<SigVal<S, V>>>,
1832
        data: ShardData<'_, D>,
1833
        get_val: &G,
1834
        pl: &mut impl ProgressLog,
1835
    ) -> Result<(), ()> {
1836
        let shard_edge = &self.shard_edge;
330✔
1837
        // Let's try to peel first
1838
        match self.peel_by_index(shard_index, shard, data, get_val, pl) {
1,155✔
1839
            Err(()) => Err(()),
×
1840
            Ok(PeelResult::Complete()) => Ok(()),
10✔
1841
            Ok(PeelResult::Partial {
×
1842
                shard_index,
155✔
1843
                shard,
155✔
1844
                mut data,
155✔
1845
                double_stack,
155✔
1846
                sides_stack,
155✔
1847
                _marker: PhantomData,
×
1848
            }) => {
×
1849
                pl.info(format_args!("Switching to lazy Gaussian elimination..."));
465✔
1850
                // Likely result--we have solve the rest
1851
                pl.start(format!(
465✔
1852
                    "Generating system for shard {}/{}...",
×
1853
                    shard_index + 1,
155✔
1854
                    shard_edge.num_shards()
310✔
1855
                ));
1856

1857
                let num_vertices = shard_edge.num_vertices();
465✔
1858
                let mut peeled_edges = BitVec::new(shard.len());
620✔
1859
                let mut used_vars = BitVec::new(num_vertices);
465✔
1860
                for &edge in double_stack.iter_upper() {
883,430✔
1861
                    peeled_edges.set(edge.upcast(), true);
1,324,680✔
1862
                }
1863

1864
                // Create data for a system using non-peeled edges
1865
                //
1866
                // SAFETY: there is no undefined behavior here, but the
1867
                // raw construction methods we use assume that the
1868
                // equations are sorted, that the variables are not repeated,
1869
                // and the variables are in the range [0..num_vertices).
1870
                let mut system = unsafe {
1871
                    crate::utils::mod2_sys::Modulo2System::from_parts(
1872
                        num_vertices,
155✔
1873
                        shard
155✔
1874
                            .iter()
155✔
1875
                            .enumerate()
155✔
1876
                            .filter(|(edge_index, _)| !peeled_edges[*edge_index])
1,080,755✔
1877
                            .map(|(_edge_index, sig_val)| {
639,195✔
1878
                                let local_sig = shard_edge.local_sig(sig_val.sig);
2,556,160✔
1879
                                let mut eq: Vec<_> = shard_edge
2,556,160✔
1880
                                    .local_edge(local_sig)
639,040✔
1881
                                    .iter()
639,040✔
1882
                                    .map(|&x| {
2,556,160✔
1883
                                        used_vars.set(x, true);
5,751,360✔
1884
                                        x as u32
1,917,120✔
1885
                                    })
1886
                                    .collect();
639,040✔
1887
                                eq.sort_unstable();
639,040✔
1888
                                crate::utils::mod2_sys::Modulo2Equation::from_parts(
639,040✔
1889
                                    eq,
639,040✔
1890
                                    get_val(
1,278,080✔
1891
                                        shard_edge,
1,278,080✔
1892
                                        SigVal {
639,040✔
1893
                                            sig: local_sig,
639,040✔
1894
                                            val: sig_val.val,
639,040✔
1895
                                        },
1896
                                    ),
1897
                                )
1898
                            })
1899
                            .collect(),
155✔
1900
                    )
1901
                };
1902

1903
                if self.failed.load(Ordering::Relaxed) {
465✔
1904
                    return Err(());
×
1905
                }
1906

1907
                pl.start("Solving system...");
310✔
1908
                let result = system.lazy_gaussian_elimination().map_err(|_| ())?;
620✔
1909
                pl.done_with_count(system.num_equations());
100✔
1910

1911
                for (v, &value) in result.iter().enumerate().filter(|(v, _)| used_vars[*v]) {
2,990,880✔
1912
                    data.set_value(v, value);
1,850,325✔
1913
                }
1914

1915
                self.assign(
50✔
1916
                    shard_index,
25✔
1917
                    data,
25✔
1918
                    double_stack
25✔
1919
                        .iter_upper()
25✔
1920
                        .map(|&edge_index| {
411,615✔
1921
                            let sig_val = &shard[edge_index.upcast()];
1,234,770✔
1922
                            let local_sig = shard_edge.local_sig(sig_val.sig);
1,646,360✔
1923
                            (
1924
                                local_sig,
823,180✔
1925
                                get_val(
823,180✔
1926
                                    shard_edge,
823,180✔
1927
                                    SigVal {
411,590✔
1928
                                        sig: local_sig,
411,590✔
1929
                                        val: sig_val.val,
411,590✔
1930
                                    },
1931
                                ),
1932
                            )
1933
                        })
1934
                        .zip(sides_stack.into_iter().rev()),
100✔
1935
                    pl,
25✔
1936
                );
1937
                Ok(())
25✔
1938
            }
1939
        }
1940
    }
1941

1942
    /// Perform assignment of values based on peeling data.
1943
    ///
1944
    /// This method might be called after a successful peeling procedure, or
1945
    /// after a linear solver has been used to solve the remaining edges.
1946
    ///
1947
    /// `sig_vals_sides` is an iterator returning pairs of signature/value pairs
1948
    /// and sides in reverse peeling order.
1949
    fn assign(
57✔
1950
        &self,
1951
        shard_index: usize,
1952
        mut data: ShardData<'_, D>,
1953
        sigs_vals_sides: impl Iterator<Item = ((E::LocalSig, W), u8)>,
1954
        pl: &mut impl ProgressLog,
1955
    ) where
1956
        for<'a> ShardData<'a, D>: SliceByValueMut<Value = W>,
1957
    {
1958
        if self.failed.load(Ordering::Relaxed) {
171✔
1959
            return;
×
1960
        }
1961

1962
        pl.start(format!(
171✔
1963
            "Assigning values for shard {}/{}...",
×
1964
            shard_index + 1,
57✔
1965
            self.shard_edge.num_shards()
114✔
1966
        ));
1967

1968
        for ((sig, val), side) in sigs_vals_sides {
67,235,127✔
1969
            let edge = self.shard_edge.local_edge(sig);
89,646,760✔
1970
            let side = side as usize;
44,823,380✔
1971
            unsafe {
1972
                let xor = match side {
44,823,380✔
1973
                    0 => data.get_value_unchecked(edge[1]) ^ data.get_value_unchecked(edge[2]),
19,668,345✔
1974
                    1 => data.get_value_unchecked(edge[0]) ^ data.get_value_unchecked(edge[2]),
34,052,275✔
1975
                    2 => data.get_value_unchecked(edge[0]) ^ data.get_value_unchecked(edge[1]),
58,337,830✔
1976
                    _ => core::hint::unreachable_unchecked(),
1977
                };
1978

1979
                data.set_value_unchecked(edge[side], val ^ xor);
89,646,760✔
1980
            }
1981
        }
1982
        pl.done();
114✔
1983
    }
1984
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc