• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

vigna / sux-rs / 17841719366

18 Sep 2025 09:21PM UTC coverage: 64.717% (-2.2%) from 66.912%
17841719366

push

github

vigna
Completed delegations for custom iterators; fixed infinite recursion when delegating traits to MemCase

0 of 7 new or added lines in 3 files covered. (0.0%)

909 existing lines in 21 files now uncovered.

3674 of 5677 relevant lines covered (64.72%)

117564551.11 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

58.7
/src/func/vbuilder.rs
1
/*
2
 * SPDX-FileCopyrightText: 2025 Sebastiano Vigna
3
 *
4
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
5
 */
6

7
#![allow(clippy::type_complexity)]
8
#![allow(clippy::too_many_arguments)]
9

10
use crate::bits::*;
11
use crate::dict::VFilter;
12
use crate::func::{shard_edge::ShardEdge, *};
13
use crate::traits::bit_field_slice::{BitFieldSlice, BitFieldSliceMut, Word};
14
use crate::utils::*;
15
use common_traits::{
16
    CastableInto, DowncastableFrom, DowncastableInto, UnsignedInt, UpcastableFrom, UpcastableInto,
17
};
18
use derivative::Derivative;
19
use derive_setters::*;
20
use dsi_progress_logger::*;
21
use epserde::prelude::*;
22
use log::info;
23
use rand::SeedableRng;
24
use rand::rngs::SmallRng;
25
use rand::{Rng, RngCore};
26
use rayon::iter::ParallelIterator;
27
use rayon::slice::ParallelSlice;
28
use rdst::*;
29
use std::any::TypeId;
30
use std::borrow::{Borrow, Cow};
31
use std::hint::unreachable_unchecked;
32
use std::marker::PhantomData;
33
use std::mem::transmute;
34
use std::ops::{BitXor, BitXorAssign};
35
use std::slice::Iter;
36
use std::sync::Arc;
37
use std::sync::atomic::{AtomicBool, Ordering};
38
use std::time::Instant;
39
use thread_priority::ThreadPriority;
40

41
use super::shard_edge::FuseLge3Shards;
42

43
const LOG2_MAX_SHARDS: u32 = 16;
44

45
/// A builder for [`VFunc`] and [`VFilter`].
46
///
47
/// Keys must implement the [`ToSig`] trait, which provides a method to compute
48
/// a signature of the key.
49
///
50
/// There are two construction modes: in core memory (default) and
51
/// [offline](VBuilder::offline); both use a [`SigStore`]. In the first case,
52
/// space will be allocated in core memory for signatures and associated values
53
/// for all keys; in the second case, such information will be stored in a
54
/// number of on-disk buckets.
55
///
56
/// There are several setters: for example, you can set [set the maximum number
57
/// of threads](VBuilder::max_num_threads).
58
///
59
/// Once signatures have been computed, each parallel thread will process a
60
/// shard of the signature/value pairs. For very large key sets shards will be
61
/// significantly smaller than the number of keys, so the memory usage, in
62
/// particular in offline mode, can be significantly reduced. Note that using
63
/// too many threads might actually be harmful due to memory contention.
64
///
65
/// The generic parameters are explained in the [`VFunc`] documentation. You
66
/// have to choose the type of the output values and the backend. The remaining
67
/// parameters have default values that are the same as those of
68
/// [`VFunc`]/[`VFilter`], and some elaboration about them can be found in their
69
/// documentation.
70
///
71
/// All construction methods require to pass one or two [`RewindableIoLender`]s
72
/// (keys and possibly values), and the construction might fail and keys might
73
/// be scanned again. The structures in the [`lenders`] module provide easy ways
74
/// to build such lenders, even starting from compressed files of UTF-8 strings.
75
/// The type of the keys of the resulting filter or function will be the type of
76
/// the elements of the [`RewindableIoLender`].
77
///
78
/// # Examples
79
///
80
/// In this example, we build a function that maps each key to itself using a
81
/// boxed slice of `usize` as a backend (note that this is really wasteful). The
82
/// setter for the expected number of keys is used to optimize the construction.
83
/// We use the [`FromIntoIterator`] adapter to turn a clonable [`IntoIterator`]
84
/// into a [`RewindableIoLender`]. Note that you need the
85
/// [`dsi-progress-logger`](https://crates.io/crates/dsi-progress-logger) crate.
86
///
87
/// Type inference derives the input type (`usize`) from the type of the items
88
/// returned by the first [`RewindableIoLender`], and the output type (again,
89
/// `usize`, the first type parameter), from the backend type (`Box<[usize]>`,
90
/// the second type parameter):
91
///
92
/// ```rust
93
/// # use sux::func::VBuilder;
94
/// # use dsi_progress_logger::no_logging;
95
/// # use sux::utils::FromIntoIterator;
96
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
97
/// let builder = VBuilder::<_, Box<[usize]>>::default()
98
///     .expected_num_keys(100);
99
/// let func = builder.try_build_func(
100
///    FromIntoIterator::from(0..100),
101
///    FromIntoIterator::from(0..100),
102
///    no_logging![]
103
/// )?;
104
///
105
/// for i in 0..100 {
106
///    assert_eq!(i, func.get(&i));
107
/// }
108
/// #     Ok(())
109
/// # }
110
/// ```
111
///
112
/// Alternatively we can use the bit-field vector backend, that will use
113
/// ⌈log₂(99)⌉ bits per backend element:
114
///
115
/// ```rust
116
/// # use sux::func::VBuilder;
117
/// # use dsi_progress_logger::no_logging;
118
/// # use sux::utils::FromIntoIterator;
119
/// # use sux::bits::BitFieldVec;
120
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
121
/// let builder = VBuilder::<_, BitFieldVec<usize>>::default()
122
///     .expected_num_keys(100);
123
/// let func = builder.try_build_func(
124
///    FromIntoIterator::from(0..100),
125
///    FromIntoIterator::from(0..100),
126
///    no_logging![]
127
/// )?;
128
///
129
/// for i in 0..100 {
130
///    assert_eq!(i, func.get(&i));
131
/// }
132
/// #     Ok(())
133
/// # }
134
/// ```
135
///
136
/// Since the numbers are small, we can also try to use a fixed-size output:
137
/// type inference takes care of making the second range `0..100` a range of
138
/// `u8`. Note that the type of keys is always `usize`, as it is still inferred
139
/// from the type of the items returned by the first [`RewindableIoLender`]:
140
///
141
/// ```rust
142
/// # use sux::func::VBuilder;
143
/// # use dsi_progress_logger::no_logging;
144
/// # use sux::utils::FromIntoIterator;
145
/// # use sux::bits::BitFieldVec;
146
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
147
/// let builder = VBuilder::<_, Box<[u8]>>::default()
148
///     .expected_num_keys(100);
149
/// let func = builder.try_build_func(
150
///    FromIntoIterator::from(0..100),
151
///    FromIntoIterator::from(0..100),
152
///    no_logging![]
153
/// )?;
154
///
155
/// for i in 0..100 {
156
///    assert_eq!(i, func.get(&i));
157
/// }
158
/// #     Ok(())
159
/// # }
160
/// ```
161
///
162
///
163
/// We now try to build a fast 8-bit filter for the same key set, using a boxed
164
/// slice of `u8` as a backend (this is not wasteful, as the filter uses 8-bit
165
/// hashes):
166
///
167
/// ```rust
168
/// # use sux::func::VBuilder;
169
/// # use dsi_progress_logger::no_logging;
170
/// # use sux::utils::FromIntoIterator;
171
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
172
/// let builder = VBuilder::<_, Box<[u8]>>::default()
173
///     .expected_num_keys(100);
174
/// let func = builder.try_build_filter(
175
///    FromIntoIterator::from(0..100),
176
///    no_logging![]
177
/// )?;
178
///
179
/// for i in 0..100 {
180
///    assert!(func[i]);
181
/// }
182
/// #     Ok(())
183
/// # }
184
/// ```
185
///
186
/// Since the keys are very few, we can switch to 64-bit signatures, and no
187
/// shards, which will yield faster queries:
188
///
189
/// ```rust
190
/// # use sux::func::VBuilder;
191
/// # use sux::func::shard_edge::FuseLge3NoShards;
192
/// # use dsi_progress_logger::no_logging;
193
/// # use sux::utils::FromIntoIterator;
194
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
195
/// let builder = VBuilder::<_, Box<[u8]>, [u64; 1], FuseLge3NoShards>::default()
196
///     .expected_num_keys(100);
197
/// let func = builder.try_build_filter(
198
///    FromIntoIterator::from(0..100),
199
///    no_logging![]
200
/// )?;
201
///
202
/// for i in 0..100 {
203
///    assert!(func[i]);
204
/// }
205
/// #     Ok(())
206
/// # }
207
/// ```
208

209
#[derive(Setters, Debug, Derivative)]
210
#[derivative(Default)]
211
#[setters(generate = false)]
212
pub struct VBuilder<
213
    W: ZeroCopy + Word,
214
    D: BitFieldSlice<W> + Send + Sync = Box<[W]>,
215
    S = [u64; 2],
216
    E: ShardEdge<S, 3> = FuseLge3Shards,
217
> {
218
    /// The expected number of keys.
219
    ///
220
    /// While this setter is optional, setting this value to a reasonable bound
221
    /// on the actual number of keys will significantly speed up the
222
    /// construction.
223
    #[setters(generate = true, strip_option)]
224
    #[derivative(Default(value = "None"))]
225
    expected_num_keys: Option<usize>,
226

227
    /// The maximum number of parallel threads to use. The default is 8.
228
    #[setters(generate = true)]
229
    #[derivative(Default(value = "8"))]
230
    max_num_threads: usize,
231

232
    /// Use disk-based buckets to reduce core memory usage at construction time.
233
    #[setters(generate = true)]
234
    offline: bool,
235

236
    /// Check for duplicated signatures. This is not necessary in general,
237
    /// but if you suspect you might be feeding duplicate keys, you can
238
    /// enable this check.
239
    #[setters(generate = true)]
240
    check_dups: bool,
241

242
    /// Use always the low-memory peel-by-signature algorithm (true) or the
243
    /// high-memory peel-by-index algorithm (false). The former is slightly
244
    /// slower, but it uses much less memory. Normally [`VBuilder`] uses
245
    /// high-mem and switches to low-mem if there are more
246
    /// than two threads and more than four shards.
247
    #[setters(generate = true, strip_option)]
248
    #[derivative(Default(value = "None"))]
249
    low_mem: Option<bool>,
250

251
    /// The seed for the random number generator.
252
    #[setters(generate = true)]
253
    seed: u64,
254

255
    /// The base-2 logarithm of buckets of the [`SigStore`]. The default is 8.
256
    /// This value is automatically overridden, even if set, if you provide an
257
    /// [expected number of keys](VBuilder::expected_num_keys).
258
    #[setters(generate = true, strip_option)]
259
    #[derivative(Default(value = "8"))]
260
    log2_buckets: u32,
261

262
    /// The target relative space loss due to [ε-cost
263
    /// sharding](https://doi.org/10.4230/LIPIcs.ESA.2019.38).s
264
    ///
265
    /// The default is 0.001. Setting a larger target, for example, 0.01, will
266
    /// increase the space overhead due to sharding, but will provide in general
267
    /// finer shards. This might not always happen, however, because the ε-cost
268
    /// bound is just one of the bounds concurring in limiting the amount of
269
    /// sharding for a specific [`ShardEdge`]. For example, increasing the
270
    /// target to 0.01 will provide very fine sharding using `Mwhc3Shards`
271
    /// shard/edge logic, activated by the `mwhc` feature, but will affect only
272
    /// slightly [`FuseLge3Shards`] or
273
    /// [`FuseLge3FullSigs`](crate::func::shard_edge::FuseLge3FullSigs).
274
    #[setters(generate = true, strip_option)]
275
    #[derivative(Default(value = "0.001"))]
276
    eps: f64,
277

278
    /// The bit width of the maximum value.
279
    bit_width: usize,
280
    /// The edge generator.
281
    shard_edge: E,
282
    /// The number of keys.
283
    num_keys: usize,
284
    /// The ratio between the number of vertices and the number of edges
285
    /// (i.e., between the number of variables and the number of equations).
286
    c: f64,
287
    /// Whether we should use [lazy Gaussian
288
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517).
289
    lge: bool,
290
    /// The number of threads to use.
291
    num_threads: usize,
292
    /// Fast-stop for failed attempts.
293
    failed: AtomicBool,
294
    #[doc(hidden)]
295
    _marker_v: PhantomData<(W, D, S)>,
296
}
297

298
/// Fatal build errors.
299
#[derive(thiserror::Error, Debug)]
300
pub enum BuildError {
301
    #[error("Duplicate key")]
302
    /// A duplicate key was detected.
303
    DuplicateKey,
304
    #[error("Duplicate local signatures: use full signatures")]
305
    /// A duplicate key was detected.
306
    DuplicateLocalSignatures,
307
    #[error("Value too large for specified bit size")]
308
    /// A value is too large for the specified bit size.
309
    ValueTooLarge,
310
}
311

312
#[derive(thiserror::Error, Debug)]
313
/// Transient error during the build, leading to
314
/// trying with a different seed.
315
pub enum SolveError {
316
    #[error("Duplicate signature")]
317
    /// A duplicate signature was detected.
318
    DuplicateSignature,
319
    #[error("Duplicate local signature")]
320
    /// A duplicate local signature was detected.
321
    DuplicateLocalSignature,
322
    #[error("Max shard too big")]
323
    /// The maximum shard is too big relatively to the average shard.
324
    MaxShardTooBig,
325
    #[error("Unsolvable shard")]
326
    /// A shard cannot be solved.
327
    UnsolvableShard,
328
}
329

330
/// The result of a peeling procedure.
331
enum PeelResult<
332
    'a,
333
    W: ZeroCopy + Word + Send + Sync,
334
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync + 'a,
335
    S: Sig + ZeroCopy + Send + Sync,
336
    E: ShardEdge<S, 3>,
337
    V: ZeroCopy,
338
> {
339
    Complete(),
340
    Partial {
341
        /// The shard index.
342
        shard_index: usize,
343
        /// The shard.
344
        shard: Arc<Vec<SigVal<S, V>>>,
345
        /// The data.
346
        data: ShardData<'a, W, D>,
347
        /// The double stack whose upper stack contains the peeled edges
348
        /// (possibly represented by the vertex from which they have been
349
        /// peeled).
350
        double_stack: DoubleStack<E::Vertex>,
351
        /// The sides stack.
352
        sides_stack: Vec<u8>,
353
    },
354
}
355

356
/// A graph represented compactly.
357
///
358
/// Each (*k*-hyper)edge is a set of *k* vertices (by construction fuse graphs
359
/// do not have degenerate edges), but we represent it internally as a vector.
360
/// We call *side* the position of a vertex in the edge.
361
///
362
/// For each vertex we store information about the edges incident to the vertex
363
/// and the sides of the vertex in such edges. While technically not necessary
364
/// to perform peeling, the knowledge of the sides speeds up the peeling visit
365
/// by reducing the number of tests that are necessary to update the degrees
366
/// once the edge is peeled (see the `peel_by_*` methods). For the same reason
367
/// it also speeds up assignment.
368
///
369
/// Depending on the peeling method (by signature or by index), the graph will
370
/// store edge indices or signature/value pairs (the generic parameter `X`).
371
///
372
/// Edge information is packed together using Djamal's XOR trick (see
373
/// [“Cache-Oblivious Peeling of Random
374
/// Hypergraphs”](https://doi.org/10.1109/DCC.2014.48)): since during the
375
/// peeling b visit we need to know the content of the list only when a single
376
/// edge index is present, we can XOR together all the edge information.
377
///
378
/// We use a single byte to store the degree (six upper bits) and the XOR of the
379
/// sides (lower two bits). The degree can be stored with a small number of bits
380
/// because the graph is random, so the maximum degree is *O*(log log *n*).
381
/// In any case, the Boolean field `overflow` will become `true` in case of
382
/// overflow.
383
///
384
/// When we peel an edge, we just [zero the degree](Self::zero), leaving the
385
/// edge information and the side in place for further processing later.
386
struct XorGraph<X: Copy + Default + BitXor + BitXorAssign> {
387
    edges: Box<[X]>,
388
    degrees_sides: Box<[u8]>,
389
    overflow: bool,
390
}
391

392
impl<X: BitXor + BitXorAssign + Default + Copy> XorGraph<X> {
393
    pub fn new(n: usize) -> XorGraph<X> {
187✔
394
        XorGraph {
395
            edges: vec![X::default(); n].into(),
935✔
396
            degrees_sides: vec![0; n].into(),
374✔
397
            overflow: false,
398
        }
399
    }
400

401
    #[inline(always)]
402
    pub fn add(&mut self, v: usize, x: X, side: usize) {
69,242,100✔
403
        debug_assert!(side < 3);
138,484,200✔
404
        let (degree_size, overflow) = self.degrees_sides[v].overflowing_add(4);
207,726,300✔
405
        self.degrees_sides[v] = degree_size;
69,242,100✔
406
        self.overflow |= overflow;
69,242,100✔
407
        self.degrees_sides[v] ^= side as u8;
69,242,100✔
408
        self.edges[v] ^= x;
69,242,100✔
409
    }
410

411
    #[inline(always)]
412
    pub fn remove(&mut self, v: usize, x: X, side: usize) {
44,901,368✔
413
        debug_assert!(side < 3);
89,802,736✔
414
        self.degrees_sides[v] -= 4;
44,901,368✔
415
        self.degrees_sides[v] ^= side as u8;
44,901,368✔
416
        self.edges[v] ^= x;
44,901,368✔
417
    }
418

419
    #[inline(always)]
420
    pub fn zero(&mut self, v: usize) {
22,450,684✔
421
        self.degrees_sides[v] &= 0b11;
22,450,684✔
422
    }
423

424
    #[inline(always)]
425
    pub fn edge_and_side(&self, v: usize) -> (X, usize) {
32,450,684✔
426
        debug_assert!(self.degree(v) < 2);
97,352,052✔
427
        (self.edges[v] as _, (self.degrees_sides[v] & 0b11) as _)
64,901,368✔
428
    }
429

430
    #[inline(always)]
431
    pub fn degree(&self, v: usize) -> u8 {
101,315,344✔
432
        self.degrees_sides[v] >> 2
101,315,344✔
433
    }
434

435
    pub fn degrees(
187✔
436
        &self,
437
    ) -> std::iter::Map<std::iter::Copied<std::slice::Iter<'_, u8>>, fn(u8) -> u8> {
438
        self.degrees_sides.iter().copied().map(|d| d >> 2)
26,094,784✔
439
    }
440
}
441

442
/// A preallocated stack implementation that avoids the expensive (even if
443
/// rarely taken) branch of the `Vec` implementation in which memory is
444
/// reallocated. Note that using [`Vec::with_capacity`] is not enough, because
445
/// for the CPU the branch is still there.
446
struct FastStack<X: Copy + Default> {
447
    stack: Vec<X>,
448
    top: usize,
449
}
450

451
impl<X: Copy + Default> FastStack<X> {
452
    pub fn new(n: usize) -> FastStack<X> {
24✔
453
        FastStack {
454
            stack: vec![X::default(); n],
48✔
455
            top: 0,
456
        }
457
    }
458

459
    pub fn push(&mut self, x: X) {
24,000,000✔
460
        debug_assert!(self.top < self.stack.len());
72,000,000✔
461
        self.stack[self.top] = x;
24,000,000✔
462
        self.top += 1;
24,000,000✔
463
    }
464

465
    pub fn len(&self) -> usize {
12✔
466
        self.top
12✔
467
    }
468

469
    pub fn iter(&self) -> std::slice::Iter<'_, X> {
24✔
470
        self.stack[..self.top].iter()
48✔
471
    }
472
}
473

474
/// Two stacks in the same vector.
475
///
476
/// This struct implements a pair of stacks sharing the same memory. The lower
477
/// stack grows from the beginning of the vector, the upper stack grows from the
478
/// end of the vector. Since we use the lower stack for the visit and the upper
479
/// stack for peeled edges (possibly represented by the vertex from which they
480
/// have been peeled), the sum of the lengths of the two stacks cannot exceed
481
/// the length of the vector.
482
#[derive(Debug)]
483
struct DoubleStack<V> {
484
    stack: Vec<V>,
485
    lower: usize,
486
    upper: usize,
487
}
488

489
impl<V: Default + Copy> DoubleStack<V> {
490
    fn new(n: usize) -> DoubleStack<V> {
175✔
491
        DoubleStack {
492
            stack: vec![V::default(); n],
525✔
493
            lower: 0,
494
            upper: n,
495
        }
496
    }
497
}
498

499
impl<V: Copy> DoubleStack<V> {
500
    #[inline(always)]
501
    fn push_lower(&mut self, v: V) {
10,951,140✔
502
        debug_assert!(self.lower < self.upper);
21,902,280✔
503
        self.stack[self.lower] = v;
10,951,140✔
504
        self.lower += 1;
10,951,140✔
505
    }
506

507
    #[inline(always)]
508
    fn push_upper(&mut self, v: V) {
10,450,684✔
509
        debug_assert!(self.lower < self.upper);
20,901,368✔
510
        self.upper -= 1;
10,450,684✔
511
        self.stack[self.upper] = v;
10,450,684✔
512
    }
513

514
    #[inline(always)]
515
    fn pop_lower(&mut self) -> Option<V> {
10,951,315✔
516
        if self.lower == 0 {
10,951,315✔
517
            None
175✔
518
        } else {
519
            self.lower -= 1;
10,951,140✔
UNCOV
520
            Some(self.stack[self.lower])
×
521
        }
522
    }
523

524
    fn upper_len(&self) -> usize {
330✔
525
        self.stack.len() - self.upper
660✔
526
    }
527

528
    fn iter_upper(&self) -> Iter<'_, V> {
200✔
529
        self.stack[self.upper..].iter()
400✔
530
    }
531
}
532

533
/// An iterator over segments of data associated with each shard.
534
type ShardDataIter<'a, W, D> = <D as BitFieldSliceMut<W>>::ChunksMut<'a>;
535
/// A segment of data associated with a specific shard.
536
type ShardData<'a, W, D> = <ShardDataIter<'a, W, D> as Iterator>::Item;
537

538
/// Builds a new function using a `Box<[W]>` to store values.
539
///
540
/// Since values are stored in a boxed slice access is particularly fast, but
541
/// the bit width of the output of the function will be exactly the bit width of
542
/// the unsigned type `W`.
543
impl<W: ZeroCopy + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, Box<[W]>, S, E>
544
where
545
    u128: UpcastableFrom<W>,
546
    SigVal<S, W>: RadixKey,
547
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
548
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
549
{
UNCOV
550
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
551
        mut self,
552
        keys: impl RewindableIoLender<B>,
553
        values: impl RewindableIoLender<W>,
554
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
555
    ) -> anyhow::Result<VFunc<T, W, Box<[W]>, S, E>>
556
    where
557
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
558
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
559
    {
UNCOV
560
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
×
UNCOV
561
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
×
UNCOV
562
        self.build_loop(keys, values, None, &get_val, new_data, pl)
×
563
    }
564
}
565

566
/// Builds a new filter using a `Box<[W]>` to store values.
567
///
568
/// Since values are stored in a boxed slice access is particularly fast, but
569
/// the number of bits of the hashes will be  exactly the bit width of the
570
/// unsigned type `W`.
571
impl<W: ZeroCopy + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
572
    VBuilder<W, Box<[W]>, S, E>
573
where
574
    u128: UpcastableFrom<W>,
575
    SigVal<S, EmptyVal>: RadixKey,
576
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
577
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
578
{
579
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
36✔
580
        mut self,
581
        keys: impl RewindableIoLender<B>,
582
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
583
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, Box<[W]>, S, E>>>
584
    where
585
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
586
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
587
    {
588
        let filter_mask = W::MAX;
72✔
589
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
12,630,474✔
590
            mix64(shard_edge.edge_hash(sig_val.sig)).downcast()
63,152,190✔
591
        };
592
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
378✔
593

UNCOV
594
        Ok(VFilter {
×
595
            func: self.build_loop(
72✔
596
                keys,
36✔
597
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
108✔
598
                Some(W::BITS),
36✔
599
                &get_val,
36✔
600
                new_data,
36✔
601
                pl,
36✔
602
            )?,
603
            filter_mask,
36✔
UNCOV
604
            hash_bits: W::BITS as u32,
×
605
        })
606
    }
607
}
608

609
/// Builds a new function using a [bit-field vector](BitFieldVec) on words of
610
/// the unsigned type `W` to store values.
611
///
612
/// Since values are stored in a bit-field vector access will be slower than
613
/// when using a boxed slice, but the bit width of stored values will be the
614
/// minimum necessary. It must be in any case at most the bit width of `W`.
615
///
616
/// Typically `W` will be `usize` or `u64`.
617
impl<W: ZeroCopy + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, BitFieldVec<W>, S, E>
618
where
619
    u128: UpcastableFrom<W>,
620
    SigVal<S, W>: RadixKey,
621
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
622
{
623
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
27✔
624
        mut self,
625
        keys: impl RewindableIoLender<B>,
626
        values: impl RewindableIoLender<W>,
627
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
628
    ) -> anyhow::Result<VFunc<T, W, BitFieldVec<W>, S, E>> {
629
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
27✔
630
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
273✔
631
        self.build_loop(keys, values, None, &get_val, new_data, pl)
216✔
632
    }
633
}
634

635
/// Builds a new filter using a [bit-field vector](BitFieldVec) on words of the
636
/// unsigned type `W` to store values.
637
///
638
/// Since values are stored in a bit-field vector access will be slower than
639
/// when using a boxed slice, but the number of bits of the hashes can be set at
640
/// will. It must be in any case at most the bit width of `W`.
641
///
642
/// Typically `W` will be `usize` or `u64`.
643
impl<W: ZeroCopy + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
644
    VBuilder<W, BitFieldVec<W>, S, E>
645
where
646
    u128: UpcastableFrom<W>,
647
    SigVal<S, EmptyVal>: RadixKey,
648
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
649
{
UNCOV
650
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
651
        mut self,
652
        keys: impl RewindableIoLender<B>,
653
        filter_bits: usize,
654
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
655
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, BitFieldVec<W>, S, E>>> {
UNCOV
656
        assert!(filter_bits > 0);
×
UNCOV
657
        assert!(filter_bits <= W::BITS);
×
UNCOV
658
        let filter_mask = W::MAX >> (W::BITS - filter_bits);
×
UNCOV
659
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
×
UNCOV
660
            <W as DowncastableFrom<u64>>::downcast_from(mix64(shard_edge.edge_hash(sig_val.sig)))
×
661
                & filter_mask
×
662
        };
663
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
×
664

665
        Ok(VFilter {
×
666
            func: self.build_loop(
×
UNCOV
667
                keys,
×
668
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
×
UNCOV
669
                Some(filter_bits),
×
670
                &get_val,
×
671
                new_data,
×
672
                pl,
×
673
            )?,
674
            filter_mask,
×
675
            hash_bits: filter_bits as _,
×
676
        })
677
    }
678
}
679

680
impl<
681
    W: ZeroCopy + Word,
682
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
683
    S: Sig + Send + Sync,
684
    E: ShardEdge<S, 3>,
685
> VBuilder<W, D, S, E>
686
{
687
    /// Builds and returns a new function with given keys and values.
688
    ///
689
    /// This method can build functions based both on vectors and on bit-field
690
    /// vectors. The necessary abstraction is provided by the
691
    /// `new_data(bit_width, len)` function, which is called to create the data
692
    /// structure to store the values.
693
    ///
694
    /// When `V` is [`EmptyVal`], the this method builds a function supporting a
695
    /// filter by mapping each key to a mix of its local signature. The
696
    /// necessary abstraction is provided by the `get_val` function, which is
697
    /// called to extract the value from the signature/value pair; in the case
698
    /// of functions it returns the value stored in the signature/value pair,
699
    /// and in the case of filters it returns the hash associated with the
700
    /// signature.
701
    fn build_loop<
63✔
702
        T: ?Sized + ToSig<S> + std::fmt::Debug,
703
        B: ?Sized + Borrow<T>,
704
        V: ZeroCopy + Default + Send + Sync + Ord + UpcastableInto<u128>,
705
    >(
706
        &mut self,
707
        mut keys: impl RewindableIoLender<B>,
708
        mut values: impl RewindableIoLender<V>,
709
        bit_width: Option<usize>,
710
        get_val: &(impl Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync),
711
        new_data: fn(usize, usize) -> D,
712
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
713
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
714
    where
715
        u128: UpcastableFrom<W>,
716
        SigVal<S, V>: RadixKey,
717
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
718
        for<'a> ShardDataIter<'a, W, D>: Send,
719
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
720
    {
721
        let mut dup_count = 0;
126✔
722
        let mut local_dup_count = 0;
126✔
723
        let mut prng = SmallRng::seed_from_u64(self.seed);
189✔
724

725
        if let Some(expected_num_keys) = self.expected_num_keys {
115✔
UNCOV
726
            self.shard_edge.set_up_shards(expected_num_keys, self.eps);
×
UNCOV
727
            self.log2_buckets = self.shard_edge.shard_high_bits();
×
728
        }
729

730
        pl.info(format_args!("Using 2^{} buckets", self.log2_buckets));
252✔
731

732
        // Loop until success or duplicate detection
UNCOV
733
        loop {
×
734
            let seed = prng.random();
591✔
735

736
            match if self.offline {
197✔
737
                self.try_seed(
144✔
738
                    seed,
72✔
739
                    sig_store::new_offline::<S, V>(
72✔
740
                        self.log2_buckets,
72✔
UNCOV
741
                        LOG2_MAX_SHARDS,
×
742
                        self.expected_num_keys,
72✔
743
                    )?,
UNCOV
744
                    &mut keys,
×
UNCOV
745
                    &mut values,
×
746
                    bit_width,
×
UNCOV
747
                    get_val,
×
UNCOV
748
                    new_data,
×
749
                    pl,
×
750
                )
751
            } else {
752
                self.try_seed(
250✔
753
                    seed,
×
754
                    sig_store::new_online::<S, V>(
×
UNCOV
755
                        self.log2_buckets,
×
UNCOV
756
                        LOG2_MAX_SHARDS,
×
UNCOV
757
                        self.expected_num_keys,
×
758
                    )?,
759
                    &mut keys,
×
760
                    &mut values,
×
761
                    bit_width,
×
762
                    get_val,
×
UNCOV
763
                    new_data,
×
764
                    pl,
×
765
                )
766
            } {
767
                Ok(func) => {
62✔
768
                    return Ok(func);
×
769
                }
770
                Err(error) => {
135✔
771
                    match error.downcast::<SolveError>() {
135✔
772
                        Ok(vfunc_error) => match vfunc_error {
135✔
773
                            // Let's try another seed, but just a few times--most likely,
774
                            // duplicate keys
UNCOV
775
                            SolveError::DuplicateSignature => {
×
776
                                if dup_count >= 3 {
4✔
777
                                    pl.error(format_args!("Duplicate keys (duplicate 128-bit signatures with four different seeds)"));
4✔
778
                                    return Err(BuildError::DuplicateKey.into());
1✔
779
                                }
780
                                pl.warn(format_args!(
×
UNCOV
781
                                "Duplicate 128-bit signature, trying again with a different seed..."
×
782
                                ));
UNCOV
783
                                dup_count += 1;
×
784
                            }
785
                            // Let's try another seed, but just a few times
786
                            SolveError::DuplicateLocalSignature => {
×
UNCOV
787
                                if local_dup_count >= 2 {
×
788
                                    pl.error(format_args!("Duplicate local signatures: use full signatures (duplicate local signatures with three different seeds)"));
×
UNCOV
789
                                    return Err(BuildError::DuplicateLocalSignatures.into());
×
790
                                }
791
                                pl.warn(format_args!(
×
792
                                "Duplicate local signature, trying again with a different seed..."
×
793
                                ));
794
                                local_dup_count += 1;
×
795
                            }
796
                            SolveError::MaxShardTooBig => {
1✔
797
                                pl.warn(format_args!(
3✔
798
                                "The maximum shard is too big, trying again with a different seed..."
1✔
799
                               ));
800
                            }
801
                            // Let's just try another seed
802
                            SolveError::UnsolvableShard => {
130✔
803
                                pl.warn(format_args!(
390✔
804
                                    "Unsolvable shard, trying again with a different seed..."
130✔
805
                                ));
806
                            }
807
                        },
UNCOV
808
                        Err(error) => return Err(error),
×
809
                    }
810
                }
811
            }
812

813
            values = values.rewind()?;
268✔
814
            keys = keys.rewind()?;
134✔
815
        }
816
    }
817
}
818

819
impl<
820
    W: ZeroCopy + Word,
821
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
822
    S: Sig + Send + Sync,
823
    E: ShardEdge<S, 3>,
824
> VBuilder<W, D, S, E>
825
{
826
    /// Tries to build a function using specific seed. See the comments in the
827
    /// [`VBuilder::build_loop`] method for more details.
828
    ///
829
    /// This methods reads the input, sets up the shards, allocates the backend
830
    /// using `new_data`, and passes the backend and an iterator on shards to
831
    /// the [`VBuilder::try_build_from_shard_iter`] method.
832
    fn try_seed<
197✔
833
        T: ?Sized + ToSig<S> + std::fmt::Debug,
834
        B: ?Sized + Borrow<T>,
835
        V: ZeroCopy + Default + Send + Sync + Ord + UpcastableInto<u128>,
836
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
837
    >(
838
        &mut self,
839
        seed: u64,
840
        mut sig_store: impl SigStore<S, V>,
841
        keys: &mut impl RewindableIoLender<B>,
842
        values: &mut impl RewindableIoLender<V>,
843
        bit_width: Option<usize>,
844
        get_val: &G,
845
        new_data: fn(usize, usize) -> D,
846
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
847
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
848
    where
849
        SigVal<S, V>: RadixKey,
850
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
851
        for<'a> ShardDataIter<'a, W, D>: Send,
852
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
853
    {
854
        let shard_edge = &mut self.shard_edge;
394✔
855

856
        pl.expected_updates(self.expected_num_keys);
591✔
857
        pl.item_name("key");
394✔
858
        pl.start(format!(
591✔
859
            "Computing and storing {}-bit signatures in {} using seed 0x{:016x}...",
197✔
860
            std::mem::size_of::<S>() * 8,
197✔
861
            sig_store
197✔
862
                .temp_dir()
197✔
863
                .map(|d| d.path().to_string_lossy())
341✔
864
                .unwrap_or(Cow::Borrowed("memory")),
394✔
UNCOV
865
            seed
×
866
        ));
867

868
        // This will be the maximum value for functions and EmptyVal for filters
869
        let mut maybe_max_value = V::default();
394✔
870
        let start = Instant::now();
394✔
871

872
        while let Some(result) = keys.next() {
46,361,677✔
UNCOV
873
            match result {
×
874
                Ok(key) => {
23,180,740✔
UNCOV
875
                    pl.light_update();
×
876
                    // This might be an actual value, if we are building a
877
                    // function, or EmptyVal, if we are building a filter.
878
                    let &maybe_val = values.next().expect("Not enough values")?;
23,180,740✔
879
                    let sig_val = SigVal {
880
                        sig: T::to_sig(key.borrow(), seed),
×
881
                        val: maybe_val,
882
                    };
UNCOV
883
                    maybe_max_value = Ord::max(maybe_max_value, maybe_val);
×
UNCOV
884
                    sig_store.try_push(sig_val)?;
×
885
                }
UNCOV
886
                Err(e) => return Err(e.into()),
×
887
            }
888
        }
889
        pl.done();
394✔
890

891
        self.num_keys = sig_store.len();
197✔
892
        self.bit_width = if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
197✔
893
            bit_width.expect("Bit width must be set for filters")
345✔
894
        } else {
895
            let len_width = maybe_max_value.upcast().len() as usize;
82✔
UNCOV
896
            if let Some(bit_width) = bit_width {
×
UNCOV
897
                if len_width > bit_width {
×
UNCOV
898
                    return Err(BuildError::ValueTooLarge.into());
×
899
                }
UNCOV
900
                bit_width
×
901
            } else {
902
                len_width
82✔
903
            }
904
        };
905

906
        info!(
197✔
907
            "Computation of signatures from inputs completed in {:.3} seconds ({} keys, {:.3} ns/key)",
160✔
908
            start.elapsed().as_secs_f64(),
320✔
UNCOV
909
            self.num_keys,
×
910
            start.elapsed().as_nanos() as f64 / self.num_keys as f64
160✔
911
        );
912

UNCOV
913
        let start = Instant::now();
×
914

915
        let shard_store = sig_store.into_shard_store(shard_edge.shard_high_bits())?;
197✔
UNCOV
916
        let max_shard = shard_store.shard_sizes().iter().copied().max().unwrap_or(0);
×
UNCOV
917
        let filter = TypeId::of::<V>() == TypeId::of::<EmptyVal>();
×
918

UNCOV
919
        shard_edge.set_up_shards(self.num_keys, self.eps);
×
UNCOV
920
        (self.c, self.lge) = shard_edge.set_up_graphs(self.num_keys, max_shard);
×
921

922
        if filter {
115✔
923
            pl.info(format_args!(
345✔
924
                "Number of keys: {} Bit width: {}",
115✔
925
                self.num_keys, self.bit_width,
115✔
926
            ));
927
        } else {
928
            pl.info(format_args!(
82✔
929
                "Number of keys: {} Max value: {} Bit width: {}",
82✔
930
                self.num_keys,
82✔
931
                maybe_max_value.upcast(),
82✔
932
                self.bit_width,
82✔
933
            ));
934
        }
935

936
        if shard_edge.shard_high_bits() != 0 {
6✔
937
            pl.info(format_args!(
24✔
938
                "Max shard / average shard: {:.2}%",
12✔
939
                (100.0 * max_shard as f64)
12✔
940
                    / (self.num_keys as f64 / shard_edge.num_shards() as f64)
12✔
941
            ));
942
        }
943

UNCOV
944
        if max_shard as f64 > 1.01 * self.num_keys as f64 / shard_edge.num_shards() as f64 {
×
945
            // This might sometimes happen with small sharded graphs
946
            Err(SolveError::MaxShardTooBig.into())
1✔
947
        } else {
948
            let data = new_data(
949
                self.bit_width,
×
UNCOV
950
                shard_edge.num_vertices() * shard_edge.num_shards(),
×
951
            );
UNCOV
952
            self.try_build_from_shard_iter(seed, data, shard_store.into_iter(), get_val, pl)
×
953
                .inspect(|_| {
62✔
954
                    info!(
62✔
955
                        "Construction from signatures completed in {:.3} seconds ({} keys, {:.3} ns/key)",
52✔
956
                        start.elapsed().as_secs_f64(),
104✔
957
                        self.num_keys,
×
958
                        start.elapsed().as_nanos() as f64 / self.num_keys as f64
52✔
959
                    );
960
                })
UNCOV
961
                .map_err(Into::into)
×
962
        }
963
    }
964

965
    /// Builds and return a new function starting from an iterator on shards.
966
    ///
967
    /// See [`VBuilder::build_loop`] for more details on the parameters.
968
    fn try_build_from_shard_iter<
196✔
969
        T: ?Sized + ToSig<S>,
970
        I,
971
        P,
972
        V: ZeroCopy + Default + Send + Sync,
973
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
974
    >(
975
        &mut self,
976
        seed: u64,
977
        mut data: D,
978
        shard_iter: I,
979
        get_val: &G,
980
        pl: &mut P,
981
    ) -> Result<VFunc<T, W, D, S, E>, SolveError>
982
    where
983
        SigVal<S, V>: RadixKey,
984
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
985
        P: ProgressLog + Clone + Send + Sync,
986
        I: Iterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
987
        for<'a> ShardDataIter<'a, W, D>: Send,
988
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
989
    {
990
        let shard_edge = &self.shard_edge;
392✔
991
        self.num_threads = shard_edge.num_shards().min(self.max_num_threads);
588✔
992

993
        pl.info(format_args!("{}", self.shard_edge));
784✔
994
        pl.info(format_args!(
588✔
995
            "c: {}, Overhead: {:+.4}% Number of threads: {}",
196✔
UNCOV
996
            self.c,
×
997
            100. * ((shard_edge.num_vertices() * shard_edge.num_shards()) as f64
588✔
998
                / (self.num_keys as f64)
196✔
999
                - 1.),
196✔
UNCOV
1000
            self.num_threads
×
1001
        ));
1002

1003
        if self.lge {
196✔
1004
            pl.info(format_args!("Peeling towards lazy Gaussian elimination"));
696✔
1005
            self.par_solve(
348✔
1006
                shard_iter,
174✔
1007
                &mut data,
174✔
1008
                |this, shard_index, shard, data, pl| {
165✔
1009
                    this.lge_shard(shard_index, shard, data, get_val, pl)
1,155✔
1010
                },
1011
                &mut pl.concurrent(),
174✔
1012
                pl,
174✔
1013
            )?;
1014
        } else if self.low_mem == Some(true)
22✔
1015
            || self.low_mem.is_none() && self.num_threads > 3 && shard_edge.num_shards() > 2
14✔
1016
        {
1017
            // Less memory, but slower
1018
            self.par_solve(
10✔
UNCOV
1019
                shard_iter,
×
UNCOV
1020
                &mut data,
×
1021
                |this, shard_index, shard, data, pl| {
10✔
1022
                    this.peel_by_sig_vals_low_mem(shard_index, shard, data, get_val, pl)
70✔
1023
                },
1024
                &mut pl.concurrent(),
×
1025
                pl,
×
1026
            )?;
1027
        } else {
1028
            // More memory, but faster
1029
            self.par_solve(
12✔
1030
                shard_iter,
×
UNCOV
1031
                &mut data,
×
1032
                |this, shard_index, shard, data, pl| {
12✔
1033
                    this.peel_by_sig_vals_high_mem(shard_index, shard, data, get_val, pl)
84✔
1034
                },
1035
                &mut pl.concurrent(),
×
1036
                pl,
×
1037
            )?;
1038
        }
1039

1040
        pl.info(format_args!(
62✔
1041
            "Bits/keys: {} ({:+.4}%)",
×
UNCOV
1042
            data.len() as f64 * self.bit_width as f64 / self.num_keys as f64,
×
UNCOV
1043
            100.0 * (data.len() as f64 / self.num_keys as f64 - 1.),
×
1044
        ));
1045

1046
        Ok(VFunc {
×
1047
            seed,
×
1048
            shard_edge: self.shard_edge,
×
UNCOV
1049
            num_keys: self.num_keys,
×
UNCOV
1050
            data,
×
1051
            _marker_t: std::marker::PhantomData,
×
1052
            _marker_w: std::marker::PhantomData,
×
1053
            _marker_s: std::marker::PhantomData,
×
1054
        })
1055
    }
1056
}
1057

1058
macro_rules! remove_edge {
1059
    ($xor_graph: ident, $e: ident, $side: ident, $edge: ident, $stack: ident, $push:ident) => {
1060
        match $side {
1061
            0 => {
1062
                if $xor_graph.degree($e[1]) == 2 {
1063
                    $stack.$push($e[1].cast());
1064
                }
1065
                $xor_graph.remove($e[1], $edge, 1);
1066
                if $xor_graph.degree($e[2]) == 2 {
1067
                    $stack.$push($e[2].cast());
1068
                }
1069
                $xor_graph.remove($e[2], $edge, 2);
1070
            }
1071
            1 => {
1072
                if $xor_graph.degree($e[0]) == 2 {
1073
                    $stack.$push($e[0].cast());
1074
                }
1075
                $xor_graph.remove($e[0], $edge, 0);
1076
                if $xor_graph.degree($e[2]) == 2 {
1077
                    $stack.$push($e[2].cast());
1078
                }
1079
                $xor_graph.remove($e[2], $edge, 2);
1080
            }
1081
            2 => {
1082
                if $xor_graph.degree($e[0]) == 2 {
1083
                    $stack.$push($e[0].cast());
1084
                }
1085
                $xor_graph.remove($e[0], $edge, 0);
1086
                if $xor_graph.degree($e[1]) == 2 {
1087
                    $stack.$push($e[1].cast());
1088
                }
1089
                $xor_graph.remove($e[1], $edge, 1);
1090
            }
1091
            _ => unsafe { unreachable_unchecked() },
1092
        }
1093
    };
1094
}
1095

1096
impl<
1097
    W: ZeroCopy + Word + Send + Sync,
1098
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
1099
    S: Sig + ZeroCopy + Send + Sync,
1100
    E: ShardEdge<S, 3>,
1101
> VBuilder<W, D, S, E>
1102
{
1103
    fn count_sort<V: ZeroCopy>(&self, data: &mut [SigVal<S, V>]) {
187✔
1104
        let num_sort_keys = self.shard_edge.num_sort_keys();
561✔
1105
        let mut count = vec![0; num_sort_keys];
561✔
1106

1107
        let mut copied = Box::new_uninit_slice(data.len());
748✔
1108
        for (&sig_val, copy) in data.iter().zip(copied.iter_mut()) {
23,081,635✔
UNCOV
1109
            count[self.shard_edge.sort_key(sig_val.sig)] += 1;
×
UNCOV
1110
            copy.write(sig_val);
×
1111
        }
1112
        let copied = unsafe { copied.assume_init() };
561✔
1113

1114
        count.iter_mut().fold(0, |acc, c| {
8,613✔
1115
            let old = *c;
16,478✔
1116
            *c = acc;
8,239✔
1117
            acc + old
8,239✔
1118
        });
1119

1120
        for &sig_val in copied.iter() {
23,081,074✔
UNCOV
1121
            let key = self.shard_edge.sort_key(sig_val.sig);
×
UNCOV
1122
            data[count[key]] = sig_val;
×
UNCOV
1123
            count[key] += 1;
×
1124
        }
1125
    }
1126

1127
    /// After this number of keys, in the case of filters we remove duplicate
1128
    /// edges.
1129
    const MAX_NO_LOCAL_SIG_CHECK: usize = 1 << 33;
1130

1131
    /// Solves in parallel shards returned by an iterator, storing
1132
    /// the result in `data`.
1133
    ///
1134
    /// # Arguments
1135
    ///
1136
    /// * `shard_iter`: an iterator returning the shards to solve.
1137
    ///
1138
    /// * `data`: the storage for the solution values.
1139
    ///
1140
    /// * `solve_shard`: a method to solve a shard; it takes the shard index,
1141
    ///   the shard, shard-local data, and a progress logger. It may
1142
    ///   fail by returning an error.
1143
    ///
1144
    /// * `main_pl`: the progress logger for the overall computation.
1145
    ///
1146
    /// * `pl`: a progress logger that will be cloned to display the progress of
1147
    ///   a current shard.
1148
    ///
1149
    /// # Errors
1150
    ///
1151
    /// This method returns an error if one of the shards cannot be solved, or
1152
    /// if duplicates are detected.
1153
    fn par_solve<
196✔
1154
        'b,
1155
        V: ZeroCopy + Send + Sync,
1156
        I: IntoIterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1157
        SS: Fn(&Self, usize, Arc<Vec<SigVal<S, V>>>, ShardData<'b, W, D>, &mut P) -> Result<(), ()>
1158
            + Send
1159
            + Sync
1160
            + Copy,
1161
        C: ConcurrentProgressLog + Send + Sync,
1162
        P: ProgressLog + Clone + Send + Sync,
1163
    >(
1164
        &self,
1165
        shard_iter: I,
1166
        data: &'b mut D,
1167
        solve_shard: SS,
1168
        main_pl: &mut C,
1169
        pl: &mut P,
1170
    ) -> Result<(), SolveError>
1171
    where
1172
        I::IntoIter: Send,
1173
        SigVal<S, V>: RadixKey,
1174
        for<'a> ShardDataIter<'a, W, D>: Send,
1175
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
1176
    {
1177
        main_pl
392✔
1178
            .item_name("shard")
1179
            .expected_updates(Some(self.shard_edge.num_shards()))
392✔
1180
            .display_memory(true)
1181
            .start("Solving shards...");
1182

1183
        self.failed.store(false, Ordering::Relaxed);
588✔
1184
        let num_shards = self.shard_edge.num_shards();
588✔
1185
        let buffer_size = self.num_threads.ilog2() as usize;
392✔
1186

1187
        let (err_send, err_recv) = crossbeam_channel::bounded::<_>(self.num_threads);
588✔
1188
        let (data_send, data_recv) = crossbeam_channel::bounded::<(
1189
            usize,
1190
            (Arc<Vec<SigVal<S, V>>>, ShardData<'_, W, D>),
1191
        )>(buffer_size);
196✔
1192

1193
        let result = std::thread::scope(|scope| {
588✔
1194
            scope.spawn(move || {
588✔
1195
                let _ = thread_priority::set_current_thread_priority(ThreadPriority::Max);
196✔
1196
                for val in shard_iter
397✔
1197
                    .into_iter()
196✔
1198
                    .zip(data.try_chunks_mut(self.shard_edge.num_vertices()).unwrap())
1,176✔
1199
                    .enumerate()
196✔
1200
                {
1201
                    if data_send.send(val).is_err() {
603✔
UNCOV
1202
                        break;
×
1203
                    }
1204
                }
1205

1206
                drop(data_send);
392✔
1207
            });
1208

1209
            for _thread_id in 0..self.num_threads {
397✔
UNCOV
1210
                let mut main_pl = main_pl.clone();
×
UNCOV
1211
                let mut pl = pl.clone();
×
UNCOV
1212
                let err_send = err_send.clone();
×
UNCOV
1213
                let data_recv = data_recv.clone();
×
1214
                scope.spawn(move || {
201✔
1215
                    loop {
×
1216
                        match data_recv.recv() {
258✔
1217
                            Err(_) => return,
57✔
1218
                            Ok((shard_index, (shard, mut data))) => {
201✔
UNCOV
1219
                                if shard.is_empty() {
×
1220
                                    return;
10✔
1221
                                }
1222

UNCOV
1223
                                main_pl.info(format_args!(
×
1224
                                    "Analyzing shard {}/{}...",
×
UNCOV
1225
                                    shard_index + 1,
×
UNCOV
1226
                                    num_shards
×
1227
                                ));
1228

1229
                                pl.start(format!(
×
1230
                                    "Sorting shard {}/{}...",
×
1231
                                    shard_index + 1,
×
UNCOV
1232
                                    num_shards
×
1233
                                ));
1234

1235
                                {
1236
                                    // SAFETY: only one thread may be accessing the
1237
                                    // shard, and we will be consuming it
UNCOV
1238
                                    let shard = unsafe {
×
UNCOV
1239
                                        &mut *(Arc::as_ptr(&shard) as *mut Vec<SigVal<S, V>>)
×
1240
                                    };
1241

UNCOV
1242
                                    if self.check_dups {
×
1243
                                        shard.radix_sort_builder().sort();
12✔
1244
                                        if shard.par_windows(2).any(|w| w[0].sig == w[1].sig) {
16✔
1245
                                            let _ = err_send.send(SolveError::DuplicateSignature);
8✔
1246
                                            return;
4✔
1247
                                        }
1248
                                    }
1249

1250
                                    if TypeId::of::<E::LocalSig>() != TypeId::of::<S>()
187✔
1251
                                        && self.num_keys > Self::MAX_NO_LOCAL_SIG_CHECK
109✔
1252
                                    {
1253
                                        // Check for duplicate local signatures
1254

1255
                                        // E::SortSig provides a transmutable
1256
                                        // view of SigVal with an implementation
1257
                                        // of RadixKey that is compatible with
1258
                                        // the sort induced by the key returned
1259
                                        // by ShardEdge::sort_key, and equality
1260
                                        // that implies equality of local
1261
                                        // signatures.
1262

1263
                                        // SAFETY: we drop this immediately after sorting.
UNCOV
1264
                                        let shard = unsafe {
×
UNCOV
1265
                                            transmute::<
×
UNCOV
1266
                                                &mut Vec<SigVal<S, V>>,
×
UNCOV
1267
                                                &mut Vec<E::SortSigVal<V>>,
×
UNCOV
1268
                                            >(shard)
×
1269
                                        };
1270

1271
                                        let builder = shard.radix_sort_builder();
×
1272
                                        if self.max_num_threads == 1 {
×
1273
                                            builder
×
UNCOV
1274
                                                .with_single_threaded_tuner()
×
UNCOV
1275
                                                .with_parallel(false)
×
1276
                                        } else {
1277
                                            builder
×
1278
                                        }
1279
                                        .sort();
×
1280

UNCOV
1281
                                        let shard_len = shard.len();
×
1282
                                        shard.dedup();
×
1283

1284
                                        if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
×
1285
                                            // Duplicate local signatures on
1286
                                            // large filters can be simply
1287
                                            // removed: they do not change the
1288
                                            // semantics of the filter because
1289
                                            // hashes are computed on
1290
                                            // local signatures.
UNCOV
1291
                                            pl.info(format_args!(
×
UNCOV
1292
                                                "Removed signatures: {}",
×
UNCOV
1293
                                                shard_len - shard.len()
×
1294
                                            ));
1295
                                        } else {
1296
                                            // For function, we have to try again
1297
                                            if shard_len != shard.len() {
×
1298
                                                let _ = err_send
×
UNCOV
1299
                                                    .send(SolveError::DuplicateLocalSignature);
×
UNCOV
1300
                                                return;
×
1301
                                            }
1302
                                        }
1303
                                    } else if self.shard_edge.num_sort_keys() != 1 {
374✔
1304
                                        // Sorting the signatures increases locality
1305
                                        self.count_sort::<V>(shard);
374✔
1306
                                    }
1307
                                }
1308

1309
                                pl.done_with_count(shard.len());
187✔
1310

UNCOV
1311
                                main_pl.info(format_args!(
×
UNCOV
1312
                                    "Solving shard {}/{}...",
×
UNCOV
1313
                                    shard_index + 1,
×
UNCOV
1314
                                    num_shards
×
1315
                                ));
1316

1317
                                if self.failed.load(Ordering::Relaxed) {
×
1318
                                    return;
×
1319
                                }
1320

1321
                                if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
111✔
1322
                                    // For filters, we fill the array with random data, otherwise
1323
                                    // elements with signature 0 would have a significantly higher
1324
                                    // probability of being false positives.
1325
                                    //
1326
                                    // SAFETY: We work around the fact that [usize] does not implement Fill
1327
                                    Mwc192::seed_from_u64(self.seed).fill_bytes(unsafe {
333✔
1328
                                        data.as_mut_slice().align_to_mut::<u8>().1
111✔
1329
                                    });
1330
                                }
1331

UNCOV
1332
                                if solve_shard(self, shard_index, shard, data, &mut pl).is_err() {
×
1333
                                    let _ = err_send.send(SolveError::UnsolvableShard);
260✔
1334
                                    return;
130✔
1335
                                }
1336

1337
                                if self.failed.load(Ordering::Relaxed) {
×
UNCOV
1338
                                    return;
×
1339
                                }
1340

UNCOV
1341
                                main_pl.info(format_args!(
×
1342
                                    "Completed shard {}/{}",
×
1343
                                    shard_index + 1,
×
UNCOV
1344
                                    num_shards
×
1345
                                ));
1346
                                main_pl.update_and_display();
×
1347
                            }
1348
                        }
1349
                    }
1350
                });
1351
            }
1352

1353
            drop(err_send);
392✔
1354
            drop(data_recv);
392✔
1355

1356
            if let Some(error) = err_recv.into_iter().next() {
330✔
UNCOV
1357
                self.failed.store(true, Ordering::Relaxed);
×
UNCOV
1358
                return Err(error);
×
1359
            }
1360

1361
            Ok(())
62✔
1362
        });
1363

1364
        main_pl.done();
392✔
1365
        result
196✔
1366
    }
1367

1368
    /// Peels a shard via edge indices.
1369
    ///
1370
    /// This peeler uses a [`SigVal`] per key (the shard), a
1371
    /// [`ShardEdge::Vertex`] and a byte per vertex (for the [`XorGraph`]), a
1372
    /// [`ShardEdge::Vertex`] per vertex (for the [`DoubleStack`]), and a final
1373
    /// byte per vertex (for the stack of sides).
1374
    ///
1375
    /// This peeler uses more memory than
1376
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) but
1377
    /// less memory than
1378
    /// [`peel_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem). It
1379
    /// is fairly slow as it has to go through a cache-unfriendly memory
1380
    /// indirection every time it has to retrieve a [`SigVal`] from the shard,
1381
    /// but it is the peeler of choice when [lazy Gaussian
1382
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) is required, as
1383
    /// after a failed peel-by-sig-vals it is not possible to retrieve
1384
    /// information about the signature/value pairs in the shard.
1385
    ///
1386
    /// In theory one could avoid the stack of sides by putting vertices,
1387
    /// instead of edge indices, on the upper stack, and retrieving edge indices
1388
    /// and sides from the [`XorGraph`], as
1389
    /// [`peel_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) does,
1390
    ///  but this would be less cache friendly. This peeler is only used for
1391
    /// very small instances, and since we are going to pass through lazy
1392
    /// Gaussian elimination some additional speed is a good idea.
1393
    fn peel_by_index<
165✔
1394
        'a,
1395
        V: ZeroCopy + Send + Sync,
1396
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1397
    >(
1398
        &self,
1399
        shard_index: usize,
1400
        shard: Arc<Vec<SigVal<S, V>>>,
1401
        data: ShardData<'a, W, D>,
1402
        get_val: &G,
1403
        pl: &mut impl ProgressLog,
1404
    ) -> Result<PeelResult<'a, W, D, S, E, V>, ()> {
1405
        let shard_edge = &self.shard_edge;
330✔
1406
        let num_vertices = shard_edge.num_vertices();
495✔
1407
        let num_shards = shard_edge.num_shards();
495✔
1408

1409
        pl.start(format!(
495✔
1410
            "Generating graph for shard {}/{}...",
165✔
1411
            shard_index + 1,
165✔
UNCOV
1412
            num_shards
×
1413
        ));
1414

1415
        let mut xor_graph = XorGraph::<E::Vertex>::new(num_vertices);
495✔
1416
        for (edge_index, sig_val) in shard.iter().enumerate() {
1,081,030✔
1417
            for (side, &v) in shard_edge
3,242,100✔
UNCOV
1418
                .local_edge(shard_edge.local_sig(sig_val.sig))
×
UNCOV
1419
                .iter()
×
UNCOV
1420
                .enumerate()
×
1421
            {
UNCOV
1422
                xor_graph.add(v, edge_index.cast(), side);
×
1423
            }
1424
        }
1425
        pl.done_with_count(shard.len());
495✔
1426

1427
        assert!(
165✔
1428
            !xor_graph.overflow,
165✔
UNCOV
1429
            "Degree overflow for shard {}/{}",
×
UNCOV
1430
            shard_index + 1,
×
UNCOV
1431
            num_shards
×
1432
        );
1433

1434
        if self.failed.load(Ordering::Relaxed) {
495✔
1435
            return Err(());
×
1436
        }
1437

1438
        // The lower stack contains vertices to be visited. The upper stack
1439
        // contains peeled edges.
1440
        let mut double_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
UNCOV
1441
        let mut sides_stack = Vec::<u8>::new();
×
1442

UNCOV
1443
        pl.start(format!(
×
UNCOV
1444
            "Peeling graph for shard {}/{} by edge indices...",
×
1445
            shard_index + 1,
×
1446
            num_shards
×
1447
        ));
1448

1449
        // Preload all vertices of degree one in the visit stack
1450
        for (v, degree) in xor_graph.degrees().enumerate() {
1,223,124✔
1451
            if degree == 1 {
229,108✔
1452
                double_stack.push_lower(v.cast());
687,324✔
1453
            }
1454
        }
1455

1456
        while let Some(v) = double_stack.pop_lower() {
967,413✔
UNCOV
1457
            let v: usize = v.upcast();
×
UNCOV
1458
            if xor_graph.degree(v) == 0 {
×
1459
                continue;
32,940✔
1460
            }
1461
            debug_assert!(xor_graph.degree(v) == 1);
1,352,052✔
1462
            let (edge_index, side) = xor_graph.edge_and_side(v);
1,802,736✔
1463
            xor_graph.zero(v);
1,352,052✔
1464
            double_stack.push_upper(edge_index);
1,352,052✔
1465
            sides_stack.push(side as u8);
1,352,052✔
1466

1467
            let e = shard_edge.local_edge(shard_edge.local_sig(shard[edge_index.upcast()].sig));
3,154,788✔
1468
            remove_edge!(xor_graph, e, side, edge_index, double_stack, push_lower);
450,684✔
1469
        }
1470

1471
        pl.done();
330✔
1472

1473
        if shard.len() != double_stack.upper_len() {
330✔
1474
            pl.info(format_args!(
465✔
1475
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
155✔
1476
                shard_index + 1,
155✔
UNCOV
1477
                num_shards,
×
1478
                double_stack.upper_len(),
310✔
1479
                shard.len(),
155✔
1480
            ));
1481
            return Ok(PeelResult::Partial {
155✔
1482
                shard_index,
310✔
1483
                shard,
310✔
1484
                data,
310✔
1485
                double_stack,
155✔
1486
                sides_stack,
155✔
1487
            });
1488
        }
1489

UNCOV
1490
        self.assign(
×
UNCOV
1491
            shard_index,
×
UNCOV
1492
            data,
×
UNCOV
1493
            double_stack
×
UNCOV
1494
                .iter_upper()
×
1495
                .map(|&edge_index| {
100✔
1496
                    // Turn edge indices into local edge signatures
1497
                    // and associated values
1498
                    let sig_val = &shard[edge_index.upcast()];
300✔
1499
                    let local_sig = shard_edge.local_sig(sig_val.sig);
400✔
1500
                    (
1501
                        local_sig,
200✔
1502
                        get_val(
200✔
1503
                            shard_edge,
200✔
1504
                            SigVal {
100✔
1505
                                sig: local_sig,
100✔
1506
                                val: sig_val.val,
100✔
1507
                            },
1508
                        ),
1509
                    )
1510
                })
UNCOV
1511
                .zip(sides_stack.into_iter().rev()),
×
UNCOV
1512
            pl,
×
1513
        );
1514

UNCOV
1515
        Ok(PeelResult::Complete())
×
1516
    }
1517

1518
    /// Peels a shard via signature/value pairs using a stack of peeled
1519
    /// signatures/value pairs.
1520
    ///
1521
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1522
    /// it drops it immediately after building the graph.
1523
    ///
1524
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]), a
1525
    /// [`ShardEdge::Vertex`] per vertex (for visit stack, albeit usually the stack
1526
    /// never contains more than a third of the vertices), and a [`SigVal`] and
1527
    /// a byte per key (for the stack of peeled edges).
1528
    ///
1529
    /// This is the fastest and more memory-consuming peeler. It has however
1530
    /// just a small advantage during assignment with respect to
1531
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem), which
1532
    /// uses almost half the memory. It is the peeler of choice for low levels
1533
    /// of parallelism.
1534
    ///
1535
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1536
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a failed
1537
    /// peeling it is not possible to retrieve information about the
1538
    /// signature/value pairs in the shard.
1539
    fn peel_by_sig_vals_high_mem<
12✔
1540
        V: ZeroCopy + Send + Sync,
1541
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1542
    >(
1543
        &self,
1544
        shard_index: usize,
1545
        shard: Arc<Vec<SigVal<S, V>>>,
1546
        data: ShardData<'_, W, D>,
1547
        get_val: &G,
1548
        pl: &mut impl ProgressLog,
1549
    ) -> Result<(), ()>
1550
    where
1551
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1552
    {
1553
        let shard_edge = &self.shard_edge;
24✔
1554
        let num_vertices = shard_edge.num_vertices();
36✔
1555
        let num_shards = shard_edge.num_shards();
36✔
1556
        let shard_len = shard.len();
24✔
1557

1558
        pl.start(format!(
36✔
1559
            "Generating graph for shard {}/{}...",
12✔
1560
            shard_index + 1,
12✔
UNCOV
1561
            num_shards
×
1562
        ));
1563

1564
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
36✔
1565
        for &sig_val in shard.iter() {
12,000,024✔
1566
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1567
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
36,000,000✔
UNCOV
1568
                xor_graph.add(
×
UNCOV
1569
                    v,
×
UNCOV
1570
                    SigVal {
×
1571
                        sig: local_sig,
×
UNCOV
1572
                        val: sig_val.val,
×
1573
                    },
1574
                    side,
×
1575
                );
1576
            }
1577
        }
1578
        pl.done_with_count(shard.len());
36✔
1579

1580
        // We are using a consuming iterator over the shard store, so this
1581
        // drop will free the memory used by the signatures
1582
        drop(shard);
24✔
1583

1584
        assert!(
12✔
1585
            !xor_graph.overflow,
12✔
UNCOV
1586
            "Degree overflow for shard {}/{}",
×
UNCOV
1587
            shard_index + 1,
×
UNCOV
1588
            num_shards
×
1589
        );
1590

1591
        if self.failed.load(Ordering::Relaxed) {
36✔
1592
            return Err(());
×
1593
        }
1594

UNCOV
1595
        let mut sig_vals_stack = FastStack::<SigVal<E::LocalSig, V>>::new(shard_len);
×
UNCOV
1596
        let mut sides_stack = FastStack::<u8>::new(shard_len);
×
1597
        // Experimentally this stack never grows beyond a little more than
1598
        // num_vertices / 4
UNCOV
1599
        let mut visit_stack = Vec::<E::Vertex>::with_capacity(num_vertices / 3);
×
1600

1601
        pl.start(format!(
×
UNCOV
1602
            "Peeling graph for shard {}/{} by signatures (high-mem)...",
×
UNCOV
1603
            shard_index + 1,
×
1604
            num_shards
×
1605
        ));
1606

1607
        // Preload all vertices of degree one in the visit stack
1608
        for (v, degree) in xor_graph.degrees().enumerate() {
13,565,952✔
1609
            if degree == 1 {
2,530,634✔
1610
                visit_stack.push(v.cast());
7,591,902✔
1611
            }
1612
        }
1613

1614
        while let Some(v) = visit_stack.pop() {
25,122,948✔
UNCOV
1615
            let v: usize = v.upcast();
×
UNCOV
1616
            if xor_graph.degree(v) == 0 {
×
1617
                continue;
561,468✔
1618
            }
1619
            let (sig_val, side) = xor_graph.edge_and_side(v);
48,000,000✔
1620
            xor_graph.zero(v);
36,000,000✔
1621
            sig_vals_stack.push(sig_val);
36,000,000✔
1622
            sides_stack.push(side as u8);
36,000,000✔
1623

1624
            let e = self.shard_edge.local_edge(sig_val.sig);
48,000,000✔
1625
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push);
12,000,000✔
1626
        }
1627

1628
        pl.done();
24✔
1629

1630
        if shard_len != sig_vals_stack.len() {
24✔
UNCOV
1631
            pl.info(format_args!(
×
UNCOV
1632
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
UNCOV
1633
                shard_index + 1,
×
UNCOV
1634
                num_shards,
×
UNCOV
1635
                sig_vals_stack.len(),
×
1636
                shard_len
×
1637
            ));
1638
            return Err(());
×
1639
        }
1640

1641
        self.assign(
×
UNCOV
1642
            shard_index,
×
1643
            data,
×
UNCOV
1644
            sig_vals_stack
×
UNCOV
1645
                .iter()
×
1646
                .rev()
×
1647
                .map(|&sig_val| (sig_val.sig, get_val(shard_edge, sig_val)))
36,000,000✔
1648
                .zip(sides_stack.iter().copied().rev()),
×
1649
            pl,
×
1650
        );
1651

UNCOV
1652
        Ok(())
×
1653
    }
1654

1655
    /// Peels a shard via signature/value pairs using a stack of vertices to
1656
    /// represent peeled edges.
1657
    ///
1658
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1659
    /// it drops it immediately after building the graph.
1660
    ///
1661
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]) and a
1662
    /// [`ShardEdge::Vertex`] per vertex (for a [`DoubleStack`]).
1663
    ///
1664
    /// This is by far the less memory-hungry peeler, and it is just slightly
1665
    /// slower than
1666
    /// [`peek_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem),
1667
    /// which uses almost twice the memory. It is the peeler of choice for
1668
    /// significant levels of parallelism.
1669
    ///
1670
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1671
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a
1672
    /// failed peeling it is not possible to retrieve information about the
1673
    /// signature/value pairs in the shard.
1674
    fn peel_by_sig_vals_low_mem<
10✔
1675
        V: ZeroCopy + Send + Sync,
1676
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1677
    >(
1678
        &self,
1679
        shard_index: usize,
1680
        shard: Arc<Vec<SigVal<S, V>>>,
1681
        data: ShardData<'_, W, D>,
1682
        get_val: &G,
1683
        pl: &mut impl ProgressLog,
1684
    ) -> Result<(), ()>
1685
    where
1686
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1687
    {
1688
        let shard_edge = &self.shard_edge;
20✔
1689
        let num_vertices = shard_edge.num_vertices();
30✔
1690
        let num_shards = shard_edge.num_shards();
30✔
1691
        let shard_len = shard.len();
20✔
1692

1693
        pl.start(format!(
30✔
1694
            "Generating graph for shard {}/{}...",
10✔
1695
            shard_index + 1,
10✔
UNCOV
1696
            num_shards,
×
1697
        ));
1698

1699
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
30✔
1700
        for &sig_val in shard.iter() {
10,000,020✔
1701
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1702
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
30,000,000✔
UNCOV
1703
                xor_graph.add(
×
UNCOV
1704
                    v,
×
UNCOV
1705
                    SigVal {
×
1706
                        sig: local_sig,
×
UNCOV
1707
                        val: sig_val.val,
×
1708
                    },
1709
                    side,
×
1710
                );
1711
            }
1712
        }
1713
        pl.done_with_count(shard.len());
30✔
1714

1715
        // We are using a consuming iterator over the shard store, so this
1716
        // drop will free the memory used by the signatures
1717
        drop(shard);
20✔
1718

1719
        assert!(
10✔
1720
            !xor_graph.overflow,
10✔
UNCOV
1721
            "Degree overflow for shard {}/{}",
×
UNCOV
1722
            shard_index + 1,
×
UNCOV
1723
            num_shards
×
1724
        );
1725

1726
        if self.failed.load(Ordering::Relaxed) {
30✔
1727
            return Err(());
×
1728
        }
1729

UNCOV
1730
        let mut visit_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1731

1732
        pl.start(format!(
×
UNCOV
1733
            "Peeling graph for shard {}/{} by signatures (low-mem)...",
×
UNCOV
1734
            shard_index + 1,
×
1735
            num_shards
×
1736
        ));
1737

1738
        // Preload all vertices of degree one in the visit stack
1739
        for (v, degree) in xor_graph.degrees().enumerate() {
11,304,960✔
1740
            if degree == 1 {
2,108,606✔
1741
                visit_stack.push_lower(v.cast());
6,325,818✔
1742
            }
1743
        }
1744

1745
        while let Some(v) = visit_stack.pop_lower() {
20,935,042✔
UNCOV
1746
            let v: usize = v.upcast();
×
UNCOV
1747
            if xor_graph.degree(v) == 0 {
×
1748
                continue;
467,516✔
1749
            }
1750
            let (sig_val, side) = xor_graph.edge_and_side(v);
40,000,000✔
1751
            xor_graph.zero(v);
30,000,000✔
1752
            visit_stack.push_upper(v.cast());
40,000,000✔
1753

1754
            let e = self.shard_edge.local_edge(sig_val.sig);
40,000,000✔
1755
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push_lower);
10,000,000✔
1756
        }
1757

1758
        pl.done();
20✔
1759

1760
        if shard_len != visit_stack.upper_len() {
20✔
UNCOV
1761
            pl.info(format_args!(
×
UNCOV
1762
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
UNCOV
1763
                shard_index + 1,
×
UNCOV
1764
                num_shards,
×
UNCOV
1765
                visit_stack.upper_len(),
×
1766
                shard_len
×
1767
            ));
1768
            return Err(());
×
1769
        }
1770

1771
        self.assign(
×
UNCOV
1772
            shard_index,
×
1773
            data,
×
1774
            visit_stack.iter_upper().map(|&v| {
10,000,000✔
1775
                let (sig_val, side) = xor_graph.edge_and_side(v.upcast());
50,000,000✔
1776
                ((sig_val.sig, get_val(shard_edge, sig_val)), side as u8)
30,000,000✔
1777
            }),
1778
            pl,
×
1779
        );
1780

UNCOV
1781
        Ok(())
×
1782
    }
1783

1784
    /// Solves a shard of given index possibly using [lazy Gaussian
1785
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517), and stores the
1786
    /// solution in the given data.
1787
    ///
1788
    /// As a first try, the shard is [peeled by index](VBuilder::peel_by_index).
1789
    /// If the peeling is [partial](PeelResult::Partial), lazy Gaussian
1790
    /// elimination is used to solve the remaining edges.
1791
    ///
1792
    /// This method will scan the double stack, without emptying it, to check
1793
    /// which edges have been peeled. The information will be then passed to
1794
    /// [`VBuilder::assign`] to complete the assignment of values.
1795
    fn lge_shard<
165✔
1796
        V: ZeroCopy + Send + Sync,
1797
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1798
    >(
1799
        &self,
1800
        shard_index: usize,
1801
        shard: Arc<Vec<SigVal<S, V>>>,
1802
        data: ShardData<'_, W, D>,
1803
        get_val: &G,
1804
        pl: &mut impl ProgressLog,
1805
    ) -> Result<(), ()> {
1806
        let shard_edge = &self.shard_edge;
330✔
1807
        // Let's try to peel first
1808
        match self.peel_by_index(shard_index, shard, data, get_val, pl) {
1,155✔
UNCOV
1809
            Err(()) => Err(()),
×
1810
            Ok(PeelResult::Complete()) => Ok(()),
10✔
UNCOV
1811
            Ok(PeelResult::Partial {
×
1812
                shard_index,
155✔
1813
                shard,
155✔
1814
                mut data,
155✔
1815
                double_stack,
155✔
1816
                sides_stack,
155✔
UNCOV
1817
            }) => {
×
1818
                pl.info(format_args!("Switching to lazy Gaussian elimination..."));
620✔
1819
                // Likely result--we have solve the rest
1820
                pl.start(format!(
465✔
1821
                    "Generating system for shard {}/{}...",
155✔
1822
                    shard_index + 1,
155✔
1823
                    shard_edge.num_shards()
310✔
1824
                ));
1825

1826
                let num_vertices = shard_edge.num_vertices();
465✔
1827
                let mut peeled_edges = BitVec::new(shard.len());
465✔
1828
                let mut used_vars = BitVec::new(num_vertices);
465✔
1829
                for &edge in double_stack.iter_upper() {
450,894✔
UNCOV
1830
                    peeled_edges.set(edge.upcast(), true);
×
1831
                }
1832

1833
                // Create data for a system using non-peeled edges
1834
                //
1835
                // SAFETY: there is no undefined behavior here, but the
1836
                // raw construction methods we use assume that the
1837
                // equations are sorted, that the variables are not repeated,
1838
                // and the variables are in the range [0..num_vertices).
1839
                let mut system = unsafe {
1840
                    crate::utils::mod2_sys::Modulo2System::from_parts(
1841
                        num_vertices,
155✔
1842
                        shard
155✔
1843
                            .iter()
155✔
1844
                            .enumerate()
155✔
1845
                            .filter(|(edge_index, _)| !peeled_edges[*edge_index])
1,080,755✔
1846
                            .map(|(_edge_index, sig_val)| {
630,171✔
1847
                                let local_sig = shard_edge.local_sig(sig_val.sig);
2,520,064✔
1848
                                let mut eq: Vec<_> = shard_edge
2,520,064✔
1849
                                    .local_edge(local_sig)
630,016✔
1850
                                    .iter()
630,016✔
1851
                                    .map(|&x| {
2,520,064✔
1852
                                        used_vars.set(x, true);
5,670,144✔
1853
                                        x as u32
1,890,048✔
1854
                                    })
1855
                                    .collect();
630,016✔
1856
                                eq.sort_unstable();
630,016✔
1857
                                crate::utils::mod2_sys::Modulo2Equation::from_parts(
630,016✔
1858
                                    eq,
630,016✔
1859
                                    get_val(
1,260,032✔
1860
                                        shard_edge,
1,260,032✔
1861
                                        SigVal {
630,016✔
1862
                                            sig: local_sig,
630,016✔
1863
                                            val: sig_val.val,
630,016✔
1864
                                        },
1865
                                    ),
1866
                                )
1867
                            })
1868
                            .collect(),
155✔
1869
                    )
1870
                };
1871

1872
                if self.failed.load(Ordering::Relaxed) {
465✔
UNCOV
1873
                    return Err(());
×
1874
                }
1875

UNCOV
1876
                pl.start("Solving system...");
×
1877
                let result = system.lazy_gaussian_elimination().map_err(|_| ())?;
155✔
1878
                pl.done_with_count(system.num_equations());
×
1879

1880
                for (v, &value) in result.iter().enumerate().filter(|(v, _)| used_vars[*v]) {
1,749,491✔
1881
                    data.set(v, value);
×
1882
                }
1883

UNCOV
1884
                self.assign(
×
UNCOV
1885
                    shard_index,
×
1886
                    data,
×
UNCOV
1887
                    double_stack
×
UNCOV
1888
                        .iter_upper()
×
1889
                        .map(|&edge_index| {
420,614✔
1890
                            let sig_val = &shard[edge_index.upcast()];
1,261,842✔
1891
                            let local_sig = shard_edge.local_sig(sig_val.sig);
1,682,456✔
1892
                            (
1893
                                local_sig,
841,228✔
1894
                                get_val(
841,228✔
1895
                                    shard_edge,
841,228✔
1896
                                    SigVal {
420,614✔
1897
                                        sig: local_sig,
420,614✔
1898
                                        val: sig_val.val,
420,614✔
1899
                                    },
1900
                                ),
1901
                            )
1902
                        })
UNCOV
1903
                        .zip(sides_stack.into_iter().rev()),
×
UNCOV
1904
                    pl,
×
1905
                );
UNCOV
1906
                Ok(())
×
1907
            }
1908
        }
1909
    }
1910

1911
    /// Perform assignment of values based on peeling data.
1912
    ///
1913
    /// This method might be called after a successful peeling procedure, or
1914
    /// after a linear solver has been used to solve the remaining edges.
1915
    ///
1916
    /// `sig_vals_sides` is an iterator returning pairs of signature/value pairs
1917
    /// and sides in reverse peeling order.
1918
    fn assign(
57✔
1919
        &self,
1920
        shard_index: usize,
1921
        mut data: ShardData<'_, W, D>,
1922
        sigs_vals_sides: impl Iterator<Item = ((E::LocalSig, W), u8)>,
1923
        pl: &mut impl ProgressLog,
1924
    ) {
1925
        if self.failed.load(Ordering::Relaxed) {
171✔
UNCOV
1926
            return;
×
1927
        }
1928

UNCOV
1929
        pl.start(format!(
×
UNCOV
1930
            "Assigning values for shard {}/{}...",
×
1931
            shard_index + 1,
×
UNCOV
1932
            self.shard_edge.num_shards()
×
1933
        ));
1934

1935
        for ((sig, val), side) in sigs_vals_sides {
89,682,913✔
1936
            let edge = self.shard_edge.local_edge(sig);
89,682,856✔
1937
            let side = side as usize;
44,841,428✔
1938
            unsafe {
1939
                let xor = match side {
44,841,428✔
1940
                    0 => data.get_unchecked(edge[1]) ^ data.get_unchecked(edge[2]),
3,935,744✔
1941
                    1 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[2]),
34,066,525✔
1942
                    2 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[1]),
58,358,325✔
1943
                    _ => core::hint::unreachable_unchecked(),
1944
                };
1945

UNCOV
1946
                data.set_unchecked(edge[side], val ^ xor);
×
1947
            }
1948
        }
1949
        pl.done();
57✔
1950
    }
1951
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc