• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

vigna / sux-rs / 17861369619

19 Sep 2025 02:34PM UTC coverage: 66.129% (+1.4%) from 64.717%
17861369619

push

github

vigna
Minor fixes

3774 of 5707 relevant lines covered (66.13%)

117089019.02 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

58.7
/src/func/vbuilder.rs
1
/*
2
 * SPDX-FileCopyrightText: 2025 Sebastiano Vigna
3
 *
4
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
5
 */
6

7
#![allow(clippy::type_complexity)]
8
#![allow(clippy::too_many_arguments)]
9

10
use crate::bits::*;
11
use crate::dict::VFilter;
12
use crate::func::{shard_edge::ShardEdge, *};
13
use crate::traits::bit_field_slice::{BitFieldSlice, BitFieldSliceMut, Word};
14
use crate::utils::*;
15
use common_traits::{
16
    CastableInto, DowncastableFrom, DowncastableInto, UnsignedInt, UpcastableFrom, UpcastableInto,
17
};
18
use derivative::Derivative;
19
use derive_setters::*;
20
use dsi_progress_logger::*;
21
use log::info;
22
use rand::SeedableRng;
23
use rand::rngs::SmallRng;
24
use rand::{Rng, RngCore};
25
use rayon::iter::ParallelIterator;
26
use rayon::slice::ParallelSlice;
27
use rdst::*;
28
use std::any::TypeId;
29
use std::borrow::{Borrow, Cow};
30
use std::hint::unreachable_unchecked;
31
use std::marker::PhantomData;
32
use std::mem::transmute;
33
use std::ops::{BitXor, BitXorAssign};
34
use std::slice::Iter;
35
use std::sync::Arc;
36
use std::sync::atomic::{AtomicBool, Ordering};
37
use std::time::Instant;
38
use thread_priority::ThreadPriority;
39

40
use super::shard_edge::FuseLge3Shards;
41

42
const LOG2_MAX_SHARDS: u32 = 16;
43

44
/// A builder for [`VFunc`] and [`VFilter`].
45
///
46
/// Keys must implement the [`ToSig`] trait, which provides a method to compute
47
/// a signature of the key.
48
///
49
/// There are two construction modes: in core memory (default) and
50
/// [offline](VBuilder::offline); both use a [`SigStore`]. In the first case,
51
/// space will be allocated in core memory for signatures and associated values
52
/// for all keys; in the second case, such information will be stored in a
53
/// number of on-disk buckets.
54
///
55
/// There are several setters: for example, you can set [set the maximum number
56
/// of threads](VBuilder::max_num_threads).
57
///
58
/// Once signatures have been computed, each parallel thread will process a
59
/// shard of the signature/value pairs. For very large key sets shards will be
60
/// significantly smaller than the number of keys, so the memory usage, in
61
/// particular in offline mode, can be significantly reduced. Note that using
62
/// too many threads might actually be harmful due to memory contention.
63
///
64
/// The generic parameters are explained in the [`VFunc`] documentation. You
65
/// have to choose the type of the output values and the backend. The remaining
66
/// parameters have default values that are the same as those of
67
/// [`VFunc`]/[`VFilter`], and some elaboration about them can be found in their
68
/// documentation.
69
///
70
/// All construction methods require to pass one or two [`RewindableIoLender`]s
71
/// (keys and possibly values), and the construction might fail and keys might
72
/// be scanned again. The structures in the [`lenders`] module provide easy ways
73
/// to build such lenders, even starting from compressed files of UTF-8 strings.
74
/// The type of the keys of the resulting filter or function will be the type of
75
/// the elements of the [`RewindableIoLender`].
76
///
77
/// # Examples
78
///
79
/// In this example, we build a function that maps each key to itself using a
80
/// boxed slice of `usize` as a backend (note that this is really wasteful). The
81
/// setter for the expected number of keys is used to optimize the construction.
82
/// We use the [`FromIntoIterator`] adapter to turn a clonable [`IntoIterator`]
83
/// into a [`RewindableIoLender`]. Note that you need the
84
/// [`dsi-progress-logger`](https://crates.io/crates/dsi-progress-logger) crate.
85
///
86
/// Type inference derives the input type (`usize`) from the type of the items
87
/// returned by the first [`RewindableIoLender`], and the output type (again,
88
/// `usize`, the first type parameter), from the backend type (`Box<[usize]>`,
89
/// the second type parameter):
90
///
91
/// ```rust
92
/// # use sux::func::VBuilder;
93
/// # use dsi_progress_logger::no_logging;
94
/// # use sux::utils::FromIntoIterator;
95
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
96
/// let builder = VBuilder::<_, Box<[usize]>>::default()
97
///     .expected_num_keys(100);
98
/// let func = builder.try_build_func(
99
///    FromIntoIterator::from(0..100),
100
///    FromIntoIterator::from(0..100),
101
///    no_logging![]
102
/// )?;
103
///
104
/// for i in 0..100 {
105
///    assert_eq!(i, func.get(&i));
106
/// }
107
/// #     Ok(())
108
/// # }
109
/// ```
110
///
111
/// Alternatively we can use the bit-field vector backend, that will use
112
/// ⌈log₂(99)⌉ bits per backend element:
113
///
114
/// ```rust
115
/// # use sux::func::VBuilder;
116
/// # use dsi_progress_logger::no_logging;
117
/// # use sux::utils::FromIntoIterator;
118
/// # use sux::bits::BitFieldVec;
119
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
120
/// let builder = VBuilder::<_, BitFieldVec<usize>>::default()
121
///     .expected_num_keys(100);
122
/// let func = builder.try_build_func(
123
///    FromIntoIterator::from(0..100),
124
///    FromIntoIterator::from(0..100),
125
///    no_logging![]
126
/// )?;
127
///
128
/// for i in 0..100 {
129
///    assert_eq!(i, func.get(&i));
130
/// }
131
/// #     Ok(())
132
/// # }
133
/// ```
134
///
135
/// Since the numbers are small, we can also try to use a fixed-size output:
136
/// type inference takes care of making the second range `0..100` a range of
137
/// `u8`. Note that the type of keys is always `usize`, as it is still inferred
138
/// from the type of the items returned by the first [`RewindableIoLender`]:
139
///
140
/// ```rust
141
/// # use sux::func::VBuilder;
142
/// # use dsi_progress_logger::no_logging;
143
/// # use sux::utils::FromIntoIterator;
144
/// # use sux::bits::BitFieldVec;
145
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
146
/// let builder = VBuilder::<_, Box<[u8]>>::default()
147
///     .expected_num_keys(100);
148
/// let func = builder.try_build_func(
149
///    FromIntoIterator::from(0..100),
150
///    FromIntoIterator::from(0..100),
151
///    no_logging![]
152
/// )?;
153
///
154
/// for i in 0..100 {
155
///    assert_eq!(i, func.get(&i));
156
/// }
157
/// #     Ok(())
158
/// # }
159
/// ```
160
///
161
///
162
/// We now try to build a fast 8-bit filter for the same key set, using a boxed
163
/// slice of `u8` as a backend (this is not wasteful, as the filter uses 8-bit
164
/// hashes):
165
///
166
/// ```rust
167
/// # use sux::func::VBuilder;
168
/// # use dsi_progress_logger::no_logging;
169
/// # use sux::utils::FromIntoIterator;
170
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
171
/// let builder = VBuilder::<_, Box<[u8]>>::default()
172
///     .expected_num_keys(100);
173
/// let func = builder.try_build_filter(
174
///    FromIntoIterator::from(0..100),
175
///    no_logging![]
176
/// )?;
177
///
178
/// for i in 0..100 {
179
///    assert!(func[i]);
180
/// }
181
/// #     Ok(())
182
/// # }
183
/// ```
184
///
185
/// Since the keys are very few, we can switch to 64-bit signatures, and no
186
/// shards, which will yield faster queries:
187
///
188
/// ```rust
189
/// # use sux::func::VBuilder;
190
/// # use sux::func::shard_edge::FuseLge3NoShards;
191
/// # use dsi_progress_logger::no_logging;
192
/// # use sux::utils::FromIntoIterator;
193
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
194
/// let builder = VBuilder::<_, Box<[u8]>, [u64; 1], FuseLge3NoShards>::default()
195
///     .expected_num_keys(100);
196
/// let func = builder.try_build_filter(
197
///    FromIntoIterator::from(0..100),
198
///    no_logging![]
199
/// )?;
200
///
201
/// for i in 0..100 {
202
///    assert!(func[i]);
203
/// }
204
/// #     Ok(())
205
/// # }
206
/// ```
207

208
#[derive(Setters, Debug, Derivative)]
209
#[derivative(Default)]
210
#[setters(generate = false)]
211
pub struct VBuilder<
212
    W: BinSafe + Word,
213
    D: BitFieldSlice<W> + Send + Sync = Box<[W]>,
214
    S = [u64; 2],
215
    E: ShardEdge<S, 3> = FuseLge3Shards,
216
> {
217
    /// The expected number of keys.
218
    ///
219
    /// While this setter is optional, setting this value to a reasonable bound
220
    /// on the actual number of keys will significantly speed up the
221
    /// construction.
222
    #[setters(generate = true, strip_option)]
223
    #[derivative(Default(value = "None"))]
224
    expected_num_keys: Option<usize>,
225

226
    /// The maximum number of parallel threads to use. The default is 8.
227
    #[setters(generate = true)]
228
    #[derivative(Default(value = "8"))]
229
    max_num_threads: usize,
230

231
    /// Use disk-based buckets to reduce core memory usage at construction time.
232
    #[setters(generate = true)]
233
    offline: bool,
234

235
    /// Check for duplicated signatures. This is not necessary in general,
236
    /// but if you suspect you might be feeding duplicate keys, you can
237
    /// enable this check.
238
    #[setters(generate = true)]
239
    check_dups: bool,
240

241
    /// Use always the low-memory peel-by-signature algorithm (true) or the
242
    /// high-memory peel-by-index algorithm (false). The former is slightly
243
    /// slower, but it uses much less memory. Normally [`VBuilder`] uses
244
    /// high-mem and switches to low-mem if there are more
245
    /// than two threads and more than four shards.
246
    #[setters(generate = true, strip_option)]
247
    #[derivative(Default(value = "None"))]
248
    low_mem: Option<bool>,
249

250
    /// The seed for the random number generator.
251
    #[setters(generate = true)]
252
    seed: u64,
253

254
    /// The base-2 logarithm of buckets of the [`SigStore`]. The default is 8.
255
    /// This value is automatically overridden, even if set, if you provide an
256
    /// [expected number of keys](VBuilder::expected_num_keys).
257
    #[setters(generate = true, strip_option)]
258
    #[derivative(Default(value = "8"))]
259
    log2_buckets: u32,
260

261
    /// The target relative space loss due to [ε-cost
262
    /// sharding](https://doi.org/10.4230/LIPIcs.ESA.2019.38).s
263
    ///
264
    /// The default is 0.001. Setting a larger target, for example, 0.01, will
265
    /// increase the space overhead due to sharding, but will provide in general
266
    /// finer shards. This might not always happen, however, because the ε-cost
267
    /// bound is just one of the bounds concurring in limiting the amount of
268
    /// sharding for a specific [`ShardEdge`]. For example, increasing the
269
    /// target to 0.01 will provide very fine sharding using `Mwhc3Shards`
270
    /// shard/edge logic, activated by the `mwhc` feature, but will affect only
271
    /// slightly [`FuseLge3Shards`] or
272
    /// [`FuseLge3FullSigs`](crate::func::shard_edge::FuseLge3FullSigs).
273
    #[setters(generate = true, strip_option)]
274
    #[derivative(Default(value = "0.001"))]
275
    eps: f64,
276

277
    /// The bit width of the maximum value.
278
    bit_width: usize,
279
    /// The edge generator.
280
    shard_edge: E,
281
    /// The number of keys.
282
    num_keys: usize,
283
    /// The ratio between the number of vertices and the number of edges
284
    /// (i.e., between the number of variables and the number of equations).
285
    c: f64,
286
    /// Whether we should use [lazy Gaussian
287
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517).
288
    lge: bool,
289
    /// The number of threads to use.
290
    num_threads: usize,
291
    /// Fast-stop for failed attempts.
292
    failed: AtomicBool,
293
    #[doc(hidden)]
294
    _marker_v: PhantomData<(W, D, S)>,
295
}
296

297
/// Fatal build errors.
298
#[derive(thiserror::Error, Debug)]
299
pub enum BuildError {
300
    #[error("Duplicate key")]
301
    /// A duplicate key was detected.
302
    DuplicateKey,
303
    #[error("Duplicate local signatures: use full signatures")]
304
    /// A duplicate key was detected.
305
    DuplicateLocalSignatures,
306
    #[error("Value too large for specified bit size")]
307
    /// A value is too large for the specified bit size.
308
    ValueTooLarge,
309
}
310

311
#[derive(thiserror::Error, Debug)]
312
/// Transient error during the build, leading to
313
/// trying with a different seed.
314
pub enum SolveError {
315
    #[error("Duplicate signature")]
316
    /// A duplicate signature was detected.
317
    DuplicateSignature,
318
    #[error("Duplicate local signature")]
319
    /// A duplicate local signature was detected.
320
    DuplicateLocalSignature,
321
    #[error("Max shard too big")]
322
    /// The maximum shard is too big relatively to the average shard.
323
    MaxShardTooBig,
324
    #[error("Unsolvable shard")]
325
    /// A shard cannot be solved.
326
    UnsolvableShard,
327
}
328

329
/// The result of a peeling procedure.
330
enum PeelResult<
331
    'a,
332
    W: BinSafe + Word + Send + Sync,
333
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync + 'a,
334
    S: Sig + BinSafe,
335
    E: ShardEdge<S, 3>,
336
    V: BinSafe,
337
> {
338
    Complete(),
339
    Partial {
340
        /// The shard index.
341
        shard_index: usize,
342
        /// The shard.
343
        shard: Arc<Vec<SigVal<S, V>>>,
344
        /// The data.
345
        data: ShardData<'a, W, D>,
346
        /// The double stack whose upper stack contains the peeled edges
347
        /// (possibly represented by the vertex from which they have been
348
        /// peeled).
349
        double_stack: DoubleStack<E::Vertex>,
350
        /// The sides stack.
351
        sides_stack: Vec<u8>,
352
    },
353
}
354

355
/// A graph represented compactly.
356
///
357
/// Each (*k*-hyper)edge is a set of *k* vertices (by construction fuse graphs
358
/// do not have degenerate edges), but we represent it internally as a vector.
359
/// We call *side* the position of a vertex in the edge.
360
///
361
/// For each vertex we store information about the edges incident to the vertex
362
/// and the sides of the vertex in such edges. While technically not necessary
363
/// to perform peeling, the knowledge of the sides speeds up the peeling visit
364
/// by reducing the number of tests that are necessary to update the degrees
365
/// once the edge is peeled (see the `peel_by_*` methods). For the same reason
366
/// it also speeds up assignment.
367
///
368
/// Depending on the peeling method (by signature or by index), the graph will
369
/// store edge indices or signature/value pairs (the generic parameter `X`).
370
///
371
/// Edge information is packed together using Djamal's XOR trick (see
372
/// [“Cache-Oblivious Peeling of Random
373
/// Hypergraphs”](https://doi.org/10.1109/DCC.2014.48)): since during the
374
/// peeling b visit we need to know the content of the list only when a single
375
/// edge index is present, we can XOR together all the edge information.
376
///
377
/// We use a single byte to store the degree (six upper bits) and the XOR of the
378
/// sides (lower two bits). The degree can be stored with a small number of bits
379
/// because the graph is random, so the maximum degree is *O*(log log *n*).
380
/// In any case, the Boolean field `overflow` will become `true` in case of
381
/// overflow.
382
///
383
/// When we peel an edge, we just [zero the degree](Self::zero), leaving the
384
/// edge information and the side in place for further processing later.
385
struct XorGraph<X: Copy + Default + BitXor + BitXorAssign> {
386
    edges: Box<[X]>,
387
    degrees_sides: Box<[u8]>,
388
    overflow: bool,
389
}
390

391
impl<X: BitXor + BitXorAssign + Default + Copy> XorGraph<X> {
392
    pub fn new(n: usize) -> XorGraph<X> {
187✔
393
        XorGraph {
394
            edges: vec![X::default(); n].into(),
935✔
395
            degrees_sides: vec![0; n].into(),
374✔
396
            overflow: false,
397
        }
398
    }
399

400
    #[inline(always)]
401
    pub fn add(&mut self, v: usize, x: X, side: usize) {
69,242,100✔
402
        debug_assert!(side < 3);
138,484,200✔
403
        let (degree_size, overflow) = self.degrees_sides[v].overflowing_add(4);
207,726,300✔
404
        self.degrees_sides[v] = degree_size;
69,242,100✔
405
        self.overflow |= overflow;
69,242,100✔
406
        self.degrees_sides[v] ^= side as u8;
69,242,100✔
407
        self.edges[v] ^= x;
69,242,100✔
408
    }
409

410
    #[inline(always)]
411
    pub fn remove(&mut self, v: usize, x: X, side: usize) {
44,901,368✔
412
        debug_assert!(side < 3);
89,802,736✔
413
        self.degrees_sides[v] -= 4;
44,901,368✔
414
        self.degrees_sides[v] ^= side as u8;
44,901,368✔
415
        self.edges[v] ^= x;
44,901,368✔
416
    }
417

418
    #[inline(always)]
419
    pub fn zero(&mut self, v: usize) {
22,450,684✔
420
        self.degrees_sides[v] &= 0b11;
22,450,684✔
421
    }
422

423
    #[inline(always)]
424
    pub fn edge_and_side(&self, v: usize) -> (X, usize) {
32,450,684✔
425
        debug_assert!(self.degree(v) < 2);
97,352,052✔
426
        (self.edges[v] as _, (self.degrees_sides[v] & 0b11) as _)
64,901,368✔
427
    }
428

429
    #[inline(always)]
430
    pub fn degree(&self, v: usize) -> u8 {
101,315,344✔
431
        self.degrees_sides[v] >> 2
101,315,344✔
432
    }
433

434
    pub fn degrees(
187✔
435
        &self,
436
    ) -> std::iter::Map<std::iter::Copied<std::slice::Iter<'_, u8>>, fn(u8) -> u8> {
437
        self.degrees_sides.iter().copied().map(|d| d >> 2)
26,094,784✔
438
    }
439
}
440

441
/// A preallocated stack implementation that avoids the expensive (even if
442
/// rarely taken) branch of the `Vec` implementation in which memory is
443
/// reallocated. Note that using [`Vec::with_capacity`] is not enough, because
444
/// for the CPU the branch is still there.
445
struct FastStack<X: Copy + Default> {
446
    stack: Vec<X>,
447
    top: usize,
448
}
449

450
impl<X: Copy + Default> FastStack<X> {
451
    pub fn new(n: usize) -> FastStack<X> {
24✔
452
        FastStack {
453
            stack: vec![X::default(); n],
48✔
454
            top: 0,
455
        }
456
    }
457

458
    pub fn push(&mut self, x: X) {
24,000,000✔
459
        debug_assert!(self.top < self.stack.len());
72,000,000✔
460
        self.stack[self.top] = x;
24,000,000✔
461
        self.top += 1;
24,000,000✔
462
    }
463

464
    pub fn len(&self) -> usize {
12✔
465
        self.top
12✔
466
    }
467

468
    pub fn iter(&self) -> std::slice::Iter<'_, X> {
24✔
469
        self.stack[..self.top].iter()
48✔
470
    }
471
}
472

473
/// Two stacks in the same vector.
474
///
475
/// This struct implements a pair of stacks sharing the same memory. The lower
476
/// stack grows from the beginning of the vector, the upper stack grows from the
477
/// end of the vector. Since we use the lower stack for the visit and the upper
478
/// stack for peeled edges (possibly represented by the vertex from which they
479
/// have been peeled), the sum of the lengths of the two stacks cannot exceed
480
/// the length of the vector.
481
#[derive(Debug)]
482
struct DoubleStack<V> {
483
    stack: Vec<V>,
484
    lower: usize,
485
    upper: usize,
486
}
487

488
impl<V: Default + Copy> DoubleStack<V> {
489
    fn new(n: usize) -> DoubleStack<V> {
175✔
490
        DoubleStack {
491
            stack: vec![V::default(); n],
525✔
492
            lower: 0,
493
            upper: n,
494
        }
495
    }
496
}
497

498
impl<V: Copy> DoubleStack<V> {
499
    #[inline(always)]
500
    fn push_lower(&mut self, v: V) {
10,951,140✔
501
        debug_assert!(self.lower < self.upper);
21,902,280✔
502
        self.stack[self.lower] = v;
10,951,140✔
503
        self.lower += 1;
10,951,140✔
504
    }
505

506
    #[inline(always)]
507
    fn push_upper(&mut self, v: V) {
10,450,684✔
508
        debug_assert!(self.lower < self.upper);
20,901,368✔
509
        self.upper -= 1;
10,450,684✔
510
        self.stack[self.upper] = v;
10,450,684✔
511
    }
512

513
    #[inline(always)]
514
    fn pop_lower(&mut self) -> Option<V> {
10,951,315✔
515
        if self.lower == 0 {
10,951,315✔
516
            None
175✔
517
        } else {
518
            self.lower -= 1;
10,951,140✔
519
            Some(self.stack[self.lower])
×
520
        }
521
    }
522

523
    fn upper_len(&self) -> usize {
330✔
524
        self.stack.len() - self.upper
660✔
525
    }
526

527
    fn iter_upper(&self) -> Iter<'_, V> {
200✔
528
        self.stack[self.upper..].iter()
400✔
529
    }
530
}
531

532
/// An iterator over segments of data associated with each shard.
533
type ShardDataIter<'a, W, D> = <D as BitFieldSliceMut<W>>::ChunksMut<'a>;
534
/// A segment of data associated with a specific shard.
535
type ShardData<'a, W, D> = <ShardDataIter<'a, W, D> as Iterator>::Item;
536

537
/// Builds a new function using a `Box<[W]>` to store values.
538
///
539
/// Since values are stored in a boxed slice access is particularly fast, but
540
/// the bit width of the output of the function will be exactly the bit width of
541
/// the unsigned type `W`.
542
impl<W: BinSafe + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, Box<[W]>, S, E>
543
where
544
    u128: UpcastableFrom<W>,
545
    SigVal<S, W>: RadixKey,
546
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
547
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
548
{
549
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
550
        mut self,
551
        keys: impl RewindableIoLender<B>,
552
        values: impl RewindableIoLender<W>,
553
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
554
    ) -> anyhow::Result<VFunc<T, W, Box<[W]>, S, E>>
555
    where
556
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
557
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
558
    {
559
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
×
560
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
×
561
        self.build_loop(keys, values, None, &get_val, new_data, pl)
×
562
    }
563
}
564

565
/// Builds a new filter using a `Box<[W]>` to store values.
566
///
567
/// Since values are stored in a boxed slice access is particularly fast, but
568
/// the number of bits of the hashes will be  exactly the bit width of the
569
/// unsigned type `W`.
570
impl<W: BinSafe + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
571
    VBuilder<W, Box<[W]>, S, E>
572
where
573
    u128: UpcastableFrom<W>,
574
    SigVal<S, EmptyVal>: RadixKey,
575
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
576
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
577
{
578
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
36✔
579
        mut self,
580
        keys: impl RewindableIoLender<B>,
581
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
582
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, Box<[W]>, S, E>>>
583
    where
584
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
585
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
586
    {
587
        let filter_mask = W::MAX;
72✔
588
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
12,630,474✔
589
            mix64(shard_edge.edge_hash(sig_val.sig)).downcast()
63,152,190✔
590
        };
591
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
378✔
592

593
        Ok(VFilter {
×
594
            func: self.build_loop(
72✔
595
                keys,
36✔
596
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
108✔
597
                Some(W::BITS),
36✔
598
                &get_val,
36✔
599
                new_data,
36✔
600
                pl,
36✔
601
            )?,
602
            filter_mask,
36✔
603
            hash_bits: W::BITS as u32,
×
604
        })
605
    }
606
}
607

608
/// Builds a new function using a [bit-field vector](BitFieldVec) on words of
609
/// the unsigned type `W` to store values.
610
///
611
/// Since values are stored in a bit-field vector access will be slower than
612
/// when using a boxed slice, but the bit width of stored values will be the
613
/// minimum necessary. It must be in any case at most the bit width of `W`.
614
///
615
/// Typically `W` will be `usize` or `u64`.
616
impl<W: BinSafe + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, BitFieldVec<W>, S, E>
617
where
618
    u128: UpcastableFrom<W>,
619
    SigVal<S, W>: RadixKey,
620
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
621
{
622
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
27✔
623
        mut self,
624
        keys: impl RewindableIoLender<B>,
625
        values: impl RewindableIoLender<W>,
626
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
627
    ) -> anyhow::Result<VFunc<T, W, BitFieldVec<W>, S, E>> {
628
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
27✔
629
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
273✔
630
        self.build_loop(keys, values, None, &get_val, new_data, pl)
216✔
631
    }
632
}
633

634
/// Builds a new filter using a [bit-field vector](BitFieldVec) on words of the
635
/// unsigned type `W` to store values.
636
///
637
/// Since values are stored in a bit-field vector access will be slower than
638
/// when using a boxed slice, but the number of bits of the hashes can be set at
639
/// will. It must be in any case at most the bit width of `W`.
640
///
641
/// Typically `W` will be `usize` or `u64`.
642
impl<W: BinSafe + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
643
    VBuilder<W, BitFieldVec<W>, S, E>
644
where
645
    u128: UpcastableFrom<W>,
646
    SigVal<S, EmptyVal>: RadixKey,
647
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
648
{
649
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
650
        mut self,
651
        keys: impl RewindableIoLender<B>,
652
        filter_bits: usize,
653
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
654
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, BitFieldVec<W>, S, E>>> {
655
        assert!(filter_bits > 0);
×
656
        assert!(filter_bits <= W::BITS);
×
657
        let filter_mask = W::MAX >> (W::BITS - filter_bits);
×
658
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
×
659
            <W as DowncastableFrom<u64>>::downcast_from(mix64(shard_edge.edge_hash(sig_val.sig)))
×
660
                & filter_mask
×
661
        };
662
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
×
663

664
        Ok(VFilter {
×
665
            func: self.build_loop(
×
666
                keys,
×
667
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
×
668
                Some(filter_bits),
×
669
                &get_val,
×
670
                new_data,
×
671
                pl,
×
672
            )?,
673
            filter_mask,
×
674
            hash_bits: filter_bits as _,
×
675
        })
676
    }
677
}
678

679
impl<
680
    W: BinSafe + Word,
681
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
682
    S: Sig + Send + Sync,
683
    E: ShardEdge<S, 3>,
684
> VBuilder<W, D, S, E>
685
{
686
    /// Builds and returns a new function with given keys and values.
687
    ///
688
    /// This method can build functions based both on vectors and on bit-field
689
    /// vectors. The necessary abstraction is provided by the
690
    /// `new_data(bit_width, len)` function, which is called to create the data
691
    /// structure to store the values.
692
    ///
693
    /// When `V` is [`EmptyVal`], the this method builds a function supporting a
694
    /// filter by mapping each key to a mix of its local signature. The
695
    /// necessary abstraction is provided by the `get_val` function, which is
696
    /// called to extract the value from the signature/value pair; in the case
697
    /// of functions it returns the value stored in the signature/value pair,
698
    /// and in the case of filters it returns the hash associated with the
699
    /// signature.
700
    fn build_loop<
63✔
701
        T: ?Sized + ToSig<S> + std::fmt::Debug,
702
        B: ?Sized + Borrow<T>,
703
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
704
    >(
705
        &mut self,
706
        mut keys: impl RewindableIoLender<B>,
707
        mut values: impl RewindableIoLender<V>,
708
        bit_width: Option<usize>,
709
        get_val: &(impl Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync),
710
        new_data: fn(usize, usize) -> D,
711
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
712
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
713
    where
714
        u128: UpcastableFrom<W>,
715
        SigVal<S, V>: RadixKey,
716
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
717
        for<'a> ShardDataIter<'a, W, D>: Send,
718
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
719
    {
720
        let mut dup_count = 0;
126✔
721
        let mut local_dup_count = 0;
126✔
722
        let mut prng = SmallRng::seed_from_u64(self.seed);
189✔
723

724
        if let Some(expected_num_keys) = self.expected_num_keys {
115✔
725
            self.shard_edge.set_up_shards(expected_num_keys, self.eps);
×
726
            self.log2_buckets = self.shard_edge.shard_high_bits();
×
727
        }
728

729
        pl.info(format_args!("Using 2^{} buckets", self.log2_buckets));
252✔
730

731
        // Loop until success or duplicate detection
732
        loop {
×
733
            let seed = prng.random();
591✔
734

735
            match if self.offline {
197✔
736
                self.try_seed(
144✔
737
                    seed,
72✔
738
                    sig_store::new_offline::<S, V>(
72✔
739
                        self.log2_buckets,
72✔
740
                        LOG2_MAX_SHARDS,
×
741
                        self.expected_num_keys,
72✔
742
                    )?,
743
                    &mut keys,
×
744
                    &mut values,
×
745
                    bit_width,
×
746
                    get_val,
×
747
                    new_data,
×
748
                    pl,
×
749
                )
750
            } else {
751
                self.try_seed(
250✔
752
                    seed,
×
753
                    sig_store::new_online::<S, V>(
×
754
                        self.log2_buckets,
×
755
                        LOG2_MAX_SHARDS,
×
756
                        self.expected_num_keys,
×
757
                    )?,
758
                    &mut keys,
×
759
                    &mut values,
×
760
                    bit_width,
×
761
                    get_val,
×
762
                    new_data,
×
763
                    pl,
×
764
                )
765
            } {
766
                Ok(func) => {
62✔
767
                    return Ok(func);
×
768
                }
769
                Err(error) => {
135✔
770
                    match error.downcast::<SolveError>() {
135✔
771
                        Ok(vfunc_error) => match vfunc_error {
135✔
772
                            // Let's try another seed, but just a few times--most likely,
773
                            // duplicate keys
774
                            SolveError::DuplicateSignature => {
×
775
                                if dup_count >= 3 {
4✔
776
                                    pl.error(format_args!("Duplicate keys (duplicate 128-bit signatures with four different seeds)"));
4✔
777
                                    return Err(BuildError::DuplicateKey.into());
1✔
778
                                }
779
                                pl.warn(format_args!(
×
780
                                "Duplicate 128-bit signature, trying again with a different seed..."
×
781
                                ));
782
                                dup_count += 1;
×
783
                            }
784
                            // Let's try another seed, but just a few times
785
                            SolveError::DuplicateLocalSignature => {
×
786
                                if local_dup_count >= 2 {
×
787
                                    pl.error(format_args!("Duplicate local signatures: use full signatures (duplicate local signatures with three different seeds)"));
×
788
                                    return Err(BuildError::DuplicateLocalSignatures.into());
×
789
                                }
790
                                pl.warn(format_args!(
×
791
                                "Duplicate local signature, trying again with a different seed..."
×
792
                                ));
793
                                local_dup_count += 1;
×
794
                            }
795
                            SolveError::MaxShardTooBig => {
1✔
796
                                pl.warn(format_args!(
3✔
797
                                "The maximum shard is too big, trying again with a different seed..."
1✔
798
                               ));
799
                            }
800
                            // Let's just try another seed
801
                            SolveError::UnsolvableShard => {
130✔
802
                                pl.warn(format_args!(
390✔
803
                                    "Unsolvable shard, trying again with a different seed..."
130✔
804
                                ));
805
                            }
806
                        },
807
                        Err(error) => return Err(error),
×
808
                    }
809
                }
810
            }
811

812
            values = values.rewind()?;
268✔
813
            keys = keys.rewind()?;
134✔
814
        }
815
    }
816
}
817

818
impl<
819
    W: BinSafe + Word,
820
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
821
    S: Sig + Send + Sync,
822
    E: ShardEdge<S, 3>,
823
> VBuilder<W, D, S, E>
824
{
825
    /// Tries to build a function using specific seed. See the comments in the
826
    /// [`VBuilder::build_loop`] method for more details.
827
    ///
828
    /// This methods reads the input, sets up the shards, allocates the backend
829
    /// using `new_data`, and passes the backend and an iterator on shards to
830
    /// the [`VBuilder::try_build_from_shard_iter`] method.
831
    fn try_seed<
197✔
832
        T: ?Sized + ToSig<S> + std::fmt::Debug,
833
        B: ?Sized + Borrow<T>,
834
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
835
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
836
    >(
837
        &mut self,
838
        seed: u64,
839
        mut sig_store: impl SigStore<S, V>,
840
        keys: &mut impl RewindableIoLender<B>,
841
        values: &mut impl RewindableIoLender<V>,
842
        bit_width: Option<usize>,
843
        get_val: &G,
844
        new_data: fn(usize, usize) -> D,
845
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
846
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
847
    where
848
        SigVal<S, V>: RadixKey,
849
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
850
        for<'a> ShardDataIter<'a, W, D>: Send,
851
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
852
    {
853
        let shard_edge = &mut self.shard_edge;
394✔
854

855
        pl.expected_updates(self.expected_num_keys);
591✔
856
        pl.item_name("key");
394✔
857
        pl.start(format!(
591✔
858
            "Computing and storing {}-bit signatures in {} using seed 0x{:016x}...",
197✔
859
            std::mem::size_of::<S>() * 8,
197✔
860
            sig_store
197✔
861
                .temp_dir()
197✔
862
                .map(|d| d.path().to_string_lossy())
341✔
863
                .unwrap_or(Cow::Borrowed("memory")),
394✔
864
            seed
×
865
        ));
866

867
        // This will be the maximum value for functions and EmptyVal for filters
868
        let mut maybe_max_value = V::default();
394✔
869
        let start = Instant::now();
394✔
870

871
        while let Some(result) = keys.next() {
46,361,677✔
872
            match result {
×
873
                Ok(key) => {
23,180,740✔
874
                    pl.light_update();
×
875
                    // This might be an actual value, if we are building a
876
                    // function, or EmptyVal, if we are building a filter.
877
                    let &maybe_val = values.next().expect("Not enough values")?;
23,180,740✔
878
                    let sig_val = SigVal {
879
                        sig: T::to_sig(key.borrow(), seed),
×
880
                        val: maybe_val,
881
                    };
882
                    maybe_max_value = Ord::max(maybe_max_value, maybe_val);
×
883
                    sig_store.try_push(sig_val)?;
×
884
                }
885
                Err(e) => return Err(e.into()),
×
886
            }
887
        }
888
        pl.done();
394✔
889

890
        self.num_keys = sig_store.len();
197✔
891
        self.bit_width = if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
197✔
892
            bit_width.expect("Bit width must be set for filters")
345✔
893
        } else {
894
            let len_width = maybe_max_value.upcast().len() as usize;
82✔
895
            if let Some(bit_width) = bit_width {
×
896
                if len_width > bit_width {
×
897
                    return Err(BuildError::ValueTooLarge.into());
×
898
                }
899
                bit_width
×
900
            } else {
901
                len_width
82✔
902
            }
903
        };
904

905
        info!(
197✔
906
            "Computation of signatures from inputs completed in {:.3} seconds ({} keys, {:.3} ns/key)",
160✔
907
            start.elapsed().as_secs_f64(),
320✔
908
            self.num_keys,
×
909
            start.elapsed().as_nanos() as f64 / self.num_keys as f64
160✔
910
        );
911

912
        let start = Instant::now();
×
913

914
        let shard_store = sig_store.into_shard_store(shard_edge.shard_high_bits())?;
197✔
915
        let max_shard = shard_store.shard_sizes().iter().copied().max().unwrap_or(0);
×
916
        let filter = TypeId::of::<V>() == TypeId::of::<EmptyVal>();
×
917

918
        shard_edge.set_up_shards(self.num_keys, self.eps);
×
919
        (self.c, self.lge) = shard_edge.set_up_graphs(self.num_keys, max_shard);
×
920

921
        if filter {
115✔
922
            pl.info(format_args!(
345✔
923
                "Number of keys: {} Bit width: {}",
115✔
924
                self.num_keys, self.bit_width,
115✔
925
            ));
926
        } else {
927
            pl.info(format_args!(
82✔
928
                "Number of keys: {} Max value: {} Bit width: {}",
82✔
929
                self.num_keys,
82✔
930
                maybe_max_value.upcast(),
82✔
931
                self.bit_width,
82✔
932
            ));
933
        }
934

935
        if shard_edge.shard_high_bits() != 0 {
6✔
936
            pl.info(format_args!(
24✔
937
                "Max shard / average shard: {:.2}%",
12✔
938
                (100.0 * max_shard as f64)
12✔
939
                    / (self.num_keys as f64 / shard_edge.num_shards() as f64)
12✔
940
            ));
941
        }
942

943
        if max_shard as f64 > 1.01 * self.num_keys as f64 / shard_edge.num_shards() as f64 {
×
944
            // This might sometimes happen with small sharded graphs
945
            Err(SolveError::MaxShardTooBig.into())
1✔
946
        } else {
947
            let data = new_data(
948
                self.bit_width,
×
949
                shard_edge.num_vertices() * shard_edge.num_shards(),
×
950
            );
951
            self.try_build_from_shard_iter(seed, data, shard_store.into_iter(), get_val, pl)
×
952
                .inspect(|_| {
62✔
953
                    info!(
62✔
954
                        "Construction from signatures completed in {:.3} seconds ({} keys, {:.3} ns/key)",
52✔
955
                        start.elapsed().as_secs_f64(),
104✔
956
                        self.num_keys,
×
957
                        start.elapsed().as_nanos() as f64 / self.num_keys as f64
52✔
958
                    );
959
                })
960
                .map_err(Into::into)
×
961
        }
962
    }
963

964
    /// Builds and return a new function starting from an iterator on shards.
965
    ///
966
    /// See [`VBuilder::build_loop`] for more details on the parameters.
967
    fn try_build_from_shard_iter<
196✔
968
        T: ?Sized + ToSig<S>,
969
        I,
970
        P,
971
        V: BinSafe + Default + Send + Sync,
972
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
973
    >(
974
        &mut self,
975
        seed: u64,
976
        mut data: D,
977
        shard_iter: I,
978
        get_val: &G,
979
        pl: &mut P,
980
    ) -> Result<VFunc<T, W, D, S, E>, SolveError>
981
    where
982
        SigVal<S, V>: RadixKey,
983
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
984
        P: ProgressLog + Clone + Send + Sync,
985
        I: Iterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
986
        for<'a> ShardDataIter<'a, W, D>: Send,
987
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
988
    {
989
        let shard_edge = &self.shard_edge;
392✔
990
        self.num_threads = shard_edge.num_shards().min(self.max_num_threads);
588✔
991

992
        pl.info(format_args!("{}", self.shard_edge));
784✔
993
        pl.info(format_args!(
588✔
994
            "c: {}, Overhead: {:+.4}% Number of threads: {}",
196✔
995
            self.c,
×
996
            100. * ((shard_edge.num_vertices() * shard_edge.num_shards()) as f64
588✔
997
                / (self.num_keys as f64)
196✔
998
                - 1.),
196✔
999
            self.num_threads
×
1000
        ));
1001

1002
        if self.lge {
196✔
1003
            pl.info(format_args!("Peeling towards lazy Gaussian elimination"));
696✔
1004
            self.par_solve(
348✔
1005
                shard_iter,
174✔
1006
                &mut data,
174✔
1007
                |this, shard_index, shard, data, pl| {
165✔
1008
                    this.lge_shard(shard_index, shard, data, get_val, pl)
1,155✔
1009
                },
1010
                &mut pl.concurrent(),
174✔
1011
                pl,
174✔
1012
            )?;
1013
        } else if self.low_mem == Some(true)
22✔
1014
            || self.low_mem.is_none() && self.num_threads > 3 && shard_edge.num_shards() > 2
14✔
1015
        {
1016
            // Less memory, but slower
1017
            self.par_solve(
10✔
1018
                shard_iter,
×
1019
                &mut data,
×
1020
                |this, shard_index, shard, data, pl| {
10✔
1021
                    this.peel_by_sig_vals_low_mem(shard_index, shard, data, get_val, pl)
70✔
1022
                },
1023
                &mut pl.concurrent(),
×
1024
                pl,
×
1025
            )?;
1026
        } else {
1027
            // More memory, but faster
1028
            self.par_solve(
12✔
1029
                shard_iter,
×
1030
                &mut data,
×
1031
                |this, shard_index, shard, data, pl| {
12✔
1032
                    this.peel_by_sig_vals_high_mem(shard_index, shard, data, get_val, pl)
84✔
1033
                },
1034
                &mut pl.concurrent(),
×
1035
                pl,
×
1036
            )?;
1037
        }
1038

1039
        pl.info(format_args!(
62✔
1040
            "Bits/keys: {} ({:+.4}%)",
×
1041
            data.len() as f64 * self.bit_width as f64 / self.num_keys as f64,
×
1042
            100.0 * (data.len() as f64 / self.num_keys as f64 - 1.),
×
1043
        ));
1044

1045
        Ok(VFunc {
×
1046
            seed,
×
1047
            shard_edge: self.shard_edge,
×
1048
            num_keys: self.num_keys,
×
1049
            data,
×
1050
            _marker_t: std::marker::PhantomData,
×
1051
            _marker_w: std::marker::PhantomData,
×
1052
            _marker_s: std::marker::PhantomData,
×
1053
        })
1054
    }
1055
}
1056

1057
macro_rules! remove_edge {
1058
    ($xor_graph: ident, $e: ident, $side: ident, $edge: ident, $stack: ident, $push:ident) => {
1059
        match $side {
1060
            0 => {
1061
                if $xor_graph.degree($e[1]) == 2 {
1062
                    $stack.$push($e[1].cast());
1063
                }
1064
                $xor_graph.remove($e[1], $edge, 1);
1065
                if $xor_graph.degree($e[2]) == 2 {
1066
                    $stack.$push($e[2].cast());
1067
                }
1068
                $xor_graph.remove($e[2], $edge, 2);
1069
            }
1070
            1 => {
1071
                if $xor_graph.degree($e[0]) == 2 {
1072
                    $stack.$push($e[0].cast());
1073
                }
1074
                $xor_graph.remove($e[0], $edge, 0);
1075
                if $xor_graph.degree($e[2]) == 2 {
1076
                    $stack.$push($e[2].cast());
1077
                }
1078
                $xor_graph.remove($e[2], $edge, 2);
1079
            }
1080
            2 => {
1081
                if $xor_graph.degree($e[0]) == 2 {
1082
                    $stack.$push($e[0].cast());
1083
                }
1084
                $xor_graph.remove($e[0], $edge, 0);
1085
                if $xor_graph.degree($e[1]) == 2 {
1086
                    $stack.$push($e[1].cast());
1087
                }
1088
                $xor_graph.remove($e[1], $edge, 1);
1089
            }
1090
            _ => unsafe { unreachable_unchecked() },
1091
        }
1092
    };
1093
}
1094

1095
impl<
1096
    W: BinSafe + Word + Send + Sync,
1097
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
1098
    S: Sig + BinSafe,
1099
    E: ShardEdge<S, 3>,
1100
> VBuilder<W, D, S, E>
1101
{
1102
    fn count_sort<V: BinSafe>(&self, data: &mut [SigVal<S, V>]) {
187✔
1103
        let num_sort_keys = self.shard_edge.num_sort_keys();
561✔
1104
        let mut count = vec![0; num_sort_keys];
561✔
1105

1106
        let mut copied = Box::new_uninit_slice(data.len());
748✔
1107
        for (&sig_val, copy) in data.iter().zip(copied.iter_mut()) {
23,081,635✔
1108
            count[self.shard_edge.sort_key(sig_val.sig)] += 1;
×
1109
            copy.write(sig_val);
×
1110
        }
1111
        let copied = unsafe { copied.assume_init() };
561✔
1112

1113
        count.iter_mut().fold(0, |acc, c| {
8,613✔
1114
            let old = *c;
16,478✔
1115
            *c = acc;
8,239✔
1116
            acc + old
8,239✔
1117
        });
1118

1119
        for &sig_val in copied.iter() {
23,081,074✔
1120
            let key = self.shard_edge.sort_key(sig_val.sig);
×
1121
            data[count[key]] = sig_val;
×
1122
            count[key] += 1;
×
1123
        }
1124
    }
1125

1126
    /// After this number of keys, in the case of filters we remove duplicate
1127
    /// edges.
1128
    const MAX_NO_LOCAL_SIG_CHECK: usize = 1 << 33;
1129

1130
    /// Solves in parallel shards returned by an iterator, storing
1131
    /// the result in `data`.
1132
    ///
1133
    /// # Arguments
1134
    ///
1135
    /// * `shard_iter`: an iterator returning the shards to solve.
1136
    ///
1137
    /// * `data`: the storage for the solution values.
1138
    ///
1139
    /// * `solve_shard`: a method to solve a shard; it takes the shard index,
1140
    ///   the shard, shard-local data, and a progress logger. It may
1141
    ///   fail by returning an error.
1142
    ///
1143
    /// * `main_pl`: the progress logger for the overall computation.
1144
    ///
1145
    /// * `pl`: a progress logger that will be cloned to display the progress of
1146
    ///   a current shard.
1147
    ///
1148
    /// # Errors
1149
    ///
1150
    /// This method returns an error if one of the shards cannot be solved, or
1151
    /// if duplicates are detected.
1152
    fn par_solve<
196✔
1153
        'b,
1154
        V: BinSafe,
1155
        I: IntoIterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1156
        SS: Fn(&Self, usize, Arc<Vec<SigVal<S, V>>>, ShardData<'b, W, D>, &mut P) -> Result<(), ()>
1157
            + Send
1158
            + Sync
1159
            + Copy,
1160
        C: ConcurrentProgressLog + Send + Sync,
1161
        P: ProgressLog + Clone + Send + Sync,
1162
    >(
1163
        &self,
1164
        shard_iter: I,
1165
        data: &'b mut D,
1166
        solve_shard: SS,
1167
        main_pl: &mut C,
1168
        pl: &mut P,
1169
    ) -> Result<(), SolveError>
1170
    where
1171
        I::IntoIter: Send,
1172
        SigVal<S, V>: RadixKey,
1173
        for<'a> ShardDataIter<'a, W, D>: Send,
1174
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
1175
    {
1176
        main_pl
392✔
1177
            .item_name("shard")
1178
            .expected_updates(Some(self.shard_edge.num_shards()))
392✔
1179
            .display_memory(true)
1180
            .start("Solving shards...");
1181

1182
        self.failed.store(false, Ordering::Relaxed);
588✔
1183
        let num_shards = self.shard_edge.num_shards();
588✔
1184
        let buffer_size = self.num_threads.ilog2() as usize;
392✔
1185

1186
        let (err_send, err_recv) = crossbeam_channel::bounded::<_>(self.num_threads);
588✔
1187
        let (data_send, data_recv) = crossbeam_channel::bounded::<(
1188
            usize,
1189
            (Arc<Vec<SigVal<S, V>>>, ShardData<'_, W, D>),
1190
        )>(buffer_size);
196✔
1191

1192
        let result = std::thread::scope(|scope| {
588✔
1193
            scope.spawn(move || {
588✔
1194
                let _ = thread_priority::set_current_thread_priority(ThreadPriority::Max);
196✔
1195
                for val in shard_iter
397✔
1196
                    .into_iter()
196✔
1197
                    .zip(data.try_chunks_mut(self.shard_edge.num_vertices()).unwrap())
1,176✔
1198
                    .enumerate()
196✔
1199
                {
1200
                    if data_send.send(val).is_err() {
603✔
1201
                        break;
×
1202
                    }
1203
                }
1204

1205
                drop(data_send);
392✔
1206
            });
1207

1208
            for _thread_id in 0..self.num_threads {
397✔
1209
                let mut main_pl = main_pl.clone();
×
1210
                let mut pl = pl.clone();
×
1211
                let err_send = err_send.clone();
×
1212
                let data_recv = data_recv.clone();
×
1213
                scope.spawn(move || {
201✔
1214
                    loop {
×
1215
                        match data_recv.recv() {
258✔
1216
                            Err(_) => return,
57✔
1217
                            Ok((shard_index, (shard, mut data))) => {
201✔
1218
                                if shard.is_empty() {
×
1219
                                    return;
10✔
1220
                                }
1221

1222
                                main_pl.info(format_args!(
×
1223
                                    "Analyzing shard {}/{}...",
×
1224
                                    shard_index + 1,
×
1225
                                    num_shards
×
1226
                                ));
1227

1228
                                pl.start(format!(
×
1229
                                    "Sorting shard {}/{}...",
×
1230
                                    shard_index + 1,
×
1231
                                    num_shards
×
1232
                                ));
1233

1234
                                {
1235
                                    // SAFETY: only one thread may be accessing the
1236
                                    // shard, and we will be consuming it
1237
                                    let shard = unsafe {
×
1238
                                        &mut *(Arc::as_ptr(&shard) as *mut Vec<SigVal<S, V>>)
×
1239
                                    };
1240

1241
                                    if self.check_dups {
×
1242
                                        shard.radix_sort_builder().sort();
12✔
1243
                                        if shard.par_windows(2).any(|w| w[0].sig == w[1].sig) {
16✔
1244
                                            let _ = err_send.send(SolveError::DuplicateSignature);
8✔
1245
                                            return;
4✔
1246
                                        }
1247
                                    }
1248

1249
                                    if TypeId::of::<E::LocalSig>() != TypeId::of::<S>()
187✔
1250
                                        && self.num_keys > Self::MAX_NO_LOCAL_SIG_CHECK
109✔
1251
                                    {
1252
                                        // Check for duplicate local signatures
1253

1254
                                        // E::SortSig provides a transmutable
1255
                                        // view of SigVal with an implementation
1256
                                        // of RadixKey that is compatible with
1257
                                        // the sort induced by the key returned
1258
                                        // by ShardEdge::sort_key, and equality
1259
                                        // that implies equality of local
1260
                                        // signatures.
1261

1262
                                        // SAFETY: we drop this immediately after sorting.
1263
                                        let shard = unsafe {
×
1264
                                            transmute::<
×
1265
                                                &mut Vec<SigVal<S, V>>,
×
1266
                                                &mut Vec<E::SortSigVal<V>>,
×
1267
                                            >(shard)
×
1268
                                        };
1269

1270
                                        let builder = shard.radix_sort_builder();
×
1271
                                        if self.max_num_threads == 1 {
×
1272
                                            builder
×
1273
                                                .with_single_threaded_tuner()
×
1274
                                                .with_parallel(false)
×
1275
                                        } else {
1276
                                            builder
×
1277
                                        }
1278
                                        .sort();
×
1279

1280
                                        let shard_len = shard.len();
×
1281
                                        shard.dedup();
×
1282

1283
                                        if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
×
1284
                                            // Duplicate local signatures on
1285
                                            // large filters can be simply
1286
                                            // removed: they do not change the
1287
                                            // semantics of the filter because
1288
                                            // hashes are computed on
1289
                                            // local signatures.
1290
                                            pl.info(format_args!(
×
1291
                                                "Removed signatures: {}",
×
1292
                                                shard_len - shard.len()
×
1293
                                            ));
1294
                                        } else {
1295
                                            // For function, we have to try again
1296
                                            if shard_len != shard.len() {
×
1297
                                                let _ = err_send
×
1298
                                                    .send(SolveError::DuplicateLocalSignature);
×
1299
                                                return;
×
1300
                                            }
1301
                                        }
1302
                                    } else if self.shard_edge.num_sort_keys() != 1 {
374✔
1303
                                        // Sorting the signatures increases locality
1304
                                        self.count_sort::<V>(shard);
374✔
1305
                                    }
1306
                                }
1307

1308
                                pl.done_with_count(shard.len());
187✔
1309

1310
                                main_pl.info(format_args!(
×
1311
                                    "Solving shard {}/{}...",
×
1312
                                    shard_index + 1,
×
1313
                                    num_shards
×
1314
                                ));
1315

1316
                                if self.failed.load(Ordering::Relaxed) {
×
1317
                                    return;
×
1318
                                }
1319

1320
                                if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
111✔
1321
                                    // For filters, we fill the array with random data, otherwise
1322
                                    // elements with signature 0 would have a significantly higher
1323
                                    // probability of being false positives.
1324
                                    //
1325
                                    // SAFETY: We work around the fact that [usize] does not implement Fill
1326
                                    Mwc192::seed_from_u64(self.seed).fill_bytes(unsafe {
333✔
1327
                                        data.as_mut_slice().align_to_mut::<u8>().1
111✔
1328
                                    });
1329
                                }
1330

1331
                                if solve_shard(self, shard_index, shard, data, &mut pl).is_err() {
×
1332
                                    let _ = err_send.send(SolveError::UnsolvableShard);
260✔
1333
                                    return;
130✔
1334
                                }
1335

1336
                                if self.failed.load(Ordering::Relaxed) {
×
1337
                                    return;
×
1338
                                }
1339

1340
                                main_pl.info(format_args!(
×
1341
                                    "Completed shard {}/{}",
×
1342
                                    shard_index + 1,
×
1343
                                    num_shards
×
1344
                                ));
1345
                                main_pl.update_and_display();
×
1346
                            }
1347
                        }
1348
                    }
1349
                });
1350
            }
1351

1352
            drop(err_send);
392✔
1353
            drop(data_recv);
392✔
1354

1355
            if let Some(error) = err_recv.into_iter().next() {
330✔
1356
                self.failed.store(true, Ordering::Relaxed);
×
1357
                return Err(error);
×
1358
            }
1359

1360
            Ok(())
62✔
1361
        });
1362

1363
        main_pl.done();
392✔
1364
        result
196✔
1365
    }
1366

1367
    /// Peels a shard via edge indices.
1368
    ///
1369
    /// This peeler uses a [`SigVal`] per key (the shard), a
1370
    /// [`ShardEdge::Vertex`] and a byte per vertex (for the [`XorGraph`]), a
1371
    /// [`ShardEdge::Vertex`] per vertex (for the [`DoubleStack`]), and a final
1372
    /// byte per vertex (for the stack of sides).
1373
    ///
1374
    /// This peeler uses more memory than
1375
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) but
1376
    /// less memory than
1377
    /// [`peel_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem). It
1378
    /// is fairly slow as it has to go through a cache-unfriendly memory
1379
    /// indirection every time it has to retrieve a [`SigVal`] from the shard,
1380
    /// but it is the peeler of choice when [lazy Gaussian
1381
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) is required, as
1382
    /// after a failed peel-by-sig-vals it is not possible to retrieve
1383
    /// information about the signature/value pairs in the shard.
1384
    ///
1385
    /// In theory one could avoid the stack of sides by putting vertices,
1386
    /// instead of edge indices, on the upper stack, and retrieving edge indices
1387
    /// and sides from the [`XorGraph`], as
1388
    /// [`peel_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) does,
1389
    ///  but this would be less cache friendly. This peeler is only used for
1390
    /// very small instances, and since we are going to pass through lazy
1391
    /// Gaussian elimination some additional speed is a good idea.
1392
    fn peel_by_index<'a, V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1393
        &self,
1394
        shard_index: usize,
1395
        shard: Arc<Vec<SigVal<S, V>>>,
1396
        data: ShardData<'a, W, D>,
1397
        get_val: &G,
1398
        pl: &mut impl ProgressLog,
1399
    ) -> Result<PeelResult<'a, W, D, S, E, V>, ()> {
1400
        let shard_edge = &self.shard_edge;
330✔
1401
        let num_vertices = shard_edge.num_vertices();
495✔
1402
        let num_shards = shard_edge.num_shards();
495✔
1403

1404
        pl.start(format!(
495✔
1405
            "Generating graph for shard {}/{}...",
165✔
1406
            shard_index + 1,
165✔
1407
            num_shards
×
1408
        ));
1409

1410
        let mut xor_graph = XorGraph::<E::Vertex>::new(num_vertices);
495✔
1411
        for (edge_index, sig_val) in shard.iter().enumerate() {
1,081,030✔
1412
            for (side, &v) in shard_edge
3,242,100✔
1413
                .local_edge(shard_edge.local_sig(sig_val.sig))
×
1414
                .iter()
×
1415
                .enumerate()
×
1416
            {
1417
                xor_graph.add(v, edge_index.cast(), side);
×
1418
            }
1419
        }
1420
        pl.done_with_count(shard.len());
495✔
1421

1422
        assert!(
165✔
1423
            !xor_graph.overflow,
165✔
1424
            "Degree overflow for shard {}/{}",
×
1425
            shard_index + 1,
×
1426
            num_shards
×
1427
        );
1428

1429
        if self.failed.load(Ordering::Relaxed) {
495✔
1430
            return Err(());
×
1431
        }
1432

1433
        // The lower stack contains vertices to be visited. The upper stack
1434
        // contains peeled edges.
1435
        let mut double_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1436
        let mut sides_stack = Vec::<u8>::new();
×
1437

1438
        pl.start(format!(
×
1439
            "Peeling graph for shard {}/{} by edge indices...",
×
1440
            shard_index + 1,
×
1441
            num_shards
×
1442
        ));
1443

1444
        // Preload all vertices of degree one in the visit stack
1445
        for (v, degree) in xor_graph.degrees().enumerate() {
1,223,124✔
1446
            if degree == 1 {
229,108✔
1447
                double_stack.push_lower(v.cast());
687,324✔
1448
            }
1449
        }
1450

1451
        while let Some(v) = double_stack.pop_lower() {
967,413✔
1452
            let v: usize = v.upcast();
×
1453
            if xor_graph.degree(v) == 0 {
×
1454
                continue;
32,940✔
1455
            }
1456
            debug_assert!(xor_graph.degree(v) == 1);
1,352,052✔
1457
            let (edge_index, side) = xor_graph.edge_and_side(v);
1,802,736✔
1458
            xor_graph.zero(v);
1,352,052✔
1459
            double_stack.push_upper(edge_index);
1,352,052✔
1460
            sides_stack.push(side as u8);
1,352,052✔
1461

1462
            let e = shard_edge.local_edge(shard_edge.local_sig(shard[edge_index.upcast()].sig));
3,154,788✔
1463
            remove_edge!(xor_graph, e, side, edge_index, double_stack, push_lower);
450,684✔
1464
        }
1465

1466
        pl.done();
330✔
1467

1468
        if shard.len() != double_stack.upper_len() {
330✔
1469
            pl.info(format_args!(
465✔
1470
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
155✔
1471
                shard_index + 1,
155✔
1472
                num_shards,
×
1473
                double_stack.upper_len(),
310✔
1474
                shard.len(),
155✔
1475
            ));
1476
            return Ok(PeelResult::Partial {
155✔
1477
                shard_index,
310✔
1478
                shard,
310✔
1479
                data,
310✔
1480
                double_stack,
155✔
1481
                sides_stack,
155✔
1482
            });
1483
        }
1484

1485
        self.assign(
×
1486
            shard_index,
×
1487
            data,
×
1488
            double_stack
×
1489
                .iter_upper()
×
1490
                .map(|&edge_index| {
100✔
1491
                    // Turn edge indices into local edge signatures
1492
                    // and associated values
1493
                    let sig_val = &shard[edge_index.upcast()];
300✔
1494
                    let local_sig = shard_edge.local_sig(sig_val.sig);
400✔
1495
                    (
1496
                        local_sig,
200✔
1497
                        get_val(
200✔
1498
                            shard_edge,
200✔
1499
                            SigVal {
100✔
1500
                                sig: local_sig,
100✔
1501
                                val: sig_val.val,
100✔
1502
                            },
1503
                        ),
1504
                    )
1505
                })
1506
                .zip(sides_stack.into_iter().rev()),
×
1507
            pl,
×
1508
        );
1509

1510
        Ok(PeelResult::Complete())
×
1511
    }
1512

1513
    /// Peels a shard via signature/value pairs using a stack of peeled
1514
    /// signatures/value pairs.
1515
    ///
1516
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1517
    /// it drops it immediately after building the graph.
1518
    ///
1519
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]), a
1520
    /// [`ShardEdge::Vertex`] per vertex (for visit stack, albeit usually the stack
1521
    /// never contains more than a third of the vertices), and a [`SigVal`] and
1522
    /// a byte per key (for the stack of peeled edges).
1523
    ///
1524
    /// This is the fastest and more memory-consuming peeler. It has however
1525
    /// just a small advantage during assignment with respect to
1526
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem), which
1527
    /// uses almost half the memory. It is the peeler of choice for low levels
1528
    /// of parallelism.
1529
    ///
1530
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1531
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a failed
1532
    /// peeling it is not possible to retrieve information about the
1533
    /// signature/value pairs in the shard.
1534
    fn peel_by_sig_vals_high_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
12✔
1535
        &self,
1536
        shard_index: usize,
1537
        shard: Arc<Vec<SigVal<S, V>>>,
1538
        data: ShardData<'_, W, D>,
1539
        get_val: &G,
1540
        pl: &mut impl ProgressLog,
1541
    ) -> Result<(), ()>
1542
    where
1543
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1544
    {
1545
        let shard_edge = &self.shard_edge;
24✔
1546
        let num_vertices = shard_edge.num_vertices();
36✔
1547
        let num_shards = shard_edge.num_shards();
36✔
1548
        let shard_len = shard.len();
24✔
1549

1550
        pl.start(format!(
36✔
1551
            "Generating graph for shard {}/{}...",
12✔
1552
            shard_index + 1,
12✔
1553
            num_shards
×
1554
        ));
1555

1556
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
36✔
1557
        for &sig_val in shard.iter() {
12,000,024✔
1558
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1559
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
36,000,000✔
1560
                xor_graph.add(
×
1561
                    v,
×
1562
                    SigVal {
×
1563
                        sig: local_sig,
×
1564
                        val: sig_val.val,
×
1565
                    },
1566
                    side,
×
1567
                );
1568
            }
1569
        }
1570
        pl.done_with_count(shard.len());
36✔
1571

1572
        // We are using a consuming iterator over the shard store, so this
1573
        // drop will free the memory used by the signatures
1574
        drop(shard);
24✔
1575

1576
        assert!(
12✔
1577
            !xor_graph.overflow,
12✔
1578
            "Degree overflow for shard {}/{}",
×
1579
            shard_index + 1,
×
1580
            num_shards
×
1581
        );
1582

1583
        if self.failed.load(Ordering::Relaxed) {
36✔
1584
            return Err(());
×
1585
        }
1586

1587
        let mut sig_vals_stack = FastStack::<SigVal<E::LocalSig, V>>::new(shard_len);
×
1588
        let mut sides_stack = FastStack::<u8>::new(shard_len);
×
1589
        // Experimentally this stack never grows beyond a little more than
1590
        // num_vertices / 4
1591
        let mut visit_stack = Vec::<E::Vertex>::with_capacity(num_vertices / 3);
×
1592

1593
        pl.start(format!(
×
1594
            "Peeling graph for shard {}/{} by signatures (high-mem)...",
×
1595
            shard_index + 1,
×
1596
            num_shards
×
1597
        ));
1598

1599
        // Preload all vertices of degree one in the visit stack
1600
        for (v, degree) in xor_graph.degrees().enumerate() {
13,565,952✔
1601
            if degree == 1 {
2,530,634✔
1602
                visit_stack.push(v.cast());
7,591,902✔
1603
            }
1604
        }
1605

1606
        while let Some(v) = visit_stack.pop() {
25,122,948✔
1607
            let v: usize = v.upcast();
×
1608
            if xor_graph.degree(v) == 0 {
×
1609
                continue;
561,468✔
1610
            }
1611
            let (sig_val, side) = xor_graph.edge_and_side(v);
48,000,000✔
1612
            xor_graph.zero(v);
36,000,000✔
1613
            sig_vals_stack.push(sig_val);
36,000,000✔
1614
            sides_stack.push(side as u8);
36,000,000✔
1615

1616
            let e = self.shard_edge.local_edge(sig_val.sig);
48,000,000✔
1617
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push);
12,000,000✔
1618
        }
1619

1620
        pl.done();
24✔
1621

1622
        if shard_len != sig_vals_stack.len() {
24✔
1623
            pl.info(format_args!(
×
1624
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1625
                shard_index + 1,
×
1626
                num_shards,
×
1627
                sig_vals_stack.len(),
×
1628
                shard_len
×
1629
            ));
1630
            return Err(());
×
1631
        }
1632

1633
        self.assign(
×
1634
            shard_index,
×
1635
            data,
×
1636
            sig_vals_stack
×
1637
                .iter()
×
1638
                .rev()
×
1639
                .map(|&sig_val| (sig_val.sig, get_val(shard_edge, sig_val)))
36,000,000✔
1640
                .zip(sides_stack.iter().copied().rev()),
×
1641
            pl,
×
1642
        );
1643

1644
        Ok(())
×
1645
    }
1646

1647
    /// Peels a shard via signature/value pairs using a stack of vertices to
1648
    /// represent peeled edges.
1649
    ///
1650
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1651
    /// it drops it immediately after building the graph.
1652
    ///
1653
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]) and a
1654
    /// [`ShardEdge::Vertex`] per vertex (for a [`DoubleStack`]).
1655
    ///
1656
    /// This is by far the less memory-hungry peeler, and it is just slightly
1657
    /// slower than
1658
    /// [`peek_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem),
1659
    /// which uses almost twice the memory. It is the peeler of choice for
1660
    /// significant levels of parallelism.
1661
    ///
1662
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1663
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a
1664
    /// failed peeling it is not possible to retrieve information about the
1665
    /// signature/value pairs in the shard.
1666
    fn peel_by_sig_vals_low_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
10✔
1667
        &self,
1668
        shard_index: usize,
1669
        shard: Arc<Vec<SigVal<S, V>>>,
1670
        data: ShardData<'_, W, D>,
1671
        get_val: &G,
1672
        pl: &mut impl ProgressLog,
1673
    ) -> Result<(), ()>
1674
    where
1675
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1676
    {
1677
        let shard_edge = &self.shard_edge;
20✔
1678
        let num_vertices = shard_edge.num_vertices();
30✔
1679
        let num_shards = shard_edge.num_shards();
30✔
1680
        let shard_len = shard.len();
20✔
1681

1682
        pl.start(format!(
30✔
1683
            "Generating graph for shard {}/{}...",
10✔
1684
            shard_index + 1,
10✔
1685
            num_shards,
×
1686
        ));
1687

1688
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
30✔
1689
        for &sig_val in shard.iter() {
10,000,020✔
1690
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1691
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
30,000,000✔
1692
                xor_graph.add(
×
1693
                    v,
×
1694
                    SigVal {
×
1695
                        sig: local_sig,
×
1696
                        val: sig_val.val,
×
1697
                    },
1698
                    side,
×
1699
                );
1700
            }
1701
        }
1702
        pl.done_with_count(shard.len());
30✔
1703

1704
        // We are using a consuming iterator over the shard store, so this
1705
        // drop will free the memory used by the signatures
1706
        drop(shard);
20✔
1707

1708
        assert!(
10✔
1709
            !xor_graph.overflow,
10✔
1710
            "Degree overflow for shard {}/{}",
×
1711
            shard_index + 1,
×
1712
            num_shards
×
1713
        );
1714

1715
        if self.failed.load(Ordering::Relaxed) {
30✔
1716
            return Err(());
×
1717
        }
1718

1719
        let mut visit_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1720

1721
        pl.start(format!(
×
1722
            "Peeling graph for shard {}/{} by signatures (low-mem)...",
×
1723
            shard_index + 1,
×
1724
            num_shards
×
1725
        ));
1726

1727
        // Preload all vertices of degree one in the visit stack
1728
        for (v, degree) in xor_graph.degrees().enumerate() {
11,304,960✔
1729
            if degree == 1 {
2,108,606✔
1730
                visit_stack.push_lower(v.cast());
6,325,818✔
1731
            }
1732
        }
1733

1734
        while let Some(v) = visit_stack.pop_lower() {
20,935,042✔
1735
            let v: usize = v.upcast();
×
1736
            if xor_graph.degree(v) == 0 {
×
1737
                continue;
467,516✔
1738
            }
1739
            let (sig_val, side) = xor_graph.edge_and_side(v);
40,000,000✔
1740
            xor_graph.zero(v);
30,000,000✔
1741
            visit_stack.push_upper(v.cast());
40,000,000✔
1742

1743
            let e = self.shard_edge.local_edge(sig_val.sig);
40,000,000✔
1744
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push_lower);
10,000,000✔
1745
        }
1746

1747
        pl.done();
20✔
1748

1749
        if shard_len != visit_stack.upper_len() {
20✔
1750
            pl.info(format_args!(
×
1751
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1752
                shard_index + 1,
×
1753
                num_shards,
×
1754
                visit_stack.upper_len(),
×
1755
                shard_len
×
1756
            ));
1757
            return Err(());
×
1758
        }
1759

1760
        self.assign(
×
1761
            shard_index,
×
1762
            data,
×
1763
            visit_stack.iter_upper().map(|&v| {
10,000,000✔
1764
                let (sig_val, side) = xor_graph.edge_and_side(v.upcast());
50,000,000✔
1765
                ((sig_val.sig, get_val(shard_edge, sig_val)), side as u8)
30,000,000✔
1766
            }),
1767
            pl,
×
1768
        );
1769

1770
        Ok(())
×
1771
    }
1772

1773
    /// Solves a shard of given index possibly using [lazy Gaussian
1774
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517), and stores the
1775
    /// solution in the given data.
1776
    ///
1777
    /// As a first try, the shard is [peeled by index](VBuilder::peel_by_index).
1778
    /// If the peeling is [partial](PeelResult::Partial), lazy Gaussian
1779
    /// elimination is used to solve the remaining edges.
1780
    ///
1781
    /// This method will scan the double stack, without emptying it, to check
1782
    /// which edges have been peeled. The information will be then passed to
1783
    /// [`VBuilder::assign`] to complete the assignment of values.
1784
    fn lge_shard<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1785
        &self,
1786
        shard_index: usize,
1787
        shard: Arc<Vec<SigVal<S, V>>>,
1788
        data: ShardData<'_, W, D>,
1789
        get_val: &G,
1790
        pl: &mut impl ProgressLog,
1791
    ) -> Result<(), ()> {
1792
        let shard_edge = &self.shard_edge;
330✔
1793
        // Let's try to peel first
1794
        match self.peel_by_index(shard_index, shard, data, get_val, pl) {
1,155✔
1795
            Err(()) => Err(()),
×
1796
            Ok(PeelResult::Complete()) => Ok(()),
10✔
1797
            Ok(PeelResult::Partial {
×
1798
                shard_index,
155✔
1799
                shard,
155✔
1800
                mut data,
155✔
1801
                double_stack,
155✔
1802
                sides_stack,
155✔
1803
            }) => {
×
1804
                pl.info(format_args!("Switching to lazy Gaussian elimination..."));
620✔
1805
                // Likely result--we have solve the rest
1806
                pl.start(format!(
465✔
1807
                    "Generating system for shard {}/{}...",
155✔
1808
                    shard_index + 1,
155✔
1809
                    shard_edge.num_shards()
310✔
1810
                ));
1811

1812
                let num_vertices = shard_edge.num_vertices();
465✔
1813
                let mut peeled_edges = BitVec::new(shard.len());
465✔
1814
                let mut used_vars = BitVec::new(num_vertices);
465✔
1815
                for &edge in double_stack.iter_upper() {
450,894✔
1816
                    peeled_edges.set(edge.upcast(), true);
×
1817
                }
1818

1819
                // Create data for a system using non-peeled edges
1820
                //
1821
                // SAFETY: there is no undefined behavior here, but the
1822
                // raw construction methods we use assume that the
1823
                // equations are sorted, that the variables are not repeated,
1824
                // and the variables are in the range [0..num_vertices).
1825
                let mut system = unsafe {
1826
                    crate::utils::mod2_sys::Modulo2System::from_parts(
1827
                        num_vertices,
155✔
1828
                        shard
155✔
1829
                            .iter()
155✔
1830
                            .enumerate()
155✔
1831
                            .filter(|(edge_index, _)| !peeled_edges[*edge_index])
1,080,755✔
1832
                            .map(|(_edge_index, sig_val)| {
630,171✔
1833
                                let local_sig = shard_edge.local_sig(sig_val.sig);
2,520,064✔
1834
                                let mut eq: Vec<_> = shard_edge
2,520,064✔
1835
                                    .local_edge(local_sig)
630,016✔
1836
                                    .iter()
630,016✔
1837
                                    .map(|&x| {
2,520,064✔
1838
                                        used_vars.set(x, true);
5,670,144✔
1839
                                        x as u32
1,890,048✔
1840
                                    })
1841
                                    .collect();
630,016✔
1842
                                eq.sort_unstable();
630,016✔
1843
                                crate::utils::mod2_sys::Modulo2Equation::from_parts(
630,016✔
1844
                                    eq,
630,016✔
1845
                                    get_val(
1,260,032✔
1846
                                        shard_edge,
1,260,032✔
1847
                                        SigVal {
630,016✔
1848
                                            sig: local_sig,
630,016✔
1849
                                            val: sig_val.val,
630,016✔
1850
                                        },
1851
                                    ),
1852
                                )
1853
                            })
1854
                            .collect(),
155✔
1855
                    )
1856
                };
1857

1858
                if self.failed.load(Ordering::Relaxed) {
465✔
1859
                    return Err(());
×
1860
                }
1861

1862
                pl.start("Solving system...");
×
1863
                let result = system.lazy_gaussian_elimination().map_err(|_| ())?;
155✔
1864
                pl.done_with_count(system.num_equations());
×
1865

1866
                for (v, &value) in result.iter().enumerate().filter(|(v, _)| used_vars[*v]) {
1,749,491✔
1867
                    data.set(v, value);
×
1868
                }
1869

1870
                self.assign(
×
1871
                    shard_index,
×
1872
                    data,
×
1873
                    double_stack
×
1874
                        .iter_upper()
×
1875
                        .map(|&edge_index| {
420,614✔
1876
                            let sig_val = &shard[edge_index.upcast()];
1,261,842✔
1877
                            let local_sig = shard_edge.local_sig(sig_val.sig);
1,682,456✔
1878
                            (
1879
                                local_sig,
841,228✔
1880
                                get_val(
841,228✔
1881
                                    shard_edge,
841,228✔
1882
                                    SigVal {
420,614✔
1883
                                        sig: local_sig,
420,614✔
1884
                                        val: sig_val.val,
420,614✔
1885
                                    },
1886
                                ),
1887
                            )
1888
                        })
1889
                        .zip(sides_stack.into_iter().rev()),
×
1890
                    pl,
×
1891
                );
1892
                Ok(())
×
1893
            }
1894
        }
1895
    }
1896

1897
    /// Perform assignment of values based on peeling data.
1898
    ///
1899
    /// This method might be called after a successful peeling procedure, or
1900
    /// after a linear solver has been used to solve the remaining edges.
1901
    ///
1902
    /// `sig_vals_sides` is an iterator returning pairs of signature/value pairs
1903
    /// and sides in reverse peeling order.
1904
    fn assign(
57✔
1905
        &self,
1906
        shard_index: usize,
1907
        mut data: ShardData<'_, W, D>,
1908
        sigs_vals_sides: impl Iterator<Item = ((E::LocalSig, W), u8)>,
1909
        pl: &mut impl ProgressLog,
1910
    ) {
1911
        if self.failed.load(Ordering::Relaxed) {
171✔
1912
            return;
×
1913
        }
1914

1915
        pl.start(format!(
×
1916
            "Assigning values for shard {}/{}...",
×
1917
            shard_index + 1,
×
1918
            self.shard_edge.num_shards()
×
1919
        ));
1920

1921
        for ((sig, val), side) in sigs_vals_sides {
89,682,913✔
1922
            let edge = self.shard_edge.local_edge(sig);
89,682,856✔
1923
            let side = side as usize;
44,841,428✔
1924
            unsafe {
1925
                let xor = match side {
44,841,428✔
1926
                    0 => data.get_unchecked(edge[1]) ^ data.get_unchecked(edge[2]),
3,935,744✔
1927
                    1 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[2]),
34,066,525✔
1928
                    2 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[1]),
58,358,325✔
1929
                    _ => core::hint::unreachable_unchecked(),
1930
                };
1931

1932
                data.set_unchecked(edge[side], val ^ xor);
×
1933
            }
1934
        }
1935
        pl.done();
57✔
1936
    }
1937
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc