• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

vigna / sux-rs / 15155344114

21 May 2025 06:42AM UTC coverage: 68.995% (-6.5%) from 75.47%
15155344114

Pull #62

github

web-flow
Merge efbcfa250 into 30c4990e8
Pull Request #62: Fix typo in documentation

3625 of 5254 relevant lines covered (69.0%)

124437652.64 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

58.7
/src/func/vbuilder.rs
1
/*
2
 * SPDX-FileCopyrightText: 2025 Sebastiano Vigna
3
 *
4
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
5
 */
6

7
#![allow(clippy::type_complexity)]
8
#![allow(clippy::too_many_arguments)]
9

10
use crate::bits::*;
11
use crate::dict::VFilter;
12
use crate::func::{shard_edge::ShardEdge, *};
13
use crate::traits::bit_field_slice::{BitFieldSlice, BitFieldSliceMut, Word};
14
use crate::utils::*;
15
use common_traits::{
16
    CastableInto, DowncastableFrom, DowncastableInto, UnsignedInt, UpcastableFrom, UpcastableInto,
17
};
18
use derivative::Derivative;
19
use derive_setters::*;
20
use dsi_progress_logger::*;
21
use epserde::prelude::*;
22
use log::info;
23
use rand::rngs::SmallRng;
24
use rand::SeedableRng;
25
use rand::{Rng, RngCore};
26
use rayon::iter::ParallelIterator;
27
use rayon::slice::ParallelSlice;
28
use rdst::*;
29
use std::any::TypeId;
30
use std::borrow::Cow;
31
use std::hint::unreachable_unchecked;
32
use std::marker::PhantomData;
33
use std::mem::transmute;
34
use std::ops::{BitXor, BitXorAssign};
35
use std::slice::Iter;
36
use std::sync::atomic::{AtomicBool, Ordering};
37
use std::sync::Arc;
38
use std::time::Instant;
39
use thread_priority::ThreadPriority;
40

41
use super::shard_edge::FuseLge3Shards;
42

43
const LOG2_MAX_SHARDS: u32 = 16;
44

45
/// A builder for [`VFunc`] and [`VFilter`].
46
///
47
/// Keys must implement the [`ToSig`] trait, which provides a method to compute
48
/// a signature of the key.
49
///
50
/// There are two construction modes: in core memory (default) and
51
/// [offline](VBuilder::offline); both use a [`SigStore`]. In the first case,
52
/// space will be allocated in core memory for signatures and associated values
53
/// for all keys; in the second case, such information will be stored in a
54
/// number of on-disk buckets.
55
///
56
/// There are several setters: for example, you can set [set the maximum number
57
/// of threads](VBuilder::max_num_threads).
58
///
59
/// Once signatures have been computed, each parallel thread will process a
60
/// shard of the signature/value pairs. For very large key sets shards will be
61
/// significantly smaller than the number of keys, so the memory usage, in
62
/// particular in offline mode, can be significantly reduced. Note that using
63
/// too many threads might actually be harmful due to memory contention.
64
///
65
/// The generic parameters are explained in the [`VFunc`] documentation. You
66
/// have to choose the type of the output values and the backend. The remaining
67
/// parameters have default values that are the same as those of
68
/// [`VFunc`]/[`VFilter`], and some elaboration about them can be found in their
69
/// documentation.
70
///
71
/// All construction methods require to pass one or two [`RewindableIoLender`]s
72
/// (keys and possibly values), and the construction might fail and keys might
73
/// be scanned again. The structures in the [`lenders`] module provide easy ways
74
/// to build such lenders, even starting from compressed files of UTF-8 strings.
75
/// The type of the keys of the resulting filter or function will be the type of
76
/// the elements of the [`RewindableIoLender`].
77
///
78
/// # Examples
79
///
80
/// In this example, we build a function that maps each key to itself using a
81
/// boxed slice of `usize` as a backend (note that this is really wasteful). The
82
/// setter for the expected number of keys is used to optimize the construction.
83
/// We use the [`FromIntoIterator`] adapter to turn a clonable [`IntoIterator`]
84
/// into a [`RewindableIoLender`]. Note that you need the
85
/// [`dsi-progress-logger`](https://crates.io/crates/dsi-progress-logger) crate.
86
///
87
/// Type inference derives the input type (`usize`) from the type of the items
88
/// returned by the first [`RewindableIoLender`], and the output type (again,
89
/// `usize`, the first type parameter), from the backend type (`Box<[usize]>`,
90
/// the second type parameter):
91
///
92
/// ```rust
93
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
94
/// use sux::func::VBuilder;
95
/// use dsi_progress_logger::no_logging;
96
/// use sux::utils::FromIntoIterator;
97
///
98
/// let builder = VBuilder::<_, Box<[usize]>>::default()
99
///     .expected_num_keys(100);
100
/// let func = builder.try_build_func(
101
///    FromIntoIterator::from(0..100),
102
///    FromIntoIterator::from(0..100),
103
///    no_logging![]
104
/// )?;
105
///
106
/// for i in 0..100 {
107
///    assert_eq!(i, func.get(&i));
108
/// }
109
/// #     Ok(())
110
/// # }
111
/// ```
112
///
113
/// Alternatively we can use the bit-field vector backend, that will use
114
/// ⌈log₂(99)⌉ bits per backend element:
115
///
116
/// ```rust
117
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
118
/// use sux::func::VBuilder;
119
/// use dsi_progress_logger::no_logging;
120
/// use sux::utils::FromIntoIterator;
121
/// use sux::bits::BitFieldVec;
122
///
123
/// let builder = VBuilder::<_, BitFieldVec<usize>>::default()
124
///     .expected_num_keys(100);
125
/// let func = builder.try_build_func(
126
///    FromIntoIterator::from(0..100),
127
///    FromIntoIterator::from(0..100),
128
///    no_logging![]
129
/// )?;
130
///
131
/// for i in 0..100 {
132
///    assert_eq!(i, func.get(&i));
133
/// }
134
/// #     Ok(())
135
/// # }
136
/// ```
137
///
138
/// Since the numbers are small, we can also try to use a fixed-size output:
139
/// type inference takes care of making the second range `0..100` a range of
140
/// `u8`. Note that the type of keys is always `usize`, as it is still inferred
141
/// from the type of the items returned by the first [`RewindableIoLender`]:
142
///
143
/// ```rust
144
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
145
/// use sux::func::VBuilder;
146
/// use dsi_progress_logger::no_logging;
147
/// use sux::utils::FromIntoIterator;
148
/// use sux::bits::BitFieldVec;
149
///
150
/// let builder = VBuilder::<_, Box<[u8]>>::default()
151
///     .expected_num_keys(100);
152
/// let func = builder.try_build_func(
153
///    FromIntoIterator::from(0..100),
154
///    FromIntoIterator::from(0..100),
155
///    no_logging![]
156
/// )?;
157
///
158
/// for i in 0..100 {
159
///    assert_eq!(i, func.get(&i));
160
/// }
161
/// #     Ok(())
162
/// # }
163
/// ```
164
///
165
///
166
/// We now try to build a fast 8-bit filter for the same key set, using a boxed
167
/// slice of `u8` as a backend (this is not wasteful, as the filter uses 8-bit
168
/// hashes):
169
///
170
/// ```rust
171
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
172
/// use sux::func::VBuilder;
173
/// use dsi_progress_logger::no_logging;
174
/// use sux::utils::FromIntoIterator;
175
///
176
/// let builder = VBuilder::<_, Box<[u8]>>::default()
177
///     .expected_num_keys(100);
178
/// let func = builder.try_build_filter(
179
///    FromIntoIterator::from(0..100),
180
///    no_logging![]
181
/// )?;
182
///
183
/// for i in 0..100 {
184
///    assert!(func[i]);
185
/// }
186
/// #     Ok(())
187
/// # }
188
/// ```
189
///
190
/// Since the keys are very few, we can switch to 64-bit signatures, and no
191
/// shards, which will yield faster queries:
192
///
193
/// ```rust
194
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
195
/// use sux::func::VBuilder;
196
/// use sux::func::shard_edge::FuseLge3NoShards;
197
/// use dsi_progress_logger::no_logging;
198
/// use sux::utils::FromIntoIterator;
199
///
200
/// let builder = VBuilder::<_, Box<[u8]>, [u64; 1], FuseLge3NoShards>::default()
201
///     .expected_num_keys(100);
202
/// let func = builder.try_build_filter(
203
///    FromIntoIterator::from(0..100),
204
///    no_logging![]
205
/// )?;
206
///
207
/// for i in 0..100 {
208
///    assert!(func[i]);
209
/// }
210
/// #     Ok(())
211
/// # }
212
/// ```
213

214
#[derive(Setters, Debug, Derivative)]
215
#[derivative(Default)]
216
#[setters(generate = false)]
217
pub struct VBuilder<
218
    W: ZeroCopy + Word,
219
    D: BitFieldSlice<W> + Send + Sync = Box<[W]>,
220
    S = [u64; 2],
221
    E: ShardEdge<S, 3> = FuseLge3Shards,
222
> {
223
    /// The expected number of keys.
224
    ///
225
    /// While this setter is optional, setting this value to a reasonable bound
226
    /// on the actual number of keys will significantly speed up the
227
    /// construction.
228
    #[setters(generate = true, strip_option)]
229
    #[derivative(Default(value = "None"))]
230
    expected_num_keys: Option<usize>,
231

232
    /// The maximum number of parallel threads to use. The default is 8.
233
    #[setters(generate = true)]
234
    #[derivative(Default(value = "8"))]
235
    max_num_threads: usize,
236

237
    /// Use disk-based buckets to reduce core memory usage at construction time.
238
    #[setters(generate = true)]
239
    offline: bool,
240

241
    /// Check for duplicated signatures. This is not necessary in general,
242
    /// but if you suspect you might be feeding duplicate keys, you can
243
    /// enable this check.
244
    #[setters(generate = true)]
245
    check_dups: bool,
246

247
    /// Use always the low-memory peel-by-signature algorithm (true) or the
248
    /// high-memory peel-by-index algorithm (false). The former is slightly
249
    /// slower, but it uses much less memory. Normally [`VBuilder`] uses
250
    /// high-mem and switches to low-mem if there are more
251
    /// than two threads and more than four shards.
252
    #[setters(generate = true, strip_option)]
253
    #[derivative(Default(value = "None"))]
254
    low_mem: Option<bool>,
255

256
    /// The seed for the random number generator.
257
    #[setters(generate = true)]
258
    seed: u64,
259

260
    /// The base-2 logarithm of buckets of the [`SigStore`]. The default is 8.
261
    /// This value is automatically overridden, even if set, if you provide an
262
    /// [expected number of keys](VBuilder::expected_num_keys).
263
    #[setters(generate = true, strip_option)]
264
    #[derivative(Default(value = "8"))]
265
    log2_buckets: u32,
266

267
    /// The target relative space loss due to [ε-cost
268
    /// sharding](https://doi.org/10.4230/LIPIcs.ESA.2019.38).s
269
    ///
270
    /// The default is 0.001. Setting a larger target, for example, 0.01, will
271
    /// increase the space overhead due to sharding, but will provide in general
272
    /// finer shards. This might not always happen, however, because the ε-cost
273
    /// bound is just one of the bounds concurring in limiting the amount of
274
    /// sharding for a specific [`ShardEdge`]. For example, increasing the
275
    /// target to 0.01 will provide very fine sharding using `Mwhc3Shards`
276
    /// shard/edge logic, activated by the `mwhc` feature, but will affect only
277
    /// slightly [`FuseLge3Shards`] or
278
    /// [`FuseLge3FullSigs`](crate::func::shard_edge::FuseLge3FullSigs).
279
    #[setters(generate = true, strip_option)]
280
    #[derivative(Default(value = "0.001"))]
281
    eps: f64,
282

283
    /// The bit width of the maximum value.
284
    bit_width: usize,
285
    /// The edge generator.
286
    shard_edge: E,
287
    /// The number of keys.
288
    num_keys: usize,
289
    /// The ratio between the number of vertices and the number of edges
290
    /// (i.e., between the number of variables and the number of equations).
291
    c: f64,
292
    /// Whether we should use [lazy Gaussian
293
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517).
294
    lge: bool,
295
    /// The number of threads to use.
296
    num_threads: usize,
297
    /// Fast-stop for failed attempts.
298
    failed: AtomicBool,
299
    #[doc(hidden)]
300
    _marker_v: PhantomData<(W, D, S)>,
301
}
302

303
/// Fatal build errors.
304
#[derive(thiserror::Error, Debug)]
305
pub enum BuildError {
306
    #[error("Duplicate key")]
307
    /// A duplicate key was detected.
308
    DuplicateKey,
309
    #[error("Duplicate local signatures: use full signatures")]
310
    /// A duplicate key was detected.
311
    DuplicateLocalSignatures,
312
    #[error("Value too large for specified bit size")]
313
    /// A value is too large for the specified bit size.
314
    ValueTooLarge,
315
}
316

317
#[derive(thiserror::Error, Debug)]
318
/// Transient error during the build, leading to
319
/// trying with a different seed.
320
pub enum SolveError {
321
    #[error("Duplicate signature")]
322
    /// A duplicate signature was detected.
323
    DuplicateSignature,
324
    #[error("Duplicate local signature")]
325
    /// A duplicate local signature was detected.
326
    DuplicateLocalSignature,
327
    #[error("Max shard too big")]
328
    /// The maximum shard is too big relatively to the average shard.
329
    MaxShardTooBig,
330
    #[error("Unsolvable shard")]
331
    /// A shard cannot be solved.
332
    UnsolvableShard,
333
}
334

335
/// The result of a peeling procedure.
336
enum PeelResult<
337
    'a,
338
    W: ZeroCopy + Word + Send + Sync,
339
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync + 'a,
340
    S: Sig + ZeroCopy + Send + Sync,
341
    E: ShardEdge<S, 3>,
342
    V: ZeroCopy,
343
> {
344
    Complete(),
345
    Partial {
346
        /// The shard index.
347
        shard_index: usize,
348
        /// The shard.
349
        shard: Arc<Vec<SigVal<S, V>>>,
350
        /// The data.
351
        data: ShardData<'a, W, D>,
352
        /// The double stack whose upper stack contains the peeled edges
353
        /// (possibly represented by the vertex from which they have been
354
        /// peeled).
355
        double_stack: DoubleStack<E::Vertex>,
356
        /// The sides stack.
357
        sides_stack: Vec<u8>,
358
    },
359
}
360

361
/// A graph represented compactly.
362
///
363
/// Each (*k*-hyper)edge is a set of *k* vertices (by construction fuse graphs
364
/// do not have degenerate edges), but we represent it internally as a vector.
365
/// We call *side* the position of a vertex in the edge.
366
///
367
/// For each vertex we store information about the edges incident to the vertex
368
/// and the sides of the vertex in such edges. While technically not necessary
369
/// to perform peeling, the knowledge of the sides speeds up the peeling visit
370
/// by reducing the number of tests that are necessary to update the degrees
371
/// once the edge is peeled (see the `peel_by_*` methods). For the same reason
372
/// it also speeds up assignment.
373
///
374
/// Depending on the peeling method (by signature or by index), the graph will
375
/// store edge indices or signature/value pairs (the generic parameter `X`).
376
///
377
/// Edge information is packed together using Djamal's XOR trick (see
378
/// [“Cache-Oblivious Peeling of Random
379
/// Hypergraphs”](https://doi.org/10.1109/DCC.2014.48)): since during the
380
/// peeling b visit we need to know the content of the list only when a single
381
/// edge index is present, we can XOR together all the edge information.
382
///
383
/// We use a single byte to store the degree (six upper bits) and the XOR of the
384
/// sides (lower two bits). The degree can be stored with a small number of bits
385
/// because the graph is random, so the maximum degree is *O*(log log *n*).
386
/// In any case, the Boolean field `overflow` will become `true` in case of
387
/// overflow.
388
///
389
/// When we peel an edge, we just [zero the degree](Self::zero), leaving the
390
/// edge information and the side in place for further processing later.
391
struct XorGraph<X: Copy + Default + BitXor + BitXorAssign> {
392
    edges: Box<[X]>,
393
    degrees_sides: Box<[u8]>,
394
    overflow: bool,
395
}
396

397
impl<X: BitXor + BitXorAssign + Default + Copy> XorGraph<X> {
398
    pub fn new(n: usize) -> XorGraph<X> {
187✔
399
        XorGraph {
400
            edges: vec![X::default(); n].into(),
935✔
401
            degrees_sides: vec![0; n].into(),
374✔
402
            overflow: false,
403
        }
404
    }
405

406
    #[inline(always)]
407
    pub fn add(&mut self, v: usize, x: X, side: usize) {
69,242,100✔
408
        debug_assert!(side < 3);
138,484,200✔
409
        let (degree_size, overflow) = self.degrees_sides[v].overflowing_add(4);
207,726,300✔
410
        self.degrees_sides[v] = degree_size;
69,242,100✔
411
        self.overflow |= overflow;
69,242,100✔
412
        self.degrees_sides[v] ^= side as u8;
69,242,100✔
413
        self.edges[v] ^= x;
69,242,100✔
414
    }
415

416
    #[inline(always)]
417
    pub fn remove(&mut self, v: usize, x: X, side: usize) {
44,901,368✔
418
        debug_assert!(side < 3);
89,802,736✔
419
        self.degrees_sides[v] -= 4;
44,901,368✔
420
        self.degrees_sides[v] ^= side as u8;
44,901,368✔
421
        self.edges[v] ^= x;
44,901,368✔
422
    }
423

424
    #[inline(always)]
425
    pub fn zero(&mut self, v: usize) {
22,450,684✔
426
        self.degrees_sides[v] &= 0b11;
22,450,684✔
427
    }
428

429
    #[inline(always)]
430
    pub fn edge_and_side(&self, v: usize) -> (X, usize) {
32,450,684✔
431
        debug_assert!(self.degree(v) < 2);
97,352,052✔
432
        (self.edges[v] as _, (self.degrees_sides[v] & 0b11) as _)
64,901,368✔
433
    }
434

435
    #[inline(always)]
436
    pub fn degree(&self, v: usize) -> u8 {
101,315,344✔
437
        self.degrees_sides[v] >> 2
101,315,344✔
438
    }
439

440
    pub fn degrees(
187✔
441
        &self,
442
    ) -> std::iter::Map<std::iter::Copied<std::slice::Iter<'_, u8>>, fn(u8) -> u8> {
443
        self.degrees_sides.iter().copied().map(|d| d >> 2)
26,094,784✔
444
    }
445
}
446

447
/// A preallocated stack implementation that avoids the expensive (even if
448
/// rarely taken) branch of the `Vec` implementation in which memory is
449
/// reallocated. Note that using [`Vec::with_capacity`] is not enough, because
450
/// for the CPU the branch is still there.
451
struct FastStack<X: Copy + Default> {
452
    stack: Vec<X>,
453
    top: usize,
454
}
455

456
impl<X: Copy + Default> FastStack<X> {
457
    pub fn new(n: usize) -> FastStack<X> {
24✔
458
        FastStack {
459
            stack: vec![X::default(); n],
48✔
460
            top: 0,
461
        }
462
    }
463

464
    pub fn push(&mut self, x: X) {
24,000,000✔
465
        debug_assert!(self.top < self.stack.len());
72,000,000✔
466
        self.stack[self.top] = x;
24,000,000✔
467
        self.top += 1;
24,000,000✔
468
    }
469

470
    pub fn len(&self) -> usize {
12✔
471
        self.top
12✔
472
    }
473

474
    pub fn iter(&self) -> std::slice::Iter<'_, X> {
24✔
475
        self.stack[..self.top].iter()
48✔
476
    }
477
}
478

479
/// Two stacks in the same vector.
480
///
481
/// This struct implements a pair of stacks sharing the same memory. The lower
482
/// stack grows from the beginning of the vector, the upper stack grows from the
483
/// end of the vector. Since we use the lower stack for the visit and the upper
484
/// stack for peeled edges (possibly represented by the vertex from which they
485
/// have been peeled), the sum of the lengths of the two stacks cannot exceed
486
/// the length of the vector.
487
#[derive(Debug)]
488
struct DoubleStack<V> {
489
    stack: Vec<V>,
490
    lower: usize,
491
    upper: usize,
492
}
493

494
impl<V: Default + Copy> DoubleStack<V> {
495
    fn new(n: usize) -> DoubleStack<V> {
175✔
496
        DoubleStack {
497
            stack: vec![V::default(); n],
525✔
498
            lower: 0,
499
            upper: n,
500
        }
501
    }
502
}
503

504
impl<V: Copy> DoubleStack<V> {
505
    #[inline(always)]
506
    fn push_lower(&mut self, v: V) {
10,951,140✔
507
        debug_assert!(self.lower < self.upper);
21,902,280✔
508
        self.stack[self.lower] = v;
10,951,140✔
509
        self.lower += 1;
10,951,140✔
510
    }
511

512
    #[inline(always)]
513
    fn push_upper(&mut self, v: V) {
10,450,684✔
514
        debug_assert!(self.lower < self.upper);
20,901,368✔
515
        self.upper -= 1;
10,450,684✔
516
        self.stack[self.upper] = v;
10,450,684✔
517
    }
518

519
    #[inline(always)]
520
    fn pop_lower(&mut self) -> Option<V> {
10,951,315✔
521
        if self.lower == 0 {
10,951,315✔
522
            None
175✔
523
        } else {
524
            self.lower -= 1;
10,951,140✔
525
            Some(self.stack[self.lower])
×
526
        }
527
    }
528

529
    fn upper_len(&self) -> usize {
330✔
530
        self.stack.len() - self.upper
660✔
531
    }
532

533
    fn iter_upper(&self) -> Iter<'_, V> {
200✔
534
        self.stack[self.upper..].iter()
400✔
535
    }
536
}
537

538
/// An iterator over segments of data associated with each shard.
539
type ShardDataIter<'a, W, D> = <D as BitFieldSliceMut<W>>::ChunksMut<'a>;
540
/// A segment of data associated with a specific shard.
541
type ShardData<'a, W, D> = <ShardDataIter<'a, W, D> as Iterator>::Item;
542

543
/// Builds a new function using a `Box<[W]>` to store values.
544
///
545
/// Since values are stored in a boxed slice access is particularly fast, but
546
/// the bit width of the output of the function will be exactly the bit width of
547
/// the unsigned type `W`.
548
impl<W: ZeroCopy + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, Box<[W]>, S, E>
549
where
550
    u128: UpcastableFrom<W>,
551
    SigVal<S, W>: RadixKey,
552
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
553
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
554
{
555
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug>(
×
556
        mut self,
557
        keys: impl RewindableIoLender<T>,
558
        values: impl RewindableIoLender<W>,
559
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
560
    ) -> anyhow::Result<VFunc<T, W, Box<[W]>, S, E>>
561
    where
562
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
563
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
564
    {
565
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
×
566
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
×
567
        self.build_loop(keys, values, None, &get_val, new_data, pl)
×
568
    }
569
}
570

571
/// Builds a new filter using a `Box<[W]>` to store values.
572
///
573
/// Since values are stored in a boxed slice access is particularly fast, but
574
/// the number of bits of the hashes will be  exactly the bit width of the
575
/// unsigned type `W`.
576
impl<W: ZeroCopy + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
577
    VBuilder<W, Box<[W]>, S, E>
578
where
579
    u128: UpcastableFrom<W>,
580
    SigVal<S, EmptyVal>: RadixKey,
581
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
582
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
583
{
584
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug>(
36✔
585
        mut self,
586
        keys: impl RewindableIoLender<T>,
587
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
588
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, Box<[W]>, S, E>>>
589
    where
590
        for<'a> ShardDataIter<'a, W, Box<[W]>>: Send,
591
        for<'a> ShardData<'a, W, Box<[W]>>: Send,
592
    {
593
        let filter_mask = W::MAX;
72✔
594
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
12,630,474✔
595
            mix64(shard_edge.edge_hash(sig_val.sig)).downcast()
63,152,190✔
596
        };
597
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
378✔
598

599
        Ok(VFilter {
×
600
            func: self.build_loop(
72✔
601
                keys,
36✔
602
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
108✔
603
                Some(W::BITS),
36✔
604
                &get_val,
36✔
605
                new_data,
36✔
606
                pl,
36✔
607
            )?,
608
            filter_mask,
36✔
609
            hash_bits: W::BITS as u32,
×
610
        })
611
    }
612
}
613

614
/// Builds a new function using a [bit-field vector](BitFieldVec) on words of
615
/// the unsigned type `W` to store values.
616
///
617
/// Since values are stored in a bit-field vector access will be slower than
618
/// when using a boxed slice, but the bit width of stored values will be the
619
/// minimum necessary. It must be in any case at most the bit width of `W`.
620
///
621
/// Typically `W` will be `usize` or `u64`.
622
impl<W: ZeroCopy + Word, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, BitFieldVec<W>, S, E>
623
where
624
    u128: UpcastableFrom<W>,
625
    SigVal<S, W>: RadixKey,
626
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
627
{
628
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug>(
27✔
629
        mut self,
630
        keys: impl RewindableIoLender<T>,
631
        values: impl RewindableIoLender<W>,
632
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
633
    ) -> anyhow::Result<VFunc<T, W, BitFieldVec<W>, S, E>> {
634
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
27✔
635
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
273✔
636
        self.build_loop(keys, values, None, &get_val, new_data, pl)
216✔
637
    }
638
}
639

640
/// Builds a new filter using a [bit-field vector](BitFieldVec) on words of the
641
/// unsigned type `W` to store values.
642
///
643
/// Since values are stored in a bit-field vector access will be slower than
644
/// when using a boxed slice, but the number of bits of the hashes can be set at
645
/// will. It must be in any case at most the bit width of `W`.
646
///
647
/// Typically `W` will be `usize` or `u64`.
648
impl<W: ZeroCopy + Word + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
649
    VBuilder<W, BitFieldVec<W>, S, E>
650
where
651
    u128: UpcastableFrom<W>,
652
    SigVal<S, EmptyVal>: RadixKey,
653
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
654
{
655
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug>(
×
656
        mut self,
657
        keys: impl RewindableIoLender<T>,
658
        filter_bits: usize,
659
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
660
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, BitFieldVec<W>, S, E>>> {
661
        assert!(filter_bits > 0);
×
662
        assert!(filter_bits <= W::BITS);
×
663
        let filter_mask = W::MAX >> (W::BITS - filter_bits);
×
664
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
×
665
            <W as DowncastableFrom<u64>>::downcast_from(mix64(shard_edge.edge_hash(sig_val.sig)))
×
666
                & filter_mask
×
667
        };
668
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
×
669

670
        Ok(VFilter {
×
671
            func: self.build_loop(
×
672
                keys,
×
673
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
×
674
                Some(filter_bits),
×
675
                &get_val,
×
676
                new_data,
×
677
                pl,
×
678
            )?,
679
            filter_mask,
×
680
            hash_bits: filter_bits as _,
×
681
        })
682
    }
683
}
684

685
impl<
686
        W: ZeroCopy + Word,
687
        D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
688
        S: Sig + Send + Sync,
689
        E: ShardEdge<S, 3>,
690
    > VBuilder<W, D, S, E>
691
{
692
    /// Builds and returns a new function with given keys and values.
693
    ///
694
    /// This method can build functions based both on vectors and on bit-field
695
    /// vectors. The necessary abstraction is provided by the
696
    /// `new_data(bit_width, len)` function, which is called to create the data
697
    /// structure to store the values.
698
    ///
699
    /// When `V` is [`EmptyVal`], the this method builds a function supporting a
700
    /// filter by mapping each key to a mix of its local signature. The
701
    /// necessary abstraction is provided by the `get_val` function, which is
702
    /// called to extract the value from the signature/value pair; in the case
703
    /// of functions it returns the value stored in the signature/value pair,
704
    /// and in the case of filters it returns the hash associated with the
705
    /// signature.
706
    fn build_loop<
63✔
707
        T: ?Sized + ToSig<S> + std::fmt::Debug,
708
        V: ZeroCopy + Default + Send + Sync + Ord + UpcastableInto<u128>,
709
    >(
710
        &mut self,
711
        mut keys: impl RewindableIoLender<T>,
712
        mut values: impl RewindableIoLender<V>,
713
        bit_width: Option<usize>,
714
        get_val: &(impl Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync),
715
        new_data: fn(usize, usize) -> D,
716
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
717
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
718
    where
719
        u128: UpcastableFrom<W>,
720
        SigVal<S, V>: RadixKey,
721
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
722
        for<'a> ShardDataIter<'a, W, D>: Send,
723
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
724
    {
725
        let mut dup_count = 0;
126✔
726
        let mut local_dup_count = 0;
126✔
727
        let mut prng = SmallRng::seed_from_u64(self.seed);
189✔
728

729
        if let Some(expected_num_keys) = self.expected_num_keys {
115✔
730
            self.shard_edge.set_up_shards(expected_num_keys, self.eps);
×
731
            self.log2_buckets = self.shard_edge.shard_high_bits();
×
732
        }
733

734
        pl.info(format_args!("Using 2^{} buckets", self.log2_buckets));
252✔
735

736
        // Loop until success or duplicate detection
737
        loop {
×
738
            let seed = prng.random();
591✔
739

740
            match if self.offline {
197✔
741
                self.try_seed(
144✔
742
                    seed,
72✔
743
                    sig_store::new_offline::<S, V>(
72✔
744
                        self.log2_buckets,
72✔
745
                        LOG2_MAX_SHARDS,
×
746
                        self.expected_num_keys,
72✔
747
                    )?,
748
                    &mut keys,
×
749
                    &mut values,
×
750
                    bit_width,
×
751
                    get_val,
×
752
                    new_data,
×
753
                    pl,
×
754
                )
755
            } else {
756
                self.try_seed(
250✔
757
                    seed,
×
758
                    sig_store::new_online::<S, V>(
×
759
                        self.log2_buckets,
×
760
                        LOG2_MAX_SHARDS,
×
761
                        self.expected_num_keys,
×
762
                    )?,
763
                    &mut keys,
×
764
                    &mut values,
×
765
                    bit_width,
×
766
                    get_val,
×
767
                    new_data,
×
768
                    pl,
×
769
                )
770
            } {
771
                Ok(func) => {
62✔
772
                    return Ok(func);
×
773
                }
774
                Err(error) => {
135✔
775
                    match error.downcast::<SolveError>() {
135✔
776
                        Ok(vfunc_error) => match vfunc_error {
135✔
777
                            // Let's try another seed, but just a few times--most likely,
778
                            // duplicate keys
779
                            SolveError::DuplicateSignature => {
×
780
                                if dup_count >= 3 {
4✔
781
                                    pl.error(format_args!("Duplicate keys (duplicate 128-bit signatures with four different seeds)"));
4✔
782
                                    return Err(BuildError::DuplicateKey.into());
1✔
783
                                }
784
                                pl.warn(format_args!(
×
785
                                "Duplicate 128-bit signature, trying again with a different seed..."
×
786
                                ));
787
                                dup_count += 1;
×
788
                            }
789
                            // Let's try another seed, but just a few times
790
                            SolveError::DuplicateLocalSignature => {
×
791
                                if local_dup_count >= 2 {
×
792
                                    pl.error(format_args!("Duplicate local signatures: use full signatures (duplicate local signatures with three different seeds)"));
×
793
                                    return Err(BuildError::DuplicateLocalSignatures.into());
×
794
                                }
795
                                pl.warn(format_args!(
×
796
                                "Duplicate local signature, trying again with a different seed..."
×
797
                                ));
798
                                local_dup_count += 1;
×
799
                            }
800
                            SolveError::MaxShardTooBig => {
1✔
801
                                pl.warn(format_args!(
3✔
802
                                "The maximum shard is too big, trying again with a different seed..."
1✔
803
                               ));
804
                            }
805
                            // Let's just try another seed
806
                            SolveError::UnsolvableShard => {
130✔
807
                                pl.warn(format_args!(
390✔
808
                                    "Unsolvable shard, trying again with a different seed..."
130✔
809
                                ));
810
                            }
811
                        },
812
                        Err(error) => return Err(error),
×
813
                    }
814
                }
815
            }
816

817
            values = values.rewind()?;
268✔
818
            keys = keys.rewind()?;
134✔
819
        }
820
    }
821
}
822

823
impl<
824
        W: ZeroCopy + Word,
825
        D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
826
        S: Sig + Send + Sync,
827
        E: ShardEdge<S, 3>,
828
    > VBuilder<W, D, S, E>
829
{
830
    /// Tries to build a function using specific seed. See the comments in the
831
    /// [`VBuilder::build_loop`] method for more details.
832
    ///
833
    /// This methods reads the input, sets up the shards, allocates the backend
834
    /// using `new_data`, and passes the backend and an iterator on shards to
835
    /// the [`VBuilder::try_build_from_shard_iter`] method.
836
    fn try_seed<
197✔
837
        T: ?Sized + ToSig<S> + std::fmt::Debug,
838
        V: ZeroCopy + Default + Send + Sync + Ord + UpcastableInto<u128>,
839
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
840
    >(
841
        &mut self,
842
        seed: u64,
843
        mut sig_store: impl SigStore<S, V>,
844
        keys: &mut impl RewindableIoLender<T>,
845
        values: &mut impl RewindableIoLender<V>,
846
        bit_width: Option<usize>,
847
        get_val: &G,
848
        new_data: fn(usize, usize) -> D,
849
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
850
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
851
    where
852
        SigVal<S, V>: RadixKey,
853
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
854
        for<'a> ShardDataIter<'a, W, D>: Send,
855
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
856
    {
857
        let shard_edge = &mut self.shard_edge;
394✔
858

859
        pl.expected_updates(self.expected_num_keys);
591✔
860
        pl.item_name("key");
394✔
861
        pl.start(format!(
591✔
862
            "Computing and storing {}-bit signatures in {} using seed 0x{:016x}...",
197✔
863
            std::mem::size_of::<S>() * 8,
197✔
864
            sig_store
197✔
865
                .temp_dir()
197✔
866
                .map(|d| d.path().to_string_lossy())
341✔
867
                .unwrap_or(Cow::Borrowed("memory")),
394✔
868
            seed
×
869
        ));
870

871
        // This will be the maximum value for functions and EmptyVal for filters
872
        let mut maybe_max_value = V::default();
394✔
873
        let start = Instant::now();
394✔
874

875
        while let Some(result) = keys.next() {
46,361,677✔
876
            match result {
×
877
                Ok(key) => {
23,180,740✔
878
                    pl.light_update();
×
879
                    // This might be an actual value, if we are building a
880
                    // function, or EmptyVal, if we are building a filter.
881
                    let &maybe_val = values.next().expect("Not enough values")?;
23,180,740✔
882
                    let sig_val = SigVal {
883
                        sig: T::to_sig(key, seed),
×
884
                        val: maybe_val,
885
                    };
886
                    maybe_max_value = Ord::max(maybe_max_value, maybe_val);
×
887
                    sig_store.try_push(sig_val)?;
×
888
                }
889
                Err(e) => return Err(e.into()),
×
890
            }
891
        }
892
        pl.done();
394✔
893

894
        self.num_keys = sig_store.len();
197✔
895
        self.bit_width = if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
197✔
896
            bit_width.expect("Bit width must be set for filters")
345✔
897
        } else {
898
            let len_width = maybe_max_value.upcast().len() as usize;
82✔
899
            if let Some(bit_width) = bit_width {
×
900
                if len_width > bit_width {
×
901
                    return Err(BuildError::ValueTooLarge.into());
×
902
                }
903
                bit_width
×
904
            } else {
905
                len_width
82✔
906
            }
907
        };
908

909
        info!(
197✔
910
            "Computation of signatures from inputs completed in {:.3} seconds ({} keys, {:.3} ns/key)",
160✔
911
            start.elapsed().as_secs_f64(),
320✔
912
            self.num_keys,
×
913
            start.elapsed().as_nanos() as f64 / self.num_keys as f64
160✔
914
        );
915

916
        let start = Instant::now();
×
917

918
        let shard_store = sig_store.into_shard_store(shard_edge.shard_high_bits())?;
197✔
919
        let max_shard = shard_store.shard_sizes().iter().copied().max().unwrap_or(0);
×
920
        let filter = TypeId::of::<V>() == TypeId::of::<EmptyVal>();
×
921

922
        shard_edge.set_up_shards(self.num_keys, self.eps);
×
923
        (self.c, self.lge) = shard_edge.set_up_graphs(self.num_keys, max_shard);
×
924

925
        if filter {
115✔
926
            pl.info(format_args!(
345✔
927
                "Number of keys: {} Bit width: {}",
115✔
928
                self.num_keys, self.bit_width,
115✔
929
            ));
930
        } else {
931
            pl.info(format_args!(
82✔
932
                "Number of keys: {} Max value: {} Bit width: {}",
82✔
933
                self.num_keys,
82✔
934
                maybe_max_value.upcast(),
82✔
935
                self.bit_width,
82✔
936
            ));
937
        }
938

939
        if shard_edge.shard_high_bits() != 0 {
6✔
940
            pl.info(format_args!(
24✔
941
                "Max shard / average shard: {:.2}%",
12✔
942
                (100.0 * max_shard as f64)
12✔
943
                    / (self.num_keys as f64 / shard_edge.num_shards() as f64)
12✔
944
            ));
945
        }
946

947
        if max_shard as f64 > 1.01 * self.num_keys as f64 / shard_edge.num_shards() as f64 {
×
948
            // This might sometimes happen with small sharded graphs
949
            Err(SolveError::MaxShardTooBig.into())
1✔
950
        } else {
951
            let data = new_data(
952
                self.bit_width,
×
953
                shard_edge.num_vertices() * shard_edge.num_shards(),
×
954
            );
955
            self.try_build_from_shard_iter(seed, data, shard_store.into_iter(), get_val, pl)
×
956
                .inspect(|_| {
62✔
957
                    info!(
62✔
958
                        "Construction from signatures completed in {:.3} seconds ({} keys, {:.3} ns/key)",
52✔
959
                        start.elapsed().as_secs_f64(),
104✔
960
                        self.num_keys,
×
961
                        start.elapsed().as_nanos() as f64 / self.num_keys as f64
52✔
962
                    );
963
                })
964
                .map_err(Into::into)
×
965
        }
966
    }
967

968
    /// Builds and return a new function starting from an iterator on shards.
969
    ///
970
    /// See [`VBuilder::build_loop`] for more details on the parameters.
971
    fn try_build_from_shard_iter<
196✔
972
        T: ?Sized + ToSig<S>,
973
        I,
974
        P,
975
        V: ZeroCopy + Default + Send + Sync,
976
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
977
    >(
978
        &mut self,
979
        seed: u64,
980
        mut data: D,
981
        shard_iter: I,
982
        get_val: &G,
983
        pl: &mut P,
984
    ) -> Result<VFunc<T, W, D, S, E>, SolveError>
985
    where
986
        SigVal<S, V>: RadixKey,
987
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
988
        P: ProgressLog + Clone + Send + Sync,
989
        I: Iterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
990
        for<'a> ShardDataIter<'a, W, D>: Send,
991
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
992
    {
993
        let shard_edge = &self.shard_edge;
392✔
994
        self.num_threads = shard_edge.num_shards().min(self.max_num_threads);
588✔
995

996
        pl.info(format_args!("{}", self.shard_edge));
784✔
997
        pl.info(format_args!(
588✔
998
            "c: {}, Overhead: {:+.4}% Number of threads: {}",
196✔
999
            self.c,
×
1000
            100. * ((shard_edge.num_vertices() * shard_edge.num_shards()) as f64
588✔
1001
                / (self.num_keys as f64)
196✔
1002
                - 1.),
196✔
1003
            self.num_threads
×
1004
        ));
1005

1006
        if self.lge {
196✔
1007
            pl.info(format_args!("Peeling towards lazy Gaussian elimination"));
696✔
1008
            self.par_solve(
348✔
1009
                shard_iter,
174✔
1010
                &mut data,
174✔
1011
                |this, shard_index, shard, data, pl| {
165✔
1012
                    this.lge_shard(shard_index, shard, data, get_val, pl)
1,155✔
1013
                },
1014
                &mut pl.concurrent(),
174✔
1015
                pl,
174✔
1016
            )?;
1017
        } else if self.low_mem == Some(true)
22✔
1018
            || self.low_mem.is_none() && self.num_threads > 3 && shard_edge.num_shards() > 2
14✔
1019
        {
1020
            // Less memory, but slower
1021
            self.par_solve(
10✔
1022
                shard_iter,
×
1023
                &mut data,
×
1024
                |this, shard_index, shard, data, pl| {
10✔
1025
                    this.peel_by_sig_vals_low_mem(shard_index, shard, data, get_val, pl)
70✔
1026
                },
1027
                &mut pl.concurrent(),
×
1028
                pl,
×
1029
            )?;
1030
        } else {
1031
            // More memory, but faster
1032
            self.par_solve(
12✔
1033
                shard_iter,
×
1034
                &mut data,
×
1035
                |this, shard_index, shard, data, pl| {
12✔
1036
                    this.peel_by_sig_vals_high_mem(shard_index, shard, data, get_val, pl)
84✔
1037
                },
1038
                &mut pl.concurrent(),
×
1039
                pl,
×
1040
            )?;
1041
        }
1042

1043
        pl.info(format_args!(
62✔
1044
            "Bits/keys: {} ({:+.4}%)",
×
1045
            data.len() as f64 * self.bit_width as f64 / self.num_keys as f64,
×
1046
            100.0 * (data.len() as f64 / self.num_keys as f64 - 1.),
×
1047
        ));
1048

1049
        Ok(VFunc {
×
1050
            seed,
×
1051
            shard_edge: self.shard_edge,
×
1052
            num_keys: self.num_keys,
×
1053
            data,
×
1054
            _marker_t: std::marker::PhantomData,
×
1055
            _marker_w: std::marker::PhantomData,
×
1056
            _marker_s: std::marker::PhantomData,
×
1057
        })
1058
    }
1059
}
1060

1061
macro_rules! remove_edge {
1062
    ($xor_graph: ident, $e: ident, $side: ident, $edge: ident, $stack: ident, $push:ident) => {
1063
        match $side {
1064
            0 => {
1065
                if $xor_graph.degree($e[1]) == 2 {
1066
                    $stack.$push($e[1].cast());
1067
                }
1068
                $xor_graph.remove($e[1], $edge, 1);
1069
                if $xor_graph.degree($e[2]) == 2 {
1070
                    $stack.$push($e[2].cast());
1071
                }
1072
                $xor_graph.remove($e[2], $edge, 2);
1073
            }
1074
            1 => {
1075
                if $xor_graph.degree($e[0]) == 2 {
1076
                    $stack.$push($e[0].cast());
1077
                }
1078
                $xor_graph.remove($e[0], $edge, 0);
1079
                if $xor_graph.degree($e[2]) == 2 {
1080
                    $stack.$push($e[2].cast());
1081
                }
1082
                $xor_graph.remove($e[2], $edge, 2);
1083
            }
1084
            2 => {
1085
                if $xor_graph.degree($e[0]) == 2 {
1086
                    $stack.$push($e[0].cast());
1087
                }
1088
                $xor_graph.remove($e[0], $edge, 0);
1089
                if $xor_graph.degree($e[1]) == 2 {
1090
                    $stack.$push($e[1].cast());
1091
                }
1092
                $xor_graph.remove($e[1], $edge, 1);
1093
            }
1094
            _ => unsafe { unreachable_unchecked() },
1095
        }
1096
    };
1097
}
1098

1099
impl<
1100
        W: ZeroCopy + Word + Send + Sync,
1101
        D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync,
1102
        S: Sig + ZeroCopy + Send + Sync,
1103
        E: ShardEdge<S, 3>,
1104
    > VBuilder<W, D, S, E>
1105
{
1106
    fn count_sort<V: ZeroCopy>(&self, data: &mut [SigVal<S, V>]) {
187✔
1107
        let num_sort_keys = self.shard_edge.num_sort_keys();
561✔
1108
        let mut count = vec![0; num_sort_keys];
561✔
1109

1110
        let mut copied = Box::new_uninit_slice(data.len());
748✔
1111
        for (&sig_val, copy) in data.iter().zip(copied.iter_mut()) {
23,081,635✔
1112
            count[self.shard_edge.sort_key(sig_val.sig)] += 1;
×
1113
            copy.write(sig_val);
×
1114
        }
1115
        let copied = unsafe { copied.assume_init() };
561✔
1116

1117
        count.iter_mut().fold(0, |acc, c| {
8,613✔
1118
            let old = *c;
16,478✔
1119
            *c = acc;
8,239✔
1120
            acc + old
8,239✔
1121
        });
1122

1123
        for &sig_val in copied.iter() {
23,081,074✔
1124
            let key = self.shard_edge.sort_key(sig_val.sig);
×
1125
            data[count[key]] = sig_val;
×
1126
            count[key] += 1;
×
1127
        }
1128
    }
1129

1130
    /// After this number of keys, in the case of filters we remove duplicate
1131
    /// edges.
1132
    const MAX_NO_LOCAL_SIG_CHECK: usize = 1 << 33;
1133

1134
    /// Solves in parallel shards returned by an iterator, storing
1135
    /// the result in `data`.
1136
    ///
1137
    /// # Arguments
1138
    ///
1139
    /// * `shard_iter`: an iterator returning the shards to solve.
1140
    ///
1141
    /// * `data`: the storage for the solution values.
1142
    ///
1143
    /// * `solve_shard`: a method to solve a shard; it takes the shard index,
1144
    ///   the shard, shard-local data, and a progress logger. It may
1145
    ///   fail by returning an error.
1146
    ///
1147
    /// * `main_pl`: the progress logger for the overall computation.
1148
    ///
1149
    /// * `pl`: a progress logger that will be cloned to display the progress of
1150
    ///   a current shard.
1151
    ///
1152
    /// # Errors
1153
    ///
1154
    /// This method returns an error if one of the shards cannot be solved, or
1155
    /// if duplicates are detected.
1156
    fn par_solve<
196✔
1157
        'b,
1158
        V: ZeroCopy + Send + Sync,
1159
        I: IntoIterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1160
        SS: Fn(&Self, usize, Arc<Vec<SigVal<S, V>>>, ShardData<'b, W, D>, &mut P) -> Result<(), ()>
1161
            + Send
1162
            + Sync
1163
            + Copy,
1164
        C: ConcurrentProgressLog + Send + Sync,
1165
        P: ProgressLog + Clone + Send + Sync,
1166
    >(
1167
        &self,
1168
        shard_iter: I,
1169
        data: &'b mut D,
1170
        solve_shard: SS,
1171
        main_pl: &mut C,
1172
        pl: &mut P,
1173
    ) -> Result<(), SolveError>
1174
    where
1175
        I::IntoIter: Send,
1176
        SigVal<S, V>: RadixKey,
1177
        for<'a> ShardDataIter<'a, W, D>: Send,
1178
        for<'a> <ShardDataIter<'a, W, D> as Iterator>::Item: Send,
1179
    {
1180
        main_pl
392✔
1181
            .item_name("shard")
1182
            .expected_updates(Some(self.shard_edge.num_shards()))
392✔
1183
            .display_memory(true)
1184
            .start("Solving shards...");
1185

1186
        self.failed.store(false, Ordering::Relaxed);
588✔
1187
        let num_shards = self.shard_edge.num_shards();
588✔
1188
        let buffer_size = self.num_threads.ilog2() as usize;
392✔
1189

1190
        let (err_send, err_recv) = crossbeam_channel::bounded::<_>(self.num_threads);
588✔
1191
        let (data_send, data_recv) = crossbeam_channel::bounded::<(
1192
            usize,
1193
            (Arc<Vec<SigVal<S, V>>>, ShardData<'_, W, D>),
1194
        )>(buffer_size);
196✔
1195

1196
        let result = std::thread::scope(|scope| {
588✔
1197
            scope.spawn(move || {
588✔
1198
                let _ = thread_priority::set_current_thread_priority(ThreadPriority::Max);
196✔
1199
                for val in shard_iter
397✔
1200
                    .into_iter()
196✔
1201
                    .zip(data.try_chunks_mut(self.shard_edge.num_vertices()).unwrap())
1,176✔
1202
                    .enumerate()
196✔
1203
                {
1204
                    if data_send.send(val).is_err() {
603✔
1205
                        break;
×
1206
                    }
1207
                }
1208

1209
                drop(data_send);
392✔
1210
            });
1211

1212
            for _thread_id in 0..self.num_threads {
397✔
1213
                let mut main_pl = main_pl.clone();
×
1214
                let mut pl = pl.clone();
×
1215
                let err_send = err_send.clone();
×
1216
                let data_recv = data_recv.clone();
×
1217
                scope.spawn(move || {
201✔
1218
                    loop {
×
1219
                        match data_recv.recv() {
258✔
1220
                            Err(_) => return,
57✔
1221
                            Ok((shard_index, (shard, mut data))) => {
201✔
1222
                                if shard.is_empty() {
×
1223
                                    return;
10✔
1224
                                }
1225

1226
                                main_pl.info(format_args!(
×
1227
                                    "Analyzing shard {}/{}...",
×
1228
                                    shard_index + 1,
×
1229
                                    num_shards
×
1230
                                ));
1231

1232
                                pl.start(format!(
×
1233
                                    "Sorting shard {}/{}...",
×
1234
                                    shard_index + 1,
×
1235
                                    num_shards
×
1236
                                ));
1237

1238
                                {
1239
                                    // SAFETY: only one thread may be accessing the
1240
                                    // shard, and we will be consuming it
1241
                                    let shard = unsafe {
×
1242
                                        &mut *(Arc::as_ptr(&shard) as *mut Vec<SigVal<S, V>>)
×
1243
                                    };
1244

1245
                                    if self.check_dups {
×
1246
                                        shard.radix_sort_builder().sort();
12✔
1247
                                        if shard.par_windows(2).any(|w| w[0].sig == w[1].sig) {
18✔
1248
                                            let _ = err_send.send(SolveError::DuplicateSignature);
8✔
1249
                                            return;
4✔
1250
                                        }
1251
                                    }
1252

1253
                                    if TypeId::of::<E::LocalSig>() != TypeId::of::<S>()
187✔
1254
                                        && self.num_keys > Self::MAX_NO_LOCAL_SIG_CHECK
109✔
1255
                                    {
1256
                                        // Check for duplicate local signatures
1257

1258
                                        // E::SortSig provides a transmutable
1259
                                        // view of SigVal with an implementation
1260
                                        // of RadixKey that is compatible with
1261
                                        // the sort induced by the key returned
1262
                                        // by ShardEdge::sort_key, and equality
1263
                                        // that implies equality of local
1264
                                        // signatures.
1265

1266
                                        // SAFETY: we drop this immediately after sorting.
1267
                                        let shard = unsafe {
×
1268
                                            transmute::<
×
1269
                                                &mut Vec<SigVal<S, V>>,
×
1270
                                                &mut Vec<E::SortSigVal<V>>,
×
1271
                                            >(shard)
×
1272
                                        };
1273

1274
                                        let builder = shard.radix_sort_builder();
×
1275
                                        if self.max_num_threads == 1 {
×
1276
                                            builder
×
1277
                                                .with_single_threaded_tuner()
×
1278
                                                .with_parallel(false)
×
1279
                                        } else {
1280
                                            builder
×
1281
                                        }
1282
                                        .sort();
×
1283

1284
                                        let shard_len = shard.len();
×
1285
                                        shard.dedup();
×
1286

1287
                                        if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
×
1288
                                            // Duplicate local signatures on
1289
                                            // large filters can be simply
1290
                                            // removed: they do not change the
1291
                                            // semantics of the filter because
1292
                                            // hashes are computed on
1293
                                            // local signatures.
1294
                                            pl.info(format_args!(
×
1295
                                                "Removed signatures: {}",
×
1296
                                                shard_len - shard.len()
×
1297
                                            ));
1298
                                        } else {
1299
                                            // For function, we have to try again
1300
                                            if shard_len != shard.len() {
×
1301
                                                let _ = err_send
×
1302
                                                    .send(SolveError::DuplicateLocalSignature);
×
1303
                                                return;
×
1304
                                            }
1305
                                        }
1306
                                    } else if self.shard_edge.num_sort_keys() != 1 {
374✔
1307
                                        // Sorting the signatures increases locality
1308
                                        self.count_sort::<V>(shard);
374✔
1309
                                    }
1310
                                }
1311

1312
                                pl.done_with_count(shard.len());
187✔
1313

1314
                                main_pl.info(format_args!(
×
1315
                                    "Solving shard {}/{}...",
×
1316
                                    shard_index + 1,
×
1317
                                    num_shards
×
1318
                                ));
1319

1320
                                if self.failed.load(Ordering::Relaxed) {
×
1321
                                    return;
×
1322
                                }
1323

1324
                                if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
111✔
1325
                                    // For filters, we fill the array with random data, otherwise
1326
                                    // elements with signature 0 would have a significantly higher
1327
                                    // probability of being false positives.
1328
                                    //
1329
                                    // SAFETY: We work around the fact that [usize] does not implement Fill
1330
                                    Mwc192::seed_from_u64(self.seed).fill_bytes(unsafe {
333✔
1331
                                        data.as_mut_slice().align_to_mut::<u8>().1
111✔
1332
                                    });
1333
                                }
1334

1335
                                if solve_shard(self, shard_index, shard, data, &mut pl).is_err() {
×
1336
                                    let _ = err_send.send(SolveError::UnsolvableShard);
260✔
1337
                                    return;
130✔
1338
                                }
1339

1340
                                if self.failed.load(Ordering::Relaxed) {
×
1341
                                    return;
×
1342
                                }
1343

1344
                                main_pl.info(format_args!(
×
1345
                                    "Completed shard {}/{}",
×
1346
                                    shard_index + 1,
×
1347
                                    num_shards
×
1348
                                ));
1349
                                main_pl.update_and_display();
×
1350
                            }
1351
                        }
1352
                    }
1353
                });
1354
            }
1355

1356
            drop(err_send);
392✔
1357
            drop(data_recv);
392✔
1358

1359
            if let Some(error) = err_recv.into_iter().next() {
330✔
1360
                self.failed.store(true, Ordering::Relaxed);
×
1361
                return Err(error);
×
1362
            }
1363

1364
            Ok(())
62✔
1365
        });
1366

1367
        main_pl.done();
392✔
1368
        result
196✔
1369
    }
1370

1371
    /// Peels a shard via edge indices.
1372
    ///
1373
    /// This peeler uses a [`SigVal`] per key (the shard), a
1374
    /// [`ShardEdge::Vertex`] and a byte per vertex (for the [`XorGraph`]), a
1375
    /// [`ShardEdge::Vertex`] per vertex (for the [`DoubleStack`]), and a final
1376
    /// byte per vertex (for the stack of sides).
1377
    ///
1378
    /// This peeler uses more memory than
1379
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) but
1380
    /// less memory than
1381
    /// [`peel_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem). It
1382
    /// is fairly slow as it has to go through a cache-unfriendly memory
1383
    /// indirection every time it has to retrieve a [`SigVal`] from the shard,
1384
    /// but it is the peeler of choice when [lazy Gaussian
1385
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) is required, as
1386
    /// after a failed peel-by-sig-vals it is not possible to retrieve
1387
    /// information about the signature/value pairs in the shard.
1388
    ///
1389
    /// In theory one could avoid the stack of sides by putting vertices,
1390
    /// instead of edge indices, on the upper stack, and retrieving edge indices
1391
    /// and sides from the [`XorGraph`], as
1392
    /// [`peel_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) does,
1393
    ///  but this would be less cache friendly. This peeler is only used for
1394
    /// very small instances, and since we are going to pass through lazy
1395
    /// Gaussian elimination some additional speed is a good idea.
1396
    fn peel_by_index<
165✔
1397
        'a,
1398
        V: ZeroCopy + Send + Sync,
1399
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1400
    >(
1401
        &self,
1402
        shard_index: usize,
1403
        shard: Arc<Vec<SigVal<S, V>>>,
1404
        data: ShardData<'a, W, D>,
1405
        get_val: &G,
1406
        pl: &mut impl ProgressLog,
1407
    ) -> Result<PeelResult<'a, W, D, S, E, V>, ()> {
1408
        let shard_edge = &self.shard_edge;
330✔
1409
        let num_vertices = shard_edge.num_vertices();
495✔
1410
        let num_shards = shard_edge.num_shards();
495✔
1411

1412
        pl.start(format!(
495✔
1413
            "Generating graph for shard {}/{}...",
165✔
1414
            shard_index + 1,
165✔
1415
            num_shards
×
1416
        ));
1417

1418
        let mut xor_graph = XorGraph::<E::Vertex>::new(num_vertices);
495✔
1419
        for (edge_index, sig_val) in shard.iter().enumerate() {
1,081,030✔
1420
            for (side, &v) in shard_edge
3,242,100✔
1421
                .local_edge(shard_edge.local_sig(sig_val.sig))
×
1422
                .iter()
×
1423
                .enumerate()
×
1424
            {
1425
                xor_graph.add(v, edge_index.cast(), side);
×
1426
            }
1427
        }
1428
        pl.done_with_count(shard.len());
495✔
1429

1430
        assert!(
165✔
1431
            !xor_graph.overflow,
165✔
1432
            "Degree overflow for shard {}/{}",
×
1433
            shard_index + 1,
×
1434
            num_shards
×
1435
        );
1436

1437
        if self.failed.load(Ordering::Relaxed) {
495✔
1438
            return Err(());
×
1439
        }
1440

1441
        // The lower stack contains vertices to be visited. The upper stack
1442
        // contains peeled edges.
1443
        let mut double_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1444
        let mut sides_stack = Vec::<u8>::new();
×
1445

1446
        pl.start(format!(
×
1447
            "Peeling graph for shard {}/{} by edge indices...",
×
1448
            shard_index + 1,
×
1449
            num_shards
×
1450
        ));
1451

1452
        // Preload all vertices of degree one in the visit stack
1453
        for (v, degree) in xor_graph.degrees().enumerate() {
1,223,124✔
1454
            if degree == 1 {
229,108✔
1455
                double_stack.push_lower(v.cast());
687,324✔
1456
            }
1457
        }
1458

1459
        while let Some(v) = double_stack.pop_lower() {
967,413✔
1460
            let v: usize = v.upcast();
×
1461
            if xor_graph.degree(v) == 0 {
×
1462
                continue;
32,940✔
1463
            }
1464
            debug_assert!(xor_graph.degree(v) == 1);
1,352,052✔
1465
            let (edge_index, side) = xor_graph.edge_and_side(v);
1,802,736✔
1466
            xor_graph.zero(v);
1,352,052✔
1467
            double_stack.push_upper(edge_index);
1,352,052✔
1468
            sides_stack.push(side as u8);
1,352,052✔
1469

1470
            let e = shard_edge.local_edge(shard_edge.local_sig(shard[edge_index.upcast()].sig));
3,154,788✔
1471
            remove_edge!(xor_graph, e, side, edge_index, double_stack, push_lower);
450,684✔
1472
        }
1473

1474
        pl.done();
330✔
1475

1476
        if shard.len() != double_stack.upper_len() {
330✔
1477
            pl.info(format_args!(
465✔
1478
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
155✔
1479
                shard_index + 1,
155✔
1480
                num_shards,
×
1481
                double_stack.upper_len(),
310✔
1482
                shard.len(),
155✔
1483
            ));
1484
            return Ok(PeelResult::Partial {
155✔
1485
                shard_index,
310✔
1486
                shard,
310✔
1487
                data,
310✔
1488
                double_stack,
155✔
1489
                sides_stack,
155✔
1490
            });
1491
        }
1492

1493
        self.assign(
×
1494
            shard_index,
×
1495
            data,
×
1496
            double_stack
×
1497
                .iter_upper()
×
1498
                .map(|&edge_index| {
100✔
1499
                    // Turn edge indices into local edge signatures
1500
                    // and associated values
1501
                    let sig_val = &shard[edge_index.upcast()];
300✔
1502
                    let local_sig = shard_edge.local_sig(sig_val.sig);
400✔
1503
                    (
1504
                        local_sig,
200✔
1505
                        get_val(
200✔
1506
                            shard_edge,
200✔
1507
                            SigVal {
100✔
1508
                                sig: local_sig,
100✔
1509
                                val: sig_val.val,
100✔
1510
                            },
1511
                        ),
1512
                    )
1513
                })
1514
                .zip(sides_stack.into_iter().rev()),
×
1515
            pl,
×
1516
        );
1517

1518
        Ok(PeelResult::Complete())
×
1519
    }
1520

1521
    /// Peels a shard via signature/value pairs using a stack of peeled
1522
    /// signatures/value pairs.
1523
    ///
1524
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1525
    /// it drops it immediately after building the graph.
1526
    ///
1527
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]), a
1528
    /// [`ShardEdge::Vertex`] per vertex (for visit stack, albeit usually the stack
1529
    /// never contains more than a third of the vertices), and a [`SigVal`] and
1530
    /// a byte per key (for the stack of peeled edges).
1531
    ///
1532
    /// This is the fastest and more memory-consuming peeler. It has however
1533
    /// just a small advantage during assignment with respect to
1534
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem), which
1535
    /// uses almost half the memory. It is the peeler of choice for low levels
1536
    /// of parallelism.
1537
    ///
1538
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1539
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a failed
1540
    /// peeling it is not possible to retrieve information about the
1541
    /// signature/value pairs in the shard.
1542
    fn peel_by_sig_vals_high_mem<
12✔
1543
        V: ZeroCopy + Send + Sync,
1544
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1545
    >(
1546
        &self,
1547
        shard_index: usize,
1548
        shard: Arc<Vec<SigVal<S, V>>>,
1549
        data: ShardData<'_, W, D>,
1550
        get_val: &G,
1551
        pl: &mut impl ProgressLog,
1552
    ) -> Result<(), ()>
1553
    where
1554
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1555
    {
1556
        let shard_edge = &self.shard_edge;
24✔
1557
        let num_vertices = shard_edge.num_vertices();
36✔
1558
        let num_shards = shard_edge.num_shards();
36✔
1559
        let shard_len = shard.len();
24✔
1560

1561
        pl.start(format!(
36✔
1562
            "Generating graph for shard {}/{}...",
12✔
1563
            shard_index + 1,
12✔
1564
            num_shards
×
1565
        ));
1566

1567
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
36✔
1568
        for &sig_val in shard.iter() {
12,000,024✔
1569
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1570
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
36,000,000✔
1571
                xor_graph.add(
×
1572
                    v,
×
1573
                    SigVal {
×
1574
                        sig: local_sig,
×
1575
                        val: sig_val.val,
×
1576
                    },
1577
                    side,
×
1578
                );
1579
            }
1580
        }
1581
        pl.done_with_count(shard.len());
36✔
1582

1583
        // We are using a consuming iterator over the shard store, so this
1584
        // drop will free the memory used by the signatures
1585
        drop(shard);
24✔
1586

1587
        assert!(
12✔
1588
            !xor_graph.overflow,
12✔
1589
            "Degree overflow for shard {}/{}",
×
1590
            shard_index + 1,
×
1591
            num_shards
×
1592
        );
1593

1594
        if self.failed.load(Ordering::Relaxed) {
36✔
1595
            return Err(());
×
1596
        }
1597

1598
        let mut sig_vals_stack = FastStack::<SigVal<E::LocalSig, V>>::new(shard_len);
×
1599
        let mut sides_stack = FastStack::<u8>::new(shard_len);
×
1600
        // Experimentally this stack never grows beyond a little more than
1601
        // num_vertices / 4
1602
        let mut visit_stack = Vec::<E::Vertex>::with_capacity(num_vertices / 3);
×
1603

1604
        pl.start(format!(
×
1605
            "Peeling graph for shard {}/{} by signatures (high-mem)...",
×
1606
            shard_index + 1,
×
1607
            num_shards
×
1608
        ));
1609

1610
        // Preload all vertices of degree one in the visit stack
1611
        for (v, degree) in xor_graph.degrees().enumerate() {
13,565,952✔
1612
            if degree == 1 {
2,530,634✔
1613
                visit_stack.push(v.cast());
7,591,902✔
1614
            }
1615
        }
1616

1617
        while let Some(v) = visit_stack.pop() {
25,122,948✔
1618
            let v: usize = v.upcast();
×
1619
            if xor_graph.degree(v) == 0 {
×
1620
                continue;
561,468✔
1621
            }
1622
            let (sig_val, side) = xor_graph.edge_and_side(v);
48,000,000✔
1623
            xor_graph.zero(v);
36,000,000✔
1624
            sig_vals_stack.push(sig_val);
36,000,000✔
1625
            sides_stack.push(side as u8);
36,000,000✔
1626

1627
            let e = self.shard_edge.local_edge(sig_val.sig);
48,000,000✔
1628
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push);
12,000,000✔
1629
        }
1630

1631
        pl.done();
24✔
1632

1633
        if shard_len != sig_vals_stack.len() {
24✔
1634
            pl.info(format_args!(
×
1635
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1636
                shard_index + 1,
×
1637
                num_shards,
×
1638
                sig_vals_stack.len(),
×
1639
                shard_len
×
1640
            ));
1641
            return Err(());
×
1642
        }
1643

1644
        self.assign(
×
1645
            shard_index,
×
1646
            data,
×
1647
            sig_vals_stack
×
1648
                .iter()
×
1649
                .rev()
×
1650
                .map(|&sig_val| (sig_val.sig, get_val(shard_edge, sig_val)))
36,000,000✔
1651
                .zip(sides_stack.iter().copied().rev()),
×
1652
            pl,
×
1653
        );
1654

1655
        Ok(())
×
1656
    }
1657

1658
    /// Peels a shard via signature/value pairs using a stack of vertices to
1659
    /// represent peeled edges.
1660
    ///
1661
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1662
    /// it drops it immediately after building the graph.
1663
    ///
1664
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]) and a
1665
    /// [`ShardEdge::Vertex`] per vertex (for a [`DoubleStack`]).
1666
    ///
1667
    /// This is by far the less memory-hungry peeler, and it is just slightly
1668
    /// slower than
1669
    /// [`peek_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem),
1670
    /// which uses almost twice the memory. It is the peeler of choice for
1671
    /// significant levels of parallelism.
1672
    ///
1673
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1674
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a
1675
    /// failed peeling it is not possible to retrieve information about the
1676
    /// signature/value pairs in the shard.
1677
    fn peel_by_sig_vals_low_mem<
10✔
1678
        V: ZeroCopy + Send + Sync,
1679
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1680
    >(
1681
        &self,
1682
        shard_index: usize,
1683
        shard: Arc<Vec<SigVal<S, V>>>,
1684
        data: ShardData<'_, W, D>,
1685
        get_val: &G,
1686
        pl: &mut impl ProgressLog,
1687
    ) -> Result<(), ()>
1688
    where
1689
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1690
    {
1691
        let shard_edge = &self.shard_edge;
20✔
1692
        let num_vertices = shard_edge.num_vertices();
30✔
1693
        let num_shards = shard_edge.num_shards();
30✔
1694
        let shard_len = shard.len();
20✔
1695

1696
        pl.start(format!(
30✔
1697
            "Generating graph for shard {}/{}...",
10✔
1698
            shard_index + 1,
10✔
1699
            num_shards,
×
1700
        ));
1701

1702
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
30✔
1703
        for &sig_val in shard.iter() {
10,000,020✔
1704
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1705
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
30,000,000✔
1706
                xor_graph.add(
×
1707
                    v,
×
1708
                    SigVal {
×
1709
                        sig: local_sig,
×
1710
                        val: sig_val.val,
×
1711
                    },
1712
                    side,
×
1713
                );
1714
            }
1715
        }
1716
        pl.done_with_count(shard.len());
30✔
1717

1718
        // We are using a consuming iterator over the shard store, so this
1719
        // drop will free the memory used by the signatures
1720
        drop(shard);
20✔
1721

1722
        assert!(
10✔
1723
            !xor_graph.overflow,
10✔
1724
            "Degree overflow for shard {}/{}",
×
1725
            shard_index + 1,
×
1726
            num_shards
×
1727
        );
1728

1729
        if self.failed.load(Ordering::Relaxed) {
30✔
1730
            return Err(());
×
1731
        }
1732

1733
        let mut visit_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1734

1735
        pl.start(format!(
×
1736
            "Peeling graph for shard {}/{} by signatures (low-mem)...",
×
1737
            shard_index + 1,
×
1738
            num_shards
×
1739
        ));
1740

1741
        // Preload all vertices of degree one in the visit stack
1742
        for (v, degree) in xor_graph.degrees().enumerate() {
11,304,960✔
1743
            if degree == 1 {
2,108,606✔
1744
                visit_stack.push_lower(v.cast());
6,325,818✔
1745
            }
1746
        }
1747

1748
        while let Some(v) = visit_stack.pop_lower() {
20,935,042✔
1749
            let v: usize = v.upcast();
×
1750
            if xor_graph.degree(v) == 0 {
×
1751
                continue;
467,516✔
1752
            }
1753
            let (sig_val, side) = xor_graph.edge_and_side(v);
40,000,000✔
1754
            xor_graph.zero(v);
30,000,000✔
1755
            visit_stack.push_upper(v.cast());
40,000,000✔
1756

1757
            let e = self.shard_edge.local_edge(sig_val.sig);
40,000,000✔
1758
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push_lower);
10,000,000✔
1759
        }
1760

1761
        pl.done();
20✔
1762

1763
        if shard_len != visit_stack.upper_len() {
20✔
1764
            pl.info(format_args!(
×
1765
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1766
                shard_index + 1,
×
1767
                num_shards,
×
1768
                visit_stack.upper_len(),
×
1769
                shard_len
×
1770
            ));
1771
            return Err(());
×
1772
        }
1773

1774
        self.assign(
×
1775
            shard_index,
×
1776
            data,
×
1777
            visit_stack.iter_upper().map(|&v| {
10,000,000✔
1778
                let (sig_val, side) = xor_graph.edge_and_side(v.upcast());
50,000,000✔
1779
                ((sig_val.sig, get_val(shard_edge, sig_val)), side as u8)
30,000,000✔
1780
            }),
1781
            pl,
×
1782
        );
1783

1784
        Ok(())
×
1785
    }
1786

1787
    /// Solves a shard of given index possibly using [lazy Gaussian
1788
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517), and stores the
1789
    /// solution in the given data.
1790
    ///
1791
    /// As a first try, the shard is [peeled by index](VBuilder::peel_by_index).
1792
    /// If the peeling is [partial](PeelResult::Partial), lazy Gaussian
1793
    /// elimination is used to solve the remaining edges.
1794
    ///
1795
    /// This method will scan the double stack, without emptying it, to check
1796
    /// which edges have been peeled. The information will be then passed to
1797
    /// [`VBuilder::assign`] to complete the assignment of values.
1798
    fn lge_shard<
165✔
1799
        V: ZeroCopy + Send + Sync,
1800
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
1801
    >(
1802
        &self,
1803
        shard_index: usize,
1804
        shard: Arc<Vec<SigVal<S, V>>>,
1805
        data: ShardData<'_, W, D>,
1806
        get_val: &G,
1807
        pl: &mut impl ProgressLog,
1808
    ) -> Result<(), ()> {
1809
        let shard_edge = &self.shard_edge;
330✔
1810
        // Let's try to peel first
1811
        match self.peel_by_index(shard_index, shard, data, get_val, pl) {
1,155✔
1812
            Err(()) => Err(()),
×
1813
            Ok(PeelResult::Complete()) => Ok(()),
10✔
1814
            Ok(PeelResult::Partial {
×
1815
                shard_index,
155✔
1816
                shard,
155✔
1817
                mut data,
155✔
1818
                double_stack,
155✔
1819
                sides_stack,
155✔
1820
            }) => {
×
1821
                pl.info(format_args!("Switching to lazy Gaussian elimination..."));
620✔
1822
                // Likely result--we have solve the rest
1823
                pl.start(format!(
465✔
1824
                    "Generating system for shard {}/{}...",
155✔
1825
                    shard_index + 1,
155✔
1826
                    shard_edge.num_shards()
310✔
1827
                ));
1828

1829
                let num_vertices = shard_edge.num_vertices();
465✔
1830
                let mut peeled_edges = BitVec::new(shard.len());
465✔
1831
                let mut used_vars = BitVec::new(num_vertices);
465✔
1832
                for &edge in double_stack.iter_upper() {
450,894✔
1833
                    peeled_edges.set(edge.upcast(), true);
×
1834
                }
1835

1836
                // Create data for a system using non-peeled edges
1837
                //
1838
                // SAFETY: there is no undefined behavior here, but the
1839
                // raw construction methods we use assume that the
1840
                // equations are sorted, that the variables are not repeated,
1841
                // and the variables are in the range [0..num_vertices).
1842
                let mut system = unsafe {
1843
                    crate::utils::mod2_sys::Modulo2System::from_parts(
1844
                        num_vertices,
155✔
1845
                        shard
155✔
1846
                            .iter()
155✔
1847
                            .enumerate()
155✔
1848
                            .filter(|(edge_index, _)| !peeled_edges[*edge_index])
1,080,755✔
1849
                            .map(|(_edge_index, sig_val)| {
630,171✔
1850
                                let local_sig = shard_edge.local_sig(sig_val.sig);
2,520,064✔
1851
                                let mut eq: Vec<_> = shard_edge
2,520,064✔
1852
                                    .local_edge(local_sig)
630,016✔
1853
                                    .iter()
630,016✔
1854
                                    .map(|&x| {
2,520,064✔
1855
                                        used_vars.set(x, true);
5,670,144✔
1856
                                        x as u32
1,890,048✔
1857
                                    })
1858
                                    .collect();
630,016✔
1859
                                eq.sort_unstable();
630,016✔
1860
                                crate::utils::mod2_sys::Modulo2Equation::from_parts(
630,016✔
1861
                                    eq,
630,016✔
1862
                                    get_val(
1,260,032✔
1863
                                        shard_edge,
1,260,032✔
1864
                                        SigVal {
630,016✔
1865
                                            sig: local_sig,
630,016✔
1866
                                            val: sig_val.val,
630,016✔
1867
                                        },
1868
                                    ),
1869
                                )
1870
                            })
1871
                            .collect(),
155✔
1872
                    )
1873
                };
1874

1875
                if self.failed.load(Ordering::Relaxed) {
465✔
1876
                    return Err(());
×
1877
                }
1878

1879
                pl.start("Solving system...");
×
1880
                let result = system.lazy_gaussian_elimination().map_err(|_| ())?;
155✔
1881
                pl.done_with_count(system.num_equations());
×
1882

1883
                for (v, &value) in result.iter().enumerate().filter(|(v, _)| used_vars[*v]) {
1,749,491✔
1884
                    data.set(v, value);
×
1885
                }
1886

1887
                self.assign(
×
1888
                    shard_index,
×
1889
                    data,
×
1890
                    double_stack
×
1891
                        .iter_upper()
×
1892
                        .map(|&edge_index| {
420,614✔
1893
                            let sig_val = &shard[edge_index.upcast()];
1,261,842✔
1894
                            let local_sig = shard_edge.local_sig(sig_val.sig);
1,682,456✔
1895
                            (
1896
                                local_sig,
841,228✔
1897
                                get_val(
841,228✔
1898
                                    shard_edge,
841,228✔
1899
                                    SigVal {
420,614✔
1900
                                        sig: local_sig,
420,614✔
1901
                                        val: sig_val.val,
420,614✔
1902
                                    },
1903
                                ),
1904
                            )
1905
                        })
1906
                        .zip(sides_stack.into_iter().rev()),
×
1907
                    pl,
×
1908
                );
1909
                Ok(())
×
1910
            }
1911
        }
1912
    }
1913

1914
    /// Perform assignment of values based on peeling data.
1915
    ///
1916
    /// This method might be called after a successful peeling procedure, or
1917
    /// after a linear solver has been used to solve the remaining edges.
1918
    ///
1919
    /// `sig_vals_sides` is an iterator returning pairs of signature/value pairs
1920
    /// and sides in reverse peeling order.
1921
    fn assign(
57✔
1922
        &self,
1923
        shard_index: usize,
1924
        mut data: ShardData<'_, W, D>,
1925
        sigs_vals_sides: impl Iterator<Item = ((E::LocalSig, W), u8)>,
1926
        pl: &mut impl ProgressLog,
1927
    ) {
1928
        if self.failed.load(Ordering::Relaxed) {
171✔
1929
            return;
×
1930
        }
1931

1932
        pl.start(format!(
×
1933
            "Assigning values for shard {}/{}...",
×
1934
            shard_index + 1,
×
1935
            self.shard_edge.num_shards()
×
1936
        ));
1937

1938
        for ((sig, val), side) in sigs_vals_sides {
89,682,913✔
1939
            let edge = self.shard_edge.local_edge(sig);
89,682,856✔
1940
            let side = side as usize;
44,841,428✔
1941
            unsafe {
1942
                let xor = match side {
44,841,428✔
1943
                    0 => data.get_unchecked(edge[1]) ^ data.get_unchecked(edge[2]),
3,935,744✔
1944
                    1 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[2]),
34,066,525✔
1945
                    2 => data.get_unchecked(edge[0]) ^ data.get_unchecked(edge[1]),
58,358,325✔
1946
                    _ => core::hint::unreachable_unchecked(),
1947
                };
1948

1949
                data.set_unchecked(edge[side], val ^ xor);
×
1950
            }
1951
        }
1952
        pl.done();
57✔
1953
    }
1954
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc