• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

vigna / sux-rs / 18543058803

15 Oct 2025 09:22PM UTC coverage: 64.418% (-0.7%) from 65.126%
18543058803

push

github

vigna
Docs

3628 of 5632 relevant lines covered (64.42%)

119122797.27 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

58.59
/src/func/vbuilder.rs
1
/*
2
 * SPDX-FileCopyrightText: 2025 Sebastiano Vigna
3
 *
4
 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
5
 */
6

7
#![allow(clippy::type_complexity)]
8
#![allow(clippy::too_many_arguments)]
9

10
use crate::bits::*;
11
use crate::dict::VFilter;
12
use crate::func::{shard_edge::ShardEdge, *};
13
use crate::traits::BitVecOpsMut;
14
use crate::traits::bit_field_slice::{BitFieldSlice, BitFieldSliceMut, Word};
15
use crate::utils::*;
16
use common_traits::{
17
    CastableInto, DowncastableFrom, DowncastableInto, UnsignedInt, UpcastableFrom, UpcastableInto,
18
};
19
use derivative::Derivative;
20
use derive_setters::*;
21
use dsi_progress_logger::*;
22
use log::info;
23
use rand::SeedableRng;
24
use rand::rngs::SmallRng;
25
use rand::{Rng, RngCore};
26
use rayon::iter::ParallelIterator;
27
use rayon::slice::ParallelSlice;
28
use rdst::*;
29
use std::any::TypeId;
30
use std::borrow::{Borrow, Cow};
31
use std::hint::unreachable_unchecked;
32
use std::marker::PhantomData;
33
use std::mem::transmute;
34
use std::ops::{BitXor, BitXorAssign};
35
use std::slice::Iter;
36
use std::sync::Arc;
37
use std::sync::atomic::{AtomicBool, Ordering};
38
use std::time::Instant;
39
use thread_priority::ThreadPriority;
40
use value_traits::slices::{SliceByValue, SliceByValueMut};
41

42
use super::shard_edge::FuseLge3Shards;
43

44
const LOG2_MAX_SHARDS: u32 = 16;
45

46
/// A builder for [`VFunc`] and [`VFilter`].
47
///
48
/// Keys must implement the [`ToSig`] trait, which provides a method to compute
49
/// a signature of the key.
50
///
51
/// There are two construction modes: in core memory (default) and
52
/// [offline](VBuilder::offline); both use a [`SigStore`]. In the first case,
53
/// space will be allocated in core memory for signatures and associated values
54
/// for all keys; in the second case, such information will be stored in a
55
/// number of on-disk buckets.
56
///
57
/// There are several setters: for example, you can set [set the maximum number
58
/// of threads](VBuilder::max_num_threads).
59
///
60
/// Once signatures have been computed, each parallel thread will process a
61
/// shard of the signature/value pairs. For very large key sets shards will be
62
/// significantly smaller than the number of keys, so the memory usage, in
63
/// particular in offline mode, can be significantly reduced. Note that using
64
/// too many threads might actually be harmful due to memory contention.
65
///
66
/// The generic parameters are explained in the [`VFunc`] documentation. You
67
/// have to choose the type of the output values and the backend. The remaining
68
/// parameters have default values that are the same as those of
69
/// [`VFunc`]/[`VFilter`], and some elaboration about them can be found in their
70
/// documentation.
71
///
72
/// All construction methods require to pass one or two [`RewindableIoLender`]s
73
/// (keys and possibly values), and the construction might fail and keys might
74
/// be scanned again. The structures in the [`lenders`] module provide easy ways
75
/// to build such lenders, even starting from compressed files of UTF-8 strings.
76
/// The type of the keys of the resulting filter or function will be the type of
77
/// the elements of the [`RewindableIoLender`].
78
///
79
/// # Examples
80
///
81
/// In this example, we build a function that maps each key to itself using a
82
/// boxed slice of `usize` as a backend (note that this is really wasteful). The
83
/// setter for the expected number of keys is used to optimize the construction.
84
/// We use the [`FromIntoIterator`] adapter to turn a clonable [`IntoIterator`]
85
/// into a [`RewindableIoLender`]. Note that you need the
86
/// [`dsi-progress-logger`](https://crates.io/crates/dsi-progress-logger) crate.
87
///
88
/// Type inference derives the input type (`usize`) from the type of the items
89
/// returned by the first [`RewindableIoLender`], and the output type (again,
90
/// `usize`, the first type parameter), from the backend type (`Box<[usize]>`,
91
/// the second type parameter):
92
///
93
/// ```rust
94
/// # use sux::func::VBuilder;
95
/// # use dsi_progress_logger::no_logging;
96
/// # use sux::utils::FromIntoIterator;
97
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
98
/// let builder = VBuilder::<_, Box<[usize]>>::default()
99
///     .expected_num_keys(100);
100
/// let func = builder.try_build_func(
101
///    FromIntoIterator::from(0..100),
102
///    FromIntoIterator::from(0..100),
103
///    no_logging![]
104
/// )?;
105
///
106
/// for i in 0..100 {
107
///    assert_eq!(i, func.get(&i));
108
/// }
109
/// #     Ok(())
110
/// # }
111
/// ```
112
///
113
/// Alternatively we can use the bit-field vector backend, that will use
114
/// ⌈log₂(99)⌉ bits per backend element:
115
///
116
/// ```rust
117
/// # use sux::func::VBuilder;
118
/// # use dsi_progress_logger::no_logging;
119
/// # use sux::utils::FromIntoIterator;
120
/// # use sux::bits::BitFieldVec;
121
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
122
/// let builder = VBuilder::<_, BitFieldVec<usize>>::default()
123
///     .expected_num_keys(100);
124
/// let func = builder.try_build_func(
125
///    FromIntoIterator::from(0..100),
126
///    FromIntoIterator::from(0..100),
127
///    no_logging![]
128
/// )?;
129
///
130
/// for i in 0..100 {
131
///    assert_eq!(i, func.get(&i));
132
/// }
133
/// #     Ok(())
134
/// # }
135
/// ```
136
///
137
/// Since the numbers are small, we can also try to use a fixed-size output:
138
/// type inference takes care of making the second range `0..100` a range of
139
/// `u8`. Note that the type of keys is always `usize`, as it is still inferred
140
/// from the type of the items returned by the first [`RewindableIoLender`]:
141
///
142
/// ```rust
143
/// # use sux::func::VBuilder;
144
/// # use dsi_progress_logger::no_logging;
145
/// # use sux::utils::FromIntoIterator;
146
/// # use sux::bits::BitFieldVec;
147
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
148
/// let builder = VBuilder::<_, Box<[u8]>>::default()
149
///     .expected_num_keys(100);
150
/// let func = builder.try_build_func(
151
///    FromIntoIterator::from(0..100),
152
///    FromIntoIterator::from(0..100),
153
///    no_logging![]
154
/// )?;
155
///
156
/// for i in 0..100 {
157
///    assert_eq!(i, func.get(&i));
158
/// }
159
/// #     Ok(())
160
/// # }
161
/// ```
162
///
163
///
164
/// We now try to build a fast 8-bit filter for the same key set, using a boxed
165
/// slice of `u8` as a backend (this is not wasteful, as the filter uses 8-bit
166
/// hashes):
167
///
168
/// ```rust
169
/// # use sux::func::VBuilder;
170
/// # use dsi_progress_logger::no_logging;
171
/// # use sux::utils::FromIntoIterator;
172
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
173
/// let builder = VBuilder::<_, Box<[u8]>>::default()
174
///     .expected_num_keys(100);
175
/// let func = builder.try_build_filter(
176
///    FromIntoIterator::from(0..100),
177
///    no_logging![]
178
/// )?;
179
///
180
/// for i in 0..100 {
181
///    assert!(func[i]);
182
/// }
183
/// #     Ok(())
184
/// # }
185
/// ```
186
///
187
/// Since the keys are very few, we can switch to 64-bit signatures, and no
188
/// shards, which will yield faster queries:
189
///
190
/// ```rust
191
/// # use sux::func::VBuilder;
192
/// # use sux::func::shard_edge::FuseLge3NoShards;
193
/// # use dsi_progress_logger::no_logging;
194
/// # use sux::utils::FromIntoIterator;
195
/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
196
/// let builder = VBuilder::<_, Box<[u8]>, [u64; 1], FuseLge3NoShards>::default()
197
///     .expected_num_keys(100);
198
/// let func = builder.try_build_filter(
199
///    FromIntoIterator::from(0..100),
200
///    no_logging![]
201
/// )?;
202
///
203
/// for i in 0..100 {
204
///    assert!(func[i]);
205
/// }
206
/// #     Ok(())
207
/// # }
208
/// ```
209

210
#[derive(Setters, Debug, Derivative)]
211
#[derivative(Default)]
212
#[setters(generate = false)]
213
pub struct VBuilder<
214
    W: Word + BinSafe,
215
    D: BitFieldSlice<W> + Send + Sync = Box<[W]>,
216
    S = [u64; 2],
217
    E: ShardEdge<S, 3> = FuseLge3Shards,
218
> {
219
    /// The expected number of keys.
220
    ///
221
    /// While this setter is optional, setting this value to a reasonable bound
222
    /// on the actual number of keys will significantly speed up the
223
    /// construction.
224
    #[setters(generate = true, strip_option)]
225
    #[derivative(Default(value = "None"))]
226
    expected_num_keys: Option<usize>,
227

228
    /// The maximum number of parallel threads to use. The default is 8.
229
    #[setters(generate = true)]
230
    #[derivative(Default(value = "8"))]
231
    max_num_threads: usize,
232

233
    /// Use disk-based buckets to reduce core memory usage at construction time.
234
    #[setters(generate = true)]
235
    offline: bool,
236

237
    /// Check for duplicated signatures. This is not necessary in general,
238
    /// but if you suspect you might be feeding duplicate keys, you can
239
    /// enable this check.
240
    #[setters(generate = true)]
241
    check_dups: bool,
242

243
    /// Use always the low-memory peel-by-signature algorithm (true) or the
244
    /// high-memory peel-by-index algorithm (false). The former is slightly
245
    /// slower, but it uses much less memory. Normally [`VBuilder`] uses
246
    /// high-mem and switches to low-mem if there are more
247
    /// than two threads and more than four shards.
248
    #[setters(generate = true, strip_option)]
249
    #[derivative(Default(value = "None"))]
250
    low_mem: Option<bool>,
251

252
    /// The seed for the random number generator.
253
    #[setters(generate = true)]
254
    seed: u64,
255

256
    /// The base-2 logarithm of buckets of the [`SigStore`]. The default is 8.
257
    /// This value is automatically overridden, even if set, if you provide an
258
    /// [expected number of keys](VBuilder::expected_num_keys).
259
    #[setters(generate = true, strip_option)]
260
    #[derivative(Default(value = "8"))]
261
    log2_buckets: u32,
262

263
    /// The target relative space loss due to [ε-cost
264
    /// sharding](https://doi.org/10.4230/LIPIcs.ESA.2019.38).s
265
    ///
266
    /// The default is 0.001. Setting a larger target, for example, 0.01, will
267
    /// increase the space overhead due to sharding, but will provide in general
268
    /// finer shards. This might not always happen, however, because the ε-cost
269
    /// bound is just one of the bounds concurring in limiting the amount of
270
    /// sharding for a specific [`ShardEdge`]. For example, increasing the
271
    /// target to 0.01 will provide very fine sharding using `Mwhc3Shards`
272
    /// shard/edge logic, activated by the `mwhc` feature, but will affect only
273
    /// slightly [`FuseLge3Shards`] or
274
    /// [`FuseLge3FullSigs`](crate::func::shard_edge::FuseLge3FullSigs).
275
    #[setters(generate = true, strip_option)]
276
    #[derivative(Default(value = "0.001"))]
277
    eps: f64,
278

279
    /// The bit width of the maximum value.
280
    bit_width: usize,
281
    /// The edge generator.
282
    shard_edge: E,
283
    /// The number of keys.
284
    num_keys: usize,
285
    /// The ratio between the number of vertices and the number of edges
286
    /// (i.e., between the number of variables and the number of equations).
287
    c: f64,
288
    /// Whether we should use [lazy Gaussian
289
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517).
290
    lge: bool,
291
    /// The number of threads to use.
292
    num_threads: usize,
293
    /// Fast-stop for failed attempts.
294
    failed: AtomicBool,
295
    #[doc(hidden)]
296
    _marker_v: PhantomData<(W, D, S)>,
297
}
298

299
/// Fatal build errors.
300
#[derive(thiserror::Error, Debug)]
301
pub enum BuildError {
302
    #[error("Duplicate key")]
303
    /// A duplicate key was detected.
304
    DuplicateKey,
305
    #[error("Duplicate local signatures: use full signatures")]
306
    /// A duplicate key was detected.
307
    DuplicateLocalSignatures,
308
    #[error("Value too large for specified bit size")]
309
    /// A value is too large for the specified bit size.
310
    ValueTooLarge,
311
}
312

313
#[derive(thiserror::Error, Debug)]
314
/// Transient error during the build, leading to
315
/// trying with a different seed.
316
pub enum SolveError {
317
    #[error("Duplicate signature")]
318
    /// A duplicate signature was detected.
319
    DuplicateSignature,
320
    #[error("Duplicate local signature")]
321
    /// A duplicate local signature was detected.
322
    DuplicateLocalSignature,
323
    #[error("Max shard too big")]
324
    /// The maximum shard is too big relatively to the average shard.
325
    MaxShardTooBig,
326
    #[error("Unsolvable shard")]
327
    /// A shard cannot be solved.
328
    UnsolvableShard,
329
}
330

331
/// The result of a peeling procedure.
332
enum PeelResult<
333
    'a,
334
    W: Word + BinSafe + Send + Sync,
335
    D: BitFieldSlice<W> + BitFieldSliceMut<W> + Send + Sync + 'a,
336
    S: Sig + BinSafe,
337
    E: ShardEdge<S, 3>,
338
    V: BinSafe,
339
> {
340
    Complete(),
341
    Partial {
342
        /// The shard index.
343
        shard_index: usize,
344
        /// The shard.
345
        shard: Arc<Vec<SigVal<S, V>>>,
346
        /// The data.
347
        data: ShardData<'a, D>,
348
        /// The double stack whose upper stack contains the peeled edges
349
        /// (possibly represented by the vertex from which they have been
350
        /// peeled).
351
        double_stack: DoubleStack<E::Vertex>,
352
        /// The sides stack.
353
        sides_stack: Vec<u8>,
354
        _marker: PhantomData<W>,
355
    },
356
}
357

358
/// A graph represented compactly.
359
///
360
/// Each (*k*-hyper)edge is a set of *k* vertices (by construction fuse graphs
361
/// do not have degenerate edges), but we represent it internally as a vector.
362
/// We call *side* the position of a vertex in the edge.
363
///
364
/// For each vertex we store information about the edges incident to the vertex
365
/// and the sides of the vertex in such edges. While technically not necessary
366
/// to perform peeling, the knowledge of the sides speeds up the peeling visit
367
/// by reducing the number of tests that are necessary to update the degrees
368
/// once the edge is peeled (see the `peel_by_*` methods). For the same reason
369
/// it also speeds up assignment.
370
///
371
/// Depending on the peeling method (by signature or by index), the graph will
372
/// store edge indices or signature/value pairs (the generic parameter `X`).
373
///
374
/// Edge information is packed together using Djamal's XOR trick (see
375
/// [“Cache-Oblivious Peeling of Random
376
/// Hypergraphs”](https://doi.org/10.1109/DCC.2014.48)): since during the
377
/// peeling b visit we need to know the content of the list only when a single
378
/// edge index is present, we can XOR together all the edge information.
379
///
380
/// We use a single byte to store the degree (six upper bits) and the XOR of the
381
/// sides (lower two bits). The degree can be stored with a small number of bits
382
/// because the graph is random, so the maximum degree is *O*(log log *n*).
383
/// In any case, the Boolean field `overflow` will become `true` in case of
384
/// overflow.
385
///
386
/// When we peel an edge, we just [zero the degree](Self::zero), leaving the
387
/// edge information and the side in place for further processing later.
388
struct XorGraph<X: Copy + Default + BitXor + BitXorAssign> {
389
    edges: Box<[X]>,
390
    degrees_sides: Box<[u8]>,
391
    overflow: bool,
392
}
393

394
impl<X: BitXor + BitXorAssign + Default + Copy> XorGraph<X> {
395
    pub fn new(n: usize) -> XorGraph<X> {
187✔
396
        XorGraph {
397
            edges: vec![X::default(); n].into(),
935✔
398
            degrees_sides: vec![0; n].into(),
374✔
399
            overflow: false,
400
        }
401
    }
402

403
    #[inline(always)]
404
    pub fn add(&mut self, v: usize, x: X, side: usize) {
69,242,100✔
405
        debug_assert!(side < 3);
138,484,200✔
406
        let (degree_size, overflow) = self.degrees_sides[v].overflowing_add(4);
207,726,300✔
407
        self.degrees_sides[v] = degree_size;
69,242,100✔
408
        self.overflow |= overflow;
69,242,100✔
409
        self.degrees_sides[v] ^= side as u8;
69,242,100✔
410
        self.edges[v] ^= x;
69,242,100✔
411
    }
412

413
    #[inline(always)]
414
    pub fn remove(&mut self, v: usize, x: X, side: usize) {
44,901,368✔
415
        debug_assert!(side < 3);
89,802,736✔
416
        self.degrees_sides[v] -= 4;
44,901,368✔
417
        self.degrees_sides[v] ^= side as u8;
44,901,368✔
418
        self.edges[v] ^= x;
44,901,368✔
419
    }
420

421
    #[inline(always)]
422
    pub fn zero(&mut self, v: usize) {
22,450,684✔
423
        self.degrees_sides[v] &= 0b11;
22,450,684✔
424
    }
425

426
    #[inline(always)]
427
    pub fn edge_and_side(&self, v: usize) -> (X, usize) {
32,450,684✔
428
        debug_assert!(self.degree(v) < 2);
97,352,052✔
429
        (self.edges[v] as _, (self.degrees_sides[v] & 0b11) as _)
64,901,368✔
430
    }
431

432
    #[inline(always)]
433
    pub fn degree(&self, v: usize) -> u8 {
101,315,344✔
434
        self.degrees_sides[v] >> 2
101,315,344✔
435
    }
436

437
    pub fn degrees(
187✔
438
        &self,
439
    ) -> std::iter::Map<std::iter::Copied<std::slice::Iter<'_, u8>>, fn(u8) -> u8> {
440
        self.degrees_sides.iter().copied().map(|d| d >> 2)
26,094,784✔
441
    }
442
}
443

444
/// A preallocated stack implementation that avoids the expensive (even if
445
/// rarely taken) branch of the `Vec` implementation in which memory is
446
/// reallocated. Note that using [`Vec::with_capacity`] is not enough, because
447
/// for the CPU the branch is still there.
448
struct FastStack<X: Copy + Default> {
449
    stack: Vec<X>,
450
    top: usize,
451
}
452

453
impl<X: Copy + Default> FastStack<X> {
454
    pub fn new(n: usize) -> FastStack<X> {
24✔
455
        FastStack {
456
            stack: vec![X::default(); n],
48✔
457
            top: 0,
458
        }
459
    }
460

461
    pub fn push(&mut self, x: X) {
24,000,000✔
462
        debug_assert!(self.top < self.stack.len());
72,000,000✔
463
        self.stack[self.top] = x;
24,000,000✔
464
        self.top += 1;
24,000,000✔
465
    }
466

467
    pub fn len(&self) -> usize {
12✔
468
        self.top
12✔
469
    }
470

471
    pub fn iter(&self) -> std::slice::Iter<'_, X> {
24✔
472
        self.stack[..self.top].iter()
48✔
473
    }
474
}
475

476
/// Two stacks in the same vector.
477
///
478
/// This struct implements a pair of stacks sharing the same memory. The lower
479
/// stack grows from the beginning of the vector, the upper stack grows from the
480
/// end of the vector. Since we use the lower stack for the visit and the upper
481
/// stack for peeled edges (possibly represented by the vertex from which they
482
/// have been peeled), the sum of the lengths of the two stacks cannot exceed
483
/// the length of the vector.
484
#[derive(Debug)]
485
struct DoubleStack<V> {
486
    stack: Vec<V>,
487
    lower: usize,
488
    upper: usize,
489
}
490

491
impl<V: Default + Copy> DoubleStack<V> {
492
    fn new(n: usize) -> DoubleStack<V> {
175✔
493
        DoubleStack {
494
            stack: vec![V::default(); n],
525✔
495
            lower: 0,
496
            upper: n,
497
        }
498
    }
499
}
500

501
impl<V: Copy> DoubleStack<V> {
502
    #[inline(always)]
503
    fn push_lower(&mut self, v: V) {
10,951,140✔
504
        debug_assert!(self.lower < self.upper);
21,902,280✔
505
        self.stack[self.lower] = v;
10,951,140✔
506
        self.lower += 1;
10,951,140✔
507
    }
508

509
    #[inline(always)]
510
    fn push_upper(&mut self, v: V) {
10,450,684✔
511
        debug_assert!(self.lower < self.upper);
20,901,368✔
512
        self.upper -= 1;
10,450,684✔
513
        self.stack[self.upper] = v;
10,450,684✔
514
    }
515

516
    #[inline(always)]
517
    fn pop_lower(&mut self) -> Option<V> {
10,951,315✔
518
        if self.lower == 0 {
10,951,315✔
519
            None
175✔
520
        } else {
521
            self.lower -= 1;
10,951,140✔
522
            Some(self.stack[self.lower])
×
523
        }
524
    }
525

526
    fn upper_len(&self) -> usize {
330✔
527
        self.stack.len() - self.upper
660✔
528
    }
529

530
    fn iter_upper(&self) -> Iter<'_, V> {
200✔
531
        self.stack[self.upper..].iter()
400✔
532
    }
533
}
534

535
/// An iterator over segments of data associated with each shard.
536
type ShardDataIter<'a, D> = <D as SliceByValueMut>::ChunksMut<'a>;
537
/// A segment of data associated with a specific shard.
538
type ShardData<'a, D> = <ShardDataIter<'a, D> as Iterator>::Item;
539

540
/// Builds a new function using a `Box<[W]>` to store values.
541
///
542
/// Since values are stored in a boxed slice access is particularly fast, but
543
/// the bit width of the output of the function will be exactly the bit width of
544
/// the unsigned type `W`.
545
impl<W: Word + BinSafe, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, Box<[W]>, S, E>
546
where
547
    u128: UpcastableFrom<W>,
548
    SigVal<S, W>: RadixKey,
549
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
550
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
551
{
552
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
553
        mut self,
554
        keys: impl RewindableIoLender<B>,
555
        values: impl RewindableIoLender<W>,
556
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
557
    ) -> anyhow::Result<VFunc<T, W, Box<[W]>, S, E>>
558
    where
559
        for<'a> <<Box<[W]> as SliceByValueMut>::ChunksMut<'a> as Iterator>::Item:
560
            BitFieldSliceMut<W>,
561
        for<'a> ShardDataIter<'a, Box<[W]>>: Send,
562
        for<'a> ShardData<'a, Box<[W]>>: Send,
563
    {
564
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
×
565
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
×
566
        self.build_loop(keys, values, None, &get_val, new_data, pl)
×
567
    }
568
}
569

570
/// Builds a new filter using a `Box<[W]>` to store values.
571
///
572
/// Since values are stored in a boxed slice access is particularly fast, but
573
/// the number of bits of the hashes will be  exactly the bit width of the
574
/// unsigned type `W`.
575
impl<W: Word + BinSafe + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
576
    VBuilder<W, Box<[W]>, S, E>
577
where
578
    u128: UpcastableFrom<W>,
579
    SigVal<S, EmptyVal>: RadixKey,
580
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
581
    Box<[W]>: BitFieldSliceMut<W> + BitFieldSlice<W>,
582
{
583
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
36✔
584
        mut self,
585
        keys: impl RewindableIoLender<B>,
586
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
587
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, Box<[W]>, S, E>>>
588
    where
589
        for<'a> <<Box<[W]> as SliceByValueMut>::ChunksMut<'a> as Iterator>::Item:
590
            BitFieldSliceMut<W>,
591
        for<'a> ShardDataIter<'a, Box<[W]>>: Send,
592
        for<'a> ShardData<'a, Box<[W]>>: Send,
593
    {
594
        let filter_mask = W::MAX;
72✔
595
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
12,630,474✔
596
            mix64(shard_edge.edge_hash(sig_val.sig)).downcast()
63,152,190✔
597
        };
598
        let new_data = |_bit_width: usize, len: usize| vec![W::ZERO; len].into();
378✔
599

600
        Ok(VFilter {
×
601
            func: self.build_loop(
72✔
602
                keys,
36✔
603
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
108✔
604
                Some(W::BITS),
36✔
605
                &get_val,
36✔
606
                new_data,
36✔
607
                pl,
36✔
608
            )?,
609
            filter_mask,
36✔
610
            hash_bits: W::BITS as u32,
×
611
        })
612
    }
613
}
614

615
/// Builds a new function using a [bit-field vector](BitFieldVec) on words of
616
/// the unsigned type `W` to store values.
617
///
618
/// Since values are stored in a bit-field vector access will be slower than
619
/// when using a boxed slice, but the bit width of stored values will be the
620
/// minimum necessary. It must be in any case at most the bit width of `W`.
621
///
622
/// Typically `W` will be `usize` or `u64`.
623
impl<W: Word + BinSafe, S: Sig + Send + Sync, E: ShardEdge<S, 3>> VBuilder<W, BitFieldVec<W>, S, E>
624
where
625
    u128: UpcastableFrom<W>,
626
    SigVal<S, W>: RadixKey,
627
    SigVal<E::LocalSig, W>: BitXor + BitXorAssign,
628
{
629
    pub fn try_build_func<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
27✔
630
        mut self,
631
        keys: impl RewindableIoLender<B>,
632
        values: impl RewindableIoLender<W>,
633
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
634
    ) -> anyhow::Result<VFunc<T, W, BitFieldVec<W>, S, E>> {
635
        let get_val = |_shard_edge: &E, sig_val: SigVal<E::LocalSig, W>| sig_val.val;
27✔
636
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
273✔
637
        self.build_loop(keys, values, None, &get_val, new_data, pl)
216✔
638
    }
639
}
640

641
/// Builds a new filter using a [bit-field vector](BitFieldVec) on words of the
642
/// unsigned type `W` to store values.
643
///
644
/// Since values are stored in a bit-field vector access will be slower than
645
/// when using a boxed slice, but the number of bits of the hashes can be set at
646
/// will. It must be in any case at most the bit width of `W`.
647
///
648
/// Typically `W` will be `usize` or `u64`.
649
impl<W: Word + BinSafe + DowncastableFrom<u64>, S: Sig + Send + Sync, E: ShardEdge<S, 3>>
650
    VBuilder<W, BitFieldVec<W>, S, E>
651
where
652
    u128: UpcastableFrom<W>,
653
    SigVal<S, EmptyVal>: RadixKey,
654
    SigVal<E::LocalSig, EmptyVal>: BitXor + BitXorAssign,
655
{
656
    pub fn try_build_filter<T: ?Sized + ToSig<S> + std::fmt::Debug, B: ?Sized + Borrow<T>>(
×
657
        mut self,
658
        keys: impl RewindableIoLender<B>,
659
        filter_bits: usize,
660
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
661
    ) -> anyhow::Result<VFilter<W, VFunc<T, W, BitFieldVec<W>, S, E>>> {
662
        assert!(filter_bits > 0);
×
663
        assert!(filter_bits <= W::BITS);
×
664
        let filter_mask = W::MAX >> (W::BITS - filter_bits);
×
665
        let get_val = |shard_edge: &E, sig_val: SigVal<E::LocalSig, EmptyVal>| {
×
666
            <W as DowncastableFrom<u64>>::downcast_from(mix64(shard_edge.edge_hash(sig_val.sig)))
×
667
                & filter_mask
×
668
        };
669
        let new_data = |bit_width, len| BitFieldVec::<W>::new_unaligned(bit_width, len);
×
670

671
        Ok(VFilter {
×
672
            func: self.build_loop(
×
673
                keys,
×
674
                FromIntoIterator::from(itertools::repeat_n(EmptyVal::default(), usize::MAX)),
×
675
                Some(filter_bits),
×
676
                &get_val,
×
677
                new_data,
×
678
                pl,
×
679
            )?,
680
            filter_mask,
×
681
            hash_bits: filter_bits as _,
×
682
        })
683
    }
684
}
685

686
impl<
687
    W: Word + BinSafe,
688
    D: BitFieldSlice<W>
689
        + for<'a> BitFieldSliceMut<W, ChunksMut<'a>: Iterator<Item: BitFieldSliceMut<W>>>
690
        + Send
691
        + Sync,
692
    S: Sig + Send + Sync,
693
    E: ShardEdge<S, 3>,
694
> VBuilder<W, D, S, E>
695
{
696
    /// Builds and returns a new function with given keys and values.
697
    ///
698
    /// This method can build functions based both on vectors and on bit-field
699
    /// vectors. The necessary abstraction is provided by the
700
    /// `new_data(bit_width, len)` function, which is called to create the data
701
    /// structure to store the values.
702
    ///
703
    /// When `V` is [`EmptyVal`], the this method builds a function supporting a
704
    /// filter by mapping each key to a mix of its local signature. The
705
    /// necessary abstraction is provided by the `get_val` function, which is
706
    /// called to extract the value from the signature/value pair; in the case
707
    /// of functions it returns the value stored in the signature/value pair,
708
    /// and in the case of filters it returns the hash associated with the
709
    /// signature.
710
    fn build_loop<
63✔
711
        T: ?Sized + ToSig<S> + std::fmt::Debug,
712
        B: ?Sized + Borrow<T>,
713
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
714
    >(
715
        &mut self,
716
        mut keys: impl RewindableIoLender<B>,
717
        mut values: impl RewindableIoLender<V>,
718
        bit_width: Option<usize>,
719
        get_val: &(impl Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync),
720
        new_data: fn(usize, usize) -> D,
721
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
722
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
723
    where
724
        u128: UpcastableFrom<W>,
725
        SigVal<S, V>: RadixKey,
726
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
727
        for<'a> ShardDataIter<'a, D>: Send,
728
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
729
    {
730
        let mut dup_count = 0;
126✔
731
        let mut local_dup_count = 0;
126✔
732
        let mut prng = SmallRng::seed_from_u64(self.seed);
189✔
733

734
        if let Some(expected_num_keys) = self.expected_num_keys {
115✔
735
            self.shard_edge.set_up_shards(expected_num_keys, self.eps);
×
736
            self.log2_buckets = self.shard_edge.shard_high_bits();
×
737
        }
738

739
        pl.info(format_args!("Using 2^{} buckets", self.log2_buckets));
252✔
740

741
        // Loop until success or duplicate detection
742
        loop {
×
743
            let seed = prng.random();
591✔
744

745
            match if self.offline {
197✔
746
                self.try_seed(
144✔
747
                    seed,
72✔
748
                    sig_store::new_offline::<S, V>(
72✔
749
                        self.log2_buckets,
72✔
750
                        LOG2_MAX_SHARDS,
×
751
                        self.expected_num_keys,
72✔
752
                    )?,
753
                    &mut keys,
×
754
                    &mut values,
×
755
                    bit_width,
×
756
                    get_val,
×
757
                    new_data,
×
758
                    pl,
×
759
                )
760
            } else {
761
                self.try_seed(
250✔
762
                    seed,
×
763
                    sig_store::new_online::<S, V>(
×
764
                        self.log2_buckets,
×
765
                        LOG2_MAX_SHARDS,
×
766
                        self.expected_num_keys,
×
767
                    )?,
768
                    &mut keys,
×
769
                    &mut values,
×
770
                    bit_width,
×
771
                    get_val,
×
772
                    new_data,
×
773
                    pl,
×
774
                )
775
            } {
776
                Ok(func) => {
62✔
777
                    return Ok(func);
×
778
                }
779
                Err(error) => {
135✔
780
                    match error.downcast::<SolveError>() {
135✔
781
                        Ok(vfunc_error) => match vfunc_error {
135✔
782
                            // Let's try another seed, but just a few times--most likely,
783
                            // duplicate keys
784
                            SolveError::DuplicateSignature => {
×
785
                                if dup_count >= 3 {
4✔
786
                                    pl.error(format_args!("Duplicate keys (duplicate 128-bit signatures with four different seeds)"));
4✔
787
                                    return Err(BuildError::DuplicateKey.into());
1✔
788
                                }
789
                                pl.warn(format_args!(
×
790
                                "Duplicate 128-bit signature, trying again with a different seed..."
×
791
                                ));
792
                                dup_count += 1;
×
793
                            }
794
                            // Let's try another seed, but just a few times
795
                            SolveError::DuplicateLocalSignature => {
×
796
                                if local_dup_count >= 2 {
×
797
                                    pl.error(format_args!("Duplicate local signatures: use full signatures (duplicate local signatures with three different seeds)"));
×
798
                                    return Err(BuildError::DuplicateLocalSignatures.into());
×
799
                                }
800
                                pl.warn(format_args!(
×
801
                                "Duplicate local signature, trying again with a different seed..."
×
802
                                ));
803
                                local_dup_count += 1;
×
804
                            }
805
                            SolveError::MaxShardTooBig => {
1✔
806
                                pl.warn(format_args!(
3✔
807
                                "The maximum shard is too big, trying again with a different seed..."
1✔
808
                               ));
809
                            }
810
                            // Let's just try another seed
811
                            SolveError::UnsolvableShard => {
130✔
812
                                pl.warn(format_args!(
390✔
813
                                    "Unsolvable shard, trying again with a different seed..."
130✔
814
                                ));
815
                            }
816
                        },
817
                        Err(error) => return Err(error),
×
818
                    }
819
                }
820
            }
821

822
            values = values.rewind().map_err(Into::into)?;
268✔
823
            keys = keys.rewind().map_err(Into::into)?;
134✔
824
        }
825
    }
826
}
827

828
impl<
829
    W: Word + BinSafe,
830
    D: BitFieldSlice<W>
831
        + for<'a> BitFieldSliceMut<W, ChunksMut<'a>: Iterator<Item: BitFieldSliceMut<W>>>
832
        + Send
833
        + Sync,
834
    S: Sig + Send + Sync,
835
    E: ShardEdge<S, 3>,
836
> VBuilder<W, D, S, E>
837
{
838
    /// Tries to build a function using specific seed. See the comments in the
839
    /// [`VBuilder::build_loop`] method for more details.
840
    ///
841
    /// This methods reads the input, sets up the shards, allocates the backend
842
    /// using `new_data`, and passes the backend and an iterator on shards to
843
    /// the [`VBuilder::try_build_from_shard_iter`] method.
844
    fn try_seed<
197✔
845
        T: ?Sized + ToSig<S> + std::fmt::Debug,
846
        B: ?Sized + Borrow<T>,
847
        V: BinSafe + Default + Send + Sync + Ord + UpcastableInto<u128>,
848
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
849
    >(
850
        &mut self,
851
        seed: u64,
852
        mut sig_store: impl SigStore<S, V>,
853
        keys: &mut impl RewindableIoLender<B>,
854
        values: &mut impl RewindableIoLender<V>,
855
        bit_width: Option<usize>,
856
        get_val: &G,
857
        new_data: fn(usize, usize) -> D,
858
        pl: &mut (impl ProgressLog + Clone + Send + Sync),
859
    ) -> anyhow::Result<VFunc<T, W, D, S, E>>
860
    where
861
        SigVal<S, V>: RadixKey,
862
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
863
        for<'a> ShardDataIter<'a, D>: Send,
864
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
865
    {
866
        let shard_edge = &mut self.shard_edge;
394✔
867

868
        pl.expected_updates(self.expected_num_keys);
591✔
869
        pl.item_name("key");
394✔
870
        pl.start(format!(
591✔
871
            "Computing and storing {}-bit signatures in {} using seed 0x{:016x}...",
197✔
872
            std::mem::size_of::<S>() * 8,
197✔
873
            sig_store
197✔
874
                .temp_dir()
197✔
875
                .map(|d| d.path().to_string_lossy())
341✔
876
                .unwrap_or(Cow::Borrowed("memory")),
394✔
877
            seed
×
878
        ));
879

880
        // This will be the maximum value for functions and EmptyVal for filters
881
        let mut maybe_max_value = V::default();
394✔
882
        let start = Instant::now();
394✔
883

884
        while let Some(result) = keys.next() {
46,361,677✔
885
            match result {
×
886
                Ok(key) => {
23,180,740✔
887
                    pl.light_update();
×
888
                    // This might be an actual value, if we are building a
889
                    // function, or EmptyVal, if we are building a filter.
890
                    let &maybe_val = values
23,180,740✔
891
                        .next()
892
                        .expect("Not enough values")
893
                        .map_err(Into::into)?;
×
894
                    let sig_val = SigVal {
895
                        sig: T::to_sig(key.borrow(), seed),
×
896
                        val: maybe_val,
897
                    };
898
                    maybe_max_value = Ord::max(maybe_max_value, maybe_val);
×
899
                    sig_store.try_push(sig_val)?;
×
900
                }
901
                Err(e) => return Err(e.into()),
×
902
            }
903
        }
904
        pl.done();
394✔
905

906
        self.num_keys = sig_store.len();
197✔
907
        self.bit_width = if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
197✔
908
            bit_width.expect("Bit width must be set for filters")
345✔
909
        } else {
910
            let len_width = maybe_max_value.upcast().len() as usize;
82✔
911
            if let Some(bit_width) = bit_width {
×
912
                if len_width > bit_width {
×
913
                    return Err(BuildError::ValueTooLarge.into());
×
914
                }
915
                bit_width
×
916
            } else {
917
                len_width
82✔
918
            }
919
        };
920

921
        info!(
197✔
922
            "Computation of signatures from inputs completed in {:.3} seconds ({} keys, {:.3} ns/key)",
160✔
923
            start.elapsed().as_secs_f64(),
320✔
924
            self.num_keys,
×
925
            start.elapsed().as_nanos() as f64 / self.num_keys as f64
160✔
926
        );
927

928
        let start = Instant::now();
×
929

930
        let shard_store = sig_store.into_shard_store(shard_edge.shard_high_bits())?;
197✔
931
        let max_shard = shard_store.shard_sizes().iter().copied().max().unwrap_or(0);
×
932
        let filter = TypeId::of::<V>() == TypeId::of::<EmptyVal>();
×
933

934
        shard_edge.set_up_shards(self.num_keys, self.eps);
×
935
        (self.c, self.lge) = shard_edge.set_up_graphs(self.num_keys, max_shard);
×
936

937
        if filter {
115✔
938
            pl.info(format_args!(
345✔
939
                "Number of keys: {} Bit width: {}",
115✔
940
                self.num_keys, self.bit_width,
115✔
941
            ));
942
        } else {
943
            pl.info(format_args!(
82✔
944
                "Number of keys: {} Max value: {} Bit width: {}",
82✔
945
                self.num_keys,
82✔
946
                maybe_max_value.upcast(),
82✔
947
                self.bit_width,
82✔
948
            ));
949
        }
950

951
        if shard_edge.shard_high_bits() != 0 {
6✔
952
            pl.info(format_args!(
24✔
953
                "Max shard / average shard: {:.2}%",
12✔
954
                (100.0 * max_shard as f64)
12✔
955
                    / (self.num_keys as f64 / shard_edge.num_shards() as f64)
12✔
956
            ));
957
        }
958

959
        if max_shard as f64 > 1.01 * self.num_keys as f64 / shard_edge.num_shards() as f64 {
×
960
            // This might sometimes happen with small sharded graphs
961
            Err(SolveError::MaxShardTooBig.into())
1✔
962
        } else {
963
            let data = new_data(
964
                self.bit_width,
×
965
                shard_edge.num_vertices() * shard_edge.num_shards(),
×
966
            );
967
            self.try_build_from_shard_iter(seed, data, shard_store.into_iter(), get_val, pl)
×
968
                .inspect(|_| {
62✔
969
                    info!(
62✔
970
                        "Construction from signatures completed in {:.3} seconds ({} keys, {:.3} ns/key)",
52✔
971
                        start.elapsed().as_secs_f64(),
104✔
972
                        self.num_keys,
×
973
                        start.elapsed().as_nanos() as f64 / self.num_keys as f64
52✔
974
                    );
975
                })
976
                .map_err(Into::into)
×
977
        }
978
    }
979

980
    /// Builds and return a new function starting from an iterator on shards.
981
    ///
982
    /// See [`VBuilder::build_loop`] for more details on the parameters.
983
    fn try_build_from_shard_iter<
196✔
984
        T: ?Sized + ToSig<S>,
985
        I,
986
        P,
987
        V: BinSafe + Default + Send + Sync,
988
        G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync,
989
    >(
990
        &mut self,
991
        seed: u64,
992
        mut data: D,
993
        shard_iter: I,
994
        get_val: &G,
995
        pl: &mut P,
996
    ) -> Result<VFunc<T, W, D, S, E>, SolveError>
997
    where
998
        SigVal<S, V>: RadixKey,
999
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign,
1000
        P: ProgressLog + Clone + Send + Sync,
1001
        I: Iterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1002
        for<'a> ShardDataIter<'a, D>: Send,
1003
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
1004
    {
1005
        let shard_edge = &self.shard_edge;
392✔
1006
        self.num_threads = shard_edge.num_shards().min(self.max_num_threads);
588✔
1007

1008
        pl.info(format_args!("{}", self.shard_edge));
784✔
1009
        pl.info(format_args!(
588✔
1010
            "c: {}, Overhead: {:+.4}% Number of threads: {}",
196✔
1011
            self.c,
×
1012
            100. * ((shard_edge.num_vertices() * shard_edge.num_shards()) as f64
588✔
1013
                / (self.num_keys as f64)
196✔
1014
                - 1.),
196✔
1015
            self.num_threads
×
1016
        ));
1017

1018
        if self.lge {
196✔
1019
            pl.info(format_args!("Peeling towards lazy Gaussian elimination"));
696✔
1020
            self.par_solve(
348✔
1021
                shard_iter,
174✔
1022
                &mut data,
174✔
1023
                |this, shard_index, shard, data, pl| {
165✔
1024
                    this.lge_shard(shard_index, shard, data, get_val, pl)
1,155✔
1025
                },
1026
                &mut pl.concurrent(),
174✔
1027
                pl,
174✔
1028
            )?;
1029
        } else if self.low_mem == Some(true)
22✔
1030
            || self.low_mem.is_none() && self.num_threads > 3 && shard_edge.num_shards() > 2
14✔
1031
        {
1032
            // Less memory, but slower
1033
            self.par_solve(
10✔
1034
                shard_iter,
×
1035
                &mut data,
×
1036
                |this, shard_index, shard, data, pl| {
10✔
1037
                    this.peel_by_sig_vals_low_mem(shard_index, shard, data, get_val, pl)
70✔
1038
                },
1039
                &mut pl.concurrent(),
×
1040
                pl,
×
1041
            )?;
1042
        } else {
1043
            // More memory, but faster
1044
            self.par_solve(
12✔
1045
                shard_iter,
×
1046
                &mut data,
×
1047
                |this, shard_index, shard, data, pl| {
12✔
1048
                    this.peel_by_sig_vals_high_mem(shard_index, shard, data, get_val, pl)
84✔
1049
                },
1050
                &mut pl.concurrent(),
×
1051
                pl,
×
1052
            )?;
1053
        }
1054

1055
        pl.info(format_args!(
62✔
1056
            "Bits/keys: {} ({:+.4}%)",
×
1057
            data.len() as f64 * self.bit_width as f64 / self.num_keys as f64,
×
1058
            100.0 * (data.len() as f64 / self.num_keys as f64 - 1.),
×
1059
        ));
1060

1061
        Ok(VFunc {
×
1062
            seed,
×
1063
            shard_edge: self.shard_edge,
×
1064
            num_keys: self.num_keys,
×
1065
            data,
×
1066
            _marker_t: std::marker::PhantomData,
×
1067
            _marker_w: std::marker::PhantomData,
×
1068
            _marker_s: std::marker::PhantomData,
×
1069
        })
1070
    }
1071
}
1072

1073
macro_rules! remove_edge {
1074
    ($xor_graph: ident, $e: ident, $side: ident, $edge: ident, $stack: ident, $push:ident) => {
1075
        match $side {
1076
            0 => {
1077
                if $xor_graph.degree($e[1]) == 2 {
1078
                    $stack.$push($e[1].cast());
1079
                }
1080
                $xor_graph.remove($e[1], $edge, 1);
1081
                if $xor_graph.degree($e[2]) == 2 {
1082
                    $stack.$push($e[2].cast());
1083
                }
1084
                $xor_graph.remove($e[2], $edge, 2);
1085
            }
1086
            1 => {
1087
                if $xor_graph.degree($e[0]) == 2 {
1088
                    $stack.$push($e[0].cast());
1089
                }
1090
                $xor_graph.remove($e[0], $edge, 0);
1091
                if $xor_graph.degree($e[2]) == 2 {
1092
                    $stack.$push($e[2].cast());
1093
                }
1094
                $xor_graph.remove($e[2], $edge, 2);
1095
            }
1096
            2 => {
1097
                if $xor_graph.degree($e[0]) == 2 {
1098
                    $stack.$push($e[0].cast());
1099
                }
1100
                $xor_graph.remove($e[0], $edge, 0);
1101
                if $xor_graph.degree($e[1]) == 2 {
1102
                    $stack.$push($e[1].cast());
1103
                }
1104
                $xor_graph.remove($e[1], $edge, 1);
1105
            }
1106
            _ => unsafe { unreachable_unchecked() },
1107
        }
1108
    };
1109
}
1110

1111
impl<
1112
    W: Word + BinSafe + Send + Sync,
1113
    D: BitFieldSlice<W>
1114
        + for<'a> BitFieldSliceMut<W, ChunksMut<'a>: Iterator<Item: BitFieldSliceMut<W>>>
1115
        + Send
1116
        + Sync,
1117
    S: Sig + BinSafe,
1118
    E: ShardEdge<S, 3>,
1119
> VBuilder<W, D, S, E>
1120
{
1121
    fn count_sort<V: BinSafe>(&self, data: &mut [SigVal<S, V>]) {
187✔
1122
        let num_sort_keys = self.shard_edge.num_sort_keys();
561✔
1123
        let mut count = vec![0; num_sort_keys];
561✔
1124

1125
        let mut copied = Box::new_uninit_slice(data.len());
748✔
1126
        for (&sig_val, copy) in data.iter().zip(copied.iter_mut()) {
23,081,635✔
1127
            count[self.shard_edge.sort_key(sig_val.sig)] += 1;
×
1128
            copy.write(sig_val);
×
1129
        }
1130
        let copied = unsafe { copied.assume_init() };
561✔
1131

1132
        count.iter_mut().fold(0, |acc, c| {
8,613✔
1133
            let old = *c;
16,478✔
1134
            *c = acc;
8,239✔
1135
            acc + old
8,239✔
1136
        });
1137

1138
        for &sig_val in copied.iter() {
23,081,074✔
1139
            let key = self.shard_edge.sort_key(sig_val.sig);
×
1140
            data[count[key]] = sig_val;
×
1141
            count[key] += 1;
×
1142
        }
1143
    }
1144

1145
    /// After this number of keys, in the case of filters we remove duplicate
1146
    /// edges.
1147
    const MAX_NO_LOCAL_SIG_CHECK: usize = 1 << 33;
1148

1149
    /// Solves in parallel shards returned by an iterator, storing
1150
    /// the result in `data`.
1151
    ///
1152
    /// # Arguments
1153
    ///
1154
    /// * `shard_iter`: an iterator returning the shards to solve.
1155
    ///
1156
    /// * `data`: the storage for the solution values.
1157
    ///
1158
    /// * `solve_shard`: a method to solve a shard; it takes the shard index,
1159
    ///   the shard, shard-local data, and a progress logger. It may
1160
    ///   fail by returning an error.
1161
    ///
1162
    /// * `main_pl`: the progress logger for the overall computation.
1163
    ///
1164
    /// * `pl`: a progress logger that will be cloned to display the progress of
1165
    ///   a current shard.
1166
    ///
1167
    /// # Errors
1168
    ///
1169
    /// This method returns an error if one of the shards cannot be solved, or
1170
    /// if duplicates are detected.
1171
    fn par_solve<
196✔
1172
        'b,
1173
        V: BinSafe,
1174
        I: IntoIterator<Item = Arc<Vec<SigVal<S, V>>>> + Send,
1175
        SS: Fn(&Self, usize, Arc<Vec<SigVal<S, V>>>, ShardData<'b, D>, &mut P) -> Result<(), ()>
1176
            + Send
1177
            + Sync
1178
            + Copy,
1179
        C: ConcurrentProgressLog + Send + Sync,
1180
        P: ProgressLog + Clone + Send + Sync,
1181
    >(
1182
        &self,
1183
        shard_iter: I,
1184
        data: &'b mut D,
1185
        solve_shard: SS,
1186
        main_pl: &mut C,
1187
        pl: &mut P,
1188
    ) -> Result<(), SolveError>
1189
    where
1190
        I::IntoIter: Send,
1191
        SigVal<S, V>: RadixKey,
1192
        for<'a> ShardDataIter<'a, D>: Send,
1193
        for<'a> <ShardDataIter<'a, D> as Iterator>::Item: Send,
1194
    {
1195
        main_pl
392✔
1196
            .item_name("shard")
1197
            .expected_updates(Some(self.shard_edge.num_shards()))
392✔
1198
            .display_memory(true)
1199
            .start("Solving shards...");
1200

1201
        self.failed.store(false, Ordering::Relaxed);
588✔
1202
        let num_shards = self.shard_edge.num_shards();
588✔
1203
        let buffer_size = self.num_threads.ilog2() as usize;
392✔
1204

1205
        let (err_send, err_recv) = crossbeam_channel::bounded::<_>(self.num_threads);
588✔
1206
        let (data_send, data_recv) = crossbeam_channel::bounded::<(
1207
            usize,
1208
            (Arc<Vec<SigVal<S, V>>>, ShardData<'_, D>),
1209
        )>(buffer_size);
196✔
1210

1211
        let result = std::thread::scope(|scope| {
588✔
1212
            scope.spawn(move || {
588✔
1213
                let _ = thread_priority::set_current_thread_priority(ThreadPriority::Max);
196✔
1214
                for val in shard_iter
397✔
1215
                    .into_iter()
196✔
1216
                    .zip(data.try_chunks_mut(self.shard_edge.num_vertices()).unwrap())
1,176✔
1217
                    .enumerate()
196✔
1218
                {
1219
                    if data_send.send(val).is_err() {
603✔
1220
                        break;
×
1221
                    }
1222
                }
1223

1224
                drop(data_send);
392✔
1225
            });
1226

1227
            for _thread_id in 0..self.num_threads {
397✔
1228
                let mut main_pl = main_pl.clone();
×
1229
                let mut pl = pl.clone();
×
1230
                let err_send = err_send.clone();
×
1231
                let data_recv = data_recv.clone();
×
1232
                scope.spawn(move || {
201✔
1233
                    loop {
×
1234
                        match data_recv.recv() {
258✔
1235
                            Err(_) => return,
57✔
1236
                            Ok((shard_index, (shard, mut data))) => {
201✔
1237
                                if shard.is_empty() {
×
1238
                                    return;
10✔
1239
                                }
1240

1241
                                main_pl.info(format_args!(
×
1242
                                    "Analyzing shard {}/{}...",
×
1243
                                    shard_index + 1,
×
1244
                                    num_shards
×
1245
                                ));
1246

1247
                                pl.start(format!(
×
1248
                                    "Sorting shard {}/{}...",
×
1249
                                    shard_index + 1,
×
1250
                                    num_shards
×
1251
                                ));
1252

1253
                                {
1254
                                    // SAFETY: only one thread may be accessing the
1255
                                    // shard, and we will be consuming it
1256
                                    let shard = unsafe {
×
1257
                                        &mut *(Arc::as_ptr(&shard) as *mut Vec<SigVal<S, V>>)
×
1258
                                    };
1259

1260
                                    if self.check_dups {
×
1261
                                        shard.radix_sort_builder().sort();
12✔
1262
                                        if shard.par_windows(2).any(|w| w[0].sig == w[1].sig) {
18✔
1263
                                            let _ = err_send.send(SolveError::DuplicateSignature);
8✔
1264
                                            return;
4✔
1265
                                        }
1266
                                    }
1267

1268
                                    if TypeId::of::<E::LocalSig>() != TypeId::of::<S>()
187✔
1269
                                        && self.num_keys > Self::MAX_NO_LOCAL_SIG_CHECK
109✔
1270
                                    {
1271
                                        // Check for duplicate local signatures
1272

1273
                                        // E::SortSig provides a transmutable
1274
                                        // view of SigVal with an implementation
1275
                                        // of RadixKey that is compatible with
1276
                                        // the sort induced by the key returned
1277
                                        // by ShardEdge::sort_key, and equality
1278
                                        // that implies equality of local
1279
                                        // signatures.
1280

1281
                                        // SAFETY: we drop this immediately after sorting.
1282
                                        let shard = unsafe {
×
1283
                                            transmute::<
×
1284
                                                &mut Vec<SigVal<S, V>>,
×
1285
                                                &mut Vec<E::SortSigVal<V>>,
×
1286
                                            >(shard)
×
1287
                                        };
1288

1289
                                        let builder = shard.radix_sort_builder();
×
1290
                                        if self.max_num_threads == 1 {
×
1291
                                            builder
×
1292
                                                .with_single_threaded_tuner()
×
1293
                                                .with_parallel(false)
×
1294
                                        } else {
1295
                                            builder
×
1296
                                        }
1297
                                        .sort();
×
1298

1299
                                        let shard_len = shard.len();
×
1300
                                        shard.dedup();
×
1301

1302
                                        if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
×
1303
                                            // Duplicate local signatures on
1304
                                            // large filters can be simply
1305
                                            // removed: they do not change the
1306
                                            // semantics of the filter because
1307
                                            // hashes are computed on
1308
                                            // local signatures.
1309
                                            pl.info(format_args!(
×
1310
                                                "Removed signatures: {}",
×
1311
                                                shard_len - shard.len()
×
1312
                                            ));
1313
                                        } else {
1314
                                            // For function, we have to try again
1315
                                            if shard_len != shard.len() {
×
1316
                                                let _ = err_send
×
1317
                                                    .send(SolveError::DuplicateLocalSignature);
×
1318
                                                return;
×
1319
                                            }
1320
                                        }
1321
                                    } else if self.shard_edge.num_sort_keys() != 1 {
374✔
1322
                                        // Sorting the signatures increases locality
1323
                                        self.count_sort::<V>(shard);
374✔
1324
                                    }
1325
                                }
1326

1327
                                pl.done_with_count(shard.len());
187✔
1328

1329
                                main_pl.info(format_args!(
×
1330
                                    "Solving shard {}/{}...",
×
1331
                                    shard_index + 1,
×
1332
                                    num_shards
×
1333
                                ));
1334

1335
                                if self.failed.load(Ordering::Relaxed) {
×
1336
                                    return;
×
1337
                                }
1338

1339
                                if TypeId::of::<V>() == TypeId::of::<EmptyVal>() {
111✔
1340
                                    // For filters, we fill the array with random data, otherwise
1341
                                    // elements with signature 0 would have a significantly higher
1342
                                    // probability of being false positives.
1343
                                    //
1344
                                    // SAFETY: We work around the fact that [usize] does not implement Fill
1345
                                    Mwc192::seed_from_u64(self.seed).fill_bytes(unsafe {
333✔
1346
                                        data.as_mut_slice().align_to_mut::<u8>().1
111✔
1347
                                    });
1348
                                }
1349

1350
                                if solve_shard(self, shard_index, shard, data, &mut pl).is_err() {
×
1351
                                    let _ = err_send.send(SolveError::UnsolvableShard);
260✔
1352
                                    return;
130✔
1353
                                }
1354

1355
                                if self.failed.load(Ordering::Relaxed) {
×
1356
                                    return;
×
1357
                                }
1358

1359
                                main_pl.info(format_args!(
×
1360
                                    "Completed shard {}/{}",
×
1361
                                    shard_index + 1,
×
1362
                                    num_shards
×
1363
                                ));
1364
                                main_pl.update_and_display();
×
1365
                            }
1366
                        }
1367
                    }
1368
                });
1369
            }
1370

1371
            drop(err_send);
392✔
1372
            drop(data_recv);
392✔
1373

1374
            if let Some(error) = err_recv.into_iter().next() {
330✔
1375
                self.failed.store(true, Ordering::Relaxed);
×
1376
                return Err(error);
×
1377
            }
1378

1379
            Ok(())
62✔
1380
        });
1381

1382
        main_pl.done();
392✔
1383
        result
196✔
1384
    }
1385

1386
    /// Peels a shard via edge indices.
1387
    ///
1388
    /// This peeler uses a [`SigVal`] per key (the shard), a
1389
    /// [`ShardEdge::Vertex`] and a byte per vertex (for the [`XorGraph`]), a
1390
    /// [`ShardEdge::Vertex`] per vertex (for the [`DoubleStack`]), and a final
1391
    /// byte per vertex (for the stack of sides).
1392
    ///
1393
    /// This peeler uses more memory than
1394
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) but
1395
    /// less memory than
1396
    /// [`peel_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem). It
1397
    /// is fairly slow as it has to go through a cache-unfriendly memory
1398
    /// indirection every time it has to retrieve a [`SigVal`] from the shard,
1399
    /// but it is the peeler of choice when [lazy Gaussian
1400
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) is required, as
1401
    /// after a failed peel-by-sig-vals it is not possible to retrieve
1402
    /// information about the signature/value pairs in the shard.
1403
    ///
1404
    /// In theory one could avoid the stack of sides by putting vertices,
1405
    /// instead of edge indices, on the upper stack, and retrieving edge indices
1406
    /// and sides from the [`XorGraph`], as
1407
    /// [`peel_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem) does,
1408
    ///  but this would be less cache friendly. This peeler is only used for
1409
    /// very small instances, and since we are going to pass through lazy
1410
    /// Gaussian elimination some additional speed is a good idea.
1411
    fn peel_by_index<'a, V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1412
        &self,
1413
        shard_index: usize,
1414
        shard: Arc<Vec<SigVal<S, V>>>,
1415
        data: ShardData<'a, D>,
1416
        get_val: &G,
1417
        pl: &mut impl ProgressLog,
1418
    ) -> Result<PeelResult<'a, W, D, S, E, V>, ()> {
1419
        let shard_edge = &self.shard_edge;
330✔
1420
        let num_vertices = shard_edge.num_vertices();
495✔
1421
        let num_shards = shard_edge.num_shards();
495✔
1422

1423
        pl.start(format!(
495✔
1424
            "Generating graph for shard {}/{}...",
165✔
1425
            shard_index + 1,
165✔
1426
            num_shards
×
1427
        ));
1428

1429
        let mut xor_graph = XorGraph::<E::Vertex>::new(num_vertices);
495✔
1430
        for (edge_index, sig_val) in shard.iter().enumerate() {
1,081,030✔
1431
            for (side, &v) in shard_edge
3,242,100✔
1432
                .local_edge(shard_edge.local_sig(sig_val.sig))
×
1433
                .iter()
×
1434
                .enumerate()
×
1435
            {
1436
                xor_graph.add(v, edge_index.cast(), side);
×
1437
            }
1438
        }
1439
        pl.done_with_count(shard.len());
660✔
1440

1441
        assert!(
165✔
1442
            !xor_graph.overflow,
165✔
1443
            "Degree overflow for shard {}/{}",
×
1444
            shard_index + 1,
×
1445
            num_shards
×
1446
        );
1447

1448
        if self.failed.load(Ordering::Relaxed) {
495✔
1449
            return Err(());
×
1450
        }
1451

1452
        // The lower stack contains vertices to be visited. The upper stack
1453
        // contains peeled edges.
1454
        let mut double_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1455
        let mut sides_stack = Vec::<u8>::new();
×
1456

1457
        pl.start(format!(
×
1458
            "Peeling graph for shard {}/{} by edge indices...",
×
1459
            shard_index + 1,
×
1460
            num_shards
×
1461
        ));
1462

1463
        // Preload all vertices of degree one in the visit stack
1464
        for (v, degree) in xor_graph.degrees().enumerate() {
1,223,124✔
1465
            if degree == 1 {
229,108✔
1466
                double_stack.push_lower(v.cast());
687,324✔
1467
            }
1468
        }
1469

1470
        while let Some(v) = double_stack.pop_lower() {
967,413✔
1471
            let v: usize = v.upcast();
×
1472
            if xor_graph.degree(v) == 0 {
×
1473
                continue;
32,940✔
1474
            }
1475
            debug_assert!(xor_graph.degree(v) == 1);
1,352,052✔
1476
            let (edge_index, side) = xor_graph.edge_and_side(v);
1,802,736✔
1477
            xor_graph.zero(v);
1,352,052✔
1478
            double_stack.push_upper(edge_index);
1,352,052✔
1479
            sides_stack.push(side as u8);
1,352,052✔
1480

1481
            let e = shard_edge.local_edge(shard_edge.local_sig(shard[edge_index.upcast()].sig));
3,154,788✔
1482
            remove_edge!(xor_graph, e, side, edge_index, double_stack, push_lower);
450,684✔
1483
        }
1484

1485
        pl.done();
330✔
1486

1487
        if shard.len() != double_stack.upper_len() {
495✔
1488
            pl.info(format_args!(
465✔
1489
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
155✔
1490
                shard_index + 1,
155✔
1491
                num_shards,
×
1492
                double_stack.upper_len(),
310✔
1493
                shard.len(),
310✔
1494
            ));
1495
            return Ok(PeelResult::Partial {
155✔
1496
                shard_index,
310✔
1497
                shard,
310✔
1498
                data,
310✔
1499
                double_stack,
310✔
1500
                sides_stack,
155✔
1501
                _marker: std::marker::PhantomData,
155✔
1502
            });
1503
        }
1504

1505
        self.assign(
×
1506
            shard_index,
×
1507
            data,
×
1508
            double_stack
×
1509
                .iter_upper()
×
1510
                .map(|&edge_index| {
100✔
1511
                    // Turn edge indices into local edge signatures
1512
                    // and associated values
1513
                    let sig_val = &shard[edge_index.upcast()];
300✔
1514
                    let local_sig = shard_edge.local_sig(sig_val.sig);
400✔
1515
                    (
1516
                        local_sig,
200✔
1517
                        get_val(
200✔
1518
                            shard_edge,
200✔
1519
                            SigVal {
100✔
1520
                                sig: local_sig,
100✔
1521
                                val: sig_val.val,
100✔
1522
                            },
1523
                        ),
1524
                    )
1525
                })
1526
                .zip(sides_stack.into_iter().rev()),
×
1527
            pl,
×
1528
        );
1529

1530
        Ok(PeelResult::Complete())
×
1531
    }
1532

1533
    /// Peels a shard via signature/value pairs using a stack of peeled
1534
    /// signatures/value pairs.
1535
    ///
1536
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1537
    /// it drops it immediately after building the graph.
1538
    ///
1539
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]), a
1540
    /// [`ShardEdge::Vertex`] per vertex (for visit stack, albeit usually the stack
1541
    /// never contains more than a third of the vertices), and a [`SigVal`] and
1542
    /// a byte per key (for the stack of peeled edges).
1543
    ///
1544
    /// This is the fastest and more memory-consuming peeler. It has however
1545
    /// just a small advantage during assignment with respect to
1546
    /// [`peek_by_sig_vals_low_mem`](VBuilder::peel_by_sig_vals_low_mem), which
1547
    /// uses almost half the memory. It is the peeler of choice for low levels
1548
    /// of parallelism.
1549
    ///
1550
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1551
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a failed
1552
    /// peeling it is not possible to retrieve information about the
1553
    /// signature/value pairs in the shard.
1554
    fn peel_by_sig_vals_high_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
12✔
1555
        &self,
1556
        shard_index: usize,
1557
        shard: Arc<Vec<SigVal<S, V>>>,
1558
        data: ShardData<'_, D>,
1559
        get_val: &G,
1560
        pl: &mut impl ProgressLog,
1561
    ) -> Result<(), ()>
1562
    where
1563
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1564
    {
1565
        let shard_edge = &self.shard_edge;
24✔
1566
        let num_vertices = shard_edge.num_vertices();
36✔
1567
        let num_shards = shard_edge.num_shards();
36✔
1568
        let shard_len = shard.len();
36✔
1569

1570
        pl.start(format!(
36✔
1571
            "Generating graph for shard {}/{}...",
12✔
1572
            shard_index + 1,
12✔
1573
            num_shards
×
1574
        ));
1575

1576
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
36✔
1577
        for &sig_val in shard.iter() {
12,000,024✔
1578
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1579
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
36,000,000✔
1580
                xor_graph.add(
×
1581
                    v,
×
1582
                    SigVal {
×
1583
                        sig: local_sig,
×
1584
                        val: sig_val.val,
×
1585
                    },
1586
                    side,
×
1587
                );
1588
            }
1589
        }
1590
        pl.done_with_count(shard.len());
48✔
1591

1592
        // We are using a consuming iterator over the shard store, so this
1593
        // drop will free the memory used by the signatures
1594
        drop(shard);
24✔
1595

1596
        assert!(
12✔
1597
            !xor_graph.overflow,
12✔
1598
            "Degree overflow for shard {}/{}",
×
1599
            shard_index + 1,
×
1600
            num_shards
×
1601
        );
1602

1603
        if self.failed.load(Ordering::Relaxed) {
36✔
1604
            return Err(());
×
1605
        }
1606

1607
        let mut sig_vals_stack = FastStack::<SigVal<E::LocalSig, V>>::new(shard_len);
×
1608
        let mut sides_stack = FastStack::<u8>::new(shard_len);
×
1609
        // Experimentally this stack never grows beyond a little more than
1610
        // num_vertices / 4
1611
        let mut visit_stack = Vec::<E::Vertex>::with_capacity(num_vertices / 3);
×
1612

1613
        pl.start(format!(
×
1614
            "Peeling graph for shard {}/{} by signatures (high-mem)...",
×
1615
            shard_index + 1,
×
1616
            num_shards
×
1617
        ));
1618

1619
        // Preload all vertices of degree one in the visit stack
1620
        for (v, degree) in xor_graph.degrees().enumerate() {
13,565,952✔
1621
            if degree == 1 {
2,530,634✔
1622
                visit_stack.push(v.cast());
7,591,902✔
1623
            }
1624
        }
1625

1626
        while let Some(v) = visit_stack.pop() {
25,122,948✔
1627
            let v: usize = v.upcast();
×
1628
            if xor_graph.degree(v) == 0 {
×
1629
                continue;
561,468✔
1630
            }
1631
            let (sig_val, side) = xor_graph.edge_and_side(v);
48,000,000✔
1632
            xor_graph.zero(v);
36,000,000✔
1633
            sig_vals_stack.push(sig_val);
36,000,000✔
1634
            sides_stack.push(side as u8);
36,000,000✔
1635

1636
            let e = self.shard_edge.local_edge(sig_val.sig);
48,000,000✔
1637
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push);
12,000,000✔
1638
        }
1639

1640
        pl.done();
24✔
1641

1642
        if shard_len != sig_vals_stack.len() {
24✔
1643
            pl.info(format_args!(
×
1644
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1645
                shard_index + 1,
×
1646
                num_shards,
×
1647
                sig_vals_stack.len(),
×
1648
                shard_len
×
1649
            ));
1650
            return Err(());
×
1651
        }
1652

1653
        self.assign(
×
1654
            shard_index,
×
1655
            data,
×
1656
            sig_vals_stack
×
1657
                .iter()
×
1658
                .rev()
×
1659
                .map(|&sig_val| (sig_val.sig, get_val(shard_edge, sig_val)))
36,000,000✔
1660
                .zip(sides_stack.iter().copied().rev()),
×
1661
            pl,
×
1662
        );
1663

1664
        Ok(())
×
1665
    }
1666

1667
    /// Peels a shard via signature/value pairs using a stack of vertices to
1668
    /// represent peeled edges.
1669
    ///
1670
    /// This peeler does not need the shard once the [`XorGraph`] is built, so
1671
    /// it drops it immediately after building the graph.
1672
    ///
1673
    /// It uses a [`SigVal`] and a byte per vertex (for the [`XorGraph`]) and a
1674
    /// [`ShardEdge::Vertex`] per vertex (for a [`DoubleStack`]).
1675
    ///
1676
    /// This is by far the less memory-hungry peeler, and it is just slightly
1677
    /// slower than
1678
    /// [`peek_by_sig_vals_high_mem`](VBuilder::peel_by_sig_vals_high_mem),
1679
    /// which uses almost twice the memory. It is the peeler of choice for
1680
    /// significant levels of parallelism.
1681
    ///
1682
    /// This peeler cannot be used in conjunction with [lazy Gaussian
1683
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517) as after a
1684
    /// failed peeling it is not possible to retrieve information about the
1685
    /// signature/value pairs in the shard.
1686
    fn peel_by_sig_vals_low_mem<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
10✔
1687
        &self,
1688
        shard_index: usize,
1689
        shard: Arc<Vec<SigVal<S, V>>>,
1690
        data: ShardData<'_, D>,
1691
        get_val: &G,
1692
        pl: &mut impl ProgressLog,
1693
    ) -> Result<(), ()>
1694
    where
1695
        SigVal<E::LocalSig, V>: BitXor + BitXorAssign + Default,
1696
    {
1697
        let shard_edge = &self.shard_edge;
20✔
1698
        let num_vertices = shard_edge.num_vertices();
30✔
1699
        let num_shards = shard_edge.num_shards();
30✔
1700
        let shard_len = shard.len();
30✔
1701

1702
        pl.start(format!(
30✔
1703
            "Generating graph for shard {}/{}...",
10✔
1704
            shard_index + 1,
10✔
1705
            num_shards,
×
1706
        ));
1707

1708
        let mut xor_graph = XorGraph::<SigVal<E::LocalSig, V>>::new(num_vertices);
30✔
1709
        for &sig_val in shard.iter() {
10,000,020✔
1710
            let local_sig = shard_edge.local_sig(sig_val.sig);
×
1711
            for (side, &v) in shard_edge.local_edge(local_sig).iter().enumerate() {
30,000,000✔
1712
                xor_graph.add(
×
1713
                    v,
×
1714
                    SigVal {
×
1715
                        sig: local_sig,
×
1716
                        val: sig_val.val,
×
1717
                    },
1718
                    side,
×
1719
                );
1720
            }
1721
        }
1722
        pl.done_with_count(shard.len());
40✔
1723

1724
        // We are using a consuming iterator over the shard store, so this
1725
        // drop will free the memory used by the signatures
1726
        drop(shard);
20✔
1727

1728
        assert!(
10✔
1729
            !xor_graph.overflow,
10✔
1730
            "Degree overflow for shard {}/{}",
×
1731
            shard_index + 1,
×
1732
            num_shards
×
1733
        );
1734

1735
        if self.failed.load(Ordering::Relaxed) {
30✔
1736
            return Err(());
×
1737
        }
1738

1739
        let mut visit_stack = DoubleStack::<E::Vertex>::new(num_vertices);
×
1740

1741
        pl.start(format!(
×
1742
            "Peeling graph for shard {}/{} by signatures (low-mem)...",
×
1743
            shard_index + 1,
×
1744
            num_shards
×
1745
        ));
1746

1747
        // Preload all vertices of degree one in the visit stack
1748
        for (v, degree) in xor_graph.degrees().enumerate() {
11,304,960✔
1749
            if degree == 1 {
2,108,606✔
1750
                visit_stack.push_lower(v.cast());
6,325,818✔
1751
            }
1752
        }
1753

1754
        while let Some(v) = visit_stack.pop_lower() {
20,935,042✔
1755
            let v: usize = v.upcast();
×
1756
            if xor_graph.degree(v) == 0 {
×
1757
                continue;
467,516✔
1758
            }
1759
            let (sig_val, side) = xor_graph.edge_and_side(v);
40,000,000✔
1760
            xor_graph.zero(v);
30,000,000✔
1761
            visit_stack.push_upper(v.cast());
40,000,000✔
1762

1763
            let e = self.shard_edge.local_edge(sig_val.sig);
40,000,000✔
1764
            remove_edge!(xor_graph, e, side, sig_val, visit_stack, push_lower);
10,000,000✔
1765
        }
1766

1767
        pl.done();
20✔
1768

1769
        if shard_len != visit_stack.upper_len() {
20✔
1770
            pl.info(format_args!(
×
1771
                "Peeling failed for shard {}/{} (peeled {} out of {} edges)",
×
1772
                shard_index + 1,
×
1773
                num_shards,
×
1774
                visit_stack.upper_len(),
×
1775
                shard_len
×
1776
            ));
1777
            return Err(());
×
1778
        }
1779

1780
        self.assign(
×
1781
            shard_index,
×
1782
            data,
×
1783
            visit_stack.iter_upper().map(|&v| {
10,000,000✔
1784
                let (sig_val, side) = xor_graph.edge_and_side(v.upcast());
50,000,000✔
1785
                ((sig_val.sig, get_val(shard_edge, sig_val)), side as u8)
30,000,000✔
1786
            }),
1787
            pl,
×
1788
        );
1789

1790
        Ok(())
×
1791
    }
1792

1793
    /// Solves a shard of given index possibly using [lazy Gaussian
1794
    /// elimination](https://doi.org/10.1016/j.ic.2020.104517), and stores the
1795
    /// solution in the given data.
1796
    ///
1797
    /// As a first try, the shard is [peeled by index](VBuilder::peel_by_index).
1798
    /// If the peeling is [partial](PeelResult::Partial), lazy Gaussian
1799
    /// elimination is used to solve the remaining edges.
1800
    ///
1801
    /// This method will scan the double stack, without emptying it, to check
1802
    /// which edges have been peeled. The information will be then passed to
1803
    /// [`VBuilder::assign`] to complete the assignment of values.
1804
    fn lge_shard<V: BinSafe, G: Fn(&E, SigVal<E::LocalSig, V>) -> W + Send + Sync>(
165✔
1805
        &self,
1806
        shard_index: usize,
1807
        shard: Arc<Vec<SigVal<S, V>>>,
1808
        data: ShardData<'_, D>,
1809
        get_val: &G,
1810
        pl: &mut impl ProgressLog,
1811
    ) -> Result<(), ()> {
1812
        let shard_edge = &self.shard_edge;
330✔
1813
        // Let's try to peel first
1814
        match self.peel_by_index(shard_index, shard, data, get_val, pl) {
1,155✔
1815
            Err(()) => Err(()),
×
1816
            Ok(PeelResult::Complete()) => Ok(()),
10✔
1817
            Ok(PeelResult::Partial {
×
1818
                shard_index,
155✔
1819
                shard,
155✔
1820
                mut data,
155✔
1821
                double_stack,
155✔
1822
                sides_stack,
155✔
1823
                _marker: PhantomData,
×
1824
            }) => {
×
1825
                pl.info(format_args!("Switching to lazy Gaussian elimination..."));
620✔
1826
                // Likely result--we have solve the rest
1827
                pl.start(format!(
465✔
1828
                    "Generating system for shard {}/{}...",
155✔
1829
                    shard_index + 1,
155✔
1830
                    shard_edge.num_shards()
310✔
1831
                ));
1832

1833
                let num_vertices = shard_edge.num_vertices();
465✔
1834
                let mut peeled_edges = BitVec::new(shard.len());
620✔
1835
                let mut used_vars = BitVec::new(num_vertices);
465✔
1836
                for &edge in double_stack.iter_upper() {
450,894✔
1837
                    peeled_edges.set(edge.upcast(), true);
×
1838
                }
1839

1840
                // Create data for a system using non-peeled edges
1841
                //
1842
                // SAFETY: there is no undefined behavior here, but the
1843
                // raw construction methods we use assume that the
1844
                // equations are sorted, that the variables are not repeated,
1845
                // and the variables are in the range [0..num_vertices).
1846
                let mut system = unsafe {
1847
                    crate::utils::mod2_sys::Modulo2System::from_parts(
1848
                        num_vertices,
155✔
1849
                        shard
155✔
1850
                            .iter()
155✔
1851
                            .enumerate()
155✔
1852
                            .filter(|(edge_index, _)| !peeled_edges[*edge_index])
1,080,755✔
1853
                            .map(|(_edge_index, sig_val)| {
630,171✔
1854
                                let local_sig = shard_edge.local_sig(sig_val.sig);
2,520,064✔
1855
                                let mut eq: Vec<_> = shard_edge
2,520,064✔
1856
                                    .local_edge(local_sig)
630,016✔
1857
                                    .iter()
630,016✔
1858
                                    .map(|&x| {
2,520,064✔
1859
                                        used_vars.set(x, true);
5,670,144✔
1860
                                        x as u32
1,890,048✔
1861
                                    })
1862
                                    .collect();
630,016✔
1863
                                eq.sort_unstable();
630,016✔
1864
                                crate::utils::mod2_sys::Modulo2Equation::from_parts(
630,016✔
1865
                                    eq,
630,016✔
1866
                                    get_val(
1,260,032✔
1867
                                        shard_edge,
1,260,032✔
1868
                                        SigVal {
630,016✔
1869
                                            sig: local_sig,
630,016✔
1870
                                            val: sig_val.val,
630,016✔
1871
                                        },
1872
                                    ),
1873
                                )
1874
                            })
1875
                            .collect(),
155✔
1876
                    )
1877
                };
1878

1879
                if self.failed.load(Ordering::Relaxed) {
465✔
1880
                    return Err(());
×
1881
                }
1882

1883
                pl.start("Solving system...");
×
1884
                let result = system.lazy_gaussian_elimination().map_err(|_| ())?;
155✔
1885
                pl.done_with_count(system.num_equations());
×
1886

1887
                for (v, &value) in result.iter().enumerate().filter(|(v, _)| used_vars[*v]) {
1,749,491✔
1888
                    data.set_value(v, value);
×
1889
                }
1890

1891
                self.assign(
×
1892
                    shard_index,
×
1893
                    data,
×
1894
                    double_stack
×
1895
                        .iter_upper()
×
1896
                        .map(|&edge_index| {
420,614✔
1897
                            let sig_val = &shard[edge_index.upcast()];
1,261,842✔
1898
                            let local_sig = shard_edge.local_sig(sig_val.sig);
1,682,456✔
1899
                            (
1900
                                local_sig,
841,228✔
1901
                                get_val(
841,228✔
1902
                                    shard_edge,
841,228✔
1903
                                    SigVal {
420,614✔
1904
                                        sig: local_sig,
420,614✔
1905
                                        val: sig_val.val,
420,614✔
1906
                                    },
1907
                                ),
1908
                            )
1909
                        })
1910
                        .zip(sides_stack.into_iter().rev()),
×
1911
                    pl,
×
1912
                );
1913
                Ok(())
×
1914
            }
1915
        }
1916
    }
1917

1918
    /// Perform assignment of values based on peeling data.
1919
    ///
1920
    /// This method might be called after a successful peeling procedure, or
1921
    /// after a linear solver has been used to solve the remaining edges.
1922
    ///
1923
    /// `sig_vals_sides` is an iterator returning pairs of signature/value pairs
1924
    /// and sides in reverse peeling order.
1925
    fn assign(
57✔
1926
        &self,
1927
        shard_index: usize,
1928
        mut data: ShardData<'_, D>,
1929
        sigs_vals_sides: impl Iterator<Item = ((E::LocalSig, W), u8)>,
1930
        pl: &mut impl ProgressLog,
1931
    ) where
1932
        for<'a> ShardData<'a, D>: SliceByValueMut<Value = W>,
1933
    {
1934
        if self.failed.load(Ordering::Relaxed) {
171✔
1935
            return;
×
1936
        }
1937

1938
        pl.start(format!(
×
1939
            "Assigning values for shard {}/{}...",
×
1940
            shard_index + 1,
×
1941
            self.shard_edge.num_shards()
×
1942
        ));
1943

1944
        for ((sig, val), side) in sigs_vals_sides {
89,682,913✔
1945
            let edge = self.shard_edge.local_edge(sig);
89,682,856✔
1946
            let side = side as usize;
44,841,428✔
1947
            unsafe {
1948
                let xor = match side {
44,841,428✔
1949
                    0 => data.get_value_unchecked(edge[1]) ^ data.get_value_unchecked(edge[2]),
3,935,744✔
1950
                    1 => data.get_value_unchecked(edge[0]) ^ data.get_value_unchecked(edge[2]),
34,066,525✔
1951
                    2 => data.get_value_unchecked(edge[0]) ^ data.get_value_unchecked(edge[1]),
58,358,325✔
1952
                    _ => core::hint::unreachable_unchecked(),
1953
                };
1954

1955
                data.set_value_unchecked(edge[side], val ^ xor);
×
1956
            }
1957
        }
1958
        pl.done();
57✔
1959
    }
1960
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc