• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

djeedai / bevy_hanabi / 11638278810

02 Nov 2024 12:30AM UTC coverage: 57.846% (-0.08%) from 57.93%
11638278810

push

github

web-flow
Improve debugging (#396)

- Several asset-related errors now emit the asset name.
- `BufferTable` now emits a table name in its logs.
- Improved `ExprError` error messages by implementing `Display` for
  `ValueType`.
- `SetAttributeModifer::eval()` now attempts to check that the type of the
  value emitted by the expression, if available, matches the type of the
attribute being assigned. This prevents generating invalid shader code.

21 of 53 new or added lines in 5 files covered. (39.62%)

1 existing line in 1 file now uncovered.

3561 of 6156 relevant lines covered (57.85%)

22.92 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

66.23
/src/render/buffer_table.rs
1
use std::{
2
    borrow::Cow,
3
    num::{NonZeroU32, NonZeroU64},
4
};
5

6
use bevy::{
7
    log::trace,
8
    render::{
9
        render_resource::{
10
            Buffer, BufferAddress, BufferDescriptor, BufferUsages, CommandEncoder, ShaderSize,
11
            ShaderType,
12
        },
13
        renderer::{RenderDevice, RenderQueue},
14
    },
15
};
16
use bytemuck::{cast_slice, Pod};
17
use copyless::VecHelper;
18

19
use crate::next_multiple_of;
20

21
/// Index of a row in a [`BufferTable`].
22
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
23
pub struct BufferTableId(pub(crate) u32); // TEMP: pub(crate)
24

25
impl BufferTableId {
26
    #[inline]
27
    pub fn offset(&self, index: u32) -> BufferTableId {
×
28
        BufferTableId(self.0 + index)
×
29
    }
30
}
31

32
#[derive(Debug)]
33
struct AllocatedBuffer {
34
    /// Currently allocated buffer, of size equal to `size`.
35
    buffer: Buffer,
36
    /// Size of the currently allocated buffer, in number of rows.
37
    count: u32,
38
    /// Previously allocated buffer if any, cached until the next buffer write
39
    /// so that old data can be copied into the newly-allocated buffer.
40
    old_buffer: Option<Buffer>,
41
    /// Size of the old buffer if any, in number of rows.
42
    old_count: u32,
43
}
44

45
impl AllocatedBuffer {
46
    /// Get the number of rows of the currently allocated GPU buffer.
47
    ///
48
    /// On capacity grow, the count is valid until the next buffer swap.
49
    pub fn allocated_count(&self) -> u32 {
3✔
50
        if self.old_buffer.is_some() {
3✔
51
            self.old_count
×
52
        } else {
53
            self.count
3✔
54
        }
55
    }
56
}
57

58
/// GPU buffer holding a table with concurrent interleaved CPU/GPU access.
59
///
60
/// The buffer table data structure represents a GPU buffer holding a table made
61
/// of individual rows. Each row of the table has the same layout (same size),
62
/// and can be allocated (assigned to an existing index) or free (available for
63
/// future allocation). The data structure manages a free list of rows, and copy
64
/// of rows modified on CPU to the GPU without touching other rows. This ensures
65
/// that existing rows in the GPU buffer can be accessed and modified by the GPU
66
/// without being overwritten by the CPU and without the need for the CPU to
67
/// read the data back from GPU into CPU memory.
68
///
69
/// The element type `T` needs to implement the following traits:
70
/// - [`Pod`] to allow copy.
71
/// - [`ShaderType`] because it needs to be mapped for a shader.
72
/// - [`ShaderSize`] to ensure a fixed footprint, to allow packing multiple
73
///   instances inside a single buffer. This therefore excludes any
74
///   runtime-sized array.
75
///
76
/// This is similar to a [`BufferVec`] or [`AlignedBufferVec`], but unlike those
77
/// data structures a buffer table preserves rows modified by the GPU without
78
/// overwriting. This is useful when the buffer is also modified by GPU shaders,
79
/// so neither the CPU side nor the GPU side have an up-to-date view of the
80
/// entire table, and so the CPU cannot re-upload the entire table on changes.
81
///
82
/// # Usage
83
///
84
/// - During the [`RenderStage::Prepare`] stage, call
85
///   [`clear_previous_frame_resizes()`] to clear any stale buffer from the
86
///   previous frame. Then insert new rows with [`insert()`] and if you made
87
///   changes call [`allocate_gpu()`] at the end to allocate any new buffer
88
///   needed.
89
/// - During the [`RenderStage::Render`] stage, call [`write_buffer()`] from a
90
///   command encoder before using any row, to perform any buffer resize copy
91
///   pending.
92
///
93
/// [`BufferVec`]: bevy::render::render_resource::BufferVec
94
/// [`AlignedBufferVec`]: crate::render::aligned_buffer_vec::AlignedBufferVec
95
#[derive(Debug)]
96
pub struct BufferTable<T: Pod + ShaderSize> {
97
    /// GPU buffer if already allocated, or `None` otherwise.
98
    buffer: Option<AllocatedBuffer>,
99
    /// GPU buffer usages.
100
    buffer_usage: BufferUsages,
101
    /// Optional GPU buffer name, for debugging.
102
    label: Option<String>,
103
    /// Size of a single buffer element, in bytes, in CPU memory (Rust layout).
104
    item_size: usize,
105
    /// Size of a single buffer element, in bytes, aligned to GPU memory
106
    /// constraints.
107
    aligned_size: usize,
108
    /// Capacity of the buffer, in number of rows.
109
    capacity: u32,
110
    /// Size of the "active" portion of the table, which includes allocated rows
111
    /// and any row in the free list. All other rows in the
112
    /// `active_size..capacity` range are implicitly unallocated.
113
    active_count: u32,
114
    /// Free list of rows available in the GPU buffer for a new allocation. This
115
    /// only contains indices in the `0..active_size` range; all row indices in
116
    /// `active_size..capacity` are assumed to be unallocated.
117
    free_indices: Vec<u32>,
118
    /// Pending values accumulated on CPU and not yet written to GPU, and their
119
    /// rows.
120
    pending_values: Vec<(u32, T)>,
121
    /// Extra pending values accumulated on CPU like `pending_values`, but for
122
    /// which there's not enough space in the current GPU buffer. Those values
123
    /// are sorted in index order, occupying the range `buffer.size..`.
124
    extra_pending_values: Vec<T>,
125
}
126

127
impl<T: Pod + ShaderSize> Default for BufferTable<T> {
128
    fn default() -> Self {
25✔
129
        let item_size = std::mem::size_of::<T>();
25✔
130
        let aligned_size = <T as ShaderSize>::SHADER_SIZE.get() as usize;
25✔
131
        assert!(aligned_size >= item_size);
25✔
132
        Self {
133
            buffer: None,
134
            buffer_usage: BufferUsages::all(),
25✔
135
            label: None,
136
            item_size,
137
            aligned_size,
138
            capacity: 0,
139
            active_count: 0,
140
            free_indices: Vec::new(),
25✔
141
            pending_values: Vec::new(),
25✔
142
            extra_pending_values: Vec::new(),
25✔
143
        }
144
    }
145
}
146

147
impl<T: Pod + ShaderSize> BufferTable<T> {
148
    /// Create a new collection.
149
    ///
150
    /// `item_align` is an optional additional alignment for items in the
151
    /// collection. If greater than the natural alignment dictated by WGSL
152
    /// rules, this extra alignment is enforced. Otherwise it's ignored (so you
153
    /// can pass `None` to ignore). This is useful if for example you want to
154
    /// bind individual rows or any subset of the table, to ensure each row is
155
    /// aligned to the device constraints.
156
    ///
157
    /// # Panics
158
    ///
159
    /// Panics if `buffer_usage` contains [`BufferUsages::UNIFORM`] and the
160
    /// layout of the element type `T` does not meet the requirements of the
161
    /// uniform address space, as tested by
162
    /// [`ShaderType::assert_uniform_compat()`].
163
    ///
164
    /// [`BufferUsages::UNIFORM`]: bevy::render::render_resource::BufferUsages::UNIFORM
165
    pub fn new(
25✔
166
        buffer_usage: BufferUsages,
167
        item_align: Option<NonZeroU64>,
168
        label: Option<String>,
169
    ) -> Self {
170
        // GPU-aligned item size, compatible with WGSL rules
171
        let item_size = <T as ShaderSize>::SHADER_SIZE.get() as usize;
25✔
172
        // Extra manual alignment for device constraints
173
        let aligned_size = if let Some(item_align) = item_align {
72✔
174
            let item_align = item_align.get() as usize;
×
175
            let aligned_size = next_multiple_of(item_size, item_align);
×
176
            assert!(aligned_size >= item_size);
×
177
            assert!(aligned_size % item_align == 0);
22✔
178
            aligned_size
22✔
179
        } else {
180
            item_size
3✔
181
        };
182
        trace!(
×
183
            "BufferTable[\"{}\"]: item_size={} aligned_size={}",
×
184
            label.as_ref().unwrap_or(&String::new()),
×
185
            item_size,
×
186
            aligned_size
×
187
        );
188
        if buffer_usage.contains(BufferUsages::UNIFORM) {
25✔
189
            <T as ShaderType>::assert_uniform_compat();
×
190
        }
191
        Self {
192
            // Need COPY_SRC and COPY_DST to copy from old to new buffer on resize
193
            buffer_usage: buffer_usage | BufferUsages::COPY_SRC | BufferUsages::COPY_DST,
×
194
            aligned_size,
195
            label,
196
            ..Default::default()
197
        }
198
    }
199

200
    /// Get a safe buffer label for debug display.
201
    ///
202
    /// Falls back to an empty string if no label was specified.
NEW
203
    pub fn safe_label(&self) -> Cow<'_, str> {
×
NEW
204
        self.label
×
205
            .as_ref()
NEW
206
            .map(|s| Cow::Borrowed(&s[..]))
×
NEW
207
            .unwrap_or(Cow::Borrowed(""))
×
208
    }
209

210
    /// Get a safe buffer name for debug display.
211
    ///
212
    /// Same as [`safe_label()`] but includes the buffer ID as well.
213
    ///
214
    /// [`safe_label()`]: self::BufferTable::safe_label
NEW
215
    pub fn safe_name(&self) -> String {
×
NEW
216
        let id = self
×
NEW
217
            .buffer
×
218
            .as_ref()
NEW
219
            .map(|ab| {
×
NEW
220
                let id: NonZeroU32 = ab.buffer.id().into();
×
NEW
221
                id.get()
×
222
            })
223
            .unwrap_or(0);
NEW
224
        format!("#{}:{}", id, self.safe_label())
×
225
    }
226

227
    /// Reference to the GPU buffer, if already allocated.
228
    ///
229
    /// This reference corresponds to the currently allocated GPU buffer, which
230
    /// may not contain all data since the last [`insert()`] call, and could
231
    /// become invalid if a new larger buffer needs to be allocated to store the
232
    /// pending values inserted with [`insert()`].
233
    ///
234
    /// [`insert()]`: BufferTable::insert
235
    #[inline]
236
    pub fn buffer(&self) -> Option<&Buffer> {
6✔
237
        self.buffer.as_ref().map(|ab| &ab.buffer)
18✔
238
    }
239

240
    /// Maximum number of rows the table can hold without reallocation.
241
    ///
242
    /// This is the maximum number of rows that can be added to the table
243
    /// without forcing a new GPU buffer to be allocated and a copy from the old
244
    /// to the new buffer.
245
    ///
246
    /// Note that this doesn't imply that no GPU buffer allocation will ever
247
    /// occur; if a GPU buffer was never allocated, and there are pending
248
    /// CPU rows to insert, then a new buffer will be allocated on next
249
    /// update with this capacity.
250
    #[inline]
251
    #[allow(dead_code)]
252
    pub fn capacity(&self) -> u32 {
27✔
253
        self.capacity
27✔
254
    }
255

256
    /// Current number of rows in use in the table.
257
    #[inline]
258
    #[allow(dead_code)]
259
    pub fn len(&self) -> u32 {
31✔
260
        self.active_count - self.free_indices.len() as u32
31✔
261
    }
262

263
    /// Size of a single row in the table, in bytes, aligned to GPU constraints.
264
    #[inline]
265
    #[allow(dead_code)]
266
    pub fn aligned_size(&self) -> usize {
22✔
267
        self.aligned_size
22✔
268
    }
269

270
    /// Is the table empty?
271
    #[inline]
272
    #[allow(dead_code)]
273
    pub fn is_empty(&self) -> bool {
52✔
274
        self.active_count == 0
52✔
275
    }
276

277
    /// Clear all rows of the table without deallocating any existing GPU
278
    /// buffer.
279
    ///
280
    /// This operation only updates the CPU cache of the table, without touching
281
    /// any GPU buffer. On next GPU buffer update, the GPU buffer will be
282
    /// deallocated.
283
    #[allow(dead_code)]
284
    pub fn clear(&mut self) {
×
285
        self.pending_values.clear();
×
286
        self.extra_pending_values.clear();
×
287
        self.free_indices.clear();
×
288
        self.active_count = 0;
×
289
    }
290

291
    /// Clear any stale buffer used for resize in the previous frame during
292
    /// rendering while the data structure was immutable.
293
    ///
294
    /// This must be called before any new [`insert()`].
295
    ///
296
    /// [`insert()`]: crate::BufferTable::insert
297
    pub fn clear_previous_frame_resizes(&mut self) {
37✔
298
        if let Some(ab) = self.buffer.as_mut() {
42✔
299
            ab.old_buffer = None;
×
300
            ab.old_count = 0;
×
301
        }
302
    }
303

304
    fn to_byte_size(&self, count: u32) -> usize {
7✔
305
        count as usize * self.aligned_size
7✔
306
    }
307

308
    /// Insert a new row into the table.
309
    ///
310
    /// For performance reasons, this buffers the row content on the CPU until
311
    /// the next GPU update, to minimize the number of CPU to GPU transfers.
312
    pub fn insert(&mut self, value: T) -> BufferTableId {
27✔
313
        trace!(
27✔
NEW
314
            "Inserting into table buffer '{}' with {} free indices, capacity: {}, active_size: {}",
×
NEW
315
            self.safe_name(),
×
316
            self.free_indices.len(),
×
317
            self.capacity,
×
318
            self.active_count
×
319
        );
320
        let index = if self.free_indices.is_empty() {
54✔
321
            let index = self.active_count;
26✔
322
            if index == self.capacity {
52✔
323
                self.capacity += 1;
26✔
324
            }
325
            debug_assert!(index < self.capacity);
52✔
326
            self.active_count += 1;
26✔
327
            index
26✔
328
        } else {
329
            // Note: this is inefficient O(n) but we need to apply the same logic as the
330
            // EffectCache because we rely on indices being in sync.
331
            self.free_indices.remove(0)
1✔
332
        };
333
        let allocated_count = self
×
334
            .buffer
×
335
            .as_ref()
336
            .map(|ab| ab.allocated_count())
3✔
337
            .unwrap_or(0);
338
        trace!(
×
339
            "Found free index {}, capacity: {}, active_count: {}, allocated_count: {}",
×
340
            index,
×
341
            self.capacity,
×
342
            self.active_count,
×
343
            allocated_count
×
344
        );
345
        if index < allocated_count {
29✔
346
            self.pending_values.alloc().init((index, value));
2✔
347
        } else {
348
            let extra_index = index - allocated_count;
25✔
349
            if extra_index < self.extra_pending_values.len() as u32 {
25✔
350
                self.extra_pending_values[extra_index as usize] = value;
×
351
            } else {
352
                self.extra_pending_values.alloc().init(value);
25✔
353
            }
354
        }
355
        BufferTableId(index)
27✔
356
    }
357

358
    /// Remove a row from the table.
359
    #[allow(dead_code)]
360
    pub fn remove(&mut self, id: BufferTableId) {
2✔
361
        let index = id.0;
2✔
362
        assert!(index < self.active_count);
2✔
363

364
        // If this is the last item in the active zone, just shrink the active zone
365
        // (implicit free list).
366
        if index == self.active_count - 1 {
3✔
367
            self.active_count -= 1;
1✔
368
            self.capacity -= 1;
1✔
369
        } else {
370
            // This is very inefficient but we need to apply the same logic as the
371
            // EffectCache because we rely on indices being in sync.
372
            let pos = self
1✔
373
                .free_indices
1✔
374
                .binary_search(&index) // will fail
1✔
375
                .unwrap_or_else(|e| e); // will get position of insertion
2✔
376
            self.free_indices.insert(pos, index);
×
377
        }
378
    }
379

380
    /// Allocate any GPU buffer if needed, based on the most recent capacity
381
    /// requested.
382
    ///
383
    /// This should be called only once per frame after all allocation requests
384
    /// have been made via [`insert()`] but before the GPU buffer is actually
385
    /// updated. This is an optimization to enable allocating the GPU buffer
386
    /// earlier than it's actually needed. Calling this multiple times will work
387
    /// but will be inefficient and allocate GPU buffers for nothing. Not
388
    /// calling it is safe, as the next update will call it just-in-time anyway.
389
    ///
390
    /// # Returns
391
    ///
392
    /// Returns `true` if a new buffer was (re-)allocated, to indicate any bind
393
    /// group needs to be re-created.
394
    ///
395
    /// [`insert()]`: crate::render::BufferTable::insert
396
    pub fn allocate_gpu(&mut self, device: &RenderDevice, queue: &RenderQueue) -> bool {
38✔
397
        // The allocated capacity is the capacity of the currently allocated GPU buffer,
398
        // which can be different from the expected capacity (self.capacity) for next
399
        // update.
400
        let allocated_count = self.buffer.as_ref().map(|ab| ab.count).unwrap_or(0);
82✔
401
        let reallocated = if self.capacity > allocated_count {
76✔
402
            let size = self.to_byte_size(self.capacity);
2✔
403
            trace!(
2✔
NEW
404
                "reserve('{}'): increase capacity from {} to {} elements, old size {} bytes, new size {} bytes",
×
NEW
405
                self.safe_name(),
×
406
                allocated_count,
×
407
                self.capacity,
×
408
                self.to_byte_size(allocated_count),
×
409
                size
×
410
            );
411

412
            // Create the new buffer, swapping with the old one if any
413
            let has_init_data = !self.extra_pending_values.is_empty();
2✔
414
            let new_buffer = device.create_buffer(&BufferDescriptor {
2✔
415
                label: self.label.as_ref().map(|s| &s[..]),
2✔
416
                size: size as BufferAddress,
×
417
                usage: self.buffer_usage,
×
418
                mapped_at_creation: has_init_data,
×
419
            });
420

421
            // Use any pending data to initialize the buffer. We only use CPU-available
422
            // data, which was inserted after the buffer was (re-)allocated and
423
            // has not been uploaded to GPU yet.
424
            if has_init_data {
×
425
                // Leave some space to copy the old buffer if any
426
                let base_size = self.to_byte_size(allocated_count) as u64;
2✔
427
                let extra_size = self.to_byte_size(self.extra_pending_values.len() as u32) as u64;
2✔
428

429
                // Scope get_mapped_range_mut() to force a drop before unmap()
430
                {
431
                    let dst_slice = &mut new_buffer
2✔
432
                        .slice(base_size..base_size + extra_size)
2✔
433
                        .get_mapped_range_mut();
2✔
434

435
                    for (index, content) in self.extra_pending_values.drain(..).enumerate() {
6✔
436
                        let byte_size = self.aligned_size; // single row
4✔
437
                        let byte_offset = byte_size * index;
4✔
438

439
                        // Copy Rust value into a GPU-ready format, including GPU padding.
440
                        let src: &[u8] = cast_slice(std::slice::from_ref(&content));
4✔
441
                        let dst_range = byte_offset..byte_offset + self.item_size;
4✔
442
                        trace!(
4✔
443
                            "+ copy: index={} src={:?} dst={:?} byte_offset={} byte_size={}",
×
444
                            index,
×
445
                            src.as_ptr(),
×
446
                            dst_range,
×
447
                            byte_offset,
×
448
                            byte_size
×
449
                        );
450
                        let dst = &mut dst_slice[dst_range];
4✔
451
                        dst.copy_from_slice(src);
4✔
452
                    }
453
                }
454

455
                new_buffer.unmap();
2✔
456
            }
457

458
            if let Some(ab) = self.buffer.as_mut() {
3✔
459
                // If there's any data currently in the GPU buffer, we need to copy it on next
460
                // update to preserve it, but only if there's no pending copy already.
461
                if self.active_count > 0 && ab.old_buffer.is_none() {
2✔
462
                    ab.old_buffer = Some(ab.buffer.clone()); // TODO: swap
1✔
463
                    ab.old_count = ab.count;
1✔
464
                }
465
                ab.buffer = new_buffer;
1✔
466
                ab.count = self.capacity;
1✔
467
            } else {
468
                self.buffer = Some(AllocatedBuffer {
1✔
469
                    buffer: new_buffer,
1✔
470
                    count: self.capacity,
1✔
471
                    old_buffer: None,
1✔
472
                    old_count: 0,
1✔
473
                });
474
            }
475

476
            true
2✔
477
        } else {
478
            false
36✔
479
        };
480

481
        // Immediately schedule a copy of old rows.
482
        // - For old rows, copy into the old buffer because the old-to-new buffer copy
483
        //   will be executed during a command queue while any CPU to GPU upload is
484
        //   prepended before the next command queue. To ensure things do get out of
485
        //   order with the CPU upload overwriting the GPU-to-GPU copy, make sure those
486
        //   two are disjoint.
487
        if let Some(ab) = self.buffer.as_ref() {
7✔
488
            let buffer = ab.old_buffer.as_ref().unwrap_or(&ab.buffer);
×
489
            for (index, content) in self.pending_values.drain(..) {
2✔
490
                let byte_size = self.aligned_size;
2✔
491
                let byte_offset = byte_size * index as usize;
2✔
492

493
                // Copy Rust value into a GPU-ready format, including GPU padding.
494
                // TODO - Do that in insert()!
495
                let mut aligned_buffer: Vec<u8> = vec![0; self.aligned_size];
2✔
496
                let src: &[u8] = cast_slice(std::slice::from_ref(&content));
2✔
497
                let dst_range = ..self.item_size;
2✔
498
                trace!(
2✔
499
                    "+ copy: index={} src={:?} dst={:?} byte_offset={} byte_size={}",
×
500
                    index,
×
501
                    src.as_ptr(),
×
502
                    dst_range,
×
503
                    byte_offset,
×
504
                    byte_size
×
505
                );
506
                let dst = &mut aligned_buffer[dst_range];
2✔
507
                dst.copy_from_slice(src);
2✔
508

509
                // Upload to GPU
510
                // TODO - Merge contiguous blocks into a single write_buffer()
511
                let bytes: &[u8] = cast_slice(&aligned_buffer);
2✔
512
                queue.write_buffer(buffer, byte_offset as u64, bytes);
2✔
513
            }
514
        } else {
515
            debug_assert!(self.pending_values.is_empty());
62✔
516
            debug_assert!(self.extra_pending_values.is_empty());
62✔
517
        }
518

519
        reallocated
38✔
520
    }
521

522
    /// Write CPU data to the GPU buffer, (re)allocating it as needed.
523
    pub fn write_buffer(&self, encoder: &mut CommandEncoder) {
37✔
524
        // Check if there's any work to do: either some pending values to upload or some
525
        // existing buffer to copy into a newly-allocated one.
526
        if self.pending_values.is_empty()
37✔
527
            && self
37✔
528
                .buffer
37✔
529
                .as_ref()
37✔
530
                .map(|ab| ab.old_buffer.is_none())
80✔
531
                .unwrap_or(true)
37✔
532
        {
533
            trace!("write_buffer({}): nothing to do", self.safe_name());
36✔
534
            return;
36✔
535
        }
536

537
        trace!(
1✔
NEW
538
            "write_buffer({}): pending_values.len={} item_size={} aligned_size={} buffer={:?}",
×
NEW
539
            self.safe_name(),
×
540
            self.pending_values.len(),
×
541
            self.item_size,
×
542
            self.aligned_size,
×
543
            self.buffer,
×
544
        );
545

546
        // If there's no more GPU buffer, there's nothing to do
547
        let Some(ab) = self.buffer.as_ref() else {
2✔
548
            return;
×
549
        };
550

551
        // Copy any old buffer into the new one, and clear the old buffer. Note that we
552
        // only clear the ref-counted reference to the buffer, not the actual buffer,
553
        // which stays alive until the copy is done (but we don't need to care about
554
        // keeping it alive, wgpu does that for us).
555
        if let Some(old_buffer) = ab.old_buffer.as_ref() {
1✔
556
            let old_size = self.to_byte_size(ab.old_count) as u64;
×
557
            trace!("Copy old buffer id {:?} of size {} bytes into newly-allocated buffer {:?} of size {} bytes.", old_buffer.id(), old_size, ab.buffer.id(), self.to_byte_size(ab.count));
×
558
            encoder.copy_buffer_to_buffer(old_buffer, 0, &ab.buffer, 0, old_size);
1✔
559
        }
560
    }
561
}
562

563
#[cfg(test)]
564
mod tests {
565
    use bevy::math::Vec3;
566
    use bytemuck::{Pod, Zeroable};
567

568
    use super::*;
569

570
    #[repr(C)]
571
    #[derive(Debug, Default, Clone, Copy, Pod, Zeroable, ShaderType)]
572
    pub(crate) struct GpuDummy {
573
        pub v: Vec3,
574
    }
575

576
    #[repr(C)]
577
    #[derive(Debug, Default, Clone, Copy, Pod, Zeroable, ShaderType)]
578
    pub(crate) struct GpuDummyComposed {
579
        pub simple: GpuDummy,
580
        pub tag: u32,
581
        // GPU padding to 16 bytes due to GpuDummy forcing align to 16 bytes
582
    }
583

584
    #[repr(C)]
585
    #[derive(Debug, Clone, Copy, Pod, Zeroable, ShaderType)]
586
    pub(crate) struct GpuDummyLarge {
587
        pub simple: GpuDummy,
588
        pub tag: u32,
589
        pub large: [f32; 128],
590
    }
591

592
    #[test]
593
    fn table_sizes() {
594
        // Rust
595
        assert_eq!(std::mem::size_of::<GpuDummy>(), 12);
596
        assert_eq!(std::mem::align_of::<GpuDummy>(), 4);
597
        assert_eq!(std::mem::size_of::<GpuDummyComposed>(), 16); // tight packing
598
        assert_eq!(std::mem::align_of::<GpuDummyComposed>(), 4);
599
        assert_eq!(std::mem::size_of::<GpuDummyLarge>(), 132 * 4); // tight packing
600
        assert_eq!(std::mem::align_of::<GpuDummyLarge>(), 4);
601

602
        // GPU
603
        assert_eq!(<GpuDummy as ShaderType>::min_size().get(), 16); // Vec3 gets padded to 16 bytes
604
        assert_eq!(<GpuDummy as ShaderSize>::SHADER_SIZE.get(), 16);
605
        assert_eq!(<GpuDummyComposed as ShaderType>::min_size().get(), 32); // align is 16 bytes, forces padding
606
        assert_eq!(<GpuDummyComposed as ShaderSize>::SHADER_SIZE.get(), 32);
607
        assert_eq!(<GpuDummyLarge as ShaderType>::min_size().get(), 544); // align is 16 bytes, forces padding
608
        assert_eq!(<GpuDummyLarge as ShaderSize>::SHADER_SIZE.get(), 544);
609

610
        for (item_align, expected_aligned_size) in [
611
            (0, 16),
612
            (4, 16),
613
            (8, 16),
614
            (16, 16),
615
            (32, 32),
616
            (256, 256),
617
            (512, 512),
618
        ] {
619
            let mut table = BufferTable::<GpuDummy>::new(
620
                BufferUsages::STORAGE,
621
                NonZeroU64::new(item_align),
622
                None,
623
            );
624
            assert_eq!(table.aligned_size(), expected_aligned_size);
625
            assert!(table.is_empty());
626
            table.insert(GpuDummy::default());
627
            assert!(!table.is_empty());
628
            assert_eq!(table.len(), 1);
629
        }
630

631
        for (item_align, expected_aligned_size) in [
632
            (0, 32),
633
            (4, 32),
634
            (8, 32),
635
            (16, 32),
636
            (32, 32),
637
            (256, 256),
638
            (512, 512),
639
        ] {
640
            let mut table = BufferTable::<GpuDummyComposed>::new(
641
                BufferUsages::STORAGE,
642
                NonZeroU64::new(item_align),
643
                None,
644
            );
645
            assert_eq!(table.aligned_size(), expected_aligned_size);
646
            assert!(table.is_empty());
647
            table.insert(GpuDummyComposed::default());
648
            assert!(!table.is_empty());
649
            assert_eq!(table.len(), 1);
650
        }
651

652
        for (item_align, expected_aligned_size) in [
653
            (0, 544),
654
            (4, 544),
655
            (8, 544),
656
            (16, 544),
657
            (32, 544),
658
            (256, 768),
659
            (512, 1024),
660
        ] {
661
            let mut table = BufferTable::<GpuDummyLarge>::new(
662
                BufferUsages::STORAGE,
663
                NonZeroU64::new(item_align),
664
                None,
665
            );
666
            assert_eq!(table.aligned_size(), expected_aligned_size);
667
            assert!(table.is_empty());
668
            table.insert(GpuDummyLarge {
669
                simple: Default::default(),
670
                tag: 0,
671
                large: [0.; 128],
672
            });
673
            assert!(!table.is_empty());
674
            assert_eq!(table.len(), 1);
675
        }
676
    }
677
}
678

679
#[cfg(all(test, feature = "gpu_tests"))]
680
mod gpu_tests {
681
    use std::fmt::Write;
682

683
    use bevy::render::render_resource::BufferSlice;
684
    use tests::*;
685
    use wgpu::{BufferView, CommandBuffer};
686

687
    use super::*;
688
    use crate::test_utils::MockRenderer;
689

690
    /// Read data from GPU back into CPU memory.
691
    ///
692
    /// This call blocks until the data is available on CPU. Used for testing
693
    /// only.
694
    fn read_back_gpu<'a>(device: &RenderDevice, slice: BufferSlice<'a>) -> BufferView<'a> {
6✔
695
        let (tx, rx) = futures::channel::oneshot::channel();
6✔
696
        slice.map_async(wgpu::MapMode::Read, move |result| {
12✔
697
            tx.send(result).unwrap();
6✔
698
        });
699
        device.poll(wgpu::Maintain::Wait);
6✔
700
        let result = futures::executor::block_on(rx);
6✔
701
        assert!(result.is_ok());
6✔
702
        slice.get_mapped_range()
6✔
703
    }
704

705
    /// Submit a command buffer to GPU and wait for completion.
706
    ///
707
    /// This call blocks until the GPU executed the command buffer. Used for
708
    /// testing only.
709
    fn submit_gpu_and_wait(
7✔
710
        device: &RenderDevice,
711
        queue: &RenderQueue,
712
        command_buffer: CommandBuffer,
713
    ) {
714
        // Queue command
715
        queue.submit([command_buffer]);
7✔
716

717
        // Register callback to observe completion
718
        let (tx, rx) = futures::channel::oneshot::channel();
7✔
719
        queue.on_submitted_work_done(move || {
14✔
720
            tx.send(()).unwrap();
7✔
721
        });
722

723
        // Poll device, checking for completion and raising callback
724
        device.poll(wgpu::Maintain::Wait);
7✔
725

726
        // Wait for callback to be raised. This was need in previous versions, however
727
        // it's a bit unclear if it's still needed or if device.poll() is enough to
728
        // guarantee that the command was executed.
729
        let _ = futures::executor::block_on(rx);
7✔
730
    }
731

732
    /// Convert a byte slice to a string of hexadecimal values separated by a
733
    /// blank space.
734
    fn to_hex_string(slice: &[u8]) -> String {
19✔
735
        let len = slice.len();
19✔
736
        let num_chars = len * 3 - 1;
19✔
737
        let mut s = String::with_capacity(num_chars);
19✔
738
        for b in &slice[..len - 1] {
589✔
739
            write!(&mut s, "{:02x} ", *b).unwrap();
285✔
740
        }
741
        write!(&mut s, "{:02x}", slice[len - 1]).unwrap();
19✔
742
        debug_assert_eq!(s.len(), num_chars);
38✔
743
        s
19✔
744
    }
745

746
    fn write_buffers_and_wait<T: Pod + ShaderSize>(
7✔
747
        table: &BufferTable<T>,
748
        device: &RenderDevice,
749
        queue: &RenderQueue,
750
    ) {
751
        let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
7✔
752
            label: Some("test"),
7✔
753
        });
754
        table.write_buffer(&mut encoder);
7✔
755
        let command_buffer = encoder.finish();
7✔
756
        submit_gpu_and_wait(device, queue, command_buffer);
7✔
757
        println!("Buffer written to GPU");
7✔
758
    }
759

760
    #[test]
761
    fn table_write() {
762
        let renderer = MockRenderer::new();
763
        let device = renderer.device();
764
        let queue = renderer.queue();
765

766
        let item_align = device.limits().min_storage_buffer_offset_alignment as u64;
767
        println!("min_storage_buffer_offset_alignment = {item_align}");
768
        let mut table = BufferTable::<GpuDummyComposed>::new(
769
            BufferUsages::STORAGE | BufferUsages::MAP_READ,
770
            NonZeroU64::new(item_align),
771
            None,
772
        );
773
        let final_align = item_align.max(<GpuDummyComposed as ShaderSize>::SHADER_SIZE.get());
774
        assert_eq!(table.aligned_size(), final_align as usize);
775

776
        // Initial state
777
        assert!(table.is_empty());
778
        assert_eq!(table.len(), 0);
779
        assert_eq!(table.capacity(), 0);
780
        assert!(table.buffer.is_none());
781

782
        // This has no effect while the table is empty
783
        table.clear_previous_frame_resizes();
784
        table.allocate_gpu(&device, &queue);
785
        write_buffers_and_wait(&table, &device, &queue);
786
        assert!(table.is_empty());
787
        assert_eq!(table.len(), 0);
788
        assert_eq!(table.capacity(), 0);
789
        assert!(table.buffer.is_none());
790

791
        // New frame
792
        table.clear_previous_frame_resizes();
793

794
        // Insert some entries
795
        let len = 3;
796
        for i in 0..len {
797
            let row = table.insert(GpuDummyComposed {
798
                tag: i + 1,
799
                ..Default::default()
800
            });
801
            assert_eq!(row.0, i);
802
        }
803
        assert!(!table.is_empty());
804
        assert_eq!(table.len(), len);
805
        assert!(table.capacity() >= len); // contract: could over-allocate...
806
        assert!(table.buffer.is_none()); // not yet allocated on GPU
807

808
        // Allocate GPU buffer for current requested state
809
        table.allocate_gpu(&device, &queue);
810
        assert!(!table.is_empty());
811
        assert_eq!(table.len(), len);
812
        assert!(table.capacity() >= len);
813
        let ab = table
814
            .buffer
815
            .as_ref()
816
            .expect("GPU buffer should be allocated after allocate_gpu()");
817
        assert!(ab.old_buffer.is_none()); // no previous copy
818
        assert_eq!(ab.count, len);
819
        println!(
820
            "Allocated buffer #{:?} of {} rows",
821
            ab.buffer.id(),
822
            ab.count
823
        );
824
        let ab_buffer = ab.buffer.clone();
825

826
        // Another allocate_gpu() is a no-op
827
        table.allocate_gpu(&device, &queue);
828
        assert!(!table.is_empty());
829
        assert_eq!(table.len(), len);
830
        assert!(table.capacity() >= len);
831
        let ab = table
832
            .buffer
833
            .as_ref()
834
            .expect("GPU buffer should be allocated after allocate_gpu()");
835
        assert!(ab.old_buffer.is_none()); // no previous copy
836
        assert_eq!(ab.count, len);
837
        assert_eq!(ab_buffer.id(), ab.buffer.id()); // same buffer
838

839
        // Write buffer (CPU -> GPU)
840
        write_buffers_and_wait(&table, &device, &queue);
841

842
        {
843
            // Read back (GPU -> CPU)
844
            let buffer = table.buffer().expect("Buffer was not allocated").clone(); // clone() for lifetime
845
            {
846
                let slice = buffer.slice(..);
847
                let view = read_back_gpu(&device, slice);
848
                println!(
849
                    "GPU data read back to CPU for validation: {} bytes",
850
                    view.len()
851
                );
852

853
                // Validate content
854
                assert_eq!(view.len(), final_align as usize * table.capacity() as usize);
855
                for i in 0..len as usize {
856
                    let offset = i * final_align as usize;
857
                    let item_size = std::mem::size_of::<GpuDummyComposed>();
858
                    let src = &view[offset..offset + 16];
859
                    println!("{}", to_hex_string(src));
860
                    let dummy_composed: &[GpuDummyComposed] =
861
                        cast_slice(&view[offset..offset + item_size]);
862
                    assert_eq!(dummy_composed[0].tag, (i + 1) as u32);
863
                }
864
            }
865
            buffer.unmap();
866
        }
867

868
        // New frame
869
        table.clear_previous_frame_resizes();
870

871
        // Insert more entries
872
        let old_capacity = table.capacity();
873
        let mut len = len;
874
        while table.capacity() == old_capacity {
875
            let row = table.insert(GpuDummyComposed {
876
                tag: len + 1,
877
                ..Default::default()
878
            });
879
            assert_eq!(row.0, len);
880
            len += 1;
881
        }
882
        println!(
883
            "Added {} rows to grow capacity from {} to {}.",
884
            len - 3,
885
            old_capacity,
886
            table.capacity()
887
        );
888

889
        // This re-allocates a new GPU buffer because the capacity changed
890
        table.allocate_gpu(&device, &queue);
891
        assert!(!table.is_empty());
892
        assert_eq!(table.len(), len);
893
        assert!(table.capacity() >= len);
894
        let ab = table
895
            .buffer
896
            .as_ref()
897
            .expect("GPU buffer should be allocated after allocate_gpu()");
898
        assert_eq!(ab.count, len);
899
        assert!(ab.old_buffer.is_some()); // old buffer to copy
900
        assert_ne!(ab.old_buffer.as_ref().unwrap().id(), ab.buffer.id());
901
        println!(
902
            "Allocated new buffer #{:?} of {} rows",
903
            ab.buffer.id(),
904
            ab.count
905
        );
906

907
        // Write buffer (CPU -> GPU)
908
        write_buffers_and_wait(&table, &device, &queue);
909

910
        {
911
            // Read back (GPU -> CPU)
912
            let buffer = table.buffer().expect("Buffer was not allocated").clone(); // clone() for lifetime
913
            {
914
                let slice = buffer.slice(..);
915
                let view = read_back_gpu(&device, slice);
916
                println!(
917
                    "GPU data read back to CPU for validation: {} bytes",
918
                    view.len()
919
                );
920

921
                // Validate content
922
                assert_eq!(view.len(), final_align as usize * table.capacity() as usize);
923
                for i in 0..len as usize {
924
                    let offset = i * final_align as usize;
925
                    let item_size = std::mem::size_of::<GpuDummyComposed>();
926
                    let src = &view[offset..offset + 16];
927
                    println!("{}", to_hex_string(src));
928
                    let dummy_composed: &[GpuDummyComposed] =
929
                        cast_slice(&view[offset..offset + item_size]);
930
                    assert_eq!(dummy_composed[0].tag, (i + 1) as u32);
931
                }
932
            }
933
            buffer.unmap();
934
        }
935

936
        // New frame
937
        table.clear_previous_frame_resizes();
938

939
        // Delete the last allocated row
940
        let old_capacity = table.capacity();
941
        let len = len - 1;
942
        table.remove(BufferTableId(len));
943
        println!(
944
            "Removed last row to shrink capacity from {} to {}.",
945
            old_capacity,
946
            table.capacity()
947
        );
948

949
        // This doesn't do anything since we removed a row only
950
        table.allocate_gpu(&device, &queue);
951
        assert!(!table.is_empty());
952
        assert_eq!(table.len(), len);
953
        assert!(table.capacity() >= len);
954
        let ab = table
955
            .buffer
956
            .as_ref()
957
            .expect("GPU buffer should be allocated after allocate_gpu()");
958
        assert_eq!(ab.count, len + 1); // GPU buffer kept its size
959
        assert!(ab.old_buffer.is_none());
960

961
        // Write buffer (CPU -> GPU)
962
        write_buffers_and_wait(&table, &device, &queue);
963

964
        {
965
            // Read back (GPU -> CPU)
966
            let buffer = table.buffer().expect("Buffer was not allocated").clone(); // clone() for lifetime
967
            {
968
                let slice = buffer.slice(..);
969
                let view = read_back_gpu(&device, slice);
970
                println!(
971
                    "GPU data read back to CPU for validation: {} bytes",
972
                    view.len()
973
                );
974

975
                // Validate content
976
                assert!(view.len() >= final_align as usize * table.capacity() as usize); // note the >=, the buffer is over-allocated
977
                for i in 0..len as usize {
978
                    let offset = i * final_align as usize;
979
                    let item_size = std::mem::size_of::<GpuDummyComposed>();
980
                    let src = &view[offset..offset + 16];
981
                    println!("{}", to_hex_string(src));
982
                    let dummy_composed: &[GpuDummyComposed] =
983
                        cast_slice(&view[offset..offset + item_size]);
984
                    assert_eq!(dummy_composed[0].tag, (i + 1) as u32);
985
                }
986
            }
987
            buffer.unmap();
988
        }
989

990
        // New frame
991
        table.clear_previous_frame_resizes();
992

993
        // Delete the first allocated row
994
        let old_capacity = table.capacity();
995
        let mut len = len - 1;
996
        table.remove(BufferTableId(0));
997
        assert_eq!(old_capacity, table.capacity());
998
        println!(
999
            "Removed first row to shrink capacity from {} to {} (no change).",
1000
            old_capacity,
1001
            table.capacity()
1002
        );
1003

1004
        // This doesn't do anything since we only removed a row
1005
        table.allocate_gpu(&device, &queue);
1006
        assert!(!table.is_empty());
1007
        assert_eq!(table.len(), len);
1008
        assert!(table.capacity() >= len);
1009
        let ab = table
1010
            .buffer
1011
            .as_ref()
1012
            .expect("GPU buffer should be allocated after allocate_gpu()");
1013
        assert_eq!(ab.count, len + 2); // GPU buffer kept its size
1014
        assert!(ab.old_buffer.is_none());
1015

1016
        // Write buffer (CPU -> GPU)
1017
        write_buffers_and_wait(&table, &device, &queue);
1018

1019
        {
1020
            // Read back (GPU -> CPU)
1021
            let buffer = table.buffer().expect("Buffer was not allocated").clone(); // clone() for lifetime
1022
            {
1023
                let slice = buffer.slice(..);
1024
                let view = read_back_gpu(&device, slice);
1025
                println!(
1026
                    "GPU data read back to CPU for validation: {} bytes",
1027
                    view.len()
1028
                );
1029

1030
                // Validate content
1031
                assert!(view.len() >= final_align as usize * table.capacity() as usize); // note the >=, the buffer is over-allocated
1032
                for i in 0..len as usize {
1033
                    let offset = i * final_align as usize;
1034
                    let item_size = std::mem::size_of::<GpuDummyComposed>();
1035
                    let src = &view[offset..offset + 16];
1036
                    println!("{}", to_hex_string(src));
1037
                    if i > 0 {
1038
                        let dummy_composed: &[GpuDummyComposed] =
1039
                            cast_slice(&view[offset..offset + item_size]);
1040
                        assert_eq!(dummy_composed[0].tag, (i + 1) as u32);
1041
                    }
1042
                }
1043
            }
1044
            buffer.unmap();
1045
        }
1046

1047
        // New frame
1048
        table.clear_previous_frame_resizes();
1049

1050
        // Insert a row; this should get into row #0 in the buffer
1051
        let row = table.insert(GpuDummyComposed {
1052
            tag: 1,
1053
            ..Default::default()
1054
        });
1055
        assert_eq!(row.0, 0);
1056
        len += 1;
1057
        println!(
1058
            "Added 1 row to grow capacity from {} to {}.",
1059
            old_capacity,
1060
            table.capacity()
1061
        );
1062

1063
        // This doesn't reallocate the GPU buffer since we used a free list entry
1064
        table.allocate_gpu(&device, &queue);
1065
        assert!(!table.is_empty());
1066
        assert_eq!(table.len(), len);
1067
        assert!(table.capacity() >= len);
1068
        let ab = table
1069
            .buffer
1070
            .as_ref()
1071
            .expect("GPU buffer should be allocated after allocate_gpu()");
1072
        assert_eq!(ab.count, 4); // 4 == last time we grew
1073
        assert!(ab.old_buffer.is_none());
1074

1075
        // Write buffer (CPU -> GPU)
1076
        write_buffers_and_wait(&table, &device, &queue);
1077

1078
        {
1079
            // Read back (GPU -> CPU)
1080
            let buffer = table.buffer().expect("Buffer was not allocated").clone(); // clone() for lifetime
1081
            {
1082
                let slice = buffer.slice(..);
1083
                let view = read_back_gpu(&device, slice);
1084
                println!(
1085
                    "GPU data read back to CPU for validation: {} bytes",
1086
                    view.len()
1087
                );
1088

1089
                // Validate content
1090
                assert!(view.len() >= final_align as usize * table.capacity() as usize);
1091
                for i in 0..len as usize {
1092
                    let offset = i * final_align as usize;
1093
                    let item_size = std::mem::size_of::<GpuDummyComposed>();
1094
                    let src = &view[offset..offset + 16];
1095
                    println!("{}", to_hex_string(src));
1096
                    let dummy_composed: &[GpuDummyComposed] =
1097
                        cast_slice(&view[offset..offset + item_size]);
1098
                    assert_eq!(dummy_composed[0].tag, (i + 1) as u32);
1099
                }
1100
            }
1101
            buffer.unmap();
1102
        }
1103

1104
        // New frame
1105
        table.clear_previous_frame_resizes();
1106

1107
        // Insert a row; this should get into row #3 at the end of the allocated buffer
1108
        let row = table.insert(GpuDummyComposed {
1109
            tag: 4,
1110
            ..Default::default()
1111
        });
1112
        assert_eq!(row.0, 3);
1113
        len += 1;
1114
        println!(
1115
            "Added 1 row to grow capacity from {} to {}.",
1116
            old_capacity,
1117
            table.capacity()
1118
        );
1119

1120
        // This doesn't reallocate the GPU buffer since we used an implicit free entry
1121
        table.allocate_gpu(&device, &queue);
1122
        assert!(!table.is_empty());
1123
        assert_eq!(table.len(), len);
1124
        assert!(table.capacity() >= len);
1125
        let ab = table
1126
            .buffer
1127
            .as_ref()
1128
            .expect("GPU buffer should be allocated after allocate_gpu()");
1129
        assert_eq!(ab.count, 4); // 4 == last time we grew
1130
        assert!(ab.old_buffer.is_none());
1131

1132
        // Write buffer (CPU -> GPU)
1133
        write_buffers_and_wait(&table, &device, &queue);
1134

1135
        {
1136
            // Read back (GPU -> CPU)
1137
            let buffer = table.buffer().expect("Buffer was not allocated").clone(); // clone() for lifetime
1138
            {
1139
                let slice = buffer.slice(..);
1140
                let view = read_back_gpu(&device, slice);
1141
                println!(
1142
                    "GPU data read back to CPU for validation: {} bytes",
1143
                    view.len()
1144
                );
1145

1146
                // Validate content
1147
                assert!(view.len() >= final_align as usize * table.capacity() as usize);
1148
                for i in 0..len as usize {
1149
                    let offset = i * final_align as usize;
1150
                    let item_size = std::mem::size_of::<GpuDummyComposed>();
1151
                    let src = &view[offset..offset + 16];
1152
                    println!("{}", to_hex_string(src));
1153
                    let dummy_composed: &[GpuDummyComposed] =
1154
                        cast_slice(&view[offset..offset + item_size]);
1155
                    assert_eq!(dummy_composed[0].tag, (i + 1) as u32);
1156
                }
1157
            }
1158
            buffer.unmap();
1159
        }
1160
    }
1161
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc