• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

djeedai / bevy_hanabi / 22800469877

07 Mar 2026 02:04PM UTC coverage: 57.546% (-0.2%) from 57.75%
22800469877

push

github

web-flow
Batch spawners and properties (#525)

Bind the entire spawner and property arrays in all passes, instead of a
single entry. This removes the need to pad those structures. Access them
via an offset in the effect's own metadata.

Add a new `BatchInfo` struct holding the per-batch data. This clarifies
the responsibilites between this and `EffectMetadata`, the latter
holding per-effect (not per-batch) data.

Remove the `DispatchBufferIndices` component, which was used to track
the allocated entry for indirect compute dispatch. Instead, align those
allocations 1:1 with the GPU batch info allocations, which are
re-computed each frame.

Add a prefix sum pass before the update pass, which computes the prefix
sum of alive particles after the init pass. This is used to enable
batched update compute dispatch, where the number of compute threads
maps to the total number of alive particles in the batch. In that case,
we need to find which thread updates which particle of which batch,
using that prefix sum. Note that in this change, due to other
limitations still present, each effect instance is still in its own
batch (there's effectively no batching). Enabling full batching requires
more work, notably on the sort pass for ribbons, and the GPU-based init
pass with GPU events.

Change the allocation of spawners to occur after sorting. This ensures
all effects in a same batch have sequential allocations, which enables
accessing those spawners with a simple {offset + index} strategy.

Change the render pass to use the same bind group "spawner@2" than other
passes. This binds the property buffer, although in this change the
metadata buffer is still not available, so properties can't be used yet
in the render pass. The bind groups should be reviewed anyway because,
with batching approaching, and with the current change, assumptions
about frequency of changes is now wrong, and individual bindings should
be re-grouped in more suitable frequency-based groups.

Finally, the... (continued)

193 of 404 new or added lines in 7 files covered. (47.77%)

26 existing lines in 3 files now uncovered.

4793 of 8329 relevant lines covered (57.55%)

198.51 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

56.98
/src/render/gpu_buffer.rs
1
use std::marker::PhantomData;
2

3
use bevy::{
4
    log::trace,
5
    render::{
6
        render_resource::{
7
            BindingResource, Buffer, BufferAddress, BufferDescriptor, BufferUsages, ShaderSize,
8
            ShaderType,
9
        },
10
        renderer::RenderDevice,
11
    },
12
};
13
use bytemuck::Pod;
14
use wgpu::CommandEncoder;
15

16
struct BufferAndSize {
17
    /// Allocate GPU buffer.
18
    pub buffer: Buffer,
19
    /// Size of the buffer, in number of elements.
20
    pub size: u32,
21
}
22

23
/// GPU-only buffer without CPU-side storage.
24
///
25
/// This is a rather specialized helper to allocate an array on the GPU and
26
/// manage its buffer, depending on the device constraints and the WGSL rules
27
/// for data alignment, and allowing to resize the buffer without losing its
28
/// content (so, scheduling a buffer-to-buffer copy on GPU after reallocatin).
29
///
30
/// The element type `T` needs to implement the following traits:
31
/// - [`Pod`] to prevent user error. This is not strictly necessary, as there's
32
///   no copy from or to CPU, but if the placeholder type is not POD this might
33
///   indicate some user error.
34
/// - [`ShaderSize`] to ensure a fixed footprint, to allow packing multiple
35
///   instances inside a single buffer. This therefore excludes any
36
///   runtime-sized array (T being the element type here; it will itself be part
37
///   of an array).
38
pub struct GpuBuffer<T: Pod + ShaderSize> {
39
    /// GPU buffer if already allocated, or `None` otherwise.
40
    buffer: Option<BufferAndSize>,
41
    /// Previous GPU buffer, pending copy.
42
    old_buffer: Option<BufferAndSize>,
43
    /// GPU buffer usages.
44
    buffer_usage: BufferUsages,
45
    /// Optional GPU buffer name, for debugging.
46
    label: Option<String>,
47
    /// Used size, in element count. Elements past this are all free. Elements
48
    /// with a lower index are either allocated or in the free list.
49
    used_size: u32,
50
    /// Free list.
51
    free_list: Vec<u32>,
52
    _phantom: PhantomData<T>,
53
}
54

55
impl<T: Pod + ShaderType + ShaderSize> Default for GpuBuffer<T> {
56
    fn default() -> Self {
9✔
57
        Self {
58
            buffer: None,
59
            old_buffer: None,
60
            buffer_usage: BufferUsages::all(),
18✔
61
            label: None,
62
            used_size: 0,
63
            free_list: vec![],
9✔
64
            _phantom: PhantomData,
65
        }
66
    }
67
}
68

69
impl<T: Pod + ShaderType + ShaderSize> GpuBuffer<T> {
70
    /// Create a new collection.
71
    ///
72
    /// The buffer usage is always augmented by [`BufferUsages::COPY_SRC`] and
73
    /// [`BufferUsages::COPY_DST`] in order to allow buffer-to-buffer copy when
74
    /// reallocating, to preserve old content.
75
    ///
76
    /// # Panics
77
    ///
78
    /// Panics if `buffer_usage` contains [`BufferUsages::UNIFORM`] and the
79
    /// layout of the element type `T` does not meet the requirements of the
80
    /// uniform address space, as tested by
81
    /// [`ShaderType::assert_uniform_compat()`].
82
    ///
83
    /// [`BufferUsages::UNIFORM`]: bevy::render::render_resource::BufferUsages::UNIFORM
84
    #[allow(dead_code)]
85
    pub fn new(buffer_usage: BufferUsages, label: Option<String>) -> Self {
6✔
86
        // GPU-aligned item size, compatible with WGSL rules
87
        let item_size = <T as ShaderSize>::SHADER_SIZE.get() as usize;
12✔
88
        trace!("GpuBuffer: item_size={}", item_size);
6✔
89
        if buffer_usage.contains(BufferUsages::UNIFORM) {
12✔
90
            <T as ShaderType>::assert_uniform_compat();
×
91
        }
92
        Self {
93
            // We need both COPY_SRC and COPY_DST for copy_buffer_to_buffer() on realloc
94
            buffer_usage: buffer_usage | BufferUsages::COPY_SRC | BufferUsages::COPY_DST,
12✔
95
            label,
96
            ..Default::default()
97
        }
98
    }
99

100
    /// Create a new collection from an allocated buffer.
101
    ///
102
    /// The buffer usage must contain [`BufferUsages::COPY_SRC`] and
103
    /// [`BufferUsages::COPY_DST`] in order to allow buffer-to-buffer copy when
104
    /// reallocating, to preserve old content.
105
    ///
106
    /// # Panics
107
    ///
108
    /// Panics if `buffer_usage` doesn't contain [`BufferUsages::COPY_SRC`] or
109
    /// [`BufferUsages::COPY_DST`].
110
    ///
111
    /// Panics if `buffer_usage` contains [`BufferUsages::UNIFORM`] and the
112
    /// layout of the element type `T` does not meet the requirements of the
113
    /// uniform address space, as tested by
114
    /// [`ShaderType::assert_uniform_compat()`].
115
    ///
116
    /// [`BufferUsages::UNIFORM`]: bevy::render::render_resource::BufferUsages::UNIFORM
117
    pub fn new_allocated(buffer: Buffer, size: u32, label: Option<String>) -> Self {
3✔
118
        // GPU-aligned item size, compatible with WGSL rules
119
        let item_size = <T as ShaderSize>::SHADER_SIZE.get() as u32;
6✔
120
        let buffer_usage = buffer.usage();
6✔
121
        assert!(
3✔
122
            buffer_usage.contains(BufferUsages::COPY_SRC | BufferUsages::COPY_DST),
9✔
123
            "GpuBuffer requires COPY_SRC and COPY_DST buffer usages to allow copy on reallocation."
×
124
        );
125
        if buffer_usage.contains(BufferUsages::UNIFORM) {
6✔
126
            <T as ShaderType>::assert_uniform_compat();
×
127
        }
128
        trace!("GpuBuffer: item_size={}", item_size);
3✔
129
        Self {
130
            buffer: Some(BufferAndSize { buffer, size }),
6✔
131
            buffer_usage,
132
            label,
133
            ..Default::default()
134
        }
135
    }
136

137
    /// Clear the buffer.
138
    ///
139
    /// This doesn't de-allocate any GPU buffer.
140
    pub fn clear(&mut self) {
660✔
141
        self.free_list.clear();
1,320✔
142
        self.used_size = 0;
660✔
143
    }
144

145
    /// Allocate a new entry in the buffer.
146
    ///
147
    /// If the GPU buffer has not enough storage, or is not allocated yet, this
148
    /// schedules a (re-)allocation, which must be applied by calling
149
    /// [`allocate_gpu()`] once a frame after all [`allocate()`] calls were made
150
    /// for that frame.
151
    ///
152
    /// # Returns
153
    ///
154
    /// The index of the allocated entry.
155
    ///
156
    /// [`allocate_gpu()`]: Self::allocate_gpu
157
    /// [`allocate()`]: Self::allocate
158
    pub fn allocate(&mut self) -> u32 {
312✔
159
        if let Some(index) = self.free_list.pop() {
312✔
160
            index
×
161
        } else {
162
            // Note: we may return an index past the buffer capacity. This will instruct
163
            // allocate_gpu() to re-allocate the buffer.
164
            let index = self.used_size;
624✔
165
            self.used_size += 1;
312✔
166
            index
312✔
167
        }
168
    }
169

170
    /// Free an existing entry.
171
    ///
172
    /// # Panics
173
    ///
174
    /// In debug only, panics if the entry is not allocated (double-free). In
175
    /// non-debug, the behavior is undefined and will generally lead to bugs.
176
    // Currently we use GpuBuffer in sorting, and re-allocate everything each frame.
177
    #[allow(dead_code)]
UNCOV
178
    pub fn free(&mut self, index: u32) {
×
UNCOV
179
        if index < self.used_size {
×
UNCOV
180
            debug_assert!(
×
UNCOV
181
                !self.free_list.contains(&index),
×
182
                "Double-free in GpuBuffer at index #{}",
×
183
                index
×
184
            );
UNCOV
185
            self.free_list.push(index);
×
186
        }
187
    }
188

189
    /// Get the current GPU buffer, if allocated.
190
    #[inline]
191
    pub fn buffer(&self) -> Option<&Buffer> {
624✔
192
        self.buffer.as_ref().map(|b| &b.buffer)
1,872✔
193
    }
194

195
    /// Get a binding for the entire GPU buffer, if allocated.
196
    #[inline]
197
    #[allow(dead_code)]
198
    pub fn as_entire_binding(&self) -> Option<BindingResource<'_>> {
×
199
        let buffer = self.buffer()?;
×
200
        Some(buffer.as_entire_binding())
×
201
    }
202

203
    /// Get the current buffer capacity, in element count.
204
    ///
205
    /// This is the CPU view of allocations, which counts the number of
206
    /// [`allocate()`] and [`free()`] calls.
207
    ///
208
    /// [`allocate()`]: Self::allocate
209
    /// [`free()`]: Self::allocate_gpu
210
    #[inline]
211
    #[allow(dead_code)]
212
    pub fn capacity(&self) -> u32 {
×
213
        debug_assert!(self.used_size >= self.free_list.len() as u32);
×
214
        self.used_size - self.free_list.len() as u32
×
215
    }
216

217
    /// Get the current GPU buffer capacity, in element count.
218
    ///
219
    /// Note that it is possible for [`allocate()`] to return an index greater
220
    /// than or equal to the value returned by [`capacity()`], at least
221
    /// temporarily until [`allocate_gpu()`] is called.
222
    ///
223
    /// [`allocate()`]: Self::allocate
224
    /// [`gpu_capacity()`]: Self::gpu_capacity
225
    /// [`allocate_gpu()`]: Self::allocate_gpu
226
    #[inline]
227
    pub fn gpu_capacity(&self) -> u32 {
990✔
228
        self.buffer.as_ref().map(|b| b.size).unwrap_or(0)
3,960✔
229
    }
230

231
    /// Size in bytes of a single item in the buffer.
232
    ///
233
    /// This is equal to [`ShaderSize::SHADER_SIZE`] for the buffer element `T`.
234
    #[inline]
235
    pub fn item_size(&self) -> usize {
2✔
236
        <T as ShaderSize>::SHADER_SIZE.get() as usize
2✔
237
    }
238

239
    /// Check if the buffer is empty.
240
    ///
241
    /// The check is based on the CPU representation of the buffer, that is the
242
    /// number of calls to [`allocate()`]. The buffer is considered empty if no
243
    /// [`allocate()`] call was made, or they all have been followed by a
244
    /// corresponding [`free()`] call. This makes no assumption about the GPU
245
    /// buffer.
246
    ///
247
    /// [`allocate()`]: Self::allocate
248
    /// [`free()`]: Self::free
249
    #[inline]
250
    #[allow(dead_code)]
251
    pub fn is_empty(&self) -> bool {
×
252
        self.used_size == 0
×
253
    }
254

255
    /// Allocate or reallocate the GPU buffer if needed.
256
    ///
257
    /// This allocates or reallocates a GPU buffer to ensure storage for all
258
    /// previous calls to [`allocate()`]. This is a no-op if a GPU buffer is
259
    /// already allocated and has sufficient storage.
260
    ///
261
    /// This should be called once a frame after any new [`allocate()`] in that
262
    /// frame. After this call, [`buffer()`] is guaranteed to return `Some(..)`.
263
    ///
264
    /// # Returns
265
    ///
266
    /// `true` if the buffer was (re)allocated, or `false` if an existing buffer
267
    /// was reused which already had enough capacity.
268
    ///
269
    /// [`reserve()`]: Self::reserve
270
    /// [`allocate()`]: Self::allocate
271
    /// [`buffer()`]: Self::buffer
272
    pub fn prepare_buffers(&mut self, render_device: &RenderDevice) -> bool {
990✔
273
        // Don't do anything if we still have some storage.
274
        let old_capacity = self.gpu_capacity();
2,970✔
275
        if self.used_size <= old_capacity {
990✔
276
            return false;
988✔
277
        }
278

279
        // Ensure we allocate at least 256 more entries than what we need this frame,
280
        // and round that to make it nicer for the GPU.
281
        let new_capacity = (self.used_size + 256).next_multiple_of(1024);
×
282
        if new_capacity <= old_capacity {
×
283
            return false;
×
284
        }
285

286
        // Save the old buffer, we will need to copy it to the new one later.
287
        assert!(self.old_buffer.is_none(), "Multiple calls to GpuTable::prepare_buffers() before write_buffers() was called to copy old content.");
×
288
        self.old_buffer = self.buffer.take();
6✔
289

290
        // Allocate a new buffer of the appropriate size.
291
        let byte_size = self.item_size() * new_capacity as usize;
6✔
292
        trace!(
2✔
293
            "prepare_buffers(): increase capacity from {} to {} elements, new size {} bytes",
×
294
            old_capacity,
×
295
            new_capacity,
×
296
            byte_size
×
297
        );
298
        let buffer = render_device.create_buffer(&BufferDescriptor {
6✔
299
            label: self.label.as_ref().map(|s| &s[..]),
8✔
300
            size: byte_size as BufferAddress,
2✔
301
            usage: BufferUsages::COPY_DST | self.buffer_usage,
2✔
302
            mapped_at_creation: false,
×
303
        });
304
        self.buffer = Some(BufferAndSize {
4✔
305
            buffer,
2✔
306
            size: new_capacity,
2✔
307
        });
308

309
        true
2✔
310
    }
311

312
    /// Schedule any pending buffer copy.
313
    ///
314
    /// If a new buffer was (re-)allocated this frame, this schedules a
315
    /// buffer-to-buffer copy from the old buffer to the new one, then releases
316
    /// the old buffer.
317
    ///
318
    /// This should be called once a frame after [`prepare_buffers()`]. This is
319
    /// a no-op if there's no need for a buffer copy.
320
    ///
321
    /// [`prepare_buffers()`]: Self::prepare_buffers
322
    pub fn write_buffers(&self, command_encoder: &mut CommandEncoder) {
990✔
323
        if let Some(old_buffer) = self.old_buffer.as_ref() {
990✔
324
            let new_buffer = self.buffer.as_ref().unwrap();
×
325
            assert!(
×
326
                new_buffer.size >= old_buffer.size,
×
327
                "Old buffer is smaller than the new one. This is unexpected."
×
328
            );
329
            command_encoder.copy_buffer_to_buffer(
×
330
                &old_buffer.buffer,
×
331
                0,
332
                &new_buffer.buffer,
×
333
                0,
334
                old_buffer.size as u64,
×
335
            );
336
        }
337
    }
338

339
    /// Clear any stale buffer used for resize in the previous frame during
340
    /// rendering while the data structure was immutable.
341
    ///
342
    /// This must be called before any new [`allocate()`].
343
    ///
344
    /// [`allocate()`]: Self::allocate
345
    pub fn clear_previous_frame_resizes(&mut self) {
990✔
346
        if let Some(old_buffer) = self.old_buffer.take() {
990✔
347
            old_buffer.buffer.destroy();
×
348
        }
349
    }
350
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc