• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

djeedai / bevy_hanabi / 17710478608

14 Sep 2025 11:21AM UTC coverage: 66.279% (+0.2%) from 66.033%
17710478608

push

github

web-flow
Move indirect draw args to separate buffer (#495)

Move the indirect draw args outside of `EffectMetadata`, and into a
separate buffer of their own. This decouples the indirect draw args,
which are largely driven by GPU, from the effect metadata, which are
largely (and ideally, entirely) driven by CPU. The new indirect draw
args buffer stores both indexed and non-indexed draw args, the latter
padded with an extra `u32`. This ensures all entries are the same size
and simplifies handling, but more importantly allows retaining a single
unified dispatch of `vfx_indirect` for all effects without adding any
extra indirection or having to split into two passes.

The main benefit is that this prevents resetting the effect when Bevy
relocates the mesh, which requires re-uploading the mesh location info
into the draw args (base vertex and/or first index, notably), but
otherwise doesn't affect runtime info like the number of particles
alive. Previously when this happened, the entire `EffectMetadata` was
re-uploaded from CPU with default values for GPU-driven fields,
effectively leading to a "reset" of the effect (alive particle reset to
zero), as the warning in #471 used to highlight.

This change also cleans up the shaders by removing the `dead_count`
atomic particle count, and instead adding the constant `capacity`
particle count, which allows deducing the dead particle count from the
existing `alive_count`. This means `alive_count` becomes the only source
of truth for the number of alive particles. This makes several shaders
much more readable, and saves a couple of atomic instructions.

96 of 122 new or added lines in 5 files covered. (78.69%)

4 existing lines in 1 file now uncovered.

4900 of 7393 relevant lines covered (66.28%)

449.04 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

63.95
/src/render/gpu_buffer.rs
1
use std::marker::PhantomData;
2

3
use bevy::{
4
    log::trace,
5
    render::{
6
        render_resource::{
7
            BindingResource, Buffer, BufferAddress, BufferDescriptor, BufferUsages, ShaderSize,
8
            ShaderType,
9
        },
10
        renderer::RenderDevice,
11
    },
12
};
13
use bytemuck::Pod;
14
use wgpu::CommandEncoder;
15

16
struct BufferAndSize {
17
    /// Allocate GPU buffer.
18
    pub buffer: Buffer,
19
    /// Size of the buffer, in number of elements.
20
    pub size: u32,
21
}
22

23
/// GPU-only buffer without CPU-side storage.
24
///
25
/// This is a rather specialized helper to allocate an array on the GPU and
26
/// manage its buffer, depending on the device constraints and the WGSL rules
27
/// for data alignment, and allowing to resize the buffer without losing its
28
/// content (so, scheduling a buffer-to-buffer copy on GPU after reallocatin).
29
///
30
/// The element type `T` needs to implement the following traits:
31
/// - [`Pod`] to prevent user error. This is not strictly necessary, as there's
32
///   no copy from or to CPU, but if the placeholder type is not POD this might
33
///   indicate some user error.
34
/// - [`ShaderSize`] to ensure a fixed footprint, to allow packing multiple
35
///   instances inside a single buffer. This therefore excludes any
36
///   runtime-sized array (T being the element type here; it will itself be part
37
///   of an array).
38
pub struct GpuBuffer<T: Pod + ShaderSize> {
39
    /// GPU buffer if already allocated, or `None` otherwise.
40
    buffer: Option<BufferAndSize>,
41
    /// Previous GPU buffer, pending copy.
42
    old_buffer: Option<BufferAndSize>,
43
    /// GPU buffer usages.
44
    buffer_usage: BufferUsages,
45
    /// Optional GPU buffer name, for debugging.
46
    label: Option<String>,
47
    /// Used size, in element count. Elements past this are all free. Elements
48
    /// with a lower index are either allocated or in the free list.
49
    used_size: u32,
50
    /// Free list.
51
    free_list: Vec<u32>,
52
    _phantom: PhantomData<T>,
53
}
54

55
impl<T: Pod + ShaderType + ShaderSize> Default for GpuBuffer<T> {
56
    fn default() -> Self {
9✔
57
        Self {
58
            buffer: None,
59
            old_buffer: None,
60
            buffer_usage: BufferUsages::all(),
18✔
61
            label: None,
62
            used_size: 0,
63
            free_list: vec![],
9✔
64
            _phantom: PhantomData,
65
        }
66
    }
67
}
68

69
impl<T: Pod + ShaderType + ShaderSize> GpuBuffer<T> {
70
    /// Create a new collection.
71
    ///
72
    /// The buffer usage is always augmented by [`BufferUsages::COPY_SRC`] and
73
    /// [`BufferUsages::COPY_DST`] in order to allow buffer-to-buffer copy when
74
    /// reallocating, to preserve old content.
75
    ///
76
    /// # Panics
77
    ///
78
    /// Panics if `buffer_usage` contains [`BufferUsages::UNIFORM`] and the
79
    /// layout of the element type `T` does not meet the requirements of the
80
    /// uniform address space, as tested by
81
    /// [`ShaderType::assert_uniform_compat()`].
82
    ///
83
    /// [`BufferUsages::UNIFORM`]: bevy::render::render_resource::BufferUsages::UNIFORM
84
    #[allow(dead_code)]
85
    pub fn new(buffer_usage: BufferUsages, label: Option<String>) -> Self {
6✔
86
        // GPU-aligned item size, compatible with WGSL rules
87
        let item_size = <T as ShaderSize>::SHADER_SIZE.get() as usize;
12✔
88
        trace!("GpuBuffer: item_size={}", item_size);
10✔
89
        if buffer_usage.contains(BufferUsages::UNIFORM) {
12✔
90
            <T as ShaderType>::assert_uniform_compat();
×
91
        }
92
        Self {
93
            // We need both COPY_SRC and COPY_DST for copy_buffer_to_buffer() on realloc
94
            buffer_usage: buffer_usage | BufferUsages::COPY_SRC | BufferUsages::COPY_DST,
12✔
95
            label,
96
            ..Default::default()
97
        }
98
    }
99

100
    /// Create a new collection from an allocated buffer.
101
    ///
102
    /// The buffer usage must contain [`BufferUsages::COPY_SRC`] and
103
    /// [`BufferUsages::COPY_DST`] in order to allow buffer-to-buffer copy when
104
    /// reallocating, to preserve old content.
105
    ///
106
    /// # Panics
107
    ///
108
    /// Panics if `buffer_usage` doesn't contain [`BufferUsages::COPY_SRC`] or
109
    /// [`BufferUsages::COPY_DST`].
110
    ///
111
    /// Panics if `buffer_usage` contains [`BufferUsages::UNIFORM`] and the
112
    /// layout of the element type `T` does not meet the requirements of the
113
    /// uniform address space, as tested by
114
    /// [`ShaderType::assert_uniform_compat()`].
115
    ///
116
    /// [`BufferUsages::UNIFORM`]: bevy::render::render_resource::BufferUsages::UNIFORM
117
    pub fn new_allocated(buffer: Buffer, size: u32, label: Option<String>) -> Self {
3✔
118
        // GPU-aligned item size, compatible with WGSL rules
119
        let item_size = <T as ShaderSize>::SHADER_SIZE.get() as u32;
6✔
120
        let buffer_usage = buffer.usage();
6✔
121
        assert!(
3✔
122
            buffer_usage.contains(BufferUsages::COPY_SRC | BufferUsages::COPY_DST),
9✔
123
            "GpuBuffer requires COPY_SRC and COPY_DST buffer usages to allow copy on reallocation."
×
124
        );
125
        if buffer_usage.contains(BufferUsages::UNIFORM) {
6✔
126
            <T as ShaderType>::assert_uniform_compat();
×
127
        }
128
        trace!("GpuBuffer: item_size={}", item_size);
5✔
129
        Self {
130
            buffer: Some(BufferAndSize { buffer, size }),
6✔
131
            buffer_usage,
132
            label,
133
            ..Default::default()
134
        }
135
    }
136

137
    /// Clear the buffer.
138
    ///
139
    /// This doesn't de-allocate any GPU buffer.
140
    pub fn clear(&mut self) {
1,030✔
141
        self.free_list.clear();
2,060✔
142
        self.used_size = 0;
1,030✔
143
    }
144

145
    /// Allocate a new entry in the buffer.
146
    ///
147
    /// If the GPU buffer has not enough storage, or is not allocated yet, this
148
    /// schedules a (re-)allocation, which must be applied by calling
149
    /// [`allocate_gpu()`] once a frame after all [`allocate()`] calls were made
150
    /// for that frame.
151
    ///
152
    /// # Returns
153
    ///
154
    /// The index of the allocated entry.
155
    ///
156
    /// [`allocate_gpu()`]: Self::allocate_gpu
157
    /// [`allocate()`]: Self::allocate
158
    pub fn allocate(&mut self) -> u32 {
2✔
159
        if let Some(index) = self.free_list.pop() {
2✔
160
            index
×
161
        } else {
162
            // Note: we may return an index past the buffer capacity. This will instruct
163
            // allocate_gpu() to re-allocate the buffer.
164
            let index = self.used_size;
4✔
165
            self.used_size += 1;
2✔
166
            index
2✔
167
        }
168
    }
169

170
    /// Free an existing entry.
171
    ///
172
    /// # Panics
173
    ///
174
    /// In debug only, panics if the entry is not allocated (double-free). In
175
    /// non-debug, the behavior is undefined and will generally lead to bugs.
176
    // Currently we use GpuBuffer in sorting, and re-allocate everything each frame.
177
    #[allow(dead_code)]
178
    pub fn free(&mut self, index: u32) {
1✔
179
        if index < self.used_size {
1✔
180
            debug_assert!(
1✔
181
                !self.free_list.contains(&index),
2✔
182
                "Double-free in GpuBuffer at index #{}",
×
183
                index
×
184
            );
185
            self.free_list.push(index);
3✔
186
        }
187
    }
188

189
    /// Get the current GPU buffer, if allocated.
190
    #[inline]
191
    pub fn buffer(&self) -> Option<&Buffer> {
3,042✔
192
        self.buffer.as_ref().map(|b| &b.buffer)
9,126✔
193
    }
194

195
    /// Get a binding for the entire GPU buffer, if allocated.
196
    #[inline]
NEW
197
    pub fn as_entire_binding(&self) -> Option<BindingResource<'_>> {
×
198
        let buffer = self.buffer()?;
×
NEW
199
        Some(buffer.as_entire_binding())
×
200
    }
201

202
    /// Get the current buffer capacity, in element count.
203
    ///
204
    /// This is the CPU view of allocations, which counts the number of
205
    /// [`allocate()`] and [`free()`] calls.
206
    ///
207
    /// [`allocate()`]: Self::allocate
208
    /// [`free()`]: Self::allocate_gpu
209
    #[inline]
210
    #[allow(dead_code)]
211
    pub fn capacity(&self) -> u32 {
×
212
        debug_assert!(self.used_size >= self.free_list.len() as u32);
×
213
        self.used_size - self.free_list.len() as u32
×
214
    }
215

216
    /// Get the current GPU buffer capacity, in element count.
217
    ///
218
    /// Note that it is possible for [`allocate()`] to return an index greater
219
    /// than or equal to the value returned by [`capacity()`], at least
220
    /// temporarily until [`allocate_gpu()`] is called.
221
    ///
222
    /// [`allocate()`]: Self::allocate
223
    /// [`gpu_capacity()`]: Self::gpu_capacity
224
    /// [`allocate_gpu()`]: Self::allocate_gpu
225
    #[inline]
226
    pub fn gpu_capacity(&self) -> u32 {
3,090✔
227
        self.buffer.as_ref().map(|b| b.size).unwrap_or(0)
12,360✔
228
    }
229

230
    /// Size in bytes of a single item in the buffer.
231
    ///
232
    /// This is equal to [`ShaderSize::SHADER_SIZE`] for the buffer element `T`.
233
    #[inline]
234
    pub fn item_size(&self) -> usize {
2✔
235
        <T as ShaderSize>::SHADER_SIZE.get() as usize
2✔
236
    }
237

238
    /// Check if the buffer is empty.
239
    ///
240
    /// The check is based on the CPU representation of the buffer, that is the
241
    /// number of calls to [`allocate()`]. The buffer is considered empty if no
242
    /// [`allocate()`] call was made, or they all have been followed by a
243
    /// corresponding [`free()`] call. This makes no assumption about the GPU
244
    /// buffer.
245
    ///
246
    /// [`allocate()`]: Self::allocate
247
    /// [`free()`]: Self::free
248
    #[inline]
249
    #[allow(dead_code)]
250
    pub fn is_empty(&self) -> bool {
×
251
        self.used_size == 0
×
252
    }
253

254
    /// Allocate or reallocate the GPU buffer if needed.
255
    ///
256
    /// This allocates or reallocates a GPU buffer to ensure storage for all
257
    /// previous calls to [`allocate()`]. This is a no-op if a GPU buffer is
258
    /// already allocated and has sufficient storage.
259
    ///
260
    /// This should be called once a frame after any new [`allocate()`] in that
261
    /// frame. After this call, [`buffer()`] is guaranteed to return `Some(..)`.
262
    ///
263
    /// # Returns
264
    ///
265
    /// `true` if the buffer was (re)allocated, or `false` if an existing buffer
266
    /// was reused which already had enough capacity.
267
    ///
268
    /// [`reserve()`]: Self::reserve
269
    /// [`allocate()`]: Self::allocate
270
    /// [`buffer()`]: Self::buffer
271
    pub fn prepare_buffers(&mut self, render_device: &RenderDevice) -> bool {
3,090✔
272
        // Don't do anything if we still have some storage.
273
        let old_capacity = self.gpu_capacity();
9,270✔
274
        if self.used_size <= old_capacity {
3,090✔
275
            return false;
3,088✔
276
        }
277

278
        // Ensure we allocate at least 256 more entries than what we need this frame,
279
        // and round that to make it nicer for the GPU.
280
        let new_capacity = (self.used_size + 256).next_multiple_of(1024);
×
281
        if new_capacity <= old_capacity {
×
282
            return false;
×
283
        }
284

285
        // Save the old buffer, we will need to copy it to the new one later.
286
        assert!(self.old_buffer.is_none(), "Multiple calls to GpuTable::prepare_buffers() before write_buffers() was called to copy old content.");
×
287
        self.old_buffer = self.buffer.take();
6✔
288

289
        // Allocate a new buffer of the appropriate size.
290
        let byte_size = self.item_size() * new_capacity as usize;
6✔
291
        trace!(
2✔
292
            "prepare_buffers(): increase capacity from {} to {} elements, new size {} bytes",
2✔
293
            old_capacity,
×
294
            new_capacity,
×
295
            byte_size
×
296
        );
297
        let buffer = render_device.create_buffer(&BufferDescriptor {
6✔
298
            label: self.label.as_ref().map(|s| &s[..]),
8✔
299
            size: byte_size as BufferAddress,
2✔
300
            usage: BufferUsages::COPY_DST | self.buffer_usage,
2✔
301
            mapped_at_creation: false,
×
302
        });
303
        self.buffer = Some(BufferAndSize {
4✔
304
            buffer,
2✔
305
            size: new_capacity,
2✔
306
        });
307

308
        true
2✔
309
    }
310

311
    /// Schedule any pending buffer copy.
312
    ///
313
    /// If a new buffer was (re-)allocated this frame, this schedules a
314
    /// buffer-to-buffer copy from the old buffer to the new one, then releases
315
    /// the old buffer.
316
    ///
317
    /// This should be called once a frame after [`prepare_buffers()`]. This is
318
    /// a no-op if there's no need for a buffer copy.
319
    ///
320
    /// [`prepare_buffers()`]: Self::prepare_buffers
321
    pub fn write_buffers(&self, command_encoder: &mut CommandEncoder) {
3,090✔
322
        if let Some(old_buffer) = self.old_buffer.as_ref() {
3,090✔
323
            let new_buffer = self.buffer.as_ref().unwrap();
×
324
            assert!(
×
325
                new_buffer.size >= old_buffer.size,
×
326
                "Old buffer is smaller than the new one. This is unexpected."
×
327
            );
328
            command_encoder.copy_buffer_to_buffer(
×
329
                &old_buffer.buffer,
×
330
                0,
331
                &new_buffer.buffer,
×
332
                0,
333
                old_buffer.size as u64,
×
334
            );
335
        }
336
    }
337

338
    /// Clear any stale buffer used for resize in the previous frame during
339
    /// rendering while the data structure was immutable.
340
    ///
341
    /// This must be called before any new [`allocate()`].
342
    ///
343
    /// [`allocate()`]: Self::allocate
344
    pub fn clear_previous_frame_resizes(&mut self) {
3,090✔
345
        if let Some(old_buffer) = self.old_buffer.take() {
3,090✔
346
            old_buffer.buffer.destroy();
×
347
        }
348
    }
349
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc