• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

shader-slang / slang-rhi / 21595320431 / 1
69%
main: 69%

Build:
DEFAULT BRANCH: main
Ran 02 Feb 2026 03:15PM UTC
Files 158
Run time 4s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

02 Feb 2026 03:04PM UTC coverage: 39.024% (-0.01%) from 39.037%
21595320431.1

push

github

web-flow
PyTorch-style caching allocator for the CUDA backend with proper multi-stream support. (#626)

* Add PyTorch-style caching allocator for GPU memory

Implements a caching allocator model that associates memory pages with
CUDA streams, enabling efficient memory reuse without expensive
cuMemAlloc/cuMemFree calls on every allocation.

Key features:
- Page-level stream tracking (m_stream set once, never transfers)
- Lazy CUDA event creation (only for multi-stream scenarios)
- PageCache stores freed pages for reuse instead of freeing to CUDA
- HeapCachingConfig for programmatic configuration
- Environment variable support (SLANG_RHI_ALLOCATOR_*)

This follows PyTorch's caching allocator design where:
- Block ownership remains with original allocation stream
- Events only created when current_stream != block.stream
- Memory reclaimed only after all stream events complete

* Add lazy events optimization for single-stream workloads

- Skip event creation for single-stream submits (PyTorch-style)
- Use cuStreamQuery for non-blocking completion checks
- Add cuStreamQuery to dynamic CUDA API loading
- Add tests for lazy events and rapid alloc/free patterns

This optimization eliminates event overhead for single-stream workloads,
matching PyTorch's behavior where events are only created for cross-stream
synchronization.

* Fix lazy events: remove signalFenceCount check

The internal event is for command buffer retirement, not for user fences.
User fences use setCurrentValue() which is a separate mechanism.
Single-stream retirement works with cuStreamQuery() regardless of user fences.

* Add test-caching-allocator.cpp to CMakeLists.txt

* Wire up multi-stream page tracking

- Set current stream when creating command encoder (not just at submit)
- Add Page::notifyUse() virtual hook called on every allocation
- Implement PageImpl::notifyUse() to call recordStreamUse() for cross-stream usage
- This enables proper PyTorch-style multi-stream memory synchronization:
  wh... (continued)

3945 of 11918 branches covered (33.1%)

Branch coverage included in aggregate %.

12277 of 29651 relevant lines covered (41.41%)

26729.97 hits per line

Source Files on job macos-aarch64 - 21595320431.1
  • Tree
  • List 158
  • Changed 3
  • Source Changed 3
  • Coverage Changed 3
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses
  • Back to Build 21595320431
  • 7305dbb3 on github
  • Prev Job for on main (#21591253909.3)
  • Next Job for on main (#21603386108.3)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc