• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

shader-slang / slang-rhi / 21595320431
69%

Build:
DEFAULT BRANCH: main
Ran 02 Feb 2026 03:09PM UTC
Jobs 3
Files 240
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

02 Feb 2026 03:04PM UTC coverage: 69.182% (-0.07%) from 69.255%
21595320431

push

github

web-flow
PyTorch-style caching allocator for the CUDA backend with proper multi-stream support. (#626)

* Add PyTorch-style caching allocator for GPU memory

Implements a caching allocator model that associates memory pages with
CUDA streams, enabling efficient memory reuse without expensive
cuMemAlloc/cuMemFree calls on every allocation.

Key features:
- Page-level stream tracking (m_stream set once, never transfers)
- Lazy CUDA event creation (only for multi-stream scenarios)
- PageCache stores freed pages for reuse instead of freeing to CUDA
- HeapCachingConfig for programmatic configuration
- Environment variable support (SLANG_RHI_ALLOCATOR_*)

This follows PyTorch's caching allocator design where:
- Block ownership remains with original allocation stream
- Events only created when current_stream != block.stream
- Memory reclaimed only after all stream events complete

* Add lazy events optimization for single-stream workloads

- Skip event creation for single-stream submits (PyTorch-style)
- Use cuStreamQuery for non-blocking completion checks
- Add cuStreamQuery to dynamic CUDA API loading
- Add tests for lazy events and rapid alloc/free patterns

This optimization eliminates event overhead for single-stream workloads,
matching PyTorch's behavior where events are only created for cross-stream
synchronization.

* Fix lazy events: remove signalFenceCount check

The internal event is for command buffer retirement, not for user fences.
User fences use setCurrentValue() which is a separate mechanism.
Single-stream retirement works with cuStreamQuery() regardless of user fences.

* Add test-caching-allocator.cpp to CMakeLists.txt

* Wire up multi-stream page tracking

- Set current stream when creating command encoder (not just at submit)
- Add Page::notifyUse() virtual hook called on every allocation
- Implement PageImpl::notifyUse() to call recordStreamUse() for cross-stream usage
- This enables proper PyTorch-style multi-stream memory synchronization:
  wh... (continued)

10853 of 18688 branches covered (58.07%)

Branch coverage included in aggregate %.

201 of 302 new or added lines in 6 files covered. (66.56%)

16 existing lines in 2 files now uncovered.

33052 of 44775 relevant lines covered (73.82%)

228616.78 hits per line

New Missed Lines in Diff

Lines Coverage ∆ File
1
51.8
-0.02% src/cuda-driver-api.cpp
4
81.91
1.04% src/cuda/cuda-command.cpp
8
30.3
-1.52% src/heap.h
88
54.08
-32.79% src/cuda/cuda-heap.cpp

Uncovered Existing Lines

Lines Coverage ∆ File
5
81.91
1.04% src/cuda/cuda-command.cpp
11
54.08
-32.79% src/cuda/cuda-heap.cpp
Jobs
ID Job ID Ran Files Coverage
1 macos-aarch64 - 21595320431.1 02 Feb 2026 03:14PM UTC 158
39.02
GitHub Action Run
2 windows-x86_64 - 21595320431.2 02 Feb 2026 03:11PM UTC 217
68.52
GitHub Action Run
3 linux-x86_64 - 21595320431.3 02 Feb 2026 03:08PM UTC 167
57.98
GitHub Action Run
Source Files on build 21595320431
  • Tree
  • List 240
  • Changed 34
  • Source Changed 10
  • Coverage Changed 34
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses
  • Back to Repo
  • Github Actions Build #21595320431
  • 7305dbb3 on github
  • Prev Build on main (#21591253909)
  • Next Build on main (#21603386108)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc