• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

meta-pytorch / opacus / 19090998057 / 2
80%
main: 80%

Build:
DEFAULT BRANCH: main
Ran 05 Nov 2025 04:24AM UTC
Files 134
Run time 4s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

27 Oct 2025 03:40PM UTC coverage: 77.918% (-2.3%) from 80.18%
19090998057.2

push

github

meta-codesync[bot]
Evgri243/multi device models (#796)

Summary:
## Types of changes

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Docs change / refactoring / dependency upgrade

## Motivation and Context / Related issue

This PR adds support for multi-device training scenarios where model parameters are distributed across multiple GPU devices (e.g., when assigning different layers directly with `module.to(device[I])` oe using `device_map="auto"` with accelerate).

**Problem solved:**
When training large models that don't fit on a single GPU, parameters and gradients can be spread across multiple devices. The existing Opacus optimizers and gradient clipping modules assumed all tensors were on the same device, causing runtime errors during norm computation and gradient clipping operations.

**Changes:**
1. **Sequential multi-device execution support (https://github.com/meta-pytorch/opacus/issues/9)**: Modified `DPOptimizer` and `AdaClipDPOptimizer` to move tensors to appropriate devices before operations like `torch.stack()` and `torch.einsum()`, preventing device mismatch errors during gradient clipping and accumulation.

2. **Multi-device support in GradSampleModuleFastGradientClipping (https://github.com/meta-pytorch/opacus/issues/10)**: Extended multi-device handling to `GradSampleModuleFastGradientClipping`, `DPPerLayerOptimizer`, and additional edge cases in optimizers that were previously uncovered.

## How Has This Been Tested

- The code was used to train 7B Zetta model with LoRA on 8xH200 GPU node.
- Added test suite in `multidevice_optimizer_test.py` covering:
  - `DPOptimizer`, `AdaClipDPOptimizer`, and `DPPerLayerOptimizer` with multi-device models
  - Both `clip_and_accumulate()` and full `step()` operations
  - Helper function `_clip_and_accumulate_parameter()` with multi-device paramete... (continued)

5575 of 7155 relevant lines covered (77.92%)

0.78 hits per line

Source Files on job run-2 - 19090998057.2
  • Tree
  • List 134
  • Changed 4
  • Source Changed 4
  • Coverage Changed 4
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 19090998057
  • 7dbbb400 on github
  • Prev Job for on main (#18790895690.2)
  • Next Job for on main (#19124556538.3)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc