19234922926
78%
jbr-main: 81%

Ran 10 Nov 2025 02:30PM UTC

Jobs 3

Files 135

Run time 39min

Badge

Embed ▾

Committed 27 Oct 2025 03:40PM UTC coverage: 78.17%. First build

Build # 19234922926

Build Type

push

github

Committed by

meta-codesync[bot]

Commit Message

Evgri243/multi device models (#796)

Summary:
## Types of changes

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Docs change / refactoring / dependency upgrade

## Motivation and Context / Related issue

This PR adds support for multi-device training scenarios where model parameters are distributed across multiple GPU devices (e.g., when assigning different layers directly with `module.to(device[I])` oe using `device_map="auto"` with accelerate).

**Problem solved:**
When training large models that don't fit on a single GPU, parameters and gradients can be spread across multiple devices. The existing Opacus optimizers and gradient clipping modules assumed all tensors were on the same device, causing runtime errors during norm computation and gradient clipping operations.

**Changes:**
1. **Sequential multi-device execution support (https://github.com/meta-pytorch/opacus/issues/9)**: Modified `DPOptimizer` and `AdaClipDPOptimizer` to move tensors to appropriate devices before operations like `torch.stack()` and `torch.einsum()`, preventing device mismatch errors during gradient clipping and accumulation.

2. **Multi-device support in GradSampleModuleFastGradientClipping (https://github.com/meta-pytorch/opacus/issues/10)**: Extended multi-device handling to `GradSampleModuleFastGradientClipping`, `DPPerLayerOptimizer`, and additional edge cases in optimizers that were previously uncovered.

## How Has This Been Tested

- The code was used to train 7B Zetta model with LoRA on 8xH200 GPU node.
- Added test suite in `multidevice_optimizer_test.py` covering:
  - `DPOptimizer`, `AdaClipDPOptimizer`, and `DPPerLayerOptimizer` with multi-device models
  - Both `clip_and_accumulate()` and full `step()` operations
  - Helper function `_clip_and_accumulate_parameter()` with multi-device paramete... (continued)

Coverage Stats

40 of 252 new or added lines in 5 files covered. (15.87%)

5672 of 7256 relevant lines covered (78.17%)

1.74 hits per line

Uncovered Changes

Lines	Coverage	File
151	14.69	opacus/tests/multidevice_optimizer_test.py
56	80.07	opacus/tests/grad_sample_module_fast_gradient_clipping_test.py
5	22.58	opacus/optimizers/adaclipoptimizer.py

Jobs

ID	Job ID	Ran	Files	Coverage
1	run-3 - 19234922926.1	10 Nov 2025 02:30PM UTC	72	47.93	GitHub Action Run
2	run-2 - 19234922926.2	10 Nov 2025 02:38PM UTC	134	77.92	GitHub Action Run
3	run-1 - 19234922926.3	10 Nov 2025 02:39PM UTC	134	77.93	GitHub Action Run

JetBrains-Research / opacus / 19234922926
78%
jbr-main: 81%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Uncovered Changes

Jobs

Source Files on build 19234922926

JetBrains-Research / opacus / 19234922926 78% jbr-main: 81%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Uncovered Changes

Jobs

Source Files on build 19234922926

JetBrains-Research / opacus / 19234922926
78%
jbr-main: 81%

README BADGES
x