|
Ran
|
Files
135
|
Run time
3s
|
Badge
README BADGES
|
push
github
Fix GitHub issue #792: Fast gradient clipping ignores ignore_index masking (#808) Summary: Pull Request resolved: https://github.com/meta-pytorch/opacus/pull/808 Context/Motivation: Fixes https://github.com/meta-pytorch/opacus/issues/792 When using fast/ghost gradient clipping for NLP tasks, `DPLossFastGradientClipping` computes per-sample mean loss via `.mean(dim=1)`, which divides by the full sequence length. This ignores the `ignore_index` parameter from the criterion (e.g., `CrossEntropyLoss(ignore_index=-100)`), causing masked/padded positions to dilute the loss. For tasks like SQuAD where only a few tokens are real targets out of a long sequence, the loss becomes orders of magnitude too small, preventing training. This diff: - Modified `DPLossFastGradientClipping.__call__()` to check for `ignore_index` on the criterion and compute mean only over non-ignored positions when present - Added regression test `github_issue_test.py` verifying ignore_index is respected for both mean and sum reductions, plus a backwards-compatibility test for the no-masking case Reviewed By: aparna-aketi Differential Revision: D95489302 fbshipit-source-id: d02146a71
5798 of 7344 relevant lines covered (78.95%)
0.79 hits per line
| Coverage | ∆ | File | Lines | Relevant | Covered | Missed | Hits/Line |
|---|