|
Ran
|
Files
66
|
Run time
2s
|
Badge
README BADGES
|
push
github
Fix Fast Gradient Clipping bias gradient calculation for three dim data (#751) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/751 The bias grad calculation for three dim data was incorect. Let `G = g^Tg`, where `g`, of dimensions `Txd` be the per-sample activation gradient, where `T` is the number of tokens and `d` dimension. The per-sample gradient norm with respect to bias is `vec(G)^T vec(1)`, instead of the erroneous,`vec(G)^T vec(G)` before. This diff fixes it. Reviewed By: aparna-aketi, HuanyuZhang Differential Revision: D70823094 fbshipit-source-id: c1fe1dd7f
1399 of 2885 relevant lines covered (48.49%)
0.48 hits per line
| Coverage | ∆ | File | Lines | Relevant | Covered | Missed | Hits/Line |
|---|