• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pytorch / opacus / 17754228286 / 3
80%
main: 80%

Build:
DEFAULT BRANCH: main
Ran 16 Sep 2025 04:22AM UTC
Files 70
Run time 2s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

16 Sep 2025 12:01AM UTC coverage: 47.815%. Remained the same
17754228286.3

push

github

facebook-github-bot
Add support for passing args and kwargs to per-sample loss functions (#786)

Summary:
## Types of changes

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Docs change / refactoring / dependency upgrade

## Motivation and Context / Related issue

It prevents `TypeError: DPLossFastGradientAdaptiveClipping.__call__() got an unexpected keyword argument 'vocab_size'` error from triggering when assigning `DPLossFastGradientAdaptiveClipping` or `DPLossFastGradientClipping` to the `.loss_function` property of any `PreTrainedModel`.

Every `PreTrainedModel.loss_function()` call expects `vocab_size` amongst it's keyword arguments:
```
# transformers.models.gpt2.modeling_gpt2.py:1099
# Flatten the tokens
loss = self.loss_function(
        logits,
        labels,
        vocab_size=self.config.vocab_size,
        **kwargs,
    )
```
Meanwhile, `DPLossFastGradientAdaptiveClipping.__call__ `and `DPLossFastGradientClipping.__call__` don't have this keyword argument `vocab_size` in their signature. `vocab` size is later needed for tensor flattening:
```
def ForCausalLMLoss(
    logits,
    labels,
    vocab_size: int,
    num_items_in_batch: Optional[torch.Tensor] = None,
    ignore_index: int = -100,
    shift_labels: Optional[torch.Tensor] = None,
    **kwargs,
) -> torch.Tensor:
    # Upcast to float if we need to compute the loss to avoid potential precision issues
    logits = logits.float()

    if shift_labels is None:
        # Shift so that tokens < n predict n
        labels = nn.functional.pad(labels, (0, 1), value=ignore_index)
        shift_labels = labels[..., 1:].contiguous()

    # Flatten the tokens
    logits = logits.view(-1, vocab_size) <------ used here
```

## How Has This Been Tested (if it applies)
Tested and trained on transformers' `GPT2LMHeadModel` with LoRA and 4B parameter... (continued)

1466 of 3066 relevant lines covered (47.81%)

0.48 hits per line

Source Files on job run-3 - 17754228286.3
  • Tree
  • List 70
  • Changed 1
  • Source Changed 1
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 17754228286
  • c9032e95 on github
  • Prev Job for on main (#17664266545.3)
  • Next Job for on main (#17777453786.3)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc