3
80%
main: 80%

Ran 16 Sep 2025 04:22AM UTC

Files 70

Run time 2s

Badge

Embed ▾

Committed 16 Sep 2025 12:01AM UTC coverage: 47.815%. Remained the same

Job # 17754228286.3

Build Type

push

github

Committed by

facebook-github-bot

Commit Message

Add support for passing args and kwargs to per-sample loss functions (#786)

Summary:
## Types of changes

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Docs change / refactoring / dependency upgrade

## Motivation and Context / Related issue

It prevents `TypeError: DPLossFastGradientAdaptiveClipping.__call__() got an unexpected keyword argument 'vocab_size'` error from triggering when assigning `DPLossFastGradientAdaptiveClipping` or `DPLossFastGradientClipping` to the `.loss_function` property of any `PreTrainedModel`.

Every `PreTrainedModel.loss_function()` call expects `vocab_size` amongst it's keyword arguments:
```
# transformers.models.gpt2.modeling_gpt2.py:1099
# Flatten the tokens
loss = self.loss_function(
        logits,
        labels,
        vocab_size=self.config.vocab_size,
        **kwargs,
    )
```
Meanwhile, `DPLossFastGradientAdaptiveClipping.__call__ `and `DPLossFastGradientClipping.__call__` don't have this keyword argument `vocab_size` in their signature. `vocab` size is later needed for tensor flattening:
```
def ForCausalLMLoss(
    logits,
    labels,
    vocab_size: int,
    num_items_in_batch: Optional[torch.Tensor] = None,
    ignore_index: int = -100,
    shift_labels: Optional[torch.Tensor] = None,
    **kwargs,
) -> torch.Tensor:
    # Upcast to float if we need to compute the loss to avoid potential precision issues
    logits = logits.float()

    if shift_labels is None:
        # Shift so that tokens < n predict n
        labels = nn.functional.pad(labels, (0, 1), value=ignore_index)
        shift_labels = labels[..., 1:].contiguous()

    # Flatten the tokens
    logits = logits.view(-1, vocab_size) <------ used here
```

## How Has This Been Tested (if it applies)
Tested and trained on transformers' `GPT2LMHeadModel` with LoRA and 4B parameter... (continued)

Run Details

1466 of 3066 relevant lines covered (47.81%)

0.48 hits per line

pytorch / opacus / 17754228286 / 3
80%
main: 80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job run-3 - 17754228286.3

pytorch / opacus / 17754228286 / 3 80% main: 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job run-3 - 17754228286.3

pytorch / opacus / 17754228286 / 3
80%
main: 80%

README BADGES
x