17750090532
80%

Ran 16 Sep 2025 12:09AM UTC

Jobs 3

Files 133

Run time 1min

Badge

Embed ▾

Committed 16 Sep 2025 12:01AM UTC coverage: 80.307%. Remained the same

Build # 17750090532

Build Type

push

github

Committed by

facebook-github-bot

Commit Message

Add support for passing args and kwargs to per-sample loss functions (#786)

Summary:
## Types of changes

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Docs change / refactoring / dependency upgrade

## Motivation and Context / Related issue

It prevents `TypeError: DPLossFastGradientAdaptiveClipping.__call__() got an unexpected keyword argument 'vocab_size'` error from triggering when assigning `DPLossFastGradientAdaptiveClipping` or `DPLossFastGradientClipping` to the `.loss_function` property of any `PreTrainedModel`.

Every `PreTrainedModel.loss_function()` call expects `vocab_size` amongst it's keyword arguments:
```
# transformers.models.gpt2.modeling_gpt2.py:1099
# Flatten the tokens
loss = self.loss_function(
        logits,
        labels,
        vocab_size=self.config.vocab_size,
        **kwargs,
    )
```
Meanwhile, `DPLossFastGradientAdaptiveClipping.__call__ `and `DPLossFastGradientClipping.__call__` don't have this keyword argument `vocab_size` in their signature. `vocab` size is later needed for tensor flattening:
```
def ForCausalLMLoss(
    logits,
    labels,
    vocab_size: int,
    num_items_in_batch: Optional[torch.Tensor] = None,
    ignore_index: int = -100,
    shift_labels: Optional[torch.Tensor] = None,
    **kwargs,
) -> torch.Tensor:
    # Upcast to float if we need to compute the loss to avoid potential precision issues
    logits = logits.float()

    if shift_labels is None:
        # Shift so that tokens < n predict n
        labels = nn.functional.pad(labels, (0, 1), value=ignore_index)
        shift_labels = labels[..., 1:].contiguous()

    # Flatten the tokens
    logits = logits.view(-1, vocab_size) <------ used here
```

## How Has This Been Tested (if it applies)
Tested and trained on transformers' `GPT2LMHeadModel` with LoRA and 4B parameter... (continued)

Run Details

3 of 3 new or added lines in 2 files covered. (100.0%)

5591 of 6962 relevant lines covered (80.31%)

1.79 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
1	run-2 - 17750090532.1	16 Sep 2025 12:16AM UTC	132	80.08	GitHub Action Run
2	run-1 - 17750090532.2	16 Sep 2025 12:17AM UTC	132	80.08	GitHub Action Run
3	run-3 - 17750090532.3	16 Sep 2025 12:09AM UTC	70	47.81	GitHub Action Run

pytorch / opacus / 17750090532
80%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 17750090532

pytorch / opacus / 17750090532 80%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 17750090532

pytorch / opacus / 17750090532
80%

README BADGES
x