• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

kubeflow / trainer / 23960692171
58%
master: 58%

Build:
Build:
LAST BUILD BRANCH: megatron
DEFAULT BRANCH: master
Ran 03 Apr 2026 08:19PM UTC
Jobs 1
Files 40
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

03 Apr 2026 08:07PM UTC coverage: 57.89% (-0.2%) from 58.057%
23960692171

Pull #3408

github

krishdef7
feat(operator): support multi-slice TPU training via trainer replicas

For multi-slice TPU, JobSet models each TPU slice as a ReplicatedJob
replica, with parallelism = hosts per slice and replicas = slice count.
The operator previously blocked this with two hard constraints:

1. builder.go unconditionally set trainer Replicas = 1, destroying any
   value from the runtime template.
2. trainingruntime_webhook.go rejected replicas != 1 for all ancestors
   including trainer.

Changes:
- builder.go: nil-guard for trainer Replicas, preserving the value from
  the runtime template instead of unconditional overwrite.
- jobset.go: in Build(), compute perSlice = numNodes / replicas for the
  trainer ancestor so each slice runs the correct number of hosts.
- trainingruntime_webhook.go: allow trainer ancestor replicas > 1 to
  enable multi-slice configurations to pass admission.
- trainingruntime_webhook_test.go: update invalid_replicas test to
  reflect that trainer replicas > 1 is now valid.
- trainingruntime_test.go: add test case for 4-slice x 8 hosts
  (NumNodes=32), verifying Parallelism=8 per slice and MinMember=34.

Semantics: numNodes = total hosts across all slices.
Per-slice hosts = numNodes / replicas.

REF: https://github.com/kubeflow/trainer/issues/3407
Signed-off-by: krishdef7 <gargkrish06@gmail.com>
Pull Request #3408: feat(operator): support multi-slice TPU by enabling trainer replicas > 1

6 of 31 new or added lines in 4 files covered. (19.35%)

8 existing lines in 1 file now uncovered.

2036 of 3517 relevant lines covered (57.89%)

0.67 hits per line

New Missed Lines in Diff

Lines Coverage ∆ File
5
0.0
0.0% pkg/runtime/framework/plugins/jobset/builder.go
20
45.02
-2.26% pkg/runtime/framework/plugins/jobset/jobset.go

Uncovered Existing Lines

Lines Coverage ∆ File
8
45.02
-2.26% pkg/runtime/framework/plugins/jobset/jobset.go
Jobs
ID Job ID Ran Files Coverage
1 23960692171.1 03 Apr 2026 08:19PM UTC 40
57.89
GitHub Action Run
Source Files on build 23960692171
  • Tree
  • List 40
  • Changed 3
  • Source Changed 0
  • Coverage Changed 3
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Pull Request #3408
  • PR Base - master (#23956910435)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc