• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

kubeflow / trainer / 26143795682
62%
master: 65%

Build:
Build:
LAST BUILD BRANCH: test/flux-integration-e2e
DEFAULT BRANCH: master
Ran 20 May 2026 05:45AM UTC
Jobs 1
Files 40
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

20 May 2026 05:41AM UTC coverage: 61.804% (-0.3%) from 62.134%
26143795682

Pull #3408

github

krishdef7
feat(operator): support multi-slice TPU training via trainer replicas

For multi-slice TPU, JobSet models each TPU slice as a ReplicatedJob
replica, with parallelism = hosts per slice and replicas = slice count.
The operator previously blocked this with two hard constraints:

1. builder.go unconditionally set trainer Replicas = 1, destroying any
   value from the runtime template.
2. trainingruntime_webhook.go rejected replicas != 1 for all ancestors
   including trainer.

Changes:
- builder.go: nil-guard for trainer Replicas, preserving the value from
  the runtime template instead of unconditional overwrite.
- jobset.go: in Build(), compute perSlice = numNodes / replicas for the
  trainer ancestor so each slice runs the correct number of hosts.
- trainingruntime_webhook.go: allow trainer ancestor replicas > 1 to
  enable multi-slice configurations to pass admission.
- trainingruntime_webhook_test.go: update invalid_replicas test to
  reflect that trainer replicas > 1 is now valid.
- trainingruntime_test.go: add test case for 4-slice x 8 hosts
  (NumNodes=32), verifying Parallelism=8 per slice and MinMember=34.

Semantics: numNodes = total hosts across all slices.
Per-slice hosts = numNodes / replicas.

REF: https://github.com/kubeflow/trainer/issues/3407
Signed-off-by: krishdef7 <gargkrish06@gmail.com>
Pull Request #3408: feat(operator): support multi-slice TPU by enabling trainer replicas > 1

8 of 31 new or added lines in 4 files covered. (25.81%)

8 existing lines in 1 file now uncovered.

2186 of 3537 relevant lines covered (61.8%)

0.72 hits per line

Uncovered Changes

Lines Coverage ∆ File
21
45.04
-3.32% pkg/runtime/framework/plugins/jobset/jobset.go
2
98.41
-1.59% pkg/runtime/framework/plugins/jobset/builder.go

Coverage Regressions

Lines Coverage ∆ File
8
62.79
0.0% pkg/webhooks/trainingruntime_webhook.go
Jobs
ID Job ID Ran Files Coverage
1 26143795682.1 20 May 2026 05:45AM UTC 40
61.8
GitHub Action Run
Source Files on build 26143795682
  • Tree
  • List 40
  • Changed 3
  • Source Changed 0
  • Coverage Changed 3
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • Pull Request #3408
  • PR Base - master (#26100441005)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc