• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

kubeflow / trainer / 24222426643
58%

Build:
DEFAULT BRANCH: master
Ran 10 Apr 2026 01:57AM UTC
Jobs 1
Files 40
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

10 Apr 2026 01:53AM UTC coverage: 58.057%. Remained the same
24222426643

push

github

web-flow
feat: add Megatron-Core GPT Tensor Parallelism example notebook (#3201)

* feat: add Megatron-Core GPT Tensor Parallelism example notebook

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* added megatron notebook to the e2e gpu test

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* feat: Parameterize Megatron core GPT notebook's tensor parallelism and GPU count, and update the e2e test workflow to pass these parameters.

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* change the number of gpus to 2

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* docs: update Megatron GPT tensor parallelism example notebook.

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* fix: correct minor typos and improve code readability in Megatron-Core GPT notebook

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* fix: update tensor model parallel size retrieval and improve code clarity in Megatron-Core GPT notebook

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* fix: correct wording in training function description for clarity in Megatron-Core GPT notebook

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* feat: add verification step for TrainJob completion in Megatron-Core GPT notebook

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* fix: use multi-node setup for Megatron TP to work with GPU time-slicing

GPU time-slicing (replicas=2) advertises 2 GPUs to Kubernetes but
exposes only 1 CUDA device inside the container. This caused torchrun
to launch 1 worker (WORLD_SIZE=1), failing Megatron's TP requirement
of world_size >= 2.

Switch from 1 node with 2 GPUs to 2 nodes with 1 GPU each. This
creates 2 pods, each getting 1 time-sliced GPU, giving WORLD_SIZE=2
for tensor parallelism across pods.

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>

* fix: use PyTorch devel image for Megatron compile_helpers() support

Megatron-Core's compile_helpers() requires `make` and `gcc` to build
C dataset helpers. The default runtime image d... (continued)

2032 of 3500 relevant lines covered (58.06%)

0.67 hits per line

Jobs
ID Job ID Ran Files Coverage
1 24222426643.1 10 Apr 2026 01:57AM UTC 40
58.06
GitHub Action Run
Source Files on build 24222426643
  • Tree
  • List 40
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Repo
  • 253aad27 on github
  • Prev Build on master (#24217645634)
  • Next Build on master (#24245349018)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc