• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

kubeflow / trainer
58%
master: 58%

Build:
Build:
LAST BUILD BRANCH: megatron
DEFAULT BRANCH: master
Repo Added 20 Mar 2025 01:49PM UTC
Token 3qIdUH6ns6RNy0sBPPQ6ybJp7VqYkScU8 regen
Build 3007 Last
Files 40
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH megatron
branch: SELECT
CHANGE BRANCH
x
Sync Branches
  • No branch selected
  • 2836-expose-builruntimeinfo
  • 2871-allow-podspecoverride-dupl-jobs
  • Bug
  • KEP-volcano-scheduler
  • Xgboost-E2E-renable
  • a10-2-gpu
  • add-akshay-reviewer
  • add-audio-examples
  • add-config-api-tests-2885
  • add-core-runtimes-function
  • add-dependabot
  • add-gitattr
  • add-gpu-e2e-timeout
  • add-license-scan-badge
  • add-local-example
  • add-local-trainer-client
  • add-local-trainer-example
  • add-manager-field-podtemplateoverride
  • add-ok-to-test
  • add-overlay-manifest-v2
  • add-patch-updates-k8s
  • add-pod-network-plugin-to-diagram
  • add-qwen3-1.7b
  • add-r-generation
  • add-runtime-labels
  • add-sdk-release
  • add-standalone-manifest
  • agents-md
  • automate-release
  • bo/feat/remove-launcher-chainer-validation
  • bo/test/add-ut-for-torch-runtime-valid
  • bump-go-1.25
  • bump-jobset-v0.9.0
  • bump-master-2.2
  • bump-torch-deepspeed
  • bump-trivy-0.69.2
  • cache-example
  • cache-oss
  • cache_initilizer
  • cache_pipeline
  • changelog-1.9.1
  • changelog-2.0.0
  • changelog-2.0.1
  • changelog-v2.0.0-rc.0
  • changelog-v2.0.0-rc.1
  • changelog-v2.1.0
  • changelog-v2.1.0-rc.0
  • changelog-v2.1.0-rc.1
  • cherry-pick-2666-to-release-2.0
  • cherry-pick-2675-to-release-2.0
  • cherry-pick-2682-to-release-2.0
  • cherry-pick-2683-to-release-2.0
  • cherry-pick-2685-to-release-2.0
  • cherry-pick-2686-to-release-2.0
  • cherry-pick-2691-to-release-2.0
  • cherry-pick-2695-to-release-2.0
  • cherry-pick-2700-to-release-2.0
  • cherry-pick-2703-to-release-2.0
  • cherry-pick-2707-to-release-2.0
  • cherry-pick-2719-to-release-2.0
  • cherry-pick-2726-to-release-2.0
  • cherry-pick-2728-to-release-2.1
  • cherry-pick-2731-to-release-2.0
  • cherry-pick-2734-to-release-2.0
  • cherry-pick-2739-to-release-2.0
  • cherry-pick-2761
  • cherry-pick-2766
  • cherry-pick-2771-to-release-2.0
  • cherry-pick-2774-to-release-2.0
  • cherry-pick-2780
  • cherry-pick-2813
  • cherry-pick-2815
  • cherry-pick-2837-to-release-2.0
  • cherry-pick-2854-to-release-2.0
  • cherry-pick-2877-to-release-2.1
  • cherry-pick-2904-to-release-2.1
  • cherry-pick-2907-to-release-2.1
  • cherry-pick-2908-to-release-2.1
  • cherry-pick-2913-to-release-2.1
  • cherry-pick-2923-to-release-2.1
  • cherry-pick-2926-to-release-2.1
  • cherry-pick-2971-to-release-2.1
  • cherry-pick-3009-to-release-2.1
  • cherry-pick-3010-to-release-2.1
  • cherry-pick-3307-to-release-2.2
  • cherry-pick-3319-to-release-2.2
  • cherry-pick-3322-to-release-2.2
  • cherry-pick-3323-to-release-2.2
  • cherry-pick-3331-to-release-2.2
  • cherry-pick-3333-to-release-2.2
  • cherry-pick-3335-to-release-2.2
  • cherry-pick-3360-to-release-2.2
  • cherry-pick-changelog-1.9
  • chore/KEP-runtime-class
  • chore/gha
  • chore/merge-podspacoverride-test-cases
  • chore/remove-copyright-year
  • chore/rename-certmanagement-config-fields
  • chore/upgrade-torchtune-version
  • ci/include-1.32-k8s
  • claude-symlink
  • code-quality-check
  • code-quality-clean
  • config-api-implementation
  • coscheduling-indexers-ut
  • deepspeed-runtime
  • dependabot/cargo/pkg/data_cache/arrow-56.2.0
  • dependabot/cargo/pkg/data_cache/arrow-57.0.0
  • dependabot/cargo/pkg/data_cache/arrow-57.1.0
  • dependabot/cargo/pkg/data_cache/arrow-57.2.0
  • dependabot/cargo/pkg/data_cache/arrow-58.0.0
  • dependabot/cargo/pkg/data_cache/arrow-58.1.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-56.2.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-57.1.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-57.2.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-58.0.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-58.1.0
  • dependabot/cargo/pkg/data_cache/arrow-schema-56.2.0
  • dependabot/cargo/pkg/data_cache/arrow-schema-57.2.0
  • dependabot/cargo/pkg/data_cache/async-trait-0.1.89
  • dependabot/cargo/pkg/data_cache/axum-0.8.8
  • dependabot/cargo/pkg/data_cache/bincode-2.0.1
  • dependabot/cargo/pkg/data_cache/bincode-3.0.0
  • dependabot/cargo/pkg/data_cache/bytes-1.11.0
  • dependabot/cargo/pkg/data_cache/bytes-1.11.1
  • dependabot/cargo/pkg/data_cache/crossbeam-channel-0.5.15
  • dependabot/cargo/pkg/data_cache/datafusion-51.0.0
  • dependabot/cargo/pkg/data_cache/futures-0.3.32
  • dependabot/cargo/pkg/data_cache/hickory-resolver-0.25.2
  • dependabot/cargo/pkg/data_cache/iceberg-0.6.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.7.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.8.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.9.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.6.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.7.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.8.0
  • dependabot/cargo/pkg/data_cache/lz4_flex-0.11.6
  • dependabot/cargo/pkg/data_cache/quinn-proto-0.11.14
  • dependabot/cargo/pkg/data_cache/ring-0.17.14
  • dependabot/cargo/pkg/data_cache/rustls-webpki-0.103.10
  • dependabot/cargo/pkg/data_cache/serde-1.0.228
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.0.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.1.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.2.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.3.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-58.0.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-58.1.0
  • dependabot/cargo/pkg/data_cache/test/bincode-2.0.1
  • dependabot/cargo/pkg/data_cache/test/bincode-3.0.0
  • dependabot/cargo/pkg/data_cache/test/bytes-1.11.0
  • dependabot/cargo/pkg/data_cache/test/bytes-1.11.1
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.51
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.52
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.53
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.54
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.56
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.57
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.59
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.60
  • dependabot/cargo/pkg/data_cache/test/futures-0.3.32
  • dependabot/cargo/pkg/data_cache/test/serde-1.0.228
  • dependabot/cargo/pkg/data_cache/test/tokio-1.48.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.49.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.50.0
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.2
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.3
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.4
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.5
  • dependabot/cargo/pkg/data_cache/test/tracing-0.1.43
  • dependabot/cargo/pkg/data_cache/test/tracing-0.1.44
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.20
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.22
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.23
  • dependabot/cargo/pkg/data_cache/time-0.3.47
  • dependabot/cargo/pkg/data_cache/tokio-1.44.2
  • dependabot/cargo/pkg/data_cache/tokio-1.48.0
  • dependabot/cargo/pkg/data_cache/tokio-1.49.0
  • dependabot/cargo/pkg/data_cache/tokio-1.50.0
  • dependabot/cargo/pkg/data_cache/tonic-0.14.2
  • dependabot/cargo/pkg/data_cache/tonic-0.14.3
  • dependabot/cargo/pkg/data_cache/tonic-0.14.4
  • dependabot/cargo/pkg/data_cache/tonic-0.14.5
  • dependabot/cargo/pkg/data_cache/tower-0.5.2
  • dependabot/cargo/pkg/data_cache/tower-0.5.3
  • dependabot/cargo/pkg/data_cache/tracing-subscriber-0.3.23
  • dependabot/docker/cmd/data_cache/rust-1.91-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.92-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.93-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.94-bullseye
  • dependabot/docker/cmd/initializers/dataset/python-3.14-slim-bookworm
  • dependabot/docker/cmd/initializers/model/python-3.14-slim-bookworm
  • dependabot/docker/cmd/runtimes/deepspeed/mpioperator/base-v0.7.0
  • dependabot/docker/cmd/runtimes/deepspeed/mpioperator/base-v0.8.0
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.0.2-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.1.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.1.1-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.2.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/mpioperator/base-v0.7.0
  • dependabot/docker/cmd/runtimes/mlx/mpioperator/base-v0.8.0
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.0.2-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.1.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.1.1-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.2.0-devel-ubuntu22.04
  • dependabot/docker/cmd/trainer-controller-manager/golang-1.25
  • dependabot/docker/cmd/trainer-controller-manager/golang-1.26
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.10.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.11.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.9.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.9.1-cuda12.8-cudnn9-runtime
  • dependabot/github_actions/actions/checkout-5
  • dependabot/github_actions/actions/checkout-6
  • dependabot/github_actions/actions/github-script-8
  • dependabot/github_actions/actions/setup-go-6
  • dependabot/github_actions/actions/setup-python-6
  • dependabot/github_actions/actions/stale-10
  • dependabot/github_actions/actions/upload-artifact-5
  • dependabot/github_actions/actions/upload-artifact-6
  • dependabot/github_actions/actions/upload-artifact-7
  • dependabot/github_actions/amannn/action-semantic-pull-request-6.1.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.33.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.34.0
  • dependabot/github_actions/aquasecurity/trivy-action-0.34.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.34.2
  • dependabot/github_actions/aquasecurity/trivy-action-0.35.0
  • dependabot/github_actions/docker/login-action-4
  • dependabot/github_actions/dot-github/workflows/aquasecurity/trivy-action-0.34.0
  • dependabot/github_actions/github/codeql-action-4
  • dependabot/go_modules/github.com/go-jose/go-jose/v4-4.1.4
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.2
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.3
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.5
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.28.1
  • dependabot/go_modules/github.com/onsi/gomega-1.38.3
  • dependabot/go_modules/github.com/onsi/gomega-1.39.0
  • dependabot/go_modules/github.com/onsi/gomega-1.39.1
  • dependabot/go_modules/github.com/open-policy-agent/cert-controller-0.15.0
  • dependabot/go_modules/github.com/open-policy-agent/cert-controller-0.16.0
  • dependabot/go_modules/go.uber.org/zap-1.27.1
  • dependabot/go_modules/golang-8c88b1e330
  • dependabot/go_modules/golang-c94709d3c3
  • dependabot/go_modules/golang-ce64870c5e
  • dependabot/go_modules/golang-cf2caa1bb8
  • dependabot/go_modules/golang-edfadaf7f0
  • dependabot/go_modules/golang-f180a085e8
  • dependabot/go_modules/golang.org/x/crypto-0.45.0
  • dependabot/go_modules/golang.org/x/net-0.38.0
  • dependabot/go_modules/golang.org/x/oauth2-0.27.0
  • dependabot/go_modules/kubernetes-13c179eb27
  • dependabot/go_modules/kubernetes-203b3330f8
  • dependabot/go_modules/kubernetes-2b83cfd1e1
  • dependabot/go_modules/kubernetes-33780c5637
  • dependabot/go_modules/kubernetes-33cfdb17df
  • dependabot/go_modules/kubernetes-46bc08174d
  • dependabot/go_modules/kubernetes-bc4ec63014
  • dependabot/go_modules/kubernetes-bd430bb9c9
  • dependabot/go_modules/kubernetes-df4453129a
  • dependabot/go_modules/kubernetes-e0300699ac
  • dependabot/go_modules/kubernetes-faa114bc83
  • dependabot/go_modules/kubernetes-fdea40109e
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.2
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.3
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.4
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.5
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.6
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.7
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.8
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.9
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.2
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.3
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.4
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.5
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.6
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.7
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.8
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.9
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.4.1
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.4.2
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.5.0
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.6.1
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.7.0
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.8.4
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.2
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.3
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.4
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.5
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.6
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.7
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.8
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.9
  • dependabot/pip/cmd/runtimes/deepspeed/mpi4py-4.1.1
  • dependabot/pip/cmd/runtimes/deepspeed/sentencepiece-0.2.1
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.10.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.11.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.6.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.7.1
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.8.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.9.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.9.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.51.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.52.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.53.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.2
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.3
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.6
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.0.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.1.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.2.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.3.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.4.0
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.4.1
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.4.2
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.5.0
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.6.1
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.7.0
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.8.4
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.29.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.1
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.5
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.6
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.31.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-data-0.2.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.28.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.28.4
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.2
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.4
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.5
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.6
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.31.0
  • dependabot/pip/cmd/trainers/torchtune/torchao-0.16.0
  • dependabot/pip/cmd/trainers/torchtune/torchao-0.17.0
  • docs/local-examples-gpu-support
  • docs/local-iceberg-validation
  • dont-merge-gpu-label-test
  • e2e-debug-clean
  • e2e-test-through-helm
  • example/trainjob-yaml
  • fail-gpu-e2e
  • feat/add-coscheduling-uts
  • feat/add-helm-ci-checks
  • feat/add-securitycontext-support-trainjob
  • feat/add-version-file
  • feat/automate-release
  • feat/config-unit-tests
  • feat/ctr-webhook
  • feat/dataset-preprocess
  • feat/enforce-runtime-info-plugin
  • feat/example/add-speech-recognition-with-ddp-example
  • feat/helm-data-cache-config
  • feat/initializers/s3
  • feat/llama3_2-manifests
  • feat/llm-trainer-v2
  • feat/local-model
  • feat/lora-support
  • feat/move-enablehttp2-to-config
  • feat/pvc-check
  • feat/replica-valid
  • feat/sdk-torchtune-config
  • feat/statusserver-healthz-probe
  • feat/termination-grace-period-patch
  • feat/torchtune-plugin
  • feat/trainer-multi-slice-tpu
  • feat/trainjob-affinity
  • feat/trainjob-imagepullsecrets
  • feat/webhook-validate-trainjob-name
  • feat/webhook/rfc1035
  • feature-gate-scaffolding
  • feature/add-xgboost-runtime
  • feature/debabrata
  • feature/framework-env-conflict-validation
  • feature/helm-charts-v2
  • feature/kube-linter-3096
  • feature/multi-replica-replicatedjobs
  • feature/support-for-ClusterTrainingRuntimes
  • fix-arg-for-get-args-using-torchtune-config
  • fix-close-pr-message
  • fix-controller-rbac
  • fix-coveralls
  • fix-crd-cel-namespace
  • fix-deepspeed-example
  • fix-deepspeed-npoc
  • fix-e2e
  • fix-e2e-sdk-install
  • fix-e2e-test
  • fix-example-runtime
  • fix-helm-chart-name
  • fix-helm-charts-config-api-2894
  • fix-helm-lint
  • fix-helm-test
  • fix-helm-unittest-logic-clean
  • fix-immutable-apis
  • fix-kep-volcano
  • fix-latest-dev
  • fix-latest-tag
  • fix-llm-hp-optimization-error
  • fix-local-tests
  • fix-mlx-runtime
  • fix-mpi-key-mode
  • fix-oci-vm-tf
  • fix-outdated-intstr-lib
  • fix-permissions
  • fix-python-release-version
  • fix-readonly-rootfs
  • fix-release-doc
  • fix-release-steps
  • fix-resource-allocation
  • fix-serviceaccount-name
  • fix-suspend-resume-3008
  • fix-tag-manager
  • fix-test-bug
  • fix-torch-compile
  • fix-torchtune-runtime-deps
  • fix-trainer-type-annotation
  • fix/allow-podtemplate-update-on-unsuspend
  • fix/cert-and-issuer
  • fix/ci-duplicate-step-name
  • fix/disable-github-actions
  • fix/e2e-platform-mismatch
  • fix/helm-chart
  • fix/issue-template
  • fix/jax-validation
  • fix/kep2401-lint
  • fix/mnist-training-parameters-v2
  • fix/multiple-depends-on
  • fix/notebook-e2e-flaky-completion
  • fix/python-type-import
  • fix/rbac/event
  • fix/remove-jobset-lws-patches
  • fix/runtime-info-thread-safety
  • fix/sync-podsets-count-to-template-spec
  • fix/tidy-KEP-2401
  • fix/torchtune-c-compiler
  • fix/torchtune-plugin
  • fix/torchtune-validation-lora-immutable-args
  • fix/trainjob-status-error
  • flux-framework-plugin
  • gpu-arc-doc
  • gpu-test-on-pr
  • gsoc-2442-jax-runtime-proposal
  • gsoc-pss-istio-fix
  • gsoc25-project7-kep
  • hatchling-package
  • health
  • helm-integration-tests
  • implement-resource-in-use-finalizer
  • implement-resource-in-use-for-cl-training-runtime
  • implement-validation-uts
  • indexers-ut
  • issue-2218-pod-spec-override-kep
  • issue-2547
  • issue-2706-v2-go-mod
  • issue-2789/implement-cluster-training-runtimes-deprecation-process
  • jax-runtime
  • jax-runtime-impl
  • jobset-name-prefix
  • jobset-validation
  • jobset-volume-claim-policies
  • k8s_1.32_upgrade
  • kai-scheduler-2628
  • kai_kep
  • kep-2598-xgboost-runtime
  • kep-2779-trainjob-progress
  • kep-2841-add-flux-hpc
  • kubecon-london-demo
  • kubelow-sdk-release
  • master
  • megatron
  • mlx-cuda-runtime
  • mlx-runtime
  • move-imports
  • obtain-runtimeTemplate-via-info
  • openssf-badge
  • override_label_and_annotation
  • patch-1
  • patch-issue-2027
  • pick/example-alpaca
  • pick/fix-torchtune-plugin
  • pkg/apply_unit-tests
  • plugin/flux
  • pr-15
  • pr-17
  • pr-18
  • pr-19
  • pr-20
  • pr-21
  • pr-22
  • pr-24
  • pr-25
  • pr-26
  • pr-27
  • pr-28
  • pr-29
  • pr-30
  • pr-32
  • pr-33
  • pr-35
  • pr-36
  • pr-37
  • pr-38
  • pr-39
  • pr-41
  • pr-42
  • pr-43
  • pr-44
  • pr-45
  • pr-47
  • pr-created-condition
  • pr-k8s-lint
  • pr-runtime-patches
  • pr-time-webhook
  • pr-title-workflow
  • prometheus
  • proposal
  • proposal-2170
  • pss-istio-fix-clean
  • pss-restricted-fixes
  • public-configmap
  • refactor-named-container-ports
  • refs/tags/v1.9.1
  • refs/tags/v2.0.0-rc.0
  • refs/tags/v2.0.0-rc.1
  • refs/tags/v2.0.1
  • refs/tags/v2.1.0
  • refs/tags/v2.1.0-rc.0
  • refs/tags/v2.1.0-rc.1
  • refs/tags/v2.2.0
  • refs/tags/v2.2.0-rc.0
  • refs/tags/v2.2.0-rc.1
  • release-1.9
  • release-2.0
  • release-2.1
  • release-2.2
  • release-automation
  • release-python-doc
  • remove-command-runtimes
  • remove-elastic-policy
  • remove-k8s-version-matrix
  • remove-mpi
  • remove-num-proc
  • remove-py-packages
  • remove-sdk
  • remove-setcap-cap-net-bind
  • remove-trivy-action
  • remove-vendor-specific-parameters
  • revert-2646-fix-trainer-type-annotation
  • roadmap-2025
  • roadmap-2026
  • rqst-env-only-if-label-present
  • runtime-rbac
  • runtime_fix
  • safe-gpu-test
  • scorecard-workflow
  • script/setup-gpu-cluster2
  • sdk-ancestor-updates
  • sdk-fix-mpirun
  • security-doc
  • separate-models-from-sdk
  • sharedinit
  • solanyn/question-answer-example
  • support-arm-container
  • support-for-gpu-cluster-using-oci-runner
  • support_kai
  • terrytangyuan-patch-1
  • test-cncf-gpu-runner
  • test-gpu-arc
  • test-statusserver-helpers
  • test/fix-flaky-test
  • test/rename-runtime-plugin-tests
  • test/runtime-core-coverage
  • tmp_secret_verify
  • torchrun-var
  • trainer-release
  • training-progression#2779
  • trainjob-progress
  • treat-ancestor-label-as-identifier
  • trivy-scans
  • ttl
  • update-approvers
  • update-examples-with-unpacking-params
  • update-github-runners
  • update-image-tags
  • update-jobset-0.11
  • update-license
  • update-logs-examples
  • update-manifest-images-to-ghcr
  • update-news-v2.2-release
  • update-owners
  • update-readme
  • update-release-process
  • update-sdk-reference
  • update-security-context
  • update-slack
  • update-stale-bot-version
  • update-torch-2.10
  • update-torch-2.9
  • use-tilt
  • validation-mpiruntimes
  • volcano
  • volcano-podgroup-build
  • vuls
  • vzamboulingame-upgrade-go-v1.24
  • was-kep
  • workflow/helm
  • workflow/publish-helm-charts
  • xgboost-runtime-implementation
  • year-cleanup

04 Apr 2026 05:15AM UTC coverage: 58.057% (+0.1%) from 57.939%
23972072192

Pull #3201

github

XploY04
debug: add NCCL_DEBUG=INFO to diagnose /dev/shm failure on node-1

Signed-off-by: XploY04 <2004agarwalyash@gmail.com>
Pull Request #3201: feat: add Megatron-Core GPT Tensor Parallelism example notebook

2032 of 3500 relevant lines covered (58.06%)

0.67 hits per line

Relevant lines Covered
Build:
Build:
3500 RELEVANT LINES 2032 COVERED LINES
0.67 HITS PER LINE
Source Files on master
  • Tree
  • List 40
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
23972072192 megatron debug: add NCCL_DEBUG=INFO to diagnose /dev/shm failure on node-1 Signed-off-by: XploY04 <2004agarwalyash@gmail.com> Pull #3201 04 Apr 2026 05:19AM UTC XploY04 github
58.06
23960834318 feat/trainer-multi-slice-tpu feat(operator): support multi-slice TPU training via trainer replicas For multi-slice TPU, JobSet models each TPU slice as a ReplicatedJob replica, with parallelism = hosts per slice and replicas = slice count. The operator previously blocked thi... Pull #3408 03 Apr 2026 08:24PM UTC krishdef7 github
57.79
23960692171 feat/trainer-multi-slice-tpu feat(operator): support multi-slice TPU training via trainer replicas For multi-slice TPU, JobSet models each TPU slice as a ReplicatedJob replica, with parallelism = hosts per slice and replicas = slice count. The operator previously blocked thi... Pull #3408 03 Apr 2026 08:19PM UTC krishdef7 github
57.89
23811025952 test-statusserver-helpers fix(statusserver): improve bearer token parsing and add helper tests Signed-off-by: Skolli <tanusuch@gmail.com> Pull #3405 03 Apr 2026 06:24PM UTC suchirkolli github
58.34
23957046905 dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.11.0-cuda12.8-cudnn9-runtime chore(deps): bump pytorch/pytorch in /cmd/trainers/torchtune Bumps pytorch/pytorch from 2.9.1-cuda12.8-cudnn9-runtime to 2.11.0-cuda12.8-cudnn9-runtime. --- updated-dependencies: - dependency-name: pytorch/pytorch dependency-version: 2.11.0-cu... Pull #3381 03 Apr 2026 06:23PM UTC web-flow github
58.06
23957029240 dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.2.0-devel-ubuntu22.04 chore(deps): bump nvidia/cuda in /cmd/runtimes/deepspeed Bumps nvidia/cuda from 13.1.1-devel-ubuntu22.04 to 13.2.0-devel-ubuntu22.04. --- updated-dependencies: - dependency-name: nvidia/cuda dependency-version: 13.2.0-devel-ubuntu22.04 depen... Pull #3380 03 Apr 2026 06:22PM UTC web-flow github
58.14
23956912610 dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.9 chore(deps): bump deepspeed in /cmd/runtimes/deepspeed Bumps [deepspeed](https://github.com/deepspeedai/DeepSpeed) from 0.18.7 to 0.18.9. - [Release notes](https://github.com/deepspeedai/DeepSpeed/releases) - [Commits](https://github.com/deepspee... Pull #3402 03 Apr 2026 06:19PM UTC web-flow github
58.06
23956913849 dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.9 chore(deps): bump deepspeed in /cmd/runtimes/deepspeed Bumps [deepspeed](https://github.com/deepspeedai/DeepSpeed) from 0.18.7 to 0.18.9. - [Release notes](https://github.com/deepspeedai/DeepSpeed/releases) - [Commits](https://github.com/deepspee... Pull #3402 03 Apr 2026 06:18PM UTC web-flow github
58.06
23956914515 dependabot/pip/cmd/runtimes/deepspeed/datasets-4.8.4 chore(deps): bump datasets in /cmd/runtimes/deepspeed Bumps [datasets](https://github.com/huggingface/datasets) from 4.7.0 to 4.8.4. - [Release notes](https://github.com/huggingface/datasets/releases) - [Commits](https://github.com/huggingface/da... Pull #3384 03 Apr 2026 06:18PM UTC web-flow github
58.06
23956910435 master chore(deps): bump github.com/go-jose/go-jose/v4 from 4.1.3 to 4.1.4 (#3406) Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.1.3 to 4.1.4. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Commits](... push 03 Apr 2026 06:18PM UTC web-flow github
58.06
See All Builds (2688)

Badge your Repo: trainer

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc