• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

kubeflow / trainer
62%

Build:
DEFAULT BRANCH: master
Repo Added 20 Mar 2025 01:49PM UTC
Token 3qIdUH6ns6RNy0sBPPQ6ybJp7VqYkScU8 regen
Build 3574 Last
Files 40
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH master
branch: master
CHANGE BRANCH
x
Reset
Sync Branches
  • master
  • 2836-expose-builruntimeinfo
  • 2871-allow-podspecoverride-dupl-jobs
  • Bug
  • KEP-volcano-scheduler
  • Xgboost-E2E-renable
  • a10-2-gpu
  • add-akshay-reviewer
  • add-audio-examples
  • add-config-api-tests-2885
  • add-core-runtimes-function
  • add-dependabot
  • add-gitattr
  • add-gpu-e2e-timeout
  • add-license-scan-badge
  • add-local-example
  • add-local-trainer-client
  • add-local-trainer-example
  • add-manager-field-podtemplateoverride
  • add-ok-to-test
  • add-osv-scanner
  • add-overlay-manifest-v2
  • add-patch-updates-k8s
  • add-pod-network-plugin-to-diagram
  • add-python-osv-scanning
  • add-qwen3-1.7b
  • add-r-generation
  • add-runtime-labels
  • add-sdk-release
  • add-standalone-manifest
  • agents-md
  • ai-policy
  • automate-release
  • bo/feat/remove-launcher-chainer-validation
  • bo/test/add-ut-for-torch-runtime-valid
  • bump-go-1.25
  • bump-jobset-v0.9.0
  • bump-master-2.2
  • bump-torch-deepspeed
  • bump-trivy-0.69.2
  • cache-example
  • cache-oss
  • cache_initilizer
  • cache_pipeline
  • changelog-1.9.1
  • changelog-2.0.0
  • changelog-2.0.1
  • changelog-v2.0.0-rc.0
  • changelog-v2.0.0-rc.1
  • changelog-v2.1.0
  • changelog-v2.1.0-rc.0
  • changelog-v2.1.0-rc.1
  • cherry-pick-2666-to-release-2.0
  • cherry-pick-2675-to-release-2.0
  • cherry-pick-2682-to-release-2.0
  • cherry-pick-2683-to-release-2.0
  • cherry-pick-2685-to-release-2.0
  • cherry-pick-2686-to-release-2.0
  • cherry-pick-2691-to-release-2.0
  • cherry-pick-2695-to-release-2.0
  • cherry-pick-2700-to-release-2.0
  • cherry-pick-2703-to-release-2.0
  • cherry-pick-2707-to-release-2.0
  • cherry-pick-2719-to-release-2.0
  • cherry-pick-2726-to-release-2.0
  • cherry-pick-2728-to-release-2.1
  • cherry-pick-2731-to-release-2.0
  • cherry-pick-2734-to-release-2.0
  • cherry-pick-2739-to-release-2.0
  • cherry-pick-2761
  • cherry-pick-2766
  • cherry-pick-2771-to-release-2.0
  • cherry-pick-2774-to-release-2.0
  • cherry-pick-2780
  • cherry-pick-2813
  • cherry-pick-2815
  • cherry-pick-2837-to-release-2.0
  • cherry-pick-2854-to-release-2.0
  • cherry-pick-2877-to-release-2.1
  • cherry-pick-2904-to-release-2.1
  • cherry-pick-2907-to-release-2.1
  • cherry-pick-2908-to-release-2.1
  • cherry-pick-2913-to-release-2.1
  • cherry-pick-2923-to-release-2.1
  • cherry-pick-2926-to-release-2.1
  • cherry-pick-2971-to-release-2.1
  • cherry-pick-3009-to-release-2.1
  • cherry-pick-3010-to-release-2.1
  • cherry-pick-3307-to-release-2.2
  • cherry-pick-3319-to-release-2.2
  • cherry-pick-3322-to-release-2.2
  • cherry-pick-3323-to-release-2.2
  • cherry-pick-3331-to-release-2.2
  • cherry-pick-3333-to-release-2.2
  • cherry-pick-3335-to-release-2.2
  • cherry-pick-3360-to-release-2.2
  • cherry-pick-3452-to-release-2.2
  • cherry-pick-3469-to-release-2.2
  • cherry-pick-changelog-1.9
  • chore/KEP-runtime-class
  • chore/gha
  • chore/merge-podspacoverride-test-cases
  • chore/remove-copyright-year
  • chore/rename-certmanagement-config-fields
  • chore/upgrade-torchtune-version
  • chore/xgboost-quantile-dmatrix-update
  • ci/include-1.32-k8s
  • claude-symlink
  • code-quality-check
  • code-quality-clean
  • config-api-implementation
  • coscheduling-indexers-ut
  • deepspeed-runtime
  • dependabot/cargo/pkg/data_cache/arrow-56.2.0
  • dependabot/cargo/pkg/data_cache/arrow-57.0.0
  • dependabot/cargo/pkg/data_cache/arrow-57.1.0
  • dependabot/cargo/pkg/data_cache/arrow-57.2.0
  • dependabot/cargo/pkg/data_cache/arrow-58.0.0
  • dependabot/cargo/pkg/data_cache/arrow-58.1.0
  • dependabot/cargo/pkg/data_cache/arrow-58.2.0
  • dependabot/cargo/pkg/data_cache/arrow-58.3.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-56.2.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-57.1.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-57.2.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-58.0.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-58.1.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-58.2.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-58.3.0
  • dependabot/cargo/pkg/data_cache/arrow-schema-56.2.0
  • dependabot/cargo/pkg/data_cache/arrow-schema-57.2.0
  • dependabot/cargo/pkg/data_cache/async-trait-0.1.89
  • dependabot/cargo/pkg/data_cache/axum-0.8.8
  • dependabot/cargo/pkg/data_cache/axum-0.8.9
  • dependabot/cargo/pkg/data_cache/bincode-2.0.1
  • dependabot/cargo/pkg/data_cache/bincode-3.0.0
  • dependabot/cargo/pkg/data_cache/bytes-1.11.0
  • dependabot/cargo/pkg/data_cache/bytes-1.11.1
  • dependabot/cargo/pkg/data_cache/crossbeam-channel-0.5.15
  • dependabot/cargo/pkg/data_cache/datafusion-51.0.0
  • dependabot/cargo/pkg/data_cache/futures-0.3.32
  • dependabot/cargo/pkg/data_cache/hickory-resolver-0.25.2
  • dependabot/cargo/pkg/data_cache/hickory-resolver-0.26.0
  • dependabot/cargo/pkg/data_cache/hickory-resolver-0.26.1
  • dependabot/cargo/pkg/data_cache/hyper-1.9.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.6.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.7.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.8.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.9.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.9.1
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.6.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.7.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.8.0
  • dependabot/cargo/pkg/data_cache/lz4_flex-0.11.6
  • dependabot/cargo/pkg/data_cache/quinn-proto-0.11.14
  • dependabot/cargo/pkg/data_cache/rand-0.8.6
  • dependabot/cargo/pkg/data_cache/ring-0.17.14
  • dependabot/cargo/pkg/data_cache/rustls-webpki-0.103.10
  • dependabot/cargo/pkg/data_cache/rustls-webpki-0.103.12
  • dependabot/cargo/pkg/data_cache/rustls-webpki-0.103.13
  • dependabot/cargo/pkg/data_cache/serde-1.0.228
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.0.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.1.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.2.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.3.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-58.0.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-58.1.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-58.2.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-58.3.0
  • dependabot/cargo/pkg/data_cache/test/bincode-2.0.1
  • dependabot/cargo/pkg/data_cache/test/bincode-3.0.0
  • dependabot/cargo/pkg/data_cache/test/bytes-1.11.0
  • dependabot/cargo/pkg/data_cache/test/bytes-1.11.1
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.51
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.52
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.53
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.54
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.56
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.57
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.59
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.60
  • dependabot/cargo/pkg/data_cache/test/clap-4.6.1
  • dependabot/cargo/pkg/data_cache/test/futures-0.3.32
  • dependabot/cargo/pkg/data_cache/test/serde-1.0.228
  • dependabot/cargo/pkg/data_cache/test/tokio-1.48.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.49.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.50.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.51.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.51.1
  • dependabot/cargo/pkg/data_cache/test/tokio-1.52.1
  • dependabot/cargo/pkg/data_cache/test/tokio-1.52.2
  • dependabot/cargo/pkg/data_cache/test/tokio-1.52.3
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.2
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.3
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.4
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.5
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.6
  • dependabot/cargo/pkg/data_cache/test/tracing-0.1.43
  • dependabot/cargo/pkg/data_cache/test/tracing-0.1.44
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.20
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.22
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.23
  • dependabot/cargo/pkg/data_cache/time-0.3.47
  • dependabot/cargo/pkg/data_cache/tokio-1.44.2
  • dependabot/cargo/pkg/data_cache/tokio-1.48.0
  • dependabot/cargo/pkg/data_cache/tokio-1.49.0
  • dependabot/cargo/pkg/data_cache/tokio-1.50.0
  • dependabot/cargo/pkg/data_cache/tokio-1.51.0
  • dependabot/cargo/pkg/data_cache/tokio-1.52.2
  • dependabot/cargo/pkg/data_cache/tokio-1.52.3
  • dependabot/cargo/pkg/data_cache/tonic-0.14.2
  • dependabot/cargo/pkg/data_cache/tonic-0.14.3
  • dependabot/cargo/pkg/data_cache/tonic-0.14.4
  • dependabot/cargo/pkg/data_cache/tonic-0.14.5
  • dependabot/cargo/pkg/data_cache/tonic-0.14.6
  • dependabot/cargo/pkg/data_cache/tower-0.5.2
  • dependabot/cargo/pkg/data_cache/tower-0.5.3
  • dependabot/cargo/pkg/data_cache/tracing-subscriber-0.3.23
  • dependabot/docker/cmd/data_cache/rust-1.91-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.92-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.93-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.94-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.95-bullseye
  • dependabot/docker/cmd/initializers/dataset/python-3.14-slim-bookworm
  • dependabot/docker/cmd/initializers/model/python-3.14-slim-bookworm
  • dependabot/docker/cmd/runtimes/deepspeed/mpioperator/base-v0.7.0
  • dependabot/docker/cmd/runtimes/deepspeed/mpioperator/base-v0.8.0
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.0.2-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.1.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.1.1-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.2.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.2.1-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/mpioperator/base-v0.7.0
  • dependabot/docker/cmd/runtimes/mlx/mpioperator/base-v0.8.0
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.0.2-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.1.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.1.1-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.2.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.2.1-devel-ubuntu22.04
  • dependabot/docker/cmd/trainer-controller-manager/golang-1.25
  • dependabot/docker/cmd/trainer-controller-manager/golang-1.26
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.10.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.11.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.9.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.9.1-cuda12.8-cudnn9-runtime
  • dependabot/github_actions/actions/checkout-5
  • dependabot/github_actions/actions/checkout-6
  • dependabot/github_actions/actions/github-script-8
  • dependabot/github_actions/actions/github-script-9
  • dependabot/github_actions/actions/setup-go-6
  • dependabot/github_actions/actions/setup-python-6
  • dependabot/github_actions/actions/stale-10
  • dependabot/github_actions/actions/upload-artifact-5
  • dependabot/github_actions/actions/upload-artifact-6
  • dependabot/github_actions/actions/upload-artifact-7
  • dependabot/github_actions/amannn/action-semantic-pull-request-6.1.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.33.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.34.0
  • dependabot/github_actions/aquasecurity/trivy-action-0.34.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.34.2
  • dependabot/github_actions/aquasecurity/trivy-action-0.35.0
  • dependabot/github_actions/docker/login-action-4
  • dependabot/github_actions/dot-github/workflows/aquasecurity/trivy-action-0.34.0
  • dependabot/github_actions/github/codeql-action-4
  • dependabot/go_modules/github.com/coreos/go-oidc/v3-3.18.0
  • dependabot/go_modules/github.com/go-jose/go-jose/v4-4.1.4
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.2
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.3
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.5
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.28.1
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.28.2
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.28.3
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.29.0
  • dependabot/go_modules/github.com/onsi/gomega-1.38.3
  • dependabot/go_modules/github.com/onsi/gomega-1.39.0
  • dependabot/go_modules/github.com/onsi/gomega-1.39.1
  • dependabot/go_modules/github.com/onsi/gomega-1.40.0
  • dependabot/go_modules/github.com/onsi/gomega-1.41.0
  • dependabot/go_modules/github.com/open-policy-agent/cert-controller-0.15.0
  • dependabot/go_modules/github.com/open-policy-agent/cert-controller-0.16.0
  • dependabot/go_modules/go.opentelemetry.io/otel-1.41.0
  • dependabot/go_modules/go.uber.org/zap-1.27.1
  • dependabot/go_modules/go.uber.org/zap-1.28.0
  • dependabot/go_modules/golang-8c88b1e330
  • dependabot/go_modules/golang-c94709d3c3
  • dependabot/go_modules/golang-ce64870c5e
  • dependabot/go_modules/golang-cf2caa1bb8
  • dependabot/go_modules/golang-edfadaf7f0
  • dependabot/go_modules/golang-f180a085e8
  • dependabot/go_modules/golang-f5d6e98b80
  • dependabot/go_modules/golang.org/x/crypto-0.45.0
  • dependabot/go_modules/golang.org/x/net-0.38.0
  • dependabot/go_modules/golang.org/x/oauth2-0.27.0
  • dependabot/go_modules/kubernetes-13c179eb27
  • dependabot/go_modules/kubernetes-13fedca295
  • dependabot/go_modules/kubernetes-203b3330f8
  • dependabot/go_modules/kubernetes-2b83cfd1e1
  • dependabot/go_modules/kubernetes-2fcfbcdcf0
  • dependabot/go_modules/kubernetes-33780c5637
  • dependabot/go_modules/kubernetes-33cfdb17df
  • dependabot/go_modules/kubernetes-46bc08174d
  • dependabot/go_modules/kubernetes-67567c79b3
  • dependabot/go_modules/kubernetes-bc4ec63014
  • dependabot/go_modules/kubernetes-bd430bb9c9
  • dependabot/go_modules/kubernetes-df4453129a
  • dependabot/go_modules/kubernetes-e0300699ac
  • dependabot/go_modules/kubernetes-faa114bc83
  • dependabot/go_modules/kubernetes-fdea40109e
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.10
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.12
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.2
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.3
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.4
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.5
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.6
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.7
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.8
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.9
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-1.12.0-and-lt-1.13
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-1.13.0-and-lt-1.14
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-1.14.0-and-lt-1.15
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-1.15.0-and-lt-1.16
  • dependabot/pip/cmd/initializers/dataset/kubernetes-gte-35.0.0
  • dependabot/pip/cmd/initializers/dataset/opendal-gte-0.47.1
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.10
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.11
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.12
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.2
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.3
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.4
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.5
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.6
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.7
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.8
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.9
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-1.12.0-and-lt-1.13
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-1.13.0-and-lt-1.14
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-1.14.0-and-lt-1.15
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-1.15.0-and-lt-1.16
  • dependabot/pip/cmd/initializers/model/opendal-gte-0.47.1
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.4.1
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.4.2
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.5.0
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.6.1
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.7.0
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.8.4
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.8.5
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.2
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.3
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.4
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.5
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.6
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.7
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.8
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.9
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.19.0
  • dependabot/pip/cmd/runtimes/deepspeed/mpi4py-4.1.1
  • dependabot/pip/cmd/runtimes/deepspeed/mpi4py-4.1.2
  • dependabot/pip/cmd/runtimes/deepspeed/sentencepiece-0.2.1
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.10.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.11.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.12.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.6.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.7.1
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.8.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.9.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.9.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.51.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.52.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.53.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.2
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.3
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.6
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.0.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.1.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.2.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.3.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.4.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.5.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.5.4
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.6.2
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.7.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.8.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-5.8.1
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.4.1
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.4.2
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.5.0
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.6.1
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.7.0
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.8.4
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.8.5
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.29.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.1
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.5
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.6
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.31.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-data-0.2.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.28.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.28.4
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.2
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.4
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.5
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.6
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.31.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.31.2
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.31.3
  • dependabot/pip/cmd/trainers/torchtune/bitsandbytes-gte-0.49.2
  • dependabot/pip/cmd/trainers/torchtune/kagglehub-gte-1.0.0
  • dependabot/pip/cmd/trainers/torchtune/kagglehub-gte-1.0.1
  • dependabot/pip/cmd/trainers/torchtune/torchao-0.16.0
  • dependabot/pip/cmd/trainers/torchtune/torchao-0.17.0
  • doc-website
  • docs/local-examples-gpu-support
  • docs/local-iceberg-validation
  • dont-merge-gpu-label-test
  • e2e-debug-clean
  • e2e-test-through-helm
  • example/trainjob-yaml
  • fail-gpu-e2e
  • feat/add-coscheduling-uts
  • feat/add-helm-ci-checks
  • feat/add-securitycontext-support-trainjob
  • feat/add-storageuri-targetjob-validation
  • feat/add-version-file
  • feat/automate-release
  • feat/config-unit-tests
  • feat/ctr-webhook
  • feat/dataset-preprocess
  • feat/enforce-runtime-info-plugin
  • feat/example/add-speech-recognition-with-ddp-example
  • feat/helm-data-cache-config
  • feat/initializers/s3
  • feat/llama3_2-manifests
  • feat/llm-trainer-v2
  • feat/local-model
  • feat/lora-support
  • feat/move-enablehttp2-to-config
  • feat/mpi-env-validation
  • feat/pvc-check
  • feat/replica-valid
  • feat/replicatedjobpatch-name-validation
  • feat/sdk-torchtune-config
  • feat/statusserver-healthz-probe
  • feat/storageuri-targetjob-validation-fix
  • feat/termination-grace-period-patch
  • feat/torchtune-plugin
  • feat/trainer-multi-slice-tpu
  • feat/trainjob-affinity
  • feat/trainjob-imagepullsecrets
  • feat/webhook-validate-trainjob-name
  • feat/webhook/rfc1035
  • feature-gate-scaffolding
  • feature/add-xgboost-runtime
  • feature/debabrata
  • feature/framework-env-conflict-validation
  • feature/helm-charts-v2
  • feature/kube-linter-3096
  • feature/multi-replica-replicatedjobs
  • feature/pet-env-init-containers
  • feature/support-for-ClusterTrainingRuntimes
  • fix-arg-for-get-args-using-torchtune-config
  • fix-close-pr-message
  • fix-controller-rbac
  • fix-coveralls
  • fix-crd-cel-namespace
  • fix-deepspeed-example
  • fix-deepspeed-npoc
  • fix-e2e
  • fix-e2e-sdk-install
  • fix-e2e-test
  • fix-example-runtime
  • fix-helm-chart-name
  • fix-helm-charts-config-api-2894
  • fix-helm-lint
  • fix-helm-test
  • fix-helm-unittest-logic-clean
  • fix-immutable-apis
  • fix-jobset-image
  • fix-kep-volcano
  • fix-latest-dev
  • fix-latest-tag
  • fix-llm-hp-optimization-error
  • fix-local-tests
  • fix-mlx-runtime
  • fix-mpi-key-mode
  • fix-oci-vm-tf
  • fix-outdated-intstr-lib
  • fix-permissions
  • fix-python-release-version
  • fix-readonly-rootfs
  • fix-release-doc
  • fix-release-steps
  • fix-resource-allocation
  • fix-sdk-e2e
  • fix-serviceaccount-name
  • fix-suspend-resume-3008
  • fix-tag-manager
  • fix-test-bug
  • fix-torch-compile
  • fix-torchtune-runtime-deps
  • fix-trainer-type-annotation
  • fix/allow-podtemplate-update-on-unsuspend
  • fix/cache-service-conflict-check
  • fix/cert-and-issuer
  • fix/ci-duplicate-step-name
  • fix/disable-github-actions
  • fix/e2e-platform-mismatch
  • fix/helm-chart
  • fix/issue-template
  • fix/jax-validation
  • fix/kep2401-lint
  • fix/megatron-notebook-complete-wait-timeout
  • fix/mnist-training-parameters-v2
  • fix/multiple-depends-on
  • fix/notebook-e2e-flaky-completion
  • fix/python-type-import
  • fix/rbac/event
  • fix/remove-jobset-lws-patches
  • fix/rename-needs-triage-label
  • fix/runtime-info-thread-safety
  • fix/runtime-patches-atomic-unsuspend
  • fix/sync-podsets-count-to-template-spec
  • fix/tidy-KEP-2401
  • fix/torchtune-c-compiler
  • fix/torchtune-plugin
  • fix/torchtune-validation-lora-immutable-args
  • fix/trainjob-status-error
  • fix/wire-client-connection-qps-burst
  • flux-framework-plugin
  • go-1.26
  • gpu-arc-doc
  • gpu-test-on-pr
  • grafana-dashboard
  • gsoc-2442-jax-runtime-proposal
  • gsoc-pss-istio-fix
  • gsoc25-project7-kep
  • hatchling-package
  • health
  • helm-integration-tests
  • implement-resource-in-use-finalizer
  • implement-resource-in-use-for-cl-training-runtime
  • implement-validation-uts
  • indexers-ut
  • issue-2218-pod-spec-override-kep
  • issue-2547
  • issue-2706-v2-go-mod
  • issue-2789/implement-cluster-training-runtimes-deprecation-process
  • jax-example-ipynb
  • jax-runtime
  • jax-runtime-impl
  • jobset-name-prefix
  • jobset-validation
  • jobset-volume-claim-policies
  • k8s-1-3-6
  • k8s_1.32_upgrade
  • kai-scheduler-2628
  • kai_kep
  • kep-2598-xgboost-runtime
  • kep-2779-trainjob-progress
  • kep-2841-add-flux-hpc
  • kep-mutable-runtimes
  • kep3416
  • kubecon-london-demo
  • kubelow-sdk-release
  • megatron
  • mlx-cuda-runtime
  • mlx-runtime
  • move-imports
  • obtain-runtimeTemplate-via-info
  • openssf-badge
  • override_label_and_annotation
  • patch-1
  • patch-issue-2027
  • pick/example-alpaca
  • pick/fix-torchtune-plugin
  • pkg/apply_unit-tests
  • plugin/flux
  • pr-15
  • pr-17
  • pr-18
  • pr-19
  • pr-20
  • pr-21
  • pr-22
  • pr-24
  • pr-25
  • pr-26
  • pr-27
  • pr-28
  • pr-29
  • pr-30
  • pr-32
  • pr-33
  • pr-35
  • pr-36
  • pr-37
  • pr-38
  • pr-39
  • pr-41
  • pr-42
  • pr-43
  • pr-44
  • pr-45
  • pr-47
  • pr-created-condition
  • pr-k8s-lint
  • pr-runtime-patches
  • pr-time-webhook
  • pr-title-workflow
  • prometheus
  • proposal
  • proposal-2170
  • pss-istio-fix-clean
  • pss-restricted-fixes
  • public-configmap
  • readme-blog
  • refactor-named-container-ports
  • refs/tags/v1.9.1
  • refs/tags/v2.0.0-rc.0
  • refs/tags/v2.0.0-rc.1
  • refs/tags/v2.0.1
  • refs/tags/v2.1.0
  • refs/tags/v2.1.0-rc.0
  • refs/tags/v2.1.0-rc.1
  • refs/tags/v2.2.0
  • refs/tags/v2.2.0-rc.0
  • refs/tags/v2.2.0-rc.1
  • release-1.9
  • release-2.0
  • release-2.1
  • release-2.2
  • release-automation
  • release-python-doc
  • release-v2.2.5
  • release-v8.9.1
  • remove-command-runtimes
  • remove-duplicate-code
  • remove-elastic-policy
  • remove-k8s-version-matrix
  • remove-mpi
  • remove-num-proc
  • remove-py-packages
  • remove-sdk
  • remove-setcap-cap-net-bind
  • remove-trivy-action
  • remove-vendor-specific-parameters
  • revert-2646-fix-trainer-type-annotation
  • roadmap-2025
  • roadmap-2026
  • rqst-env-only-if-label-present
  • runtime-rbac
  • runtime_fix
  • safe-gpu-test
  • scorecard-workflow
  • script/setup-gpu-cluster2
  • sdk-ancestor-updates
  • sdk-fix-mpirun
  • security-doc
  • separate-models-from-sdk
  • sharedinit
  • skip-stdlib
  • solanyn/question-answer-example
  • support-arm-container
  • support-for-gpu-cluster-using-oci-runner
  • support_kai
  • terrytangyuan-patch-1
  • test-cncf-gpu-runner
  • test-gpu-arc
  • test-statusserver-helpers
  • test/add-unit-tests-trainingruntime-util
  • test/fix-flaky-test
  • test/jobset-builder-tests
  • test/rename-runtime-plugin-tests
  • test/runtime-core-coverage
  • tmp_secret_verify
  • torchrun-var
  • trainer-release
  • training-progression#2779
  • trainjob-progress
  • treat-ancestor-label-as-identifier
  • trivy-scans
  • ttl
  • update-approvers
  • update-examples-with-unpacking-params
  • update-github-runners
  • update-golangci-linter
  • update-image-tags
  • update-jobset-0.11
  • update-k8s-0-36
  • update-license
  • update-logs-examples
  • update-manifest-images-to-ghcr
  • update-news-v2.2-release
  • update-owners
  • update-readme
  • update-release-process
  • update-sdk-reference
  • update-security-context
  • update-slack
  • update-stale-bot-version
  • update-torch-2.10
  • update-torch-2.9
  • use-tilt
  • validation-mpiruntimes
  • volcano
  • volcano-podgroup-build
  • vuls
  • vzamboulingame-upgrade-go-v1.24
  • was-kep
  • workflow/helm
  • workflow/publish-helm-charts
  • xgboost-runtime-implementation
  • yaml-examples
  • year-cleanup

20 May 2026 02:15PM UTC coverage: 62.429% (+0.2%) from 62.219%
26168404895

push

github

web-flow
fix(runtimes): add validation for LoRA multi-node and immutable trainer args (#3302)

* fix(torchtune): add validation for LoRA multi-node and immutable args

Add two missing validations to validateTorchTune:

1. Reject LoRA fine-tuning when numNodes > 1. getRecipeAndConfig
   silently falls through to full_finetune_distributed in this case,
   discarding LoRA args without any user-visible error.

2. Reject immutable runtime configs (output_dir, tokenizer.path,
   checkpointer.checkpoint_dir, tokenizer.merges_file) in
   spec.trainer.args. These are injected by the runtime via
   extractOverridesFromRuntime and must not be overridden by the user.

Resolves the TODO added by @Electronic-Waste in torchtune.go.

Signed-off-by: Krish Garg <gargkrish06@gmail.com>
Signed-off-by: krishdef7 <gargkrish06@gmail.com>

* fix(runtimes): add validation for LoRA/QLoRA multi-node and immutable trainer args

Signed-off-by: krishdef7 <gargkrish06@gmail.com>

---------

Signed-off-by: Krish Garg <gargkrish06@gmail.com>
Signed-off-by: krishdef7 <gargkrish06@gmail.com>

17 of 17 new or added lines in 1 file covered. (100.0%)

3 existing lines in 1 file now uncovered.

2205 of 3532 relevant lines covered (62.43%)

0.72 hits per line

Relevant lines Covered
Build:
Build:
3532 RELEVANT LINES 2205 COVERED LINES
0.72 HITS PER LINE
Source Files on master
  • Tree
  • List 40
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
26168404895 master fix(runtimes): add validation for LoRA multi-node and immutable trainer args (#3302) * fix(torchtune): add validation for LoRA multi-node and immutable args Add two missing validations to validateTorchTune: 1. Reject LoRA fine-tuning when numNo... push 20 May 2026 02:19PM UTC web-flow github
62.43
26166068392 master feat(ci): add Python dependency scanning to OSV-Scanner workflow (#3530) * feat(ci): add Python dependency scanning to OSV-Scanner workflow Add lockfiles and scanning for Python dependencies in the initializers and python_api. This extends the n... push 20 May 2026 01:39PM UTC web-flow github
62.22
26100441005 master feat(ci): add nightly OSV-Scanner vulnerability scan workflow (#3518) * feat: add nightly OSV-Scanner vulnerability scan workflow Adds a nightly GitHub Actions workflow that scans go.mod for known vulnerabilities using OSV-Scanner, uploads SARIF... push 19 May 2026 01:34PM UTC web-flow github
62.13
26098981213 master feat(docs): KEP-2599: Decouple runtime lifecycle from TrainJobs to simplify updating runtimes (#3428) * feat(docs): add KEP-2599 for mutable runtimes Proposes allowing TrainingRuntimes and ClusterTrainingRuntimes to be mutable by introducing Tra... push 19 May 2026 01:07PM UTC web-flow github
62.13
25925865057 master feat: KEP for inject PET envs into init-container (#3417) * docs(proposals): add draft KEP for inject PET envs into trainer init containers Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> * fix comments Signed-off-by: Peter Pan <Peter.Pan@dao... push 15 May 2026 03:25PM UTC web-flow github
62.13
25800981853 master fix(cache): validate cache_index schema collisions in worker dat… (#3216) * fix(data-cache): validate cache_index schema collisions in worker datasource Signed-off-by: Hitanshi Goklani <hitanshigoklani33@gmail.com> * Update pkg/data_cache/src/w... push 13 May 2026 01:09PM UTC web-flow github
62.13
25763132049 master feat: storageuri targetjob validation fix (#3510) * Applied the required changes Signed-off-by: Pulkit Agrawal <97938993+pulkit-999@users.noreply.github.com> * feat: add StorageUri and replicated job name validation - Add CEL-based validation ... push 12 May 2026 09:29PM UTC web-flow github
62.13
25762422721 master chore(deps): bump tokio from 1.52.2 to 1.52.3 in /pkg/data_cache/test (#3497) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.52.2 to 1.52.3. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tok... push 12 May 2026 09:14PM UTC web-flow github
62.13
25748562511 master fix: remove unnecessary setcap CAP_NET_BIND_SERVICE from MPI runtime docker file (#3286) sshd in the MPI runtime listens on port 2222 (non-privileged), so CAP_NET_BIND_SERVICE is not needed. Also removes libcap2-bin which was only installed to pr... push 12 May 2026 04:45PM UTC web-flow github
62.13
25748325579 master feat(docs): add AI policy reference to contributing guide (#3493) Direct contributors to review the Kubeflow AI Policy before using AI assistant tooling for contributions. Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-aut... push 12 May 2026 04:41PM UTC web-flow github
62.13
See All Builds (3179)

Badge your Repo: trainer

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc