• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

kubeflow / trainer
51%

Build:
DEFAULT BRANCH: master
Repo Added 20 Mar 2025 01:49PM UTC
Token 3qIdUH6ns6RNy0sBPPQ6ybJp7VqYkScU8 regen
Build 1931 Last
Files 30
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH master
branch: SELECT
CHANGE BRANCH
x
Sync Branches
  • No branch selected
  • 2836-expose-builruntimeinfo
  • 2871-allow-podspecoverride-dupl-jobs
  • Bug
  • KEP-volcano-scheduler
  • add-audio-examples
  • add-config-api-tests-2885
  • add-dependabot
  • add-gitattr
  • add-license-scan-badge
  • add-local-example
  • add-local-trainer-client
  • add-local-trainer-example
  • add-manager-field-podtemplateoverride
  • add-ok-to-test
  • add-overlay-manifest-v2
  • add-patch-updates-k8s
  • add-pod-network-plugin-to-diagram
  • add-qwen3-1.7b
  • add-runtime-labels
  • add-sdk-release
  • add-standalone-manifest
  • automate-release
  • bo/feat/remove-launcher-chainer-validation
  • bo/test/add-ut-for-torch-runtime-valid
  • bump-jobset-v0.9.0
  • bump-torch-deepspeed
  • cache-example
  • cache-oss
  • cache_initilizer
  • cache_pipeline
  • changelog-1.9.1
  • changelog-2.0.0
  • changelog-2.0.1
  • changelog-v2.0.0-rc.0
  • changelog-v2.0.0-rc.1
  • changelog-v2.1.0
  • changelog-v2.1.0-rc.0
  • changelog-v2.1.0-rc.1
  • cherry-pick-2666-to-release-2.0
  • cherry-pick-2675-to-release-2.0
  • cherry-pick-2682-to-release-2.0
  • cherry-pick-2683-to-release-2.0
  • cherry-pick-2685-to-release-2.0
  • cherry-pick-2686-to-release-2.0
  • cherry-pick-2691-to-release-2.0
  • cherry-pick-2695-to-release-2.0
  • cherry-pick-2700-to-release-2.0
  • cherry-pick-2703-to-release-2.0
  • cherry-pick-2707-to-release-2.0
  • cherry-pick-2719-to-release-2.0
  • cherry-pick-2726-to-release-2.0
  • cherry-pick-2728-to-release-2.1
  • cherry-pick-2731-to-release-2.0
  • cherry-pick-2734-to-release-2.0
  • cherry-pick-2739-to-release-2.0
  • cherry-pick-2761
  • cherry-pick-2766
  • cherry-pick-2771-to-release-2.0
  • cherry-pick-2774-to-release-2.0
  • cherry-pick-2780
  • cherry-pick-2813
  • cherry-pick-2815
  • cherry-pick-2837-to-release-2.0
  • cherry-pick-2854-to-release-2.0
  • cherry-pick-2877-to-release-2.1
  • cherry-pick-2904-to-release-2.1
  • cherry-pick-2907-to-release-2.1
  • cherry-pick-2908-to-release-2.1
  • cherry-pick-2913-to-release-2.1
  • cherry-pick-2923-to-release-2.1
  • cherry-pick-2926-to-release-2.1
  • cherry-pick-2971-to-release-2.1
  • cherry-pick-3009-to-release-2.1
  • cherry-pick-3010-to-release-2.1
  • cherry-pick-changelog-1.9
  • chore/KEP-runtime-class
  • chore/gha
  • chore/merge-podspacoverride-test-cases
  • chore/upgrade-torchtune-version
  • ci/include-1.32-k8s
  • config-api-implementation
  • coscheduling-indexers-ut
  • deepspeed-runtime
  • dependabot/cargo/pkg/data_cache/arrow-57.0.0
  • dependabot/cargo/pkg/data_cache/arrow-57.1.0
  • dependabot/cargo/pkg/data_cache/arrow-flight-57.1.0
  • dependabot/cargo/pkg/data_cache/async-trait-0.1.89
  • dependabot/cargo/pkg/data_cache/axum-0.8.8
  • dependabot/cargo/pkg/data_cache/bincode-2.0.1
  • dependabot/cargo/pkg/data_cache/bincode-3.0.0
  • dependabot/cargo/pkg/data_cache/bytes-1.11.0
  • dependabot/cargo/pkg/data_cache/crossbeam-channel-0.5.15
  • dependabot/cargo/pkg/data_cache/hickory-resolver-0.25.2
  • dependabot/cargo/pkg/data_cache/iceberg-0.6.0
  • dependabot/cargo/pkg/data_cache/iceberg-0.7.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.6.0
  • dependabot/cargo/pkg/data_cache/iceberg-datafusion-0.7.0
  • dependabot/cargo/pkg/data_cache/ring-0.17.14
  • dependabot/cargo/pkg/data_cache/serde-1.0.228
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.0.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.1.0
  • dependabot/cargo/pkg/data_cache/test/arrow-flight-57.2.0
  • dependabot/cargo/pkg/data_cache/test/bincode-2.0.1
  • dependabot/cargo/pkg/data_cache/test/bincode-3.0.0
  • dependabot/cargo/pkg/data_cache/test/bytes-1.11.0
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.51
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.52
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.53
  • dependabot/cargo/pkg/data_cache/test/clap-4.5.54
  • dependabot/cargo/pkg/data_cache/test/serde-1.0.228
  • dependabot/cargo/pkg/data_cache/test/tokio-1.48.0
  • dependabot/cargo/pkg/data_cache/test/tokio-1.49.0
  • dependabot/cargo/pkg/data_cache/test/tonic-0.14.2
  • dependabot/cargo/pkg/data_cache/test/tracing-0.1.43
  • dependabot/cargo/pkg/data_cache/test/tracing-0.1.44
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.20
  • dependabot/cargo/pkg/data_cache/test/tracing-subscriber-0.3.22
  • dependabot/cargo/pkg/data_cache/tokio-1.44.2
  • dependabot/cargo/pkg/data_cache/tokio-1.48.0
  • dependabot/cargo/pkg/data_cache/tonic-0.14.2
  • dependabot/cargo/pkg/data_cache/tower-0.5.2
  • dependabot/docker/cmd/data_cache/rust-1.91-bullseye
  • dependabot/docker/cmd/data_cache/rust-1.92-bullseye
  • dependabot/docker/cmd/initializers/dataset/python-3.14-slim-bookworm
  • dependabot/docker/cmd/initializers/model/python-3.14-slim-bookworm
  • dependabot/docker/cmd/runtimes/deepspeed/mpioperator/base-v0.7.0
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.0.2-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/deepspeed/nvidia/cuda-13.1.0-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/mpioperator/base-v0.7.0
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.0.2-devel-ubuntu22.04
  • dependabot/docker/cmd/runtimes/mlx/nvidia/cuda-13.1.0-devel-ubuntu22.04
  • dependabot/docker/cmd/trainer-controller-manager/golang-1.25
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.9.0-cuda12.8-cudnn9-runtime
  • dependabot/docker/cmd/trainers/torchtune/pytorch/pytorch-2.9.1-cuda12.8-cudnn9-runtime
  • dependabot/github_actions/actions/checkout-5
  • dependabot/github_actions/actions/checkout-6
  • dependabot/github_actions/actions/github-script-8
  • dependabot/github_actions/actions/setup-go-6
  • dependabot/github_actions/actions/setup-python-6
  • dependabot/github_actions/actions/stale-10
  • dependabot/github_actions/actions/upload-artifact-5
  • dependabot/github_actions/actions/upload-artifact-6
  • dependabot/github_actions/amannn/action-semantic-pull-request-6.1.1
  • dependabot/github_actions/aquasecurity/trivy-action-0.33.1
  • dependabot/github_actions/github/codeql-action-4
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.2
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.3
  • dependabot/go_modules/github.com/onsi/ginkgo/v2-2.27.5
  • dependabot/go_modules/github.com/onsi/gomega-1.38.3
  • dependabot/go_modules/github.com/onsi/gomega-1.39.0
  • dependabot/go_modules/github.com/open-policy-agent/cert-controller-0.15.0
  • dependabot/go_modules/go.uber.org/zap-1.27.1
  • dependabot/go_modules/golang-c94709d3c3
  • dependabot/go_modules/golang-ce64870c5e
  • dependabot/go_modules/golang-cf2caa1bb8
  • dependabot/go_modules/golang-f180a085e8
  • dependabot/go_modules/golang.org/x/crypto-0.45.0
  • dependabot/go_modules/golang.org/x/net-0.38.0
  • dependabot/go_modules/golang.org/x/oauth2-0.27.0
  • dependabot/go_modules/kubernetes-2b83cfd1e1
  • dependabot/go_modules/kubernetes-33780c5637
  • dependabot/go_modules/kubernetes-46bc08174d
  • dependabot/go_modules/kubernetes-bd430bb9c9
  • dependabot/go_modules/kubernetes-e0300699ac
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.2
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.3
  • dependabot/pip/cmd/initializers/dataset/huggingface-hub-gte-0.27.0-and-lt-1.4
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.2
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.3
  • dependabot/pip/cmd/initializers/model/huggingface-hub-gte-0.27.0-and-lt-1.4
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.4.1
  • dependabot/pip/cmd/runtimes/deepspeed/datasets-4.4.2
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.2
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.3
  • dependabot/pip/cmd/runtimes/deepspeed/deepspeed-0.18.4
  • dependabot/pip/cmd/runtimes/deepspeed/mpi4py-4.1.1
  • dependabot/pip/cmd/runtimes/deepspeed/sentencepiece-0.2.1
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.6.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.7.1
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.8.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.9.0
  • dependabot/pip/cmd/runtimes/deepspeed/torch-2.9.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.51.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.52.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.53.0
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.1
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.2
  • dependabot/pip/cmd/runtimes/deepspeed/transformers-4.57.3
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.4.1
  • dependabot/pip/cmd/runtimes/mlx/datasets-4.4.2
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.29.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-cuda--0.30.1
  • dependabot/pip/cmd/runtimes/mlx/mlx-data-0.2.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.28.3
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.28.4
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.0
  • dependabot/pip/cmd/runtimes/mlx/mlx-lm-0.30.2
  • docs/local-examples-gpu-support
  • dont-merge-gpu-label-test
  • example/trainjob-yaml
  • feat/add-coscheduling-uts
  • feat/add-securitycontext-support-trainjob
  • feat/add-version-file
  • feat/ctr-webhook
  • feat/dataset-preprocess
  • feat/example/add-speech-recognition-with-ddp-example
  • feat/helm-data-cache-config
  • feat/initializers/s3
  • feat/llama3_2-manifests
  • feat/llm-trainer-v2
  • feat/local-model
  • feat/lora-support
  • feat/pvc-check
  • feat/replica-valid
  • feat/sdk-torchtune-config
  • feat/torchtune-plugin
  • feat/trainjob-affinity
  • feat/trainjob-imagepullsecrets
  • feat/webhook-validate-trainjob-name
  • feat/webhook/rfc1035
  • feature/add-xgboost-runtime
  • feature/debabrata
  • feature/helm-charts-v2
  • fix-arg-for-get-args-using-torchtune-config
  • fix-close-pr-message
  • fix-controller-rbac
  • fix-coveralls
  • fix-deepspeed-example
  • fix-deepspeed-npoc
  • fix-e2e-sdk-install
  • fix-example-runtime
  • fix-helm-chart-name
  • fix-helm-charts-config-api-2894
  • fix-kep-volcano
  • fix-latest-tag
  • fix-llm-hp-optimization-error
  • fix-local-tests
  • fix-mlx-runtime
  • fix-mpi-key-mode
  • fix-oci-vm-tf
  • fix-outdated-intstr-lib
  • fix-permissions
  • fix-release-doc
  • fix-resource-allocation
  • fix-suspend-resume-3008
  • fix-tag-manager
  • fix-test-bug
  • fix-trainer-type-annotation
  • fix/cert-and-issuer
  • fix/disable-github-actions
  • fix/helm-chart
  • fix/issue-template
  • fix/kep2401-lint
  • fix/multiple-depends-on
  • fix/python-type-import
  • fix/rbac/event
  • fix/runtime-info-thread-safety
  • fix/sync-podsets-count-to-template-spec
  • fix/tidy-KEP-2401
  • fix/torchtune-c-compiler
  • fix/torchtune-plugin
  • gpu-test-on-pr
  • gsoc-2442-jax-runtime-proposal
  • gsoc25-project7-kep
  • hatchling-package
  • health
  • helm-integration-tests
  • implement-resource-in-use-finalizer
  • implement-resource-in-use-for-cl-training-runtime
  • implement-validation-uts
  • indexers-ut
  • issue-2218-pod-spec-override-kep
  • issue-2547
  • issue-2706-v2-go-mod
  • issue-2789/implement-cluster-training-runtimes-deprecation-process
  • jax-runtime
  • jax-runtime-impl
  • jobset-name-prefix
  • jobset-validation
  • k8s_1.32_upgrade
  • kai_kep
  • kep-2779-trainjob-progress
  • kep-2841-add-flux-hpc
  • kubecon-london-demo
  • kubelow-sdk-release
  • master
  • mlx-cuda-runtime
  • mlx-runtime
  • obtain-runtimeTemplate-via-info
  • openssf-badge
  • override_label_and_annotation
  • patch-1
  • patch-issue-2027
  • pick/example-alpaca
  • pick/fix-torchtune-plugin
  • pkg/apply_unit-tests
  • plugin/flux
  • pr-15
  • pr-17
  • pr-18
  • pr-19
  • pr-20
  • pr-21
  • pr-22
  • pr-24
  • pr-25
  • pr-26
  • pr-27
  • pr-28
  • pr-29
  • pr-30
  • pr-32
  • pr-33
  • pr-35
  • pr-36
  • pr-37
  • pr-38
  • pr-39
  • pr-41
  • pr-42
  • pr-43
  • pr-44
  • pr-45
  • pr-created-condition
  • pr-k8s-lint
  • pr-title-workflow
  • prometheus
  • proposal-2170
  • pss-restricted-fixes
  • refs/tags/v1.9.1
  • refs/tags/v2.0.0-rc.0
  • refs/tags/v2.0.0-rc.1
  • refs/tags/v2.0.1
  • refs/tags/v2.1.0
  • refs/tags/v2.1.0-rc.0
  • refs/tags/v2.1.0-rc.1
  • release-1.9
  • release-2.0
  • release-2.1
  • release-python-doc
  • remove-command-runtimes
  • remove-k8s-version-matrix
  • remove-mpi
  • remove-sdk
  • remove-vendor-specific-parameters
  • revert-2646-fix-trainer-type-annotation
  • roadmap-2025
  • rqst-env-only-if-label-present
  • runtime-rbac
  • runtime_fix
  • safe-gpu-test
  • scorecard-workflow
  • sdk-ancestor-updates
  • sdk-fix-mpirun
  • security-doc
  • separate-models-from-sdk
  • solanyn/question-answer-example
  • support-arm-container
  • support-for-gpu-cluster-using-oci-runner
  • support_kai
  • terrytangyuan-patch-1
  • test-gpu-arc
  • test/fix-flaky-test
  • tmp_secret_verify
  • training-progression#2779
  • treat-ancestor-label-as-identifier
  • trivy-scans
  • update-approvers
  • update-examples-with-unpacking-params
  • update-github-runners
  • update-image-tags
  • update-license
  • update-logs-examples
  • update-manifest-images-to-ghcr
  • update-owners
  • update-release-process
  • update-sdk-reference
  • update-security-context
  • update-slack
  • update-stale-bot-version
  • update-torch-2.9
  • use-tilt
  • validation-mpiruntimes
  • volcano
  • volcano-podgroup-build
  • vuls
  • vzamboulingame-upgrade-go-v1.24
  • workflow/helm
  • workflow/publish-helm-charts

18 Jan 2026 09:12PM UTC coverage: 51.385% (+0.1%) from 51.264%
21118692202

push

github

web-flow
fix: fix resourcePerNode override not applied with Volcano scheduler (#2982)

* volcano integeration for resourcePerNode override in trainjob

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* Fixed the issue where PodGroup resources weren't scaling with spec.trainer.resourcesPerNode when specified on a TrainJob

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* limits removed

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* maintains consistency across all MLPolicy plugins

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* Fix pre-commit formatting

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* Unit test updates for mpi and torch pluggins

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* framework unit test prblm solved

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* fix:All MLPolicy Plugins update SinglePodRequests from TrainJob.spec.trainer.resourcesPerNode

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* Refactor ResourcesPerNode handling to use PodRequest helper for correct scaling with init/sidecar containers

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* refactor: simplify ResourcesPerNode by modifying jobSetTemplateSpec directly

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* test: modify TrainJob resources in coscheduling test to verify node container override

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

* Refactor: remove duplicate resource logic from plugins

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

---------

Signed-off-by: sksingh2005 <shashanksgh3@gmail.com>

7 of 7 new or added lines in 1 file covered. (100.0%)

1243 of 2419 relevant lines covered (51.38%)

0.61 hits per line

Relevant lines Covered
Build:
Build:
2419 RELEVANT LINES 1243 COVERED LINES
0.61 HITS PER LINE
Source Files on master
  • Tree
  • List 30
  • Changed 1
  • Source Changed 0
  • Coverage Changed 1
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
21118692202 master fix: fix resourcePerNode override not applied with Volcano scheduler (#2982) * volcano integeration for resourcePerNode override in trainjob Signed-off-by: sksingh2005 <shashanksgh3@gmail.com> * Fixed the issue where PodGroup resources weren't ... push 18 Jan 2026 09:15PM UTC web-flow github
51.38
21082357673 fix-local-tests fix(ci): Set runc as default Docker runtime for CPU tests Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Pull #3098 16 Jan 2026 10:10PM UTC andreyvelich github
51.26
21082246830 fix-local-tests fix(ci): Set runc as default Docker runtime for CPU tests Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Pull #3098 16 Jan 2026 10:06PM UTC andreyvelich github
51.26
21082199300 fix-local-tests fix(ci): Set runc as default Docker runtime for CPU tests Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Pull #3098 16 Jan 2026 10:04PM UTC andreyvelich github
51.26
21082115826 fix-local-tests fix(ci): Set runc as default Docker runtime for CPU tests Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Pull #3098 16 Jan 2026 10:00PM UTC andreyvelich github
51.26
21082097207 fix-local-tests fix(ci): Set runc as default Docker runtime for CPU tests Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Pull #3098 16 Jan 2026 09:59PM UTC andreyvelich github
51.26
21081732341 fix-local-tests fix(ci): Fix the container local execution tests Pull #3098 16 Jan 2026 09:44PM UTC andreyvelich github
51.26
21075519626 volcano Refactor: remove duplicate resource logic from plugins Signed-off-by: sksingh2005 <shashanksgh3@gmail.com> Pull #2982 16 Jan 2026 05:47PM UTC sksingh2005 github
51.38
21075171915 test-gpu-arc chore: refactored the code Signed-off-by: Akash Jaiswal <akashjaiswal3846@gmail.com> Pull #3067 16 Jan 2026 05:34PM UTC jaiakash github
51.26
21074098477 test-gpu-arc fix: helm dirs Signed-off-by: Akash Jaiswal <akashjaiswal3846@gmail.com> Pull #3067 16 Jan 2026 04:56PM UTC jaiakash github
51.26
See All Builds (1750)

Badge your Repo: trainer

We detected this repo isn’t badged! Grab the embed code to the right, add it to your repo to show off your code coverage, and when the badge is live hit the refresh button to remove this message.

Could not find badge in README.

Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

Refresh
  • Settings
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc