24268402333
68%

Ran 10 Apr 2026 11:20PM UTC

Jobs 1

Files 616

Run time 2min

Badge

Embed ▾

Committed 10 Apr 2026 11:13PM UTC coverage: 65.458% (-0.05%) from 65.511%

Build # 24268402333

Build Type

push

github

Committed by

web-flow

Commit Message

Add deflake skill for finding and fixing flaky tests (#4746)

* Add deflake skill for finding and fixing flaky tests

Adds a /deflake skill that analyzes GitHub Actions failures on main to
discover, rank, and plan fixes for flaky tests. The skill includes a
Python collection script that deterministically fetches failed run logs
in parallel, extracts test names from Ginkgo and gotestfmt output, and
aggregates failures into a ranked report.

Used this skill to identify and fix the #1 flake (workload lifecycle
E2E test, 12/147 runs) in #4745.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address review feedback on collect-flakes script

- Extract per-test log context (50 lines before the failure marker)
  before classifying failure mode, so tests in the same run get
  accurate individual mode labels instead of all inheriting the
  first match from the full run log
- Add try/except around future.result() so one failed run doesn't
  crash the script and lose all collected data
- Fix misleading comment about MAX_PAGES covering 300 Main build
  runs — the API returns all workflows' runs, not just Main build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix per-test failure mode extraction in collect-flakes

The previous attempt to extract per-test failure context used a 50-line
window before the [FAIL] summary line, but Ginkgo's [FAILED] reason
line (e.g., "Timed out after 120s") can appear thousands of lines
earlier. Also needed ANSI stripping when searching for [FAILED] markers.

Now searches backwards from the [FAIL] summary to find all [FAILED]
lines in the failure block, uses the earliest one (which has the
actual failure reason), and extracts context spanning all of them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Coverage Stats

56342 of 86073 relevant lines covered (65.46%)

62.59 hits per line

Coverage Regressions

Lines	Coverage	∆	File
14	74.44	-5.19%	pkg/client/config.go
14	20.11	-8.05%	pkg/client/manager.go
11	68.83	-14.29%	pkg/client/discovery.go
6	76.15	-5.5%	pkg/secrets/keyring/keyctl_linux.go
3	70.0	-3.33%	pkg/state/local.go
2	82.29	-0.21%	pkg/vmcp/composer/workflow_engine.go

Jobs

ID	Job ID	Ran	Files	Coverage
1	24268402333.1	10 Apr 2026 11:20PM UTC	616	65.46	GitHub Action Run

stacklok / toolhive / 24268402333
68%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 24268402333

stacklok / toolhive / 24268402333 68%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 24268402333

stacklok / toolhive / 24268402333
68%

README BADGES
x