26411444620
82%

Ran 25 May 2026 05:02PM UTC

Jobs 28

Files 571

Run time 2min

Badge

Embed ▾

Committed 25 May 2026 05:01PM UTC coverage: 81.169%. First build

Build # 26411444620

Build Type

push

github

Committed by

web-flow

Commit Message

Benchmark graph context tools on task 394 (#2009)

* docs: benchmark graph context tools

Compare CodeGraph, code-review-graph, Graphify, and baseline on task #394 to guide optional NeoKai integration priority.

* docs: correct graph benchmark findings

Address review feedback by fixing the task #394 answer key, MCP tool counts, and Graphify runtime notes.

* docs: add ast-grep benchmark comparison

Benchmark ast-grep as a structural search baseline alongside the graph context tools for task #394.

* docs: add unseeded graph benchmark round

* docs: add plain unseeded GLM baseline

* docs: add mixed graph benchmark round

* test: add graph tool benchmark as agent session integration test

Proper benchmark using NeoKai daemon sessions with MCP tool servers
attached, not raw Python HTTP calls. 12 test cases (describe.skip by
default): baseline GLM, 4 unseeded tool cases, 4 mixed discovery cases,
plus mixed baseline. Outputs JSON results to /tmp/.

Run: cd packages/daemon && GLM_API_KEY=xxx bun test tests/online/benchmark/benchmark-graph-tools.test.ts

* docs: add agent session benchmark results and fix GLM-SDK compatibility

Run graph tool benchmark through real NeoKai daemon sessions with MCP
servers attached. Key findings:

- GLM-5.x tool_use responses incompatible with Claude Agent SDK context-fetcher
- GLM-4.7 works for text-only and single-tool MCP sessions
- Mixed multi-tool sessions hang due to same SDK incompatibility
- GLM-4.7 did not voluntarily invoke MCP tools in any test case
- All 4 completed tests (baseline, CodeGraph, CRG, ast-grep) produced
  text-only plans with zero tool calls

Restructure benchmark: drop mixed round, keep unseeded tests only,
add text-only baseline prompt, increase timeouts, build indexes before
daemon start to avoid transport PONG timeout.

* fix: address benchmark PR review feedback

- Use BENCHMARK_PROMPT_UNSEDED for MCP cases (not TEXT_ONLY) so tools
  are not suppressed
- Record real commit SHA via git rev-parse... (continued)

Coverage Stats

9317 of 13703 branches covered (67.99%)

Branch coverage included in aggregate %.

77645 of 93434 relevant lines covered (83.1%)

295.38 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
1	daemon-0-shared - 26411444620.1	25 May 2026 05:02PM UTC	32	81.29	GitHub Action Run
2	daemon-5-space-other - 26411444620.2	25 May 2026 05:02PM UTC	115	40.08	GitHub Action Run
3	daemon-5-space-agent - 26411444620.3	25 May 2026 05:02PM UTC	167	24.46	GitHub Action Run
4	daemon-online-rewind-1 - 26411444620.4	25 May 2026 05:03PM UTC	325	22.49	GitHub Action Run
5	daemon-online-features-1 - 26411444620.5	25 May 2026 05:03PM UTC	325	23.13	GitHub Action Run
6	daemon-online-git - 26411444620.6	25 May 2026 05:03PM UTC	325	19.09	GitHub Action Run
7	daemon-online-rpc-1 - 26411444620.7	25 May 2026 05:03PM UTC	325	19.34	GitHub Action Run
8	daemon-online-rpc-3 - 26411444620.8	25 May 2026 05:03PM UTC	325	19.74	GitHub Action Run
9	daemon-online-components - 26411444620.9	25 May 2026 05:02PM UTC	325	18.07	GitHub Action Run
10	daemon-online-space-2 - 26411444620.10	25 May 2026 05:04PM UTC	325	33.32	GitHub Action Run
11	daemon-5-space-runtime - 26411444620.11	25 May 2026 05:03PM UTC	180	45.9	GitHub Action Run
12	daemon-4-space-storage - 26411444620.12	25 May 2026 05:04PM UTC	156	60.63	GitHub Action Run
13	daemon-online-mcp - 26411444620.13	25 May 2026 05:02PM UTC	325	18.45	GitHub Action Run
14	daemon-online-features-2 - 26411444620.14	25 May 2026 05:03PM UTC	325	22.68	GitHub Action Run
15	daemon-2-handlers - 26411444620.15	25 May 2026 05:02PM UTC	298	28.51	GitHub Action Run
16	daemon-online-sdk - 26411444620.16	25 May 2026 05:03PM UTC	325	22.35	GitHub Action Run
17	daemon-online-websocket - 26411444620.17	25 May 2026 05:02PM UTC	325	18.18	GitHub Action Run
18	daemon-online-convo - 26411444620.18	25 May 2026 05:03PM UTC	325	22.23	GitHub Action Run
19	daemon-online-agent-sdk - 26411444620.19	25 May 2026 05:03PM UTC	325	22.34	GitHub Action Run
20	daemon-online-coordinator - 26411444620.20	25 May 2026 05:02PM UTC	325	7.67	GitHub Action Run
21	daemon-5-space-workflow - 26411444620.21	25 May 2026 05:02PM UTC	111	30.98	GitHub Action Run
22	daemon-online-rewind-2 - 26411444620.22	25 May 2026 05:03PM UTC	325	22.96	GitHub Action Run
23	daemon-online-rpc-2 - 26411444620.23	25 May 2026 05:03PM UTC	325	23.51	GitHub Action Run
24	daemon-online-space-1 - 26411444620.24	25 May 2026 05:03PM UTC	325	33.8	GitHub Action Run
25	web - 26411444620.25	25 May 2026 05:03PM UTC	237	73.25	GitHub Action Run
26	daemon-1-core - 26411444620.26	25 May 2026 05:03PM UTC	331	35.95	GitHub Action Run
27	daemon-online-rpc-4 - 26411444620.27	25 May 2026 05:04PM UTC	325	23.36	GitHub Action Run
28	daemon-online-lifecycle - 26411444620.28	25 May 2026 05:03PM UTC	325	22.75	GitHub Action Run

lsm / neokai / 26411444620
82%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 26411444620

lsm / neokai / 26411444620 82%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 26411444620

lsm / neokai / 26411444620
82%

README BADGES
x