• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

lsm / neokai / 26411444620 / 13
83%
dev: 83%

Build:
DEFAULT BRANCH: dev
Ran 25 May 2026 05:04PM UTC
Files 325
Run time 10s
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

25 May 2026 05:01PM UTC coverage: 18.449%. First build
26411444620.13

push

github

web-flow
Benchmark graph context tools on task 394 (#2009)

* docs: benchmark graph context tools

Compare CodeGraph, code-review-graph, Graphify, and baseline on task #394 to guide optional NeoKai integration priority.

* docs: correct graph benchmark findings

Address review feedback by fixing the task #394 answer key, MCP tool counts, and Graphify runtime notes.

* docs: add ast-grep benchmark comparison

Benchmark ast-grep as a structural search baseline alongside the graph context tools for task #394.

* docs: add unseeded graph benchmark round

* docs: add plain unseeded GLM baseline

* docs: add mixed graph benchmark round

* test: add graph tool benchmark as agent session integration test

Proper benchmark using NeoKai daemon sessions with MCP tool servers
attached, not raw Python HTTP calls. 12 test cases (describe.skip by
default): baseline GLM, 4 unseeded tool cases, 4 mixed discovery cases,
plus mixed baseline. Outputs JSON results to /tmp/.

Run: cd packages/daemon && GLM_API_KEY=xxx bun test tests/online/benchmark/benchmark-graph-tools.test.ts

* docs: add agent session benchmark results and fix GLM-SDK compatibility

Run graph tool benchmark through real NeoKai daemon sessions with MCP
servers attached. Key findings:

- GLM-5.x tool_use responses incompatible with Claude Agent SDK context-fetcher
- GLM-4.7 works for text-only and single-tool MCP sessions
- Mixed multi-tool sessions hang due to same SDK incompatibility
- GLM-4.7 did not voluntarily invoke MCP tools in any test case
- All 4 completed tests (baseline, CodeGraph, CRG, ast-grep) produced
  text-only plans with zero tool calls

Restructure benchmark: drop mixed round, keep unseeded tests only,
add text-only baseline prompt, increase timeouts, build indexes before
daemon start to avoid transport PONG timeout.

* fix: address benchmark PR review feedback

- Use BENCHMARK_PROMPT_UNSEDED for MCP cases (not TEXT_ONLY) so tools
  are not suppressed
- Record real commit SHA via git rev-parse... (continued)

16903 of 91618 relevant lines covered (18.45%)

9.71 hits per line

Source Files on job daemon-online-mcp - 26411444620.13
  • Tree
  • List 325
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line
  • Back to Build 26411444620
  • 41158616 on github
  • Next Job for on dev (#26417906621.27)
  • Delete
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc