• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

IBM / unitxt / 13953172985
81%

Build:
DEFAULT BRANCH: main
Ran 19 Mar 2025 05:56PM UTC
Jobs 1
Files 62
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

19 Mar 2025 05:42PM UTC coverage: 80.233% (-0.001%) from 80.234%
13953172985

push

github

web-flow
Improve LLM as Judge consistency (#1688)

* Improve final answer prompt to avoid contradictions with the assessment

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Be consistent in how the criteria is shown to the LLM

Now both the assessment prompt and the final answer prompt show the criteria in the same way

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Set default temperature to 0 to improve consistency

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Adapt llm as judge catalog to use temperature = 0

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

---------

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

1569 of 1947 branches covered (80.59%)

Branch coverage included in aggregate %.

9816 of 12243 relevant lines covered (80.18%)

0.8 hits per line

Uncovered Existing Lines

Lines Coverage ∆ File
79
42.95
-0.18% unitxt/llm_as_judge.py
Jobs
ID Job ID Ran Files Coverage
1 13953172985.1 19 Mar 2025 05:56PM UTC 62
80.23
GitHub Action Run
Source Files on build 13953172985
  • Tree
  • List 62
  • Changed 2
  • Source Changed 0
  • Coverage Changed 2
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses
  • Back to Repo
  • Github Actions Build #13953172985
  • dae893f5 on github
  • Prev Build on main (#13952338067)
  • Next Build on main (#13953625983)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc