• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

IBM / unitxt / 12853871276
81%

Build:
DEFAULT BRANCH: main
Ran 19 Jan 2025 01:15PM UTC
Jobs 1
Files 61
Run time 1min
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

19 Jan 2025 01:10PM UTC coverage: 79.592% (+0.1%) from 79.447%
12853871276

push

github

web-flow
Minor llm as judge fix/changes (#1467)

* Import llm judge operators directly from llm_as_judge

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* fix bug in pairwise ranking calculation

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Reinitialized reduction map between invocations to compute with different numbre of predictions.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Add more criterias

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Improve pairwise option selection prompt

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Change pairwise main_score

Now pairwise main score is the winrate of the first system

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Pairwise instance predictions cannot be a dict anymore

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Pairwise summarization prompt: ask for both response winner to be clearly stated

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Add more gpt models

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Criteria name changes

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Apply pre-commit

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Update src/unitxt/catalog/metrics/llm_as_judge/direct/criterias/question_answer_quality.json

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Remove unused code

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Remove default criteria name

This allows to know if a criteria wasn't given a name and thus allows to deduce which main_score to use

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Move get_criteria to LLMJudge class

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Improve direct score names

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Check criteria type in befo... (continued)

1394 of 1740 branches covered (80.11%)

Branch coverage included in aggregate %.

8824 of 11098 relevant lines covered (79.51%)

0.8 hits per line

Uncovered Existing Lines

Lines Coverage ∆ File
166
16.98
0.98% unitxt/llm_as_judge.py
Jobs
ID Job ID Ran Files Coverage
1 12853871276.1 19 Jan 2025 01:15PM UTC 61
79.59
GitHub Action Run
Source Files on build 12853871276
  • Tree
  • List 61
  • Changed 3
  • Source Changed 0
  • Coverage Changed 3
Coverage ∆ File Lines Relevant Covered Missed Hits/Line Branch Hits Branch Misses
  • Back to Repo
  • Github Actions Build #12853871276
  • b17ac7c6 on github
  • Prev Build on main (#12853582818)
  • Next Build on main (#12854108555)
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc