12853871276
81%

Ran 19 Jan 2025 01:15PM UTC

Jobs 1

Files 61

Run time 1min

Badge

Embed ▾

Committed 19 Jan 2025 01:10PM UTC coverage: 79.592% (+0.1%) from 79.447%

Build # 12853871276

Build Type

push

github

Committed by

web-flow

Commit Message

Minor llm as judge fix/changes (#1467)

* Import llm judge operators directly from llm_as_judge

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* fix bug in pairwise ranking calculation

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Reinitialized reduction map between invocations to compute with different numbre of predictions.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Add more criterias

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Improve pairwise option selection prompt

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Change pairwise main_score

Now pairwise main score is the winrate of the first system

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Pairwise instance predictions cannot be a dict anymore

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Pairwise summarization prompt: ask for both response winner to be clearly stated

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Add more gpt models

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Criteria name changes

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Apply pre-commit

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Update src/unitxt/catalog/metrics/llm_as_judge/direct/criterias/question_answer_quality.json

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Remove unused code

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Remove default criteria name

This allows to know if a criteria wasn't given a name and thus allows to deduce which main_score to use

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Move get_criteria to LLMJudge class

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Improve direct score names

Signed-off-by: Martín Santillán Cooper <msantillancooper@ibm.com>

* Check criteria type in befo... (continued)

Coverage Stats

1394 of 1740 branches covered (80.11%)

Branch coverage included in aggregate %.

8824 of 11098 relevant lines covered (79.51%)

0.8 hits per line

Coverage Regressions

Lines	Coverage	∆	File
166	16.98	0.98%	unitxt/llm_as_judge.py

Jobs

ID	Job ID	Ran	Files	Coverage
1	12853871276.1	19 Jan 2025 01:15PM UTC	61	79.59	GitHub Action Run

IBM / unitxt / 12853871276
81%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 12853871276

IBM / unitxt / 12853871276 81%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Coverage Regressions

Jobs

Source Files on build 12853871276

IBM / unitxt / 12853871276
81%

README BADGES
x