IBM / unitxt
81%
main: 81%

Repo Added 24 Dec 2024 03:17PM UTC

Files 64

Badge

LAST BUILD ON BRANCH default-template-policy
branch: default-template-policy

CHANGE BRANCH
x

Reset

default-template-policy

1.16.1

1.16.2

1.16.3

1.16.4

1.17.0

1.17.1

1.17.2

1.18.0

1.19.0

1.20.0

1.21.0

1.22.1

1.22.3

1.23.0

1.23.1

1.24.0

1.25.0

1.26.0

1.26.1

1.26.10

1.26.2

1.26.3

1.26.4

1.26.5

1.26.6

1.26.7

1.26.8

1.26.9

2024-blog

Add-multiple-choice-example

Added-example-for-standalone-metric-evaluation

Added-param-to-control-of-confidence-interval-calculation-in-evaluate-api

Documenation-updates

Example-of-creating-yaml-representation-of-card

Fix-LoadJsonFile

LoadFromAPI-optional-apikey

accelerate-rag-metrics

add-audio-support

add-balance-operator

add-cache-gitignore

add-cross-inference-models

add-docstring-llm-judge

add-engine-id-method

add-format-and-system-prompt-to-meta-data

add-global-mmlu-lite-sensitivity-cards

add-granite-docs-format

add-hf-to-cross-provider-inference-engine

add-inline-template-support

add-metric-example

add-more-judges

add-more-llmjudge-benchmarks

add-more-metrics-for-schema-linking

add-non-verify-option-to-api-loader

add-quality-dataset

add-replicate

add-schema-linking

add-spacy-req-to-examples-tests

add-text2sql

add-text2sql-blog-post

add-to_markdown-to-instance-score

add-to_yaml-for-artifiact

add-tokenizer-name

add-vision-benchmark-example

add-vllm-to-cross

add_completeness_judge

add_entity_type_filter_to_operators

add_generation_text_to_meta_data

add_judges

add_metadata

added_social_iqa_card

airbench

allow-read-timeout

an_issue_with_loader_cache

api_call_evaluation

arc-indic-rudra

arena-hard-fix

assistant-improve-links

assistant_assessment

assitant-with-search

atta_q_safety

azure

banner-top-website

base-dep

batch-size-inference

bench-and-models

bench-recipe-in-cli

benjams/add_bioasq_miniwiki_datasets

benjams/add_hotpotqa

benjams/add_watson_x

benjams/enrich_tags

benjams/fix_bioasq_card

benjams/fix_clap_nq_benchmark

benjams/fix_clapnq

benjams/fix_watsonx_qa_dataset

biggen-bench

biggen-multilingual

biggen-revert

blog-update

cache-key-and-lock

ccc_inference

changes

chat_api_format

cli-benchmark-fix

cli-enhancements

cli-imports

cli-util

clinc-faster

codecov

comment-out-sql

convert-inline-templates

correct_tool_calling

correcteness-criteria

criteria-typo

criterias

cross-inference-add-model

cross-inference-custom-model

csv-loader

data-classification-cross-provider-engine

datasets351

dedup-operator

demos-sampling-seed

demos_experimental

disable-litellm-cache

down-dount

ds-4-req

empty_yaml_strings

entity_squad_metric

eval_assist_documentation

evalassist-judges

evaluate_different_formats

extend-choices-order

extend_coverage_some

external_client_for_wml_infer_engine

f1-docs

feature/add-global-mmlu-cards

filter_if_missing_field

filter_wikitq

finqa-hash-to-top

fix-DiverseLabelSampler

fix-artifact-saving

fix-aus-legal-qa

fix-azure-llmjudge

fix-azure-openai

fix-batching

fix-bench-docs

fix-bird-task

fix-bootstrap-empty

fix-bug-when-WML-does-not-return-any-content-or-tool-call

fix-cache-dir

fix-catalog

fix-criteria-json

fix-datasets-4

fix-dependencies-installation

fix-disable-mem-caching

fix-examples

fix-fusion

fix-images-demos-pool

fix-inference

fix-inference-tests

fix-issue-in-token-decosing

fix-litellm-without-task-data

fix-load-csv

fix-loaders-trust

fix-loading2

fix-metrics-docs

fix-missing-dataset

fix-model-name

fix-mt_bench-style-llm-as-judge-post-processor

fix-multiple-source-loader

fix-nan-ci

fix-number-of-batchs

fix-pearsonr-tests

fix-qa-evaluation-data-classification-policy

fix-rag-metrics

fix-rits-model-names

fix-scout-name

fix-some-tests

fix-tablebench-dp-split

fix-task-metrics

fix-tests

fix-tests-sacrebleu-ja

fix-text2sql_utils-sort_df

fix-tools-nested-params

fix-typo-in-azure-openai-variable-name-and-dictionary-key

fix-vision

fix-zero-division-in-compare-performances

fix/catalog-prep-hf-login

fix/correct-choice-position-handling

fix/disable-milu-test-gated-dataset

fix/hf-namespaced-dataset-paths

fix/inference-tests-model-updates

fix/negative-index-support

fix_assistance_token_error

fix_bfcl

fix_global_mmlu

fix_llmjudge

fix_mmmu

fix_mtrag

fix_ollama

fix_performance_test

fix_prompts_table_benchmark

fix_rag_metrics

fix_summarize_from_human_feedback

fix_torr_vulnerable_dependencies

fix_xlam_function_calling

fixed-bug-in-tool-inference

fixed_wiki_bio

fixing_criterias_in_catalog

frames

from_api_import

function-operators

gg-add-prompt-to-result

gg-fc-fix

gg-hf

gg-prediction-field

global-mmlu-improvment

gpqa

granite-guardian-minor-changes

granite-guardian-result-type

granite-guardian-support

groupby_processor

handle_empty_tool_call_list

head-qa-updates

helm-test-fix

hf-cache

hf-files

hf-retry

hf-timeout

hf-tool-calling

hf_pipeline_peft

homepage

hub-rust

image_key_value_extrqaction

imports_html_button

improve-assistant

improve-context-parsing

improve-score-option-selection

improve-tc-example

improve__instance_scores_summary

improve_inference_log

improve_merge_error_message

improved-error-messages

improved-parsing-of-MT-bench-style-rating-parsing

improved_multi_turn_example

indic_milu

inference_engine_cache

issue-1881

issues-stale

jb/fix-arena-hard-template

jb/fix-cli

jb/gg-hack

jb/provoq-updates

jb/replicate-models

jb/safety-updates

json

json-loader

jsonschema

just_lazy_loader

just_to_run_examples

key_value_extraction_improvements

know_your_splits

last_line_processor

lazy-return-multi-stream

lazy_evaluate

lazy_loadHF

lazy_scipy

llm-as-judge-metric-update-again

llm-judge-cot

llm-judge-granite-evals

llm-judge-judgebench

llm-judge-prepare

llm-judge-response-name

llm-judge-str-evaluator-name

llm-judge-summaries

llm-judge-use-cross-provider

llmjudge-add-prompts-by-default

llmjudge-changes

load_dataset_use_cache_default

local-cache

log-probs-hf-fix

long-bench

main

meteor_n_resample

metric_based_ner

metrics-formatting

metrics_fix

mistral_small_watsonx_support

mixed_args_support

mlcommons-ailuminate

mm_updates

mock-performence

module_name_same_catalog

more-bluebench-fixes

mtrag

mtrag_corpora

multi-turn-metrics

multi_turn_rag_example

multiple-choice-improved

multithreading-support

nave_tool_calling

ner_example

networkx

new-base-metric

new-text2sql-metrics-scores

no_iterable_datasets

no_loader_cache

normalize-bench-target

nve_tool_calling

ollama

ollama-host

ollama_inference

override_ci_method_globalmetric

pandas-403

patch-1

peft

performance_blue_benchmark

performance_no_cProfile

performance_no_cProfile_existing_loaders

pipeline_tokenizer

place-correct-choice-position

polish_performance

prediction-type-without-load

prep-tests

preparation3

prevent-ds-4

protobuf

provider-specific-args-and-allow-unroecognized-model-name

pythonize_the_yaml

rag-bench

rag-metric-update-again

ragbench

readme-update

real_mm_rag

refactor-inference

refactor-llm-ad-judge-to-map-reduce

reflector-integration

reflector-semantic-integration

remote_catalog

remove-balance-new

remove-ibm-branding-from-doc

remove-src-lock

remove_bam_llm_as_judges

remove_break_point

remove_ds351_installation

remove_genai_support

remove_gpqa_experts

remove_redundant_from_performance_yml

renovate/configure

return_source_to_recipe_to_performance

reuse-hf-cache-for-actions

reuters-improvments

rits_infer

safety-benchmark

safety_airbench2024

security/fix-cwe95-eval-injection

settings-docs

simple_qa

simplify-artifact-link

small_issue_with_error_box

small_modifs_to_profiler

small_typos_in_loaders

small_typos_to_profiler

social_iqa_new

space-id-only

speed-up-prep-tests

sqllite3-error

summaries-pos-bias

support-max-per-split-in-benchmarks

system-leakage

table_as_image

tables_bench

task-types

test_faithfulness_with_external_client

text2sql-execution-accuracy-metric-fix

text2sql-metric-fixes

text2sql-metrics-cache

text2sql-metrics-fixes

text2sql-metrics-update

tool-calling-3

tool-calling-correctness

tool-calling-multi-turn

tool-calling-support

tool-calling-wx_ai

torr

torr_documentation

tot

touch_the_loaded_dataset

try_lmarena-ai_arena_hard_auto

typed_recipe_artifact_saving

typo_in_intersect_corr_fields

unitxt-assistant

up-readme

upd-readme

update-ag-news

update-cov

update-datasets-descriptions

update-metrics-docs

update-sacrebleu

update-to-tool-calling-metric

update-vis-bench

update_ibm_wml_engine_#1775

update_rag_metrics

update_rag_metrics_leftover

updates-7

use-repr-for-cache

users/ofir/add_qa_template_exact_output

users/ofir/hf_inference_debug

users/ofir/template_for_bbq

users/ofir/update_Wml_llmajj

vision_bench

vision_bench_update

vision_templates

whitesource/configure

wml_comp

wxai-async-chat

wxai-chat-features

xstest

yifanmai/cross-provider-vertex-ai

yifanmai/fix-indexed-row-major-none

yifanmai/wikitq-1-shot

Committed 12 Feb 2025 12:24PM UTC coverage: 81.122% (-0.002%) from 81.124%

Build # 13285293451

Build Type

github

Committed by

Commit Message

Merge 18dd20344 into d7200b518

Pull Request Pull Request #1596: Prioritize using default templates from card over task

Coverage Stats

1498 of 1839 branches covered (81.46%)

Branch coverage included in aggregate %.

9507 of 11727 relevant lines covered (81.07%)

0.81 hits per line

Relevant lines Covered

11727 RELEVANT LINES 9507 COVERED LINES

0.81 HITS PER LINE

Source Files on default-template-policy

Recent builds

Builds	Branch	Commit	Type	Ran	Committer	Via	Coverage
13285293451	default-template-policy	Merge 18dd20344 into d7200b518	Pull #1596	12 Feb 2025 12:40PM UTC	web-flow	github	81.12
13284061874	default-template-policy	Merge e627140f9 into 6c82ceb05	Pull #1596	12 Feb 2025 11:20AM UTC	web-flow	github	81.11
13280933272	default-template-policy	Merge f07f7cf3f into 309749edb	Pull #1596	12 Feb 2025 08:27AM UTC	web-flow	github	81.13
13265747003	default-template-policy	Merge f41a1ae2d into 7d3d51731	Pull #1596	11 Feb 2025 03:01PM UTC	web-flow	github	81.43

See All Builds (1895)