• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

IBM / unitxt
80%
main: 81%

Build:
Build:
LAST BUILD BRANCH: fix/disable-milu-test-gated-dataset
DEFAULT BRANCH: main
Repo Added 24 Dec 2024 03:17PM UTC
Files 64
Badge
Embed â–¾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH frames
branch: frames
CHANGE BRANCH
x
Reset
  • frames
  • 1.16.1
  • 1.16.2
  • 1.16.3
  • 1.16.4
  • 1.17.0
  • 1.17.1
  • 1.17.2
  • 1.18.0
  • 1.19.0
  • 1.20.0
  • 1.21.0
  • 1.22.1
  • 1.22.3
  • 1.23.0
  • 1.23.1
  • 1.24.0
  • 1.25.0
  • 1.26.0
  • 1.26.1
  • 1.26.2
  • 1.26.3
  • 1.26.4
  • 1.26.5
  • 1.26.6
  • 1.26.7
  • 1.26.8
  • 1.26.9
  • 2024-blog
  • Add-multiple-choice-example
  • Added-example-for-standalone-metric-evaluation
  • Added-param-to-control-of-confidence-interval-calculation-in-evaluate-api
  • Documenation-updates
  • Example-of-creating-yaml-representation-of-card
  • Fix-LoadJsonFile
  • LoadFromAPI-optional-apikey
  • accelerate-rag-metrics
  • add-audio-support
  • add-balance-operator
  • add-cache-gitignore
  • add-cross-inference-models
  • add-docstring-llm-judge
  • add-engine-id-method
  • add-format-and-system-prompt-to-meta-data
  • add-global-mmlu-lite-sensitivity-cards
  • add-granite-docs-format
  • add-hf-to-cross-provider-inference-engine
  • add-inline-template-support
  • add-metric-example
  • add-more-judges
  • add-more-llmjudge-benchmarks
  • add-more-metrics-for-schema-linking
  • add-non-verify-option-to-api-loader
  • add-quality-dataset
  • add-replicate
  • add-schema-linking
  • add-spacy-req-to-examples-tests
  • add-text2sql
  • add-text2sql-blog-post
  • add-to_markdown-to-instance-score
  • add-to_yaml-for-artifiact
  • add-tokenizer-name
  • add-vision-benchmark-example
  • add-vllm-to-cross
  • add_completeness_judge
  • add_entity_type_filter_to_operators
  • add_generation_text_to_meta_data
  • add_judges
  • add_metadata
  • added_social_iqa_card
  • airbench
  • allow-read-timeout
  • an_issue_with_loader_cache
  • api_call_evaluation
  • arc-indic-rudra
  • arena-hard-fix
  • assistant-improve-links
  • assistant_assessment
  • assitant-with-search
  • atta_q_safety
  • azure
  • banner-top-website
  • base-dep
  • batch-size-inference
  • bench-and-models
  • bench-recipe-in-cli
  • benjams/add_bioasq_miniwiki_datasets
  • benjams/add_hotpotqa
  • benjams/add_watson_x
  • benjams/enrich_tags
  • benjams/fix_bioasq_card
  • benjams/fix_clap_nq_benchmark
  • benjams/fix_clapnq
  • benjams/fix_watsonx_qa_dataset
  • biggen-bench
  • biggen-multilingual
  • biggen-revert
  • blog-update
  • cache-key-and-lock
  • ccc_inference
  • changes
  • chat_api_format
  • cli-benchmark-fix
  • cli-enhancements
  • cli-imports
  • cli-util
  • clinc-faster
  • codecov
  • comment-out-sql
  • convert-inline-templates
  • correct_tool_calling
  • correcteness-criteria
  • criteria-typo
  • criterias
  • cross-inference-add-model
  • cross-inference-custom-model
  • csv-loader
  • data-classification-cross-provider-engine
  • datasets351
  • dedup-operator
  • default-template-policy
  • demos-sampling-seed
  • demos_experimental
  • disable-litellm-cache
  • down-dount
  • ds-4-req
  • empty_yaml_strings
  • entity_squad_metric
  • eval_assist_documentation
  • evalassist-judges
  • evaluate_different_formats
  • extend-choices-order
  • extend_coverage_some
  • external_client_for_wml_infer_engine
  • f1-docs
  • feature/add-global-mmlu-cards
  • filter_if_missing_field
  • filter_wikitq
  • finqa-hash-to-top
  • fix-DiverseLabelSampler
  • fix-artifact-saving
  • fix-aus-legal-qa
  • fix-azure-llmjudge
  • fix-azure-openai
  • fix-batching
  • fix-bench-docs
  • fix-bird-task
  • fix-bootstrap-empty
  • fix-bug-when-WML-does-not-return-any-content-or-tool-call
  • fix-cache-dir
  • fix-catalog
  • fix-criteria-json
  • fix-datasets-4
  • fix-dependencies-installation
  • fix-disable-mem-caching
  • fix-examples
  • fix-fusion
  • fix-images-demos-pool
  • fix-inference
  • fix-inference-tests
  • fix-issue-in-token-decosing
  • fix-litellm-without-task-data
  • fix-load-csv
  • fix-loaders-trust
  • fix-loading2
  • fix-metrics-docs
  • fix-missing-dataset
  • fix-model-name
  • fix-mt_bench-style-llm-as-judge-post-processor
  • fix-multiple-source-loader
  • fix-nan-ci
  • fix-number-of-batchs
  • fix-pearsonr-tests
  • fix-qa-evaluation-data-classification-policy
  • fix-rag-metrics
  • fix-rits-model-names
  • fix-scout-name
  • fix-some-tests
  • fix-tablebench-dp-split
  • fix-task-metrics
  • fix-tests
  • fix-tests-sacrebleu-ja
  • fix-text2sql_utils-sort_df
  • fix-tools-nested-params
  • fix-typo-in-azure-openai-variable-name-and-dictionary-key
  • fix-vision
  • fix-zero-division-in-compare-performances
  • fix/correct-choice-position-handling
  • fix/disable-milu-test-gated-dataset
  • fix/negative-index-support
  • fix_assistance_token_error
  • fix_bfcl
  • fix_global_mmlu
  • fix_llmjudge
  • fix_mmmu
  • fix_mtrag
  • fix_ollama
  • fix_performance_test
  • fix_prompts_table_benchmark
  • fix_rag_metrics
  • fix_summarize_from_human_feedback
  • fix_xlam_function_calling
  • fixed-bug-in-tool-inference
  • fixed_wiki_bio
  • fixing_criterias_in_catalog
  • from_api_import
  • function-operators
  • gg-add-prompt-to-result
  • gg-fc-fix
  • gg-hf
  • gg-prediction-field
  • global-mmlu-improvment
  • gpqa
  • granite-guardian-minor-changes
  • granite-guardian-result-type
  • granite-guardian-support
  • groupby_processor
  • handle_empty_tool_call_list
  • head-qa-updates
  • helm-test-fix
  • hf-cache
  • hf-files
  • hf-retry
  • hf-timeout
  • hf-tool-calling
  • hf_pipeline_peft
  • homepage
  • hub-rust
  • image_key_value_extrqaction
  • imports_html_button
  • improve-assistant
  • improve-context-parsing
  • improve-score-option-selection
  • improve-tc-example
  • improve__instance_scores_summary
  • improve_inference_log
  • improve_merge_error_message
  • improved-error-messages
  • improved-parsing-of-MT-bench-style-rating-parsing
  • improved_multi_turn_example
  • indic_milu
  • inference_engine_cache
  • issue-1881
  • issues-stale
  • jb/fix-arena-hard-template
  • jb/fix-cli
  • jb/gg-hack
  • jb/provoq-updates
  • jb/replicate-models
  • jb/safety-updates
  • json
  • json-loader
  • jsonschema
  • just_lazy_loader
  • just_to_run_examples
  • key_value_extraction_improvements
  • know_your_splits
  • last_line_processor
  • lazy-return-multi-stream
  • lazy_evaluate
  • lazy_loadHF
  • lazy_scipy
  • llm-as-judge-metric-update-again
  • llm-judge-cot
  • llm-judge-granite-evals
  • llm-judge-judgebench
  • llm-judge-prepare
  • llm-judge-response-name
  • llm-judge-str-evaluator-name
  • llm-judge-summaries
  • llm-judge-use-cross-provider
  • llmjudge-add-prompts-by-default
  • llmjudge-changes
  • load_dataset_use_cache_default
  • local-cache
  • log-probs-hf-fix
  • long-bench
  • main
  • meteor_n_resample
  • metric_based_ner
  • metrics-formatting
  • metrics_fix
  • mistral_small_watsonx_support
  • mixed_args_support
  • mlcommons-ailuminate
  • mm_updates
  • mock-performence
  • module_name_same_catalog
  • more-bluebench-fixes
  • mtrag
  • mtrag_corpora
  • multi-turn-metrics
  • multi_turn_rag_example
  • multiple-choice-improved
  • multithreading-support
  • nave_tool_calling
  • ner_example
  • networkx
  • new-base-metric
  • new-text2sql-metrics-scores
  • no_iterable_datasets
  • no_loader_cache
  • normalize-bench-target
  • nve_tool_calling
  • ollama
  • ollama-host
  • ollama_inference
  • override_ci_method_globalmetric
  • pandas-403
  • patch-1
  • peft
  • performance_blue_benchmark
  • performance_no_cProfile
  • performance_no_cProfile_existing_loaders
  • pipeline_tokenizer
  • place-correct-choice-position
  • polish_performance
  • prediction-type-without-load
  • prep-tests
  • preparation3
  • prevent-ds-4
  • protobuf
  • provider-specific-args-and-allow-unroecognized-model-name
  • pythonize_the_yaml
  • rag-bench
  • rag-metric-update-again
  • ragbench
  • readme-update
  • real_mm_rag
  • refactor-inference
  • refactor-llm-ad-judge-to-map-reduce
  • reflector-integration
  • reflector-semantic-integration
  • remote_catalog
  • remove-balance-new
  • remove-ibm-branding-from-doc
  • remove-src-lock
  • remove_bam_llm_as_judges
  • remove_break_point
  • remove_ds351_installation
  • remove_genai_support
  • remove_gpqa_experts
  • remove_redundant_from_performance_yml
  • renovate/configure
  • return_source_to_recipe_to_performance
  • reuse-hf-cache-for-actions
  • reuters-improvments
  • rits_infer
  • safety-benchmark
  • safety_airbench2024
  • settings-docs
  • simple_qa
  • simplify-artifact-link
  • small_issue_with_error_box
  • small_modifs_to_profiler
  • small_typos_in_loaders
  • small_typos_to_profiler
  • social_iqa_new
  • space-id-only
  • speed-up-prep-tests
  • sqllite3-error
  • summaries-pos-bias
  • support-max-per-split-in-benchmarks
  • system-leakage
  • table_as_image
  • tables_bench
  • task-types
  • test_faithfulness_with_external_client
  • text2sql-execution-accuracy-metric-fix
  • text2sql-metric-fixes
  • text2sql-metrics-cache
  • text2sql-metrics-fixes
  • text2sql-metrics-update
  • tool-calling-3
  • tool-calling-correctness
  • tool-calling-multi-turn
  • tool-calling-support
  • tool-calling-wx_ai
  • torr
  • torr_documentation
  • tot
  • touch_the_loaded_dataset
  • try_lmarena-ai_arena_hard_auto
  • typed_recipe_artifact_saving
  • typo_in_intersect_corr_fields
  • unitxt-assistant
  • up-readme
  • upd-readme
  • update-ag-news
  • update-cov
  • update-datasets-descriptions
  • update-metrics-docs
  • update-sacrebleu
  • update-to-tool-calling-metric
  • update-vis-bench
  • update_ibm_wml_engine_#1775
  • update_rag_metrics
  • update_rag_metrics_leftover
  • updates-7
  • use-repr-for-cache
  • users/ofir/add_qa_template_exact_output
  • users/ofir/hf_inference_debug
  • users/ofir/template_for_bbq
  • users/ofir/update_Wml_llmajj
  • vision_bench
  • vision_bench_update
  • vision_templates
  • whitesource/configure
  • wml_comp
  • wxai-async-chat
  • wxai-chat-features
  • xstest
  • yifanmai/cross-provider-vertex-ai
  • yifanmai/fix-indexed-row-major-none
  • yifanmai/wikitq-1-shot

07 Jan 2025 10:23AM UTC coverage: 80.234%. First build
12649665693

Pull #1477

github

web-flow
Merge d657f74de into c997e2808
Pull Request #1477: Add multi document support and FRAMES benchmark

1380 of 1711 branches covered (80.65%)

Branch coverage included in aggregate %.

8695 of 10846 relevant lines covered (80.17%)

0.8 hits per line

Relevant lines Covered
Build:
Build:
10846 RELEVANT LINES 8695 COVERED LINES
0.8 HITS PER LINE
Source Files on frames
Detailed source file information is not available for this build.

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
12649665693 frames Merge d657f74de into c997e2808 Pull #1477 07 Jan 2025 10:28AM UTC web-flow github
80.23
12649054828 frames Merge afe865dfd into 44368cdec Pull #1477 07 Jan 2025 09:51AM UTC web-flow github
80.03
12647357363 frames Merge 7d38899e6 into 44368cdec Pull #1477 07 Jan 2025 07:58AM UTC web-flow github
80.03
12629939926 frames Merge b4a9bc01e into b32bb80fa Pull #1477 06 Jan 2025 09:32AM UTC web-flow github
80.01
See All Builds (1863)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc