22238458228
100%

Ran 20 Feb 2026 08:09PM UTC

Jobs 1

Files 250

Run time 1min

Badge

Embed ▾

Committed 20 Feb 2026 07:40PM UTC coverage: 100.0%. Remained the same

Build # 22238458228

Build Type

push

github

Committed by

web-flow

Commit Message

Add BB25 normalization for sparse encoders (#1046)

* Add log-odds conjunction fusion for BB25 hybrid search

BB25 normalization outputs calibrated probabilities, but the existing
hybrid fusion uses convex combination which discards the Bayesian
probability semantics. This causes BB25 to regress on 4/5 BEIR datasets.

Add log-odds conjunction fusion (from "From Bayesian Inference to Neural
Computation") that correctly combines probability signals in logit space
with per-query dynamic calibration for dense cosine scores.

- scoring/normalize.py: Extract Bayesian method check into isbayes()
- scoring/base.py: Add default isbayes() returning False
- scoring/tfidf.py: Add isbayes() delegating to normalizer
- search/base.py: Add logodds(), convex(), rrf() fusion methods;
  dispatch based on isbayes()

BEIR nDCG@10 results (BB25+LogOdds vs Default):
  arguana +2.23, fiqa +2.03, scidocs +0.62, scifact +1.33, nfcorpus -1.96

* Extract Hybrid class for score fusion strategies

Move logodds, convex, and rrf fusion methods from Search into
a dedicated Hybrid class, following the same pattern as Normalize.

* Fix coding convention issues in Hybrid class for CI

- Fix black formatting: remove unnecessary parentheses, remove spaces around **
- Fix pylint too-many-branches: extract calibrate() method from logodds()
- Fix pylint unused-variable: rename score to _ in rrf()

* Add BB25 normalization for sparse encoders and fix IVFSparse topn bug

- Support `normalize: bb25` config for sparse encoder scoring, enabling
  Bayesian sigmoid calibration as an alternative to default linear
  normalization. Reuses existing Normalize.bayes() infrastructure.

- Fix dimension check in IVFSparse.topn(): use scores.shape[1] (number
  of data items) instead of scores.shape[0] (number of queries) for the
  argpartition kth bound check. The previous code caused ValueError when
  the number of centroids was less than nprobe.

* Add tests for BB25 sparse normalization and IVFSparse topn fix

Run Details

10 of 10 new or added lines in 2 files covered. (100.0%)

9567 of 9567 relevant lines covered (100.0%)

1.0 hits per line

Jobs

ID	Job ID	Ran	Files	Coverage
1	22238458228.1	20 Feb 2026 08:09PM UTC	250	100.0	GitHub Action Run

neuml / txtai / 22238458228
100%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 22238458228

neuml / txtai / 22238458228 100%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Jobs

Source Files on build 22238458228

neuml / txtai / 22238458228
100%

README BADGES
x