1
100%
master: 100%

Ran 22 Feb 2026 01:55AM UTC

Files 250

Run time 4s

Badge

Embed ▾

Committed 20 Feb 2026 07:40PM UTC coverage: 100.0% (+0.4%) from 99.571%

Job # 22247893261.1

Build Type

push

github

Committed by

web-flow

Commit Message

Add BB25 normalization for sparse encoders (#1046)

* Add log-odds conjunction fusion for BB25 hybrid search

BB25 normalization outputs calibrated probabilities, but the existing
hybrid fusion uses convex combination which discards the Bayesian
probability semantics. This causes BB25 to regress on 4/5 BEIR datasets.

Add log-odds conjunction fusion (from "From Bayesian Inference to Neural
Computation") that correctly combines probability signals in logit space
with per-query dynamic calibration for dense cosine scores.

- scoring/normalize.py: Extract Bayesian method check into isbayes()
- scoring/base.py: Add default isbayes() returning False
- scoring/tfidf.py: Add isbayes() delegating to normalizer
- search/base.py: Add logodds(), convex(), rrf() fusion methods;
  dispatch based on isbayes()

BEIR nDCG@10 results (BB25+LogOdds vs Default):
  arguana +2.23, fiqa +2.03, scidocs +0.62, scifact +1.33, nfcorpus -1.96

* Extract Hybrid class for score fusion strategies

Move logodds, convex, and rrf fusion methods from Search into
a dedicated Hybrid class, following the same pattern as Normalize.

* Fix coding convention issues in Hybrid class for CI

- Fix black formatting: remove unnecessary parentheses, remove spaces around **
- Fix pylint too-many-branches: extract calibrate() method from logodds()
- Fix pylint unused-variable: rename score to _ in rrf()

* Add BB25 normalization for sparse encoders and fix IVFSparse topn bug

- Support `normalize: bb25` config for sparse encoder scoring, enabling
  Bayesian sigmoid calibration as an alternative to default linear
  normalization. Reuses existing Normalize.bayes() infrastructure.

- Fix dimension check in IVFSparse.topn(): use scores.shape[1] (number
  of data items) instead of scores.shape[0] (number of queries) for the
  argpartition kth bound check. The previous code caused ValueError when
  the number of centroids was less than nprobe.

* Add tests for BB25 sparse normalization and IVFSparse topn fix

Coverage Stats

9567 of 9567 relevant lines covered (100.0%)

1.0 hits per line

kp-forks / txtai / 22247893261 / 1
100%
master: 100%

README BADGES
x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 22247893261.1

kp-forks / txtai / 22247893261 / 1 100% master: 100%

README BADGES x

Markdown

Textile

RDoc

HTML

Rst

Source Files on job 22247893261.1

kp-forks / txtai / 22247893261 / 1
100%
master: 100%

README BADGES
x