• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

joaoh82 / rust_sqlite / 25563224520

08 May 2026 03:09PM UTC coverage: 65.365% (+0.3%) from 65.094%
25563224520

push

github

web-flow
feat(engine): HNSW probe widened to cosine + dot via per-index metric (SQLR-28) (#113)

Phase 7d.2's `try_hnsw_probe` was L2-only, so any KNN query using
`vec_distance_cosine` or `vec_distance_dot` silently fell through to
brute-force even with an HNSW index attached. Surfaced by the SQLR-23
v2 W10 bench: the HNSW variant clocked ~181 ms vs ~129 ms for
brute-force because the cosine hot loop never touched the graph.

Lands as sub-phase 7d.4 — per-index distance metric, no file format
bump (the metric round-trips via the synthesized CREATE INDEX SQL in
`sqlrite_master`):

- New SQL surface: `CREATE INDEX … USING hnsw (col) WITH (metric =
  '<l2|cosine|dot>')`. Omitting the WITH clause defaults to L2,
  so pre-SQLR-28 catalogs round-trip byte-identical. Typo'd metric
  names error at CREATE INDEX time rather than silently defaulting
  to L2 — that silent fallback is exactly what we're fixing.
- New `SqlriteDialect` (wraps sqlparser's `SQLiteDialect`, only
  override is `supports_create_index_with_clause = true`).
- `HnswIndexEntry` grows a `metric: DistanceMetric` field; the load,
  rebuild, and dirty-rebuild paths all consume the per-entry metric
  instead of hard-coded L2.
- `try_hnsw_probe` widens to all three `vec_distance_*` functions
  and only fires when the index entry's metric matches the query
  function. Mismatch → brute-force fallback (correct, just slow).
- W10 bench bumped to v3; the HNSW variant creates the index
  `WITH (metric = 'cosine')`. v1/v2 numbers are not comparable.
- Tests: cosine + dot self-query through the optimizer,
  metric-mismatch fallback, unknown-metric rejection, WITH-on-btree
  rejection, save+reopen preserves cosine metric.

Unblocks SQLR-25 (republish v2/v3 bench numbers).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

176 of 207 new or added lines in 10 files covered. (85.02%)

1 existing line in 1 file now uncovered.

9140 of 13983 relevant lines covered (65.37%)

1.2 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

63.33
/src/sql/dialect.rs
1
//! SQLRite SQL dialect.
2
//!
3
//! Wraps sqlparser's `SQLiteDialect` so we get every SQLite-specific
4
//! tokenizer/parser quirk (delimited identifiers, NOTNULL operator,
5
//! `LIMIT a, b`, `MATCH`/`REGEXP` infix, …) and overrides only what we
6
//! need for SQLRite's vector extensions:
7
//!
8
//! - `supports_create_index_with_clause = true` — lets the parser
9
//!   accept `CREATE INDEX … USING hnsw (col) WITH (metric = 'cosine')`.
10
//!   sqlparser's `SQLiteDialect` returns `false` from this method, so
11
//!   the WITH clause would otherwise be parked in `index_options` (or
12
//!   error). The PostgreSQL dialect already turns it on; we copy that
13
//!   behaviour here without taking the rest of the pgsql parser
14
//!   divergences.
15
//!
16
//! Add new dialect overrides here as the surface grows; everything not
17
//! explicitly listed defers to the base SQLite dialect.
18
use sqlparser::ast::{Expr, Statement};
19
use sqlparser::dialect::{Dialect, SQLiteDialect};
20
use sqlparser::parser::{Parser, ParserError};
21

22
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
23
pub struct SqlriteDialect {
24
    inner: SQLiteDialect,
25
}
26

27
impl SqlriteDialect {
NEW
28
    pub const fn new() -> Self {
×
29
        Self {
NEW
30
            inner: SQLiteDialect {},
×
31
        }
32
    }
33
}
34

35
impl Dialect for SqlriteDialect {
36
    fn is_delimited_identifier_start(&self, ch: char) -> bool {
2✔
37
        self.inner.is_delimited_identifier_start(ch)
2✔
38
    }
39

NEW
40
    fn identifier_quote_style(&self, identifier: &str) -> Option<char> {
×
NEW
41
        self.inner.identifier_quote_style(identifier)
×
42
    }
43

44
    fn is_identifier_start(&self, ch: char) -> bool {
2✔
45
        self.inner.is_identifier_start(ch)
2✔
46
    }
47

48
    fn is_identifier_part(&self, ch: char) -> bool {
2✔
49
        self.inner.is_identifier_part(ch)
2✔
50
    }
51

52
    fn supports_filter_during_aggregation(&self) -> bool {
1✔
53
        self.inner.supports_filter_during_aggregation()
1✔
54
    }
55

56
    fn supports_start_transaction_modifier(&self) -> bool {
1✔
57
        self.inner.supports_start_transaction_modifier()
1✔
58
    }
59

60
    fn supports_in_empty_list(&self) -> bool {
1✔
61
        self.inner.supports_in_empty_list()
1✔
62
    }
63

64
    fn supports_limit_comma(&self) -> bool {
1✔
65
        self.inner.supports_limit_comma()
1✔
66
    }
67

NEW
68
    fn supports_asc_desc_in_column_definition(&self) -> bool {
×
NEW
69
        self.inner.supports_asc_desc_in_column_definition()
×
70
    }
71

NEW
72
    fn supports_dollar_placeholder(&self) -> bool {
×
NEW
73
        self.inner.supports_dollar_placeholder()
×
74
    }
75

NEW
76
    fn supports_notnull_operator(&self) -> bool {
×
NEW
77
        self.inner.supports_notnull_operator()
×
78
    }
79

80
    fn parse_statement(&self, parser: &mut Parser) -> Option<Result<Statement, ParserError>> {
2✔
81
        self.inner.parse_statement(parser)
2✔
82
    }
83

84
    fn parse_infix(
1✔
85
        &self,
86
        parser: &mut Parser,
87
        expr: &Expr,
88
        precedence: u8,
89
    ) -> Option<Result<Expr, ParserError>> {
90
        self.inner.parse_infix(parser, expr, precedence)
1✔
91
    }
92

93
    /// SQLRite-specific extension: `CREATE INDEX … USING hnsw (col)
94
    /// WITH (metric = 'cosine')` is the canonical way to pick a
95
    /// non-L2 distance metric for an HNSW index. See
96
    /// `docs/supported-sql.md` and `try_hnsw_probe`.
97
    fn supports_create_index_with_clause(&self) -> bool {
2✔
NEW
98
        true
×
99
    }
100
}
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc