Skip to main content

Retrieval Stack

  1. query rewrite - multi-turn follow-ups rewritten to standalone queries (see Chat Internals)
  2. lexical candidates from FTS5
  3. semantic candidates from vector search
  4. merge + dedupe
  5. rerank top candidates
  6. apply feedback-weighted adjustment
The rewritten query from step 0 is used for both retrieval (steps 1-5) and final answer generation. This ensures the LLM sees a focused question like “Python usage by Nestlé data teams” instead of the raw “tell me more about it”.

Default Retrieval Shape

  • high-recall candidate generation before rerank
  • final top-k context for generation
  • rerank resilience: retry transient errors 2x with backoff, then fall back to vector similarity scores (L2-to-similarity conversion); reranker_available flag threaded through timing and response for observability

Feedback Influence

Source-level feedback can demote repeatedly low-value sources. Canonical source-id normalization prevents alias double-counting. Optional time-decay mode can prioritize recent feedback over stale signals.

Multi-Namespace Queries

“All namespace” flows merge results across namespaces, then rerank globally.

No-Result Behavior

When evidence quality is insufficient, chat can route to deterministic follow-up options (retry in library, search online, or rephrase).