Skip to main content

Pipeline Summary

Entry Points

  • POST /api/chat (single JSON response)
  • POST /api/chat/stream (SSE stream)
Both paths share core orchestration and persist user messages before downstream retrieval/generation.

Routing Buckets

Deterministic buckets:
  • clear_meta
  • ambiguous
  • general
  • skill
  • content
  • web_search
Ambiguity follow-up state is persisted in conversation state for deterministic multi-turn resolution. Skill dispatch runs before web/content when a registered pattern or slash command matches.

Multi-Turn Query Rewriting

Before retrieval, the orchestrator rewrites vague follow-ups into standalone search queries using conversation history. This is the rewrite-retrieve-read pattern - the rewritten query flows end-to-end through both retrieval and generation. The rewriter returns one of three actions:
ActionWhenExample
RETRIEVE (default)Any message that could benefit from library content”tell me more”, “yes”, “for the data”
CONVERSEPure meta-commentary about the conversation itself”be more specific”, “elaborate on point 2”
CLARIFYLast resort - message is truly unintelligibleExtremely rare by design
Key behaviors:
  • RETRIEVE is the default. A slightly off search is better than asking the user to repeat themselves.
  • CLARIFY never fires twice in a row. A hard code breaker forces RETRIEVE if the previous response was already a clarification.
  • CONVERSE never makes factual claims about sources. Questions like “did she mention X?” always trigger RETRIEVE.
  • The rewritten query anchors to the specific topic of the previous exchange, not the general source.

Content Path

  1. Rewrite query for multi-turn context (see above)
  2. Run hybrid retrieval (FTS + vector + rerank) using the rewritten query
  3. Apply feedback-weighted score adjustment
  4. Generate final answer via fallback-aware LLM path, using the rewritten query as the question
  5. Persist assistant message + provenance metadata

Streaming Path Notes

SSE emits typed events and always targets one assistant slot in frontend state. Terminal event should be done or error.

Persistence Touchpoints

  • conversations
  • messages (includes max_rerank_score, answer_origin)
  • feedback (separate endpoint; impacts later ranking)