Pipeline Summary
Entry Points
POST /api/chat(single JSON response)POST /api/chat/stream(SSE stream)
Routing Buckets
Deterministic buckets:clear_metaambiguousgeneralskillcontentweb_search
Multi-Turn Query Rewriting
Before retrieval, the orchestrator rewrites vague follow-ups into standalone search queries using conversation history. This is the rewrite-retrieve-read pattern - the rewritten query flows end-to-end through both retrieval and generation. The rewriter returns one of three actions:| Action | When | Example |
|---|---|---|
| RETRIEVE (default) | Any message that could benefit from library content | ”tell me more”, “yes”, “for the data” |
| CONVERSE | Pure meta-commentary about the conversation itself | ”be more specific”, “elaborate on point 2” |
| CLARIFY | Last resort - message is truly unintelligible | Extremely rare by design |
- RETRIEVE is the default. A slightly off search is better than asking the user to repeat themselves.
- CLARIFY never fires twice in a row. A hard code breaker forces RETRIEVE if the previous response was already a clarification.
- CONVERSE never makes factual claims about sources. Questions like “did she mention X?” always trigger RETRIEVE.
- The rewritten query anchors to the specific topic of the previous exchange, not the general source.
Content Path
- Rewrite query for multi-turn context (see above)
- Run hybrid retrieval (FTS + vector + rerank) using the rewritten query
- Apply feedback-weighted score adjustment
- Generate final answer via fallback-aware LLM path, using the rewritten query as the question
- Persist assistant message + provenance metadata
Streaming Path Notes
SSE emits typed events and always targets one assistant slot in frontend state. Terminal event should bedone or error.
Persistence Touchpoints
conversationsmessages(includesmax_rerank_score,answer_origin)feedback(separate endpoint; impacts later ranking)