Store Inventory
| Store | Path | Purpose |
|---|---|---|
| Main relational state | data/samaritan.db | Conversations, messages, feedback, projects, recommendation events |
| Lexical index | data/samaritan_fts.db | FTS5/BM25 retrieval index for chunk text |
| Vector store | data/chroma/ | Embeddings + chunk metadata, namespace-scoped |
| Related-source cache | data/samaritan_connections.db | Cached source-to-source similarity links |
| Learnings | data/samaritan.db | Extracted concepts, tools, code snippets per source (tables: learnings, learning_concepts, learning_tools, learning_code_snippets) |
| Eval artifacts | data/eval/results/*.json | Eval run outputs and comparisons |
Runtime Roles
samaritan.db
System-of-record SQL store for user-facing state:
- conversation timeline
- message provenance fields
- chat feedback rows
- project records
- usage/cost telemetry rows
samaritan_fts.db
Lexical search companion to vector retrieval.
- stores chunk text + lightweight metadata keys
- queried for exact-term and sparse-match recall
data/chroma/
Namespace collections for semantic retrieval.
- ingestion writes deterministic chunk IDs and metadata
- retrieval reads vectors + metadata for candidate generation
samaritan_connections.db
Performance cache, not source of truth.
- related-source edges can be recomputed
- invalidated after ingestion/deletion events
Consistency Model
Samaritan writes to multiple stores during ingestion (Chroma + FTS + learnings). These writes are coordinated by application flow, not multi-store transactions. Operationally:- ingestion completion implies indexes are expected to be query-ready
- temporary drift is possible during in-flight operations
- cache invalidation keeps retrieval paths aligned after updates
Durability
Persists across restart:- SQLite files
- Chroma storage
- eval JSON artifacts
- retriever object cache
- circuit-breaker runtime state
Deployment Note
Persist/mount the entiredata/ directory in non-ephemeral environments.
If data/ is lost, chat history, retrieval indexes, and local artifacts are lost.