Skip to main content

Store Inventory

StorePathPurpose
Main relational statedata/samaritan.dbConversations, messages, feedback, projects, recommendation events
Lexical indexdata/samaritan_fts.dbFTS5/BM25 retrieval index for chunk text
Vector storedata/chroma/Embeddings + chunk metadata, namespace-scoped
Related-source cachedata/samaritan_connections.dbCached source-to-source similarity links
Learningsdata/samaritan.dbExtracted concepts, tools, code snippets per source (tables: learnings, learning_concepts, learning_tools, learning_code_snippets)
Eval artifactsdata/eval/results/*.jsonEval run outputs and comparisons

Runtime Roles

samaritan.db

System-of-record SQL store for user-facing state:
  • conversation timeline
  • message provenance fields
  • chat feedback rows
  • project records
  • usage/cost telemetry rows

samaritan_fts.db

Lexical search companion to vector retrieval.
  • stores chunk text + lightweight metadata keys
  • queried for exact-term and sparse-match recall

data/chroma/

Namespace collections for semantic retrieval.
  • ingestion writes deterministic chunk IDs and metadata
  • retrieval reads vectors + metadata for candidate generation

samaritan_connections.db

Performance cache, not source of truth.
  • related-source edges can be recomputed
  • invalidated after ingestion/deletion events

Consistency Model

Samaritan writes to multiple stores during ingestion (Chroma + FTS + learnings). These writes are coordinated by application flow, not multi-store transactions. Operationally:
  • ingestion completion implies indexes are expected to be query-ready
  • temporary drift is possible during in-flight operations
  • cache invalidation keeps retrieval paths aligned after updates

Durability

Persists across restart:
  • SQLite files
  • Chroma storage
  • eval JSON artifacts
In-memory only (resets on restart):
  • retriever object cache
  • circuit-breaker runtime state

Deployment Note

Persist/mount the entire data/ directory in non-ephemeral environments. If data/ is lost, chat history, retrieval indexes, and local artifacts are lost.