Skip to main content

Core Signals

Request timing

All API responses include:
  • X-Response-Time

Provider state

Inspect provider/circuit status via:
  • GET /api/stats/providers

Cost and usage

Inspect LLM usage/cost telemetry via:
  • GET /api/stats/costs

Health status

  • GET /health
  • GET /health?deep=true (detailed dependency checks)

Eval and quality telemetry

  • GET /api/evals/runs
  • GET /api/evals/runs/{run_id}
  • GET /api/evals/compare

Streaming Timing Fields

Chat stream terminal payload can include:
  • timing.search_ms
  • timing.rerank_ms
  • timing.llm_ms
  • timing.total_ms
These are useful for latency decomposition and regression analysis.

Logging and Tracing

Runtime uses structured logs and optional tracing integrations. If optional tracing backends are unavailable, core API behavior should remain operational.

Services Tab Mapping

The /monitor surface maps to observability data like this:
Services tabObservability inputs
Health/health, /health?deep=true
Architecturearchitecture docs + runtime status metadata
Eval Runs/api/evals/runs, /api/evals/compare
Databaseslocal store topology (data/*.db, data/chroma/) + data-store docs
Explorationlocal curated monitor queue for external tools to evaluate
Tracingprovider stats + tracing integration state

Practical Debug Loop

  1. confirm health endpoint
  2. check provider status and circuit state
  3. inspect response timing and stream timing fields
  4. inspect usage/cost endpoint for provider error spikes
  5. reproduce with a minimal request and capture logs