Why This Matters
Chunk metadata powers:- citation rendering
- source identity and deletion
- namespace filtering
- retrieval and ranking diagnostics
Shared Metadata Invariants
All chunk types include:namespacechunk_index- optional taxonomy fields (
labels/path/confidence/classifier_version)
- unknown fields are rejected by typed models
- optional
Nonefields are omitted from final Chroma payload
Per-Type Required Fields
YouTube chunks
Required keys include:video_idtitledurationtimestamp_startsource_urlnamespacechunk_index
Article chunks
Required keys include:source_idtitlesource_urlnamespacechunk_index
File/document chunks
Required keys include:source_idfile_pathfile_namefile_hashfile_typetitletotal_chunksnamespacechunk_index
Chunk ID Patterns
- YouTube:
<video_id>_<chunk_index> - Article:
<article_source_id>_<chunk_index> - File:
file_<file_hash12>_<chunk_index>