Database
Three Data Stores
| Store | Technology | Path | Purpose |
|---|---|---|---|
| OSINT Cache | SQLite | data/osint_cache.db | Cache for tool results |
| Investigations | SQLite | data/osint_ai.db | Investigations, evidence, entities, claims |
| Embeddings | ChromaDB | data/chroma/ | Semantic search over evidence |
SQLite — OSINT Cache (osint_cache.db)
Simple cache table with TTL:
CREATE TABLE cache ( key TEXT PRIMARY KEY, -- Hash of (tool_name + input) value TEXT, -- Serialized JSON result created_at REAL, -- Unix timestamp expires_at REAL -- Unix timestamp (created_at + TTL));TTL is controlled with CACHE_TTL_SECONDS (default: 86400 = 24h).
SQLite — Investigations (osint_ai.db)
8 tables covering the complete investigation cycle:
investigations
CREATE TABLE investigations ( id INTEGER PRIMARY KEY AUTOINCREMENT, slug TEXT UNIQUE NOT NULL, name TEXT NOT NULL, goal TEXT, status TEXT DEFAULT 'active', -- active, archived created_at TEXT, updated_at TEXT, metadata TEXT -- JSON with additional info);evidences
CREATE TABLE evidences ( id TEXT PRIMARY KEY, -- ev_<uuid> investigation_id INTEGER, sha256 TEXT UNIQUE, -- For deduplication source TEXT, -- Original path or URL source_type TEXT, -- pdf, txt, html, url, osint_result credibility_tier INTEGER, -- 1 (high) to 3 (low) extracted_text TEXT, metadata TEXT, -- JSON ingested_at TEXT);entities
CREATE TABLE entities ( id TEXT PRIMARY KEY, -- ent_<uuid> investigation_id INTEGER, type TEXT, -- Person, Company, Domain, IP, etc. name TEXT NOT NULL, normalized_name TEXT, identifiers TEXT, -- JSON: {dni, cif, iban, email...} metadata TEXT, created_at TEXT, updated_at TEXT);relationships
CREATE TABLE relationships ( id TEXT PRIMARY KEY, investigation_id INTEGER, from_entity_id TEXT, to_entity_id TEXT, relation_type TEXT, -- owns, controls, member_of, etc. evidence_id TEXT, -- Evidence supporting the relationship confidence REAL, -- 0.0 - 1.0 metadata TEXT, created_at TEXT);claims
CREATE TABLE claims ( id TEXT PRIMARY KEY, -- clm_<uuid> investigation_id INTEGER, text TEXT NOT NULL, -- The assertion in natural language evidence_id TEXT, -- Source evidence chunk_id TEXT, -- Specific fragment snippet TEXT, -- Exact text from which it was extracted status TEXT DEFAULT 'unverified', -- verified, disputed, unverified verified_by TEXT, verified_at TEXT, created_at TEXT);alerts
CREATE TABLE alerts ( id TEXT PRIMARY KEY, investigation_id INTEGER, alert_type TEXT, -- entity, search, feed target TEXT, -- Entity or query to monitor frequency TEXT, -- daily, weekly, realtime last_run TEXT, last_result TEXT, active INTEGER DEFAULT 1, created_at TEXT);ChromaDB (data/chroma/)
One collection per investigation: osint_<slug>.
Each evidence chunk is stored with:
{ "id": "ev_001_chunk_003", "embedding": [...], # 384 dimensions (MiniLM-L12) "document": "...chunk text...", "metadata": { "evidence_id": "ev_001", "source": "registry-report.pdf", "chunk_index": 3, "investigation_slug": "acme-case" }}CLI Utility Scripts
In .claude/skills/_shared/scripts/:
| Script | Available Commands |
|---|---|
puruto_db.py | 47 complete CRUD commands for investigations |
puruto_vector.py | ChromaDB operations (search, upsert, delete) |
puruto_hash.py | SHA-256 hash calculation and verification |
puruto_normalize.py | Normalization of Spanish entities |
puruto_alerts.py | Execution of scheduled alerts |
Direct usage example
# View active investigation summarySLUG=$(cat data/.active)python3 .claude/skills/_shared/scripts/puruto_db.py investigation-summary --slug "$SLUG"
# List entitiespython3 .claude/skills/_shared/scripts/puruto_db.py list-entities --slug "$SLUG"
# Semantic searchpython3 .claude/skills/_shared/scripts/puruto_vector.py search \ --slug "$SLUG" \ --query "contracts with Chinese suppliers" \ --n 5Active Investigation
The slug of the active investigation is saved in:
data/.activecat data/.active # View active investigationecho "acme-case" > data/.active # Change active investigation manuallyAll Claude Code skills automatically read data/.active.