Skip to content

Database

Three Data Stores

StoreTechnologyPathPurpose
OSINT CacheSQLitedata/osint_cache.dbCache for tool results
InvestigationsSQLitedata/osint_ai.dbInvestigations, evidence, entities, claims
EmbeddingsChromaDBdata/chroma/Semantic search over evidence

SQLite — OSINT Cache (osint_cache.db)

Simple cache table with TTL:

CREATE TABLE cache (
key TEXT PRIMARY KEY, -- Hash of (tool_name + input)
value TEXT, -- Serialized JSON result
created_at REAL, -- Unix timestamp
expires_at REAL -- Unix timestamp (created_at + TTL)
);

TTL is controlled with CACHE_TTL_SECONDS (default: 86400 = 24h).

SQLite — Investigations (osint_ai.db)

8 tables covering the complete investigation cycle:

investigations

CREATE TABLE investigations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
goal TEXT,
status TEXT DEFAULT 'active', -- active, archived
created_at TEXT,
updated_at TEXT,
metadata TEXT -- JSON with additional info
);

evidences

CREATE TABLE evidences (
id TEXT PRIMARY KEY, -- ev_<uuid>
investigation_id INTEGER,
sha256 TEXT UNIQUE, -- For deduplication
source TEXT, -- Original path or URL
source_type TEXT, -- pdf, txt, html, url, osint_result
credibility_tier INTEGER, -- 1 (high) to 3 (low)
extracted_text TEXT,
metadata TEXT, -- JSON
ingested_at TEXT
);

entities

CREATE TABLE entities (
id TEXT PRIMARY KEY, -- ent_<uuid>
investigation_id INTEGER,
type TEXT, -- Person, Company, Domain, IP, etc.
name TEXT NOT NULL,
normalized_name TEXT,
identifiers TEXT, -- JSON: {dni, cif, iban, email...}
metadata TEXT,
created_at TEXT,
updated_at TEXT
);

relationships

CREATE TABLE relationships (
id TEXT PRIMARY KEY,
investigation_id INTEGER,
from_entity_id TEXT,
to_entity_id TEXT,
relation_type TEXT, -- owns, controls, member_of, etc.
evidence_id TEXT, -- Evidence supporting the relationship
confidence REAL, -- 0.0 - 1.0
metadata TEXT,
created_at TEXT
);

claims

CREATE TABLE claims (
id TEXT PRIMARY KEY, -- clm_<uuid>
investigation_id INTEGER,
text TEXT NOT NULL, -- The assertion in natural language
evidence_id TEXT, -- Source evidence
chunk_id TEXT, -- Specific fragment
snippet TEXT, -- Exact text from which it was extracted
status TEXT DEFAULT 'unverified', -- verified, disputed, unverified
verified_by TEXT,
verified_at TEXT,
created_at TEXT
);

alerts

CREATE TABLE alerts (
id TEXT PRIMARY KEY,
investigation_id INTEGER,
alert_type TEXT, -- entity, search, feed
target TEXT, -- Entity or query to monitor
frequency TEXT, -- daily, weekly, realtime
last_run TEXT,
last_result TEXT,
active INTEGER DEFAULT 1,
created_at TEXT
);

ChromaDB (data/chroma/)

One collection per investigation: osint_<slug>.

Each evidence chunk is stored with:

{
"id": "ev_001_chunk_003",
"embedding": [...], # 384 dimensions (MiniLM-L12)
"document": "...chunk text...",
"metadata": {
"evidence_id": "ev_001",
"source": "registry-report.pdf",
"chunk_index": 3,
"investigation_slug": "acme-case"
}
}

CLI Utility Scripts

In .claude/skills/_shared/scripts/:

ScriptAvailable Commands
puruto_db.py47 complete CRUD commands for investigations
puruto_vector.pyChromaDB operations (search, upsert, delete)
puruto_hash.pySHA-256 hash calculation and verification
puruto_normalize.pyNormalization of Spanish entities
puruto_alerts.pyExecution of scheduled alerts

Direct usage example

Ventana de terminal
# View active investigation summary
SLUG=$(cat data/.active)
python3 .claude/skills/_shared/scripts/puruto_db.py investigation-summary --slug "$SLUG"
# List entities
python3 .claude/skills/_shared/scripts/puruto_db.py list-entities --slug "$SLUG"
# Semantic search
python3 .claude/skills/_shared/scripts/puruto_vector.py search \
--slug "$SLUG" \
--query "contracts with Chinese suppliers" \
--n 5

Active Investigation

The slug of the active investigation is saved in:

data/.active
Ventana de terminal
cat data/.active # View active investigation
echo "acme-case" > data/.active # Change active investigation manually

All Claude Code skills automatically read data/.active.