AIRIN

A fully autonomous, no-human-in-the-loop pipeline. Trust is earned by machine-verifiable means only: cryptographic provenance, deterministic citation verification, and multi-model consensus.

1. Harvest

A registry-driven crawler fetches each platform's Terms, Privacy, Usage, and Pricing documents (static fetch, falling back to a managed scraper), respecting robots.txt and per-host rate limits. Every fetched version is hashed (SHA-256); unchanged documents are skipped, changed documents produce a new immutable snapshot and enqueue re-analysis. 1402 snapshots are archived so far.

2. Extraction

Each document is chunked and passed to an LLM tool-call that extracts clauses across 13 risk surfaces (prompt & output ownership, training use, commercial use, privacy, retention, tier differences, indemnity/liability, confidentiality, subprocessors, audit/DPA, governing law) as strict JSON.

3. The verification gate

A finding is published only if its verbatim quote re-derives as an EXACT substring of the stored snapshot (offset-anchored, else exact substring; a stale offset never accepts the wrong text). Unanchorable quotes are rejected. This is the deterministic, code-checked substitute for human citation review — there is no moderation queue.

4. Multi-model consensus

Extraction runs through two independent model families — Claude (Anthropic) and Gemini Flash (Google) (with Claude Haiku as a fallback). A finding publishes only when the second family independently corroborates it; otherwise it is suppressed as contested. See confidence & consensus for the exact thresholds.

5. Scoring

A versioned, reproducible rubric weights surfaces by use case (creators, GRC, in-house counsel) — not worst-clause-wins. The factual verbatim quote is always kept separate from the disclaimed risk interpretation.

6. Contradictions & knowledge graph

Same-document, cross-document, and retention-mismatch conflicts are detected and surfaced side by side (104 found). Every verified finding becomes an immutable, dated fact in a temporal knowledge graph (5315 facts), so "how did X change over time" is a native query.

Counts above are live from the database. Published findings: 71936.