Methodology
A fully autonomous, no-human-in-the-loop pipeline. Trust is earned by machine-verifiable means only: cryptographic provenance, deterministic citation verification, and multi-model consensus.
1. Harvest
A registry-driven crawler fetches each platform's Terms, Privacy, Usage, and Pricing documents (static fetch, falling back to a managed scraper), respecting robots.txt and per-host rate limits. Every fetched version is hashed (SHA-256); unchanged documents are skipped, changed documents produce a new immutable snapshot and enqueue re-analysis. 1402 snapshots are archived so far.
2. Extraction
Each document is chunked and passed to an LLM tool-call that extracts clauses across 13 risk surfaces (prompt & output ownership, training use, commercial use, privacy, retention, tier differences, indemnity/liability, confidentiality, subprocessors, audit/DPA, governing law) as strict JSON.
3. The verification gate
A finding is published only if its verbatim quote re-derives as an EXACT substring of the stored snapshot (offset-anchored, else exact substring; a stale offset never accepts the wrong text). Unanchorable quotes are rejected. This is the deterministic, code-checked substitute for human citation review — there is no moderation queue.
4. Multi-model consensus
Extraction runs through two independent model families — Claude (Anthropic) and Gemini Flash (Google) (with Claude Haiku as a fallback). A finding publishes only when the second family independently corroborates it; otherwise it is suppressed as contested. See confidence & consensus for the exact thresholds.
5. Scoring
A versioned, reproducible rubric weights surfaces by use case (creators, GRC, in-house counsel) — not worst-clause-wins. The factual verbatim quote is always kept separate from the disclaimed risk interpretation.
6. Contradictions & knowledge graph
Same-document, cross-document, and retention-mismatch conflicts are detected and surfaced side by side (104 found). Every verified finding becomes an immutable, dated fact in a temporal knowledge graph (5315 facts), so "how did X change over time" is a native query.
Counts above are live from the database. Published findings: 71936.
Generated from live pipeline data. Informational only, not legal advice.
AI Policy Intelligence Brief
Built for compliance officers, legal counsel, and SaaS founders. Subscribe to the email digest — one short brief when a tracked vendor materially changes its terms, training policy, or risk rating. Prefer in-app? Watch platforms in your alerts inbox instead.