Free AI Data Leakage Tool for IT Administrators and MSPs

Five free tools for checking and diagnosing email security configuration. Check SPF, DMARC and DKIM records for any domain, analyse email headers to detect phishing, validate SPF record lookup limits, understand DMARC policies in plain English, and identify what email provider any domain is using.

No login required. All results are generated in your browser using the Cloudflare public DNS resolver. Built for IT administrators, MSPs, and security teams managing Microsoft 365 environments.

AI Data Exposure Scanner | Sabiki
AI Training Data Intelligence

Has your organisation been
ingested by AI -- and what else
can attackers already see?

Every time an AI company crawls the web, your organisation's data gets swept into training datasets that power ChatGPT, LLaMA, Mistral, Gemini and dozens of other models. This tool reveals the full extent of your digital exposure -- from AI ingestion history to sensitive files that should never have been public.

250B+
Pages in Common Crawl
3T+
Tokens trained on web data
15
AI crawlers checked
Free
No login required
Enter your organisation's domain
For defensive use only. Queries public indexes and publicly accessible URLs only. Enter a domain you own or are authorised to assess.
Initialising deep scan...
1
Querying Common Crawl AI training indexes
Waiting...
2
Mapping AI model exposure from crawl history
Waiting...
3
Scanning sensitive files and endpoints
Waiting...
4
Analysing robots.txt, headers and AI crawler access
Waiting...
5
Compiling exposure intelligence report
Waiting...
--
/ 100
Calculating...
AI model training exposure
Which AI systems have likely ingested your organisation's data
Common Crawl ingestion history
Pages from your domain captured in each global web crawl
Publicly indexed document exposure
Document types found in AI training crawls
Sensitive file and endpoint exposure
Publicly accessible paths that should not be reachable from the internet
Robots.txt intelligence disclosure
Hidden paths inadvertently advertised to attackers via robots.txt
Server disclosure and security headers
Technology stack revealed by HTTP response headers
AI training crawler access
Whether your domain is protected against the crawlers that harvest content for AI model training

Know what attackers know -- before they act

Sabiki monitors your M365 tenant exposure continuously and alerts you to new risks.