Free AI Data Leakage Tool for IT Administrators and MSPs

Five free tools for checking and diagnosing email security configuration. Check SPF, DMARC and DKIM records for any domain, analyse email headers to detect phishing, validate SPF record lookup limits, understand DMARC policies in plain English, and identify what email provider any domain is using.

No login required. All results are generated in your browser using the Cloudflare public DNS resolver. Built for IT administrators, MSPs, and security teams managing Microsoft 365 environments.

AI Data Exposure Scanner | Sabiki

AI Training Data Intelligence

Has your organisation been
ingested by AI -- and what else
can attackers already see?

Every time an AI company crawls the web, your organisation's data gets swept into training datasets that power ChatGPT, LLaMA, Mistral, Gemini and dozens of other models. This tool reveals the full extent of your digital exposure -- from AI ingestion history to sensitive files that should never have been public.

250B+

Pages in Common Crawl

3T+

Tokens trained on web data

AI crawlers checked

Free

No login required

Enter your organisation's domain

For defensive use only. Queries public indexes and publicly accessible URLs only. Enter a domain you own or are authorised to assess.

Initialising deep scan...

Querying Common Crawl AI training indexes

Waiting...

Mapping AI model exposure from crawl history

Waiting...

Scanning sensitive files and endpoints

Waiting...

Analysing robots.txt, headers and AI crawler access

Waiting...

Compiling exposure intelligence report

Waiting...

/ 100

Calculating...

AI model training exposure

Which AI systems have likely ingested your organisation's data

Common Crawl ingestion history

Pages from your domain captured in each global web crawl

Publicly indexed document exposure

Document types found in AI training crawls

Sensitive file and endpoint exposure

Publicly accessible paths that should not be reachable from the internet

Robots.txt intelligence disclosure

Hidden paths inadvertently advertised to attackers via robots.txt

Server disclosure and security headers

Technology stack revealed by HTTP response headers

AI training crawler access

Whether your domain is protected against the crawlers that harvest content for AI model training

Know what attackers know -- before they act

Sabiki monitors your M365 tenant exposure continuously and alerts you to new risks.

Start free trial M365 tools

Free AI Data Leakage Tool for IT Administrators and MSPs

Has your organisation beeningested by AI -- and what elsecan attackers already see?

Know what attackers know -- before they act

Has your organisation been
ingested by AI -- and what else
can attackers already see?