Crawler Policy
We welcome legitimate search engines and AI-powered services while protecting our infrastructure from harmful bots.
๐ค Crawlers We Welcome
Metal Hats Cats supports AI-powered search and knowledge services. We provide full access to the following crawler types:
โ Premium Search Engines (Tier 1)
Rate Limit: 60 requests/minute
- Googlebot (Google Search)
- bingbot (Bing Search)
- Applebot (Apple Search & Siri)
- Slurp (Yahoo Search)
- DuckDuckBot (DuckDuckGo Search & AI Chat)
โ AI Assistants & AI-Powered Search (Tier 2)
Rate Limit: 30-40 requests/minute
- Google-Extended (Gemini AI Training)
- BingPreview (Microsoft Copilot)
- GPTBot (OpenAI Training & ChatGPT Search)
- ChatGPT-User (ChatGPT Browsing)
- ClaudeBot (Anthropic AI Training)
- Claude-Web (Claude Browsing)
- PerplexityBot (Perplexity AI Search)
- Applebot-Extended (Apple Intelligence)
- YouBot (You.com AI Search)
โ AI Training & Archives (Tier 3)
Rate Limit: 20-30 requests/minute
- anthropic-ai (Anthropic Research)
- CCBot (Common Crawl for AI Training)
- Meta-ExternalAgent (Meta AI Training)
- Diffbot (Structured Data for AI)
โ Social Media Preview Bots
Rate Limit: 30-50 requests/minute
- Slackbot, LinkedInBot, Twitterbot, FacebookBot
- TelegramBot, WhatsApp, Discordbot
- Reddit (Snoobot), Pinterestbot, Mastodon
๐ซ Blocked Crawlers
We block SEO tools, scrapers, and aggressive bots that provide no value:
- SemrushBot, AhrefsBot, MJ12bot, DotBot (SEO tools)
- PetalBot, BLEXBot, DataForSeoBot (aggressive crawlers)
- Generic scrapers (Scrapy, python-requests, curl, wget)
- Headless browsers (HeadlessChrome, PhantomJS, Selenium)
๐ก๏ธ Protection Measures
Rate Limiting
All crawlers are subject to tier-based rate limits to ensure fair resource usage. Limits are designed to be generous for legitimate services while preventing abuse.
Behavioral Analysis
Unknown crawlers are monitored for suspicious patterns. Important: All allowed AI bots and social preview bots are exempt from behavioral blocking.
Content Protection
Generated content may include watermarks and fingerprints to track unauthorized republication and prove ownership.
๐ง Request Access
If you operate a legitimate crawler that's being blocked, or if you need higher rate limits for research purposes, please contact us:
Contact Email: contact@metalhatscats.com
Please include: crawler user-agent, purpose, expected request rate, and IP ranges.
๐ robots.txt
View our complete robots.txt file for technical crawler directives:
View robots.txt โ