Crawler Policy

We welcome legitimate search engines and AI-powered services while protecting our infrastructure from harmful bots.

๐Ÿค Crawlers We Welcome

Metal Hats Cats supports AI-powered search and knowledge services. We provide full access to the following crawler types:

โœ… Premium Search Engines (Tier 1)

Rate Limit: 60 requests/minute

  • Googlebot (Google Search)
  • bingbot (Bing Search)
  • Applebot (Apple Search & Siri)
  • Slurp (Yahoo Search)
  • DuckDuckBot (DuckDuckGo Search & AI Chat)

โœ… AI Assistants & AI-Powered Search (Tier 2)

Rate Limit: 30-40 requests/minute

  • Google-Extended (Gemini AI Training)
  • BingPreview (Microsoft Copilot)
  • GPTBot (OpenAI Training & ChatGPT Search)
  • ChatGPT-User (ChatGPT Browsing)
  • ClaudeBot (Anthropic AI Training)
  • Claude-Web (Claude Browsing)
  • PerplexityBot (Perplexity AI Search)
  • Applebot-Extended (Apple Intelligence)
  • YouBot (You.com AI Search)

โœ… AI Training & Archives (Tier 3)

Rate Limit: 20-30 requests/minute

  • anthropic-ai (Anthropic Research)
  • CCBot (Common Crawl for AI Training)
  • Meta-ExternalAgent (Meta AI Training)
  • Diffbot (Structured Data for AI)

โœ… Social Media Preview Bots

Rate Limit: 30-50 requests/minute

  • Slackbot, LinkedInBot, Twitterbot, FacebookBot
  • TelegramBot, WhatsApp, Discordbot
  • Reddit (Snoobot), Pinterestbot, Mastodon

๐Ÿšซ Blocked Crawlers

We block SEO tools, scrapers, and aggressive bots that provide no value:

  • SemrushBot, AhrefsBot, MJ12bot, DotBot (SEO tools)
  • PetalBot, BLEXBot, DataForSeoBot (aggressive crawlers)
  • Generic scrapers (Scrapy, python-requests, curl, wget)
  • Headless browsers (HeadlessChrome, PhantomJS, Selenium)

๐Ÿ›ก๏ธ Protection Measures

Rate Limiting

All crawlers are subject to tier-based rate limits to ensure fair resource usage. Limits are designed to be generous for legitimate services while preventing abuse.

Behavioral Analysis

Unknown crawlers are monitored for suspicious patterns. Important: All allowed AI bots and social preview bots are exempt from behavioral blocking.

Content Protection

Generated content may include watermarks and fingerprints to track unauthorized republication and prove ownership.

๐Ÿ“ง Request Access

If you operate a legitimate crawler that's being blocked, or if you need higher rate limits for research purposes, please contact us:

Contact Email: contact@metalhatscats.com

Please include: crawler user-agent, purpose, expected request rate, and IP ranges.

๐Ÿ“„ robots.txt

View our complete robots.txt file for technical crawler directives:

View robots.txt โ†’

Last updated: January 2024 | This policy may be updated to adapt to new crawlers and threats.