Skip to content
← Back to blog
Bot Security

How to Tell Real Googlebot from Fake

March 19, 2026·6 min read

The problem with trusting user-agents

Every HTTP request includes a user-agent string. Googlebot identifies itself as Googlebot/2.1. The problem? Anyone can set their user-agent to anything.

Attackers know that many security tools whitelist Googlebot. So they set their scraper's user-agent to "Googlebot" and bypass rate limits, WAF rules, and bot detection entirely.

If your security relies on user-agent strings alone, any attacker with one line of code can walk right past it.

Reverse DNS: the real identity check

Google's actual crawlers come from specific IP ranges that resolve to *.googlebot.com. Bing's crawlers resolve to *.search.msn.com. No one else can make their IP resolve to those domains.

Reverse DNS (rDNS) looks up an IP address and returns the hostname it belongs to. If a request says "I'm Googlebot" but the IP resolves to random-vps.hosting-provider.com, it's fake.

How Inner Warden handles this

Inner Warden checks every request that claims to be a major search engine bot:

GooglebotIP must resolve to *.googlebot.com or *.google.com
BingbotIP must resolve to *.search.msn.com
YandexIP must resolve to *.yandex.ru or *.yandex.com
BaiduIP must resolve to *.baidu.com
DuckDuckBotIP must resolve to *.duckduckgo.com
AppleBotIP must resolve to *.apple.com

If the check passes, the request is tagged as a verified bot and excluded from abuse detection. If it fails, the request is tagged as bot:spoofed and treated as a potential attacker.

What about other bots?

Inner Warden recognizes 25+ legitimate bots (Facebook, Twitter, LinkedIn, Discord, Pinterest, Reddit, Telegram, ChatGPT, Claude, Ahrefs, Semrush, and more). These are whitelisted by user-agent because they're rarely spoofed and don't have well-documented rDNS patterns.

For the big search engine bots, the ones attackers actually impersonate, rDNS verification ensures you only whitelist the real thing.

Why this matters

Most security tools (fail2ban, ModSecurity, even some CDN rules) either block all bots (hurting your SEO) or trust user-agents blindly (letting attackers through). Inner Warden does what enterprise CDNs do: verify the identity, then decide.

The rDNS check takes about 10-50ms and only runs when a request claims to be a major bot. It doesn't slow down normal traffic at all.

Set it up

Bot verification is built in. No extra configuration needed. Install Inner Warden and it works automatically:

curl -fsSL https://innerwarden.com/install | sudo bash

Read more: How to detect web scanners · Fail2ban vs Inner Warden