Name: Inner Warden
Author: Inner Warden

The problem with trusting user-agents

Every HTTP request includes a user-agent string. Googlebot identifies itself as Googlebot/2.1. The problem? Anyone can set their user-agent to anything.

Attackers know that many security tools whitelist Googlebot. So they set their scraper's user-agent to "Googlebot" and bypass rate limits, WAF rules, and bot detection entirely.

If your security relies on user-agent strings alone, any attacker with one line of code can walk right past it.

Reverse DNS: the real identity check

Google's actual crawlers come from specific IP ranges that resolve to *.googlebot.com. Bing's crawlers resolve to *.search.msn.com. No one else can make their IP resolve to those domains.

Reverse DNS (rDNS) looks up an IP address and returns the hostname it belongs to. If a request says "I'm Googlebot" but the IP resolves to random-vps.hosting-provider.com, it's fake.

How Inner Warden handles this

Inner Warden checks every request that claims to be a major search engine bot:

GooglebotIP must resolve to *.googlebot.com or *.google.com

BingbotIP must resolve to *.search.msn.com

YandexIP must resolve to *.yandex.ru or *.yandex.com

BaiduIP must resolve to *.baidu.com

DuckDuckBotIP must resolve to *.duckduckgo.com

AppleBotIP must resolve to *.apple.com

If the check passes, the request is tagged as a verified bot and excluded from abuse detection. If it fails, the request is tagged as bot:spoofed and treated as a potential attacker.

What about other bots?

Inner Warden recognizes 25+ legitimate bots (Facebook, Twitter, LinkedIn, Discord, Pinterest, Reddit, Telegram, ChatGPT, Claude, Ahrefs, Semrush, and more). These are whitelisted by user-agent because they're rarely spoofed and don't have well-documented rDNS patterns.

For the big search engine bots, the ones attackers actually impersonate, rDNS verification ensures you only whitelist the real thing.

Why this matters

Most security tools (fail2ban, ModSecurity, even some CDN rules) either block all bots (hurting your SEO) or trust user-agents blindly (letting attackers through). Inner Warden does what enterprise CDNs do: verify the identity, then decide.

The rDNS check takes about 10-50ms and only runs when a request claims to be a major bot. It doesn't slow down normal traffic at all.

Set it up

Bot verification is built in. No extra configuration needed. Install Inner Warden and it works automatically:

curl -fsSL https://www.innerwarden.com/install | sudo bash

How to Tell Real Googlebot from Fake

The problem with trusting user-agents

Reverse DNS: the real identity check

How Inner Warden handles this

What about other bots?

Why this matters

Set it up

Keep following the attack path

30 Days on a Fresh Ubuntu: Attacker Dwell Time and What They Did

Monthly Threat Report: Your Own CrowdStrike Intelligence

Behavioral DNA: Fingerprinting Attackers Without IP Addresses

How We Built a Live Attack Map with Real-Time eBPF Data

Collaborative Defense: How Game Theory Protects a Security Mesh Network

We Built a Honeypot That Attackers Can't Detect