Name: InnerWarden
Author: InnerWarden

A category formed faster than the answers did

In the first half of 2026, "Agentic Runtime Security" went from blog phrase to product category. IBM published a framing piece. HiddenLayer announced agentic runtime capabilities. Oligo Security shipped an AI Runtime Defense product. The A2AS paper (arXiv 2510.13825) proposed a formal model: behaviour certificates, authenticated prompts, in-context defences. Lumia Security wrote about unified agentic defence platforms.

The energy is real and the threat is real. Autonomous agents now write code, invoke tools, touch production systems, and occasionally do the wrong thing with high confidence. Someone has to put a fence around that.

What is interesting is where everyone is building the fence. Almost every entrant in this category focuses on the same layer: prompts, context windows, tool-call schemas, behaviour certificates issued to the model. That is a coherent layer to defend. It is also not the load-bearing one.

The LLM does not break the server. The syscall does.

An autonomous agent compromising a Linux host does it the same way every other process compromises a Linux host: by issuing a syscall the kernel agrees to execute. The attacker primitive is execve("/bin/sh", ...) or open("/etc/shadow", O_RDONLY) or connect(AF_INET, attacker_ip). The kernel does not know, and does not care, whether the calling process is a human typing in tmux, a Python script, or an LLM with a tool-call schema.

That is the load-bearing observation. The runtime is the place where intent becomes effect. Everything upstream of the syscall is reasoning about what the agent might do. The syscall is where it actually does it.

Linux has fifteen years of mature primitives for watching and controlling that boundary: eBPF tracepoints, LSM hooks, seccomp filters, cgroup capabilities, process trees, namespaces. They were not built for AI agents. They were built for the same problem from a different direction: defending the host against any process that acts wrong. Agentic Runtime Security can either reinvent them inside the LLM, or use them.

What prompt-side defences catch, and what they miss

Prompt filters work. They catch jailbreak strings, recognisable injection patterns, requests for the model to ignore prior instructions. Behaviour certificates pin the agent's allowed tool surface. In-context defences keep untrusted input from rewriting the system prompt. For the class of attack where the adversary tries to make the model say something dangerous, this is the right layer.

They miss a different class. An agent might hallucinate a destructive command on its own. No jailbreak, no injection, just confident wrongness. The prompt is clean and the context window is intact. The shell command rm -rf ~/projects looks like a developer asking for help. The prompt-side defence has nothing to flag because nothing about the prompt is wrong.

The runtime-side defence sees the same event differently: a process is about to delete a directory tree that contains files the operator opened thirty minutes ago. That is a recoverable signal even when the prompt was clean.

The wedge, in one sentence

Prompt guardrails control what the agent says. Runtime guardrails control what the agent does. They operate on different evidence, use different primitives, and fail in different ways. Both are useful. If you can only afford one, the runtime layer is the one that survives the agent being wrong for any reason: adversary, hallucination, drifted fine-tune, brand-new failure mode you have never seen.

What this looks like in InnerWarden

InnerWarden runs the runtime side of this story on Linux. Before an AI agent invokes a command, it asks the local Trusted Advisor endpoint to score the command against 71 ATR (Agent Threat Rules) YAML rules and 29 prompt-injection patterns. The advisor returns allow, review, or deny along with a reason. The agent treats this as advice; the operator gets a Telegram alert on deny regardless.

That is the polite layer. Behind it sits the impolite one: 40 eBPF kernel hooks watching every process exec, every file open, every outbound connection on the host. If the agent ignores the advice, or the operator's own shell is the source of the dangerous syscall, or a compromised dependency is, the kernel-level layer sees it and the cross-layer correlation engine decides whether to kill the process, block the network, or just record evidence.

The two layers compose. Prompt-side filters from any vendor that ships them (Lakera, Prompt Armor, NeMo Guardrails, an in-house policy) keep running on the agent side. InnerWarden does not replace them. It is the layer underneath.

Broader than agent-only

The other thing the runtime layer gets, almost by accident, is universal coverage. The same eBPF hook that watches an AI agent's execve also watches a human attacker's. The cross-layer rule that catches an agent walking from "list users" to "read /etc/shadow" also catches an SSH brute-forcer doing the same thing.

Agentic Runtime Security as a category will probably narrow in on the agent-only product. InnerWarden's position is that the agent is one of several actors on the host, and the right defensive layer treats them all the same. Read The Autonomous EDR Thesis for the attacker side of the same primitive, and Protect Your Agent for the agent integration.

Runtime Guardrails, Not Prompt Guardrails

A category formed faster than the answers did

The LLM does not break the server. The syscall does.

What prompt-side defences catch, and what they miss

The wedge, in one sentence

What this looks like in InnerWarden

Broader than agent-only

Keep following the attack path

The Shell Rewrites Your Filter: How Command Blocklists Get Beaten

The OWASP Agentic Top 10: What "Covered" Honestly Means

From Prompt Injection to Syscall: Why Prompt-Layer Defenses Guard the Wrong Layer

Denied Is Denied: Why the Kernel's No Beats the Model's No

Claude Code in Auto Mode: Brilliant, Fast, and Running as You

The MCP Attack Surface Nobody Sandboxed