How InnerWarden uses AI
InnerWarden runs real machine learning, entirely on your own machine, layered into a deterministic security engine. This page walks through exactly where AI lives in the product, from the AI-free sensor at the bottom up to the optional LLM layer at the top, and what each layer is actually doing.
The LLM is an enhancement, never a dependency.
Stated in the source and enforced throughout: if no model is configured, the system proceeds on deterministic detection with no AI cost. That single sentence is the most important thing to understand about how this project uses AI.
Five layers, increasing sophistication, decreasing trust
When AI is enabled it appears in distinct forms. The lower layers are deterministic and always on; the model layers are capped, local, and removable.
Sensor
Tier 0 · No AIThe lowest layer is explicitly AI-free. It collects raw signals (eBPF syscalls, logs, network and firmware integrity) and writes them to a local store. No model, no scoring, no network call. The ground-truth evidence stream is produced by deterministic code that cannot be “argued out of” recording something. Intelligence is applied downstream of collection, never as a gate on what gets seen.
Detectors & rules
Tier 1 · No MLThis is where most “decisions” actually happen, and none of it is machine learning. 80+ stateful detectors plus the Agent Threat Rules engine that screens a command or MCP tool-call before an agent acts: regex, structural AST analysis (tree-sitter), and heuristics. Classic detection engineering, hand-tuned and auditable.
On-host machine learning
Tier 2 · Real, first-party MLA from-scratch anomaly autoencoder, written in pure Rust (no PyTorch, no external library doing the math), trained locally and nightly, per host, on ~65 behavioural features. It learns what “normal” looks like for your machine. It earns trust over a 7-day observation window before it counts, and is capped at no more than 30% of any score. No cloud.
Local Warden
Tier 3 · On-device, defaultThe component most people mean by “the AI that decides.” An ONNX classifier distilled from a SecureBERT teacher (a model pre-trained on cybersecurity text), running entirely in-process with no network calls, ~50 to 200 ms per incident on a normal server CPU. It is the default intelligence, and it never leaves your machine.
LLM triage
Tier 4 · Optional · advisory“Add AI when you want it.” Bring your own model across 12 providers, including fully-local Ollama. It is strictly advisory unless you explicitly enable auto-execution, it can only choose from actions the system already knows how to run, and you can remove it entirely. Detection, correlation, and response still work without it.
The safety architecture around the AI
The most important engineering here is not the models; it is the deterministic guardrails wrapped around them.
A deterministic gate has the last word
Every model output passes through a deterministic Context Gate before it counts. High and Critical incidents are NEVER auto-dismissed by a model. The AI is subordinated to safety logic, not the other way around.
Nothing acts on its own by default
A fresh install is observe-only. Nothing is blocked or killed unless confidence clears an operator-set threshold AND you have explicitly turned off dry-run. The AI can recommend all day and change nothing until you flip two switches.
Bounded, reversible actions
When an action does fire, the AI only ever picks from a fixed set of safe primitives (block IP, kill process, honeypot redirect), each time-bounded, audited to a local hash-chained trail, and reversible. The model chooses among safe options; it never gets a shell.
Real ML, and where it's plain statistics
The autoencoder is a real neural network you can read in one file. The behavioural “DNA” engine is classical statistics (cosine distance plus z-scores), not a neural net. An older scoring network is deprecated. Calling each layer what it actually is keeps the system auditable, and the default decision path has no cloud, no API key, and no per-call cost.
And all of it is open. Every layer above (the sensor, the detectors, the autoencoder, the Local Warden classifier, the LLM router) ships in the open-source core (Apache-2.0); you can read every line in the public repository. The paid Active Defence tier is about enforcement (arming the kernel execution gate, host-level containment, fleet management), not the AI. The intelligence is free and inspectable; you only pay when you want it to act on its own, at scale.
InnerWarden is a deterministic host-security engine that uses a locally-trained autoencoder and an on-device SecureBERT-distilled classifier for its default intelligence, with optional external/local LLMs as a strictly advisory, fully-removable layer, and every model output is subordinated to deterministic, red-team-tested safety gates.