Machine Learning

Building an Autoencoder That Learns What Normal Looks Like on Your Server

Name: InnerWarden
Author: InnerWarden

April 21, 202610 min read

Rule-based detection is excellent at catching known threats. If an attacker runs a reverse shell, fires up a crypto miner, or brute-forces SSH, deterministic rules will catch it every time. But rules have a blind spot: they can only detect what someone has already described. The novel attack, the creative technique, the thing nobody has written a signature for yet, slips through.

We tried solving this with a neural classifier. Version 10 was a supervised model trained on labeled attack data. It worked well on the training set and terribly in production. Too many false positives. Every unusual-but-legitimate workload triggered alerts. The model had learned what attacks look like, but it had no understanding of what normal looks like on your specific server.

The fix was to flip the problem. Instead of teaching a model what attacks look like, teach it what normal looks like. Then anything that does not reconstruct well is anomalous. That is the autoencoder approach, and it is what InnerWarden ships today.

Why an autoencoder, not a classifier

A classifier needs labeled examples of both normal and malicious behavior. This creates two problems. First, you need a comprehensive dataset of attacks, which is always incomplete. Second, the classifier learns to distinguish between the specific attacks in the training set and everything else. A new attack technique that was not in the training data may look more like "everything else" than like the known attacks.

An autoencoder only needs normal data. It learns to compress normal behavior into a small latent space and then reconstruct it. When you feed it something it has never seen before, the reconstruction is poor. The reconstruction error becomes the anomaly score. High error means the input does not look like anything the model learned during training.

Classifier vs Autoencoder

Classifier (V10, abandoned):
  Input: event features → "attack" or "benign"
  Problem: needs labeled attack data
  Problem: novel attacks classified as "benign"
  Problem: high false positive rate on unusual workloads

Autoencoder (current):
  Input: event features → compress → reconstruct → compare
  Only needs normal data (your server's own traffic)
  Novel attacks = high reconstruction error = flagged
  Unusual-but-normal workloads = low error = ignored

Network architecture: 65 to 8 and back

The autoencoder is a bottleneck network with four layers. The input is a 65-dimensional feature vector. The encoder compresses it to 16 dimensions, then to 8. The decoder expands it back to 16, then to 65. The bottleneck of 8 neurons forces the network to learn only the most important patterns in normal behavior.

Autoencoder architecture

Input (65) ──→ Encode (16) ──→ Bottleneck (8) ──→ Decode (16) ──→ Output (65)
    │                                                                    │
    └────────────── reconstruction error = ║input - output║² ───────────┘

Activation: ReLU (hidden layers), Sigmoid (output)
Loss: Mean Squared Error between input and output
Weights: ~3KB total (65×16 + 16×8 + 8×16 + 16×65 + biases)

Written in pure Rust. No PyTorch. No TensorFlow. No ONNX runtime.
Inference: microseconds per event window.

Why pure Rust? Because InnerWarden runs on production servers where installing Python, CUDA, or a 200MB ML runtime is not acceptable. The entire model, including weights and inference code, compiles into the agent binary. No external dependencies. No GPU required. The 3KB weight file loads in microseconds and inference is pure matrix multiplication on the CPU.

65 features from a sliding window

The input vector is built from a sliding window of the last 20 events. Each event has a kind (SSH login, process exec, network connect, file write, etc.), and the feature extractor produces two groups of 24 values each.

Feature vector breakdown

Features [0..24]: Event kind frequency
  Count of each event kind in the 20-event window,
  normalized to [0, 1].

  Index 0: ssh_login       (e.g., 3/20 = 0.15)
  Index 1: ssh_failed      (e.g., 8/20 = 0.40)
  Index 2: process_exec    (e.g., 5/20 = 0.25)
  Index 3: network_connect (e.g., 2/20 = 0.10)
  Index 4: file_write      (e.g., 1/20 = 0.05)
  ...24 event kinds total

Features [24..48]: Bigram transition frequency
  Count of attack-indicative two-event sequences.

  Index 24: ssh_failed → ssh_success  (brute-force success)
  Index 25: exec → connect            (post-exploit C2)
  Index 26: connect → file_write      (download payload)
  Index 27: ssh_success → exec        (lateral movement)
  Index 28: file_write → exec         (drop and run)
  Index 29: exec → exec → exec        (rapid tool chain)
  ...24 bigrams total

The bigram features are the key innovation. A single ssh_failed event is normal. Eight ssh_failed events followed by one ssh_success is a brute-force that succeeded. The bigram ssh_failed to ssh_success captures this pattern as a single feature. Similarly, exec followed by connect is the signature of post-exploitation: run a tool, then phone home.

On a normal server, most bigram features stay near zero. The autoencoder learns this. When a bigram like exec to connect suddenly appears, the reconstruction error spikes because the model has never seen that pattern during training.

Lifecycle: from installation to activation

The autoencoder does not start scoring events on day one. It follows a careful lifecycle that prevents false positives during the learning period while gradually increasing its influence on the final score.

Training lifecycle

Day 0: Install
  └── Sensor starts collecting events
  └── Agent stores events in JSONL files
  └── No autoencoder model exists yet
  └── Rules and kill chain scoring operate normally

Day 1-7: Observation period
  └── Events accumulate (typically 50K-500K per week)
  └── Rules are the sole scoring mechanism
  └── Autoencoder weight in final score: 0

Day 7, 3:00 AM: First training run
  └── Read 7 days of events from JSONL
  └── Extract feature vectors (sliding windows)
  └── Train for 50 epochs
  └── Timeout: 30 minutes
  └── RAM budget: 500MB
  └── Auto-test: verify reconstruction error on training data
  └── Save model weights (~3KB)

Day 8+: Nightly retraining
  └── Every night at 3 AM, retrain on last 7 days
  └── Model adapts to evolving server behavior
  └── Maturity score increases each cycle

The maturity curve

A freshly trained autoencoder is not trusted as much as one that has been retraining for 30 days. The maturity score controls how much weight the autoencoder gets in the final severity calculation. It follows an exponential curve:

Maturity formula

maturity = 1 - e^(-0.1 * training_cycles)

Training cycle 1  (Day 8):   maturity = 0.095  (~10%)
Training cycle 7  (Day 14):  maturity = 0.503  (~50%)
Training cycle 14 (Day 21):  maturity = 0.753  (~75%)
Training cycle 30 (Day 37):  maturity = 0.950  (~95%)

// The curve asymptotes at 1.0 but never reaches it.
// After 30 cycles, the model is effectively at full trust.

This design means a brand-new model with only one training cycle contributes less than 10% of its potential weight to the final score. If it produces a false positive, the impact is minimal. By day 30, the model has retrained 30 times on 30 different 7-day windows. It has seen your server through weekday traffic, weekend lulls, monthly cron jobs, and deployment spikes. At 95% maturity, it has earned its influence.

How the anomaly score integrates with rules

InnerWarden computes a final severity score from three sources. Each source contributes a weighted fraction, and the autoencoder's contribution is scaled by its maturity.

Scoring formula

final_score = rules_score     * 0.4
            + killchain_score  * 0.3
            + anomaly_score    * 0.3 * maturity

// Example: Day 8 (maturity = 0.095)
// anomaly detects something unusual: anomaly_score = 0.9
// effective anomaly contribution: 0.9 * 0.3 * 0.095 = 0.026
// Barely moves the needle. Good.

// Example: Day 37 (maturity = 0.950)
// same anomaly: 0.9 * 0.3 * 0.950 = 0.256
// Significant contribution. The model has earned trust.

// If rules_score = 0 and killchain_score = 0 but anomaly is high,
// this is exactly the scenario the autoencoder was built for:
// novel attack that no rule covers.

The three-source scoring creates defense in depth. A known attack triggers rules. A multi-step attack triggers the kill chain engine. A novel attack that evades both still gets flagged by the autoencoder. All three need to miss for an attack to go undetected.

Rules as teacher: self-correcting false positives

The autoencoder has a built-in self-correction mechanism. It only trains on events that the rule engine considers benign. If the rules say an event is normal, the autoencoder learns to reconstruct it with low error. If the autoencoder later flags a similar event as anomalous, the next nightly training cycle absorbs it into the model of normal behavior.

Self-correction loop

Day 8:  New deployment tool runs for the first time.
        Rules: no match (benign)
        Kill chain: no match (benign)
        Autoencoder: high reconstruction error (anomaly!)
        Result: small bump in score (maturity is low)

Day 9:  Nightly training includes yesterday's deployment events.
        Autoencoder learns: this pattern is normal.

Day 10: Same deployment tool runs again.
        Autoencoder: low reconstruction error (normal)
        False positive eliminated automatically.

This is the key advantage of training on the server's own data. A classifier trained on a generic dataset would keep flagging your custom deployment tool forever. The autoencoder adapts because it retrains every night on your server's actual behavior.

Training: 50 epochs at 3 AM

Training happens nightly at 3 AM when most servers have minimal load. The process reads 7 days of events from JSONL files, extracts feature vectors using the sliding window, and runs 50 epochs of gradient descent.

Training constraints

// Training configuration
const EPOCHS: usize = 50;
const LEARNING_RATE: f32 = 0.001;
const WINDOW_SIZE: usize = 20;       // sliding window of events
const FEATURE_DIM: usize = 48;       // 24 kinds + 24 bigrams
const TIMEOUT: Duration = Duration::from_secs(30 * 60);  // 30 min
const RAM_BUDGET: usize = 500 * 1024 * 1024;             // 500 MB

// Training reads events from the last 7 days
// Events are stored in /var/lib/innerwarden/incidents-*.jsonl
// Each line is a JSON object with kind, timestamp, metadata
// Feature extraction produces one vector per window position

// Auto-test after training:
// Run inference on a sample of training data
// If mean reconstruction error > threshold, discard model
// Keep previous model weights as fallback

The 500MB RAM budget is a hard limit. If the 7-day event window produces more data than fits in memory, the trainer uses reservoir sampling to get a representative subset. On most servers, a week of events fits comfortably within the budget. The 30-minute timeout ensures training never impacts daytime operations, even on slower hardware.

3KB of learned behavior

The total number of weights in the network is small by design. The four weight matrices plus biases add up to roughly 3KB when serialized. For comparison, a typical PyTorch model checkpoint for a similar architecture would be 50-100KB due to optimizer state and metadata. Inner Warden stores only the raw float32 weights.

Weight breakdown

Layer 1 (encode):  48 x 16 = 768 weights + 16 biases
Layer 2 (bottle):  16 x  8 = 128 weights +  8 biases
Layer 3 (decode):   8 x 16 = 128 weights + 16 biases
Layer 4 (output):  16 x 48 = 768 weights + 48 biases

Total: 1,792 weights + 88 biases = 1,880 parameters
Size:  1,880 x 4 bytes (f32) = 7,520 bytes (~7.5KB)

// Stored at /var/lib/innerwarden/anomaly-model.bin
// Loaded once at agent startup, replaced on nightly retrain
// Inference: 4 matrix multiplications + 4 bias additions
// No dynamic allocation during inference

Microsecond inference with zero allocation. The feature vector goes in, the reconstruction error comes out, and the agent continues processing the next event. There is no batching, no GPU transfer, no Python interpreter startup. Just Rust multiplying small matrices on the stack.

What it looks like in practice

Here is a real scenario. A server has been running Inner Warden for 30 days. The autoencoder has been trained 23 times. Maturity is 0.90. An attacker gets in through a zero-day in a web application that no rule covers.

Zero-day detection timeline

14:22:01  Event: process_exec (nginx worker spawns /bin/sh)
14:22:01  Event: process_exec (/bin/sh spawns curl)
14:22:02  Event: network_connect (curl -> 185.220.101.XX:443)
14:22:03  Event: file_write (/tmp/.cache_update)
14:22:03  Event: process_exec (/tmp/.cache_update)
14:22:04  Event: network_connect (.cache_update -> 45.XX.XX.XX:4444)

Sliding window features:
  exec frequency:    0.30  (normally ~0.05)
  connect frequency: 0.10  (normally ~0.02)
  bigram exec->connect:    0.15  (normally 0.00)
  bigram write->exec:      0.05  (normally 0.00)

Rules score:     0.0  (no signature matches this zero-day)
Kill chain:      0.0  (not enough stages yet)
Anomaly score:   0.87 (reconstruction error far above threshold)
Effective:       0.87 * 0.3 * 0.90 = 0.235

Combined score:  0.235 -> Medium severity incident
Agent AI triage: escalates based on exec->connect bigram pattern

Without the autoencoder, this attack would have been invisible until more kill chain stages triggered. The autoencoder caught it on the initial exploitation phase because it recognized the event pattern as fundamentally different from anything the server normally does.

Why not use an off-the-shelf ML solution?

InnerWarden AutoencoderTypical ML Pipeline

RuntimePure Rust, in-processPython + ONNX/TF Serving

Model size~7.5KB weights50MB-2GB

Inference timeMicrosecondsMilliseconds to seconds

GPU requiredNoOften yes

DependenciesZero (compiled in)Python, NumPy, model runtime

TrainingOn-host, nightly, 30 minOff-host, GPU cluster

Adapts to serverYes, per-host modelGeneric model for all hosts

RAM usage< 500MB during trainingOften 2-8GB

False positive handlingAuto-corrects nightlyRequires manual retraining

The goal was never to build the most powerful anomaly detection model. It was to build one that actually runs on production servers without creating operational burden. A 3KB model that retrains itself every night and requires zero configuration is more useful than a 2GB model that needs a data science team to maintain.

Nothing to configure

The autoencoder is enabled by default. Install InnerWarden and it starts collecting events immediately. After 7 days, the first training run happens at 3 AM. You do not need to label data, tune hyperparameters, or provision a GPU.

Install

curl -fsSL https://www.innerwarden.com/install | sudo bash

After the first training cycle, you will see anomaly scores in the dashboard alongside rule-based and kill chain scores. The maturity indicator shows how much trust the model has earned. By day 30, it is operating at full capacity and catching things that no signature could.

Building an Autoencoder That Learns What Normal Looks Like on Your Server

Why an autoencoder, not a classifier

Network architecture: 65 to 8 and back

65 features from a sliding window

Lifecycle: from installation to activation

The maturity curve

How the anomaly score integrates with rules

Rules as teacher: self-correcting false positives

Training: 50 epochs at 3 AM

3KB of learned behavior

What it looks like in practice

Why not use an off-the-shelf ML solution?

Nothing to configure

What to read next

Keep following the attack path

Signature vs Behavioral Detection in 2026

Kubernetes Node Security with InnerWarden

Aya, eBPF, Rust: Lessons From Shipping 40 Hooks to Production

The First 60 Seconds After an Attacker Gets Shell Access

How We Detect Rootkits With CPU Timing (No Kernel Module Needed)

We Ran MITRE Caldera Against Our Own Product. Here's What We Found.