Rootkit Detection

How We Detect Rootkits With CPU Timing (No Kernel Module Needed)

Name: InnerWarden
Author: InnerWarden

April 22, 202614 min read

Rootkits hide. That is their entire purpose. They intercept system calls, modify kernel data structures, and make themselves invisible to every tool that relies on the kernel telling the truth. If the kernel is compromised, anything the kernel reports is suspect.

But there is one thing a rootkit cannot fake: how long things take. Every hook, every interception, every extra instruction adds CPU cycles. Those cycles are measurable from userspace with no kernel module, no TPM chip, and no special hardware. InnerWarden's chronomancy crate uses this principle to detect rootkits with a 98.7% F1 score, entirely from userspace.

The core idea: if it is hooked, it is slower

A clean kernel function takes a predictable number of CPU cycles. When a rootkit hooks that function, it adds its own code: checking conditions, hiding processes, filtering file listings. That code takes time. Even a few hundred extra nanoseconds per call create a measurable statistical anomaly when you sample thousands of calls.

This is not a new idea. MITRE documented it in 2013 as the BIOS Chronomancy technique for firmware timing attestation via CPU cycle counters. The 2025 paper "Trace of the Times" (arXiv:2503.02402) formalized it for kernel function timing anomaly detection and achieved a 98.7% F1 score detecting rootkits in controlled experiments. InnerWarden builds on both of these foundations.

Reading the CPU cycle counter

Modern CPUs have a high-resolution cycle counter that increments every clock tick. On x86_64, you read it with the RDTSC or RDTSCP instruction. On aarch64 (ARM), you read MRS CNTVCT_EL0. Both are available from userspace with no privileges required.

x86_64: RDTSCP

// Read Time-Stamp Counter with processor ID
// RDTSCP serializes, preventing out-of-order execution
// from skewing the measurement
#[cfg(target_arch = "x86_64")]
fn read_cycles() -> u64 {
    unsafe {
        core::arch::x86_64::_rdtsc()
    }
}

// Measure a syscall's timing
let before = read_cycles();
libc::getpid();  // target syscall
let after = read_cycles();
let elapsed = after - before;

aarch64: MRS CNTVCT_EL0

// ARM Counter-timer Virtual Count register
// Accessible from EL0 (userspace) by default
#[cfg(target_arch = "aarch64")]
fn read_cycles() -> u64 {
    let cnt: u64;
    unsafe {
        core::arch::asm!(
            "mrs {}, cntvct_el0",
            out(reg) cnt
        );
    }
    cnt
}

The key insight: these counters are implemented in hardware. A rootkit cannot intercept the RDTSC instruction itself without introducing even more timing overhead. The measurement tool and the thing being measured operate at different levels, which makes the technique self-reinforcing.

From raw cycles to rootkit detection

A single timing measurement tells you nothing. CPU caches, branch prediction, interrupts, and context switches all add noise. The solution is statistical: collect thousands of samples and look for distributional anomalies that cannot be explained by normal system variance.

InnerWarden's chronomancy crate uses a three-stage detection pipeline:

Stage 1: Quantile Extraction

Collect N timing samples for each target operation (syscall, CPUID, firmware call). Extract the 25th, 50th, 75th, 90th, and 99th percentiles. These quantiles form a compact fingerprint of the timing distribution. A clean system produces tight, consistent quantiles. A hooked system shows inflated upper percentiles because the hook code adds variable overhead depending on its logic path.

Stage 2: Mahalanobis Distance

Compare the observed quantile vector against a baseline using Mahalanobis distance. Unlike simple threshold comparisons, Mahalanobis distance accounts for the correlation structure between quantiles. If the 90th and 99th percentiles always move together on clean systems, an observation where only the 99th is inflated is flagged as suspicious. This catches subtle hooks that only add overhead in specific code paths.

Stage 3: Chi-Squared P-Value

The Mahalanobis distance follows a chi-squared distribution under the null hypothesis (no rootkit). Convert the distance to a p-value. If the p-value is below the threshold (typically 0.001), the timing distribution is statistically incompatible with a clean system. This gives a principled, probabilistic decision boundary instead of arbitrary cycle-count thresholds.

Detection pipeline in Rust

use chronomancy::{TimingProbe, ProbeTarget, Verdict};

let probe = TimingProbe::new()
    .target(ProbeTarget::Syscall)    // measure syscall latency
    .samples(5000)                    // collect 5000 timing samples
    .baseline_from_boot();            // compare against boot-time baseline

let result = probe.run();

match result.verdict {
    Verdict::Clean => { /* timing matches baseline */ }
    Verdict::Anomalous { p_value, distance } => {
        // p_value: 0.00003 (far below 0.001 threshold)
        // distance: 14.7 (Mahalanobis distance)
        // Something is hooking this syscall path.
        report_incident(result);
    }
}

What timing analysis detects

Timing-based detection is effective against threats that are specifically designed to evade traditional tools. Here are the categories chronomancy targets:

Syscall table hooks: a rootkit that replaces entries in the syscall table (e.g., hooking getdents64 to hide files, or kill to hide processes) adds indirection. Every hooked syscall takes measurably longer because it passes through the rootkit's filter function before reaching the real handler.
eBPF weaponization: malicious eBPF programs attached to tracepoints or kprobes add processing to every traced event. If an attacker uses eBPF to intercept and modify syscall results, the timing overhead is detectable. The irony: using eBPF for offense creates the same timing signature we look for.
Firmware rootkits: rootkits that live in UEFI firmware or SMM (System Management Mode) intercept specific CPU instructions. CPUID calls that pass through a hypervisor or firmware hook take longer than bare-metal execution. The MITRE BIOS Chronomancy technique specifically targets this class of threat.
Hypervisor-based rootkits: a malicious hypervisor (blue pill attack) intercepts privileged instructions via VM exits. Each VM exit adds thousands of cycles. Timing CPUID instructions from inside the VM reveals whether a hidden hypervisor is present.
Inline function hooks: rootkits that patch kernel function prologues with JMP instructions add branch overhead and disrupt CPU pipeline prediction. The timing signature is subtle but consistent across thousands of samples.

Why this works without TPM or kernel modules

Traditional integrity verification relies on TPM (Trusted Platform Module) chips for measured boot, or on loading kernel modules like LKRG to verify kernel data structures. Both approaches have limitations. TPM only verifies the boot chain, not runtime integrity. Kernel modules require maintenance across kernel versions and can themselves be targeted by rootkits.

Timing-based detection avoids both problems. It runs entirely in userspace, requires no special hardware, and works on any Linux kernel. The CPU cycle counter is a hardware feature that exists on every modern processor. The measurement is physics, not software. A rootkit would need to slow down the CPU's clock itself to fake clean timing results, which is not possible from software.

Comparison

Approach          | Hardware | Kernel Module | Runtime Detection
------------------+----------+---------------+------------------
TPM Measured Boot | Yes      | No            | No (boot only)
LKRG              | No       | Yes           | Yes
eBPF integrity    | No       | No (but root) | Yes
CPU Timing (ours) | No       | No            | Yes
                  |          |               |
CPU Timing needs: a CPU with RDTSC/CNTVCT. That is all.

Academic foundations

InnerWarden's implementation draws from two key research contributions:

MITRE BIOS Chronomancy (2013)

Introduced the concept of using CPU timing to verify firmware integrity. The technique measures BIOS/UEFI function execution times and compares them against known-good baselines. If firmware has been modified (e.g., by a persistent rootkit), the execution timing changes because the modified code takes a different number of cycles. This was the first formalization of timing as an integrity attestation mechanism.

Trace of the Times (arXiv:2503.02402, 2025)

Extended the timing approach to kernel-level rootkit detection. The paper demonstrated that kernel function hooking produces statistically detectable timing anomalies with a 98.7% F1 score. Key contributions include the quantile-based feature extraction method, the use of Mahalanobis distance for multivariate anomaly detection, and validation against real rootkits including Diamorphine, Reptile, and custom eBPF-based hooks.

The chronomancy crate

InnerWarden implements this as the chronomancy crate, written in Rust with zero unsafe code outside the inline assembly for cycle counter reads. The crate is also available as a standalone open-source tool for anyone who wants timing-based rootkit detection without running the full InnerWarden stack.

Crate structure

chronomancy/
  src/
    lib.rs          // public API
    probe.rs        // timing measurement engine
    baseline.rs     // boot-time baseline collection
    detector.rs     // Mahalanobis + chi-squared analysis
    arch/
      x86_64.rs     // RDTSC/RDTSCP implementation
      aarch64.rs    // MRS CNTVCT_EL0 implementation
    targets/
      syscall.rs    // syscall timing probes
      cpuid.rs      // CPUID instruction timing
      firmware.rs   // firmware call timing

The crate supports both x86_64 and aarch64 architectures. On x86_64, it uses RDTSCP for serialized reads (preventing out-of-order execution from corrupting measurements). On aarch64, it reads the CNTVCT_EL0 counter, which is available from userspace (EL0) on all ARMv8+ processors.

Running a standalone scan

# Install the standalone tool
cargo install chronomancy

# Run a full timing scan
chronomancy scan --samples 10000

# Output:
# [CLEAN]  getpid      p=0.847  distance=1.2
# [CLEAN]  getuid      p=0.634  distance=2.1
# [ALERT]  getdents64  p=0.00003  distance=14.7
#          ^ This syscall is being hooked.
#            Something is filtering directory listings.

Dealing with noise

CPU timing is inherently noisy. Context switches, interrupts, cache misses, TLB flushes, and CPU frequency scaling all affect measurements. Getting a 98.7% F1 score requires careful noise reduction:

Outlier trimming: discard the top and bottom 1% of samples. Interrupt jitter produces extreme outliers that skew statistics. Trimming removes these without affecting the core distribution.
CPU pinning: pin the measurement thread to a single core using sched_setaffinity. Cross-core migration causes cache invalidation that looks like timing anomalies.
Warmup rounds: run 500 warmup iterations before collecting real samples. This primes the instruction cache, branch predictor, and TLB so that measurements reflect steady-state behavior.
Adaptive baselines: the baseline is not a fixed number. It is a full statistical distribution collected at boot time and periodically refreshed. This handles CPU frequency changes, thermal throttling, and varying system load.

How it fits into the InnerWarden pipeline

The sensor runs chronomancy probes on a configurable interval (default: every 5 minutes). Results flow through the same event pipeline as eBPF events, log parsing, and network analysis. When chronomancy detects an anomaly, it generates an incident with the specific syscall or instruction that is exhibiting anomalous timing, the statistical confidence, and the raw timing data.

Event flow

chronomancy probe (every 5min)
  -> collect 5000 timing samples per target
    -> extract quantile fingerprint
      -> compute Mahalanobis distance vs baseline
        -> chi-squared p-value
          -> if p < 0.001: generate incident
            -> agent correlates with other signals
              -> response: alert, investigate, isolate

The agent correlates chronomancy alerts with other indicators. A timing anomaly on getdents64 combined with hidden processes detected by /proc enumeration provides high-confidence rootkit detection. A CPUID timing anomaly combined with DMI fingerprint mismatches confirms hypervisor-level compromise.

Real-world scenario: detecting a getdents64 hook

Consider a classic rootkit like Diamorphine. It hooks the getdents64 syscall to hide files and processes whose names start with a specific prefix. When you run ls or ps, the rootkit intercepts the kernel's directory listing results and removes entries before they reach userspace.

From the perspective of traditional tools, everything looks normal. The hidden process does not appear in ps, top, or /proc. The hidden file does not appear in ls. There are no log entries because the rootkit operates below the logging layer.

But chronomancy sees it. On a clean system, getdents64 takes approximately 800-1200 cycles for a small directory. With Diamorphine's hook active, the same call takes 1400-2200 cycles because the hook function iterates through the directory entries, checks each name against the hide prefix, and rebuilds the buffer without the hidden entries. The 75th and 99th percentiles shift upward. The Mahalanobis distance spikes. The p-value drops below 0.0001.

Chronomancy output for hooked vs clean system

# Clean system baseline (getdents64, cycles):
#   p25=820  p50=950  p75=1100  p90=1180  p99=1350
#
# With Diamorphine loaded:
#   p25=1150  p50=1420  p75=1880  p90=2100  p99=2650
#
# Mahalanobis distance: 18.3
# Chi-squared p-value:  0.0000012
# Verdict: ANOMALOUS
#
# The getdents64 syscall is being intercepted.
# Correlating with /proc enumeration discrepancy...
# Hidden PID 31337 detected via /proc brute-force.
# Classification: kernel rootkit (syscall table hook)

Limitations and honest trade-offs

Timing-based detection is powerful, but it is not perfect. Understanding its limitations is important for proper deployment:

Noisy environments: heavily loaded servers with constant context switching produce noisier baselines. The detection threshold needs to be adjusted for the specific workload. This is why InnerWarden uses adaptive baselines rather than fixed thresholds.
Sophisticated evasion: a rootkit that adds exactly zero-overhead to the hooked path (e.g., by only activating during specific conditions) could evade timing detection during the probe window. Randomizing probe timing and targets mitigates this.
VM overhead: virtual machines add baseline timing overhead to privileged instructions. The baseline must be established within the same VM. Cross-environment baselines are not valid.
Not a replacement: timing analysis complements other detection methods. It is one layer in a defense-in-depth strategy. Inner Warden combines it with eBPF monitoring, log analysis, file integrity checking, and behavioral correlation.

Enable timing-based detection

Chronomancy probes are enabled by default in InnerWarden. You can configure the probe interval, sample count, and detection threshold in your configuration file.

Configuration

# /etc/innerwarden/config.toml

[sensor.chronomancy]
enabled = true
interval_secs = 300        # probe every 5 minutes
samples = 5000             # samples per probe target
p_value_threshold = 0.001  # detection sensitivity
targets = [
  "syscall",   # syscall table integrity
  "cpuid",     # hypervisor detection
  "firmware",  # firmware/UEFI integrity
]

What to do next

eBPF for kernel security to understand how InnerWarden monitors syscalls in real time using eBPF tracepoints and kprobes.
Firmware integrity monitoring to learn how chronomancy integrates with UEFI and SMM verification for Ring -2 protection.
Why CrowdStrike can't see firmware to see why traditional EDR tools miss the threats that timing analysis catches.

How We Detect Rootkits With CPU Timing (No Kernel Module Needed)

The core idea: if it is hooked, it is slower

Reading the CPU cycle counter

From raw cycles to rootkit detection

What timing analysis detects

Why this works without TPM or kernel modules

Academic foundations

The chronomancy crate

Dealing with noise

How it fits into the InnerWarden pipeline

Real-world scenario: detecting a getdents64 hook

Limitations and honest trade-offs

Enable timing-based detection

What to do next

Keep following the attack path

Signature vs Behavioral Detection in 2026

Kubernetes Node Security with InnerWarden

Aya, eBPF, Rust: Lessons From Shipping 40 Hooks to Production

The First 60 Seconds After an Attacker Gets Shell Access

Building an Autoencoder That Learns What Normal Looks Like on Your Server

We Ran MITRE Caldera Against Our Own Product. Here's What We Found.