Skip to content
Detection Engineering

208 Sigma Rules on eBPF: Bridging Community Detection to Kernel Telemetry

14 min read

Sigma is the open standard for detection rules. Think of it as Snort for logs, or YARA for events. Security teams write rules in YAML that describe suspicious behavior, and the community has published thousands of them. The problem is that Sigma was built for Windows SIEMs. Splunk, Elastic, Microsoft Sentinel. The field names assume Windows Event Logs. The backends assume log indexers. Linux and eBPF were not part of the original design.

We imported 208 Sigma rules from SigmaHQ and made them work with Inner Warden's eBPF kernel telemetry. This post explains what worked, what broke, and what we learned about bridging community detection logic to kernel-level event streams.

What Sigma rules are

Sigma is a generic signature format for log events. A rule describes what to look for (the detection logic), what it means (the description and MITRE mapping), and how noisy it is (the level and false positive notes). The community maintains over 3,000 rules in the SigmaHQ repository, covering everything from credential dumping to DNS tunneling.

Example Sigma rule
title: Suspicious Curl Download to Tmp
logsource:
  category: process_creation
  product: linux
detection:
  selection:
    Image|endswith: '/curl'
    CommandLine|contains|all:
      - '-o'
      - '/tmp/'
  condition: selection
level: medium
tags:
  - attack.execution
  - attack.t1059

The beauty of Sigma is portability. One rule, many backends. Except the backends are all log indexers: Splunk SPL, Elasticsearch DSL, Azure KQL. Nobody had built a backend for live kernel telemetry. Until now.

The problem: Sigma speaks Windows SIEM

Sigma's field names come from Windows Event Logs. A process creation rule uses Image for the binary path, CommandLine for the full command, ParentImage for the parent process binary, and User for the account name. These come from Sysmon Event ID 1 on Windows.

eBPF on Linux does not have these fields. When the sys_enter_execve tracepoint fires, you get a binary path in details.filename, command arguments in details.command, and a parent process path in details.parent. The data is the same conceptually, but the field names and structure are completely different.

Beyond field names, there are deeper differences. Sigma assumes you are querying indexed logs retroactively. Inner Warden evaluates events in real time as they stream from the kernel. There is no query language, no index, no aggregation. Each event must be matched against all applicable rules in microseconds, not milliseconds.

Importing 208 rules from SigmaHQ

We filtered the SigmaHQ repository for Linux-relevant rules and imported 208 of them. The breakdown by logsource category:

Rules by category
process_creation120 rules

Process execution patterns: reverse shells, crypto miners, suspicious binaries, encoded commands, LOLBins.

auditd53 rules

Audit framework events: file access, permission changes, module loading, user management.

builtin22 rules

System log patterns: SSH events, cron activity, authentication anomalies from syslog/journald.

file_event8 rules

File creation and modification patterns: webshell drops, cron persistence, authorized_keys tampering.

network5 rules

Network connection patterns: C2 callbacks, DNS tunneling indicators, unusual outbound ports.

Each rule was converted from its SigmaHQ YAML format into Inner Warden's internal rule representation. The rules live alongside our hand-written detectors and are evaluated on the same event stream. No separate pipeline, no separate engine.

Field aliasing: translating Windows to Linux

The first challenge was mapping Sigma's Windows-centric field names to Inner Warden's eBPF event structure. This is not just renaming. Some fields require extraction from nested structures, and some Sigma fields have no direct equivalent.

Field mapping table
Sigma field          → Inner Warden field
─────────────────────────────────────────────
Image                → details.filename
CommandLine          → details.command
ParentImage          → details.parent
ParentCommandLine    → details.parent_command
User                 → details.user
CurrentDirectory     → details.cwd
TargetFilename       → details.target_path
SourceIp             → details.source_ip
DestinationIp        → details.dest_ip
DestinationPort      → details.dest_port

The alias layer sits between the Sigma parser and the matching engine. When a rule references Image|endswith: '/curl', the alias layer rewrites it to details.filename ends_with "/curl" before evaluation. This happens at parse time, not at match time, so there is no runtime cost.

Rewriting the Sigma parser for real-time evaluation

Sigma's detection syntax looks simple in basic cases, but production rules use features that require careful parsing. We had to implement support for multiple named selections, filter negation, the |contains|all modifier, YAML list values as OR conditions, and boolean condition expressions.

Complex Sigma condition
detection:
  selection_tool:
    Image|endswith:
      - '/wget'
      - '/curl'
  selection_dest:
    CommandLine|contains|all:
      - 'http'
      - '/tmp/'
  filter_package_manager:
    ParentImage|endswith:
      - '/apt'
      - '/yum'
      - '/dnf'
  condition: selection_tool and selection_dest
             and not filter_package_manager

This rule has two selections that must both match (AND), a filter that excludes package manager activity (NOT), list values that act as alternatives (OR), and the |contains|all modifier that requires all strings to be present in the same field. Our parser compiles this into an evaluation tree at load time, so matching at runtime is a simple tree walk with no string parsing.

Compiled evaluation (internal representation)
And(
  Or(                         // selection_tool
    EndsWith(filename, "/wget"),
    EndsWith(filename, "/curl"),
  ),
  And(                        // selection_dest (contains|all)
    Contains(command, "http"),
    Contains(command, "/tmp/"),
  ),
  Not(                        // filter_package_manager
    Or(
      EndsWith(parent, "/apt"),
      EndsWith(parent, "/yum"),
      EndsWith(parent, "/dnf"),
    ),
  ),
)

The key modifiers we support: |contains, |endswith, |startswith, |contains|all, |re, and |base64offset. Each modifier compiles into a specific match operation. Plain field values without modifiers default to exact string comparison.

What works well

Process creation rules are the sweet spot. The 120 process_creation rules map almost perfectly to execve tracepoint events. The data is all there: binary path, full command line, parent process, user. The rules detect real threats that our hand-written detectors did not cover. We found rules for GTFOBins abuse, living-off-the-land binaries, and obscure persistence mechanisms that we would not have written ourselves.

  • Reverse shell detection: Sigma rules catch patterns like bash -i >& /dev/tcp/ and python -c 'import socket' across many language runtimes.
  • Credential access: rules for reading /etc/shadow, dumping SSH keys, and accessing credential stores via standard Linux tools.
  • Defense evasion: rules that catch timestomping via touch, log clearing via truncate, and history evasion via unset HISTFILE.
  • Persistence: rules for crontab modification, systemd service creation, and .bashrc backdoors. These complement our eBPF file monitoring.

What does not work

EQL (Event Query Language) correlation rules are a different story. Some Sigma rules assume you can correlate across multiple events in a time window, for example, "a login followed by a privilege escalation within 60 seconds." This is a stateful query that requires an event database. Our sensor evaluates rules per-event, so these correlation rules need a different approach. We handle this at the kill chain layer instead, using Redis Streams for cross-event correlation.

Rules that depend on Windows-specific fields with no Linux equivalent also fail. A few rules reference IntegrityLevel or LogonId, which have no counterpart in Linux process telemetry. These rules are imported but silently disabled.

The gap: 176 pass, 32 need work

Of the 208 imported rules, 176 parse correctly and are active in production. The remaining 32 fail parsing due to condition syntax our engine does not yet support. Most of these use advanced features: nested parentheses with mixed boolean operators, the 1 of selection_* wildcard syntax, or all of them aggregation.

Parse results
208
Imported
176
Active
32
Pending

The 32 pending rules are tracked in our parser backlog. Each one needs a specific condition syntax extension. We are adding support incrementally, prioritized by the severity of the detection the rule provides.

False positive reality

Community rules are written to be broadly applicable. That means some of them are noisy in specific environments. When you run 176 Sigma rules against live eBPF telemetry, you learn fast which ones fire too often.

The worst offender was "Inline Python Execution," which triggers on any command containing python -c. On a server running Ansible, this fires dozens of times per hour. "Shell Spawned by Web Server" fires constantly if your web application legitimately calls shell scripts for image processing or PDF generation. "Suspicious Curl Usage" matches every cURL command that downloads to a non-standard path.

These are not bad rules. They detect real attack patterns. But the gap between "this pattern is suspicious in general" and "this pattern is suspicious on this specific server" is where most false positives live.

Dynamic suppression with allowlist.toml

Rebuilding the sensor every time you need to suppress a noisy rule is not practical. Inner Warden uses an allowlist.toml file that the sensor watches for changes. You can suppress rules by ID, by field values, or by combinations of both. Changes take effect immediately, no rebuild, no restart.

allowlist.toml
# Suppress a rule entirely
[[suppress]]
rule_id = "sigma.inline_python_execution"
reason  = "Ansible runs python -c constantly"

# Suppress a rule only for specific processes
[[suppress]]
rule_id = "sigma.shell_spawned_by_web_server"
match   = { parent = "/usr/bin/convert" }
reason  = "ImageMagick calls /bin/sh for delegates"

# Suppress by command pattern
[[suppress]]
rule_id = "sigma.suspicious_curl_download"
match   = { command = "*apt.postgresql.org*" }
reason  = "PostgreSQL repo setup script"

The allowlist is checked after a rule matches but before an incident is emitted. Suppressed matches are counted in telemetry so you can review them later, but they do not generate incidents or trigger responses. This keeps the detection engine aggressive while giving operators control over what surfaces.

How Sigma rules fit the sensor pipeline

Sigma rules run in the same detector pipeline as Inner Warden's built-in detectors. They receive the same normalized events from the same collectors. The sensor does not distinguish between a hand-written detector and a Sigma rule at evaluation time.

Detection pipeline
eBPF tracepoint / kprobe / LSM
  → ring buffer → eBPF collector
    → normalized Event
      ├── built-in detectors (49 hand-written)
      ├── Sigma rules (176 active)
      └── allowlist.toml check
          → incident (if not suppressed)
            → agent (AI triage + response)

Sigma rules add breadth. Built-in detectors add depth. The hand-written detectors maintain state across events, track sessions, count failures, and implement rate-based logic. Sigma rules are stateless per-event matchers. Together they cover both the common patterns the community has catalogued and the complex multi-step attacks that require stateful correlation.

Lessons learned

  • Field aliasing is the easy part. A mapping table solves 90% of the translation problem. The hard part is the condition parser, especially when rules use nested boolean logic with wildcards.
  • Community rules need local tuning. Every server has a different baseline. A rule that is perfect for one environment is noisy in another. The allowlist is not optional. It is a core part of the system.
  • Real-time evaluation is stricter than log queries. When you query logs, you can afford regex, aggregation, and multi-event correlation. When you evaluate 176 rules per execve event in real time, every microsecond matters. We compile rules into evaluation trees at load time and avoid any string parsing at match time.
  • 208 rules for free is a good deal. Even after suppressing the noisy ones and waiting on the 32 that do not parse yet, we gained over 100 detection patterns that we would not have built ourselves. The community already did the research. We just built the bridge.

What comes next

We are working on three fronts. First, finishing the condition parser to support the remaining 32 rules. Second, adding automatic tuning that adjusts suppression thresholds based on per-server baselines. Third, contributing Linux-native Sigma rules back to SigmaHQ based on attack patterns we see in production that the community has not documented yet.

The long-term goal is bidirectional. Import community detection logic into kernel telemetry. Export kernel-level insights back to the community. Security is a shared problem, and Sigma is the right format for sharing solutions.

Related posts