Name: Inner Warden
Author: Inner Warden

Why Aya

We had two real options when we started: libbpf-rs with C programs, or Aya with the program itself written in Rust. We picked Aya for one reason that mattered more than every other one combined: a single language across user space and kernel space. Reviewers do not have to context switch. The same clippy lints apply. The same error type bubbles all the way from aya::Bpf::load to the program-side return code.

The cost is real. Aya is younger than libbpf, the ecosystem has fewer copy-paste examples, and the Rust eBPF target has its own sharp edges. After a year of running 40 hooks across tracepoints, kprobes, LSM, and XDP, the trade was the right one for us.

The no_std boundary is the most important thing in the repo

The kernel-side crate is #![no_std]. No allocator. No std types. No panics that allocate. The user-space loader is normal Rust. They share a third crate for the wire types, and that third crate is also #![no_std].

Once you accept that the shared types crate has to live in the no_std world, a lot of decisions get easy. No String, no Vec, fixed-size byte arrays for paths and comms, plain old POD structs with #[repr(C)]. The user side does any allocation needed when it lifts events off the ring buffer.

#[repr(C)]
#[derive(Copy, Clone)]
pub struct ExecEvent {
    pub pid: u32,
    pub uid: u32,
    pub comm: [u8; 16],
    pub argv: [u8; 256],
    pub argv_len: u16,
    pub flags: u16,
}

CO-RE and BTF: do not skip this

Compile-once, run-everywhere only works if you actually use it. That means kernel struct field reads go through aya_ebpf::helpers::bpf_probe_read_kernel with BTF-aware field offsets, not hand-coded offsets you read out of one specific kernel header. We learned this the hard way the first time we tried to run on a 6.x kernel after building against 5.15.

The fix is not glamorous: ship vmlinux BTF awareness in the loader, generate kernel struct bindings against /sys/kernel/btf/vmlinux when present, fall back to a bundled BTF for kernels that do not expose it, and refuse to load if neither is available with a clear error message.

Ring buffer with epoll, not perf event arrays

Use BPF_MAP_TYPE_RINGBUF wherever your minimum kernel allows it (5.8+). It is a single MPSC ring across all CPUs, you do not have to drain a per-CPU array, and Aya gives you a polled file descriptor you can plug into tokio with one helper.

let mut events: RingBuf<_> = bpf
    .take_map("EVENTS")
    .ok_or(LoadError::MapMissing)?
    .try_into()?;

let async_fd = AsyncFd::new(events.as_raw_fd())?;
loop {
    let mut guard = async_fd.readable_mut().await?;
    while let Some(item) = events.next() {
        let ev: ExecEvent = unsafe { std::ptr::read_unaligned(item.as_ptr() as *const _) };
        tx.send(ev).await.ok();
    }
    guard.clear_ready();
}

Two things to remember. First, the read is unaligned because the ring buffer does not promise alignment for arbitrary record sizes. Second, you must keep the kernel-side reservation and commit balanced. Reserve, fill, commit. If you panic between reserve and commit, the slot is leaked and the ring wedges.

Verifier failures we hit (and how we got past them)

The verifier is honest. Every failure we hit was a real bug that would have been a kernel oops if it had loaded. The recurring ones:

Unbounded loops. The verifier rejects loops it cannot prove terminate. We bound every loop with a #[unroll] attribute or a const-bounded counter. For string scanning we cap at 256 bytes and accept a small false-negative rate on gigantic argv.

Pointer arithmetic the verifier cannot follow. If you derive a pointer through a helper, you have to bounds-check the result before you read through it, even if you know it is safe. The verifier is not going to take your word for it.

Stack frames over 512 bytes. Push large state into a per-CPU array map. Treat that map as your scratchpad.

When the verifier rejects a program, dump the log with aya_log::EbpfLogger and bpftool prog load with --debug. The log is verbose, the offending instruction is named, the fix is usually one line.

Mixing program types

Forty hooks across tracepoints (process exec, signals, network connect), kprobes (filesystem and credential operations), LSM (file open and bprm_check), and XDP (early packet inspection on a couple of perimeter interfaces). Each type has its own constraints and its own sweet spot.

Tracepoints are the safest first stop. Stable ABI, fixed context, no ABI drift between kernels. Kprobes are powerful and fragile, since the function names you attach to are not a stable interface. LSM hooks are the right choice when you want a yes/no decision in the kernel path; we use them for a tiny number of guard rails, not for general telemetry. XDP is spectacular for rate-limiting at the NIC and a poor fit for anything that requires correlation with process state.

Debugging with bpftool

Once you accept that bpftool is the kernel-side equivalent of ps, debugging gets a lot calmer. bpftool prog list shows what is loaded. bpftool map dump id N dumps a map without you having to write a reader. bpftool prog tracelog tails the trace pipe. Combined with aya_log, that covers about ninety percent of the debugging surface.

CPU budget is real

Every hook runs in the hot path of whatever it attached to. An exec tracepoint that takes a millisecond is a five percent cost on workloads that fork a lot. Profile every hook on a busy host. We aim for sub-microsecond p99 in the kernel side. Anything that wants to do more work goes into the user-space consumer.

The rule of thumb that has held up: the kernel side decides "is this event worth shipping" with the cheapest possible filter. Everything else lives in user space.

Aya, eBPF, Rust: Lessons From Shipping 40 Hooks to Production