Why We Switched to jemalloc (and How glibc malloc Was Eating 1GB RAM)
Inner Warden's agent dashboard was using 1.3GB of RSS on a server with 2GB total RAM. The application was not leaking memory. Valgrind showed no leaks. The heap was clean. But the process kept growing, and it never gave memory back to the OS. The culprit was glibc's malloc, and the fix was three lines of Rust.
This is the story of how we diagnosed the problem, why glibc malloc fragments under specific workloads, and why jemalloc solved it completely.
The problem: RSS grows forever
We exposed the agent dashboard to the internet so operators could check their server status remotely. Within hours, bots found it. They were not attacking. They were scanning: sending HTTP requests to random paths, probing for WordPress, phpMyAdmin, and .env files. Each request allocated memory for parsing, routing, and response generation. After the request completed, Rust dropped all the allocations.
But the RSS kept growing. After 24 hours of bot traffic, the process was at 1.3GB. After restarting, it was at 40MB. Then it started growing again.
The gap between heap in use (45MB) and RSS (1,310MB) is entirely fragmentation. glibc malloc allocated memory in small chunks, freed them, but could not return the pages to the OS because of how it manages arenas and thread caches.
Why glibc malloc does this
glibc's malloc (ptmalloc2) uses a system of arenas and bins to manage memory. When a thread allocates memory, it gets an arena. When it frees memory, the freed block goes into a bin for reuse. The problem is that glibc uses brk() and mmap() for backing memory, and it can only return pages to the OS when the top of the heap is free. If even one small allocation sits at the top, all the memory below it is trapped.
This is exactly what happens with async Rust web servers. Tokio spawns tasks across threads. Each task makes small, short-lived allocations for request parsing, JSON serialization, and response building. The allocations are freed quickly, but they fragment across arenas. glibc sees the freed memory but cannot release it because the arena structure prevents consolidation.
This is not a bug. It is a design tradeoff. glibc prioritizes allocation speed for general-purpose workloads. But for long-running services with many short-lived allocations, it creates unbounded RSS growth.
The 3-line fix: jemalloc
jemalloc handles this differently. It uses madvise(MADV_DONTNEED) to tell the kernel that freed pages can be reclaimed. The pages stay in the virtual address space but the physical memory is returned. When the application needs memory again, the kernel transparently re-maps fresh pages.
[dependencies]
tikv-jemallocator = "0.6"#[cfg(target_os = "linux")]
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;That is the entire change. Three lines in main.rs, one dependency in Cargo.toml. The results were immediate:
Before and after
| Metric | glibc malloc | jemalloc |
|---|---|---|
| RSS at startup | 38 MB | 42 MB |
| RSS after 1 hour | 210 MB | 68 MB |
| RSS after 24 hours | 1,310 MB | 120 MB |
| RSS after 7 days | OOM killed | 115 MB |
| Memory returned to OS | Never | Continuously |
With jemalloc, RSS stabilizes around 115-120MB regardless of traffic volume. The bot traffic still comes. The allocations still happen. But jemalloc returns freed pages to the OS continuously, so the RSS stays bounded.
Other memory optimizations we made
jemalloc fixed the allocator fragmentation, but we also optimized the application layer to reduce peak allocations:
- JSONL file cache - the agent reads events and incidents from JSONL files. Instead of loading entire files into memory, we read only the tail using byte-offset cursors. The cursor tracks where we left off so each read processes only new data.
- Narrative accumulator - the narrative summary used to buffer all incidents in memory. We switched to a streaming accumulator that processes incidents in batches and emits the summary incrementally.
- Rate limiting on dashboard - bots sending hundreds of requests per second could overwhelm the dashboard. We added rate limiting to cap concurrent request processing, which bounds peak memory usage.
- Data retention - JSONL files are automatically rotated and old data is cleaned up. Without this, the files grow indefinitely and reading them becomes increasingly expensive.
When should you switch to jemalloc?
If your Rust application runs on Linux and you observe any of these symptoms, jemalloc is likely the fix:
- RSS grows over time but Valgrind shows no leaks
- RSS is much larger than actual heap usage (check with
jemalloc_ctlor/proc/self/smaps) - The application uses async Rust (Tokio, async-std) with many concurrent tasks
- The application handles many short-lived requests (HTTP server, message queue consumer)
- Restarting the process temporarily fixes the memory usage
On macOS, the system allocator already handles this well (it uses a different strategy than glibc). The #[cfg(target_os = "linux")] guard ensures jemalloc only activates on Linux where it is needed.
Try it yourself
Inner Warden ships with jemalloc enabled. Install it and check the memory usage under load:
curl -fsSL https://innerwarden.com/install | sudo bash# Check RSS of the agent process
ps aux | grep innerwarden-agent | awk '{print $6/1024 " MB"}'What to do next
- Grafana + Prometheus monitoring - track Inner Warden's memory usage and other metrics in Grafana.
- AI isolation architecture - how the agent processes thousands of events without giving AI execution access.
- View on GitHub - see the jemalloc configuration and all memory optimizations in the source.