Skip to content
Engineering

Why We Switched to jemalloc (and How glibc malloc Was Eating 1GB RAM)

9 min read

Inner Warden's agent dashboard was using 1.3GB of RSS on a server with 2GB total RAM. The application was not leaking memory. Valgrind showed no leaks. The heap was clean. But the process kept growing, and it never gave memory back to the OS. The culprit was glibc's malloc, and the fix was three lines of Rust.

This is the story of how we diagnosed the problem, why glibc malloc fragments under specific workloads, and why jemalloc solved it completely.

The problem: RSS grows forever

We exposed the agent dashboard to the internet so operators could check their server status remotely. Within hours, bots found it. They were not attacking. They were scanning: sending HTTP requests to random paths, probing for WordPress, phpMyAdmin, and .env files. Each request allocated memory for parsing, routing, and response generation. After the request completed, Rust dropped all the allocations.

But the RSS kept growing. After 24 hours of bot traffic, the process was at 1.3GB. After restarting, it was at 40MB. Then it started growing again.

RSS at startup38 MB
RSS after 1 hour210 MB
RSS after 6 hours680 MB
RSS after 24 hours1,310 MB
Actual heap in use~45 MB

The gap between heap in use (45MB) and RSS (1,310MB) is entirely fragmentation. glibc malloc allocated memory in small chunks, freed them, but could not return the pages to the OS because of how it manages arenas and thread caches.

Why glibc malloc does this

glibc's malloc (ptmalloc2) uses a system of arenas and bins to manage memory. When a thread allocates memory, it gets an arena. When it frees memory, the freed block goes into a bin for reuse. The problem is that glibc uses brk() and mmap() for backing memory, and it can only return pages to the OS when the top of the heap is free. If even one small allocation sits at the top, all the memory below it is trapped.

This is exactly what happens with async Rust web servers. Tokio spawns tasks across threads. Each task makes small, short-lived allocations for request parsing, JSON serialization, and response building. The allocations are freed quickly, but they fragment across arenas. glibc sees the freed memory but cannot release it because the arena structure prevents consolidation.

This is not a bug. It is a design tradeoff. glibc prioritizes allocation speed for general-purpose workloads. But for long-running services with many short-lived allocations, it creates unbounded RSS growth.

The 3-line fix: jemalloc

jemalloc handles this differently. It uses madvise(MADV_DONTNEED) to tell the kernel that freed pages can be reclaimed. The pages stay in the virtual address space but the physical memory is returned. When the application needs memory again, the kernel transparently re-maps fresh pages.

Cargo.toml
[dependencies]
tikv-jemallocator = "0.6"
main.rs
#[cfg(target_os = "linux")]
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

That is the entire change. Three lines in main.rs, one dependency in Cargo.toml. The results were immediate:

Before and after

Metricglibc mallocjemalloc
RSS at startup38 MB42 MB
RSS after 1 hour210 MB68 MB
RSS after 24 hours1,310 MB120 MB
RSS after 7 daysOOM killed115 MB
Memory returned to OSNeverContinuously

With jemalloc, RSS stabilizes around 115-120MB regardless of traffic volume. The bot traffic still comes. The allocations still happen. But jemalloc returns freed pages to the OS continuously, so the RSS stays bounded.

Other memory optimizations we made

jemalloc fixed the allocator fragmentation, but we also optimized the application layer to reduce peak allocations:

  • JSONL file cache - the agent reads events and incidents from JSONL files. Instead of loading entire files into memory, we read only the tail using byte-offset cursors. The cursor tracks where we left off so each read processes only new data.
  • Narrative accumulator - the narrative summary used to buffer all incidents in memory. We switched to a streaming accumulator that processes incidents in batches and emits the summary incrementally.
  • Rate limiting on dashboard - bots sending hundreds of requests per second could overwhelm the dashboard. We added rate limiting to cap concurrent request processing, which bounds peak memory usage.
  • Data retention - JSONL files are automatically rotated and old data is cleaned up. Without this, the files grow indefinitely and reading them becomes increasingly expensive.

When should you switch to jemalloc?

If your Rust application runs on Linux and you observe any of these symptoms, jemalloc is likely the fix:

  • RSS grows over time but Valgrind shows no leaks
  • RSS is much larger than actual heap usage (check with jemalloc_ctl or /proc/self/smaps)
  • The application uses async Rust (Tokio, async-std) with many concurrent tasks
  • The application handles many short-lived requests (HTTP server, message queue consumer)
  • Restarting the process temporarily fixes the memory usage

On macOS, the system allocator already handles this well (it uses a different strategy than glibc). The #[cfg(target_os = "linux")] guard ensures jemalloc only activates on Linux where it is needed.

Try it yourself

Inner Warden ships with jemalloc enabled. Install it and check the memory usage under load:

Install
curl -fsSL https://innerwarden.com/install | sudo bash
Check memory usage
# Check RSS of the agent process
ps aux | grep innerwarden-agent | awk '{print $6/1024 " MB"}'

What to do next