This is the Tebako fork of jemalloc, a
general-purpose malloc(3) implementation that emphasizes fragmentation
avoidance and scalable concurrency support.
Tebako uses jemalloc as its default memory allocator due to its strong performance characteristics and rich feature set.
jemalloc first came into use as the FreeBSD libc allocator in 2005, and since then it has found its way into numerous applications that rely on its predictable behavior. In 2010 jemalloc development efforts broadened to include developer support features such as heap profiling and extensive monitoring/tuning hooks. Modern jemalloc releases continue to be integrated back into FreeBSD, and therefore versatility remains critical.
This fork is based on the Facebook fork of the original jemalloc project. The original jemalloc project by Jason Evans has been discontinued by its original author and is no longer actively maintained.
While the original project is frozen, this fork continues to:
-
Fix security issues and code quality problems
-
Optimize performance for modern hardware
-
Maintain compatibility across evolving platforms
-
Address bugs reported by the community
CMake (Recommended):
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install buildAutotools (Traditional):
./configure
make
make installFor detailed build instructions, configuration options, platform-specific builds, and advanced usage, see INSTALL.adoc.
Build and install using CMake:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build --prefix /usr/localAfter installation, use in your CMake projects:
cmake_minimum_required(VERSION 3.24)
project(myapp)
find_package(jemalloc CONFIG REQUIRED)
add_executable(myapp main.c)
target_link_libraries(myapp PRIVATE jemalloc::jemalloc)For complete CMake integration guide including add_subdirectory() and
FetchContent methods, configuration options, platform-specific notes, and
troubleshooting, see CMAKE_INTEGRATION.adoc.
The Tebako fork of jemalloc fully supports musl libc, a lightweight and fast alternative to glibc.
Two build approaches are supported:
-
Native musl builds using Alpine Linux
-
Cross-compilation using musl-tools on Ubuntu/Debian
The recommended way to build and test with musl is using Alpine Linux:
# Install build dependencies
apk add cmake make gcc musl-dev linux-headers ninja
# Build and test with CMake
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build --prefix /usr/local
# Run tests
cd build
ctest --output-on-failureWhy CMake on Alpine: The native CMake build system works perfectly on Alpine Linux without requiring autotools.
# Install musl tools
sudo apt-get install musl-tools
# Build with CMake
CC=musl-gcc CXX=musl-g++ cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build-
MADV_DONTNEEDzeros memory pages on musl (safer than glibc’s behavior) -
No glibc-specific malloc hooks (
malloc_hook,free_hook, etc.) -
All threading and math functions integrated into musl libc (no separate libraries)
-
Smaller binary footprint and faster performance
-
Ideal for containers, embedded systems, and static linking
Known limitation: C++ memory exhaustion tests (like
cpp/infallible_new_false) may hang on Alpine/musl in containerized
environments. This is due to differences in OOM behavior between musl and glibc.
These tests are automatically skipped on musl builds.
Heap profiling works correctly on musl libc with Alpine Linux, including static and dynamic linking scenarios. The historical deadlock issue reported in GitHub issue #585 (2017) has been resolved in jemalloc 5.x series.
For comprehensive profiling documentation including Alpine-specific setup, see Profiling Guide § Alpine Linux.
Build with profiling:
cmake -B build -DJEMALLOC_ENABLE_PROF=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install buildUse profiling:
# Enable profiling at runtime
MALLOC_CONF="prof:true,lg_prof_sample:19" ./your_app
# Generate heap dump at program exit
MALLOC_CONF="prof:true,prof_final:true" ./your_appStatic linking with profiling:
gcc -static your_app.c -ljemalloc -lpthread -no-pie -o your_app|
Note
|
The -no-pie flag is required for static builds on Alpine.
|
Link your application against libjemalloc:
// No code changes required - use standard malloc/free
void *ptr = malloc(1024);
free(ptr);Configure behavior via environment variables:
# Disable thread caching
MALLOC_CONF="tcache:false"
# Enable per-CPU arenas
MALLOC_CONF="percpu_arena:percpu"
# Multiple options
MALLOC_CONF="tcache:false,dirty_decay_ms:10000"When migrating from glibc to jemalloc:
-
Audit code for alignment assumptions - Search for SIMD types,
alignas(), SSE/AVX intrinsics -
Use explicit alignment - Replace plain
malloc()withaligned_alloc()ormallocx()where needed -
Test thoroughly - Run with address sanitizer (
-fsanitize=address) to catch alignment issues -
Review language bindings - Verify FFI boundaries handle alignment correctly (Rust, Go, etc.)
jemalloc’s alignment guarantees differ from glibc, which can cause crashes in applications expecting glibc-specific behavior. This section explains the differences and provides solutions.
Understanding alignment behavior is essential when porting from glibc to jemalloc, especially for:
-
Code migrating from glibc-based systems
-
Applications using SIMD types or overaligned structures
-
Applications using SSE/AVX instructions expecting specific alignment
-
C++ code with
alignas()specifiers
jemalloc provides alignment based on the size class of the allocation:
| Allocation size | Minimum alignment | Notes |
|---|---|---|
1-8 bytes |
8 bytes |
Sufficient for |
9-16 bytes |
16 bytes |
Sufficient for most SIMD types |
17-32 bytes |
16 or 32 bytes |
Platform-dependent |
33-64 bytes |
32 or 64 bytes |
Platform-dependent |
65+ bytes |
Based on size class |
Minimum 16 bytes on most platforms |
|
Warning
|
jemalloc differs from glibc’s behavior in critical ways. |
The differences are:
- glibc behavior
-
Always returns ≥16-byte alignment on x86_64 for ALL allocations
- jemalloc behavior
-
Returns alignment based on allocation size
What this means:
-
Code expecting 16-byte alignment for 8-byte allocations will fail with jemalloc.
// GCC/Clang extension with overaligned type
struct alignas(16) SmallData {
uint64_t value; // sizeof = 8 bytes
};
void* p = malloc(sizeof(struct SmallData)); // Allocates 8 bytes
// With glibc: p has 16-byte alignment ✓
// With jemalloc: p has 8-byte alignment ✗
__m128i* simd = (__m128i*)p;
*simd = _mm_set1_epi32(42); // May crash with jemalloc!For guaranteed alignment, use alignment-specific allocation functions:
void* p;
int result = posix_memalign(&p, 16, size);
if (result != 0) {
// Handle allocation failure
}
// p is guaranteed to be 16-byte aligned// Size must be a multiple of alignment
size_t aligned_size = (size + 15) & ~15; // Round up to multiple of 16
void* p = aligned_alloc(16, aligned_size);
if (p == NULL) {
// Handle allocation failure
}
// p is guaranteed to be 16-byte aligned#include <jemalloc/jemalloc.h>
// Most flexible - size doesn't need to be multiple of alignment
void* p = mallocx(size, MALLOCX_ALIGN(16));
if (p == NULL) {
// Handle allocation failure
}
// p is guaranteed to be 16-byte alignedC++17 over-aligned operator new:
// C++17 automatically calls aligned operator new
struct alignas(32) Data {
int64_t values[2]; // 16 bytes, but needs 32-byte alignment
};
// Uses ::operator new(size_t, std::align_val_t)
// jemalloc's C++ integration handles this via mallocx()
auto* p = new Data();
// Correctly aligned at 32 bytes
delete p;|
Note
|
For C users, when compiling with jemalloc's
C support (enabled by default), aligned new operators are handled correctly.
|
To check the alignment of an existing pointer:
#include <jemalloc/jemalloc.h>
void* p = malloc(size);
// Get actual allocation size (may be larger than requested)
size_t actual_size = malloc_usable_size(p);
// Alignment is based on size class of actual_size
# For most cases: alignment = min(actual_size, 16) on x86_64
// For larger allocations: alignment follows size class rules
// To check pointer alignment at runtime:
uintptr_t addr = (uintptr_t)p;
size_t alignment = addr & -addr; // Finds actual alignment
printf("Pointer %p has %zu-byte alignment\n", p, alignment);| Platform | Architectures | Compilers | CI Coverage | CMake | vcpkg |
|---|---|---|---|---|---|
Linux (glibc) |
x64, ARM64, x86 (cross-compile) |
gcc, clang |
✅ Full |
✅ Yes |
✅ Yes |
Linux (musl) |
x64, ARM64 |
gcc, clang |
✅ Full |
✅ Yes |
✅ Yes |
Windows |
x64, x86, ARM64 |
MSVC, MinGW |
✅ Full |
✅ Yes |
✅ Yes |
macOS |
x64 (Intel), ARM64 (Apple Silicon) |
clang |
✅ Full |
✅ Yes |
✅ Yes |
FreeBSD |
x64 |
gcc |
✅ Full |
✅ Yes |
✅ Yes |
Linux RISC-V |
riscv64 |
gcc |
✅ Full |
✅ Yes |
✅ Yes |
|
Note
|
CMake 3.24+ required. vcpkg support requires Tebako fork overlay (see CMAKE_INTEGRATION.adoc § vcpkg Integration). |
|
Note
|
RISC-V CI testing uses QEMU emulation via GitHub Actions, which can be 10-25x slower than native execution for compute-intensive tasks. The emulated environment validates that jemalloc builds and runs correctly on RISC-V architecture. |
The Tebako fork continues jemalloc development with enhancements on top of the original project with these differences:
-
Simplified building using CMake as the primary build system
-
vcpkg port support for easy installation and dependency management
-
Full ARM architecture support across all major platforms
-
Comprehensive cross-platform CI/CD testing using GitHub Actions
-
Ongoing development to fix bugs, security issues, and maintain compatibility
The Tebako fork simplifies jemalloc building by making CMake the primary build system. While the traditional autotools build system is still available for compatibility, CMake is now the recommended approach for all platforms.
Native CMake build system - works on all platforms without Unix tools.
Quick start:
# All platforms (Windows, Linux, macOS, FreeBSD)
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build --prefix /path/to/installWindows with MSVC (no MSYS2/bash/autoconf needed):
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build --prefix C:\path\to\installCommon Options:
-DJEMALLOC_BUILD_SHARED=ON/OFF-
Build shared library (default: ON)
-DJEMALLOC_BUILD_STATIC=ON/OFF-
Build static library (default: ON)
-DJEMALLOC_ENABLE_PROF=ON/OFF-
Enable heap profiling (default: OFF)
-DJEMALLOC_ENABLE_STATS=ON/OFF-
Enable statistics (default: ON)
For complete CMake integration guide including find_package(), add_subdirectory(),
FetchContent, and advanced configuration, see CMAKE_INTEGRATION.adoc.
Still available for compatibility, but CMake is now the recommended build system:
./autogen.sh
./configure
make
make installSee INSTALL.adoc for detailed autotools instructions.
This fork provides official vcpkg port support for easy installation and dependency management:
-
Overlay port included - Ready-to-use vcpkg port in
ports/jemalloc/ -
Manifest mode support - Works with vcpkg.json dependency declarations
-
Classic mode support - Global installation available
-
Cross-platform - Supports all vcpkg target platforms
-
Automatic CMake integration - Installs proper CMake config files
Quick vcpkg installation:
# Install with overlay
vcpkg install jemalloc --overlay-ports=path/to/jemalloc/ports
# Or use manifest mode (vcpkg.json)
{
"dependencies": ["jemalloc"],
"vcpkg-configuration": {
"overlay-ports": ["path/to/jemalloc/ports"]
}
}For detailed vcpkg integration guide including manifest mode, version pinning, and troubleshooting, see CMAKE_INTEGRATION.adoc § vcpkg Integration.
This fork provides comprehensive ARM64 support across all major platforms.
First jemalloc variant with COMPLETE Windows ARM64 support.
-
Full CI/CD coverage on
windows-11-armGitHub Actions runners -
All build configurations working (Debug, Release, static, shared, MSVC)
-
Native CMake build - no Unix tools required
-
Native ARM64 intrinsics (
__yield()for CPU spinwait viaYieldProcessor()) -
vcpkg integration tested and validated
-
All Visual Studio versions supported (2015-2022)
Build on Windows ARM64:
# Simple! Just use CMake with MSVC
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config ReleaseAutomatic fix for ARM64 outline atomics linking error:
-
Detects
musl + ARM64configuration automatically -
Adds
-mno-outline-atomicsflag to prevent__getauxvalerrors -
Resolves GitHub issue #2782
-
Tested on Alpine Linux and Ubuntu cross-compilation
Full native support for Apple Silicon:
-
M1/M2/M3/M4 processor optimization
-
Universal binary support (x86_64 + ARM64)
-
Tested on macOS 13-15
-
Zone allocator integration
On ARM64 systems with ARMv8.5-A or newer processors (ARMv9), jemalloc
automatically detects and uses the SB (Speculation Barrier) instruction
instead of the older ISB (Instruction Synchronization Barrier) for improved
spin delay performance.
The ARM SB implementation on Linux was contributed by @salvatoredipietro in GitHub issue #2843. (Thank you!)
Performance improvement: depends on the version of instruction set and CPU, it is reported that AWS Graviton 3 shows approximately ~30% (GitHub issue #2843), and Apple Silicon M3 shows approximately ~11.3% improvement in benchmarks.
For detailed benchmark comparison and methodology, see Profiling Guide § ARMv8.5+ speculation barrier optimization.
Cross-platform support:
-
Linux ARM64: Runtime detection via
getauxval(AT_HWCAP)andHWCAP_SBflag -
macOS ARM64: Runtime detection via
sysctlbyname("hw.optional.arm.FEAT_SB") -
Windows ARM64: Runtime detection via registry (
ID_AA64ISAR1_EL1-CP 4031)
Automatic and transparent - no configuration needed on any platform.
Supported hardware:
-
Linux: AWS Graviton 3/4 (ARMv9 Neoverse V1/V2), Neoverse N2, future (2022+)
-
macOS: All Apple Silicon - M1, M2, M3, M4 and future variants (ARMv8.5-A+)
-
Windows: Snapdragon X Elite/Plus (ARMv8.7-A+) and future variants
Fallback: Automatically uses ISB on ARMv8.0-A through ARMv8.4-A processors
|
Note
|
ARMv8.5-A introduced the SB instruction in 2019. Performance improvement varies by microarchitecture - server-class ARM cores (Neoverse) typically show larger gains than client cores (Apple Silicon). |
Usage (optimization is automatic):
# Build jemalloc normally
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build
# Link your application
gcc myapp.c -ljemalloc -o myapp
./myapp # Automatically uses SB on ARMv8.5-A+, ISB on olderTo verify it’s working:
Linux:
# Check for SB support in CPU features
grep -o 'sb' /proc/cpuinfo | head -1macOS:
# Check for SB support via sysctl
sysctl -n hw.optional.arm.FEAT_SB # Returns 1 if supportedWindows:
# Check registry for ARMv8.5-A support (SB included)
# ID_AA64ISAR1_EL1 bits 11:8 should be nonzeroBenchmark on your system:
# Compile benchmark (included in source)
clang -O2 -o benchmark_sb benchmark_sb_macos.c
./benchmark_sbVerify at runtime (cross-platform):
#include <stdio.h>
extern int arm_has_sb_instruction; // Works on all ARM64 platforms
int main() {
printf("SB instruction support: %s\n",
arm_has_sb_instruction ? "YES (ARMv8.5-A+)" : "NO (ARMv8.0-8.4)");
return 0;
}For platform-specific build instructions and ARM64 details, see CMAKE_INTEGRATION.adoc § Platform-Specific Notes.
ARM64 systems support multiple page sizes. jemalloc must be compiled for the correct page size or it will crash at runtime.
|
Warning
|
Building jemalloc with an incorrect page size will cause immediate crashes. This is the #1 compatibility issue on ARM64 platforms. |
ARM64 kernels can be configured with different page sizes:
-
4 KiB (4096 bytes) - Traditional Linux default, most compatible
-
16 KiB (16384 bytes) - Apple Silicon Macs, Raspberry Pi 5, some ARM servers
-
64 KiB (65536 bytes) - ARM64 servers optimized for performance (AWS Graviton, etc.)
To determine your system’s page size, run:
# On Linux/macOS/FreeBSD
getconf PAGESIZE
# Example outputs:
# 4096 → 4K pages: use -DJEMALLOC_LG_PAGE=12 (2^12 = 4096)
# 16384 → 16K pages: use -DJEMALLOC_LG_PAGE=14 (2^14 = 16384)
# 65536 → 64K pages: use -DJEMALLOC_LG_PAGE=16 (2^16 = 65536)| Platform | Page size | CMake command |
|---|---|---|
x86_64 (Intel/AMD) |
4 KiB |
|
Apple Silicon (M1/M2/M3) |
16 KiB |
|
Raspberry Pi 5 |
16 KiB |
|
AWS Graviton 2/3/4 |
64 KiB |
|
ARM64 generic Linux |
Check with |
Set |
|
Warning
|
For optimal efficiency, always use the exact page size of your deployment target. |
|
Note
|
If things crash:
|
When building on the target system, CMake automatically detects the page size:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
cmake --install build|
Note
|
Automatic detection only works when building ON the target architecture. For cross-compilation, use manual configuration below. |
When cross-compiling or needing explicit control, set the page size using
-DJEMALLOC_LG_PAGE=XX, where XX is the log2 of the page size.
# For 4 KiB pages (most x86_64, some ARM64)
cmake -B build -DJEMALLOC_LG_PAGE=12
# For 16 KiB pages (Apple Silicon, Raspberry Pi 5)
cmake -B build -DJEMALLOC_LG_PAGE=14
# For 64 KiB pages (ARM64 servers)
cmake -B build -DJEMALLOC_LG_PAGE=16# On Ubuntu x86_64, cross-compiling for Raspberry Pi 5
sudo apt-get install gcc-aarch64-linux-gnu
# MUST specify 16K pages (Pi 5 uses 16K pages)
cmake -B build \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
-DJEMALLOC_LG_PAGE=14 # Critical: Pi 5 = 16K pages
cmake --build buildThe Tebako fork includes native heap profiling support on Windows using the
CaptureStackBackTrace() Windows API. This enables full profiling capabilities
on all supported Windows platforms, as authored by @roblabla.
Features:
-
Works on x64, x86, and ARM64 architectures
-
Works with both MSVC and MinGW toolchains
-
Native API: Uses Windows
CaptureStackBackTrace()for fast backtrace collection -
CMake integrated: Automatically detected during build configuration
For comprehensive profiling documentation including build instructions, usage examples, configuration options, and platform-specific considerations, see profiling.adoc.
The Tebako fork enables malloc_conf configuration override from libraries on
Windows MSVC builds, matching the Unix weak symbol behavior. This was authored by @roblabla.
Traditional limitation: On MSVC, weak symbols aren’t supported, so
malloc_conf could only be overridden from object files (.obj), not from
library files (.lib). This broke important scenarios:
-
Rust applications (always generate
.libfiles) -
Multi-library projects where configuration is in a library
-
Dynamic library runtime configuration
-
Plugin architectures
Tebako solution: Uses MSVC /alternatename linker directive to emulate weak
symbols, enabling library-level overrides identical to Unix platforms.
Override malloc_conf from any source file in your application or library:
// In your application or library (not just main.obj!)
const char *je_malloc_conf = "narenas:2,tcache:true,dirty_decay_ms:5000";This works identically to weak symbols on Unix platforms and is especially useful for:
-
Rust applications using FFI to jemalloc
-
Multi-language projects with library dependencies
-
Applications built from multiple static libraries
-
Systems where configuration must be embedded in libraries
The implementation uses the MSVC /alternatename linker pragma:
#pragma comment(linker, "/alternatename:je_malloc_conf=malloc_conf_default")This tells the linker: "If je_malloc_conf isn’t defined elsewhere, use
malloc_conf_default (which is NULL)." Applications and libraries can override
it just like on Unix.
Platform support:
-
✅ Windows MSVC x64, x86, ARM64 (uses
/alternatename) -
✅ Windows MinGW x64, x86 (uses standard weak attributes)
-
✅ All Unix platforms (uses weak attributes)
Based on upstream PR #2689 by @roblabla.
jemalloc supports Windows via both MSVC (Visual Studio 2015+) and MinGW toolchains, with comprehensive CI testing across x64, x86, and ARM64 architectures.
The build system automatically uses MSVC-compatible header wrappers for portability:
-
msvc_compat/strings.h- Providesffs(),ffsl(),ffsll()functions using MSVC intrinsics -
msvc_compat/windows_extra.h- Additional Windows compatibility layer
These headers are used for all Windows builds (MSVC and MinGW) to ensure consistent behavior.
Based on upstream PR #2420 by @threeseed (harana-oss organization).
The Tebako fork of jemalloc uses a comprehensive GitHub Actions CI/CD system that achieves:
-
Native CMake build system - All ~220 CI jobs use CMake (no autotools)
-
Extensive platform coverage - Comprehensive testing across:
-
Linux (glibc): Ubuntu 22.04/24.04 on x64, ARM64, and 32-bit (cross-compilation)
-
Linux (musl): Alpine Linux 3.20 on x64 and ARM64
-
Linux (RISC-V): Ubuntu 24.04 on riscv64
-
Windows: Server 2022/2025 and Windows 11 on x64, x86, and ARM64
-
macOS: versions 13-15 on Intel x64 and Apple Silicon ARM64
-
FreeBSD: versions 14.x and 15.x
-
-
Two-level testing approach:
-
Level 1: Fast smoke tests (~16 jobs including musl and RISC-V, ~2-3 minutes)
-
Level 2: Comprehensive testing (~200+ jobs covering all configure flag combinations, ~30-45 minutes)
-
-
Quality checks - Automated trailing whitespace detection and static analysis with CodeChecker
|
Note
|
The original jemalloc project relied on a fragmented CI approach using Travis CI, AppVeyor, and Cirrus CI. |
-
Profiling Guide - Comprehensive heap profiling across all platforms
-
PROFILING_INTERNALS.md - In-depth profiling implementation details
-
CMAKE_INTEGRATION.adoc - Comprehensive CMake and vcpkg integration guide
-
RELEASING.adoc - Complete release process guide
-
INSTALL.adoc - Building and installation instructions
Contributions are welcome! Please:
-
Fork the repository
-
Create a feature branch from
main -
Make your changes with clear, semantic commit messages
-
Run the full test suite:
make check -
Submit a pull request to the
mainbranch
-
Tebako fork: https://github.com/tamatebako/jemalloc
-
Original jemalloc (discontinued): https://github.com/jemalloc/jemalloc
-
Facebook fork (parent of this fork): https://github.com/facebook/jemalloc
-
Tebako project: https://github.com/tamatebako
jemalloc is licensed under the BSD-2-Clause license. See the COPYING file for details.
This fork is based on the work of:
-
Jason Evans - Original jemalloc author
-
Facebook - Maintained fork with additional features
-
FreeBSD community - Integration and testing
-
All contributors to the jemalloc project
We continue their excellent work while maintaining and enhancing jemalloc for modern use cases.