Component
confconsole Let's Encrypt plugin — plugins.d/Lets_Encrypt/add-water-srv
Environment
- Appliance: turnkey-odoo-18.0-bookworm-amd64
- confconsole: 2.1.6.1
- dehydrated: 0.7.0-3
- python3: 3.11.2
Summary
The internal control socket used by add-water-srv binds to 127.0.0.1:9977
without setting SO_REUSEADDR. If add-water is stopped and started again
within the TCP TIME_WAIT window (~60s), the bind() fails with
OSError: [Errno 98] Address already in use. The handle_token_input thread
dies, so the Bottle server on port 80 comes up but the control channel is gone.
dehydrated then calls the deploy_challenge hook, add-water-client tries to
connect to 127.0.0.1:9977, and gets ConnectionRefusedError: [Errno 111] Connection refused. The whole certificate run aborts.
Root cause
In add-water-srv, handle_token_input():
def handle_token_input():
host = '127.0.0.1'
port = 9977
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((host, port)) # no SO_REUSEADDR -> fails while previous socket is in TIME_WAIT
sock.listen(1)
There is no SO_REUSEADDR (confirmed: grep -rn "SO_REUSEADDR\|setsockopt" over
the plugin directory returns nothing).
Steps to reproduce
1. Run the wrapper successfully once:
/usr/lib/confconsole/plugins.d/Lets_Encrypt/dehydrated-wrapper --register --force --log-info
2. Immediately run it again (within ~60s).
3. The second run fails.
Actual output (second run)
[...] dehydrated-wrapper: INFO: stopping apache2
[...] dehydrated-wrapper: INFO: running dehydrated
+ Deploying challenge tokens...
[...] confconsole.hook.sh: INFO: Deploying challenge for <domain>
Traceback (most recent call last):
File ".../add-water-client", line 41, in <module>
sock.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR: deploy_challenge hook returned with non-zero exit code
[...] dehydrated-wrapper: FATAL: dehydrated exited with a non-zero exit code.
And in journalctl -u add-water:
Bottle v0.12.23 server starting up (using WSGIRefServer())...
Listening on http://:::80/
Exception in thread Thread-1 (handle_token_input):
File ".../add-water-srv", line 87, in handle_token_input
sock.bind((host, port))
OSError: [Errno 98] Address already in use
Expected behavior
Re-running the wrapper shortly after a previous run should succeed (e.g. retry
after a transient failure) instead of aborting on a lingering control socket.
Proposed fix
Set SO_REUSEADDR before binding the control socket in handle_token_input():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # add this
sock.bind((host, port))
sock.listen(1)
Notes / impact
- This does NOT affect the daily renewal cron (it only runs once per day, so it
never hits the TIME_WAIT window). It bites operators who retry manually after a
failed/aborted run, or who test against staging and then immediately switch to
production.
- Possibly related minor quirk: dehydrated-wrapper logs
WARNING: Python is still listening on port 80 on successful runs too, then
force-stops add-water. Might be worth tightening the shutdown ordering so the
warning only appears when something is genuinely wrong.
---
Component
confconsole Let's Encrypt plugin —
plugins.d/Lets_Encrypt/add-water-srvEnvironment
Summary
The internal control socket used by
add-water-srvbinds to127.0.0.1:9977without setting
SO_REUSEADDR. Ifadd-wateris stopped and started againwithin the TCP
TIME_WAITwindow (~60s), thebind()fails withOSError: [Errno 98] Address already in use. Thehandle_token_inputthreaddies, so the Bottle server on port 80 comes up but the control channel is gone.
dehydratedthen calls thedeploy_challengehook,add-water-clienttries toconnect to
127.0.0.1:9977, and getsConnectionRefusedError: [Errno 111] Connection refused. The whole certificate run aborts.Root cause
In
add-water-srv,handle_token_input():