Skip to content

SNOW-3192362: Fix auth callback recv blind sleep with MSG_DONTWAIT#2804

Draft
sfc-gh-fpawlowski wants to merge 1 commit intomainfrom
fix/py314-auth-recv-select-cooldown
Draft

SNOW-3192362: Fix auth callback recv blind sleep with MSG_DONTWAIT#2804
sfc-gh-fpawlowski wants to merge 1 commit intomainfrom
fix/py314-auth-recv-select-cooldown

Conversation

@sfc-gh-fpawlowski
Copy link
Copy Markdown
Contributor

SNOW-3192362: Fix auth callback recv blind sleep with MSG_DONTWAIT

Summary

  • Replace time.sleep(cooldown) with select.select([client_socket], [], [], cooldown) in the MSG_DONTWAIT retry path of _try_receive_block, making the cooldown data-aware
  • Fixes flaky test_auth_callback_success failures (particularly with dontwait=true, timeout=0.05) on loaded CI runners

Root Cause

In _try_receive_block (src/snowflake/connector/auth/_http_server.py:194-201), the MSG_DONTWAIT code path uses a blind time.sleep() between non-blocking recv attempts:

# recv with MSG_DONTWAIT raises BlockingIOError instantly if no data
except BlockingIOError:
    if attempt < attempts - 1:
        cooldown = min(attempt_timeout, 0.25) if attempt_timeout else 0.25
        time.sleep(cooldown)  # <-- BLIND: data arriving during sleep is invisible

With timeout=0.05 and max_attempts=15:

  • attempt_timeout = 0.05 / 15 = ~3.3ms
  • Each iteration: instant recv check (0ms useful work) + 3.3ms blind sleep
  • 12 recv attempts = ~0ms of data-checking time, ~36ms of blind sleeping
  • On a loaded CI runner, the data arrival window is easily missed entirely

Compare with the dontwait=false path: recv blocks for up to attempt_timeout, returning as soon as data arrives. Every millisecond is spent actively waiting for data.

The core issue: time.sleep() is not socket-aware. Data arriving on the socket during the sleep cannot wake it up. The process sleeps for the full cooldown duration regardless of whether data is available.

Fix

One-line change in src/snowflake/connector/auth/_http_server.py, line 201:

# Before:
    time.sleep(cooldown)

# After:
    select.select([client_socket], [], [], cooldown)

select.select() monitors the socket's read-readiness with a timeout of cooldown. If data arrives during the wait, select returns early and the next recv succeeds immediately. If no data arrives, select times out after cooldown — same wallclock behavior as the original time.sleep.

Why this is correct and safe

  1. Only affects MSG_DONTWAIT pathBlockingIOError is never raised without MSG_DONTWAIT, so the dontwait=false path is completely untouched
  2. Never runs on Windows_use_msg_dont_wait() returns False on Windows (lines 26-30), so this code path is Unix-only where select works on sockets
  3. select already imported and used — module-level import at line 9, used in _try_poll at line 180
  4. Spurious wakeups handled — if select returns readable but recv still raises BlockingIOError, the existing retry loop handles it gracefully
  5. No signature/API changes — only the internal sleep mechanism changes
  6. time import stays — still used in the bind retry loop (line 116)

Test plan

  • python -m pytest test/unit/test_auth_callback_server.py -v — all 318 tests pass
  • python -m pytest test/unit/test_auth_callback_server.py -v -k "test_auth_callback_success" — the specific failing tests
  • Full unit suite: python -m pytest test/unit/ -x --timeout=120

Replace time.sleep(cooldown) with select.select([client_socket], [], [], cooldown)
in the MSG_DONTWAIT retry path so the cooldown is data-aware. If data arrives
during the wait, select returns early and the next recv succeeds immediately,
fixing flaky test_auth_callback_success failures on loaded CI runners.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sfc-gh-fpawlowski sfc-gh-fpawlowski self-assigned this Mar 23, 2026
@sfc-gh-fpawlowski sfc-gh-fpawlowski added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO_ASYNC_CHANGES labels Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NO_ASYNC_CHANGES NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant