Skip to content

fix: add rate limiting and retry logic for Aurora and OSDG API calls#52

Open
adarsh-7-satyam wants to merge 1 commit into
chaoss:mainfrom
adarsh-7-satyam:feat/rate-limit-handling
Open

fix: add rate limiting and retry logic for Aurora and OSDG API calls#52
adarsh-7-satyam wants to merge 1 commit into
chaoss:mainfrom
adarsh-7-satyam:feat/rate-limit-handling

Conversation

@adarsh-7-satyam
Copy link
Copy Markdown

Closes #51

Problem

Neither the Aurora API (aurora_api.py) nor the OSDG API (app.py) had any rate limiting, retry logic, or 429 error handling. Specifically:

  • aurora_api.py called requests.request() with no timeout and no retry — any transient failure or rate limit would immediately crash the classification
  • app.py (OSDG route) had a hardcoded timeout=1000 (clearly a bug — this is 1000 seconds) with no retry or 429 detection
  • Both APIs silently swallowed rate limit errors inside a generic except block, returning a misleading 500 error to the frontend

This is a critical gap for the DMP 2026 goal of bulk testing 100 projects from the DPG registry, since both Aurora and OSDG APIs enforce a 1 request/second rate limit.


Changes Made

backend/aurora_api.py

  • Added import time
  • Replaced the bare requests.request() call with a retry loop (max 3 attempts)
  • Added explicit 429 detection with exponential backoff: waits 1s, 2s, 4s between retries
  • Added timeout=30 to prevent indefinite hanging
  • Returns a clean, descriptive error dict after retries are exhausted instead of crashing

backend/app.py

  • Added import time
  • Replaced the bare requests.post() OSDG call with a retry loop (max 3 attempts)
  • Added explicit 429 detection with exponential backoff: waits 1s, 2s, 4s between retries
  • Fixed timeout=1000 bug — changed to timeout=30
  • Returns proper HTTP 429 status to the frontend when all retries are exhausted, instead of a misleading 500

How It Works

Both APIs now follow this flow on every request:

  1. Make the API call with timeout=30
  2. If response is 429 → wait 1 * (2 ** attempt) seconds and retry
  3. If response is any other error → raise immediately
  4. If all 3 retries are exhausted → return a descriptive error with the correct status code
  5. If successful → continue normally

Exponential backoff sequence: 1s → 2s → 4s


Testing Done

  • Verified import time present in both files
  • Verified max_retries = 3 and retry_delay = 1 present in both files
  • Verified time.sleep(wait) present in both files
  • Verified status_code == 429 check present in both files
  • Verified timeout=1000 is completely removed from app.py
  • Verified timeout=30 set correctly in both files
  • All 10 verification checks passed

Notes

  • No new dependencies added — uses only Python's built-in time module
  • No changes to API response format — existing frontend behaviour is unchanged
  • This fix is a prerequisite for the DMP 2026 intern's bulk testing workflow against the DPG registry

Signed-off-by: Adarsh Satyam <adarsh5.satyam@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

External API calls have no rate limiting or retry logic, causing failures during bulk testing

1 participant