Background
We currently run a long-lived Ruby process that periodically scans the database for Job records in a processing state and checks whether their expected output file exists in S3.
This polling approach is simple and works at our current scale, but it has some tradeoffs:
- Continuous DB reads
- Repeated S3 checks
- Latency between job completion and detection
- Requires a perpetual scanning process
AWS S3 supports emitting events when objects are created, which can be delivered to SQS. This opens up the possibility of shifting from a polling-based model to an event-driven model.
Goal
Investigate whether we should move from:
DB + S3 polling → detect completion
to:
S3 event → SQS message → worker updates Job state
This would allow the system to react to job completion instead of repeatedly checking for it.
Questions to Explore
-
How complex is the S3 → SQS setup?
-
What changes would be required in our Ruby worker architecture?
-
How would we safely handle duplicate or out-of-order events?
-
Would this materially reduce:
- DB load?
- S3 API calls?
- long-running process complexity?
-
Operational tradeoffs:
- observability
- retries
- failure handling
Non-Goals (for now)
This is not a commitment to implement — only an investigation.
We are not trying to prematurely optimize or redesign the pipeline without evidence.
Outcome
Document:
- Pros / cons
- Implementation complexity
- Estimated effort
- Recommendation: stay with polling vs move to event-driven
Motivation
If job volume increases, an event-driven approach may provide:
- faster completion detection
- lower infrastructure load
- better scalability
Worth understanding before we need it.
Background
We currently run a long-lived Ruby process that periodically scans the database for
Jobrecords in aprocessingstate and checks whether their expected output file exists in S3.This polling approach is simple and works at our current scale, but it has some tradeoffs:
AWS S3 supports emitting events when objects are created, which can be delivered to SQS. This opens up the possibility of shifting from a polling-based model to an event-driven model.
Goal
Investigate whether we should move from:
DB + S3 polling → detect completion
to:
S3 event → SQS message → worker updates Job state
This would allow the system to react to job completion instead of repeatedly checking for it.
Questions to Explore
How complex is the S3 → SQS setup?
What changes would be required in our Ruby worker architecture?
How would we safely handle duplicate or out-of-order events?
Would this materially reduce:
Operational tradeoffs:
Non-Goals (for now)
This is not a commitment to implement — only an investigation.
We are not trying to prematurely optimize or redesign the pipeline without evidence.
Outcome
Document:
Motivation
If job volume increases, an event-driven approach may provide:
Worth understanding before we need it.