-
Notifications
You must be signed in to change notification settings - Fork 1
Job scheduler possible OOM errors #86
Copy link
Copy link
Open
Labels
Description
Task
- Limit the in-memory queue size for remote workers
- Otherwise the distributed scheduler might queue thousands of jobs on one node and cause OOM errors
- Each node should have a fixed limit and enforce it, either by exposing the current queue size to the scheduler or by returning an error when the scheduler queues a new job
- Also make sure, when checking for jobs to run, the scheduler doesn't just load 10,000 jobs at once
Reactions are currently unavailable