Skip to content

Job scheduler possible OOM errors #86

@Bethibande

Description

@Bethibande

Task

  • Limit the in-memory queue size for remote workers
  • Otherwise the distributed scheduler might queue thousands of jobs on one node and cause OOM errors
  • Each node should have a fixed limit and enforce it, either by exposing the current queue size to the scheduler or by returning an error when the scheduler queues a new job
  • Also make sure, when checking for jobs to run, the scheduler doesn't just load 10,000 jobs at once

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions