fix: update armada to respect node pod limits by jparraga-stackav · Pull Request #4517 · armadaproject/armada

jparraga-stackav · 2025-11-06T00:52:54Z

What type of PR is this?

Bug fix

What this PR does / why we need it:

This pull request updates armada to respect node limits. Without this change it is possible for Armada to try and schedule pods to nodes which do not have capacity in regards to concurrent running pods. When this happens the pods get stuck in a pending state and end up getting lease returned. Presumably Armada considers these pods towards fair share even though they cannot run.

Configures default pod limits in the Armada server to include pods
Configures the Armada scheduler to consider pods resource in scheduling decisions by default
Updates the executor to sanitize pod resources before submitting pods to k8s
Configures the executor to sanitize pod pods resources by default

Which issue(s) this PR fixes:

Fixes #4515

Special notes for your reviewer:

In order to rollout safely the executors must be updated first. After that the scheduler/server can be rolled out in any order.

Signed-off-by: Jason Parraga <jparraga@stackav.com>

dejanzele · 2025-11-07T14:38:50Z

 	FailedPodChecks           podchecks.FailedChecks
 	PendingPodChecks          *podchecks.Checks
 	FatalPodSubmissionErrors  []string
+	ResourcesToSanitize       []string


Can you please add docs for the new field?

In our docs, we often link to config in https://pkg.go.dev, i.e. for the executor it will be https://pkg.go.dev/github.com/armadaproject/armada/internal/executor/configuration#ApplicationConfiguration, it is easier for the end-users if they can see the config field docs there.

dejanzele · 2025-11-07T14:46:57Z

 }

+// Sanitizes pod resources that may be used during scheduling but are invalid at runtime.
+func (submitService *SubmitService) sanitizePodResources(pod *v1.Pod) {


nit: this function can be simplified to a simple sanitizePodResources function, no need for it to be a receiver method on SubmitService, same for sanitizeResourceList

JamesMurkin · 2025-11-07T15:51:26Z

Hey so generally a good direction but I there are a few things I think we'd like to consider / possibly change

Generally we're trying to move to a world where all mutation happens on the executor API, and the executor does less edits. This PR does not move us in that direction
We already basically have this functionality in the executor API

https://github.com/armadaproject/armada/blob/master/internal/scheduler/api.go#L158
I think you could achieve what you want by simply adding a config option to supportedResourceTypes to say if we send it to the executor

I think we should debate if we add pods to more of the config:

indexedResources - I think add it here
dominantResourceFairnessResourcesToConsider - possibly add it here. It'd mean we fairshare on pods, which is probably correct in some cases (happy to leave this for now)

Could we confirm if you overwrite the config it does actually remove pods (I can't remember if it merges or overwrites and I don't have time to test now)
I'm somewhat debating if we should add it as defaultJobLimits as it means now the api + scheduler need to be configured in unison. It could just be a scheduler concern (tbh all of this PR could be just a scheduler concern) but I think this is probably the most configurable option for now.

dave-gantenbein · 2026-01-21T15:19:43Z

@dejanzele @nikola-jokic to reveiw

dejanzele · 2026-04-24T13:41:17Z

Closing in favour of #4841

@Sovietaced

…ity (#4841) ## Summary - Adds `scheduling.respectNodePodLimits` feature flag (default `false`) that enables the scheduler to track `pods` as a resource and reject scheduling to nodes that have exhausted their pod limit (`node.Status.Allocatable["pods"]`) - When enabled, the scheduler programmatically registers `pods` in `supportedResourceTypes` and `indexedResources` at startup, and injects `pods: 1` into every job's internal resource requirements - The executor now always reports non-Armada pod count in `NonArmadaAllocatedResources` so the scheduler can subtract system/DaemonSet pods from available capacity Fixes #4515 This PR builds on top of #4517 and big thanks for the initial work to @Sovietaced ## Operator upgrade notes - **Executor change is unconditional.** After the executor upgrade, `NonArmadaAllocatedResources` gains a `pods` key in every report regardless of whether any scheduler has the flag enabled. Dashboards, metrics, or custom consumers that iterate this map generically (e.g. sum over all keys) will start including pod counts. Audit prometheus / Grafana panels before rollout. - **Rollback is clean.** Reverting the scheduler flag to `false` stops the scheduler from tracking pods; reverting the executor binary removes the `pods` key from its reports. Neither requires data migration. - **Rolling upgrade order is flexible.** Old scheduler + new executor is safe (scheduler's `FromNodeProto` silently drops unknown resources). New scheduler + old executor briefly overestimates free pod capacity by the count of non-Armada pods per node (~10-30 typically, DaemonSets + system pods), since the old executor does not report them. The overestimate resolves as executors are upgraded. ## Known limitations - `pods` is **not** added to `dominantResourceFairnessResourcesToConsider`. On dense-pod nodes (e.g. GKE's 110-pod limit) a queue running many small pods can monopolize pod slots without a fair-share penalty. Deferred per reviewer request; follow-up if this becomes a problem in practice. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

Jason Parraga added 2 commits November 6, 2025 00:47

Update armada to respect node pod limits

3986aa7

Signed-off-by: Jason Parraga <jparraga@stackav.com>

Fix typo and improve unit test

135924e

Signed-off-by: Jason Parraga <jparraga@stackav.com>

dejanzele reviewed Nov 7, 2025

View reviewed changes

Merge branch 'master' into issue-4515

8cb1993

Merge branch 'master' into issue-4515

8a924ca

dejanzele mentioned this pull request Apr 16, 2026

Add respectNodePodLimits scheduler flag to enforce per-node pod capacity #4841

Merged

dejanzele closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update armada to respect node pod limits#4517

fix: update armada to respect node pod limits#4517
jparraga-stackav wants to merge 4 commits intoarmadaproject:masterfrom
jparraga-stackav:issue-4515

jparraga-stackav commented Nov 6, 2025 •

edited

Loading

Uh oh!

dejanzele Nov 7, 2025

Uh oh!

dejanzele Nov 7, 2025

Uh oh!

dejanzele Nov 7, 2025 •

edited

Loading

Uh oh!

JamesMurkin commented Nov 7, 2025

Uh oh!

dave-gantenbein commented Jan 21, 2026

Uh oh!

dejanzele commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jparraga-stackav commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Uh oh!

dejanzele Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

dejanzele Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

dejanzele Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JamesMurkin commented Nov 7, 2025

Uh oh!

dave-gantenbein commented Jan 21, 2026

Uh oh!

dejanzele commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jparraga-stackav commented Nov 6, 2025 •

edited

Loading

dejanzele Nov 7, 2025 •

edited

Loading