Generic resource manager by AbrarQuazi · Pull Request #1728 · sifiveinc/wake

Abrar Quazi (AbrarQuazi) · 2026-01-13T16:13:04Z

Resource management system to control how many resources jobs are allowed to utilize. Add resource limits in .wakeroot like so:

{
  "resource_limits": {
    "test_resource": 4
  }
}

And configure how many resources a job will use in the plan: requireResources "test_resource" 1

Abrar Quazi (AbrarQuazi) · 2026-01-13T16:13:56Z

    export Resources: List String
+    # Resource requirements that must be acquired before the job runs.
+    # These are consumable units managed by the Wake runtime (lisenses, etc).
+    export ResourceReqs: List ResourceRequirement


Will need a better name for this to differentiate between Resources above

I know I've said this a couple times, but are you finding a benefit to putting it here rather than in Usage? If so, I'll stop beating that drum. If not, that one's already used for limiting job launching, and would allow runners to report arbitrary usage metrics (for example, I could see a heuristic-based runner reporting "number of iterations to find an allowed parameter set").

Also, you should probably have Map String a rather than List ThingContainingStringKey to avoid having to deal with some resource getting listed multiple times with different values.

Abrar Quazi (AbrarQuazi) · 2026-01-13T16:14:39Z

+        !jobtable->imp->resource_manager.can_acquire(task.resources)) {
+
+      // TODO move this check potentially in prim_job_launch, have it fail the job
+      // instead of exiting the process.


Reminder for this todo

The way we do this for memory/CPU predictions that exceed the system limits is throttle everything down so the overlarge job is the only one that's running, but we do still allow it to run -- if the prediction was accurate and was a hard limit, we let the failure happen naturally in the course of the job, rather than trying to be preemptive about it and risk killing something that could have dynamically scaled down. These custom limits are admittedly a bit more likely to be hard limits than the memory/CPU, but for ease of mental modelling I think we should keep the same behaviour.

Abrar Quazi (AbrarQuazi) · 2026-01-13T16:15:20Z

 };

+// Policy for resource limits configuration
+// Configured in .wakeroot as:


Maybe it might be better to have this in its own file, outside of .wakeroot?

For better or worse, .wakeroot has become the workspace configuration file. I really don't think we need to introduce yet another one.

Sam May (ag-eitilt) · 2026-01-27T17:53:42Z

    export Resources: List String
+    # Resource requirements that must be acquired before the job runs.
+    # These are consumable units managed by the Wake runtime (lisenses, etc).
+    export ResourceReqs: List ResourceRequirement


I know I've said this a couple times, but are you finding a benefit to putting it here rather than in Usage? If so, I'll stop beating that drum. If not, that one's already used for limiting job launching, and would allow runners to report arbitrary usage metrics (for example, I could see a heuristic-based runner reporting "number of iterations to find an allowed parameter set").

Also, you should probably have Map String a rather than List ThingContainingStringKey to avoid having to deal with some resource getting listed multiple times with different values.

Sam May (ag-eitilt) · 2026-01-27T17:58:55Z

+def serializeResourcesToJson (resources: List ResourceRequirement): String =
+    def serializeOne (ResourceRequirement name count) =
+        "\{\"name\": \"{name}\", \"count\": {str count}\}"


Probably better returning a JValue that gets piped into formatJSON at point of use; we'd be able to use this same serializer with prettyJSON for outputting to human-readable files.

EDIT: Ah, you already have resourceToJson -- just use that for both cases.

Sam May (ag-eitilt) · 2026-01-27T18:03:12Z

 };

+// Policy for resource limits configuration
+// Configured in .wakeroot as:


For better or worse, .wakeroot has become the workspace configuration file. I really don't think we need to introduce yet another one.

Sam May (ag-eitilt) · 2026-01-27T18:10:39Z

+        !jobtable->imp->resource_manager.can_acquire(task.resources)) {
+
+      // TODO move this check potentially in prim_job_launch, have it fail the job
+      // instead of exiting the process.


The way we do this for memory/CPU predictions that exceed the system limits is throttle everything down so the overlarge job is the only one that's running, but we do still allow it to run -- if the prediction was accurate and was a hard limit, we let the failure happen naturally in the course of the job, rather than trying to be preemptive about it and risk killing something that could have dynamically scaled down. These custom limits are admittedly a bit more likely to be hard limits than the memory/CPU, but for ease of mental modelling I think we should keep the same behaviour.

Sam May (ag-eitilt) · 2026-01-27T18:14:10Z

+        for (const auto& req : task.resources) {
+          int64_t limit = jobtable->imp->resource_manager.limit(req.name);
+          if (limit >= 0 && req.count > limit) {


I might be misreading that function, but why aren't we using can_acquire here?

Sam May (ag-eitilt) · 2026-01-27T18:21:42Z

+  return true;
+}
+
+void ResourceManager::acquire(const std::vector<ResourceRequirement>& requirements) {


I know we're still single-threaded, but if we do wind up allowing multiple threads over the same pool (more likely in the form of multi-wake communication rather than as multithreading a single instance's job manager) then having can_acquire and acquire be separate opens us up to racing more than something atomic would. Not sure how folding it into a single function would wind up looking for the rest of the code, and it's likely nothing we'd have to worry about in the near future, but might be something to look into.

Generic resource manager

06455fc

Abrar Quazi (AbrarQuazi) requested a review from Nathan Graybeal (ngraybeal) January 13, 2026 16:13

Abrar Quazi (AbrarQuazi) commented Jan 13, 2026

View reviewed changes

Sam May (ag-eitilt) reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic resource manager#1728

Generic resource manager#1728
Abrar Quazi (AbrarQuazi) wants to merge 1 commit into
masterfrom
generic-resource-manager

Abrar Quazi (AbrarQuazi) commented Jan 13, 2026

Uh oh!

Abrar Quazi (AbrarQuazi) Jan 13, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Abrar Quazi (AbrarQuazi) Jan 13, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Abrar Quazi (AbrarQuazi) Jan 13, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abrar Quazi (AbrarQuazi) commented Jan 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sam May (ag-eitilt) Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sam May (ag-eitilt) Jan 27, 2026 •

edited

Loading