Add datacenter_compute_allocation template (multi-reasoner)#68
Merged
Conversation
…low-up to energy_grid_planning)
Inside-the-fence GPU allocation across the 5 hyperscaler campuses the
upstream energy_grid_planning $300M solve approves. Four reasoner stages
on a shared ontology:
- Stage 1 Predictive: heterogeneous-graph GNN forecasts per-lab training
intensity (cross-lab co_dated edges carry industry co-movement;
--no-gnn falls back to a precomputed CSV)
- Stage 2 Rules: hardware compatibility (memory + GPU-type allowlist)
+ priority-tier classification (P0/P1/P2 from contract_tier)
- Stage 3 Graph: reverse-PageRank on the WorkloadDependency.blocks DAG
- Stage 4 Prescriptive: assignment MIP indexed by a 3D Scenario sweep
(PowerEnvelopeLevel x MarginFloor x DiversityCap = 48 cells)
Each stage writes derived properties back to the ontology that the
next stage reads. After the solve, the chosen baseline cell
(100pct / unconstrained / none) is persisted as a singleton
AllocationPlan Concept plus an Assignment.is_chosen unary Relationship,
mirroring telco_network_recovery's RestorePlan / is_selected_upgrade
pattern.
Runs in both chain mode (binds to upstream Model("Energy Grid
Infrastructure") and reads x_approve outcomes) and standalone mode
(loads the bundled DC snapshot).
References:
- runbook.md: operational runbook (prereqs, two-mode run, GNN fallback,
troubleshooting)
- references/runbook.md: analyst paste-test runbook with per-stage
skill callouts (rai-build-starter-ontology, rai-querying,
rai-discovery, rai-predictive-modeling/-training, rai-rules-authoring,
rai-graph-analysis, rai-prescriptive-problem-formulation,
rai-prescriptive-results-interpretation, rai-ontology-design)
|
The docs preview for this pull request has been deployed to Vercel!
|
Match the structural conventions of telco_network_recovery and energy_grid_planning: - runbook.md is now the analyst paste-test walkthrough (chain ASCII + 10 Workflow prompts with per-stage rai-* skill callouts + Data), not an operational doc. Same heading shape and Prompt/Response fenced-block style as the peer multi-reasoner runbooks. - references/ directory removed (no peer template has it). - README.md gains the peer sections: Prerequisites (with Access / Tools / Snowflake setup), Quickstart (numbered download/venv/install/rai-init/ run with expected output and runtime), Template structure (file tree), How it works (code-snippet walkthrough per stage), Troubleshooting (collapsible <details> blocks), Learn more, Support. - Operational content (prereqs, run modes, GNN-vs-baseline lift hint, troubleshooting table) folded from the old runbook.md into the appropriate README sections. - Module docstring gains the Output: block alongside Run: (peer convention from telco_network_recovery and energy_grid_planning). No behavior change in the script.
…y_grid_planning style
Apply the dev-templates-review checklist against the upstream sibling
(energy_grid_planning) and align:
Runbook
- Shorten prompts from 100-250 words to ~30-70 words each, matching
energy's brevity.
- Convert imperative prompts to question form (Steps 5, 7, 8, 9).
- Describe the structural test in Step 7 ("workloads where, if they
slip, many later workloads slip too, and those gate even more")
instead of naming reverse-PageRank.
- Tighten Response paragraphs to 1-2 sentences with the headline numbers,
matching energy's punchier shape.
- Fold the persist step into Stage 4 in the chain ASCII so the headline
plan + persisted-ontology bullets appear together, matching energy's
pattern.
- Add a sequential-cascade preface: "Prompts below are designed to run
in order in a single session so each step inherits the ontology
state from the previous step."
README
- "What this template is for" now names reasoning types in **bold**
("predictive ... rules ... graph ... prescriptive") per the
dev-templates-review checklist + energy convention.
- Replace the "Rules alone classify... Graph alone ranks..." enumeration
apologetic with the peer-style bolded reasoning-type paragraph.
- Drop the "GNN lift over a per-lab tabular baseline" framing in
How it works Stage 1; describe the cross-concept signal a
heterogeneous GNN propagates instead. ("Why X reasoner over Y" prose
is inside-baseball and belongs in evals, not READMEs.)
No script behavior change.
…ual lineage
The literal cross-process chain (binding to upstream
Model('Energy Grid Infrastructure') populated by energy_grid_planning)
hits a cross-process model-state visibility issue in PyRel 1.x and isn't
the demo we want anyway. Reframe energy_grid_planning as conceptual
lineage instead: the bundled data_centers.csv is a snapshot of that
upstream $300M-approved campus set, and this template demonstrates the
operator-side allocation decision that picks up after.
Script
- Drop --standalone and --investment-level CLI flags. Only --no-gnn
and --gnn-strict remain.
- Drop the chain-bind branch in main() (upstream Model() rebind,
InvestmentLevel filter, x_approve lookup, side-table dollars_per_mwh
attach, RuntimeError pre-flight).
- Drop the data_center_attrs.csv side table (dollars_per_mwh now lives
directly on data_centers.csv).
- main() now just loads the snapshot CSV into a fresh
Model('Datacenter Compute Allocation'). ~110 LOC simpler.
README
- Reframe lede as conceptual sequence: "This template demonstrates the
operator-side decision that picks up where energy_grid_planning
leaves off. ... The two templates form a conceptual sequence over the
same domain, not a literal engine-level chain."
- Drop the "0. Chain bind" row from Reasoner overview (4 stages now,
not 5).
- Quickstart now has 7 steps (was 9); single run command + --no-gnn.
- Drop "Cross-template ontology extension" key design pattern bullet
(no longer applicable).
- Drop data_center_attrs.csv from What's included + Template structure.
- Drop chain-mode troubleshooting entry.
- Customize section: rephrase DC editing in terms of the snapshot CSV
rather than chain-mode investment-level switching.
Runbook
- Lead-in: "campuses ... a snapshot of the campus set the upstream
energy_grid_planning $300M solve approved" rather than "the 5
hyperscaler campuses the upstream solve approved".
Distill the AI-compute capital-allocation framework into the operator template's narrative + structural additions. Operator POV preserved throughout; no name-checks of any specific source. Narrative (README — items 1-5): - Power envelope cells now framed explicitly as the cone of uncertainty: 85% lower-cone curtailment / 100% expected / 110% upper-cone planning point. Planning to the midpoint is how operators end up under- provisioned to anchors when frontier demand outruns the forecast. - Gross-margin discipline reframed at the envelope level, not per-token. - Anchor concentration reframed as a strategic floor: must be served wherever feasible because penalty is contractual + reputational, not just foregone revenue. - New bullet on the generational layer cake (H100/H200/GB200 sit at different points on the price-per-effective-GPU-hour curve; effective capacity != nameplate). - Four-factor objective reframed as envelope-level ROI across anchor / opportunistic / research uses, single capital-allocation lens. Adaptation opportunities (README — items 9-10): - Pool fungibility added to "What this template abstracts" as the natural extension capturing the dedicated / swing / scratch distinction (fungible pools are strategically more valuable than single-purpose pools of equal nameplate). - Per-lab range forecast (p10/p50/p90) added as the other natural extension capturing lab-side uncertainty at the same fidelity as the envelope axis. Structural — item 7 (under-provisioning penalty): - Stage 2 (rules) derives Workload.under_provisioning_penalty from priority_tier: P0 = 1.0, P1 = 0.3, P2 = 0.0. - Stage 4 (prescriptive) objective amplifies the assignment reward by (1 + under_provisioning_penalty), so the solver treats anchor under-provisioning as a 2x foregone-revenue loss instead of 1x. - README Reasoner overview + How it works updated. - Runbook Step 6 (rules) prompt + response updated. Structural — item 8 (DemandScenario overlay): - After main solve + AllocationPlan persist, replay the chosen plan under four risk scenarios: expected (factor 1.0), diffusion_slowdown (0.85), scaling_break (0.70), frontier_loss (0.50). P0 anchor revenue is contractual; P1/P2 opportunistic seats realize only the scenario factor. - DemandScenario Concept (4 rows) + DemandScenarioOutlook(scenario) Concept persisted as ontology so the stranded-capacity exposure survives the chain run. - README Reasoner overview + How it works + expected output sample updated. - Runbook Step 9 (results interpretation) combined with the overlay; Step 10 (persist) extended to mention the new concepts. Module docstring Output: block extended to cover the new outputs. No behavior change on the baseline cell (100pct / unconstrained / none) -- all 110 workloads still fit, so the under-provisioning amplifier doesn't change which workloads are selected when all fit. The amplifier matters at tighter cells where P0 vs P1/P2 trade-offs become active.
After running datacenter_compute_allocation.py end-to-end with the new
objective (under-provisioning amplification + DemandScenario overlay),
update README + runbook where actual outputs drift from prior text.
- 85pct margin / none diversity cell now fits 20 workloads (not 18):
the (1 + under_provisioning_penalty) amplifier on P0 nudges the
solver into selecting 2 additional P2 evals that fit under the 85%
floor without violating it. Update both:
- README Quickstart expected-output table: 85pct row 18 -> 20,
revenue 22,032,899.59 -> 22,032,951.05, cost 3,304,895.87 ->
3,304,939.84
- README "Expected per-cell behavior" bullet: drops 89 -> 90
workloads, mention the retained P1 finetunes + P2 evals
- Runbook Step 9 Response: 89 workloads dropped -> 90, add the
14 P0 + 4 P1 + 2 P2 breakdown
- Baseline cell total_cost shifts $4,190,130.34 -> $4,187,977.91
(~$2K solver-noise within TIME_LIMIT). Update README expected output
to match the actual run; "$4.19M" rounding holds elsewhere.
- DemandScenario overlay numbers (~$200K / $400K / $667K stranded)
match real run to within cents.
No script change; this is doc-to-real reconciliation only.
Real paste-test of the runbook Step 2 prompt against the live ontology shows 11 concepts (not 12 as previously stated), with WorkloadGpuCompat missing from the prior enumeration: 5 DataCenterRequest, 28 GpuPool, 6 AILab, 110 Workload, 181 WorkloadGpuCompat, 138 WorkloadDependency, 2,190 LabMetric (date range 2025-05-11 .. 2026-05-10), 6 LabGrowth, plus 3/4/4 Scenario Concepts. Also add the explicit date range to the Response since the Step 2 prompt asks for it.
GNN's strongest task types are node classification and link prediction,
not time-series forecasting. The prior per-LabMetric regression on
training_intensity_growth_rate was effectively a tabular forecast
dressed up with cross-concept edges -- the cross-lab co_dated edges
gave it lift, but the *task* was the wrong fit.
Reframe Stage 1 as the operator's truly load-bearing forward-looking
signal: per-workload binary classification of utilization probability
(will this workload actually use its allocated capacity at high duty
cycle, or stall / be repaced?). Stranded capacity -- depreciation
accruing without offsetting revenue -- is the operator's biggest
economic exposure, and a per-workload signal is sharper than a per-lab
demand multiplier the contracts already lock in.
The heterogeneous message-passing is genuinely load-bearing now:
- LabMetric -> Workload: a workload owned by a fast-ramping lab
inherits signal from lab-side recent activity
- WorkloadDependency.blocks: a workload downstream of a high-utilization
gating pretrain inherits signal through the dep chain
- cross-lab co_dated (LabMetric <-> LabMetric): industry-wide
co-movement signal a per-workload tabular model can't see
Data changes:
- DROP data/train_metrics.csv, val_metrics.csv, test_metrics.csv,
lab_growth_forecasts.csv (per-lab forecast paradigm).
- ADD data/workload_utilization_train.csv (80 workloads + label),
_val.csv (15 workloads + label), _test.csv (110 workloads, no label
-- ALL workloads get a prediction).
- ADD data/workload_utilization_fallback.csv -- the deterministic
latent probability used to generate the synthetic labels; this is
what --no-gnn loads.
Labels generated by a deterministic synthetic process (seed=42) from:
- lab recent training-intensity (lab_growth_forecasts seed values)
- workload type (pretrain bonus, eval penalty)
- dep-DAG gating position (workloads gating many downstream get bonus)
- small gaussian noise
Then thresholded to binary at 0.5.
Script changes:
- stage1_predictive() rewritten as per-Workload binary_classification:
- Drop LabGrowth concept entirely. Workload.utilization_probability
is the direct output, bound from gnn.predictions().probs[1].
- Task tables join Workload to TrainTable/ValTable/TestTable by
workload_id (single-PK join, GNN-friendly).
- GNN(task_type="binary_classification", eval_metric="roc_auc").
- stage4_prescriptive() objective replaces projected_demand_growth
with utilization_probability. Same structure; same four-factor
multiplied by (1 + under_provisioning_penalty).
- Module docstring updated.
Doc changes:
- README front matter description, "What this template is for"
reasoning-types paragraph, "What you'll build", "What's included",
Reasoner overview Stage 1 row, "How it works" Stage 1 code snippet
+ narrative, Customize bullets (drop tabular-baseline RMSE
comparison, replace with ROC-AUC comparison), Troubleshooting
collapsible, "What this template abstracts" point-estimate item
reframed for per-workload range forecast.
- Runbook chain ASCII Stage 1, Step 2 Response (drop LabGrowth from
the 10-concept enumeration), Step 3 Discovery routing description,
Step 5 prompt + Response entirely rewritten.
- Quickstart Expected output Stage 1 sample updated to show
utilization-probability top/bottom 5 instead of per-lab multipliers.
…ions
The prior per-workload single-shot labels (80 training examples) gave
the GNN too little signal to discriminate -- with 68/42 class imbalance
and only 80 examples, the model converged to the positive-class prior
and compressed all predictions to 0.78-0.91.
Reframe as the realistic operator data shape: each workload is observed
monthly. The historical labels CSV now carries 9 months × 110 workloads
of (workload, observation_date, is_high_utilization) tuples, generated
deterministically (seed=42) from:
- structural propensity per workload (lab growth, type bonus, dep
gating position)
- per-month lab activity perturbation (mean training_intensity_growth_rate
from LabMetric for that month)
- cross-lab macro shock per month (cross-lab mean)
- period-specific gaussian noise
Splits:
Train: 7 historical months × 110 workloads = 770 observations
Val: 1 month × 110 = 110
Test: current month × 110 = 110 (no labels; predict for all workloads)
Script changes:
- _train_gnn_and_predict: task relationships now use `at {Date:obs_date}`
time slots. Train + Val are 3-arity (workload, obs_date, label);
Test is 2-arity (workload, obs_date).
- LabMetric.metric_date typed as Date (was String). PropertyTransformer
has datetime=[LabMetric.metric_date], time_col=[LabMetric.metric_date]
per the rai-predictive-training triple-coupling rule for
has_time_column=True.
- GNN(has_time_column=True, ...). Same edges, same task type.
- Import Date from relationalai.semantics.
Doc changes:
- README "What's included" + "Template structure" updated for the new
data shape (770/110/110 instead of 80/15/110).
- README Reasoner overview Stage 1 reframed: "Heterogeneous-graph
temporal GNN" with explicit mention of has_time_column=True and the
per-month observation count.
- README "How it works" Stage 1 code snippet rewritten to show the
temporal task relationships with at clause + the GNN constructor
with has_time_column=True.
- Runbook chain ASCII Stage 1 updated to mention the 770 historical
obs + 110 val structure.
- Runbook Step 5 prompt + Response reframed for temporal training data.
Same chain shape downstream; same Stage 4 objective. The expected
output of Stage 1 should now show better-discriminated per-workload
probabilities (frontier 0.85+, Stability 0.20-0.35) once the GNN has
real training variety to learn from.
Real chain run with temporal GNN + Date-typed metric_date / observation_date: Stage 1 GNN now genuinely discriminates per-workload: n_total=110, n>=0.5: 99, n<0.5: 11 Top 5 (frontier pretrains): p ≈ 0.880-0.881 (Claude/Grok/GPT-Next shards) Bottom 5 (Stability evals): p ≈ 0.338-0.342 (was compressed at 0.78-0.91 in v3 with 80-example single-shot labels; the 770 historical (workload, month) observations + temporal alignment give the GNN real signal to learn from.) Stage 4 baseline cell (100pct / unconstrained / none) shifts slightly: total_cost_usd: $4,187,977.91 -> $4,204,035.68 realized_margin: 0.834322 -> 0.833687 (revenue, n_assigned, anchor_share, binding_axis all unchanged.) Stage 4 85% margin / none cell: n_assigned: 20 -> 18 revenue: $22,032,951.05 -> $22,032,899.59 cost: $3,304,939.84 -> $3,304,895.87 breakdown: 14 P0 + 4 P1 + 2 P2 -> 14 P0 + 4 P1 + 0 P2 The 85%-floor cell change has a meaningful narrative reason: P2 evals now have utilization_probability ~0.34 (vs P0 pretrains at ~0.88), so the four-factor objective devalues them ~2.6x relative to P0. They no longer "sneak in" to fill the 85%-floor cell -- the cell goes back to the 18-workload anchor-only shape. DemandScenario overlay numbers unchanged to within cents (P0 vs non-P0 strategic-value split is the same; the overlay is purely a post-solve multiplication). Doc changes: - README Quickstart Expected-output table updated: Stage 1 distribution (n>=0.5: 99, n<0.5: 11) + top-5 / bottom-5 lists; baseline cell row (cost 4,204,035.68); 85pct cell row (n=18, revenue 22,032,899.59, cost 3,304,895.87); AllocationPlan singleton row. - README "Expected per-cell behavior" 85% bullet: "drops 90 -> drops 92", remove "+ 2 P2 evals" mention, add the under-provisioning-amplifier + GNN-utilization rationale. - Runbook Step 9 Response: baseline cost $4.19M -> $4.20M; 85% margin cliff "90 workloads dropped -> 92", same breakdown rationale. - Runbook Step 5 Response + chain-ASCII Stage 1 annotation: Stability evals "0.20-0.35" -> "~0.34" (matches actual GNN output range).
…PTIMAL)
HiGHS at 900s consistently hits TIME_LIMIT on this 48-cell sweep --
acceptable for a tutorial but a poor demo experience (15 min wall
time before the script prints Stage 4 results).
Switch to Gurobi as the recommended solver; it typically converges
to OPTIMAL across all feasible cells well under 60s. HiGHS remains
documented as a fallback for customers without a Gurobi-licensed
prescriptive engine (raise time_limit_sec to ~900 and accept the
feasible-but-not-proven-optimal solution).
Changes:
- problem.solve("gurobi", time_limit_sec=60)
- Stage 4 print "Solving 48-cell scenario sweep with Gurobi..."
- README Reasoner overview row: MIP (Gurobi)
- README Prerequisites: Gurobi-enabled prescriptive engine
(recommended); HiGHS works as a fallback
- README Quickstart Expected output: Termination OPTIMAL (was TIME_LIMIT)
- README Customize bullet: swap to "highs" for non-Gurobi prescriptive
engines, raise time_limit_sec to ~900
- Runbook Step 8 Response: Gurobi typically OPTIMAL under 60s
- Troubleshooting `TIME_LIMIT` collapsible reframed: only happens if
Gurobi can't reach OPTIMAL on a cell in 60s, or if you've swapped
to HiGHS
Real run with Gurobi: - Termination: OPTIMAL (was TIME_LIMIT) - Solve time: 9.4s (was 900s) - Objective: $1,301,756,328.76 (was $1,255,809,068.45 -- better solution because Gurobi reached OPTIMAL where HiGHS hit the wall) Per-cell shifts driven by the better solution: - Baseline total_cost: $4,204,035.68 -> $4,197,891.06 (~$4.20M, still) - realized_margin: 0.833687 -> 0.833930 - 85pct/none cell: n_assigned 18 -> 20 (Gurobi finds the v2-era 20-workload solution by squeezing 2 P2 evals in under the floor); revenue $22,032,899.59 -> $22,032,951.05; cost $3,304,895.87 -> $3,304,939.27; breakdown back to 14 P0 + 4 P1 + 2 P2. - "drops 92" -> "drops 90" in narrative. - Diversity frontier 70% cap revenue: $4,171,456 -> $4,437,625 (Gurobi finds a better assignment within the 70% anchor cap). DemandScenario overlay: unchanged (computation depends only on the chosen-cell P0/non-P0 strategic-value split, which is unchanged). Doc updates: - README Quickstart expected output: 85pct row, baseline cost, AllocationPlan singleton row. - README Expected runtime: Gurobi ~10s solve; total wall time ~5 min end-to-end with Gurobi vs ~17 min with HiGHS. - README "Expected per-cell behavior" 85% bullet. - Runbook Step 8 Response: ~10s solve detail. - Runbook Step 9 Response: 90 dropped, 14 P0 + 4 P1 + 2 P2.
…ng peer templates
Drop gurobi/HiGHS mentions from README and runbook narrative. Stage 4
script call switches to problem.solve("highs", time_limit_sec=120),
matching energy_grid_planning and telco_network_recovery conventions
(both peers use highs+120s). README Prerequisites now just says
"prescriptive engine for Stage 4"; Customize section has a single
solver-tuning bullet that's solver-agnostic.
TIME_LIMIT remains the expected termination on the default config;
already-documented as 'signal, not failure' per
rai-prescriptive-results-interpretation. No behavior change on the
baseline cell (all 110 still fit); tight cells may have slightly
different feasible solutions across runs but the documented patterns
hold.
…vention until GNN GAs) Matches telco_network_recovery, subscriber_retention, demand_forecasting — all GNN-using templates carry private: true until the GNN reasoner is generally available.
…enario Analysis) to match peer convention
…ocs to live Gurobi run - Flatten script to module scope under `# ---- Stage N ----` banners (no `def main`, no `_Ctx`, helpers reduced to `load_csv` + `_train_gnn_and_predict`) - Trim scenario sweep: 2 × 3 × 4 = 24 cells (drop 110pct envelope and 75% margin; drop anchor_max_40pct_with_type_floor was retained as designed-INFEASIBLE example) - Top-level `SOLVER` + `SOLVER_TIME_LIMIT_SEC` constants; default Gurobi, HiGHS option - Stage 4 status labelling: distinguish INFEASIBLE (proven) from UNSOLVED (timeout) - Predictive: `device="cuda"`, matches the configured `GPU_NV_S` engine - pyproject.toml: pin `relationalai==1.4.2` - Reconcile README expected-output + per-cell behavior + framing to the actual run: 16 OPTIMAL / 8 INFEASIBLE, baseline 110/$25.28M/83%/95%, OPTIMAL in 6.3s - Tighten runbook Response blocks (mean 44 words, in range with canonical runbooks) - Refresh `data/workload_utilization_fallback.csv` from a real GNN run so the `--no-gnn` path matches GNN-shape probabilities (~0.78 top / ~0.45 bottom) - Clean up apologetics in README, stale function references after refactor, and the phantom `data_center_attrs.csv` listing
- README front matter description: "3D-scenario MIP" → "24-cell scenario MIP" - README Template structure tree: power_envelope 3→2 rows, margin_floors 4→3 rows, workload_utilization_fallback annotated as "GNN-shape" not "deterministic" - Runbook Data footer: "3 / 4 / 4 scenario rows" → "2 / 3 / 4"
somacdivad
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this template does
Inside-the-fence GPU allocation across the 5 hyperscaler campuses an upstream interconnection-planning step (see
energy_grid_planning) has approved and energized. A datacenter operator decides which workloads from 6 AI labs (frontier-anchor, applied, research) get which GPUs in which pool this period, accountable on three publicly-discussed dimensions: substation power envelope, gross-margin after energy + depreciation, and anchor-concentration risk. The compounding wrinkle: a GPU-hour reserved by a workload that stalls is depreciation accruing without offsetting revenue — stranded capacity is the operator's biggest economic exposure.Reasoner chain (4 stages on a shared ontology)
LabMetric→Workload,WorkloadDependency.blocks, and cross-labco_datededges. WritesWorkload.utilization_probability(110 rows). Frontier pretrain shards top the distribution (~0.78), Stability evals bottom (~0.38).Compatibility(workload, gpu_pool)(1,918 pairs),Workload.priority_tier(P0=15 / P1=80 / P2=15),.priority_weight(100/10/1),.under_provisioning_penalty(1.0/0.3/0.0).Workload.gating_score; GPT-Next pretrain shard 02 tops at 0.0310.priority × gating × utilization × strategic_value × (1 + penalty). PersistsAllocationPlansingleton,Assignment.is_chosen(110 rows),DemandScenario+DemandScenarioOutlook(4 risk scenarios) — all queryable as ontology.Headline output (live Gurobi run, 4 min 21 s end-to-end)
100pct / unconstrained / none): 110 workloads assigned · $25.28M revenue · $4.18M cost · 83% margin · 95% anchor · binding axis = power envelope.Verified end-to-end
(workload, pool)cost varies within multi-optimal MIP tolerance; GNN run-to-run probabilities vary within seed/CUDA non-determinism.# ---- Stage N ----banners; nodef main, no_Ctx).energy_grid_planningandtelco_network_recovery).dev-templates-reviewskill updated with a new bullet covering multi-reasoner script structure and one covering Runbook Response density.Test plan
GPU_NV_S+ Gurobi prescriptive engine, 4 min 21 s wall)--no-gnnfallback CSV refreshed from a real GNN run (no more GNN-vs-fallback drift)