-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
657 lines (651 loc) · 29.9 KB
/
index.html
File metadata and controls
657 lines (651 loc) · 29.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta name="description" content="A practitioner's editorial on cgroup tiering, Effective Constrained Clock, and why symmetric CPU tiers change the economics of virtualization density.">
<meta property="og:type" content="article">
<meta property="og:title" content="Deterministic Density: Rethinking CPU Architecture with Cgroup Tiering">
<meta property="og:description" content="Symmetric cgroup tiering turns stranded production idle time into useful throughput. Here's the math on EPYC 9655 vs 9575F.">
<meta property="og:url" content="https://gprocunier.github.io/deterministic-density/">
<meta name="author" content="Greg Procunier">
<meta property="og:image" content="https://gprocunier.github.io/deterministic-density/og-image.png">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Deterministic Density: Rethinking CPU Architecture with Cgroup Tiering">
<meta name="twitter:description" content="Symmetric cgroup tiering turns stranded production idle time into useful throughput. Here's the math on EPYC 9655 vs 9575F.">
<meta name="twitter:image" content="https://gprocunier.github.io/deterministic-density/og-image.png">
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Deterministic Density</title>
<meta name="description" content="A practical reflection on deterministic virtualization, cgroup tiering, and why symmetric Gold, Silver, and Bronze capacity changes the density conversation.">
<meta name="google-site-verification" content="-cAcLaA0l0O_JyCuMrNDwKoISaFm8JtOsfjnvXLGgA4">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Red+Hat+Display:wght@500;700&family=Red+Hat+Mono:wght@400;500&family=Red+Hat+Text:wght@400;500;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="assets/site.css">
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
mermaid.initialize({
startOnLoad: true,
theme: 'base',
securityLevel: 'loose',
flowchart: {
useMaxWidth: true,
htmlLabels: true,
nodeSpacing: 30,
rankSpacing: 42
},
themeVariables: {
fontFamily: '"Red Hat Text", "Helvetica Neue", Arial, sans-serif',
fontSize: '18px',
primaryColor: '#fff4e5',
primaryBorderColor: '#e0e0e0',
primaryTextColor: '#151515',
lineColor: '#8a8d90',
clusterBkg: '#ffffff',
clusterBorder: '#c7c7c7'
}
});
</script>
</head>
<body>
<div class="site-shell">
<header class="site-header">
<div class="site-header__inner">
<p class="eyebrow">Oversubscription</p>
<div class="site-brand">
<div>
<h1 class="site-brand__title"><a href="index.html">Deterministic Density</a></h1>
<p class="site-brand__tagline">A practitioner's editorial on cgroup tiering, Effective Constrained Clock, and why symmetric CPU tiers change the economics of virtualization density.</p>
</div>
</div>
<div class="site-header__actions">
<a href="https://github.com/gprocunier/deterministic-density"><kbd>READ ON GITHUB</kbd></a>
<a href="https://github.com/gprocunier/openstack-cgroup-tiering"><kbd>CGROUP THESIS</kbd></a>
<a href="https://gprocunier.github.io/calabi/host-resource-management.html"><kbd>CALABI PROJECT</kbd></a>
</div>
</div>
</header>
<main class="page-shell">
<div class="content-column">
<article class="markdown-body">
<p><a id="overview"></a></p>
<h1
id="deterministic-virtualization-rethinking-cpu-architecture-with-cgroup-tiering">Deterministic
Virtualization: Rethinking CPU Architecture with Cgroup Tiering</h1>
<p>When I think about how we used to size virtualization hosts, the
pattern was simple: buy the fastest, biggest CPU the budget could
tolerate, drop everything into one pool, and hope the noisy neighbors
stayed quiet.</p>
<p>My cgroup-tiering idea, first described in my <a
href="https://github.com/gprocunier/openstack-cgroup-tiering">2025
OpenStack cgroup-tiering thesis</a> and later adapted for single-host
KVM/OpenShift in my <a
href="https://gprocunier.github.io/calabi/host-resource-management.html">Calabi
project</a>, changes that starting point. I read it as a symmetric tier
model, where the useful capacity comes from equal Gold, Silver, and
Bronze slices instead of from squeezing every last vCPU into a flat
pool.</p>
<p>Before that chip comparison makes sense, the tiering model needs a
little context. When I compare processors like the AMD EPYC 9655 and the
EPYC 9575F under this model, I care less about raw throughput and more
about how large a symmetric guest pool each chip can support while
keeping the Gold/Silver/Bronze balance intact. I am not affiliated with
AMD. I singled out these two parts because, to my eye, they represent
the current state of dense, high-clock-frequency server silicon in 2026,
which makes them useful reference points for this kind of
tiered-capacity argument.</p>
<hr />
<p><a id="density-paradigm"></a></p>
<h2
id="the-density-paradigm-traditional-virtualization-vs-guardrail-tiering">The
Density Paradigm: Traditional Virtualization vs. Guardrail Tiering</h2>
<p>To explain why this matters, I start with the old habit of treating a
host as a single pile of CPU time, then compare it with the tiered
model.</p>
<p><a id="traditional-approach"></a></p>
<h3 id="the-traditional-approach-flat-pools-and-hardware-silos">The
Traditional Approach: Flat Pools and Hardware Silos</h3>
<p>In a standard virtualization host, the "noisy neighbor" problem is a
constant, unmanaged threat. The hypervisor's default scheduler treats
all guest threads relatively equally. If a developer runs an unoptimized
script that pegs the CPU at 100%, the hypervisor will happily steal
execution time from a mission-critical production database to serve the
developer's script.</p>
<p>Because running mixed environments (Production and Non-Production) on
a flat host is a bad default, traditional best practices tend to favor
defensive design. I treat <code>1.5:1</code> as a conservative midpoint
between published guidance from <a
href="https://blogs.vmware.com/cloud-foundation/2025/06/04/vcpu-to-pcpu-ratio-guidelines/">VMware
VCF</a>, <a
href="https://learn.microsoft.com/en-us/biztalk/technical-guides/checklist-optimizing-performance-on-hyper-v">Microsoft
Hyper-V</a>, and <a
href="https://www.nutanix.com/tech-center/blog/understanding-cpu-resource-management-in-nutanix-ahv">Nutanix
AHV</a>: VMware says there is no single right ratio and that
<code>1:1</code> is the safe starting point when quick response matters,
Microsoft says <code>1:1</code> is best for CPU-intensive workloads, and
Nutanix says latency-sensitive workloads should stay at no
oversubscription or up to <code>2x</code>, while non-critical workloads
may go higher.</p>
<ol type="1">
<li><strong>Physical Segregation:</strong> Organizations build entirely
separate hardware silos (e.g., a "Prod Cluster" and a "Dev
Cluster").</li>
<li><strong>Defensive Sizing:</strong> Production workloads are rarely
oversubscribed. To guarantee latency, Prod VMs are often provisioned at
a 1:1 or strictly managed 1.5:1 ratio across the board to prevent
contention.</li>
<li><strong>The Result:</strong> Massive amounts of stranded, wasted
compute. Production servers sit at 15% average utilization, wasting 85%
of their hardware capacity just to maintain enough headroom for brief,
unpredictable traffic spikes.</li>
</ol>
<p><a id="tiered-approach"></a></p>
<h3 id="the-tiered-approach-mixed-tenancy-and-consolidation">The Tiered
Approach: Mixed Tenancy and Consolidation</h3>
<p>The Calabi/cgroup-tiering model addresses the noisy neighbor problem
structurally. It keeps one tier from taking over the whole shared pool
when the other tiers are also runnable.</p>
<p><strong>The Density Uplift:</strong> Because the model is symmetric,
mixed tenancy becomes practical. I can put CPU-bound non-production
workloads beside latency-sensitive production workloads on the same host
as long as the Gold/Silver/Bronze allocation stays balanced and the
vendor-style baseline is kept conservative.</p>
<ul>
<li>Instead of running a Prod host at 15% utilization and a Dev host at
80% utilization, you combine them.</li>
<li>The host average utilization climbs by reclaiming stranded headroom
from the silos.</li>
<li>The blended host can safely run at a deterministic <strong>3:1
symmetric oversubscription ratio</strong> across the three tiers.</li>
</ul>
<p>What I like about that arrangement is that it turns stranded
production idle time into useful development throughput without making
the tier ratios chaotic.</p>
<hr />
<p><a id="foundation"></a></p>
<h2 id="the-foundation-strict-isolation-and-symmetric-tiering">The
Foundation: Strict Isolation and Symmetric Tiering</h2>
<p>To make this density work, the Calabi model still depends on
isolation. If the hypervisor competes with the guests for physical CPU
cycles, the whole setup gets noisy fast. So I reserve a fixed slice of
logical threads, for example 12 threads, for the host OS and emulators,
and treat the rest as the shared guest pool.</p>
<p>Once I am inside the shared guest pool, the model avoids thread
mobbing through <strong>Symmetric Tiering</strong>. Capacity comes from
an equal partition of that pool across Gold, Silver, and Bronze, and the
weights only decide how those equal slices behave under contention. The
main planning risk is asymmetric demand, because if Gold fills faster
than Silver or Bronze, the unused capacity in those tiers does not
automatically become useful Gold capacity.</p>
<p><a id="host-architecture-and-tiering"></a></p>
<h3 id="conceptual-flow-host-architecture-and-tiering">Conceptual Flow:
Host Architecture and Tiering</h3>
<div class="mermaid">graph TD
Host["Physical CPU"] --> Reserved["Host Reserved CPUs (12 Threads)"]
Host --> Guest["Shared Guest Domain Pool"]
Reserved --> HK["Housekeeping<br/>Networking, libvirt, OVS"]
Reserved --> Emu["Emulator Threads<br/>QEMU, Storage I/O"]
Guest --> Gold["Gold Tier<br/>Weight: 512"]
Guest --> Silver["Silver Tier<br/>Weight: 333"]
Guest --> Bronze["Bronze Tier<br/>Weight: 167"]
%% Symmetric constraint annotation
Gold -.- |"1:1:1 vCPU Ratio"| Silver
Silver -.- |"1:1:1 vCPU Ratio"| Bronze</div>
<hr />
<p><a id="ecc-and-sla-floor"></a></p>
<h2 id="effective-constrained-clock-ecc-and-the-sla-floor">Effective
Constrained Clock (ECC) and The SLA Floor</h2>
<p>At full saturation, the cgroup weights (512 for Gold, 333 for Silver,
167 for Bronze) shape Linux Completely Fair Scheduler behavior. If the
runnable work is evenly represented across the three tiers, Gold gets
about <strong>50.6%</strong> of the physical core's time, Silver gets
<strong>32.9%</strong>, and Bronze gets <strong>16.5%</strong>.</p>
<p>When I translate those percentages into clock speed, I get the
<strong>Effective Constrained Clock (ECC)</strong>. Roughly half of a
4.5 GHz all-core boost clock works out to about 2.28 GHz of sustained
execution under contention.</p>
<p><a id="worst-case-contention-sla"></a></p>
<h3 id="the-worst-case-contention-sla">The Worst-Case Contention
SLA</h3>
<table>
<thead>
<tr>
<th style="text-align: left;">Tier</th>
<th style="text-align: left;">Time Slice</th>
<th style="text-align: left;">EPYC 9575F (4.5 GHz All-Core Boost)</th>
<th style="text-align: left;">EPYC 9655 (4.1 GHz All-Core Boost)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong>Gold</strong></td>
<td style="text-align: left;">~50.6%</td>
<td style="text-align: left;"><strong>~2.28 GHz</strong></td>
<td style="text-align: left;">~2.07 GHz</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Silver</strong></td>
<td style="text-align: left;">~32.9%</td>
<td style="text-align: left;"><strong>~1.48 GHz</strong></td>
<td style="text-align: left;">~1.35 GHz</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Bronze</strong></td>
<td style="text-align: left;">~16.5%</td>
<td style="text-align: left;"><strong>~0.74 GHz</strong></td>
<td style="text-align: left;">~0.68 GHz</td>
</tr>
</tbody>
</table>
<p>That does sound low, but the point is that it is a contention floor,
not a boost target. The actual experience still depends on the workload
mix, but the tier model keeps the service curve predictable. For
context, that is a different world from the 2016 era, when a flagship
enterprise CPU like the Xeon E5-2699 v4 sat around a 2.2 GHz base clock
and people already treated that as decent headroom.</p>
<hr />
<p><a id="map-latency-tolerance"></a></p>
<h2 id="map-latency-tolerance-not-environments">Map Latency Tolerance,
Not Environments</h2>
<p>A common mistake is to map environments directly to tiers, for
example, "Production is Gold, Development is Bronze." I do not think
that works well. I map tiers based on the workload's tolerance for
latency, not on the environment label.</p>
<ul>
<li><p><strong>Gold (Synchronous/Interactive):</strong> Anything a human
or live API is actively waiting on.</p>
<ul>
<li><em>Prod:</em> Kubernetes Masters (<code>etcd</code>), primary
transactional databases, real-time auth.</li>
<li><em>Dev:</em> Active developer VDI workspaces, interactive
debuggers, "inner-loop" code compilation.</li>
</ul></li>
<li><p><strong>Silver (Asynchronous/Infrastructure):</strong> Highly
available but fundamentally asynchronous or heavily cached.</p>
<ul>
<li><em>Prod:</em> Ingress routers, Observability (Prometheus/Grafana),
Message Brokers (Kafka), Software-Defined Storage (Ceph).</li>
<li><em>Dev:</em> Staging control planes, internal Identity/DNS
servers.</li>
</ul></li>
<li><p><strong>Bronze (Deferrable/Batch):</strong> Background jobs where
execution time is flexible.</p>
<ul>
<li><em>Prod:</em> Data warehousing (ETL), async email/PDF rendering,
log archival.</li>
<li><em>Dev:</em> Nightly automated test suites, CI/CD PR runners,
static code analysis.</li>
</ul></li>
</ul>
<p>Because of the cgroup weights, a Bronze background job cannot
dominate the shared pool when Gold and Silver are also runnable. That is
the property I care about.</p>
<hr />
<p><a id="idle-borrowing"></a></p>
<h2 id="the-sweet-spot-and-idle-borrowing">The "Sweet Spot" and Idle
Borrowing</h2>
<p>The ECC floors represent the contention case. The cgroup weights only
matter when there is a scheduler queue. If Gold and Silver VMs are idle,
the Bronze tier can use the spare CPU through <strong>Idle
Borrowing</strong>.</p>
<p><a id="cfs-idle-borrowing"></a></p>
<h3 id="conceptual-flow-cfs-idle-borrowing-logic">Conceptual Flow: CFS
Idle Borrowing Logic</h3>
<div class="mermaid">graph TD
Start["Bronze vCPU Demands Compute"] --> QueueCheck{"Are Gold/Silver vCPUs demanding<br/>compute on this physical core?"}
QueueCheck -->|"Yes"| Contended["Apply Cgroup Weights"]
Contended --> Throttled["Bronze runs at 16.5% Time Slice<br/>Effective: ~0.74 GHz"]
QueueCheck -->|"No"| Idle["Idle Borrowing Activated"]
Idle --> Native["Bronze can use spare CPU<br/>Effective: Native host speed while idle"]</div>
<p>To keep that steady state, I would plan for Gold and Silver to leave
enough average headroom for Bronze to borrow when the higher tiers are
idle.</p>
<hr />
<p><a id="capacity-planning"></a></p>
<h2 id="capacity-planning-traditional-vs-tiered-density">Capacity
Planning: Traditional vs. Tiered Density</h2>
<p>To compare the economics, I set a traditional flat-pool host beside
the Calabi symmetric tiered model. In the tiered case, capacity comes
from equal Gold, Silver, and Bronze slices, so the host ceiling is the
symmetric guest pool rather than the most aggressive single-tier
packing.</p>
<p><em>(Note: Guest pools calculated after reserving 12 host threads for
the hypervisor: 9575F = 116 threads; 9655 = 180 threads.)</em></p>
<p><a id="scenario-a"></a></p>
<h3 id="scenario-a-t-shirt-sizing-optimal-density-stacking">Scenario A:
T-Shirt Sizing (Optimal Density Stacking)</h3>
<p>In <a
href="https://www.linkedin.com/pulse/size-matters-cloud-vm-statistics-tech-stack-tuesday-howard-young-77rff">Howard
Young's summary of Zadara's 2024 VM sampling</a>, 60.8% of the sample is
2 vCPU, 24.2% is 4 vCPU, and 10.1% is 8 vCPU. That is close enough to a
plain 60/30/10 split for this comparison. The table below uses exact
10-VM packs in that ratio: 6 x 2-vCPU, 3 x 4-vCPU, and 1 x 8-vCPU, or 32
vCPU per pack.</p>
<p><strong>EPYC 9575F</strong></p>
<table>
<thead>
<tr>
<th style="text-align: left;">Mix Component</th>
<th style="text-align: left;">1:1</th>
<th style="text-align: left;">1.5:1</th>
<th style="text-align: left;">3:1 tiered</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong>2-vCPU VMs</strong></td>
<td style="text-align: left;">18</td>
<td style="text-align: left;">30</td>
<td style="text-align: left;">60</td>
</tr>
<tr>
<td style="text-align: left;"><strong>4-vCPU VMs</strong></td>
<td style="text-align: left;">9</td>
<td style="text-align: left;">15</td>
<td style="text-align: left;">30</td>
</tr>
<tr>
<td style="text-align: left;"><strong>8-vCPU VMs</strong></td>
<td style="text-align: left;">3</td>
<td style="text-align: left;">5</td>
<td style="text-align: left;">10</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Total VMs</strong></td>
<td style="text-align: left;"><strong>30</strong></td>
<td style="text-align: left;"><strong>50</strong></td>
<td style="text-align: left;"><strong>100</strong></td>
</tr>
<tr>
<td style="text-align: left;"><strong>Gain vs 1:1</strong></td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;">1.67x</td>
<td style="text-align: left;"><strong>3.33x</strong></td>
</tr>
<tr>
<td style="text-align: left;"><strong>vCPU consumed</strong></td>
<td style="text-align: left;">96</td>
<td style="text-align: left;">160</td>
<td style="text-align: left;">320</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Reserve/slack</strong></td>
<td style="text-align: left;">20</td>
<td style="text-align: left;">14</td>
<td style="text-align: left;">28</td>
</tr>
</tbody>
</table>
<p><strong>EPYC 9655</strong></p>
<table>
<thead>
<tr>
<th style="text-align: left;">Mix Component</th>
<th style="text-align: left;">1:1</th>
<th style="text-align: left;">1.5:1</th>
<th style="text-align: left;">3:1 tiered</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong>2-vCPU VMs</strong></td>
<td style="text-align: left;">30</td>
<td style="text-align: left;">48</td>
<td style="text-align: left;">96</td>
</tr>
<tr>
<td style="text-align: left;"><strong>4-vCPU VMs</strong></td>
<td style="text-align: left;">15</td>
<td style="text-align: left;">24</td>
<td style="text-align: left;">48</td>
</tr>
<tr>
<td style="text-align: left;"><strong>8-vCPU VMs</strong></td>
<td style="text-align: left;">5</td>
<td style="text-align: left;">8</td>
<td style="text-align: left;">16</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Total VMs</strong></td>
<td style="text-align: left;"><strong>50</strong></td>
<td style="text-align: left;"><strong>80</strong></td>
<td style="text-align: left;"><strong>160</strong></td>
</tr>
<tr>
<td style="text-align: left;"><strong>Gain vs 1:1</strong></td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;">1.60x</td>
<td style="text-align: left;"><strong>3.20x</strong></td>
</tr>
<tr>
<td style="text-align: left;"><strong>vCPU consumed</strong></td>
<td style="text-align: left;">160</td>
<td style="text-align: left;">256</td>
<td style="text-align: left;">512</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Reserve/slack</strong></td>
<td style="text-align: left;">20</td>
<td style="text-align: left;">14</td>
<td style="text-align: left;">28</td>
</tr>
</tbody>
</table>
<div class="note">
<div class="title">
<p>Note</p>
</div>
<p>The 9655 shows slightly less relative gain than the 9575F because its
strict <code>1:1</code> baseline is already stronger. The absolute
outcome is still better on the 9655, but the percentage uplift
compresses because the denominator is larger.</p>
</div>
<p><strong>Value:</strong> On this 60/30/10 small-instance mix, the
<code>1.5:1</code> flat-pool midpoint already gives a visible lift over
strict <code>1:1</code>, but the tiered model is where the step-change
appears. The 9575F moves from 30 mixed-size VMs at <code>1:1</code> to
50 at <code>1.5:1</code> and 100 in the tiered model, while the 9655
moves from 50 to 80 to 160. I would treat the leftover vCPU in each
column as intentional reserve for host housekeeping, QEMU emulator
threads, IOThreads, and small mix skew, not as accidental waste.</p>
<p><a id="scenario-b"></a></p>
<h3 id="scenario-b-openshift-estate-and-orthogonal-tenancy">Scenario B:
OpenShift Estate and Orthogonal Tenancy</h3>
<p>Consider a primary OpenShift estate shaped like this:</p>
<ul>
<li><strong>Gold:</strong> 3 masters, 24 vCPU total (8 vCPU each)</li>
<li><strong>Silver:</strong> 3 infra VMs, 24 vCPU total: 10 vCPU for <a
href="https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.10/html/planning_your_deployment/infrastructure-requirements_rhodf">OpenShift
Data Foundation</a>, 4 vCPU for <a
href="https://docs.redhat.com/en/documentation/red_hat_ansible_automation_platform/2.4/html-single/red_hat_ansible_automation_platform_installation_guide/red_hat_ansible_automation_platform_installation_guide">Red
Hat Ansible Automation Platform</a>, and 10 vCPU for <a
href="https://docs.redhat.com/en/documentation/openshift_container_platform/4.10/html/logging/configuring-your-logging-deployment">OpenShift
Logging</a> plus <a
href="https://docs.redhat.com/en/documentation/red_hat_build_of_keycloak/26.4/html-single/high_availability_guide/">Red
Hat build of Keycloak</a></li>
<li><strong>Bronze:</strong> 3 standard workers, 24 vCPU total (8 vCPU
each)</li>
</ul>
<p>That makes the primary estate a 72-vCPU footprint before any second
tenant is added.</p>
<p><strong>EPYC 9575F</strong></p>
<table>
<thead>
<tr>
<th style="text-align: left;">Capacity After Primary Estate</th>
<th style="text-align: left;">1:1</th>
<th style="text-align: left;">1.5:1</th>
<th style="text-align: left;">3:1 tiered</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong>vCPU remaining</strong></td>
<td style="text-align: left;">44</td>
<td style="text-align: left;">102</td>
<td style="text-align: left;">276</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Tenant slots</strong> <em>(2 vCPU
each)</em></td>
<td style="text-align: left;">22</td>
<td style="text-align: left;">51</td>
<td style="text-align: left;">138</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Additional 72-vCPU
estates</strong></td>
<td style="text-align: left;">0 full estates + slack (44 vCPU)</td>
<td style="text-align: left;">1 full estate + slack (30 vCPU)</td>
<td style="text-align: left;">3 full estates + slack (60 vCPU)</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Gain vs 1:1</strong></td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;">2.32x</td>
<td style="text-align: left;"><strong>6.27x</strong></td>
</tr>
<tr>
<td style="text-align: left;"><strong>Gain vs 1.5:1</strong></td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;"><strong>2.71x</strong></td>
</tr>
</tbody>
</table>
<p><strong>EPYC 9655</strong></p>
<table>
<thead>
<tr>
<th style="text-align: left;">Capacity After Primary Estate</th>
<th style="text-align: left;">1:1</th>
<th style="text-align: left;">1.5:1</th>
<th style="text-align: left;">3:1 tiered</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong>vCPU remaining</strong></td>
<td style="text-align: left;">108</td>
<td style="text-align: left;">198</td>
<td style="text-align: left;">468</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Tenant slots</strong> <em>(2 vCPU
each)</em></td>
<td style="text-align: left;">54</td>
<td style="text-align: left;">99</td>
<td style="text-align: left;">234</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Additional 72-vCPU
estates</strong></td>
<td style="text-align: left;">1 full estate + slack (36 vCPU)</td>
<td style="text-align: left;">2 full estates + slack (54 vCPU)</td>
<td style="text-align: left;">6 full estates + slack (36 vCPU)</td>
</tr>
<tr>
<td style="text-align: left;"><strong>Gain vs 1:1</strong></td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;">1.83x</td>
<td style="text-align: left;"><strong>4.33x</strong></td>
</tr>
<tr>
<td style="text-align: left;"><strong>Gain vs 1.5:1</strong></td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;">N/A</td>
<td style="text-align: left;"><strong>2.36x</strong></td>
</tr>
</tbody>
</table>
<p><strong>Value:</strong> The <code>1:1</code> columns are the control
case, and the primary <code>72-vCPU</code> estate fits on both chips.
The difference is what remains after that first estate is in place: on
the 9575F, <code>1:1</code> leaves only <code>44 vCPU</code>, which is
not enough for a second full estate, while the 9655 can fit one more.
The <code>1.5:1</code> midpoint moves those chips to one and two
additional estates. The tiered model moves them to three and six. That
is the practical tenancy jump: not a marginal improvement in worker
count, but a host that can absorb whole additional clusters without
abandoning predictable service tiers.</p>
<hr />
<p><a id="conclusion"></a></p>
<h2 id="conclusion-density-vs-baseline-sla-guarantees">Conclusion:
Density vs. Baseline SLA Guarantees</h2>
<p>My read is that the Calabi and OpenStack cgroup-tiering models I
wrote work well as long as the symmetric tier constraint is respected.
Compared with the conservative oversubscription assumptions many teams
used in the 2016 era, the same chassis can absorb a much larger workload
mix without giving up the SLA floor. For me, <code>1.5:1</code> is a
midpoint, not a universal rule, because it sits between VMware's
<code>1:1</code> starting point and Nutanix's <code>2x</code> upper
bound for latency-sensitive workloads. I think that is the practical
gain here: legacy flat pools give way to a denser host without losing
the ability to reason about the tiers.</p>
<p>Choosing the right silicon to drive this model requires an honest
assessment of the organization's worst-case scenario:</p>
<ul>
<li><strong>Choose the AMD EPYC 9655 if density and throughput matter
most.</strong> It gives better hardware ROI and more symmetric guest
capacity for CI/CD and worker nodes.</li>
<li><strong>Choose the AMD EPYC 9575F if you want more per-core
headroom.</strong> You pay more per symmetric slot, but you get a faster
CPU baseline for workloads that are sensitive to contention floors.</li>
</ul>
</article>
</div>
<aside class="side-column">
<section class="context-block">
<h2>Why This Matters</h2>
<p>In 2026, the interesting problem is not whether high-density core silicon exists. It is how to use it without collapsing into flat-pool noise. This essay is my reflection on shaping those chips into predictable mixed-tenancy hosts.</p>
</section>
<section class="context-block">
<h2>Try It Yourself</h2>
<p>The <a href="https://gprocunier.github.io/calabi/">Calabi project</a> turns this model into working automation — cgroup tiering, NUMA-aware placement, and OpenShift node tuning you can deploy today. If the density math here makes sense, Calabi is how you operationalize it.</p>
</section>
<section class="source-block">
<h2>Primary Links</h2>
<ul class="path-list">
<li>
<a href="https://github.com/gprocunier/deterministic-density">
<strong>Repository</strong>
<span>Source for this essay and Pages site.</span>
</a>
</li>
<li>
<a href="https://github.com/gprocunier/openstack-cgroup-tiering">
<strong>2025 Cgroup Thesis</strong>
<span>The original OpenStack cgroup-tiering work.</span>
</a>
</li>
<li>
<a href="https://gprocunier.github.io/calabi/host-resource-management.html">
<strong>Calabi Project</strong>
<span>The single-host KVM and OpenShift adaptation.</span>
</a>
</li>
</ul>
</section>
<section class="toc-block">
<h2>On This Page</h2>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#density-paradigm">The Density Paradigm</a></li>
<li><a href="#foundation">Strict Isolation and Symmetric Tiering</a></li>
<li><a href="#ecc-and-sla-floor">ECC and the SLA Floor</a></li>
<li><a href="#map-latency-tolerance">Map Latency Tolerance</a></li>
<li><a href="#idle-borrowing">Idle Borrowing</a></li>
<li><a href="#capacity-planning">Capacity Planning</a></li>
<li><a href="#scenario-a">Scenario A</a></li>
<li><a href="#scenario-b">Scenario B</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</section>
</aside>
</main>
<footer class="site-footer">
Published from repository docs on 2026-04-12.
</footer>
</div>
</body>
</html>