Skip to content

docs: restructure topology-aware documentation.#628

Open
klihub wants to merge 2 commits intocontainers:mainfrom
klihub:docs/topology-aware/restructure
Open

docs: restructure topology-aware documentation.#628
klihub wants to merge 2 commits intocontainers:mainfrom
klihub:docs/topology-aware/restructure

Conversation

@klihub
Copy link
Collaborator

@klihub klihub commented Feb 6, 2026

This PR is an attempt to restructure the topology-aware documentation. I threw copilot at it asking to reorganize the content taking the ToC from the corresponding balloons PR#627.

@klihub klihub requested a review from askervin February 6, 2026 07:57
@klihub klihub force-pushed the docs/topology-aware/restructure branch 6 times, most recently from 0d1a829 to d108a06 Compare February 12, 2026 16:34
@klihub klihub changed the title [draft/test]: docs: restructure topology-aware documentation. docs: restructure topology-aware documentation. Feb 12, 2026
@klihub klihub requested a review from kad February 12, 2026 17:06
@klihub klihub force-pushed the docs/topology-aware/restructure branch from d108a06 to 4ef44cf Compare February 13, 2026 07:26
## Background
## 1. Overview

### What Problems Does the Topology-Aware Policy Solve?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this introduction/background on why resource policies matter in the first place. Let's take it in.

In a later phase, when possibly adding a separate "Why to use a resource policy and how to choose the right one" (sub)document, perhaps we can move this background information there and include links from the topology-aware and balloons policy documents into that.

@klihub klihub force-pushed the docs/topology-aware/restructure branch from e144b3c to 720e3f5 Compare February 13, 2026 08:29
@klihub klihub force-pushed the docs/topology-aware/restructure branch 3 times, most recently from ada0163 to 93d7449 Compare February 24, 2026 12:45
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the docs/topology-aware/restructure branch from 93d7449 to 2424fdb Compare February 24, 2026 12:53

## Controlling Topology Hints Via Annotations

### Topology Hints
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next paragraph used to be again an introduction to topology hints in general, to cover the subsection. But now it became a sub-subsection itself. I still assume we wouldn't like to go to 4th level sections, so perhaps these two subsection ("topology hints" and "enabling or disabling...") could b merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@askervin Shouldn't we also move the earlier 'Implicit Hardware Topology Hints' section here and combine it with the rest so we'd have everything related to hints described together/close to each other ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. It's easier to find the information under only one topic/section.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up moving the first section about (Implicit Hardware) Topology Hints together with the rest later, turning them into a single chapter and using bold paragraph headers as topic separators.

Comment on lines 1160 to 1161
# Prefer isolated CPUs when available
prefer-isolated-cpus.resource-policy.nri.io/pod: "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should not recommend isolcpus for generic HPC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the HPC workload example altogether. It did not make sense with the isolated allocation attempt of multiple CPUs.

name: default
spec:
preferSharedCPUs: false
preferIsolatedCPUs: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same as above, isolcpus might be difficult...)

name: default
spec:
preferSharedCPUs: false
preferIsolatedCPUs: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suspecting that isolcpus for any Guaranteed container as a default is not a safe choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a bad/hallucinated example. Usually shared preference should be set or left at the default true so that isolated CPUs would be preferred only for guaranteed containers with < 2 CPU request. Plus globally preferred isolated for all exclusive allocation requires explicitly annotated opt-out from shared allocation. We need to check and rethink the examples more carefully.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this Guaranteed realtime app to ask for a single CPU, to let the isolated CPU preference make more sense.

cpu: "4"
memory: "8Gi"

# Best-effort background task
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have one more workload in this cookbook? I'm thinking of a high-responsive, relatively low-latency Burstable container. We could treat it with slightly elevated scheduling priority (yet not realtime), and with memory locality set to NUMA level. That would be again clearly something that is not achievable with on-the-stock managers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I added a prioritized app like that, and changed the realtime app to take a single exclusive CPU and explicitly prefer isolated CPUs for it.

@klihub klihub force-pushed the docs/topology-aware/restructure branch 5 times, most recently from 6579de9 to 0da59cc Compare February 27, 2026 07:57
Reorganize topology-aware.md taking inspiration from a similar
update to balloons and using this table of content
  - Overview with problem statement and key features
  - Installation and Configuration
  - Configuration Options (8 subsections)
  - Cookbook with practical examples
  - Troubleshooting

Co-authored-by: GitHub Copilot <noreply@github.com>
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the docs/topology-aware/restructure branch from 0da59cc to 8c29598 Compare February 27, 2026 15:57
@klihub klihub marked this pull request as ready for review February 27, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants