Skip to content

walkup to root when getting cpu.max in cgroup v2#712

Closed
link89 wants to merge 1 commit into
lxc:mainfrom
link89:fix-cpu-max-cgroup-v2
Closed

walkup to root when getting cpu.max in cgroup v2#712
link89 wants to merge 1 commit into
lxc:mainfrom
link89:fix-cpu-max-cgroup-v2

Conversation

@link89
Copy link
Copy Markdown

@link89 link89 commented Mar 17, 2026

close #711

Signed-off-by: weihong.xu <xuweihong.cn@gmail.com>
@link89 link89 force-pushed the fix-cpu-max-cgroup-v2 branch from de18d5f to f3f045b Compare March 17, 2026 16:54
Comment thread src/proc_cpuview.c
return false;

if (!cgroup_ops->get(cgroup_ops, "cpu", cg, file, &str))
if (!cgroup_ops->get_walkup_to_root(cgroup_ops, "cpu", cg, file, &str))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this? I have a gut feeling that this is not correct, because for example strcmp("max 100000", "max") is not zero.

Copy link
Copy Markdown
Author

@link89 link89 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this?

Yes, it is already deployed in our production environment.

because for example strcmp("max 100000", "max") is not zero.

Replacing cgroup_ops->get with cgroup_ops->get_walkup_to_root only ensure when lxcfs cannot find cpu.max in /user.slice/user-1008.slice/session-219.scope, it will search in the parent directory /user.slice/user-1008.slice/cpu.max. And the parsing of cpu.max remains the same without any change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing cgroup_ops->get with cgroup_ops->get_walkup_to_root only ensure when lxcfs cannot find cpu.max in /user.slice/user-1008.slice/session-219.scope, it will search in the parent directory /user.slice/user-1008.slice/cpu.max.

yes, this makes sense. Another question is that why LXCFS even looks on /user.slice/user-1008.slice/session-219.scope cgroup in your case. LXCFS always tries to determine a "init" process of the container. See, for example proc_stat_read. It will go to /user.slice/user-1008.slice/session-219.scope only if requesting process is running in the same PID namespace as LXCFS itself, i.e. there is no LXC/Docker container. Am I right in understanding that in your setup you don't use any kind of containers?

And the parsing of cpu.max remains the same without any change.

Thanks, this makes sense. That would be awesome if you had described all of this in your commit message and issue ;-) Cause otherwise I have to guess what kind of setup do you have and why this fix helps.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. Another question is that why LXCFS even looks on /user.slice/user-1008.slice/session-219.scope cgroup in your case. LXCFS always tries to determine a "init" process of the container.

I am using LXCFS without container. As a HPC admin I find that I can make use of LXCFS and pam_namespace to override the /proc/cpuinfo to make programs developed by Python and others to be cgroup-aware.

@mihalicyn
Copy link
Copy Markdown
Member

Looks like a duplicate of #690

@link89
Copy link
Copy Markdown
Author

link89 commented Mar 19, 2026

Looks like a duplicate of #690

Yes. In my case cpu.max is not exist in the current directory, so all I need to do is to read value from parent. In the case of #690 , the current directory contains part of value, so it has to read value from all parents and merge them to get the right one.

I didn't expect that situation when I solved my own problem. But the patch looks a little bit over complicated to me.

@mihalicyn
Copy link
Copy Markdown
Member

mihalicyn commented Mar 19, 2026

the current directory contains part of value, so it has to read value from all parents and merge them to get the right one.

no, there is no need to merge values, but you need to figure out that on a current level there is no information about limit. It means that cpu.max content will be like max SOME_NUMBER2, if you see that, then you should go up the tree and try again and again and do like that until you reach something like SOME_NUMBER1 SOME_NUMBER2.

I didn't expect that situation when I solved my own problem. But the patch looks a little bit over complicated to me.

No worries. This stuff is quite complicated and it is hard to figure out everything at once.

Thanks for your work on this. Let me know if you are interested to continue.

@mihalicyn
Copy link
Copy Markdown
Member

I'm closing this PR for now, because we have another one with more activity and hopefully it will be ready to be merged soon #690.

@mihalicyn mihalicyn closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

lxcfs fails to read cpu.max for cgroup v2 with systemd slices/scopes

2 participants