Hi developers,
My institution recently rolled out Arbiter2 on our production login nodes. However, we had an issue where users' CPU usage could spike even after constraints were applied. A screenshot of the utilization plot was attached for your reference.

Oddly we didn't see this behavior on the dev nodes, which should be exact clones of the production ones. On the dev nodes, the CPU usage plot is more like a standard rectangle (attached below). And the spikes only happen for CPU but not for memory.

We are wondering if your team has ever seen this behavior before? Our current guess is that it might be due to how cgroup CPU limiting works through a quota/period scheduling mechanism, e.g. process can burst to 100% CPU utilization at the start of a period, exhaust its quota, and then get throttled for the remainder. Or it might be due to the fact that there are a lot more users and processes on the production nodes, therefore, it takes longer for Arbiter to process those and throttle them.
Any thoughts would be greatly appreciated!
Thanks,
Minyan
Hi developers,
My institution recently rolled out Arbiter2 on our production login nodes. However, we had an issue where users' CPU usage could spike even after constraints were applied. A screenshot of the utilization plot was attached for your reference.

Oddly we didn't see this behavior on the dev nodes, which should be exact clones of the production ones. On the dev nodes, the CPU usage plot is more like a standard rectangle (attached below). And the spikes only happen for CPU but not for memory.

We are wondering if your team has ever seen this behavior before? Our current guess is that it might be due to how cgroup CPU limiting works through a quota/period scheduling mechanism, e.g. process can burst to 100% CPU utilization at the start of a period, exhaust its quota, and then get throttled for the remainder. Or it might be due to the fact that there are a lot more users and processes on the production nodes, therefore, it takes longer for Arbiter to process those and throttle them.
Any thoughts would be greatly appreciated!
Thanks,
Minyan