Skip to content

CMP-4007: Fix aide-worker memory growth caused by cgroup page cache accumulation#877

Open
Vincent056 wants to merge 1 commit intoopenshift:masterfrom
Vincent056:fix-aide-worker-memory-growth
Open

CMP-4007: Fix aide-worker memory growth caused by cgroup page cache accumulation#877
Vincent056 wants to merge 1 commit intoopenshift:masterfrom
Vincent056:fix-aide-worker-memory-growth

Conversation

@Vincent056
Copy link
Contributor

@Vincent056 Vincent056 commented Mar 2, 2026

Summary

  • aide-worker pods exhibit continuous memory growth toward the resource limit across AIDE scan cycles. Root cause: AIDE reads the entire host filesystem, and the kernel page cache for those reads is charged to the container's cgroup. Without reclamation this cache accumulates, consuming ~530 MiB of the 600 MiB limit while the Go daemon itself only uses ~10 MiB.
  • Use cgroup v2 memory.reclaim to evict file-backed page cache after each AIDE scan and DB init, reducing reported memory from ~570 MiB to ~11 MiB.
  • Fix a file descriptor leak in getNonEmptyFile, pre-compile regex patterns, handle AlreadyExists on ConfigMap creation, and call runtime.GC/debug.FreeOSMemory after scans.

Root Cause Analysis

The aide-worker container runs AIDE as a privileged process scanning the host root filesystem mounted at /hostroot. Every file AIDE reads generates kernel page cache entries that are charged to the container's cgroup memory. oc adm top pods reports container_memory_working_set_bytes which includes this page cache, causing reported memory to grow toward the limit after each scan cycle. Increasing resource limits only causes memory to grow to the new limit.

Cgroup memory breakdown before fix:

Category Size % of Total
file (page cache) 530 MiB 93%
anon (Go process) 10 MiB 2%
kernel/slab 30 MiB 5%

Test Results

Tested on OCP 4.18.22 with FIO 1.3.8 (6 nodes, 3 masters + 3 workers):

Metric Before After Reduction
oc adm top memory 152-562 MiB 10-13 MiB ~97%
cgroup memory.current 568-597 MiB 11-45 MiB ~97%
cgroup file (page cache) 529-588 MiB 0-33 MiB ~97%
Pod restarts 0 0

Changes

  1. cmd/manager/daemon_util.go: Add reclaimCgroupPageCache() using cgroup v2 memory.reclaim, getOwnCgroupPath() to discover the container cgroup, and releaseMemoryAfterScan() for explicit GC. Fix file descriptor leak in getNonEmptyFile().
  2. cmd/manager/loops.go: Call reclaim and GC after each AIDE scan in aideLoop and after DB initialization in handleAIDEInit.
  3. cmd/manager/logcollector_util.go: Pre-compile regex patterns at package level. Handle AlreadyExists error on ConfigMap creation with delete-and-recreate.
  4. pkg/controller/fileintegrity/fileintegrity_controller.go: Update outdated GODEBUG comment.

@Vincent056 Vincent056 force-pushed the fix-aide-worker-memory-growth branch from 717303f to fe75940 Compare March 2, 2026 16:39
@Vincent056 Vincent056 changed the title Fix aide-worker memory growth caused by cgroup page cache accumulation CMP-4007: Fix aide-worker memory growth caused by cgroup page cache accumulation Mar 2, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 2, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 2, 2026

@Vincent056: This pull request references CMP-4007 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • aide-worker pods exhibit continuous memory growth toward the resource limit across AIDE scan cycles. Root cause: AIDE reads the entire host filesystem, and the kernel page cache for those reads is charged to the container's cgroup. Without reclamation this cache accumulates, consuming ~530 MiB of the 600 MiB limit while the Go daemon itself only uses ~10 MiB.
  • Use cgroup v2 memory.reclaim to evict file-backed page cache after each AIDE scan and DB init, reducing reported memory from ~570 MiB to ~11 MiB.
  • Fix a file descriptor leak in getNonEmptyFile, pre-compile regex patterns, handle AlreadyExists on ConfigMap creation, and call runtime.GC/debug.FreeOSMemory after scans.

Root Cause Analysis

The aide-worker container runs AIDE as a privileged process scanning the host root filesystem mounted at /hostroot. Every file AIDE reads generates kernel page cache entries that are charged to the container's cgroup memory. oc adm top pods reports container_memory_working_set_bytes which includes this page cache, causing reported memory to grow toward the limit after each scan cycle. Increasing resource limits only causes memory to grow to the new limit.

Cgroup memory breakdown before fix:

Category Size % of Total
file (page cache) 530 MiB 93%
anon (Go process) 10 MiB 2%
kernel/slab 30 MiB 5%

Test Results

Tested on OCP 4.18.22 with FIO 1.3.8 (6 nodes, 3 masters + 3 workers):

Metric Before After Reduction
oc adm top memory 152-562 MiB 10-13 MiB ~97%
cgroup memory.current 568-597 MiB 11-45 MiB ~97%
cgroup file (page cache) 529-588 MiB 0-33 MiB ~97%
Pod restarts 0 0

Changes

  1. cmd/manager/daemon_util.go: Add reclaimCgroupPageCache() using cgroup v2 memory.reclaim, getOwnCgroupPath() to discover the container cgroup, and releaseMemoryAfterScan() for explicit GC. Fix file descriptor leak in getNonEmptyFile().
  2. cmd/manager/loops.go: Call reclaim and GC after each AIDE scan in aideLoop and after DB initialization in handleAIDEInit.
  3. cmd/manager/logcollector_util.go: Pre-compile regex patterns at package level. Handle AlreadyExists error on ConfigMap creation with delete-and-recreate.
  4. pkg/controller/fileintegrity/fileintegrity_controller.go: Update outdated GODEBUG comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from xiaojiey and yuumasato March 2, 2026 16:39
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2026
@xiaojiey
Copy link

xiaojiey commented Mar 16, 2026

With a 3 master + 3 worker cluster, I can see huge difference without/with the PR.
Without the PR, I can see:

  1. Memory grows from ~10 MiB to 300-570 MiB
  2. Most memory is 'file' (page cache) in cgroup stats
  3. Memory does NOT drop after scans complete
  4. No 'reclaimed cgroup page cache' messages in logs
  5. Eventually may approach resource limit (600 MiB default)

With PR #877 fix applied, I can see:

  1. Memory stays at ~10-20 MiB consistently
  2. 'file' (page cache) stays near 0 MiB
  3. Logs show: 'reclaimed cgroup page cache after AIDE scan'

The problem is only the first scan result will be loged into aide pod logs. Not sure it is an env issue or not. I will double check tomorrow:

$ oc logs pod/aide-test-memory-growth-79xcn --all-containers 
ERROR: ld.so: object '/opt/libaide_md5_guard.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/opt/libaide_md5_guard.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
2026-03-16T14:03:37Z: Starting the AIDE runner daemon
W0316 14:03:37.988839       1 client_config.go:659] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2026-03-16T14:03:37Z: debug: aide files locked by aideLoop
2026-03-16T14:03:37Z: debug: Getting FileIntegrity openshift-file-integrity/test-memory-growth
2026-03-16T14:03:37Z: running aide check
2026-03-16T14:03:37Z: debug: Still waiting for file integrity instance initialization
2026-03-16T14:03:37Z: aide check returned status 22
2026-03-16T14:03:37Z: debug: /hostroot/etc/kubernetes/aide.log.new is missing or empty, did not copy
2026-03-16T14:03:37Z: debug: aide files unlocked by aideLoop
2026-03-16T14:03:37Z: debug: aide files locked by handleFailedResult
2026-03-16T14:03:37Z: debug: AIDE check failed, continuing to collect log file
2026-03-16T14:03:37Z: debug: Opening /hostroot/etc/kubernetes/aide.log
2026-03-16T14:03:37Z: debug: creating temporary configMap 'aide-test-memory-growth-ip-10-0-44-216.us-east-2.compute.internal' to report a FAILED scan result
2026-03-16T14:03:37Z: debug: Still waiting for file integrity instance initialization
2026-03-16T14:03:38Z: debug: uncompressed log size: 777
2026-03-16T14:03:38Z: debug: added 0 changed 0 removed 0
2026-03-16T14:03:38Z: debug: aide files unlocked by handleFailedResult

AIDE scans the entire host filesystem, and the resulting kernel page
cache is charged to the container's cgroup. Without reclamation, reported
memory grows toward the resource limit after each scan cycle.

Use cgroup v2 memory.reclaim to evict file-backed page cache after each
AIDE scan and database initialization. This reduced aide-worker memory
from ~570 MiB to ~11 MiB in testing on OCP 4.18.22.

Use raw syscalls (syscall.Open/Write/Close) for memory.reclaim instead
of os.OpenFile, because Go's runtime registers fds with its epoll poller
and the cgroup v2 file's poll support causes the goroutine to hang
waiting for write-readiness that never arrives.

Additional fixes:
- Close leaked file descriptor in getNonEmptyFile when file is empty
- Pre-compile regex patterns used in log parsing
- Handle AlreadyExists on ConfigMap creation to avoid unnecessary retries
- Call runtime.GC and debug.FreeOSMemory after scan to return heap to OS
- Update outdated GODEBUG comment (madvdontneed=1 is default since Go 1.16)
@Vincent056 Vincent056 force-pushed the fix-aide-worker-memory-growth branch from fe75940 to 81bd8e5 Compare March 17, 2026 04:15
@xiaojiey
Copy link

xiaojiey commented Mar 17, 2026

With this update, the scan won't stuck now. You can see the scan can be triggered successfully and is running as expected.
With a 3 master + 3 worker cluster, I can see huge difference without/with the PR.
Without the PR, I can see:

  1. Memory grows from ~10 MiB to 300-570 MiB
  2. Most memory is 'file' (page cache) in cgroup stats
  3. Memory does NOT drop after scans complete
  4. No 'reclaimed cgroup page cache' messages in logs
  5. Eventually may approach resource limit (600 MiB default)

With PR #877 fix applied, I can see:

1. Memory stays at ~10-20 MiB consistently
  2. 'file' (page cache) stays near 0 MiB
  3. Logs show: 'reclaimed cgroup page cache after AIDE scan'

More details:

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io 
NAME                                                           NODE                                        STATUS
test-memory-growth-ip-10-0-23-112.us-east-2.compute.internal   ip-10-0-23-112.us-east-2.compute.internal   Succeeded
test-memory-growth-ip-10-0-50-14.us-east-2.compute.internal    ip-10-0-50-14.us-east-2.compute.internal    Succeeded
test-memory-growth-ip-10-0-63-206.us-east-2.compute.internal   ip-10-0-63-206.us-east-2.compute.internal   Succeeded
test-memory-growth-ip-10-0-64-246.us-east-2.compute.internal   ip-10-0-64-246.us-east-2.compute.internal   Succeeded
test-memory-growth-ip-10-0-65-28.us-east-2.compute.internal    ip-10-0-65-28.us-east-2.compute.internal    Succeeded
test-memory-growth-ip-10-0-7-117.us-east-2.compute.internal    ip-10-0-7-117.us-east-2.compute.internal    Succeeded
$ oc logs pod/aide-test-memory-growth-9ntwf --all-containers |tail
2026-03-17T04:34:55Z: reclaimed cgroup page cache after AIDE scan
2026-03-17T04:35:15Z: debug: aide files locked by aideLoop
2026-03-17T04:35:15Z: running aide check
2026-03-17T04:35:56Z: aide check returned status 0
2026-03-17T04:35:56Z: debug: copying /hostroot/etc/kubernetes/aide.log.new to /hostroot/etc/kubernetes/aide.log
2026-03-17T04:35:56Z: debug: aide files unlocked by aideLoop
2026-03-17T04:35:56Z: debug: creating temporary configMap 'aide-test-memory-growth-ip-10-0-7-117.us-east-2.compute.internal' to report a successful scan result
2026-03-17T04:35:56Z: reclaimed cgroup page cache after AIDE scan
2026-03-17T04:36:16Z: debug: aide files locked by aideLoop
2026-03-17T04:36:16Z: running aide check
$ cat /tmp/aide-memory-growth-1773721445.csv | column -t -s,
timestamp   pod_name                       node                                       memory_mb  file_cache_mb  anon_mb  cycle
1773721459  aide-test-memory-growth-9ntwf  ip-10-0-7-117.us-east-2.compute.internal   599        515            37       1
1773721475  aide-test-memory-growth-cb22p  ip-10-0-63-206.us-east-2.compute.internal  12         0              8        1
1773721490  aide-test-memory-growth-fv2jv  ip-10-0-23-112.us-east-2.compute.internal  146        486            14       1
1773721504  aide-test-memory-growth-jj685  ip-10-0-50-14.us-east-2.compute.internal   599        542            25       1
1773721520  aide-test-memory-growth-qtrrf  ip-10-0-64-246.us-east-2.compute.internal  599        5              7        1
1773721534  aide-test-memory-growth-x486j  ip-10-0-65-28.us-east-2.compute.internal   14         5              6        1
1773721623  aide-test-memory-growth-9ntwf  ip-10-0-7-117.us-east-2.compute.internal   14         1              11       2
1773721638  aide-test-memory-growth-cb22p  ip-10-0-63-206.us-east-2.compute.internal  10         0              8        2
1773721653  aide-test-memory-growth-fv2jv  ip-10-0-23-112.us-east-2.compute.internal  599        516            36       2
1773721669  aide-test-memory-growth-jj685  ip-10-0-50-14.us-east-2.compute.internal   599        511            37       2
1773721684  aide-test-memory-growth-qtrrf  ip-10-0-64-246.us-east-2.compute.internal  15         5              7        2
1773721699  aide-test-memory-growth-x486j  ip-10-0-65-28.us-east-2.compute.internal   14         5              6        2
1773721789  aide-test-memory-growth-9ntwf  ip-10-0-7-117.us-east-2.compute.internal   33         235            14       3
1773721803  aide-test-memory-growth-cb22p  ip-10-0-63-206.us-east-2.compute.internal  599        559            19       3
1773721818  aide-test-memory-growth-fv2jv  ip-10-0-23-112.us-east-2.compute.internal  12         0              11       3
1773721832  aide-test-memory-growth-jj685  ip-10-0-50-14.us-east-2.compute.internal   11         0              10       3
1773721847  aide-test-memory-growth-qtrrf  ip-10-0-64-246.us-east-2.compute.internal  15         179            9        3
1773721863  aide-test-memory-growth-x486j  ip-10-0-65-28.us-east-2.compute.internal   599        1              15       3
1773721950  aide-test-memory-growth-9ntwf  ip-10-0-7-117.us-east-2.compute.internal   599        546            25       4
1773721966  aide-test-memory-growth-cb22p  ip-10-0-63-206.us-east-2.compute.internal  599        523            31       4
1773721980  aide-test-memory-growth-fv2jv  ip-10-0-23-112.us-east-2.compute.internal  545        557            22       4
1773721996  aide-test-memory-growth-jj685  ip-10-0-50-14.us-east-2.compute.internal   11         0              10       4
1773722010  aide-test-memory-growth-qtrrf  ip-10-0-64-246.us-east-2.compute.internal  599        559            18       4
1773722027  aide-test-memory-growth-x486j  ip-10-0-65-28.us-east-2.compute.internal   599        495            47       4
1773722116  aide-test-memory-growth-9ntwf  ip-10-0-7-117.us-east-2.compute.internal   599        1              11       5
1773722131  aide-test-memory-growth-cb22p  ip-10-0-63-206.us-east-2.compute.internal  9          0              8        5
1773722146  aide-test-memory-growth-fv2jv  ip-10-0-23-112.us-east-2.compute.internal  599        430            11       5
1773722160  aide-test-memory-growth-jj685  ip-10-0-50-14.us-east-2.compute.internal   144        449            13       5
1773722176  aide-test-memory-growth-qtrrf  ip-10-0-64-246.us-east-2.compute.internal  599        523            33       5
1773722191  aide-test-memory-growth-x486j  ip-10-0-65-28.us-east-2.compute.internal   599        495            47       5

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@Vincent056: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants