Bug description
After upgrading this instance from 33.0.4 to 34.0.1, PROPFIND requests to the WebDAV files endpoint (/remote.php/dav/files/...) targeting a directory that contains a large number of entries cause the serving php-fpm worker to consume a full CPU core and exceed the fastcgi read timeout (nginx logs upstream timed out). Driven by a desktop client performing virtual-files discovery, multiple such requests accumulated and saturated all CPU cores: system load reached ~30 on a 4-core host, with the Nextcloud php-fpm process consuming ~384% CPU while every other container on the host was idle.
This instance ran 33.0.4 prior to the upgrade and did not exhibit this symptom, which is why I am reporting it as a possible regression in the 34.0.x line. I want to be explicit that I have not root-caused it (see "What I could not determine" below) — I am only reporting observed behaviour.
Steps to reproduce
I do not have a minimal isolated reproduction. Observed in normal operation:
- A desktop client with Virtual Files (Files-on-Demand) enabled syncs an account whose tree includes a directory with many files/subfolders.
- The client issues a PROPFIND on that directory.
- The serving php-fpm worker pins a CPU core and the request exceeds nginx's fastcgi read timeout. With the client's periodic retries, additional workers accumulate until all cores are saturated.
Expected behaviour
PROPFIND on such a directory completes within a reasonable time without saturating CPU (as observed on 33.0.4).
Actual behaviour
The PROPFIND does not return within the fastcgi timeout and the worker remains CPU-bound. Multiple concurrent/retried requests accumulate and saturate all cores.
Logs (sanitized — host, user, internal IPs and folder name redacted)
nginx error.log:
[error] upstream timed out (110: Operation timed out) while reading response header from upstream,
request: "PROPFIND /remote.php/dav/files/<user>/<folder-with-many-entries> HTTP/1.1",
upstream: "fastcgi://127.0.0.1:9000"
[error] upstream timed out (110: Operation timed out) while reading response header from upstream,
request: "GET /index.php/204 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000" (this line then repeats ~every 60s)
Process / container CPU at the time of the event:
ps top consumers: ~37 × php-fpm84 processes, each 15–27% CPU.
docker stats: nextcloud at 384.42% CPU; all other containers below 1% (MariaDB was not among the CPU consumers — the load was entirely in php-fpm, not the database).
/proc/loadavg: ~30 (flat across 1/5/15 min) on a 4-core host.
Recovery: docker restart nextcloud cleared the workers and load returned to baseline (~0.7). Stopping the desktop client prevented re-triggering.
What I could NOT determine (no assumptions included)
- I did not capture a stack trace or profiler output of a CPU-bound worker, so I cannot identify the responsible code path. I can collect and attach one (e.g.
cat /proc/<pid>/stack, strace -p <pid>, or an Excimer/Xdebug profile) if a maintainer tells me what is most useful and the condition recurs.
- I cannot state the exact directory characteristic (entry count / nesting depth) that triggers it.
- I have not confirmed this on a clean upstream (non-linuxserver) image.
Note on worker accumulation
In this deployment the workers did not self-terminate because php-fpm max_execution_time = 0 and no request_terminate_timeout is set (image default). That explains why the workers accumulated rather than being reaped; I mention it only as context for the cascade, not as part of the suspected bug. Setting request_terminate_timeout bounded the blast radius but does not address the underlying high CPU on the PROPFIND itself.
Server configuration
- Nextcloud Server: 34.0.1 (previously 33.0.4 — symptom not observed there)
- Image: linuxserver/nextcloud, PHP 8.4 (php-fpm), nginx → fastcgi
- Host: 4 cores, 128 GB RAM, Docker on Unraid
- Database: MariaDB (idle during the event)
- Reverse proxy in front: nginx (SWAG)
Client
- Nextcloud Desktop Client 33.0.5 (build 20260519), Windows, Virtual Files (
wincfapi) mode, syncing the account root.
Bug description
After upgrading this instance from 33.0.4 to 34.0.1, PROPFIND requests to the WebDAV files endpoint (
/remote.php/dav/files/...) targeting a directory that contains a large number of entries cause the serving php-fpm worker to consume a full CPU core and exceed the fastcgi read timeout (nginx logsupstream timed out). Driven by a desktop client performing virtual-files discovery, multiple such requests accumulated and saturated all CPU cores: system load reached ~30 on a 4-core host, with the Nextcloud php-fpm process consuming ~384% CPU while every other container on the host was idle.This instance ran 33.0.4 prior to the upgrade and did not exhibit this symptom, which is why I am reporting it as a possible regression in the 34.0.x line. I want to be explicit that I have not root-caused it (see "What I could not determine" below) — I am only reporting observed behaviour.
Steps to reproduce
I do not have a minimal isolated reproduction. Observed in normal operation:
Expected behaviour
PROPFIND on such a directory completes within a reasonable time without saturating CPU (as observed on 33.0.4).
Actual behaviour
The PROPFIND does not return within the fastcgi timeout and the worker remains CPU-bound. Multiple concurrent/retried requests accumulate and saturate all cores.
Logs (sanitized — host, user, internal IPs and folder name redacted)
nginx
error.log:Process / container CPU at the time of the event:
pstop consumers: ~37 ×php-fpm84processes, each 15–27% CPU.docker stats:nextcloudat 384.42% CPU; all other containers below 1% (MariaDB was not among the CPU consumers — the load was entirely in php-fpm, not the database)./proc/loadavg: ~30 (flat across 1/5/15 min) on a 4-core host.Recovery:
docker restart nextcloudcleared the workers and load returned to baseline (~0.7). Stopping the desktop client prevented re-triggering.What I could NOT determine (no assumptions included)
cat /proc/<pid>/stack,strace -p <pid>, or an Excimer/Xdebug profile) if a maintainer tells me what is most useful and the condition recurs.Note on worker accumulation
In this deployment the workers did not self-terminate because php-fpm
max_execution_time = 0and norequest_terminate_timeoutis set (image default). That explains why the workers accumulated rather than being reaped; I mention it only as context for the cascade, not as part of the suspected bug. Settingrequest_terminate_timeoutbounded the blast radius but does not address the underlying high CPU on the PROPFIND itself.Server configuration
Client
wincfapi) mode, syncing the account root.