Skip to content

Possible 34.0.x regression: PROPFIND on a large directory saturates php-fpm CPU (not observed on 33.0.4) #61607

Description

@ACETyr

Bug description

After upgrading this instance from 33.0.4 to 34.0.1, PROPFIND requests to the WebDAV files endpoint (/remote.php/dav/files/...) targeting a directory that contains a large number of entries cause the serving php-fpm worker to consume a full CPU core and exceed the fastcgi read timeout (nginx logs upstream timed out). Driven by a desktop client performing virtual-files discovery, multiple such requests accumulated and saturated all CPU cores: system load reached ~30 on a 4-core host, with the Nextcloud php-fpm process consuming ~384% CPU while every other container on the host was idle.

This instance ran 33.0.4 prior to the upgrade and did not exhibit this symptom, which is why I am reporting it as a possible regression in the 34.0.x line. I want to be explicit that I have not root-caused it (see "What I could not determine" below) — I am only reporting observed behaviour.

Steps to reproduce

I do not have a minimal isolated reproduction. Observed in normal operation:

  1. A desktop client with Virtual Files (Files-on-Demand) enabled syncs an account whose tree includes a directory with many files/subfolders.
  2. The client issues a PROPFIND on that directory.
  3. The serving php-fpm worker pins a CPU core and the request exceeds nginx's fastcgi read timeout. With the client's periodic retries, additional workers accumulate until all cores are saturated.

Expected behaviour

PROPFIND on such a directory completes within a reasonable time without saturating CPU (as observed on 33.0.4).

Actual behaviour

The PROPFIND does not return within the fastcgi timeout and the worker remains CPU-bound. Multiple concurrent/retried requests accumulate and saturate all cores.

Logs (sanitized — host, user, internal IPs and folder name redacted)

nginx error.log:

[error] upstream timed out (110: Operation timed out) while reading response header from upstream,
  request: "PROPFIND /remote.php/dav/files/<user>/<folder-with-many-entries> HTTP/1.1",
  upstream: "fastcgi://127.0.0.1:9000"
[error] upstream timed out (110: Operation timed out) while reading response header from upstream,
  request: "GET /index.php/204 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000"   (this line then repeats ~every 60s)

Process / container CPU at the time of the event:

  • ps top consumers: ~37 × php-fpm84 processes, each 15–27% CPU.
  • docker stats: nextcloud at 384.42% CPU; all other containers below 1% (MariaDB was not among the CPU consumers — the load was entirely in php-fpm, not the database).
  • /proc/loadavg: ~30 (flat across 1/5/15 min) on a 4-core host.

Recovery: docker restart nextcloud cleared the workers and load returned to baseline (~0.7). Stopping the desktop client prevented re-triggering.

What I could NOT determine (no assumptions included)

  • I did not capture a stack trace or profiler output of a CPU-bound worker, so I cannot identify the responsible code path. I can collect and attach one (e.g. cat /proc/<pid>/stack, strace -p <pid>, or an Excimer/Xdebug profile) if a maintainer tells me what is most useful and the condition recurs.
  • I cannot state the exact directory characteristic (entry count / nesting depth) that triggers it.
  • I have not confirmed this on a clean upstream (non-linuxserver) image.

Note on worker accumulation

In this deployment the workers did not self-terminate because php-fpm max_execution_time = 0 and no request_terminate_timeout is set (image default). That explains why the workers accumulated rather than being reaped; I mention it only as context for the cascade, not as part of the suspected bug. Setting request_terminate_timeout bounded the blast radius but does not address the underlying high CPU on the PROPFIND itself.

Server configuration

  • Nextcloud Server: 34.0.1 (previously 33.0.4 — symptom not observed there)
  • Image: linuxserver/nextcloud, PHP 8.4 (php-fpm), nginx → fastcgi
  • Host: 4 cores, 128 GB RAM, Docker on Unraid
  • Database: MariaDB (idle during the event)
  • Reverse proxy in front: nginx (SWAG)

Client

  • Nextcloud Desktop Client 33.0.5 (build 20260519), Windows, Virtual Files (wincfapi) mode, syncing the account root.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions