Skip to content

Add lazy loading for io counter to prevent crash on package import#45275

Open
tbequinor wants to merge 4 commits intoAzure:mainfrom
tbequinor:main
Open

Add lazy loading for io counter to prevent crash on package import#45275
tbequinor wants to merge 4 commits intoAzure:mainfrom
tbequinor:main

Conversation

@tbequinor
Copy link

Description

When I bumped azure-monitor-opentelemetry-exporter from 1.0.0b36 to 1.0.0b48 in my app, it failed to start. I am running a debian docker container in Azure App Service. With Supervisor and uwsgi. Here is parts of the stack trace:

...
File "/usr/src/app/.venv/lib/python3.13/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/init.py", line 4, in
from azure.monitor.opentelemetry.exporter._performance_counters._manager import (
enable_performance_counters,
)
File "/usr/src/app/.venv/lib/python3.13/site-packages/azure/monitor/opentelemetry/exporter/_performance_counters/_manager.py", line 52, in
_io_counters_initial = _PROCESS.io_counters()
File "/usr/src/app/.venv/lib/python3.13/site-packages/psutil/init.py", line 830, in io_counters
return self._proc.io_counters()
~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/src/app/.venv/lib/python3.13/site-packages/psutil/_pslinux.py", line 1718, in wrapper
raise AccessDenied(self.pid, self._name)
psutil.AccessDenied: (pid=41)
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. going in full dynamic mode ***
--- no python application found, check your startup logs for errors ---

Added lazy loading, so it won't do this on import, and more robust handling for when psutil don't have the required access.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings February 20, 2026 09:54
@github-actions github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter labels Feb 20, 2026
@github-actions
Copy link

Thank you for your contribution @tbequinor! We will review the pull request and get back to you soon.

@tbequinor tbequinor closed this Feb 20, 2026
@tbequinor tbequinor reopened this Feb 20, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a startup/import crash in azure-monitor-opentelemetry-exporter caused by calling psutil.Process().io_counters() at module import time in restricted environments (e.g., container/App Service), by deferring initialization until the metric callback runs.

Changes:

  • Make process I/O counter initialization lazy to avoid import-time psutil.AccessDenied failures.
  • Disable process I/O counters when io_counters is inaccessible/unavailable and return a safe zero observation.
  • Update/add unit tests to account for the new lazy-init/disable behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
sdk/monitor/azure-monitor-opentelemetry-exporter/azure/monitor/opentelemetry/exporter/_performance_counters/_manager.py Removes import-time I/O counter read and adds lazy initialization + disable-on-failure logic in _get_process_io.
sdk/monitor/azure-monitor-opentelemetry-exporter/tests/performance_counters/test_performance_counters.py Resets new global state and adds tests for AccessDenied + disabled I/O behavior.

Comment on lines +178 to +204
# Lazily initialize counters to avoid import-time failures in restricted environments.
if not _IO_INITIALIZED:
_IO_INITIALIZED = True
if _IO_AVAILABLE:
io_counters_func = getattr(_PROCESS, "io_counters", None)
if not callable(io_counters_func):
_IO_AVAILABLE = False
yield Observation(0, {})
return
io_counters_initial = io_counters_func()
_IO_LAST_COUNT = int(getattr(io_counters_initial, "read_bytes", 0)) + int(
getattr(io_counters_initial, "write_bytes", 0)
)
_IO_LAST_TIME = datetime.now()

if not _IO_AVAILABLE:
yield Observation(0, {})
return

io_counters_func = getattr(_PROCESS, "io_counters", None)
if not callable(io_counters_func):
_IO_AVAILABLE = False
yield Observation(0, {})
return

io_counters = io_counters_func()
rw_count = int(getattr(io_counters, "read_bytes", 0)) + int(getattr(io_counters, "write_bytes", 0))
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When _IO_INITIALIZED is False, the callback initializes _IO_LAST_COUNT/_IO_LAST_TIME using an io_counters() call, but then continues in the same invocation to call io_counters() again and compute a rate over a near-zero elapsed time. This can create a large spike on the first observed value and does an extra syscall. Consider treating the first call as a baseline: after setting _IO_LAST_COUNT/_IO_LAST_TIME, yield Observation(0, {}) (or return) so rate calculation starts on the next collection interval.

Copilot uses AI. Check for mistakes.
Comment on lines +207 to +236
@mock.patch("azure.monitor.opentelemetry.exporter._performance_counters._manager._PROCESS")
def test_get_process_io_access_denied_disables_io(self, mock_process):
"""Test process I/O retrieval disables I/O counters when access is denied."""
mock_process.io_counters.side_effect = psutil.AccessDenied(1)

import azure.monitor.opentelemetry.exporter._performance_counters._manager as manager_module

manager_module._IO_AVAILABLE = True
manager_module._IO_INITIALIZED = False

result = list(_get_process_io(None))

self.assertEqual(len(result), 1)
self.assertEqual(result[0].value, 0)
self.assertFalse(manager_module._IO_AVAILABLE)
mock_process.io_counters.assert_called_once_with()

@mock.patch("azure.monitor.opentelemetry.exporter._performance_counters._manager._PROCESS")
def test_get_process_io_disabled_does_not_call_io_counters(self, mock_process):
"""Test process I/O retrieval does not call io_counters when unavailable."""
import azure.monitor.opentelemetry.exporter._performance_counters._manager as manager_module

manager_module._IO_AVAILABLE = False
manager_module._IO_INITIALIZED = True

result = list(_get_process_io(None))

self.assertEqual(len(result), 1)
self.assertEqual(result[0].value, 0)
mock_process.io_counters.assert_not_called()
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests cover AccessDenied and the disabled path, but there’s no coverage for the new lazy-initialization behavior on the first _get_process_io() invocation (when _IO_INITIALIZED is False). Adding a test for that path would catch issues like first-sample spikes / double io_counters() calls, and would directly validate the regression described in the PR (no import-time crash / baseline initialization behavior).

Copilot uses AI. Check for mistakes.
Comment on lines 215 to 218
except (psutil.NoSuchProcess, psutil.AccessDenied, Exception) as e: # pylint: disable=broad-except
_logger.exception("Error getting process I/O rate: %s", e)
_IO_AVAILABLE = False
_logger.debug("Disabling process I/O counter due to inaccessible io_counters: %s", e)
yield Observation(0, {})
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except (psutil.NoSuchProcess, psutil.AccessDenied, Exception) block effectively catches all exceptions (because Exception is included) and permanently sets _IO_AVAILABLE = False. This means a transient/unknown failure will disable the metric for the lifetime of the process and may hide real bugs. Consider only disabling on expected non-recoverable cases (e.g., AccessDenied, missing/non-callable io_counters, NotImplementedError), and for unexpected exceptions keep logging (ideally with stack trace) without flipping _IO_AVAILABLE to False.

Copilot uses AI. Check for mistakes.
@tbequinor
Copy link
Author

tbequinor commented Feb 20, 2026

@microsoft-github-policy-service agree company="equinor"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments