Summary
The New Relic Python agent currently assumes long-lived processes (web servers, Celery prefork).
In fork-per-job frameworks like RQ, each job runs in its own short-lived child process.
This leads to data loss and high overhead when trying to instrument these jobs with the existing agent.
Desired Behavior
- Allow parent process (worker launcher) to call
register_application() once and keep a harvester alive.
- Child processes should be able to accept distributed trace headers, run the job, and emit lightweight job events to the parent.
- Parent consumer should handle the actual New Relic API calls (
accept_distributed_trace_headers, record_custom_event, record_exception).
- This reduces per-job handshake cost and prevents metric/event loss when children exit with
os._exit().
Possible Solution
- Provide an official recipe for fork-per-job frameworks:
- Parent process: start a consumer thread (UDS datagram, stdout, or Redis Streams) and register once.
- Child process: fire-and-forget emit of
{type, payload, trace_headers}.
- Or, add an optional feature flag (e.g.
fork_per_job.enabled=true) with a built-in transport adapter.
- Maintain backward compatibility: default behavior unchanged.
Additional Context
- Today’s workaround is calling
register_application()/shutdown_agent() per job, which is both slow and lossy.
- RQ is widely used for its simplicity and isolation guarantees, but its fork-per-job model makes New Relic integration hard.
- This feature would let RQ users adopt New Relic APM more easily, while keeping job isolation.
Summary
The New Relic Python agent currently assumes long-lived processes (web servers, Celery prefork).
In fork-per-job frameworks like RQ, each job runs in its own short-lived child process.
This leads to data loss and high overhead when trying to instrument these jobs with the existing agent.
Desired Behavior
register_application()once and keep a harvester alive.accept_distributed_trace_headers,record_custom_event,record_exception).os._exit().Possible Solution
{type, payload, trace_headers}.fork_per_job.enabled=true) with a built-in transport adapter.Additional Context
register_application()/shutdown_agent()per job, which is both slow and lossy.