Skip to content

feat(observability): ESL channel lifecycle traces + sip.call_id correlation#83

Merged
Otoru merged 5 commits into
mainfrom
feat/esl-lifecycle-traces
Jul 1, 2026
Merged

feat(observability): ESL channel lifecycle traces + sip.call_id correlation#83
Otoru merged 5 commits into
mainfrom
feat/esl-lifecycle-traces

Conversation

@Otoru

@Otoru Otoru commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Adds OpenTelemetry spans for the semantic FreeSWITCH channel lifecycle and CUSTOM subclasses, plus the metrics and session/consumer/load-balancer telemetry that go with them.

Why

Genesis traces were missing the call lifecycle — a call could be answered, bridged, transferred, and hung up without a single span showing it. This adds freeswitch.channel.* and freeswitch.{sofia,callcenter,conference,valet}.* spans so a call can be followed end to end, and centralizes every metric instrument in one module (duplicate instrument definitions were tripping OTel SDK warnings).

Every channel span carries sip.call_id (the standard SIP Call-ID, from variable_sip_call_id). It is a stable per-call identifier, so it serves as a join key when correlating Genesis traces with another system's view of the same call. The join happens at the observability backend (Grafana/Tempo) by filtering/grouping on that attribute — not in code. Bridge spans also carry bridge.a_uuid / bridge.b_uuid for cross-leg grouping.

Notable points

  • Lifecycle / CUSTOM processors are on by default; opt out with GENESIS_TRACE_ESL_LIFECYCLE=0 or GENESIS_TRACE_CUSTOM_SUBCLASSES=0.
  • They only enrich telemetry — they never consume events that route to user handlers.
  • UUIDs go on spans only; metric attributes use low-cardinality enums.

Docs

AGENTS.md and docs/content/docs/Observability/{tracing,metrics,server}.md updated.

Otoru added 3 commits June 30, 2026 22:55
Add OpenTelemetry spans for the semantic FreeSWITCH channel lifecycle and
CUSTOM subclasses so Genesis traces can be correlated with the passive
sniffer (Otoru/sniffer) at the observability backend.

Correlation is attribute-based: every freeswitch.channel.* span carries
sip.call_id (= variable_sip_call_id), matching the sniffer's voip.call_id.
The join happens in Grafana/Tempo, not in code — no sniffer changes required.
Bridge spans carry bridge.a_uuid/bridge.b_uuid for cross-leg grouping.
W3C traceparent propagation is intentionally out of scope.

Centralize all metric instruments in genesis/protocol/metrics.py (removes
duplicate instrument definitions); add calls.active, channel.bridge.events,
channel.transfers, channel.codec.changes, dialplan.applications,
hangup.causes.q850, event.processing.duration, events.without_sip_call_id,
session.commands, consumer.handlers, loadbalancer.selections/errors, and
observable gauges for queue depth.

New processors (channel_lifecycle_processor, custom_subclass_processor) emit
freeswitch.channel.{create,progress,progress_media,answer,bridge,unbridge,
hangup,hangup_complete,destroy,execute,execute_complete,codec} and
freeswitch.{sofia.transfer,sofia.register,callcenter.info,conference.*,
valet.info} spans. Instrument session.sendmsg/await_complete,
consumer.start/stop, queue.wait_and_acquire, and load balancer selection.

Cardinality rule: UUIDs on spans only; metric attributes use low-cardinality
enums. All tests pass on 3.10/3.11/3.12 (tox); black + mypy clean.
Remove all references to the sniffer product and its internal voip.call_id
attribute from Genesis docs and code comments. sip.call_id is the standard
SIP Call-ID; document it as a generic cross-system join key instead.

- tracing.md: format lifecycle/CUSTOM/session spans with the standard
  Description/Attributes pattern; rename 'Sniffer correlation' section to
  'Cross-system correlation (sip.call_id)'.
- metrics.md: drop sniffer mention from the lifecycle intro and the
  events.without_sip_call_id description.
- server.md: split the opening paragraph into endpoint/config/start-mode lists.
- AGENTS.md: rename the correlation subsection and drop sniffer naming.
- lifecycle.py / telemetry.py / channel.py / tests: drop sniffer naming from
  docstrings and comments; rename SIP_CALL_ID test payload value.
@Otoru Otoru changed the title feat(observability): ESL channel lifecycle traces + sniffer correlation feat(observability): ESL channel lifecycle traces + sip.call_id correlation Jul 1, 2026
Otoru added 2 commits June 30, 2026 23:11
- lifecycle: extract attribute-key constants (S1192), replace empty
  `with start_as_current_span(...): pass` blocks with an _attr_span
  helper using start_span + end() (S108), make channel/custom processors
  sync and use `protocol` in a debug log (S1172, S7503), replace the
  13-branch if/elif with a _LIFECYCLE_EMITTERS dispatch dict (S3776)
- telemetry: split build_event_attributes into _header_attr_name /
  _scalar helpers and hoist _EXPLICIT_ATTRS to module scope (S3776)
- metrics: drop unnecessary list() on the WeakSet (S7504)
- base: record GenesisError instead of generic Exception (S112)
- ring: centralise the loadbalancer.backend attribute key (S1192)
- tests: call the now-sync processors without await
@Otoru Otoru merged commit 051243c into main Jul 1, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant