feat(observability): ESL channel lifecycle traces + sip.call_id correlation#83
Merged
Conversation
Add OpenTelemetry spans for the semantic FreeSWITCH channel lifecycle and
CUSTOM subclasses so Genesis traces can be correlated with the passive
sniffer (Otoru/sniffer) at the observability backend.
Correlation is attribute-based: every freeswitch.channel.* span carries
sip.call_id (= variable_sip_call_id), matching the sniffer's voip.call_id.
The join happens in Grafana/Tempo, not in code — no sniffer changes required.
Bridge spans carry bridge.a_uuid/bridge.b_uuid for cross-leg grouping.
W3C traceparent propagation is intentionally out of scope.
Centralize all metric instruments in genesis/protocol/metrics.py (removes
duplicate instrument definitions); add calls.active, channel.bridge.events,
channel.transfers, channel.codec.changes, dialplan.applications,
hangup.causes.q850, event.processing.duration, events.without_sip_call_id,
session.commands, consumer.handlers, loadbalancer.selections/errors, and
observable gauges for queue depth.
New processors (channel_lifecycle_processor, custom_subclass_processor) emit
freeswitch.channel.{create,progress,progress_media,answer,bridge,unbridge,
hangup,hangup_complete,destroy,execute,execute_complete,codec} and
freeswitch.{sofia.transfer,sofia.register,callcenter.info,conference.*,
valet.info} spans. Instrument session.sendmsg/await_complete,
consumer.start/stop, queue.wait_and_acquire, and load balancer selection.
Cardinality rule: UUIDs on spans only; metric attributes use low-cardinality
enums. All tests pass on 3.10/3.11/3.12 (tox); black + mypy clean.
Remove all references to the sniffer product and its internal voip.call_id attribute from Genesis docs and code comments. sip.call_id is the standard SIP Call-ID; document it as a generic cross-system join key instead. - tracing.md: format lifecycle/CUSTOM/session spans with the standard Description/Attributes pattern; rename 'Sniffer correlation' section to 'Cross-system correlation (sip.call_id)'. - metrics.md: drop sniffer mention from the lifecycle intro and the events.without_sip_call_id description. - server.md: split the opening paragraph into endpoint/config/start-mode lists. - AGENTS.md: rename the correlation subsection and drop sniffer naming. - lifecycle.py / telemetry.py / channel.py / tests: drop sniffer naming from docstrings and comments; rename SIP_CALL_ID test payload value.
- lifecycle: extract attribute-key constants (S1192), replace empty `with start_as_current_span(...): pass` blocks with an _attr_span helper using start_span + end() (S108), make channel/custom processors sync and use `protocol` in a debug log (S1172, S7503), replace the 13-branch if/elif with a _LIFECYCLE_EMITTERS dispatch dict (S3776) - telemetry: split build_event_attributes into _header_attr_name / _scalar helpers and hoist _EXPLICIT_ATTRS to module scope (S3776) - metrics: drop unnecessary list() on the WeakSet (S7504) - base: record GenesisError instead of generic Exception (S112) - ring: centralise the loadbalancer.backend attribute key (S1192) - tests: call the now-sync processors without await
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds OpenTelemetry spans for the semantic FreeSWITCH channel lifecycle and CUSTOM subclasses, plus the metrics and session/consumer/load-balancer telemetry that go with them.
Why
Genesis traces were missing the call lifecycle — a call could be answered, bridged, transferred, and hung up without a single span showing it. This adds
freeswitch.channel.*andfreeswitch.{sofia,callcenter,conference,valet}.*spans so a call can be followed end to end, and centralizes every metric instrument in one module (duplicate instrument definitions were tripping OTel SDK warnings).Every channel span carries
sip.call_id(the standard SIPCall-ID, fromvariable_sip_call_id). It is a stable per-call identifier, so it serves as a join key when correlating Genesis traces with another system's view of the same call. The join happens at the observability backend (Grafana/Tempo) by filtering/grouping on that attribute — not in code. Bridge spans also carrybridge.a_uuid/bridge.b_uuidfor cross-leg grouping.Notable points
GENESIS_TRACE_ESL_LIFECYCLE=0orGENESIS_TRACE_CUSTOM_SUBCLASSES=0.Docs
AGENTS.mdanddocs/content/docs/Observability/{tracing,metrics,server}.mdupdated.