Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,6 @@ docker/freeswitch/conf/**/.fsxml
docker/freeswitch/conf/**/*.fsxml
docker/freeswitch-test/config/logs/
docker/freeswitch-test/config/recordings/freeswitch/

# Local FreeSWITCH source checkout (research reference, not part of the project)
/freeswitch/
65 changes: 65 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,71 @@ async def my_feature():
raise
```

### Centralized metrics

All OTel metric instruments live in `genesis/protocol/metrics.py`. **Do not
re-declare an instrument that already exists there** — duplicate instrument
creation for the same metric name trips static analysis and produces OTel SDK
warnings. Import the instrument (and the `safe_add` / `safe_record` helpers)
from `genesis.protocol.metrics` instead:

```python
from genesis.protocol.metrics import (
calls_active_counter,
safe_add,
safe_record,
event_processing_duration,
)
```

`safe_add(counter, *args, **kwargs)` and `safe_record(histogram, *args, **kwargs)`
swallow OTel/no-provider errors so a missing exporter never crashes the protocol.

### ESL channel lifecycle spans

`genesis/protocol/lifecycle.py` registers two event processors
(`channel_lifecycle_processor`, `custom_subclass_processor`) that emit
`freeswitch.channel.*` and `freeswitch.sofia.*` / `freeswitch.callcenter.*` /
`freeswitch.conference.*` / `freeswitch.valet.*` spans for the semantic
FreeSWITCH channel lifecycle. They run after the core protocol processors
(auth, command reply, disconnect) and only enrich telemetry — they never
consume events that route to user handlers. They are on by default; opt out
with `GENESIS_TRACE_ESL_LIFECYCLE=0` / `GENESIS_TRACE_CUSTOM_SUBCLASSES=0`.

Emitted spans (non-exhaustive): `freeswitch.channel.create`, `.progress`,
`.progress_media`, `.answer`, `.bridge`, `.unbridge`, `.hangup`,
`.hangup_complete`, `.destroy`, `.execute`, `.execute_complete`, `.codec`,
`freeswitch.call.update`, `freeswitch.sofia.transfer`, `freeswitch.sofia.register`,
`freeswitch.callcenter.info`, `freeswitch.conference.maintenance`,
`freeswitch.conference.cdr`, `freeswitch.valet.info`.

### Cross-system correlation (sip.call_id)

Every channel lifecycle span carries `sip.call_id` (= the ESL
`variable_sip_call_id` header), the standard SIP `Call-ID`. This is a stable
per-call identifier that any other SIP observer of the same call will also
have, so it is the natural join key when correlating Genesis traces with
another system's traces of the same call. The join happens **at the
observability backend** (Grafana/Tempo), by filtering/grouping on `sip.call_id`
— not in code.

Cross-leg grouping: bridge spans carry `bridge.a_uuid` and `bridge.b_uuid`
(from `Bridge-A-Unique-ID` / `Bridge-B-Unique-ID`), so the a-leg and b-leg of a
call can be tied together at the backend.

The `genesis.events.without_sip_call_id` counter tracks channel events that
lack the correlation key (a correlation-gap signal). W3C `traceparent` /
`X-Tracespan` propagation is intentionally **out of scope**; the attribute join
is sufficient.

### Cardinality rule

UUIDs go **on spans only**, never as metric attributes. Metric attributes use
low-cardinality enums/labels (`channel.state`, `direction`, `hangup.cause`,
`application.name`, `bridge.result`, `transfer.type`, `loadbalancer.backend`,
...). `queue.depth` is a span attribute (not a metric label) for the same
reason.

## Pre-PR Checklist

**CRITICAL: Always run the full CI stack locally before opening a PR.**
Expand Down
71 changes: 71 additions & 0 deletions docs/content/docs/Observability/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,74 @@ For programmatic access to load counts per destination, use the load balancer's
- **`genesis_timeouts_total`** (Counter)
- Description: Number of timeouts
- Attributes: `timeout.type` (wait, command, connection), `timeout.operation`, `timeout.duration`

## Channel lifecycle metrics

These metrics describe what a call is doing across its lifecycle — from the
moment FreeSWITCH creates the channel until it is destroyed. Use them together
with the [tracing](./tracing) spans to follow a call end to end, and see
[Cross-system correlation](./tracing#cross-system-correlation-sipcall_id) for
how `sip.call_id` lets you join these traces with another system's view of the
same call.

- **`genesis.calls.active`** (UpDownCounter)
- Description: Number of calls currently active, by state and direction. Goes up when a channel is created and back down when it is destroyed.
- Attributes: `channel.state`, `direction`

- **`genesis.channel.bridge.events`** (Counter)
- Description: Bridges established and torn down, from the authoritative `CHANNEL_BRIDGE` / `CHANNEL_UNBRIDGE` events.
- Attributes: `bridge.result` (`established`, `unbridged`), `hangup.cause`

- **`genesis.channel.transfers`** (Counter)
- Description: Call transfers observed through the `sofia::transferor` and `sofia::transferee` events.
- Attributes: `transfer.type` (`blind`, `attended`), `transfer.role`

- **`genesis.channel.codec.changes`** (Counter)
- Description: Codec renegotiations observed through `CODEC` events.
- Attributes: `channel.read_codec`, `channel.write_codec`

- **`genesis.dialplan.applications`** (Counter)
- Description: Dialplan applications executed, from `CHANNEL_EXECUTE` and `CHANNEL_EXECUTE_COMPLETE`.
- Attributes: `application.name`, `application.result` (`started`, `success`, `fail`)

- **`genesis.channel.hangup.causes.q850`** (Counter)
- Description: Hangup causes grouped by Q.850 code.
- Attributes: `hangup.cause.q850`

- **`genesis.event.processing.duration`** (Histogram)
- Description: How long it takes to dispatch a single event through the processors and routing.
- Attributes: `event.name`

- **`genesis.events.without_sip_call_id`** (Counter)
- Description: Channel events that arrived without a `variable_sip_call_id`. A high value means those calls cannot be joined to another system's view of the same call via `sip.call_id`.
- Attributes: (none)

## Session, consumer, load balancer and queue metrics

- **`genesis.session.commands`** (Counter)
- Description: `sendmsg` commands sent through a session, by application.
- Attributes: `application.name`

- **`genesis.session.command.duration`** (Histogram)
- Description: How long a session `sendmsg` command takes to complete.
- Attributes: `application.name`

- **`genesis.consumer.handlers`** (Counter)
- Description: How many times a consumer handler was invoked, by event.
- Attributes: `event.name`

- **`genesis.loadbalancer.selections`** (Counter)
- Description: Destinations picked by the load balancer, including when it falls back to the first available destination.
- Attributes: `loadbalancer.backend`, `loadbalancer.result` (`selected`, `fallback`)

- **`genesis.loadbalancer.errors`** (Counter)
- Description: Errors raised while selecting a destination.
- Attributes: `loadbalancer.backend`, `error`

- **`genesis.commands.queue.depth`** (ObservableGauge)
- Description: How many command replies are still pending. Useful to spot backpressure on the command path.
- Attributes: (none)

- **`genesis.events.queue.depth`** (ObservableGauge)
- Description: How many events are waiting to be processed. Useful to spot backpressure on the event path.
- Attributes: (none)
15 changes: 14 additions & 1 deletion docs/content/docs/Observability/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,20 @@ title: HTTP Server
weight: 30
---

A built-in HTTP server exposes health, readiness, and metrics. Port **8000** by default; set `GENESIS_OBSERVABILITY_PORT` to change it. With the CLI, the server starts automatically; with the library, you start it yourself (see below).
Genesis ships a built-in HTTP server that exposes three endpoints:

- **`/health`** — liveness probe (is the process up and connected?)
- **`/ready`** — readiness probe (can the app accept work yet?)
- **`/metrics`** — Prometheus scrape endpoint for all Genesis metrics

The server listens on port **8000** by default. Change it with the
`GENESIS_OBSERVABILITY_PORT` environment variable.

How it starts depends on how you run Genesis:

- **CLI** (`genesis consumer` / `genesis outbound`): the server starts
automatically.
- **Library**: you start the server yourself (see [Library](#library) below).

## Endpoints

Expand Down
119 changes: 119 additions & 0 deletions docs/content/docs/Observability/tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,125 @@ Genesis automatically creates spans for the following operations:
- Description: Ringing a group of destinations
- Attributes: `ring_group.mode`, `ring_group.size`, `ring_group.timeout`, `ring_group.has_balancer`, `ring_group.has_variables`, `ring_group.balanced`, `ring_group.result`, `ring_group.duration`, `ring_group.answered_uuid`, `ring_group.answered_dial_path`, `ring_group.error` (if error)

**ESL Channel Lifecycle Spans (`freeswitch.channel.*`):**

These spans follow a call across its FreeSWITCH lifecycle, from channel
creation to destruction. They carry the channel UUIDs and the SIP correlation
key on the span (see [Cross-system correlation](#cross-system-correlation-sipcall_id)).

- **`freeswitch.channel.create`**
- Description: A new channel was created
- Attributes: `channel.uuid`, `channel.call_uuid`, `channel.direction`, `sip.call_id`, `channel.destination_number`, `channel.context`

- **`freeswitch.channel.progress`** / **`freeswitch.channel.progress_media`**
- Description: The call is progressing / early media is flowing
- Attributes: `channel.state`, `answer.state`, codec names

- **`freeswitch.channel.answer`**
- Description: The call was answered
- Attributes: `channel.state`, `answer.state`, codec names

- **`freeswitch.channel.bridge`**
- Description: Two channels were bridged together
- Attributes: `bridge.a_uuid`, `bridge.b_uuid`, `other_leg.*`
- Events: `bridge.established`

- **`freeswitch.channel.unbridge`**
- Description: The bridge between two channels was torn down
- Attributes: `bridge.a_uuid`, `bridge.b_uuid`, `hangup.cause`
- Events: `bridge.torn_down`

- **`freeswitch.channel.hangup`**
- Description: The channel is hanging up
- Attributes: `hangup.cause`, `channel.state`
- Events: `hangup.cause.<normalized>`

- **`freeswitch.channel.hangup_complete`**
- Description: Hangup is complete and the call is finalized
- Attributes: `hangup.cause`, `hangup.cause.q850`
- Events: `call.finalized`

- **`freeswitch.channel.destroy`**
- Description: The channel was destroyed
- Attributes: `channel.uuid`, `sip.call_id`

- **`freeswitch.channel.execute`** / **`freeswitch.channel.execute_complete`**
- Description: A dialplan application started / finished executing
- Attributes: `application.name`, `application.uuid`, `application.data` / `application.response`
- Events: `app.<name>.done`

- **`freeswitch.channel.codec`**
- Description: The channel negotiated (or renegotiated) its codecs
- Attributes: `channel.read_codec.*`, `channel.write_codec.*`

- **`freeswitch.call.update`**
- Description: The caller ID or bridged state changed
- Attributes: `bridged.to`, `caller.transfer_source`
- Events: `caller_id.mutated`

**CUSTOM Subclass Spans:**

These spans cover the `CUSTOM` event subclasses FreeSWITCH emits for
transfers, registrations, callcenter, conference and valet parking.

- **`freeswitch.sofia.transfer`**
- Description: A call transfer was observed
- Attributes: `transfer.role` (`transferor` / `transferee`), `transfer.type` (`blind` / `attended`)
- Events: `transfer.initiated`

- **`freeswitch.sofia.register`** / **`freeswitch.sofia.reinvite`** / **`freeswitch.sofia.replaced`**
- Description: A SIP registration, reinvite or replace was observed
- Attributes: `register.aor`, `register.action`, `gateway.name` / `gateway.state`, `sofia.profile`

- **`freeswitch.callcenter.info`**
- Description: A callcenter queue event
- Attributes: `cc.queue`, `cc.action`, `cc.agent`, `cc.member_uuid`, `cc.count`, `cc.selection`

- **`freeswitch.conference.maintenance`** / **`freeswitch.conference.cdr`**
- Description: A conference maintenance or CDR event
- Attributes: `conference.name`, `conference.profile`, `conference.action`, `conference.member_id`

- **`freeswitch.valet.info`**
- Description: A valet parking event
- Attributes: `valet.lot`, `valet.extension`, `valet.action`, `bridge.to_uuid`

**Session / Consumer / Queue Spans:**

- **`session.sendmsg`** (`Session` module)
- Description: A `sendmsg` command was sent through a session
- Attributes: `channel.uuid`, `application.name`, `application.uuid`, `application.block`

- **`session.await_complete`** (`Session` module)
- Description: Waits for a blocking `sendmsg` to complete (child of `session.sendmsg` when `block=True`)
- Attributes: `channel.uuid`, `application.uuid`

- **`consumer.start`** / **`consumer.stop`** (`Consumer` module)
- Description: The consumer subscribed to events / stopped
- Attributes: `consumer.host`, `consumer.port`

- **`queue.wait_and_acquire`** (`Queue` module)
- Description: Waiting to acquire an item from the queue
- Attributes: `queue.id`, `queue.item_id`, `queue.depth` (span attribute, not a metric label)

## Cross-system correlation (sip.call_id)

Every `freeswitch.channel.*` span carries **`sip.call_id`**, taken from the ESL
`variable_sip_call_id` header. This is the standard SIP `Call-ID` header, a
stable per-call identifier that any other SIP observer of the same call will
also have. That makes it a natural join key when you want to correlate Genesis
traces with traces from another system that observed the same call.

- The join happens **at the observability backend** (Grafana/Tempo or similar),
by filtering or grouping on `sip.call_id` — not in code.
- Cross-leg grouping: bridge spans carry **`bridge.a_uuid`** and
**`bridge.b_uuid`**, so the a-leg and b-leg of a call can be tied together.
- The `genesis.events.without_sip_call_id` metric counts channel events that
arrived without the correlation key — a signal that those calls cannot be
joined to another system's view.

The lifecycle/CUSTOM processors are on by default. Opt out with
`GENESIS_TRACE_ESL_LIFECYCLE=0` or `GENESIS_TRACE_CUSTOM_SUBCLASSES=0`.

## Configuration

Install the OpenTelemetry SDK:
Expand Down
Loading
Loading