Skip to content

Fix AMI callback queue desynchronization causing dialer lockup under high call volume#66

Open
psycho84tr wants to merge 1 commit intoIssabelFoundation:masterfrom
psycho84tr:fix-ami-callback-desync-lockup
Open

Fix AMI callback queue desynchronization causing dialer lockup under high call volume#66
psycho84tr wants to merge 1 commit intoIssabelFoundation:masterfrom
psycho84tr:fix-ami-callback-desync-lockup

Conversation

@psycho84tr
Copy link

Summary

Under high outbound call volume, the dialer can lock up completely due to a race condition in AMI (Asterisk Manager Interface) callback queue processing. This affects production systems running predictive/progressive dialer campaigns and requires a manual restart to recover.

Root Cause

Three related issues in AMIClientConn.class.php:

  1. wait_response() infinite loop — When an AMI response is lost or delayed, the synchronous wait loop blocks indefinitely, freezing the entire dialer event loop.

  2. Orphan responses break the event chain — AMI responses arriving without a matching callback in _queue_requests cause process_event() to return FALSE, halting event processing.

  3. Unsent request queue mismatch — When a queued request hasn't been sent yet but a response arrives for a different request, the handler assignment fails silently.

Additionally, AMIEventProcess.class.php crashes with a PHP Fatal Error (Call to a member function asyncOriginate() on a non-object) when _ejecutarOriginate() is called before the AMI connection is established after a process restart.

Changes

AMIClientConn.class.php:

  • Add 30-second timeout to wait_response() to prevent indefinite blocking
  • Handle orphan responses gracefully (log warning, return TRUE to continue processing)
  • Handle unsent-request/response mismatch by sending the next queued request and continuing

AMIEventProcess.class.php:

  • Add NULL check for $this->_ami in _ejecutarOriginate() before calling marcarLlamada()

Symptoms (before fix)

  • Dialer freezes completely during high-volume campaigns
  • Agents get logged out from their sessions
  • No new calls are placed
  • Web interface becomes unresponsive
  • Only a full dialer restart recovers the system
  • Log shows: segundo Response sobreescribe primer Response and se pierde respuesta porque no hay callback encolado

Testing

  • Tested on a production Issabel 4 system with ~13 agents running predictive outbound campaigns
  • Confirmed the dialer survives high call volume without freezing after the fix
  • Warning messages appear in the log but the system continues operating normally

Made with Cursor

Under high call volume, the AMI event processing in AMIClientConn can
lock up the entire dialer due to three related issues:

1. wait_response() blocks indefinitely when an AMI response is lost
   or delayed, freezing the dialer event loop.
   Fix: Add a 30-second timeout so the dialer can recover.

2. Orphan AMI responses (no matching callback in queue) cause
   process_event() to return FALSE, breaking the event chain.
   Fix: Log a warning and return TRUE to continue processing.

3. When a queued request has not been sent yet but a response arrives,
   the handler assignment fails silently.
   Fix: Skip the unsent request, send the next one, and continue.

Additionally, in AMIEventProcess, _ejecutarOriginate() crashes with
"Call to a member function asyncOriginate() on a non-object" when
the AMI connection is not yet established after a process restart.
   Fix: Add a NULL check for $this->_ami before calling marcarLlamada().

These fixes prevent the dialer from freezing during high-volume
outbound campaigns and allow graceful recovery from AMI communication
glitches without requiring a manual restart.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant