EDIT/UPDATE:: A bug in Envisalink EVL4 (DSC) firmware version 01.0.203 (2023-01-27) uncovered a bug AlarmServer (or one of the supporting libraries), I've been running the old master branch from 2017) on Python 2.7, tornado 4.4.1).
The bug in the EVL4 firmware caused the EVL4 to continually send 67302, 67303, 67304 (Partition 2,3,4 busy). The traffic was about 30 messages from the Envisalink every second.
What happened was:
- AlarmServer would connect and login,
- The EVL4 started sending
67302, 67303, 67304 over and over.
- AlarmServer would send the
001 command to get status updates.
- EVL4 sends the normal status update responses mixed in with
67302, 67303, 67304.
- AlarmServer logged this output, which caused some sort of recursion to occur. When the recursion depth hit the Python default recursion limit of 1000, an exception was thrown
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _TracebackLogger.__del__ of <tornado.concurrent. _TracebackLogger object at ...
Note: simply raising Python's recursion limit to 10,000 via sys.setrecursionlimit(10000) did NOT work around the problem. My recollection is that AlarmServer waits a second for the envisalink to stop sending updates to know that the initial state dump is complete. With the envisalink constantly sending messages, AlarmServer never sent anything to the envisalink after the initial 001 status dump.
There is a separate old/(known?) bug that AlarmServer isn't able to recover from an exception thrown while processing data from the envisalink. The AlarmServer process continues to run and other (I'm not sure of the correct terminology here: thread, coroutine, callback) continue to run. Things like ping will still get sent to the envisalink, but no responses will ever be processed.
So the problems/questions are:
- (This issue) why did these messages from the envisalink cause recursion to occur and how to fix that ?
- (old issue) Any easy way to make AlarmServer able to recover from an exception when receiving, or get AlarmServer to exit and restart if the receive (thread/coroutine/process) gets broken
I'm still investigating but wanted to post this in case someone else sees this problem or has a clue what's going on.
Here's the log:
...
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@158: RX < 610 - Zone 064 Restored
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed zone 64
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Traceback (most recent call last):
File "/usr/local/lib/python2.7/logging/__init__.py", line 861, in emit
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _TracebackLogger.__del__ of <tornado.concurrent.
_TracebackLogger object at 0xb74d672c>> ignored
EDIT/UPDATE:: A bug in Envisalink EVL4 (DSC) firmware version 01.0.203 (2023-01-27) uncovered a bug AlarmServer (or one of the supporting libraries), I've been running the old master branch from 2017) on Python 2.7, tornado 4.4.1).
The bug in the EVL4 firmware caused the EVL4 to continually send
67302,67303,67304(Partition 2,3,4 busy). The traffic was about 30 messages from the Envisalink every second.What happened was:
67302,67303,67304over and over.001command to get status updates.67302,67303,67304.Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _TracebackLogger.__del__ of <tornado.concurrent. _TracebackLogger object at ...Note: simply raising Python's recursion limit to 10,000 via
sys.setrecursionlimit(10000)did NOT work around the problem. My recollection is that AlarmServer waits a second for the envisalink to stop sending updates to know that the initial state dump is complete. With the envisalink constantly sending messages, AlarmServer never sent anything to the envisalink after the initial001status dump.There is a separate old/(known?) bug that AlarmServer isn't able to recover from an exception thrown while processing data from the envisalink. The AlarmServer process continues to run and other (I'm not sure of the correct terminology here: thread, coroutine, callback) continue to run. Things like ping will still get sent to the envisalink, but no responses will ever be processed.
So the problems/questions are:
I'm still investigating but wanted to post this in case someone else sees this problem or has a clue what's going on.
Here's the log: