-
Notifications
You must be signed in to change notification settings - Fork 218
Open
Description
I got called to debug a system that had a sidecar in A2 after mupdate to 17.2.
humility -a /data/local/images/sidecar/d/sp/build-sidecar-d-image-default-v1.0.56.zip --ip fe80::aa40:25ff:fe05:8e00%dut2 ringbuf seq
humility: connecting to fe80::aa40:25ff:fe05:8e00%5
humility: ring buffer drv_oxide_vpd::__RINGBUF in sequencer:
humility: ring buffer drv_packrat_vpd_loader::__RINGBUF in sequencer:
humility: ring buffer drv_sidecar_seq_server::__RINGBUF in sequencer:
NDX LINE GEN COUNT PAYLOAD
17 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
18 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
19 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
20 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
21 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
22 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
23 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
24 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
25 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
26 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
27 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
28 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
29 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
30 350 406 1 TofinoSequencerTick(Disabled, A2 { error: None })
31 368 406 1 TofinoSequencerPolicyUpdate(Disabled)
0 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
1 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
2 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
3 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
4 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
5 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
6 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
7 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
8 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
9 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
10 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
11 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
12 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
13 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
14 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
15 368 407 1 TofinoSequencerPolicyUpdate(Disabled)
16 350 407 1 TofinoSequencerTick(Disabled, A2 { error: None })
Looks liked a commanded power off of some kind, let's check thermal
humility -a /data/local/images/sidecar/d/sp/build-sidecar-d-image-default-v1.0.56.zip --ip fe80::aa40:25ff:fe05:8e00%dut2 ringbuf thermal
humility: connecting to fe80::aa40:25ff:fe05:8e00%5
humility: ring buffer drv_i2c_devices::emc2305::__RINGBUF in thermal:
humility: ring buffer drv_i2c_devices::max31790::__RINGBUF in thermal:
humility: ring buffer task_thermal::__RINGBUF in thermal:
TOTAL VARIANT
6678 PowerDownAt
97 ControlPwm
11 AutoState(Boot)
2 AutoState(Running)
2 AutoState(Uncontrollable)
1 AutoState(Overheated)
8 FanAdded
8 AddedDynamicInput
2 PowerDownDueTo
2 PowerModeChanged
2 FanControllerInitialized
1 Start
1 ThermalMode(Auto)
1 CriticalDueTo
1 SetFanWatchdogOk
NDX LINE GEN COUNT PAYLOAD
8 1210 211 1 PowerDownAt(0x66e840)
9 1210 211 1 PowerDownAt(0x66ec28)
10 1210 211 1 PowerDownAt(0x66f010)
11 1210 211 1 PowerDownAt(0x66f3f8)
12 1210 211 1 PowerDownAt(0x66f7e8)
13 1210 211 1 PowerDownAt(0x66fbc8)
14 1210 211 1 PowerDownAt(0x66ffb0)
15 1210 211 1 PowerDownAt(0x670398)
16 1210 211 1 PowerDownAt(0x670780)
17 1210 211 1 PowerDownAt(0x670b68)
18 1210 211 1 PowerDownAt(0x670f50)
19 1210 211 1 PowerDownAt(0x671338)
20 1210 211 1 PowerDownAt(0x671720)
21 1210 211 1 PowerDownAt(0x671b08)
22 1210 211 1 PowerDownAt(0x671ef0)
23 1210 211 1 PowerDownAt(0x6722d8)
24 1210 211 1 PowerDownAt(0x6726c0)
25 1210 211 1 PowerDownAt(0x672aa8)
26 1210 211 1 PowerDownAt(0x672e90)
27 1210 211 1 PowerDownAt(0x673278)
28 1210 211 1 PowerDownAt(0x673660)
29 1210 211 1 PowerDownAt(0x673a48)
30 1210 211 1 PowerDownAt(0x673e30)
31 1210 211 1 PowerDownAt(0x674218)
0 1210 212 1 PowerDownAt(0x674600)
1 1210 212 1 PowerDownAt(0x6749e8)
2 1210 212 1 PowerDownAt(0x674dd0)
3 1210 212 1 PowerDownAt(0x6751b8)
4 1210 212 1 PowerDownAt(0x6755a0)
5 1210 212 1 PowerDownAt(0x675988)
6 1210 212 1 PowerDownAt(0x675d70)
7 1210 212 1 PowerDownAt(0x676158)
This looks like a thermal shutdown. 2 things of note here: the repeated PowerDownAt seems like a bug, and it pushes the other more useful stuff out of the ring buffer. We did have a CriticalDueTo logged in history so I'm postulating that this was due to a xcvr read error. Additional attempts to go back to A0 did not work, as we were immediately shut down again:
humility -a /data/local/images/sidecar/d/sp/build-sidecar-d-image-default-v1.0.56.zip --ip fe80::aa40:25ff:fe05:8e00%dut2 ringbuf seq
humility: connecting to fe80::aa40:25ff:fe05:8e00%5
humility: ring buffer drv_oxide_vpd::__RINGBUF in sequencer:
humility: ring buffer drv_packrat_vpd_loader::__RINGBUF in sequencer:
humility: ring buffer drv_sidecar_seq_server::__RINGBUF in sequencer:
NDX LINE GEN COUNT PAYLOAD
17 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
18 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
19 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
20 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
21 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
22 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
23 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
24 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
25 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
26 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
27 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
28 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
29 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
30 350 409 1 TofinoSequencerTick(Disabled, A2 { error: None })
31 368 409 1 TofinoSequencerPolicyUpdate(Disabled)
0 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
1 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
2 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
3 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
4 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
5 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
6 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
7 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
8 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
9 403 410 1 ClearingTofinoSequencerFault(None)
10 368 410 1 TofinoSequencerPolicyUpdate(LatchOffOnFault)
11 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
12 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
13 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
14 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
15 368 410 1 TofinoSequencerPolicyUpdate(Disabled)
16 350 410 1 TofinoSequencerTick(Disabled, A2 { error: None })
I took a hubris dump for further debug stored at /staff/core/hubris-2369
and proceed to ignition cycle the sidecar at which point it came back online with no problem.
Some thoughts:
- This shouldn't be so sticky as to require an ignition cycle.
- Spamming the ring buf with PowerDownAt and a new timestamp is really unhelpful
- If this is a xcvr temp issue as suspected, it is most probably incorrect behavior to totally shut down the switch!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels