Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions scripts.d/ta/410_nvme_controllers_with_invalid_irq.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,8 @@ for PCI_DEVICE_ID in $(sudo lspci -mm | grep 'Non-Volatile memory controller' |
echo "The NVMe device at PCI address ${PCI_DEVICE_ID} appears to have"
echo "invalid IRQ routing. This is indicated by the presence of a negative number in the"
echo "\"Interrupt:\" line from lspci."
echo "This might not cause a problem, but it might prevent an NVMe drive from being claimed"
echo "by a Weka process."
echo "This can be caused by the presence of an enabled APIC device. Review your hardware,"
echo "firmware, and linux kernel settings if this is causing a problem"
echo "This can sometimes prevent a WEKA Process from receiving interrupts from the NVME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weka doesnt care about the interrupts, the issue was that the kernel couldn't allocate interrupts and that caused the KERNEL to fail to use the device, which in turn made weka unable to use it since the device couldn't be scanned to know that it is a weka signed device.

echo "Please review your hardware, firmware, and linux kernel settings if this is causing a problem"
fi
done

Expand Down
35 changes: 35 additions & 0 deletions scripts.d/ta/415_check_extended_apic.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

#set -ue # Fail with an error code if there's any sub-command/variable error

DESCRIPTION="Check that extended APIC is available for assigning IRQs"
# script type is single, parallel, sequential, or parallel-compare-backends
SCRIPT_TYPE="parallel"
JIRA_REFERENCE=""
WTA_REFERENCE=""
KB_REFERENCE=""

RETURN_CODE="0"

#check that extended APIC (or x2apic) is available, because it's required for more
# space for IRQs

grep -m1 -q -E '^flags.*(\<extapic|\<x2apic)' /proc/cpuinfo 2>/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for certain that all relevant platforms have x2apic? down to the oldest supported server platforms?

I never took notice of that to know myself.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently came out in 2008 with Nehalem.... I think that's older than weka.

EXT_APIC_STATUS=$?
if [[ ${EXT_APIC_STATUS} -eq 0 ]] ; then
RETURN_CODE="0"
else
RETURN_CODE="254"
echo "There is no extended APIC available. This can prevent the assignment"
echo "of enough IRQs to support all hardware, resulting in the kernel"
echo "error message: vector space exhaustion. This in turn can completely"
echo "prevent the kernel accessing devices such as NVMEs."
echo "A frequent cause of no extended APIC is the disabling of IOMMUs"
fi

if [[ ${RETURN_CODE} -eq "0" ]]; then
echo "Extended APIC reports available"
else
echo "No extended APIC available"
fi
exit ${RETURN_CODE}