Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
type: automatic_troubleshooting
sub_type: bsod
os: [Windows]
date: '2026-03-09'
---

## Symptom

A Windows system running Elastic Defend experiences a Blue Screen of Death (BSOD) or kernel crash. The memory dump analysis references `elastic_endpoint_driver.sys` or `elastic-endpoint-driver.sys`. The crash may occur shortly after an agent upgrade, 3rd party security product installation or its configuration change, during heavy I/O workloads, or after the system has been running for some time with many network connections. In severe cases the system enters a boot loop.


## Summary

Elastic Defend uses a kernel-mode driver (`elastic_endpoint_driver.sys`) for file system filtering, network monitoring, and process/object callbacks. Most BSOD issues traced to the endpoint driver fall into a few categories: regressions introduced in specific driver versions, conflicts with other kernel-mode drivers (third-party security products), or running on unsupported OS versions.

Collecting a full kernel memory dump and sharing it with Elastic Support is essential for root-cause determination. The bugcheck code alone is not sufficient — the faulting call stack identifies which code path triggered the crash. Just because Elastic Defend is in the calls stack does not mean it is responsible for the crash.


## Common issues

### Network driver pool corruption (8.17.8, 8.18.3, 9.0.3)

A regression in the network driver introduced in Elastic Defend versions 8.17.8, 8.18.3, and 9.0.3 can cause kernel pool corruption on systems with a large number of long-lived network connections that remain inactive for 30+ minutes. The corruption manifests as BSODs with various bugcheck codes including `IRQL_NOT_LESS_OR_EQUAL`, `SYSTEM_SERVICE_EXCEPTION`, `KERNEL_MODE_HEAP_CORRUPTION`, or `PAGE_FAULT_IN_NONPAGED_AREA`.

This is the most frequently reported BSOD pattern and affects Windows Server environments with persistent connections (e.g. database servers, backup servers running Veeam with PostgreSQL).

**Affected versions**: 8.17.8, 8.18.3, 9.0.3 only.

**Fixed versions**: 8.17.9, 8.18.4, 9.0.4. Hotfix builds are also available: 8.18.3+build202507101319 and 9.0.3+build202507110136.

**Mitigation**: Upgrade to a fixed version. If immediate upgrade is not possible, set `advanced.kernel.network: false` in the Elastic Defend advanced policy settings to disable the kernel network driver.

### ODX-enabled volume crash (8.19.8, 9.1.8, 9.2.2)

A regression introduced in versions 8.19.8, 9.1.8, and 9.2.2 causes BSODs on systems with ODX (Offloaded Data Transfer) enabled volumes, particularly affecting Hyper-V clusters and Windows Server 2016 Datacenter. The crash can appear 2-3 hours after an agent upgrade, often triggered when the storage subsystem processes asynchronous offload write operations.

**Affected versions**: 8.19.8, 9.1.8, 9.2.2 only.

**Fixed version**: 9.2.4.

**Mitigation**: Upgrade to 9.2.4+ which contains the fix.

### Third-party kernel driver conflicts

Other security products running kernel-mode drivers can interfere with Elastic Defend's driver initialization or runtime operation. The most commonly reported conflicts include:

- **Trellix Access Control**: Trellix's kernel driver can intercept the Windows Base Filtering Engine (BFE) service, causing Defend's WFP (Windows Filtering Platform) driver initialization to hang or take an extremely long time. This interaction was introduced by an Elastic Defend refactor in 8.16.0. Fixed in 8.17.6, 8.18.1, and 9.0.1. Upgrade to a fixed version to resolve.

- **CrowdStrike, Kaspersky, Windows Defender coexistence**: Running multiple endpoint security products increases the probability of kernel-level interactions. Each additional kernel-mode filter driver introduces another point of contention for file system, registry, and network callbacks. When BSODs occur on systems with multiple security products, simplify by removing redundant products.

### Unsupported OS version

Upgrading Elastic Defend to a version that does not support the host's Windows version causes immediate BSODs or boot loops. Support for Windows Server 2012 R2 was dropped in 8.13.0 and re-added in 8.16.0. The system crashes during driver load because the driver uses kernel APIs unavailable on the older OS.

**Recovery**: Boot into Safe Mode or the Windows Recovery Console and delete `C:\Windows\System32\drivers\elastic-endpoint-driver.sys`. This prevents the driver from loading on the next boot. Then move the agent to a policy without the Elastic Defend integration, or upgrade to a version that re-added support (8.16.0+ for Windows Server 2012 R2).

**Prevention**: Check the [Elastic Defend support matrix](https://www.elastic.co/support/matrix) before upgrading agents across a fleet. Use separate agent policies for older OS versions that require pinned agent versions.

## Investigation priorities

1) Collect the full kernel memory dump (`C:\Windows\MEMORY.DMP` or minidumps from `C:\Windows\Minidump\`). Share the dump with Elastic.
2) Check the Elastic Defend version at the time of crash. Query `.fleet-agents*` for the agent version and `metrics-endpoint.metadata_current_*` for the endpoint version and OS details. Cross-reference against the known affected versions listed above (8.17.8, 8.18.3, 9.0.3 for network driver; 8.19.8, 9.1.8, 9.2.2 for ODX).
3) Determine whether the BSOD started after a specific agent or OS upgrade. Check `.fleet-agents*` for recent version changes and correlate with the crash timeline.
4) Identify other kernel-mode security products installed on the system. Look for drivers like `klflt.sys` (Kaspersky), `mfehidk.sys` (Trellix/McAfee), `csagent.sys` (CrowdStrike), or other filter drivers in the WinDbg module list.
5) Check the Windows version against the Elastic Defend support matrix. Query `metrics-endpoint.metadata_current_*` for `host.os.version` and `host.os.name`.
6) Look for gaps in endpoint metadata timestamps in `metrics-endpoint.metadata_current_*` — an offline gap followed by a version change often indicates a crash-recovery-rollback sequence.
7) Check `metrics-endpoint.policy-*` for `connect_kernel` failures, which indicate the driver failed to load or initialize properly after a crash.
8) If the system is in a boot loop, guide the user to boot into Safe Mode, delete the driver file at `C:\Windows\System32\drivers\elastic-endpoint-driver.sys`, then boot normally and downgrade or uninstall the agent.
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
type: automatic_troubleshooting
sub_type: device_control
link: https://elastic.co/docs/solutions/security/configure-elastic-defend/configure-an-integration-policy-for-elastic-defend
os: [Windows]
date: '2026-03-11'
---

## Symptom

A custom notification message has been configured in the Elastic Defend Device Control policy to display when a USB device is blocked, but the Windows system tray popup does not appear. Instead, the user sees only a generic Windows Explorer error stating the device is not accessible. Alternatively, device-specific allow/block rules based on `device.serial_number` do not match the intended device because the serial number field contains `0` or a seemingly random value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two situations are completely different. Should a single MD doc cover them both or is it better to break this doc up? I'm holding off on reading this file until this is answered.

I have no personal preference, I'm just bringing this up in case it helps with context windows.



## Summary

Elastic Defend's Device Control feature (available since 9.2) can block USB storage devices and display a custom notification message to the user. On Windows, USB mount events are generated by the SYSTEM service process rather than the interactive user session. Prior to 9.4.0, the notification popup was sent to the desktop session associated with the USB mount event — which is the SYSTEM service desktop. Because most systems do not have an interactive desktop for the SYSTEM service, the popup silently failed to display.

A separate limitation affects device-specific rules. The `device.serial_number` field is populated by querying the kernel driver via `IOCTL_STORAGE_QUERY_PROPERTY`, but USB serial numbers are inherently unreliable on Windows — many devices report `0`, an empty string, or a random value. This makes `device.serial_number` unsuitable as the sole identifier for device-specific allow/block rules. The `device.id` field contains the Windows PNP Device ID which is more consistently populated, but it has different semantics than a serial number and cannot be used interchangeably with `device.serial_number` in Device Control rules.


## Common issues

### Custom notification popup not appearing (pre-9.4.0)

When a USB device is blocked by Device Control, the endpoint generates a notification event and attempts to display the configured custom message as a Windows system tray popup. On versions prior to 9.4.0, the popup is targeted at the desktop session that originated the USB mount event. Because USB mount events come from the SYSTEM service process (`services.exe` / PID 0 session), the popup is sent to the SYSTEM service desktop — which does not exist as an interactive session on most systems. The popup is created but never rendered.

This affects all Device Control block actions regardless of user privilege level. Even administrators logged into an interactive session do not see the popup because the popup is dispatched to the wrong session.

**Fixed in**: 9.4.0. The fix changes the behavior so that when a USB device event triggers a block notification, the popup is shown on all interactive desktop sessions instead of only the session that originated the mount event.

**Workaround** (pre-9.4.0): There is no workaround for displaying the custom notification. The device is still blocked — the user will see the standard Windows Explorer "device not accessible" error, which confirms the block is in effect even though the custom message is not shown.

### Windows Do Not Disturb suppressing notifications

Even on 9.4.0+ where the popup is correctly dispatched to interactive desktops, Windows Do Not Disturb (Focus Assist) can suppress the notification. When DND is enabled, Windows silently drops or queues system tray popups from all applications, including Elastic Defend.

To diagnose: check Settings > System > Focus assist (Windows 10) or Settings > System > Notifications > Do not disturb (Windows 11/Server 2022+). If DND is enabled or configured for automatic rules (e.g. during presentations, full-screen apps), the notification will not appear until DND is disabled.

Also verify that notifications are enabled for the Elastic Endpoint application in Settings > System > Notifications. If the Elastic Endpoint application is explicitly set to "Off", no notifications will appear regardless of DND state.

### `device.serial_number` unreliable for device-specific rules

The `device.serial_number` field frequently contains `0` or a random single-digit value, making device-specific Device Control rules ineffective when they rely on this field. This is not a bug in Elastic Defend — USB serial numbers are inherently unreliable on Windows. Many USB devices do not have manufacturer-programmed serial numbers, and the Windows storage stack returns a generated instance ID instead of a true serial number.

Elastic Defend queries the serial number from the kernel driver using `DeviceIoControl` with `IOCTL_STORAGE_QUERY_PROPERTY`. When the device does not report a serial number, the query returns `0` or an empty value. Some devices report inconsistent values across different USB ports or after re-enumeration.

**Workaround**: Instead of relying solely on `device.serial_number`, use a combination of `device.vendor_id` and `device.product_id` to identify device classes. Query `logs-endpoint.events.device-*` for the target device to check which fields are reliably populated. The `device.id` field (which contains the Windows PNP Device ID) is more consistently available, but it is not currently usable as a condition field in Device Control rules.

**Improvement planned**: Groundwork has been done to re-gather the serial number in user space after the device connects, which may improve reliability for devices that expose the serial number through registry enumeration rather than the kernel storage query.

### Device Control rules not matching expected devices

When `device.serial_number` is unreliable, rules that use serial number conditions will either fail to match intended devices or unintentionally match unrelated devices that happen to share the same `0` or generated value. A rule configured to allow a specific USB drive by serial number `0` would match every device that reports `0` — effectively allowing all devices without true serial numbers.

Review Device Control rules that use `device.serial_number` conditions. For each rule, query `logs-endpoint.events.device-*` to verify the actual serial number value reported for the target device. If the value is `0`, empty, or inconsistent, switch to identifying the device by `device.vendor_id` and `device.product_id` instead.


## Investigation priorities

1) Confirm the Elastic Defend version. If pre-9.4.0, the missing notification popup is a known issue — upgrade to 9.4.0+ to resolve it.
2) If on 9.4.0+ and notifications still do not appear, check Windows Do Not Disturb and notification settings on the affected endpoint. Verify DND is disabled and Elastic Endpoint notifications are enabled.
3) For serial number issues, query `logs-endpoint.events.device-*` for the target device and inspect the `device.serial_number`, `device.id`, `device.vendor_id`, and `device.product_id` fields. Determine which fields are reliably populated and adjust Device Control rules accordingly.
4) Verify the Device Control policy configuration via `get_package_configurations` — confirm that custom notification text is configured and that device rules reference fields with reliable values.
5) Check `logs-endpoint.events.device-*` for device mount/unmount events to confirm Elastic Defend is detecting the USB device at all. If no device events are present, the Device Control feature may not be enabled in the policy.
6) For block actions that should be working but are not preventing device access, confirm the Device Control mode is set to "Block" rather than "Detect" in the integration policy.
Loading