Skip to content

output/flush: Correct EVE flushing logic#15088

Closed
jlucovsky wants to merge 4 commits intoOISF:mainfrom
jlucovsky:flush.v3.7
Closed

output/flush: Correct EVE flushing logic#15088
jlucovsky wants to merge 4 commits intoOISF:mainfrom
jlucovsky:flush.v3.7

Conversation

@jlucovsky
Copy link
Copy Markdown
Contributor

Continuation of #15085

Fix deficiencies in EVE-flushing logic and adopt a simpler approach that

  • Maintains a list of EVE file contexts

  • According to the configuration value heartbeat.output-flush-interval, periodically traverse the context list and flush each.

  • Coordinate with other usages

    Add flushing logic driven by the file contexts. This is a simpler
    solution that removes the need for logger registration changes.

    Overview:
    Use the heartbeat-driven thread to periodically flush all registered EVE
    contexts via a global flush list.

    The global flush list is a mutex-protected TAILQ of LogFileFlushEntry
    nodes; each node points to a LogFileCtx. Mutex = log_file_flush_mutex

    Periodic flushing is performed by a thread according to the
    heartbeat.output-flush-interval [1,60]. LogFileFlushAll() is invoked to
    initiate flushing of registered LogFileCtx structs; each struct's
    fp_mutex is obtained while the flush occurs to synchronize with
    LogFileWrite activity.

    Interacts with file-rotation via the fp_mutex.

    Deadlock prevention: the log_file_flush_mutex must be obtained before
    the fp_mutex.

Note that the log_file_flush_mutex is not strictly necessary to hold during flushes; however

  • Defense in depth
  • Low-latency acquisition by a background thread.

Link to ticket: https://redmine.openinfosecfoundation.org/issues/8286

Describe changes:

  • Removed previous flushing logic based on packet logger registration
  • Added EVE flushing logic and register/unregister functions
  • Updated log-flush worker thread
  • Updated suricata.yaml/documentation to eliminate mention of threads performing flushing
  • Clarified flushing applies to file output only

Updates:

  • Expanded commit message with design overview
  • Conditionally set Write/Close function pointers to lock/nolock (nolock when heartbeat is disabled).
  • Added performance summary
  • Addressed review comments
  • Removed unused function
  • Fixed commit-compile check issue

Performance Summary

Suricata Benchmark Analysis

Date: Fri Mar 20 02:31:46 PM EDT 2026
Binary: ./src/suricata | Config: suricata.yaml | PCAP: /home/jlucovsky/pcap


Test Configuration

Parameter Values
Threading true, false
Buffer sizes 0, 4kb, 8kb, 16kb, 32kb, 64kb
Flush intervals 0, 15, 30, 60
Warmup runs 1
Min / Max runs 3 / 10

Summary

All 48 configurations cluster tightly between ~523–534 seconds, with two notable outliers at 32kb buffer + flush=0. The overall spread is small (~10s across "normal" configs), meaning no single parameter has a dramatic effect — but consistency (low std dev) varies considerably.

Metric Value Configuration
Fastest mean 523.484 s threaded=false, 8kb, flush=30
Slowest mean 566.436 s threaded=false, 32kb, flush=0 ⚠️
Most stable ±0.261 s std dev threaded=true, 0 buffer, flush=15
Most unstable ±22.526 s std dev threaded=false, 32kb, flush=0 ⚠️

Key Findings

1. Major Outlier: threaded=false, 32kb, flush=0

This configuration is a clear anomaly:

  • Mean of 566.4s vs ~524–528s for most other configs
  • Standard deviation of ±22.5s (range: 548–592s)
  • The equivalent threaded=true config (32kb, flush=0) also shows elevated variance (±10.7s, mean 532.6s)
  • Something about large buffers combined with no flush interval appears to cause instability — particularly when threading is off

Recommendation: Avoid 32kb+ buffer sizes with flush=0 in production.

2. Threaded=false Is Slightly Faster for Buffered I/O

For buffer sizes of 4–16kb, threaded=false configurations tend to run 2–4 seconds faster on average than their threaded=true counterparts. However, for the unset (0) buffer case, threaded=false performs worse (~530–532s vs ~527–529s).

This difference is within noise for most configs, but the pattern is consistent enough to be noteworthy.

3. 8kb Buffer Is the Sweet Spot

Across both threading modes, 8kb buffer consistently delivers the best or near-best performance with low variance:

Config Mean (s) Std Dev (s)
threaded=false, 8kb, flush=30 523.484 ±2.670
threaded=false, 8kb, flush=0 524.201 ±1.269
threaded=false, 8kb, flush=60 524.992 ±1.858
threaded=true, 8kb, flush=30 524.665 ±1.251
threaded=true, 8kb, flush=15 525.802 ±1.988

4. Flush Interval Has Mixed Impact

Flush interval shows no consistent directional effect on mean performance, but flush=0 increases variance — especially at larger buffer sizes. Using flush=30 or flush=60 tends to stabilize results without a meaningful speed penalty.

5. Large Buffers (32kb, 64kb) Offer No Benefit

Buffers larger than 16kb do not improve performance and introduce more instability (higher std dev). The gains from buffering plateau around 8–16kb.


Full Results Table

Threaded = true

Buffer Flush=0 Flush=15 Flush=30 Flush=60
0 527.052 ± 2.662 527.814 ± 0.261 528.998 ± 0.830 528.333 ± 0.696
4kb 528.095 ± 2.497 527.261 ± 4.113 529.825 ± 3.324 527.871 ± 2.121
8kb 527.429 ± 1.010 525.802 ± 1.988 524.665 ± 1.251 527.086 ± 3.211
16kb 529.427 ± 0.891 527.174 ± 2.955 527.305 ± 3.325 529.413 ± 0.835
32kb 532.629 ± 10.672 ⚠️ 530.651 ± 2.989 527.177 ± 2.537 529.380 ± 2.450
64kb 529.396 ± 1.089 527.405 ± 2.920 528.758 ± 1.013 527.458 ± 2.388

Threaded = false

Buffer Flush=0 Flush=15 Flush=30 Flush=60
0 532.564 ± 2.339 529.360 ± 1.659 532.122 ± 2.294 528.272 ± 1.572
4kb 525.913 ± 0.851 524.182 ± 1.480 526.829 ± 0.941 524.874 ± 1.415
8kb 524.201 ± 1.269 525.880 ± 1.291 523.484 ± 2.670 524.992 ± 1.858
16kb 525.404 ± 3.380 528.273 ± 1.930 533.021 ± 0.679 527.344 ± 4.980
32kb 566.436 ± 22.526 ⚠️ 525.643 ± 1.795 526.295 ± 1.120 528.016 ± 3.201
64kb 527.183 ± 2.612 527.571 ± 1.382 526.214 ± 1.457 530.293 ± 2.570

Values shown as mean ± std dev in seconds. Bold = global best. ⚠️ = notable outlier.


Recommendations

Optimal configuration: threaded=false, buffer-size=8kb, output-flush-interval=30

  • Fastest mean (523.5s), acceptable variance

Safe conservative choice: threaded=true, buffer-size=8kb, output-flush-interval=30

  • Nearly as fast (524.7s), very low variance (±1.3s), threading may be preferable for operational stability

Avoid:

  • buffer-size=32kb with output-flush-interval=0 in either threading mode — high variance, worst-case runtimes of 545–592s
  • buffer-size=0 with threaded=false — consistently slower than buffered equivalents (~530–533s range)

Provide values to any of the below to override the defaults.

  • To use a Suricata-Verify or Suricata-Update pull request,
    link to the pull request in the respective _BRANCH variable.
  • Leave unused overrides blank or remove.

SV_REPO=
SV_BRANCH=
SU_REPO=
SU_BRANCH=

Add flushing logic driven off of the file contexts. This is a simpler
solution that removes the need for logger registration changes.

Overview:
Use the heartbeat-driven thread to periodically flush all registered EVE
contexts via a global flush list.

The global flush list is a mutex-protected TAILQ of LogFileFlushEntry
nodes; each node points to a LogFileCtx. Mutex = log_file_flush_mutex

Periodic flushing performed by a thread according to the
heartbeat.output-flush-interval [1,60]. LogFileFlushAll() is invoked to
initiate flushing of registered LogFileCtx structs; each struct's
fp_mutex is obtained while the flush occurs to synchronize with
LogFileWrite activity.

Interacts with file-rotation via the fp_mutex.

Deadlock prevention: the log_file_flush_mutex must be obtained before
the fp_mutex.

Issue: 8286
Remove packet-based flush logic in favor of simpler solution

Issue: 8286
Remove log flush functions and update registration logic as
context-based flushing doesn't require it.

Issue: 8286
Update output flushing description to reflect EVE based approach in
documentation and config template.

Issue: 8286
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 22.72727% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.61%. Comparing base (bc4a055) to head (4f9b54c).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15088   +/-   ##
=======================================
  Coverage   82.61%   82.61%           
=======================================
  Files         990      990           
  Lines      271655   271581   -74     
=======================================
- Hits       224415   224378   -37     
+ Misses      47240    47203   -37     
Flag Coverage Δ
fuzzcorpus 61.06% <13.63%> (+0.02%) ⬆️
livemode 18.36% <22.72%> (-0.04%) ⬇️
netns 18.38% <22.72%> (-0.03%) ⬇️
pcap 45.24% <22.72%> (-0.04%) ⬇️
suricata-verify 66.13% <22.72%> (+<0.01%) ⬆️
unittests 58.84% <13.63%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@victorjulien victorjulien added this to the 9.0 milestone Mar 23, 2026
@suricata-qa
Copy link
Copy Markdown

WARNING:

field baseline test %
SURI_TLPR1_stats_chk
.app_layer.flow.ftp-data 607 580 95.55%

Pipeline = 30472

@victorjulien
Copy link
Copy Markdown
Member

Merged in #15091, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants