Conversation
Proposes a dedicated NetFlow generator using an actor-model simulation where network hosts execute behavioral sequences that produce correlated flow records as a byproduct. Covers v5 and v9 wire formats, explains why the block cache is inapplicable (temporal compression), and follows the dedicated-generator precedent set by ProcessTree/FileTree/Container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8603b88 to
b2ea346
Compare
garrison-stauffer
left a comment
There was a problem hiding this comment.
Looks very cool! One question I had that I might have missed in the doc: is there a single client for the agent? The agent aggregates by these fields, typically in a customer environment this is a router, firewall, or L3 switch, as opposed to individual clients on the network
Good question. Let me clarify the ADR. |
|
|
||
| Each field is individually valid. The flow, while technically valid, does not | ||
| represent 'realistic' sufficient to make claims about the target, except in | ||
| extremis. |
There was a problem hiding this comment.
fwiw I will likely end up needing a custom generator for CWS as well.
They will need a "shape" of kernel traffic that exercises the right code paths. Completely arbitrary kernel events could easily bypass the functionality of CWS.
So I am also super curious about what we do here.
| ``` | ||
| - **Semantic assumptions**: The behavior definitions encode assumptions about | ||
| "realistic" traffic that may not match all deployment environments | ||
| - **Runtime generation cost**: Unlike block-cache generators, the dedicated |
There was a problem hiding this comment.
Should we establish any baseline requirements for runtime generators?
- "Must have associated benchmarks"
- "Throughput must be X"
There was a problem hiding this comment.
Hmm. Let's talk about this when I get into the implementation side. I think yes, I'm not exactly sure what to claim yet.
There was a problem hiding this comment.
Happy to start with this and have a follow-up ADR that addresses the runtime generation costs.
It's easier for me to argue about this once we have some guarantees/better understanding of what we mean by "lading must run faster than the target"/ "lading must not be the bottleneck". The more benchmarking I'm iterating on, the clearer that's getting.
We don't currently enforce any invariants/constraints on lading - as such, it wouldn't make sense to constrain the solution we need for the problem you've highlighted.
There was a problem hiding this comment.
It's easier for me to argue about this once we have some guarantees/better understanding of what we mean by "lading must run faster than the target"/ "lading must not be the bottleneck". The more benchmarking I'm iterating on, the clearer that's getting.
Agreed. So far that's been a design goal without a quantitative measure. I'd be real pleased to have that resolved.
There was a problem hiding this comment.
Note: we might want to add/tweak some of the documentation in AGENTS.md afterward.
There was a problem hiding this comment.
Agreed. I'll ping you, curious to work that together. You've improved over me in this area.
Introduces exporters as first-class config concept — each exporter represents a router/firewall/switch with its own bind address, protocol version, source_id, flows_per_second, and actor pool. The Agent aggregates by ExporterAddr (UDP source IP), so distinct loopback addresses (127.0.0.1, 127.0.0.2, etc.) model distinct network devices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@garrison-stauffer solid catch here. I had completely overlooked this. I've adjusted the proposed config now so that we are explicitly setting exporter / clients. I'm thinking the interior implementation will just have separate forks of the same model, serialization code etc and bind to unique sockets on the same host. |
| mask: 24 | ||
| weight: 100 | ||
|
|
||
| - addr: "127.0.0.2" # IoT gateway |
There was a problem hiding this comment.
Ah cool, I was wondering if we'd be able to multiple local IPs (loopbacks? not sure what it is called), this looks perfect
|
@blt question for ya, we don't speak of any limitations in the expression of the lading configuration: we allow unbounded lists to be expressed. Eventually, would we want to do some kind of configuration verification to verify that the user is not doing something that would cause lading to be very slow? |
What does this PR do?
Proposes a dedicated NetFlow generator using an actor-model simulation
where network hosts execute behavioral sequences that produce correlated
flow records as a byproduct. Covers v5 and v9 wire formats, explains why
the block cache is inapplicable (temporal compression), and follows the
dedicated-generator precedent set by ProcessTree/FileTree/Container.