Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 24 additions & 24 deletions TODO.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions commands/eforge/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ When writing to the overlay, files are partial — they contain ONLY the user's
| Add proxy URI templates | `proxy_uri_templates.yaml` | `dns_registry.yaml` (validate domain exists); use `domain_class` and `referrer_policy` for certificate/update infrastructure |
| Modify proxy User-Agent pools | `proxy_user_agents.yaml` | `dns_registry.yaml` for package/update hostnames |
| Add site map entries | `site_maps.yaml` | `dns_registry.yaml` (validate domain exists) |
| Modify inbound web visitor mix | `web_session_profiles.yaml` | `site_maps.yaml`, `traffic_rates.yaml`, `timing_profiles.yaml` |
| Modify bash commands | `bash_commands.yaml` | Validate role names match persona names; keep `typo_model` rates/counts realistic |
| Modify traffic rate defaults | `traffic_rates.yaml` | (standalone — intensity-based rate table for all system traffic) |
| Modify systemd schedules | `systemd_schedules.yaml` | (standalone) |
Expand All @@ -66,6 +67,7 @@ When writing to the overlay, files are partial — they contain ONLY the user's
| Modify ProcessAccess masks | `process_access_patterns.yaml` | (standalone — Event 10 baseline source/target pairs and GrantedAccess masks) |
| Modify CreateRemoteThread pairs | `create_remote_thread_patterns.yaml` | (standalone — Event 8 baseline source/target pairs) |
| Modify Windows auth realism | `windows_auth_realism.yaml` | (standalone — Security log auth timing and failed-logon profile knobs) |
| Modify baseline auth noise | `auth_noise.yaml` | (standalone — stale scheduled-credential accounts and irregular recurrence timing) |
| Modify causal/source timing | `timing_profiles.yaml` | (standalone — causal prerequisite, source latency, teardown, and Windows/Sysmon collision-spacing knobs) |
| ~~Format definitions~~ | Not user-customizable | Engine internals — requires code changes |
| ~~Evaluation rules~~ | Not user-customizable | Must match format definitions — requires code changes |
Expand Down
18 changes: 16 additions & 2 deletions commands/eforge/references/config-dependency-graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,21 @@ Each row is a file; columns show what it depends on and what depends on it.
| Direction | File | Relationship |
|-----------|------|-------------|
| depends on | nothing | Standalone rate table |
| **depended on by** | Engine (runtime) | Drives all baseline traffic rate calculations (user activity, web, DNS, SMB, Kerberos, LDAP, persona connections) |
| **depended on by** | Engine (runtime) | Drives all baseline traffic rate calculations (user activity, web top-level actions, DNS, SMB, Kerberos, LDAP, persona connections) |

### web_session_profiles.yaml
| Direction | File | Relationship |
|-----------|------|-------------|
| depends on | `site_maps.yaml` | Human visitor sessions use site maps to expand top-level page loads into assets and same-origin API calls |
| depends on | `traffic_rates.yaml` | `web` rates count top-level visitor actions; subresources are dependent fanout |
| depends on | `timing_profiles.yaml` | Uses web session/navigation and asset/tool fanout timing relationships |
| **depended on by** | Engine (runtime) | Drives inbound `web_server` visitor classes, tool/API request shapes, status codes, and User-Agents |

### timing_profiles.yaml
| Direction | File | Relationship |
|-----------|------|-------------|
| depends on | nothing | Standalone timing relationship profile |
| **depended on by** | Engine (runtime) | Drives causal prerequisite offsets, source-latency offsets, teardown margins, and Windows/Sysmon tied-timestamp collision spacing |
| **depended on by** | Engine (runtime) | Drives causal prerequisite offsets, source-latency offsets, web session/fanout timing, sensor observation timing, teardown margins, and Windows/Sysmon tied-timestamp collision spacing |
| validated by | `eforge validate-config` | Enforces valid relationship classes, before/after positions, non-negative timing windows, and coherent min/max bounds |

### kerberos_realism.yaml
Expand Down Expand Up @@ -137,6 +145,12 @@ Each row is a file; columns show what it depends on and what depends on it.
| depends on | nothing | Standalone (uses distro/role filters) |
| **depended on by** | Engine (runtime) | Adds diversity to syslog baseline |

### auth_noise.yaml
| Direction | File | Relationship |
|-----------|------|-------------|
| depends on | nothing | Standalone authentication-noise profile data |
| **depended on by** | Engine (runtime) | Drives stale scheduled-credential account pools, recurrence timing, jitter, skips, and backoff |

### network_params.yaml
| Direction | File | Relationship |
|-----------|------|-------------|
Expand Down
64 changes: 59 additions & 5 deletions commands/eforge/references/config-dns-network.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ Schema documentation for the network-related config files. User customizations g
2. [traffic_profiles.yaml](#traffic_profilesyaml)
3. [proxy_uri_templates.yaml](#proxy_uri_templatesyaml)
4. [site_maps.yaml](#site_mapsyaml)
5. [network_params.yaml](#network_paramsyaml)
6. [tls_issuers.yaml](#tls_issuersyaml)
7. [tls_realism.yaml](#tls_realismyaml)
8. [smb_file_transfers.yaml](#smb_file_transfersyaml)
5. [web_session_profiles.yaml](#web_session_profilesyaml)
6. [network_params.yaml](#network_paramsyaml)
7. [tls_issuers.yaml](#tls_issuersyaml)
8. [tls_realism.yaml](#tls_realismyaml)
9. [smb_file_transfers.yaml](#smb_file_transfersyaml)

---

Expand Down Expand Up @@ -338,6 +339,59 @@ Minimal single-page structure for domains with no curated or tag-based match.

---

## web_session_profiles.yaml

Visitor-class definitions for inbound `web_server` baseline traffic. Human visitors use `site_maps.yaml` to emit a top-level page request plus required JS/CSS/images/fonts/API fanout. Crawler, health-check, API-client, and opportunistic-probe visitors use configured request lists so tool traffic keeps realistic paths, status codes, referrers, and User-Agents.

The `traffic_rates.yaml` `web` value counts top-level visitor actions only. Subresources required to render a human page load do not consume that budget.

### Structure

```yaml
visitor_classes:
human_browser:
weight: 70
kind: session # session|requests
external: true
internal: true
browsing_intensity: normal
user_agent_pool: browser_any
user_agent_pool_by_os:
linux: browser_linux

opportunistic_probe:
weight: 5
kind: requests
external: true
internal: false
request_count: [1, 5]
user_agent_pool: scanner
referrer_mode: none
requests:
- {path: "/wp-login.php", method: "GET", status: 404, type: "text/html", weight: 22}

user_agent_pools:
browser_any:
- "Mozilla/5.0 (...) Chrome/120.0.0.0 Safari/537.36"
scanner:
- "python-requests/2.31.0"
```

### Field Reference

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `visitor_classes.<name>.weight` | number | yes | Relative visitor-class frequency |
| `visitor_classes.<name>.kind` | string | yes | `session` for site-map browsing, `requests` for configured tool/API paths |
| `external` / `internal` | bool | no | Whether the class can be used for external or internal clients |
| `browsing_intensity` | string | session | Site-map session depth (`light`, `normal`, `heavy`) |
| `request_count` | `[min, max]` | requests | Number of configured requests per visitor action |
| `requests[].path` / `method` / `status` / `type` | mixed | requests | Source-native HTTP request shape |
| `user_agent_pool` | string | yes | Pool name under `user_agent_pools` |
| `user_agent_pool_by_os` | mapping | no | OS-specific override pools for known internal clients |

---

## network_params.yaml

MAC OUI (vendor) prefixes, public NTP server defaults, and DNS tunnel transaction timing. Scenario-defined internal/domain NTP servers are preferred at generation time; `public_ntp_servers` is the fallback pool for non-domain environments and for upstream refids on internal NTP servers.
Expand Down Expand Up @@ -480,7 +534,7 @@ Three top-level keys (`low`, `medium`, `high`), each containing the same traffic
| Key | Unit | Description |
|-----|------|-------------|
| `user_activity` | events/user/hr | Endpoint user activity (logons, processes, connections) |
| `web` | requests/web_server/hr | Background HTTP requests to web_server hosts |
| `web` | top-level actions/web_server/hr | User-driven page/API/tool requests to web_server hosts; page assets are emitted as dependent requests and do not consume this budget |
| `dns_interval` | seconds between queries | Lower = more DNS traffic |
| `ntp` | syncs/host/hr | NTP time sync frequency |
| `smb_interval` | seconds between SMB ops | Lower = more SMB/file share traffic |
Expand Down
61 changes: 61 additions & 0 deletions commands/eforge/references/config-host-activity.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,36 @@ Failed-logon profiles control source-native Windows 4625 fields and DC-side vali

---

## Auth Noise (`auth_noise.yaml`)

Controls baseline authentication noise that is not scenario-authored, especially stale scheduled credentials.

```yaml
scheduled_stale_credentials:
account_base_names: [svc_backup, svc_monitor, svc_report, svc_deploy, svc_scan]
host_count_min: 1
host_count_max: 2
interval_ranges:
- min_minutes: 55
max_minutes: 95
weight: 30
- min_minutes: 105
max_minutes: 155
weight: 45
first_occurrence_seconds_min: 0
first_occurrence_seconds_max: 2700
jitter_seconds_min: -420
jitter_seconds_max: 780
skip_probability: 0.16
backoff_probability: 0.10
backoff_seconds_min: 900
backoff_seconds_max: 3600
```

`account_base_names` should be plausible disabled service or automation principals; the engine still avoids collisions with scenario users and service accounts. Interval ranges, jitter, skip probability, and backoff probability produce deterministic but non-modulo recurrence so stale scheduled-task failures do not land on exact hourly or two-hour cadences. Run `eforge validate-config` after overlay changes; ranges must be ordered, weights must be positive, and probabilities must be between 0 and 0.95.

---

## timing_profiles.yaml

Data-driven timing windows for causal relationships, source-native latency, teardown margins, and Windows/Sysmon same-timestamp collision spacing. Use this when tuning realism of correlated event gaps without changing scenario YAML.
Expand Down Expand Up @@ -313,6 +343,32 @@ relationships:
position: after
min_ms: 800
max_ms: 2500
web.session_navigation:
class: human_workflow
position: after
min_ms: 3000
max_ms: 30000
web.asset_stylesheet_script_after_page:
class: burst_fanout
position: after
min_ms: 50
max_ms: 200
web.tool_request_gap:
class: burst_fanout
position: after
min_ms: 120
max_ms: 1500

network_sensor_observation:
default_profile: well_synced
profiles:
well_synced:
clock_skew_us:
min: -1500
max: 1500
path_delay_us:
min: 50
max: 2000

windows_event_time:
collision_spacing:
Expand All @@ -334,6 +390,9 @@ windows_event_time:
| `windows_event_time.collision_spacing.near_zero_until` | int | yes | Same-host tied-event collisions that can remain near-zero before larger spacing begins |
| `windows_event_time.collision_spacing.near_gap_min_us` / `near_gap_max_us` | int | yes | Microsecond spacing for small tied clusters |
| `windows_event_time.collision_spacing.large_gap_min_ms` / `large_gap_max_ms` | int | yes | Millisecond spacing for large tied clusters that would otherwise compress into synthetic-looking bursts |
| `network_sensor_observation.default_profile` | string | yes | Sensor timing profile used for multi-sensor Zeek observation offsets |
| `network_sensor_observation.profiles.<name>.clock_skew_us` | mapping | yes | `{min, max}` per-sensor clock skew in microseconds |
| `network_sensor_observation.profiles.<name>.path_delay_us` | mapping | yes | `{min, max}` per-flow tap/capture delay in microseconds |

### Conventions

Expand All @@ -342,6 +401,8 @@ windows_event_time:
`ssl.log` and `x509.log` timestamps should occur after conn start but before conn end for
the same UID.
- Use seconds or minutes for human or bulk workflow relationships; do not force everything into microseconds.
- Web session timing uses `web.session_navigation` for user-driven page-to-page actions and `web.asset_*_after_page` / `web.tool_request_gap` for render fanout and tool/API bursts.
- Keep the default `network_sensor_observation` profile in low milliseconds for well-synced Zeek fleets; use overlays only when modeling known sensor clock drift or queued/remote capture paths.
- Run `eforge validate-config` after overlay changes; it rejects invalid relationship classes, positions, negative windows, and inverted min/max ranges.

---
Expand Down
2 changes: 2 additions & 0 deletions commands/eforge/references/config-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ Run `eforge info <field>` to get specific values (e.g., `eforge info paths.activ
| 34 | create_remote_thread_patterns.yaml structure | ERROR | Baseline pair missing source/target PID keys, image paths, or positive weight |
| 35 | smb_file_transfers.yaml structure | ERROR | Missing SMB file-analysis thresholds/probabilities, invalid probability ranges, empty MIME/analyzer lists, invalid filename templates, or non-positive weights |
| 36 | kerberos_realism.yaml structure | ERROR | Invalid Kerberos 4768 pre-auth/ticket/encryption distribution, unsupported hex values, PKINIT without certificate profile, non-PKINIT with certificate fields, excessive no-preauth/PKINIT/RC4 weights, or malformed certificate profile fields |
| 37 | web_session_profiles.yaml structure | ERROR | Invalid inbound web visitor class, missing User-Agent pool, malformed configured request, or invalid request-count range |
| 38 | auth_noise.yaml structure | ERROR | Invalid stale scheduled-credential account pool, host-count range, recurrence interval range, jitter range, skip probability, or backoff bounds |

## Scenario Validation: traffic_rates

Expand Down
4 changes: 2 additions & 2 deletions commands/eforge/references/evidence-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ EDR/XDR telemetry rendered in MITRE CAR-based eCAR format. Represents what an ED
**File:** `syslog.log`
**Format:** RFC 5424 syslog

Authentication and system logs from Linux hosts. Generated syslog uses RFC 5424 with year-bearing ISO/RFC3339 timestamps. `eforge eval` still accepts older BSD/RFC3164-style syslog as a legacy ingest fallback. All generated syslog entries are rendered from `SyslogContext` on `SecurityEvent` — the emitter doesn't derive messages from other contexts. This enables correlated dispatch: a logon event carries both `AuthContext` (for Windows 4624) and `SyslogContext` (for sshd accepted) on the same SecurityEvent.
Authentication and system logs from Linux hosts. Generated syslog uses RFC 5424 with year-bearing ISO/RFC3339 timestamps. `eforge eval` still accepts older BSD/RFC3164-style syslog as a legacy ingest fallback. All generated syslog entries are rendered from `SyslogContext` on `SecurityEvent` — the emitter doesn't derive messages from other contexts. This enables correlated dispatch: a logon event carries both `AuthContext` (for Windows 4624) and `SyslogContext` (for sshd accepted) on the same SecurityEvent. Remote Linux `sshd` failed-password rows reuse the same source port as the companion Zeek SSH connection tuple.

| Program | Description | Notes |
|---------|-------------|-------|
Expand Down Expand Up @@ -313,7 +313,7 @@ Fields are whitespace-delimited; values with spaces, such as User-Agent strings,

**Status and byte semantics:** For explicit proxy mode, client-side Zeek HTTP records describe the client-to-proxy exchange. Plain HTTP denials therefore show the proxy's status code and proxy response size, not the origin's status/body. For intercepted HTTPS, the CONNECT setup status is tracked separately from the inspected request status, so a successful tunnel setup can coexist with a denied inspected GET.

**Session depth:** Persona HTTP traffic generates multi-request browsing sessions with subresource cascades. Each page load triggers follow-on requests for JS, CSS, images, and fonts, producing realistic request clusters in the proxy log. The number of pages and subresources per session is controlled by the persona's `browsing_intensity` setting (light/normal/heavy).
**Session depth:** Persona HTTP traffic and inbound `web_server` human visitors generate multi-request browsing sessions with subresource cascades. Each page load triggers follow-on requests for JS, CSS, images, fonts, and same-origin API calls, producing realistic request clusters in proxy and web access logs. Persona browsing depth is controlled by `browsing_intensity`; inbound web visitor classes, tool/API requests, and User-Agent pools are controlled by `web_session_profiles.yaml`.

**Known Limitations:**
- Only generated for systems with the `forward_proxy` role declared
Expand Down
6 changes: 3 additions & 3 deletions commands/eforge/references/scenario-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ If `proxy_access` is requested and `environment.proxy` is omitted, validation wa

The `roles` field declares a system's function in the network. The engine uses roles to generate both **outbound** traffic (connections the host initiates) and **inbound** traffic (connections the host receives):

- `web_server` — outbound: database queries, LDAP auth, API calls; inbound: HTTPS/HTTP from external clients and internal users
- `web_server` — outbound: database queries, LDAP auth, API calls; inbound: HTTPS/HTTP from external clients and internal users. Human inbound traffic is generated as browsing sessions: top-level page views consume the `web` traffic-rate budget, and required assets/API calls fan out from each page load.
- `database` — outbound: replication, updates; inbound: SQL queries from web/app servers
- `mail_server` — outbound: SMTP relay, LDAP lookups; inbound: SMTP from internet, webmail from users
- `file_server` — outbound: Kerberos/LDAP auth; inbound: SMB file access from workstations. File-server roles also increase baseline SMB target selection beyond normal DC SYSVOL/GPO traffic.
Expand Down Expand Up @@ -306,7 +306,7 @@ Work hours are automatically parsed into a `work_hours_parsed` dict containing:

### Browsing Intensity

The `browsing_intensity` field controls how much HTTP traffic a persona generates per browsing session. It affects proxy log depth (number of page loads and subresource cascades) for baseline web activity.
The `browsing_intensity` field controls how much HTTP traffic a persona generates per browsing session. It affects proxy log depth (number of page loads and subresource cascades) for baseline web activity. Inbound `web_server` background traffic uses the separate `web_session_profiles.yaml` visitor mix: `traffic_rates.web` counts top-level visitor actions, then page assets and same-origin API calls fan out automatically.

```yaml
personas:
Expand Down Expand Up @@ -524,7 +524,7 @@ The generation engine automatically provides several layers of realism in baseli

**NTP time synchronization:** In AD environments, all domain-joined workstations sync NTP from the domain controller (W32Time service), not from external NIST servers. NTP stratum is stable per server — a DC serving as NTP always reports the same stratum value. External NTP servers are only used for non-domain environments.

**Multi-sensor timing realism:** When multiple Zeek sensors observe the same connection, each sensor's records have a deterministic propagation delay (100-500 microseconds) based on the sensor's position. Sensors farther from the packet source see events slightly later. Byte and packet counts are identical across sensors (both see the same packets on the wire), but timestamps and durations differ.
**Multi-sensor timing realism:** When multiple Zeek sensors observe the same connection, each sensor's records use the well-synced network sensor timing profile in `config/activity/timing_profiles.yaml`. The default profile keeps stable per-sensor clock skew within +/-1.5 ms and per-flow path/capture delay within 50-2000 microseconds. Byte and packet counts remain canonical unless sensor observation variance is explicitly allowed for that source-native row.

**Linux syslog depth:** Linux hosts generate 18 categories of syslog messages: SSH login/key exchange (70% key / 30% password), package management, systemd timer execution, logrotate detail, journald statistics, plus systemd lifecycle, cron, UFW, logind, and more. Distro-aware (Ubuntu vs RHEL) with appropriate daemon names and paths.

Expand Down
Loading
Loading