Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For details on the overlay system, manual editing, and cross-file dependencies,

EvidenceForge creates multi-format security log datasets from YAML scenario definitions. You describe an environment (users, systems, network topology) and a storyline (attack events), and EvidenceForge generates temporally consistent logs across all formats simultaneously — complete with cross-referenced LogonIDs, PIDs, timestamps, and UIDs.

Every attack scenario includes a `GROUND_TRUTH.md` file documenting exactly what happened, when, and where — making the datasets immediately usable for threat hunting training.
Every generated scenario includes a `GROUND_TRUTH.md` file. Attack scenarios document exactly what happened, when, and where, while baseline-only scenarios explicitly document that no malicious events were generated.

### Key Capabilities

Expand All @@ -106,7 +106,7 @@ Every attack scenario includes a `GROUND_TRUTH.md` file documenting exactly what
- **Realistic baseline noise** — 26 lateral movement patterns, process→network correlation, network-level red herrings, and 18 Linux syslog categories create noise that analysts must work through
- **OS-aware generation** — Windows systems produce Windows Event + Sysmon logs; Linux systems produce syslog + bash history
- **Network visibility modeling** — Define sensor placement (SPAN/TAP), direction, and monitored segments
- **Ground truth documentation** — Every attack scenario generates a GROUND_TRUTH.md with narrative, timeline, and IOCs
- **Ground truth documentation** — Every run generates a GROUND_TRUTH.md; attack scenarios include narrative, timeline, and IOCs
- **Parallel generation** — Threaded emitters write all formats simultaneously with temporal consistency
- **Scenario validation** — Cross-reference checking, uniqueness constraints, and network topology validation
- **Data quality evaluation** — 5-dimension scoring framework (23 sub-scores) with acceptance criteria
Expand Down
2 changes: 1 addition & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ Verification is complete: dedicated `tests/unit/test_world_model.py` coverage wa
- [x] Security: cap firewall deny baseline amplification (`deny_ratio`/hourly deny volume) to prevent scenario-driven local DoS — `NetworkSensor.deny_ratio` now enforces `<= 50.0`.
- [x] Security: prevent IPv6 scenario DoS in DNS AAAA fallback (`_ipv4_to_fake_ipv6` no longer evaluates for IPv6 destination IPs; AAAA uses mapped IPv6 or preserves IPv6 literal).
- [x] Security: bounded/pruned ActivityGenerator DNS cache (60s prune cadence, 600s TTL-horizon eviction, 50k hard cap) to prevent unbounded memory growth from unique `(src_ip, hostname)` keys.
- [ ] `eforge generate --force` overwrite can fail for scenarios that do not emit `GROUND_TRUTH.md` — explicit-proxy smoke testing exposed that replacing an existing output directory expects staged ground truth even when fresh no-storyline generation produced only `data/`. Decide whether no-storyline generation should always write an empty `GROUND_TRUTH.md` or overwrite swap should tolerate its absence.
- [x] `eforge generate --force` overwrite can fail for scenarios that do not emit `GROUND_TRUTH.md` — fixed the root contract so every successful generation emits a matched `data/`, `GROUND_TRUTH.md`, and `OBSERVATION_MANIFEST.json` sidecar set, including baseline-only scenarios. The CLI swap stays strict and now requires staged data, ground truth, and observation manifest before replacing old output. Verification passed with focused engine/CLI/ground-truth/manifest tests, `eforge validate-config`, Ruff checks, and full normal `uv run pytest -v` (`3051 passed, 15 skipped`).

- [x] **`uv.lock` not committed** — gitignored, so CI `setup-uv@v4` cache fails. Remove from `.gitignore` and commit.
- [x] **`eforge validate` can't find personas in dev mode** — works when installed (`eforge validate`) but not via `uv run eforge validate`. Blocks dev workflow.
Expand Down
7 changes: 4 additions & 3 deletions commands/eforge/generate.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,8 @@ Generation writes log files to a `data/` subdirectory alongside the scenario fil
scenarios/<scenario-name>/
scenario.yaml ← input
ENVIRONMENT.md ← created by /eforge scenario
GROUND_TRUTH.md ← generated (answer key)
GROUND_TRUTH.md ← generated answer key (empty for benign baseline-only runs)
OBSERVATION_MANIFEST.json ← generated source-observation sidecar
data/ ← generated log files
windows/
security.xml
Expand All @@ -104,14 +105,14 @@ scenarios/<scenario-name>/
...
```

If `data/`, `GROUND_TRUTH.md`, or `ENVIRONMENT.md` already exist, the CLI prompts before overwriting. Use `--force` to skip the prompt (for automation / AI use).
If generated output (`data/`, `GROUND_TRUTH.md`, or `OBSERVATION_MANIFEST.json`) already exists, the CLI prompts before overwriting. Use `--force` to skip the prompt (for automation / AI use). `ENVIRONMENT.md` is scenario-authored and is preserved.

### 3. Post-Generation

After successful generation:
- List the generated files and their sizes
- Check that expected formats were produced
- If the scenario had a storyline, note that `GROUND_TRUTH.md` was generated alongside the scenario file — this is the answer key containing the full attack timeline and IOCs
- Note that `GROUND_TRUTH.md` and `OBSERVATION_MANIFEST.json` were generated alongside the scenario file. For baseline-only runs, `GROUND_TRUTH.md` explicitly says no malicious events were generated.
- `ENVIRONMENT.md` (created by `/eforge scenario`) is already in the same directory — no copying needed
- Note that the causal expansion engine auto-generates prerequisite events (DNS lookups before connections, Kerberos TGT/TGS before logons, audit events from command patterns, etc.) — these appear in the logs but are not explicitly listed in the scenario YAML
- Summarize the output for the user
Expand Down
3 changes: 2 additions & 1 deletion commands/eforge/references/evidence-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ This document lists every evidence type EvidenceForge can generate, where to fin

```
output/
GROUND_TRUTH.md # Attack narrative, timeline, IOCs
GROUND_TRUTH.md # Ground truth sidecar; empty for baseline-only runs
OBSERVATION_MANIFEST.json # Source-observation sidecar for eval
ENVIRONMENT.md # Student-facing environment description (created by /eforge scenario skill)
<hostname.domain>/ # Per-host directories (FQDN)
windows_event_security.xml # Windows Security channel events
Expand Down
9 changes: 5 additions & 4 deletions docs/design/PRD.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The tool addresses the need for realistic, large-volume training datasets withou
- Schema validation for scenario files (Pydantic-based)
- Cross-reference validation (users, systems, personas, groups referenced correctly)
- Evaluation framework with concrete metrics (format compliance, consistency, statistical properties)
- Ground truth documentation (GROUND_TRUTH.md) for scenarios with malicious activity
- Ground truth documentation (GROUND_TRUTH.md) for every generated scenario
- Network topology and sensor placement modeling for traffic visibility
- Persona-based temporal activity distribution with configurable work hours, intensity, and risk profiles
- Comprehensive test coverage (95%+) with pytest
Expand Down Expand Up @@ -154,7 +154,7 @@ eforge generate SCENARIO_FILE [--output DIR] [--verbose] [--debug]
9. Write to organized directory structure with incremental flushing (10K event buffer)
10. Show progress with Rich progress bars (per-hour baseline, per-event storyline)
11. Log details to `generation.log` in output directory
12. Generate GROUND_TRUTH.md when malicious/suspicious activities are present
12. Generate GROUND_TRUTH.md and OBSERVATION_MANIFEST.json sidecars

#### Workflow 6: Evaluate Output
```bash
Expand Down Expand Up @@ -430,7 +430,8 @@ Generated logs are written to a timestamped output directory:
output/
scenario-name-YYYYMMDD-HHMMSS/
generation.log # Detailed generation log
GROUND_TRUTH.md # Attack ground truth (if malicious activity present)
GROUND_TRUTH.md # Ground truth sidecar (empty for baseline-only scenarios)
OBSERVATION_MANIFEST.json # Source-observation sidecar
windows_events.xml # Windows Event Logs
zeek_conn.log # Zeek connection logs
ecar.json # ECAR events
Expand All @@ -442,7 +443,7 @@ output/

**GROUND_TRUTH.md Format**

When a scenario includes malicious or suspicious activities (not baseline-only scenarios), the generator creates a GROUND_TRUTH.md file documenting the attack for training and evaluation purposes.
Every successful generation creates a GROUND_TRUTH.md file. Attack/red-herring scenarios document the narrative, timeline, and IOCs for training and evaluation; baseline-only scenarios explicitly state that no malicious events were generated.

```markdown
# Ground Truth: [Scenario Name]
Expand Down
3 changes: 2 additions & 1 deletion docs/reference/EVIDENCE_FORMATS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ This document lists every evidence type EvidenceForge can generate, where to fin

```
output/
GROUND_TRUTH.md # Attack narrative, timeline, IOCs
GROUND_TRUTH.md # Ground truth sidecar; empty for baseline-only runs
OBSERVATION_MANIFEST.json # Source-observation sidecar for eval
ENVIRONMENT.md # Student-facing environment description (created by /eforge scenario skill)
<hostname.domain>/ # Per-host directories (FQDN)
windows_event_security.xml # Windows Security channel events
Expand Down
10 changes: 7 additions & 3 deletions src/evidenceforge/cli/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ def generate(
console.print(f"\n[bold]Data directory:[/bold] {data_dir}")
console.print(f"[bold]Ground truth:[/bold] {ground_truth_dir / 'GROUND_TRUTH.md'}")

# Check for existing generated output (data/ and GROUND_TRUTH.md only).
# Check for existing generated output (data/ and generated sidecars only).
# ENVIRONMENT.md is authored by /eforge scenario, not the engine — never touch it.
existing = []
if data_dir.exists():
Expand Down Expand Up @@ -387,15 +387,19 @@ def progress_callback(event_type: str, data: dict) -> None:

# Transactional swap: backup old → install new → cleanup backup.
# If any step fails (including KeyboardInterrupt), old output is
# restored from backup. data/ and GROUND_TRUTH.md are always kept
# as a matched pair — partial preservation is never valid.
# restored from backup. data/ and generated sidecars are always kept
# as a matched set — partial preservation is never valid.
if staging_dir:
staged_gt = gen_gt_dir / "GROUND_TRUTH.md"
staged_manifest = gen_gt_dir / OBSERVATION_MANIFEST_FILENAME
if not gen_data_dir.exists():
raise RuntimeError("Staged data/ directory missing after generation")
if not staged_gt.exists():
raise RuntimeError("Staged GROUND_TRUTH.md missing after generation")
if not staged_manifest.exists():
raise RuntimeError(
f"Staged {OBSERVATION_MANIFEST_FILENAME} missing after generation"
)

# Clean up stale rollback dirs from prior killed runs
for stale in ground_truth_dir.glob(".eforge_rollback_*"):
Expand Down
29 changes: 16 additions & 13 deletions src/evidenceforge/generation/engine/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def generate(self) -> None:
2. Generate baseline activity (hour-by-hour iteration)
3. Execute storyline events (if present)
4. Finalize and close emitters
5. Generate GROUND_TRUTH.md (if malicious activity present)
5. Generate GROUND_TRUTH.md and OBSERVATION_MANIFEST.json sidecars
"""
logger.info(f"Starting generation for scenario: {self.scenario.name}")

Expand Down Expand Up @@ -185,17 +185,20 @@ def generate(self) -> None:
self._finalize()
self._report_progress("phase_end", {"phase": "finalize"})

# Phase 5: Generate ground truth (if malicious activity or red herrings present)
if self.malicious_events or self.red_herring_events:
logger.info(
f"Generating GROUND_TRUTH.md with {len(self.malicious_events)} malicious events"
)
self._report_progress(
"phase_start",
{"phase": "ground_truth", "description": "Generating ground truth documentation"},
)
self._generate_ground_truth()
self._report_progress("phase_end", {"phase": "ground_truth"})
# Phase 5: Generate sidecars for every successful run. Baseline-only
# datasets still need an empty GROUND_TRUTH.md so CLI overwrite swaps
# can keep data and metadata as a matched pair.
logger.info(
"Generating GROUND_TRUTH.md with %d malicious events and %d red herrings",
len(self.malicious_events),
len(self.red_herring_events),
)
self._report_progress(
"phase_start",
{"phase": "ground_truth", "description": "Generating ground truth documentation"},
)
self._generate_ground_truth()
self._report_progress("phase_end", {"phase": "ground_truth"})

logger.info("Generation complete")

Expand Down Expand Up @@ -464,7 +467,7 @@ def _finalize(self) -> None:
logger.info("All emitters closed")

def _generate_ground_truth(self) -> None:
"""Generate GROUND_TRUTH.md documentation."""
"""Generate GROUND_TRUTH.md and observation manifest sidecars."""
from evidenceforge.events.observation_manifest import (
OBSERVATION_MANIFEST_FILENAME,
write_observation_manifest,
Expand Down
10 changes: 5 additions & 5 deletions src/evidenceforge/generation/ground_truth.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,34 +509,34 @@ def _format_iocs(self, iocs: dict[str, set]) -> str:
Returns:
Formatted IOC sections (Markdown)
"""
if not iocs:
if not iocs or not any(values for values in iocs.values()):
return "*No IOCs extracted.*\n"

sections = []

# Network IOCs
if "network" in iocs:
if iocs.get("network"):
sections.append("### Network IOCs\n")
for ioc in sorted(iocs["network"]):
sections.append(f"- {ioc}")
sections.append("")

# Process IOCs
if "processes" in iocs:
if iocs.get("processes"):
sections.append("### Process IOCs\n")
for ioc in sorted(iocs["processes"]):
sections.append(f"- {ioc}")
sections.append("")

# User IOCs
if "users" in iocs:
if iocs.get("users"):
sections.append("### User IOCs\n")
for ioc in sorted(iocs["users"]):
sections.append(f"- {ioc} (compromised account)")
sections.append("")

# File IOCs
if "files" in iocs:
if iocs.get("files"):
sections.append("### File IOCs\n")
for ioc in sorted(iocs["files"]):
sections.append(f"- {ioc}")
Expand Down
Loading
Loading