From 7caf4b27c9af4b3ae02fa0969ac3bd98b304cb29 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 15:04:45 -0400 Subject: [PATCH 01/61] fix: improve observation coherence and TLS realism --- TODO.md | 3 + .../config/activity/tls_realism.yaml | 7 +- src/evidenceforge/events/observation.py | 20 +++- .../generation/activity/tls_realism.py | 6 +- .../generation/engine/storyline.py | 19 +++- tests/unit/test_bulk_events.py | 78 +++++++++++++++ tests/unit/test_dhcp_and_certs.py | 16 ++- tests/unit/test_dispatcher.py | 97 +++++++++++++++++++ 8 files changed, 232 insertions(+), 14 deletions(-) diff --git a/TODO.md b/TODO.md index bedd3327..4d9ca82d 100644 --- a/TODO.md +++ b/TODO.md @@ -250,6 +250,9 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Observation-aware automated eval and manifest — generation now writes `OBSERVATION_MANIFEST.json` beside ground truth, `eforge eval` loads it when present, coverage-style causality metrics report raw and observation-adjusted scores for expected non-visible evidence, and correctness/contradiction checks remain strict. Verification passed with config validation, Ruff checks/format checks, focused eval/manifest tests, and full normal `uv run pytest -v` (`3047 passed, 15 skipped`). - [x] Post-host-activity score check — synced `dev`, cleaned up stale TODOs, regenerated/evaluated `scenarios/iteration-test` from the current iteration-test prompt with `enterprise_standard` observation, and ran one blind expert-panel review without entering another fix loop. Automated eval passed at `92.39` over `108,858` records; blind synthetic-confidence averaged `82.75`. Highest-leverage follow-ups are Linux SSH/syslog lifecycle ordering, Zeek observation-tree consistency, X.509 metadata coherence, Windows OS-build/local-SID identity, and static web asset manifests. - [x] Current-dev calibration pass — regenerated and evaluated `scenarios/iteration-test` from current `dev`, fixed actionable cleanliness issues in OCSP optional-field rendering, observation-manifest accounting for sensor-filtered network evidence, Kerberos/domain-logon causal ordering, storyline event timing, storyline trace matching, temporal trace comparison, and visible Windows logon-before-process ordering. Verification passed with `uv run eforge validate-config`, scenario validation with only expected sensor/observation/pivot-linkability warnings, quantitative eval at `94.64` with all hard gates passing, Ruff checks, focused regressions (`164 passed`), and full normal `uv run pytest -v` (`3075 passed, 15 skipped`). +- [ ] **IN PROGRESS** Up-to-10 current-dev assessment continuation — run iterative EvidenceForge realism loops from the latest calibrated iteration-test state, fix the highest-leverage verified findings, commit each completed fix pass, regenerate/evaluate, and preserve loop artifacts. + - Loop 1 baseline eval completed at `93.89` across `107,377` records; blind synthetic-confidence scores were Threat Hunter `76`, Detection `82`, Network `82`, Host/EDR `86`. + - Loop 1 fix pass completed and verified: fixed external CIDR-only segment scan target resolution, coherent SSH syslog and Zeek UID observation decisions, OS-aware TLS destination filtering for Windows update/trust-list domains, and Let's Encrypt RSA/ECDSA chain templates. Verification passed with `uv run eforge validate-config`, focused regressions (`11 passed` plus the adjusted certificate regression), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -v` (`3057 passed, 37 skipped`). - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/activity/tls_realism.yaml b/src/evidenceforge/config/activity/tls_realism.yaml index 148e5de3..31698307 100644 --- a/src/evidenceforge/config/activity/tls_realism.yaml +++ b/src/evidenceforge/config/activity/tls_realism.yaml @@ -139,10 +139,13 @@ certificate_chains: key_length: 4096 child_signature_algorithms: ["sha384WithRSAEncryption"] templates: - - name: lets_encrypt - issuer_patterns: ["*Let's Encrypt*"] + - name: lets_encrypt_rsa + issuer_patterns: ["CN=R3, O=Let's Encrypt, C=US"] intermediates: - "CN=ISRG Root X1, O=Internet Security Research Group, C=US" + - name: lets_encrypt_ecdsa + issuer_patterns: ["CN=E1, O=Let's Encrypt, C=US"] + intermediates: - "CN=ISRG Root X2, O=Internet Security Research Group, C=US" - name: digicert issuer_patterns: ["*DigiCert*"] diff --git a/src/evidenceforge/events/observation.py b/src/evidenceforge/events/observation.py index ff03ee07..19aa0f5b 100644 --- a/src/evidenceforge/events/observation.py +++ b/src/evidenceforge/events/observation.py @@ -188,14 +188,15 @@ def _event_identity(self, source: str, format_name: str, event: SecurityEvent) - group = self._coherent_group_key(source, event) host = self._host_key_for_event(event) timestamp = int(event.timestamp.timestamp() * 1_000_000) + coherent = self._uses_coherent_source_identity(source, group) return "|".join( [ source, - format_name, - event.event_type, + source if coherent else format_name, + source if coherent else event.event_type, host, group, - str(timestamp), + "" if coherent else str(timestamp), ] ) @@ -211,6 +212,10 @@ def _raw_identity(self, source: str, entry: RawLogEntry) -> str: ) def _coherent_group_key(self, source: str, event: SecurityEvent) -> str: + if source == "syslog" and event.syslog and event.syslog.app_name == "sshd": + pid = event.syslog.pid if event.syslog.pid not in (None, "") else "" + if pid: + return f"sshd:{pid}" if event.network: uid = getattr(event.network, "uid", "") or getattr(event.network, "zeek_uid", "") if uid: @@ -233,6 +238,15 @@ def _coherent_group_key(self, source: str, event: SecurityEvent) -> str: return f"ids:{event.ids.sid}:{event.ids.message}" return "event" + @staticmethod + def _uses_coherent_source_identity(source: str, group: str) -> bool: + """Return whether observation delay/drop should be shared within a source group.""" + if source == "syslog" and group.startswith("sshd:"): + return True + if source == "zeek" and (group.startswith("uid:") or group.startswith("dns:")): + return True + return False + def _host_key_for_event(self, event: SecurityEvent) -> str: host = event.dst_host or event.src_host if host: diff --git a/src/evidenceforge/generation/activity/tls_realism.py b/src/evidenceforge/generation/activity/tls_realism.py index b8601132..78de6e3c 100644 --- a/src/evidenceforge/generation/activity/tls_realism.py +++ b/src/evidenceforge/generation/activity/tls_realism.py @@ -343,7 +343,7 @@ def _tls_profile_domains( source_os: str, ) -> list[str]: """Build a profile domain pool from explicit domains, OS overrides, and DNS tags.""" - from evidenceforge.generation.activity.dns_registry import get_domains_by_tag + from evidenceforge.generation.activity.dns_registry import get_domain_tags, get_domains_by_tag from evidenceforge.generation.activity.proxy_uri import get_proxy_domain_class override: dict[str, Any] = {} @@ -374,6 +374,10 @@ def _tls_profile_domains( seen: set[str] = set() unique_domains: list[str] = [] for domain in domains: + domain_tags = set(get_domain_tags(domain)) + os_tags = domain_tags & {"windows", "linux"} + if source_os in {"windows", "linux"} and os_tags and source_os not in os_tags: + continue if get_proxy_domain_class(domain) in _CLEARTEXT_CERT_INFRA_DOMAIN_CLASSES: continue if domain not in seen: diff --git a/src/evidenceforge/generation/engine/storyline.py b/src/evidenceforge/generation/engine/storyline.py index 0ac7df80..9dfeaea5 100644 --- a/src/evidenceforge/generation/engine/storyline.py +++ b/src/evidenceforge/generation/engine/storyline.py @@ -2119,11 +2119,20 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: ) if seg: if is_external_scan: - segment_systems = [ - candidate - for candidate in self.scenario.environment.systems - if candidate.hostname in (seg.systems or []) - ] + segment_hostnames = set(seg.systems or []) + if segment_hostnames: + segment_systems = [ + candidate + for candidate in self.scenario.environment.systems + if candidate.hostname in segment_hostnames + ] + else: + net = ipaddress.ip_network(seg.cidr, strict=False) + segment_systems = [ + candidate + for candidate in self.scenario.environment.systems + if ipaddress.ip_address(candidate.ip) in net + ] all_hosts = [] for candidate in segment_systems: public_target = ( diff --git a/tests/unit/test_bulk_events.py b/tests/unit/test_bulk_events.py index 0e145a93..8616ed4e 100644 --- a/tests/unit/test_bulk_events.py +++ b/tests/unit/test_bulk_events.py @@ -30,6 +30,10 @@ DnsQueryEventSpec, DnsTunnelEventSpec, ExplicitCredentialsEventSpec, + NetworkConfig, + NetworkSegment, + NetworkSensor, + PortScanEventSpec, WebScanEventSpec, WorkstationLockEventSpec, WorkstationUnlockEventSpec, @@ -568,6 +572,80 @@ def test_profile_includes_filtered_and_rejected_closed_ports(self): assert {sample[1] for sample in samples} >= {"S0", "REJ"} +class TestPortScanTargetResolution: + def test_external_target_segment_uses_inferred_segment_members(self): + """External segment scans should work when segment.systems is omitted.""" + from unittest.mock import Mock + + start = datetime(2026, 4, 16, 12, 0, 0, tzinfo=UTC) + web = System( + hostname="WEB-01", + ip="10.10.3.10", + os="Ubuntu 22.04", + type="server", + roles=["web_server"], + ) + proxy = System( + hostname="PROXY-01", + ip="10.10.3.20", + os="Ubuntu 22.04", + type="server", + roles=["forward_proxy"], + ) + network = NetworkConfig( + public_cidrs=["203.14.220.0/28"], + segments=[ + NetworkSegment(name="dmz", cidr="10.10.3.0/24", exposure="both"), + ], + sensors=[ + NetworkSensor( + type="firewall", + name="fw-perimeter", + monitoring_segments=["dmz"], + log_formats=["cisco_asa"], + ) + ], + ) + + class Visibility: + _vip_to_real_ip = {"203.14.220.10": "10.10.3.10"} + + @staticmethod + def get_public_inbound_address(ip: str) -> str | None: + return "203.14.220.10" if ip == "10.10.3.10" else None + + engine = object.__new__(StorylineMixin) + engine.scenario = SimpleNamespace( + environment=SimpleNamespace(systems=[web, proxy], network=network) + ) + engine.dispatcher = SimpleNamespace(visibility_engine=Visibility()) + engine.state_manager = SimpleNamespace(set_current_time=lambda _time: None) + engine.activity_generator = Mock() + engine.activity_generator._ip_to_system = {web.ip: web, proxy.ip: proxy} + + spec = PortScanEventSpec( + source_ip="185.70.41.45", + target_segment="dmz", + target_count=8, + ports=[80], + scan_rate=10, + ) + event = engine._execute_typed_event( + spec=spec, + actor=User(username="apache", full_name="Apache", email="apache@example.com"), + system=web, + time=start, + activity="External DMZ scan", + explicit_types={"port_scan"}, + ) + + assert event["target_count"] == 1 + assert event["total_connections"] == 1 + connection_kwargs = engine.activity_generator.generate_connection.call_args.kwargs + assert connection_kwargs["src_ip"] == "185.70.41.45" + assert connection_kwargs["dst_ip"] == "203.14.220.10" + + # ── WebScanEventSpec ────────────────────────────────────────────────────── diff --git a/tests/unit/test_dhcp_and_certs.py b/tests/unit/test_dhcp_and_certs.py index 9a722128..dcabf9fc 100644 --- a/tests/unit/test_dhcp_and_certs.py +++ b/tests/unit/test_dhcp_and_certs.py @@ -341,7 +341,7 @@ def test_tls_destination_os_overrides_replace_generic_package_domains(self): } assert not {domain for domain in windows_domains if "ubuntu.com" in domain} - assert "download.windowsupdate.com" not in linux_domains + assert not {domain for domain in linux_domains if "windowsupdate.com" in domain} def test_tls_destination_picker_excludes_cleartext_cert_infra_domains(self): """OCSP/CRL responders are HTTP objects, not direct TLS SNI destinations.""" @@ -379,6 +379,14 @@ def test_public_ca_chain_templates_keep_issuer_family(self): assert any( "Sectigo" in subject or "USERTrust" in subject for subject in sectigo["intermediates"] ) + lets_encrypt_rsa = chain_template_for_issuer("CN=R3, O=Let's Encrypt, C=US") + lets_encrypt_ecdsa = chain_template_for_issuer("CN=E1, O=Let's Encrypt, C=US") + assert lets_encrypt_rsa["intermediates"] == [ + "CN=ISRG Root X1, O=Internet Security Research Group, C=US" + ] + assert lets_encrypt_ecdsa["intermediates"] == [ + "CN=ISRG Root X2, O=Internet Security Research Group, C=US" + ] def test_tls_destination_servers_avoid_human_saas_profiles(self): """Server-origin TLS background should not pick browser/SaaS-heavy destinations.""" @@ -822,9 +830,9 @@ def test_intermediate_ca_profile_is_stable_across_leaf_certificates(self): def test_intermediate_signature_algorithm_follows_issuer_key(self): """Intermediate certificate signatures should be signed by the issuer key.""" generator = ActivityGenerator(StateManager(), {}) - issuer_name = "CN=E1, O=Let's Encrypt, C=US" + issuer_name = "CN=Amazon RSA 2048 M01, O=Amazon, C=US" intermediate = None - for seed in range(1, 50): + for seed in range(1, 200): chain = generator._build_tls_certificate_chain( leaf=X509Context( fuid="FLeaf", @@ -837,6 +845,8 @@ def test_intermediate_signature_algorithm_follows_issuer_key(self): connection_uid=f"CLeE1{seed}", rng=random.Random(seed), ) + if len(chain) < 2: + continue candidate = chain[1] if ( certificate_subject_key_profile(candidate.certificate_subject)[0] diff --git a/tests/unit/test_dispatcher.py b/tests/unit/test_dispatcher.py index f6021a52..82ad9a5e 100644 --- a/tests/unit/test_dispatcher.py +++ b/tests/unit/test_dispatcher.py @@ -203,6 +203,103 @@ def test_source_delay_uses_copy_and_preserves_canonical_state(self, monkeypatch) assert event.timestamp == _make_ts() assert dispatcher.source_evidence_status["story-001"]["sysmon"] == {"delayed": 1} + def test_zeek_observation_delay_is_coherent_per_uid(self, monkeypatch): + """Zeek protocol rows for one UID should share source collection delay.""" + monkeypatch.setattr( + "evidenceforge.events.observation.get_observation_profile", + lambda _name: { + "default": { + "missingness": 0.0, + "delay_ms": {"min_ms": 0, "max_ms": 0}, + "host_missingness_multiplier": {"min": 1.0, "max": 1.0}, + }, + "sources": { + "zeek": { + "missingness": 0.0, + "delay_ms": {"min_ms": 5, "max_ms": 1000}, + } + }, + }, + ) + sm = MagicMock(spec=StateManager) + conn = _make_mock_emitter("zeek_conn", handles=True) + http = _make_mock_emitter("zeek_http", handles=True) + dispatcher = EventDispatcher( + state_manager=sm, + emitters={"zeek_conn": conn, "zeek_http": http}, + observation_policy=ObservationPolicy("zeek_delay_test"), + ) + + event = SecurityEvent( + timestamp=_make_ts(), + event_type="connection", + network=NetworkContext( + src_ip="10.0.1.10", + src_port=51111, + dst_ip="10.0.2.20", + dst_port=443, + protocol="tcp", + zeek_uid="CUID123456789", + ), + ) + dispatcher.dispatch(event) + + conn_event = conn.emit.call_args.args[0] + http_event = http.emit.call_args.args[0] + assert conn_event.timestamp == http_event.timestamp + assert conn_event.timestamp > event.timestamp + + def test_syslog_ssh_lifecycle_delay_preserves_session_order(self, monkeypatch): + """SSH lifecycle syslog rows with one sshd PID should share collection delay.""" + monkeypatch.setattr( + "evidenceforge.events.observation.get_observation_profile", + lambda _name: { + "default": { + "missingness": 0.0, + "delay_ms": {"min_ms": 0, "max_ms": 0}, + "host_missingness_multiplier": {"min": 1.0, "max": 1.0}, + }, + "sources": { + "syslog": { + "missingness": 0.0, + "delay_ms": {"min_ms": 5, "max_ms": 1000}, + } + }, + }, + ) + policy = ObservationPolicy("syslog_delay_test") + host = HostContext( + hostname="APP-01", + ip="10.0.3.10", + os="Ubuntu 22.04", + os_category="linux", + system_type="server", + ) + connection = SecurityEvent( + timestamp=_make_ts(), + event_type="syslog", + src_host=host, + syslog=SyslogContext( + app_name="sshd", + pid=5158, + message='Connection from 10.0.1.10 port 52713 on 10.0.3.10 port 22 rdomain ""', + ), + ) + accepted = SecurityEvent( + timestamp=_make_ts() + timedelta(milliseconds=120), + event_type="ssh_session", + dst_host=host, + syslog=SyslogContext( + app_name="sshd", + pid=5158, + message="Accepted publickey for admin from 10.0.1.10 port 52713 ssh2", + ), + ) + + delay = policy.decide("syslog", connection).delay + assert delay == policy.decide("syslog", accepted).delay + assert connection.timestamp + delay < accepted.timestamp + delay + def test_network_visibility_records_filtered_source_status(self): """Network visibility filtering is reflected in source evidence status.""" sm = MagicMock(spec=StateManager) From b476c16841d35370c6f1715b49b409e9f8be1fd5 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 15:36:37 -0400 Subject: [PATCH 02/61] fix: improve web and kerberos baseline realism --- TODO.md | 2 + .../config/activity/web_session_profiles.yaml | 2 + .../generation/activity/http_content.py | 37 ++++ .../generation/activity/site_maps.py | 122 +++++++++++-- .../generation/engine/baseline.py | 162 ++++++++++++++---- tests/unit/test_baseline_canonical.py | 79 +++++++++ tests/unit/test_http_content.py | 15 ++ tests/unit/test_phase5_system_traffic.py | 53 ++++++ tests/unit/test_site_maps.py | 27 ++- tests/unit/test_web_session_profiles.py | 7 + 10 files changed, 455 insertions(+), 51 deletions(-) diff --git a/TODO.md b/TODO.md index 4d9ca82d..49a98b41 100644 --- a/TODO.md +++ b/TODO.md @@ -253,6 +253,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [ ] **IN PROGRESS** Up-to-10 current-dev assessment continuation — run iterative EvidenceForge realism loops from the latest calibrated iteration-test state, fix the highest-leverage verified findings, commit each completed fix pass, regenerate/evaluate, and preserve loop artifacts. - Loop 1 baseline eval completed at `93.89` across `107,377` records; blind synthetic-confidence scores were Threat Hunter `76`, Detection `82`, Network `82`, Host/EDR `86`. - Loop 1 fix pass completed and verified: fixed external CIDR-only segment scan target resolution, coherent SSH syslog and Zeek UID observation decisions, OS-aware TLS destination filtering for Windows update/trust-list domains, and Let's Encrypt RSA/ECDSA chain templates. Verification passed with `uv run eforge validate-config`, focused regressions (`11 passed` plus the adjusted certificate regression), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -v` (`3057 passed, 37 skipped`). + - Loop 2 regeneration/eval completed at `93.80` JSON overall (`94/100` human-readable) across `116,087` records. Hard probes found zero SSH ordering violations, zero Zeek UID gaps, zero Let's Encrypt R3/X2 mismatches, and zero non-Windows `windowsupdate.com` proxy rows. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `68`, Host/EDR `76` (average `74.5`). + - Loop 2 fix pass completed and verified: stabilized per-host deployed web asset cache-buster tokens, made health/status endpoint response sizes small and stable, scoped health-check visitor profiles to server/domain-controller sources, reduced machine-account DC Kerberos volume, skewed service class selection, and fixed member-server SPN targeting for source-native `SMB`/`file_server` names. Verification passed with `uv run eforge validate-config`, focused web/Kerberos regressions, `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3064 passed, 37 skipped`). Regenerated eval passed at `95.19` JSON overall across `92,476` records; probes found max static asset variants per host `2`, zero workstation-sourced health-check rows, max health response size `589` bytes, and DC-local Kerberos 4769 rows reduced to `68/491`. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/activity/web_session_profiles.yaml b/src/evidenceforge/config/activity/web_session_profiles.yaml index afc1d4dc..d4599b2b 100644 --- a/src/evidenceforge/config/activity/web_session_profiles.yaml +++ b/src/evidenceforge/config/activity/web_session_profiles.yaml @@ -37,6 +37,8 @@ visitor_classes: kind: requests external: false internal: true + source_type_any: ["server", "domain_controller"] + source_role_any: ["monitoring", "load_balancer", "forward_proxy", "app_server"] request_count: [1, 2] user_agent_pool: health_check referrer_mode: none diff --git a/src/evidenceforge/generation/activity/http_content.py b/src/evidenceforge/generation/activity/http_content.py index 7f9b3334..d7900008 100644 --- a/src/evidenceforge/generation/activity/http_content.py +++ b/src/evidenceforge/generation/activity/http_content.py @@ -51,6 +51,20 @@ "text/plain": (100, 20_000), } +_HEALTH_ENDPOINT_PATHS = { + "/health", + "/healthz", + "/ready", + "/readyz", + "/status", + "/api/health", + "/api/status", + "/api/v1/health", + "/api/v1/status", + "/api/v2/health", + "/livez", +} + def infer_mime_type_from_path(path: str, default: str = "text/html") -> str: """Infer a response MIME type from a URI path extension. @@ -73,10 +87,31 @@ def response_size_for_mime(rng: random.Random, content_type: str) -> int: return rng.randint(lo, hi) +def is_health_endpoint_path(uri: str) -> bool: + """Return whether a URI path is a small operational health endpoint.""" + clean_path = uri.split("?", 1)[0].split("#", 1)[0].lower().rstrip("/") + if not clean_path: + clean_path = "/" + return clean_path in _HEALTH_ENDPOINT_PATHS + + +def response_size_for_health_endpoint(status_code: int, host: str, uri: str) -> int: + """Return a stable, source-native body size for health/status endpoints.""" + if status_code >= 400: + return response_size_for_status(status_code, host, uri) + clean_path = uri.split("?", 1)[0].split("#", 1)[0].lower().rstrip("/") + rng = random.Random(_stable_seed(f"web_health_response:{status_code}:{host}:{clean_path}")) + if clean_path.endswith("/status") or clean_path == "/status": + return rng.randint(18, 180) + return rng.randint(42, 720) + + def is_stable_resource_path(uri: str) -> bool: """Return whether repeated 200 responses should keep a stable body size.""" clean_path = uri.split("?", 1)[0].split("#", 1)[0].lower() suffix = PurePosixPath(clean_path).suffix.lower() + if is_health_endpoint_path(uri): + return True if clean_path in {"/", "/index.html", "/robots.txt", "/sitemap.xml", "/favicon.ico"}: return True return suffix in { @@ -99,6 +134,8 @@ def is_stable_resource_path(uri: str) -> bool: def response_size_for_status(status_code: int, host: str, uri: str) -> int: """Return a stable source-native web response body size for an HTTP status.""" + if status_code < 400 and is_health_endpoint_path(uri): + return response_size_for_health_endpoint(status_code, host, uri) if status_code < 400: rng = random.Random(_stable_seed(f"web_response:{status_code}:{host}:{uri}")) return response_size_for_mime(rng, normalize_mime_type_for_path(uri, "text/html")) diff --git a/src/evidenceforge/generation/activity/site_maps.py b/src/evidenceforge/generation/activity/site_maps.py index 2364caba..634768b0 100644 --- a/src/evidenceforge/generation/activity/site_maps.py +++ b/src/evidenceforge/generation/activity/site_maps.py @@ -17,6 +17,7 @@ from evidenceforge.config import get_activity_directory from evidenceforge.config.overlay import deep_merge_dict, load_with_overlay +from evidenceforge.utils.rng import _stable_seed _SITE_MAPS_PATH = get_activity_directory() / "site_maps.yaml" _CACHED_DATA: dict[str, Any] | None = None @@ -69,20 +70,97 @@ class SiteMap: cdn_domains: list[str] = field(default_factory=list) -def _substitute_vars(rng: random.Random, path: str, data: dict[str, Any]) -> str: +_STATIC_ASSET_TYPES = { + "application/javascript", + "application/json", + "image/png", + "image/svg+xml", + "image/webp", + "text/css", + "font/woff", + "font/woff2", +} + +_DYNAMIC_RESOURCE_MARKERS = ( + "/avatar/", + "/content/", + "/comments/", + "/patient/", + "/post/", + "/profile-displayphoto", + "/thumb/", + "/u/", +) + + +def _uses_stable_asset_tokens(path: str, content_type: str | None) -> bool: + """Return whether template cache-buster tokens should be stable for a host.""" + if "{hex" not in path: + return False + if content_type not in _STATIC_ASSET_TYPES: + return False + lowered = path.lower() + if any(marker in lowered for marker in _DYNAMIC_RESOURCE_MARKERS): + return False + return ( + "/assets/" in lowered + or "/static/" in lowered + or "bundle" in lowered + or "chunk" in lowered + or lowered.split("?", 1)[0].endswith((".css", ".js", ".png", ".svg", ".webp", ".woff2")) + ) + + +def _stable_hex_token(hostname: str, template: str, token: str, occurrence: int) -> str: + bits = 64 if token == "{hex16}" else 32 + width = bits // 4 + mask = (1 << bits) - 1 + seed = _stable_seed(f"site_map_asset:{hostname}:{template}:{token}:{occurrence}") + return f"{seed & mask:0{width}x}" + + +def _replace_hex_tokens( + rng: random.Random, + path: str, + *, + hostname: str, + template: str, + stable_asset_tokens: bool, +) -> str: + for token, bits in (("{hex8}", 32), ("{hex16}", 64)): + occurrence = 0 + while token in path: + if stable_asset_tokens: + replacement = _stable_hex_token(hostname, template, token, occurrence) + else: + replacement = f"{rng.getrandbits(bits):0{bits // 4}x}" + path = path.replace(token, replacement, 1) + occurrence += 1 + return path + + +def _substitute_vars( + rng: random.Random, + path: str, + data: dict[str, Any], + *, + hostname: str = "", + content_type: str | None = None, +) -> str: """Replace template variables in a URI path.""" + stable_asset_tokens = _uses_stable_asset_tokens(path, content_type) + template = path if "{guid}" in path: path = path.replace("{guid}", str(uuid.UUID(int=rng.getrandbits(128))), 1) if "{guid}" in path: path = path.replace("{guid}", str(uuid.UUID(int=rng.getrandbits(128))), 1) - if "{hex8}" in path: - path = path.replace("{hex8}", f"{rng.getrandbits(32):08x}", 1) - if "{hex8}" in path: - path = path.replace("{hex8}", f"{rng.getrandbits(32):08x}", 1) - if "{hex16}" in path: - path = path.replace("{hex16}", f"{rng.getrandbits(64):016x}", 1) - if "{hex16}" in path: - path = path.replace("{hex16}", f"{rng.getrandbits(64):016x}", 1) + path = _replace_hex_tokens( + rng, + path, + hostname=hostname, + template=template, + stable_asset_tokens=stable_asset_tokens, + ) if "{search_term}" in path: search_terms = data.get("search_terms", ["enterprise+software"]) path = path.replace("{search_term}", rng.choice(search_terms)) @@ -107,11 +185,18 @@ def _build_subresources( rng: random.Random, raw_list: list[dict[str, str]], data: dict[str, Any], + hostname: str, ) -> list[SubresourceDef]: """Convert raw YAML subresource dicts to SubresourceDef objects.""" result = [] for item in raw_list: - path = _substitute_vars(rng, item.get("path", "/"), data) + path = _substitute_vars( + rng, + item.get("path", "/"), + data, + hostname=item.get("host") or hostname, + content_type=item.get("type", "application/octet-stream"), + ) result.append( SubresourceDef( path=path, @@ -125,6 +210,7 @@ def _build_subresources( def _build_pages_from_curated( rng: random.Random, + hostname: str, domain_entry: dict[str, Any], data: dict[str, Any], ) -> list[PageDef]: @@ -132,7 +218,12 @@ def _build_pages_from_curated( pages = [] for page_raw in domain_entry.get("pages", []): nav_targets = [_substitute_vars(rng, t, data) for t in page_raw.get("nav_targets", [])] - subresources = _build_subresources(rng, page_raw.get("subresources", []), data) + subresources = _build_subresources( + rng, + page_raw.get("subresources", []), + data, + hostname, + ) pages.append( PageDef( path=_substitute_vars(rng, page_raw["path"], data), @@ -146,6 +237,7 @@ def _build_pages_from_curated( def _build_pages_from_tag( rng: random.Random, + hostname: str, tag_entry: dict[str, Any], data: dict[str, Any], ) -> list[PageDef]: @@ -156,7 +248,7 @@ def _build_pages_from_tag( pattern_name = tmpl.get("subresource_pattern", "") raw_subs = patterns.get(pattern_name, []) nav_targets = [_substitute_vars(rng, t, data) for t in tmpl.get("nav_targets", [])] - subresources = _build_subresources(rng, raw_subs, data) + subresources = _build_subresources(rng, raw_subs, data, hostname) pages.append( PageDef( path=_substitute_vars(rng, tmpl["path"], data), @@ -197,7 +289,7 @@ def get_site_map( domains = data.get("domains", {}) if hostname in domains: entry = domains[hostname] - pages = _build_pages_from_curated(rng, entry, data) + pages = _build_pages_from_curated(rng, hostname, entry, data) cdn = entry.get("cdn_domains", []) return SiteMap(hostname=hostname, pages=pages, cdn_domains=cdn) @@ -205,10 +297,10 @@ def get_site_map( tags = data.get("tags", {}) for tag in domain_tags: if tag in tags: - pages = _build_pages_from_tag(rng, tags[tag], data) + pages = _build_pages_from_tag(rng, hostname, tags[tag], data) return SiteMap(hostname=hostname, pages=pages) # Tier 3: Generic fallback generic = data.get("generic", {}) - pages = _build_pages_from_tag(rng, generic, data) + pages = _build_pages_from_tag(rng, hostname, generic, data) return SiteMap(hostname=hostname, pages=pages) diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 762133f7..9e1d7d1d 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -591,6 +591,39 @@ def _windows_scheduled_task_offsets( "sunday": 6, } +_DC_KERBEROS_MEMBER_SVC_DIST = ( + ("cifs", 52), + ("host", 16), + ("http", 12), + ("ldap", 8), + ("rpcss", 5), + ("wsman", 4), + ("termsrv", 3), +) +_DC_KERBEROS_LOCAL_SVC_DIST = ( + ("ldap", 42), + ("cifs", 22), + ("host", 18), + ("DNS", 12), + ("rpcss", 4), + ("http", 2), +) +_KERBEROS_MEMBER_SERVICE_MARKERS = { + "app-server", + "crm", + "exchange", + "file-server", + "iis", + "mssql", + "print", + "sharepoint", + "smb", + "sql-server", + "web", + "web-server", + "windows-search", +} + def _merge_systemd_schedules(default: dict, overlay: dict) -> dict: """Merge overlay systemd schedules into defaults (keyed by service name).""" @@ -643,6 +676,47 @@ def _machine_account_ntlm_offset_seconds(tgt_offset_seconds: float, rng: random. return max(0.0, min(3599.0, candidate)) +def _dc_kerberos_cycle_range(multiplier: float) -> tuple[int, int]: + """Return per-client DC Kerberos cycle bounds without letting DC roles explode volume.""" + return scale_count_range(1, 3, min(max(multiplier, 0.25), 2.5)) + + +def _dc_kerberos_tgs_range(multiplier: float) -> tuple[int, int]: + """Return service-ticket burst bounds for one machine-account Kerberos cycle.""" + return scale_count_range(1, 2, min(max(multiplier, 0.25), 1.5)) + + +def _pick_dc_kerberos_service(rng: random.Random, *, target_is_dc: bool) -> str: + """Pick a Kerberos service class with source-native skew instead of uniform buckets.""" + dist = _DC_KERBEROS_LOCAL_SVC_DIST if target_is_dc else _DC_KERBEROS_MEMBER_SVC_DIST + values = [entry[0] for entry in dist] + weights = [entry[1] for entry in dist] + return rng.choices(values, weights=weights, k=1)[0] + + +def _pick_dc_kerberos_target( + rng: random.Random, + member_servers: list[str], + dc_hostname: str, +) -> tuple[str, bool]: + """Pick a service-ticket target, favoring member services over the DC itself.""" + if member_servers and rng.random() < 0.82: + return rng.choice(member_servers), False + return dc_hostname, True + + +def _is_kerberos_member_server(system: Any) -> bool: + """Return whether a Windows host should receive routine machine-account TGS traffic.""" + host_type = str(getattr(system, "type", "")).lower() + if host_type not in {"server", "workstation"}: + return False + services = {str(value).lower().replace("_", "-") for value in getattr(system, "services", [])} + roles = { + str(value).lower().replace("_", "-") for value in (getattr(system, "roles", None) or []) + } + return bool((services | roles) & _KERBEROS_MEMBER_SERVICE_MARKERS) + + class BaselineMixin: """Mixin providing baseline activity generation methods.""" @@ -5032,7 +5106,9 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 None, ) for client in windows_clients: - num_cycles = self._scaled_randint(rng, dc_system, "dc_kerberos", 3, 8) + dc_kerberos_multiplier = self._activity_multiplier(dc_system, "dc_kerberos") + cycle_lo, cycle_hi = _dc_kerberos_cycle_range(dc_kerberos_multiplier) + num_cycles = rng.randint(cycle_lo, cycle_hi) base_interval = 3600 / (num_cycles + 1) for i in range(num_cycles): offset = base_interval * (i + 1) + rng.gauss(0, base_interval * 0.15) @@ -5050,43 +5126,27 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 if rng.random() < 0.22: num_tgs = 0 else: - num_tgs = self._scaled_randint( - rng, - dc_system, - "dc_kerberos", - 1, - 5, - ) + tgs_lo, tgs_hi = _dc_kerberos_tgs_range(dc_kerberos_multiplier) + num_tgs = rng.randint(tgs_lo, tgs_hi) member_servers = [ s.hostname for s in self.scenario.environment.systems if _get_os_category(s.os) == "windows" and s.ip not in dc_ips - and any( - svc in s.services - for svc in [ - "file-server", - "sql-server", - "web", - "iis", - "exchange", - "sharepoint", - "crm", - "print", - ] - ) - ] or [dc_hostname] + and _is_kerberos_member_server(s) + ] elapsed_ms = 0 for tgs_i in range(num_tgs): elapsed_ms += _machine_account_tgs_gap_ms(rng, first=tgs_i == 0) ts2 = ts + timedelta(milliseconds=elapsed_ms) if ts2 >= current_hour + timedelta(hours=1): continue - svc = rng.choice(["cifs", "ldap", "http", "host"]) - if rng.random() < 0.60 and member_servers: - target = rng.choice(member_servers) - else: - target = dc_hostname + target, target_is_dc = _pick_dc_kerberos_target( + rng, + member_servers, + dc_hostname, + ) + svc = _pick_dc_kerberos_service(rng, target_is_dc=target_is_dc) self.activity_generator.generate_kerberos_service_ticket( username=username, service_name=f"{svc}/{target}", @@ -5769,7 +5829,8 @@ def _emit_web_server_access( if top_level_budget <= 0: return - internal_ips = [s.ip for s in systems if s.ip != sys_obj.ip] + internal_client_systems = [s for s in systems if s.ip != sys_obj.ip] + internal_ips = [s.ip for s in internal_client_systems] segment = self._get_segment_for_system(sys_obj) exposure = segment.exposure if segment else self._get_system_exposure(sys_obj) ext_ratio = ( @@ -5794,6 +5855,32 @@ def _choose_client_ip() -> str: return rng.choices(internal_ips, weights=int_ip_weights, k=1)[0] return "10.0.0.1" + def _profile_restricted_internal_pool( + profile: dict[str, Any], + ) -> tuple[list[str], list[float]] | None: + raw_types = profile.get("source_type_any") + raw_roles = profile.get("source_role_any") + type_filter = ( + {str(value) for value in raw_types} if isinstance(raw_types, list) else set() + ) + role_filter = ( + {str(value) for value in raw_roles} if isinstance(raw_roles, list) else set() + ) + if not type_filter and not role_filter: + return None + + candidates = [] + for candidate in internal_client_systems: + candidate_type = str(getattr(candidate, "type", "")) + candidate_roles = {str(role) for role in (getattr(candidate, "roles", None) or [])} + if candidate_type in type_filter or candidate_roles & role_filter: + candidates.append(candidate) + if not candidates: + return [], [] + ips = [candidate.ip for candidate in candidates] + weights = [1.0 / (i + 1) for i in range(len(ips))] + return ips, weights + def _effective_dst_ip(is_external_client: bool) -> str: dispatcher = getattr(self, "dispatcher", None) if is_external_client and dispatcher is not None: @@ -5833,6 +5920,21 @@ def _tool_gap_ms() -> int: attempts += 1 client_ip = _choose_client_ip() is_external_client = not _is_private_ip(client_ip) + profile_name, profile = pick_web_visitor_profile( + rng, + is_external=is_external_client, + ) + + restricted_pool = None + if not is_external_client: + restricted_pool = _profile_restricted_internal_pool(profile) + if restricted_pool is not None: + restricted_ips, restricted_weights = restricted_pool + if not restricted_ips: + continue + client_ip = rng.choices(restricted_ips, weights=restricted_weights, k=1)[0] + is_external_client = False + dst_port = 443 if is_external_client and rng.random() < 0.85 else 80 dst_service = "ssl" if dst_port == 443 else "http" http_host = ( @@ -5842,10 +5944,6 @@ def _tool_gap_ms() -> int: ) client_sys = ip_map.get(client_ip) source_os = _get_os_category(client_sys.os) if client_sys is not None else None - profile_name, profile = pick_web_visitor_profile( - rng, - is_external=is_external_client, - ) ua_rng = random.Random( _stable_seed( f"web_client_ua:{client_ip}:{http_host}:{profile_name}:{source_os or 'external'}" diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 146f0c0e..2aa54819 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -1343,3 +1343,82 @@ def test_web_server_access_keeps_scanner_requests_source_native(self, monkeypatc assert {kw["http"].status_code for kw in collected} == {404} assert {kw["http"].uri for kw in collected} == {"/wp-login.php"} assert all(kw["http"].referrer == "" for kw in collected) + + def test_health_checks_use_server_scoped_internal_sources(self, monkeypatch): + """Monitoring UAs should not be sourced from ordinary workstations.""" + from random import Random + from types import SimpleNamespace + from unittest.mock import MagicMock + + from evidenceforge.generation.activity import web_session_profiles + from evidenceforge.generation.engine.baseline import BaselineMixin + + monkeypatch.setattr( + web_session_profiles, + "pick_web_visitor_profile", + lambda rng, *, is_external: ( + "health_check", + { + "kind": "requests", + "request_count": [1, 1], + "user_agent_pool": "health_check", + "source_type_any": ["server", "domain_controller"], + "source_role_any": ["monitoring", "load_balancer", "forward_proxy"], + "referrer_mode": "none", + "requests": [ + { + "path": "/api/v1/health", + "method": "GET", + "status": 200, + "type": "application/json", + "weight": 1, + } + ], + }, + ), + ) + + target = self._make_web_system("internal") + workstation = SimpleNamespace( + hostname="WS-01", + ip="10.0.10.20", + os="Windows 11", + type="workstation", + roles=[], + services=[], + ) + monitor = SimpleNamespace( + hostname="MON-01", + ip="10.0.10.30", + os="Linux Ubuntu 22.04", + type="server", + roles=["monitoring"], + services=["prometheus"], + ) + + collected = [] + activity_gen = MagicMock() + activity_gen._ip_to_system = {workstation.ip: workstation, monitor.ip: monitor} + activity_gen.generate_connection.side_effect = lambda **kw: collected.append(kw) + engine = MagicMock() + engine.activity_generator = activity_gen + engine._resolve_traffic_rate.return_value = (4, 4) + engine._get_segment_for_system.return_value = SimpleNamespace( + exposure="internal", + external_ratio=None, + ) + engine._generate_external_client_ip.return_value = "8.8.8.8" + + BaselineMixin._emit_web_server_access( + engine, + target, + [target, workstation, monitor], + Random(42), + datetime(2024, 3, 15, 10, 0, 0, tzinfo=UTC), + ) + + assert collected + assert {kw["src_ip"] for kw in collected} == {monitor.ip} + assert all(kw["source_system"] is monitor for kw in collected) + assert all(kw["http"].uri == "/api/v1/health" for kw in collected) + assert all(42 <= kw["http"].response_body_len <= 720 for kw in collected) diff --git a/tests/unit/test_http_content.py b/tests/unit/test_http_content.py index dede0e8b..e98dd6cb 100644 --- a/tests/unit/test_http_content.py +++ b/tests/unit/test_http_content.py @@ -7,8 +7,10 @@ from evidenceforge.generation.activity.http_content import ( infer_mime_type_from_path, + is_health_endpoint_path, is_stable_resource_path, normalize_mime_type_for_path, + response_size_for_health_endpoint, response_size_for_mime, response_size_for_status, ) @@ -48,6 +50,7 @@ def test_stable_resource_path_identifies_static_web_content(): assert is_stable_resource_path("/assets/vendor.js?cache=1") assert is_stable_resource_path("/robots.txt") assert is_stable_resource_path("/index.html") + assert is_stable_resource_path("/api/v1/health") assert not is_stable_resource_path("/api/v1/events") @@ -58,3 +61,15 @@ def test_success_response_size_is_stable_for_same_resource(): assert first == second assert first != sibling + + +def test_health_endpoint_response_sizes_are_small_and_stable(): + assert is_health_endpoint_path("/api/v1/health?probe=1") + + first = response_size_for_health_endpoint(200, "portal.example.com", "/api/v1/health") + second = response_size_for_status(200, "portal.example.com", "/api/v1/health") + status = response_size_for_status(200, "portal.example.com", "/status") + + assert first == second + assert 42 <= first <= 720 + assert 18 <= status <= 180 diff --git a/tests/unit/test_phase5_system_traffic.py b/tests/unit/test_phase5_system_traffic.py index 8b395e72..29cf13d1 100644 --- a/tests/unit/test_phase5_system_traffic.py +++ b/tests/unit/test_phase5_system_traffic.py @@ -30,8 +30,13 @@ from evidenceforge.generation.activity import ActivityGenerator from evidenceforge.generation.engine.baseline import ( + _dc_kerberos_cycle_range, + _dc_kerberos_tgs_range, + _is_kerberos_member_server, _machine_account_ntlm_offset_seconds, _machine_account_tgs_gap_ms, + _pick_dc_kerberos_service, + _pick_dc_kerberos_target, _registry_writer_candidates, ) from evidenceforge.generation.state_manager import StateManager @@ -633,6 +638,54 @@ def test_machine_account_ntlm_offset_avoids_same_second_kerberos(self): assert all(0 <= offset <= 3599 for offset in offsets) assert all(abs(offset - tgt_offset) >= 2.0 for offset in offsets) + def test_dc_kerberos_counts_are_capped_for_high_activity_dcs(self): + """DC activity multipliers should not explode machine-account TGS volume.""" + assert _dc_kerberos_cycle_range(8.0) == (2, 8) + assert _dc_kerberos_tgs_range(8.0) == (2, 3) + + def test_dc_kerberos_service_distribution_is_skewed(self): + """Baseline service-ticket classes should not be uniform buckets.""" + from collections import Counter + + member_rng = random.Random(21) + dc_rng = random.Random(22) + member_counts = Counter( + _pick_dc_kerberos_service(member_rng, target_is_dc=False) for _ in range(500) + ) + dc_counts = Counter( + _pick_dc_kerberos_service(dc_rng, target_is_dc=True) for _ in range(500) + ) + + assert member_counts["cifs"] > member_counts["http"] > member_counts["termsrv"] + assert dc_counts["ldap"] > dc_counts["cifs"] > dc_counts["http"] + + def test_dc_kerberos_targets_prefer_member_servers_when_available(self): + rng = random.Random(23) + picks = [_pick_dc_kerberos_target(rng, ["FILE-01", "APP-01"], "DC-01") for _ in range(200)] + + member_count = sum(1 for _target, is_dc in picks if not is_dc) + assert member_count > 140 + + def test_kerberos_member_server_detector_handles_roles_and_source_native_services(self): + file_server = System( + hostname="FILE-SRV-01", + ip="10.0.0.20", + os="Windows Server 2019", + type="server", + services=["SMB", "Windows Search"], + roles=["file_server"], + ) + ordinary_workstation = System( + hostname="WS-01", + ip="10.0.0.30", + os="Windows 11", + type="workstation", + services=["dns-client"], + ) + + assert _is_kerberos_member_server(file_server) + assert not _is_kerberos_member_server(ordinary_workstation) + def test_detects_mssql_from_services(self): """DB servers should be detected from system services list.""" from evidenceforge.generation.engine import GenerationEngine diff --git a/tests/unit/test_site_maps.py b/tests/unit/test_site_maps.py index 4b6e54b3..299cd0d1 100644 --- a/tests/unit/test_site_maps.py +++ b/tests/unit/test_site_maps.py @@ -153,14 +153,33 @@ def test_deterministic_with_same_seed(self): def test_different_seeds_produce_different_vars(self): """Different seeds should produce different template substitutions.""" - sm1 = get_site_map("outlook.office365.com", [], random.Random(1)) - sm2 = get_site_map("outlook.office365.com", [], random.Random(999)) - # The page paths are fixed (/owa/, /owa/#/mail, etc.) but subresource - # paths contain {hex16} which should differ + sm1 = get_site_map("github.com", [], random.Random(1)) + sm2 = get_site_map("github.com", [], random.Random(999)) + # User/content URLs should still vary between sessions. subs1 = [s.path for p in sm1.pages for s in p.subresources] subs2 = [s.path for p in sm2.pages for s in p.subresources] assert subs1 != subs2 + def test_deployment_static_asset_hashes_are_stable_per_host(self): + """Cache-busted JS/CSS bundles should look like a deployed app version.""" + sm1 = get_site_map("portal.example.com", ["web"], random.Random(1)) + sm2 = get_site_map("portal.example.com", ["web"], random.Random(999)) + + def _paths(site_map): + return [s.path for p in site_map.pages for s in p.subresources] + + paths1 = _paths(sm1) + paths2 = _paths(sm2) + app_bundles1 = [path for path in paths1 if "/assets/js/app.bundle." in path] + app_bundles2 = [path for path in paths2 if "/assets/js/app.bundle." in path] + content_images1 = [path for path in paths1 if "/assets/img/content/" in path] + content_images2 = [path for path in paths2 if "/assets/img/content/" in path] + + assert app_bundles1 + assert app_bundles1 == app_bundles2 + assert content_images1 + assert content_images1 != content_images2 + def test_favicon_in_curated_domains(self): """Curated domains should include favicon.ico in subresources.""" rng = random.Random(42) diff --git a/tests/unit/test_web_session_profiles.py b/tests/unit/test_web_session_profiles.py index 49c255e0..967ac02e 100644 --- a/tests/unit/test_web_session_profiles.py +++ b/tests/unit/test_web_session_profiles.py @@ -42,6 +42,13 @@ def test_external_profile_selection_excludes_internal_health_checks(): assert name != "health_check" +def test_health_check_profile_is_server_scoped(): + profile = load_web_session_profiles()["visitor_classes"]["health_check"] + + assert profile["source_type_any"] == ["server", "domain_controller"] + assert "monitoring" in profile["source_role_any"] + + def test_user_agent_honors_source_os_pool(): profile = load_web_session_profiles()["visitor_classes"]["human_browser"] ua = pick_web_user_agent(random.Random(1), profile, source_os="linux") From 7e6c7ea895185ec199f83fe3de96f4a712ba3e21 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 15:56:09 -0400 Subject: [PATCH 03/61] fix: repair service logon and linux telemetry realism --- TODO.md | 2 + .../activity/extra_syslog_messages.yaml | 7 +- .../generation/activity/generator.py | 105 +++++++++++++++++- tests/unit/test_activity.py | 72 ++++++++++++ tests/unit/test_spawn_rules.py | 59 ++++++++++ 5 files changed, 240 insertions(+), 5 deletions(-) diff --git a/TODO.md b/TODO.md index 49a98b41..bdb85971 100644 --- a/TODO.md +++ b/TODO.md @@ -255,6 +255,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 1 fix pass completed and verified: fixed external CIDR-only segment scan target resolution, coherent SSH syslog and Zeek UID observation decisions, OS-aware TLS destination filtering for Windows update/trust-list domains, and Let's Encrypt RSA/ECDSA chain templates. Verification passed with `uv run eforge validate-config`, focused regressions (`11 passed` plus the adjusted certificate regression), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -v` (`3057 passed, 37 skipped`). - Loop 2 regeneration/eval completed at `93.80` JSON overall (`94/100` human-readable) across `116,087` records. Hard probes found zero SSH ordering violations, zero Zeek UID gaps, zero Let's Encrypt R3/X2 mismatches, and zero non-Windows `windowsupdate.com` proxy rows. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `68`, Host/EDR `76` (average `74.5`). - Loop 2 fix pass completed and verified: stabilized per-host deployed web asset cache-buster tokens, made health/status endpoint response sizes small and stable, scoped health-check visitor profiles to server/domain-controller sources, reduced machine-account DC Kerberos volume, skewed service class selection, and fixed member-server SPN targeting for source-native `SMB`/`file_server` names. Verification passed with `uv run eforge validate-config`, focused web/Kerberos regressions, `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3064 passed, 37 skipped`). Regenerated eval passed at `95.19` JSON overall across `92,476` records; probes found max static asset variants per host `2`, zero workstation-sourced health-check rows, max health response size `589` bytes, and DC-local Kerberos 4769 rows reduced to `68/491`. + - Loop 3 blind review completed against `/private/tmp/eforge-assess-loop3-neutral-b476c16/data`. Blind synthetic-confidence scores were Threat Hunter `62`, Detection `68`, Network `72`, Host/EDR `72` (average `68.5`). Top verified fix targets are the DC service-logon + `explorer.exe` parent contradiction, Zeek HTTP `trans_depth > 1` on one-request unique-UID connections, repeated Linux syslog daemon lines, and role-drifted bash command pools. + - Loop 3 fix pass completed and verified: service-logon sessions now stay service-shaped instead of spawning Explorer children, process telemetry carries LogonType 5 token/session semantics, one-request HTTP connections clamp Zeek `trans_depth` to 1, Linux eCAR command rendering no longer quotes expandable glob tokens, DB host shell noise avoids web-admin command pools, and high-repeat syslog daemon messages have reduced weights. Verification passed with `uv run eforge validate-config`, focused activity/spawn/config regressions (`269 passed, 1 skipped` plus focused new tests), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3069 passed, 37 skipped`). - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/activity/extra_syslog_messages.yaml b/src/evidenceforge/config/activity/extra_syslog_messages.yaml index bb4a3190..41d4d886 100644 --- a/src/evidenceforge/config/activity/extra_syslog_messages.yaml +++ b/src/evidenceforge/config/activity/extra_syslog_messages.yaml @@ -29,6 +29,7 @@ programs: - "[system] Activating via systemd: service name='org.freedesktop.timedate1'" - app: rsyslogd + weight: 1 messages: - "imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog'" - '[origin software="rsyslogd"] rsyslogd was HUPed' @@ -37,8 +38,8 @@ programs: transient: true weight: 2 messages: - - "admin : TTY=pts/0 ; PWD=/home/admin ; USER=root ; COMMAND=/bin/systemctl status nginx" - - "deploy : TTY=pts/1 ; PWD=/srv/app ; USER=root ; COMMAND=/usr/bin/systemctl reload nginx" + - "admin : TTY=pts/0 ; PWD=/home/admin ; USER=root ; COMMAND=/bin/systemctl status ssh" + - "deploy : TTY=pts/1 ; PWD=/srv/app ; USER=root ; COMMAND=/usr/bin/systemctl status app-agent" - "ops : TTY=pts/2 ; PWD=/home/ops ; USER=root ; COMMAND=/usr/bin/journalctl -u ssh -n 50" - "ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/apt list --upgradable" @@ -71,6 +72,7 @@ programs: - "sda: remaining active paths: 1" - app: accounts-daemon + weight: 1 messages: - "user 'admin' has logged in" @@ -98,5 +100,6 @@ programs: - "cooling device 0 intel_powerclamp type: 0x02" - app: irqbalance + weight: 1 messages: - "Balancing is ineffective IRQs are pinned and balanced" diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index fb249dc6..09ec2cb1 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -36,6 +36,7 @@ import re import shlex import uuid +from dataclasses import replace from datetime import UTC, datetime, timedelta from threading import Lock from typing import Any, Optional @@ -1274,9 +1275,9 @@ def _linux_command_process_from_stage(stage: str) -> tuple[str, str] | None: if alias is not None: image, command_line = alias if index + 1 < len(parts): - command_line = f"{command_line} {shlex.join(parts[index + 1 :])}" + command_line = f"{command_line} {_shell_display_join(parts[index + 1 :])}" return image, command_line - command_line = shlex.join(parts[index:]) + command_line = _shell_display_join(parts[index:]) if parts[index].startswith("/"): return parts[index], command_line mapped = _LINUX_COMMAND_IMAGE_OVERRIDES.get(executable) @@ -1285,6 +1286,17 @@ def _linux_command_process_from_stage(stage: str) -> tuple[str, str] | None: return None +def _shell_display_join(parts: list[str]) -> str: + """Render shell argv for telemetry without quoting expandable glob tokens.""" + rendered: list[str] = [] + for part in parts: + if any(marker in part for marker in ("*", "?", "[")): + rendered.append(part) + else: + rendered.append(shlex.quote(part)) + return " ".join(rendered) + + def _strip_linux_shell_redirections(parts: list[str]) -> list[str]: """Remove shell redirection operators and targets from argv tokens.""" cleaned: list[str] = [] @@ -3067,6 +3079,8 @@ def generate_logon( else: session_kind = { 3: "network", + 4: "batch", + 5: "service", 10: "rdp", }.get(logon_type, "interactive") @@ -3934,6 +3948,7 @@ def _derive_current_directory( process_name: str, command_line: str, parent_pid: int, + logon_type: int = 2, ) -> str: """Derive a source-native process working directory for Sysmon Event 1.""" if _get_os_category(system.os) != "windows": @@ -3950,6 +3965,8 @@ def _derive_current_directory( if username in _SYSTEM_ACCOUNTS or username.endswith("$"): return system_dir + "\\" + if logon_type == 5: + return system_dir + "\\" parent_image = ( self._lookup_process_name(system.hostname, parent_pid, _get_os_category(system.os)) @@ -4268,6 +4285,7 @@ def generate_process( ) self.state_manager.set_current_time(time) session = self.state_manager.get_session(process_logon_id) + process_logon_type = session.logon_type if session is not None else 2 if session is not None and time <= session.start_time: offset_ms = 100 + ( _stable_seed( @@ -4313,6 +4331,8 @@ def generate_process( self.state_manager.set_current_time(time) if process_username != user.username and process_username not in _SYSTEM_ACCOUNTS: _integrity = "Medium" + if _get_os_category(system.os) == "windows" and process_logon_type == 5: + _integrity = "High" if _integrity == "Medium" else _integrity if _get_os_category(system.os) == "windows": _integrity, _token_elevation, _mandatory_label = _windows_token_profile( process_username, @@ -4385,6 +4405,8 @@ def generate_process( username=process_username, user_sid=self._get_sid(process_username), logon_id=process_logon_id, + logon_type=process_logon_type, + elevated=_integrity in {"High", "System"}, ), process=ProcessContext( pid=pid, @@ -4408,6 +4430,7 @@ def generate_process( process_name=process_name, command_line=command_line, parent_pid=parent_pid, + logon_type=process_logon_type, ), ), edr=EdrContext(object_id=proc_obj_id, actor_id=parent_obj_id), @@ -4983,6 +5006,9 @@ def generate_connection( """ from evidenceforge.events.contexts import NetworkContext + if http is not None and http.trans_depth != 1: + http = replace(http, trans_depth=1) + caller_provided_duration = duration is not None caller_provided_conn_state = conn_state is not None caller_provided_payload = ( @@ -7095,6 +7121,42 @@ def generate_bash_command( if activity_type_or_command in _activity_type_commands: command_list = _activity_type_commands[activity_type_or_command] + if activity_type_or_command == "process_user_apps": + from evidenceforge.generation.activity.bash_commands import _resolve_server_role + + server_role = _resolve_server_role( + system.hostname, + list(getattr(system, "services", []) or []), + ) + if server_role == "db": + command_list = [ + "ls -la", + "tail -f /var/log/mysql/error.log", + "mysql -u root -p -e 'SHOW PROCESSLIST'", + "pg_isready", + "du -sh /var/lib/mysql/*", + "systemctl status mysql", + "free -m", + "uptime", + "cat /etc/hostname", + "ss -tulnp", + "w", + "htop", + "ip addr show", + ] + elif server_role != "web": + web_markers = ( + "apache", + "nginx", + "certbot", + "/var/www", + "ab -n", + ) + command_list = [ + command + for command in command_list + if not any(marker in command for marker in web_markers) + ] command = _get_rng().choice(command_list) else: # Literal command string (direct commands, typos, etc.) @@ -9153,6 +9215,7 @@ def generate_service_logon( logon_type=5, source_ip="-", start_time=time, + session_kind="service", ) host = self._build_host_context(system) reporting_pid = self._get_system_pid(system.hostname, "lsass", 0x2E0) @@ -11086,7 +11149,7 @@ def _ensure_session_explorer_pid( return None if session.system != system.hostname or session.username != user.username: return None - if session.logon_type == 3 or session.session_kind == "network": + if session.logon_type in {3, 5} or session.session_kind in {"network", "service"}: return None sys_pids = getattr(self, "_system_pids", {}).get(system.hostname, {}) @@ -11242,6 +11305,7 @@ def _select_parent_pid( else None ) is_network_logon = active_session and active_session.logon_type == 3 + is_service_logon = active_session and active_session.logon_type == 5 if is_network_logon: # Network logon: parent is services.exe or svchost.exe @@ -11257,6 +11321,15 @@ def _select_parent_pid( return sys_pids.get( "services", sys_pids.get("svchost_dcom", sys_pids.get("wininit", 4)) ) + if is_service_logon: + if exe_name in self._WINDOWS_SHELLS: + return sys_pids.get( + "svchost_netsvcs", + sys_pids.get("svchost_dcom", sys_pids.get("services", 4)), + ) + return sys_pids.get( + "services", sys_pids.get("svchost_dcom", sys_pids.get("wininit", 4)) + ) if exe_name == "explorer.exe": return self._windows_explorer_parent_pid( @@ -11396,6 +11469,7 @@ def _resolve_parent( else None ) is_network_logon = active_session and active_session.logon_type == 3 + is_service_logon = active_session and active_session.logon_type == 5 if is_network_logon: if remote_wrapper_pid is not None: return remote_wrapper_pid @@ -11433,6 +11507,15 @@ def _resolve_parent( return sys_pids.get( "services", sys_pids.get("svchost_dcom", sys_pids.get("wininit", 4)) ) + if is_service_logon: + if is_shell: + return sys_pids.get( + "svchost_netsvcs", + sys_pids.get("svchost_dcom", sys_pids.get("services", 4)), + ) + return sys_pids.get( + "services", sys_pids.get("svchost_dcom", sys_pids.get("wininit", 4)) + ) if os_cat == "windows" and exe_name == "explorer.exe": return self._windows_explorer_parent_pid(system, user, time, logon_id) @@ -11550,6 +11633,22 @@ def _sanitize_user_parent_pid( parent_proc = self.state_manager.get_process(system.hostname, parent_pid) parent_image = (parent_proc.image if parent_proc is not None else "").lower() process_exe = process_name.rsplit("\\", 1)[-1].rsplit("/", 1)[-1].lower() + session = self.state_manager.get_session(logon_id) + if ( + os_category == "windows" + and session is not None + and session.logon_type == 5 + and parent_image.rsplit("\\", 1)[-1].rsplit("/", 1)[-1] == "explorer.exe" + ): + sys_pids = getattr(self, "_system_pids", {}).get(system.hostname, {}) + if process_exe in self._WINDOWS_SHELLS: + return sys_pids.get( + "svchost_netsvcs", + sys_pids.get("svchost_dcom", sys_pids.get("services", parent_pid)), + ) + return sys_pids.get( + "services", sys_pids.get("svchost_dcom", sys_pids.get("wininit", parent_pid)) + ) is_browser_child = process_exe in _WINDOWS_BROWSER_EXES and not ( self._is_top_level_browser_launch(process_name, command_line) ) diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 431be9da..011f7103 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -3213,6 +3213,37 @@ def test_generate_connection_emits_zeek(self, activity_gen, state_manager, mock_ assert event.network.dst_port == dst_port assert event.network.service == "ssl" + def test_generate_connection_clamps_http_depth_for_one_request_connections( + self, activity_gen, state_manager, mock_emitters + ): + """A fresh connection UID should not inherit page-session transaction depth.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + http = HttpContext( + method="GET", + host="portal.example.com", + uri="/static/app.js", + response_body_len=2048, + trans_depth=4, + ) + + activity_gen.generate_connection( + "10.0.0.1", + "93.184.216.34", + timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=0.5, + orig_bytes=300, + resp_bytes=2048, + http=http, + ) + + event = mock_emitters["zeek_conn"].emit.call_args[0][0] + assert event.http.trans_depth == 1 + assert http.trans_depth == 4 + def test_generate_connection_with_bytes(self, activity_gen, state_manager, mock_emitters): """generate_connection should include byte counts in NetworkContext.""" timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) @@ -3698,6 +3729,41 @@ def test_generate_bash_command_emits_correlated_linux_process( assert terminate_events assert process_events[-1].timestamp < terminate_events[-1].timestamp + def test_process_user_apps_bash_pool_respects_database_role( + self, activity_gen, test_user, monkeypatch, mock_emitters + ): + """Generic user-app shell noise on DB hosts should not pick web-admin commands.""" + + class AssertingRng: + def choice(self, seq): + joined = "\n".join(seq) + assert "apache" not in joined + assert "nginx" not in joined + assert "certbot" not in joined + assert "ab -n" not in joined + return "du -sh /var/lib/mysql/*" + + monkeypatch.setattr(generator_module, "_get_rng", lambda: AssertingRng()) + linux = System( + hostname="DB-PROD-01", + ip="10.0.0.2", + os="Ubuntu 22.04", + type="server", + services=["mysql"], + assigned_user=test_user.username, + ) + + activity_gen.generate_bash_command( + test_user, + linux, + datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + "process_user_apps", + emit_process_telemetry=False, + ) + + event = mock_emitters["windows_event_security"].emit.call_args[0][0] + assert event.shell.command == "du -sh /var/lib/mysql/*" + def test_generate_bash_command_does_not_emit_process_for_shell_builtin( self, activity_gen, test_user, state_manager, mock_emitters ): @@ -3880,6 +3946,12 @@ def test_linux_shell_redirection_removed_from_process_argv(self): "mysqldump --single-transaction ehr patients", ) + def test_linux_shell_glob_tokens_remain_unquoted_in_process_argv(self): + """Expanded shell globs should not be rendered as literal quoted wildcards.""" + process = generator_module._linux_command_process_from_shell("du -sh /var/log/*") + + assert process == ("/usr/bin/du", "du -sh /var/log/*") + def test_linux_shell_control_operators_split_process_argv(self): """Shell control operators should separate child process argv entries.""" processes = generator_module._linux_command_processes_from_shell( diff --git a/tests/unit/test_spawn_rules.py b/tests/unit/test_spawn_rules.py index 3e6a08eb..891efd9d 100644 --- a/tests/unit/test_spawn_rules.py +++ b/tests/unit/test_spawn_rules.py @@ -624,6 +624,65 @@ def test_network_logon_uses_services_parent( f"Network logon parent should be services/svchost, got {parent_proc.image}" ) + def test_service_logon_process_uses_service_parent_and_token( + self, state_manager, mock_emitters, win_system + ): + """Type 5 service logon processes should not look like Explorer children.""" + ag, _pids = _setup_activity_gen(state_manager, mock_emitters, win_system) + svc_user = User( + username="svc_mhsync", + full_name="Meridian Sync Service", + email="svc_mhsync@example.com", + enabled=True, + persona="service", + ) + logon_time = datetime(2024, 3, 18, 12, 0, 0, tzinfo=UTC) + process_time = datetime(2024, 3, 18, 12, 0, 2, tzinfo=UTC) + + logon_id = ag.generate_service_logon( + system=win_system, + time=logon_time, + service_account=svc_user.username, + ) + parent_pid = ag._resolve_parent( + win_system, + svc_user, + process_time, + logon_id, + r"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe", + ) + pid = ag.generate_process( + user=svc_user, + system=win_system, + time=process_time, + logon_id=logon_id, + process_name=r"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe", + command_line=r"powershell.exe -NoP -EncodedCommand SQBtAHAAbwByAHQ=", + parent_pid=parent_pid, + ) + + proc = state_manager.get_process(win_system.hostname, pid) + assert proc is not None + parent_proc = state_manager.get_process(win_system.hostname, proc.parent_pid) + assert parent_proc is not None + parent_exe = parent_proc.image.rsplit("\\", 1)[-1].lower() + assert parent_exe in {"services.exe", "svchost.exe"} + assert proc.integrity_level == "High" + + process_events = [ + call.args[0] + for call in mock_emitters["windows_event_security"].emit.call_args_list + if call.args[0].event_type == "process_create" + ] + assert process_events + event = process_events[-1] + assert event.auth.logon_type == 5 + assert event.process.parent_image.rsplit("\\", 1)[-1].lower() in { + "services.exe", + "svchost.exe", + } + assert event.process.current_directory == "C:\\Windows\\System32\\" + class TestLinuxProcessTreeRealism: """Linux process trees should use spawn rules for parent selection.""" From 02bd4bd980c807ab937a948700efbf565d7474b8 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 16:07:11 -0400 Subject: [PATCH 04/61] fix: scope bash tool affinity by role pool --- TODO.md | 1 + .../generation/activity/bash_commands.py | 9 +++++---- tests/unit/test_expert_round4.py | 12 ++++++++++++ 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/TODO.md b/TODO.md index bdb85971..e41bd37e 100644 --- a/TODO.md +++ b/TODO.md @@ -257,6 +257,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 2 fix pass completed and verified: stabilized per-host deployed web asset cache-buster tokens, made health/status endpoint response sizes small and stable, scoped health-check visitor profiles to server/domain-controller sources, reduced machine-account DC Kerberos volume, skewed service class selection, and fixed member-server SPN targeting for source-native `SMB`/`file_server` names. Verification passed with `uv run eforge validate-config`, focused web/Kerberos regressions, `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3064 passed, 37 skipped`). Regenerated eval passed at `95.19` JSON overall across `92,476` records; probes found max static asset variants per host `2`, zero workstation-sourced health-check rows, max health response size `589` bytes, and DC-local Kerberos 4769 rows reduced to `68/491`. - Loop 3 blind review completed against `/private/tmp/eforge-assess-loop3-neutral-b476c16/data`. Blind synthetic-confidence scores were Threat Hunter `62`, Detection `68`, Network `72`, Host/EDR `72` (average `68.5`). Top verified fix targets are the DC service-logon + `explorer.exe` parent contradiction, Zeek HTTP `trans_depth > 1` on one-request unique-UID connections, repeated Linux syslog daemon lines, and role-drifted bash command pools. - Loop 3 fix pass completed and verified: service-logon sessions now stay service-shaped instead of spawning Explorer children, process telemetry carries LogonType 5 token/session semantics, one-request HTTP connections clamp Zeek `trans_depth` to 1, Linux eCAR command rendering no longer quotes expandable glob tokens, DB host shell noise avoids web-admin command pools, and high-repeat syslog daemon messages have reduced weights. Verification passed with `uv run eforge validate-config`, focused activity/spawn/config regressions (`269 passed, 1 skipped` plus focused new tests), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3069 passed, 37 skipped`). + - Loop 4 regeneration/eval hard-probe follow-up completed and verified: the first post-`7e6c7ea` regeneration passed automated eval at `95.30` across `86,698` records and fixed the service-logon/Zeek-depth/quoted-glob probes, but hard probing still found DB-host `apache2` bash/eCAR commands caused by username-only bash tool-affinity caching. Fixed the cache to scope affinity by user plus command pool so web-admin tool preferences cannot leak into DB sessions. Verification passed with focused regressions, `uv run eforge validate-config`, related activity/spawn/expert/baseline tests (`246 passed, 1 skipped`), Ruff checks, and full normal `uv run pytest --no-cov -q` (`3070 passed, 37 skipped`). - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/bash_commands.py b/src/evidenceforge/generation/activity/bash_commands.py index 3d53f766..f9cfb2ec 100644 --- a/src/evidenceforge/generation/activity/bash_commands.py +++ b/src/evidenceforge/generation/activity/bash_commands.py @@ -218,7 +218,7 @@ def _typo_allowed( return True -_USER_TOOL_AFFINITY: dict[str, list[str]] = {} +_USER_TOOL_AFFINITY: dict[tuple[str, tuple[str, ...]], list[str]] = {} def _get_user_pool(username: str, full_pool: list[str]) -> list[str]: @@ -228,8 +228,9 @@ def _get_user_pool(username: str, full_pool: list[str]) -> list[str]: 80% of role-specific commands come from the primary tools, 20% from the full pool — so users have consistent tooling preferences. """ - if username in _USER_TOOL_AFFINITY: - return _USER_TOOL_AFFINITY[username] + cache_key = (username, tuple(full_pool)) + if cache_key in _USER_TOOL_AFFINITY: + return _USER_TOOL_AFFINITY[cache_key] # Identify tool families by prefix keywords _TOOL_FAMILIES = { @@ -260,7 +261,7 @@ def _get_user_pool(username: str, full_pool: list[str]) -> list[str]: if len(primary_pool) < 3: primary_pool = full_pool - _USER_TOOL_AFFINITY[username] = primary_pool + _USER_TOOL_AFFINITY[cache_key] = primary_pool return primary_pool diff --git a/tests/unit/test_expert_round4.py b/tests/unit/test_expert_round4.py index 1b481c77..42868f4d 100644 --- a/tests/unit/test_expert_round4.py +++ b/tests/unit/test_expert_round4.py @@ -103,6 +103,18 @@ def test_different_users_may_get_different_pools(self): assert all(cmd in pool for cmd in pool_a) assert all(cmd in pool for cmd in pool_b) + def test_same_user_pool_affinity_is_role_specific(self): + """A user's web-admin affinity should not leak into later DB sessions.""" + commands = load_bash_commands() + web_pool = commands["webadmin"] + db_pool = commands["dba"] + + _get_user_pool("marcus.chen", web_pool) + db_affinity = _get_user_pool("marcus.chen", db_pool) + + assert all(command in db_pool for command in db_affinity) + assert not any("apache2" in command or "nginx" in command for command in db_affinity) + class TestPerUserBrowserAffinity: """P1-7c: Per-user browser affinity.""" From 67ac00d4baddbb1430bd30a93936ca67b9aebfa8 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 16:11:34 -0400 Subject: [PATCH 05/61] fix: prefer host services in bash templates --- TODO.md | 1 + .../generation/activity/bash_commands.py | 29 ++++++++++++++++--- tests/unit/test_expert_round4.py | 12 ++++++++ 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/TODO.md b/TODO.md index e41bd37e..c1caa911 100644 --- a/TODO.md +++ b/TODO.md @@ -258,6 +258,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 3 blind review completed against `/private/tmp/eforge-assess-loop3-neutral-b476c16/data`. Blind synthetic-confidence scores were Threat Hunter `62`, Detection `68`, Network `72`, Host/EDR `72` (average `68.5`). Top verified fix targets are the DC service-logon + `explorer.exe` parent contradiction, Zeek HTTP `trans_depth > 1` on one-request unique-UID connections, repeated Linux syslog daemon lines, and role-drifted bash command pools. - Loop 3 fix pass completed and verified: service-logon sessions now stay service-shaped instead of spawning Explorer children, process telemetry carries LogonType 5 token/session semantics, one-request HTTP connections clamp Zeek `trans_depth` to 1, Linux eCAR command rendering no longer quotes expandable glob tokens, DB host shell noise avoids web-admin command pools, and high-repeat syslog daemon messages have reduced weights. Verification passed with `uv run eforge validate-config`, focused activity/spawn/config regressions (`269 passed, 1 skipped` plus focused new tests), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3069 passed, 37 skipped`). - Loop 4 regeneration/eval hard-probe follow-up completed and verified: the first post-`7e6c7ea` regeneration passed automated eval at `95.30` across `86,698` records and fixed the service-logon/Zeek-depth/quoted-glob probes, but hard probing still found DB-host `apache2` bash/eCAR commands caused by username-only bash tool-affinity caching. Fixed the cache to scope affinity by user plus command pool so web-admin tool preferences cannot leak into DB sessions. Verification passed with focused regressions, `uv run eforge validate-config`, related activity/spawn/expert/baseline tests (`246 passed, 1 skipped`), Ruff checks, and full normal `uv run pytest --no-cov -q` (`3070 passed, 37 skipped`). + - Loop 4 regeneration/eval second hard-probe follow-up completed and verified: post-`02bd4bd` regeneration passed automated eval at `95.71` across `87,420` records and kept the service-logon/Zeek-depth/quoted-glob probes clean, but two DB-host `apache2` commands remained via global `{service}` template substitution. Fixed service placeholders to prefer the current host's service list before falling back to global service names. Verification passed with focused regressions, `uv run eforge validate-config`, related activity/spawn/expert/baseline tests (`247 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3071 passed, 37 skipped`). - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/bash_commands.py b/src/evidenceforge/generation/activity/bash_commands.py index f9cfb2ec..c9d5dd97 100644 --- a/src/evidenceforge/generation/activity/bash_commands.py +++ b/src/evidenceforge/generation/activity/bash_commands.py @@ -68,14 +68,35 @@ def _resolve_server_role(hostname: str, services: list[str]) -> str: return "generic" -def _resolve_template(template: str, rng: random.Random, params: dict[str, list[str]]) -> str: +def _service_template_values(system_services: list[str] | None, fallback: list[str]) -> list[str]: + """Return service placeholder values that fit the current host when possible.""" + contextual: list[str] = [] + for service in system_services or []: + normalized = service.strip().lower() + if not normalized or normalized in {"dns-client", "systemd"}: + continue + if normalized == "ssh": + normalized = "sshd" + contextual.append(normalized) + return contextual or fallback + + +def _resolve_template( + template: str, + rng: random.Random, + params: dict[str, list[str]], + system_services: list[str] | None = None, +) -> str: """Resolve {placeholder} tokens in a command template.""" result = template # Iterate to handle templates with multiple placeholders for key, values in params.items(): token = "{" + key + "}" while token in result: - result = result.replace(token, rng.choice(values), 1) + candidates = ( + _service_template_values(system_services, values) if key == "service" else values + ) + result = result.replace(token, rng.choice(candidates), 1) return result @@ -325,9 +346,9 @@ def pick_bash_command_entry( if username and rng.random() < 0.80: pool = _get_user_pool(username, pool) template = rng.choice(pool) - return _resolve_template(template, rng, params), False + return _resolve_template(template, rng, params, system_services), False # Common command (60%) common = commands.get("common", ["ls"]) template = rng.choice(common) - return _resolve_template(template, rng, params), False + return _resolve_template(template, rng, params, system_services), False diff --git a/tests/unit/test_expert_round4.py b/tests/unit/test_expert_round4.py index 42868f4d..ede02951 100644 --- a/tests/unit/test_expert_round4.py +++ b/tests/unit/test_expert_round4.py @@ -10,6 +10,7 @@ ) from evidenceforge.generation.activity.bash_commands import ( _get_user_pool, + _resolve_template, load_bash_commands, ) from evidenceforge.generation.activity.dns_registry import pick_domain_and_ip @@ -115,6 +116,17 @@ def test_same_user_pool_affinity_is_role_specific(self): assert all(command in db_pool for command in db_affinity) assert not any("apache2" in command or "nginx" in command for command in db_affinity) + def test_service_placeholder_prefers_host_services(self): + """Generic service placeholders should not pull web services onto DB hosts.""" + command = _resolve_template( + "systemctl status {service}", + random.Random(42), + {"service": ["apache2", "nginx"]}, + ["mysql", "ssh", "dns-client"], + ) + + assert command in {"systemctl status mysql", "systemctl status sshd"} + class TestPerUserBrowserAffinity: """P1-7c: Per-user browser affinity.""" From caf06a70c6f0baea1d6f04eacc18f0c8d54940d8 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 16:48:38 -0400 Subject: [PATCH 06/61] fix: align cli http and analyzer timing realism --- TODO.md | 4 + .../config/activity/application_catalog.yaml | 2 +- .../generation/activity/generator.py | 180 ++++++++++++++++-- .../generation/activity/helpers.py | 7 + src/evidenceforge/generation/emitters/ecar.py | 54 +++++- .../generation/emitters/zeek_files.py | 28 ++- .../generation/emitters/zeek_x509.py | 10 +- .../generation/engine/baseline.py | 14 +- tests/unit/test_activity.py | 65 +++++++ tests/unit/test_ecar_spec_compliance.py | 1 + tests/unit/test_zeek_activity_contexts.py | 17 +- tests/unit/test_zeek_files.py | 41 ++++ tests/unit/test_zeek_ssl.py | 39 ++++ 13 files changed, 431 insertions(+), 31 deletions(-) diff --git a/TODO.md b/TODO.md index c1caa911..515634a2 100644 --- a/TODO.md +++ b/TODO.md @@ -259,6 +259,10 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 3 fix pass completed and verified: service-logon sessions now stay service-shaped instead of spawning Explorer children, process telemetry carries LogonType 5 token/session semantics, one-request HTTP connections clamp Zeek `trans_depth` to 1, Linux eCAR command rendering no longer quotes expandable glob tokens, DB host shell noise avoids web-admin command pools, and high-repeat syslog daemon messages have reduced weights. Verification passed with `uv run eforge validate-config`, focused activity/spawn/config regressions (`269 passed, 1 skipped` plus focused new tests), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` (`3069 passed, 37 skipped`). - Loop 4 regeneration/eval hard-probe follow-up completed and verified: the first post-`7e6c7ea` regeneration passed automated eval at `95.30` across `86,698` records and fixed the service-logon/Zeek-depth/quoted-glob probes, but hard probing still found DB-host `apache2` bash/eCAR commands caused by username-only bash tool-affinity caching. Fixed the cache to scope affinity by user plus command pool so web-admin tool preferences cannot leak into DB sessions. Verification passed with focused regressions, `uv run eforge validate-config`, related activity/spawn/expert/baseline tests (`246 passed, 1 skipped`), Ruff checks, and full normal `uv run pytest --no-cov -q` (`3070 passed, 37 skipped`). - Loop 4 regeneration/eval second hard-probe follow-up completed and verified: post-`02bd4bd` regeneration passed automated eval at `95.71` across `87,420` records and kept the service-logon/Zeek-depth/quoted-glob probes clean, but two DB-host `apache2` commands remained via global `{service}` template substitution. Fixed service placeholders to prefer the current host's service list before falling back to global service names. Verification passed with focused regressions, `uv run eforge validate-config`, related activity/spawn/expert/baseline tests (`247 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3071 passed, 37 skipped`). + - Loop 4 final regeneration/eval completed from commit `67ac00d`: regenerated unchanged `scenarios/iteration-test/scenario.yaml`, quantitative eval passed at `95.94` JSON overall (`96/100` human-readable) across `89,334` records, and hard probes found zero HTTP `trans_depth > 1` rows, zero service-logon Explorer/medium-integrity/nonzero-terminal contradictions, zero quoted-glob commands, and zero DB-host `apache2`/`certbot`/`nginx reload`/`ab` web-admin command hits. + - Loop 4 blind review completed against `/private/tmp/eforge-assess-loop4-neutral-67ac00d/data`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `82`, Network `82`, Host/EDR `78` (average `80.0`). Top next fix targets are canonical process-to-network HTTP client mismatch (`api.example.com/status` command lines paired with unrelated proxy/Zeek hosts and user agents), repeated Linux SSH/logind ordering and exact password-auth timing artifacts, Windows eCAR thread-ID morphology, and Zeek protocol timing/lifecycle bounds. + - Loop 4 blind-review follow-up fix pass completed and verified: command-line HTTP clients now drive canonical HTTP/proxy/TLS destination metadata so curl/wget-style eCAR process rows do not point at unrelated proxy/Zeek hosts or user agents; Linux SSH compound syslog timing uses source-native subsecond connection/PAM/logind spacing instead of exact one-second triads; Windows eCAR process-owned TIDs align to Windows allocation morphology; Zeek certificate files/x509 analyzer rows are bounded inside the owning connection lifetime; sudo baseline noise uses auth/authpriv-style facility semantics. Verification passed with focused regressions, `uv run eforge validate-config`, related unit files (`325 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3075 passed, 37 skipped`). + - Loop 5 regeneration/eval **IN PROGRESS**: regenerate `scenarios/iteration-test/scenario.yaml` from the Loop 4 follow-up commit, rerun quantitative eval and hard probes, then launch the next blind-review panel if the probes stay clean. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/activity/application_catalog.yaml b/src/evidenceforge/config/activity/application_catalog.yaml index 0aab0266..5dd72844 100644 --- a/src/evidenceforge/config/activity/application_catalog.yaml +++ b/src/evidenceforge/config/activity/application_catalog.yaml @@ -541,7 +541,7 @@ applications: linux: image_path: "/usr/bin/curl" command_templates: - - "curl -s https://api.example.com/status" + - "curl -s {external_api_url}" - "curl -sS -o /dev/null -w '%{http_code}' {internal_url}" - "curl -X GET {internal_url} -H 'Accept: application/json'" categories: [user_app] diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 09ec2cb1..02d9ff43 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -294,6 +294,102 @@ def _extract_nmap_ports(command_line: str) -> list[int]: return list(dict.fromkeys(port for port in ports if 0 < port <= 65535)) +def _extract_http_url_from_command(command_line: str) -> str | None: + """Return the first HTTP(S) URL embedded in a process command line.""" + for match in re.finditer(r"https?://[^\s'\"<>]+", command_line): + candidate = match.group(0).rstrip(").,;]") + parsed = urlsplit(candidate) + if parsed.scheme in {"http", "https"} and parsed.hostname: + return candidate + return None + + +def _http_user_agent_for_process(process_name: str, command_line: str) -> str: + """Return a source-native HTTP User-Agent for command-line HTTP clients.""" + exe = process_name.rsplit("\\", 1)[-1].rsplit("/", 1)[-1].lower() + command = command_line.lower() + if exe in {"curl", "curl.exe"} or command.startswith("curl "): + return "curl/7.88.1" + if exe in {"wget", "wget.exe"} or command.startswith("wget "): + return "Wget/1.21.3" + if "python" in exe and "requests" in command: + return "python-requests/2.31.0" + return "" + + +def _is_tool_http_user_agent(user_agent: str) -> bool: + """Return true when the UA identifies a command-line/library HTTP client.""" + ua = user_agent.strip().lower() + return ua.startswith( + ( + "curl/", + "wget/", + "python-requests/", + "go-http-client/", + "apache-httpclient/", + "powershell/", + ) + ) + + +def _http_method_for_process_command(command_line: str) -> str: + """Infer the HTTP method visible for a simple CLI HTTP command.""" + lowered = f" {command_line.lower()} " + if " -i " in lowered or " --head " in lowered or " --head" in lowered: + return "HEAD" + method_match = re.search(r"(?:\s-X\s+|\s--request\s+)([A-Za-z]+)", command_line) + if method_match: + return method_match.group(1).upper() + return "GET" + + +def _http_context_from_process_command( + process_name: str, + command_line: str, + *, + response_body_len: int, +) -> tuple[HttpContext, str, int, str] | None: + """Build canonical HTTP request metadata from a process command URL. + + Returns ``(context, host, port, service)`` so the owning process, proxy, and + Zeek records agree on host, path, method, and User-Agent for the same flow. + """ + http_url = _extract_http_url_from_command(command_line) + if not http_url: + return None + parsed = urlsplit(http_url) + host = parsed.hostname or "" + if not host: + return None + service = "ssl" if parsed.scheme == "https" else "http" + port = parsed.port or (443 if service == "ssl" else 80) + path = parsed.path or "/" + if parsed.query: + path = f"{path}?{parsed.query}" + user_agent = _http_user_agent_for_process(process_name, command_line) + if not user_agent: + return None + + from evidenceforge.generation.activity.http_content import infer_mime_type_from_path + + mime_type = infer_mime_type_from_path(path) + context = HttpContext( + method=_http_method_for_process_command(command_line), + host=host if port in (80, 443) else f"{host}:{port}", + uri=path, + version="1.1", + user_agent=user_agent, + request_body_len=0, + response_body_len=response_body_len, + status_code=200, + status_msg="OK", + referrer="", + resp_mime_types=[mime_type] if mime_type else [], + tags=[], + ) + return context, host, port, service + + def _parse_port_tokens(tokens: list[str]) -> list[int]: """Parse nmap port tokens until the next option or target token.""" ports: list[int] = [] @@ -1722,6 +1818,31 @@ def _build_host_context(self, system: System) -> HostContext: roles=list(system.roles), ) + def _system_for_hostname(self, hostname: str) -> Any | None: + """Resolve a scenario system by short hostname or FQDN.""" + wanted = hostname.lower().rstrip(".") + if not wanted: + return None + systems = [] + seen_hosts: set[str] = set() + for system in getattr(self, "_ip_to_system", {}).values(): + system_host_key = str(getattr(system, "hostname", "") or "") + if system_host_key in seen_hosts: + continue + seen_hosts.add(system_host_key) + systems.append(system) + for system in systems: + system_host = str(getattr(system, "hostname", "") or "").lower().rstrip(".") + ad_domain = str(getattr(self, "_ad_domain", "") or "").lower().rstrip(".") + system_fqdn = ( + f"{system_host}.{ad_domain}" + if system_host and ad_domain and "." not in system_host + else system_host + ) + if wanted in {system_host, system_fqdn}: + return system + return None + def _resolve_process_identity( self, *, @@ -1976,7 +2097,10 @@ def _build_proxy_context( ) user_agent = "" - apply_domain_user_agent = http is None or not is_browser_like_proxy_domain(proxy_hostname) + apply_domain_user_agent = http is None or ( + not _is_tool_http_user_agent(http.user_agent) + and not is_browser_like_proxy_domain(proxy_hostname) + ) domain_user_agent = ( pick_proxy_domain_user_agent( rng, @@ -6292,8 +6416,9 @@ def generate_connection( ) from evidenceforge.generation.activity.proxy_uri import is_browser_like_proxy_domain - apply_domain_user_agent = event.http is None or not is_browser_like_proxy_domain( - proxy_hostname + apply_domain_user_agent = event.http is None or ( + not _is_tool_http_user_agent(event.http.user_agent) + and not is_browser_like_proxy_domain(proxy_hostname) ) domain_user_agent = ( pick_proxy_domain_user_agent( @@ -6934,9 +7059,13 @@ def generate_ssh_session( if event.dst_host and event.dst_host.os_category == "linux": from evidenceforge.events.contexts import SyslogContext + conn_delay_ms = rng.randint(70, 160) + pam_delay_ms = conn_delay_ms + rng.randint(45, 110) + logind_delay_ms = pam_delay_ms + rng.randint(420, 760) + # sshd connection message (precedes auth in real SSH lifecycle) conn_msg_event = SecurityEvent( - timestamp=time - timedelta(seconds=1), + timestamp=time - timedelta(milliseconds=conn_delay_ms), event_type="syslog", src_host=event.dst_host, syslog=SyslogContext( @@ -6972,7 +7101,7 @@ def generate_ssh_session( # pam_unix session opened (syslog-only, no eCAR/Zeek correlation) hostname = target_system.hostname pam_event = SecurityEvent( - timestamp=time + timedelta(seconds=1), + timestamp=time + timedelta(milliseconds=pam_delay_ms), event_type="syslog", src_host=event.dst_host, syslog=SyslogContext( @@ -6989,7 +7118,7 @@ def generate_ssh_session( self.dispatcher.dispatch(pam_event) # systemd-logind new session (syslog-only) - logind_time = time + timedelta(seconds=2) + logind_time = time + timedelta(milliseconds=logind_delay_ms) # Session ID: monotonic + unique per host. StateManager owns this # sequence because baseline syslog noise and explicit SSH sessions # both produce systemd-logind messages for the same host. @@ -8021,6 +8150,11 @@ def _emit_process_network_correlation( conn_time = time + timedelta(milliseconds=rng.randint(50, 500)) ext_hostname = None + dst_port = conn_info["dst_port"] + service = conn_info["service"] + http_context = None + resp_bytes = rng.randint(500, 50000) + emit_dns = bool(conn_info["external"]) if conn_info["external"]: # External connection: domain-first selection. App-specific mappings @@ -8035,9 +8169,28 @@ def _emit_process_network_correlation( from evidenceforge.generation.activity.dns_registry import ( pick_domain_and_ip as _pick_domain_and_ip, ) + from evidenceforge.generation.activity.dns_registry import resolve_domain_ip dns_tags = conn_info.get("dns_tags") or [] - if conn_info["service"] == "ssl": + process_http = _http_context_from_process_command( + process_name, + command_line, + response_body_len=resp_bytes, + ) + if process_http is not None: + http_context, ext_hostname, dst_port, service = process_http + command_target = self._system_for_hostname(ext_hostname) + if command_target is not None: + dst_ip = command_target.ip + else: + host_lower = ext_hostname.lower().rstrip(".") + ad_domain = str(getattr(self, "_ad_domain", "") or "").lower().rstrip(".") + if host_lower.endswith(".local") or ( + ad_domain and host_lower.endswith(f".{ad_domain}") + ): + return + dst_ip = resolve_domain_ip(ext_hostname, src_host=system.hostname) + elif service == "ssl": if hasattr(self, "_pick_profiled_tls_destination"): ext_hostname, dst_ip = self._pick_profiled_tls_destination( rng, @@ -8071,9 +8224,9 @@ def _emit_process_network_correlation( # Internal connection: use DB server or any internal server db_servers = getattr(self, "_db_servers", []) all_ips = getattr(self, "_all_system_ips", []) - if conn_info["service"] in ("mssql", "mysql", "postgresql") and db_servers: + if service in ("mssql", "mysql", "postgresql") and db_servers: # Filter to DB servers that match the requested service - svc = conn_info["service"] + svc = service compatible = [ e for e in db_servers @@ -8092,14 +8245,15 @@ def _emit_process_network_correlation( src_ip=system.ip, dst_ip=dst_ip, time=conn_time, - dst_port=conn_info["dst_port"], + dst_port=dst_port, proto="tcp", - service=conn_info["service"], + service=service, duration=rng.uniform(0.3, 15.0), orig_bytes=rng.randint(200, 5000), - resp_bytes=rng.randint(500, 50000), - emit_dns=conn_info["external"], + resp_bytes=resp_bytes, + emit_dns=emit_dns, pid=pid, + http=http_context, hostname=ext_hostname if conn_info["external"] else None, ) diff --git a/src/evidenceforge/generation/activity/helpers.py b/src/evidenceforge/generation/activity/helpers.py index 4cf9b1d9..fc95dc63 100644 --- a/src/evidenceforge/generation/activity/helpers.py +++ b/src/evidenceforge/generation/activity/helpers.py @@ -154,6 +154,13 @@ def _get_os_category(os_string: str) -> str: "https://gitlab.corp.local/team/project/-/pipelines", "https://grafana.corp.local/d/system-overview", ], + "external_api_url": [ + "https://api.github.com/rate_limit", + "https://api.gitlab.com/version", + "https://api.cloudflare.com/client/v4/user", + "https://api.slack.com/methods/api.test", + "https://api.snapcraft.io/v2/snaps/refresh", + ], } # Parameterized command-line value pools for process_query variety diff --git a/src/evidenceforge/generation/emitters/ecar.py b/src/evidenceforge/generation/emitters/ecar.py index d49096a1..9907a3a4 100644 --- a/src/evidenceforge/generation/emitters/ecar.py +++ b/src/evidenceforge/generation/emitters/ecar.py @@ -32,6 +32,7 @@ from evidenceforge.generation.activity.timing_profiles import sample_timing_delta from evidenceforge.generation.emitters.host_base import HostMultiplexEmitter from evidenceforge.utils.rng import _stable_seed +from evidenceforge.utils.windows_ids import align_windows_id _ECAR_SORT_PRIORITY = { ("USER_SESSION", "LOGIN"): 0, @@ -192,12 +193,21 @@ def _apply_edr_context(event_data: dict[str, Any], event: SecurityEvent) -> None event_data["tid"] = event.edr.tid @staticmethod - def _stable_tid(hostname: str, pid: int, timestamp: datetime, salt: str) -> int: + def _stable_tid( + hostname: str, + pid: int, + timestamp: datetime, + salt: str, + os_category: str = "", + ) -> int: """Return a plausible source thread ID for process-owned eCAR events.""" if pid <= 0: return -1 bucket_ms = int(timestamp.timestamp() * 1000) - return 1000 + (_stable_seed(f"ecar_tid:{hostname}:{pid}:{bucket_ms}:{salt}") % 60000) + tid = 1000 + (_stable_seed(f"ecar_tid:{hostname}:{pid}:{bucket_ms}:{salt}") % 60000) + if os_category == "windows": + return align_windows_id(tid) + return tid def _render_logon(self, event: SecurityEvent) -> None: """Render eCAR USER_SESSION/LOGIN event (logged on dst_host).""" @@ -300,7 +310,13 @@ def _render_process_create(self, event: SecurityEvent) -> None: self._apply_edr_context(event_data, event) event_data.setdefault( "tid", - self._stable_tid(self._host_name(host), proc.pid, event_ts, "process_create"), + self._stable_tid( + self._host_name(host), + proc.pid, + event_ts, + "process_create", + getattr(host, "os_category", ""), + ), ) self.emit_event(event_data) @@ -321,7 +337,13 @@ def _render_process_terminate(self, event: SecurityEvent) -> None: self._apply_edr_context(event_data, event) event_data.setdefault( "tid", - self._stable_tid(self._host_name(host), proc.pid, event.timestamp, "process_terminate"), + self._stable_tid( + self._host_name(host), + proc.pid, + event.timestamp, + "process_terminate", + getattr(host, "os_category", ""), + ), ) self.emit_event(event_data) @@ -348,7 +370,13 @@ def _render_file_event(self, event: SecurityEvent) -> None: self._apply_edr_context(event_data, event) event_data.setdefault( "tid", - self._stable_tid(self._host_name(host), event_data["pid"], event.timestamp, "file"), + self._stable_tid( + self._host_name(host), + event_data["pid"], + event.timestamp, + "file", + getattr(host, "os_category", ""), + ), ) self.emit_event(event_data) @@ -370,7 +398,13 @@ def _render_registry_event(self, event: SecurityEvent) -> None: self._apply_edr_context(event_data, event) event_data.setdefault( "tid", - self._stable_tid(self._host_name(host), event_data["pid"], event.timestamp, "registry"), + self._stable_tid( + self._host_name(host), + event_data["pid"], + event.timestamp, + "registry", + getattr(host, "os_category", ""), + ), ) self.emit_event(event_data) @@ -398,7 +432,13 @@ def _render_module_event(self, event: SecurityEvent) -> None: self._apply_edr_context(event_data, event) event_data.setdefault( "tid", - self._stable_tid(self._host_name(host), event_data["pid"], event.timestamp, "module"), + self._stable_tid( + self._host_name(host), + event_data["pid"], + event.timestamp, + "module", + getattr(host, "os_category", ""), + ), ) self.emit_event(event_data) diff --git a/src/evidenceforge/generation/emitters/zeek_files.py b/src/evidenceforge/generation/emitters/zeek_files.py index ff260dd3..df93f83a 100644 --- a/src/evidenceforge/generation/emitters/zeek_files.py +++ b/src/evidenceforge/generation/emitters/zeek_files.py @@ -112,8 +112,10 @@ def emit(self, event: SecurityEvent) -> None: fuid=cert.fuid, position=depth, ) - event_data = { - "ts": self._offset_timestamp( + cert_ts = _bounded_in_connection_timestamp( + event.timestamp, + net.duration, + self._offset_timestamp( event.timestamp, max( analyzer_delay_ms, @@ -124,6 +126,9 @@ def emit(self, event: SecurityEvent) -> None: + 1, ), ), + ) + event_data = { + "ts": cert_ts, "fuid": cert.fuid, "tx_hosts": [net.dst_ip], "rx_hosts": [net.src_ip], @@ -228,6 +233,25 @@ def _bounded_file_transfer_observation( return file_ts, bounded_duration +def _bounded_in_connection_timestamp( + conn_ts: datetime, + conn_duration: float | None, + preferred_ts: datetime, +) -> datetime: + """Keep source-side analyzer rows inside the owning conn.log lifetime.""" + if conn_duration is None or conn_duration <= 0: + return max(conn_ts, preferred_ts) + + epsilon = 0.001 + conn_end = conn_ts + timedelta(seconds=conn_duration) + latest_ts = conn_end - timedelta(seconds=epsilon) + if preferred_ts > latest_ts: + return latest_ts if latest_ts > conn_ts else conn_ts + if preferred_ts < conn_ts: + return conn_ts + return preferred_ts + + def _related_http_analyzer_timestamp(event: SecurityEvent) -> datetime | None: """Return the owning HTTP analyzer timestamp when this file belongs to http.log.""" net = event.network diff --git a/src/evidenceforge/generation/emitters/zeek_x509.py b/src/evidenceforge/generation/emitters/zeek_x509.py index 4db0132c..a573b8a9 100644 --- a/src/evidenceforge/generation/emitters/zeek_x509.py +++ b/src/evidenceforge/generation/emitters/zeek_x509.py @@ -27,6 +27,7 @@ from evidenceforge.events.base import SecurityEvent from evidenceforge.generation.activity.tls_realism import certificate_analyzer_delay_ms from evidenceforge.generation.emitters.zeek_base import SensorMultiplexEmitter +from evidenceforge.generation.emitters.zeek_files import _bounded_in_connection_timestamp class ZeekX509Emitter(SensorMultiplexEmitter): @@ -76,8 +77,15 @@ def _emit_certificate( sensor_hostnames = list(dict.fromkeys([*x509_sensor_hostnames, *ssl_sensor_hostnames])) targets = sensor_hostnames or self._sensor_hostnames new_targets = targets + timestamp = self._offset_timestamp(event.timestamp, analyzer_delay_ms) + if event.network is not None: + timestamp = _bounded_in_connection_timestamp( + event.timestamp, + event.network.duration, + timestamp, + ) event_data: dict[str, Any] = { - "ts": self._offset_timestamp(event.timestamp, analyzer_delay_ms), + "ts": timestamp, "id": x509.fuid, "fingerprint": x509.fingerprint, "certificate.version": x509.certificate_version, diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 9e1d7d1d..9634ec6d 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -5397,9 +5397,15 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 auth_msg = f"Accepted password for {ssh_user} from {ip} port {port} ssh2" _msg_offset = rng.randint(10, 50) login_times: list[datetime] = [] - for _ in range(4): + for _ in range(3): login_times.append(ts + timedelta(milliseconds=_msg_offset)) - _msg_offset += rng.randint(1, 50) + _msg_offset += rng.randint(12, 70) + # systemd-logind is observed as a different process from + # sshd, so source-observation delay can be independent. + # Keep enough visible margin that New-session rows cannot + # sort before auth/PAM under the default syslog delay profile. + _msg_offset += rng.randint(420, 760) + login_times.append(ts + timedelta(milliseconds=_msg_offset)) ssh_sid = self.state_manager.next_linux_logind_session_id( system.hostname, rng, @@ -5592,12 +5598,16 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 16, ) pid = 500 + (_h % 59500) # range 500-59999 + facility = 10 if app == "sudo" else 3 + severity = 5 if app == "sudo" else 6 self.activity_generator.generate_syslog_event( system=system, time=ts, app_name=app, message=msg, pid=pid, + facility=facility, + severity=severity, ) # ICMP ping between systems on same subnet diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 011f7103..e4da0c82 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -41,6 +41,7 @@ from evidenceforge.generation.activity import generator as generator_module from evidenceforge.generation.activity.generator import ( _extract_image_from_command, + _http_context_from_process_command, _jitter_default_connection_duration, ) from evidenceforge.generation.activity.tls_realism import ( @@ -63,6 +64,70 @@ def test_missing_process_object_id_returns_empty(self): assert second == "" +class TestProcessHttpCommandCorrelation: + def test_http_context_from_curl_command_preserves_url_and_user_agent(self): + """CLI HTTP command lines should drive the canonical HTTP flow metadata.""" + result = _http_context_from_process_command( + "/usr/bin/curl", + "curl -s https://api.github.com/rate_limit?resource=core", + response_body_len=1234, + ) + + assert result is not None + http, host, port, service = result + assert host == "api.github.com" + assert port == 443 + assert service == "ssl" + assert http.host == "api.github.com" + assert http.uri == "/rate_limit?resource=core" + assert http.user_agent == "curl/7.88.1" + assert http.response_body_len == 1234 + + def test_proxy_context_preserves_cli_http_user_agent(self): + """Proxy logs should not replace a caller-provided CLI User-Agent.""" + generator = ActivityGenerator(StateManager(), {}) + source = System( + hostname="LINUX-01", + ip="10.0.0.20", + os="Ubuntu 24.04", + type="workstation", + ) + proxy = System( + hostname="proxy01", + ip="10.0.0.5", + os="Ubuntu 24.04", + type="server", + ) + http = HttpContext( + method="GET", + host="api.github.com", + uri="/rate_limit", + user_agent="curl/7.88.1", + response_body_len=1234, + status_code=200, + status_msg="OK", + resp_mime_types=["application/json"], + ) + + proxy_context = generator._build_proxy_context( + src_ip=source.ip, + dst_ip="140.82.112.5", + dst_port=443, + service="ssl", + duration=1.2, + orig_bytes=320, + resp_bytes=1234, + hostname="api.github.com", + source_system=source, + proxy_sys=proxy, + http=http, + explicit_mode=True, + ) + + assert proxy_context.url == "https://api.github.com/rate_limit" + assert proxy_context.user_agent == "curl/7.88.1" + + class TestNetworkValidation: """Tests for network connection validation.""" diff --git a/tests/unit/test_ecar_spec_compliance.py b/tests/unit/test_ecar_spec_compliance.py index a0618d3f..88808d23 100644 --- a/tests/unit/test_ecar_spec_compliance.py +++ b/tests/unit/test_ecar_spec_compliance.py @@ -939,6 +939,7 @@ def test_process_create_derives_tid_when_context_has_pid(self, emitter, ts): row = emitter.emit_event.call_args.args[0] assert row["tid"] > 0 + assert row["tid"] % 4 == 0 class TestPpidOnlyOnProcess: diff --git a/tests/unit/test_zeek_activity_contexts.py b/tests/unit/test_zeek_activity_contexts.py index bfe1628e..812222e0 100644 --- a/tests/unit/test_zeek_activity_contexts.py +++ b/tests/unit/test_zeek_activity_contexts.py @@ -445,7 +445,7 @@ def test_ssh_session_pam_message_uses_non_root_user_uid(self, activity_gen): assert "admin(uid=1001) by (uid=0)" in pam_messages[0] assert "admin(uid=0)" not in pam_messages[0] - def test_ssh_syslog_sub_events_are_second_ordered(self, activity_gen): + def test_ssh_syslog_sub_events_are_source_ordered_with_subsecond_texture(self, activity_gen): gen, events = activity_gen user = User(username="admin", full_name="Admin User", email="admin@example.com") @@ -477,11 +477,18 @@ def test_ssh_syslog_sub_events_are_second_ordered(self, activity_gen): "Accepted password for admin from 10.0.10.50 port 51111 ssh2", "pam_unix(sshd:session): session opened for user admin(uid=1001) by (uid=0)", ] - assert times == [ - base_time - timedelta(seconds=1), - base_time, - base_time + timedelta(seconds=1), + assert times[0] < base_time < times[2] + assert timedelta(milliseconds=70) <= base_time - times[0] <= timedelta(milliseconds=160) + assert timedelta(milliseconds=115) <= times[2] - base_time <= timedelta(milliseconds=270) + assert times[2] - times[0] != timedelta(seconds=1) + + logind_events = [ + event + for event in events + if event.syslog is not None and event.syslog.app_name == "systemd-logind" ] + assert len(logind_events) == 1 + assert logind_events[0].timestamp - times[2] >= timedelta(milliseconds=420) def test_ssh_systemd_session_ids_stay_in_same_integer_regime(self, activity_gen): gen, events = activity_gen diff --git a/tests/unit/test_zeek_files.py b/tests/unit/test_zeek_files.py index 9db90da8..5da19720 100644 --- a/tests/unit/test_zeek_files.py +++ b/tests/unit/test_zeek_files.py @@ -385,6 +385,47 @@ def test_certificate_file_timestamp_follows_parent_ssl_record(self): assert file_row["ts"] > ssl_row["ts"] + def test_certificate_file_timestamp_stays_inside_parent_connection(self): + """Certificate files should not render after the owning conn.log row ends.""" + fmt = load_format("zeek_files") + base_ts = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + with tempfile.TemporaryDirectory() as tmpdir: + output = Path(tmpdir) / "files.json" + emitter = ZeekFilesEmitter(fmt, output) + cert = X509Context( + fuid="FShortCert12345", + fingerprint="e" * 40, + certificate_subject="CN=short.example.test", + certificate_issuer="CN=Example Issuer", + ) + event = SecurityEvent( + timestamp=base_ts, + event_type="connection", + network=NetworkContext( + src_ip="10.0.0.1", + src_port=50000, + dst_ip="10.0.0.10", + dst_port=443, + protocol="tcp", + service="ssl", + conn_state="SF", + zeek_uid="CShortUID123456", + duration=0.01, + ), + ssl=SslContext( + server_name="short.example.test", + cert_chain_fuids=[cert.fuid], + ), + x509=cert, + ) + + emitter.emit(event) + emitter.close() + + file_row = json.loads(output.read_text().splitlines()[0]) + + assert base_ts.timestamp() <= file_row["ts"] <= base_ts.timestamp() + 0.01 + def test_certificate_file_timestamps_follow_chain_depth_order(self): """Certificate file observations should preserve TLS chain order.""" fmt = load_format("zeek_files") diff --git a/tests/unit/test_zeek_ssl.py b/tests/unit/test_zeek_ssl.py index 3bde0763..4d50877a 100644 --- a/tests/unit/test_zeek_ssl.py +++ b/tests/unit/test_zeek_ssl.py @@ -644,6 +644,45 @@ def test_tls_analyzer_logs_have_stage_timestamp_offsets(self): assert "id.orig_h" not in ocsp_file_row assert ocsp_file_row["mime_type"] == "application/ocsp-response" + def test_x509_timestamp_stays_inside_parent_connection(self): + """x509 analyzer rows should not outlive their owning connection interval.""" + x509_fmt = load_format("zeek_x509") + base_ts = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + + with tempfile.TemporaryDirectory() as tmpdir: + out_dir = Path(tmpdir) + x509_emitter = ZeekX509Emitter(x509_fmt, out_dir / "x509.json") + event = SecurityEvent( + timestamp=base_ts, + event_type="connection", + network=NetworkContext( + src_ip="10.0.0.1", + src_port=50000, + dst_ip="8.8.8.8", + dst_port=443, + protocol="tcp", + service="ssl", + conn_state="SF", + zeek_uid="CShortX509UID12", + duration=0.01, + ), + x509=X509Context( + fuid="Fshortx50912345", + fingerprint="abc123", + certificate_serial="01", + certificate_subject="CN=short.example.com", + certificate_issuer="CN=Example CA", + certificate_not_valid_before=1700000000.0, + certificate_not_valid_after=1730000000.0, + ), + ) + + x509_emitter.emit(event) + x509_emitter.close() + x509_row = json.loads((out_dir / "x509.json").read_text().splitlines()[0]) + + assert base_ts.timestamp() <= x509_row["ts"] <= base_ts.timestamp() + 0.01 + def test_revoked_ocsp_status_renders_revocation_metadata(self): """Revoked OCSP rows should include source-native revocation details.""" ocsp_fmt = load_format("zeek_ocsp") From 508501cf413f6bce1c0a37b592a8c992b6c09f25 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 16:56:21 -0400 Subject: [PATCH 07/61] fix: preserve cli http network effect context --- TODO.md | 2 +- .../generation/activity/generator.py | 27 +++++++++++++++++-- tests/unit/test_activity.py | 13 +++++++++ 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/TODO.md b/TODO.md index 515634a2..484c66db 100644 --- a/TODO.md +++ b/TODO.md @@ -261,7 +261,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 4 regeneration/eval second hard-probe follow-up completed and verified: post-`02bd4bd` regeneration passed automated eval at `95.71` across `87,420` records and kept the service-logon/Zeek-depth/quoted-glob probes clean, but two DB-host `apache2` commands remained via global `{service}` template substitution. Fixed service placeholders to prefer the current host's service list before falling back to global service names. Verification passed with focused regressions, `uv run eforge validate-config`, related activity/spawn/expert/baseline tests (`247 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3071 passed, 37 skipped`). - Loop 4 final regeneration/eval completed from commit `67ac00d`: regenerated unchanged `scenarios/iteration-test/scenario.yaml`, quantitative eval passed at `95.94` JSON overall (`96/100` human-readable) across `89,334` records, and hard probes found zero HTTP `trans_depth > 1` rows, zero service-logon Explorer/medium-integrity/nonzero-terminal contradictions, zero quoted-glob commands, and zero DB-host `apache2`/`certbot`/`nginx reload`/`ab` web-admin command hits. - Loop 4 blind review completed against `/private/tmp/eforge-assess-loop4-neutral-67ac00d/data`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `82`, Network `82`, Host/EDR `78` (average `80.0`). Top next fix targets are canonical process-to-network HTTP client mismatch (`api.example.com/status` command lines paired with unrelated proxy/Zeek hosts and user agents), repeated Linux SSH/logind ordering and exact password-auth timing artifacts, Windows eCAR thread-ID morphology, and Zeek protocol timing/lifecycle bounds. - - Loop 4 blind-review follow-up fix pass completed and verified: command-line HTTP clients now drive canonical HTTP/proxy/TLS destination metadata so curl/wget-style eCAR process rows do not point at unrelated proxy/Zeek hosts or user agents; Linux SSH compound syslog timing uses source-native subsecond connection/PAM/logind spacing instead of exact one-second triads; Windows eCAR process-owned TIDs align to Windows allocation morphology; Zeek certificate files/x509 analyzer rows are bounded inside the owning connection lifetime; sudo baseline noise uses auth/authpriv-style facility semantics. Verification passed with focused regressions, `uv run eforge validate-config`, related unit files (`325 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3075 passed, 37 skipped`). + - Loop 4 blind-review follow-up fix pass completed and verified: command-line HTTP clients now drive canonical HTTP/proxy/TLS destination metadata so curl/wget-style eCAR process rows do not point at unrelated proxy/Zeek hosts or user agents, including the post-regeneration hard-probe catch where stale effect context retargeted a rendered curl command to a wget-style proxy destination; Linux SSH compound syslog timing uses source-native subsecond connection/PAM/logind spacing instead of exact one-second triads; Windows eCAR process-owned TIDs align to Windows allocation morphology; Zeek certificate files/x509 analyzer rows are bounded inside the owning connection lifetime; sudo baseline noise uses auth/authpriv-style facility semantics. Verification passed with focused regressions, `uv run eforge validate-config`, related unit files (`339 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3076 passed, 37 skipped`). - Loop 5 regeneration/eval **IN PROGRESS**: regenerate `scenarios/iteration-test/scenario.yaml` from the Loop 4 follow-up commit, rerun quantitative eval and hard probes, then launch the next blind-review panel if the probes stay clean. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 02d9ff43..5c063c2b 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -390,6 +390,21 @@ def _http_context_from_process_command( return context, host, port, service +def _network_effect_context_for_process( + process_name: str, + command_line: str, + effect_process_name: str, + effect_command_line: str, +) -> tuple[str, str]: + """Choose the process identity used for immediate network side effects.""" + if _extract_http_url_from_command(command_line) and _http_user_agent_for_process( + process_name, + command_line, + ): + return process_name, command_line + return effect_process_name, effect_command_line + + def _parse_port_tokens(tokens: list[str]) -> list[int]: """Parse nmap port tokens until the next option or target token.""" ports: list[int] = [] @@ -8468,6 +8483,14 @@ def execute_baseline_activity( process_name, command_line, ) + network_process_name, network_command_line = ( + _network_effect_context_for_process( + process_name, + command_line, + effect_process_name, + effect_command_line, + ) + ) # Spawn child/utility processes for apps that have them if activity_type == "process_user_apps": @@ -8503,8 +8526,8 @@ def execute_baseline_activity( # (tight PID+timestamp coupling alongside profile-driven volume) self._emit_process_network_correlation( system, - effect_process_name, - effect_command_line, + network_process_name, + network_command_line, process_time, pid, rng, diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index e4da0c82..9ea72c69 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -43,6 +43,7 @@ _extract_image_from_command, _http_context_from_process_command, _jitter_default_connection_duration, + _network_effect_context_for_process, ) from evidenceforge.generation.activity.tls_realism import ( certificate_analyzer_delay_ms, @@ -127,6 +128,18 @@ def test_proxy_context_preserves_cli_http_user_agent(self): assert proxy_context.url == "https://api.github.com/rate_limit" assert proxy_context.user_agent == "curl/7.88.1" + def test_network_effect_context_keeps_rendered_cli_http_command(self): + """A stale process-state lookup should not retarget a rendered curl command.""" + process_name, command_line = _network_effect_context_for_process( + "/usr/bin/curl", + "curl -s https://api.slack.com/methods/api.test", + "/usr/bin/wget", + "wget https://images.netscaler.dev/agent.dat", + ) + + assert process_name == "/usr/bin/curl" + assert command_line == "curl -s https://api.slack.com/methods/api.test" + class TestNetworkValidation: """Tests for network connection validation.""" From 811b2f15f32006b6d1cc6ccfadee59b44e4661a7 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 17:08:01 -0400 Subject: [PATCH 08/61] fix: bind process http commands to proxy flows --- TODO.md | 3 +- .../generation/activity/generator.py | 41 +++++++++++ tests/unit/test_activity.py | 70 +++++++++++++++++++ 3 files changed, 113 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 484c66db..e43ba71a 100644 --- a/TODO.md +++ b/TODO.md @@ -262,7 +262,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 4 final regeneration/eval completed from commit `67ac00d`: regenerated unchanged `scenarios/iteration-test/scenario.yaml`, quantitative eval passed at `95.94` JSON overall (`96/100` human-readable) across `89,334` records, and hard probes found zero HTTP `trans_depth > 1` rows, zero service-logon Explorer/medium-integrity/nonzero-terminal contradictions, zero quoted-glob commands, and zero DB-host `apache2`/`certbot`/`nginx reload`/`ab` web-admin command hits. - Loop 4 blind review completed against `/private/tmp/eforge-assess-loop4-neutral-67ac00d/data`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `82`, Network `82`, Host/EDR `78` (average `80.0`). Top next fix targets are canonical process-to-network HTTP client mismatch (`api.example.com/status` command lines paired with unrelated proxy/Zeek hosts and user agents), repeated Linux SSH/logind ordering and exact password-auth timing artifacts, Windows eCAR thread-ID morphology, and Zeek protocol timing/lifecycle bounds. - Loop 4 blind-review follow-up fix pass completed and verified: command-line HTTP clients now drive canonical HTTP/proxy/TLS destination metadata so curl/wget-style eCAR process rows do not point at unrelated proxy/Zeek hosts or user agents, including the post-regeneration hard-probe catch where stale effect context retargeted a rendered curl command to a wget-style proxy destination; Linux SSH compound syslog timing uses source-native subsecond connection/PAM/logind spacing instead of exact one-second triads; Windows eCAR process-owned TIDs align to Windows allocation morphology; Zeek certificate files/x509 analyzer rows are bounded inside the owning connection lifetime; sudo baseline noise uses auth/authpriv-style facility semantics. Verification passed with focused regressions, `uv run eforge validate-config`, related unit files (`339 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3076 passed, 37 skipped`). - - Loop 5 regeneration/eval **IN PROGRESS**: regenerate `scenarios/iteration-test/scenario.yaml` from the Loop 4 follow-up commit, rerun quantitative eval and hard probes, then launch the next blind-review panel if the probes stay clean. + - Loop 5 regeneration/eval hard-probe follow-up fix pass completed and verified: regeneration from `508501c` passed quantitative eval at `95.70` JSON overall (`96/100` human-readable) across `83,915` records, and hard probes kept Zeek `trans_depth`, service-logon parentage, DB role drift, Windows eCAR TID shape, Zeek analyzer timing bounds, and SSH ordering clean. The same probes caught remaining curl/proxy host mismatches caused by profile-driven traffic being attributed to a still-running one-shot CLI process; fixed canonical connection generation to derive HTTP/proxy destination metadata from the owning process command when an attributed PID has a concrete HTTP URL. Verification passed with focused CLI/proxy tests, related activity/source tests (`340 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3077 passed, 37 skipped`). + - Loop 5 final regeneration/eval **IN PROGRESS**: regenerate `scenarios/iteration-test/scenario.yaml` from the Loop 5 follow-up commit, rerun quantitative eval and hard probes, then launch the next blind-review panel if the probes stay clean. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 5c063c2b..40f33a88 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -5161,6 +5161,47 @@ def generate_connection( if source_system is None and hasattr(self, "_ip_to_system"): source_system = self._ip_to_system.get(src_ip) + if ( + http is None + and pid > 0 + and source_system is not None + and proto == "tcp" + and (dst_port in {80, 443, 8080} or service is None or service in {"http", "ssl"}) + ): + proc = self.state_manager.get_process(source_system.hostname, pid) + if proc is not None: + command_http = _http_context_from_process_command( + proc.image, + proc.command_line, + response_body_len=resp_bytes or _get_rng().randint(500, 50000), + ) + if command_http is not None: + command_http_context, command_host, command_port, command_service = command_http + command_target = self._system_for_hostname(command_host) + host_lower = command_host.lower().rstrip(".") + ad_domain_for_command = ( + str( + getattr(self, "_ad_domain", "") or "", + ) + .lower() + .rstrip(".") + ) + command_is_unknown_internal = command_target is None and ( + host_lower.endswith(".local") + or ( + ad_domain_for_command + and host_lower.endswith(f".{ad_domain_for_command}") + ) + ) + if not command_is_unknown_internal: + http = command_http_context + hostname = command_host + dst_port = command_port + service = command_service + if command_target is not None: + dst_ip = command_target.ip + emit_dns = True + # Resolve hostname ONCE for DNS/proxy consistency. # All downstream uses (causal DNS expansion, proxy hostname) # share this single resolved value instead of doing independent lookups. diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 9ea72c69..3200bb22 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -140,6 +140,76 @@ def test_network_effect_context_keeps_rendered_cli_http_command(self): assert process_name == "/usr/bin/curl" assert command_line == "curl -s https://api.slack.com/methods/api.test" + def test_generate_connection_uses_process_http_command_for_proxy_context(self, monkeypatch): + """Later network effects attributed to curl should keep the command URL.""" + state = StateManager() + generator = ActivityGenerator( + state, + {}, + dispatcher=EventDispatcher(state_manager=state, emitters={}), + ) + source = System( + hostname="APP-INT-01", + ip="10.10.2.30", + os="Ubuntu 24.04", + type="server", + ) + proxy = System( + hostname="PROXY-01", + ip="10.10.3.20", + os="Ubuntu 24.04", + type="server", + ) + generator._ip_to_system = {source.ip: source, proxy.ip: proxy} + generator._proxy_mode = "explicit" + generator._proxy_listener_port = 8080 + generator._proxy_routes = {source.ip: [proxy]} + generator._ad_domain = "meridianhcs.local" + + timestamp = datetime(2024, 3, 18, 12, 0, tzinfo=UTC) + state.set_current_time(timestamp) + pid = state.create_process( + system=source.hostname, + parent_pid=4, + image="/usr/bin/curl", + command_line="curl -s https://api.slack.com/methods/api.test", + username="sarah.martinez", + integrity_level="Medium", + logon_id="0x1234", + ) + + captured: list[dict[str, object]] = [] + original_build_proxy_context = generator._build_proxy_context + + def capture_proxy_context(**kwargs): + captured.append(kwargs) + return original_build_proxy_context(**kwargs) + + monkeypatch.setattr(generator, "_build_proxy_context", capture_proxy_context) + + generator.generate_connection( + src_ip=source.ip, + dst_ip="13.107.246.52", + time=timestamp + timedelta(seconds=1), + dst_port=443, + proto="tcp", + service="ssl", + duration=2.0, + orig_bytes=400, + resp_bytes=1200, + emit_dns=True, + pid=pid, + source_system=source, + ) + + assert captured + assert captured[0]["hostname"] == "api.slack.com" + assert captured[0]["dst_port"] == 443 + http = captured[0]["http"] + assert isinstance(http, HttpContext) + assert http.user_agent == "curl/7.88.1" + assert http.uri == "/methods/api.test" + class TestNetworkValidation: """Tests for network connection validation.""" From 380f38c31b7179f889cb507babe19aabdee0c14a Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 17:30:05 -0400 Subject: [PATCH 09/61] docs: record loop 5 blind review results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index e43ba71a..a3431fec 100644 --- a/TODO.md +++ b/TODO.md @@ -263,7 +263,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 4 blind review completed against `/private/tmp/eforge-assess-loop4-neutral-67ac00d/data`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `82`, Network `82`, Host/EDR `78` (average `80.0`). Top next fix targets are canonical process-to-network HTTP client mismatch (`api.example.com/status` command lines paired with unrelated proxy/Zeek hosts and user agents), repeated Linux SSH/logind ordering and exact password-auth timing artifacts, Windows eCAR thread-ID morphology, and Zeek protocol timing/lifecycle bounds. - Loop 4 blind-review follow-up fix pass completed and verified: command-line HTTP clients now drive canonical HTTP/proxy/TLS destination metadata so curl/wget-style eCAR process rows do not point at unrelated proxy/Zeek hosts or user agents, including the post-regeneration hard-probe catch where stale effect context retargeted a rendered curl command to a wget-style proxy destination; Linux SSH compound syslog timing uses source-native subsecond connection/PAM/logind spacing instead of exact one-second triads; Windows eCAR process-owned TIDs align to Windows allocation morphology; Zeek certificate files/x509 analyzer rows are bounded inside the owning connection lifetime; sudo baseline noise uses auth/authpriv-style facility semantics. Verification passed with focused regressions, `uv run eforge validate-config`, related unit files (`339 passed, 1 skipped`), Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3076 passed, 37 skipped`). - Loop 5 regeneration/eval hard-probe follow-up fix pass completed and verified: regeneration from `508501c` passed quantitative eval at `95.70` JSON overall (`96/100` human-readable) across `83,915` records, and hard probes kept Zeek `trans_depth`, service-logon parentage, DB role drift, Windows eCAR TID shape, Zeek analyzer timing bounds, and SSH ordering clean. The same probes caught remaining curl/proxy host mismatches caused by profile-driven traffic being attributed to a still-running one-shot CLI process; fixed canonical connection generation to derive HTTP/proxy destination metadata from the owning process command when an attributed PID has a concrete HTTP URL. Verification passed with focused CLI/proxy tests, related activity/source tests (`340 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3077 passed, 37 skipped`). - - Loop 5 final regeneration/eval **IN PROGRESS**: regenerate `scenarios/iteration-test/scenario.yaml` from the Loop 5 follow-up commit, rerun quantitative eval and hard probes, then launch the next blind-review panel if the probes stay clean. + - Loop 5 final regeneration/eval completed from commit `811b2f1`: regenerated `scenarios/iteration-test/scenario.yaml`, quantitative eval passed at `95.26` JSON overall (`95/100` human-readable) across `93,567` records, and hard probes found zero CLI/proxy host mismatches, zero `api.example.com` remnants, zero HTTP `trans_depth > 1` rows, zero service-logon Explorer/medium-integrity/nonzero-terminal contradictions, zero DB-host web-admin command hits, zero expandable Linux quoted-glob commands, zero Windows eCAR TID morphology violations, zero Zeek file/x509 connection-bound violations, and zero SSH exact-one-second/order violations. + - Loop 5 blind review completed against `/private/tmp/eforge-assess-loop5-neutral-811b2f1-20260515T2109/data`. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `72`, Network `78`, Host/EDR `78` (average `75.0`). Top next fix targets are duplicate Linux `systemd-logind` session removals/SSH lifecycle state, web asset-before-document ordering plus cache behavior, Windows WinSxS component path/build identity, PsExec service-process lineage/privilege semantics, and lower-priority endpoint observation smoothness/command-pool variance. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From ac300949484b56bbf0f949866b41598c1600fae6 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 18:52:52 -0400 Subject: [PATCH 10/61] fix: improve loop 6 session and web realism --- TODO.md | 16 ++++ .../config/activity/system_processes.yaml | 4 +- .../config/activity/timing_profiles.yaml | 20 ++--- .../generation/activity/proxy_uri.py | 7 +- .../generation/activity/system_processes.py | 47 +++++++++-- .../generation/emitters/syslog.py | 28 +++---- .../generation/emitters/sysmon.py | 16 ++++ .../generation/engine/baseline.py | 18 +++- tests/unit/test_baseline_canonical.py | 83 +++++++++++++++++++ tests/unit/test_dispatcher.py | 48 +++++++++++ tests/unit/test_phase5_process_pools.py | 19 ++++- tests/unit/test_sysmon_new_events.py | 22 +++++ tests/unit/test_timing_profiles.py | 16 +++- tests/unit/test_ua_os_mismatch.py | 14 ++++ 14 files changed, 319 insertions(+), 39 deletions(-) diff --git a/TODO.md b/TODO.md index a3431fec..5a89047d 100644 --- a/TODO.md +++ b/TODO.md @@ -265,6 +265,22 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - Loop 5 regeneration/eval hard-probe follow-up fix pass completed and verified: regeneration from `508501c` passed quantitative eval at `95.70` JSON overall (`96/100` human-readable) across `83,915` records, and hard probes kept Zeek `trans_depth`, service-logon parentage, DB role drift, Windows eCAR TID shape, Zeek analyzer timing bounds, and SSH ordering clean. The same probes caught remaining curl/proxy host mismatches caused by profile-driven traffic being attributed to a still-running one-shot CLI process; fixed canonical connection generation to derive HTTP/proxy destination metadata from the owning process command when an attributed PID has a concrete HTTP URL. Verification passed with focused CLI/proxy tests, related activity/source tests (`340 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3077 passed, 37 skipped`). - Loop 5 final regeneration/eval completed from commit `811b2f1`: regenerated `scenarios/iteration-test/scenario.yaml`, quantitative eval passed at `95.26` JSON overall (`95/100` human-readable) across `93,567` records, and hard probes found zero CLI/proxy host mismatches, zero `api.example.com` remnants, zero HTTP `trans_depth > 1` rows, zero service-logon Explorer/medium-integrity/nonzero-terminal contradictions, zero DB-host web-admin command hits, zero expandable Linux quoted-glob commands, zero Windows eCAR TID morphology violations, zero Zeek file/x509 connection-bound violations, and zero SSH exact-one-second/order violations. - Loop 5 blind review completed against `/private/tmp/eforge-assess-loop5-neutral-811b2f1-20260515T2109/data`. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `72`, Network `78`, Host/EDR `78` (average `75.0`). Top next fix targets are duplicate Linux `systemd-logind` session removals/SSH lifecycle state, web asset-before-document ordering plus cache behavior, Windows WinSxS component path/build identity, PsExec service-process lineage/privilege semantics, and lower-priority endpoint observation smoothness/command-pool variance. + - Loop 6 fix pass completed and verified: duplicate pre-window `systemd-logind` + removals now receive unique rendered session IDs, web page-asset fanout timing + stays beyond source-observation delay windows, repeated fingerprinted web assets + are served from browser cache instead of repeatedly hitting the server, standalone + proxy static-asset requests no longer claim unseen same-origin page referrers, + and TiWorker WinSxS component paths/metadata resolve by host Windows build. + Verification passed with `uv run eforge validate-config`, focused + syslog/web/proxy/TiWorker regressions, `uv run ruff check .`, + `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -q` + (`3082 passed, 37 skipped`). + - [x] Loop 6 hard-probe follow-up: one residual rendered Zeek DMZ static asset + still appeared before its same-origin page after source-native conn/http offsets, + so widened the data-driven web asset fanout windows beyond the combined Zeek + observation offsets. Verification passed with focused timing/cache regressions, + `uv run eforge validate-config`, Ruff checks/format checks, and full normal + `uv run pytest --no-cov -q` (`3082 passed, 37 skipped`). - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/activity/system_processes.yaml b/src/evidenceforge/config/activity/system_processes.yaml index f689508c..ac9d1d35 100644 --- a/src/evidenceforge/config/activity/system_processes.yaml +++ b/src/evidenceforge/config/activity/system_processes.yaml @@ -28,7 +28,7 @@ scheduled_tasks: - "usoclient.exe StartInteractiveScan" - "usoclient.exe ResumeUpdate" parent: svchost_wusvcs - - image: "C:\\Windows\\WinSxS\\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_10.0.19041.3636_none_7c91d6e7c9f7f1f5\\TiWorker.exe" + - image: "C:\\Windows\\WinSxS\\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_{servicing_stack_version}_none_7c91d6e7c9f7f1f5\\TiWorker.exe" command_templates: - "TiWorker.exe -Embedding" parent: svchost_netsvcs @@ -326,7 +326,7 @@ system_binaries: # --- Windows Update and telemetry --- - {exe: "usoclient.exe", path: "C:\\Windows\\System32\\usoclient.exe"} - - {exe: "TiWorker.exe", path: "C:\\Windows\\WinSxS\\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_10.0.19041.3636_none_7c91d6e7c9f7f1f5\\TiWorker.exe"} + - {exe: "TiWorker.exe", path: "C:\\Windows\\WinSxS\\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_{servicing_stack_version}_none_7c91d6e7c9f7f1f5\\TiWorker.exe"} - {exe: "cleanmgr.exe", path: "C:\\Windows\\System32\\cleanmgr.exe"} - {exe: "wsqmcons.exe", path: "C:\\Windows\\System32\\wsqmcons.exe"} - {exe: "CompatTelRunner.exe", path: "C:\\Windows\\System32\\CompatTelRunner.exe"} diff --git a/src/evidenceforge/config/activity/timing_profiles.yaml b/src/evidenceforge/config/activity/timing_profiles.yaml index 29f80824..82b25036 100644 --- a/src/evidenceforge/config/activity/timing_profiles.yaml +++ b/src/evidenceforge/config/activity/timing_profiles.yaml @@ -119,28 +119,28 @@ relationships: web.asset_stylesheet_script_after_page: class: burst_fanout position: after - min_ms: 50 - max_ms: 200 + min_ms: 1500 + max_ms: 2600 web.asset_image_after_page: class: burst_fanout position: after - min_ms: 200 - max_ms: 800 + min_ms: 1700 + max_ms: 3200 web.asset_font_after_page: class: burst_fanout position: after - min_ms: 300 - max_ms: 600 + min_ms: 1700 + max_ms: 3000 web.asset_api_after_page: class: burst_fanout position: after - min_ms: 500 - max_ms: 2000 + min_ms: 1800 + max_ms: 3600 web.asset_other_after_page: class: burst_fanout position: after - min_ms: 100 - max_ms: 500 + min_ms: 1500 + max_ms: 2800 web.tool_request_gap: class: burst_fanout position: after diff --git a/src/evidenceforge/generation/activity/proxy_uri.py b/src/evidenceforge/generation/activity/proxy_uri.py index 920772f5..258ef297 100644 --- a/src/evidenceforge/generation/activity/proxy_uri.py +++ b/src/evidenceforge/generation/activity/proxy_uri.py @@ -14,7 +14,10 @@ from evidenceforge.config import get_activity_directory from evidenceforge.config.overlay import deep_merge_dict, load_with_overlay -from evidenceforge.generation.activity.http_content import normalize_mime_type_for_path +from evidenceforge.generation.activity.http_content import ( + is_stable_resource_path, + normalize_mime_type_for_path, +) _TEMPLATES_PATH = get_activity_directory() / "proxy_uri_templates.yaml" _CACHED_DATA: dict[str, Any] | None = None @@ -166,5 +169,7 @@ def pick_proxy_uri( path = _substitute_vars(rng, path, data) content_type = normalize_mime_type_for_path(path, content_type) + if referrer_policy != "none" and is_stable_resource_path(path): + referrer_policy = "none" return path, content_type, method, user_agent, referrer_policy diff --git a/src/evidenceforge/generation/activity/system_processes.py b/src/evidenceforge/generation/activity/system_processes.py index e40e14f6..664cf956 100644 --- a/src/evidenceforge/generation/activity/system_processes.py +++ b/src/evidenceforge/generation/activity/system_processes.py @@ -63,7 +63,11 @@ def get_system_binary_exes() -> set[str]: return exes -def get_system_binary_path(exe_name: str, username: str | None = None) -> str | None: +def get_system_binary_path( + exe_name: str, + username: str | None = None, + host: Any | None = None, +) -> str | None: """Look up the full image path for a system binary by exe name. Case-insensitive lookup. Resolves ``{username}`` placeholders if @@ -91,6 +95,8 @@ def get_system_binary_path(exe_name: str, username: str | None = None) -> str | else: # No username context — return None to let caller fall back return None + if path: + path = _resolve_host_placeholders(path, host) return path @@ -106,7 +112,28 @@ def _resolve_template(template: str, rng: random.Random, entry_params: dict | No return result -def pick_scheduled_task(rng: random.Random) -> tuple[str, str, str]: +def _windows_servicing_stack_version(host: Any | None) -> str: + """Return a plausible servicing-stack component version for a Windows host.""" + os_name = str(getattr(host, "os", "") or "").lower() if host is not None else "" + system_type = str(getattr(host, "system_type", getattr(host, "type", "")) or "").lower() + if "windows 11" in os_name: + return "10.0.22621.3155" + if "server" in os_name or system_type in {"server", "domain_controller"}: + if "2019" in os_name: + return "10.0.17763.5329" + return "10.0.20348.2322" + return "10.0.19041.3636" + + +def _resolve_host_placeholders(value: str, host: Any | None = None) -> str: + """Resolve host-owned placeholders in system-process paths and commands.""" + return value.replace( + "{servicing_stack_version}", + _windows_servicing_stack_version(host), + ) + + +def pick_scheduled_task(rng: random.Random, host: Any | None = None) -> tuple[str, str, str]: """Pick a random scheduled task. Returns (image_path, command_line, parent_key). @@ -119,11 +146,17 @@ def pick_scheduled_task(rng: random.Random) -> tuple[str, str, str]: entry = rng.choice(tasks) cmd_template = rng.choice(entry["command_templates"]) cmd = _resolve_template(cmd_template, rng, entry.get("params")) - return entry["image"], cmd, entry.get("parent", "services") + return ( + _resolve_host_placeholders(entry["image"], host), + _resolve_host_placeholders(cmd, host), + entry.get("parent", "services"), + ) def pick_system_service_process( - rng: random.Random, host_type: str = "workstation" + rng: random.Random, + host_type: str = "workstation", + host: Any | None = None, ) -> tuple[str, str, str]: """Pick a random system service process appropriate for the host role. @@ -151,4 +184,8 @@ def pick_system_service_process( entry = rng.choice(pool) cmd_template = rng.choice(entry["command_templates"]) cmd = _resolve_template(cmd_template, rng, entry.get("params")) - return entry["image"], cmd, entry.get("parent", "services") + return ( + _resolve_host_placeholders(entry["image"], host), + _resolve_host_placeholders(cmd, host), + entry.get("parent", "services"), + ) diff --git a/src/evidenceforge/generation/emitters/syslog.py b/src/evidenceforge/generation/emitters/syslog.py index 5e15125c..20530f58 100644 --- a/src/evidenceforge/generation/emitters/syslog.py +++ b/src/evidenceforge/generation/emitters/syslog.py @@ -277,7 +277,6 @@ def _normalize_logind_session_ids_for_lines(lines: list[str], host_key: str) -> next_by_pid = {pid: max(2, start) - 1 for pid, start in first_by_pid.items()} prewindow_next_by_pid = {pid: max(2, start) - 1 for pid, start in first_by_pid.items()} rewritten_by_original: dict[tuple[str, str], int] = {} - prewindow_removed_by_original: dict[tuple[str, str], int] = {} normalized: list[str] = [] for index, line in enumerate(lines): new_match = _LOGIND_NEW_SESSION_RE.search(line) @@ -304,22 +303,17 @@ def _normalize_logind_session_ids_for_lines(lines: list[str], host_key: str) -> rewritten = rewritten_by_original.get(key) if rewritten is None: pid = removed_match.group("pid") - session = int(removed_match.group("session")) - first_visible = max(2, first_by_pid.get(pid, session + 1)) - if session >= first_visible: - rewritten = prewindow_removed_by_original.get(key) - if rewritten is None: - step_seed = _stable_seed( - "syslog_logind_prewindow_session_step:" - f"{host_key}:{pid}:{removed_match.group('session')}:{index}" - ) - prewindow_next_by_pid[pid] = ( - prewindow_next_by_pid.get(pid, first_visible - 1) - - 1 - - (step_seed % 3) - ) - rewritten = prewindow_next_by_pid[pid] - prewindow_removed_by_original[key] = rewritten + first_visible = max( + 2, first_by_pid.get(pid, int(removed_match.group("session")) + 1) + ) + step_seed = _stable_seed( + "syslog_logind_prewindow_session_step:" + f"{host_key}:{pid}:{removed_match.group('session')}:{index}" + ) + prewindow_next_by_pid[pid] = ( + prewindow_next_by_pid.get(pid, first_visible - 1) - 1 - (step_seed % 3) + ) + rewritten = prewindow_next_by_pid[pid] if rewritten is not None: line = ( f"{line[: removed_match.start('session')]}" diff --git a/src/evidenceforge/generation/emitters/sysmon.py b/src/evidenceforge/generation/emitters/sysmon.py index daf57720..d406522a 100644 --- a/src/evidenceforge/generation/emitters/sysmon.py +++ b/src/evidenceforge/generation/emitters/sysmon.py @@ -504,6 +504,9 @@ def _normalize_os_binary_metadata( return metadata if not cls._is_windows_os_binary_path(image_path): return metadata + component_version = cls._servicing_stack_version_from_path(image_path) + if component_version and orig.lower() == "tiworker.exe": + return component_version, desc, prod, company, orig return cls._host_windows_file_version(host), desc, prod, company, orig @staticmethod @@ -515,6 +518,19 @@ def _is_windows_os_binary_path(image_path: str) -> bool: or image_lower.startswith("c:\\windows\\") ) + @staticmethod + def _servicing_stack_version_from_path(image_path: str) -> str: + image_lower = image_path.replace("/", "\\").lower() + marker = "microsoft-windows-servicingstack_31bf3856ad364e35_" + if marker not in image_lower: + return "" + tail = image_lower.split(marker, 1)[1] + version = tail.split("_", 1)[0] + parts = version.split(".") + if len(parts) == 4 and all(part.isdigit() for part in parts): + return version + return "" + @staticmethod def _host_windows_file_version(host: Any) -> str: os_name = str(getattr(host, "os", "") or "").lower() diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 9634ec6d..cfc5108b 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -4443,7 +4443,11 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 svc_offset = rng.uniform(0, 3599) svc_ts = current_hour + timedelta(seconds=svc_offset) self.state_manager.set_current_time(svc_ts) - svc_image, svc_cmd, svc_parent_key = _pick_svc(rng, sys_type_str) + svc_image, svc_cmd, svc_parent_key = _pick_svc( + rng, + sys_type_str, + system, + ) svc_parent = sys_pids.get( svc_parent_key, sys_pids.get("services", sys_pids.get("wininit", 4)) ) @@ -4627,7 +4631,7 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 ): ts = current_hour + timedelta(seconds=offset) self.state_manager.set_current_time(ts) - task_image, task_cmd, task_parent_key = pick_scheduled_task(rng) + task_image, task_cmd, task_parent_key = pick_scheduled_task(rng, system) parent_pid = sys_pids.get( task_parent_key, sys_pids.get("services", sys_pids.get("wininit", 4)) ) @@ -5904,6 +5908,7 @@ def _effective_dst_ip(is_external_client: bool) -> str: def _status_message(status: int) -> str: return { 200: "OK", + 304: "Not Modified", 403: "Forbidden", 404: "Not Found", 405: "Method Not Allowed", @@ -5985,6 +5990,15 @@ def _tool_gap_ms() -> int: if req.hostname != http_host: continue req_ts = base_ts + timedelta(milliseconds=req.time_offset_ms) + if is_stable_resource_path(req.path) and not req.is_page_load: + cache_seen = getattr(self, "_web_static_cache_seen", None) + if not isinstance(cache_seen, dict): + cache_seen = self._web_static_cache_seen = {} + cache_key = (client_ip, http_host, req.path) + if cache_key in cache_seen: + cache_seen[cache_key] += 1 + continue + cache_seen[cache_key] = 1 self.activity_generator.generate_connection( src_ip=client_ip, dst_ip=effective_dst_ip, diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 2aa54819..4d78c624 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -1285,6 +1285,89 @@ def test_web_server_access_uses_browsing_session_shape(self, monkeypatch): kw["http"].uri.endswith(".css") or kw["http"].uri.endswith(".js") for kw in collected ) + def test_web_server_access_uses_browser_cache_for_repeated_static_assets(self, monkeypatch): + """Repeated browser assets from one client should not all hit the server.""" + from random import Random + from types import SimpleNamespace + from unittest.mock import MagicMock + + from evidenceforge.generation.activity import browsing_session, web_session_profiles + from evidenceforge.generation.activity.browsing_session import BrowsingRequest + from evidenceforge.generation.engine.baseline import BaselineMixin + + monkeypatch.setattr( + web_session_profiles, + "pick_web_visitor_profile", + lambda rng, *, is_external: ( + "human_browser", + { + "kind": "session", + "browsing_intensity": "normal", + "user_agent_pool": "browser_any", + }, + ), + ) + monkeypatch.setattr( + browsing_session, + "generate_browsing_session", + lambda **kwargs: [ + BrowsingRequest( + time_offset_ms=0, + hostname=kwargs["hostname"], + path="/", + method="GET", + content_type="text/html", + referrer="", + trans_depth=1, + is_page_load=True, + response_body_len=4096, + request_body_len=0, + ), + BrowsingRequest( + time_offset_ms=900, + hostname=kwargs["hostname"], + path="/assets/js/app.bundle.1234abcd.js", + method="GET", + content_type="application/javascript", + referrer=f"https://{kwargs['hostname']}/", + trans_depth=2, + is_page_load=False, + response_body_len=180_000, + request_body_len=0, + ), + ], + ) + + collected = [] + activity_gen = MagicMock() + activity_gen._ip_to_system = {} + activity_gen.generate_connection.side_effect = lambda **kw: collected.append(kw) + engine = MagicMock() + engine.activity_generator = activity_gen + engine._resolve_traffic_rate.return_value = (2, 2) + engine._get_segment_for_system.return_value = SimpleNamespace( + exposure="external", + external_ratio=None, + ) + engine._generate_external_client_ip.return_value = "8.8.4.20" + sys_obj = self._make_web_system("external", public_hostnames=["portal.example.com"]) + + BaselineMixin._emit_web_server_access( + engine, + sys_obj, + [sys_obj], + Random(4), + datetime(2024, 3, 15, 10, 0, 0, tzinfo=UTC), + ) + + page_rows = [kw for kw in collected if kw["http"].uri == "/"] + asset_rows = [ + kw for kw in collected if kw["http"].uri == "/assets/js/app.bundle.1234abcd.js" + ] + assert len(page_rows) == 2 + assert len(asset_rows) == 1 + assert asset_rows[0]["http"].status_code == 200 + def test_web_server_access_keeps_scanner_requests_source_native(self, monkeypatch): """Scanner visitors should keep configured error paths and blank referrers.""" from random import Random diff --git a/tests/unit/test_dispatcher.py b/tests/unit/test_dispatcher.py index 82ad9a5e..8c3c159d 100644 --- a/tests/unit/test_dispatcher.py +++ b/tests/unit/test_dispatcher.py @@ -800,6 +800,54 @@ def test_syslog_rewrites_prewindow_logind_removals_below_visible_news(self, tmp_ assert first_removed < new_session assert later_removed == new_session + def test_syslog_rewrites_duplicate_prewindow_logind_removals_uniquely(self, tmp_path): + """Removed-only logind rows should not collapse to one duplicate session ID.""" + from datetime import UTC, datetime + + from evidenceforge.formats import load_format + from evidenceforge.generation.emitters.syslog import SyslogEmitter + + format_def = load_format("syslog") + output_path = tmp_path / "syslog.log" + emitter = SyslogEmitter(format_def, output_path, buffer_size=10) + for timestamp, message in [ + ( + datetime(2024, 3, 18, 12, 1, 55, tzinfo=UTC), + "Removed session 12940.", + ), + ( + datetime(2024, 3, 18, 12, 9, 11, tzinfo=UTC), + "Removed session 12940.", + ), + ( + datetime(2024, 3, 18, 12, 18, 13, tzinfo=UTC), + "New session 12945 of user root.", + ), + ]: + emitter.emit_raw( + { + "timestamp": timestamp, + "hostname": "linux01", + "app_name": "systemd-logind", + "pid": 24094, + "facility": 10, + "severity": 6, + "message": message, + } + ) + emitter.close() + + lines = output_path.read_text(encoding="utf-8").splitlines() + removed_sessions = [ + int(line.split("Removed session ", 1)[1].rstrip(".")) + for line in lines + if "Removed session" in line + ] + new_session = int(lines[-1].split("New session ", 1)[1].split(" ", 1)[0]) + + assert len(removed_sessions) == len(set(removed_sessions)) + assert all(session < new_session for session in removed_sessions) + def test_syslog_sorts_same_second_ssh_lifecycle(self, tmp_path): """Same-second SSH syslog groups should keep lifecycle order.""" from datetime import UTC, datetime diff --git a/tests/unit/test_phase5_process_pools.py b/tests/unit/test_phase5_process_pools.py index 7926ae47..b89a3f47 100644 --- a/tests/unit/test_phase5_process_pools.py +++ b/tests/unit/test_phase5_process_pools.py @@ -23,6 +23,7 @@ """Unit tests for Phase 5.1.4: Expanded process template pools.""" from datetime import UTC, datetime +from types import SimpleNamespace from unittest.mock import Mock from evidenceforge.generation.activity import ( @@ -32,7 +33,10 @@ PROCESS_TEMPLATES_LINUX, ActivityGenerator, ) -from evidenceforge.generation.activity.system_processes import load_system_processes +from evidenceforge.generation.activity.system_processes import ( + _resolve_host_placeholders, + load_system_processes, +) from evidenceforge.generation.state_manager import StateManager from evidenceforge.models import System, User @@ -114,6 +118,19 @@ def test_system_process_templates_avoid_windows_internal_path_artifacts(self): assert all("S-1-5-21 1" not in arg for arg in params) assert all("UsGthrCtrlFltPipeMssGthrPipe" in arg for arg in params) + def test_tiworker_servicing_stack_placeholder_resolves_by_host_build(self): + """TiWorker WinSxS component paths should follow the host OS family.""" + template = ( + r"C:\Windows\WinSxS\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_" + r"{servicing_stack_version}_none_7c91d6e7c9f7f1f5\TiWorker.exe" + ) + + workstation = SimpleNamespace(os="Windows 10 Enterprise", type="workstation") + server = SimpleNamespace(os="Windows Server 2022", type="server") + + assert "10.0.19041.3636" in _resolve_host_placeholders(template, workstation) + assert "10.0.20348.2322" in _resolve_host_placeholders(template, server) + class TestBaselinePatterns: """Verify baseline patterns include new activity types.""" diff --git a/tests/unit/test_sysmon_new_events.py b/tests/unit/test_sysmon_new_events.py index 15bcf3fd..6dbb7aa9 100644 --- a/tests/unit/test_sysmon_new_events.py +++ b/tests/unit/test_sysmon_new_events.py @@ -855,6 +855,28 @@ def test_hashes_follow_rendered_binary_identity(self): image, workstation ) == SysmonEventEmitter._generate_hashes(image, server) + def test_tiworker_metadata_uses_servicing_stack_component_version(self): + """WinSxS TiWorker metadata should match the rendered component path.""" + server = HostContext( + hostname="SRV-01", + ip="10.0.1.20", + os="Windows Server 2022", + os_category="windows", + system_type="server", + domain="corp.local", + fqdn="SRV-01.corp.local", + netbios_domain="CORP", + ) + image = ( + r"C:\Windows\WinSxS\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_" + r"10.0.20348.2322_none_7c91d6e7c9f7f1f5\TiWorker.exe" + ) + + metadata = SysmonEventEmitter._get_pe_metadata(image, server) + + assert metadata[0] == "10.0.20348.2322" + assert metadata[4] == "TiWorker.exe" + def test_image_load_hashes_follow_rendered_file_identity(self): """Same DLL path with different rendered PE metadata must not share hashes.""" image = r"C:\Program Files\Mozilla Firefox\lgpllibs.dll" diff --git a/tests/unit/test_timing_profiles.py b/tests/unit/test_timing_profiles.py index f36ffbce..067ef6f6 100644 --- a/tests/unit/test_timing_profiles.py +++ b/tests/unit/test_timing_profiles.py @@ -71,7 +71,21 @@ def test_timing_profiles_load_default_relationship(): assert navigation_window.relationship_class == "human_workflow" assert navigation_window.min_ms >= 3000 assert asset_window.relationship_class == "burst_fanout" - assert asset_window.max_ms <= 200 + assert asset_window.min_ms >= 1500 + + zeek_conn_window = get_timing_window( + "source.zeek_conn_start", + default_min_ms=0, + default_max_ms=0, + default_position="after", + ) + zeek_http_window = get_timing_window( + "source.zeek_http_request", + default_min_ms=0, + default_max_ms=0, + default_position="after", + ) + assert asset_window.min_ms > zeek_conn_window.max_ms + zeek_http_window.max_ms sensor_timing = network_sensor_observation_timing() assert sensor_timing.clock_skew_min_us == -1500 diff --git a/tests/unit/test_ua_os_mismatch.py b/tests/unit/test_ua_os_mismatch.py index 152bf53e..568a224f 100644 --- a/tests/unit/test_ua_os_mismatch.py +++ b/tests/unit/test_ua_os_mismatch.py @@ -135,6 +135,20 @@ def test_certificate_infra_templates_are_not_browser_like(self): assert content_type in allowed_types assert referrer_policy == "none" + def test_standalone_static_proxy_paths_do_not_claim_same_origin_referrers(self): + """Single proxy asset requests should not imply an unseen page load.""" + from evidenceforge.generation.activity.proxy_uri import pick_proxy_uri + + path, _content_type, _method, _ua_override, referrer_policy = pick_proxy_uri( + random.Random(0), + "example.org", + ["web"], + source_os="windows", + ) + + assert path == "/favicon.ico" + assert referrer_policy == "none" + def test_non_browser_proxy_domains_are_not_browser_session_targets(self): """Proxy domain_class controls whether a host can use browser-style site maps.""" from evidenceforge.generation.activity.proxy_uri import is_browser_like_proxy_domain From 5702bbf62e1cc93a7286e217923a32c85ae634b1 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 19:16:59 -0400 Subject: [PATCH 11/61] docs: record loop 6 blind review results --- TODO.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/TODO.md b/TODO.md index 5a89047d..4dc3dd06 100644 --- a/TODO.md +++ b/TODO.md @@ -281,6 +281,16 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r observation offsets. Verification passed with focused timing/cache regressions, `uv run eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3082 passed, 37 skipped`). + - Loop 6 final regeneration and assessment completed from commit `ac30094`: + regenerated `scenarios/iteration-test/scenario.yaml`, quantitative eval passed at + `96/100` across `76,801` records, and hard probes found zero duplicate logind + removals, zero page-asset ordering inversions, zero repeated full static 200s, + and zero TiWorker component/version mismatches. Blind synthetic-confidence scores + were Threat Hunter `74`, Detection `74`, Network `72`, Host/EDR `82` + (average `75.5`). Top Loop 7 targets are PsExec/service-control process lineage + and token semantics, one-shot CLI/eCAR process lifetimes, proxy cache behavior + for immutable third-party static assets, stale Let's Encrypt OCSP responder + mapping, and repeated Linux daemon/syslog texture. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 3f66d069fc3e81d68ae0adc987a7f69c942a6fd2 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 19:25:44 -0400 Subject: [PATCH 12/61] fix: improve loop 7 service and cli realism --- TODO.md | 13 ++ .../generation/activity/generator.py | 156 +++++++++++++++--- .../generation/engine/baseline.py | 6 + .../generation/engine/storyline.py | 141 ++++++++++++++-- tests/unit/test_explicit_proxy.py | 99 +++++++++++ tests/unit/test_process_lifetimes.py | 82 ++++++++- tests/unit/test_storyline_command_networks.py | 57 +++++++ 7 files changed, 514 insertions(+), 40 deletions(-) diff --git a/TODO.md b/TODO.md index 4dc3dd06..04f52307 100644 --- a/TODO.md +++ b/TODO.md @@ -291,6 +291,19 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r and token semantics, one-shot CLI/eCAR process lifetimes, proxy cache behavior for immutable third-party static assets, stale Let's Encrypt OCSP responder mapping, and repeated Linux daemon/syslog texture. + - [x] Loop 7 fix pass: routed recent PsExec/HealthMonitor storyline follow-on + command utilities through the installed service wrapper with SYSTEM/0x3e7 + identity, normalized `%SystemRoot%` service images for wrapper reuse, and + shortened one-shot curl/wget/cmd `/c`/PowerShell command lifetimes while + leaving interactive shells unbounded. Hard-probe follow-ups also reject expired + Linux one-shot process owners for later network/proxy attribution and start + explicit-proxy one-shot clients near the request time, then terminate bounded + foreground process owners after observed network activity; this prevents stale + `curl` PIDs from stretching across unrelated requests or remaining open. + Verified with focused regressions, related activity/storyline/spawn tests + (`213 passed`), config validation, Ruff, format check, and full normal + `uv run pytest --no-cov -q` (`3092 passed, 37 skipped`). Regeneration and + blind review follow. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 40f33a88..93efd14a 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -628,7 +628,9 @@ def _linux_foreground_lifetime(process_name: str, command_line: str) -> tuple[fl return (0.2, 2.0) if exe_name in {"grep", "head", "tail", "wc", "env", "printenv", "ss", "ip", "ps"}: return (0.5, 4.0) - if exe_name in {"gzip", "tar", "zip", "scp", "curl", "wget", "kubectl", "docker"}: + if exe_name in {"curl", "wget"}: + return (0.8, 12.0) + if exe_name in {"gzip", "tar", "zip", "scp", "kubectl", "docker"}: return (3.0, 18.0) if exe_name in {"make", "gcc", "cargo", "npm", "python", "python3", "mysqldump"}: return (8.0, 45.0) @@ -637,6 +639,15 @@ def _linux_foreground_lifetime(process_name: str, command_line: str) -> tuple[fl return (1.0, 8.0) +_LINUX_ONE_SHOT_NETWORK_EXES: set[str] = { + "curl", + "wget", + "scp", + "kubectl", + "mysqldump", +} + + def _windows_foreground_lifetime( process_name: str, command_line: str ) -> tuple[float, float] | None: @@ -657,6 +668,8 @@ def _windows_foreground_lifetime( ) ): return None + if exe_name in {"curl.exe", "curl", "wget.exe", "wget"}: + return (0.8, 12.0) if exe_name in { "whoami.exe", "hostname.exe", @@ -682,7 +695,25 @@ def _windows_foreground_lifetime( "wevtutil.exe", }: return (0.4, 6.0) - if exe_name in {"powershell.exe", "pwsh.exe", "cmd.exe", "wmic.exe", "certutil.exe"}: + padded_command = f" {command} " + if exe_name == "cmd.exe": + if " /c " in padded_command: + return (0.4, 8.0) + return None + if exe_name in {"powershell.exe", "pwsh.exe"}: + one_shot_markers = ( + " -command ", + " -encodedcommand ", + " -enc ", + " -file ", + " invoke-webrequest", + " iwr ", + " downloadstring", + ) + if any(marker in padded_command for marker in one_shot_markers): + return (2.0, 25.0) + return None + if exe_name in {"wmic.exe", "certutil.exe"}: return (4.0, 35.0) if exe_name == "sqlcmd.exe" and " -q " in f" {command} ": return (2.0, 25.0) @@ -1774,6 +1805,7 @@ def __init__( self._recent_connection_tuples: dict[tuple[str, int, str, int, str], float] = {} self._recent_icmp_observations: set[tuple[str, int, str, int, int]] = set() self._ssh_source_ports: set[tuple[str, str, int]] = set() + self._terminated_process_keys: set[tuple[str, int]] = set() self._dns_cache: dict[tuple[str, str, str], float] = {} self._dns_cache_last_prune = 0.0 self._tls_seen_server_names: set[str] = set() @@ -2356,6 +2388,7 @@ def _ensure_explicit_proxy_client_process( and proc.image.lower() == image_lower and proc.start_time is not None and proc.start_time <= time + and not self._foreground_process_expired_for_attribution(source_system, proc, time) ] if running_candidates: proc = max(running_candidates, key=lambda candidate: candidate.start_time) @@ -2372,7 +2405,11 @@ def _ensure_explicit_proxy_client_process( f"{source_system.hostname}:{user.username}:{image}:{proxy_context.host}" ) ) - lead_seconds = process_rng.uniform(12.0, 240.0) + process_lifetime = _windows_foreground_lifetime(image, command_line) + if process_lifetime is not None: + lead_seconds = process_rng.uniform(0.4, min(8.0, process_lifetime[1])) + else: + lead_seconds = process_rng.uniform(12.0, 240.0) process_time = time - timedelta(seconds=lead_seconds) min_process_time = session.start_time + timedelta(milliseconds=500) if process_time < min_process_time: @@ -2408,6 +2445,7 @@ def _caller_explicit_proxy_process_image( source_system: System | None, pid: int, process_image: str | None, + time: datetime, proxy_context: ProxyContext, proxy_sys: System, dst_port: int, @@ -2417,6 +2455,12 @@ def _caller_explicit_proxy_process_image( return None running = self.state_manager.get_process(source_system.hostname, pid) + if running is not None and self._foreground_process_expired_for_attribution( + source_system, + running, + time=time, + ): + return None candidate_image = running.image if running is not None else process_image if not candidate_image: return None @@ -4274,6 +4318,37 @@ def _process_effect_context( return process_name, command_line return proc.image, proc.command_line + def _foreground_process_expired_for_attribution( + self, + system: System, + proc: Any, + time: datetime, + ) -> bool: + """Return whether a bounded foreground process is too old for new effects.""" + if proc is None or proc.start_time is None: + return False + lifetime = self._foreground_process_lifetime_for_attribution(system, proc) + if lifetime is None: + return False + max_process_time = proc.start_time + timedelta(seconds=lifetime[1] + 5.0) + return time > max_process_time + + def _foreground_process_lifetime_for_attribution( + self, + system: System, + proc: Any, + ) -> tuple[float, float] | None: + """Return bounded foreground lifetime for process-owned network attribution.""" + os_category = _get_os_category(system.os) + if os_category == "windows": + return _windows_foreground_lifetime(proc.image, proc.command_line) + if os_category == "linux": + exe_name = proc.image.rsplit("/", 1)[-1].lower() + if exe_name not in _LINUX_ONE_SHOT_NETWORK_EXES: + return None + return _linux_foreground_lifetime(proc.image, proc.command_line) + return None + def _space_browser_launch( self, *, @@ -5011,6 +5086,10 @@ def generate_process_termination( """ from evidenceforge.events.contexts import ProcessContext + termination_key = (system.hostname, pid) + if termination_key in self._terminated_process_keys: + return + running_proc = self.state_manager.get_process(system.hostname, pid) if ( running_proc is not None @@ -5080,6 +5159,7 @@ def generate_process_termination( ) self.dispatcher.dispatch(event) + self._terminated_process_keys.add(termination_key) logger.debug( f"Generated process termination: {process_name} (PID: {pid}) on {system.hostname}" @@ -5544,6 +5624,7 @@ def generate_connection( source_system=source_system, pid=pid, process_image=process_image, + time=time, proxy_context=proxy_context, proxy_sys=proxy_sys, dst_port=dst_port, @@ -5784,28 +5865,23 @@ def generate_connection( if ( resolved_process and resolved_process.start_time - and _get_os_category(resolved_source_system.os) == "windows" + and self._foreground_process_expired_for_attribution( + resolved_source_system, + resolved_process, + time, + ) ): - process_lifetime = _windows_foreground_lifetime( + logger.debug( + "Dropping expired foreground process attribution: " + "host=%s pid=%s image=%s dst=%s:%s", + resolved_source_system.hostname, + pid, resolved_process.image, - resolved_process.command_line, + dst_ip, + dst_port, ) - if process_lifetime is not None: - max_process_time = resolved_process.start_time + timedelta( - seconds=process_lifetime[1] + 5.0 - ) - if time > max_process_time: - logger.debug( - "Dropping expired foreground process attribution: " - "host=%s pid=%s image=%s dst=%s:%s", - resolved_source_system.hostname, - pid, - resolved_process.image, - dst_ip, - dst_port, - ) - pid = -1 - resolved_process = None + pid = -1 + resolved_process = None elif resolved_process is None and pid != 4: logger.debug( "Dropping stale connection PID attribution: host=%s pid=%s dst=%s:%s", @@ -6944,6 +7020,42 @@ def generate_connection( application=wfp_application, ) + if ( + pid > 0 + and resolved_source_system is not None + and process_ctx is not None + and (resolved_source_system.hostname, pid) not in self._terminated_process_keys + ): + running = self.state_manager.get_process(resolved_source_system.hostname, pid) + lifetime = ( + self._foreground_process_lifetime_for_attribution(resolved_source_system, running) + if running is not None + else None + ) + if lifetime is not None and re.match(r"^[a-zA-Z0-9._$-]+$", running.username): + known_users = getattr(self, "_users_by_username", {}) + process_user = known_users.get(running.username) or User( + username=running.username, + full_name=running.username, + email=f"{running.username}@example.local", + ) + term_rng = random.Random( + _stable_seed( + "connection_owned_foreground_termination:" + f"{resolved_source_system.hostname}:{pid}:{time.isoformat()}" + ) + ) + min_delay = min(max(lifetime[0], 0.5), 4.0) + max_delay = max(min_delay + 0.5, min(lifetime[1] + 8.0, 45.0)) + self.generate_process_termination( + user=process_user, + system=resolved_source_system, + time=time + timedelta(seconds=term_rng.uniform(min_delay, max_delay)), + pid=pid, + process_name=running.image, + logon_id=running.logon_id, + ) + return uid def generate_ssh_session( diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index cfc5108b..f2274cf9 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -2200,6 +2200,12 @@ def _terminate_stale_processes(self, current_hour: datetime) -> None: "tasklist.exe", "sc.exe", "wevtutil.exe", + "curl", + "wget", + "scp", + "kubectl", + "mysqldump", + "sqlcmd", ) # Collect all seeded system PIDs for this system as a safety net diff --git a/src/evidenceforge/generation/engine/storyline.py b/src/evidenceforge/generation/engine/storyline.py index 9dfeaea5..b347a3b3 100644 --- a/src/evidenceforge/generation/engine/storyline.py +++ b/src/evidenceforge/generation/engine/storyline.py @@ -727,6 +727,97 @@ def _record_storyline_service_install( "installed_at": time, } + @staticmethod + def _normalize_storyline_service_file_name(service_file_name: str) -> str: + """Return a Windows service image path in source-native expanded form.""" + image = service_file_name.strip().strip('"') + replacements = { + "%SystemRoot%": r"C:\Windows", + "%systemroot%": r"C:\Windows", + r"\SystemRoot": r"C:\Windows", + } + for marker, replacement in replacements.items(): + if image.startswith(marker): + image = replacement + image[len(marker) :] + break + return image.replace("/", "\\") + + @staticmethod + def _service_account_user(service_account: str) -> User | None: + """Return a User model for service identities that can own process telemetry.""" + normalized = service_account.strip().replace("/", "\\") + account_key = normalized.upper() + if account_key in {"LOCALSYSTEM", "LOCAL SYSTEM", "NT AUTHORITY\\SYSTEM", "SYSTEM"}: + return User( + username="SYSTEM", + full_name="Local System", + email="system@example.local", + ) + return None + + def _storyline_service_context_for_process( + self, + actor: User, + system: System, + time: datetime, + process_name: str, + ) -> tuple[User, str, int] | None: + """Return service identity/logon/parent PID for recent service-backed commands.""" + if _get_os_category(system.os) != "windows": + return None + services = getattr(self, "_last_storyline_service_by_system", {}) + service = services.get(system.hostname) + if not service: + return None + + installed_at = service.get("installed_at") + if isinstance(installed_at, datetime): + if time < installed_at or time - installed_at > timedelta(minutes=30): + return None + + service_file_name = str(service.get("service_file_name") or "") + if not service_file_name: + return None + service_image = self._normalize_storyline_service_file_name(service_file_name) + service_exe = service_image.rsplit("\\", 1)[-1].lower() + if service_exe not in {"psexesvc.exe", "healthmonitorsvc.exe"}: + return None + + process_exe = process_name.rsplit("\\", 1)[-1].rsplit("/", 1)[-1].lower() + if process_exe == service_exe: + return None + service_child_exes = { + "cmd.exe", + "powershell.exe", + "pwsh.exe", + "net.exe", + "net1.exe", + "whoami.exe", + "hostname.exe", + "ipconfig.exe", + "nltest.exe", + "klist.exe", + "sc.exe", + "wevtutil.exe", + "wmic.exe", + "certutil.exe", + } + if process_exe not in service_child_exes: + return None + + service_user = self._service_account_user(str(service.get("service_account") or "")) + if service_user is None: + return None + + service_pid, _service_image = self._ensure_storyline_service_process_for_beacon( + actor=service_user, + system=system, + time=time, + ) + if service_pid <= 0: + return None + return service_user, "0x3e7", service_pid + @staticmethod def _scheduled_task_lookup_key(system: System, task_name: str) -> tuple[str, str]: """Return a normalized host/task key for correlating schtasks with 4698.""" @@ -775,6 +866,7 @@ def _ensure_storyline_service_process_for_beacon( service_file_name = str(service.get("service_file_name") or "") if not service_file_name: return -1, None + service_file_name = self._normalize_storyline_service_file_name(service_file_name) image_lower = service_file_name.lower() running = [ @@ -1329,19 +1421,30 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: process_command_line = inferred_command_line output_file = self._extract_output_file(command_line, os_category) - parent_pid = self.activity_generator._resolve_parent( - system, actor, time, logon_id, process_name + process_actor = actor + process_logon_id = logon_id + service_context = self._storyline_service_context_for_process( + actor=actor, + system=system, + time=time, + process_name=process_name, ) + if service_context is not None: + process_actor, process_logon_id, parent_pid = service_context + else: + parent_pid = self.activity_generator._resolve_parent( + system, actor, time, logon_id, process_name + ) exe_name = process_name.rsplit("\\", 1)[-1].rsplit("/", 1)[-1].lower() service_backed_process = "service_installed" in explicit_types and exe_name in { "psexesvc.exe", "healthmonitorsvc.exe", } pid = self.activity_generator.generate_process( - user=actor, + user=process_actor, system=system, time=time, - logon_id=logon_id, + logon_id=process_logon_id, process_name=process_name, command_line=process_command_line, parent_pid=parent_pid, @@ -1349,7 +1452,7 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: from_storyline=True, suppress_command_file_effect=output_file is not None, ) - self.activity_generator._record_user_process(system, actor, pid, process_name) + self.activity_generator._record_user_process(system, process_actor, pid, process_name) self._record_last_storyline_process(system, pid, process_name) malicious_event["process_name"] = process_name malicious_event["command_line"] = command_line @@ -1360,7 +1463,11 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: if output_file: if os_category == "linux" and output_file.startswith("~/"): - home = "/root" if actor.username == "root" else f"/home/{actor.username}" + home = ( + "/root" + if process_actor.username == "root" + else f"/home/{process_actor.username}" + ) output_file = f"{home}/{output_file[2:]}" file_time = time + timedelta(seconds=rng.uniform(0.5, 3.0)) from evidenceforge.events.base import SecurityEvent @@ -1379,14 +1486,14 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: timestamp=file_time, event_type="file_create", src_host=host_ctx, - auth=AuthContext(username=actor.username), + auth=AuthContext(username=process_actor.username), process=ProcessContext( pid=pid, parent_pid=parent_pid, image=process_name, command_line=process_command_line, - username=actor.username, - logon_id=logon_id, + username=process_actor.username, + logon_id=process_logon_id, start_time=running_proc.start_time if running_proc is not None else None, @@ -1501,11 +1608,11 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: self._emit_scp_receiver_artifacts( source_system=system, target_system=target_system, - actor=actor, + actor=process_actor, source_pid=pid, source_process=process_name, source_command=command_line, - target_user=scp_destination[2] or actor.username, + target_user=scp_destination[2] or process_actor.username, target_path=scp_destination[1], transfer_time=transfer_time, source_port=source_port, @@ -1526,10 +1633,10 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: if uses_explicit_creds and os_category == "windows": cred_time = time - timedelta(milliseconds=rng.randint(5, 50)) self.activity_generator.generate_explicit_credentials( - user=actor, + user=process_actor, system=system, time=cred_time, - target_username=actor.username, + target_username=process_actor.username, target_server="localhost", process_name=process_name, process_pid=pid, @@ -1539,11 +1646,11 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: self.activity_generator._expand_and_emit( "process_create", time, - actor=actor, + actor=process_actor, target_system=system, command_line=command_line, os_category=os_category, - logon_id=logon_id, + logon_id=process_logon_id, skip_types=explicit_types, ) @@ -1554,12 +1661,12 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: term_delay = rng.uniform(lifetime[0], lifetime[1]) term_time = time + timedelta(seconds=term_delay) self._queue_story_process_termination( - actor=actor, + actor=process_actor, system=system, time=term_time, pid=pid, process_name=process_name, - logon_id=logon_id, + logon_id=process_logon_id, ) if os_category == "linux": self._storyline_shell_available_at[shell_key] = term_time diff --git a/tests/unit/test_explicit_proxy.py b/tests/unit/test_explicit_proxy.py index 280518a8..9764cef0 100644 --- a/tests/unit/test_explicit_proxy.py +++ b/tests/unit/test_explicit_proxy.py @@ -605,6 +605,105 @@ def test_matching_caller_proxy_process_is_preserved_for_storyline_download(self) assert client_event.process.username == "SYSTEM" assert client_event.process.command_line.endswith("AA==") + def test_one_shot_proxy_client_process_starts_near_request_time(self): + generator, _emitters = _generator( + [ + NetworkSensor( + type="network", + name="client-tap", + monitoring_segments=["workstations"], + direction="outbound", + log_formats=["zeek"], + ) + ] + ) + _seed_proxy_client_user_session(generator) + workstation = generator._ip_to_system["10.0.1.10"] + proxy = generator._ip_to_system["10.0.3.10"] + generator._explicit_proxy_client_process_hint = Mock( + return_value=( + r"C:\Windows\System32\curl.exe", + 'curl.exe --proxy http://PROXY-01.example.org:8080 "https://www.bing.com/"', + ) + ) + request_time = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + + pid, image = generator._ensure_explicit_proxy_client_process( + source_system=workstation, + time=request_time, + proxy_context=ProxyContext( + client_ip=workstation.ip, + method="CONNECT", + url="www.bing.com:443", + host="www.bing.com", + status_code=200, + user_agent="curl/8.4.0", + proxy_fqdn="PROXY-01.example.org", + ), + proxy_sys=proxy, + dst_port=443, + ) + + proc = generator.state_manager.get_process(workstation.hostname, pid) + assert image == r"C:\Windows\System32\curl.exe" + assert proc is not None + lead_seconds = (request_time - proc.start_time).total_seconds() + assert 0 < lead_seconds <= 8.0 + + def test_one_shot_proxy_client_process_terminates_after_request(self): + generator, _emitters = _generator( + [ + NetworkSensor( + type="network", + name="client-tap", + monitoring_segments=["workstations"], + direction="outbound", + log_formats=["zeek"], + ) + ] + ) + _seed_proxy_client_user_session(generator) + workstation = generator._ip_to_system["10.0.1.10"] + generator._explicit_proxy_client_process_hint = Mock( + return_value=( + r"C:\Windows\System32\curl.exe", + 'curl.exe --proxy http://PROXY-01.example.org:8080 "https://www.bing.com/"', + ) + ) + generator._build_proxy_context = Mock( + return_value=ProxyContext( + client_ip=workstation.ip, + method="CONNECT", + url="www.bing.com:443", + host="www.bing.com", + status_code=200, + user_agent="curl/8.4.0", + proxy_fqdn="PROXY-01.example.org", + cache_result="MISS", + ) + ) + + generator.generate_connection( + src_ip=workstation.ip, + dst_ip="204.79.197.200", + time=datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + dst_port=443, + proto="tcp", + service="ssl", + duration=1.0, + orig_bytes=500, + resp_bytes=5000, + source_system=workstation, + hostname="www.bing.com", + conn_state="SF", + ) + + active_images = [ + proc.image + for proc in generator.state_manager.get_processes_on_system(workstation.hostname) + ] + assert r"C:\Windows\System32\curl.exe" not in active_images + def test_documentation_ip_with_external_hostname_routes_through_proxy(self): generator, emitters = _generator( [ diff --git a/tests/unit/test_process_lifetimes.py b/tests/unit/test_process_lifetimes.py index 26be0590..4b40c823 100644 --- a/tests/unit/test_process_lifetimes.py +++ b/tests/unit/test_process_lifetimes.py @@ -7,8 +7,14 @@ import pytest -from evidenceforge.generation.activity.generator import _windows_foreground_lifetime +from evidenceforge.generation.activity import ActivityGenerator +from evidenceforge.generation.activity.generator import ( + _linux_foreground_lifetime, + _windows_foreground_lifetime, +) from evidenceforge.generation.engine.baseline import _eligible_for_hourly_module_load +from evidenceforge.generation.state_manager import StateManager +from evidenceforge.models.scenario import System from evidenceforge.models.state import RunningProcess @@ -61,6 +67,80 @@ def test_windows_one_shot_admin_utilities_have_short_lifetimes( assert lifetime[1] <= 6.0 +@pytest.mark.parametrize( + ("image", "command_line"), + [ + ( + r"C:\Windows\System32\curl.exe", + "curl.exe --proxy http://PROXY-01:8080 http://www.bing.com/", + ), + ( + r"C:\Windows\System32\cmd.exe", + "cmd.exe /c whoami /all", + ), + ( + r"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe", + "powershell.exe -NoProfile -Command Invoke-WebRequest https://example.test", + ), + ], +) +def test_windows_one_shot_shell_and_http_commands_have_bounded_lifetimes( + image: str, command_line: str +) -> None: + lifetime = _windows_foreground_lifetime(image, command_line) + + assert lifetime is not None + assert lifetime[1] <= 25.0 + + +@pytest.mark.parametrize( + ("image", "command_line"), + [ + ("/usr/bin/curl", "curl -sS https://grafana.example/api/health"), + ("/usr/bin/wget", "wget -qO- https://api.example/status"), + ], +) +def test_linux_http_cli_commands_have_short_lifetimes(image: str, command_line: str) -> None: + lifetime = _linux_foreground_lifetime(image, command_line) + + assert lifetime is not None + assert lifetime[1] <= 12.0 + + +def test_expired_linux_curl_is_not_valid_for_later_network_attribution() -> None: + start = datetime(2024, 3, 18, 13, 28, 11, tzinfo=UTC) + proc = _process("/usr/bin/curl", "curl -sS https://grafana.example/api/health", start) + system = System( + hostname="APP-INT-01", + ip="10.10.2.30", + os="Ubuntu 22.04", + type="server", + ) + generator = ActivityGenerator(StateManager(), {}) + + assert not generator._foreground_process_expired_for_attribution( + system, + proc, + start + timedelta(seconds=10), + ) + assert generator._foreground_process_expired_for_attribution( + system, + proc, + start + timedelta(minutes=5), + ) + + +def test_interactive_windows_shells_are_not_forced_to_short_lifetimes() -> None: + assert _windows_foreground_lifetime(r"C:\Windows\System32\cmd.exe", "cmd.exe /k") is None + assert ( + _windows_foreground_lifetime( + r"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe", + "powershell.exe", + ) + is None + ) + + def test_hourly_module_noise_skips_stale_one_shot_processes() -> None: start = datetime(2024, 3, 18, 13, 28, 11, tzinfo=UTC) proc = _process( diff --git a/tests/unit/test_storyline_command_networks.py b/tests/unit/test_storyline_command_networks.py index 527f301e..940b8310 100644 --- a/tests/unit/test_storyline_command_networks.py +++ b/tests/unit/test_storyline_command_networks.py @@ -106,6 +106,9 @@ def generate_bash_command(self, *args: Any, **kwargs: Any) -> None: def _resolve_parent(self, *args: Any, **kwargs: Any) -> int: return 1 + def _get_system_pid(self, *args: Any, **kwargs: Any) -> int: + return 500 + def generate_process(self, *args: Any, **kwargs: Any) -> int: self.processes.append(kwargs) return 4242 @@ -138,6 +141,9 @@ class _FakeStateManager: def get_sessions_for_user(self, username: str) -> list[SimpleNamespace]: return [SimpleNamespace(system="SRC", logon_id="0xabc")] + def get_processes_on_system(self, hostname: str) -> list[SimpleNamespace]: + return [] + def mark_story_process(self, hostname: str, pid: int) -> None: return None @@ -318,6 +324,57 @@ def test_process_url_network_reuses_storyline_authored_domain_ip(self): assert conn["hostname"] == "cdn-assets-update.com" assert conn["preserve_dst_ip"] is True + def test_recent_psexesvc_service_runs_follow_on_commands_as_system(self): + source = System( + hostname="DC-01", + ip="10.10.0.10", + os="Windows Server 2022", + type="domain_controller", + ) + actor = User( + username="alice", + full_name="Alice Example", + email="alice@example.com", + ) + engine = object.__new__(StorylineMixin) + engine.scenario = SimpleNamespace( + environment=SimpleNamespace(systems=[source], service_accounts=[]) + ) + engine.state_manager = _FakeStateManager() + engine.activity_generator = _FakeActivityGenerator() + engine.dispatcher = SimpleNamespace(visibility_engine=None) + service_time = datetime(2026, 5, 11, 12, 0, tzinfo=UTC) + engine._record_storyline_service_install( + system=source, + service_name="PSEXESVC", + service_file_name=r"%SystemRoot%\PSEXESVC.exe", + service_account="LocalSystem", + time=service_time, + ) + spec = SimpleNamespace( + type="process", + process_name=r"C:\Windows\System32\cmd.exe", + command_line="cmd.exe /c whoami /all", + ) + + engine._execute_typed_event( + spec=spec, + actor=actor, + system=source, + time=service_time.replace(second=2), + activity="run remote command through psexec service", + explicit_types={"process"}, + ) + + service_proc = engine.activity_generator.processes[0] + child_proc = engine.activity_generator.processes[1] + assert service_proc["user"].username == "SYSTEM" + assert service_proc["process_name"] == r"C:\Windows\PSEXESVC.exe" + assert service_proc["parent_pid"] == 500 + assert child_proc["user"].username == "SYSTEM" + assert child_proc["logon_id"] == "0x3e7" + assert child_proc["parent_pid"] == 4242 + def test_storyline_dhcp_lease_reuses_existing_host_lease_identity(self): source = System( hostname="ROGUE-LAPTOP", From 63743030dbc9c22466255d170a22e6888ae74cde Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 19:54:00 -0400 Subject: [PATCH 13/61] fix: close tracked foreground processes at finalize --- TODO.md | 8 ++ .../generation/activity/generator.py | 119 ++++++++++++++++-- src/evidenceforge/generation/engine/core.py | 3 + tests/unit/test_process_lifetimes.py | 87 ++++++++++++- 4 files changed, 206 insertions(+), 11 deletions(-) diff --git a/TODO.md b/TODO.md index 04f52307..fea2661d 100644 --- a/TODO.md +++ b/TODO.md @@ -304,6 +304,14 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r (`213 passed`), config validation, Ruff, format check, and full normal `uv run pytest --no-cov -q` (`3092 passed, 37 skipped`). Regeneration and blind review follow. + - [x] Loop 7 hard-probe follow-up: one late Linux bash/eCAR + `curl -sI https://localhost` process remained open despite being a bounded + foreground command. Added a finalization backstop for tracked one-shot shell + process telemetry before emitters close while preserving commands whose + expected termination falls beyond the visible window. Verification passed + with focused lifetime/storyline/proxy regressions, config validation, Ruff, + format check, and full normal `uv run pytest --no-cov -q` + (`3094 passed, 37 skipped`). Regeneration and blind review follow. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 93efd14a..b3f418c2 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -1814,6 +1814,9 @@ def __init__( self._tls_ocsp_windows: dict[tuple[str, str, int], tuple[int, int]] = {} self._ntp_association_profiles: dict[tuple[str, str], dict[str, float | int]] = {} self._bash_history_next_time: dict[tuple[str, str], datetime] = {} + self._foreground_process_finalizers: dict[ + tuple[str, int], tuple[System, str, str, str, datetime] + ] = {} self._loaded_modules_by_process: set[tuple[str, int, str, str]] = set() self._last_one_shot_cli_launch_by_exe: dict[tuple[str, str, str, str], datetime] = {} self._last_one_shot_cli_launch_by_command: dict[ @@ -1826,6 +1829,92 @@ def __init__( self._causal_engine = causal_engine or CausalExpansionEngine() self._expanding_types: set[str] = set() + def _remember_foreground_process_finalizer( + self, + *, + system: System, + user: User, + pid: int, + process_name: str, + logon_id: str, + termination_time: datetime, + ) -> None: + """Track a bounded foreground process until its terminate event is observed.""" + self._foreground_process_finalizers[(system.hostname, pid)] = ( + system, + user.username, + process_name, + logon_id, + ensure_utc(termination_time), + ) + + def finalize_foreground_process_lifetimes(self, end_time: datetime) -> None: + """Close any tracked one-shot foreground shell processes still running. + + Most shell telemetry emits its terminate row immediately after the create row. This + finalization pass is a safety net for slice-end and session-interleaving edge cases + where a bounded foreground command stayed active in state despite its expected + lifetime being inside the visible window. + """ + known_users = getattr(self, "_users_by_username", {}) + window_end = ensure_utc(end_time) + for key, ( + system, + username, + process_name, + logon_id, + termination_time, + ) in sorted(self._foreground_process_finalizers.items(), key=lambda item: item[1][4]): + if key in self._terminated_process_keys or termination_time > window_end: + continue + running = self.state_manager.get_process(system.hostname, key[1]) + if running is None: + continue + process_user = known_users.get(username) or User( + username=username, + full_name=username, + email=f"{username}@example.local", + ) + self.generate_process_termination( + user=process_user, + system=system, + time=termination_time, + pid=key[1], + process_name=running.image or process_name, + logon_id=running.logon_id or logon_id, + ) + + def _generate_bounded_foreground_process_termination( + self, + *, + user: User, + system: System, + start_time: datetime, + pid: int, + process_name: str, + logon_id: str, + lifetime: tuple[float, float], + rng: random.Random, + ) -> None: + """Emit and track termination for a bounded foreground command process.""" + termination_time = start_time + timedelta(seconds=rng.uniform(*lifetime)) + self._remember_foreground_process_finalizer( + system=system, + user=user, + pid=pid, + process_name=process_name, + logon_id=logon_id, + termination_time=termination_time, + ) + self.generate_process_termination( + user=user, + system=system, + time=termination_time, + pid=pid, + process_name=process_name, + logon_id=logon_id, + ) + def _ntp_association_profile(self, src_ip: str, dst_ip: str) -> dict[str, float | int]: """Return stable NTP client/server association fields.""" key = (src_ip, dst_ip) @@ -7559,13 +7648,15 @@ def _maybe_emit_bash_process_telemetry( self._record_user_process(system, user, pid, image) lifetime = _linux_foreground_lifetime(image, process_command_line) if lifetime is not None: - self.generate_process_termination( + self._generate_bounded_foreground_process_termination( user=user, system=system, - time=process_time + timedelta(seconds=rng.uniform(*lifetime)), + start_time=process_time, pid=pid, process_name=image, logon_id=session.logon_id, + lifetime=lifetime, + rng=rng, ) def _schedule_bash_history_time( @@ -8694,13 +8785,15 @@ def execute_baseline_activity( effect_command_line, ) if lifetime is not None: - self.generate_process_termination( + self._generate_bounded_foreground_process_termination( user=user, system=system, - time=time + timedelta(seconds=rng.uniform(*lifetime)), + start_time=process_time, pid=pid, process_name=effect_process_name, logon_id=logon_id, + lifetime=lifetime, + rng=rng, ) elif os_category == "linux": self._emit_bash_command_event( @@ -8714,13 +8807,15 @@ def execute_baseline_activity( effect_command_line, ) if lifetime is not None: - self.generate_process_termination( + self._generate_bounded_foreground_process_termination( user=user, system=system, - time=process_time + timedelta(seconds=rng.uniform(*lifetime)), + start_time=process_time, pid=pid, process_name=effect_process_name, logon_id=logon_id, + lifetime=lifetime, + rng=rng, ) # Legacy PROCESS_TEMPLATES only for process_system (not user apps/code/build/query) @@ -8743,13 +8838,15 @@ def execute_baseline_activity( self._record_user_process(system, user, pid, process_name) lifetime = _windows_foreground_lifetime(process_name, command_line) if lifetime is not None: - self.generate_process_termination( + self._generate_bounded_foreground_process_termination( user=user, system=system, - time=time + timedelta(seconds=rng.uniform(*lifetime)), + start_time=time, pid=pid, process_name=process_name, logon_id=logon_id, + lifetime=lifetime, + rng=rng, ) elif os_category == "linux" and activity_type in PROCESS_TEMPLATES_LINUX: rng = _get_rng() @@ -8776,13 +8873,15 @@ def execute_baseline_activity( self._emit_bash_command_event(user, system, process_time, command_line) lifetime = _linux_foreground_lifetime(process_name, command_line) if lifetime is not None: - self.generate_process_termination( + self._generate_bounded_foreground_process_termination( user=user, system=system, - time=process_time + timedelta(seconds=rng.uniform(*lifetime)), + start_time=process_time, pid=pid, process_name=process_name, logon_id=logon_id, + lifetime=lifetime, + rng=rng, ) # Connection activities diff --git a/src/evidenceforge/generation/engine/core.py b/src/evidenceforge/generation/engine/core.py index 703b8e61..c6ebab38 100644 --- a/src/evidenceforge/generation/engine/core.py +++ b/src/evidenceforge/generation/engine/core.py @@ -460,6 +460,9 @@ def _finalize(self) -> None: """ logger.info("Finalizing generation") + if self.activity_generator is not None and self.end_time is not None: + self.activity_generator.finalize_foreground_process_lifetimes(self.end_time) + for format_name, emitter in self.emitters.items(): logger.info(f"Stopping {format_name} emitter thread") emitter.close() diff --git a/tests/unit/test_process_lifetimes.py b/tests/unit/test_process_lifetimes.py index 4b40c823..38770091 100644 --- a/tests/unit/test_process_lifetimes.py +++ b/tests/unit/test_process_lifetimes.py @@ -7,6 +7,7 @@ import pytest +from evidenceforge.events.dispatcher import EventDispatcher from evidenceforge.generation.activity import ActivityGenerator from evidenceforge.generation.activity.generator import ( _linux_foreground_lifetime, @@ -14,7 +15,7 @@ ) from evidenceforge.generation.engine.baseline import _eligible_for_hourly_module_load from evidenceforge.generation.state_manager import StateManager -from evidenceforge.models.scenario import System +from evidenceforge.models.scenario import System, User from evidenceforge.models.state import RunningProcess @@ -107,6 +108,90 @@ def test_linux_http_cli_commands_have_short_lifetimes(image: str, command_line: assert lifetime[1] <= 12.0 +def test_finalize_foreground_process_lifetimes_closes_tracked_one_shot() -> None: + start = datetime(2024, 3, 18, 17, 56, 39, tzinfo=UTC) + state = StateManager() + state.set_current_time(start) + dispatcher = EventDispatcher(state_manager=state, emitters={}) + generator = ActivityGenerator(state, {}, dispatcher=dispatcher) + system = System( + hostname="APP-INT-01", + ip="10.10.2.30", + os="Ubuntu 22.04", + type="server", + ) + user = User( + username="marcus.chen", + full_name="Marcus Chen", + email="marcus.chen@example.local", + ) + pid = state.create_process( + system=system.hostname, + parent_pid=0, + image="/usr/bin/curl", + command_line="curl -sI https://localhost", + username=user.username, + integrity_level="Medium", + logon_id="0x1234", + ) + + generator._remember_foreground_process_finalizer( + system=system, + user=user, + pid=pid, + process_name="/usr/bin/curl", + logon_id="0x1234", + termination_time=start + timedelta(seconds=5), + ) + + generator.finalize_foreground_process_lifetimes(start + timedelta(minutes=1)) + + assert state.get_process(system.hostname, pid) is None + assert (system.hostname, pid) in generator._terminated_process_keys + + +def test_finalize_foreground_process_lifetimes_preserves_commands_beyond_window() -> None: + start = datetime(2024, 3, 18, 17, 59, 58, tzinfo=UTC) + state = StateManager() + state.set_current_time(start) + dispatcher = EventDispatcher(state_manager=state, emitters={}) + generator = ActivityGenerator(state, {}, dispatcher=dispatcher) + system = System( + hostname="APP-INT-01", + ip="10.10.2.30", + os="Ubuntu 22.04", + type="server", + ) + user = User( + username="marcus.chen", + full_name="Marcus Chen", + email="marcus.chen@example.local", + ) + pid = state.create_process( + system=system.hostname, + parent_pid=0, + image="/usr/bin/curl", + command_line="curl -sI https://localhost", + username=user.username, + integrity_level="Medium", + logon_id="0x1234", + ) + + generator._remember_foreground_process_finalizer( + system=system, + user=user, + pid=pid, + process_name="/usr/bin/curl", + logon_id="0x1234", + termination_time=start + timedelta(seconds=5), + ) + + generator.finalize_foreground_process_lifetimes(start + timedelta(seconds=2)) + + assert state.get_process(system.hostname, pid) is not None + assert (system.hostname, pid) not in generator._terminated_process_keys + + def test_expired_linux_curl_is_not_valid_for_later_network_attribution() -> None: start = datetime(2024, 3, 18, 13, 28, 11, tzinfo=UTC) proc = _process("/usr/bin/curl", "curl -sS https://grafana.example/api/health", start) From a477484e0ca63569566a7dbb50d497dce1d79ccc Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:08:42 -0400 Subject: [PATCH 14/61] docs: record loop 7 blind review results --- TODO.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/TODO.md b/TODO.md index fea2661d..fc77c4c1 100644 --- a/TODO.md +++ b/TODO.md @@ -312,6 +312,18 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r with focused lifetime/storyline/proxy regressions, config validation, Ruff, format check, and full normal `uv run pytest --no-cov -q` (`3094 passed, 37 skipped`). Regeneration and blind review follow. + - Loop 7 final regeneration and assessment completed from commit `6374303`: + regenerated `scenarios/iteration-test/scenario.yaml`, quantitative eval passed + at `96/100` across `76,838` records, and hard probes found zero PsExec service + semantic violations, zero one-shot CLI lifetimes over 120 seconds, zero + unexplained one-shot terminate gaps after accounting for `enterprise_standard` + eCAR observation drops, zero duplicate or regressing `systemd-logind` removals, + zero web asset-before-referrer inversions, and zero repeated full static 200s. + Blind synthetic-confidence scores were Threat Hunter `72`, Detection `68`, + Network `74`, Host/EDR `76` (average `72.5`). Top Loop 8 targets are + PsExec/security service-install source-native fields (`ServiceStartType`), + HTTP/proxy status-outcome texture, Linux bash/sudo command-pool realism, DNS + tunnel response grammar/TTL diversity, and attack-phase pivot continuity. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From e768f4cfbed41b46cd256be43243aefd17cb042a Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:12:42 -0400 Subject: [PATCH 15/61] fix: correct service install start semantics --- TODO.md | 7 +++++++ .../config/formats/windows_event_security.yaml | 2 +- src/evidenceforge/events/contexts.py | 2 +- .../generation/activity/generator.py | 2 +- src/evidenceforge/generation/causal/rules.py | 17 +++++++++++++++++ tests/unit/test_activity.py | 1 + tests/unit/test_causal_engine.py | 15 +++++++++++++++ 7 files changed, 43 insertions(+), 3 deletions(-) diff --git a/TODO.md b/TODO.md index fc77c4c1..485cd2ca 100644 --- a/TODO.md +++ b/TODO.md @@ -324,6 +324,13 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r PsExec/security service-install source-native fields (`ServiceStartType`), HTTP/proxy status-outcome texture, Linux bash/sudo command-pool realism, DNS tunnel response grammar/TTL diversity, and attack-phase pivot continuity. + - [x] Loop 8 fix pass: corrected PsExec/service-control Windows 4697 + service-install source-native fields by defaulting generated service + installs to demand/manual `ServiceStartType=3`, preserving explicit + `sc create ... start= auto`, and validating with focused service tests, + config validation, Ruff, format check, and full normal + `uv run pytest --no-cov -q` (`3095 passed, 37 skipped`). Regeneration and + blind review follow. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/formats/windows_event_security.yaml b/src/evidenceforge/config/formats/windows_event_security.yaml index 1eea80b2..e7795133 100644 --- a/src/evidenceforge/config/formats/windows_event_security.yaml +++ b/src/evidenceforge/config/formats/windows_event_security.yaml @@ -1345,7 +1345,7 @@ output: {{ ServiceName }} {{ ServiceFileName }} {{ ServiceType | default('0x10') }} - {{ ServiceStartType | default('2') }} + {{ ServiceStartType | default('3') }} {{ ServiceAccount | default('LocalSystem') }} {% elif EventID in [4698, 4699, 4700, 4701] %} {{ SubjectUserSid }} diff --git a/src/evidenceforge/events/contexts.py b/src/evidenceforge/events/contexts.py index 48574b55..c81943c6 100644 --- a/src/evidenceforge/events/contexts.py +++ b/src/evidenceforge/events/contexts.py @@ -288,7 +288,7 @@ class ServiceContext: service_name: str service_file_name: str # Full command line / binary path service_type: str = "0x10" # 0x10=Own Process, 0x20=Share Process - service_start_type: str = "2" # 2=Auto, 3=Manual, 4=Disabled + service_start_type: str = "3" # 2=Auto, 3=Manual/Demand, 4=Disabled service_account: str = "LocalSystem" diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index b3f418c2..5fb7b54a 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -9827,7 +9827,7 @@ def generate_service_installed( service_name: str, service_file_name: str, service_type: str = "0x10", - service_start_type: str = "2", + service_start_type: str = "3", service_account: str = "LocalSystem", ) -> None: """Generate service installed event (4697) on target system.""" diff --git a/src/evidenceforge/generation/causal/rules.py b/src/evidenceforge/generation/causal/rules.py index e204b5e2..38992751 100644 --- a/src/evidenceforge/generation/causal/rules.py +++ b/src/evidenceforge/generation/causal/rules.py @@ -385,6 +385,22 @@ def _make_sid(rid: int | None = None) -> str: if match and "service_installed" not in skip: svc_name = match.group(1) svc_path = match.group(2) + service_start_type = "3" + start_match = re.search( + r"\bstart=\s*(delayed-auto|auto|demand|disabled|boot|system)\b", + cmd, + re.IGNORECASE, + ) + if start_match: + start_value = start_match.group(1).lower() + service_start_type = { + "boot": "0", + "system": "1", + "auto": "2", + "delayed-auto": "2", + "demand": "3", + "disabled": "4", + }[start_value] expanded.append( ExpandedEvent( method="generate_service_installed", @@ -393,6 +409,7 @@ def _make_sid(rid: int | None = None) -> str: "system": ctx.target_system, "service_name": svc_name, "service_file_name": svc_path, + "service_start_type": service_start_type, }, timing=timing, description="4697 service installed from sc create", diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 3200bb22..b7455ea1 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -1686,6 +1686,7 @@ def test_service_payload_file_event_precedes_service_install( call.args[0] for call in mock_emitters["windows_event_security"].emit.call_args_list ] service_event = next(event for event in events if event.event_type == "service_installed") + assert service_event.service.service_start_type == "3" file_event = next( event for event in events diff --git a/tests/unit/test_causal_engine.py b/tests/unit/test_causal_engine.py index f9747911..57810528 100644 --- a/tests/unit/test_causal_engine.py +++ b/tests/unit/test_causal_engine.py @@ -513,6 +513,21 @@ def test_expand_sc_create(self): assert len(result) == 1 assert result[0].method == "generate_service_installed" assert result[0].kwargs["service_name"] == "EvilSvc" + assert result[0].kwargs["service_start_type"] == "3" + + def test_expand_sc_create_auto_start(self): + rule = SupplementaryAuditEvents() + ctx = _make_ctx( + os_category="windows", + command_line='sc create EvilSvc binpath= "C:\\temp\\evil.exe" start= auto', + actor="attacker", + target_system="WS-01", + ) + result = rule.expand("process_create", ctx) + assert len(result) == 1 + assert result[0].method == "generate_service_installed" + assert result[0].kwargs["service_name"] == "EvilSvc" + assert result[0].kwargs["service_start_type"] == "2" def test_expand_wevtutil_cl(self): rule = SupplementaryAuditEvents() From e21a25f5d3d868a266deade8ec2970a46834ffd3 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:19:12 -0400 Subject: [PATCH 16/61] docs: record loop 8 assessment results --- TODO.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/TODO.md b/TODO.md index 485cd2ca..669191b3 100644 --- a/TODO.md +++ b/TODO.md @@ -331,6 +331,17 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r config validation, Ruff, format check, and full normal `uv run pytest --no-cov -q` (`3095 passed, 37 skipped`). Regeneration and blind review follow. + - [x] Loop 8 regeneration, hard probes, quantitative eval, and blind review + completed from commit `e768f4c`: automated eval passed at `96/100` across + `76,837` records, hard probes found zero PSEXESVC service-start or + Security/Sysmon lineage violations (`ServiceStartType=3`, SYSTEM children + under `PSEXESVC.exe`), and blind synthetic-confidence scores were Threat + Hunter `72`, Detection `64`, Network `78`, Host/EDR `80` (average `73.5`). + Top Loop 9 targets are HTTP/proxy status-outcome cleanliness, Linux + bash/syslog command-pool repetition, and rare Windows admin-tool overuse. + - [ ] **IN PROGRESS** Loop 9 fix pass: add realistic HTTP/proxy + status-outcome texture first, targeting the dataset-wide `200` success-rate + fingerprint in Zeek HTTP and proxy access logs. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 46bd9d2b8634f25ff61a79c4147baab85a271db1 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:25:27 -0400 Subject: [PATCH 17/61] fix: diversify web and proxy status outcomes --- TODO.md | 12 +++- .../generation/activity/browsing_session.py | 69 ++++++++++++++++++- .../generation/activity/generator.py | 35 ++++++---- .../generation/engine/baseline.py | 30 +++++++- tests/unit/test_browsing_session.py | 52 ++++++++++---- 5 files changed, 163 insertions(+), 35 deletions(-) diff --git a/TODO.md b/TODO.md index 669191b3..768905bc 100644 --- a/TODO.md +++ b/TODO.md @@ -339,9 +339,15 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r Hunter `72`, Detection `64`, Network `78`, Host/EDR `80` (average `73.5`). Top Loop 9 targets are HTTP/proxy status-outcome cleanliness, Linux bash/syslog command-pool repetition, and rare Windows admin-tool overuse. - - [ ] **IN PROGRESS** Loop 9 fix pass: add realistic HTTP/proxy - status-outcome texture first, targeting the dataset-wide `200` success-rate - fingerprint in Zeek HTTP and proxy access logs. + - [x] Loop 9 fix pass: added realistic HTTP/proxy status-outcome texture by + carrying sampled browser request statuses through `BrowsingRequest` into + generated `HttpContext`, setting status-aware response sizes/MIME fan-out, + and raising proxy CONNECT terminal-failure variance. Isolated browsing + sessions now sample roughly 75% `200` with `304`, `206`, `404`, `403`, and + 5xx outcomes in the tail. Verified with config validation, focused + web/proxy tests, Ruff, format check, and full normal + `uv run pytest --no-cov -q` (`3097 passed, 37 skipped`). Regeneration and + blind review follow. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/browsing_session.py b/src/evidenceforge/generation/activity/browsing_session.py index 0afa65a8..43013474 100644 --- a/src/evidenceforge/generation/activity/browsing_session.py +++ b/src/evidenceforge/generation/activity/browsing_session.py @@ -45,6 +45,7 @@ class BrowsingRequest: is_page_load: bool # True for the page itself, False for subresources response_body_len: int # Estimated response size in bytes request_body_len: int # Estimated request size in bytes + status_code: int = 200 # HTTP response status _INTENSITY_PARAMS: dict[str, dict[str, tuple[int, int]]] = { @@ -80,6 +81,49 @@ def _request_size(rng: random.Random, method: str) -> int: return 0 +def _sample_status_code( + rng: random.Random, + *, + is_page_load: bool, + method: str, + content_type: str, +) -> int: + """Sample realistic browser HTTP outcomes for page and asset requests.""" + if method == "POST": + statuses = [200, 204, 302, 400, 401, 403, 404, 500, 503] + weights = [70, 4, 4, 4, 3, 4, 3, 5, 3] + elif not is_page_load: + statuses = [200, 206, 304, 403, 404, 500, 503] + if content_type.startswith(("image/", "video/", "font/")): + weights = [70, 8, 12, 2, 4, 2, 2] + else: + weights = [74, 2, 12, 3, 4, 3, 2] + else: + statuses = [200, 301, 302, 401, 403, 404, 500, 503] + weights = [80, 2, 4, 2, 3, 5, 2, 2] + return rng.choices(statuses, weights=weights, k=1)[0] + + +def _response_size_for_status_code( + rng: random.Random, + hostname: str, + path: str, + content_type: str, + status_code: int, +) -> int: + """Generate a response body size consistent with the HTTP status.""" + if status_code in {204, 304}: + return 0 + if status_code in {301, 302}: + return rng.randint(120, 480) + if status_code == 206: + full_size = _response_size(rng, hostname, path, content_type) + return max(128, int(full_size * rng.uniform(0.15, 0.65))) + if status_code >= 400: + return response_size_for_status(status_code, hostname, path) + return _response_size(rng, hostname, path, content_type) + + def _sample_profile_timing_ms( rng: random.Random, key: str, @@ -270,6 +314,12 @@ def generate_browsing_session( page = site_map.pages[current_page_idx] page_content_type = normalize_mime_type_for_path(page.path, page.content_type) + page_status = _sample_status_code( + rng, + is_page_load=True, + method="GET", + content_type=page_content_type, + ) visited_indices.append(current_page_idx) page_url = _make_referrer(hostname, page.path, port) @@ -284,8 +334,15 @@ def generate_browsing_session( referrer=previous_page_url, trans_depth=1, is_page_load=True, - response_body_len=_response_size(rng, hostname, page.path, page_content_type), + response_body_len=_response_size_for_status_code( + rng, + hostname, + page.path, + page_content_type, + page_status, + ), request_body_len=_request_size(rng, "GET"), + status_code=page_status, ) ) @@ -297,6 +354,12 @@ def generate_browsing_session( for sub_idx, sub in enumerate(subresources): sub_hostname = sub.host or hostname sub_content_type = normalize_mime_type_for_path(sub.path, sub.content_type) + sub_status = _sample_status_code( + rng, + is_page_load=False, + method=sub.method, + content_type=sub_content_type, + ) delay = _subresource_delay_ms(rng, sub_content_type) @@ -310,13 +373,15 @@ def generate_browsing_session( referrer=page_url, trans_depth=sub_idx + 2, # Page is depth 1, subs start at 2 is_page_load=False, - response_body_len=_response_size( + response_body_len=_response_size_for_status_code( rng, sub_hostname, sub.path, sub_content_type, + sub_status, ), request_body_len=_request_size(rng, sub.method), + status_code=sub_status, ) ) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 5fb7b54a..4f050aba 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -2292,11 +2292,11 @@ def _build_proxy_context( else: cache_result = "MISS" elif explicit_mode and proxy_method == "CONNECT": - if cache_roll < 0.975: + if cache_roll < 0.88: cache_result = "NONE" - elif cache_roll < 0.988: + elif cache_roll < 0.925: cache_result = "DENIED" - elif cache_roll < 0.995: + elif cache_roll < 0.965: cache_result = "AUTH_REQUIRED" else: cache_result = "GATEWAY_ERROR" @@ -6673,10 +6673,14 @@ def generate_connection( cache_result = "MISS" elif proxy_cacheable and cache_roll < 0.30: cache_result = "HIT" - elif cache_roll < 0.95: + elif cache_roll < 0.91: cache_result = "MISS" - else: + elif cache_roll < 0.945: cache_result = "DENIED" + elif cache_roll < 0.975: + cache_result = "AUTH_REQUIRED" + else: + cache_result = "GATEWAY_ERROR" # W3C sc-bytes/cs-bytes are proxy-side accounting fields: # payload plus HTTP/proxy headers for allowed responses, # or proxy-generated error pages for failures. @@ -6686,22 +6690,29 @@ def generate_connection( ) if cache_result == "DENIED": _sc = rng.randint(500, 2000) # proxy error page + elif cache_result == "AUTH_REQUIRED": + _sc = rng.randint(300, 1200) + elif cache_result == "GATEWAY_ERROR": + _sc = rng.randint(250, 1800) elif cache_result == "HIT": _sc = _response_bytes + rng.randint(*_PROXY_SC_OVERHEAD) else: _sc = _response_bytes + rng.randint(*_PROXY_SC_OVERHEAD) + proxy_status_code = ( + event.http.status_code + if event.http is not None + else { + "DENIED": 403, + "AUTH_REQUIRED": 407, + "GATEWAY_ERROR": rng.choice([502, 503, 504]), + }.get(cache_result, 200) + ) event.proxy = ProxyContext( client_ip=src_ip, method=proxy_method, url=url, host=proxy_hostname, - status_code=( - event.http.status_code - if event.http is not None - else 200 - if cache_result != "DENIED" - else 403 - ), + status_code=proxy_status_code, sc_bytes=_sc, cs_bytes=_cs, time_taken=int((duration or 0) * 1000), diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index f2274cf9..94050717 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -4055,6 +4055,22 @@ def _emit_browsing_session( if not session_requests: return + def _http_status_message(status: int) -> str: + return { + 200: "OK", + 204: "No Content", + 206: "Partial Content", + 301: "Moved Permanently", + 302: "Found", + 304: "Not Modified", + 400: "Bad Request", + 401: "Unauthorized", + 403: "Forbidden", + 404: "Not Found", + 500: "Internal Server Error", + 503: "Service Unavailable", + }.get(status, "OK") + # Pick a consistent UA for the entire session if os_cat == "linux": _session_uas = [ @@ -4101,11 +4117,13 @@ def _emit_browsing_session( user_agent=session_ua, request_body_len=req.request_body_len, response_body_len=req.response_body_len, - status_code=200, - status_msg="OK", + status_code=req.status_code, + status_msg=_http_status_message(req.status_code), referrer=req.referrer, trans_depth=req.trans_depth, - resp_mime_types=[req.content_type] if req.content_type else [], + resp_mime_types=[req.content_type] + if req.content_type and req.status_code in {200, 206} + else [], tags=[], ) @@ -5914,7 +5932,13 @@ def _effective_dst_ip(is_external_client: bool) -> str: def _status_message(status: int) -> str: return { 200: "OK", + 204: "No Content", + 206: "Partial Content", + 301: "Moved Permanently", + 302: "Found", 304: "Not Modified", + 400: "Bad Request", + 401: "Unauthorized", 403: "Forbidden", 404: "Not Found", 405: "Method Not Allowed", diff --git a/tests/unit/test_browsing_session.py b/tests/unit/test_browsing_session.py index 864870f3..fb2a69d0 100644 --- a/tests/unit/test_browsing_session.py +++ b/tests/unit/test_browsing_session.py @@ -272,22 +272,42 @@ def test_extension_drives_content_type(self): assert 500 <= request.response_body_len <= 5_000 def test_stable_static_asset_size_for_same_host_and_path(self): - first = generate_browsing_session( - random.Random(42), - "portal.customer.example", - [], - require_browser_like_domain=False, - ) - second = generate_browsing_session( - random.Random(43), - "portal.customer.example", - [], - require_browser_like_domain=False, - ) - first_favicon = next(r for r in first if r.path == "/favicon.ico") - second_favicon = next(r for r in second if r.path == "/favicon.ico") + successful_favicons = [] + for seed in range(60): + requests = generate_browsing_session( + random.Random(seed), + "portal.customer.example", + [], + require_browser_like_domain=False, + ) + favicon = next(r for r in requests if r.path == "/favicon.ico") + if favicon.status_code == 200: + successful_favicons.append(favicon) + if len(successful_favicons) >= 2: + break + + assert len(successful_favicons) >= 2 + assert {r.response_body_len for r in successful_favicons} == { + successful_favicons[0].response_body_len + } + + def test_sessions_include_non_success_http_outcomes(self): + statuses = [] + for seed in range(40): + requests = generate_browsing_session(random.Random(seed), "github.com", []) + statuses.extend(request.status_code for request in requests) + + assert 200 in statuses + assert any(status != 200 for status in statuses) + + def test_empty_body_statuses_have_zero_response_body(self): + requests = [] + for seed in range(80): + requests.extend(generate_browsing_session(random.Random(seed), "github.com", [])) - assert first_favicon.response_body_len == second_favicon.response_body_len + empty_body = [request for request in requests if request.status_code in {204, 304}] + assert empty_body + assert all(request.response_body_len == 0 for request in empty_body) def test_subresource_timing_uses_timing_profile_overlay(self, tmp_path, monkeypatch): overlay = tmp_path / ".eforge" / "config" / "activity" @@ -331,3 +351,5 @@ def test_same_seed_same_output(self): assert a.hostname == b.hostname assert a.path == b.path assert a.referrer == b.referrer + assert a.status_code == b.status_code + assert a.response_body_len == b.response_body_len From 454edf01df7fe35a4af177afbebdcf65b4f3b39e Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:33:55 -0400 Subject: [PATCH 18/61] docs: record loop 9 assessment results --- TODO.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/TODO.md b/TODO.md index 768905bc..7c1844a6 100644 --- a/TODO.md +++ b/TODO.md @@ -348,6 +348,21 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r web/proxy tests, Ruff, format check, and full normal `uv run pytest --no-cov -q` (`3097 passed, 37 skipped`). Regeneration and blind review follow. + - [x] Loop 9 regeneration, HTTP/proxy hard probes, quantitative eval, and + blind review completed from commit `46bd9d2`: automated eval passed at + `95.96` JSON overall (`96/100` human-readable) across `78,166` records, + hard probes confirmed the status-texture gate now passes with proxy `200` + ratio `78.3%` and Zeek HTTP `200` ratio `93.0%`, and blind + synthetic-confidence scores were Threat Hunter `70`, Detection `66`, + Network `68`, Host/EDR `82` (average `71.5`). Top Loop 10 target is + rare Windows admin-tool overuse: generated eCAR still contains 21 bare + `ntdsutil.exe` launches on DC-01 under `services.exe` and 65 + `dsquery.exe` process creates across Windows hosts; Linux bash/syslog + command-pool repetition remains the next broad target. + - [ ] **IN PROGRESS** Loop 10 fix pass: reduce rare Windows admin-tool + overuse at the owning config/generation layer while preserving explicit + storyline/admin context, then regenerate/evaluate and run the final blind + review for the requested up-to-10 batch. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From f0f5c3dd8000849085c81c89bf8a91791e0121c1 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:38:38 -0400 Subject: [PATCH 19/61] fix: reduce rare admin tool background noise --- .../config/activity/application_catalog.yaml | 3 ++ .../config/activity/system_processes.yaml | 4 -- src/evidenceforge/config/schemas.py | 1 + .../activity/application_catalog.py | 35 ++++++++++++++- src/evidenceforge/generation/world_model.py | 4 ++ tests/unit/test_application_catalog.py | 44 +++++++++++++++++++ tests/unit/test_log_realism_fixes.py | 1 + tests/unit/test_phase5_process_pools.py | 16 +++++++ 8 files changed, 103 insertions(+), 5 deletions(-) diff --git a/src/evidenceforge/config/activity/application_catalog.yaml b/src/evidenceforge/config/activity/application_catalog.yaml index 5dd72844..83e84b0c 100644 --- a/src/evidenceforge/config/activity/application_catalog.yaml +++ b/src/evidenceforge/config/activity/application_catalog.yaml @@ -802,6 +802,7 @@ applications: - id: dsquery display_name: "Directory Service Query" + selection_weight: 2 platforms: windows: image_path: "C:\\Windows\\System32\\dsquery.exe" @@ -820,6 +821,7 @@ applications: - 'dsquery.exe group -samid "*admin*" -limit {ad_limit}' categories: [query] personas: [sysadmin, help_desk] + system_types: [domain_controller] - id: ldapsearch display_name: "LDAP Search" @@ -868,6 +870,7 @@ applications: - id: ntdsutil display_name: "NTDS Utility" + selection_weight: 1 platforms: windows: image_path: 'C:\Windows\System32\ntdsutil.exe' diff --git a/src/evidenceforge/config/activity/system_processes.yaml b/src/evidenceforge/config/activity/system_processes.yaml index ac9d1d35..150c2054 100644 --- a/src/evidenceforge/config/activity/system_processes.yaml +++ b/src/evidenceforge/config/activity/system_processes.yaml @@ -106,10 +106,6 @@ system_services: command_templates: - "dns.exe" parent: services - - image: "C:\\Windows\\System32\\ntdsutil.exe" - command_templates: - - "ntdsutil.exe" - parent: services - image: "C:\\Windows\\System32\\ismserv.exe" command_templates: - "ismserv.exe" diff --git a/src/evidenceforge/config/schemas.py b/src/evidenceforge/config/schemas.py index 66fcfd08..c4b426a5 100644 --- a/src/evidenceforge/config/schemas.py +++ b/src/evidenceforge/config/schemas.py @@ -107,6 +107,7 @@ class ApplicationEntry(BaseModel, extra="forbid"): categories: list[str] personas: list[str] system_types: list[str] | None = None + selection_weight: int = Field(default=10, gt=0) # --- Persona --- diff --git a/src/evidenceforge/generation/activity/application_catalog.py b/src/evidenceforge/generation/activity/application_catalog.py index c996b3ce..d9a690cc 100644 --- a/src/evidenceforge/generation/activity/application_catalog.py +++ b/src/evidenceforge/generation/activity/application_catalog.py @@ -141,6 +141,38 @@ def is_persona_allowed(exe_basename: str, os_category: str, persona: str) -> boo return True # Unknown apps are unrestricted +def is_system_type_allowed( + exe_basename: str, + os_category: str, + system_type: str | None, +) -> bool: + """Check if an app can be selected on the given system type. + + Unknown apps remain unrestricted so explicit scenario commands and raw + process names are not blocked by catalog absence. + """ + if not system_type: + return True + data = load_catalog() + lower = exe_basename.lower() + for app in data["applications"]: + platform = app.get("platforms", {}).get(os_category) + if not platform: + continue + path = platform["image_path"] + if os_category == "windows": + basename = path.rsplit("\\", 1)[-1].lower() + else: + basename = path.rsplit("/", 1)[-1].lower() + if ( + basename == lower + or (lower + ".exe") == basename + or basename.replace(".exe", "") == lower + ): + return system_type in app.get("system_types", _ALL_SYSTEM_TYPES) + return True + + def get_app_categories(exe_basename: str, os_category: str) -> list[str]: """Return the catalog categories for an executable, or [] if not found.""" data = load_catalog() @@ -392,7 +424,8 @@ def pick_app_and_command( if not apps: return None - app = rng.choice(apps) + weights = [int(app.get("selection_weight", 10)) for app in apps] + app = rng.choices(apps, weights=weights, k=1)[0] app = _apply_browser_affinity(rng, apps, app, username) platform = app["platforms"][os_category] diff --git a/src/evidenceforge/generation/world_model.py b/src/evidenceforge/generation/world_model.py index 10e56c52..41ec4552 100644 --- a/src/evidenceforge/generation/world_model.py +++ b/src/evidenceforge/generation/world_model.py @@ -858,6 +858,7 @@ def _destination_score(exe: str) -> int: get_app_categories, has_catalog_entry, is_persona_allowed, + is_system_type_allowed, load_catalog, resolve_image_path, ) @@ -877,6 +878,9 @@ def _destination_score(exe: str) -> int: def _is_allowed(exe: str) -> bool: if not has_catalog_entry(exe, os_cat): return False + system_type = getattr(system, "type", None) + if not is_system_type_allowed(exe, os_cat, system_type): + return False allowed = is_persona_allowed(exe, os_cat, persona) # Server-admin sessions also grant sysadmin-level tool access if not allowed and is_server_admin: diff --git a/tests/unit/test_application_catalog.py b/tests/unit/test_application_catalog.py index 071a5c37..f52ad2cf 100644 --- a/tests/unit/test_application_catalog.py +++ b/tests/unit/test_application_catalog.py @@ -10,6 +10,7 @@ _USER_BROWSER_AFFINITY, get_apps_for_persona, get_pe_metadata, + is_system_type_allowed, load_catalog, pick_app_and_command, ) @@ -138,6 +139,11 @@ def test_default_persona_has_common_apps(self): app_ids = {a["id"] for a in apps} assert "chrome" in app_ids or "firefox" in app_ids + def test_dsquery_system_type_restricted_for_generic_selection(self): + assert not is_system_type_allowed("dsquery.exe", "windows", "workstation") + assert not is_system_type_allowed("dsquery.exe", "windows", "server") + assert is_system_type_allowed("dsquery.exe", "windows", "domain_controller") + class TestPeMetadataLookup: """Tests for PE metadata lookup from catalog.""" @@ -222,6 +228,44 @@ def test_no_apps_returns_none(self): result = pick_app_and_command(rng, "default", "windows", "nonexistent_category") assert result is None + def test_selection_weight_biases_catalog_choice(self, monkeypatch): + """Application entries with lower selection_weight should be rarer.""" + from evidenceforge.generation.activity import application_catalog + + apps = [ + { + "id": "common", + "selection_weight": 100, + "platforms": { + "windows": { + "image_path": r"C:\Tools\common.exe", + "command_templates": ["common.exe"], + } + }, + }, + { + "id": "rare", + "selection_weight": 1, + "platforms": { + "windows": { + "image_path": r"C:\Tools\rare.exe", + "command_templates": ["rare.exe"], + } + }, + }, + ] + monkeypatch.setattr( + application_catalog, "get_apps_for_persona", lambda *args, **kwargs: apps + ) + + rng = random.Random(42) + choices = Counter( + pick_app_and_command(rng, "sysadmin", "windows", "query")[0].rsplit("\\", 1)[-1] + for _ in range(400) + ) + + assert choices["common.exe"] > choices["rare.exe"] * 20 + def test_command_templates_are_not_bare_words(self): """P1-3: Command templates should have arguments, not just an exe name.""" rng = random.Random(42) diff --git a/tests/unit/test_log_realism_fixes.py b/tests/unit/test_log_realism_fixes.py index a52c2235..c73ed516 100644 --- a/tests/unit/test_log_realism_fixes.py +++ b/tests/unit/test_log_realism_fixes.py @@ -554,6 +554,7 @@ def test_dc_admin_tools_not_on_workstation(self): apps = get_apps_for_persona("sysadmin", "windows", "query", system_type="workstation") app_ids = {a["id"] for a in apps} + assert "dsquery" not in app_ids, "dsquery should not be generic workstation texture" assert "dcdiag" not in app_ids, "dcdiag should not be on workstations" assert "repadmin" not in app_ids, "repadmin should not be on workstations" diff --git a/tests/unit/test_phase5_process_pools.py b/tests/unit/test_phase5_process_pools.py index b89a3f47..d9f17c7d 100644 --- a/tests/unit/test_phase5_process_pools.py +++ b/tests/unit/test_phase5_process_pools.py @@ -22,6 +22,7 @@ """Unit tests for Phase 5.1.4: Expanded process template pools.""" +import random from datetime import UTC, datetime from types import SimpleNamespace from unittest.mock import Mock @@ -36,6 +37,7 @@ from evidenceforge.generation.activity.system_processes import ( _resolve_host_placeholders, load_system_processes, + pick_system_service_process, ) from evidenceforge.generation.state_manager import StateManager from evidenceforge.models import System, User @@ -131,6 +133,20 @@ def test_tiworker_servicing_stack_placeholder_resolves_by_host_build(self): assert "10.0.19041.3636" in _resolve_host_placeholders(template, workstation) assert "10.0.20348.2322" in _resolve_host_placeholders(template, server) + def test_ntdsutil_not_generic_domain_controller_service_texture(self): + """NTDS utility should appear via explicit admin context, not service noise.""" + data = load_system_processes() + dc_services = data["system_services"]["domain_controller"] + + assert all("ntdsutil.exe" not in entry["image"].lower() for entry in dc_services) + + host = SimpleNamespace(os="Windows Server 2022", type="domain_controller") + picks = [ + pick_system_service_process(random.Random(seed), "domain_controller", host)[0].lower() + for seed in range(100) + ] + assert all("ntdsutil.exe" not in image for image in picks) + class TestBaselinePatterns: """Verify baseline patterns include new activity types.""" From 34731ff8b0a7425c727b1e044db4a08af67ae76f Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 20:48:38 -0400 Subject: [PATCH 20/61] docs: record loop 10 assessment results --- TODO.md | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/TODO.md b/TODO.md index 7c1844a6..396cec30 100644 --- a/TODO.md +++ b/TODO.md @@ -250,7 +250,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Observation-aware automated eval and manifest — generation now writes `OBSERVATION_MANIFEST.json` beside ground truth, `eforge eval` loads it when present, coverage-style causality metrics report raw and observation-adjusted scores for expected non-visible evidence, and correctness/contradiction checks remain strict. Verification passed with config validation, Ruff checks/format checks, focused eval/manifest tests, and full normal `uv run pytest -v` (`3047 passed, 15 skipped`). - [x] Post-host-activity score check — synced `dev`, cleaned up stale TODOs, regenerated/evaluated `scenarios/iteration-test` from the current iteration-test prompt with `enterprise_standard` observation, and ran one blind expert-panel review without entering another fix loop. Automated eval passed at `92.39` over `108,858` records; blind synthetic-confidence averaged `82.75`. Highest-leverage follow-ups are Linux SSH/syslog lifecycle ordering, Zeek observation-tree consistency, X.509 metadata coherence, Windows OS-build/local-SID identity, and static web asset manifests. - [x] Current-dev calibration pass — regenerated and evaluated `scenarios/iteration-test` from current `dev`, fixed actionable cleanliness issues in OCSP optional-field rendering, observation-manifest accounting for sensor-filtered network evidence, Kerberos/domain-logon causal ordering, storyline event timing, storyline trace matching, temporal trace comparison, and visible Windows logon-before-process ordering. Verification passed with `uv run eforge validate-config`, scenario validation with only expected sensor/observation/pivot-linkability warnings, quantitative eval at `94.64` with all hard gates passing, Ruff checks, focused regressions (`164 passed`), and full normal `uv run pytest -v` (`3075 passed, 15 skipped`). -- [ ] **IN PROGRESS** Up-to-10 current-dev assessment continuation — run iterative EvidenceForge realism loops from the latest calibrated iteration-test state, fix the highest-leverage verified findings, commit each completed fix pass, regenerate/evaluate, and preserve loop artifacts. +- [x] Up-to-10 current-dev assessment continuation — completed 10 iterative EvidenceForge realism loops from the latest calibrated iteration-test state, fixed the highest-leverage verified findings, committed each completed fix pass, regenerated/evaluated, and preserved loop artifacts. - Loop 1 baseline eval completed at `93.89` across `107,377` records; blind synthetic-confidence scores were Threat Hunter `76`, Detection `82`, Network `82`, Host/EDR `86`. - Loop 1 fix pass completed and verified: fixed external CIDR-only segment scan target resolution, coherent SSH syslog and Zeek UID observation decisions, OS-aware TLS destination filtering for Windows update/trust-list domains, and Let's Encrypt RSA/ECDSA chain templates. Verification passed with `uv run eforge validate-config`, focused regressions (`11 passed` plus the adjusted certificate regression), `uv run ruff check .`, `uv run ruff format --check .`, and full normal `uv run pytest --no-cov -v` (`3057 passed, 37 skipped`). - Loop 2 regeneration/eval completed at `93.80` JSON overall (`94/100` human-readable) across `116,087` records. Hard probes found zero SSH ordering violations, zero Zeek UID gaps, zero Let's Encrypt R3/X2 mismatches, and zero non-Windows `windowsupdate.com` proxy rows. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `68`, Host/EDR `76` (average `74.5`). @@ -359,10 +359,24 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r `ntdsutil.exe` launches on DC-01 under `services.exe` and 65 `dsquery.exe` process creates across Windows hosts; Linux bash/syslog command-pool repetition remains the next broad target. - - [ ] **IN PROGRESS** Loop 10 fix pass: reduce rare Windows admin-tool - overuse at the owning config/generation layer while preserving explicit - storyline/admin context, then regenerate/evaluate and run the final blind - review for the requested up-to-10 batch. + - [x] Loop 10 fix pass completed and verified: removed `ntdsutil.exe` from + generic domain-controller service-process texture, added data-driven + application selection weights, restricted `dsquery.exe` to DC generic + selection, and enforced catalog `system_types` in the WorldPlanner + process-owner path so workstation/file-server LDAP-like traffic cannot + spawn forbidden admin tools. Verification passed with config validation, + focused app/catalog/world-model regressions, Ruff, format check, and full + normal `uv run pytest --no-cov -q` (`3100 passed, 37 skipped`). + Regeneration and final assessment from commit `f0f5c3d` passed automated + eval at `96/100` across `80,616` records; hard probes found zero eCAR + process-create rows for `ntdsutil.exe`, `dsquery.exe`, `repadmin.exe`, or + `nltest.exe`, preserved proxy status texture (`200` ratio `80.6%`) and + Zeek HTTP status texture (`200` ratio `91.6%`), and blind + synthetic-confidence scores were Threat Hunter `68`, Detection `62`, + Network `66`, Host/EDR `74` (average `67.5`). Best next target for a + future batch is Linux bash/syslog command-pool repetition, followed by + Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache + behavior. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From e9ff69ccd6f68aa10b40caaed6dd5a2d92ee31a1 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 21:09:16 -0400 Subject: [PATCH 21/61] fix: diversify linux command texture --- TODO.md | 1 + .../config/activity/bash_commands.yaml | 47 ++++++++++ .../activity/extra_syslog_messages.yaml | 75 ++++++++++++++-- src/evidenceforge/config/schemas.py | 1 + .../generation/activity/bash_commands.py | 88 +++++++++++++++++-- .../generation/activity/extra_syslog.py | 54 +++++++++++- .../generation/engine/baseline.py | 37 ++++++-- src/evidenceforge/generation/engine/core.py | 4 + tests/unit/test_bash_history_noise.py | 25 ++++++ tests/unit/test_validate_config.py | 30 +++++++ 10 files changed, 335 insertions(+), 27 deletions(-) diff --git a/TODO.md b/TODO.md index 396cec30..9ff57c53 100644 --- a/TODO.md +++ b/TODO.md @@ -377,6 +377,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r future batch is Linux bash/syslog command-pool repetition, followed by Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior. +- [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/config/activity/bash_commands.yaml b/src/evidenceforge/config/activity/bash_commands.yaml index a126f785..d590d6fa 100644 --- a/src/evidenceforge/config/activity/bash_commands.yaml +++ b/src/evidenceforge/config/activity/bash_commands.yaml @@ -27,29 +27,76 @@ common: - "w" - "whoami" - "uname -a" + - "uname -sr" + - "uname -mrs" + - "cat /proc/version | cut -d' ' -f1-3" - "uptime" - "df -h" + - "df -h /" + - "df -h /var" + - "df -h /tmp" - "free -m" + - "free -h" - "ps aux" + - "ps -ef" + - "ps -ef | head" + - "ps aux --sort=-%mem | head" - "ps aux | grep {service}" - "clear" - "hostname -f" - "hostname" - "date" + - "date -u" - "history" - "cat /etc/hostname" - "cat /etc/os-release" + - "cat /etc/issue" + - "cat /etc/passwd | head" - "ll" - "exit" - "cat /etc/resolv.conf" - "cat /proc/cpuinfo | grep 'model name' | head -1" + - "grep -m1 'model name' /proc/cpuinfo" - "echo $SHELL" + - "locale" + - "umask" + - "ulimit -n" - "env | head -20" + - "env | sort | head" - "which python3" + - "command -v python3" - "file /usr/bin/ls" + - "stat /etc/passwd" + - "getent hosts localhost" + - "getent passwd $(whoami)" + - "groups" + - "users" + - "last -5" + - "who -a" + - "ip -br addr" + - "ip route" + - "ip route get 8.8.8.8" + - "ss -tan | head" + - "ss -s" - "ls -ltr /var/log/ | tail -10" + - "ls -lt /var/log | head" + - "ls -ltr /var/log | tail" + - "ls -ld /var/log" + - "ls -lah /tmp | head" + - "find /tmp -maxdepth 1 -type f | head" + - "du -sh /var/log" + - "du -sh /home/* 2>/dev/null | head" - "journalctl --no-pager -n 5" - "journalctl -p err --no-pager -n 10" + - "journalctl --since '10 min ago' --no-pager -n 20" + - "journalctl -xe --no-pager | tail -20" + - "systemctl --failed --no-pager" + - "tail -20 /var/log/syslog" + - "tail -50 /var/log/auth.log" + - "grep -i error /var/log/syslog | tail" + - "grep -i failed /var/log/auth.log | tail" + - "lsmod | head" + - "dmesg --ctime | tail -20" sysadmin: - "systemctl status sshd" diff --git a/src/evidenceforge/config/activity/extra_syslog_messages.yaml b/src/evidenceforge/config/activity/extra_syslog_messages.yaml index 41d4d886..e6f7c922 100644 --- a/src/evidenceforge/config/activity/extra_syslog_messages.yaml +++ b/src/evidenceforge/config/activity/extra_syslog_messages.yaml @@ -37,22 +37,79 @@ programs: - app: sudo transient: true weight: 2 + params: + sudo_user: + - admin + - deploy + - ops + - ubuntu + - svc_app + - backup + tty: + - pts/0 + - pts/1 + - pts/2 + - pts/3 + - pts/5 + cwd: + - /home/admin + - /home/deploy + - /home/ops + - /srv/app + - /etc + - /var/log + - /tmp + service: + - ssh + - cron + - app-agent + - nginx + - apache2 + - mysql + - postgresql + sudo_command: + - /bin/systemctl status {service} + - /bin/systemctl restart {service} + - /usr/bin/journalctl -u {service} -n 50 + - /usr/bin/journalctl -u {service} --since "30 min ago" + - /usr/bin/apt list --upgradable + - /usr/bin/apt-cache policy openssl + - /usr/bin/tail -n 100 /var/log/auth.log + - /usr/bin/tail -n 80 /var/log/syslog + - /usr/bin/grep -i error /var/log/syslog + - /usr/bin/ss -ltnp + - /usr/bin/lsblk + - /usr/bin/df -h /var + - /usr/bin/du -sh /var/log + - /usr/bin/find /var/log -type f -mtime -1 + - /usr/sbin/service {service} status messages: - - "admin : TTY=pts/0 ; PWD=/home/admin ; USER=root ; COMMAND=/bin/systemctl status ssh" - - "deploy : TTY=pts/1 ; PWD=/srv/app ; USER=root ; COMMAND=/usr/bin/systemctl status app-agent" - - "ops : TTY=pts/2 ; PWD=/home/ops ; USER=root ; COMMAND=/usr/bin/journalctl -u ssh -n 50" - - "ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/apt list --upgradable" + - "{sudo_user} : TTY={tty} ; PWD={cwd} ; USER=root ; COMMAND={sudo_command}" - app: sudo transient: true roles: [web_server, forward_proxy] weight: 1 + params: + web_user: + - www-data + - nginx + - apache + denied_command: + - /bin/cat /etc/shadow + - /usr/bin/head /etc/shadow + - /usr/bin/tail /var/log/auth.log + - /usr/bin/find /root -maxdepth 1 + - /usr/bin/id + - /usr/bin/curl -s http://169.254.169.254/latest/meta-data/ + - /bin/ls /root + - /usr/bin/grep -R password /etc + denied_tty: + - unknown + - pts/0 + - pts/2 messages: - - "www-data : command not allowed ; TTY=unknown ; USER=root ; COMMAND=/bin/cat /etc/shadow" - - "www-data : command not allowed ; TTY=unknown ; USER=root ; COMMAND=/usr/bin/head /etc/shadow" - - "www-data : command not allowed ; TTY=unknown ; USER=root ; COMMAND=/usr/bin/tail /var/log/auth.log" - - "nginx : command not allowed ; TTY=unknown ; USER=root ; COMMAND=/usr/bin/find /root -maxdepth 1" - - "apache : command not allowed ; TTY=unknown ; USER=root ; COMMAND=/usr/bin/id" + - "{web_user} : command not allowed ; TTY={denied_tty} ; USER=root ; COMMAND={denied_command}" - app: dhclient messages: diff --git a/src/evidenceforge/config/schemas.py b/src/evidenceforge/config/schemas.py index c4b426a5..dc090a54 100644 --- a/src/evidenceforge/config/schemas.py +++ b/src/evidenceforge/config/schemas.py @@ -160,6 +160,7 @@ class SyslogProgramEntry(BaseModel, extra="forbid"): app: str messages: list[str] + params: dict[str, list[str]] | None = None distro: str | None = None roles: list[str] | None = None transient: bool | None = None diff --git a/src/evidenceforge/generation/activity/bash_commands.py b/src/evidenceforge/generation/activity/bash_commands.py index c9d5dd97..8e9b5d8a 100644 --- a/src/evidenceforge/generation/activity/bash_commands.py +++ b/src/evidenceforge/generation/activity/bash_commands.py @@ -10,6 +10,7 @@ """ import random +from collections import Counter, deque from typing import Any from evidenceforge.config import get_activity_directory @@ -240,6 +241,16 @@ def _typo_allowed( _USER_TOOL_AFFINITY: dict[tuple[str, tuple[str, ...]], list[str]] = {} +_COMMAND_RECENCY_LIMIT = 14 +_COMMAND_CANDIDATE_ATTEMPTS = 16 +_COMMAND_RECENCY: dict[tuple[str, str], deque[str]] = {} +_COMMAND_GLOBAL_COUNTS: Counter[str] = Counter() + + +def reset_bash_command_memory() -> None: + """Clear per-generation bash command memory.""" + _COMMAND_RECENCY.clear() + _COMMAND_GLOBAL_COUNTS.clear() def _get_user_pool(username: str, full_pool: list[str]) -> list[str]: @@ -286,6 +297,49 @@ def _get_user_pool(username: str, full_pool: list[str]) -> list[str]: return primary_pool +def _remember_command(system_hostname: str, username: str, command: str) -> None: + """Record command selection so later picks avoid exact repeated strings.""" + key = (system_hostname.lower(), username.lower()) + recent = _COMMAND_RECENCY.setdefault(key, deque(maxlen=_COMMAND_RECENCY_LIMIT)) + recent.append(command) + _COMMAND_GLOBAL_COUNTS[command] += 1 + + +def _choose_template_with_memory( + rng: random.Random, + pool: list[str], + params: dict[str, list[str]], + system_services: list[str] | None, + system_hostname: str, + username: str, +) -> str: + """Pick a command while suppressing recent and globally overused exact repeats.""" + if not pool: + return "ls" + + key = (system_hostname.lower(), username.lower()) + recent = set(_COMMAND_RECENCY.get(key, ())) + soft_cap = max(4, min(8, max(1, len(pool) // 4))) + attempts = _COMMAND_CANDIDATE_ATTEMPTS + candidates: list[str] = [] + for _ in range(attempts): + template = rng.choice(pool) + command = _resolve_template(template, rng, params, system_services) + candidates.append(command) + if command not in recent and _COMMAND_GLOBAL_COUNTS[command] < soft_cap: + _remember_command(system_hostname, username, command) + return command + + for command in candidates: + if command not in recent: + _remember_command(system_hostname, username, command) + return command + + command = min(candidates, key=lambda candidate: _COMMAND_GLOBAL_COUNTS[candidate]) + _remember_command(system_hostname, username, command) + return command + + def pick_bash_command( rng: random.Random, persona: str, @@ -297,7 +351,7 @@ def pick_bash_command( ) -> str: """Pick a bash command appropriate for the user's role on this server. - Distribution: 60% common, 35% role-specific, 5% typo. + Distribution: roughly 45% common, 50% role-specific, up to 5% typo. Role-specific commands use per-user tool affinity (80% primary tools, 20% full pool) for consistent user behavior. """ @@ -335,20 +389,40 @@ def pick_bash_command_entry( session_command_count=session_command_count, prior_typo_count=prior_typo_count, ): - return _generate_typo(rng, username, commands), True + command = _generate_typo(rng, username, commands) + _remember_command(system_hostname, username, command) + return command, True # Scale remaining thresholds into the non-typo portion _remaining = 1.0 - _user_typo_rate - if roll < _user_typo_rate + _remaining * 0.37: + if roll < _user_typo_rate + _remaining * 0.52: # Role-specific command with per-user tool affinity pool_key = _get_role_pool(persona, server_role) pool = commands.get(pool_key, commands.get("common", ["ls"])) if username and rng.random() < 0.80: pool = _get_user_pool(username, pool) - template = rng.choice(pool) - return _resolve_template(template, rng, params, system_services), False + return ( + _choose_template_with_memory( + rng, + pool, + params, + system_services, + system_hostname, + username, + ), + False, + ) # Common command (60%) common = commands.get("common", ["ls"]) - template = rng.choice(common) - return _resolve_template(template, rng, params, system_services), False + return ( + _choose_template_with_memory( + rng, + common, + params, + system_services, + system_hostname, + username, + ), + False, + ) diff --git a/src/evidenceforge/generation/activity/extra_syslog.py b/src/evidenceforge/generation/activity/extra_syslog.py index e075e0c7..1d181d91 100644 --- a/src/evidenceforge/generation/activity/extra_syslog.py +++ b/src/evidenceforge/generation/activity/extra_syslog.py @@ -55,7 +55,19 @@ def filter_syslog_messages( Returns: List of (app_name, messages, weight) tuples matching the host context. """ - result = [] + return [ + (entry["app"], entry["messages"], int(entry.get("weight", 10))) + for entry in filter_syslog_message_entries(programs, is_rhel_like, host_roles) + ] + + +def filter_syslog_message_entries( + programs: list[dict[str, Any]], + is_rhel_like: bool, + host_roles: list[str] | None, +) -> list[dict[str, Any]]: + """Filter syslog programs by distro and host roles, preserving entry metadata.""" + result: list[dict[str, Any]] = [] for entry in programs: # Distro filter distro = entry.get("distro") @@ -68,5 +80,43 @@ def filter_syslog_messages( if not host_roles or not any(r in host_roles for r in required_roles): continue - result.append((entry["app"], entry["messages"], int(entry.get("weight", 10)))) + result.append(entry) return result + + +def _service_template_values(system_services: list[str] | None, fallback: list[str]) -> list[str]: + """Return service placeholder values that fit the current host when possible.""" + contextual: list[str] = [] + for service in system_services or []: + normalized = service.strip().lower() + if not normalized or normalized in {"dns-client", "systemd"}: + continue + if normalized == "ssh": + normalized = "sshd" + contextual.append(normalized) + return contextual or fallback + + +def render_extra_syslog_message( + entry: dict[str, Any], + rng: Any, + *, + positional_value: Any, + system_services: list[str] | None = None, + values: dict[str, Any] | None = None, +) -> str: + """Render a syslog message template with data-driven placeholder pools.""" + template = rng.choice(entry.get("messages", [""])) + render_values: dict[str, Any] = dict(values or {}) + for key, candidates in (entry.get("params") or {}).items(): + pool = ( + _service_template_values(system_services, candidates) + if key == "service" + else candidates + ) + if pool: + render_values[key] = rng.choice(pool) + for key, value in list(render_values.items()): + if isinstance(value, str) and "{" in value: + render_values[key] = value.format(positional_value, **render_values) + return template.format(positional_value, **render_values) diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 94050717..02f65f8e 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -5566,19 +5566,25 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 # Additional diverse syslog programs — loaded from YAML with # role/distro tags for data-driven filtering. from evidenceforge.generation.activity.extra_syslog import ( - filter_syslog_messages, + filter_syslog_message_entries, load_extra_syslog_messages, + render_extra_syslog_message, ) _all_programs = load_extra_syslog_messages() - filtered = filter_syslog_messages(_all_programs, is_rhel_like, system.roles) + filtered = filter_syslog_message_entries( + _all_programs, + is_rhel_like, + system.roles, + ) if not filtered: continue - app, msgs, _entry_weight = rng.choices( + entry = rng.choices( filtered, - weights=[weight for _app, _messages, weight in filtered], + weights=[int(candidate.get("weight", 10)) for candidate in filtered], k=1, )[0] + app = entry["app"] # Format placeholders vary by daemon if app == "dhclient": # DHCP syslog must be tied to the canonical lease @@ -5586,15 +5592,28 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 continue elif app == "NetworkManager": # NM uses monotonic kernel uptime seconds in [brackets] - msg = rng.choice(msgs).format(uptime) + msg = render_extra_syslog_message( + entry, + rng, + positional_value=uptime, + system_services=system.services, + ) elif app == "systemd-resolved": dns_server = rng.choice(dns_ips) if dns_ips else "10.0.0.1" - msg = rng.choice(msgs).format( - rng.randint(100000, 999999), - dns_server=dns_server, + msg = render_extra_syslog_message( + entry, + rng, + positional_value=rng.randint(100000, 999999), + system_services=system.services, + values={"dns_server": dns_server}, ) else: - msg = rng.choice(msgs).format(rng.randint(100000, 999999)) + msg = render_extra_syslog_message( + entry, + rng, + positional_value=rng.randint(100000, 999999), + system_services=system.services, + ) # Map syslog app names to sys_pids keys for persistent daemons. # Only map to sys_pids entries that are the SAME daemon. _APP_TO_PID_KEY = { diff --git a/src/evidenceforge/generation/engine/core.py b/src/evidenceforge/generation/engine/core.py index c6ebab38..6acb2cbf 100644 --- a/src/evidenceforge/generation/engine/core.py +++ b/src/evidenceforge/generation/engine/core.py @@ -101,6 +101,10 @@ def __init__( # Hawkes process state per user for cross-hour continuity self._hawkes_states: dict = {} + from evidenceforge.generation.activity.bash_commands import reset_bash_command_memory + + reset_bash_command_memory() + def _report_progress(self, event_type: str, data: dict) -> None: """Report progress to callback if registered. diff --git a/tests/unit/test_bash_history_noise.py b/tests/unit/test_bash_history_noise.py index 42b231f3..1855a68c 100644 --- a/tests/unit/test_bash_history_noise.py +++ b/tests/unit/test_bash_history_noise.py @@ -8,6 +8,8 @@ not just the attack user. """ +import random +from collections import Counter from datetime import UTC, datetime, timedelta from unittest.mock import Mock @@ -236,6 +238,29 @@ def randint(self, lower, _upper): assert is_typo is False assert command + def test_bash_picker_suppresses_repeated_exact_commands(self, monkeypatch): + """Generated bash histories should not overuse one exact command string.""" + from evidenceforge.generation.activity import bash_commands + + bash_commands.reset_bash_command_memory() + monkeypatch.setattr(bash_commands, "_typo_rate", lambda _username, _commands: 0.0) + + rng = random.Random(7) + picked = [ + bash_commands.pick_bash_command_entry( + rng, + "sysadmin", + "WEB-01", + ["nginx", "ssh"], + username="deploy", + session_command_count=80, + )[0] + for _ in range(80) + ] + + counts = Counter(picked) + assert max(counts.values()) <= 8 + class TestBashHistoryChronological: """Bash history entries should be chronologically sorted.""" diff --git a/tests/unit/test_validate_config.py b/tests/unit/test_validate_config.py index fe7b0794..51b92f91 100644 --- a/tests/unit/test_validate_config.py +++ b/tests/unit/test_validate_config.py @@ -3,6 +3,8 @@ """Regression test: eforge validate-config must ship 100% clean.""" +import random + from evidenceforge.cli.validate_config import validate_config @@ -1014,6 +1016,34 @@ def load_invalid_extra_syslog_messages(): for issue in result.issues ) + def test_extra_syslog_sudo_templates_render_contextual_services(self): + from evidenceforge.generation.activity.extra_syslog import render_extra_syslog_message + + entry = { + "app": "sudo", + "messages": [ + "{sudo_user} : TTY={tty} ; PWD={cwd} ; USER=root ; COMMAND={sudo_command}" + ], + "params": { + "sudo_user": ["deploy"], + "tty": ["pts/1"], + "cwd": ["/srv/app"], + "service": ["ssh"], + "sudo_command": ["/bin/systemctl status {service}"], + }, + } + + message = render_extra_syslog_message( + entry, + random.Random(5), + positional_value=123456, + system_services=["nginx"], + ) + + assert message == ( + "deploy : TTY=pts/1 ; PWD=/srv/app ; USER=root ; COMMAND=/bin/systemctl status nginx" + ) + def test_validate_config_rejects_invalid_4672_emission_probability(self, monkeypatch): from evidenceforge.generation.activity import windows_auth_realism From 499382974cf893ad53367badd9f203b0b5018e8d Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 21:28:35 -0400 Subject: [PATCH 22/61] docs: record loop 11 assessment results --- TODO.md | 1 + 1 file changed, 1 insertion(+) diff --git a/TODO.md b/TODO.md index 9ff57c53..7d2c328a 100644 --- a/TODO.md +++ b/TODO.md @@ -378,6 +378,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior. - [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. + - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From b7c8a70e40a13cf51756c0be50ec771810cb53d5 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 21:41:54 -0400 Subject: [PATCH 23/61] fix: repair source-native host contradictions --- TODO.md | 1 + .../generation/activity/generator.py | 56 ++++++++++++++-- .../generation/emitters/windows.py | 28 +++++++- .../generation/engine/emitter_setup.py | 22 +++++-- .../generation/engine/storyline.py | 21 +++++- tests/unit/test_activity.py | 37 +++++++++++ tests/unit/test_baseline_canonical.py | 10 ++- tests/unit/test_emitters.py | 65 +++++++++++++++++++ tests/unit/test_storyline_command_networks.py | 16 +++++ tests/unit/test_system_process_stability.py | 14 ++++ 10 files changed, 255 insertions(+), 15 deletions(-) diff --git a/TODO.md b/TODO.md index 7d2c328a..2fa3aed6 100644 --- a/TODO.md +++ b/TODO.md @@ -379,6 +379,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r behavior. - [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. + - [ ] **IN PROGRESS** Loop 12 fix pass — repair source-native host contradictions from the loop-11 panel: read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, and Windows 4624/4648 caller identity semantics. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 4f050aba..d2c365f1 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -9214,9 +9214,18 @@ def generate_explicit_credentials( and target_username.split("\\")[-1].split("@", 1)[0].lower() in _LINUX_LOCAL_ACCOUNTS ): return + subject_user = self._coerce_windows_explicit_credentials_subject( + user, + system, + target_username, + ) reporting_pid = self._get_system_pid(system.hostname, "lsass", 0x2E0) - subject_logon_id = self._ensure_explicit_credentials_subject_logon(user, system, time) - subject = self._account_subject_fields(user.username, system, subject_logon_id) + subject_logon_id = self._ensure_explicit_credentials_subject_logon( + subject_user, + system, + time, + ) + subject = self._account_subject_fields(subject_user.username, system, subject_logon_id) process_pid = process_pid or 0 if process_pid > 0 and process_name: running_process = self.state_manager.get_process(system.hostname, process_pid) @@ -9232,7 +9241,7 @@ def generate_explicit_credentials( if scenario_start is not None and ensure_utc(process_time) < ensure_utc(scenario_start): process_time = time - timedelta(milliseconds=500) process_pid = self.generate_process( - user, + subject_user, system, process_time, subject_logon_id, @@ -9268,6 +9277,45 @@ def generate_explicit_credentials( ) self.dispatcher.dispatch(event) + def _coerce_windows_explicit_credentials_subject( + self, + user: User, + system: System, + target_username: str, + ) -> User: + """Return a Windows-native subject for 4648 when the narrative actor is Unix-local.""" + if _get_os_category(system.os) != "windows": + return user + if user.username.lower() not in _LINUX_LOCAL_ACCOUNTS: + return user + + candidate = target_username.split("\\")[-1].split("@", 1)[0] + known_users = getattr(self, "_users_by_username", {}) + if candidate and candidate.lower() not in _LINUX_LOCAL_ACCOUNTS: + if candidate in known_users: + return known_users[candidate] + return User( + username=candidate, + full_name=candidate, + email=f"{candidate}@{self._valid_fallback_email_domain()}", + ) + + assigned_user = getattr(system, "assigned_user", "") + if assigned_user: + assigned = known_users.get(assigned_user) + if assigned is not None: + return assigned + return User( + username=assigned_user, + full_name=assigned_user, + email=f"{assigned_user}@{self._valid_fallback_email_domain()}", + ) + return User( + username="Administrator", + full_name="Administrator", + email=f"administrator@{self._valid_fallback_email_domain()}", + ) + def _explicit_credentials_source_ip(self, system: System, target_server: str) -> str: """Return source network metadata for remote explicit-credential use.""" target = target_server.strip().lower() @@ -10899,7 +10947,7 @@ def generate_dhcp_lease( for idx, message in enumerate(messages): self.generate_syslog_event( system=system, - time=time + timedelta(milliseconds=idx * 120), + time=time + timedelta(milliseconds=idx * 1500), app_name="dhclient", message=message, pid=dhclient_pid, diff --git a/src/evidenceforge/generation/emitters/windows.py b/src/evidenceforge/generation/emitters/windows.py index fa8c4496..b4bb0ab7 100644 --- a/src/evidenceforge/generation/emitters/windows.py +++ b/src/evidenceforge/generation/emitters/windows.py @@ -40,7 +40,7 @@ from typing import Any from evidenceforge.events.base import SecurityEvent -from evidenceforge.events.contexts import HostContext +from evidenceforge.events.contexts import AuthContext, HostContext from evidenceforge.formats.format_def import FormatDefinition from evidenceforge.generation.activity.timing_profiles import ( sample_timing_delta, @@ -423,6 +423,7 @@ def _render_logon(self, event: SecurityEvent) -> None: if auth.logon_type in (3, 10) and event.src_host is not None else host.hostname ) + process_pid, process_name = self._logon_caller_process_identity(host, auth) event_data = { "EventID": 4624, @@ -442,8 +443,8 @@ def _render_logon(self, event: SecurityEvent) -> None: "TargetLogonId": auth.logon_id, "LogonType": auth.logon_type, "WorkstationName": workstation_name, - "ProcessId": f"0x{auth.reporting_pid:x}" if auth.reporting_pid else "0x2e0", - "ProcessName": r"C:\Windows\System32\lsass.exe", + "ProcessId": f"0x{process_pid:x}" if process_pid else "0x0", + "ProcessName": process_name, "IpAddress": self._ipv6_mapped(auth.source_ip), "IpPort": auth.source_port if auth.logon_type in (3, 10) else 0, "LogonProcessName": auth.logon_process, @@ -475,6 +476,27 @@ def _render_logon(self, event: SecurityEvent) -> None: } self.emit_event(priv_data) + def _logon_caller_process_identity( + self, + host: HostContext, + auth: AuthContext, + ) -> tuple[int, str]: + """Return EventData ProcessId/ProcessName for source-native 4624 semantics.""" + caller_by_type = { + 2: ("winlogon", 0x280, r"C:\Windows\System32\winlogon.exe"), + 4: ("services", 0x2BC, r"C:\Windows\System32\services.exe"), + 5: ("services", 0x2BC, r"C:\Windows\System32\services.exe"), + 7: ("winlogon", 0x280, r"C:\Windows\System32\winlogon.exe"), + 10: ("winlogon", 0x280, r"C:\Windows\System32\winlogon.exe"), + 11: ("winlogon", 0x280, r"C:\Windows\System32\winlogon.exe"), + } + role, default_pid, process_name = caller_by_type.get( + auth.logon_type, + ("lsass", auth.reporting_pid or 0x2E0, r"C:\Windows\System32\lsass.exe"), + ) + sys_pids = getattr(self, "_system_pids", {}).get(host.hostname, {}) + return int(sys_pids.get(role, default_pid)), process_name + def _render_special_privileges(self, event: SecurityEvent) -> None: """Render standalone Windows 4672 (Special Privileges Assigned). diff --git a/src/evidenceforge/generation/engine/emitter_setup.py b/src/evidenceforge/generation/engine/emitter_setup.py index bb675183..6e088e6d 100644 --- a/src/evidenceforge/generation/engine/emitter_setup.py +++ b/src/evidenceforge/generation/engine/emitter_setup.py @@ -764,12 +764,24 @@ def _c(parent, image, cmd, user): _advance_boot_clock() return sm.create_process(hn, parent, image, cmd, user, "System") - pids["systemd"] = _c( - 0, - "/usr/lib/systemd/systemd", - "/usr/lib/systemd/systemd --system --deserialize 26", - "root", + import uuid + + from evidenceforge.models.state import RunningProcess + + systemd_object_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"linux-systemd:{hn}")) + sm.state.running_processes[(hn, 1)] = RunningProcess( + pid=1, + parent_pid=0, + image="/usr/lib/systemd/systemd", + command_line="/usr/lib/systemd/systemd --system --deserialize 26", + username="root", + system=hn, + start_time=sm.state.current_time, + integrity_level="System", + ecar_object_id=systemd_object_id, ) + sm._process_object_ids[(hn, 1)] = systemd_object_id + pids["systemd"] = 1 journal_path = "/usr/lib/systemd/systemd-journald" pids["journald"] = _c(pids["systemd"], journal_path, journal_path, "root") diff --git a/src/evidenceforge/generation/engine/storyline.py b/src/evidenceforge/generation/engine/storyline.py index b347a3b3..d2f9a6aa 100644 --- a/src/evidenceforge/generation/engine/storyline.py +++ b/src/evidenceforge/generation/engine/storyline.py @@ -3408,14 +3408,33 @@ def _extract_output_file(command_line: str, os_category: str) -> str | None: Detects common output file patterns in PowerShell, cmd, and Linux commands. Returns the file path if found, None otherwise. """ + try: + parts = shlex.split(command_line, posix=os_category != "windows") + except ValueError: + parts = command_line.split() + command_name = parts[0].rsplit("\\", 1)[-1].rsplit("/", 1)[-1].lower() if parts else "" + patterns = [ r'Export-Csv\s+[\'"]?([^\s\'">;]+)', # PowerShell Export-Csv r'-OutFile\s+[\'"]?([^\s\'">;]+)', # PowerShell -OutFile r'Out-File\s+[\'"]?([^\s\'">;]+)', # PowerShell Out-File r'>\s*[\'"]?([^\s\'">;]+)', # Shell redirect > - r'-o\s+[\'"]?([^\s\'">;]+)', # Common -o flag r'--output[= ]\s*[\'"]?([^\s\'">;]+)', # --output flag ] + short_o_output_tools = { + "curl", + "wget", + "nmap", + "tar", + "zip", + "7z", + "mysql", + "mysqldump", + "psql", + "sqlcmd", + } + if command_name in short_o_output_tools: + patterns.append(r'-o\s+[\'"]?([^\s\'">;]+)') # Tool-specific output flag for pattern in patterns: match = re.search(pattern, command_line, re.IGNORECASE) if match: diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index b7455ea1..de6f16dd 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -3233,6 +3233,43 @@ def test_generate_explicit_credentials_skips_linux_local_target_on_windows( ] assert all(event.event_type != "explicit_credentials" for event in emitted) + def test_generate_explicit_credentials_coerces_linux_subject_on_windows( + self, activity_gen, test_system, state_manager, mock_emitters + ): + """A Unix-local narrative actor should not bootstrap a Windows root logon.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + root_user = User(username="root", full_name="root", email="root@example.local") + windows_user = User( + username="aisha.johnson", + full_name="Aisha Johnson", + email="aisha.johnson@example.local", + enabled=True, + ) + activity_gen._users_by_username = {windows_user.username: windows_user} + + activity_gen.generate_explicit_credentials( + user=root_user, + system=test_system, + time=timestamp, + target_username=windows_user.username, + target_server="DC-01", + process_name=r"C:\Windows\System32\runas.exe", + process_pid=0, + source_ip="10.10.3.10", + ) + + emitted = [ + call.args[0] for call in mock_emitters["windows_event_security"].emit.call_args_list + ] + logon = next(event for event in emitted if event.event_type == "logon") + process = next(event for event in emitted if event.event_type == "process_create") + explicit = next(event for event in emitted if event.event_type == "explicit_credentials") + assert logon.auth.username == windows_user.username + assert process.auth.username == windows_user.username + assert explicit.auth.subject_username == windows_user.username + assert all(getattr(event.auth, "username", "") != "root" for event in emitted) + def test_generate_process_with_parent_pid( self, activity_gen, test_user, test_system, state_manager, mock_emitters ): diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 4d78c624..1a9c9211 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -721,18 +721,24 @@ def test_generate_dhcp_lease_emits_canonical_syslog_timeline( msg_types=["REQUEST", "ACK"], ) - syslog_messages = [ - call[0][0].syslog.message + syslog_events = [ + call[0][0] for call in mock_emitters["syslog"].emit.call_args_list if call[0][0].event_type == "syslog" and call[0][0].syslog is not None and call[0][0].syslog.app_name == "dhclient" ] + syslog_messages = [event.syslog.message for event in syslog_events] assert syslog_messages == [ "DHCPREQUEST for 10.0.10.2 on eth0 to 10.0.0.1 port 67", "DHCPACK of 10.0.10.2 from 10.0.0.1", "bound to 10.0.10.2 -- renewal in 3600 seconds.", ] + gaps = [ + syslog_events[idx].timestamp - syslog_events[idx - 1].timestamp + for idx in range(1, len(syslog_events)) + ] + assert min(gaps) >= timedelta(milliseconds=1500) class TestAnonymousLogon: diff --git a/tests/unit/test_emitters.py b/tests/unit/test_emitters.py index 2f7263fc..04837b20 100644 --- a/tests/unit/test_emitters.py +++ b/tests/unit/test_emitters.py @@ -25,6 +25,7 @@ import json import re from datetime import UTC, datetime, timedelta +from unittest.mock import Mock import pytest @@ -101,6 +102,70 @@ def test_emit_logon_event(self, format_def, temp_output): assert "WIN-TEST-01.corp.local" in content assert 'jsmith' in content + @pytest.mark.parametrize( + ("logon_type", "expected_role", "expected_process"), + [ + (2, "winlogon", r"C:\Windows\System32\winlogon.exe"), + (5, "services", r"C:\Windows\System32\services.exe"), + (10, "winlogon", r"C:\Windows\System32\winlogon.exe"), + (3, "lsass", r"C:\Windows\System32\lsass.exe"), + ], + ) + def test_render_logon_uses_source_native_caller_process( + self, + format_def, + temp_output, + logon_type, + expected_role, + expected_process, + ): + """4624 ProcessName should reflect the logon type's caller, not always lsass.""" + emitter = WindowsEventEmitter(format_def, temp_output, buffer_size=1) + emitter.emit_event = Mock() + emitter._system_pids = { + "WIN-TEST-01": { + "winlogon": 612, + "services": 704, + "lsass": 736, + } + } + host = HostContext( + hostname="WIN-TEST-01", + ip="10.0.0.10", + os="Windows 10", + os_category="windows", + system_type="workstation", + domain="corp.local", + fqdn="WIN-TEST-01.corp.local", + netbios_domain="CORP", + ) + event = SecurityEvent( + timestamp=datetime(2024, 1, 15, 10, 30, 45, tzinfo=UTC), + event_type="logon", + dst_host=host, + auth=AuthContext( + username="jsmith", + user_sid="S-1-5-21-1-2-3-1001", + logon_id="0x12345", + logon_type=logon_type, + auth_package="Negotiate", + source_ip="-" if logon_type in {2, 5} else "10.0.0.50", + source_port=0 if logon_type in {2, 5} else 50123, + logon_process="User32" if logon_type in {2, 10} else "Kerberos", + subject_sid="S-1-5-18", + subject_username="SYSTEM", + subject_domain="NT AUTHORITY", + subject_logon_id="0x3e7", + reporting_pid=736, + ), + ) + + emitter._render_logon(event) + + rendered = emitter.emit_event.call_args.args[0] + assert rendered["ProcessName"] == expected_process + assert rendered["ProcessId"] == f"0x{emitter._system_pids['WIN-TEST-01'][expected_role]:x}" + def test_emit_event_aligns_provider_execution_ids(self, format_def, temp_output): """Security XML provider PID/TID values should look Windows-native.""" emitter = WindowsEventEmitter(format_def, temp_output, buffer_size=1) diff --git a/tests/unit/test_storyline_command_networks.py b/tests/unit/test_storyline_command_networks.py index 940b8310..32da1b99 100644 --- a/tests/unit/test_storyline_command_networks.py +++ b/tests/unit/test_storyline_command_networks.py @@ -43,6 +43,22 @@ def test_parse_http_url_target_rejects_malformed_bracketed_host(self): assert target is None + def test_extract_output_file_ignores_find_or_operator(self): + output_file = StorylineMixin._extract_output_file( + "find /var/www/html -name *.conf -o -name *.env", + "linux", + ) + + assert output_file is None + + def test_extract_output_file_accepts_short_o_for_output_tools(self): + output_file = StorylineMixin._extract_output_file( + "curl -s -o /tmp/stage.ps1 https://example.test/stage.ps1", + "linux", + ) + + assert output_file == "/tmp/stage.ps1" + def test_extract_scp_target_from_remote_destination(self): target = StorylineMixin._extract_scp_target( "scp /tmp/patient_claims.sql.gz root@10.10.2.30:/var/tmp/", diff --git a/tests/unit/test_system_process_stability.py b/tests/unit/test_system_process_stability.py index f9a365d2..7fdbb957 100644 --- a/tests/unit/test_system_process_stability.py +++ b/tests/unit/test_system_process_stability.py @@ -207,6 +207,20 @@ def test_all_seeded_linux_pids_survive_termination( f"Seeded system process '{role}' (PID {pid}) was terminated" ) + def test_linux_seeded_systemd_uses_pid_one(self, state_manager, mock_emitters, linux_system): + """Linux systemd should anchor source-native process trees at PID 1.""" + _engine, pids = self._seed_and_get_pids(state_manager, mock_emitters, linux_system) + + systemd = state_manager.get_process(linux_system.hostname, pids["systemd"]) + journald = state_manager.get_process(linux_system.hostname, pids["journald"]) + + assert pids["systemd"] == 1 + assert systemd is not None + assert systemd.parent_pid == 0 + assert state_manager.get_process_object_id(linux_system.hostname, 1) + assert journald is not None + assert journald.parent_pid == 1 + def test_user_processes_still_terminate(self, state_manager, mock_emitters, win_system): """Non-system user processes should still be terminated normally.""" engine, pids = self._seed_and_get_pids(state_manager, mock_emitters, win_system) From 9ae822ce71fe5a06e502db32efe90fdc74073418 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 22:00:44 -0400 Subject: [PATCH 24/61] docs: record loop 12 assessment results --- TODO.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 2fa3aed6..7c72c7b7 100644 --- a/TODO.md +++ b/TODO.md @@ -379,7 +379,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r behavior. - [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. - - [ ] **IN PROGRESS** Loop 12 fix pass — repair source-native host contradictions from the loop-11 panel: read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, and Windows 4624/4648 caller identity semantics. + - [x] Loop 12 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `b7c8a70`: repaired source-native host contradictions from the loop-11 panel, including read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, Windows explicit-credential subject coercion, and 4624 caller-process semantics. Verification passed with focused regressions (`318 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3110 passed, 37 skipped`). Regenerated eval passed at `96.17/100` across `79,317` records; hard probes showed zero dash-prefixed read-only file artifacts, zero Linux systemd parent PID violations, zero Windows `root` identity mentions, zero 4624 caller-process mismatches, and zero DHCP ACK-before-REQUEST ordering failures. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `68`, Host/EDR `74` (average `73.5`), with top Loop 13 targets now static web/Zeek HTTP response semantics, native Sysmon Event 1/GUID fidelity, dual-Zeek sensor observation determinism, DC 4776 workstation attribution, and pooled host command/daemon texture. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 93463e41e08852bbae0023f410aaf32ad44c21f2 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 22:09:33 -0400 Subject: [PATCH 25/61] fix: preserve source-native web response semantics --- TODO.md | 1 + .../generation/activity/generator.py | 101 +++++++++- .../generation/activity/http_content.py | 43 +++++ .../generation/engine/baseline.py | 29 ++- tests/unit/test_activity.py | 23 +++ tests/unit/test_baseline_canonical.py | 173 ++++++++++++++++++ tests/unit/test_http_content.py | 25 +++ 7 files changed, 379 insertions(+), 16 deletions(-) diff --git a/TODO.md b/TODO.md index 7c72c7b7..4ca92de5 100644 --- a/TODO.md +++ b/TODO.md @@ -380,6 +380,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. - [x] Loop 12 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `b7c8a70`: repaired source-native host contradictions from the loop-11 panel, including read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, Windows explicit-credential subject coercion, and 4624 caller-process semantics. Verification passed with focused regressions (`318 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3110 passed, 37 skipped`). Regenerated eval passed at `96.17/100` across `79,317` records; hard probes showed zero dash-prefixed read-only file artifacts, zero Linux systemd parent PID violations, zero Windows `root` identity mentions, zero 4624 caller-process mismatches, and zero DHCP ACK-before-REQUEST ordering failures. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `68`, Host/EDR `74` (average `73.5`), with top Loop 13 targets now static web/Zeek HTTP response semantics, native Sysmon Event 1/GUID fidelity, dual-Zeek sensor observation determinism, DC 4776 workstation attribution, and pooled host command/daemon texture. + - [ ] **IN PROGRESS** Loop 13 fix pass — repair static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses do not render as successful `200` objects with concrete MIME bodies. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index d2c365f1..cd874799 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -370,26 +370,93 @@ def _http_context_from_process_command( if not user_agent: return None - from evidenceforge.generation.activity.http_content import infer_mime_type_from_path + from evidenceforge.generation.activity.http_content import ( + infer_mime_type_from_path, + is_stable_resource_path, + response_mime_types_for_status, + response_size_for_status, + ) mime_type = infer_mime_type_from_path(path) + method = _http_method_for_process_command(command_line) + body_len = 0 if method == "HEAD" else response_body_len + if method != "HEAD" and is_stable_resource_path(path): + body_len = response_size_for_status(200, host, path) context = HttpContext( - method=_http_method_for_process_command(command_line), + method=method, host=host if port in (80, 443) else f"{host}:{port}", uri=path, version="1.1", user_agent=user_agent, request_body_len=0, - response_body_len=response_body_len, + response_body_len=body_len, status_code=200, status_msg="OK", referrer="", - resp_mime_types=[mime_type] if mime_type else [], + resp_mime_types=response_mime_types_for_status( + 200, + mime_type, + body_len, + method=method, + ), tags=[], ) return context, host, port, service +def _normalize_http_context_for_source_native_response(http: HttpContext) -> HttpContext: + """Keep caller-provided HTTP metadata source-native before cross-source fan-out.""" + from evidenceforge.generation.activity.http_content import ( + http_status_message, + is_stable_resource_path, + response_mime_types_for_status, + ) + + method = (http.method or "GET").upper() + status_code = http.status_code + response_body_len = max(0, http.response_body_len) + status_msg = http.status_msg + bodyless_status = status_code in {204, 304} + + if bodyless_status: + response_body_len = 0 + elif ( + status_code == 200 + and response_body_len == 0 + and method not in {"CONNECT", "HEAD"} + and is_stable_resource_path(http.uri) + ): + status_code = 304 + status_msg = http_status_message(status_code) + elif method != "CONNECT": + status_msg = http_status_message(status_code) + + resp_mime_types = list(http.resp_mime_types) + if not resp_mime_types or response_body_len <= 0 or method == "HEAD" or bodyless_status: + mime_type = resp_mime_types[0] if resp_mime_types else "" + resp_mime_types = response_mime_types_for_status( + status_code, + mime_type, + response_body_len, + method=method, + ) + + if ( + status_code == http.status_code + and status_msg == http.status_msg + and response_body_len == http.response_body_len + and resp_mime_types == list(http.resp_mime_types) + ): + return http + return replace( + http, + response_body_len=response_body_len, + status_code=status_code, + status_msg=status_msg, + resp_mime_types=resp_mime_types, + ) + + def _network_effect_context_for_process( process_name: str, command_line: str, @@ -5316,6 +5383,8 @@ def generate_connection( if http is not None and http.trans_depth != 1: http = replace(http, trans_depth=1) + if http is not None: + http = _normalize_http_context_for_source_native_response(http) caller_provided_duration = duration is not None caller_provided_conn_state = conn_state is not None @@ -6795,11 +6864,20 @@ def generate_connection( if http_ua_override: ua = http_ua_override status_code, status_msg = _get_http_status(dst_ip, uri) - resp_body_len = resp_bytes or rng.randint(200, 50000) - if status_code in (301, 302): - resp_body_len = rng.randint(100, 300) - elif status_code == 304: + from evidenceforge.generation.activity.http_content import ( + is_stable_resource_path, + response_mime_types_for_status, + response_size_for_mime, + response_size_for_status, + ) + + if status_code in {204, 304}: resp_body_len = 0 + else: + if status_code >= 300 or is_stable_resource_path(uri): + resp_body_len = response_size_for_status(status_code, host, uri) + else: + resp_body_len = resp_bytes or response_size_for_mime(rng, mime_type) from evidenceforge.generation.activity.referrer import pick_referrer _http_referer = ( @@ -6818,7 +6896,12 @@ def generate_connection( status_code=status_code, status_msg=status_msg, referrer=_http_referer, - resp_mime_types=[mime_type] if status_code == 200 else [], + resp_mime_types=response_mime_types_for_status( + status_code, + mime_type, + resp_body_len, + method=http_method, + ), tags=[], ) # Probabilistic file transfer for HTTP responses with content diff --git a/src/evidenceforge/generation/activity/http_content.py b/src/evidenceforge/generation/activity/http_content.py index d7900008..a1207f50 100644 --- a/src/evidenceforge/generation/activity/http_content.py +++ b/src/evidenceforge/generation/activity/http_content.py @@ -65,6 +65,24 @@ "/livez", } +_HTTP_STATUS_MESSAGES: dict[int, str] = { + 200: "OK", + 204: "No Content", + 206: "Partial Content", + 301: "Moved Permanently", + 302: "Found", + 304: "Not Modified", + 400: "Bad Request", + 401: "Unauthorized", + 403: "Forbidden", + 404: "Not Found", + 405: "Method Not Allowed", + 500: "Internal Server Error", + 502: "Bad Gateway", + 503: "Service Unavailable", + 504: "Gateway Timeout", +} + def infer_mime_type_from_path(path: str, default: str = "text/html") -> str: """Infer a response MIME type from a URI path extension. @@ -87,6 +105,26 @@ def response_size_for_mime(rng: random.Random, content_type: str) -> int: return rng.randint(lo, hi) +def http_status_message(status_code: int) -> str: + """Return a conventional HTTP reason phrase for a status code.""" + return _HTTP_STATUS_MESSAGES.get(status_code, "OK") + + +def response_mime_types_for_status( + status_code: int, + mime_type: str, + response_body_len: int, + *, + method: str = "GET", +) -> list[str]: + """Return Zeek-style response MIME metadata only when a body is observable.""" + if not mime_type or response_body_len <= 0: + return [] + if method.upper() == "HEAD" or status_code in {204, 304}: + return [] + return [mime_type] + + def is_health_endpoint_path(uri: str) -> bool: """Return whether a URI path is a small operational health endpoint.""" clean_path = uri.split("?", 1)[0].split("#", 1)[0].lower().rstrip("/") @@ -134,6 +172,11 @@ def is_stable_resource_path(uri: str) -> bool: def response_size_for_status(status_code: int, host: str, uri: str) -> int: """Return a stable source-native web response body size for an HTTP status.""" + if status_code in {204, 304}: + return 0 + if status_code in {301, 302}: + rng = random.Random(_stable_seed(f"web_redirect:{status_code}:{host}:{uri}")) + return rng.randint(120, 480) if status_code < 400 and is_health_endpoint_path(uri): return response_size_for_health_endpoint(status_code, host, uri) if status_code < 400: diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 02f65f8e..68ead926 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -4011,6 +4011,7 @@ def _emit_browsing_session( pick_domain_and_ip, resolve_domain_ip, ) + from evidenceforge.generation.activity.http_content import response_mime_types_for_status from evidenceforge.generation.activity.proxy_uri import is_browser_like_proxy_domain domain_tags = get_domain_tags(hostname) if hostname else [] @@ -4121,9 +4122,12 @@ def _http_status_message(status: int) -> str: status_msg=_http_status_message(req.status_code), referrer=req.referrer, trans_depth=req.trans_depth, - resp_mime_types=[req.content_type] - if req.content_type and req.status_code in {200, 206} - else [], + resp_mime_types=response_mime_types_for_status( + req.status_code, + req.content_type, + req.response_body_len, + method=req.method, + ), tags=[], ) @@ -5859,6 +5863,7 @@ def _emit_web_server_access( from evidenceforge.generation.activity.http_content import ( is_stable_resource_path, normalize_mime_type_for_path, + response_mime_types_for_status, response_size_for_mime, response_size_for_status, ) @@ -6067,11 +6072,16 @@ def _tool_gap_ms() -> int: user_agent=chosen_ua, request_body_len=req.request_body_len, response_body_len=req.response_body_len, - status_code=200, - status_msg="OK", + status_code=req.status_code, + status_msg=_status_message(req.status_code), referrer=req.referrer, trans_depth=req.trans_depth, - resp_mime_types=[req.content_type] if req.content_type else [], + resp_mime_types=response_mime_types_for_status( + req.status_code, + req.content_type, + req.response_body_len, + method=req.method, + ), tags=[], ), hostname=http_host, @@ -6121,7 +6131,12 @@ def _tool_gap_ms() -> int: status_code=status, status_msg=_status_message(status), referrer=referrer, - resp_mime_types=[mime] if status == 200 else [], + resp_mime_types=response_mime_types_for_status( + status, + mime, + resp_bytes, + method=method, + ), tags=[], ), hostname=http_host, diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index de6f16dd..fe5dda1e 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -45,6 +45,7 @@ _jitter_default_connection_duration, _network_effect_context_for_process, ) +from evidenceforge.generation.activity.http_content import response_size_for_status from evidenceforge.generation.activity.tls_realism import ( certificate_analyzer_delay_ms, certificate_file_size, @@ -84,6 +85,28 @@ def test_http_context_from_curl_command_preserves_url_and_user_agent(self): assert http.user_agent == "curl/7.88.1" assert http.response_body_len == 1234 + def test_http_context_from_static_curl_uses_stable_resource_size(self): + """Repeated CLI downloads of static resources should keep one object size.""" + first = _http_context_from_process_command( + "/usr/bin/curl", + "curl -s https://cdn.example.com/favicon.ico", + response_body_len=1234, + ) + second = _http_context_from_process_command( + "/usr/bin/curl", + "curl -s https://cdn.example.com/favicon.ico", + response_body_len=98765, + ) + + assert first is not None + assert second is not None + first_http = first[0] + second_http = second[0] + expected_size = response_size_for_status(200, "cdn.example.com", "/favicon.ico") + assert first_http.response_body_len == expected_size + assert second_http.response_body_len == expected_size + assert first_http.resp_mime_types == ["image/x-icon"] + def test_proxy_context_preserves_cli_http_user_agent(self): """Proxy logs should not replace a caller-provided CLI User-Agent.""" generator = ActivityGenerator(StateManager(), {}) diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 1a9c9211..3f3ad9cb 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -348,6 +348,79 @@ def test_caller_http_context_not_overwritten( assert event.http.uri == "/api/v1/resource/42" assert event.http.status_code == 204 + def test_static_zero_body_success_normalizes_to_not_modified( + self, activity_gen, state_manager, mock_emitters, timestamp + ): + """Static GET responses should not fan out as 200 OK with zero body and MIME.""" + activity_gen.generate_connection( + src_ip="10.0.10.50", + dst_ip="10.0.10.5", + time=timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=0.1, + orig_bytes=200, + resp_bytes=0, + http=HttpContext( + method="GET", + host="WEB-01", + uri="/assets/css/main.063cbaf5.css", + version="1.1", + user_agent="Mozilla/5.0", + request_body_len=0, + response_body_len=0, + status_code=200, + status_msg="OK", + resp_mime_types=["text/css"], + tags=[], + ), + ) + + event = mock_emitters["zeek_http"].emit.call_args[0][0] + assert event.http.status_code == 304 + assert event.http.status_msg == "Not Modified" + assert event.http.response_body_len == 0 + assert event.http.resp_mime_types == [] + + def test_auto_http_static_resource_uses_stable_response_size( + self, activity_gen, state_manager, mock_emitters, timestamp, monkeypatch + ): + """Auto-generated HTTP contexts should not size static resources from flow bytes.""" + from evidenceforge.generation.activity import generator as generator_module + from evidenceforge.generation.activity import proxy_uri + from evidenceforge.generation.activity.http_content import response_size_for_status + + monkeypatch.setattr( + proxy_uri, + "pick_proxy_uri", + lambda *args, **kwargs: ("/favicon.ico", "image/x-icon", "GET", "", "none"), + ) + monkeypatch.setattr(generator_module, "_get_http_status", lambda dst_ip, uri: (200, "OK")) + + activity_gen.generate_connection( + src_ip="10.0.10.50", + dst_ip="10.0.10.5", + time=timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=0.2, + orig_bytes=300, + resp_bytes=50_000, + conn_state="SF", + hostname="portal.example.com", + ) + + event = mock_emitters["zeek_http"].emit.call_args[0][0] + assert event.http.uri == "/favicon.ico" + assert event.http.response_body_len == response_size_for_status( + 200, + "portal.example.com", + "/favicon.ico", + ) + assert event.http.resp_mime_types == ["image/x-icon"] + class TestSmbFileTransferCorrelation: """SMB data transfers should produce Zeek files.log context when substantial.""" @@ -1374,6 +1447,106 @@ def test_web_server_access_uses_browser_cache_for_repeated_static_assets(self, m assert len(asset_rows) == 1 assert asset_rows[0]["http"].status_code == 200 + def test_web_server_access_preserves_cache_and_partial_statuses(self, monkeypatch): + """Browser cache hits and partial content must not be rewritten as 200 responses.""" + from random import Random + from types import SimpleNamespace + from unittest.mock import MagicMock + + from evidenceforge.generation.activity import browsing_session, web_session_profiles + from evidenceforge.generation.activity.browsing_session import BrowsingRequest + from evidenceforge.generation.engine.baseline import BaselineMixin + + monkeypatch.setattr( + web_session_profiles, + "pick_web_visitor_profile", + lambda rng, *, is_external: ( + "human_browser", + { + "kind": "session", + "browsing_intensity": "normal", + "user_agent_pool": "browser_any", + }, + ), + ) + monkeypatch.setattr( + browsing_session, + "generate_browsing_session", + lambda **kwargs: [ + BrowsingRequest( + time_offset_ms=0, + hostname=kwargs["hostname"], + path="/", + method="GET", + content_type="text/html", + referrer="", + trans_depth=1, + is_page_load=True, + response_body_len=4096, + request_body_len=0, + status_code=200, + ), + BrowsingRequest( + time_offset_ms=100, + hostname=kwargs["hostname"], + path="/assets/css/main.063cbaf5.css", + method="GET", + content_type="text/css", + referrer=f"https://{kwargs['hostname']}/", + trans_depth=2, + is_page_load=False, + response_body_len=0, + request_body_len=0, + status_code=304, + ), + BrowsingRequest( + time_offset_ms=200, + hostname=kwargs["hostname"], + path="/assets/js/app.bundle.bf9655b3.js", + method="GET", + content_type="application/javascript", + referrer=f"https://{kwargs['hostname']}/", + trans_depth=3, + is_page_load=False, + response_body_len=1152, + request_body_len=0, + status_code=206, + ), + ], + ) + + collected = [] + activity_gen = MagicMock() + activity_gen._ip_to_system = {} + activity_gen.generate_connection.side_effect = lambda **kw: collected.append(kw) + engine = MagicMock() + engine.activity_generator = activity_gen + engine._resolve_traffic_rate.return_value = (1, 1) + engine._get_segment_for_system.return_value = SimpleNamespace( + exposure="external", + external_ratio=None, + ) + engine._generate_external_client_ip.return_value = "8.8.4.20" + sys_obj = self._make_web_system("external", public_hostnames=["portal.example.com"]) + + BaselineMixin._emit_web_server_access( + engine, + sys_obj, + [sys_obj], + Random(4), + datetime(2024, 3, 15, 10, 0, 0, tzinfo=UTC), + ) + + by_uri = {kw["http"].uri: kw["http"] for kw in collected} + assert by_uri["/assets/css/main.063cbaf5.css"].status_code == 304 + assert by_uri["/assets/css/main.063cbaf5.css"].response_body_len == 0 + assert by_uri["/assets/css/main.063cbaf5.css"].resp_mime_types == [] + assert by_uri["/assets/js/app.bundle.bf9655b3.js"].status_code == 206 + assert by_uri["/assets/js/app.bundle.bf9655b3.js"].response_body_len == 1152 + assert by_uri["/assets/js/app.bundle.bf9655b3.js"].resp_mime_types == [ + "application/javascript" + ] + def test_web_server_access_keeps_scanner_requests_source_native(self, monkeypatch): """Scanner visitors should keep configured error paths and blank referrers.""" from random import Random diff --git a/tests/unit/test_http_content.py b/tests/unit/test_http_content.py index e98dd6cb..ed6dbf37 100644 --- a/tests/unit/test_http_content.py +++ b/tests/unit/test_http_content.py @@ -10,6 +10,7 @@ is_health_endpoint_path, is_stable_resource_path, normalize_mime_type_for_path, + response_mime_types_for_status, response_size_for_health_endpoint, response_size_for_mime, response_size_for_status, @@ -35,6 +36,30 @@ def test_response_size_for_gif_uses_image_range(): assert 500 <= size <= 50_000 +def test_empty_body_statuses_have_zero_stable_response_size(): + assert response_size_for_status(204, "portal.example.com", "/assets/main.css") == 0 + assert response_size_for_status(304, "portal.example.com", "/assets/main.css") == 0 + + +def test_redirect_response_size_is_small_and_stable(): + first = response_size_for_status(302, "portal.example.com", "/login") + second = response_size_for_status(302, "portal.example.com", "/login") + + assert first == second + assert 120 <= first <= 480 + + +def test_response_mime_types_require_visible_body_and_success_status(): + assert response_mime_types_for_status(200, "text/css", 4096) == ["text/css"] + assert response_mime_types_for_status(206, "application/javascript", 512) == [ + "application/javascript" + ] + assert response_mime_types_for_status(304, "text/css", 0) == [] + assert response_mime_types_for_status(200, "text/css", 0) == [] + assert response_mime_types_for_status(200, "text/css", 2048, method="HEAD") == [] + assert response_mime_types_for_status(403, "text/html", 900) == ["text/html"] + + def test_error_response_size_is_template_stable_by_status_host_and_uri(): first = response_size_for_status(404, "portal.example.com", "/.git/HEAD") second = response_size_for_status(404, "portal.example.com", "/.git/HEAD") From af301b9ba908272f058046a7fd9cd3f25d63258f Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 22:37:33 -0400 Subject: [PATCH 26/61] docs: record loop 13 assessment results --- TODO.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/TODO.md b/TODO.md index 4ca92de5..429fb275 100644 --- a/TODO.md +++ b/TODO.md @@ -2,7 +2,7 @@ **Status:** Phase 8.5 (Dual src/dst HostContext) COMPLETE; Pre-MVP quality fixes ongoing **Started:** 2026-03-11 -**Last Updated:** 2026-05-15 +**Last Updated:** 2026-05-16 See [CHANGELOG.md](CHANGELOG.md) for detailed development history of completed phases. @@ -380,7 +380,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. - [x] Loop 12 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `b7c8a70`: repaired source-native host contradictions from the loop-11 panel, including read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, Windows explicit-credential subject coercion, and 4624 caller-process semantics. Verification passed with focused regressions (`318 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3110 passed, 37 skipped`). Regenerated eval passed at `96.17/100` across `79,317` records; hard probes showed zero dash-prefixed read-only file artifacts, zero Linux systemd parent PID violations, zero Windows `root` identity mentions, zero 4624 caller-process mismatches, and zero DHCP ACK-before-REQUEST ordering failures. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `68`, Host/EDR `74` (average `73.5`), with top Loop 13 targets now static web/Zeek HTTP response semantics, native Sysmon Event 1/GUID fidelity, dual-Zeek sensor observation determinism, DC 4776 workstation attribution, and pooled host command/daemon texture. - - [ ] **IN PROGRESS** Loop 13 fix pass — repair static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses do not render as successful `200` objects with concrete MIME bodies. + - [x] Loop 13 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `93463e4`: repaired static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses no longer render as successful `200` objects with concrete MIME bodies. Verification passed with focused HTTP/browsing/web tests (`69 passed, 1 skipped`), the broader HTTP/proxy/emitter slice (`286 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3117 passed, 37 skipped`). Regenerated eval passed at `96.64/100` across `82,065` records; hard probes showed zero static `200` zero-body rows, zero bad `304` body/MIME rows, zero zero-body `206` rows, zero refined web static `200` unstable-size groups, and zero Zeek static `200` unstable-size groups. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `68`, Network `72`, Host/EDR `82` (average `74.5`), and the prior web/static-body issue disappeared from reviewer findings. Top Loop 14 targets are SSH command target/network-destination contradictions, Zeek UDP/53 DNS-service zero-payload rows, native Kerberos 4624 `WorkstationName` semantics, Sysmon provider thread-ID distribution, and developer-tool current-directory realism. + - [ ] **IN PROGRESS** Loop 14 fix pass — repair the highest-leverage source-native contradictions from Loop 13, starting with SSH command-to-network destination ownership and Zeek UDP/53 DNS-service zero-payload rows; bundle Kerberos 4624 `WorkstationName` semantics if the owning layer is compact and low risk. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 7a824498a2e26466a2a2fb1d707858fc2d306481 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 22:54:49 -0400 Subject: [PATCH 27/61] fix: align source-native command and DNS semantics --- .../generation/activity/generator.py | 276 +++++++++++++++++- .../generation/emitters/sysmon.py | 27 +- .../generation/emitters/windows.py | 26 +- tests/unit/test_activity.py | 81 +++++ tests/unit/test_dns_realism.py | 26 ++ tests/unit/test_emitters.py | 47 +++ tests/unit/test_sysmon_emitter.py | 12 + 7 files changed, 476 insertions(+), 19 deletions(-) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index cd874799..f7254e88 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -472,6 +472,87 @@ def _network_effect_context_for_process( return effect_process_name, effect_command_line +def _is_ip_literal(value: str) -> bool: + """Return whether a command target is an IP literal.""" + try: + ipaddress.ip_address(value.strip("[]")) + except ValueError: + return False + return True + + +def _normalize_command_host_token(value: str) -> str: + """Normalize a host token from command-line arguments.""" + host = value.strip().strip("'\"") + if not host: + return "" + if "://" in host: + parsed = urlsplit(host) + host = parsed.hostname or host + if "@" in host: + host = host.rsplit("@", 1)[1] + host = host.strip("[]") + if ":" in host and not _is_ip_literal(host): + name, maybe_port = host.rsplit(":", 1) + if maybe_port.isdigit(): + host = name + return host.rstrip(".") + + +def _command_tokens(command_line: str) -> list[str]: + """Split a process command line enough to recover network target arguments.""" + try: + tokens = shlex.split(command_line, posix=False) + except ValueError: + tokens = command_line.split() + return [token.strip().strip("'\"") for token in tokens if token.strip().strip("'\"")] + + +def _extract_network_command_target(command_line: str, service: str) -> str | None: + """Extract a user-visible network target from common client command lines.""" + normalized_service = service.lower() + if normalized_service == "ssh": + tokens = _command_tokens(command_line) + if not tokens: + return None + option_args = { + "-b", + "-c", + "-e", + "-f", + "-i", + "-j", + "-l", + "-m", + "-o", + "-p", + "-s", + "-w", + } + skip_next = False + for token in tokens[1:]: + lower = token.lower() + if skip_next: + skip_next = False + continue + if lower in option_args: + skip_next = True + continue + if lower.startswith("-"): + continue + target = _normalize_command_host_token(token) + if target: + return target + return None + if normalized_service == "rdp": + match = re.search(r"(?:^|\s)/v:([^\s]+)", command_line, re.IGNORECASE) + return _normalize_command_host_token(match.group(1)) if match else None + if normalized_service == "ldap": + match = re.search(r"ldap://([^\s/\"']+)", command_line, re.IGNORECASE) + return _normalize_command_host_token(match.group(1)) if match else None + return None + + def _parse_port_tokens(tokens: list[str]) -> list[int]: """Parse nmap port tokens until the next option or target token.""" ports: list[int] = [] @@ -2046,6 +2127,137 @@ def _system_for_hostname(self, hostname: str) -> Any | None: return system return None + def _unique_environment_systems(self) -> list[Any]: + """Return scenario systems once, preserving environment order where possible.""" + systems: list[Any] = [] + seen_hosts: set[str] = set() + for system in getattr(self, "_ip_to_system", {}).values(): + hostname = str(getattr(system, "hostname", "") or "") + if hostname in seen_hosts: + continue + seen_hosts.add(hostname) + systems.append(system) + return systems + + def _system_for_command_alias(self, hostname: str, service: str) -> Any | None: + """Resolve common generic command aliases to environment systems.""" + wanted = hostname.lower().rstrip(".") + if not wanted: + return None + systems = self._unique_environment_systems() + if not systems: + return None + + def system_matches(system: Any, markers: tuple[str, ...]) -> bool: + haystack = " ".join( + [ + str(getattr(system, "hostname", "") or ""), + str(getattr(system, "type", "") or ""), + " ".join(getattr(system, "roles", []) or []), + " ".join(getattr(system, "services", []) or []), + ] + ).lower() + return any(marker in haystack for marker in markers) + + if service == "ssh": + if wanted.startswith("web") or "web" in wanted: + markers = ("web", "apache", "nginx", "http") + elif wanted.startswith("db") or "db" in wanted: + markers = ("db", "database", "mysql", "postgres", "mssql") + elif wanted.startswith("app") or "app" in wanted: + markers = ("app", "api") + elif "bastion" in wanted or "jump" in wanted: + markers = ("bastion", "proxy", "jump") + else: + markers = () + if markers: + candidates = [ + system + for system in systems + if _get_os_category(getattr(system, "os", "")) == "linux" + and system_matches(system, markers) + ] + if candidates: + return candidates[0] + return None + + def _resolve_command_network_target( + self, + target: str, + service: str, + ) -> tuple[str, str | None] | None: + """Resolve a command-line network target to a destination IP and hostname hint.""" + normalized = _normalize_command_host_token(target) + if not normalized: + return None + if _is_ip_literal(normalized): + return normalized, None + target_system = self._system_for_hostname(normalized) or self._system_for_command_alias( + normalized, service + ) + if target_system is None: + return None + return target_system.ip, normalized + + def _pick_command_target_placeholder( + self, + rng: random.Random, + command_line: str, + source_system: System, + ) -> str | None: + """Choose an environment-valid replacement for command `{ssh_target}` placeholders.""" + systems = [ + system for system in self._unique_environment_systems() if system.ip != source_system.ip + ] + if not systems: + return None + command_lower = command_line.lower() + if "ldap://" in command_lower: + candidates = [ + system + for system in systems + if getattr(system, "type", "") == "domain_controller" + or "domain_controller" in (getattr(system, "roles", []) or []) + ] + elif "mstsc" in command_lower: + candidates = [ + system + for system in systems + if _get_os_category(getattr(system, "os", "")) == "windows" + and getattr(system, "type", "") in {"server", "domain_controller"} + ] + else: + candidates = [ + system + for system in systems + if _get_os_category(getattr(system, "os", "")) == "linux" + ] + if not candidates: + candidates = systems + target = rng.choice(candidates) + ad_domain = str(getattr(self, "_ad_domain", "") or "").strip(".") + style = rng.random() + if style < 0.18: + return target.ip + if style < 0.32 and ad_domain: + return f"{target.hostname}.{ad_domain}" + return str(target.hostname) + + def _parameterize_command_for_system( + self, + rng: random.Random, + command_line: str, + *, + username: str, + system: System, + ) -> str: + """Parameterize command templates with environment-aware network targets.""" + if "{ssh_target}" in command_line: + target = self._pick_command_target_placeholder(rng, command_line, system) + if target: + command_line = command_line.replace("{ssh_target}", target) + return _parameterize_command(rng, command_line, username=username) + def _resolve_process_identity( self, *, @@ -4323,6 +4535,34 @@ def _derive_current_directory( if exe in {"onedrive.exe", "teams.exe", "outlook.exe"}: return profile_dir + "\\" + if exe in { + "cargo.exe", + "docker.exe", + "git.exe", + "kubectl.exe", + "node.exe", + "npm.cmd", + "npm.exe", + "ssh.exe", + }: + if exe == "ssh.exe": + return profile_dir + "\\" + repo_names = ( + "clinical-portal", + "integration-api", + "ops-automation", + "platform-services", + "security-tools", + ) + repo = repo_names[ + _stable_seed( + f"windows_project_cwd:{system.hostname}:{username}:{process_name}:" + f"{command_line}" + ) + % len(repo_names) + ] + return profile_dir + f"\\source\\repos\\{repo}\\" + if exe in {"chrome.exe", "msedge.exe", "firefox.exe"}: install_dir = image.rsplit("\\", 1)[0] if "\\" in image else "" if parent_dir and parent_dir == install_dir.lower(): @@ -6002,7 +6242,7 @@ def generate_connection( if service == "dns" and proto in ("udp", "tcp") and dst_port == 53: query_len = len(dns.query) if dns is not None and dns.query else 12 query_type = (dns.query_type if dns is not None else "").upper() - min_query_payload = query_len + 16 + min_query_payload = max(40, query_len + 16) if query_type in {"TXT", "NULL"}: min_query_payload += 18 elif query_type == "SRV": @@ -6180,7 +6420,10 @@ def generate_connection( if conn_state in ("S0", "REJ"): duration = None resp_bytes = 0 - orig_bytes = 0 + if service == "dns" and proto == "udp" and dst_port == 53: + orig_bytes = max(orig_bytes or 0, 40) + else: + orig_bytes = 0 elif conn_state in ("S2", "S3"): if duration is not None: duration = duration * rng.uniform(0.3, 0.8) @@ -8577,7 +8820,25 @@ def _emit_process_network_correlation( # Internal connection: use DB server or any internal server db_servers = getattr(self, "_db_servers", []) all_ips = getattr(self, "_all_system_ips", []) - if service in ("mssql", "mysql", "postgresql") and db_servers: + command_target = _extract_network_command_target(command_line, service) + resolved_command_target = ( + self._resolve_command_network_target(command_target, service) + if command_target + else None + ) + if resolved_command_target is not None: + dst_ip, command_hostname = resolved_command_target + if command_hostname: + ext_hostname = command_hostname + emit_dns = True + elif command_target: + logger.debug( + "Skipping %s process network effect with unresolved command target %s", + service, + command_target, + ) + return + elif service in ("mssql", "mysql", "postgresql") and db_servers: # Filter to DB servers that match the requested service svc = service compatible = [ @@ -8607,7 +8868,7 @@ def _emit_process_network_correlation( emit_dns=emit_dns, pid=pid, http=http_context, - hostname=ext_hostname if conn_info["external"] else None, + hostname=ext_hostname, ) def execute_baseline_activity( @@ -8794,7 +9055,12 @@ def execute_baseline_activity( ) if result: process_name, command_line = result - command_line = _parameterize_command(rng, command_line, username=user.username) + command_line = self._parameterize_command_for_system( + rng, + command_line, + username=user.username, + system=system, + ) process_time = time if os_category == "linux": process_time = self._schedule_bash_history_time( diff --git a/src/evidenceforge/generation/emitters/sysmon.py b/src/evidenceforge/generation/emitters/sysmon.py index d406522a..e9946c30 100644 --- a/src/evidenceforge/generation/emitters/sysmon.py +++ b/src/evidenceforge/generation/emitters/sysmon.py @@ -551,20 +551,29 @@ def _get_sysmon_thread_id(self, hostname: str) -> int: cache = getattr(self, "_sysmon_thread_pools", None) if cache is None: cache = self._sysmon_thread_pools = {} - offsets = getattr(self, "_sysmon_thread_pool_offsets", None) - if offsets is None: - offsets = self._sysmon_thread_pool_offsets = {} + counters = getattr(self, "_sysmon_thread_counters", None) + if counters is None: + counters = self._sysmon_thread_counters = {} + last_threads = getattr(self, "_sysmon_last_thread_by_host", None) + if last_threads is None: + last_threads = self._sysmon_last_thread_by_host = {} if hostname not in cache: rng = random.Random(_stable_seed(f"sysmon_threads_{hostname}")) cache[hostname] = [ windows_id_randint(rng, 1000, 5000) for _ in range(rng.randint(3, 5)) ] - offsets[hostname] = _stable_seed(f"sysmon_thread_offset_{hostname}") % len( - cache[hostname] - ) - offset = offsets[hostname] - offsets[hostname] = (offset + 1) % len(cache[hostname]) - return cache[hostname][offset] + counters[hostname] = 0 + pool = cache[hostname] + counter = counters.get(hostname, 0) + counters[hostname] = counter + 1 + rng = random.Random(_stable_seed(f"sysmon_thread_choice:{hostname}:{counter}")) + previous = last_threads.get(hostname) + if previous in pool and rng.random() < 0.58: + return previous + weights = [max(1, len(pool) * 3 - index * 2) for index, _thread_id in enumerate(pool)] + thread_id = rng.choices(pool, weights=weights, k=1)[0] + last_threads[hostname] = thread_id + return thread_id def _get_sysmon_pid(self, hostname: str) -> int: """Return stable Sysmon service PID for a given host. diff --git a/src/evidenceforge/generation/emitters/windows.py b/src/evidenceforge/generation/emitters/windows.py index b4bb0ab7..d1b80aaf 100644 --- a/src/evidenceforge/generation/emitters/windows.py +++ b/src/evidenceforge/generation/emitters/windows.py @@ -157,6 +157,26 @@ def _subject_domain(username: str, netbios_domain: str) -> str: return netbios_domain +def _logon_workstation_name(auth: AuthContext, host: HostContext, event: SecurityEvent) -> str: + """Return native Windows WorkstationName semantics for successful logons.""" + if auth.workstation_name: + return auth.workstation_name + if ( + auth.logon_type == 3 + and (auth.auth_package or "").lower() == "kerberos" + and auth.source_ip not in {"", "-", host.ip} + ): + seed = _stable_seed( + f"kerberos_4624_workstation:{host.hostname}:{auth.logon_id}:" + f"{auth.source_ip}:{event.timestamp.isoformat()}" + ) + if seed % 100 < 72: + return "-" + if auth.logon_type in (3, 10) and event.src_host is not None: + return event.src_host.hostname + return host.hostname + + def _auth_subject_domain(auth: Any, netbios_domain: str) -> str: """Normalize SubjectDomainName for well-known Windows subject identities.""" subject_name = getattr(auth, "subject_username", "") or getattr(auth, "username", "") @@ -418,11 +438,7 @@ def _render_logon(self, event: SecurityEvent) -> None: rng = random.Random() auth = event.auth host = self._get_host(event) - workstation_name = auth.workstation_name or ( - event.src_host.hostname - if auth.logon_type in (3, 10) and event.src_host is not None - else host.hostname - ) + workstation_name = _logon_workstation_name(auth, host, event) process_pid, process_name = self._logon_caller_process_identity(host, auth) event_data = { diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index fe5dda1e..23232de5 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -1627,6 +1627,87 @@ def test_generate_process_derives_user_current_directory( f"C:\\Users\\{test_user.username}\\Documents\\" ) + def test_generate_process_derives_project_current_directory_for_dev_tools( + self, activity_gen, test_user, test_system, state_manager, mock_emitters + ): + """Relative developer-tool commands should run from a project directory.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + process_name = r"C:\Program Files\nodejs\node.exe" + command_line = "node.exe scripts/build.js" + + activity_gen.generate_process( + test_user, + test_system, + timestamp, + "0x12345", + process_name, + command_line, + ) + + process_events = [ + call[0][0] + for call in mock_emitters["windows_event_security"].emit.call_args_list + if call[0][0].event_type == "process_create" + and call[0][0].process + and call[0][0].process.image == process_name + ] + assert process_events + current_directory = process_events[0].process.current_directory + assert current_directory.startswith(f"C:\\Users\\{test_user.username}\\source\\repos\\") + assert current_directory != r"C:\Program Files\nodejs\\" + + def test_ssh_process_network_effect_uses_command_target( + self, activity_gen, test_user, state_manager, mock_emitters + ): + """SSH Sysmon/eCAR flow destinations should agree with the process command line.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + workstation = System( + hostname="WS-01", + ip="10.0.1.10", + os="Windows 11", + type="workstation", + ) + web_server = System( + hostname="WEB-EXT-01", + ip="10.0.3.10", + os="Ubuntu 22.04", + type="server", + roles=["web_server"], + ) + activity_gen._ip_to_system = {workstation.ip: workstation, web_server.ip: web_server} + activity_gen._all_system_ips = [workstation.ip, web_server.ip] + state_manager.set_current_time(timestamp) + process_name = r"C:\Windows\System32\OpenSSH\ssh.exe" + command_line = "ssh.exe testuser@WEB-EXT-01" + pid = activity_gen.generate_process( + test_user, + workstation, + timestamp, + "0x12345", + process_name, + command_line, + ) + mock_emitters["zeek_conn"].reset_mock() + + activity_gen._emit_process_network_correlation( + workstation, + process_name, + command_line, + timestamp, + pid, + random.Random(1), + ) + + network_events = [ + call.args[0] + for call in mock_emitters["zeek_conn"].emit.call_args_list + if call.args[0].event_type == "connection" + ] + assert network_events + assert network_events[-1].network.dst_ip == web_server.ip + assert network_events[-1].network.dst_port == 22 + def test_process_follow_on_file_event_after_process_create( self, activity_gen, test_user, test_system, state_manager, mock_emitters ): diff --git a/tests/unit/test_dns_realism.py b/tests/unit/test_dns_realism.py index 76eb839a..b6328c6c 100644 --- a/tests/unit/test_dns_realism.py +++ b/tests/unit/test_dns_realism.py @@ -926,6 +926,31 @@ def test_udp_dns_with_explicit_conn_state_uses_udp_history( assert event.network.history in {"Dd", "D"} assert not set(event.network.history) & set("SshAaFfRr") + def test_unanswered_udp_dns_keeps_request_payload( + self, activity_gen, timestamp, state_manager, mock_emitters + ): + """A Zeek dns-service S0 row still needs a visible UDP DNS request payload.""" + state_manager.set_current_time(timestamp) + + activity_gen.generate_connection( + src_ip="10.0.1.50", + dst_ip="8.8.8.8", + time=timestamp, + dst_port=53, + proto="udp", + service="dns", + conn_state="S0", + ) + + event = mock_emitters["zeek_conn"].emit.call_args[0][0] + assert event.network.protocol == "udp" + assert event.network.service == "dns" + assert event.network.conn_state == "S0" + assert event.network.history == "D" + assert event.network.orig_bytes >= 40 + assert event.network.orig_ip_bytes > event.network.orig_bytes + assert event.network.resp_bytes == 0 + def test_denied_dns_query_has_no_response_payload( self, activity_gen, timestamp, state_manager, mock_emitters ): @@ -963,6 +988,7 @@ def test_denied_dns_query_has_no_response_payload( event = mock_emitters["zeek_conn"].emit.call_args[0][0] assert event.network.conn_state == "S0" + assert event.network.orig_bytes >= 40 assert event.network.resp_bytes == 0 assert event.network.resp_pkts == 0 assert event.dns.answers == [] diff --git a/tests/unit/test_emitters.py b/tests/unit/test_emitters.py index 04837b20..4791eac2 100644 --- a/tests/unit/test_emitters.py +++ b/tests/unit/test_emitters.py @@ -241,6 +241,53 @@ def test_network_logon_workstation_name_uses_source_host(self, format_def, temp_ assert "FS-01.example.com" in content assert '%%1843' in content + def test_kerberos_network_logon_can_render_blank_workstation_name( + self, format_def, temp_output + ): + """Native Kerberos type-3 4624 often leaves WorkstationName unset.""" + emitter = WindowsEventEmitter(format_def, temp_output, buffer_size=1) + event = SecurityEvent( + timestamp=datetime(2024, 1, 15, 10, 0, 45, tzinfo=UTC), + event_type="logon", + src_host=HostContext( + hostname="WS-01", + ip="10.0.1.10", + fqdn="WS-01.example.com", + os="Windows 11", + os_category="windows", + system_type="workstation", + ), + dst_host=HostContext( + hostname="FS-01", + ip="10.0.2.20", + fqdn="FS-01.example.com", + os="Windows Server 2022", + os_category="windows", + system_type="server", + ), + auth=AuthContext( + username="jsmith", + user_sid="S-1-5-21-1-2-3-1001", + logon_id="0xkerb1", + logon_type=3, + source_ip="10.0.1.10", + auth_package="Kerberos", + logon_process="Kerberos", + lm_package="-", + subject_sid="S-1-5-18", + subject_username="SYSTEM", + subject_domain="NT AUTHORITY", + subject_logon_id="0x3e7", + reporting_pid=744, + ), + ) + + emitter.emit(event) + emitter.close() + + content = temp_output.read_text() + assert '-' in content + def test_logon_elevated_token_reflects_auth_context(self, format_def, temp_output): """4624 ElevatedToken should vary with canonical auth.elevated.""" emitter = WindowsEventEmitter(format_def, temp_output, buffer_size=10) diff --git a/tests/unit/test_sysmon_emitter.py b/tests/unit/test_sysmon_emitter.py index 7f015d20..186f699f 100644 --- a/tests/unit/test_sysmon_emitter.py +++ b/tests/unit/test_sysmon_emitter.py @@ -88,6 +88,18 @@ def test_emit_sysmon_process_create(self, format_def, temp_output): assert 'SHA1=ABC123' in content assert 'C:\\Windows\\explorer.exe' in content + def test_sysmon_thread_ids_reuse_pool_without_round_robin_balance( + self, format_def, temp_output + ): + """Sysmon provider threads should be reused in bursts, not perfectly round-robin.""" + emitter = SysmonEventEmitter(format_def, temp_output, buffer_size=1) + + thread_ids = [emitter._get_sysmon_thread_id("WS-01") for _ in range(120)] + counts = {thread_id: thread_ids.count(thread_id) for thread_id in set(thread_ids)} + + assert 3 <= len(counts) <= 5 + assert max(counts.values()) - min(counts.values()) >= 10 + def test_emit_sysmon_aligns_provider_execution_ids(self, format_def, temp_output): """Sysmon XML provider PID/TID values should be 4-byte aligned.""" emitter = SysmonEventEmitter(format_def, temp_output, buffer_size=1) From 16740cc4cb39f93784454d801e867fa704a9ffc8 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 23:11:29 -0400 Subject: [PATCH 28/61] docs: record loop 14 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 429fb275..b67355ea 100644 --- a/TODO.md +++ b/TODO.md @@ -381,7 +381,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. - [x] Loop 12 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `b7c8a70`: repaired source-native host contradictions from the loop-11 panel, including read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, Windows explicit-credential subject coercion, and 4624 caller-process semantics. Verification passed with focused regressions (`318 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3110 passed, 37 skipped`). Regenerated eval passed at `96.17/100` across `79,317` records; hard probes showed zero dash-prefixed read-only file artifacts, zero Linux systemd parent PID violations, zero Windows `root` identity mentions, zero 4624 caller-process mismatches, and zero DHCP ACK-before-REQUEST ordering failures. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `68`, Host/EDR `74` (average `73.5`), with top Loop 13 targets now static web/Zeek HTTP response semantics, native Sysmon Event 1/GUID fidelity, dual-Zeek sensor observation determinism, DC 4776 workstation attribution, and pooled host command/daemon texture. - [x] Loop 13 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `93463e4`: repaired static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses no longer render as successful `200` objects with concrete MIME bodies. Verification passed with focused HTTP/browsing/web tests (`69 passed, 1 skipped`), the broader HTTP/proxy/emitter slice (`286 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3117 passed, 37 skipped`). Regenerated eval passed at `96.64/100` across `82,065` records; hard probes showed zero static `200` zero-body rows, zero bad `304` body/MIME rows, zero zero-body `206` rows, zero refined web static `200` unstable-size groups, and zero Zeek static `200` unstable-size groups. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `68`, Network `72`, Host/EDR `82` (average `74.5`), and the prior web/static-body issue disappeared from reviewer findings. Top Loop 14 targets are SSH command target/network-destination contradictions, Zeek UDP/53 DNS-service zero-payload rows, native Kerberos 4624 `WorkstationName` semantics, Sysmon provider thread-ID distribution, and developer-tool current-directory realism. - - [ ] **IN PROGRESS** Loop 14 fix pass — repair the highest-leverage source-native contradictions from Loop 13, starting with SSH command-to-network destination ownership and Zeek UDP/53 DNS-service zero-payload rows; bundle Kerberos 4624 `WorkstationName` semantics if the owning layer is compact and low risk. + - [x] Loop 14 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `7a82449`: repaired SSH command-to-network destination ownership, preserved request payload for Zeek UDP/53 DNS-service failures, added native variance to Kerberos type 3 4624 `WorkstationName`, made Sysmon provider ThreadID selection less round-robin, and moved developer-tool `CurrentDirectory` values into user project directories. Verification passed with focused tests (`6 passed`), broader related slice (`350 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3122 passed, 37 skipped`). Regenerated eval passed at `96.60/100` across `81,716` records; hard probes found zero UDP/53 DNS zero/small request-payload rows, zero SSH command/network mismatches, `26.8%` blank/dash Kerberos type 3 `WorkstationName`, zero perfectly balanced Sysmon ThreadID hosts, and zero developer-tool install/system current directories. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `84`, Host/EDR `78` (average `79.0`). Top Loop 15 targets are Windows account-state contradictions after lockout/disabled failures, ICMP and ASA/Zeek source-native network contradictions, user-session process ownership for `sihost.exe`, high-volume Linux daemon boilerplate, bash cadence/typo texture, proxy HTTPS semantics, and OneDrive Sysmon Event 7 version metadata. + - [ ] **IN PROGRESS** Loop 15 fix pass — repair the highest-leverage Loop 14 findings, starting with Windows account-state semantics for locked/disabled 4625 failures before successful 4624 logons; then address ICMP/ASA-Zeek source-native contradictions if the first fix stays compact. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 9c9dcef8ba60eade0be6cdc40cf9d60fee4965fd Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 23:19:37 -0400 Subject: [PATCH 29/61] fix: enforce auth and network source semantics --- .../generation/activity/generator.py | 34 ++++++----- .../generation/emitters/cisco_asa.py | 2 +- .../generation/emitters/zeek_base.py | 5 +- tests/unit/test_baseline_canonical.py | 8 ++- tests/unit/test_cisco_asa_emitter.py | 12 ++++ tests/unit/test_phase5_failed_logon.py | 59 +++++++++++++++++-- tests/unit/test_zeek_multiplex.py | 43 ++++++++++++++ 7 files changed, 138 insertions(+), 25 deletions(-) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index f7254e88..c6628076 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -4020,32 +4020,26 @@ def generate_failed_logon( # Use target_username if provided, otherwise use the actor's username effective_username = target_username or user.username - # Determine failure substatus with correct SID handling + # Determine failure substatus with source-native account-state semantics. + # Ordinary known/enabled accounts should fail as bad passwords; locked + # or disabled states require an explicit account-state model so they do + # not contradict later successful logons. rng = _get_rng() - substatus_roll = rng.random() known_account = self._is_known_failed_logon_account(effective_username, user) failed_profile = self._failed_logon_profile(logon_type, system, source_ip, rng) validation_path = self._failed_logon_validation_path(logon_type, failed_profile, rng) - if known_account and substatus_roll < 0.80: - substatus = "0xc000006a" # Wrong password + if self._is_disabled_failed_logon_account(effective_username, user): + substatus = "0xc0000072" # Account disabled user_sid = self._get_sid(effective_username) - failure_reason = "%%2313" - elif not known_account and substatus_roll < 0.60: + failure_reason = "%%2307" + elif not known_account: substatus = "0xc0000064" # User not found: NULL SID user_sid = "S-1-0-0" failure_reason = "%%2313" - elif substatus_roll < 0.85: + else: substatus = "0xc000006a" # Wrong password user_sid = self._get_sid(effective_username) failure_reason = "%%2313" - elif substatus_roll < 0.95: - substatus = "0xc0000234" # Account locked out - user_sid = self._get_sid(effective_username) - failure_reason = "%%2304" - else: - substatus = "0xc0000072" # Account disabled - user_sid = self._get_sid(effective_username) - failure_reason = "%%2307" remote_linux_source = ( _get_os_category(system.os) == "linux" @@ -4365,6 +4359,16 @@ def _is_known_failed_logon_account(self, username: str, actor: User) -> bool: return True return False + @staticmethod + def _is_disabled_failed_logon_account(username: str, actor: User) -> bool: + """Return whether this failed-logon target is explicitly disabled.""" + if actor.enabled: + return False + normalized = username.split("@", 1)[0].lower() + if normalized == actor.username.lower(): + return True + return bool(actor.email and username.lower() == actor.email.lower()) + def generate_logoff( self, user: User, diff --git a/src/evidenceforge/generation/emitters/cisco_asa.py b/src/evidenceforge/generation/emitters/cisco_asa.py index 0233cd0f..32f5ebd1 100644 --- a/src/evidenceforge/generation/emitters/cisco_asa.py +++ b/src/evidenceforge/generation/emitters/cisco_asa.py @@ -45,7 +45,7 @@ # ASA facility: local4 (20) _ASA_FACILITY = 20 -_TCP_SUCCESS_TEARDOWN_REASONS = ("TCP FINs", "TCP FINs", "TCP FINs", "TCP Reset-O", "TCP Reset-I") +_TCP_SUCCESS_TEARDOWN_REASONS = ("TCP FINs",) _TCP_PARTIAL_TEARDOWN_REASONS = ("Conn-timeout", "TCP Reset-O", "TCP Reset-I") diff --git a/src/evidenceforge/generation/emitters/zeek_base.py b/src/evidenceforge/generation/emitters/zeek_base.py index 85f899fc..34e2c4eb 100644 --- a/src/evidenceforge/generation/emitters/zeek_base.py +++ b/src/evidenceforge/generation/emitters/zeek_base.py @@ -564,7 +564,10 @@ def _dispatch(self, event_data: dict[str, Any]) -> None: render_data["ts"] = ts + timedelta(microseconds=sensor_delay_us) elif isinstance(ts, (int, float)): render_data["ts"] = ts + sensor_delay_us / 1_000_000 - if render_data.get("_allow_sensor_observation_variance"): + if ( + render_data.get("_allow_sensor_observation_variance") + and str(render_data.get("proto") or "").lower() != "icmp" + ): _apply_sensor_observation_variance(render_data, hostname, original_uid) _enforce_http_body_invariants(render_data) _enforce_ip_byte_invariants(render_data) diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 3f3ad9cb..5d7b6609 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -608,7 +608,11 @@ def test_remote_linux_failed_logon_reuses_ssh_source_port_for_zeek_tuple( match = re.search(r"from (?P\S+) port (?P\d+) ssh2", syslog_event.syslog.message) assert match is not None ssh_source_port = int(match.group("port")) - zeek_events = [call.args[0] for call in mock_emitters["zeek_conn"].emit.call_args_list] + zeek_events = [ + call.args[0] + for call in mock_emitters["zeek_conn"].emit.call_args_list + if call.args[0].event_type == "connection" and call.args[0].network is not None + ] assert any( event.network.src_ip == source_ip @@ -616,7 +620,7 @@ def test_remote_linux_failed_logon_reuses_ssh_source_port_for_zeek_tuple( and event.network.dst_ip == linux.ip and event.network.dst_port == 22 and event.network.service == "ssh" - and event.timestamp < syslog_event.timestamp + and abs((event.timestamp - syslog_event.timestamp).total_seconds()) <= 1.0 for event in zeek_events ) diff --git a/tests/unit/test_cisco_asa_emitter.py b/tests/unit/test_cisco_asa_emitter.py index 43cac3d0..13efdc6e 100644 --- a/tests/unit/test_cisco_asa_emitter.py +++ b/tests/unit/test_cisco_asa_emitter.py @@ -318,6 +318,18 @@ def test_teardown_byte_count_is_not_exact_zeek_payload_sum(self, asa_emitter, tm assert byte_match is not None assert int(byte_match.group(1)) != 5120 + def test_successful_tcp_teardown_uses_fin_reason(self, asa_emitter, tmp_path): + """ASA teardown reason should agree with a normal Zeek SF/FIN close.""" + event = _make_connection_event(protocol="tcp", conn_state="SF") + + asa_emitter.emit(event) + asa_emitter.flush() + + output = (tmp_path / "fw01" / "cisco_asa.log").read_text() + teardown = next(line for line in output.splitlines() if "%ASA-6-302014:" in line) + assert "TCP FINs" in teardown + assert "TCP Reset" not in teardown + def test_same_interface_permit_is_not_rendered_as_perimeter_flow(self, asa_emitter, tmp_path): """ASA should not mirror same-interface internal permits by default.""" event = _make_connection_event( diff --git a/tests/unit/test_phase5_failed_logon.py b/tests/unit/test_phase5_failed_logon.py index 23d8be2a..c506857a 100644 --- a/tests/unit/test_phase5_failed_logon.py +++ b/tests/unit/test_phase5_failed_logon.py @@ -89,13 +89,60 @@ def test_emits_failed_logon( assert event.event_type == "failed_logon" assert event.auth.username == "alice.smith" assert event.auth.failure_status == "0xc000006d" - assert event.auth.failure_substatus in ( - "0xc000006a", - "0xc0000064", - "0xc0000234", - "0xc0000072", + assert event.auth.failure_substatus == "0xc000006a" + + def test_enabled_known_user_failed_logon_never_uses_stateful_substatus( + self, activity_gen, test_user, win_system, timestamp, state_manager, mock_emitters + ): + """Enabled known accounts should fail as bad passwords unless state is modeled.""" + state_manager.set_current_time(timestamp) + + for _ in range(50): + activity_gen.generate_failed_logon(test_user, win_system, timestamp) + + events = [ + call[0][0] + for call in mock_emitters["windows_event_security"].emit.call_args_list + if call[0][0].event_type == "failed_logon" + ] + assert events + assert {event.auth.failure_substatus for event in events} == {"0xc000006a"} + + def test_disabled_user_failed_logon_uses_disabled_substatus( + self, activity_gen, win_system, timestamp, state_manager, mock_emitters + ): + """Explicitly disabled accounts should render disabled-account failures.""" + state_manager.set_current_time(timestamp) + disabled_user = User( + username="svc_old_backup", + full_name="svc_old_backup", + email="svc_old_backup@example.com", + enabled=False, + ) + + activity_gen.generate_failed_logon(disabled_user, win_system, timestamp, logon_type=4) + + event = mock_emitters["windows_event_security"].emit.call_args[0][0] + assert event.auth.failure_substatus == "0xc0000072" + assert event.auth.failure_reason == "%%2307" + + def test_unknown_target_failed_logon_uses_unknown_user_substatus( + self, activity_gen, test_user, win_system, timestamp, state_manager, mock_emitters + ): + """Unknown target accounts should not be rendered as locked or disabled users.""" + state_manager.set_current_time(timestamp) + + activity_gen.generate_failed_logon( + test_user, + win_system, + timestamp, + target_username="not.a.real.user", ) + event = mock_emitters["windows_event_security"].emit.call_args[0][0] + assert event.auth.failure_substatus == "0xc0000064" + assert event.auth.user_sid == "S-1-0-0" + def test_no_session_created( self, activity_gen, test_user, win_system, timestamp, state_manager ): @@ -426,7 +473,7 @@ def test_known_user_failed_logon_uses_wrong_password_substatus( if call[0][0].event_type == "failed_logon" ] assert events - assert all(event.auth.failure_substatus != "0xc0000064" for event in events) + assert {event.auth.failure_substatus for event in events} == {"0xc000006a"} def test_interactive_failed_logon_uses_local_windows_shape( self, state_manager, mock_emitters, timestamp diff --git a/tests/unit/test_zeek_multiplex.py b/tests/unit/test_zeek_multiplex.py index 6329406c..4e3669a9 100644 --- a/tests/unit/test_zeek_multiplex.py +++ b/tests/unit/test_zeek_multiplex.py @@ -119,6 +119,49 @@ def test_second_sensor_observation_preserves_lossless_packetization(self): assert row["orig_ip_bytes"] >= row["orig_bytes"] + (40 * row["orig_pkts"]) assert row["resp_ip_bytes"] >= row["resp_bytes"] + (40 * row["resp_pkts"]) + def test_sensor_observation_preserves_icmp_echo_accounting(self): + """ICMP echo payload and IP-byte accounting should not vary by sensor.""" + fmt = load_format("zeek_conn") + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + emitter = ZeekEmitter(fmt, base, sensor_hostnames=["core", "dmz"]) + + emitter.emit_event( + { + "ts": datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + "uid": "CTestIcmp1234567", + "id.orig_h": "10.0.0.1", + "id.orig_p": 8, + "id.resp_h": "10.0.0.2", + "id.resp_p": 0, + "proto": "icmp", + "service": "icmp", + "duration": 0.04, + "orig_bytes": 120, + "resp_bytes": 120, + "orig_pkts": 1, + "resp_pkts": 1, + "orig_ip_bytes": 148, + "resp_ip_bytes": 148, + "conn_state": "SF", + "history": "Dd", + "_allow_sensor_observation_variance": True, + "_sensor_hostnames": ["core", "dmz"], + } + ) + emitter.close() + + core = json.loads((base / "core" / "conn.json").read_text().splitlines()[0]) + dmz = json.loads((base / "dmz" / "conn.json").read_text().splitlines()[0]) + + assert core["uid"] != dmz["uid"] + assert core["ts"] != dmz["ts"] + for row in (core, dmz): + assert row["orig_bytes"] == row["resp_bytes"] == 120 + assert row["orig_ip_bytes"] == row["resp_ip_bytes"] == 148 + assert row["orig_ip_bytes"] - row["orig_bytes"] == 28 + assert row["resp_ip_bytes"] - row["resp_bytes"] == 28 + def test_sensor_timestamp_offsets_vary_by_flow(self): """Cross-sensor timestamps should not collapse into one fixed offset band.""" fmt = load_format("zeek_conn") From 024db1a406f3b643ed212bce8c3fc1762ac96939 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 23:38:25 -0400 Subject: [PATCH 30/61] docs: record loop 15 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index b67355ea..393d0916 100644 --- a/TODO.md +++ b/TODO.md @@ -382,7 +382,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 12 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `b7c8a70`: repaired source-native host contradictions from the loop-11 panel, including read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, Windows explicit-credential subject coercion, and 4624 caller-process semantics. Verification passed with focused regressions (`318 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3110 passed, 37 skipped`). Regenerated eval passed at `96.17/100` across `79,317` records; hard probes showed zero dash-prefixed read-only file artifacts, zero Linux systemd parent PID violations, zero Windows `root` identity mentions, zero 4624 caller-process mismatches, and zero DHCP ACK-before-REQUEST ordering failures. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `68`, Host/EDR `74` (average `73.5`), with top Loop 13 targets now static web/Zeek HTTP response semantics, native Sysmon Event 1/GUID fidelity, dual-Zeek sensor observation determinism, DC 4776 workstation attribution, and pooled host command/daemon texture. - [x] Loop 13 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `93463e4`: repaired static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses no longer render as successful `200` objects with concrete MIME bodies. Verification passed with focused HTTP/browsing/web tests (`69 passed, 1 skipped`), the broader HTTP/proxy/emitter slice (`286 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3117 passed, 37 skipped`). Regenerated eval passed at `96.64/100` across `82,065` records; hard probes showed zero static `200` zero-body rows, zero bad `304` body/MIME rows, zero zero-body `206` rows, zero refined web static `200` unstable-size groups, and zero Zeek static `200` unstable-size groups. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `68`, Network `72`, Host/EDR `82` (average `74.5`), and the prior web/static-body issue disappeared from reviewer findings. Top Loop 14 targets are SSH command target/network-destination contradictions, Zeek UDP/53 DNS-service zero-payload rows, native Kerberos 4624 `WorkstationName` semantics, Sysmon provider thread-ID distribution, and developer-tool current-directory realism. - [x] Loop 14 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `7a82449`: repaired SSH command-to-network destination ownership, preserved request payload for Zeek UDP/53 DNS-service failures, added native variance to Kerberos type 3 4624 `WorkstationName`, made Sysmon provider ThreadID selection less round-robin, and moved developer-tool `CurrentDirectory` values into user project directories. Verification passed with focused tests (`6 passed`), broader related slice (`350 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3122 passed, 37 skipped`). Regenerated eval passed at `96.60/100` across `81,716` records; hard probes found zero UDP/53 DNS zero/small request-payload rows, zero SSH command/network mismatches, `26.8%` blank/dash Kerberos type 3 `WorkstationName`, zero perfectly balanced Sysmon ThreadID hosts, and zero developer-tool install/system current directories. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `84`, Host/EDR `78` (average `79.0`). Top Loop 15 targets are Windows account-state contradictions after lockout/disabled failures, ICMP and ASA/Zeek source-native network contradictions, user-session process ownership for `sihost.exe`, high-volume Linux daemon boilerplate, bash cadence/typo texture, proxy HTTPS semantics, and OneDrive Sysmon Event 7 version metadata. - - [ ] **IN PROGRESS** Loop 15 fix pass — repair the highest-leverage Loop 14 findings, starting with Windows account-state semantics for locked/disabled 4625 failures before successful 4624 logons; then address ICMP/ASA-Zeek source-native contradictions if the first fix stays compact. + - [x] Loop 15 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `9c9dcef`: enforced source-native auth state semantics so enabled known accounts fail as bad passwords, disabled synthetic accounts fail as disabled, and unknown targets fail as no-such-user; preserved ICMP echo payload/IP-byte accounting across Zeek sensors; and made ASA successful TCP teardown reasons agree with Zeek `SF`/FIN semantics. Verification passed with focused auth/Zeek/ASA tests (`90 passed`), broader related slice (`302 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3127 passed, 37 skipped`). Regenerated eval passed at `95.52/100` across `83,251` records; hard probes found zero locked/disabled-before-success auth violations, zero bad ICMP echo accounting rows, and zero ASA reset-to-Zeek-`SF` mismatches. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `68`, Network `72`, Host/EDR `64` (average `69.5`). Top Loop 16 targets are Windows `.cmd`/`.bat` process-image contradictions, Zeek interval float precision, Linux journald/syslog boilerplate volume, Sysmon ProcessGuid embedded-time clustering, active TCP source-port reuse, Mimikatz LSASS remote-thread over-modeling, Sysmon Event 3/Event 10 texture, and bash command/cadence repetition. + - [ ] **IN PROGRESS** Loop 16 fix pass — repair high-leverage source-native findings from Loop 15, starting with Windows `.cmd`/`.bat` process-image canonicalization and Zeek interval microsecond precision; bundle low-risk Linux/syslog volume reduction if compact. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 21f3a79feada36008aca53d184a043a004a36ee0 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Fri, 15 May 2026 23:48:06 -0400 Subject: [PATCH 31/61] fix: repair source-native process and zeek texture --- TODO.md | 3 +- .../generation/activity/generator.py | 30 +++++++++++++++++ .../generation/emitters/zeek_base.py | 27 ++++++++++++++-- .../generation/engine/baseline.py | 2 +- tests/unit/test_activity.py | 32 +++++++++++++++++++ tests/unit/test_zeek_format_accuracy.py | 24 +++++++------- 6 files changed, 100 insertions(+), 18 deletions(-) diff --git a/TODO.md b/TODO.md index 393d0916..027db95f 100644 --- a/TODO.md +++ b/TODO.md @@ -383,7 +383,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 13 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `93463e4`: repaired static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses no longer render as successful `200` objects with concrete MIME bodies. Verification passed with focused HTTP/browsing/web tests (`69 passed, 1 skipped`), the broader HTTP/proxy/emitter slice (`286 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3117 passed, 37 skipped`). Regenerated eval passed at `96.64/100` across `82,065` records; hard probes showed zero static `200` zero-body rows, zero bad `304` body/MIME rows, zero zero-body `206` rows, zero refined web static `200` unstable-size groups, and zero Zeek static `200` unstable-size groups. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `68`, Network `72`, Host/EDR `82` (average `74.5`), and the prior web/static-body issue disappeared from reviewer findings. Top Loop 14 targets are SSH command target/network-destination contradictions, Zeek UDP/53 DNS-service zero-payload rows, native Kerberos 4624 `WorkstationName` semantics, Sysmon provider thread-ID distribution, and developer-tool current-directory realism. - [x] Loop 14 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `7a82449`: repaired SSH command-to-network destination ownership, preserved request payload for Zeek UDP/53 DNS-service failures, added native variance to Kerberos type 3 4624 `WorkstationName`, made Sysmon provider ThreadID selection less round-robin, and moved developer-tool `CurrentDirectory` values into user project directories. Verification passed with focused tests (`6 passed`), broader related slice (`350 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3122 passed, 37 skipped`). Regenerated eval passed at `96.60/100` across `81,716` records; hard probes found zero UDP/53 DNS zero/small request-payload rows, zero SSH command/network mismatches, `26.8%` blank/dash Kerberos type 3 `WorkstationName`, zero perfectly balanced Sysmon ThreadID hosts, and zero developer-tool install/system current directories. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `84`, Host/EDR `78` (average `79.0`). Top Loop 15 targets are Windows account-state contradictions after lockout/disabled failures, ICMP and ASA/Zeek source-native network contradictions, user-session process ownership for `sihost.exe`, high-volume Linux daemon boilerplate, bash cadence/typo texture, proxy HTTPS semantics, and OneDrive Sysmon Event 7 version metadata. - [x] Loop 15 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `9c9dcef`: enforced source-native auth state semantics so enabled known accounts fail as bad passwords, disabled synthetic accounts fail as disabled, and unknown targets fail as no-such-user; preserved ICMP echo payload/IP-byte accounting across Zeek sensors; and made ASA successful TCP teardown reasons agree with Zeek `SF`/FIN semantics. Verification passed with focused auth/Zeek/ASA tests (`90 passed`), broader related slice (`302 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3127 passed, 37 skipped`). Regenerated eval passed at `95.52/100` across `83,251` records; hard probes found zero locked/disabled-before-success auth violations, zero bad ICMP echo accounting rows, and zero ASA reset-to-Zeek-`SF` mismatches. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `68`, Network `72`, Host/EDR `64` (average `69.5`). Top Loop 16 targets are Windows `.cmd`/`.bat` process-image contradictions, Zeek interval float precision, Linux journald/syslog boilerplate volume, Sysmon ProcessGuid embedded-time clustering, active TCP source-port reuse, Mimikatz LSASS remote-thread over-modeling, Sysmon Event 3/Event 10 texture, and bash command/cadence repetition. - - [ ] **IN PROGRESS** Loop 16 fix pass — repair high-leverage source-native findings from Loop 15, starting with Windows `.cmd`/`.bat` process-image canonicalization and Zeek interval microsecond precision; bundle low-risk Linux/syslog volume reduction if compact. + - [x] Loop 16 fix pass completed and verified: Windows `.cmd`/`.bat` process launches now canonicalize to the real `cmd.exe` host before state/event creation, shared Zeek JSON rendering rounds float values to microsecond precision, and Linux `systemd-journald` storage-accounting boilerplate takes a smaller share of ambient syslog noise. Verification passed with focused regressions, broader process/catalog/Zeek/syslog slices, `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3128 passed, 37 skipped`). Regeneration and blind review follow. + - [ ] **IN PROGRESS** Loop 16 regeneration, hard probes, quantitative eval, and blind review. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index c6628076..f1798c78 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -716,6 +716,25 @@ def _extract_image_from_command(command_line: str) -> str: return cleaned.split()[0] +def _windows_script_host_process( + process_name: str, + command_line: str, +) -> tuple[str, str]: + """Return the real Windows process image for batch-script execution.""" + basename = ntpath.basename(process_name).lower() + if not basename.endswith((".cmd", ".bat")): + return process_name, command_line + + host_image = r"C:\Windows\System32\cmd.exe" + stripped = command_line.strip() + command_lower = stripped.lower() + if command_lower.startswith(("cmd.exe ", r"c:\windows\system32\cmd.exe ")): + return host_image, command_line + if command_lower.startswith("cmd "): + return host_image, f"cmd.exe {stripped[4:]}" + return host_image, f"cmd.exe /c {stripped or ntpath.basename(process_name)}" + + def _windows_token_profile(username: str, integrity_level: str) -> tuple[str, str, str]: """Return source-native Windows token fields for a process owner.""" normalized = username.upper().split("\\")[-1] @@ -4824,6 +4843,11 @@ def generate_process( from evidenceforge.events.contexts import ProcessContext self.state_manager.set_current_time(time) + if _get_os_category(system.os) == "windows": + process_name, command_line = _windows_script_host_process( + process_name, + command_line, + ) # Determine integrity level per UAC model: # - SYSTEM processes: "System" (handled in generate_system_process) @@ -8103,6 +8127,12 @@ def generate_system_process( """ from evidenceforge.events.contexts import ProcessContext + if _get_os_category(system.os) == "windows": + process_name, command_line = _windows_script_host_process( + process_name, + command_line, + ) + exe_name = ntpath.basename(process_name).lower() if _get_os_category(system.os) == "windows" and exe_name in _WINDOWS_SINGLETON_SERVICE_EXES: for proc in self.state_manager.get_processes_on_system(system.hostname): diff --git a/src/evidenceforge/generation/emitters/zeek_base.py b/src/evidenceforge/generation/emitters/zeek_base.py index 34e2c4eb..2823fae9 100644 --- a/src/evidenceforge/generation/emitters/zeek_base.py +++ b/src/evidenceforge/generation/emitters/zeek_base.py @@ -65,6 +65,29 @@ def _swap_host_list_value(value: Any, original_ip: Any, visible_ip: Any) -> Any: return [visible_ip if item == original_ip else item for item in value] +def _round_zeek_float(value: float) -> float: + """Round Zeek interval-like values to source-native microsecond precision.""" + rounded = round(value, 6) + if rounded == 0 and value > 0: + return 0.000001 + if rounded == 0 and value < 0: + return -0.000001 + return rounded + + +def _normalize_zeek_float_precision(value: Any) -> Any: + """Normalize floats in rendered Zeek JSON while preserving JSON structure.""" + if isinstance(value, bool): + return value + if isinstance(value, float): + return _round_zeek_float(value) + if isinstance(value, list): + return [_normalize_zeek_float_precision(item) for item in value] + if isinstance(value, dict): + return {key: _normalize_zeek_float_precision(item) for key, item in value.items()} + return value + + def _sensor_variation_fraction(hostname: str, uid: Any, field: str, magnitude: float) -> float: """Return a deterministic signed per-sensor observation variation.""" seed = _stable_seed(f"zeek_sensor_observation:{hostname}:{uid}:{field}") @@ -641,9 +664,7 @@ def _render_zeek_json(self, event_data: dict[str, Any]) -> str: rendered = self._template.render(**template_context) try: data = json.loads(rendered) - # Round timestamp to 6 decimal places (Zeek standard) - if "ts" in data and isinstance(data["ts"], float): - data["ts"] = round(data["ts"], 6) + data = _normalize_zeek_float_precision(data) return json.dumps(data, separators=(",", ":")) except json.JSONDecodeError: return rendered.strip() diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 68ead926..f59d3733 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -5540,7 +5540,7 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 message=msg, pid=sys_pids.get("timesyncd", rng.randint(400, 800)), ) - elif source_roll < 0.59: + elif source_roll < 0.51: # Journald runtime statistics (max_size and type stable per host) machine_id = self._machine_ids.get(system.hostname, "0" * 32) _j_rng = random.Random(_stable_seed(f"journald:{system.hostname}")) diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 23232de5..d966baa1 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -1601,6 +1601,38 @@ def test_generate_process_creates_process( assert event.process.image == process_name assert event.process.command_line == command_line + def test_generate_process_hosts_windows_batch_scripts_under_cmd( + self, activity_gen, test_user, test_system, state_manager, mock_emitters + ): + """Windows batch scripts should not become the process image.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + logon_id = "0x12345" + + pid = activity_gen.generate_process( + test_user, + test_system, + timestamp, + logon_id, + r"C:\Program Files\nodejs\npm.cmd", + "cmd.exe /c npm run dev", + ) + + proc = state_manager.get_process(test_system.hostname, pid) + assert proc is not None + assert proc.image == r"C:\Windows\System32\cmd.exe" + assert proc.command_line == "cmd.exe /c npm run dev" + + process_event = next( + call[0][0] + for call in mock_emitters["windows_event_security"].emit.call_args_list + if call[0][0].event_type == "process_create" + and call[0][0].process + and call[0][0].process.pid == pid + ) + assert process_event.process.image == r"C:\Windows\System32\cmd.exe" + assert process_event.process.command_line == "cmd.exe /c npm run dev" + def test_generate_process_derives_user_current_directory( self, activity_gen, test_user, test_system, state_manager, mock_emitters ): diff --git a/tests/unit/test_zeek_format_accuracy.py b/tests/unit/test_zeek_format_accuracy.py index c249510a..da113e7e 100644 --- a/tests/unit/test_zeek_format_accuracy.py +++ b/tests/unit/test_zeek_format_accuracy.py @@ -209,11 +209,8 @@ def test_timestamp_precision(self): if output_file.exists(): output_file.unlink() - def test_duration_full_precision(self): - """Verify conn.log duration uses full float precision, not rounded to 6 decimals. - - Real Zeek duration example: 0.0002410411834716797 (19 decimal digits). - """ + def test_duration_uses_microsecond_precision(self): + """Verify conn.log duration is rendered at native-looking microsecond precision.""" from evidenceforge.formats import load_format from evidenceforge.generation.emitters.zeek import ZeekEmitter @@ -257,17 +254,13 @@ def test_duration_full_precision(self): raw_line = f.readline() generated = json.loads(raw_line) - # Duration must NOT be rounded to 6 decimal places - assert generated["duration"] == test_duration, ( - f"Duration should preserve full precision: " - f"expected {test_duration}, got {generated['duration']}" - ) + assert generated["duration"] == 0.000241 - # Verify the raw JSON string has more than 6 decimal digits for duration + # Verify the raw JSON string does not expose raw Python float precision. duration_str = raw_line.split('"duration":')[1].split(",")[0] decimal_part = duration_str.split(".")[1] - assert len(decimal_part) > 6, ( - f"Duration should have >6 decimal digits in JSON, got: {duration_str}" + assert len(decimal_part) <= 6, ( + f"Duration should have <=6 decimal digits in JSON, got: {duration_str}" ) finally: @@ -404,6 +397,11 @@ def test_dns_emitter_output_fields(self): # Verify rtt is float assert isinstance(generated["rtt"], float) + rtt_str = line.split('"rtt":')[1].split(",")[0] + decimal_part = rtt_str.split(".")[1] + assert len(decimal_part) <= 6, ( + f"RTT should have <=6 decimal digits in JSON, got: {rtt_str}" + ) finally: if output_file.exists(): From aeb457b8e75af4e7f0165532ab867ffc8b7dbb53 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 00:07:50 -0400 Subject: [PATCH 32/61] docs: record loop 16 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 027db95f..1573b54d 100644 --- a/TODO.md +++ b/TODO.md @@ -384,7 +384,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 14 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `7a82449`: repaired SSH command-to-network destination ownership, preserved request payload for Zeek UDP/53 DNS-service failures, added native variance to Kerberos type 3 4624 `WorkstationName`, made Sysmon provider ThreadID selection less round-robin, and moved developer-tool `CurrentDirectory` values into user project directories. Verification passed with focused tests (`6 passed`), broader related slice (`350 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3122 passed, 37 skipped`). Regenerated eval passed at `96.60/100` across `81,716` records; hard probes found zero UDP/53 DNS zero/small request-payload rows, zero SSH command/network mismatches, `26.8%` blank/dash Kerberos type 3 `WorkstationName`, zero perfectly balanced Sysmon ThreadID hosts, and zero developer-tool install/system current directories. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `78`, Network `84`, Host/EDR `78` (average `79.0`). Top Loop 15 targets are Windows account-state contradictions after lockout/disabled failures, ICMP and ASA/Zeek source-native network contradictions, user-session process ownership for `sihost.exe`, high-volume Linux daemon boilerplate, bash cadence/typo texture, proxy HTTPS semantics, and OneDrive Sysmon Event 7 version metadata. - [x] Loop 15 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `9c9dcef`: enforced source-native auth state semantics so enabled known accounts fail as bad passwords, disabled synthetic accounts fail as disabled, and unknown targets fail as no-such-user; preserved ICMP echo payload/IP-byte accounting across Zeek sensors; and made ASA successful TCP teardown reasons agree with Zeek `SF`/FIN semantics. Verification passed with focused auth/Zeek/ASA tests (`90 passed`), broader related slice (`302 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3127 passed, 37 skipped`). Regenerated eval passed at `95.52/100` across `83,251` records; hard probes found zero locked/disabled-before-success auth violations, zero bad ICMP echo accounting rows, and zero ASA reset-to-Zeek-`SF` mismatches. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `68`, Network `72`, Host/EDR `64` (average `69.5`). Top Loop 16 targets are Windows `.cmd`/`.bat` process-image contradictions, Zeek interval float precision, Linux journald/syslog boilerplate volume, Sysmon ProcessGuid embedded-time clustering, active TCP source-port reuse, Mimikatz LSASS remote-thread over-modeling, Sysmon Event 3/Event 10 texture, and bash command/cadence repetition. - [x] Loop 16 fix pass completed and verified: Windows `.cmd`/`.bat` process launches now canonicalize to the real `cmd.exe` host before state/event creation, shared Zeek JSON rendering rounds float values to microsecond precision, and Linux `systemd-journald` storage-accounting boilerplate takes a smaller share of ambient syslog noise. Verification passed with focused regressions, broader process/catalog/Zeek/syslog slices, `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3128 passed, 37 skipped`). Regeneration and blind review follow. - - [ ] **IN PROGRESS** Loop 16 regeneration, hard probes, quantitative eval, and blind review. + - [x] Loop 16 regeneration, hard probes, quantitative eval, and blind review completed from commit `21f3a79`: regenerated output passed automated eval at exact `96.09/100` across `81,173` records; hard probes confirmed zero Windows/eCAR `.cmd`/`.bat` process-image rows, zero Zeek interval precision violations with max six decimal places, journald at `419/9,101` syslog rows (`4.6%`), and no active TCP source-port overlaps in this regenerated dataset. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `76`, Network `82`, Host/EDR `78` (average `77.0`). Top Loop 17 targets are Zeek UDP/IP byte arithmetic and paired DNS payload contradictions, Windows lock/unlock binding to a Type 3 network-logon LUID, and Linux SSH/syslog microsecond suffix preservation. + - [ ] **IN PROGRESS** Loop 17 fix pass — repair Loop 16's highest-leverage source-native contradictions, starting with Zeek UDP/IP byte accounting and cross-sensor DNS payload consistency; also fix Windows lock/unlock LUID ownership if compact, with Linux SSH/syslog timestamp suffix jitter next. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From eaf090ada2233e1206e770a753adc4aa51be8789 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 00:18:11 -0400 Subject: [PATCH 33/61] fix: repair network and session source semantics --- .../generation/activity/generator.py | 85 ++++++++++++++---- .../generation/emitters/zeek_base.py | 23 ++++- .../generation/engine/baseline.py | 18 +++- .../generation/engine/storyline.py | 38 ++++++-- tests/unit/test_activity.py | 39 +++++++++ tests/unit/test_storyline_command_networks.py | 86 ++++++++++++++++++- tests/unit/test_zeek_activity_contexts.py | 4 + tests/unit/test_zeek_fanout.py | 12 ++- tests/unit/test_zeek_multiplex.py | 34 ++++++++ 9 files changed, 306 insertions(+), 33 deletions(-) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index f1798c78..dffb40e4 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -682,6 +682,30 @@ def _nmap_conn_state(port: int, target_system: System | None = None) -> str: "--type=", "--utility-sub-type=", ) +_WINDOWS_INTERACTIVE_SESSION_LOGON_TYPES = frozenset({2, 10, 11}) +_SSH_SYSLOG_MICRO_JITTER_BANDS = { + "connection": 101, + "accepted": 301, + "pam": 501, + "logind": 701, + "closed": 901, +} + + +def _ssh_syslog_time( + base_time: datetime, + label: str, + milliseconds: int, + *seed_parts: Any, + before: bool = False, +) -> datetime: + """Return an SSH syslog lifecycle timestamp with non-repeating sub-ms texture.""" + band_start = _SSH_SYSLOG_MICRO_JITTER_BANDS.get(label, 101) + seed = _stable_seed( + "ssh_syslog_micro_jitter:" + label + ":" + ":".join(str(part) for part in seed_parts) + ) + delta = timedelta(milliseconds=milliseconds, microseconds=band_start + (seed % 89)) + return base_time - delta if before else base_time + delta def _session_started_by(session: Any, time: datetime) -> bool: @@ -7684,10 +7708,23 @@ def generate_ssh_session( conn_delay_ms = rng.randint(70, 160) pam_delay_ms = conn_delay_ms + rng.randint(45, 110) logind_delay_ms = pam_delay_ms + rng.randint(420, 760) + ssh_syslog_seed = ( + target_system.hostname, + source_ip, + src_port, + sshd_pid, + time.isoformat(), + ) # sshd connection message (precedes auth in real SSH lifecycle) conn_msg_event = SecurityEvent( - timestamp=time - timedelta(milliseconds=conn_delay_ms), + timestamp=_ssh_syslog_time( + time, + "connection", + conn_delay_ms, + *ssh_syslog_seed, + before=True, + ), event_type="syslog", src_host=event.dst_host, syslog=SyslogContext( @@ -7703,27 +7740,33 @@ def generate_ssh_session( ) self.dispatcher.dispatch(conn_msg_event) - # Primary event: sshd Accepted password - event.syslog = SyslogContext( - app_name="sshd", - pid=sshd_pid, - facility=10, - severity=6, - message=( - f"Accepted password for {user.username} from {source_ip} port {src_port} ssh2" - ), - ) - self.dispatcher.dispatch(event) # Emit follow-up syslog entries (pam_unix + systemd-logind) if event.dst_host and event.dst_host.os_category == "linux": from evidenceforge.events.contexts import SyslogContext + accepted_event = SecurityEvent( + timestamp=_ssh_syslog_time(time, "accepted", 0, *ssh_syslog_seed), + event_type="syslog", + src_host=event.dst_host, + syslog=SyslogContext( + app_name="sshd", + pid=sshd_pid, + facility=10, + severity=6, + message=( + f"Accepted password for {user.username} " + f"from {source_ip} port {src_port} ssh2" + ), + ), + ) + self.dispatcher.dispatch(accepted_event) + # pam_unix session opened (syslog-only, no eCAR/Zeek correlation) hostname = target_system.hostname pam_event = SecurityEvent( - timestamp=time + timedelta(milliseconds=pam_delay_ms), + timestamp=_ssh_syslog_time(time, "pam", pam_delay_ms, *ssh_syslog_seed), event_type="syslog", src_host=event.dst_host, syslog=SyslogContext( @@ -7740,7 +7783,7 @@ def generate_ssh_session( self.dispatcher.dispatch(pam_event) # systemd-logind new session (syslog-only) - logind_time = time + timedelta(milliseconds=logind_delay_ms) + logind_time = _ssh_syslog_time(time, "logind", logind_delay_ms, *ssh_syslog_seed) # Session ID: monotonic + unique per host. StateManager owns this # sequence because baseline syslog noise and explicit SSH sessions # both produce systemd-logind messages for the same host. @@ -9783,7 +9826,12 @@ def generate_workstation_lock( ) -> None: """Generate workstation lock event (4800).""" session = self.state_manager.get_session(logon_id) - if session is None or session.system != system.hostname or session.start_time > time: + if ( + session is None + or session.system != system.hostname + or session.start_time > time + or session.logon_type not in _WINDOWS_INTERACTIVE_SESSION_LOGON_TYPES + ): return if not hasattr(self, "_last_workstation_lock_time"): self._last_workstation_lock_time = {} @@ -9815,7 +9863,12 @@ def generate_workstation_unlock( ) -> None: """Generate workstation unlock event (4801 + 4624 type 7).""" session = self.state_manager.get_session(logon_id) - if session is None or session.system != system.hostname or session.start_time > time: + if ( + session is None + or session.system != system.hostname + or session.start_time > time + or session.logon_type not in _WINDOWS_INTERACTIVE_SESSION_LOGON_TYPES + ): return lock_key = (system.hostname, user.username, logon_id) lock_time = getattr(self, "_last_workstation_lock_time", {}).get(lock_key) diff --git a/src/evidenceforge/generation/emitters/zeek_base.py b/src/evidenceforge/generation/emitters/zeek_base.py index 2823fae9..c8cd7354 100644 --- a/src/evidenceforge/generation/emitters/zeek_base.py +++ b/src/evidenceforge/generation/emitters/zeek_base.py @@ -139,6 +139,19 @@ def _jitter_numeric_observation( render_data[field] = max(type(value)(minimum), type(value)(varied)) +def _locks_sensor_packet_accounting(render_data: dict[str, Any]) -> bool: + """Return whether a flow's byte counters should stay identical across sensors.""" + proto = str(render_data.get("proto") or "").lower() + if proto == "icmp": + return True + if proto != "udp": + return False + service = str(render_data.get("service") or "").lower() + if service == "dns": + return True + return render_data.get("id.orig_p") == 53 or render_data.get("id.resp_p") == 53 + + def _apply_sensor_observation_variance( render_data: dict[str, Any], hostname: str, @@ -279,6 +292,9 @@ def _enforce_ip_byte_invariants(render_data: dict[str, Any]) -> None: render_data[f"{side}_ip_bytes"] = 0 continue packet_count = packets if isinstance(packets, int) and packets > 0 else 1 + if proto == "udp": + render_data[f"{side}_ip_bytes"] = payload + (header_bytes * packet_count) + continue minimum_ip_bytes = payload + (header_bytes * packet_count) if ip_bytes < minimum_ip_bytes: render_data[f"{side}_ip_bytes"] = minimum_ip_bytes @@ -587,10 +603,9 @@ def _dispatch(self, event_data: dict[str, Any]) -> None: render_data["ts"] = ts + timedelta(microseconds=sensor_delay_us) elif isinstance(ts, (int, float)): render_data["ts"] = ts + sensor_delay_us / 1_000_000 - if ( - render_data.get("_allow_sensor_observation_variance") - and str(render_data.get("proto") or "").lower() != "icmp" - ): + if render_data.get( + "_allow_sensor_observation_variance" + ) and not _locks_sensor_packet_accounting(render_data): _apply_sensor_observation_variance(render_data, hostname, original_uid) _enforce_http_body_invariants(render_data) _enforce_ip_byte_invariants(render_data) diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index f59d3733..92f133da 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -51,6 +51,7 @@ _dns_rtt, _linux_foreground_lifetime, _linux_uid_for_user, + _ssh_syslog_time, _windows_foreground_lifetime, ) from evidenceforge.generation.activity.helpers import _get_os_category @@ -5428,16 +5429,27 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 else: auth_msg = f"Accepted password for {ssh_user} from {ip} port {port} ssh2" _msg_offset = rng.randint(10, 50) + ssh_syslog_seed = ( + system.hostname, + ip, + port, + sshd_pid, + ts.isoformat(), + ) login_times: list[datetime] = [] - for _ in range(3): - login_times.append(ts + timedelta(milliseconds=_msg_offset)) + for _label in ("connection", "accepted", "pam"): + login_times.append( + _ssh_syslog_time(ts, _label, _msg_offset, *ssh_syslog_seed) + ) _msg_offset += rng.randint(12, 70) # systemd-logind is observed as a different process from # sshd, so source-observation delay can be independent. # Keep enough visible margin that New-session rows cannot # sort before auth/PAM under the default syslog delay profile. _msg_offset += rng.randint(420, 760) - login_times.append(ts + timedelta(milliseconds=_msg_offset)) + login_times.append( + _ssh_syslog_time(ts, "logind", _msg_offset, *ssh_syslog_seed) + ) ssh_sid = self.state_manager.next_linux_logind_session_id( system.hostname, rng, diff --git a/src/evidenceforge/generation/engine/storyline.py b/src/evidenceforge/generation/engine/storyline.py index d2f9a6aa..b69f4e4a 100644 --- a/src/evidenceforge/generation/engine/storyline.py +++ b/src/evidenceforge/generation/engine/storyline.py @@ -43,6 +43,7 @@ from typing import Any from evidenceforge.generation.activity.application_catalog import resolve_image_path +from evidenceforge.generation.activity.generator import _ssh_syslog_time from evidenceforge.generation.activity.helpers import _get_os_category from evidenceforge.generation.activity.http_content import ( is_stable_resource_path, @@ -3339,7 +3340,17 @@ def _next_scan_path() -> dict[str, Any]: elif spec.type == "workstation_lock": sessions = self.state_manager.get_sessions_for_user(actor.username) - session = next((s for s in sessions if s.system == system.hostname), None) + session = max( + ( + s + for s in sessions + if s.system == system.hostname + and s.logon_type in (2, 10, 11) + and s.start_time <= time + ), + key=lambda s: s.start_time, + default=None, + ) logon_id = session.logon_id if session else "0x0" self.activity_generator.generate_workstation_lock( user=actor, @@ -3350,7 +3361,17 @@ def _next_scan_path() -> dict[str, Any]: elif spec.type == "workstation_unlock": sessions = self.state_manager.get_sessions_for_user(actor.username) - session = next((s for s in sessions if s.system == system.hostname), None) + session = max( + ( + s + for s in sessions + if s.system == system.hostname + and s.logon_type in (2, 10, 11) + and s.start_time <= time + ), + key=lambda s: s.start_time, + default=None, + ) logon_id = session.logon_id if session else "0x0" self.activity_generator.generate_workstation_unlock( user=actor, @@ -3521,9 +3542,16 @@ def _emit_scp_receiver_artifacts( integrity_level="High" if target_user == "root" else "Medium", ) sshd_actor_id = self.state_manager.get_process_object_id(target_system.hostname, sshd_pid) + ssh_syslog_seed = ( + target_system.hostname, + source_system.ip, + source_port, + sshd_pid, + transfer_time.isoformat(), + ) self.activity_generator.generate_syslog_event( system=target_system, - time=transfer_time + timedelta(milliseconds=80), + time=_ssh_syslog_time(transfer_time, "connection", 80, *ssh_syslog_seed), app_name="sshd", message=( f"Connection from {source_system.ip} port {source_port} " @@ -3534,7 +3562,7 @@ def _emit_scp_receiver_artifacts( ) self.activity_generator.generate_syslog_event( system=target_system, - time=transfer_time + timedelta(milliseconds=350), + time=_ssh_syslog_time(transfer_time, "accepted", 350, *ssh_syslog_seed), app_name="sshd", message=f"Accepted publickey for {target_user} from {source_system.ip} port {source_port} ssh2", pid=sshd_pid, @@ -3542,7 +3570,7 @@ def _emit_scp_receiver_artifacts( ) self.activity_generator.generate_syslog_event( system=target_system, - time=transfer_time + timedelta(milliseconds=900), + time=_ssh_syslog_time(transfer_time, "pam", 900, *ssh_syslog_seed), app_name="sshd", message=( f"pam_unix(sshd:session): session opened for user " diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index d966baa1..83f0cf1c 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -2381,6 +2381,45 @@ def test_workstation_unlock_skips_ended_session( assert "workstation_unlocked" not in emitted_types assert "logon" not in emitted_types + def test_workstation_lock_unlock_reject_network_session_luid( + self, activity_gen, test_user, test_system, state_manager, mock_emitters + ): + """4800/4801 and Type 7 unlock should never reuse a Type 3 network LUID.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + network_logon_id = "0xabc123" + state_manager.register_session( + logon_id=network_logon_id, + username=test_user.username, + system=test_system.hostname, + logon_type=3, + source_ip="10.0.0.55", + start_time=timestamp - timedelta(minutes=5), + ) + + activity_gen.generate_workstation_lock( + test_user, + test_system, + timestamp, + network_logon_id, + ) + activity_gen.generate_workstation_unlock( + test_user, + test_system, + timestamp + timedelta(minutes=5), + network_logon_id, + ) + + emitted_types = [ + call[0][0].event_type + for call in mock_emitters["windows_event_security"].emit.call_args_list + ] + assert "workstation_locked" not in emitted_types + assert "workstation_unlocked" not in emitted_types + assert not any( + call[0][0].event_type == "logon" and call[0][0].auth.logon_type == 7 + for call in mock_emitters["windows_event_security"].emit.call_args_list + ) + def test_credential_dump_command_uses_high_integrity_token( self, activity_gen, test_user, test_system, state_manager, mock_emitters ): diff --git a/tests/unit/test_storyline_command_networks.py b/tests/unit/test_storyline_command_networks.py index 32da1b99..d780a16e 100644 --- a/tests/unit/test_storyline_command_networks.py +++ b/tests/unit/test_storyline_command_networks.py @@ -3,10 +3,12 @@ """Tests for network evidence inferred from storyline commands.""" -from datetime import UTC, datetime +import random +from datetime import UTC, datetime, timedelta from types import SimpleNamespace from typing import Any +from evidenceforge.events.contexts import HostContext from evidenceforge.generation.engine.storyline import StorylineMixin from evidenceforge.models.scenario import System, User @@ -115,6 +117,7 @@ def __init__(self) -> None: self.explicit_credentials: list[dict] = [] self.processes: list[dict] = [] self.dhcp_leases: list[dict] = [] + self.syslog_events: list[dict] = [] def generate_bash_command(self, *args: Any, **kwargs: Any) -> None: return None @@ -125,6 +128,17 @@ def _resolve_parent(self, *args: Any, **kwargs: Any) -> int: def _get_system_pid(self, *args: Any, **kwargs: Any) -> int: return 500 + def _build_host_context(self, system: System) -> HostContext: + return HostContext( + hostname=system.hostname, + ip=system.ip, + os=system.os, + os_category="linux" + if "linux" in system.os.lower() or "ubuntu" in system.os.lower() + else "windows", + system_type=system.type, + ) + def generate_process(self, *args: Any, **kwargs: Any) -> int: self.processes.append(kwargs) return 4242 @@ -149,17 +163,29 @@ def generate_explicit_credentials(self, **kwargs: Any) -> None: def generate_dhcp_lease(self, **kwargs: Any) -> None: self.dhcp_leases.append(kwargs) + def generate_syslog_event(self, **kwargs: Any) -> None: + self.syslog_events.append(kwargs) + def _expand_and_emit(self, *args: Any, **kwargs: Any) -> None: return None class _FakeStateManager: + def set_current_time(self, *args: Any, **kwargs: Any) -> None: + return None + def get_sessions_for_user(self, username: str) -> list[SimpleNamespace]: return [SimpleNamespace(system="SRC", logon_id="0xabc")] def get_processes_on_system(self, hostname: str) -> list[SimpleNamespace]: return [] + def create_process(self, *args: Any, **kwargs: Any) -> int: + return 6505 + + def get_process_object_id(self, hostname: str, pid: int) -> str: + return f"{hostname}:{pid}" + def mark_story_process(self, hostname: str, pid: int) -> None: return None @@ -215,6 +241,64 @@ def capture_receiver_artifacts(**kwargs) -> None: assert engine.activity_generator.connections[0]["src_port"] == 45678 assert receiver_ports == [45678] + def test_scp_receiver_ssh_syslog_uses_distinct_submillisecond_suffixes(self): + source = System( + hostname="SRC", + ip="10.10.4.10", + os="Ubuntu 22.04", + type="workstation", + ) + target = System( + hostname="DST", + ip="10.10.2.30", + os="Ubuntu 22.04", + type="server", + ) + actor = User( + username="alice", + full_name="Alice Example", + email="alice@example.com", + ) + engine = object.__new__(StorylineMixin) + engine.state_manager = _FakeStateManager() + engine.activity_generator = _FakeActivityGenerator() + engine.dispatcher = SimpleNamespace(dispatch=lambda event: None) + transfer_time = datetime(2024, 3, 18, 17, 15, 2, 638000, tzinfo=UTC) + + engine._emit_scp_receiver_artifacts( + source_system=source, + target_system=target, + actor=actor, + source_pid=4242, + source_process="/usr/bin/scp", + source_command="scp /tmp/archive.tar.gz root@DST:/var/tmp/archive.tar.gz", + target_user="root", + target_path="/var/tmp/archive.tar.gz", + transfer_time=transfer_time, + source_port=40117, + rng=random.Random(7), + ) + + syslog_times = [event["time"] for event in engine.activity_generator.syslog_events] + assert len(syslog_times) == 3 + assert syslog_times[0] < syslog_times[1] < syslog_times[2] + assert ( + timedelta(milliseconds=80) + < syslog_times[0] - transfer_time + < timedelta(milliseconds=81) + ) + assert ( + timedelta(milliseconds=350) + < syslog_times[1] - transfer_time + < timedelta(milliseconds=351) + ) + assert ( + timedelta(milliseconds=900) + < syslog_times[2] - transfer_time + < timedelta(milliseconds=901) + ) + assert len({timestamp.microsecond % 1000 for timestamp in syslog_times}) == 3 + def test_net_domain_queries_do_not_auto_emit_4648(self): source = System( hostname="SRC", diff --git a/tests/unit/test_zeek_activity_contexts.py b/tests/unit/test_zeek_activity_contexts.py index 812222e0..df76d6ab 100644 --- a/tests/unit/test_zeek_activity_contexts.py +++ b/tests/unit/test_zeek_activity_contexts.py @@ -481,6 +481,7 @@ def test_ssh_syslog_sub_events_are_source_ordered_with_subsecond_texture(self, a assert timedelta(milliseconds=70) <= base_time - times[0] <= timedelta(milliseconds=160) assert timedelta(milliseconds=115) <= times[2] - base_time <= timedelta(milliseconds=270) assert times[2] - times[0] != timedelta(seconds=1) + assert len({timestamp.microsecond % 1000 for timestamp in times}) == len(times) logind_events = [ event @@ -489,6 +490,9 @@ def test_ssh_syslog_sub_events_are_source_ordered_with_subsecond_texture(self, a ] assert len(logind_events) == 1 assert logind_events[0].timestamp - times[2] >= timedelta(milliseconds=420) + assert logind_events[0].timestamp.microsecond % 1000 not in { + timestamp.microsecond % 1000 for timestamp in times + } def test_ssh_systemd_session_ids_stay_in_same_integer_regime(self, activity_gen): gen, events = activity_gen diff --git a/tests/unit/test_zeek_fanout.py b/tests/unit/test_zeek_fanout.py index 841cbccf..a0090484 100644 --- a/tests/unit/test_zeek_fanout.py +++ b/tests/unit/test_zeek_fanout.py @@ -457,8 +457,8 @@ def test_secondary_sensor_varies_observation_counters(self): assert dmz["orig_ip_bytes"] >= dmz["orig_bytes"] + dmz["orig_pkts"] * 40 assert dmz["resp_ip_bytes"] >= dmz["resp_bytes"] + dmz["resp_pkts"] * 40 - def test_secondary_sensor_varies_small_locked_observations(self): - """Tiny one-packet DNS-like observations should not clone after integer rounding.""" + def test_secondary_sensor_preserves_dns_packet_accounting(self): + """DNS packet sizes should stay identical across sensors observing the same query.""" with tempfile.TemporaryDirectory() as tmpdir: base = Path(tmpdir) conn_emitter = ZeekEmitter( @@ -506,7 +506,7 @@ def test_secondary_sensor_varies_small_locked_observations(self): core = json.loads((base / "core" / "conn.json").read_text()) dmz = json.loads((base / "dmz" / "conn.json").read_text()) - clone_fields = ( + locked_fields = ( "duration", "orig_bytes", "resp_bytes", @@ -515,4 +515,8 @@ def test_secondary_sensor_varies_small_locked_observations(self): "orig_ip_bytes", "resp_ip_bytes", ) - assert any(core[field] != dmz[field] for field in clone_fields) + assert core["uid"] != dmz["uid"] + assert core["ts"] != dmz["ts"] + assert all(core[field] == dmz[field] for field in locked_fields) + assert core["orig_ip_bytes"] - core["orig_bytes"] == 28 + assert core["resp_ip_bytes"] - core["resp_bytes"] == 28 diff --git a/tests/unit/test_zeek_multiplex.py b/tests/unit/test_zeek_multiplex.py index 4e3669a9..6fb3dce6 100644 --- a/tests/unit/test_zeek_multiplex.py +++ b/tests/unit/test_zeek_multiplex.py @@ -162,6 +162,40 @@ def test_sensor_observation_preserves_icmp_echo_accounting(self): assert row["orig_ip_bytes"] - row["orig_bytes"] == 28 assert row["resp_ip_bytes"] - row["resp_bytes"] == 28 + def test_udp_dns_ip_bytes_use_valid_header_accounting(self): + """UDP DNS rows should not render impossible IP-header deltas.""" + fmt = load_format("zeek_conn") + with tempfile.TemporaryDirectory() as tmpdir: + output_file = Path(tmpdir) / "conn.json" + emitter = ZeekEmitter(fmt, output_file) + + emitter.emit_event( + { + "ts": datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + "uid": "CTestDns1234567", + "id.orig_h": "10.0.0.1", + "id.orig_p": 41710, + "id.resp_h": "10.0.0.53", + "id.resp_p": 53, + "proto": "udp", + "service": "dns", + "duration": 0.02, + "orig_bytes": 80, + "resp_bytes": 177, + "orig_pkts": 1, + "resp_pkts": 1, + "orig_ip_bytes": 113, + "resp_ip_bytes": 211, + "conn_state": "SF", + "history": "Dd", + } + ) + emitter.close() + + row = json.loads(output_file.read_text().splitlines()[0]) + assert row["orig_ip_bytes"] - row["orig_bytes"] == 28 + assert row["resp_ip_bytes"] - row["resp_bytes"] == 28 + def test_sensor_timestamp_offsets_vary_by_flow(self): """Cross-sensor timestamps should not collapse into one fixed offset band.""" fmt = load_format("zeek_conn") From 4484c50e847a4ef328ec945e8b98c1e280866a83 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 00:39:15 -0400 Subject: [PATCH 34/61] docs: record loop 17 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 1573b54d..8191f89c 100644 --- a/TODO.md +++ b/TODO.md @@ -385,7 +385,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 15 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `9c9dcef`: enforced source-native auth state semantics so enabled known accounts fail as bad passwords, disabled synthetic accounts fail as disabled, and unknown targets fail as no-such-user; preserved ICMP echo payload/IP-byte accounting across Zeek sensors; and made ASA successful TCP teardown reasons agree with Zeek `SF`/FIN semantics. Verification passed with focused auth/Zeek/ASA tests (`90 passed`), broader related slice (`302 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3127 passed, 37 skipped`). Regenerated eval passed at `95.52/100` across `83,251` records; hard probes found zero locked/disabled-before-success auth violations, zero bad ICMP echo accounting rows, and zero ASA reset-to-Zeek-`SF` mismatches. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `68`, Network `72`, Host/EDR `64` (average `69.5`). Top Loop 16 targets are Windows `.cmd`/`.bat` process-image contradictions, Zeek interval float precision, Linux journald/syslog boilerplate volume, Sysmon ProcessGuid embedded-time clustering, active TCP source-port reuse, Mimikatz LSASS remote-thread over-modeling, Sysmon Event 3/Event 10 texture, and bash command/cadence repetition. - [x] Loop 16 fix pass completed and verified: Windows `.cmd`/`.bat` process launches now canonicalize to the real `cmd.exe` host before state/event creation, shared Zeek JSON rendering rounds float values to microsecond precision, and Linux `systemd-journald` storage-accounting boilerplate takes a smaller share of ambient syslog noise. Verification passed with focused regressions, broader process/catalog/Zeek/syslog slices, `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3128 passed, 37 skipped`). Regeneration and blind review follow. - [x] Loop 16 regeneration, hard probes, quantitative eval, and blind review completed from commit `21f3a79`: regenerated output passed automated eval at exact `96.09/100` across `81,173` records; hard probes confirmed zero Windows/eCAR `.cmd`/`.bat` process-image rows, zero Zeek interval precision violations with max six decimal places, journald at `419/9,101` syslog rows (`4.6%`), and no active TCP source-port overlaps in this regenerated dataset. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `76`, Network `82`, Host/EDR `78` (average `77.0`). Top Loop 17 targets are Zeek UDP/IP byte arithmetic and paired DNS payload contradictions, Windows lock/unlock binding to a Type 3 network-logon LUID, and Linux SSH/syslog microsecond suffix preservation. - - [ ] **IN PROGRESS** Loop 17 fix pass — repair Loop 16's highest-leverage source-native contradictions, starting with Zeek UDP/IP byte accounting and cross-sensor DNS payload consistency; also fix Windows lock/unlock LUID ownership if compact, with Linux SSH/syslog timestamp suffix jitter next. + - [x] Loop 17 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `eaf090a`: repaired Zeek UDP/DNS packet accounting and cross-sensor DNS payload consistency, blocked workstation lock/unlock/Type 7 unlock generation from Type 3/5 sessions, selected interactive/RDP sessions for storyline lock/unlock events, and added source-local sub-millisecond jitter for SSH syslog lifecycles across baseline, explicit SSH, and SCP receiver artifacts. Verification passed with focused regressions, broader related slices (`406 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero UDP/DNS IP-byte violations, zero cross-sensor DNS payload mismatches, zero Type 3/5 LUID lock/unlock violations, and zero repeated low-microsecond SSH lifecycle suffix groups. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `72`, Host/EDR `68` (average `71.5`). Top Loop 18 targets are exact `0.8` Zeek TLS duration clustering, proxy HTTPS source-mode semantics, Sysmon/Security LogonGuid fidelity, SCP receiver-side eCAR process attribution, bash-history authoredness, and SMB/REJ statistical polish. + - [ ] **IN PROGRESS** Loop 18 fix pass — repair the highest-leverage Loop 17 network timing tell first: exact `0.8` second Zeek TLS/web duration clustering; if compact, also start proxy HTTPS source-mode cleanup or LogonGuid propagation. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From bc738f261f8c65c397130d7960e183d6288014d3 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 00:44:04 -0400 Subject: [PATCH 35/61] fix: vary completed TLS duration floors --- .../generation/activity/generator.py | 5 +++- src/evidenceforge/generation/emitters/zeek.py | 28 +++++++++++++++++-- tests/unit/test_baseline_canonical.py | 2 +- tests/unit/test_zeek_ssl.py | 3 +- 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index dffb40e4..17be57e0 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -6584,7 +6584,10 @@ def generate_connection( ) tls_min_duration = tls_min_window.min_ms / 1000 if duration is None or duration < tls_min_duration: - duration = tls_min_duration + rng.uniform(0.0, 0.4) + max_extra = max( + 0.016, min(0.65, (tls_min_window.max_ms - tls_min_window.min_ms) / 1000) + ) + duration = tls_min_duration + rng.uniform(0.015, max_extra) else: duration += rng.expovariate(1.0 / 0.35) if rng.random() < 0.08: diff --git a/src/evidenceforge/generation/emitters/zeek.py b/src/evidenceforge/generation/emitters/zeek.py index 1b938171..17b29ec0 100644 --- a/src/evidenceforge/generation/emitters/zeek.py +++ b/src/evidenceforge/generation/emitters/zeek.py @@ -27,6 +27,7 @@ from evidenceforge.events.base import SecurityEvent from evidenceforge.generation.activity.timing_profiles import get_timing_window from evidenceforge.generation.emitters.zeek_base import SensorMultiplexEmitter +from evidenceforge.utils.rng import _stable_seed _ZEEK_SERVICE_ALIASES: dict[str, str] = { "kerberos": "krb", @@ -38,6 +39,21 @@ } +def _tls_completed_duration_floor(event: SecurityEvent, min_ms: int, max_ms: int) -> float: + """Return a deterministic TLS analyzer duration floor with source-native texture.""" + net = event.network + if net is None: + return min_ms / 1000 + span_ms = max(1, max_ms - min_ms) + seed = _stable_seed( + "zeek_tls_duration_floor:" + f"{net.zeek_uid}:{net.src_ip}:{net.src_port}:{net.dst_ip}:{net.dst_port}:" + f"{event.timestamp.isoformat()}" + ) + extra_ms = 1 + (seed % span_ms) + return (min_ms + extra_ms) / 1000 + + class ZeekEmitter(SensorMultiplexEmitter): """Emitter for Zeek conn.log format (JSON). @@ -108,8 +124,16 @@ def emit(self, event: SecurityEvent) -> None: default_class="same_observation", ) min_duration = tls_min_window.min_ms / 1000 - if duration is None or duration < min_duration: - duration = min_duration + if ( + duration is None + or duration < min_duration + or abs(duration - min_duration) < 0.000001 + ): + duration = _tls_completed_duration_floor( + event, + tls_min_window.min_ms, + tls_min_window.max_ms, + ) event_data = { "ts": event.timestamp, "uid": net.zeek_uid, diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 5d7b6609..7daf6bd3 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -206,7 +206,7 @@ def test_completed_tls_duration_contains_zeek_analyzer_evidence( event = mock_emitters["zeek_conn"].emit.call_args[0][0] assert event.ssl is not None assert event.x509 is not None - assert event.network.duration >= 0.8 + assert event.network.duration > 0.8 class TestForegroundProcessTermination: diff --git a/tests/unit/test_zeek_ssl.py b/tests/unit/test_zeek_ssl.py index 4d50877a..d9338e0e 100644 --- a/tests/unit/test_zeek_ssl.py +++ b/tests/unit/test_zeek_ssl.py @@ -744,7 +744,8 @@ def test_tls_conn_duration_contains_ssl_analyzer_offset(self): conn_row = json.loads((out_dir / "conn.json").read_text().splitlines()[0]) - assert conn_row["duration"] >= 0.8 + assert conn_row["duration"] > 0.8 + assert conn_row["duration"] != 0.8 def test_x509_rejects_partial_handshake(self): """x509.log should not emit certificates for incomplete TLS handshakes.""" From e98f744b78cdfcec7584a5ecf745ed467c2673ee Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 01:01:26 -0400 Subject: [PATCH 36/61] docs: record loop 18 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 8191f89c..03ac7b49 100644 --- a/TODO.md +++ b/TODO.md @@ -386,7 +386,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 16 fix pass completed and verified: Windows `.cmd`/`.bat` process launches now canonicalize to the real `cmd.exe` host before state/event creation, shared Zeek JSON rendering rounds float values to microsecond precision, and Linux `systemd-journald` storage-accounting boilerplate takes a smaller share of ambient syslog noise. Verification passed with focused regressions, broader process/catalog/Zeek/syslog slices, `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3128 passed, 37 skipped`). Regeneration and blind review follow. - [x] Loop 16 regeneration, hard probes, quantitative eval, and blind review completed from commit `21f3a79`: regenerated output passed automated eval at exact `96.09/100` across `81,173` records; hard probes confirmed zero Windows/eCAR `.cmd`/`.bat` process-image rows, zero Zeek interval precision violations with max six decimal places, journald at `419/9,101` syslog rows (`4.6%`), and no active TCP source-port overlaps in this regenerated dataset. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `76`, Network `82`, Host/EDR `78` (average `77.0`). Top Loop 17 targets are Zeek UDP/IP byte arithmetic and paired DNS payload contradictions, Windows lock/unlock binding to a Type 3 network-logon LUID, and Linux SSH/syslog microsecond suffix preservation. - [x] Loop 17 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `eaf090a`: repaired Zeek UDP/DNS packet accounting and cross-sensor DNS payload consistency, blocked workstation lock/unlock/Type 7 unlock generation from Type 3/5 sessions, selected interactive/RDP sessions for storyline lock/unlock events, and added source-local sub-millisecond jitter for SSH syslog lifecycles across baseline, explicit SSH, and SCP receiver artifacts. Verification passed with focused regressions, broader related slices (`406 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero UDP/DNS IP-byte violations, zero cross-sensor DNS payload mismatches, zero Type 3/5 LUID lock/unlock violations, and zero repeated low-microsecond SSH lifecycle suffix groups. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `72`, Host/EDR `68` (average `71.5`). Top Loop 18 targets are exact `0.8` Zeek TLS duration clustering, proxy HTTPS source-mode semantics, Sysmon/Security LogonGuid fidelity, SCP receiver-side eCAR process attribution, bash-history authoredness, and SMB/REJ statistical polish. - - [ ] **IN PROGRESS** Loop 18 fix pass — repair the highest-leverage Loop 17 network timing tell first: exact `0.8` second Zeek TLS/web duration clustering; if compact, also start proxy HTTPS source-mode cleanup or LogonGuid propagation. + - [x] Loop 18 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `bc738f2`: repaired the high-volume exact `0.8` Zeek TLS/web duration floor by adding deterministic post-floor texture in both generator-owned TLS durations and Zeek render-time fallback floors. Verification passed with focused TLS/activity tests, broader Zeek/activity/timing slices (`289 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero exact `0.8` TLS rows, zero rows in the `0.800`-`0.801` band, max repeated TLS duration bucket of `2`, and preserved prior UDP/DNS, cross-sensor DNS, Windows LUID, and SSH syslog gates. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `72`, Network `74`, Host/EDR `64` (average `71.5`). Top Loop 19 targets are HTTP/proxy source-native response semantics, cross-sensor Zeek timing-band regularity, public TLS/web long-tail texture, same-LUID Security/Sysmon LogonGuid consistency, and bash/host authoredness. + - [ ] **IN PROGRESS** Loop 19 fix pass — repair the highest-leverage Loop 18 HTTP/proxy source-native contradictions first: Windows Update Agent paired with Ubuntu package paths and Zeek/proxy HTTP redirect/error rows inheriting MIME types from requested file extensions. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From dc4616c252253d1941a0b7088ea3c63a1c95297e Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 01:06:54 -0400 Subject: [PATCH 37/61] fix: repair proxy http response semantics --- .../config/activity/dns_registry.yaml | 2 +- .../config/activity/proxy_uri_templates.yaml | 1 + .../config/activity/proxy_user_agents.yaml | 3 +- .../generation/activity/generator.py | 9 +++- .../generation/activity/http_content.py | 2 + .../generation/activity/proxy_uri.py | 19 +++++++- tests/unit/test_activity.py | 17 ++++++++ tests/unit/test_http_content.py | 2 + tests/unit/test_ua_os_mismatch.py | 43 ++++++++++++++++--- 9 files changed, 88 insertions(+), 10 deletions(-) diff --git a/src/evidenceforge/config/activity/dns_registry.yaml b/src/evidenceforge/config/activity/dns_registry.yaml index e5d82323..7f32a4dd 100644 --- a/src/evidenceforge/config/activity/dns_registry.yaml +++ b/src/evidenceforge/config/activity/dns_registry.yaml @@ -186,7 +186,7 @@ domains: tags: [saas, background, outlook, teams, onedrive] - domain: packages.microsoft.com ips: ["13.107.246.52", "13.107.246.53"] - tags: [background, windows, linux] + tags: [background, linux] - domain: res.cdn.office.net ips: ["13.107.6.171", "13.107.9.171"] tags: [cdn, outlook, teams, onedrive] diff --git a/src/evidenceforge/config/activity/proxy_uri_templates.yaml b/src/evidenceforge/config/activity/proxy_uri_templates.yaml index 7f4e5cc6..65f83694 100644 --- a/src/evidenceforge/config/activity/proxy_uri_templates.yaml +++ b/src/evidenceforge/config/activity/proxy_uri_templates.yaml @@ -67,6 +67,7 @@ domains: packages.microsoft.com: domain_class: software_update + os: linux referrer_policy: none paths: - "/config/ubuntu/22.04/packages-microsoft-prod.deb" diff --git a/src/evidenceforge/config/activity/proxy_user_agents.yaml b/src/evidenceforge/config/activity/proxy_user_agents.yaml index 82a81f92..88ede4ad 100644 --- a/src/evidenceforge/config/activity/proxy_user_agents.yaml +++ b/src/evidenceforge/config/activity/proxy_user_agents.yaml @@ -13,7 +13,6 @@ domain_overrides: hosts: - download.windowsupdate.com - ctldl.windowsupdate.com - - packages.microsoft.com user_agents: - "Windows-Update-Agent/10.0.10011.16384 Client-Protocol/2.33" google_update_windows: @@ -83,6 +82,7 @@ workstation: - changelogs.ubuntu.com - deb.debian.org - security.debian.org + - packages.microsoft.com user_agents: - "apt-http/2.4.11 (amd64)" - "Debian APT-HTTP/1.3 (2.4.11)" @@ -124,6 +124,7 @@ server: - changelogs.ubuntu.com - deb.debian.org - security.debian.org + - packages.microsoft.com user_agents: - "apt-http/2.4.11 (amd64)" - "Debian APT-HTTP/1.3 (2.4.11)" diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 17be57e0..f33393de 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -432,7 +432,14 @@ def _normalize_http_context_for_source_native_response(http: HttpContext) -> Htt status_msg = http_status_message(status_code) resp_mime_types = list(http.resp_mime_types) - if not resp_mime_types or response_body_len <= 0 or method == "HEAD" or bodyless_status: + if ( + not resp_mime_types + or response_body_len <= 0 + or method == "HEAD" + or bodyless_status + or status_code in {301, 302} + or status_code >= 400 + ): mime_type = resp_mime_types[0] if resp_mime_types else "" resp_mime_types = response_mime_types_for_status( status_code, diff --git a/src/evidenceforge/generation/activity/http_content.py b/src/evidenceforge/generation/activity/http_content.py index a1207f50..3781a837 100644 --- a/src/evidenceforge/generation/activity/http_content.py +++ b/src/evidenceforge/generation/activity/http_content.py @@ -122,6 +122,8 @@ def response_mime_types_for_status( return [] if method.upper() == "HEAD" or status_code in {204, 304}: return [] + if status_code in {301, 302} or status_code >= 400: + return ["text/html"] return [mime_type] diff --git a/src/evidenceforge/generation/activity/proxy_uri.py b/src/evidenceforge/generation/activity/proxy_uri.py index 258ef297..981aa46d 100644 --- a/src/evidenceforge/generation/activity/proxy_uri.py +++ b/src/evidenceforge/generation/activity/proxy_uri.py @@ -81,6 +81,18 @@ def is_browser_like_proxy_domain(hostname: str) -> bool: return domain_class not in _NON_BROWSER_DOMAIN_CLASSES +def _entry_matches_source_os(entry: Any, source_os: str | None) -> bool: + """Return whether a URI template entry is compatible with the source OS.""" + if not isinstance(entry, dict): + return False + entry_os = entry.get("os") + if not entry_os or not source_os: + return True + if isinstance(entry_os, list): + return source_os in {str(value) for value in entry_os} + return str(entry_os) == source_os + + def _substitute_vars(rng: random.Random, path: str, data: dict[str, Any]) -> str: """Replace template variables in a URI path.""" while "{guid}" in path: @@ -130,13 +142,16 @@ def pick_proxy_uri( # 1. Exact domain match domains = data.get("domains", {}) entry = domains.get(hostname) + if not _entry_matches_source_os(entry, source_os): + entry = None # 2. Tag-based fallback if entry is None: tags = data.get("tags", {}) for tag in domain_tags: - if tag in tags: - entry = tags[tag] + candidate = tags.get(tag) + if _entry_matches_source_os(candidate, source_os): + entry = candidate break # 3. Generic fallback diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 83f0cf1c..97687098 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -44,6 +44,7 @@ _http_context_from_process_command, _jitter_default_connection_duration, _network_effect_context_for_process, + _normalize_http_context_for_source_native_response, ) from evidenceforge.generation.activity.http_content import response_size_for_status from evidenceforge.generation.activity.tls_realism import ( @@ -67,6 +68,22 @@ def test_missing_process_object_id_returns_empty(self): class TestProcessHttpCommandCorrelation: + def test_http_normalization_rewrites_error_asset_mime_to_error_body(self): + """Caller-provided HTTP errors should not keep MIME from requested asset extension.""" + http = HttpContext( + method="GET", + host="portal.example.com", + uri="/assets/logo.svg", + response_body_len=900, + status_code=503, + status_msg="Service Unavailable", + resp_mime_types=["image/svg+xml"], + ) + + normalized = _normalize_http_context_for_source_native_response(http) + + assert normalized.resp_mime_types == ["text/html"] + def test_http_context_from_curl_command_preserves_url_and_user_agent(self): """CLI HTTP command lines should drive the canonical HTTP flow metadata.""" result = _http_context_from_process_command( diff --git a/tests/unit/test_http_content.py b/tests/unit/test_http_content.py index ed6dbf37..99ca25b4 100644 --- a/tests/unit/test_http_content.py +++ b/tests/unit/test_http_content.py @@ -58,6 +58,8 @@ def test_response_mime_types_require_visible_body_and_success_status(): assert response_mime_types_for_status(200, "text/css", 0) == [] assert response_mime_types_for_status(200, "text/css", 2048, method="HEAD") == [] assert response_mime_types_for_status(403, "text/html", 900) == ["text/html"] + assert response_mime_types_for_status(301, "application/javascript", 220) == ["text/html"] + assert response_mime_types_for_status(404, "image/jpeg", 900) == ["text/html"] def test_error_response_size_is_template_stable_by_status_host_and_uri(): diff --git a/tests/unit/test_ua_os_mismatch.py b/tests/unit/test_ua_os_mismatch.py index 568a224f..1af52ba7 100644 --- a/tests/unit/test_ua_os_mismatch.py +++ b/tests/unit/test_ua_os_mismatch.py @@ -116,11 +116,6 @@ def test_certificate_infra_templates_are_not_browser_like(self): "crl.microsoft.com": {"application/pkix-crl"}, "settings-win.data.microsoft.com": {"application/json"}, "update.googleapis.com": {"application/json", "application/octet-stream"}, - "packages.microsoft.com": { - "application/vnd.debian.binary-package", - "application/x-gzip", - "text/plain", - }, "archive.ubuntu.com": {"application/x-gzip", "text/plain"}, } for host, allowed_types in infra_domains.items(): @@ -135,6 +130,44 @@ def test_certificate_infra_templates_are_not_browser_like(self): assert content_type in allowed_types assert referrer_policy == "none" + path, content_type, _method, _ua_override, referrer_policy = pick_proxy_uri( + random.Random(42), + "packages.microsoft.com", + ["background"], + source_os="linux", + ) + assert "/ubuntu/" in path or path.endswith(".deb") + assert content_type in { + "application/vnd.debian.binary-package", + "application/x-gzip", + "text/plain", + } + assert referrer_policy == "none" + + def test_linux_package_templates_do_not_apply_to_windows_sources(self): + """OS-scoped exact templates should fall back instead of pairing Windows hosts with apt paths.""" + from evidenceforge.generation.activity.dns_registry import get_domains_by_tag + from evidenceforge.generation.activity.proxy_uri import pick_proxy_uri + + path, content_type, _method, ua_override, _referrer_policy = pick_proxy_uri( + random.Random(42), + "packages.microsoft.com", + ["background"], + source_os="windows", + ) + assert "/ubuntu/" not in path + assert not path.endswith((".deb", "Packages.gz")) + assert content_type not in { + "application/vnd.debian.binary-package", + "application/x-gzip", + } + assert ua_override is None + + windows_background_domains = { + entry["domain"] for entry in get_domains_by_tag("background", "windows") + } + assert "packages.microsoft.com" not in windows_background_domains + def test_standalone_static_proxy_paths_do_not_claim_same_origin_referrers(self): """Single proxy asset requests should not imply an unseen page load.""" from evidenceforge.generation.activity.proxy_uri import pick_proxy_uri From b4c99b1ce1f80a174158b18096dbf19a7e799083 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 01:13:39 -0400 Subject: [PATCH 38/61] fix: preserve redirect response mime semantics --- .../generation/activity/generator.py | 14 ++++-- tests/unit/test_zeek_activity_contexts.py | 50 +++++++++++++++++++ 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index f33393de..d47adedf 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -7210,15 +7210,21 @@ def generate_connection( tags=[], ) # Probabilistic file transfer for HTTP responses with content - if resp_body_len > 100 and rng.random() < 0.3: + if ( + 200 <= status_code < 300 + and resp_body_len > 100 + and event.http.resp_mime_types + and rng.random() < 0.3 + ): from evidenceforge.events.contexts import FileTransferContext from evidenceforge.utils.ids import generate_zeek_uid fuid = generate_zeek_uid("F") + file_mime_type = event.http.resp_mime_types[0] file_hashes = _file_transfer_hashes( f"http:{host}:{uri}:{resp_body_len}:{fuid}", ["SHA1"] - if mime_type in {"application/x-dosexec", "application/octet-stream"} + if file_mime_type in {"application/x-dosexec", "application/octet-stream"} else [], ) event.file_transfer = FileTransferContext( @@ -7226,7 +7232,7 @@ def generate_connection( source="HTTP", depth=0, analyzers=[], - mime_type=mime_type, + mime_type=file_mime_type, duration=rng.uniform(0.0, 0.01), local_orig=_is_private_ip(dst_ip), is_orig=False, @@ -7242,7 +7248,7 @@ def generate_connection( # PE analysis for Windows executables in file transfers if ( - mime_type in ("application/x-dosexec", "application/octet-stream") + file_mime_type in ("application/x-dosexec", "application/octet-stream") and rng.random() < 0.1 ): from evidenceforge.events.contexts import PeContext diff --git a/tests/unit/test_zeek_activity_contexts.py b/tests/unit/test_zeek_activity_contexts.py index df76d6ab..d5827fb8 100644 --- a/tests/unit/test_zeek_activity_contexts.py +++ b/tests/unit/test_zeek_activity_contexts.py @@ -1271,6 +1271,56 @@ def test_duplicate_icmp_tuple_times_are_disambiguated(self, activity_gen): class TestFileTransferContext: """Verify FileTransferContext populated probabilistically for HTTP.""" + def test_redirect_asset_response_does_not_attach_asset_file_transfer( + self, activity_gen, monkeypatch + ): + """Redirect bodies keep text/html MIME instead of asset extension MIME.""" + gen, events = activity_gen + + class LowRandom(random.Random): + def random(self) -> float: + return 0.05 + + import evidenceforge.generation.activity.generator as generator_module + import evidenceforge.generation.activity.proxy_uri as proxy_uri_module + + monkeypatch.setattr(generator_module, "_get_rng", lambda: LowRandom(7)) + monkeypatch.setattr( + generator_module, + "_get_http_status", + lambda _dst_ip, _uri: (301, "Moved Permanently"), + ) + monkeypatch.setattr( + proxy_uri_module, + "pick_proxy_uri", + lambda *_args, **_kwargs: ( + "/assets/app.js", + "application/javascript", + "GET", + "", + "none", + ), + ) + + gen.generate_connection( + src_ip="10.0.10.50", + dst_ip="93.184.216.34", + time=datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + dst_port=80, + proto="tcp", + service="http", + duration=1.0, + orig_bytes=200, + resp_bytes=5000, + conn_state="SF", + ) + + event = events[-1] + assert event.http is not None + assert event.http.status_code == 301 + assert event.http.resp_mime_types == ["text/html"] + assert event.file_transfer is None + def test_file_transfer_sometimes_populated(self, activity_gen): """Over many HTTP connections, some should have FileTransferContext.""" gen, events = activity_gen From 6b207bd569c0dc72e1b856fd5b64c2edfda1f775 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 01:33:25 -0400 Subject: [PATCH 39/61] docs: record loop 19 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 03ac7b49..ffd3459d 100644 --- a/TODO.md +++ b/TODO.md @@ -387,7 +387,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 16 regeneration, hard probes, quantitative eval, and blind review completed from commit `21f3a79`: regenerated output passed automated eval at exact `96.09/100` across `81,173` records; hard probes confirmed zero Windows/eCAR `.cmd`/`.bat` process-image rows, zero Zeek interval precision violations with max six decimal places, journald at `419/9,101` syslog rows (`4.6%`), and no active TCP source-port overlaps in this regenerated dataset. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `76`, Network `82`, Host/EDR `78` (average `77.0`). Top Loop 17 targets are Zeek UDP/IP byte arithmetic and paired DNS payload contradictions, Windows lock/unlock binding to a Type 3 network-logon LUID, and Linux SSH/syslog microsecond suffix preservation. - [x] Loop 17 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `eaf090a`: repaired Zeek UDP/DNS packet accounting and cross-sensor DNS payload consistency, blocked workstation lock/unlock/Type 7 unlock generation from Type 3/5 sessions, selected interactive/RDP sessions for storyline lock/unlock events, and added source-local sub-millisecond jitter for SSH syslog lifecycles across baseline, explicit SSH, and SCP receiver artifacts. Verification passed with focused regressions, broader related slices (`406 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero UDP/DNS IP-byte violations, zero cross-sensor DNS payload mismatches, zero Type 3/5 LUID lock/unlock violations, and zero repeated low-microsecond SSH lifecycle suffix groups. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `72`, Host/EDR `68` (average `71.5`). Top Loop 18 targets are exact `0.8` Zeek TLS duration clustering, proxy HTTPS source-mode semantics, Sysmon/Security LogonGuid fidelity, SCP receiver-side eCAR process attribution, bash-history authoredness, and SMB/REJ statistical polish. - [x] Loop 18 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `bc738f2`: repaired the high-volume exact `0.8` Zeek TLS/web duration floor by adding deterministic post-floor texture in both generator-owned TLS durations and Zeek render-time fallback floors. Verification passed with focused TLS/activity tests, broader Zeek/activity/timing slices (`289 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero exact `0.8` TLS rows, zero rows in the `0.800`-`0.801` band, max repeated TLS duration bucket of `2`, and preserved prior UDP/DNS, cross-sensor DNS, Windows LUID, and SSH syslog gates. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `72`, Network `74`, Host/EDR `64` (average `71.5`). Top Loop 19 targets are HTTP/proxy source-native response semantics, cross-sensor Zeek timing-band regularity, public TLS/web long-tail texture, same-LUID Security/Sysmon LogonGuid consistency, and bash/host authoredness. - - [ ] **IN PROGRESS** Loop 19 fix pass — repair the highest-leverage Loop 18 HTTP/proxy source-native contradictions first: Windows Update Agent paired with Ubuntu package paths and Zeek/proxy HTTP redirect/error rows inheriting MIME types from requested file extensions. + - [x] Loop 19 fix, regeneration, hard probes, quantitative eval, and blind review completed from commits `dc4616c` and `b4c99b1`: repaired HTTP/proxy source-native response semantics by blocking Windows Update Agent from Linux package paths, OS-gating `packages.microsoft.com` URI/UA selection, normalizing redirect/error HTTP MIME to source-native HTML bodies, and preventing HTTP file-transfer fan-out from reintroducing asset MIME on redirects/errors. Verification passed with focused HTTP/proxy/Zeek tests, broader related slices (`269 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3134 passed, 37 skipped`). Regenerated eval passed at exact `96.23/100` across `90,613` records; hard probes found zero Windows Update Agent + Ubuntu package path violations, zero redirect/error asset-MIME violations, zero exact `0.8` TLS rows, and zero rows in the `0.800`-`0.801` TLS duration band. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `76`, Host/EDR `82` (average `77.5`). Top Loop 20 targets are user-shell/UWP processes incorrectly modeled as SYSTEM/session 0, duplicate `explorer.exe` shells from one `userinit.exe`, Zeek HTTP connection reuse, Snort TLS-failure vs Zeek established-TLS conflicts, and Linux `scp` eCAR lifecycle coverage. + - [ ] **IN PROGRESS** Loop 20 fix pass — repair the highest-leverage Loop 19 endpoint source-native process/session contradictions first: user-shell/UWP processes (`sihost.exe`, `RuntimeBroker.exe`, `backgroundTaskHost.exe`, `SearchHost.exe`) should bind to interactive user sessions instead of SYSTEM/session 0, and interactive logon startup should not create multiple simultaneous primary `explorer.exe` shells from the same `userinit.exe`. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 76bc107a127c6db9748449092a943c7ff0e3d2ee Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 01:45:29 -0400 Subject: [PATCH 40/61] fix: bind shell helpers to user sessions --- .../config/activity/system_processes.yaml | 20 --- .../generation/activity/generator.py | 121 ++++++++++++++++-- tests/unit/test_activity.py | 54 ++++++++ tests/unit/test_phase5_system_traffic.py | 102 +++++++++++++++ 4 files changed, 263 insertions(+), 34 deletions(-) diff --git a/src/evidenceforge/config/activity/system_processes.yaml b/src/evidenceforge/config/activity/system_processes.yaml index 150c2054..35d01bf1 100644 --- a/src/evidenceforge/config/activity/system_processes.yaml +++ b/src/evidenceforge/config/activity/system_processes.yaml @@ -76,26 +76,10 @@ system_services: - "conhost.exe 0x4" - "\\??\\C:\\Windows\\system32\\conhost.exe 0xffffffff -ForceV1" parent: csrss_s0 - - image: "C:\\Windows\\System32\\RuntimeBroker.exe" - command_templates: - - "RuntimeBroker.exe -Embedding" - parent: svchost_dcom - loaded_modules: - - path: "C:\\Windows\\System32\\combase.dll" - - path: "C:\\Windows\\System32\\ole32.dll" - - path: "C:\\Windows\\System32\\twinapi.appcore.dll" - image: "C:\\Windows\\System32\\spoolsv.exe" command_templates: - "spoolsv.exe" parent: services - - image: "C:\\Windows\\System32\\sihost.exe" - command_templates: - - "sihost.exe" - parent: svchost_netsvcs - - image: "C:\\Windows\\System32\\backgroundTaskHost.exe" - command_templates: - - "backgroundTaskHost.exe" - parent: svchost_local_system domain_controller: - image: "C:\\Windows\\System32\\dfsr.exe" @@ -122,10 +106,6 @@ system_services: parent: services workstation: - - image: "C:\\Windows\\SystemApps\\MicrosoftWindows.Client.CBS_cw5n1h2txyewy\\SearchHost.exe" - command_templates: - - "SearchHost.exe" - parent: svchost_netsvcs - image: "C:\\Windows\\System32\\SearchProtocolHost.exe" command_templates: - "SearchProtocolHost.exe {search_pipe_args}" diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index d47adedf..7690d4c8 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -79,6 +79,7 @@ from evidenceforge.generation.emitters import WindowsEventEmitter, ZeekEmitter from evidenceforge.generation.state_manager import StateManager from evidenceforge.models.scenario import System, User +from evidenceforge.models.state import ActiveSession from evidenceforge.utils.ids import generate_stable_zeek_uid from evidenceforge.utils.rng import _stable_seed from evidenceforge.utils.time import ensure_utc @@ -662,6 +663,18 @@ def _nmap_conn_state(port: int, target_system: System | None = None) -> str: "shellexperiencehost.exe", "applicationframehost.exe", } +_WINDOWS_SHELL_UWP_USER_PROCESS_EXES = frozenset( + { + "sihost.exe", + "searchhost.exe", + "runtimebroker.exe", + "backgroundtaskhost.exe", + "textinputhost.exe", + "startmenuexperiencehost.exe", + "shellexperiencehost.exe", + "applicationframehost.exe", + } +) _WINDOWS_ONE_SHOT_CLI_EXES = { "dsquery.exe", "gpresult.exe", @@ -2308,6 +2321,51 @@ def _parameterize_command_for_system( command_line = command_line.replace("{ssh_target}", target) return _parameterize_command(rng, command_line, username=username) + def _active_interactive_windows_session( + self, + system: System, + time: datetime, + ) -> ActiveSession | None: + """Return the newest user-owned interactive Windows session on a host.""" + if _get_os_category(system.os) != "windows": + return None + + candidates = [ + session + for session in self.state_manager.list_active_sessions() + if ( + session.system == system.hostname + and session.username not in _SYSTEM_ACCOUNTS + and not session.username.endswith("$") + and session.logon_type in _WINDOWS_INTERACTIVE_SESSION_LOGON_TYPES + and session.session_kind not in {"network", "service"} + and _session_started_by(session, time) + ) + ] + if not candidates: + return None + + assigned_user = getattr(system, "assigned_user", None) + if assigned_user: + assigned_candidates = [ + session for session in candidates if session.username == assigned_user + ] + if assigned_candidates: + candidates = assigned_candidates + return max(candidates, key=lambda session: session.start_time) + + def _user_model_for_username(self, username: str) -> User: + """Resolve a known scenario user, or build a safe fallback user object.""" + known_users = getattr(self, "_users_by_username", {}) + known_user = known_users.get(username) + if known_user is not None: + return known_user + return User( + username=username, + full_name=username, + email=f"{username}@{self._valid_fallback_email_domain()}", + ) + def _resolve_process_identity( self, *, @@ -2334,21 +2392,9 @@ def _resolve_process_identity( ): return username, logon_id - candidates = [ - session - for session in self.state_manager.list_active_sessions() - if ( - session.system == system.hostname - and session.username not in _SYSTEM_ACCOUNTS - and session.logon_type in (2, 10, 11) - and _session_started_by(session, time) - ) - ] - if not candidates: + session = self._active_interactive_windows_session(system, time) + if session is None: return username, logon_id - - candidates.sort(key=lambda session: session.start_time, reverse=True) - session = candidates[0] return session.username, session.logon_id def _remember_connection_tuple( @@ -5011,6 +5057,26 @@ def generate_process( _token_elevation = "%%1938" _mandatory_label = "S-1-16-8192" + if ( + not from_storyline + and _get_os_category(system.os) == "windows" + and _exe_lower == "explorer.exe" + and process_logon_id not in _SYSTEM_ACCOUNT_LOGON_IDS.values() + ): + explorer_pid = self._ensure_session_explorer_pid( + system, + self._user_model_for_username(process_username), + time, + process_logon_id, + ) + if explorer_pid is not None: + self.state_manager.update_process_activity_time( + system.hostname, + explorer_pid, + time, + ) + return explorer_pid + singleton_pid = self._existing_windows_singleton_pid(system, process_name, time) if singleton_pid is not None: running_proc = self.state_manager.get_process(system.hostname, singleton_pid) @@ -8193,6 +8259,33 @@ def generate_system_process( ) exe_name = ntpath.basename(process_name).lower() + if ( + _get_os_category(system.os) == "windows" + and exe_name in _WINDOWS_SHELL_UWP_USER_PROCESS_EXES + ): + session = self._active_interactive_windows_session(system, time) + if session is None: + return 0 + session_user = self._user_model_for_username(session.username) + if self.state_manager.get_process(system.hostname, parent_pid) is None: + parent_pid = self._resolve_parent( + system, + session_user, + time, + session.logon_id, + process_name, + ) + return self.generate_process( + user=session_user, + system=system, + time=time, + logon_id=session.logon_id, + process_name=process_name, + command_line=command_line, + parent_pid=parent_pid, + allow_existing_browser_reuse=False, + ) + if _get_os_category(system.os) == "windows" and exe_name in _WINDOWS_SINGLETON_SERVICE_EXES: for proc in self.state_manager.get_processes_on_system(system.hostname): if ntpath.basename(proc.image).lower() == exe_name: diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 97687098..0b747220 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -395,6 +395,60 @@ def test_interactive_logons_get_distinct_userinit_parents( ) assert first_explorer.parent_pid != second_explorer.parent_pid + def test_repeated_explorer_creation_reuses_session_shell( + self, activity_gen, test_user, test_system, state_manager, mock_emitters + ): + """Baseline explorer.exe launches should reuse the interactive session shell.""" + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + smss_pid = state_manager.create_process( + test_system.hostname, + 4, + r"C:\Windows\System32\smss.exe", + r"C:\Windows\System32\smss.exe", + "SYSTEM", + "System", + ) + activity_gen._system_pids = {test_system.hostname: {"smss": smss_pid}} + logon_id = activity_gen.generate_logon(test_user, test_system, timestamp, logon_type=2) + session = state_manager.get_session(logon_id) + assert session is not None + assert session.explorer_pid is not None + mock_emitters["windows_event_security"].reset_mock() + + first_pid = activity_gen.generate_process( + test_user, + test_system, + timestamp + timedelta(seconds=1), + logon_id, + r"C:\Windows\explorer.exe", + "explorer.exe", + parent_pid=4, + ) + second_pid = activity_gen.generate_process( + test_user, + test_system, + timestamp + timedelta(seconds=2), + logon_id, + r"C:\Windows\explorer.exe", + "explorer.exe", + parent_pid=4, + ) + + assert first_pid == session.explorer_pid + assert second_pid == session.explorer_pid + emitted = [ + call.args[0] for call in mock_emitters["windows_event_security"].emit.call_args_list + ] + assert all( + not ( + event.event_type == "process_create" + and event.process is not None + and event.process.image.lower().endswith("explorer.exe") + ) + for event in emitted + ) + def test_repeated_one_shot_cli_processes_get_human_scale_spacing( self, activity_gen, test_user, test_system, state_manager ): diff --git a/tests/unit/test_phase5_system_traffic.py b/tests/unit/test_phase5_system_traffic.py index 29cf13d1..b2eaaf45 100644 --- a/tests/unit/test_phase5_system_traffic.py +++ b/tests/unit/test_phase5_system_traffic.py @@ -29,6 +29,7 @@ import pytest from evidenceforge.generation.activity import ActivityGenerator +from evidenceforge.generation.activity.system_processes import load_system_processes from evidenceforge.generation.engine.baseline import ( _dc_kerberos_cycle_range, _dc_kerberos_tgs_range, @@ -618,6 +619,107 @@ def test_process_tree_depth(self, state_manager, mock_emitters, win_system): ) +class TestSystemProcessSessionOwnership: + """Test source-native ownership for system-pool Windows process candidates.""" + + def test_shell_uwp_processes_use_active_interactive_session(self, state_manager, mock_emitters): + """Shell/UWP processes selected by system traffic should not run as SYSTEM/session 0.""" + from evidenceforge.generation.engine import GenerationEngine + + system = System( + hostname="WKS-01", + ip="10.0.10.1", + os="Windows 10", + type="workstation", + assigned_user="alice", + ) + user = User(username="alice", full_name="Alice", email="alice@example.com") + engine = object.__new__(GenerationEngine) + engine.state_manager = state_manager + engine._system_pids = {} + pids: dict[str, int] = {} + engine._seed_windows_process_tree(system, pids) + + ag = ActivityGenerator(state_manager, mock_emitters) + ag._system_pids = {system.hostname: pids} + ag._users_by_username = {user.username: user} + timestamp = datetime(2024, 3, 15, 10, 0, 0, tzinfo=UTC) + logon_id = ag.generate_logon(user, system, timestamp, logon_type=2) + mock_emitters["windows_event_security"].reset_mock() + + pid = ag.generate_system_process( + system, + timestamp + timedelta(seconds=3), + r"C:\Windows\System32\sihost.exe", + "sihost.exe", + parent_pid=pids["svchost_netsvcs"], + username="SYSTEM", + ) + + assert pid != 0 + proc = state_manager.get_process(system.hostname, pid) + assert proc is not None + assert proc.username == user.username + assert proc.logon_id == logon_id + emitted = [ + call.args[0] for call in mock_emitters["windows_event_security"].emit.call_args_list + ] + process_event = next( + event + for event in emitted + if event.event_type == "process_create" and event.process.pid == pid + ) + assert process_event.auth.username == user.username + assert process_event.auth.logon_id == logon_id + assert process_event.auth.logon_type == 2 + assert process_event.process.integrity_level == "Medium" + assert all( + event.event_type != "system_process_create" or event.process.pid != pid + for event in emitted + ) + + def test_shell_uwp_processes_skip_without_interactive_session( + self, activity_gen, state_manager, mock_emitters + ): + """Desktop-only shell helpers should not appear on hosts without a desktop session.""" + system = System(hostname="DC-01", ip="10.0.0.10", os="Windows Server 2022", type="server") + state_manager.set_current_time(datetime(2024, 3, 15, 10, 0, 0, tzinfo=UTC)) + + pid = activity_gen.generate_system_process( + system, + datetime(2024, 3, 15, 10, 0, 1, tzinfo=UTC), + r"C:\Windows\System32\RuntimeBroker.exe", + "RuntimeBroker.exe -Embedding", + parent_pid=4, + username="SYSTEM", + ) + + assert pid == 0 + emitted = [ + call.args[0] for call in mock_emitters["windows_event_security"].emit.call_args_list + ] + assert all( + "runtimebroker.exe" not in (event.process.image or "").lower() + for event in emitted + if event.process is not None + ) + + def test_system_service_pools_exclude_desktop_shell_helpers(self): + """System service config should not list user-session shell/UWP processes.""" + service_pools = load_system_processes()["system_services"] + pool_images = { + image.rsplit("\\", 1)[-1].lower() + for pool_name in ("all", "workstation") + for entry in service_pools[pool_name] + for image in [entry["image"]] + } + + assert "sihost.exe" not in pool_images + assert "runtimebroker.exe" not in pool_images + assert "backgroundtaskhost.exe" not in pool_images + assert "searchhost.exe" not in pool_images + + class TestInfrastructureTrafficGeneration: """Test Kerberos/LDAP/DB traffic detection and generation.""" From f991a778395321121f9bb192db114b88e53fef71 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 02:08:39 -0400 Subject: [PATCH 41/61] docs: record loop 20 assessment results --- TODO.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/TODO.md b/TODO.md index ffd3459d..31d505df 100644 --- a/TODO.md +++ b/TODO.md @@ -377,7 +377,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r future batch is Linux bash/syslog command-pool repetition, followed by Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior. -- [ ] **IN PROGRESS** Current-dev assessment continuation loops 11-20 — continue the iterative EvidenceForge realism loop from Loop 10, starting with Linux bash/syslog command-pool repetition, then Zeek HTTP `CONNECT` source-visibility semantics and richer proxy cache behavior unless fresh blind findings reprioritize the work. +- [x] Current-dev assessment continuation loops 11-20 — continued the iterative EvidenceForge realism loop from Loop 10 through Loop 20, fixing the highest-leverage verified blind-review findings each pass and committing each completed fix before regeneration/review. - [x] Loop 11 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `e9ff69c`: diversified Linux bash/syslog command texture with per-generation bash command memory, expanded common command tails, and data-driven sudo syslog placeholder pools. Verification passed with focused regressions, `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3102 passed, 37 skipped`). Regenerated eval passed at `95.83/100` across `78,559` records; hard probes showed max bash exact repeat dropped from `21` to `8` and max sudo exact repeat from `28` to `2`. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `84`, Network `74`, Host/EDR `88` (average `82.0`), indicating deeper concrete defects surfaced after the command-pool tell was reduced. Top Loop 12 targets are eCAR read-only command file-create artifacts, DHCP syslog ordering, Linux systemd parent PID ownership, Windows `root` identity bleed/4624 caller-process semantics, web static response byte/MIME state, and ICMP echo byte symmetry. - [x] Loop 12 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `b7c8a70`: repaired source-native host contradictions from the loop-11 panel, including read-only command output parsing, Linux PID 1 systemd ownership, DHCP client lifecycle ordering, Windows explicit-credential subject coercion, and 4624 caller-process semantics. Verification passed with focused regressions (`318 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3110 passed, 37 skipped`). Regenerated eval passed at `96.17/100` across `79,317` records; hard probes showed zero dash-prefixed read-only file artifacts, zero Linux systemd parent PID violations, zero Windows `root` identity mentions, zero 4624 caller-process mismatches, and zero DHCP ACK-before-REQUEST ordering failures. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `68`, Host/EDR `74` (average `73.5`), with top Loop 13 targets now static web/Zeek HTTP response semantics, native Sysmon Event 1/GUID fidelity, dual-Zeek sensor observation determinism, DC 4776 workstation attribution, and pooled host command/daemon texture. - [x] Loop 13 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `93463e4`: repaired static web/Zeek HTTP response semantics so cacheable hashed/static assets keep stable source-native content lengths and zero-body responses no longer render as successful `200` objects with concrete MIME bodies. Verification passed with focused HTTP/browsing/web tests (`69 passed, 1 skipped`), the broader HTTP/proxy/emitter slice (`286 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, and full normal `uv run pytest --no-cov -q` (`3117 passed, 37 skipped`). Regenerated eval passed at `96.64/100` across `82,065` records; hard probes showed zero static `200` zero-body rows, zero bad `304` body/MIME rows, zero zero-body `206` rows, zero refined web static `200` unstable-size groups, and zero Zeek static `200` unstable-size groups. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `68`, Network `72`, Host/EDR `82` (average `74.5`), and the prior web/static-body issue disappeared from reviewer findings. Top Loop 14 targets are SSH command target/network-destination contradictions, Zeek UDP/53 DNS-service zero-payload rows, native Kerberos 4624 `WorkstationName` semantics, Sysmon provider thread-ID distribution, and developer-tool current-directory realism. @@ -388,7 +388,9 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 17 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `eaf090a`: repaired Zeek UDP/DNS packet accounting and cross-sensor DNS payload consistency, blocked workstation lock/unlock/Type 7 unlock generation from Type 3/5 sessions, selected interactive/RDP sessions for storyline lock/unlock events, and added source-local sub-millisecond jitter for SSH syslog lifecycles across baseline, explicit SSH, and SCP receiver artifacts. Verification passed with focused regressions, broader related slices (`406 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero UDP/DNS IP-byte violations, zero cross-sensor DNS payload mismatches, zero Type 3/5 LUID lock/unlock violations, and zero repeated low-microsecond SSH lifecycle suffix groups. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `72`, Host/EDR `68` (average `71.5`). Top Loop 18 targets are exact `0.8` Zeek TLS duration clustering, proxy HTTPS source-mode semantics, Sysmon/Security LogonGuid fidelity, SCP receiver-side eCAR process attribution, bash-history authoredness, and SMB/REJ statistical polish. - [x] Loop 18 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `bc738f2`: repaired the high-volume exact `0.8` Zeek TLS/web duration floor by adding deterministic post-floor texture in both generator-owned TLS durations and Zeek render-time fallback floors. Verification passed with focused TLS/activity tests, broader Zeek/activity/timing slices (`289 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero exact `0.8` TLS rows, zero rows in the `0.800`-`0.801` band, max repeated TLS duration bucket of `2`, and preserved prior UDP/DNS, cross-sensor DNS, Windows LUID, and SSH syslog gates. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `72`, Network `74`, Host/EDR `64` (average `71.5`). Top Loop 19 targets are HTTP/proxy source-native response semantics, cross-sensor Zeek timing-band regularity, public TLS/web long-tail texture, same-LUID Security/Sysmon LogonGuid consistency, and bash/host authoredness. - [x] Loop 19 fix, regeneration, hard probes, quantitative eval, and blind review completed from commits `dc4616c` and `b4c99b1`: repaired HTTP/proxy source-native response semantics by blocking Windows Update Agent from Linux package paths, OS-gating `packages.microsoft.com` URI/UA selection, normalizing redirect/error HTTP MIME to source-native HTML bodies, and preventing HTTP file-transfer fan-out from reintroducing asset MIME on redirects/errors. Verification passed with focused HTTP/proxy/Zeek tests, broader related slices (`269 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3134 passed, 37 skipped`). Regenerated eval passed at exact `96.23/100` across `90,613` records; hard probes found zero Windows Update Agent + Ubuntu package path violations, zero redirect/error asset-MIME violations, zero exact `0.8` TLS rows, and zero rows in the `0.800`-`0.801` TLS duration band. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `76`, Host/EDR `82` (average `77.5`). Top Loop 20 targets are user-shell/UWP processes incorrectly modeled as SYSTEM/session 0, duplicate `explorer.exe` shells from one `userinit.exe`, Zeek HTTP connection reuse, Snort TLS-failure vs Zeek established-TLS conflicts, and Linux `scp` eCAR lifecycle coverage. - - [ ] **IN PROGRESS** Loop 20 fix pass — repair the highest-leverage Loop 19 endpoint source-native process/session contradictions first: user-shell/UWP processes (`sihost.exe`, `RuntimeBroker.exe`, `backgroundTaskHost.exe`, `SearchHost.exe`) should bind to interactive user sessions instead of SYSTEM/session 0, and interactive logon startup should not create multiple simultaneous primary `explorer.exe` shells from the same `userinit.exe`. + - [x] Loop 20 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `76bc107`: repaired user-shell/UWP process ownership by removing desktop helpers from system-service pools, rerouting any remaining `sihost.exe`, `RuntimeBroker.exe`, `backgroundTaskHost.exe`, and `SearchHost.exe` system-process selections into the active interactive session, and reusing the per-session `explorer.exe` shell instead of emitting duplicate primary shell creates. Verification passed with focused regressions, related activity/system-process/spawn-rule tests (`236 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3138 passed, 37 skipped`). Regenerated eval passed at exact `96.25/100` across `79,758` records; hard probes found zero shell/UWP SYSTEM/session-0 violations across Security, Sysmon, and eCAR, zero duplicate primary `explorer.exe` create clusters, and preserved prior Windows Update/Ubuntu, redirect/error MIME, and TLS duration gates. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `82`, Host/EDR `70` (average `74.5`). Top Loop 21 target is the concrete Windows 5156/eCAR browser-flow attribution defect where browser-like Zeek HTTP rows join to host sockets/processes attributed to `svchost.exe`; next targets are Linux syslog/bash cadence, public DNS/X.509 corpus realism, Zeek HTTP connection reuse, and cross-sensor timing texture. +- [ ] **IN PROGRESS** Current-dev assessment continuation loops 21-30 — continue the requested next 10 EvidenceForge realism loops from Loop 20, targeting the highest-leverage verified findings from each panel while keeping the draft PR open. + - [ ] **IN PROGRESS** Loop 21 fix pass — verify and repair the Detection Engineer's Windows Filtering Platform/process-network attribution finding first: browser-like Zeek HTTP User-Agents should not join to Windows 5156/eCAR flows owned by `svchost.exe` unless the traffic is explicitly service-native; fix at the canonical process/network ownership layer and add a rendered-output hard probe. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From c9a7b72e5edf5d3d7961d1a32d8442820d065160 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 02:19:27 -0400 Subject: [PATCH 42/61] fix: attribute browser http flows to user processes --- .../config/activity/web_session_profiles.yaml | 1 + .../generation/activity/generator.py | 269 +++++++++++++++++- tests/unit/test_edr_flow_pid.py | 133 ++++++++- tests/unit/test_web_session_profiles.py | 6 + 4 files changed, 397 insertions(+), 12 deletions(-) diff --git a/src/evidenceforge/config/activity/web_session_profiles.yaml b/src/evidenceforge/config/activity/web_session_profiles.yaml index d4599b2b..8535f02c 100644 --- a/src/evidenceforge/config/activity/web_session_profiles.yaml +++ b/src/evidenceforge/config/activity/web_session_profiles.yaml @@ -13,6 +13,7 @@ visitor_classes: kind: session external: true internal: true + source_type_any: ["workstation"] browsing_intensity: normal user_agent_pool: browser_any user_agent_pool_by_os: diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 7690d4c8..f90b10c1 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -2753,16 +2753,64 @@ def _explicit_proxy_client_process_hint( proxy_sys: System, ) -> tuple[str, str] | None: """Map user-owned proxy User-Agents to the process that owns the socket.""" + browser_hint = self._browser_http_client_process_hint( + user_agent=user_agent, + hostname=hostname, + dst_port=dst_port, + ) + if browser_hint is not None: + return browser_hint + ua = (user_agent or "").lower() if not ua: return None - scheme = "https" if dst_port == 443 else "http" - target_url = f"{scheme}://{hostname}/" if hostname else f"{scheme}://" + target_url = self._http_target_url(hostname=hostname, uri="/", dst_port=dst_port) proxy_url = ( f"http://{self._proxy_fqdn(proxy_sys)}:{getattr(self, '_proxy_listener_port', 8080)}" ) + if ua.startswith("curl/") or " curl/" in ua: + image = r"C:\Windows\System32\curl.exe" + return image, f'curl.exe --proxy {proxy_url} "{target_url}"' + if "powershell" in ua or "invoke-webrequest" in ua: + image = r"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" + return image, ( + f'powershell.exe -NoProfile -Command "Invoke-WebRequest ' + f"-Proxy '{proxy_url}' -Uri '{target_url}' -UseBasicParsing\"" + ) + return None + + @staticmethod + def _http_target_url(*, hostname: str, uri: str, dst_port: int) -> str: + """Build the URL used in source-native client process command lines.""" + path = uri or "/" + if path.startswith(("http://", "https://")): + return path + if not path.startswith("/"): + path = f"/{path}" + scheme = "https" if dst_port == 443 else "http" + if not hostname: + return f"{scheme}://" + host = hostname + if dst_port not in (80, 443) and ":" not in host: + host = f"{host}:{dst_port}" + return f"{scheme}://{host}{path}" + + def _browser_http_client_process_hint( + self, + *, + user_agent: str, + hostname: str, + dst_port: int, + uri: str = "/", + ) -> tuple[str, str] | None: + """Map browser-like Windows HTTP User-Agents to their owning process.""" + ua = (user_agent or "").lower() + if not ua: + return None + + target_url = self._http_target_url(hostname=hostname, uri=uri, dst_port=dst_port) if "firefox/" in ua: image = r"C:\Program Files\Mozilla Firefox\firefox.exe" return image, f'"{image}" -osint -url {target_url}' @@ -2775,15 +2823,6 @@ def _explicit_proxy_client_process_hint( if "trident/" in ua or "msie " in ua: image = r"C:\Program Files\Internet Explorer\iexplore.exe" return image, f'"{image}" {target_url}' - if ua.startswith("curl/") or " curl/" in ua: - image = r"C:\Windows\System32\curl.exe" - return image, f'curl.exe --proxy {proxy_url} "{target_url}"' - if "powershell" in ua or "invoke-webrequest" in ua: - image = r"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" - return image, ( - f'powershell.exe -NoProfile -Command "Invoke-WebRequest ' - f"-Proxy '{proxy_url}' -Uri '{target_url}' -UseBasicParsing\"" - ) return None def _select_explicit_proxy_client_session( @@ -2903,6 +2942,206 @@ def _ensure_explicit_proxy_client_process( self.state_manager.set_current_time(time) return pid, image + def _ensure_browser_http_client_process( + self, + *, + source_system: System | None, + time: datetime, + http: HttpContext, + dst_port: int, + ) -> tuple[int, str | None]: + """Create or reuse the browser process that owns a Windows HTTP socket.""" + if source_system is None or _get_os_category(source_system.os) != "windows": + return -1, None + + hint = self._browser_http_client_process_hint( + user_agent=http.user_agent, + hostname=http.host, + dst_port=dst_port, + uri=http.uri, + ) + if hint is None: + return -1, None + + image, command_line = hint + session = self._active_interactive_windows_session(source_system, time) + if session is None: + return -1, None + user = self._user_model_for_username(session.username) + + image_lower = image.lower() + running_candidates = [ + proc + for proc in self.state_manager.get_processes_on_system(source_system.hostname) + if proc.username == user.username + and proc.logon_id == session.logon_id + and proc.image.lower() == image_lower + and proc.start_time is not None + and proc.start_time <= time + and not self._foreground_process_expired_for_attribution(source_system, proc, time) + ] + if running_candidates: + proc = max(running_candidates, key=lambda candidate: candidate.start_time) + self.state_manager.update_process_activity_time( + source_system.hostname, + proc.pid, + time, + ) + return proc.pid, proc.image + + process_rng = random.Random( + _stable_seed( + "browser_http_client_process:" + f"{source_system.hostname}:{user.username}:{image}:{http.host}:{http.uri}" + ) + ) + lead_seconds = process_rng.uniform(0.4, 8.0) + process_time = time - timedelta(seconds=lead_seconds) + min_process_time = session.start_time + timedelta(milliseconds=500) + if process_time < min_process_time: + process_time = min_process_time + if process_time >= time: + process_time = time - timedelta(milliseconds=100) + + parent_pid = self._select_parent_pid( + source_system, + user, + image, + time=process_time, + logon_id=session.logon_id, + ) + pid = self.generate_process( + user=user, + system=source_system, + time=process_time, + logon_id=session.logon_id, + process_name=image, + command_line=command_line, + parent_pid=parent_pid, + suppress_command_file_effect=True, + allow_existing_browser_reuse=False, + ) + self._record_user_process(source_system, user, pid, image) + self.state_manager.update_process_activity_time(source_system.hostname, pid, time) + self.state_manager.set_current_time(time) + running = self.state_manager.get_process(source_system.hostname, pid) + if running is not None: + return pid, running.image + return pid, image + + def _set_connection_process_context( + self, + event: SecurityEvent, + *, + source_system: System, + pid: int, + image: str | None = None, + ) -> None: + """Update canonical connection process ownership from StateManager.""" + running = self.state_manager.get_process(source_system.hostname, pid) + if running is not None: + event.process = ProcessContext( + pid=pid, + parent_pid=running.parent_pid, + image=running.image, + command_line=running.command_line, + username=running.username, + logon_id=running.logon_id, + start_time=running.start_time, + parent_start_time=self._lookup_parent_start_time( + source_system.hostname, + running.parent_pid, + ), + ) + elif image: + event.process = ProcessContext( + pid=pid, + parent_pid=0, + image=image, + command_line="", + username="", + ) + else: + event.process = None + event.network.initiating_pid = pid + if event.edr is not None: + event.edr.actor_id = ( + self.state_manager.get_process_object_id(source_system.hostname, pid) + if pid > 0 + else "" + ) + + def _repair_browser_http_process_attribution( + self, + event: SecurityEvent, + *, + source_system: System | None, + time: datetime, + ) -> None: + """Prevent browser-like HTTP rows from inheriting service-process ownership.""" + if ( + source_system is None + or event.http is None + or event.network is None + or _get_os_category(source_system.os) != "windows" + ): + return + + hint = self._browser_http_client_process_hint( + user_agent=event.http.user_agent, + hostname=event.http.host, + dst_port=event.network.dst_port, + uri=event.http.uri, + ) + if hint is None: + return + + expected_image = hint[0].lower() + current_pid = event.network.initiating_pid + if current_pid > 0: + current = self.state_manager.get_process(source_system.hostname, current_pid) + if ( + current is not None + and current.image.lower() == expected_image + and not self._foreground_process_expired_for_attribution( + source_system, + current, + time, + ) + ): + self._set_connection_process_context( + event, + source_system=source_system, + pid=current_pid, + ) + self.state_manager.update_process_activity_time( + source_system.hostname, + current_pid, + time, + ) + return + + client_pid, client_image = self._ensure_browser_http_client_process( + source_system=source_system, + time=time, + http=event.http, + dst_port=event.network.dst_port, + ) + if client_pid > 0: + self._set_connection_process_context( + event, + source_system=source_system, + pid=client_pid, + image=client_image, + ) + return + + self._set_connection_process_context( + event, + source_system=source_system, + pid=-1, + ) + def _caller_explicit_proxy_process_image( self, *, @@ -7546,6 +7785,14 @@ def generate_connection( allow_failure=False, ) + self._repair_browser_http_process_attribution( + event, + source_system=resolved_source_system, + time=time, + ) + pid = event.network.initiating_pid + process_ctx = event.process + # Automatic weird.log synthesis is intentionally disabled for now. The # Zeek weird type space is broad and state-sensitive; poorly matched # weird rows are more damaging than sparse weird.log output. Explicit diff --git a/tests/unit/test_edr_flow_pid.py b/tests/unit/test_edr_flow_pid.py index bf96f44e..d2be5715 100644 --- a/tests/unit/test_edr_flow_pid.py +++ b/tests/unit/test_edr_flow_pid.py @@ -31,9 +31,10 @@ import pytest +from evidenceforge.events.contexts import HttpContext from evidenceforge.generation.activity import ActivityGenerator from evidenceforge.generation.state_manager import StateManager -from evidenceforge.models import System +from evidenceforge.models import System, User @pytest.fixture @@ -225,6 +226,136 @@ def test_inferred_dns_pid_prefers_dns_client_service( assert event.edr is not None assert event.edr.actor_id == state_manager.get_process_object_id("WKS-01", local_svc_pid) + @staticmethod + def _browser_http_context() -> HttpContext: + return HttpContext( + method="GET", + host="intranet.example.org", + uri="/", + version="1.1", + user_agent=( + "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0" + ), + request_body_len=0, + response_body_len=2048, + status_code=200, + status_msg="OK", + resp_mime_types=["text/html"], + tags=[], + ) + + def test_browser_http_flow_uses_interactive_browser_instead_of_svchost( + self, activity_gen, state_manager, timestamp, win_system, mock_emitters + ): + """Browser-like HTTP should resolve to a user browser process, not service svchost.""" + user = User(username="jdoe", full_name="Jane Doe", email="jdoe@example.org") + activity_gen._users_by_username = {user.username: user} + state_manager.set_current_time(timestamp - timedelta(minutes=10)) + logon_id = state_manager.create_session( + username=user.username, + system=win_system.hostname, + logon_type=2, + source_ip=win_system.ip, + ) + explorer_pid = state_manager.create_process( + win_system.hostname, + 4, + r"C:\Windows\explorer.exe", + "explorer.exe", + user.username, + "Medium", + logon_id=logon_id, + ) + session = state_manager.get_session(logon_id) + assert session is not None + session.explorer_pid = explorer_pid + svchost_pid = state_manager.create_process( + win_system.hostname, + 4, + r"C:\Windows\System32\svchost.exe", + "svchost.exe -k netsvcs", + "NETWORK SERVICE", + "System", + logon_id="0x3e4", + ) + activity_gen._ip_to_system = {win_system.ip: win_system} + activity_gen._system_pids = {win_system.hostname: {"svchost_netsvcs": svchost_pid}} + + activity_gen.generate_connection( + src_ip=win_system.ip, + dst_ip="10.0.20.10", + time=timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=0.5, + orig_bytes=400, + resp_bytes=2048, + conn_state="SF", + source_system=win_system, + http=self._browser_http_context(), + ) + + event = self._find_connection_event(mock_emitters) + assert event is not None + assert event.process is not None + assert event.process.pid == event.network.initiating_pid + assert event.process.pid != svchost_pid + assert event.process.username == user.username + assert event.process.image.endswith(r"\Mozilla Firefox\firefox.exe") + wfp_event = next( + call.args[0] + for call in mock_emitters["windows_event_security"].emit.call_args_list + if call.args[0].event_type == "wfp_connection" + ) + assert wfp_event.process is not None + assert wfp_event.process.image.endswith(r"\Mozilla Firefox\firefox.exe") + + def test_browser_http_flow_without_interactive_session_clears_svchost_attribution( + self, activity_gen, state_manager, timestamp, win_system, mock_emitters + ): + """A browser UA without a user session should not be rendered as svchost-owned.""" + state_manager.set_current_time(timestamp) + svchost_pid = state_manager.create_process( + win_system.hostname, + 4, + r"C:\Windows\System32\svchost.exe", + "svchost.exe -k netsvcs", + "NETWORK SERVICE", + "System", + logon_id="0x3e4", + ) + activity_gen._ip_to_system = {win_system.ip: win_system} + activity_gen._system_pids = {win_system.hostname: {"svchost_netsvcs": svchost_pid}} + + activity_gen.generate_connection( + src_ip=win_system.ip, + dst_ip="10.0.20.10", + time=timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=0.5, + orig_bytes=400, + resp_bytes=2048, + conn_state="SF", + source_system=win_system, + http=self._browser_http_context(), + ) + + event = self._find_connection_event(mock_emitters) + assert event is not None + assert event.network.initiating_pid == -1 + assert event.process is None + assert event.edr is not None + assert event.edr.actor_id == "" + wfp_events = [ + call.args[0] + for call in mock_emitters["windows_event_security"].emit.call_args_list + if call.args[0].event_type == "wfp_connection" + ] + assert not wfp_events + def test_connection_timestamp_not_before_process_start( self, activity_gen, state_manager, timestamp, win_system, mock_emitters ): diff --git a/tests/unit/test_web_session_profiles.py b/tests/unit/test_web_session_profiles.py index 967ac02e..901a220f 100644 --- a/tests/unit/test_web_session_profiles.py +++ b/tests/unit/test_web_session_profiles.py @@ -49,6 +49,12 @@ def test_health_check_profile_is_server_scoped(): assert "monitoring" in profile["source_role_any"] +def test_internal_human_browser_profile_is_workstation_scoped(): + profile = load_web_session_profiles()["visitor_classes"]["human_browser"] + + assert profile["source_type_any"] == ["workstation"] + + def test_user_agent_honors_source_os_pool(): profile = load_web_session_profiles()["visitor_classes"]["human_browser"] ua = pick_web_user_agent(random.Random(1), profile, source_os="linux") From 6b589ecd88fe098cf0f4d2b2dd19848f62f184d8 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 02:30:34 -0400 Subject: [PATCH 43/61] docs: record loop 21 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 31d505df..c2c7f2a1 100644 --- a/TODO.md +++ b/TODO.md @@ -390,7 +390,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 19 fix, regeneration, hard probes, quantitative eval, and blind review completed from commits `dc4616c` and `b4c99b1`: repaired HTTP/proxy source-native response semantics by blocking Windows Update Agent from Linux package paths, OS-gating `packages.microsoft.com` URI/UA selection, normalizing redirect/error HTTP MIME to source-native HTML bodies, and preventing HTTP file-transfer fan-out from reintroducing asset MIME on redirects/errors. Verification passed with focused HTTP/proxy/Zeek tests, broader related slices (`269 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3134 passed, 37 skipped`). Regenerated eval passed at exact `96.23/100` across `90,613` records; hard probes found zero Windows Update Agent + Ubuntu package path violations, zero redirect/error asset-MIME violations, zero exact `0.8` TLS rows, and zero rows in the `0.800`-`0.801` TLS duration band. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `76`, Host/EDR `82` (average `77.5`). Top Loop 20 targets are user-shell/UWP processes incorrectly modeled as SYSTEM/session 0, duplicate `explorer.exe` shells from one `userinit.exe`, Zeek HTTP connection reuse, Snort TLS-failure vs Zeek established-TLS conflicts, and Linux `scp` eCAR lifecycle coverage. - [x] Loop 20 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `76bc107`: repaired user-shell/UWP process ownership by removing desktop helpers from system-service pools, rerouting any remaining `sihost.exe`, `RuntimeBroker.exe`, `backgroundTaskHost.exe`, and `SearchHost.exe` system-process selections into the active interactive session, and reusing the per-session `explorer.exe` shell instead of emitting duplicate primary shell creates. Verification passed with focused regressions, related activity/system-process/spawn-rule tests (`236 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3138 passed, 37 skipped`). Regenerated eval passed at exact `96.25/100` across `79,758` records; hard probes found zero shell/UWP SYSTEM/session-0 violations across Security, Sysmon, and eCAR, zero duplicate primary `explorer.exe` create clusters, and preserved prior Windows Update/Ubuntu, redirect/error MIME, and TLS duration gates. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `82`, Host/EDR `70` (average `74.5`). Top Loop 21 target is the concrete Windows 5156/eCAR browser-flow attribution defect where browser-like Zeek HTTP rows join to host sockets/processes attributed to `svchost.exe`; next targets are Linux syslog/bash cadence, public DNS/X.509 corpus realism, Zeek HTTP connection reuse, and cross-sensor timing texture. - [ ] **IN PROGRESS** Current-dev assessment continuation loops 21-30 — continue the requested next 10 EvidenceForge realism loops from Loop 20, targeting the highest-leverage verified findings from each panel while keeping the draft PR open. - - [ ] **IN PROGRESS** Loop 21 fix pass — verify and repair the Detection Engineer's Windows Filtering Platform/process-network attribution finding first: browser-like Zeek HTTP User-Agents should not join to Windows 5156/eCAR flows owned by `svchost.exe` unless the traffic is explicitly service-native; fix at the canonical process/network ownership layer and add a rendered-output hard probe. + - [x] Loop 21 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `c9a7b72`: repaired browser-like HTTP process/network ownership by resolving Windows browser User-Agents to active interactive browser processes when possible, clearing misleading service-process attribution when no user session exists, and scoping internal human-browser web visitors to workstation clients. Verification passed with focused regressions, related eCAR/proxy/web/baseline tests (`107 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3141 passed, 37 skipped`). Regenerated eval passed at exact `96.40/100` across `80,222` records; hard probes found zero browser-like Zeek HTTP rows joined to Windows 5156 or eCAR `svchost.exe`, while all `497` matched WFP browser rows were browser-owned and service/tool HTTP retained service ownership where appropriate. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `68`, Network `84`, Host/EDR `72` (average `75.5`). Top Loop 22 target is Linux endpoint realism: bash command cadence and fleet-wide repeated syslog daemon pools, followed by public DNS/X.509 metadata templates, Zeek multi-sensor timing texture, HTTP connection reuse, and SYSTEM Sysmon `LogonGuid` morphology. + - [ ] **IN PROGRESS** Loop 22 fix pass — repair the multi-reviewer Linux endpoint realism finding first: make Linux syslog daemon noise host-role-specific and less fleet-uniform, and make bash histories session-aware with human pacing/repetition limits; add rendered-output hard probes for repeated daemon-message concentration and short-gap bash command bursts. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From ecc45ef5f88e06ec42f7af86cf9b6753d778c2ec Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 02:38:41 -0400 Subject: [PATCH 44/61] fix: reduce linux endpoint cadence fingerprints --- TODO.md | 3 +- src/evidenceforge/cli/validate_config.py | 13 ++++ .../activity/extra_syslog_messages.yaml | 60 +++++++++++++++--- src/evidenceforge/config/schemas.py | 2 + .../generation/activity/bash_commands.py | 4 +- .../generation/activity/extra_syslog.py | 28 ++++++++- .../generation/activity/generator.py | 38 +++++++++-- .../generation/engine/baseline.py | 1 + tests/unit/test_bash_history_noise.py | 27 +++++++- tests/unit/test_validate_config.py | 63 +++++++++++++++++++ 10 files changed, 221 insertions(+), 18 deletions(-) diff --git a/TODO.md b/TODO.md index c2c7f2a1..8fda4bf9 100644 --- a/TODO.md +++ b/TODO.md @@ -391,7 +391,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 20 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `76bc107`: repaired user-shell/UWP process ownership by removing desktop helpers from system-service pools, rerouting any remaining `sihost.exe`, `RuntimeBroker.exe`, `backgroundTaskHost.exe`, and `SearchHost.exe` system-process selections into the active interactive session, and reusing the per-session `explorer.exe` shell instead of emitting duplicate primary shell creates. Verification passed with focused regressions, related activity/system-process/spawn-rule tests (`236 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3138 passed, 37 skipped`). Regenerated eval passed at exact `96.25/100` across `79,758` records; hard probes found zero shell/UWP SYSTEM/session-0 violations across Security, Sysmon, and eCAR, zero duplicate primary `explorer.exe` create clusters, and preserved prior Windows Update/Ubuntu, redirect/error MIME, and TLS duration gates. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `82`, Host/EDR `70` (average `74.5`). Top Loop 21 target is the concrete Windows 5156/eCAR browser-flow attribution defect where browser-like Zeek HTTP rows join to host sockets/processes attributed to `svchost.exe`; next targets are Linux syslog/bash cadence, public DNS/X.509 corpus realism, Zeek HTTP connection reuse, and cross-sensor timing texture. - [ ] **IN PROGRESS** Current-dev assessment continuation loops 21-30 — continue the requested next 10 EvidenceForge realism loops from Loop 20, targeting the highest-leverage verified findings from each panel while keeping the draft PR open. - [x] Loop 21 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `c9a7b72`: repaired browser-like HTTP process/network ownership by resolving Windows browser User-Agents to active interactive browser processes when possible, clearing misleading service-process attribution when no user session exists, and scoping internal human-browser web visitors to workstation clients. Verification passed with focused regressions, related eCAR/proxy/web/baseline tests (`107 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3141 passed, 37 skipped`). Regenerated eval passed at exact `96.40/100` across `80,222` records; hard probes found zero browser-like Zeek HTTP rows joined to Windows 5156 or eCAR `svchost.exe`, while all `497` matched WFP browser rows were browser-owned and service/tool HTTP retained service ownership where appropriate. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `68`, Network `84`, Host/EDR `72` (average `75.5`). Top Loop 22 target is Linux endpoint realism: bash command cadence and fleet-wide repeated syslog daemon pools, followed by public DNS/X.509 metadata templates, Zeek multi-sensor timing texture, HTTP connection reuse, and SYSTEM Sysmon `LogonGuid` morphology. - - [ ] **IN PROGRESS** Loop 22 fix pass — repair the multi-reviewer Linux endpoint realism finding first: make Linux syslog daemon noise host-role-specific and less fleet-uniform, and make bash histories session-aware with human pacing/repetition limits; add rendered-output hard probes for repeated daemon-message concentration and short-gap bash command bursts. + - [x] Loop 22 fix pass — repaired the multi-reviewer Linux endpoint realism finding at the generator/config layers: bash histories now use per-host/user session pacing with bounded quick-command streaks, command pools suppress high exact-repeat counts, and extra syslog daemon pools are scoped by system type/role with lower weights and expanded templates. Verification before regeneration passed with focused regressions (`46 passed`), related activity/config tests (`212 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). + - [ ] **IN PROGRESS** Loop 22 regeneration, hard probes, quantitative eval, and blind review. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/cli/validate_config.py b/src/evidenceforge/cli/validate_config.py index 42fad68d..59e70a40 100644 --- a/src/evidenceforge/cli/validate_config.py +++ b/src/evidenceforge/cli/validate_config.py @@ -1920,6 +1920,19 @@ def _record_ids_rule_identity( if not isinstance(entry, dict): continue app = str(entry.get("app") or "") + _VALID_SYSLOG_SYSTEM_TYPES = {"workstation", "server", "domain_controller"} + for system_type in entry.get("system_types", []): + if system_type not in _VALID_SYSLOG_SYSTEM_TYPES: + result.issues.append( + Issue( + "ERROR", + "extra_syslog_messages.yaml", + ( + f'App "{app}" has invalid system_type "{system_type}" ' + f"(valid: {sorted(_VALID_SYSLOG_SYSTEM_TYPES)})" + ), + ) + ) for message in entry.get("messages", []): if not isinstance(message, str): continue diff --git a/src/evidenceforge/config/activity/extra_syslog_messages.yaml b/src/evidenceforge/config/activity/extra_syslog_messages.yaml index e6f7c922..0fca3861 100644 --- a/src/evidenceforge/config/activity/extra_syslog_messages.yaml +++ b/src/evidenceforge/config/activity/extra_syslog_messages.yaml @@ -5,9 +5,11 @@ # messages: list of message templates ({} placeholders filled at generation time) # distro: optional - "ubuntu" to restrict to Debian/Ubuntu (excluded on RHEL-like) # roles: optional - list of required host roles (any match = include) +# exclude_roles: optional - list of host roles to suppress +# system_types: optional - workstation/server/domain_controller scope # transient: optional - true if the process forks per invocation (random PID) # -# If distro/roles are omitted, the entry appears on all Linux hosts. +# If distro/roles/system_types are omitted, the entry appears on all Linux hosts. # # Depended on by: generation engine (syslog baseline diversity) # Depends on: nothing — standalone (distro/role filters are soft matches) @@ -16,6 +18,8 @@ programs: - app: NetworkManager + system_types: [workstation] + weight: 3 messages: - " [{}] dhcp4 (ens160): state changed renew -> bound" - " [{}] device (ens160): state change: disconnected -> prepare" @@ -23,6 +27,7 @@ programs: - " [{}] manager: NetworkManager state is now CONNECTED_GLOBAL" - app: dbus-daemon + weight: 2 messages: - "[system] Activating via systemd: service name='org.freedesktop.hostname1'" - "[system] Successfully activated service 'org.freedesktop.resolve1'" @@ -118,27 +123,57 @@ programs: - "bound to {ip} -- renewal in {renewal} seconds." - app: polkitd + weight: 2 + params: + auth_subject: + - unix-process + - unix-session + - system-bus-name messages: - - "Registered Authentication Agent for unix-process" - - "Unregistered Authentication Agent for unix-process" + - "Registered Authentication Agent for {auth_subject}" + - "Unregistered Authentication Agent for {auth_subject}" - "Operator of unix-process:{} successfully authenticated as 'root'" - app: multipathd + system_types: [server] + roles: [database] + weight: 1 + params: + device: + - sda + - sdb + - dm-0 + active_paths: + - "1" + - "2" messages: - - "sda: add missing path" - - "sda: remaining active paths: 1" + - "{device}: add missing path" + - "{device}: remaining active paths: {active_paths}" - app: accounts-daemon + system_types: [workstation] weight: 1 + params: + desktop_user: + - admin + - ubuntu + - deploy messages: - - "user 'admin' has logged in" + - "user '{desktop_user}' has logged in" - app: packagekitd + system_types: [workstation] + distro: ubuntu + weight: 1 messages: - "search-names transaction /{}" + - "resolve transaction /{}" + - "get-updates transaction /{}" - app: unattended-upgr distro: ubuntu + system_types: [server] + weight: 2 messages: - "Allowed origins are: o=Ubuntu,a=jammy" - "No packages found that can be upgraded unattended" @@ -146,15 +181,26 @@ programs: - app: systemd-resolved distro: ubuntu + weight: 2 messages: - "Using degraded feature set UDP instead of UDP+EDNS0 for DNS server {dns_server}." - "Grace period over, resuming full feature set for DNS server {dns_server}." - "Positive Trust Anchors: . IN DS 20326" - app: thermald + system_types: [workstation] + weight: 1 + params: + cooling_device: + - "0" + - "1" + zone: + - x86_pkg_temp + - acpitz messages: - "Unsupported cpu model, use default config" - - "cooling device 0 intel_powerclamp type: 0x02" + - "cooling device {cooling_device} intel_powerclamp type: 0x02" + - "thermal zone {zone}: trip point updated" - app: irqbalance weight: 1 diff --git a/src/evidenceforge/config/schemas.py b/src/evidenceforge/config/schemas.py index dc090a54..1bab08a6 100644 --- a/src/evidenceforge/config/schemas.py +++ b/src/evidenceforge/config/schemas.py @@ -163,6 +163,8 @@ class SyslogProgramEntry(BaseModel, extra="forbid"): params: dict[str, list[str]] | None = None distro: str | None = None roles: list[str] | None = None + exclude_roles: list[str] | None = None + system_types: list[str] | None = None transient: bool | None = None weight: int = Field(default=10, gt=0) diff --git a/src/evidenceforge/generation/activity/bash_commands.py b/src/evidenceforge/generation/activity/bash_commands.py index 8e9b5d8a..4847f392 100644 --- a/src/evidenceforge/generation/activity/bash_commands.py +++ b/src/evidenceforge/generation/activity/bash_commands.py @@ -319,7 +319,7 @@ def _choose_template_with_memory( key = (system_hostname.lower(), username.lower()) recent = set(_COMMAND_RECENCY.get(key, ())) - soft_cap = max(4, min(8, max(1, len(pool) // 4))) + soft_cap = max(3, min(6, max(1, len(pool) // 5))) attempts = _COMMAND_CANDIDATE_ATTEMPTS candidates: list[str] = [] for _ in range(attempts): @@ -331,7 +331,7 @@ def _choose_template_with_memory( return command for command in candidates: - if command not in recent: + if command not in recent and _COMMAND_GLOBAL_COUNTS[command] < soft_cap + 2: _remember_command(system_hostname, username, command) return command diff --git a/src/evidenceforge/generation/activity/extra_syslog.py b/src/evidenceforge/generation/activity/extra_syslog.py index 1d181d91..fa3acfd1 100644 --- a/src/evidenceforge/generation/activity/extra_syslog.py +++ b/src/evidenceforge/generation/activity/extra_syslog.py @@ -44,6 +44,7 @@ def filter_syslog_messages( programs: list[dict[str, Any]], is_rhel_like: bool, host_roles: list[str] | None, + system_type: str | None = None, ) -> list[tuple[str, list[str], int]]: """Filter syslog programs by distro and host roles. @@ -51,13 +52,19 @@ def filter_syslog_messages( programs: Raw program entries from YAML. is_rhel_like: True for CentOS/RHEL/Rocky/Alma hosts. host_roles: List of roles assigned to the host, or None. + system_type: Scenario system type, if known. Returns: List of (app_name, messages, weight) tuples matching the host context. """ return [ (entry["app"], entry["messages"], int(entry.get("weight", 10))) - for entry in filter_syslog_message_entries(programs, is_rhel_like, host_roles) + for entry in filter_syslog_message_entries( + programs, + is_rhel_like, + host_roles, + system_type, + ) ] @@ -65,19 +72,36 @@ def filter_syslog_message_entries( programs: list[dict[str, Any]], is_rhel_like: bool, host_roles: list[str] | None, + system_type: str | None = None, ) -> list[dict[str, Any]]: """Filter syslog programs by distro and host roles, preserving entry metadata.""" result: list[dict[str, Any]] = [] + normalized_roles = {role.lower() for role in (host_roles or [])} + normalized_type = (system_type or "").lower() for entry in programs: # Distro filter distro = entry.get("distro") if distro == "ubuntu" and is_rhel_like: continue + # System type filter — workstation-only desktop daemons should not + # appear as high-volume server noise, and server-only daemons should + # not leak onto laptops. + allowed_types = entry.get("system_types") + if allowed_types and normalized_type not in {str(t).lower() for t in allowed_types}: + continue + # Role filter — if roles specified, host must have at least one required_roles = entry.get("roles") if required_roles: - if not host_roles or not any(r in host_roles for r in required_roles): + required = {str(role).lower() for role in required_roles} + if not normalized_roles or not normalized_roles.intersection(required): + continue + + excluded_roles = entry.get("exclude_roles") + if excluded_roles: + excluded = {str(role).lower() for role in excluded_roles} + if normalized_roles.intersection(excluded): continue result.append(entry) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index f90b10c1..47491254 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -2025,6 +2025,8 @@ def __init__( self._tls_ocsp_windows: dict[tuple[str, str, int], tuple[int, int]] = {} self._ntp_association_profiles: dict[tuple[str, str], dict[str, float | int]] = {} self._bash_history_next_time: dict[tuple[str, str], datetime] = {} + self._bash_history_command_counts: dict[tuple[str, str], int] = {} + self._bash_history_quick_streaks: dict[tuple[str, str], int] = {} self._foreground_process_finalizers: dict[ tuple[str, int], tuple[System, str, str, str, datetime] ] = {} @@ -8413,9 +8415,29 @@ def _schedule_bash_history_time( ) ) if dwell_seconds <= 2.0: - dwell_seconds += jitter_rng.uniform(0.4, 4.8) + command_count = self._bash_history_command_counts.get(key, 0) + quick_streak = self._bash_history_quick_streaks.get(key, 0) + roll = jitter_rng.random() + if command_count == 0: + extra_delay = jitter_rng.uniform(4.0, 18.0) + elif roll < 0.16 and quick_streak == 0: + extra_delay = jitter_rng.uniform(4.0, 12.0) + elif roll < 0.68: + extra_delay = jitter_rng.uniform(18.0, 95.0) + elif roll < 0.93: + extra_delay = jitter_rng.uniform(95.0, 420.0) + else: + extra_delay = jitter_rng.uniform(420.0, 1500.0) + dwell_seconds += extra_delay + self._bash_history_quick_streaks[key] = quick_streak + 1 if extra_delay < 14.0 else 0 + elif dwell_seconds < 45.0: + dwell_seconds = dwell_seconds * jitter_rng.uniform(1.0, 2.2) + jitter_rng.uniform( + 4.0, 18.0 + ) else: - dwell_seconds *= jitter_rng.uniform(0.85, 1.25) + dwell_seconds = max(dwell_seconds, dwell_seconds * jitter_rng.uniform(0.95, 1.35)) + self._bash_history_quick_streaks[key] = 0 + self._bash_history_command_counts[key] = self._bash_history_command_counts.get(key, 0) + 1 self._bash_history_next_time[key] = scheduled_time + timedelta(seconds=dwell_seconds) return scheduled_time @@ -8449,11 +8471,17 @@ def generate_bash_command_with_noise( for _ in range(n_noise): # Delay based on complexity of previous command if any(prev_cmd.startswith(p) for p in _COMPLEX_PREFIXES): - delay = rng.uniform(10.0, 60.0) + delay = rng.uniform(20.0, 120.0) elif any(prev_cmd.startswith(p) for p in _MEDIUM_PREFIXES): - delay = rng.uniform(3.0, 15.0) + delay = rng.uniform(8.0, 45.0) else: - delay = rng.uniform(1.0, 5.0) + delay = rng.choice( + [ + rng.uniform(4.0, 14.0), + rng.uniform(18.0, 90.0), + rng.uniform(90.0, 240.0), + ] + ) cumulative_delay += delay noise_time = time + timedelta(seconds=cumulative_delay) noise_cmd, is_typo = pick_bash_command_entry( diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 92f133da..f7d84c54 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -5592,6 +5592,7 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 _all_programs, is_rhel_like, system.roles, + sys_type, ) if not filtered: continue diff --git a/tests/unit/test_bash_history_noise.py b/tests/unit/test_bash_history_noise.py index 1855a68c..e508c0c0 100644 --- a/tests/unit/test_bash_history_noise.py +++ b/tests/unit/test_bash_history_noise.py @@ -259,7 +259,7 @@ def test_bash_picker_suppresses_repeated_exact_commands(self, monkeypatch): ] counts = Counter(picked) - assert max(counts.values()) <= 8 + assert max(counts.values()) <= 6 class TestBashHistoryChronological: @@ -302,6 +302,31 @@ def test_simple_command_dwell_is_not_exact_two_second_cadence( assert deltas assert any(delta != 2.0 for delta in deltas) + assert any(delta > 10.0 for delta in deltas) + + def test_simple_command_dwell_avoids_mechanical_short_bursts( + self, state_manager, mock_emitters, linux_system, root_user + ): + ag = ActivityGenerator(state_manager, mock_emitters) + start = datetime(2024, 3, 18, 14, 0, 0, tzinfo=UTC) + + for offset in range(12): + ag.generate_bash_command( + root_user, + linux_system, + start + timedelta(seconds=offset * 2), + "ls", + ) + + events = [call.args[0] for call in mock_emitters["bash_history"].emit.call_args_list] + deltas = [ + (events[idx].timestamp - events[idx - 1].timestamp).total_seconds() + for idx in range(1, len(events)) + ] + + assert deltas + assert sum(delta <= 10.0 for delta in deltas) <= 3 + assert any(delta >= 60.0 for delta in deltas) def test_shred_remove_clears_rendered_history(self, tmp_path): """A destructive shred of .bash_history should erase prior collected entries.""" diff --git a/tests/unit/test_validate_config.py b/tests/unit/test_validate_config.py index 51b92f91..5aba0885 100644 --- a/tests/unit/test_validate_config.py +++ b/tests/unit/test_validate_config.py @@ -990,6 +990,31 @@ def load_invalid_extra_syslog_messages(): for issue in result.issues ) + def test_validate_config_rejects_invalid_extra_syslog_system_type(self, monkeypatch): + from evidenceforge.generation.activity import extra_syslog + + def load_invalid_extra_syslog_messages(): + return [ + { + "app": "packagekitd", + "system_types": ["laptop"], + "messages": ["search-names transaction /12345"], + } + ] + + monkeypatch.setattr( + extra_syslog, "load_extra_syslog_messages", load_invalid_extra_syslog_messages + ) + + result = validate_config() + + assert any( + issue.severity == "ERROR" + and issue.file == "extra_syslog_messages.yaml" + and 'App "packagekitd" has invalid system_type "laptop"' in issue.message + for issue in result.issues + ) + def test_validate_config_rejects_networkmanager_same_state_transition(self, monkeypatch): from evidenceforge.generation.activity import extra_syslog @@ -1044,6 +1069,44 @@ def test_extra_syslog_sudo_templates_render_contextual_services(self): "deploy : TTY=pts/1 ; PWD=/srv/app ; USER=root ; COMMAND=/bin/systemctl status nginx" ) + def test_extra_syslog_filters_by_system_type_and_excluded_roles(self): + from evidenceforge.generation.activity.extra_syslog import filter_syslog_message_entries + + programs = [ + { + "app": "packagekitd", + "system_types": ["workstation"], + "messages": ["search-names transaction /{}"], + }, + { + "app": "multipathd", + "system_types": ["server"], + "roles": ["database"], + "messages": ["{device}: add missing path"], + }, + { + "app": "accounts-daemon", + "exclude_roles": ["database"], + "messages": ["user 'admin' has logged in"], + }, + ] + + db_server = filter_syslog_message_entries( + programs, + is_rhel_like=False, + host_roles=["database"], + system_type="server", + ) + workstation = filter_syslog_message_entries( + programs, + is_rhel_like=False, + host_roles=[], + system_type="workstation", + ) + + assert [entry["app"] for entry in db_server] == ["multipathd"] + assert [entry["app"] for entry in workstation] == ["packagekitd", "accounts-daemon"] + def test_validate_config_rejects_invalid_4672_emission_probability(self, monkeypatch): from evidenceforge.generation.activity import windows_auth_realism From f8c19f0c4267394f5ab31b036f71c490f0f3d27e Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 02:50:51 -0400 Subject: [PATCH 45/61] docs: record loop 22 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 8fda4bf9..46464b37 100644 --- a/TODO.md +++ b/TODO.md @@ -392,7 +392,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [ ] **IN PROGRESS** Current-dev assessment continuation loops 21-30 — continue the requested next 10 EvidenceForge realism loops from Loop 20, targeting the highest-leverage verified findings from each panel while keeping the draft PR open. - [x] Loop 21 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `c9a7b72`: repaired browser-like HTTP process/network ownership by resolving Windows browser User-Agents to active interactive browser processes when possible, clearing misleading service-process attribution when no user session exists, and scoping internal human-browser web visitors to workstation clients. Verification passed with focused regressions, related eCAR/proxy/web/baseline tests (`107 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3141 passed, 37 skipped`). Regenerated eval passed at exact `96.40/100` across `80,222` records; hard probes found zero browser-like Zeek HTTP rows joined to Windows 5156 or eCAR `svchost.exe`, while all `497` matched WFP browser rows were browser-owned and service/tool HTTP retained service ownership where appropriate. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `68`, Network `84`, Host/EDR `72` (average `75.5`). Top Loop 22 target is Linux endpoint realism: bash command cadence and fleet-wide repeated syslog daemon pools, followed by public DNS/X.509 metadata templates, Zeek multi-sensor timing texture, HTTP connection reuse, and SYSTEM Sysmon `LogonGuid` morphology. - [x] Loop 22 fix pass — repaired the multi-reviewer Linux endpoint realism finding at the generator/config layers: bash histories now use per-host/user session pacing with bounded quick-command streaks, command pools suppress high exact-repeat counts, and extra syslog daemon pools are scoped by system type/role with lower weights and expanded templates. Verification before regeneration passed with focused regressions (`46 passed`), related activity/config tests (`212 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - - [ ] **IN PROGRESS** Loop 22 regeneration, hard probes, quantitative eval, and blind review. + - [x] Loop 22 regeneration, hard probes, quantitative eval, and blind review completed from commit `ecc45ef`: repaired Linux endpoint cadence fingerprints by adding session-aware bash pacing, tighter exact-command repeat suppression, and system-type/role-scoped extra syslog pools. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes found zero server-side desktop-daemon syslog rows, zero non-database `multipathd` rows, bash `<=10s` gaps at `47/726` (`6.47%`), median bash delta `85s`, and max per-file exact-command repeat `4`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `66`, Network `84`, Host/EDR `42` (Host/EDR verdict Real at confidence `58`; average synthetic-confidence `67.5`). Top Loop 23 target is the hard Zeek multi-sensor timing fingerprint where mirrored DMZ records always trail core records by a tiny positive offset; next targets are public DNS/X.509 corpus templates, remaining Linux daemon-message repetition, DNS TXT tunnel vocabulary/cadence, and HTTP connection reuse. + - [ ] **IN PROGRESS** Loop 23 fix pass — repair the Zeek multi-sensor timing texture so mirrored records have sensor-specific clock offset/drift/capture/parser jitter and occasional realistic ordering inversions instead of an always-positive DMZ-after-core millisecond offset. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 91546d70d8bdeaeb1ff7d98f814a5166f461c213 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 02:54:18 -0400 Subject: [PATCH 46/61] fix: vary zeek multi-sensor timing offsets --- TODO.md | 3 +- .../generation/emitters/zeek_base.py | 48 ++++++++++++++----- tests/unit/test_zeek_multiplex.py | 2 + 3 files changed, 40 insertions(+), 13 deletions(-) diff --git a/TODO.md b/TODO.md index 46464b37..69834bf4 100644 --- a/TODO.md +++ b/TODO.md @@ -393,7 +393,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 21 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `c9a7b72`: repaired browser-like HTTP process/network ownership by resolving Windows browser User-Agents to active interactive browser processes when possible, clearing misleading service-process attribution when no user session exists, and scoping internal human-browser web visitors to workstation clients. Verification passed with focused regressions, related eCAR/proxy/web/baseline tests (`107 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3141 passed, 37 skipped`). Regenerated eval passed at exact `96.40/100` across `80,222` records; hard probes found zero browser-like Zeek HTTP rows joined to Windows 5156 or eCAR `svchost.exe`, while all `497` matched WFP browser rows were browser-owned and service/tool HTTP retained service ownership where appropriate. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `68`, Network `84`, Host/EDR `72` (average `75.5`). Top Loop 22 target is Linux endpoint realism: bash command cadence and fleet-wide repeated syslog daemon pools, followed by public DNS/X.509 metadata templates, Zeek multi-sensor timing texture, HTTP connection reuse, and SYSTEM Sysmon `LogonGuid` morphology. - [x] Loop 22 fix pass — repaired the multi-reviewer Linux endpoint realism finding at the generator/config layers: bash histories now use per-host/user session pacing with bounded quick-command streaks, command pools suppress high exact-repeat counts, and extra syslog daemon pools are scoped by system type/role with lower weights and expanded templates. Verification before regeneration passed with focused regressions (`46 passed`), related activity/config tests (`212 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - [x] Loop 22 regeneration, hard probes, quantitative eval, and blind review completed from commit `ecc45ef`: repaired Linux endpoint cadence fingerprints by adding session-aware bash pacing, tighter exact-command repeat suppression, and system-type/role-scoped extra syslog pools. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes found zero server-side desktop-daemon syslog rows, zero non-database `multipathd` rows, bash `<=10s` gaps at `47/726` (`6.47%`), median bash delta `85s`, and max per-file exact-command repeat `4`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `66`, Network `84`, Host/EDR `42` (Host/EDR verdict Real at confidence `58`; average synthetic-confidence `67.5`). Top Loop 23 target is the hard Zeek multi-sensor timing fingerprint where mirrored DMZ records always trail core records by a tiny positive offset; next targets are public DNS/X.509 corpus templates, remaining Linux daemon-message repetition, DNS TXT tunnel vocabulary/cadence, and HTTP connection reuse. - - [ ] **IN PROGRESS** Loop 23 fix pass — repair the Zeek multi-sensor timing texture so mirrored records have sensor-specific clock offset/drift/capture/parser jitter and occasional realistic ordering inversions instead of an always-positive DMZ-after-core millisecond offset. + - [x] Loop 23 fix pass — repaired the Zeek multi-sensor timing texture at the shared Zeek multiplex layer by applying independent clock skew/drift plus per-flow capture delay to every observing sensor, not only secondary sensors, so mirrored rows can have positive or negative cross-sensor deltas while remaining within a well-synced low-millisecond envelope. Verification before regeneration passed with focused timing tests (`3 passed`), related Zeek fanout/multiplex/HTTP/SSL/files/activity tests (`116 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). + - [ ] **IN PROGRESS** Loop 23 regeneration, hard probes, quantitative eval, and blind review. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/generation/emitters/zeek_base.py b/src/evidenceforge/generation/emitters/zeek_base.py index c8cd7354..ccd5c14d 100644 --- a/src/evidenceforge/generation/emitters/zeek_base.py +++ b/src/evidenceforge/generation/emitters/zeek_base.py @@ -106,6 +106,28 @@ def _sensor_clock_skew_us(hostname: str) -> int: return timing.clock_skew_min_us + (seed % max(1, width)) +def _sensor_clock_drift_us(hostname: str, ts: Any) -> int: + """Return small time-bucketed clock drift for a sensor timestamp.""" + if isinstance(ts, datetime): + epoch_seconds = int(ts.timestamp()) + elif isinstance(ts, (int, float)): + epoch_seconds = int(ts) + else: + epoch_seconds = 0 + # Drift moves slowly, not per packet. Fifteen-minute buckets are enough to + # avoid a perfectly fixed offset while keeping well-synced sensors close. + bucket = epoch_seconds // 900 + seed = _stable_seed(f"zeek_sensor_clock_drift:{hostname}:{bucket}") + return (seed % 401) - 200 + + +def _sensor_clock_adjustment_us(hostname: str, ts: Any) -> int: + """Return stable skew plus bounded drift within the configured skew window.""" + timing = network_sensor_observation_timing() + skew = _sensor_clock_skew_us(hostname) + _sensor_clock_drift_us(hostname, ts) + return max(timing.clock_skew_min_us, min(timing.clock_skew_max_us, skew)) + + def _sensor_path_delay_us(hostname: str, original_uid: Any) -> int: """Return per-flow capture timestamp variance for a sensor observation.""" timing = network_sensor_observation_timing() @@ -590,19 +612,21 @@ def _dispatch(self, event_data: dict[str, Any]) -> None: original_dst_ip, swaps["dst_ip"], ) - # Sensors have stable clock skew plus per-flow capture timing - # variance. Keep the offset shared across Zeek log families for - # a flow, but avoid a fixed cross-sensor clone delay. + # Each sensor has independent clock skew/drift plus per-flow + # capture timing. Apply it to every sensor in a multi-sensor + # observation so cross-sensor deltas can be positive or + # negative instead of always "secondary = primary + delay". + ts = render_data.get("ts") + if len(targets) > 1 and ts is not None: + sensor_delay_us = _sensor_clock_adjustment_us( + hostname, + ts, + ) + _sensor_path_delay_us(hostname, original_uid) + if isinstance(ts, datetime): + render_data["ts"] = ts + timedelta(microseconds=sensor_delay_us) + elif isinstance(ts, (int, float)): + render_data["ts"] = ts + sensor_delay_us / 1_000_000 if i > 0: - sensor_delay_us = _sensor_clock_skew_us(hostname) + _sensor_path_delay_us( - hostname, original_uid - ) - ts = render_data.get("ts") - if ts is not None: - if isinstance(ts, datetime): - render_data["ts"] = ts + timedelta(microseconds=sensor_delay_us) - elif isinstance(ts, (int, float)): - render_data["ts"] = ts + sensor_delay_us / 1_000_000 if render_data.get( "_allow_sensor_observation_variance" ) and not _locks_sensor_packet_accounting(render_data): diff --git a/tests/unit/test_zeek_multiplex.py b/tests/unit/test_zeek_multiplex.py index 6fb3dce6..fd2c022e 100644 --- a/tests/unit/test_zeek_multiplex.py +++ b/tests/unit/test_zeek_multiplex.py @@ -238,6 +238,8 @@ def test_sensor_timestamp_offsets_vary_by_flow(self): for port in sorted(core_by_port) ] + assert any(offset < 0 for offset in offsets) + assert any(offset > 0 for offset in offsets) assert max(offsets) - min(offsets) > 0.0005 assert len(set(offsets)) > 30 assert max(abs(offset) for offset in offsets) <= 0.005 From a097f3029da3fb45e6208b1a2bb0aa68258691cc Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 03:06:55 -0400 Subject: [PATCH 47/61] docs: record loop 23 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 69834bf4..05296b5f 100644 --- a/TODO.md +++ b/TODO.md @@ -394,7 +394,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 22 fix pass — repaired the multi-reviewer Linux endpoint realism finding at the generator/config layers: bash histories now use per-host/user session pacing with bounded quick-command streaks, command pools suppress high exact-repeat counts, and extra syslog daemon pools are scoped by system type/role with lower weights and expanded templates. Verification before regeneration passed with focused regressions (`46 passed`), related activity/config tests (`212 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - [x] Loop 22 regeneration, hard probes, quantitative eval, and blind review completed from commit `ecc45ef`: repaired Linux endpoint cadence fingerprints by adding session-aware bash pacing, tighter exact-command repeat suppression, and system-type/role-scoped extra syslog pools. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes found zero server-side desktop-daemon syslog rows, zero non-database `multipathd` rows, bash `<=10s` gaps at `47/726` (`6.47%`), median bash delta `85s`, and max per-file exact-command repeat `4`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `66`, Network `84`, Host/EDR `42` (Host/EDR verdict Real at confidence `58`; average synthetic-confidence `67.5`). Top Loop 23 target is the hard Zeek multi-sensor timing fingerprint where mirrored DMZ records always trail core records by a tiny positive offset; next targets are public DNS/X.509 corpus templates, remaining Linux daemon-message repetition, DNS TXT tunnel vocabulary/cadence, and HTTP connection reuse. - [x] Loop 23 fix pass — repaired the Zeek multi-sensor timing texture at the shared Zeek multiplex layer by applying independent clock skew/drift plus per-flow capture delay to every observing sensor, not only secondary sensors, so mirrored rows can have positive or negative cross-sensor deltas while remaining within a well-synced low-millisecond envelope. Verification before regeneration passed with focused timing tests (`3 passed`), related Zeek fanout/multiplex/HTTP/SSL/files/activity tests (`116 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - - [ ] **IN PROGRESS** Loop 23 regeneration, hard probes, quantitative eval, and blind review. + - [x] Loop 23 regeneration, hard probes, quantitative eval, and blind review completed from commit `91546d7`: repaired the hard one-way Zeek multi-sensor timing fingerprint. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes matched `2,782` core/DMZ Zeek rows across conn/http/dns/ssl, confirmed every checked format now has both positive and negative offsets, and found zero always-positive formats. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `66`, Network `84`, Host/EDR `62` (average `71.5`). Top Loop 24 target is public DNS/X.509 corpus realism because the network specialist continues to flag templated NS/MX/SOA answers, exact host-plus-wildcard SANs, and clustered certificate validity periods at high confidence; next targets are remaining Linux daemon-message repetition, DNS TXT tunnel grammar/cadence, HTTP connection reuse, and Sysmon SYSTEM `LogonGuid` morphology. + - [ ] **IN PROGRESS** Loop 24 fix pass — repair public DNS and X.509 corpus realism by replacing provider-agnostic NS/MX/SOA templates and exact host-plus-wildcard certificate SAN/validity patterns with data-driven provider/domain-class records and less uniform certificate profiles. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 999a20e99ee8053550a56171eb5100e97d3a0325 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 03:18:39 -0400 Subject: [PATCH 48/61] fix: diversify public dns and certificate profiles --- TODO.md | 3 +- src/evidenceforge/cli/validate_config.py | 12 ++ .../config/activity/public_dns_profiles.yaml | 148 ++++++++++++++++++ .../config/activity/tls_issuers.yaml | 22 +-- .../config/activity/tls_realism.yaml | 8 + src/evidenceforge/config/schemas.py | 78 +++++++++ .../generation/activity/generator.py | 147 +++++++++++++++-- .../activity/public_dns_profiles.py | 50 ++++++ tests/unit/test_dhcp_and_certs.py | 27 +++- tests/unit/test_phase5_network_diversity.py | 3 +- tests/unit/test_public_dns_profiles.py | 37 +++++ tests/unit/test_validate_config.py | 36 +++++ 12 files changed, 534 insertions(+), 37 deletions(-) create mode 100644 src/evidenceforge/config/activity/public_dns_profiles.yaml create mode 100644 src/evidenceforge/generation/activity/public_dns_profiles.py create mode 100644 tests/unit/test_public_dns_profiles.py diff --git a/TODO.md b/TODO.md index 05296b5f..9f44fcad 100644 --- a/TODO.md +++ b/TODO.md @@ -395,7 +395,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 22 regeneration, hard probes, quantitative eval, and blind review completed from commit `ecc45ef`: repaired Linux endpoint cadence fingerprints by adding session-aware bash pacing, tighter exact-command repeat suppression, and system-type/role-scoped extra syslog pools. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes found zero server-side desktop-daemon syslog rows, zero non-database `multipathd` rows, bash `<=10s` gaps at `47/726` (`6.47%`), median bash delta `85s`, and max per-file exact-command repeat `4`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `66`, Network `84`, Host/EDR `42` (Host/EDR verdict Real at confidence `58`; average synthetic-confidence `67.5`). Top Loop 23 target is the hard Zeek multi-sensor timing fingerprint where mirrored DMZ records always trail core records by a tiny positive offset; next targets are public DNS/X.509 corpus templates, remaining Linux daemon-message repetition, DNS TXT tunnel vocabulary/cadence, and HTTP connection reuse. - [x] Loop 23 fix pass — repaired the Zeek multi-sensor timing texture at the shared Zeek multiplex layer by applying independent clock skew/drift plus per-flow capture delay to every observing sensor, not only secondary sensors, so mirrored rows can have positive or negative cross-sensor deltas while remaining within a well-synced low-millisecond envelope. Verification before regeneration passed with focused timing tests (`3 passed`), related Zeek fanout/multiplex/HTTP/SSL/files/activity tests (`116 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - [x] Loop 23 regeneration, hard probes, quantitative eval, and blind review completed from commit `91546d7`: repaired the hard one-way Zeek multi-sensor timing fingerprint. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes matched `2,782` core/DMZ Zeek rows across conn/http/dns/ssl, confirmed every checked format now has both positive and negative offsets, and found zero always-positive formats. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `66`, Network `84`, Host/EDR `62` (average `71.5`). Top Loop 24 target is public DNS/X.509 corpus realism because the network specialist continues to flag templated NS/MX/SOA answers, exact host-plus-wildcard SANs, and clustered certificate validity periods at high confidence; next targets are remaining Linux daemon-message repetition, DNS TXT tunnel grammar/cadence, HTTP connection reuse, and Sysmon SYSTEM `LogonGuid` morphology. - - [ ] **IN PROGRESS** Loop 24 fix pass — repair public DNS and X.509 corpus realism by replacing provider-agnostic NS/MX/SOA templates and exact host-plus-wildcard certificate SAN/validity patterns with data-driven provider/domain-class records and less uniform certificate profiles. + - [x] Loop 24 fix pass — repaired public DNS and X.509 corpus realism by replacing provider-agnostic NS/MX/SOA templates with data-driven provider/domain-class NS, MX, and SOA answer profiles; making public certificate SANs weighted instead of exact host-plus-wildcard by default; and widening/reweighting short-lived public CA validity profiles. Verification passed with focused DNS/TLS/config tests, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3148 passed, 37 skipped`). + - [ ] **IN PROGRESS** Loop 24 regeneration, hard probes, quantitative eval, and blind review. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/src/evidenceforge/cli/validate_config.py b/src/evidenceforge/cli/validate_config.py index 59e70a40..30d4eecb 100644 --- a/src/evidenceforge/cli/validate_config.py +++ b/src/evidenceforge/cli/validate_config.py @@ -197,6 +197,9 @@ def validate_config() -> ValidationResult: "activity/tls_realism.yaml": { "dict_fields": {"san", "serial_numbers", "ocsp", "certificate_chains", "destinations"}, }, + "activity/public_dns_profiles.yaml": { + "list_fields": {"nameserver_profiles": "name", "mail_profiles": "name"}, + }, "activity/smb_file_transfers.yaml": { "list_fields": {"mime_types": None, "analyzer_sets": None}, }, @@ -470,6 +473,7 @@ def validate_config() -> ValidationResult: from evidenceforge.generation.activity.process_network import load_process_network_map from evidenceforge.generation.activity.proxy_uri import load_proxy_uri_templates from evidenceforge.generation.activity.proxy_user_agents import load_proxy_user_agents + from evidenceforge.generation.activity.public_dns_profiles import load_public_dns_profiles from evidenceforge.generation.activity.site_maps import load_site_maps from evidenceforge.generation.activity.spawn_rules import load_spawn_rules from evidenceforge.generation.activity.system_processes import load_system_processes @@ -480,6 +484,7 @@ def validate_config() -> ValidationResult: from evidenceforge.generation.activity.windows_auth_realism import load_windows_auth_realism dns_data = load_dns_registry() + public_dns_profiles_data = load_public_dns_profiles() ids_data = load_ids_signatures() catalog_data = load_catalog() traffic_data = load_traffic_profiles() @@ -1719,6 +1724,7 @@ def _record_ids_rule_identity( ProcessAccessPatternEntry, ProcessNetworkEntry, ProxyUserAgentOverrideEntry, + PublicDnsProfilesConfig, PublicNtpServerEntry, RemoteThreadStartLocationEntry, ScheduledTaskEntry, @@ -1892,6 +1898,12 @@ def _record_ids_rule_identity( if tls_realism_data: _SCHEMA_CHECKS.append(([tls_realism_data], TlsRealismConfig, "tls_realism.yaml")) + # public_dns_profiles.yaml + if public_dns_profiles_data: + _SCHEMA_CHECKS.append( + ([public_dns_profiles_data], PublicDnsProfilesConfig, "public_dns_profiles.yaml") + ) + # kerberos_realism.yaml from evidenceforge.generation.activity.kerberos_realism import load_kerberos_realism diff --git a/src/evidenceforge/config/activity/public_dns_profiles.yaml b/src/evidenceforge/config/activity/public_dns_profiles.yaml new file mode 100644 index 00000000..34e8695b --- /dev/null +++ b/src/evidenceforge/config/activity/public_dns_profiles.yaml @@ -0,0 +1,148 @@ +# public_dns_profiles.yaml - provider-style public DNS answer profiles. +# +# Purpose: Keeps public NS/MX/SOA companion answers from collapsing into +# generated-looking ns1/ns2, mail, and hostmaster templates. +# +# User customizations go in: +# .eforge/config/activity/public_dns_profiles.yaml +# +# Overlay behavior: profile lists merge by name. + +nameserver_profiles: + - name: google + weight: 0 + match_suffixes: + - google.com + - googleapis.com + - gstatic.com + - pki.goog + - youtube.com + answer_sets: + - ["ns1.google.com", "ns2.google.com", "ns3.google.com", "ns4.google.com"] + soa_rnames: ["dns-admin.google.com"] + + - name: microsoft_azure + weight: 0 + match_suffixes: + - azure.com + - bing.com + - live.com + - microsoft.com + - microsoftonline.com + - msn.com + - office.com + - office.net + - office365.com + - windows.net + - windowsupdate.com + answer_sets: + - ["ns1-39.azure-dns.com", "ns2-39.azure-dns.net", "ns3-39.azure-dns.org", "ns4-39.azure-dns.info"] + - ["ns1-205.azure-dns.com", "ns2-205.azure-dns.net", "ns3-205.azure-dns.org", "ns4-205.azure-dns.info"] + soa_rnames: ["azuredns-hostmaster.microsoft.com"] + + - name: amazon_route53 + weight: 22 + match_suffixes: + - amazon.com + - amazonaws.com + - awsstatic.com + - cloudfront.net + answer_sets: + - ["ns-119.awsdns-14.org", "ns-421.awsdns-52.com", "ns-918.awsdns-50.net", "ns-1796.awsdns-32.co.uk"] + - ["ns-52.awsdns-06.com", "ns-823.awsdns-38.net", "ns-1374.awsdns-43.org", "ns-1641.awsdns-13.co.uk"] + - ["ns-279.awsdns-34.com", "ns-1010.awsdns-62.net", "ns-1178.awsdns-19.org", "ns-1884.awsdns-43.co.uk"] + soa_rnames: ["awsdns-hostmaster.amazon.com"] + + - name: cloudflare + weight: 20 + match_suffixes: + - cloudflare.com + answer_sets: + - ["abby.ns.cloudflare.com", "ray.ns.cloudflare.com"] + - ["amir.ns.cloudflare.com", "daisy.ns.cloudflare.com"] + - ["elsa.ns.cloudflare.com", "hank.ns.cloudflare.com"] + - ["jill.ns.cloudflare.com", "norm.ns.cloudflare.com"] + soa_rnames: ["dns.cloudflare.com"] + + - name: akamai + weight: 11 + answer_sets: + - ["a1-66.akam.net", "a11-67.akam.net", "a16-64.akam.net", "a18-65.akam.net", "a26-66.akam.net"] + - ["use1.akam.net", "use2.akam.net", "usw2.akam.net", "eur2.akam.net", "asia1.akam.net"] + soa_rnames: ["hostmaster.akamai.com"] + + - name: fastly + weight: 9 + answer_sets: + - ["ns1.fastly.net", "ns2.fastly.net", "ns3.fastly.net", "ns4.fastly.net"] + - ["ns1.p04.dynect.net", "ns2.p04.dynect.net", "ns3.p04.dynect.net", "ns4.p04.dynect.net"] + soa_rnames: ["hostmaster.fastly.com"] + + - name: registrar_dns + weight: 17 + answer_sets: + - ["dns1.registrar-servers.com", "dns2.registrar-servers.com"] + - ["ns1.dnsimple.com", "ns2.dnsimple-edge.net", "ns3.dnsimple.com", "ns4.dnsimple-edge.org"] + - ["ns1.hover.com", "ns2.hover.com"] + - ["ns1.name-services.com", "ns2.name-services.com", "ns3.name-services.com", "ns4.name-services.com"] + soa_rnames: ["hostmaster.registrar-servers.com", "admin.dnsimple.com"] + + - name: domain_owned + weight: 4 + answer_sets: + - ["ns1.{domain}", "ns2.{domain}"] + - ["dns1.{domain}", "dns2.{domain}"] + soa_rnames: ["dns-admin.{domain}", "hostmaster.{domain}"] + +mail_profiles: + - name: google_workspace + weight: 24 + match_suffixes: + - google.com + - gmail.com + answer_sets: + - ["1 aspmx.l.google.com", "5 alt1.aspmx.l.google.com", "5 alt2.aspmx.l.google.com", "10 alt3.aspmx.l.google.com", "10 alt4.aspmx.l.google.com"] + + - name: microsoft_365 + weight: 23 + match_suffixes: + - microsoft.com + - office.com + - office365.com + - microsoftonline.com + answer_sets: + - ["0 {domain_hyphen}.mail.protection.outlook.com"] + + - name: proofpoint + weight: 14 + answer_sets: + - ["5 mxa-001b2d01.gslb.pphosted.com", "5 mxb-001b2d01.gslb.pphosted.com"] + - ["10 mxa-0023b701.gslb.pphosted.com", "10 mxb-0023b701.gslb.pphosted.com"] + + - name: mimecast + weight: 11 + answer_sets: + - ["10 us-smtp-inbound-1.mimecast.com", "10 us-smtp-inbound-2.mimecast.com"] + - ["10 eu-smtp-inbound-1.mimecast.com", "10 eu-smtp-inbound-2.mimecast.com"] + + - name: amazon_ses + weight: 8 + answer_sets: + - ["10 inbound-smtp.us-east-1.amazonaws.com"] + - ["10 feedback-smtp.us-west-2.amazonses.com"] + + - name: fastmail + weight: 7 + answer_sets: + - ["10 in1-smtp.messagingengine.com", "20 in2-smtp.messagingengine.com"] + + - name: null_mx + weight: 4 + answer_sets: + - ["0 ."] + + - name: domain_owned + weight: 5 + answer_sets: + - ["10 mx1.{domain}", "20 mx2.{domain}"] + - ["10 mailhost.{domain}"] diff --git a/src/evidenceforge/config/activity/tls_issuers.yaml b/src/evidenceforge/config/activity/tls_issuers.yaml index b33388e7..d7a55179 100644 --- a/src/evidenceforge/config/activity/tls_issuers.yaml +++ b/src/evidenceforge/config/activity/tls_issuers.yaml @@ -13,23 +13,23 @@ issuers: - name: "CN=R3, O=Let's Encrypt, C=US" - weight: 30 - validity_days_min: 89 + weight: 20 + validity_days_min: 84 validity_days_max: 90 not_before_max_days: 60 key_types: - {type: "rsa", length: 2048, weight: 100} - name: "CN=E1, O=Let's Encrypt, C=US" - weight: 10 - validity_days_min: 89 + weight: 6 + validity_days_min: 84 validity_days_max: 90 not_before_max_days: 60 key_types: - {type: "ecdsa", length: 256, weight: 100} - name: "CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1, O=DigiCert Inc, C=US" - weight: 15 + weight: 18 validity_days_min: 365 validity_days_max: 397 not_before_max_days: 300 @@ -38,7 +38,7 @@ issuers: - {type: "rsa", length: 4096, weight: 50} - name: "CN=Sectigo RSA Domain Validation Secure Server CA, O=Sectigo Limited, L=Salford, ST=Greater Manchester, C=GB" - weight: 10 + weight: 13 validity_days_min: 90 validity_days_max: 397 not_before_max_days: 300 @@ -47,16 +47,16 @@ issuers: - {type: "rsa", length: 4096, weight: 30} - name: "CN=GTS CA 1C3, O=Google Trust Services LLC, C=US" - weight: 15 - validity_days_min: 89 - validity_days_max: 90 + weight: 12 + validity_days_min: 85 + validity_days_max: 92 not_before_max_days: 60 key_types: - {type: "ecdsa", length: 256, weight: 80} - {type: "rsa", length: 2048, weight: 20} - name: "CN=Amazon RSA 2048 M01, O=Amazon, C=US" - weight: 10 + weight: 14 validity_days_min: 365 validity_days_max: 397 not_before_max_days: 300 @@ -64,7 +64,7 @@ issuers: - {type: "rsa", length: 2048, weight: 100} - name: "CN=GlobalSign Atlas R3 DV TLS CA 2024 Q1, O=GlobalSign nv-sa, C=BE" - weight: 10 + weight: 13 validity_days_min: 90 validity_days_max: 397 not_before_max_days: 300 diff --git a/src/evidenceforge/config/activity/tls_realism.yaml b/src/evidenceforge/config/activity/tls_realism.yaml index 31698307..cc75ab71 100644 --- a/src/evidenceforge/config/activity/tls_realism.yaml +++ b/src/evidenceforge/config/activity/tls_realism.yaml @@ -19,6 +19,14 @@ san: - net.au - org.au - org.uk + profile_weights: + apex_exact: 34 + apex_www: 26 + apex_wildcard: 14 + subdomain_exact: 34 + subdomain_parent: 18 + subdomain_wildcard: 16 + subdomain_sibling: 12 serial_numbers: byte_lengths: diff --git a/src/evidenceforge/config/schemas.py b/src/evidenceforge/config/schemas.py index 1bab08a6..250350bd 100644 --- a/src/evidenceforge/config/schemas.py +++ b/src/evidenceforge/config/schemas.py @@ -41,6 +41,62 @@ def tags_non_empty(cls, v: list[str]) -> list[str]: return v +class PublicDnsAnswerProfile(BaseModel, extra="forbid"): + """A public DNS provider-style answer profile.""" + + name: str + weight: int + match_suffixes: list[str] = Field(default_factory=list) + answer_sets: list[list[str]] + soa_rnames: list[str] = Field(default_factory=list) + + @field_validator("weight") + @classmethod + def weight_non_negative(cls, v: int) -> int: + if v < 0: + raise ValueError("weight must be non-negative") + return v + + @field_validator("match_suffixes", "soa_rnames") + @classmethod + def optional_strings_non_empty(cls, v: list[str], info) -> list[str]: + if any(not item for item in v): + raise ValueError(f"{info.field_name} entries must be non-empty") + return v + + @field_validator("answer_sets") + @classmethod + def answer_sets_non_empty(cls, v: list[list[str]]) -> list[list[str]]: + if not v: + raise ValueError("answer_sets must not be empty") + for answer_set in v: + if not answer_set: + raise ValueError("answer_sets entries must not be empty") + if any(not answer for answer in answer_set): + raise ValueError("answer strings must be non-empty") + return v + + +class PublicDnsProfilesConfig(BaseModel, extra="forbid"): + """Root schema for public_dns_profiles.yaml.""" + + nameserver_profiles: list[PublicDnsAnswerProfile] + mail_profiles: list[PublicDnsAnswerProfile] + + @field_validator("nameserver_profiles", "mail_profiles") + @classmethod + def profiles_non_empty( + cls, + v: list[PublicDnsAnswerProfile], + info, + ) -> list[PublicDnsAnswerProfile]: + if not v: + raise ValueError(f"{info.field_name} must not be empty") + if sum(profile.weight for profile in v) <= 0: + raise ValueError(f"{info.field_name} must include at least one positive weight") + return v + + # --- Application Catalog --- @@ -204,6 +260,28 @@ class TlsSanConfig(BaseModel, extra="forbid"): """SAN generation settings in tls_realism.yaml.""" multi_label_public_suffixes: list[str] + profile_weights: dict[str, int] = Field(default_factory=dict) + _VALID_PROFILE_KEYS: ClassVar[set[str]] = { + "apex_exact", + "apex_www", + "apex_wildcard", + "subdomain_exact", + "subdomain_parent", + "subdomain_wildcard", + "subdomain_sibling", + } + + @field_validator("profile_weights") + @classmethod + def profile_weights_valid(cls, v: dict[str, int]) -> dict[str, int]: + unknown = set(v) - cls._VALID_PROFILE_KEYS + if unknown: + raise ValueError(f"unknown SAN profile weights: {sorted(unknown)}") + if any(weight < 0 for weight in v.values()): + raise ValueError("SAN profile weights must be non-negative") + if v and sum(v.values()) <= 0: + raise ValueError("SAN profile weights must have a positive total") + return v class TlsSerialLength(BaseModel, extra="forbid"): diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 47491254..9a800bc8 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -1454,12 +1454,82 @@ def _jitter_default_connection_duration( def _dns_registrable_domain(hostname: str) -> str: """Return a practical DNS owner name for mail/TXT companion lookups.""" - parts = [part for part in hostname.rstrip(".").split(".") if part] + from evidenceforge.generation.activity.tls_realism import multi_label_public_suffixes + + parts = [part.lower() for part in hostname.rstrip(".").split(".") if part] if len(parts) <= 2: - return hostname.rstrip(".") + return ".".join(parts) + lowered = ".".join(parts) + for suffix in multi_label_public_suffixes(): + suffix_parts = suffix.split(".") + if lowered.endswith(f".{suffix}") and len(parts) > len(suffix_parts): + return ".".join(parts[-(len(suffix_parts) + 1) :]) return ".".join(parts[-2:]) +def _public_dns_profile(kind: str, domain: str) -> dict[str, Any]: + """Return a stable provider-style public DNS profile for a domain.""" + from evidenceforge.generation.activity.public_dns_profiles import load_public_dns_profiles + + profiles = load_public_dns_profiles().get(kind, []) + lowered = domain.lower().rstrip(".") + for profile in profiles: + suffixes = [str(suffix).lower().rstrip(".") for suffix in profile.get("match_suffixes", [])] + if any(lowered == suffix or lowered.endswith(f".{suffix}") for suffix in suffixes): + return profile + + weighted = [profile for profile in profiles if int(profile.get("weight", 0)) > 0] + if not weighted: + return {} + rng = random.Random(_stable_seed(f"public_dns_profile:{kind}:{lowered}")) + weights = [int(profile.get("weight", 0)) for profile in weighted] + return rng.choices(weighted, weights=weights, k=1)[0] + + +def _render_public_dns_answer(template: str, domain: str) -> str: + """Render a public DNS answer template using source-owned domain tokens.""" + return template.format( + domain=domain, + domain_hyphen=domain.replace(".", "-"), + ) + + +def _public_dns_answer_set(kind: str, domain: str) -> list[str]: + """Return stable provider-style answers for a public DNS record family.""" + profile = _public_dns_profile(kind, domain) + answer_sets = profile.get("answer_sets", []) + if not answer_sets: + return [] + rng = random.Random(_stable_seed(f"public_dns_answers:{kind}:{domain}:{profile.get('name')}")) + answers = rng.choice(answer_sets) + return [_render_public_dns_answer(str(answer), domain) for answer in answers] + + +def _public_dns_ns_answers(domain: str) -> list[str]: + """Return realistic public NS answers for a domain.""" + answers = _public_dns_answer_set("nameserver_profiles", domain) + return answers or [f"ns1.{domain}", f"ns2.{domain}"] + + +def _public_dns_mx_answers(domain: str) -> list[str]: + """Return realistic public MX answers for a domain.""" + answers = _public_dns_answer_set("mail_profiles", domain) + return answers or [f"10 mail.{domain}"] + + +def _public_dns_soa_answers(domain: str) -> list[str]: + """Return a realistic public SOA answer for a domain.""" + profile = _public_dns_profile("nameserver_profiles", domain) + nameservers = _public_dns_ns_answers(domain) + rnames = profile.get("soa_rnames", []) if profile else [] + if rnames: + rng = random.Random(_stable_seed(f"public_dns_soa_rname:{domain}:{profile.get('name')}")) + rname = _render_public_dns_answer(str(rng.choice(rnames)), domain) + else: + rname = f"dns-admin.{domain}" + return [f"{nameservers[0]} {rname}"] + + def _dns_txt_query_and_answer(rng: random.Random, hostname: str) -> tuple[str, str]: """Build a plausible TXT lookup for mail/authentication background noise.""" domain = _dns_registrable_domain(hostname) @@ -1749,8 +1819,8 @@ def _proxy_http_response_body_len( def _tls_san_dns_names(cert_name: str) -> list[str]: - """Build DNS SANs without wildcarding public suffixes.""" - from evidenceforge.generation.activity.tls_realism import multi_label_public_suffixes + """Build deterministic but varied DNS SANs without public-suffix wildcards.""" + from evidenceforge.generation.activity.tls_realism import load_tls_realism try: import ipaddress as _ipa @@ -1760,15 +1830,48 @@ def _tls_san_dns_names(cert_name: str) -> list[str]: except ValueError: pass - labels = [part for part in cert_name.rstrip(".").split(".") if part] + normalized = cert_name.rstrip(".").lower() + labels = [part for part in normalized.split(".") if part] if len(labels) < 2: - return [cert_name] - parent = ".".join(labels[1:]) - if len(labels) == 2 or parent in multi_label_public_suffixes(): - wildcard_base = cert_name - else: - wildcard_base = parent - return [cert_name, f"*.{wildcard_base}"] + return [normalized] + + base_domain = _dns_registrable_domain(normalized) + is_apex = normalized == base_domain + default_weights = { + "apex_exact": 34, + "apex_www": 26, + "apex_wildcard": 14, + "subdomain_exact": 34, + "subdomain_parent": 18, + "subdomain_wildcard": 16, + "subdomain_sibling": 12, + } + config_weights = load_tls_realism().get("san", {}).get("profile_weights", {}) + weights_by_name = {**default_weights, **config_weights} + profile_names = ( + ("apex_exact", "apex_www", "apex_wildcard") + if is_apex + else ("subdomain_exact", "subdomain_parent", "subdomain_wildcard", "subdomain_sibling") + ) + weights = [max(0, int(weights_by_name.get(name, 0))) for name in profile_names] + if sum(weights) <= 0: + weights = [1] * len(profile_names) + rng = random.Random(_stable_seed(f"tls_san_profile:{normalized}")) + profile = rng.choices(profile_names, weights=weights, k=1)[0] + + names = [normalized] + if profile == "apex_www": + names.append(f"www.{base_domain}") + elif profile == "apex_wildcard": + names.append(f"*.{base_domain}") + elif profile == "subdomain_parent": + names.append(base_domain) + elif profile == "subdomain_wildcard": + names.append(f"*.{base_domain}") + elif profile == "subdomain_sibling": + sibling = rng.choice(("api", "assets", "cdn", "static", "www")) + names.append(f"{sibling}.{base_domain}") + return list(dict.fromkeys(names)) def _is_ip_literal(value: str) -> bool: @@ -8914,7 +9017,10 @@ def _emit_dns_lookup( if _dns_hostname_allows_mx(hostname): qtype, qtype_name = 15, "MX" query = _dns_registrable_domain(hostname) - answers = [f"10 mail.{query}"] + if _dns_is_internal_name(query, ad_domain): + answers = [f"10 mail.{query}"] + else: + answers = _public_dns_mx_answers(query) else: qtype, qtype_name = 16, "TXT" query, txt_answer = _dns_txt_query_and_answer(rng, hostname) @@ -9004,16 +9110,25 @@ def _emit_dns_lookup( elif companion_kind == "NS": companion_qtype = 2 companion_query = _dns_registrable_domain(hostname) - companion_answers = [f"ns1.{companion_query}", f"ns2.{companion_query}"] + if _dns_is_internal_name(companion_query, ad_domain): + companion_answers = [f"ns1.{companion_query}", f"ns2.{companion_query}"] + else: + companion_answers = _public_dns_ns_answers(companion_query) elif companion_kind == "MX" and _dns_hostname_allows_mx(hostname): companion_qtype = 15 companion_query = _dns_registrable_domain(hostname) - companion_answers = [f"10 mail.{companion_query}"] + if _dns_is_internal_name(companion_query, ad_domain): + companion_answers = [f"10 mail.{companion_query}"] + else: + companion_answers = _public_dns_mx_answers(companion_query) else: companion_kind = "SOA" companion_qtype = 6 companion_query = _dns_registrable_domain(hostname) - companion_answers = [f"ns1.{companion_query} hostmaster.{companion_query}"] + if _dns_is_internal_name(companion_query, ad_domain): + companion_answers = [f"ns1.{companion_query} hostmaster.{companion_query}"] + else: + companion_answers = _public_dns_soa_answers(companion_query) companion_ctx = DnsContext( query=companion_query, trans_id=rng.randint(1, 65535), diff --git a/src/evidenceforge/generation/activity/public_dns_profiles.py b/src/evidenceforge/generation/activity/public_dns_profiles.py new file mode 100644 index 00000000..1c9f7fe8 --- /dev/null +++ b/src/evidenceforge/generation/activity/public_dns_profiles.py @@ -0,0 +1,50 @@ +# Copyright (c) 2026 Cisco Systems, Inc. and its affiliates +# SPDX-License-Identifier: MIT + +"""Public DNS answer profiles for NS/MX/SOA companion lookups.""" + +from __future__ import annotations + +from typing import Any + +from evidenceforge.config import get_activity_directory +from evidenceforge.config.overlay import load_with_overlay, merge_keyed_list + +_PROFILES_PATH = get_activity_directory() / "public_dns_profiles.yaml" +_CACHED_DATA: dict[str, Any] | None = None + + +def _merge_public_dns_profiles(default: dict, overlay: dict) -> dict: + """Merge public DNS profile overlays by profile name.""" + result = dict(default) + for key in ("nameserver_profiles", "mail_profiles"): + if key in overlay: + result[key] = merge_keyed_list( + default.get(key, []), + overlay[key], + key_field="name", + ) + for key, value in overlay.items(): + if key not in {"nameserver_profiles", "mail_profiles"}: + result[key] = value + return result + + +def load_public_dns_profiles() -> dict[str, Any]: + """Load public DNS answer profiles, merged with local overlay if present.""" + global _CACHED_DATA + if _CACHED_DATA is not None: + return _CACHED_DATA + + _CACHED_DATA = load_with_overlay( + _PROFILES_PATH, + "activity/public_dns_profiles.yaml", + _merge_public_dns_profiles, + ) + return _CACHED_DATA + + +def reset_public_dns_profiles_cache() -> None: + """Clear cached public DNS profiles. Intended for tests.""" + global _CACHED_DATA + _CACHED_DATA = None diff --git a/tests/unit/test_dhcp_and_certs.py b/tests/unit/test_dhcp_and_certs.py index dcabf9fc..127f291b 100644 --- a/tests/unit/test_dhcp_and_certs.py +++ b/tests/unit/test_dhcp_and_certs.py @@ -157,15 +157,26 @@ def test_rsa_named_issuers_only_emit_rsa_certificate_metadata(self): assert observed == {"rsa"}, issuer["name"] def test_san_dns_never_wildcards_public_suffix(self): - """Generated SAN lists should not contain impossible public-suffix wildcards.""" - assert _tls_san_dns_names("stackoverflow.com") == [ - "stackoverflow.com", - "*.stackoverflow.com", - ] - assert _tls_san_dns_names("gcr.io") == ["gcr.io", "*.gcr.io"] - assert _tls_san_dns_names("www.gstatic.com") == ["www.gstatic.com", "*.gstatic.com"] - assert _tls_san_dns_names("example.co.uk") == ["example.co.uk", "*.example.co.uk"] + """Generated SAN lists should vary while avoiding public-suffix wildcards.""" + assert _tls_san_dns_names("stackoverflow.com")[0] == "stackoverflow.com" + assert _tls_san_dns_names("gcr.io")[0] == "gcr.io" + assert _tls_san_dns_names("www.gstatic.com")[0] == "www.gstatic.com" + assert _tls_san_dns_names("example.co.uk")[0] == "example.co.uk" assert _tls_san_dns_names("203.0.113.45") == [] + all_names = { + name + for domain in [ + "stackoverflow.com", + "gcr.io", + "www.gstatic.com", + "example.co.uk", + "files.pythonhosted.org", + ] + for name in _tls_san_dns_names(domain) + } + assert "*.co.uk" not in all_names + assert "*.io" not in all_names + assert any(not name.startswith("*.") for name in all_names) def test_ocsp_status_is_stable_by_certificate_but_not_globally_flat(self): """OCSP status should be stable per cert while still varying across certs.""" diff --git a/tests/unit/test_phase5_network_diversity.py b/tests/unit/test_phase5_network_diversity.py index 329f64a6..93324077 100644 --- a/tests/unit/test_phase5_network_diversity.py +++ b/tests/unit/test_phase5_network_diversity.py @@ -165,7 +165,8 @@ def test_dns_lookup_emits_zeek_dns( elif qtype_name == "SRV": assert dns_ctx.query.startswith("_") elif qtype_name == "MX": - assert "mail." in dns_ctx.answers[0] + assert dns_ctx.answers + assert dns_ctx.answers[0].split(maxsplit=1)[0].isdigit() net = dns_se.network assert net.src_ip == "10.0.10.1" assert net.dst_port == 53 diff --git a/tests/unit/test_public_dns_profiles.py b/tests/unit/test_public_dns_profiles.py new file mode 100644 index 00000000..f0ad5197 --- /dev/null +++ b/tests/unit/test_public_dns_profiles.py @@ -0,0 +1,37 @@ +# Copyright (c) 2026 Cisco Systems, Inc. and its affiliates +# SPDX-License-Identifier: MIT + +"""Public DNS profile realism regression tests.""" + +from evidenceforge.generation.activity.generator import ( + _dns_registrable_domain, + _public_dns_mx_answers, + _public_dns_ns_answers, + _public_dns_soa_answers, +) + + +def test_registrable_domain_handles_common_multi_label_suffixes(): + assert _dns_registrable_domain("www.example.co.uk") == "example.co.uk" + assert _dns_registrable_domain("assets.service.com.au") == "service.com.au" + + +def test_public_dns_profiles_avoid_default_ns_mx_soa_templates(): + domain = "pypi.org" + + assert _public_dns_ns_answers(domain) != [f"ns1.{domain}", f"ns2.{domain}"] + assert _public_dns_mx_answers(domain) != [f"10 mail.{domain}"] + assert _public_dns_soa_answers(domain) != [f"ns1.{domain} hostmaster.{domain}"] + + +def test_public_dns_profiles_preserve_well_known_provider_overrides(): + assert _public_dns_ns_answers("google.com") == [ + "ns1.google.com", + "ns2.google.com", + "ns3.google.com", + "ns4.google.com", + ] + assert _public_dns_mx_answers("microsoft.com") == [ + "0 microsoft-com.mail.protection.outlook.com" + ] + assert _public_dns_soa_answers("microsoft.com")[0].endswith("azuredns-hostmaster.microsoft.com") diff --git a/tests/unit/test_validate_config.py b/tests/unit/test_validate_config.py index 5aba0885..c6243948 100644 --- a/tests/unit/test_validate_config.py +++ b/tests/unit/test_validate_config.py @@ -205,6 +205,42 @@ def load_invalid_tls_realism(): for issue in result.issues ) + def test_validate_config_rejects_invalid_public_dns_profile(self, monkeypatch): + from evidenceforge.generation.activity import public_dns_profiles + + def load_invalid_public_dns_profiles(): + return { + "nameserver_profiles": [ + { + "name": "bad", + "weight": -1, + "answer_sets": [["ns1.example.net"]], + } + ], + "mail_profiles": [], + } + + monkeypatch.setattr( + public_dns_profiles, + "load_public_dns_profiles", + load_invalid_public_dns_profiles, + ) + + result = validate_config() + + assert any( + issue.severity == "ERROR" + and issue.file == "public_dns_profiles.yaml" + and "weight must be non-negative" in issue.message + for issue in result.issues + ) + assert any( + issue.severity == "ERROR" + and issue.file == "public_dns_profiles.yaml" + and "mail_profiles must not be empty" in issue.message + for issue in result.issues + ) + def test_validate_config_warns_for_unknown_ocsp_responder(self, monkeypatch): from evidenceforge.generation.activity import dns_registry, tls_realism From 09076c187bc856c3ccfcb03b5136c0ec6d382830 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 03:33:41 -0400 Subject: [PATCH 49/61] docs: record loop 24 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 9f44fcad..df9f0d8b 100644 --- a/TODO.md +++ b/TODO.md @@ -396,7 +396,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 23 fix pass — repaired the Zeek multi-sensor timing texture at the shared Zeek multiplex layer by applying independent clock skew/drift plus per-flow capture delay to every observing sensor, not only secondary sensors, so mirrored rows can have positive or negative cross-sensor deltas while remaining within a well-synced low-millisecond envelope. Verification before regeneration passed with focused timing tests (`3 passed`), related Zeek fanout/multiplex/HTTP/SSL/files/activity tests (`116 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - [x] Loop 23 regeneration, hard probes, quantitative eval, and blind review completed from commit `91546d7`: repaired the hard one-way Zeek multi-sensor timing fingerprint. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes matched `2,782` core/DMZ Zeek rows across conn/http/dns/ssl, confirmed every checked format now has both positive and negative offsets, and found zero always-positive formats. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `66`, Network `84`, Host/EDR `62` (average `71.5`). Top Loop 24 target is public DNS/X.509 corpus realism because the network specialist continues to flag templated NS/MX/SOA answers, exact host-plus-wildcard SANs, and clustered certificate validity periods at high confidence; next targets are remaining Linux daemon-message repetition, DNS TXT tunnel grammar/cadence, HTTP connection reuse, and Sysmon SYSTEM `LogonGuid` morphology. - [x] Loop 24 fix pass — repaired public DNS and X.509 corpus realism by replacing provider-agnostic NS/MX/SOA templates with data-driven provider/domain-class NS, MX, and SOA answer profiles; making public certificate SANs weighted instead of exact host-plus-wildcard by default; and widening/reweighting short-lived public CA validity profiles. Verification passed with focused DNS/TLS/config tests, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3148 passed, 37 skipped`). - - [ ] **IN PROGRESS** Loop 24 regeneration, hard probes, quantitative eval, and blind review. + - [x] Loop 24 regeneration, hard probes, quantitative eval, and blind review completed from commit `999a20e`: regenerated eval passed at exact `96.23/100` across `78,991` records; hard probes found zero old public DNS `ns1/ns2`, `mail.`, or `ns1/hostmaster` companion templates, public host-cert exact host-plus-wildcard SANs at `35/431` (`8.12%`), and `69` distinct certificate validity day counts. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `69`, Network `68`, Host/EDR `64` (average `69.25`). The Network reviewer explicitly called DNS a realism strength after the fix. Top Loop 25 target is Zeek HTTP connection reuse because all `3,373` HTTP rows still have `trans_depth=1` with zero repeated HTTP UIDs; next targets are Linux syslog daemon-message statefulness, DNS TXT tunnel cadence/grammar, proxy/C2 browser artifacts, and larger/messier X.509 SAN sets. + - [ ] **IN PROGRESS** Loop 25 fix pass — repair Zeek HTTP connection/session realism by introducing source-native persistent HTTP/1.1 reuse for browser-like asset fetches and internal web applications, with repeated UIDs and `trans_depth > 1` where appropriate. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From dd56f084ddd672a42e474464c88ec3a8d619f751 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 03:48:14 -0400 Subject: [PATCH 50/61] fix: model persistent zeek http transactions --- src/evidenceforge/events/contexts.py | 1 + .../generation/activity/generator.py | 52 +++++++++- .../generation/emitters/cisco_asa.py | 6 +- src/evidenceforge/generation/emitters/ecar.py | 6 ++ .../generation/emitters/snort.py | 6 +- .../generation/emitters/sysmon.py | 5 +- src/evidenceforge/generation/emitters/zeek.py | 6 +- .../generation/emitters/zeek_weird.py | 6 +- .../generation/engine/baseline.py | 80 +++++++++++++-- tests/unit/test_activity.py | 97 +++++++++++++++++++ tests/unit/test_baseline_canonical.py | 6 ++ tests/unit/test_zeek_http.py | 39 ++++++++ 12 files changed, 294 insertions(+), 16 deletions(-) diff --git a/src/evidenceforge/events/contexts.py b/src/evidenceforge/events/contexts.py index c81943c6..e281ff55 100644 --- a/src/evidenceforge/events/contexts.py +++ b/src/evidenceforge/events/contexts.py @@ -164,6 +164,7 @@ class NetworkContext: missed_bytes: int = 0 initiating_pid: int = -1 # PID of process that opened this connection (-1 = unknown) link_local: bool = False # True for same-broadcast-domain traffic such as DHCP + application_layer_only: bool = False # Additional protocol transaction on an existing flow @dataclass(slots=True) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 9a800bc8..2e6afb90 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -1377,6 +1377,7 @@ def _tcp_success_history(rng: random.Random) -> str: _PROXY_SC_OVERHEAD = (50, 250) # Via, X-Cache, Age, etc. _AUTO_WEIRD_ENABLED = False # weird.log realism is deferred; explicit contexts still render. _EXPLICIT_PROXY_TUNNEL_TIMEOUT_S = 240 +_HTTP_PERSISTENT_CONNECTION_TIMEOUT_S = 45.0 # Kerberos TGS service name distribution (weighted) _KERBEROS_SVC_DIST = ( @@ -2116,6 +2117,9 @@ def __init__( self._explicit_proxy_tunnels: dict[ tuple[str, str, str, str, int], tuple[datetime, str] ] = {} + self._http_persistent_connections: dict[ + tuple[str, str, int, str, str], tuple[datetime, str, int] + ] = {} self._recent_connection_tuples: dict[tuple[str, int, str, int, str], float] = {} self._recent_icmp_observations: set[tuple[str, int, str, int, int]] = set() self._ssh_source_ports: set[tuple[str, str, int]] = set() @@ -6090,8 +6094,6 @@ def generate_connection( """ from evidenceforge.events.contexts import NetworkContext - if http is not None and http.trans_depth != 1: - http = replace(http, trans_depth=1) if http is not None: http = _normalize_http_context_for_source_native_response(http) @@ -6688,6 +6690,36 @@ def generate_connection( ): resolved_source_system = self._ip_to_system[src_ip] + http_application_layer_only = False + reused_http_uid = "" + http_persistent_key: tuple[str, str, int, str, str] | None = None + if http is not None and proto == "tcp" and service == "http" and dst_port > 0: + http_host_key = (http.host or hostname or dst_ip).lower().rstrip(".") + http_user_agent_key = (http.user_agent or "").lower() + http_persistent_key = ( + src_ip, + dst_ip, + dst_port, + http_host_key, + http_user_agent_key, + ) + if http.trans_depth > 1: + cached = self._http_persistent_connections.get(http_persistent_key) + if cached is not None: + last_activity, cached_uid, cached_src_port = cached + elapsed = (time - last_activity).total_seconds() + if 0 <= elapsed <= _HTTP_PERSISTENT_CONNECTION_TIMEOUT_S: + src_port = cached_src_port + reused_http_uid = cached_uid + http_application_layer_only = True + self._http_persistent_connections[http_persistent_key] = ( + time, + cached_uid, + cached_src_port, + ) + if not http_application_layer_only: + http = replace(http, trans_depth=1) + if proto == "icmp": src_port = 0 dst_port = 0 @@ -6823,6 +6855,8 @@ def generate_connection( close_time=close_time, ) uid = self.state_manager.get_zeek_uid(conn_id) + if reused_http_uid: + uid = reused_http_uid if orig_bytes is not None and resp_bytes is not None: self.state_manager.update_connection_bytes(conn_id, orig_bytes, resp_bytes) @@ -7217,6 +7251,7 @@ def generate_connection( ip_proto=ip_proto, missed_bytes=missed_bytes, initiating_pid=pid, + application_layer_only=http_application_layer_only, ), edr=EdrContext(object_id=str(uuid.uuid4()), actor_id=conn_actor_id), ) @@ -7906,6 +7941,18 @@ def generate_connection( if not _AUTO_WEIRD_ENABLED: rng.random() + if ( + http_persistent_key is not None + and event.http is not None + and event.network.conn_state == "SF" + and not event.network.application_layer_only + ): + self._http_persistent_connections[http_persistent_key] = ( + time, + uid, + src_port, + ) + # Phase 3: Dispatch to matching emitters (visibility handled by dispatcher) self.dispatcher.dispatch(event) logger.debug(f"Generated connection: {src_ip} -> {dst_ip}:{dst_port} (UID: {uid})") @@ -7919,6 +7966,7 @@ def generate_connection( wfp_system and _get_os_category(wfp_system.os) == "windows" and (pid > 0 or wfp_application is not None) + and not event.network.application_layer_only ): self.generate_wfp_connection( system=wfp_system, diff --git a/src/evidenceforge/generation/emitters/cisco_asa.py b/src/evidenceforge/generation/emitters/cisco_asa.py index 32f5ebd1..17ee4268 100644 --- a/src/evidenceforge/generation/emitters/cisco_asa.py +++ b/src/evidenceforge/generation/emitters/cisco_asa.py @@ -286,7 +286,11 @@ def _teardown_byte_count(net: Any, protocol: str, conn_id: int) -> int: def can_handle(self, event: SecurityEvent) -> bool: """Handle all connection events with network context.""" - return event.event_type in self._supported_types and event.network is not None + return ( + event.event_type in self._supported_types + and event.network is not None + and not event.network.application_layer_only + ) def emit(self, event: SecurityEvent) -> None: """Render ASA syslog records from a connection event. diff --git a/src/evidenceforge/generation/emitters/ecar.py b/src/evidenceforge/generation/emitters/ecar.py index 9907a3a4..e79fcd9d 100644 --- a/src/evidenceforge/generation/emitters/ecar.py +++ b/src/evidenceforge/generation/emitters/ecar.py @@ -141,6 +141,12 @@ def can_handle(self, event: SecurityEvent) -> bool: """ if event.firewall is not None and event.firewall.action == "deny": return False + if ( + event.event_type == "connection" + and event.network is not None + and event.network.application_layer_only + ): + return False return event.event_type in self._supported_types def emit(self, event: SecurityEvent) -> None: diff --git a/src/evidenceforge/generation/emitters/snort.py b/src/evidenceforge/generation/emitters/snort.py index 3a3d1480..bc7a12d2 100644 --- a/src/evidenceforge/generation/emitters/snort.py +++ b/src/evidenceforge/generation/emitters/snort.py @@ -46,7 +46,11 @@ class SnortEmitter(SensorMultiplexEmitter): def can_handle(self, event: SecurityEvent) -> bool: """Handle connection events that carry an IdsContext.""" - return event.event_type in self._supported_types and event.ids is not None + return ( + event.event_type in self._supported_types + and event.ids is not None + and not (event.network is not None and event.network.application_layer_only) + ) def emit(self, event: SecurityEvent) -> None: """Render IdsContext to Snort fast alert format.""" diff --git a/src/evidenceforge/generation/emitters/sysmon.py b/src/evidenceforge/generation/emitters/sysmon.py index e9946c30..ee53194f 100644 --- a/src/evidenceforge/generation/emitters/sysmon.py +++ b/src/evidenceforge/generation/emitters/sysmon.py @@ -682,7 +682,10 @@ def emit(self, event: SecurityEvent) -> None: self._render_sysmon_process_access(event) elif event.event_type == "connection": # Connection events can produce Event 3 (NetworkConnect) and/or Event 22 (DNSQuery) - if self._passes_event3_filter(event): + is_application_layer_only = ( + event.network is not None and event.network.application_layer_only + ) + if not is_application_layer_only and self._passes_event3_filter(event): self._render_sysmon_network_connect(event) if event.dns and self._passes_event22_filter(event): self._render_sysmon_dns_query(event) diff --git a/src/evidenceforge/generation/emitters/zeek.py b/src/evidenceforge/generation/emitters/zeek.py index 17b29ec0..65eda25f 100644 --- a/src/evidenceforge/generation/emitters/zeek.py +++ b/src/evidenceforge/generation/emitters/zeek.py @@ -67,7 +67,11 @@ class ZeekEmitter(SensorMultiplexEmitter): def can_handle(self, event: SecurityEvent) -> bool: """Zeek conn emitter handles connection and session events with network context.""" - return event.event_type in self._supported_types and event.network is not None + return ( + event.event_type in self._supported_types + and event.network is not None + and not event.network.application_layer_only + ) @staticmethod def _normalize_history_for_state(conn_state: str, history: str) -> str: diff --git a/src/evidenceforge/generation/emitters/zeek_weird.py b/src/evidenceforge/generation/emitters/zeek_weird.py index 36ab6b4e..c3368b30 100644 --- a/src/evidenceforge/generation/emitters/zeek_weird.py +++ b/src/evidenceforge/generation/emitters/zeek_weird.py @@ -40,7 +40,11 @@ class ZeekWeirdEmitter(SensorMultiplexEmitter): def can_handle(self, event: SecurityEvent) -> bool: """Only handle connection events that carry WeirdContext.""" - return event.event_type == "connection" and event.weird is not None + return ( + event.event_type == "connection" + and event.weird is not None + and not (event.network is not None and event.network.application_layer_only) + ) def emit(self, event: SecurityEvent) -> None: """Render weird.log entry from WeirdContext + NetworkContext.""" diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index f7d84c54..699ff0f6 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -4087,10 +4087,37 @@ def _http_status_message(status: int) -> str: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0", ] session_ua = rng.choice(_session_uas) + request_groups: dict[str, dict[str, int]] = {} + for req in session_requests: + group = request_groups.setdefault( + req.hostname, + { + "last_offset_ms": req.time_offset_ms, + "request_body_len": 0, + "response_body_len": 0, + }, + ) + group["last_offset_ms"] = max(group["last_offset_ms"], req.time_offset_ms) + group["request_body_len"] += req.request_body_len + group["response_body_len"] += req.response_body_len + seen_request_groups: set[str] = set() for req in session_requests: req_ts = base_ts + timedelta(milliseconds=req.time_offset_ms) self.state_manager.set_current_time(req_ts) + group = request_groups[req.hostname] + first_in_group = req.hostname not in seen_request_groups + seen_request_groups.add(req.hostname) + conn_duration = rng.uniform(0.05, 2.0) + conn_orig_bytes = req.request_body_len + conn_resp_bytes = req.response_body_len + trans_depth = req.trans_depth + if first_in_group: + trans_depth = 1 + remaining_ms = max(0, group["last_offset_ms"] - req.time_offset_ms) + conn_duration = (remaining_ms / 1000) + rng.uniform(0.25, 1.75) + conn_orig_bytes = max(req.request_body_len, group["request_body_len"]) + conn_resp_bytes = max(req.response_body_len, group["response_body_len"]) # Resolve destination IP for CDN subresources req_dst_ip = dst_ip @@ -4122,7 +4149,7 @@ def _http_status_message(status: int) -> str: status_code=req.status_code, status_msg=_http_status_message(req.status_code), referrer=req.referrer, - trans_depth=req.trans_depth, + trans_depth=trans_depth, resp_mime_types=response_mime_types_for_status( req.status_code, req.content_type, @@ -4139,9 +4166,9 @@ def _http_status_message(status: int) -> str: dst_port=conn.get("port", 443), proto=conn.get("proto", "tcp"), service=conn.get("service"), - duration=rng.uniform(0.05, 2.0), - orig_bytes=req.request_body_len, - resp_bytes=req.response_body_len, + duration=conn_duration, + orig_bytes=conn_orig_bytes, + resp_bytes=conn_resp_bytes, emit_dns=req.is_page_load or req_hostname != hostname, source_system=system, hostname=req_hostname, @@ -6046,6 +6073,7 @@ def _tool_gap_ms() -> int: require_browser_like_domain=False, ) current_page_allowed = False + visible_requests = [] for req in session_requests: if req.is_page_load: if top_level_emitted >= top_level_budget: @@ -6056,7 +6084,6 @@ def _tool_gap_ms() -> int: continue if req.hostname != http_host: continue - req_ts = base_ts + timedelta(milliseconds=req.time_offset_ms) if is_stable_resource_path(req.path) and not req.is_page_load: cache_seen = getattr(self, "_web_static_cache_seen", None) if not isinstance(cache_seen, dict): @@ -6066,6 +6093,28 @@ def _tool_gap_ms() -> int: cache_seen[cache_key] += 1 continue cache_seen[cache_key] = 1 + visible_requests.append(req) + + request_groups: dict[str, dict[str, int]] = {} + for req in visible_requests: + group = request_groups.setdefault( + req.hostname, + { + "last_offset_ms": req.time_offset_ms, + "request_body_len": 0, + "response_body_len": 0, + }, + ) + group["last_offset_ms"] = max(group["last_offset_ms"], req.time_offset_ms) + group["request_body_len"] += max(200, req.request_body_len) + group["response_body_len"] += req.response_body_len + seen_request_groups: set[str] = set() + + for req in visible_requests: + req_ts = base_ts + timedelta(milliseconds=req.time_offset_ms) + group = request_groups[req.hostname] + first_in_group = req.hostname not in seen_request_groups + seen_request_groups.add(req.hostname) self.activity_generator.generate_connection( src_ip=client_ip, dst_ip=effective_dst_ip, @@ -6073,9 +6122,22 @@ def _tool_gap_ms() -> int: dst_port=dst_port, proto="tcp", service=dst_service, - duration=rng.uniform(0.03, 2.0), - orig_bytes=max(200, req.request_body_len), - resp_bytes=req.response_body_len, + duration=( + (max(0, group["last_offset_ms"] - req.time_offset_ms) / 1000) + + rng.uniform(0.25, 1.75) + if first_in_group + else rng.uniform(0.03, 2.0) + ), + orig_bytes=( + group["request_body_len"] + if first_in_group + else max(200, req.request_body_len) + ), + resp_bytes=( + max(req.response_body_len, group["response_body_len"]) + if first_in_group + else req.response_body_len + ), source_system=client_sys, http=HttpContext( method=req.method, @@ -6088,7 +6150,7 @@ def _tool_gap_ms() -> int: status_code=req.status_code, status_msg=_status_message(req.status_code), referrer=req.referrer, - trans_depth=req.trans_depth, + trans_depth=1 if first_in_group else req.trans_depth, resp_mime_types=response_mime_types_for_status( req.status_code, req.content_type, diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 0b747220..79f47f3a 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -3676,6 +3676,103 @@ def test_generate_connection_clamps_http_depth_for_one_request_connections( assert event.http.trans_depth == 1 assert http.trans_depth == 4 + def test_generate_connection_reuses_http_uid_for_persistent_transactions(self, state_manager): + """Later HTTP transactions on a warm connection should reuse one Zeek UID.""" + + class CollectorEmitter: + def __init__(self, predicate): + self._predicate = predicate + self.events = [] + + def can_handle(self, event): + return self._predicate(event) + + def emit(self, event): + self.events.append(event) + + conn_emitter = CollectorEmitter( + lambda event: ( + event.event_type == "connection" + and event.network is not None + and not event.network.application_layer_only + ) + ) + http_emitter = CollectorEmitter( + lambda event: event.event_type == "connection" and event.http is not None + ) + edr_emitter = CollectorEmitter( + lambda event: ( + event.event_type == "connection" + and event.network is not None + and not event.network.application_layer_only + ) + ) + emitters = { + "zeek_conn": conn_emitter, + "zeek_http": http_emitter, + "ecar": edr_emitter, + } + dispatcher = EventDispatcher(state_manager=state_manager, emitters=emitters) + generator = ActivityGenerator(state_manager, emitters, dispatcher=dispatcher) + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + + first_uid = generator.generate_connection( + "10.0.0.1", + "93.184.216.34", + timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=1.5, + orig_bytes=450, + resp_bytes=4096, + http=HttpContext( + method="GET", + host="portal.example.com", + uri="/", + user_agent="Mozilla/5.0", + response_body_len=4096, + trans_depth=1, + ), + emit_dns=False, + ) + second_uid = generator.generate_connection( + "10.0.0.1", + "93.184.216.34", + timestamp + timedelta(milliseconds=700), + dst_port=80, + proto="tcp", + service="http", + duration=0.2, + orig_bytes=320, + resp_bytes=8192, + http=HttpContext( + method="GET", + host="portal.example.com", + uri="/assets/app.js", + user_agent="Mozilla/5.0", + response_body_len=8192, + trans_depth=2, + ), + emit_dns=False, + ) + + assert first_uid + assert second_uid == first_uid + assert len(conn_emitter.events) == 1 + assert len(edr_emitter.events) == 1 + assert len(http_emitter.events) == 2 + + first_event, second_event = http_emitter.events + assert first_event.network.zeek_uid == first_uid + assert first_event.network.application_layer_only is False + assert first_event.http.trans_depth == 1 + assert second_event.network.zeek_uid == first_uid + assert second_event.network.src_port == first_event.network.src_port + assert second_event.network.application_layer_only is True + assert second_event.http.trans_depth == 2 + def test_generate_connection_with_bytes(self, activity_gen, state_manager, mock_emitters): """generate_connection should include byte counts in NetworkContext.""" timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 7daf6bd3..9c70bfc0 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -1550,6 +1550,12 @@ def test_web_server_access_preserves_cache_and_partial_statuses(self, monkeypatc assert by_uri["/assets/js/app.bundle.bf9655b3.js"].resp_mime_types == [ "application/javascript" ] + root_row = next(kw for kw in collected if kw["http"].uri == "/") + assert root_row["http"].trans_depth == 1 + assert root_row["duration"] >= 0.2 + assert root_row["resp_bytes"] >= 4096 + 1152 + assert by_uri["/assets/css/main.063cbaf5.css"].trans_depth == 2 + assert by_uri["/assets/js/app.bundle.bf9655b3.js"].trans_depth == 3 def test_web_server_access_keeps_scanner_requests_source_native(self, monkeypatch): """Scanner visitors should keep configured error paths and blank referrers.""" diff --git a/tests/unit/test_zeek_http.py b/tests/unit/test_zeek_http.py index 38650800..7290c74b 100644 --- a/tests/unit/test_zeek_http.py +++ b/tests/unit/test_zeek_http.py @@ -30,6 +30,7 @@ from evidenceforge.events.base import SecurityEvent from evidenceforge.events.contexts import HttpContext, NetworkContext from evidenceforge.formats import load_format +from evidenceforge.generation.emitters.zeek import ZeekEmitter from evidenceforge.generation.emitters.zeek_http import ZeekHttpEmitter @@ -218,6 +219,44 @@ def test_rejects_without_http_context(self): ) assert emitter.can_handle(event) is False + def test_accepts_application_layer_transactions(self): + fmt = load_format("zeek_http") + emitter = ZeekHttpEmitter(fmt, Path("/tmp/test.json")) + event = SecurityEvent( + timestamp=datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + event_type="connection", + network=NetworkContext( + src_ip="10.0.0.1", + src_port=50000, + dst_ip="8.8.8.8", + dst_port=80, + protocol="tcp", + service="http", + application_layer_only=True, + ), + http=HttpContext(method="GET", host="example.com", uri="/app.js", trans_depth=2), + ) + assert emitter.can_handle(event) is True + + def test_conn_emitter_rejects_application_layer_transactions(self): + fmt = load_format("zeek_conn") + emitter = ZeekEmitter(fmt, Path("/tmp/test.json")) + event = SecurityEvent( + timestamp=datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC), + event_type="connection", + network=NetworkContext( + src_ip="10.0.0.1", + src_port=50000, + dst_ip="8.8.8.8", + dst_port=80, + protocol="tcp", + service="http", + application_layer_only=True, + ), + http=HttpContext(method="GET", host="example.com", uri="/app.js", trans_depth=2), + ) + assert emitter.can_handle(event) is False + class TestHttpRenderTiming: """Verify http.log uses analyzer/request timing, not cloned conn start time.""" From ebc2d42efdc770fe2a9430fe5cb77b1d40e3ed67 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 04:03:55 -0400 Subject: [PATCH 51/61] docs: record loop 25 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index df9f0d8b..9965447a 100644 --- a/TODO.md +++ b/TODO.md @@ -397,7 +397,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 23 regeneration, hard probes, quantitative eval, and blind review completed from commit `91546d7`: repaired the hard one-way Zeek multi-sensor timing fingerprint. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes matched `2,782` core/DMZ Zeek rows across conn/http/dns/ssl, confirmed every checked format now has both positive and negative offsets, and found zero always-positive formats. Blind synthetic-confidence scores were Threat Hunter `74`, Detection `66`, Network `84`, Host/EDR `62` (average `71.5`). Top Loop 24 target is public DNS/X.509 corpus realism because the network specialist continues to flag templated NS/MX/SOA answers, exact host-plus-wildcard SANs, and clustered certificate validity periods at high confidence; next targets are remaining Linux daemon-message repetition, DNS TXT tunnel grammar/cadence, HTTP connection reuse, and Sysmon SYSTEM `LogonGuid` morphology. - [x] Loop 24 fix pass — repaired public DNS and X.509 corpus realism by replacing provider-agnostic NS/MX/SOA templates with data-driven provider/domain-class NS, MX, and SOA answer profiles; making public certificate SANs weighted instead of exact host-plus-wildcard by default; and widening/reweighting short-lived public CA validity profiles. Verification passed with focused DNS/TLS/config tests, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3148 passed, 37 skipped`). - [x] Loop 24 regeneration, hard probes, quantitative eval, and blind review completed from commit `999a20e`: regenerated eval passed at exact `96.23/100` across `78,991` records; hard probes found zero old public DNS `ns1/ns2`, `mail.`, or `ns1/hostmaster` companion templates, public host-cert exact host-plus-wildcard SANs at `35/431` (`8.12%`), and `69` distinct certificate validity day counts. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `69`, Network `68`, Host/EDR `64` (average `69.25`). The Network reviewer explicitly called DNS a realism strength after the fix. Top Loop 25 target is Zeek HTTP connection reuse because all `3,373` HTTP rows still have `trans_depth=1` with zero repeated HTTP UIDs; next targets are Linux syslog daemon-message statefulness, DNS TXT tunnel cadence/grammar, proxy/C2 browser artifacts, and larger/messier X.509 SAN sets. - - [ ] **IN PROGRESS** Loop 25 fix pass — repair Zeek HTTP connection/session realism by introducing source-native persistent HTTP/1.1 reuse for browser-like asset fetches and internal web applications, with repeated UIDs and `trans_depth > 1` where appropriate. + - [x] Loop 25 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `dd56f08`: introduced source-native persistent HTTP/1.1 transaction reuse for browser-like HTTP asset fetches and internal web sessions, suppressing duplicate connection-level rows for application-layer-only follow-on requests while preserving `zeek_http` rows with repeated UIDs and `trans_depth > 1`. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`284 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3151 passed, 37 skipped`). Regenerated eval passed at exact `95.99/100` across `79,163` records; hard probes found `277` HTTP rows with `trans_depth > 1`, `106` repeated HTTP UID groups, zero duplicate conn UID groups, and every reused HTTP UID backed by exactly one conn row. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `67`, Network `92`, Host/EDR `64` (average `75.25`). The old all-`trans_depth=1` tell is fixed, but Network found a new harder source-native HTTP/conn contradiction: some same-UID HTTP transactions occur after parent connection close, have non-monotonic `trans_depth`, or exceed parent conn byte counters. + - [ ] **IN PROGRESS** Loop 26 fix pass — repair persistent Zeek HTTP transaction sequencing and parent `conn.log` accounting so reused HTTP rows are timestamp-ordered with monotonic contiguous `trans_depth`, fit within the parent connection duration, and have aggregate body sizes reflected in parent flow bytes. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 4f92a112ca0effe70d6a5c472da9f7cbe71d55f9 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 04:09:54 -0400 Subject: [PATCH 52/61] fix: align persistent http flow accounting --- .../generation/activity/generator.py | 62 ++++++-- .../generation/emitters/zeek_http.py | 12 ++ .../generation/engine/baseline.py | 146 ++++++++++++------ tests/unit/test_activity.py | 90 ++++++++++- tests/unit/test_baseline_canonical.py | 4 +- tests/unit/test_zeek_http.py | 44 +++++- 6 files changed, 286 insertions(+), 72 deletions(-) diff --git a/src/evidenceforge/generation/activity/generator.py b/src/evidenceforge/generation/activity/generator.py index 2e6afb90..484da9e0 100644 --- a/src/evidenceforge/generation/activity/generator.py +++ b/src/evidenceforge/generation/activity/generator.py @@ -36,7 +36,7 @@ import re import shlex import uuid -from dataclasses import replace +from dataclasses import dataclass, replace from datetime import UTC, datetime, timedelta from threading import Lock from typing import Any, Optional @@ -103,6 +103,22 @@ logger = logging.getLogger(__name__) + +@dataclass(slots=True) +class _HttpPersistentConnection: + close_deadline: datetime + uid: str + src_port: int + next_trans_depth: int + orig_budget: int + resp_budget: int + used_orig: int + used_resp: int + + +_HTTP_PERSISTENT_REUSE_GUARD = timedelta(milliseconds=900) + + _WINDOWS_SINGLETON_SERVICE_EXES = frozenset( { "spoolsv.exe", @@ -1377,7 +1393,6 @@ def _tcp_success_history(rng: random.Random) -> str: _PROXY_SC_OVERHEAD = (50, 250) # Via, X-Cache, Age, etc. _AUTO_WEIRD_ENABLED = False # weird.log realism is deferred; explicit contexts still render. _EXPLICIT_PROXY_TUNNEL_TIMEOUT_S = 240 -_HTTP_PERSISTENT_CONNECTION_TIMEOUT_S = 45.0 # Kerberos TGS service name distribution (weighted) _KERBEROS_SVC_DIST = ( @@ -2118,7 +2133,7 @@ def __init__( tuple[str, str, str, str, int], tuple[datetime, str] ] = {} self._http_persistent_connections: dict[ - tuple[str, str, int, str, str], tuple[datetime, str, int] + tuple[str, str, int, str, str], _HttpPersistentConnection ] = {} self._recent_connection_tuples: dict[tuple[str, int, str, int, str], float] = {} self._recent_icmp_observations: set[tuple[str, int, str, int, int]] = set() @@ -6706,17 +6721,24 @@ def generate_connection( if http.trans_depth > 1: cached = self._http_persistent_connections.get(http_persistent_key) if cached is not None: - last_activity, cached_uid, cached_src_port = cached - elapsed = (time - last_activity).total_seconds() - if 0 <= elapsed <= _HTTP_PERSISTENT_CONNECTION_TIMEOUT_S: - src_port = cached_src_port - reused_http_uid = cached_uid + reuse_deadline = cached.close_deadline - _HTTP_PERSISTENT_REUSE_GUARD + elapsed = (time - reuse_deadline).total_seconds() + request_body = http.request_body_len or 0 + response_body = http.response_body_len or 0 + fits_parent_flow = ( + cached.used_orig + request_body <= cached.orig_budget + and cached.used_resp + response_body <= cached.resp_budget + ) + if elapsed <= 0 and fits_parent_flow: + src_port = cached.src_port + reused_http_uid = cached.uid http_application_layer_only = True - self._http_persistent_connections[http_persistent_key] = ( - time, - cached_uid, - cached_src_port, - ) + http = replace(http, trans_depth=cached.next_trans_depth) + cached.next_trans_depth += 1 + cached.used_orig += request_body + cached.used_resp += response_body + else: + self._http_persistent_connections.pop(http_persistent_key, None) if not http_application_layer_only: http = replace(http, trans_depth=1) @@ -7946,11 +7968,17 @@ def generate_connection( and event.http is not None and event.network.conn_state == "SF" and not event.network.application_layer_only + and event.network.duration is not None ): - self._http_persistent_connections[http_persistent_key] = ( - time, - uid, - src_port, + self._http_persistent_connections[http_persistent_key] = _HttpPersistentConnection( + close_deadline=event.timestamp + timedelta(seconds=event.network.duration), + uid=uid, + src_port=src_port, + next_trans_depth=max(2, event.http.trans_depth + 1), + orig_budget=max(event.network.orig_bytes or 0, event.http.request_body_len or 0), + resp_budget=max(event.network.resp_bytes or 0, event.http.response_body_len or 0), + used_orig=event.http.request_body_len or 0, + used_resp=event.http.response_body_len or 0, ) # Phase 3: Dispatch to matching emitters (visibility handled by dispatcher) diff --git a/src/evidenceforge/generation/emitters/zeek_http.py b/src/evidenceforge/generation/emitters/zeek_http.py index 0f96038c..fc6687ce 100644 --- a/src/evidenceforge/generation/emitters/zeek_http.py +++ b/src/evidenceforge/generation/emitters/zeek_http.py @@ -22,12 +22,15 @@ """Zeek http.log emitter.""" +from datetime import datetime, timedelta from typing import Any from evidenceforge.events.base import SecurityEvent from evidenceforge.generation.activity.timing_profiles import sample_packet_timing_delta from evidenceforge.generation.emitters.zeek_base import SensorMultiplexEmitter +_MIN_HTTP_TRANSACTION_TIMESTAMP_GAP = timedelta(milliseconds=1) + class ZeekHttpEmitter(SensorMultiplexEmitter): """Emitter for Zeek http.log format (NDJSON). @@ -40,6 +43,10 @@ class ZeekHttpEmitter(SensorMultiplexEmitter): _flat_filename = "zeek_http.json" _supported_types: set[str] = {"connection"} + def __init__(self, *args: Any, **kwargs: Any) -> None: + super().__init__(*args, **kwargs) + self._last_http_ts_by_uid: dict[tuple[str, str, int, str, int], datetime] = {} + def can_handle(self, event: SecurityEvent) -> bool: if event.event_type not in self._supported_types: return False @@ -67,6 +74,11 @@ def emit(self, event: SecurityEvent) -> None: event.timestamp, ), ) + uid_key = (net.zeek_uid, net.src_ip, net.src_port, net.dst_ip, net.dst_port) + previous_ts = self._last_http_ts_by_uid.get(uid_key) + if previous_ts is not None and event_ts <= previous_ts: + event_ts = previous_ts + _MIN_HTTP_TRANSACTION_TIMESTAMP_GAP + self._last_http_ts_by_uid[uid_key] = event_ts event_data: dict[str, Any] = { "ts": event_ts, "uid": net.zeek_uid, diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index 699ff0f6..cb50b3fb 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -90,6 +90,58 @@ logger = logging.getLogger(__name__) +_HttpGroupKey = tuple[str, int] +_HttpPlanValue = tuple[_HttpGroupKey, int, bool, int] + + +def _plan_http_request_groups( + requests: list[Any], + *, + request_body_floor: int = 0, +) -> tuple[dict[int, _HttpPlanValue], dict[_HttpGroupKey, dict[str, int]]]: + """Plan source-native HTTP transaction depth and parent flow accounting.""" + group_counters: dict[str, int] = {} + active_group: dict[str, _HttpGroupKey] = {} + depths: dict[_HttpGroupKey, int] = {} + last_emit_offset: dict[_HttpGroupKey, int] = {} + plan: dict[int, _HttpPlanValue] = {} + groups: dict[_HttpGroupKey, dict[str, int]] = {} + + for index, req in enumerate(requests): + hostname = str(req.hostname) + if req.is_page_load or hostname not in active_group: + group_counters[hostname] = group_counters.get(hostname, 0) + 1 + active_group[hostname] = (hostname, group_counters[hostname]) + depths[active_group[hostname]] = 0 + + group_key = active_group[hostname] + depths[group_key] += 1 + trans_depth = depths[group_key] + emit_offset_ms = req.time_offset_ms + if group_key in last_emit_offset: + emit_offset_ms = max(emit_offset_ms, last_emit_offset[group_key] + 600) + last_emit_offset[group_key] = emit_offset_ms + plan[index] = (group_key, trans_depth, trans_depth == 1, emit_offset_ms) + + group = groups.setdefault( + group_key, + { + "first_offset_ms": emit_offset_ms, + "last_offset_ms": emit_offset_ms, + "request_body_len": 0, + "response_body_len": 0, + "request_count": 0, + }, + ) + group["first_offset_ms"] = min(group["first_offset_ms"], emit_offset_ms) + group["last_offset_ms"] = max(group["last_offset_ms"], emit_offset_ms) + group["request_body_len"] += max(request_body_floor, req.request_body_len) + group["response_body_len"] += req.response_body_len + group["request_count"] += 1 + + return plan, groups + + def _session_started_by(session: Any, time: datetime) -> bool: """Return whether a session exists at the given activity time.""" session_start = session.start_time @@ -4087,37 +4139,33 @@ def _http_status_message(status: int) -> str: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0", ] session_ua = rng.choice(_session_uas) - request_groups: dict[str, dict[str, int]] = {} - for req in session_requests: - group = request_groups.setdefault( - req.hostname, - { - "last_offset_ms": req.time_offset_ms, - "request_body_len": 0, - "response_body_len": 0, - }, - ) - group["last_offset_ms"] = max(group["last_offset_ms"], req.time_offset_ms) - group["request_body_len"] += req.request_body_len - group["response_body_len"] += req.response_body_len - seen_request_groups: set[str] = set() + request_plan, request_groups = _plan_http_request_groups(session_requests) - for req in session_requests: - req_ts = base_ts + timedelta(milliseconds=req.time_offset_ms) + planned_requests = sorted( + enumerate(session_requests), + key=lambda item: (request_plan[item[0]][3], item[0]), + ) + for req_index, req in planned_requests: + group_key, trans_depth, first_in_group, emit_offset_ms = request_plan[req_index] + req_ts = base_ts + timedelta(milliseconds=emit_offset_ms) self.state_manager.set_current_time(req_ts) - group = request_groups[req.hostname] - first_in_group = req.hostname not in seen_request_groups - seen_request_groups.add(req.hostname) + group = request_groups[group_key] conn_duration = rng.uniform(0.05, 2.0) conn_orig_bytes = req.request_body_len conn_resp_bytes = req.response_body_len - trans_depth = req.trans_depth if first_in_group: - trans_depth = 1 - remaining_ms = max(0, group["last_offset_ms"] - req.time_offset_ms) - conn_duration = (remaining_ms / 1000) + rng.uniform(0.25, 1.75) - conn_orig_bytes = max(req.request_body_len, group["request_body_len"]) - conn_resp_bytes = max(req.response_body_len, group["response_body_len"]) + remaining_ms = max(0, group["last_offset_ms"] - emit_offset_ms) + conn_duration = (remaining_ms / 1000) + rng.uniform(1.25, 3.0) + request_overhead = 120 * group["request_count"] + response_overhead = 160 * group["request_count"] + conn_orig_bytes = max( + req.request_body_len, + group["request_body_len"] + request_overhead, + ) + conn_resp_bytes = max( + req.response_body_len, + group["response_body_len"] + response_overhead, + ) # Resolve destination IP for CDN subresources req_dst_ip = dst_ip @@ -6095,26 +6143,21 @@ def _tool_gap_ms() -> int: cache_seen[cache_key] = 1 visible_requests.append(req) - request_groups: dict[str, dict[str, int]] = {} - for req in visible_requests: - group = request_groups.setdefault( - req.hostname, - { - "last_offset_ms": req.time_offset_ms, - "request_body_len": 0, - "response_body_len": 0, - }, - ) - group["last_offset_ms"] = max(group["last_offset_ms"], req.time_offset_ms) - group["request_body_len"] += max(200, req.request_body_len) - group["response_body_len"] += req.response_body_len - seen_request_groups: set[str] = set() - - for req in visible_requests: - req_ts = base_ts + timedelta(milliseconds=req.time_offset_ms) - group = request_groups[req.hostname] - first_in_group = req.hostname not in seen_request_groups - seen_request_groups.add(req.hostname) + request_plan, request_groups = _plan_http_request_groups( + visible_requests, + request_body_floor=200, + ) + + planned_requests = sorted( + enumerate(visible_requests), + key=lambda item: (request_plan[item[0]][3], item[0]), + ) + for req_index, req in planned_requests: + group_key, trans_depth, first_in_group, emit_offset_ms = request_plan[req_index] + req_ts = base_ts + timedelta(milliseconds=emit_offset_ms) + group = request_groups[group_key] + request_overhead = 120 * group["request_count"] + response_overhead = 160 * group["request_count"] self.activity_generator.generate_connection( src_ip=client_ip, dst_ip=effective_dst_ip, @@ -6123,18 +6166,21 @@ def _tool_gap_ms() -> int: proto="tcp", service=dst_service, duration=( - (max(0, group["last_offset_ms"] - req.time_offset_ms) / 1000) - + rng.uniform(0.25, 1.75) + (max(0, group["last_offset_ms"] - emit_offset_ms) / 1000) + + rng.uniform(1.25, 3.0) if first_in_group else rng.uniform(0.03, 2.0) ), orig_bytes=( - group["request_body_len"] + group["request_body_len"] + request_overhead if first_in_group else max(200, req.request_body_len) ), resp_bytes=( - max(req.response_body_len, group["response_body_len"]) + max( + req.response_body_len, + group["response_body_len"] + response_overhead, + ) if first_in_group else req.response_body_len ), @@ -6150,7 +6196,7 @@ def _tool_gap_ms() -> int: status_code=req.status_code, status_msg=_status_message(req.status_code), referrer=req.referrer, - trans_depth=1 if first_in_group else req.trans_depth, + trans_depth=trans_depth, resp_mime_types=response_mime_types_for_status( req.status_code, req.content_type, diff --git a/tests/unit/test_activity.py b/tests/unit/test_activity.py index 79f47f3a..c8d564bb 100644 --- a/tests/unit/test_activity.py +++ b/tests/unit/test_activity.py @@ -3724,9 +3724,10 @@ def emit(self, event): dst_port=80, proto="tcp", service="http", - duration=1.5, + duration=2.0, orig_bytes=450, - resp_bytes=4096, + resp_bytes=12_288, + conn_state="SF", http=HttpContext( method="GET", host="portal.example.com", @@ -3747,6 +3748,7 @@ def emit(self, event): duration=0.2, orig_bytes=320, resp_bytes=8192, + conn_state="SF", http=HttpContext( method="GET", host="portal.example.com", @@ -3773,6 +3775,90 @@ def emit(self, event): assert second_event.network.application_layer_only is True assert second_event.http.trans_depth == 2 + def test_generate_connection_does_not_reuse_http_uid_after_parent_close(self, state_manager): + """A late HTTP request should start a new flow instead of overrunning conn.log.""" + + class CollectorEmitter: + def __init__(self, predicate): + self._predicate = predicate + self.events = [] + + def can_handle(self, event): + return self._predicate(event) + + def emit(self, event): + self.events.append(event) + + conn_emitter = CollectorEmitter( + lambda event: ( + event.event_type == "connection" + and event.network is not None + and not event.network.application_layer_only + ) + ) + http_emitter = CollectorEmitter( + lambda event: event.event_type == "connection" and event.http is not None + ) + emitters = { + "zeek_conn": conn_emitter, + "zeek_http": http_emitter, + } + dispatcher = EventDispatcher(state_manager=state_manager, emitters=emitters) + generator = ActivityGenerator(state_manager, emitters, dispatcher=dispatcher) + timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + state_manager.set_current_time(timestamp) + + first_uid = generator.generate_connection( + "10.0.0.1", + "93.184.216.34", + timestamp, + dst_port=80, + proto="tcp", + service="http", + duration=0.25, + orig_bytes=450, + resp_bytes=4096, + conn_state="SF", + http=HttpContext( + method="GET", + host="portal.example.com", + uri="/", + user_agent="Mozilla/5.0", + response_body_len=4096, + trans_depth=1, + ), + emit_dns=False, + ) + second_uid = generator.generate_connection( + "10.0.0.1", + "93.184.216.34", + timestamp + timedelta(seconds=2), + dst_port=80, + proto="tcp", + service="http", + duration=0.25, + orig_bytes=320, + resp_bytes=8192, + conn_state="SF", + http=HttpContext( + method="GET", + host="portal.example.com", + uri="/assets/app.js", + user_agent="Mozilla/5.0", + response_body_len=8192, + trans_depth=2, + ), + emit_dns=False, + ) + + assert first_uid + assert second_uid + assert second_uid != first_uid + assert len(conn_emitter.events) == 2 + assert len(http_emitter.events) == 2 + assert http_emitter.events[1].network.application_layer_only is False + assert http_emitter.events[1].http.trans_depth == 1 + def test_generate_connection_with_bytes(self, activity_gen, state_manager, mock_emitters): """generate_connection should include byte counts in NetworkContext.""" timestamp = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) diff --git a/tests/unit/test_baseline_canonical.py b/tests/unit/test_baseline_canonical.py index 9c70bfc0..919c4ff0 100644 --- a/tests/unit/test_baseline_canonical.py +++ b/tests/unit/test_baseline_canonical.py @@ -1497,7 +1497,7 @@ def test_web_server_access_preserves_cache_and_partial_statuses(self, monkeypatc method="GET", content_type="text/css", referrer=f"https://{kwargs['hostname']}/", - trans_depth=2, + trans_depth=5, is_page_load=False, response_body_len=0, request_body_len=0, @@ -1510,7 +1510,7 @@ def test_web_server_access_preserves_cache_and_partial_statuses(self, monkeypatc method="GET", content_type="application/javascript", referrer=f"https://{kwargs['hostname']}/", - trans_depth=3, + trans_depth=4, is_page_load=False, response_body_len=1152, request_body_len=0, diff --git a/tests/unit/test_zeek_http.py b/tests/unit/test_zeek_http.py index 7290c74b..d4b6116b 100644 --- a/tests/unit/test_zeek_http.py +++ b/tests/unit/test_zeek_http.py @@ -24,7 +24,7 @@ import json import tempfile -from datetime import UTC, datetime +from datetime import UTC, datetime, timedelta from pathlib import Path from evidenceforge.events.base import SecurityEvent @@ -288,3 +288,45 @@ def test_emit_offsets_http_timestamp_from_connection_timestamp(self, tmp_path): assert data["ts"] > base_ts.timestamp() offset_us = round((data["ts"] - base_ts.timestamp()) * 1_000_000) assert offset_us % 1000 != 0 + + def test_emit_preserves_same_uid_transaction_timestamp_order(self, tmp_path, monkeypatch): + """Per-request analyzer jitter must not reorder same-UID transaction depths.""" + fmt = load_format("zeek_http") + output = tmp_path / "http.json" + emitter = ZeekHttpEmitter(fmt, output, buffer_size=1) + base_ts = datetime(2024, 1, 15, 10, 0, 0, tzinfo=UTC) + deltas = [timedelta(milliseconds=450), timedelta(milliseconds=1)] + + monkeypatch.setattr( + "evidenceforge.generation.emitters.zeek_http.sample_packet_timing_delta", + lambda *_args, **_kwargs: deltas.pop(0), + ) + + def make_event(timestamp: datetime, trans_depth: int, uri: str) -> SecurityEvent: + return SecurityEvent( + timestamp=timestamp, + event_type="connection", + network=NetworkContext( + src_ip="10.0.0.1", + src_port=50000, + dst_ip="93.184.216.34", + dst_port=80, + protocol="tcp", + service="http", + zeek_uid="ChttpTiming1234", + ), + http=HttpContext( + method="GET", + host="example.com", + uri=uri, + trans_depth=trans_depth, + ), + ) + + emitter.emit(make_event(base_ts, 1, "/")) + emitter.emit(make_event(base_ts + timedelta(milliseconds=100), 2, "/app.js")) + emitter.close() + + rows = [json.loads(line) for line in output.read_text().splitlines()] + assert [row["trans_depth"] for row in rows] == [1, 2] + assert rows[1]["ts"] > rows[0]["ts"] From c13e429ad0c857c29dcece550bb0a218c652ad15 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 04:45:37 -0400 Subject: [PATCH 53/61] docs: record loop 26 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 9965447a..58cd66f8 100644 --- a/TODO.md +++ b/TODO.md @@ -398,7 +398,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 24 fix pass — repaired public DNS and X.509 corpus realism by replacing provider-agnostic NS/MX/SOA templates with data-driven provider/domain-class NS, MX, and SOA answer profiles; making public certificate SANs weighted instead of exact host-plus-wildcard by default; and widening/reweighting short-lived public CA validity profiles. Verification passed with focused DNS/TLS/config tests, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3148 passed, 37 skipped`). - [x] Loop 24 regeneration, hard probes, quantitative eval, and blind review completed from commit `999a20e`: regenerated eval passed at exact `96.23/100` across `78,991` records; hard probes found zero old public DNS `ns1/ns2`, `mail.`, or `ns1/hostmaster` companion templates, public host-cert exact host-plus-wildcard SANs at `35/431` (`8.12%`), and `69` distinct certificate validity day counts. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `69`, Network `68`, Host/EDR `64` (average `69.25`). The Network reviewer explicitly called DNS a realism strength after the fix. Top Loop 25 target is Zeek HTTP connection reuse because all `3,373` HTTP rows still have `trans_depth=1` with zero repeated HTTP UIDs; next targets are Linux syslog daemon-message statefulness, DNS TXT tunnel cadence/grammar, proxy/C2 browser artifacts, and larger/messier X.509 SAN sets. - [x] Loop 25 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `dd56f08`: introduced source-native persistent HTTP/1.1 transaction reuse for browser-like HTTP asset fetches and internal web sessions, suppressing duplicate connection-level rows for application-layer-only follow-on requests while preserving `zeek_http` rows with repeated UIDs and `trans_depth > 1`. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`284 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3151 passed, 37 skipped`). Regenerated eval passed at exact `95.99/100` across `79,163` records; hard probes found `277` HTTP rows with `trans_depth > 1`, `106` repeated HTTP UID groups, zero duplicate conn UID groups, and every reused HTTP UID backed by exactly one conn row. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `67`, Network `92`, Host/EDR `64` (average `75.25`). The old all-`trans_depth=1` tell is fixed, but Network found a new harder source-native HTTP/conn contradiction: some same-UID HTTP transactions occur after parent connection close, have non-monotonic `trans_depth`, or exceed parent conn byte counters. - - [ ] **IN PROGRESS** Loop 26 fix pass — repair persistent Zeek HTTP transaction sequencing and parent `conn.log` accounting so reused HTTP rows are timestamp-ordered with monotonic contiguous `trans_depth`, fit within the parent connection duration, and have aggregate body sizes reflected in parent flow bytes. + - [x] Loop 26 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `4f92a11`: repaired persistent Zeek HTTP transaction sequencing and parent `conn.log` accounting by aggregating parent flow bytes/duration across planned HTTP request groups, guarding reuse near parent close, constraining reuse to parent byte budgets, and preserving same-UID Zeek HTTP timestamp order after analyzer jitter. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`266 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3153 passed, 37 skipped`). Regenerated eval passed at exact `95.25/100` across `76,677` records; hard probes found `189` HTTP rows with `trans_depth > 1`, `82` repeated HTTP UID groups, zero duplicate conn UID groups, zero reused groups without exactly one parent conn, zero after-parent-close rows, zero non-monotonic `trans_depth` groups, and zero HTTP body sums exceeding parent conn byte counters. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `62`, Host/EDR `66` (average `69.0`). Top Loop 27 target is Linux syslog daemon-message statefulness because Host/EDR and Threat Hunter both flagged high-volume exact daemon phrase reuse across dissimilar hosts; next targets are DNS TXT tunnel grammar/cadence, public X.509 SAN diversity, C2/proxy HTTP shape variation, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. + - [ ] **IN PROGRESS** Loop 27 fix pass — reduce cross-host exact Linux syslog daemon-message repetition by making daemon pools host-role/state aware, adding source-native parameter/message variation, and clustering maintenance messages around plausible host-local service/package activity. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 38e431dfb021253e1b2a50eba187a3b3ad05b109 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 04:49:55 -0400 Subject: [PATCH 54/61] fix: diversify linux syslog daemon noise --- .../activity/extra_syslog_messages.yaml | 70 ++++++++++++++----- .../generation/engine/baseline.py | 25 +++++-- tests/unit/test_validate_config.py | 48 +++++++++++++ 3 files changed, 121 insertions(+), 22 deletions(-) diff --git a/src/evidenceforge/config/activity/extra_syslog_messages.yaml b/src/evidenceforge/config/activity/extra_syslog_messages.yaml index 0fca3861..9225a8c5 100644 --- a/src/evidenceforge/config/activity/extra_syslog_messages.yaml +++ b/src/evidenceforge/config/activity/extra_syslog_messages.yaml @@ -29,15 +29,18 @@ programs: - app: dbus-daemon weight: 2 messages: - - "[system] Activating via systemd: service name='org.freedesktop.hostname1'" - - "[system] Successfully activated service 'org.freedesktop.resolve1'" - - "[system] Activating via systemd: service name='org.freedesktop.timedate1'" + - "[system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.{}'" + - "[system] Successfully activated service 'org.freedesktop.resolve1' for ':1.{}'" + - "[system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.{}'" + - "[system] Successfully activated service 'org.freedesktop.locale1' for ':1.{}'" + - "[system] Activating via systemd: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service' requested by ':1.{}'" - app: rsyslogd weight: 1 messages: - - "imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog'" - - '[origin software="rsyslogd"] rsyslogd was HUPed' + - "imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' fd {}" + - '[origin software="rsyslogd"] rsyslogd was HUPed after config reload id {}' + - "omfwd: remote syslog target stayed connected, action queue watermark {}" - app: sudo transient: true @@ -130,8 +133,8 @@ programs: - unix-session - system-bus-name messages: - - "Registered Authentication Agent for {auth_subject}" - - "Unregistered Authentication Agent for {auth_subject}" + - "Registered Authentication Agent for {auth_subject}:{0} (system bus name :1.{0})" + - "Unregistered Authentication Agent for {auth_subject}:{0} (system bus name :1.{0})" - "Operator of unix-process:{} successfully authenticated as 'root'" - app: multipathd @@ -173,19 +176,43 @@ programs: - app: unattended-upgr distro: ubuntu system_types: [server] - weight: 2 + weight: 1 + params: + origin_set: + - "o=Ubuntu,a=jammy-security" + - "o=Ubuntu,a=jammy-updates" + - "o=Ubuntu,a=jammy; o=Ubuntu,a=jammy-security" + package_name: + - openssl + - ca-certificates + - base-files + - libc6 + - python3-apt + - tzdata messages: - - "Allowed origins are: o=Ubuntu,a=jammy" - - "No packages found that can be upgraded unattended" - - "dpkg --status-fd: processing triggers for man-db" + - "Allowed origins are: {origin_set}" + - "Checking for unattended upgrades from {origin_set}" + - "No packages found that can be upgraded unattended (run {})" + - "Package {package_name} kept back for phased update percentage {}" + - "dpkg --status-fd: processing triggers for man-db ({package_name})" - app: systemd-resolved distro: ubuntu - weight: 2 + weight: 1 + params: + feature_set: + - UDP + - TCP + - UDP+EDNS0 + trust_anchor: + - "20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d" + - "38696 8 2 683d2d0acb8c9b712a1948b27f741219298d0a450d612c483af444a4c0fb2b16" + - "20326 8 1 2f6d2c6bdab2a2d15eb06fcddcb0b195" messages: - - "Using degraded feature set UDP instead of UDP+EDNS0 for DNS server {dns_server}." - - "Grace period over, resuming full feature set for DNS server {dns_server}." - - "Positive Trust Anchors: . IN DS 20326" + - "Using degraded feature set {feature_set} instead of UDP+EDNS0 for DNS server {dns_server} after transaction {0}." + - "Grace period over, resuming full feature set {feature_set} for DNS server {dns_server} after probe {0}." + - "Positive Trust Anchors: . IN DS {trust_anchor}" + - "Cache miss for transaction {}, retrying DNS server {dns_server} with {feature_set}." - app: thermald system_types: [workstation] @@ -204,5 +231,16 @@ programs: - app: irqbalance weight: 1 + params: + cpu: + - "0" + - "1" + - "2" + - "3" + numa_node: + - "0" + - "1" messages: - - "Balancing is ineffective IRQs are pinned and balanced" + - "Balancing is ineffective: IRQs are pinned by affinity policy on CPU {cpu} during sample {0}" + - "IRQ {} affinity hint keeps vector on CPU {cpu}" + - "NUMA node {numa_node}: no movable IRQs found during rebalance pass {0}" diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index cb50b3fb..f70e9e38 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -5571,23 +5571,36 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 pid=sshd_pid, facility=10, ) - elif source_roll < 0.39: + elif source_roll < 0.36: if is_rhel_like: continue # RHEL doesn't have snapd + snap_name = rng.choice( + [ + "core20", + "core22", + "lxd", + "microk8s", + "snapd-desktop-integration", + ] + ) + change_id = 1000 + (_stable_seed(f"snapd_change:{system.hostname}") % 8000) + task_id = rng.randint(1, 9) self.activity_generator.generate_syslog_event( system=system, time=ts, app_name="snapd", message=rng.choice( [ - "autorefresh.go:540: auto-refresh: all snaps are up-to-date", - "daemon.go:460: gracefully waiting for running hooks", - "stateengine.go:150: state ensure starting", + f"autorefresh.go:540: auto-refresh for {snap_name}: no updates found", + f"daemon.go:460: gracefully waiting for hook {snap_name}.configure", + f"stateengine.go:150: state ensure starting change {change_id + task_id}", + f"taskrunner.go:271: change {change_id} task {task_id} done for {snap_name}", + f"snapmgr.go:523: refresh candidates checked for {snap_name}", ] ), pid=sys_pids.get("snapd", rng.randint(500, 2000)), ) - elif source_roll < 0.47: + elif source_roll < 0.45: if not has_ntp_client: continue if is_rhel_like: @@ -5627,7 +5640,7 @@ def _svc_pid(*keys: str, _pids: dict = sys_pids) -> int: # noqa: B006 message=msg, pid=sys_pids.get("timesyncd", rng.randint(400, 800)), ) - elif source_roll < 0.51: + elif source_roll < 0.50: # Journald runtime statistics (max_size and type stable per host) machine_id = self._machine_ids.get(system.hostname, "0" * 32) _j_rng = random.Random(_stable_seed(f"journald:{system.hostname}")) diff --git a/tests/unit/test_validate_config.py b/tests/unit/test_validate_config.py index c6243948..7c63ec0b 100644 --- a/tests/unit/test_validate_config.py +++ b/tests/unit/test_validate_config.py @@ -1143,6 +1143,54 @@ def test_extra_syslog_filters_by_system_type_and_excluded_roles(self): assert [entry["app"] for entry in db_server] == ["multipathd"] assert [entry["app"] for entry in workstation] == ["packagekitd", "accounts-daemon"] + def test_extra_syslog_high_volume_daemons_avoid_exact_boilerplate(self): + from evidenceforge.generation.activity.extra_syslog import ( + load_extra_syslog_messages, + render_extra_syslog_message, + ) + + programs = load_extra_syslog_messages() + high_volume_apps = { + "dbus-daemon", + "rsyslogd", + "unattended-upgr", + "systemd-resolved", + "irqbalance", + } + old_exact_messages = { + "[system] Activating via systemd: service name='org.freedesktop.hostname1'", + "[system] Successfully activated service 'org.freedesktop.resolve1'", + "[system] Activating via systemd: service name='org.freedesktop.timedate1'", + '[origin software="rsyslogd"] rsyslogd was HUPed', + "Allowed origins are: o=Ubuntu,a=jammy", + "No packages found that can be upgraded unattended", + "dpkg --status-fd: processing triggers for man-db", + "Positive Trust Anchors: . IN DS 20326", + "Balancing is ineffective IRQs are pinned and balanced", + } + + checked_apps = set() + for entry in programs: + app = entry.get("app") + if app not in high_volume_apps: + continue + checked_apps.add(app) + messages = entry.get("messages", []) + assert not old_exact_messages.intersection(messages) + assert any("{" in message for message in messages) + for message in messages: + rendered = render_extra_syslog_message( + {**entry, "messages": [message]}, + random.Random(5), + positional_value=123456, + system_services=["sshd", "nginx"], + values={"dns_server": "10.10.2.10"}, + ) + assert "{" not in rendered + assert "}" not in rendered + + assert checked_apps == high_volume_apps + def test_validate_config_rejects_invalid_4672_emission_probability(self, monkeypatch): from evidenceforge.generation.activity import windows_auth_realism From 3e78053dfe00128805b6c7ed7b9d57869231d430 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 05:12:59 -0400 Subject: [PATCH 55/61] docs: record loop 27 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 58cd66f8..9885465b 100644 --- a/TODO.md +++ b/TODO.md @@ -399,7 +399,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 24 regeneration, hard probes, quantitative eval, and blind review completed from commit `999a20e`: regenerated eval passed at exact `96.23/100` across `78,991` records; hard probes found zero old public DNS `ns1/ns2`, `mail.`, or `ns1/hostmaster` companion templates, public host-cert exact host-plus-wildcard SANs at `35/431` (`8.12%`), and `69` distinct certificate validity day counts. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `69`, Network `68`, Host/EDR `64` (average `69.25`). The Network reviewer explicitly called DNS a realism strength after the fix. Top Loop 25 target is Zeek HTTP connection reuse because all `3,373` HTTP rows still have `trans_depth=1` with zero repeated HTTP UIDs; next targets are Linux syslog daemon-message statefulness, DNS TXT tunnel cadence/grammar, proxy/C2 browser artifacts, and larger/messier X.509 SAN sets. - [x] Loop 25 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `dd56f08`: introduced source-native persistent HTTP/1.1 transaction reuse for browser-like HTTP asset fetches and internal web sessions, suppressing duplicate connection-level rows for application-layer-only follow-on requests while preserving `zeek_http` rows with repeated UIDs and `trans_depth > 1`. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`284 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3151 passed, 37 skipped`). Regenerated eval passed at exact `95.99/100` across `79,163` records; hard probes found `277` HTTP rows with `trans_depth > 1`, `106` repeated HTTP UID groups, zero duplicate conn UID groups, and every reused HTTP UID backed by exactly one conn row. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `67`, Network `92`, Host/EDR `64` (average `75.25`). The old all-`trans_depth=1` tell is fixed, but Network found a new harder source-native HTTP/conn contradiction: some same-UID HTTP transactions occur after parent connection close, have non-monotonic `trans_depth`, or exceed parent conn byte counters. - [x] Loop 26 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `4f92a11`: repaired persistent Zeek HTTP transaction sequencing and parent `conn.log` accounting by aggregating parent flow bytes/duration across planned HTTP request groups, guarding reuse near parent close, constraining reuse to parent byte budgets, and preserving same-UID Zeek HTTP timestamp order after analyzer jitter. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`266 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3153 passed, 37 skipped`). Regenerated eval passed at exact `95.25/100` across `76,677` records; hard probes found `189` HTTP rows with `trans_depth > 1`, `82` repeated HTTP UID groups, zero duplicate conn UID groups, zero reused groups without exactly one parent conn, zero after-parent-close rows, zero non-monotonic `trans_depth` groups, and zero HTTP body sums exceeding parent conn byte counters. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `62`, Host/EDR `66` (average `69.0`). Top Loop 27 target is Linux syslog daemon-message statefulness because Host/EDR and Threat Hunter both flagged high-volume exact daemon phrase reuse across dissimilar hosts; next targets are DNS TXT tunnel grammar/cadence, public X.509 SAN diversity, C2/proxy HTTP shape variation, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. - - [ ] **IN PROGRESS** Loop 27 fix pass — reduce cross-host exact Linux syslog daemon-message repetition by making daemon pools host-role/state aware, adding source-native parameter/message variation, and clustering maintenance messages around plausible host-local service/package activity. + - [x] Loop 27 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `38e431d`: reduced cross-host exact Linux syslog daemon-message repetition by expanding/parameterizing high-volume daemon pools, lowering stale direct snapd/syslog boilerplate, and adding validator coverage for the former exact phrases. Verification passed with focused config validation tests, related baseline/syslog slices, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3154 passed, 37 skipped`). Regenerated eval passed at exact `95.92/100` across `76,542` records; hard probes found zero old exact flagged daemon phrases, zero daemon messages repeated `25+` times, zero messages repeated `100+` times, and top exact repeats moved to SSH PAM session lifecycles. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `64`, Host/EDR `58` (average `67.5`). Top Loop 28 target is DNS TXT tunnel grammar and C2/proxy cadence because Threat Hunter and Detection both flagged tight, storyline-bound behavior; next targets are scenario-authored attack-name legibility (defer unless scenario edits are authorized), Linux `phpsessionclean`/`irqbalance` texture, X.509 SAN and AD SRV realism, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. + - [ ] **IN PROGRESS** Loop 28 fix pass — loosen DNS TXT tunnel and C2/proxy status behavior by adding source-native grammar variation, benign/background TXT collisions, mixed DNS/proxy outcomes, wider beacon jitter, and less stable proxy response-size texture without changing scenario-authored names. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From 350b0f5d27536df633d661b379e53ec856802c44 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 05:19:39 -0400 Subject: [PATCH 56/61] fix: loosen dns tunnel and c2 cadence --- .../generation/engine/storyline.py | 255 ++++++++++++++---- tests/unit/test_bulk_events.py | 122 +++++++++ 2 files changed, 332 insertions(+), 45 deletions(-) diff --git a/src/evidenceforge/generation/engine/storyline.py b/src/evidenceforge/generation/engine/storyline.py index b69f4e4a..bd4a6975 100644 --- a/src/evidenceforge/generation/engine/storyline.py +++ b/src/evidenceforge/generation/engine/storyline.py @@ -38,6 +38,7 @@ import re import shlex import uuid +from dataclasses import replace from datetime import datetime, timedelta from types import SimpleNamespace from typing import Any @@ -66,6 +67,58 @@ def _is_exfil_connection_spec(spec: Any) -> bool: return "exfil" in desc or "t1041" in tech or "t1048" in tech +def _is_c2_http_request( + *, + description: str | None, + technique: str | None, + uri: str | None, + activity: str | None = None, +) -> bool: + """Return True when a storyline HTTP request should look like C2/tasking.""" + uri_l = (uri or "").lower() + text = f"{description or ''} {technique or ''} {activity or ''} {uri_l}".lower() + text_markers = ( + "c2", + "beacon", + "callback", + "checkin", + "tasking", + "command and control", + "t1041", + "t1071", + ) + path_markers = ( + "/v2/", + "/callback", + "/checkin", + "/beacon", + "/task", + "/cmd", + "/gate", + ) + return any(marker in text for marker in text_markers) or any( + marker in uri_l for marker in path_markers + ) + + +def _c2_http_response_size(rng: random.Random, *, method: str, uri: str) -> int: + """Return varied source-native response body sizes for C2-like HTTP requests.""" + method_u = method.upper() + uri_l = uri.lower() + if method_u == "POST": + return rng.randint(160, 2600) + if any(marker in uri_l for marker in ("/status", "/check", "/heartbeat", "/ping")): + band = rng.choices(["ack", "config", "task"], weights=[55, 34, 11], k=1)[0] + if band == "ack": + return rng.randint(90, 1800) + if band == "config": + return rng.randint(2400, 14500) + return rng.randint(18_000, 86_000) + if any(marker in uri_l for marker in ("/client", "/stage", "/update", "/loader")): + return rng.randint(8_000, 94_000) + return rng.randint(220, 11_000) + + def _is_round_transfer_size(value: int) -> bool: """Return True for large human-authored round byte counts.""" if value < 1_000_000: @@ -197,13 +250,14 @@ def _iter_dns_tunnel_ticks( for tick_index, tick_time in enumerate( _iter_periodic_ticks(start_time, interval_sec, duration_sec, count, jitter, rng) ): - if tick_index > 0 and rng.random() < 0.025: - pause_offset += rng.uniform(interval_sec * 3.0, interval_sec * 18.0) - if tick_index > 0 and rng.random() < 0.035: + if tick_index > 0 and rng.random() < 0.045: + pause_offset += rng.uniform(interval_sec * 4.0, interval_sec * 26.0) + if tick_index > 0 and rng.random() < 0.055: continue - paced_time = tick_time + timedelta( - seconds=pause_offset + rng.expovariate(1.0 / max(interval_sec * 0.18, 0.001)) - ) + local_spacing = rng.expovariate(1.0 / max(interval_sec * 0.55, 0.001)) + if tick_index > 0 and rng.random() < 0.11: + local_spacing += rng.uniform(interval_sec * 1.4, interval_sec * 6.5) + paced_time = tick_time + timedelta(seconds=pause_offset + local_spacing) if end_time is not None and paced_time > end_time: break yield paced_time @@ -434,14 +488,63 @@ def _observed_web_scan_status(path_entry: dict[str, Any], rng) -> int: def _dns_tunnel_extra_labels(query_count: int, rng) -> list[str]: """Return optional DNS tunnel labels that make query grammar less uniform.""" roll = rng.random() - if roll < 0.55: + if roll < 0.42: return [] edge = f"{rng.choice(('a', 'b', 'c', 'd', 'e', 'n', 'x'))}{rng.randint(1, 99)}" - if roll < 0.74: + if roll < 0.62: return [edge] + if roll < 0.78: + return [rng.choice(("cdn", "api", "img", "edge", "r")), edge] if roll < 0.9: - return [f"s{query_count & 0xFFFF:x}"] - return [edge, f"r{rng.randint(1, 12)}"] + return [f"s{query_count & 0xFFFF:x}", rng.choice(("a", "b", "r"))] + return [edge, f"r{rng.randint(1, 12)}", rng.choice(("cdn", "cache", "svc"))] + + +def _dns_tunnel_background_txt_record(rng: random.Random) -> tuple[str, str, int]: + """Return a benign TXT query/answer that can collide with tunnel-era DNS.""" + domain = rng.choice( + ( + "meridianhcs.com", + "microsoft.com", + "github.com", + "sendgrid.net", + "okta.com", + "duo.com", + "zoom.us", + "atlassian.net", + ) + ) + selector = rng.choice(("selector1", "selector2", "s1", "mail", "k1", "mta")) + style = rng.choices(("spf", "dkim", "dmarc", "verify"), weights=[38, 32, 20, 10], k=1)[0] + if style == "spf": + answer = rng.choice( + ( + "v=spf1 include:spf.protection.outlook.com -all", + "v=spf1 include:sendgrid.net include:_spf.google.com ~all", + "v=spf1 ip4:203.0.113.0/24 include:amazonses.com -all", + ) + ) + return domain, answer, rng.choice((300, 600, 1800, 3600)) + if style == "dkim": + token = rng.randbytes(rng.randint(9, 18)).hex() + return ( + f"{selector}._domainkey.{domain}", + f"v=DKIM1; k=rsa; p={token}", + rng.choice((300, 600, 900, 1800)), + ) + if style == "dmarc": + policy = rng.choice(("none", "quarantine", "reject")) + return ( + f"_dmarc.{domain}", + f"v=DMARC1; p={policy}; rua=mailto:dmarc@{domain}", + rng.choice((300, 600, 1800)), + ) + token = rng.randbytes(8).hex() + return ( + f"_verify.{domain}", + f"verification={token}", + rng.choice((60, 300, 600)), + ) def _web_scan_path_allows_referrer(path_entry: dict[str, Any]) -> bool: @@ -1700,20 +1803,11 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: _uri_raw = spec.uri or "/" _uri = _uri_raw.lower() _mime_type = normalize_mime_type_for_path(_uri_raw, "text/html") - _desc = f"{spec.description or ''} {spec.technique or ''} {_uri}".lower() - _is_c2_http = any( - marker in _desc - for marker in ( - "c2", - "beacon", - "callback", - "checkin", - "tasking", - "exfil", - "upload", - "t1041", - "t1071", - ) + _is_c2_http = _is_c2_http_request( + description=spec.description, + technique=spec.technique, + uri=_uri_raw, + activity=activity, ) if _is_c2_http and _mime_type == "text/html": _mime_type = rng.choices( @@ -1731,6 +1825,8 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: kw in _uri for kw in ("/callback", "/task", "/cmd", "/beacon", "/gate") ): resp_bytes = rng.randint(500, 5000) + elif _is_c2_http: + resp_bytes = _c2_http_response_size(rng, method=_method, uri=_uri_raw) elif _method == "POST": resp_bytes = rng.randint(200, 5000) else: @@ -2373,6 +2469,9 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: # Allow mode: resolve service, http context, hostname, byte sizing service = spec.service http_ctx = None + http_is_c2 = False + http_method = "" + http_uri = "" conn_hostname = None emit_dns = False s_ob, s_rb = _size_storyline_connection(spec, rng) @@ -2390,19 +2489,16 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: _method = spec.method or "GET" _uri_raw = spec.uri or "/" + http_method = _method + http_uri = _uri_raw _mime_type = normalize_mime_type_for_path(_uri_raw, "text/html") - _desc = f"{spec.description or ''} {spec.technique or ''} {_uri_raw}".lower() - _is_c2_http = any( - marker in _desc - for marker in ( - "c2", - "beacon", - "callback", - "checkin", - "tasking", - "t1071", - ) + _is_c2_http = _is_c2_http_request( + description=spec.description, + technique=spec.technique, + uri=_uri_raw, + activity=activity, ) + http_is_c2 = _is_c2_http if _is_c2_http and _mime_type == "text/html": _mime_type = rng.choices( ["application/json", "text/plain", "application/octet-stream"], @@ -2414,7 +2510,7 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: elif _method == "POST": resp_bytes = rng.randint(200, 2000) elif _is_c2_http: - resp_bytes = rng.randint(180, 6500) + resp_bytes = _c2_http_response_size(rng, method=_method, uri=_uri_raw) else: resp_bytes = response_size_for_mime(rng, _mime_type) from evidenceforge.generation.activity.referrer import pick_referrer @@ -2480,6 +2576,22 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: start, interval_sec, duration_sec, count, spec.jitter, rng ): self.state_manager.set_current_time(tick_time) + tick_http_ctx = http_ctx + tick_resp_bytes = s_rb + if http_ctx is not None and http_is_c2 and spec.response_body_len is None: + tick_http_body_len = _c2_http_response_size( + rng, + method=http_method or http_ctx.method, + uri=http_uri or http_ctx.uri, + ) + tick_http_ctx = replace( + http_ctx, + response_body_len=tick_http_body_len, + tags=list(http_ctx.tags), + resp_fuids=list(http_ctx.resp_fuids), + resp_mime_types=list(http_ctx.resp_mime_types), + ) + tick_resp_bytes = max(s_rb, tick_http_body_len + rng.randint(300, 5000)) if story_pid <= 0: story_pid, story_image = self._ensure_storyline_service_process_for_beacon( actor, @@ -2531,11 +2643,11 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: service="ssl" if spec.dst_port == 443 else "http", duration=rng.uniform(0.05, 2.0), orig_bytes=s_ob, - resp_bytes=s_rb, + resp_bytes=tick_resp_bytes, conn_state="SF", emit_dns=emit_dns and attempt_count == 0, source_system=src_sys, - http=http_ctx, + http=tick_http_ctx, proxy=proxy_ctx, hostname=conn_hostname if conn_hostname is not None else spec.hostname, pid=story_pid, @@ -2564,11 +2676,11 @@ def _ground_truth_uid(uid: str, src_ip: str, dst_ip: str) -> str: service=service, duration=rng.uniform(0.5, 10.0), orig_bytes=s_ob, - resp_bytes=s_rb, + resp_bytes=tick_resp_bytes, conn_state=s_conn_state, emit_dns=emit_dns and attempt_count == 0, source_system=src_sys, - http=http_ctx, + http=tick_http_ctx, hostname=conn_hostname, pid=story_pid, process_image=story_image, @@ -3168,16 +3280,69 @@ def _next_scan_path() -> dict[str, Any]: chunk_idx = 0 tunnel_salt = rng.randbytes(4) + scenario = getattr(self, "scenario", None) + environment = getattr(scenario, "environment", None) + background_systems = [ + candidate + for candidate in getattr(environment, "systems", []) + if getattr(candidate, "ip", "") and getattr(candidate, "ip", "") != query_src_ip + ] + background_window_sec = ( + duration_sec + if duration_sec is not None + else interval_sec * float(count if count is not None else 120) + ) + if background_systems and background_window_sec > 0: + background_count = min(36, max(12, len(background_systems) * 2 + rng.randint(3, 9))) + for _ in range(background_count): + bg_system = rng.choice(background_systems) + bg_query, bg_answer, bg_ttl = _dns_tunnel_background_txt_record(rng) + bg_rtt = rng.uniform(min_rtt, max_rtt) + bg_dns = DnsContext( + query=bg_query, + query_type="TXT", + qtype=16, + rcode="NOERROR", + rcode_num=0, + answers=[bg_answer], + TTLs=[float(bg_ttl)], + trans_id=rng.randint(1, 65535), + AA=False, + RD=True, + RA=True, + rejected=False, + rtt=bg_rtt, + ) + bg_offset = rng.uniform(-240.0, background_window_sec + 240.0) + bg_time = start + timedelta(seconds=bg_offset) + self.activity_generator.generate_connection( + src_ip=bg_system.ip, + dst_ip=rng.choice(dns_server_ips), + time=bg_time, + dst_port=53, + proto="udp", + service="dns", + dns=bg_dns, + emit_dns=False, + resp_bytes=max(90, len(bg_query) + len(bg_answer) + rng.randint(35, 120)), + duration=bg_rtt, + source_system=bg_system, + ) + for tick_time in _iter_dns_tunnel_ticks( start, interval_sec, duration_sec, count, spec.jitter, rng ): self.state_manager.set_current_time(tick_time) - label_length = ( - rng.randint(max(20, spec.label_length - 10), spec.label_length) - if spec.label_length >= 20 - else spec.label_length - ) + if spec.label_length >= 24: + min_label_length = max(14, int(spec.label_length * 0.45)) + label_length = int( + rng.triangular(min_label_length, spec.label_length, spec.label_length - 4) + ) + elif spec.label_length >= 20: + label_length = rng.randint(max(16, spec.label_length - 8), spec.label_length) + else: + label_length = spec.label_length if spec.encoding == "hex": effective_bytes_per_label = label_length // 2 elif spec.encoding == "base32": diff --git a/tests/unit/test_bulk_events.py b/tests/unit/test_bulk_events.py index 8616ed4e..b90bd506 100644 --- a/tests/unit/test_bulk_events.py +++ b/tests/unit/test_bulk_events.py @@ -13,7 +13,9 @@ from evidenceforge.generation.engine.storyline import ( StorylineMixin, + _c2_http_response_size, _effective_rate_interval, + _is_c2_http_request, _iter_dns_tunnel_ticks, _iter_periodic_ticks, _observed_web_scan_status, @@ -302,6 +304,7 @@ def test_dns_tunnel_ticks_include_natural_gaps(self): assert len(ticks) < 451 assert max(intervals) > 8.0 + assert sum(interval < 3.0 for interval in intervals) < len(intervals) * 0.82 assert len({round(interval, 1) for interval in intervals}) > 20 def test_duration_shorter_than_interval(self): @@ -436,6 +439,76 @@ def test_service_backed_beacon_uses_installed_service_process(self, monkeypatch) assert connection_kwargs["pid"] == 4242 assert connection_kwargs["process_image"] == r"C:\Windows\System32\HealthMonitorSvc.exe" + def test_v2_status_beacon_gets_c2_http_texture(self, monkeypatch): + """Beacon activity should not render /v2/status as stable text/html page traffic.""" + from unittest.mock import Mock + + from evidenceforge.generation.engine import storyline + + start = datetime(2026, 4, 16, 16, 30, 0, tzinfo=UTC) + system = System(hostname="DC-01", ip="10.0.2.10", os="Windows Server 2019", type="server") + actor = User(username="SYSTEM", full_name="SYSTEM", email="system@example.com") + + engine = object.__new__(StorylineMixin) + engine.scenario = SimpleNamespace(environment=SimpleNamespace(systems=[system])) + engine.dispatcher = SimpleNamespace(visibility_engine=None) + engine.state_manager = Mock() + engine.activity_generator = Mock() + engine.activity_generator._ip_to_system = {system.ip: system} + engine.activity_generator._proxy_routes = {} + engine.activity_generator._proxy_mode = "transparent" + monkeypatch.setattr(storyline, "_iter_periodic_ticks", Mock(return_value=iter([start]))) + + spec = BeaconEventSpec( + dst_ip="45.33.32.30", + dst_port=443, + hostname="api.example.net", + service="http", + method="GET", + uri="/v2/status", + interval="10m", + count=1, + action="allow", + jitter=0.0, + ) + + engine._execute_typed_event( + spec=spec, + actor=actor, + system=system, + time=start, + activity="HTTPS beacon from DC-01", + explicit_types={"beacon"}, + ) + + http = engine.activity_generator.generate_connection.call_args.kwargs["http"] + assert http.response_body_len != 54_400 + assert http.resp_mime_types[0] in { + "application/json", + "text/plain", + "application/octet-stream", + } + + +class TestC2HttpTexture: + def test_v2_paths_are_c2_even_without_spec_description(self): + assert _is_c2_http_request( + description=None, + technique=None, + uri="/v2/status", + activity=None, + ) + + def test_c2_status_response_sizes_have_multiple_bands(self): + sizes = [ + _c2_http_response_size(random.Random(seed), method="GET", uri="/v2/status") + for seed in range(50) + ] + + assert min(sizes) < 2_000 + assert max(sizes) > 18_000 + assert len({size // 1000 for size in sizes}) > 10 + class TestEffectiveRateInterval: def test_count_based_rate_stays_exact(self): @@ -1160,6 +1233,55 @@ def capture_connection(**kwargs): assert len(label_lengths) > 1 assert len(label_depths) > 1 + def test_dns_tunnel_generation_adds_benign_txt_collisions_from_other_hosts(self): + engine = object.__new__(StorylineMixin) + captured = [] + + source = System(hostname="APP-01", ip="10.0.0.10", os="Ubuntu Server", type="server") + peers = [ + System(hostname="WS-01", ip="10.0.0.20", os="Windows 10", type="workstation"), + System(hostname="MAIL-01", ip="10.0.0.30", os="Ubuntu Server", type="server"), + ] + + def capture_connection(**kwargs): + captured.append(kwargs) + + engine.state_manager = SimpleNamespace(set_current_time=lambda _time: None) + engine.scenario = SimpleNamespace(environment=SimpleNamespace(systems=[source, *peers])) + engine.activity_generator = SimpleNamespace( + _dns_server_ips=["10.0.0.53"], + generate_connection=capture_connection, + ) + spec = DnsTunnelEventSpec( + base_domain="tunnel.example.test", + encoding="hex", + label_length=30, + payload_size=128, + interval="2s", + duration="1m", + ) + + engine._execute_typed_event( + spec=spec, + actor=User(username="attacker", full_name="Attacker", email="a@example.com"), + system=source, + time=datetime(2024, 1, 15, 10, 0, tzinfo=UTC), + activity="DNS exfiltration", + explicit_types={"dns_tunnel"}, + ) + + benign_txt = [ + item + for item in captured + if item["dns"].query_type == "TXT" + and not item["dns"].query.endswith("tunnel.example.test") + ] + benign_sources = {item["src_ip"] for item in benign_txt} + + assert len(benign_txt) >= 12 + assert benign_sources <= {peer.ip for peer in peers} + assert len(benign_sources) > 1 + def test_dns_tunnel_generation_skews_ttls_and_expands_answer_vocabulary(self): engine = object.__new__(StorylineMixin) captured = [] From 5994b26b74a5d8b183eeb71b7ba57b5e90b365cb Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 05:35:07 -0400 Subject: [PATCH 57/61] docs: record loop 28 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 9885465b..97a96644 100644 --- a/TODO.md +++ b/TODO.md @@ -400,7 +400,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 25 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `dd56f08`: introduced source-native persistent HTTP/1.1 transaction reuse for browser-like HTTP asset fetches and internal web sessions, suppressing duplicate connection-level rows for application-layer-only follow-on requests while preserving `zeek_http` rows with repeated UIDs and `trans_depth > 1`. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`284 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3151 passed, 37 skipped`). Regenerated eval passed at exact `95.99/100` across `79,163` records; hard probes found `277` HTTP rows with `trans_depth > 1`, `106` repeated HTTP UID groups, zero duplicate conn UID groups, and every reused HTTP UID backed by exactly one conn row. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `67`, Network `92`, Host/EDR `64` (average `75.25`). The old all-`trans_depth=1` tell is fixed, but Network found a new harder source-native HTTP/conn contradiction: some same-UID HTTP transactions occur after parent connection close, have non-monotonic `trans_depth`, or exceed parent conn byte counters. - [x] Loop 26 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `4f92a11`: repaired persistent Zeek HTTP transaction sequencing and parent `conn.log` accounting by aggregating parent flow bytes/duration across planned HTTP request groups, guarding reuse near parent close, constraining reuse to parent byte budgets, and preserving same-UID Zeek HTTP timestamp order after analyzer jitter. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`266 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3153 passed, 37 skipped`). Regenerated eval passed at exact `95.25/100` across `76,677` records; hard probes found `189` HTTP rows with `trans_depth > 1`, `82` repeated HTTP UID groups, zero duplicate conn UID groups, zero reused groups without exactly one parent conn, zero after-parent-close rows, zero non-monotonic `trans_depth` groups, and zero HTTP body sums exceeding parent conn byte counters. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `62`, Host/EDR `66` (average `69.0`). Top Loop 27 target is Linux syslog daemon-message statefulness because Host/EDR and Threat Hunter both flagged high-volume exact daemon phrase reuse across dissimilar hosts; next targets are DNS TXT tunnel grammar/cadence, public X.509 SAN diversity, C2/proxy HTTP shape variation, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. - [x] Loop 27 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `38e431d`: reduced cross-host exact Linux syslog daemon-message repetition by expanding/parameterizing high-volume daemon pools, lowering stale direct snapd/syslog boilerplate, and adding validator coverage for the former exact phrases. Verification passed with focused config validation tests, related baseline/syslog slices, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3154 passed, 37 skipped`). Regenerated eval passed at exact `95.92/100` across `76,542` records; hard probes found zero old exact flagged daemon phrases, zero daemon messages repeated `25+` times, zero messages repeated `100+` times, and top exact repeats moved to SSH PAM session lifecycles. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `64`, Host/EDR `58` (average `67.5`). Top Loop 28 target is DNS TXT tunnel grammar and C2/proxy cadence because Threat Hunter and Detection both flagged tight, storyline-bound behavior; next targets are scenario-authored attack-name legibility (defer unless scenario edits are authorized), Linux `phpsessionclean`/`irqbalance` texture, X.509 SAN and AD SRV realism, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. - - [ ] **IN PROGRESS** Loop 28 fix pass — loosen DNS TXT tunnel and C2/proxy status behavior by adding source-native grammar variation, benign/background TXT collisions, mixed DNS/proxy outcomes, wider beacon jitter, and less stable proxy response-size texture without changing scenario-authored names. + - [x] Loop 28 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `350b0f5`: loosened DNS TXT tunnel and C2/proxy status behavior by recognizing `/v2/*` and beacon activity as C2 HTTP, varying C2 response body sizes per beacon tick, widening DNS tunnel pacing/label grammar, and adding benign TXT collisions from other hosts around the tunnel window without changing scenario-authored names. Verification passed with focused regressions, related DNS/storyline/proxy tests (`198 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3158 passed, 37 skipped`). Regenerated eval passed at exact `96.00/100` across `74,504` records; hard probes found `226` TXT records across `12` source IPs, `34` benign TXT records from `11` non-tunnel sources, westbridge TXT median gap `2.15s`/p95 `26.26s`, and `/v2/status` proxy response sizes ranging `699`-`67,971` bytes across `8` thousand-byte buckets. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `62`, Network `61`, Host/EDR `58` (average `64.75`). Top Loop 29 target is Linux syslog timer/daemon texture (`phpsessionclean`, `irqbalance`, and trust-anchor repetition); next targets are eCAR FLOW principal context, X.509 SAN/AD SRV realism, and scenario-authored name legibility if edits are later authorized. + - [ ] **IN PROGRESS** Loop 29 fix pass — repair Linux syslog timer and daemon texture by tying `phpsessionclean` eligibility/counts to host role/package state, reducing equal-count cross-host schedules, and replacing formulaic `irqbalance`/systemd-resolved trust-anchor messages with source-native ranges and stateful variation. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From e37a5f3392fe88d38ff5a91e3fdc7abe699aab1c Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 05:42:43 -0400 Subject: [PATCH 58/61] fix: vary linux syslog timer texture --- TODO.md | 3 +- .../eforge/references/config-host-activity.md | 8 +++ .../activity/extra_syslog_messages.yaml | 61 ++++++++++++++---- .../config/activity/systemd_schedules.yaml | 13 +++- src/evidenceforge/config/schemas.py | 6 ++ .../generation/engine/baseline.py | 63 +++++++++++++++++-- tests/unit/test_validate_config.py | 46 ++++++++++++++ 7 files changed, 183 insertions(+), 17 deletions(-) diff --git a/TODO.md b/TODO.md index 97a96644..17eea9e6 100644 --- a/TODO.md +++ b/TODO.md @@ -401,7 +401,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 26 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `4f92a11`: repaired persistent Zeek HTTP transaction sequencing and parent `conn.log` accounting by aggregating parent flow bytes/duration across planned HTTP request groups, guarding reuse near parent close, constraining reuse to parent byte budgets, and preserving same-UID Zeek HTTP timestamp order after analyzer jitter. Verification passed with focused regressions, related ActivityGenerator/baseline/Zeek/proxy tests (`266 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3153 passed, 37 skipped`). Regenerated eval passed at exact `95.25/100` across `76,677` records; hard probes found `189` HTTP rows with `trans_depth > 1`, `82` repeated HTTP UID groups, zero duplicate conn UID groups, zero reused groups without exactly one parent conn, zero after-parent-close rows, zero non-monotonic `trans_depth` groups, and zero HTTP body sums exceeding parent conn byte counters. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `62`, Host/EDR `66` (average `69.0`). Top Loop 27 target is Linux syslog daemon-message statefulness because Host/EDR and Threat Hunter both flagged high-volume exact daemon phrase reuse across dissimilar hosts; next targets are DNS TXT tunnel grammar/cadence, public X.509 SAN diversity, C2/proxy HTTP shape variation, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. - [x] Loop 27 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `38e431d`: reduced cross-host exact Linux syslog daemon-message repetition by expanding/parameterizing high-volume daemon pools, lowering stale direct snapd/syslog boilerplate, and adding validator coverage for the former exact phrases. Verification passed with focused config validation tests, related baseline/syslog slices, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3154 passed, 37 skipped`). Regenerated eval passed at exact `95.92/100` across `76,542` records; hard probes found zero old exact flagged daemon phrases, zero daemon messages repeated `25+` times, zero messages repeated `100+` times, and top exact repeats moved to SSH PAM session lifecycles. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `64`, Host/EDR `58` (average `67.5`). Top Loop 28 target is DNS TXT tunnel grammar and C2/proxy cadence because Threat Hunter and Detection both flagged tight, storyline-bound behavior; next targets are scenario-authored attack-name legibility (defer unless scenario edits are authorized), Linux `phpsessionclean`/`irqbalance` texture, X.509 SAN and AD SRV realism, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. - [x] Loop 28 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `350b0f5`: loosened DNS TXT tunnel and C2/proxy status behavior by recognizing `/v2/*` and beacon activity as C2 HTTP, varying C2 response body sizes per beacon tick, widening DNS tunnel pacing/label grammar, and adding benign TXT collisions from other hosts around the tunnel window without changing scenario-authored names. Verification passed with focused regressions, related DNS/storyline/proxy tests (`198 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3158 passed, 37 skipped`). Regenerated eval passed at exact `96.00/100` across `74,504` records; hard probes found `226` TXT records across `12` source IPs, `34` benign TXT records from `11` non-tunnel sources, westbridge TXT median gap `2.15s`/p95 `26.26s`, and `/v2/status` proxy response sizes ranging `699`-`67,971` bytes across `8` thousand-byte buckets. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `62`, Network `61`, Host/EDR `58` (average `64.75`). Top Loop 29 target is Linux syslog timer/daemon texture (`phpsessionclean`, `irqbalance`, and trust-anchor repetition); next targets are eCAR FLOW principal context, X.509 SAN/AD SRV realism, and scenario-authored name legibility if edits are later authorized. - - [ ] **IN PROGRESS** Loop 29 fix pass — repair Linux syslog timer and daemon texture by tying `phpsessionclean` eligibility/counts to host role/package state, reducing equal-count cross-host schedules, and replacing formulaic `irqbalance`/systemd-resolved trust-anchor messages with source-native ranges and stateful variation. + - [x] Loop 29 fix pass — repaired Linux syslog timer and daemon texture by adding data-driven systemd schedule filters for role, excluded role, service/package state, per-host probability, slot skip probability, and slot jitter; scoped `phpsessionclean` to PHP-backed web hosts instead of generic web/proxy hosts; and replaced formulaic `irqbalance`/systemd-resolved trust-anchor messages with source-native bounded ranges. Verification passed with focused config/syslog tests (`44 passed`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3159 passed, 37 skipped`). + - [ ] **IN PROGRESS** Loop 29 regeneration, hard probes, quantitative eval, and blind review. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/commands/eforge/references/config-host-activity.md b/commands/eforge/references/config-host-activity.md index e4314509..19874870 100644 --- a/commands/eforge/references/config-host-activity.md +++ b/commands/eforge/references/config-host-activity.md @@ -112,6 +112,8 @@ schedules: jitter_minutes: 60 distro: all role: web_server + services_any: [nginx, apache2] + slot_skip_probability: 0.08 cron_user: root cron_commands: debian: "/usr/bin/certbot renew --quiet --deploy-hook 'systemctl reload nginx'" @@ -130,6 +132,12 @@ schedules: | `jitter_minutes` | int | yes | Max jitter offset (per-host deterministic) | | `distro` | string | yes | `all`, `debian`, or `rhel` | | `role` | string | no | Host role filter (e.g., `web_server`) | +| `roles` | list[string] | no | Host role filter where any role may match | +| `exclude_roles` | list[string] | no | Host roles that suppress this schedule | +| `services_any` | list[string] | no | Required host service/package signals where any service may match | +| `host_probability` | float | no | Deterministic per-host enable probability between `0.0` and `1.0` | +| `slot_skip_probability` | float | no | Deterministic per-slot skip probability for frequent timers | +| `slot_jitter_seconds` | int | no | Extra runtime jitter for frequent timer slots | | `process_path` | string | no | Path to service binary for process create events | **Systemd timer additional fields:** diff --git a/src/evidenceforge/config/activity/extra_syslog_messages.yaml b/src/evidenceforge/config/activity/extra_syslog_messages.yaml index 9225a8c5..75e45d3b 100644 --- a/src/evidenceforge/config/activity/extra_syslog_messages.yaml +++ b/src/evidenceforge/config/activity/extra_syslog_messages.yaml @@ -204,15 +204,27 @@ programs: - UDP - TCP - UDP+EDNS0 - trust_anchor: - - "20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d" - - "38696 8 2 683d2d0acb8c9b712a1948b27f741219298d0a450d612c483af444a4c0fb2b16" - - "20326 8 1 2f6d2c6bdab2a2d15eb06fcddcb0b195" + degraded_feature_set: + - UDP + - TCP + scope: + - global + - eth0 + - ens160 + - br0 + cache_bucket: + - positive + - negative + - stale + packet_size: + - "512" + - "1232" + - "1472" messages: - - "Using degraded feature set {feature_set} instead of UDP+EDNS0 for DNS server {dns_server} after transaction {0}." - - "Grace period over, resuming full feature set {feature_set} for DNS server {dns_server} after probe {0}." - - "Positive Trust Anchors: . IN DS {trust_anchor}" - - "Cache miss for transaction {}, retrying DNS server {dns_server} with {feature_set}." + - "Using degraded feature set {degraded_feature_set} instead of UDP+EDNS0 for DNS server {dns_server} after transaction {0}." + - "Grace period over, resuming full feature set UDP+EDNS0 for DNS server {dns_server} after probe {0}." + - "Flushed {cache_bucket} cache scope {scope} after DNS server {dns_server} changed features." + - "Transaction {0} switched to {feature_set} with advertised UDP packet size {packet_size}." - app: thermald system_types: [workstation] @@ -240,7 +252,34 @@ programs: numa_node: - "0" - "1" + irq: + - "16" + - "24" + - "32" + - "45" + - "64" + - "86" + - "122" + - "137" + - "154" + - "181" + device: + - ens160 + - nvme0q1 + - ahci + - virtio0-input + - mlx5_comp0 + affinity_mask: + - "00000001" + - "00000002" + - "00000004" + - "00000008" + moved: + - "0" + - "1" + - "2" messages: - - "Balancing is ineffective: IRQs are pinned by affinity policy on CPU {cpu} during sample {0}" - - "IRQ {} affinity hint keeps vector on CPU {cpu}" - - "NUMA node {numa_node}: no movable IRQs found during rebalance pass {0}" + - "IRQ {irq} affinity hint keeps vector on CPU {cpu} ({device})" + - "IRQ {irq} classified for CPU {cpu} balancing on {device}" + - "Skipping IRQ {irq}: banned by affinity mask {affinity_mask} ({device})" + - "NUMA node {numa_node}: balancing pass complete, {moved} IRQs moved" diff --git a/src/evidenceforge/config/activity/systemd_schedules.yaml b/src/evidenceforge/config/activity/systemd_schedules.yaml index 6a5e2921..cd222af7 100644 --- a/src/evidenceforge/config/activity/systemd_schedules.yaml +++ b/src/evidenceforge/config/activity/systemd_schedules.yaml @@ -12,6 +12,12 @@ # jitter_minutes - Max jitter offset in minutes (per-host deterministic) # distro - "all", "debian", or "rhel" # role - Optional role filter (e.g., "web_server") +# roles - Optional role list filter (any match) +# exclude_roles - Optional role list filter to suppress a schedule +# services_any - Optional service/package signal list (any match) +# host_probability - Optional deterministic per-host enable probability +# slot_skip_probability - Optional per-slot skip probability for frequent timers +# slot_jitter_seconds - Optional per-slot extra runtime jitter in seconds # process_path - Path to the service binary (for process create events) # # For systemd_timer type: @@ -133,7 +139,12 @@ schedules: typical_hour: 0 jitter_minutes: 5 distro: debian - role: web_server + roles: [web_server] + exclude_roles: [forward_proxy] + services_any: [php-fpm] + host_probability: 0.98 + slot_skip_probability: 0.16 + slot_jitter_seconds: 180 process_path: "/usr/lib/php/sessionclean" start_message: "Starting phpsessionclean.service - Clean PHP session files." finish_message: "Finished phpsessionclean.service - Clean PHP session files." diff --git a/src/evidenceforge/config/schemas.py b/src/evidenceforge/config/schemas.py index 250350bd..c42e47b0 100644 --- a/src/evidenceforge/config/schemas.py +++ b/src/evidenceforge/config/schemas.py @@ -203,6 +203,12 @@ class SystemdScheduleEntry(BaseModel, extra="forbid"): typical_day: str | None = None # Optional role filter role: str | None = None + roles: list[str] | None = None + exclude_roles: list[str] | None = None + services_any: list[str] | None = None + host_probability: float | None = Field(default=None, ge=0.0, le=1.0) + slot_skip_probability: float | None = Field(default=None, ge=0.0, le=1.0) + slot_jitter_seconds: int | None = Field(default=None, ge=0, le=1800) # Optional fields for cron type cron_user: str | None = None cron_commands: dict[str, str] | None = None diff --git a/src/evidenceforge/generation/engine/baseline.py b/src/evidenceforge/generation/engine/baseline.py index f70e9e38..65a4015a 100644 --- a/src/evidenceforge/generation/engine/baseline.py +++ b/src/evidenceforge/generation/engine/baseline.py @@ -703,6 +703,51 @@ def _load_systemd_schedules() -> list[dict[str, Any]]: return _CACHED_SCHEDULES +def _deterministic_probability_enabled(key: str, probability: float | None) -> bool: + """Return whether a stable per-key probability gate is enabled.""" + if probability is None: + return True + clamped = max(0.0, min(1.0, float(probability))) + if clamped <= 0.0: + return False + if clamped >= 1.0: + return True + return (_stable_seed(key) % 10_000) / 10_000.0 < clamped + + +def _schedule_applies_to_system(sched: dict[str, Any], system: Any, has_web_role: bool) -> bool: + """Return whether a Linux schedule matches host role and service/package state.""" + roles = {str(role).lower() for role in (getattr(system, "roles", []) or [])} + services = {str(service).lower() for service in (getattr(system, "services", []) or [])} + + legacy_role = sched.get("role") + if legacy_role: + role = str(legacy_role).lower() + if role == "web_server": + if role not in roles and not has_web_role: + return False + elif role not in roles: + return False + + required_roles = {str(role).lower() for role in (sched.get("roles") or [])} + if required_roles and not roles.intersection(required_roles): + return False + + excluded_roles = {str(role).lower() for role in (sched.get("exclude_roles") or [])} + if excluded_roles and roles.intersection(excluded_roles): + return False + + required_services = {str(service).lower() for service in (sched.get("services_any") or [])} + if required_services and not services.intersection(required_services): + return False + + service = sched.get("service", "") + return _deterministic_probability_enabled( + f"sched_host_enabled:{getattr(system, 'hostname', '')}:{service}", + sched.get("host_probability"), + ) + + def _machine_account_tgs_gap_ms(rng: random.Random, *, first: bool) -> int: """Return a realistic gap before machine-account service-ticket requests.""" if first: @@ -1070,9 +1115,8 @@ def _generate_scheduled_tasks( if distro == "rhel" and not is_rhel_like: continue - # Filter by role - role = sched.get("role") - if role == "web_server" and not has_web_role: + # Filter by role and service/package signals + if not _schedule_applies_to_system(sched, system, has_web_role): continue service = sched["service"] @@ -1113,7 +1157,18 @@ def _generate_scheduled_tasks( if frequency == "30min": # Generate two events per hour for fm in (fire_minute_1, fire_minute_2): - ts = current_hour + timedelta(minutes=fm, seconds=rng.uniform(0, 30)) + slot_key = ( + f"sched_slot:{system.hostname}:{service}:{current_hour.isoformat()}:{fm}" + ) + skip_probability = sched.get("slot_skip_probability") + if skip_probability is not None and not _deterministic_probability_enabled( + slot_key, 1.0 - float(skip_probability) + ): + continue + jitter_seconds = max(30.0, float(sched.get("slot_jitter_seconds") or 30)) + ts = current_hour + timedelta( + minutes=fm, seconds=rng.uniform(0, jitter_seconds) + ) self._emit_scheduled_event(sched, system, ts, rng, sys_pids, is_rhel_like) else: ts = current_hour + timedelta(minutes=fire_minute, seconds=rng.uniform(0, 59)) diff --git a/tests/unit/test_validate_config.py b/tests/unit/test_validate_config.py index 7c63ec0b..95c252da 100644 --- a/tests/unit/test_validate_config.py +++ b/tests/unit/test_validate_config.py @@ -1178,6 +1178,12 @@ def test_extra_syslog_high_volume_daemons_avoid_exact_boilerplate(self): messages = entry.get("messages", []) assert not old_exact_messages.intersection(messages) assert any("{" in message for message in messages) + if app == "systemd-resolved": + assert "trust_anchor" not in (entry.get("params") or {}) + assert all("Positive Trust Anchors" not in message for message in messages) + if app == "irqbalance": + assert all("{}" not in message and "{0}" not in message for message in messages) + assert all("from CPU" not in message for message in messages) for message in messages: rendered = render_extra_syslog_message( {**entry, "messages": [message]}, @@ -1188,9 +1194,49 @@ def test_extra_syslog_high_volume_daemons_avoid_exact_boilerplate(self): ) assert "{" not in rendered assert "}" not in rendered + if app == "systemd-resolved": + assert "UDP+EDNS0 instead of UDP+EDNS0" not in rendered assert checked_apps == high_volume_apps + def test_systemd_schedule_filters_by_role_and_service_state(self): + from types import SimpleNamespace + + from evidenceforge.generation.engine.baseline import _schedule_applies_to_system + + sched = { + "service": "phpsessionclean", + "roles": ["web_server"], + "exclude_roles": ["forward_proxy"], + "services_any": ["php-fpm"], + "host_probability": 1.0, + } + + php_web = SimpleNamespace( + hostname="WEB-EXT-01", + roles=["web_server"], + services=["apache2", "php-fpm"], + ) + nginx_only = SimpleNamespace( + hostname="APP-INT-01", + roles=["web_server"], + services=["nginx", "systemd"], + ) + proxy = SimpleNamespace( + hostname="PROXY-01", + roles=["forward_proxy"], + services=["squid", "php-fpm"], + ) + + assert _schedule_applies_to_system(sched, php_web, has_web_role=True) + assert not _schedule_applies_to_system(sched, nginx_only, has_web_role=True) + assert not _schedule_applies_to_system(sched, proxy, has_web_role=True) + assert not _schedule_applies_to_system( + {**sched, "host_probability": 0.0}, + php_web, + has_web_role=True, + ) + def test_validate_config_rejects_invalid_4672_emission_probability(self, monkeypatch): from evidenceforge.generation.activity import windows_auth_realism From 73e123e87d6fece16f63e6a5c84e012910fe3bed Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 06:03:47 -0400 Subject: [PATCH 59/61] docs: record loop 29 assessment results --- TODO.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 17eea9e6..afbe5dc8 100644 --- a/TODO.md +++ b/TODO.md @@ -402,7 +402,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 27 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `38e431d`: reduced cross-host exact Linux syslog daemon-message repetition by expanding/parameterizing high-volume daemon pools, lowering stale direct snapd/syslog boilerplate, and adding validator coverage for the former exact phrases. Verification passed with focused config validation tests, related baseline/syslog slices, `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3154 passed, 37 skipped`). Regenerated eval passed at exact `95.92/100` across `76,542` records; hard probes found zero old exact flagged daemon phrases, zero daemon messages repeated `25+` times, zero messages repeated `100+` times, and top exact repeats moved to SSH PAM session lifecycles. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `66`, Network `64`, Host/EDR `58` (average `67.5`). Top Loop 28 target is DNS TXT tunnel grammar and C2/proxy cadence because Threat Hunter and Detection both flagged tight, storyline-bound behavior; next targets are scenario-authored attack-name legibility (defer unless scenario edits are authorized), Linux `phpsessionclean`/`irqbalance` texture, X.509 SAN and AD SRV realism, eCAR FLOW principal context, and Sysmon SYSTEM `LogonGuid` morphology. - [x] Loop 28 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `350b0f5`: loosened DNS TXT tunnel and C2/proxy status behavior by recognizing `/v2/*` and beacon activity as C2 HTTP, varying C2 response body sizes per beacon tick, widening DNS tunnel pacing/label grammar, and adding benign TXT collisions from other hosts around the tunnel window without changing scenario-authored names. Verification passed with focused regressions, related DNS/storyline/proxy tests (`198 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3158 passed, 37 skipped`). Regenerated eval passed at exact `96.00/100` across `74,504` records; hard probes found `226` TXT records across `12` source IPs, `34` benign TXT records from `11` non-tunnel sources, westbridge TXT median gap `2.15s`/p95 `26.26s`, and `/v2/status` proxy response sizes ranging `699`-`67,971` bytes across `8` thousand-byte buckets. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `62`, Network `61`, Host/EDR `58` (average `64.75`). Top Loop 29 target is Linux syslog timer/daemon texture (`phpsessionclean`, `irqbalance`, and trust-anchor repetition); next targets are eCAR FLOW principal context, X.509 SAN/AD SRV realism, and scenario-authored name legibility if edits are later authorized. - [x] Loop 29 fix pass — repaired Linux syslog timer and daemon texture by adding data-driven systemd schedule filters for role, excluded role, service/package state, per-host probability, slot skip probability, and slot jitter; scoped `phpsessionclean` to PHP-backed web hosts instead of generic web/proxy hosts; and replaced formulaic `irqbalance`/systemd-resolved trust-anchor messages with source-native bounded ranges. Verification passed with focused config/syslog tests (`44 passed`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3159 passed, 37 skipped`). - - [ ] **IN PROGRESS** Loop 29 regeneration, hard probes, quantitative eval, and blind review. + - [x] Loop 29 regeneration, hard probes, quantitative eval, deliberation, and blind review completed from commit `e37a5f3`: regenerated eval passed at exact `95.99/100` across `76,333` records; hard probes verified `phpsessionclean` only on `WEB-EXT-01`, zero non-PHP `phpsessionclean` hosts, zero `Positive Trust Anchors` lines, zero old `irqbalance` sample/pass lines, zero large IRQ counters, zero self-move IRQ lines, and zero `systemd-resolved` self-contradictions. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `64`, Network `60`, Host/EDR `39` (Host/EDR verdict Real at confidence `61`; average `61.25`). Top Loop 30 generator-owned target is eCAR FLOW principal attribution because Detection found dataset-wide blank FLOW principals even when process/user context exists; next targets are TLS/X.509 SAN diversity plus AD SRV responses, DB bash/eCAR command timing, and additional DNS/C2 texture. Scenario-authored name legibility remains the highest broad tell but is deferred unless scenario edits are authorized. + - [ ] **IN PROGRESS** Loop 30 fix pass — add realistic mixed eCAR FLOW principal attribution so user-owned process flows can carry process/user identity, kernel/system flows remain source-native, and some records still model vendor collection gaps. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. From bc3772d5e500bc2f9ccec7246f783e00ec3bd34e Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 06:09:10 -0400 Subject: [PATCH 60/61] fix: mix ecar flow principal attribution --- TODO.md | 3 +- .../eforge/references/config-host-activity.md | 12 +- src/evidenceforge/cli/validate_config.py | 6 +- src/evidenceforge/config/activity/README.md | 2 +- .../config/activity/endpoint_noise.yaml | 6 + src/evidenceforge/config/schemas.py | 10 ++ .../generation/activity/endpoint_noise.py | 5 + src/evidenceforge/generation/emitters/ecar.py | 82 ++++++++++ tests/unit/test_ecar_spec_compliance.py | 146 ++++++++++++++++++ tests/unit/test_validate_config.py | 6 + 10 files changed, 273 insertions(+), 5 deletions(-) diff --git a/TODO.md b/TODO.md index afbe5dc8..51d220e9 100644 --- a/TODO.md +++ b/TODO.md @@ -403,7 +403,8 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 28 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `350b0f5`: loosened DNS TXT tunnel and C2/proxy status behavior by recognizing `/v2/*` and beacon activity as C2 HTTP, varying C2 response body sizes per beacon tick, widening DNS tunnel pacing/label grammar, and adding benign TXT collisions from other hosts around the tunnel window without changing scenario-authored names. Verification passed with focused regressions, related DNS/storyline/proxy tests (`198 passed, 1 skipped`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3158 passed, 37 skipped`). Regenerated eval passed at exact `96.00/100` across `74,504` records; hard probes found `226` TXT records across `12` source IPs, `34` benign TXT records from `11` non-tunnel sources, westbridge TXT median gap `2.15s`/p95 `26.26s`, and `/v2/status` proxy response sizes ranging `699`-`67,971` bytes across `8` thousand-byte buckets. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `62`, Network `61`, Host/EDR `58` (average `64.75`). Top Loop 29 target is Linux syslog timer/daemon texture (`phpsessionclean`, `irqbalance`, and trust-anchor repetition); next targets are eCAR FLOW principal context, X.509 SAN/AD SRV realism, and scenario-authored name legibility if edits are later authorized. - [x] Loop 29 fix pass — repaired Linux syslog timer and daemon texture by adding data-driven systemd schedule filters for role, excluded role, service/package state, per-host probability, slot skip probability, and slot jitter; scoped `phpsessionclean` to PHP-backed web hosts instead of generic web/proxy hosts; and replaced formulaic `irqbalance`/systemd-resolved trust-anchor messages with source-native bounded ranges. Verification passed with focused config/syslog tests (`44 passed`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3159 passed, 37 skipped`). - [x] Loop 29 regeneration, hard probes, quantitative eval, deliberation, and blind review completed from commit `e37a5f3`: regenerated eval passed at exact `95.99/100` across `76,333` records; hard probes verified `phpsessionclean` only on `WEB-EXT-01`, zero non-PHP `phpsessionclean` hosts, zero `Positive Trust Anchors` lines, zero old `irqbalance` sample/pass lines, zero large IRQ counters, zero self-move IRQ lines, and zero `systemd-resolved` self-contradictions. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `64`, Network `60`, Host/EDR `39` (Host/EDR verdict Real at confidence `61`; average `61.25`). Top Loop 30 generator-owned target is eCAR FLOW principal attribution because Detection found dataset-wide blank FLOW principals even when process/user context exists; next targets are TLS/X.509 SAN diversity plus AD SRV responses, DB bash/eCAR command timing, and additional DNS/C2 texture. Scenario-authored name legibility remains the highest broad tell but is deferred unless scenario edits are authorized. - - [ ] **IN PROGRESS** Loop 30 fix pass — add realistic mixed eCAR FLOW principal attribution so user-owned process flows can carry process/user identity, kernel/system flows remain source-native, and some records still model vendor collection gaps. + - [x] Loop 30 fix pass — added data-driven mixed eCAR FLOW principal attribution so user-owned process flows usually carry process/user identity, service/root flows carry it less often, inbound listener flows can occasionally expose local service principals, and unknown/rejected flows remain source-native gaps. Verification passed with focused eCAR/config tests (`93 passed`), broader eCAR tests (`88 passed`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3162 passed, 37 skipped`). + - [ ] **IN PROGRESS** Loop 30 regeneration, hard probes, quantitative eval, and blind review. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`. diff --git a/commands/eforge/references/config-host-activity.md b/commands/eforge/references/config-host-activity.md index 19874870..34dff836 100644 --- a/commands/eforge/references/config-host-activity.md +++ b/commands/eforge/references/config-host-activity.md @@ -330,7 +330,7 @@ scheduled_stale_credentials: ## Endpoint Noise (`endpoint_noise.yaml`) -Controls endpoint background timing and registry-emission policies that are too source-specific for scenario YAML. Use it to tune routine Windows scheduled-process spacing and whether DHCP interface registry values appear as ambient Sysmon/EDR noise. +Controls endpoint background timing, registry-emission, and EDR attribution policies that are too source-specific for scenario YAML. Use it to tune routine Windows scheduled-process spacing, whether DHCP interface registry values appear as ambient Sysmon/EDR noise, and how often eCAR FLOW rows expose process/user principal context. ```yaml windows_scheduled_processes: @@ -351,11 +351,19 @@ registry_noise: emit_on_lease_events: true suppress_system_types: [server, domain_controller] suppress_roles: [domain_controller, dns_server, file_server, web_server] + +ecar_flow_identity: + user_process_probability: 0.88 + service_process_probability: 0.48 + root_process_probability: 0.42 + inbound_listener_probability: 0.36 ``` `windows_scheduled_processes` replaces hour-end clamping with profile-driven trigger windows, per-host phase offsets, jitter, and skips. Keep `trigger_window_end_seconds` comfortably below 3599 to avoid synthetic `xx:59:59` clusters. -`registry_noise.dhcp_interface_values` reserves DHCP interface registry writes for actual DHCP lease/reconfigure activity. Static infrastructure roles should stay in `suppress_system_types` or `suppress_roles` so they do not repeatedly rewrite DHCP values as ambient registry noise. Run `eforge validate-config` after overlay changes; it rejects inverted ranges, empty value-name lists, and invalid probabilities. +`registry_noise.dhcp_interface_values` reserves DHCP interface registry writes for actual DHCP lease/reconfigure activity. Static infrastructure roles should stay in `suppress_system_types` or `suppress_roles` so they do not repeatedly rewrite DHCP values as ambient registry noise. + +`ecar_flow_identity` controls mixed FLOW principal attribution. User-owned process flows usually carry `principal`, service/root flows carry it less often, inbound listener flows carry it occasionally, and unknown or rejected flows remain unattributed. Run `eforge validate-config` after overlay changes; it rejects inverted ranges, empty value-name lists, and invalid probabilities. --- diff --git a/src/evidenceforge/cli/validate_config.py b/src/evidenceforge/cli/validate_config.py index 30d4eecb..1f153c80 100644 --- a/src/evidenceforge/cli/validate_config.py +++ b/src/evidenceforge/cli/validate_config.py @@ -234,7 +234,11 @@ def validate_config() -> ValidationResult: }, }, "activity/endpoint_noise.yaml": { - "dict_fields": {"windows_scheduled_processes", "registry_noise"}, + "dict_fields": { + "windows_scheduled_processes", + "registry_noise", + "ecar_flow_identity", + }, }, "activity/host_activity_profiles.yaml": { "dict_fields": { diff --git a/src/evidenceforge/config/activity/README.md b/src/evidenceforge/config/activity/README.md index 6c3d3762..657674e9 100644 --- a/src/evidenceforge/config/activity/README.md +++ b/src/evidenceforge/config/activity/README.md @@ -22,7 +22,7 @@ caches data after first load. Two files (`network_params.yaml`, | `kerberos_realism.yaml` | `kerberos_realism.py` | Kerberos 4768 TGT PreAuthType, TicketOptions, encryption, and PKINIT certificate field distributions with overlay support. | | `windows_auth_realism.yaml` | `windows_auth_realism.py` | Windows Security authentication realism knobs such as minimum 4800→4801 lock/unlock gap, failed-logon validation paths, companion network evidence, and 4672 privilege profiles. | | `auth_noise.yaml` | `auth_noise.py` | Baseline authentication-noise profiles such as stale scheduled-credential account pools and irregular recurrence timing. | -| `endpoint_noise.yaml` | `endpoint_noise.py` | Endpoint background timing and registry-emission policies for Windows scheduled processes and DHCP interface registry writes. | +| `endpoint_noise.yaml` | `endpoint_noise.py` | Endpoint background timing, registry-emission, and EDR attribution policies for Windows scheduled processes, DHCP interface registry writes, and eCAR FLOW principal context. | | `host_activity_profiles.yaml` | `host_activity_profiles.py` | Coarse host/persona/role rate multipliers for baseline volume, endpoint noise, firewall deny bursts, and data-driven artifact variation. | | `observation_profiles.yaml` | `config/observation_profiles.py` | Named source-observation profiles for optional source-level missingness and delays. Scenario `observation_profile` defaults to `complete`; generation records status in `OBSERVATION_MANIFEST.json` for eval. | | `proxy_uri_templates.yaml` | `proxy_uri.py` | Per-domain URI path templates for proxy logs (Windows Update, CRL, OCSP, Azure AD, etc.). | diff --git a/src/evidenceforge/config/activity/endpoint_noise.yaml b/src/evidenceforge/config/activity/endpoint_noise.yaml index 204d9142..32a987f5 100644 --- a/src/evidenceforge/config/activity/endpoint_noise.yaml +++ b/src/evidenceforge/config/activity/endpoint_noise.yaml @@ -35,3 +35,9 @@ registry_noise: - forward_proxy - app_server - database + +ecar_flow_identity: + user_process_probability: 0.88 + service_process_probability: 0.48 + root_process_probability: 0.42 + inbound_listener_probability: 0.36 diff --git a/src/evidenceforge/config/schemas.py b/src/evidenceforge/config/schemas.py index c42e47b0..0cb17905 100644 --- a/src/evidenceforge/config/schemas.py +++ b/src/evidenceforge/config/schemas.py @@ -1259,11 +1259,21 @@ class RegistryNoiseConfig(BaseModel, extra="forbid"): dhcp_interface_values: DhcpInterfaceRegistryNoiseConfig +class EcarFlowIdentityConfig(BaseModel, extra="forbid"): + """eCAR FLOW principal-attribution probability policy.""" + + user_process_probability: float = Field(ge=0.0, le=1.0) + service_process_probability: float = Field(ge=0.0, le=1.0) + root_process_probability: float = Field(ge=0.0, le=1.0) + inbound_listener_probability: float = Field(ge=0.0, le=1.0) + + class EndpointNoiseConfig(BaseModel, extra="forbid"): """Root schema for endpoint_noise.yaml.""" windows_scheduled_processes: WindowsScheduledProcessNoiseConfig registry_noise: RegistryNoiseConfig + ecar_flow_identity: EcarFlowIdentityConfig # --- Observation Profiles --- diff --git a/src/evidenceforge/generation/activity/endpoint_noise.py b/src/evidenceforge/generation/activity/endpoint_noise.py index 2bcff1d5..b7df8bf5 100644 --- a/src/evidenceforge/generation/activity/endpoint_noise.py +++ b/src/evidenceforge/generation/activity/endpoint_noise.py @@ -47,3 +47,8 @@ def windows_scheduled_process_config() -> dict[str, Any]: def registry_noise_config() -> dict[str, Any]: """Return ambient endpoint registry-noise policy.""" return load_endpoint_noise().get("registry_noise", {}) + + +def ecar_flow_identity_config() -> dict[str, Any]: + """Return eCAR FLOW process/principal attribution policy.""" + return load_endpoint_noise().get("ecar_flow_identity", {}) diff --git a/src/evidenceforge/generation/emitters/ecar.py b/src/evidenceforge/generation/emitters/ecar.py index e79fcd9d..404b7bfb 100644 --- a/src/evidenceforge/generation/emitters/ecar.py +++ b/src/evidenceforge/generation/emitters/ecar.py @@ -29,6 +29,7 @@ from evidenceforge.events.base import SecurityEvent from evidenceforge.events.contexts import HostContext +from evidenceforge.generation.activity.endpoint_noise import ecar_flow_identity_config from evidenceforge.generation.activity.timing_profiles import sample_timing_delta from evidenceforge.generation.emitters.host_base import HostMultiplexEmitter from evidenceforge.utils.rng import _stable_seed @@ -74,6 +75,23 @@ "%%2313": "bad_password", } +_SERVICE_PRINCIPAL_NAMES = { + "system", + "local service", + "network service", + "nt authority\\system", + "nt authority\\local service", + "nt authority\\network service", + "apache", + "mysql", + "nginx", + "postgres", + "postfix", + "squid", + "sshd", + "www-data", +} + def _ecar_sort_key(line: str) -> tuple[int, int, str]: """Extract timestamp_ms for chronological per-host eCAR output sorting.""" @@ -96,6 +114,29 @@ def _ecar_failed_logon_reason(auth: Any, os_category: str) -> str: return _ECAR_FAILURE_REASON_BY_WINDOWS_CODE.get(reason, "authentication_failure") +def _ecar_probability_enabled(key: str, probability: float) -> bool: + """Return whether a stable per-record probability gate is enabled.""" + clamped = max(0.0, min(1.0, float(probability))) + if clamped <= 0.0: + return False + if clamped >= 1.0: + return True + return (_stable_seed(key) % 10_000) / 10_000.0 < clamped + + +def _flow_principal_probability(username: str, direction: str) -> float: + """Return the configured probability for FLOW principal attribution.""" + cfg = ecar_flow_identity_config() + normalized = username.strip().lower() + if direction == "INBOUND": + return float(cfg.get("inbound_listener_probability", 0.36)) + if normalized == "root": + return float(cfg.get("root_process_probability", 0.42)) + if normalized in _SERVICE_PRINCIPAL_NAMES: + return float(cfg.get("service_process_probability", 0.48)) + return float(cfg.get("user_process_probability", 0.88)) + + class EcarEmitter(HostMultiplexEmitter): """Emitter for eCAR (extended Cyber Analytics Repository) format. @@ -495,6 +536,14 @@ def _render_connection(self, event: SecurityEvent) -> None: "protocol": net.protocol, "_host_fqdn": self._host_fqdn(event.src_host), } + principal = self._flow_principal_for_process( + event, + event.src_host, + source_proc, + "OUTBOUND", + ) + if principal: + event_data["principal"] = principal self._apply_edr_context(event_data, event) self.emit_event(event_data) @@ -536,9 +585,42 @@ def _render_connection(self, event: SecurityEvent) -> None: if not listener_observed: event_data["outcome"] = "failure" event_data["connection_state"] = net.conn_state + else: + inbound_proc = self._lookup_running_process(event.dst_host, inbound_pid) + principal = self._flow_principal_for_process( + event, + event.dst_host, + inbound_proc, + "INBOUND", + ) + if principal: + event_data["principal"] = principal # INBOUND flow gets its own objectID (separate telemetry observation) self.emit_event(event_data) + def _flow_principal_for_process( + self, + event: SecurityEvent, + host: HostContext | None, + process: Any | None, + direction: str, + ) -> str: + """Return a source-native mixed FLOW principal attribution value.""" + if host is None or process is None: + return "" + username = str(getattr(process, "username", "") or "").strip() + if not username or username == "-": + return "" + pid = int(getattr(process, "pid", -1) or -1) + net = event.network + probability = _flow_principal_probability(username, direction) + key = ( + f"ecar_flow_principal:{direction}:{host.hostname}:{pid}:" + f"{net.src_ip}:{net.src_port}:{net.dst_ip}:{net.dst_port}:" + f"{int(event.timestamp.timestamp() * 1000)}" + ) + return username if _ecar_probability_enabled(key, probability) else "" + def _lookup_running_process(self, host: HostContext, pid: int) -> Any | None: """Read a process from attached state when a connection only carries a PID.""" state_manager = getattr(self, "_state_manager", None) diff --git a/tests/unit/test_ecar_spec_compliance.py b/tests/unit/test_ecar_spec_compliance.py index 88808d23..a00dfbbe 100644 --- a/tests/unit/test_ecar_spec_compliance.py +++ b/tests/unit/test_ecar_spec_compliance.py @@ -636,6 +636,100 @@ def test_actor_linked_flow_renders_after_process_create(self, emitter, monkeypat assert emitted[0]["timestamp"] > emitter._process_create_timestamp(event, process) + def test_outbound_flow_can_render_user_principal(self, emitter, monkeypatch, ts): + """User-owned FLOW records should be able to carry mixed principal attribution.""" + monkeypatch.setattr( + "evidenceforge.generation.emitters.ecar.ecar_flow_identity_config", + lambda: { + "user_process_probability": 1.0, + "service_process_probability": 0.0, + "root_process_probability": 0.0, + "inbound_listener_probability": 0.0, + }, + ) + emitted: list[dict] = [] + monkeypatch.setattr(emitter, "emit_event", emitted.append) + event = SecurityEvent( + timestamp=ts, + event_type="connection", + src_host=HostContext( + hostname="ws01", + ip="10.0.0.10", + os="Windows 11", + os_category="windows", + system_type="workstation", + fqdn="ws01.example.org", + ), + process=ProcessContext( + pid=1234, + parent_pid=777, + image=r"C:\Program Files\Mozilla Firefox\firefox.exe", + command_line="firefox.exe", + username="alice", + start_time=ts, + ), + network=NetworkContext( + src_ip="10.0.0.10", + src_port=49152, + dst_ip="93.184.216.34", + dst_port=443, + protocol="tcp", + initiating_pid=1234, + ), + ) + + emitter._render_connection(event) + + assert emitted[0]["object"] == "FLOW" + assert emitted[0]["direction"] == "OUTBOUND" + assert emitted[0]["principal"] == "alice" + + def test_service_flow_can_omit_principal(self, emitter, monkeypatch, ts): + """Service-owned FLOW records should still model vendor attribution gaps.""" + monkeypatch.setattr( + "evidenceforge.generation.emitters.ecar.ecar_flow_identity_config", + lambda: { + "user_process_probability": 1.0, + "service_process_probability": 0.0, + "root_process_probability": 0.0, + "inbound_listener_probability": 0.0, + }, + ) + emitted: list[dict] = [] + monkeypatch.setattr(emitter, "emit_event", emitted.append) + event = SecurityEvent( + timestamp=ts, + event_type="connection", + src_host=HostContext( + hostname="dc01", + ip="10.0.0.10", + os="Windows Server 2022", + os_category="windows", + system_type="domain_controller", + fqdn="dc01.example.org", + ), + process=ProcessContext( + pid=444, + parent_pid=4, + image=r"C:\Windows\System32\svchost.exe", + command_line="svchost.exe -k netsvcs", + username="SYSTEM", + start_time=ts, + ), + network=NetworkContext( + src_ip="10.0.0.10", + src_port=49153, + dst_ip="10.0.0.20", + dst_port=88, + protocol="tcp", + initiating_pid=444, + ), + ) + + emitter._render_connection(event) + + assert "principal" not in emitted[0] + def test_inbound_flow_uses_destination_listener_pid(self, emitter, monkeypatch, ts): """Inbound host observations should use the local listener PID when known.""" emitted: list[dict] = [] @@ -668,6 +762,58 @@ def test_inbound_flow_uses_destination_listener_pid(self, emitter, monkeypatch, assert emitted[0]["direction"] == "INBOUND" assert emitted[0]["pid"] == 24118 + def test_inbound_listener_flow_can_render_principal(self, emitter, monkeypatch, ts): + """Observed listener-side FLOW rows can carry local service principal context.""" + monkeypatch.setattr( + "evidenceforge.generation.emitters.ecar.ecar_flow_identity_config", + lambda: { + "user_process_probability": 0.0, + "service_process_probability": 0.0, + "root_process_probability": 0.0, + "inbound_listener_probability": 1.0, + }, + ) + emitted: list[dict] = [] + monkeypatch.setattr(emitter, "emit_event", emitted.append) + state = StateManager() + state.set_current_time(ts) + pid = state.create_process( + "WEB-EXT-01", + 0, + "/usr/sbin/apache2", + "/usr/sbin/apache2 -DFOREGROUND", + "www-data", + "System", + ) + emitter._state_manager = state + emitter._system_pids = {"WEB-EXT-01": {"apache2": pid}} + event = SecurityEvent( + timestamp=ts, + event_type="connection", + dst_host=HostContext( + hostname="WEB-EXT-01", + ip="10.0.0.20", + os="Ubuntu 22.04", + os_category="linux", + system_type="server", + fqdn="web-ext-01.example.org", + ), + network=NetworkContext( + src_ip="198.51.100.7", + src_port=49152, + dst_ip="10.0.0.20", + dst_port=443, + protocol="tcp", + initiating_pid=-1, + ), + ) + + emitter._render_connection(event) + + assert emitted[0]["direction"] == "INBOUND" + assert emitted[0]["pid"] == pid + assert emitted[0]["principal"] == "www-data" + def test_rejected_inbound_flow_does_not_claim_listener_pid(self, emitter, monkeypatch, ts): """Rejected inbound attempts should not be attributed to a server process.""" emitted: list[dict] = [] diff --git a/tests/unit/test_validate_config.py b/tests/unit/test_validate_config.py index 95c252da..21ab7d27 100644 --- a/tests/unit/test_validate_config.py +++ b/tests/unit/test_validate_config.py @@ -69,6 +69,12 @@ def load_invalid_endpoint_noise(): "suppress_roles": ["domain_controller"], } }, + "ecar_flow_identity": { + "user_process_probability": 0.88, + "service_process_probability": 0.48, + "root_process_probability": 0.42, + "inbound_listener_probability": 0.36, + }, } monkeypatch.setattr(endpoint_noise, "load_endpoint_noise", load_invalid_endpoint_noise) From 044b0972b6b7dac0e4ae419a76fba7284e1011c7 Mon Sep 17 00:00:00 2001 From: "David J. Bianco" Date: Sat, 16 May 2026 06:27:37 -0400 Subject: [PATCH 61/61] docs: record loop 30 assessment results --- TODO.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/TODO.md b/TODO.md index 51d220e9..b9d2bbc2 100644 --- a/TODO.md +++ b/TODO.md @@ -389,7 +389,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 18 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `bc738f2`: repaired the high-volume exact `0.8` Zeek TLS/web duration floor by adding deterministic post-floor texture in both generator-owned TLS durations and Zeek render-time fallback floors. Verification passed with focused TLS/activity tests, broader Zeek/activity/timing slices (`289 passed, 13 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3131 passed, 37 skipped`). Regenerated eval passed at exact `96.09/100` across `81,173` records; hard probes found zero exact `0.8` TLS rows, zero rows in the `0.800`-`0.801` band, max repeated TLS duration bucket of `2`, and preserved prior UDP/DNS, cross-sensor DNS, Windows LUID, and SSH syslog gates. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `72`, Network `74`, Host/EDR `64` (average `71.5`). Top Loop 19 targets are HTTP/proxy source-native response semantics, cross-sensor Zeek timing-band regularity, public TLS/web long-tail texture, same-LUID Security/Sysmon LogonGuid consistency, and bash/host authoredness. - [x] Loop 19 fix, regeneration, hard probes, quantitative eval, and blind review completed from commits `dc4616c` and `b4c99b1`: repaired HTTP/proxy source-native response semantics by blocking Windows Update Agent from Linux package paths, OS-gating `packages.microsoft.com` URI/UA selection, normalizing redirect/error HTTP MIME to source-native HTML bodies, and preventing HTTP file-transfer fan-out from reintroducing asset MIME on redirects/errors. Verification passed with focused HTTP/proxy/Zeek tests, broader related slices (`269 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3134 passed, 37 skipped`). Regenerated eval passed at exact `96.23/100` across `90,613` records; hard probes found zero Windows Update Agent + Ubuntu package path violations, zero redirect/error asset-MIME violations, zero exact `0.8` TLS rows, and zero rows in the `0.800`-`0.801` TLS duration band. Blind synthetic-confidence scores were Threat Hunter `76`, Detection `76`, Network `76`, Host/EDR `82` (average `77.5`). Top Loop 20 targets are user-shell/UWP processes incorrectly modeled as SYSTEM/session 0, duplicate `explorer.exe` shells from one `userinit.exe`, Zeek HTTP connection reuse, Snort TLS-failure vs Zeek established-TLS conflicts, and Linux `scp` eCAR lifecycle coverage. - [x] Loop 20 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `76bc107`: repaired user-shell/UWP process ownership by removing desktop helpers from system-service pools, rerouting any remaining `sihost.exe`, `RuntimeBroker.exe`, `backgroundTaskHost.exe`, and `SearchHost.exe` system-process selections into the active interactive session, and reusing the per-session `explorer.exe` shell instead of emitting duplicate primary shell creates. Verification passed with focused regressions, related activity/system-process/spawn-rule tests (`236 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3138 passed, 37 skipped`). Regenerated eval passed at exact `96.25/100` across `79,758` records; hard probes found zero shell/UWP SYSTEM/session-0 violations across Security, Sysmon, and eCAR, zero duplicate primary `explorer.exe` create clusters, and preserved prior Windows Update/Ubuntu, redirect/error MIME, and TLS duration gates. Blind synthetic-confidence scores were Threat Hunter `72`, Detection `74`, Network `82`, Host/EDR `70` (average `74.5`). Top Loop 21 target is the concrete Windows 5156/eCAR browser-flow attribution defect where browser-like Zeek HTTP rows join to host sockets/processes attributed to `svchost.exe`; next targets are Linux syslog/bash cadence, public DNS/X.509 corpus realism, Zeek HTTP connection reuse, and cross-sensor timing texture. -- [ ] **IN PROGRESS** Current-dev assessment continuation loops 21-30 — continue the requested next 10 EvidenceForge realism loops from Loop 20, targeting the highest-leverage verified findings from each panel while keeping the draft PR open. +- [x] Current-dev assessment continuation loops 21-30 — completed the requested next 10 EvidenceForge realism loops from Loop 20, targeting the highest-leverage verified findings from each panel while keeping the draft PR open. - [x] Loop 21 fix, regeneration, hard probes, quantitative eval, and blind review completed from commit `c9a7b72`: repaired browser-like HTTP process/network ownership by resolving Windows browser User-Agents to active interactive browser processes when possible, clearing misleading service-process attribution when no user session exists, and scoping internal human-browser web visitors to workstation clients. Verification passed with focused regressions, related eCAR/proxy/web/baseline tests (`107 passed, 1 skipped`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3141 passed, 37 skipped`). Regenerated eval passed at exact `96.40/100` across `80,222` records; hard probes found zero browser-like Zeek HTTP rows joined to Windows 5156 or eCAR `svchost.exe`, while all `497` matched WFP browser rows were browser-owned and service/tool HTTP retained service ownership where appropriate. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `68`, Network `84`, Host/EDR `72` (average `75.5`). Top Loop 22 target is Linux endpoint realism: bash command cadence and fleet-wide repeated syslog daemon pools, followed by public DNS/X.509 metadata templates, Zeek multi-sensor timing texture, HTTP connection reuse, and SYSTEM Sysmon `LogonGuid` morphology. - [x] Loop 22 fix pass — repaired the multi-reviewer Linux endpoint realism finding at the generator/config layers: bash histories now use per-host/user session pacing with bounded quick-command streaks, command pools suppress high exact-repeat counts, and extra syslog daemon pools are scoped by system type/role with lower weights and expanded templates. Verification before regeneration passed with focused regressions (`46 passed`), related activity/config tests (`212 passed`), `eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3144 passed, 37 skipped`). - [x] Loop 22 regeneration, hard probes, quantitative eval, and blind review completed from commit `ecc45ef`: repaired Linux endpoint cadence fingerprints by adding session-aware bash pacing, tighter exact-command repeat suppression, and system-type/role-scoped extra syslog pools. Regenerated eval passed at exact `96.14/100` across `73,808` records; hard probes found zero server-side desktop-daemon syslog rows, zero non-database `multipathd` rows, bash `<=10s` gaps at `47/726` (`6.47%`), median bash delta `85s`, and max per-file exact-command repeat `4`. Blind synthetic-confidence scores were Threat Hunter `78`, Detection `66`, Network `84`, Host/EDR `42` (Host/EDR verdict Real at confidence `58`; average synthetic-confidence `67.5`). Top Loop 23 target is the hard Zeek multi-sensor timing fingerprint where mirrored DMZ records always trail core records by a tiny positive offset; next targets are public DNS/X.509 corpus templates, remaining Linux daemon-message repetition, DNS TXT tunnel vocabulary/cadence, and HTTP connection reuse. @@ -404,7 +404,7 @@ Replaced manual per-emitter field coordination with SecurityEvent intermediate r - [x] Loop 29 fix pass — repaired Linux syslog timer and daemon texture by adding data-driven systemd schedule filters for role, excluded role, service/package state, per-host probability, slot skip probability, and slot jitter; scoped `phpsessionclean` to PHP-backed web hosts instead of generic web/proxy hosts; and replaced formulaic `irqbalance`/systemd-resolved trust-anchor messages with source-native bounded ranges. Verification passed with focused config/syslog tests (`44 passed`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3159 passed, 37 skipped`). - [x] Loop 29 regeneration, hard probes, quantitative eval, deliberation, and blind review completed from commit `e37a5f3`: regenerated eval passed at exact `95.99/100` across `76,333` records; hard probes verified `phpsessionclean` only on `WEB-EXT-01`, zero non-PHP `phpsessionclean` hosts, zero `Positive Trust Anchors` lines, zero old `irqbalance` sample/pass lines, zero large IRQ counters, zero self-move IRQ lines, and zero `systemd-resolved` self-contradictions. Blind synthetic-confidence scores were Threat Hunter `82`, Detection `64`, Network `60`, Host/EDR `39` (Host/EDR verdict Real at confidence `61`; average `61.25`). Top Loop 30 generator-owned target is eCAR FLOW principal attribution because Detection found dataset-wide blank FLOW principals even when process/user context exists; next targets are TLS/X.509 SAN diversity plus AD SRV responses, DB bash/eCAR command timing, and additional DNS/C2 texture. Scenario-authored name legibility remains the highest broad tell but is deferred unless scenario edits are authorized. - [x] Loop 30 fix pass — added data-driven mixed eCAR FLOW principal attribution so user-owned process flows usually carry process/user identity, service/root flows carry it less often, inbound listener flows can occasionally expose local service principals, and unknown/rejected flows remain source-native gaps. Verification passed with focused eCAR/config tests (`93 passed`), broader eCAR tests (`88 passed`), `uv run eforge validate-config`, Ruff checks/format checks, `git diff --check`, and full normal `uv run pytest --no-cov -q` (`3162 passed, 37 skipped`). - - [ ] **IN PROGRESS** Loop 30 regeneration, hard probes, quantitative eval, and blind review. + - [x] Loop 30 regeneration, hard probes, quantitative eval, deliberation, and blind review completed from commit `bc3772d`: regenerated eval passed at exact `95.99/100` across `76,333` records; hard probes found `4,579/13,240` eCAR FLOW records now carry mixed principals (`34.58%` overall, `52.31%` outbound, `15.30%` inbound), with zero `pid=-1` principal leaks and zero failed-flow principal claims. Blind synthetic-confidence scores were Threat Hunter `84`, Detection `62`, Network `68`, Host/EDR `39` (Host/EDR verdict Real at confidence `61`; average `63.25`). The eCAR all-blank FLOW principal tell is fixed, but the panel surfaced a verified higher-leverage contradiction: DB bash history records the `scp /tmp/mhs-archive.sql.gz` command at `17:22:09Z` while eCAR records the same process/flow at `17:15:38-17:15:39Z`. Next generator-owned targets are bash-history/process timing alignment, web/proxy path-template diversity, and richer TLS/X.509 SAN distributions; scenario-authored name legibility remains deferred unless scenario edits are authorized. - [x] Full slow-suite regression cleanup after loop-65 merge — explicit-proxy storyline beacons now preserve authored hostname+destination IP pairs only when the storyline marks that pair as intentional, normal proxy-origin DNS resolution remains intact, and the parallel-generation LogonID assertion treats Type 7 unlock reuse as valid slice-of-time Windows behavior. Verified with targeted proxy/parallel tests, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run pytest -v --include-slow` (`2875 passed, 23 skipped`). Detection Engineer blind review completed for the regenerated Loop 61 dataset at `scenarios/iteration-test/data`; reviewer verdict: Synthetic, 63/100 confidence. Main findings: one PROXY-01 sshd accepted-login lifecycle gap/self-source artifact and Windows 4648 explicit-credential caller PID/image provenance ambiguity around `WS-MCHEN-01`.