Skip to content

Exposure upsert does not refresh linkage fields on conflict, leaving stale InstalledSoftwareId / MatchedVersion #83

@FrodeHus

Description

@FrodeHus

Summary

ExposureDerivationService.DeriveAndUpsertPostgresAsync updates only LastObservedAt, Status, ResolvedAt, and LastSeenRunId on the ON CONFLICT DO UPDATE branch. The linkage fields (InstalledSoftwareId, SoftwareProductId, MatchedVersion, MatchSource) stay frozen at the original INSERT. Over time the exposure row stops describing the install that's currently driving it.

This isn't producing wrong dashboard counts, but it's a real debugging hazard. In a tenant we inspected:

  • 5 335 critical-Open exposures have InstalledSoftwareId = NULL because the originally-linked install was pruned by the stale-install sweep, and a later derivation re-emitted the exposure via a different install — but the upsert never re-pointed the linkage.
  • 30 862 more critical-Open exposures are linked to an install whose current Version differs from the exposure's frozen MatchedVersion.

End result: questions like "which install is this exposure citing?" or "what version was matched against the applicability?" can't be answered from the row.

Recommended fix

In src/PatchHound.Infrastructure/Services/ExposureDerivationService.cs (the INSERT … ON CONFLICT … DO UPDATE statement in DeriveAndUpsertPostgresAsync), extend the DO UPDATE SET clause:

ON CONFLICT ("TenantId", "DeviceId", "VulnerabilityId")
DO UPDATE SET
    "LastObservedAt"      = GREATEST(EXCLUDED."LastObservedAt",
                                     "DeviceVulnerabilityExposures"."LastObservedAt"),
    "Status"              = 'Open',
    "ResolvedAt"          = NULL,
    "LastSeenRunId"       = EXCLUDED."LastSeenRunId",
    "InstalledSoftwareId" = EXCLUDED."InstalledSoftwareId",
    "SoftwareProductId"   = EXCLUDED."SoftwareProductId",
    "MatchedVersion"      = EXCLUDED."MatchedVersion",
    "MatchSource"         = EXCLUDED."MatchSource"

The deduped CTE already picks one row per (device_id, vulnerability_id) with a deterministic preference for Product over Cpe matches (see comment around CASE match_source WHEN 'Product' THEN 0 ELSE 1 END), so the EXCLUDED.* values are the canonical "current evidence" tuple.

Mirror the change in the InMemory test path

DeviceVulnerabilityExposure.Reobserve(observedAt, runId) currently only updates LastObservedAt / Status / LastSeenRunId. Either:

  • Add a richer Reobserve(observedAt, runId, installedSoftwareId, softwareProductId, matchedVersion, matchSource) overload and have InMemoryBulkExposureWriter.UpsertAsync call it, or
  • Update the fields directly on the entity in the writer (less ceremony, but breaks encapsulation).

Prefer the entity overload for consistency with the rest of the codebase's factory-boundary discipline (see feedback_canonical_entity_factory_validation.md).

Acceptance criteria

  • Postgres upsert refreshes the four linkage columns on conflict.
  • InMemory writer matches.
  • New CTE test asserts that an exposure re-derived from a different install ends up pointing at the new install, with the new version.
  • No regression in ExposureDerivationServiceCteTests (all 6 still pass).
  • No regression in episode-closure flow — verify LastSeenRunId is still bumped on every conflict so ResolveStaleAsync continues to leave true stale exposures behind.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETPull requests that update .NET codeenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions