Skip to content

Add tracer spans inside list path so slow ps calls are visible#208

Draft
sjmiller609 wants to merge 1 commit intomainfrom
hypeship/trace-list-instances
Draft

Add tracer spans inside list path so slow ps calls are visible#208
sjmiller609 wants to merge 1 commit intomainfrom
hypeship/trace-list-instances

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Summary

The /instances Server span has no children today, so a slow hypeman ps (and the api ListInstances calls from reaper workflows) is completely opaque in traces. The handler walks every metadata file, derives state, scans rotated app logs for boot markers, and then runs a second pass that takes per-instance write locks to persist any markers it just hydrated — but none of that work emits spans. When something stalls inside the loop, traces show a single multi-second server span with nothing under it.

This adds tracer.Start spans throughout the listing pipeline:

  • instances.list — wraps ListInstances
  • instances.list.persist_boot_markers — wraps the post-loop persist pass (records how many instances actually persisted)
  • instances.list_metadata — wraps listInstances (records metadata_files and instances counts)
  • instances.list_metadata.hydrate_one — per-instance, with instance_id
  • instances.derive_state — wraps deriveStateWithOptions
  • instances.hydrate_boot_markers — wraps hydrateBootMarkersFromLogs
  • instances.parse_boot_markers — wraps parseBootMarkers (records log_paths count, since the rotated-log scan size matters here)
  • instances.persist_boot_markers — wraps maybePersistBootMarkers (so we can see lock-wait time)

To make those spans nest correctly, hydrateBootMarkersFromLogs and parseBootMarkers now take a context.Context; the only non-test caller already had ctx in scope, and the two test callers pass t.Context().

Test plan

  • go test ./lib/instances/ -run='TestParseBootMarkers|TestHydrateBootMarkers'
  • go test ./cmd/api/api/ -run='TestList'
  • After deploy: confirm a GET /instances trace in SigNoz shows the new instances.* child spans and that the slow region is now attributable.

🤖 Generated with Claude Code

The /instances Server span has no children today, so a slow ps is
opaque in traces. Instrument the listing pipeline:
- instances.list (ListInstances)
- instances.list.persist_boot_markers (post-loop pass)
- instances.list_metadata (listInstances) + per-id hydrate_one
- instances.derive_state
- instances.hydrate_boot_markers
- instances.parse_boot_markers (records log_paths count)
- instances.persist_boot_markers (under per-instance lock)

Plumb context.Context through hydrateBootMarkersFromLogs and
parseBootMarkers so child spans hang off the right parent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant