Skip to content

fix: improve ssm failure diagnostics and opentelemetry dependency management#274

Merged
l50 merged 2 commits into
mainfrom
chore/renovate-cap-opentelemetry
May 10, 2026
Merged

fix: improve ssm failure diagnostics and opentelemetry dependency management#274
l50 merged 2 commits into
mainfrom
chore/renovate-cap-opentelemetry

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented May 10, 2026

Key Changes:

  • Enhanced SSM error handling and diagnostics in EC2 and Red Taskfiles for clearer operator feedback
  • Added version cap for opentelemetry Rust crates in Renovate config to prevent dependency conflicts
  • Provided actionable recovery steps for SSM command delivery failures
  • Improved status reporting by including detailed status information in error messages

Added:

  • Opentelemetry Rust crate version restriction in .github/renovate.json5 to cap updates below 0.32, with explanatory comments and package list, to avoid unresolved dependency tree issues until upstream support is available

Changed:

  • SSM command error handling in .taskfiles/ec2/Taskfile.yaml and .taskfiles/red/Taskfile.yaml:
    • Enhanced failure diagnostics by querying and displaying StatusDetails from AWS SSM for remote build, deploy, config deploy, setup, start, report, launch, tool installation, and exec tasks
    • Added explicit checks for Undeliverable status, printing clear operator guidance, probable causes (e.g., ConnectionLost), and specific recovery instructions to reboot the affected EC2 instance
    • Improved error messages to include both status and detailed reason, reducing ambiguity for operators during EC2 or SSM outages or connectivity issues

…uild issues

**Added:**

- Added a rule in renovate.json5 to cap opentelemetry-rust monorepo crates below 0.32 to prevent dependency conflicts until tracing-opentelemetry supports 0.32
@dreadnode-renovate-bot dreadnode-renovate-bot Bot added the area/github Changes made to GitHub Actions workflows label May 10, 2026
@l50 l50 changed the title fix: improve SSM error diagnostics and recovery instructions for EC2 tasks chore(renovate): cap opentelemetry crates at <0.32 until tracing-opentelemetry catches up May 10, 2026
…d Taskfiles

**Changed:**

- Enhanced error handling for SSM command failures in multiple EC2 and red Taskfile tasks by capturing and displaying StatusDetails from AWS, providing more detailed failure diagnostics
- Added specific messaging and recovery instructions when SSM command delivery is undeliverable, including suggestions to check connection status and reboot the affected instance
- Improved clarity of error output for operators by distinguishing between script failures and delivery failures in SSM command invocations
@l50 l50 changed the title chore(renovate): cap opentelemetry crates at <0.32 until tracing-opentelemetry catches up fix: improve ssm failure diagnostics and opentelemetry dependency management May 10, 2026
@l50 l50 merged commit a67b9e8 into main May 10, 2026
7 checks passed
@l50 l50 deleted the chore/renovate-cap-opentelemetry branch May 10, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/github Changes made to GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant