Skip to content

Improve error reporting in final JSON results#29

Open
cjonas9 wants to merge 23 commits intodevfrom
improve-error-reporting
Open

Improve error reporting in final JSON results#29
cjonas9 wants to merge 23 commits intodevfrom
improve-error-reporting

Conversation

@cjonas9
Copy link
Copy Markdown
Contributor

@cjonas9 cjonas9 commented Apr 3, 2026

What

Following the stepped-pacer redesign (#23):

  • Adds a per-step-interval timeline of windows/snapshots to the final JSON results. Each entry captures target RPS, success/error counts, error rate (%), and latency percentiles (p50, p95, p99, p99.9) for that window only
  • Adds CLI argument --error-percent which specifies the tolerance for errors (e.g. --error-percent 60 would kill the test after receiving over 60% error/timeout in any 5s window).
  • Removes the redundant log file mirror (as it's now superseded by the richer JSON)
Visually, the results JSON now looks like:
{
  /*
  Old: cumulative summary data of the entire run
  */
  "start": ...,
  "end": ...,
  "duration_seconds": 123,
  "endpoints": {
    "getLedgers": {
       // summary for the entire run of performance, error breakdowns by type, etc.
      },

    /*
    NEW: data of how the RPC performed at each RPS phase for this endpoint
    */
      "timeline": [
        {
          "target_rps": 7.14, "success": 142, "errors": 0, "error_rate_pct": 0,
          "p50_ms": 165.631, "p95_ms": 537.087, "p99_ms": 900.095, "p99.9_ms": 935.423
        },
        {
          "target_rps": 14.29, "success": 277, "errors": 0, "error_rate_pct": 0,
          "p50_ms": 262.399, "p95_ms": 734.719, "p99_ms": 974.335, "p99.9_ms": 1098.751
        },
        ...,
        {
          "target_rps": 50, "success": 0, "errors": 116, "error_rate_pct": 100,
          "p50_ms": 15007.743, "p95_ms": 15007.743, "p99_ms": 15007.743, "p99.9_ms": 15007.743
        }
      ]
    }
  }
}

The timeline intervals default to 5s when no ramp-up is provided as there's no natural "RPS phase" notion without a ramp-up that steps through various RPS levels. Degradation still occurs with a fixed RPS level, so this allows for a timeline-d degradation curve without a ramp-up period.

Why

Conceptually, this is a minor rework of the aggregator: stats are accumulated in windows which are deposited into the aggregator and are reset thereafter. This allows for the aggregation of stats by their current RPS level, which provides useful/digestible instantaneous info (as opposed to relying on cumulative stats).

The richer JSON gives operators a clear view of how the RPC degraded as load increased. Previously, the results JSON flattened all metrics into a single summary bucket per endpoint without any notion of an RPS-timeline, so there was no way to tell at what RPS the RPC buckled. This gives an operator running a load test a concise/minimal degradation curve (e.g. when errors started, how latency + error rates changed at each step) without requiring them to dig through through logs.

Known limitations

N/A

Base automatically changed from stepped-pacer to dev April 10, 2026 17:47
@cjonas9 cjonas9 force-pushed the improve-error-reporting branch from 169bb2e to fad8658 Compare April 10, 2026 18:47
@cjonas9 cjonas9 marked this pull request as ready for review April 10, 2026 18:48
@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 10, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedgithub.com/​stellar/​go-stellar-sdk@​v0.3.0 ⏵ v0.5.075 +1100100100100

View full report

@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 10, 2026

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
License policy violation: golang github.com/creachadair/mds under BSD-3-Clause-HP

License: BSD-3-Clause-HP - The applicable license policy does not permit this license (5) (LICENSE)

From: ?golang/github.com/creachadair/jrpc2@v1.3.5golang/github.com/creachadair/mds@v0.26.2

ℹ Read more on: This package | This alert | What is a license policy violation?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore golang/github.com/creachadair/mds@v0.26.2. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

Warn High
Obfuscated code: golang github.com/pelletier/go-toml/v2 is 98.0% likely obfuscated

Confidence: 0.98

Location: Package overview

From: ?golang/github.com/spf13/viper@v1.21.0golang/github.com/pelletier/go-toml/v2@v2.3.0

ℹ Read more on: This package | This alert | What is obfuscated code?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Packages should not obfuscate their code. Consider not using packages with obfuscated code.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore golang/github.com/pelletier/go-toml/v2@v2.3.0. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

// mergedHistogram builds a combined histogram from all windows.
func (e *EndpointStats) mergedHistogram() *hdrhistogram.Histogram {
merged := hdrhistogram.New(1, 60000000, 3)
for i := range e.windows {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’re using the counter to index into the array, the expended range loop is more concise, ie for _, elem := range e.windows

return nil, err
}
fmt.Fprintf(&buf, "\n %q: ", name)
buf.Write(epJson)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m almost certain that if you give each endpoint (and timeline) structure a custom marshalling routine that doesn’t do indentation, you can eliminate most if not all of the byte-by-byte serialization here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love that, I was unsuccessful in trying this before but I'll give that approach a shot. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent some time looking into this, and I'm not confident it's possible (or, at this point, worthwhile).

The main issue is that any notion of "marshal + indent" happens in those two, disjoint steps: marshal things to bytes, then write those bytes. The latter step is a standalone json function rather than something we can implement, and encoding/json's MarshalJSON interface only lets us control what gets serialized rather than how it's formatted. json.Indent, seemingly, is completely agnostic to types or any idea higher-level than "is this a { or [ character"?

Separately, I really wanted this to be true, as it's a bit baffling that this doesn't natively exist. The alternative to this Fprintf byte-by-byte approach is to make MarshalJSON do our marshaling, then have a custom indenter that does what we want at the byte level (i.e. not adding newlines + indents to things in an array). Which is still ugly + more work than it should be, but I do feel like digging my heels in on keeping timeline entries on one line. But if there's some way of doing this that I missed, let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants