Fix 8 bugs: TOCTOU races, panics, and missed cases from PR #150 by cep21 · Pull Request #151 · cep21/circuit

cep21 · 2026-02-26T20:40:13Z

Summary

Follow-up to #150 — 8 additional bugs found across the same areas. All fixes are backwards-compatible (panic→works or panic→error; no behavior changes for existing valid usage).

Fixes

#	Bug	File	Fix
1	`ConsecutiveErrOpenerFactory` config race	`closers/simplelogic/closers.go`	Copy config before `Merge` (same pattern as #150 for hystrix factories)
2	`openCircuit` can emit `Opened()` 2-3× under concurrent calls	`circuit.go`	`CompareAndSwap` before emitting metric
3	`close` can emit `Closed()` multiple times	`circuit.go`	`CompareAndSwap` before emitting metric
4	`Opener.SetConfigNotThreadSafe` nil-deref on `props.Now`	`closers/hystrix/opener.go`	Use nil-safe `props.now()` helper (already existed at line 53)
5	`RollingCounter.GetBuckets`/`.String()` divide-by-zero on zero value	`faststats/rolling_counter.go`	Guard `NumBuckets == 0`
6	`RollingCounter.UnmarshalJSON` nil-deref on incomplete JSON	`faststats/rolling_counter.go`	Return error on missing required fields (panic→error)
7	Fallback `MaxConcurrentRequests` double-read	`circuit.go`	Read once into local (same pattern as #150 for main throttle)
8	`MockClock.AfterFunc` — negative duration doesn't fire immediately	`internal/clock/clock.go`	`d == 0` → `d <= 0` (matches real `time.AfterFunc`)

Backwards compatibility

CAS in open/close — Opened()/Closed() now fire after isOpen is set (previously before). Audited all in-repo Metrics implementations — none call IsOpen() from inside the callback. The old ordering was non-deterministic under the TOCTOU anyway.
UnmarshalJSON — incomplete JSON (e.g. {}) now returns an error instead of panicking. Round-trip (Marshal→Unmarshal) is unaffected and fully covered by new tests.
GetBuckets — returns nil for zero-value counter instead of panicking. range nil is a no-op so callers are safe.

Test coverage

Every fix has a regression test:

TestOpenCircuit_EmitsOpenedExactlyOnce / TestCloseCircuit_EmitsClosedExactlyOnce — 100 concurrent goroutines
TestConsecutiveErrOpenerFactory_ConcurrentCreation — 100 concurrent factory calls under -race
TestOpener_SetConfigNotThreadSafe_NilNow — direct call without Now set
TestRollingCounter_Empty — extended for zero-value GetBuckets/String
TestRollingCounter_UnmarshalJSON_IncompleteInput — 5 partial-JSON variants × 2 receiver states
TestRollingCounter_UnmarshalJSON_RoundTrip — key BC test: 3 counter states marshal→unmarshal cleanly
TestMockClock_AfterFunc — extended for negative duration

Verification

go test -race -count=10 ./...   # all pass
go vet ./...                    # clean
golangci-lint run ./...         # 0 issues

- fix: use CAS in openCircuit/close to emit Opened()/Closed() exactly once - fix: copy config in ConsecutiveErrOpenerFactory closure (race hygiene, matches PR #150) - fix: read Fallback.MaxConcurrentRequests once in throttle check (matches PR #150) - fix: use nil-safe props.now() in Opener.SetConfigNotThreadSafe - fix: guard NumBuckets==0 in RollingCounter.GetBuckets (fixes String() panic) - fix: nil-check pointer fields in RollingCounter.UnmarshalJSON - fix: MockClock.AfterFunc fires immediately for d<=0 (matches real time.AfterFunc) All fixes are backwards-compatible (panic->works, race->clean, or strictly additive guards). Regression tests added for each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…g inconsistent state The previous fix (nil-check each pointer field and skip) made zero-value receivers safe but created a new deferred-panic vector: on a pre-initialized counter, UnmarshalJSON({}) would set buckets=nil while leaving NumBuckets unchanged, causing GetBuckets to panic later with index out of range. UnmarshalJSON is a round-trip API — MarshalJSON always emits all fields. Incomplete JSON is corrupt/truncated input. Return an error (receiver unmodified) rather than silently partial-apply. Backwards-compatible vs released master: panic -> error (strictly better). Tests added: - IncompleteInput: 5 partial-JSON variants x 2 receiver states (all must error, receiver preserved) - RoundTrip: 3 counter states marshal+unmarshal cleanly (the BC guarantee) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Systematic audit for timing-sensitive tests that can flake on loaded CI runners. - TestCircuitAttemptsToReopenOnlyOnce: convert to MockClock for deterministic timing. The previous real-clock version with SleepWindow: 1ms flaked when CI scheduling delays exceeded 1ms (same pattern PR #141 fixed in the sister test TestCircuitAttemptsToReopen). This is the flake that caused the CI failure. - TestThrottled, TestFallbackCircuitConcurrency: replace time.Sleep-based concurrency with a barrier pattern. Wait until ALL goroutines have either entered the throttled section or been rejected before releasing — guarantees the over-limit goroutine attempts while slots are occupied, regardless of scheduling. Also fixed: errCount++ was not atomic. - TestRollingCounter_Inc: use StringAt(now)/RollingSumAt(now) instead of String()/RollingSum(). The latter use real time.Now() which can advance past the 10ms rolling window on a slow runner, rolling out buckets before later Inc(now) calls. - TestTickUntil: register AfterFunc callbacks BEFORE starting the TickUntil goroutine. If TickUntil advances mock time first, callbacks get scheduled relative to the advanced time, breaking the final m.Now() == now+3h assertion. - TestManagerConcurrentFactoryConfiguration: drop require.NoError on Execute. With 300 concurrent executions under -race and only a 2x sleep/timeout margin, CI jitter can cause spurious timeouts. The test is about concurrent factory creation, not execution timing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…efile ## Test infrastructure - circuit_property_test.go: invariant-asserting stress tests + deterministic state-machine property test (testing/quick + MockClock + reference model). Verified catches the TOCTOU bugs from PR #151 by temporarily reverting CAS. - TestCircuit_RunMetricsExactlyOnce_Concurrent: verifies the exactly-once RunMetrics contract under 100K concurrent Execute calls - TestCircuit_TransitionAlternation_Concurrent: verifies Opened/Closed alternate correctly (closed ≤ opened ≤ closed+1) under contention - TestCircuit_FallbackMetricsExactlyOnce_Concurrent: same for fallback path - TestCircuit_StateMachineProperty: 500 random op sequences replayed against real circuit + trivial reference model, 6 invariants checked per step, fully deterministic via MockClock - faststats/fuzz_test.go: 5 Go-native fuzz targets + naiveRollingCounter differential oracle. Seeds run as unit tests in ~5ms. - FuzzRollingCounterOps: Inc/RollingSumAt/GetBuckets vs naive oracle - FuzzRollingCounterJSON: round-trip + hostile-input safety - FuzzSortedDurationsPercentile: bounds + monotonicity - FuzzRollingBucketAdvance: index bounds + LastAbsIndex monotonicity - FuzzTimedCheckJSON: round-trip + hostile-input safety - Tightened existing stress tests (were assertion-free, just t.Logf): - circuit_stress_test.go: TestCircuitStateTransitionRace now verifies Opened/Closed alternation + ConcurrentCommands balance - faststats/rolling_stress_test.go: deleted 2 permanent-skip tests, added real TestTimedCheckConcurrency, added RollingSum≤TotalSum check ## Bug fixes (found by fuzzers in ~10s) - faststats/rolling_percentile.go: Percentile(NaN) panicked — NaN falls through both p<=0 and p>=100 guards, int(math.Floor(NaN)) is INT64_MIN on amd64, index-out-of-range. Now returns -1. - faststats/rolling_counter.go: UnmarshalJSON accepted hostile JSON with NumBuckets > len(Buckets), then GetBuckets panicked. Now rejected at unmarshal time with a descriptive error. ## Build tooling - Makefile: restored after Jan 2024 deletion. make (fast dev loop), make ci (pre-PR gate mirroring GitHub Actions), make fuzz, make bench, make cover, make fix, make help. - .github/workflows/build.yml: Build/Test steps now call make build / make test-race (same commands, now DRY). - .github/workflows/fuzz.yml: daily scheduled fuzzing, 5min per target, uploads crash inputs as artifacts on failure. Manual dispatch enabled. - README.md: Development section points to make targets. Fixed dead Makefile link. - .gitignore: added /coverage.html (from make cover).

…efile (#152) (#152) * Add property/fuzz/invariant tests; fix 2 fuzz-found bugs; restore Makefile ## Test infrastructure - circuit_property_test.go: invariant-asserting stress tests + deterministic state-machine property test (testing/quick + MockClock + reference model). Verified catches the TOCTOU bugs from PR #151 by temporarily reverting CAS. - TestCircuit_RunMetricsExactlyOnce_Concurrent: verifies the exactly-once RunMetrics contract under 100K concurrent Execute calls - TestCircuit_TransitionAlternation_Concurrent: verifies Opened/Closed alternate correctly (closed ≤ opened ≤ closed+1) under contention - TestCircuit_FallbackMetricsExactlyOnce_Concurrent: same for fallback path - TestCircuit_StateMachineProperty: 500 random op sequences replayed against real circuit + trivial reference model, 6 invariants checked per step, fully deterministic via MockClock - faststats/fuzz_test.go: 5 Go-native fuzz targets + naiveRollingCounter differential oracle. Seeds run as unit tests in ~5ms. - FuzzRollingCounterOps: Inc/RollingSumAt/GetBuckets vs naive oracle - FuzzRollingCounterJSON: round-trip + hostile-input safety - FuzzSortedDurationsPercentile: bounds + monotonicity - FuzzRollingBucketAdvance: index bounds + LastAbsIndex monotonicity - FuzzTimedCheckJSON: round-trip + hostile-input safety - Tightened existing stress tests (were assertion-free, just t.Logf): - circuit_stress_test.go: TestCircuitStateTransitionRace now verifies Opened/Closed alternation + ConcurrentCommands balance - faststats/rolling_stress_test.go: deleted 2 permanent-skip tests, added real TestTimedCheckConcurrency, added RollingSum≤TotalSum check ## Bug fixes (found by fuzzers in ~10s) - faststats/rolling_percentile.go: Percentile(NaN) panicked — NaN falls through both p<=0 and p>=100 guards, int(math.Floor(NaN)) is INT64_MIN on amd64, index-out-of-range. Now returns -1. - faststats/rolling_counter.go: UnmarshalJSON accepted hostile JSON with NumBuckets > len(Buckets), then GetBuckets panicked. Now rejected at unmarshal time with a descriptive error. ## Build tooling - Makefile: restored after Jan 2024 deletion. make (fast dev loop), make ci (pre-PR gate mirroring GitHub Actions), make fuzz, make bench, make cover, make fix, make help. - .github/workflows/build.yml: Build/Test steps now call make build / make test-race (same commands, now DRY). - .github/workflows/fuzz.yml: daily scheduled fuzzing, 5min per target, uploads crash inputs as artifacts on failure. Manual dispatch enabled. - README.md: Development section points to make targets. Fixed dead Makefile link. - .gitignore: added /coverage.html (from make cover). * Fix pre-existing flake: RollingSumAt < 0 check must be post-quiescence TestRollingCounterBucketRolloverRace had a rollingSum < 0 check inside the hot concurrent loop since PR #139/#150. This check was always wrong for the lock-free RollingCounter design (documented at rolling_bucket.go): Thread A (Inc): buckets[idx].Add(1) Thread B (rollover): Swap(0) returns the incremented value, rollingSum.Add(-toDec) — briefly over-decrements Thread A (Inc cont): rollingSum.Add(1) — corrects A reader between B and A's rollingSum write observes a transient negative. The value is correct once writers quiesce. Moved the < 0 check to after wg.Wait(). The rollingSum <= TotalSum check stays in the hot loop (safe: reads are correctly ordered).

cep21 and others added 3 commits February 26, 2026 20:21

cep21 marked this pull request as ready for review February 27, 2026 19:37

cep21 merged commit 1224f2f into master Feb 27, 2026
6 checks passed

cep21 deleted the fix-toctou-panics-and-missed-cases branch February 27, 2026 19:37

cep21 mentioned this pull request Feb 28, 2026

Add property/fuzz/invariant tests; fix 2 fuzz-found bugs; restore Makefile (#152) #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 8 bugs: TOCTOU races, panics, and missed cases from PR #150#151

Fix 8 bugs: TOCTOU races, panics, and missed cases from PR #150#151
cep21 merged 3 commits into
masterfrom
fix-toctou-panics-and-missed-cases

cep21 commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cep21 commented Feb 26, 2026

Summary

Fixes

Backwards compatibility

Test coverage

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant