Fix CPU/Energy regressions for issue #139 by bald-ai · Pull Request #402 · steipete/CodexBar

bald-ai · 2026-02-19T13:34:07Z

Fixes #139

This is my first PR on this codebase, and I tried to be diligent and explicit about what I changed and how I validated it.

What was causing the high CPU/energy usage

The issue was not one single bug; it was three independent performance problems that could stack:

Codex CLI failure path could stay active too long (main culprit).
OpenAI dashboard web fetch could spend too long retrying under bad auth/cookies.
Menu bar blink task woke too often while idle.

What this PR changes

1) Main culprit: Codex CLI failed-run window is now short and bounded

File: Sources/CodexBarCore/Providers/Codex/CodexStatusProbe.swift

Reduced default timeout from 18s to 8s.
Changed retry policy:
- before: retry on .parseFailed and .timedOut
- now: retry only on .parseFailed
Added short parse retry timeout (4s).

Result: bad CLI states fail fast and wait for next scheduled refresh instead of burning CPU for long windows.

2) OpenAI web dashboard fetch timeouts are capped lower

File: Sources/CodexBar/UsageStore.swift

Primary timeout: 15s
Retry timeout: 8s

Result: bad web session/cookie cases stop much sooner.

3) Idle blink loop is now adaptive

File: Sources/CodexBar/StatusItemController+Animation.swift

Removed fixed 75ms wakeups while idle.
Keep 75ms cadence only during active blink animation.

Result: less idle wakeup noise in normal usage.

4) Documentation updates

Updated Codex provider behavior notes: docs/codex.md
Added pre-fix simulation report: docs/perf-energy-issue-139-simulation-report-2026-02-19.md
Added post-fix validation report: docs/perf-energy-issue-139-main-fix-validation-2026-02-19.md

Measured impact (before vs after)

Main culprit comparison (Codex CLI failed path):

Metric	Before	After	Delta
Failed-run window	42.00s (18+24 code-path budget)	12.67s measured mean	-69.8%
Avg child CPU during failed run	113.32%	89.34%	-21.2%
CPU-time exposure (`CPU * duration`)	4759.44	1132.94	-76.2%
Leftover child processes after failed run	not captured pre-fix	0	improved

Validation run

Commands executed:

./Scripts/lint.sh format
./Scripts/lint.sh lint (strict swiftlint)
swift test
pnpm check
./Scripts/compile_and_run.sh

All passed.

Attachments / transparency

I will upload these to the PR thread:

Activity Monitor before/after screenshots (CPU/Energy impact).
ai_conversation_full.jsonl (the conversation where I finalized and implemented this fix end-to-end).

I had earlier exploration chats too, but this attached one is the conversation where the final implementation was locked in.

AI assistance disclosure

This PR was prepared with AI assistance (analysis, implementation support, and test/report drafting), with manual review and validation by me before submission.

bald-ai · 2026-02-19T13:39:14Z

ai_conversation_full.zip

ratulsarna · 2026-02-19T19:54:05Z

A question before we merge: can you share how you landed on the new timeout values (8s/4s for Codex CLI and 15s/8s for OpenAI web), and whether you saw any increased “no data/stale” behavior in slower/flaky conditions? I’m aligned with the faster-fail direction, just want to explicitly confirm the tradeoff we’re accepting.

bald-ai · 2026-02-20T19:51:46Z

Hi, I wrote answer and let AI format it for easier readability. If you want me to do some precise testing, no problem but you will have to tell me what exactly you need. Also feel free to change the numbers if you can make better guess. Ideally it would be better to get from Peter some better way to get the info without launching the entire CLI. Maybe he can hook you with solution from big token?

AI answer

Hi, I honestly didn’t know the exact “correct” way to pick those timeout values, so I made a practical guess to get the
best fix without accidentally removing intended behaviour.

Reason:

Happy-path runs are usually in seconds.
These values still allow retries.
They hard-cap bad loops so we don’t burn CPU forever.

What I saw in my logs:

Codex RPC (happy path):

Median: 1.10s
P95: 2.95s
Max: 11.26s (rare outlier)

OpenAI web refresh (Feb 20, 2026):

Median: 3.40s
P95: 4.08s
Max: 8.65s
Runs over 8s: 1/35

No-data / stale behavior:

I didn’t run a dedicated flaky-network benchmark.
I did set up runtime logging yesterday.
Total samples: 767
Healthy samples: 754
Overall healthy rate: 98.31%
Feb 20 healthy rate: 99.79% (474/475)
Last 120 samples healthy rate: 99.17% (119/120)

So from what I collected, failures looked like short blips, not long degraded periods.

bald-ai · 2026-02-20T19:53:15Z

Ou btw the loggs are in my private version, I shipped it without and made custom version for myself. That felt like right way to do it.

Reduce perf regressions in Codex, web fetch, and idle animation

d7f9d5b

ratulsarna added the question Further information is requested label Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix CPU/Energy regressions for issue #139#402

Fix CPU/Energy regressions for issue #139#402
bald-ai wants to merge 1 commit intosteipete:mainfrom
bald-ai:codex/perf-issue-139

bald-ai commented Feb 19, 2026

Uh oh!

bald-ai commented Feb 19, 2026

Uh oh!

ratulsarna commented Feb 19, 2026

Uh oh!

bald-ai commented Feb 20, 2026

Uh oh!

bald-ai commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

bald-ai commented Feb 19, 2026

What was causing the high CPU/energy usage

What this PR changes

1) Main culprit: Codex CLI failed-run window is now short and bounded

2) OpenAI web dashboard fetch timeouts are capped lower

3) Idle blink loop is now adaptive

4) Documentation updates

Measured impact (before vs after)

Validation run

Attachments / transparency

AI assistance disclosure

Uh oh!

bald-ai commented Feb 19, 2026

Uh oh!

ratulsarna commented Feb 19, 2026

Uh oh!

bald-ai commented Feb 20, 2026

AI answer

Uh oh!

bald-ai commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants