Test overhaul + 2 security bug fixes by yyy900 · Pull Request #140 · openonion/connectonion

yyy900 · 2026-05-14T08:46:01Z

Summary

Test infrastructure overhaul + two security bug fixes uncovered while writing tests.

Two bugs found and fixed during testing (both flagged in commit messages):

bash_parser.py permission bypass — the AST walker stopped at the first word of each command and never descended into commandsubstitution nodes. echo $(rm -rf /) reported only ["echo"], so a user with Bash(echo *) approved could chain in any command via $() or backticks. Now recurses into substitutions.
host/auth.py 500 on malformed timestamp — abs(now - timestamp) raised TypeError when timestamp was a string, surfacing as 500 instead of 401. Now guarded with isinstance and returns clean unauthorized.

Other changes:

Test layout collapsed from 5 dirs (unit/integration/cli/e2e/real_api) to 2: unit/ and e2e/ (with e2e/cli/, e2e/real_api/, e2e/manual/ subdirs). Markers auto-applied by folder.
Dead tests removed (1044 LOC): test_smoke, test_imports, test_env_autoload, test_bash_chain_permissions, test_benchmarks, test_announce_only — all integers' worth of assert True / module-level skips / "needs rewrite" markers.
Metadata-only tests removed (21 tests across 11 files): assert handler._event_type == 'X', assert len(plugin) == N, plugin == [...] — these break only on cosmetic refactors, not on real bugs. Kept test_events.py (where the decorator IS the unit under test).
70+ new unit tests covering 10 previously-uncovered modules: runtime_input, ulw, bash_parser, fuzzy, element_finder format, session/merge, session/storage, ws_router/ping, agent error paths, auto_compact, scroll, exceptions.
pyproject.toml: dropped 4 unused dependencies (litellm, click, toml, PyJWT) — verified zero imports across the repo. click is still installed transitively via typer.
Env-dep failures fixed: test_plugin_system.py and test_config_permissions.py were silently failing locally for anyone without OPENAI_API_KEY — now use MockLLM.
Docs updated: CLAUDE.md, tests/README.md, TEST_ORGANIZATION.md, e2e/cli READMEs, agent.md; 13 source-file LLM-Note path refs updated.

Numbers

	Before	After
Unit tests	~1830	1870 (+40 net, more delta below)
Dead test LOC	1044 lines	0
Metadata-only tests	21	0 (outside `test_events.py`)
0-coverage modules	10+	0 of the audited list
Unused declared deps	4	0
Test directories	5	2

Test plan

pytest tests/unit — 1870 pass, 5 skip (PIL/Windows/optional TUI), 0 fail
pytest --collect-only — 2147 collected, 139 deselected (default not real_api and not network)
No stale path refs (grep "tested by \[tests/cli\|tests/integration\|tests/real_api" returns 0)
Real-API tests not run here (require keys; verify in CI with secrets)
CI green on all matrix Python versions

Commits

chore(deps) — drop 4 unused deps
test: overhaul test layout — reorg + dead/metadata cleanup + MockLLM fixes
fix(security) — bash_parser $() recursion
fix(auth) — non-numeric timestamp → 401
test: add unit tests for 10 previously-uncovered modules
docs — test layout docs + LLM-Note paths

Verified zero imports of these packages across the repo: - litellm: never imported (project has its own multi-provider LLM abstraction) - click: never imported directly (transitive via typer, will still install) - toml: never imported (no tomli/tomllib/tomlkit either) - PyJWT: never imported (jwt source code removed earlier, declaration left behind)

@Skip

Layout changes: - tests/integration/ removed (8 files redistributed: 5 to unit/, 3 to e2e/) - tests/cli/ → tests/e2e/cli/ - tests/real_api/ → tests/e2e/real_api/ - tests/integration/manual/ → tests/e2e/manual/ - pytest.ini and tests/conftest.py drop the integration marker Dead-code removal (1044 LOC): - test_smoke.py (assert True placeholder) - test_imports.py (zero-assert importlib smoke) - test_env_autoload.py (module-level skip — never runs in suite) - test_bash_chain_permissions.py (all @Skip with "needs rewrite") - test_benchmarks.py (MockLLM "benchmarks" + History API removed) - test_announce_only.py (skipif CI + if __name__ → demo, not test) Partial-skip cleanup (5 dead tests removed): - test_real_gemini_co.py::test_gemini_flash_via_co (permanent skip) - test_cli_deploy.py: test_deploy_requires_api_key, TestDeployCleanup, test_deploy_loads_api_key_from_env_file (skip + assert True + no asserts) - test_cli_auth_google.py::TestAuthGoogleIntegration (class-level skip) - test_cli_auth_microsoft.py::TestAuthMicrosoftIntegration (class-level skip) Metadata-only test removal (21 tests across 11 files): - All `assert handler._event_type == 'X'` style assertions - All `assert len(plugin) == N` / `plugin == [...]` literal export checks - Whole TestXxxPlugin classes when both methods were metadata-only - Kept test_events.py decorator tests (those test the decorator itself) Env-dependency fixes: - test_plugin_system.py: pass llm=MockLLM() instead of model="gpt-4o-mini" - test_config_permissions.py: same — avoids OPENAI_API_KEY requirement

The AST walker in extract_commands_from_bash and _extract_subcommands stopped at the first word of a command node and never visited child commandsubstitution nodes. Result: `echo $(rm -rf /)` reported only ["echo"], so an attacker who had `Bash(echo *)` approved could chain in any command via $() or `...` and the chain check would pass. Fix: after capturing the first word of a command node, recurse into each part's sub-AST (parts / list / command attrs) so substitutions are visited. Adds 25 unit tests covering: - single command / && / || / | / ; chains - $() and backtick substitution extraction - nested $(echo $(date)) - permission chain rejection when $() contains unauthorized command - wildcard / when / source / disallowed-permission paths

extract_and_authenticate did `abs(now - timestamp)` without type-checking the payload field. A signed request with timestamp="not-a-number" crashed the handler with TypeError, which surfaces as 500 to the caller instead of a clean unauthorized response. Fix: guard with `isinstance(timestamp, (int, float))` and reject as "unauthorized: timestamp must be numeric" before the math. Adds 10 attack-surface tests: - tamper-after-signing (prompt / to / extra field) all rejected - identity-swap (sig made with key A, from=key B) rejected - boundary timestamp (exactly at expiry, one past expiry) - timestamp=0 treated as missing - string timestamp returns clean error (regression for this fix) - replay-within-window documented as currently allowed - hex case-insensitivity verified

70+ new functional tests, no metadata-only assertions: - test_runtime_input.py (9): IO no-op gates, frame prefix injection, trace event fields, empty-prompt skip, missing-id defaults - test_ulw.py (16): mode_change state transitions, max-turns IO actions (continue/switch_mode/unknown/no-IO), prompt poll + injection - test_fuzzy.py (14): subsequence matching, case-insensitivity, score ordering (consecutive > scattered), word-boundary bonuses for / _ - . space - test_element_finder_format.py (13): LLM-facing format contract — empty list, text/placeholder/aria/role/input_type rendering, href query-stripping + 30-char truncation, frame=main implicit - test_session_merge.py (7): iteration tie-breaker, updated-timestamp fallback, missing-field defaults - test_session_storage.py (15): save/get round trip, last-write-wins, TTL expiry (running exempt), list dedup + sort, checkpoint 24h TTL - test_ws_ping.py (3): 30s interval send, multi-period loop, send-failure propagation - test_agent_error_paths.py (7): max iterations message, per-call override, stop_signal short-circuit, InsufficientCredits + generic LLM exception propagation, tool exception captured in trace - test_auto_compact.py (15): threshold + message-count gates, io event contract (compacting / done / error), _do_compact structure preservation, _format_messages_for_summary truncation rules - test_scroll.py (8 + 2 PIL-skip): strategy fallback chain, exception in one strategy doesn't short-circuit, screenshot diff defensiveness - test_exceptions.py (17): InsufficientCredits / LLMConnectionError / ProviderServiceError / ToolRejectedError attribute extraction, message formatting, __cause__ preservation, defensive defaults

User-facing docs: - CLAUDE.md: Test Organization section rewritten for unit + e2e two-layer structure; add Test Quality Rules section warning against metadata-only assertions - tests/README.md: directory tree + Run by Category + Test Categories sections rewritten - tests/TEST_ORGANIZATION.md: full rewrite — Strategy / Folder Structure / Markers / Decision Tree / Naming / Notes - tests/e2e/cli/README.md + aws/README.md: sh script paths updated to tests/e2e/cli/ prefix - docs/concepts/agent.md: "pytest -m real_api = integration tests" comment corrected Source LLM-Note docstring refs (13 files): - tests/cli/test_X.py → tests/e2e/cli/test_X.py - tests/real_api/test_X.py → tests/e2e/real_api/test_X.py - bash_parser.py: tests/integration/test_bash_chain_permissions.py (deleted) → tests/unit/test_bash_parser.py

… shadow `useful_plugins/__init__.py` does `from .auto_compact import auto_compact`, which rebinds the package attribute `connectonion.useful_plugins.auto_compact` from the *module* to the plugin-export *list* of the same name. As a result `import connectonion.useful_plugins.auto_compact as ac_module` actually binds `ac_module` to the list, and on Python 3.10 mock.patch's string-path resolution also lands on the list, breaking attribute lookup with: AttributeError: [<function check_and_compact at 0x...>] does not have the attribute '_do_compact' Fix the test file to grab the real module from `sys.modules` and use `patch.object(module, attr)` throughout. Same workaround applied for `connectonion.llm_do` since `__init__.py` shadows it the same way. Tests pass on Python 3.13 locally; CI will verify 3.10/3.11/3.12.

…_commands handle_sub_sync_one / handle_sub_list / handle_sub_remove use Rich console with inline markup (e.g. `0x` printed as `[bold red]0x[/bold red]`). Under capsys the colored escape codes show up in the captured stdout and split the literal substrings the tests look for: 'alice' is not a \x1b[1;31m0x\x1b[0m address... ↑ "not a 0x address" never matches Fix on the test side: add a tiny `_plain()` helper that strips `\x1b\[[0-9;]*m` and route every `readouterr().out` assertion through it. The test failure was a pre-existing upstream issue (fails on plain main), surfaced again on this PR's rebase target.

yyy900 force-pushed the cleanup/test-overhaul branch from 50c9b70 to 731f702 Compare May 14, 2026 08:54

yf added 8 commits May 14, 2026 19:24

yyy900 force-pushed the cleanup/test-overhaul branch from e55ca6b to 1aaa406 Compare May 14, 2026 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test overhaul + 2 security bug fixes#140

Test overhaul + 2 security bug fixes#140
yyy900 wants to merge 8 commits into
mainfrom
cleanup/test-overhaul

yyy900 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yyy900 commented May 14, 2026

Summary

Numbers

Test plan

Commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant