Test overhaul + 2 security bug fixes#140
Open
yyy900 wants to merge 8 commits into
Open
Conversation
50c9b70 to
731f702
Compare
added 8 commits
May 14, 2026 19:24
Verified zero imports of these packages across the repo: - litellm: never imported (project has its own multi-provider LLM abstraction) - click: never imported directly (transitive via typer, will still install) - toml: never imported (no tomli/tomllib/tomlkit either) - PyJWT: never imported (jwt source code removed earlier, declaration left behind)
Layout changes: - tests/integration/ removed (8 files redistributed: 5 to unit/, 3 to e2e/) - tests/cli/ → tests/e2e/cli/ - tests/real_api/ → tests/e2e/real_api/ - tests/integration/manual/ → tests/e2e/manual/ - pytest.ini and tests/conftest.py drop the integration marker Dead-code removal (1044 LOC): - test_smoke.py (assert True placeholder) - test_imports.py (zero-assert importlib smoke) - test_env_autoload.py (module-level skip — never runs in suite) - test_bash_chain_permissions.py (all @Skip with "needs rewrite") - test_benchmarks.py (MockLLM "benchmarks" + History API removed) - test_announce_only.py (skipif CI + if __name__ → demo, not test) Partial-skip cleanup (5 dead tests removed): - test_real_gemini_co.py::test_gemini_flash_via_co (permanent skip) - test_cli_deploy.py: test_deploy_requires_api_key, TestDeployCleanup, test_deploy_loads_api_key_from_env_file (skip + assert True + no asserts) - test_cli_auth_google.py::TestAuthGoogleIntegration (class-level skip) - test_cli_auth_microsoft.py::TestAuthMicrosoftIntegration (class-level skip) Metadata-only test removal (21 tests across 11 files): - All `assert handler._event_type == 'X'` style assertions - All `assert len(plugin) == N` / `plugin == [...]` literal export checks - Whole TestXxxPlugin classes when both methods were metadata-only - Kept test_events.py decorator tests (those test the decorator itself) Env-dependency fixes: - test_plugin_system.py: pass llm=MockLLM() instead of model="gpt-4o-mini" - test_config_permissions.py: same — avoids OPENAI_API_KEY requirement
The AST walker in extract_commands_from_bash and _extract_subcommands stopped at the first word of a command node and never visited child commandsubstitution nodes. Result: `echo $(rm -rf /)` reported only ["echo"], so an attacker who had `Bash(echo *)` approved could chain in any command via $() or `...` and the chain check would pass. Fix: after capturing the first word of a command node, recurse into each part's sub-AST (parts / list / command attrs) so substitutions are visited. Adds 25 unit tests covering: - single command / && / || / | / ; chains - $() and backtick substitution extraction - nested $(echo $(date)) - permission chain rejection when $() contains unauthorized command - wildcard / when / source / disallowed-permission paths
extract_and_authenticate did `abs(now - timestamp)` without type-checking the payload field. A signed request with timestamp="not-a-number" crashed the handler with TypeError, which surfaces as 500 to the caller instead of a clean unauthorized response. Fix: guard with `isinstance(timestamp, (int, float))` and reject as "unauthorized: timestamp must be numeric" before the math. Adds 10 attack-surface tests: - tamper-after-signing (prompt / to / extra field) all rejected - identity-swap (sig made with key A, from=key B) rejected - boundary timestamp (exactly at expiry, one past expiry) - timestamp=0 treated as missing - string timestamp returns clean error (regression for this fix) - replay-within-window documented as currently allowed - hex case-insensitivity verified
70+ new functional tests, no metadata-only assertions: - test_runtime_input.py (9): IO no-op gates, frame prefix injection, trace event fields, empty-prompt skip, missing-id defaults - test_ulw.py (16): mode_change state transitions, max-turns IO actions (continue/switch_mode/unknown/no-IO), prompt poll + injection - test_fuzzy.py (14): subsequence matching, case-insensitivity, score ordering (consecutive > scattered), word-boundary bonuses for / _ - . space - test_element_finder_format.py (13): LLM-facing format contract — empty list, text/placeholder/aria/role/input_type rendering, href query-stripping + 30-char truncation, frame=main implicit - test_session_merge.py (7): iteration tie-breaker, updated-timestamp fallback, missing-field defaults - test_session_storage.py (15): save/get round trip, last-write-wins, TTL expiry (running exempt), list dedup + sort, checkpoint 24h TTL - test_ws_ping.py (3): 30s interval send, multi-period loop, send-failure propagation - test_agent_error_paths.py (7): max iterations message, per-call override, stop_signal short-circuit, InsufficientCredits + generic LLM exception propagation, tool exception captured in trace - test_auto_compact.py (15): threshold + message-count gates, io event contract (compacting / done / error), _do_compact structure preservation, _format_messages_for_summary truncation rules - test_scroll.py (8 + 2 PIL-skip): strategy fallback chain, exception in one strategy doesn't short-circuit, screenshot diff defensiveness - test_exceptions.py (17): InsufficientCredits / LLMConnectionError / ProviderServiceError / ToolRejectedError attribute extraction, message formatting, __cause__ preservation, defensive defaults
User-facing docs: - CLAUDE.md: Test Organization section rewritten for unit + e2e two-layer structure; add Test Quality Rules section warning against metadata-only assertions - tests/README.md: directory tree + Run by Category + Test Categories sections rewritten - tests/TEST_ORGANIZATION.md: full rewrite — Strategy / Folder Structure / Markers / Decision Tree / Naming / Notes - tests/e2e/cli/README.md + aws/README.md: sh script paths updated to tests/e2e/cli/ prefix - docs/concepts/agent.md: "pytest -m real_api = integration tests" comment corrected Source LLM-Note docstring refs (13 files): - tests/cli/test_X.py → tests/e2e/cli/test_X.py - tests/real_api/test_X.py → tests/e2e/real_api/test_X.py - bash_parser.py: tests/integration/test_bash_chain_permissions.py (deleted) → tests/unit/test_bash_parser.py
… shadow
`useful_plugins/__init__.py` does `from .auto_compact import auto_compact`,
which rebinds the package attribute `connectonion.useful_plugins.auto_compact`
from the *module* to the plugin-export *list* of the same name. As a result
`import connectonion.useful_plugins.auto_compact as ac_module` actually binds
`ac_module` to the list, and on Python 3.10 mock.patch's string-path
resolution also lands on the list, breaking attribute lookup with:
AttributeError: [<function check_and_compact at 0x...>] does not have
the attribute '_do_compact'
Fix the test file to grab the real module from `sys.modules` and use
`patch.object(module, attr)` throughout. Same workaround applied for
`connectonion.llm_do` since `__init__.py` shadows it the same way.
Tests pass on Python 3.13 locally; CI will verify 3.10/3.11/3.12.
…_commands
handle_sub_sync_one / handle_sub_list / handle_sub_remove use Rich console
with inline markup (e.g. `0x` printed as `[bold red]0x[/bold red]`). Under
capsys the colored escape codes show up in the captured stdout and split
the literal substrings the tests look for:
'alice' is not a \x1b[1;31m0x\x1b[0m address...
↑ "not a 0x address" never matches
Fix on the test side: add a tiny `_plain()` helper that strips
`\x1b\[[0-9;]*m` and route every `readouterr().out` assertion through it.
The test failure was a pre-existing upstream issue (fails on plain main),
surfaced again on this PR's rebase target.
e55ca6b to
1aaa406
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test infrastructure overhaul + two security bug fixes uncovered while writing tests.
Two bugs found and fixed during testing (both flagged in commit messages):
bash_parser.pypermission bypass — the AST walker stopped at the first word of each command and never descended intocommandsubstitutionnodes.echo $(rm -rf /)reported only["echo"], so a user withBash(echo *)approved could chain in any command via$()or backticks. Now recurses into substitutions.host/auth.py500 on malformed timestamp —abs(now - timestamp)raisedTypeErrorwhentimestampwas a string, surfacing as 500 instead of 401. Now guarded withisinstanceand returns clean unauthorized.Other changes:
unit/ande2e/(withe2e/cli/,e2e/real_api/,e2e/manual/subdirs). Markers auto-applied by folder.test_smoke,test_imports,test_env_autoload,test_bash_chain_permissions,test_benchmarks,test_announce_only— all integers' worth ofassert True/ module-level skips / "needs rewrite" markers.assert handler._event_type == 'X',assert len(plugin) == N,plugin == [...]— these break only on cosmetic refactors, not on real bugs. Kepttest_events.py(where the decorator IS the unit under test).runtime_input,ulw,bash_parser,fuzzy,element_finderformat,session/merge,session/storage,ws_router/ping, agent error paths,auto_compact,scroll, exceptions.pyproject.toml: dropped 4 unused dependencies (litellm,click,toml,PyJWT) — verified zero imports across the repo.clickis still installed transitively viatyper.test_plugin_system.pyandtest_config_permissions.pywere silently failing locally for anyone withoutOPENAI_API_KEY— now useMockLLM.Numbers
test_events.py)Test plan
pytest tests/unit— 1870 pass, 5 skip (PIL/Windows/optional TUI), 0 failpytest --collect-only— 2147 collected, 139 deselected (defaultnot real_api and not network)grep "tested by \[tests/cli\|tests/integration\|tests/real_api"returns 0)Commits
chore(deps)— drop 4 unused depstest: overhaul test layout— reorg + dead/metadata cleanup + MockLLM fixesfix(security)— bash_parser$()recursionfix(auth)— non-numeric timestamp → 401test: add unit tests for 10 previously-uncovered modulesdocs— test layout docs + LLM-Note paths