Files
odysseus/tests
Tal.Yuan fc1351d0f8 refactor(tools): split tool_implementations.py into src/tools/ package (#4423)
* test(tools): add shim protection test for tool_implementations split

Covers all 48 top-level functions (33 do_* + 15 _helpers) extracted from
the original module. Guards the upcoming split: the shim must re-export
every symbol so existing 'from src.tool_implementations import X' imports
keep working. Passes on baseline (pre-split).

* refactor(tools): add src/tools/ package with shared _common

Slice 1 Task 2 (#4082/#4071). Adds the package skeleton and moves the
shared _parse_tool_args helper into src/tools/_common.py. Domain modules
will import from here. tool_implementations.py is untouched at this step.

* refactor(tools): extract system domain into src/tools/system.py

Slice 1 (#4082/#4071), Task 3: move the system-domain tool functions
(do_manage_skills/_skill_dump/do_manage_tasks/do_manage_endpoints/
do_manage_mcp/do_manage_webhooks/do_manage_tokens/do_manage_settings/
do_api_call/do_app_api) and the app_api blocklist constants out of
tool_implementations.py into a new src/tools/system.py module.

tool_implementations.py re-imports all of them so it stays a working
backward-compatible facade (shim test stays green).

- do_manage_mcp resolves get_mcp_manager via a function-local import
  from tool_implementations so the test that patches
  src.tool_implementations.get_mcp_manager still applies post-move.
- do_app_api imports _internal_headers and _INTERNAL_BASE (still in
  tool_implementations) function-locally to avoid a circular import.
- Repoint test_context_budget introspection assertion to the moved
  code's new home in src/tools/system.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(tools): extract cookbook domain into src/tools/cookbook.py

Moves the model-serving (cookbook) tool domain out of tool_implementations.py
into src/tools/cookbook.py as part of slice 1 (#4082/#4071):

- 13 do_* tools: download/serve/list/stop/tail/search/adopt/cached models,
  list downloads/cancel, list cookbook servers, serve presets
- 9 private helpers: _cookbook_servers, _resolve_cookbook_host,
  _cookbook_env_for_host, _infer_serve_{port,host}, _ensure_served_endpoint,
  _cookbook_register_task, _cookbook_apply_retry_suggestion,
  _scan_running_model_processes, _cookbook_kill_session
- _MODEL_PROCESS_PATTERNS constant (used only by _scan_running_model_processes)

tool_implementations.py stays a backward-compatible facade via a re-import
from src.tools.cookbook; src/tools/__init__ re-exports the same symbols.

_internal_headers and _INTERNAL_BASE stay in tool_implementations.py (shared
by system.py's do_app_api and many cookbook funcs). Each cookbook function
that needs them does a function-local import to avoid a top-level circular
dependency, matching the system-domain split.

Verified: compileall clean; shim test green; cookbook-touching suite
(652 passed, 1 skipped); full suite 3587 passed, 2 failed
(pre-existing test_api_chat_security, unrelated).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(tools): extract search domain into src/tools/search.py

* refactor(tools): extract notes domain into src/tools/notes.py

* refactor(tools): extract calendar domain into src/tools/calendar.py

Repoints tests/test_caldav_bidirectional_sync.py source-introspection
to src/tools/calendar.py (do_manage_calendar moved there).

* refactor(tools): extract image domain into src/tools/image.py

* refactor(tools): extract research domain into src/tools/research.py

* refactor(tools): extract contacts domain into src/tools/contacts.py

* refactor(tools): extract vault domain into src/tools/vault.py

Repoints tests/test_vault_password_not_in_argv.py source-introspection
to src/tools/vault.py (the vault do_* helpers moved there).

* refactor(tools): collapse tool_implementations to clean re-export shim

Move shared _INTERNAL_BASE/_internal_headers to src/tools/_common.py and
drop the duplicate _parse_tool_args (already in _common). tool_implementations.py
is now a pure re-export facade (+ 3 pre-existing email-context helpers, out of
scope). Domain files' function-local imports of these names still resolve via
the facade re-export.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tools): port upstream cookbook workflow changes to split module

Rebase onto dev dropped c504214 ("Cookbook model workflow fixes") edits
to do_serve_model / do_tail_serve_output: the extraction commit moved
the pre-edit bodies into src/tools/cookbook.py and git auto-accepted the
deletion from tool_implementations.py, losing dev's changes. Restore them
in their post-split home:

- do_serve_model: add where/log_path/next_tools and the expanded
  "Next required check" output message
- do_tail_serve_output: empty-output fallback message replacing
  "(empty pane)"

(do_manage_settings web_fetch alias edit was already applied to
src/tools/system.py during the system-extract conflict resolution.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tools): break admin_tools circular import in split facade

After rebasing onto dev (#3629 moved the admin manage_* tools into
src/agent_tools/admin_tools), the facade re-exported them via a top-level
`from src.agent_tools.admin_tools import ...`. But src.agent_tools.__init__
imports this facade at top level, so the eager import re-entered the
partially-initialized agent_tools package and broke collection.

Re-export the admin symbols (do_manage_endpoints/mcp/webhooks/tokens/
settings, _MCP_DENIED_COMMANDS, _validate_mcp_command) lazily through
module __getattr__ instead, and drop them from src/tools/__init__ (they
no longer live in the src.tools package). system.py now holds only the
skills/tasks/api bridges; admin tools live solely in admin_tools.py,
matching upstream.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tools): re-export dropped helpers through the split shim

Address review finding from #4423: the compatibility facade claimed to
preserve every original top-level symbol but omitted three helpers the
old src.tool_implementations exposed. Re-export them and pin them in
the shim protection test:

- _string_arg, _validate_cookbook_ssh_target <- src/tools/cookbook.py
- _mcp_allowed_commands <- src/agent_tools/admin_tools.py (lazily via
  __getattr__, to keep the agent_tools.__init__ <-> facade import acyclic
  after the #3629 admin-tools migration)

All three added to tests/test_tool_implementations_shim.py _EXPECTED so
the test contract now matches its "every original top-level function"
comment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(tools): self-verify shim re-exports every domain do_*

The hand-maintained _EXPECTED list in the shim protection test can drift
silently when a new tool is added to a domain module but not re-exported
by the facade — exactly the omission a reviewer flagged post-split.
Add an auto-discovering test that enumerates every do_* from the domain
modules (incl. admin_tools) and asserts reachability through the shim,
so a forgotten re-export fails the build automatically.

Uses hasattr (not dir(ti)) because the admin symbols are re-exported
lazily via module __getattr__ and don't appear in dir(ti).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(tools): self-verify every in-repo facade import resolves

RaresKeY's P3 on the shim test was a claim-vs-reality gap: the docstring
said it protected "every from src.tool_implementations import X" but the
hand-maintained _EXPECTED list omitted three underscore helpers, so the
claim wasn't enforced. Re-exporting the three (cf1f5e3) fixed the known
gap; this closes the structural one.

Add test_every_facade_import_in_repo_resolves: ast-enumerate every
`from src.tool_implementations import X` site in src/ and tests/ and
assert hasattr(ti, X) for each. A forgotten re-export that anything in
the repo imports now fails the build automatically — including underscore
helpers, which the do_* discovery test does not cover.

Together with test_shim_reexports_every_domain_do_function, the shim
contract is now self-verifying. Demote _EXPECTED in the docstring to the
curated historical/downstream surface (the three helpers have no in-repo
consumer, so they stay manual by necessity) instead of "ground truth".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tools): dedupe _parse_tool_args + align shim guard with route consumers

Addresses two P3s from review (RaresKeY, 2026-06-26):

1. maintainability — _common carried a full copy of _parse_tool_args
   alongside the canonical src.tool_utils one; future parser fixes could
   diverge. The two bodies were byte-identical in logic, so _common now
   re-exports from tool_utils (a leaf module, no circular-import risk).
   The single-source test is extended to assert _common._parse_tool_args
   and tool_implementations._parse_tool_args are the same object as
   tool_utils._parse_tool_args.

2. test — the shim guard's import-site scan only walked src/ and tests/,
   missing routes/chat_routes.py's clear_active_email/set_active_email
   imports, and _EXPECTED omitted the active-email facade helpers. The
   scan now walks every first-party Python dir (pruning venvs/caches/data
   in-place), and set/get/clear_active_email are added to _EXPECTED
   (get_active_email has no in-repo importer, so the scan alone can't see
   it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: yuandonghao <yuandonghao@cohl.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 15:40:04 +01:00
..
2026-05-31 23:58:26 +09:00
2026-06-01 02:22:17 +00:00
2026-05-31 23:58:26 +09:00

Test Suite Notes

Purpose

This file documents the shared test helpers and the review expectations that go with them. The suite is being refactored incrementally, so this is a working reference for that effort - not a claim that the suite is already fully organized. Read it before adding a new helper or before reviewing a PR that touches tests/helpers/.

For the broader rules - test taxonomy, determinism/isolation rules, the behavioral-vs-source-text policy, and helper/factory extraction rules - see TESTING_STANDARD.md. This file is the concrete helper reference; that file is the standard the refactor works toward.

Running focused subsets (taxonomy markers)

tests/conftest.py tags every test at collection time with two markers derived from its filename by tests/_taxonomy.py: an area_* marker (e.g. area_security) and a finer sub_* marker (e.g. sub_owner_scope). This adds markers only - it moves no files and changes no test behavior. Use them to run a focused slice:

./venv/bin/python -m pytest -m area_security
./venv/bin/python -m pytest -m "area_services and sub_cookbook"

Areas are security, routes, services, cli, js, helpers, unit, and uncategorized. Classification is conservative and token-based: a file that matches no area keyword falls back to area_uncategorized with its filename as the sub-area. The area_* names are registered in pyproject.toml; the dynamic sub_* names are registered before collection by pytest_configure in tests/conftest.py, so unknown-mark warnings still flag genuine typos.

For common focused runs, use tests/run_focus.py. It validates area and sub-area names, accepts sub-areas with or without the sub_ prefix, and passes extra pytest arguments after --:

./venv/bin/python tests/run_focus.py --area security
./venv/bin/python tests/run_focus.py --area services --sub-area cookbook
./venv/bin/python tests/run_focus.py --sub-area sub_cookbook
./venv/bin/python tests/run_focus.py --keyword taxonomy
./venv/bin/python tests/run_focus.py --last-failed
./venv/bin/python tests/run_focus.py --dry-run --area services --sub-area cookbook
./venv/bin/python tests/run_focus.py --area services -- --maxfail=1 -q

Fast lane and duration visibility

--fast runs the fast lane: the tests that are not marked slow (it adds the marker expression not slow). It composes with --area/--sub-area using and. Because no tests may be marked slow yet, --fast can initially match the full focused selection; it becomes a real speed-up as slow marks are added from duration evidence. Use it for quick local or reviewer feedback; it does not replace broader focused or full-suite validation before merge.

--durations N and --durations-min FLOAT add pytest's slowest-test reporting so you can see where time goes. They are reporting only and do not count as a focus selector, so --durations must be combined with a real selector (--area, --sub-area, --keyword, --last-failed, or --fast).

Use the project Python environment before running these commands. The examples use the repo's documented ./venv/bin/python path so they do not accidentally fall back to system Python.

./venv/bin/python tests/run_focus.py --fast
./venv/bin/python tests/run_focus.py --area services --fast
./venv/bin/python tests/run_focus.py --area services --durations 25
./venv/bin/python tests/run_focus.py --area services --fast --durations 25 --durations-min 0.05

The slow marker is opt-in. Mark a test slow only with duration evidence (from --durations), not by guessing - see the fast-lane policy in TESTING_STANDARD.md. --fast is for quick reviewer feedback and must not replace the full suite before merge. A slow mark only excludes a test from the fast lane; the test stays runnable directly, e.g.:

./venv/bin/python -m pytest tests/test_auth_config_lock_concurrency.py
./venv/bin/python -m pytest -m slow

Order-sensitivity reporting (report-only)

tests/run_order_report.py runs pytest with the collected test items shuffled by a seeded RNG, to surface order-sensitive tests (hidden coupling through shared import state, module caches, databases, etc.). It is report-only: it is not wired into CI, adds no gate, and changes no normal pytest collection or ordering - the shuffle exists only inside this runner. The seed is always printed, and pytest targets/options go after a literal --:

./venv/bin/python tests/run_order_report.py --seed 123 -- tests/cli/ -q
./venv/bin/python tests/run_order_report.py -- tests/cli/ -q   # generates and prints a seed

The same seed reproduces the same order when the reported working directory, pytest target arguments, and test environment are also the same. The runner prints all command arguments with shell-safe POSIX quoting and uses the invoking Python interpreter.

A generated-seed run starts with output like:

[order-report] working directory: /path/to/odysseus
[order-report] shuffling test order with seed 284734921
[order-report] reproduce from this working directory with the same test environment:
[order-report] reproduce with: /path/to/odysseus/venv/bin/python /path/to/odysseus/tests/run_order_report.py --seed 284734921 -- tests/cli/ -q

Run the printed command from the reported working directory to reproduce the same fixed-seed order:

[order-report] working directory: /path/to/odysseus
[order-report] shuffling test order with seed 284734921
[order-report] reproduce from this working directory with the same test environment:
[order-report] reproduce with: /path/to/odysseus/venv/bin/python /path/to/odysseus/tests/run_order_report.py --seed 284734921 -- tests/cli/ -q

Pytest output remains visible between the report header and footer. A failing run ends with pytest's normal failure report followed by:

FAILED tests/example_test.py::test_example - AssertionError
[order-report] seed 284734921: pytest exit code 1 (report-only; fix order-sensitive failures in separate scoped PRs)

Failures discovered this way are real isolation bugs: fix them in separate scoped PRs - do not silence them with skip/xfail, and do not "fix" them by depending on a particular order.

The runner propagates pytest's exit code, so it composes with normal local workflows; "report-only" means it is not a CI gate, not that failures are swallowed.

Core principles

  • Keep PRs small and homogeneous: one kind of change per PR.
  • Prefer explicit local setup over hidden global fixtures.
  • Avoid expanding the root conftest.py unless absolutely necessary.
  • Do not mix file moves with logic changes in the same PR.
  • Do not weaken tests with skip/xfail just to make CI pass.
  • Validate the focused files you changed, plus any neighboring or order-sensitive groups they interact with.

Helper conventions

The helpers below live under tests/helpers/. They exist to remove repeated boilerplate that already appeared across multiple tests. Reach for one only when your test matches its intended use; do not stretch a helper to cover a new case.

tests.helpers.cli_loader.load_script

Use when a test needs to import a script under scripts/ without repeating SourceFileLoader / importlib.util boilerplate.

  • Intended for script/CLI tests that load a single file from scripts/.
  • Not for arbitrary package imports - use a normal import for those.
  • When migrating an existing test to it, keep the existing stubs and assertions unchanged. Any sys.modules stubs the script needs at import time must still be injected (e.g. via monkeypatch) before calling load_script.

tests.helpers.import_state.clear_module

Use when a test must drop one cached module and its parent-package attribute before a fresh import.

  • Clears sys.modules[name].
  • Clears the parent-package attribute when present.
  • Good replacement for local sys.modules.pop(...) + delattr(parent, child) blocks.

tests.helpers.import_state.preserve_import_state

Use when a test temporarily installs stubs into sys.modules and needs deterministic cleanup afterward.

  • Context manager: restores both sys.modules entries and parent-package attributes on exit (normal or exception).
  • Useful around module-level stubs or temporary imports.
  • Prefer narrow, explicit module names over broad ones.

tests.helpers.import_state.clear_fake_database_modules

Use only for the guarded fake/stub database cleanup pattern.

  • Preserves a real-looking core.database (one with a string __file__).
  • Removes a fake/stub core.database and the related src.database state.
  • Do not use as a general database reset fixture.

tests.helpers.import_state.clear_fake_endpoint_resolver_modules

Use only for the guarded fake/stub src.endpoint_resolver cleanup pattern.

  • Preserves real resolver modules (those with a truthy __file__).
  • Evicts fake/stub resolver modules and the dependent route modules that were cached against them.
  • Accepts explicit extra dependent module names to evict alongside the defaults.

tests.helpers.sqlite_db.make_temp_sqlite

Use for the repeated file-backed temp sqlite setup in tests.

  • Only constructs (SessionLocal, engine, tmpfile) from the repeated block.
  • Does not patch modules and does not clean up the temp file.
  • The caller must bind SessionLocal explicitly onto whatever module the code under test reads, and must keep the returned objects alive.
  • Do not use it as a general DB fixture framework.

tests.helpers.db_stubs.make_core_db_stub

Use for small import-time core.database stubs with a placeholder SessionLocal.

  • Pass model names via models when MagicMock attributes are sufficient.
  • Pass attributes when an import needs exact placeholder values.
  • Set install_core_package=True only when the test also needs a fake parent core module stub.
  • Keep custom fake sessions and route-specific database behavior local.

What not to abstract yet

Some remaining patterns should stay as-is for now rather than being forced into helpers:

  • Large mixed files such as security/review regression files.
  • Broad setup-oriented sys.modules stub installers.
  • One-off custom module patching.
  • Custom DB session, route, and app setup.

Validation expectations

Run validation locally before opening or approving a PR. Practical checks:

  • git diff --check - catch whitespace and conflict-marker errors.
  • ./venv/bin/python -m py_compile <changed files> - confirm changed files compile.
  • Focused ./venv/bin/python -m pytest on the changed test files.
  • ./venv/bin/python -m pytest on neighboring or order-sensitive test groups that share import state with the changed files.
  • grep for the old boilerplate when replacing it, to confirm no stragglers remain.
  • A fresh audit worktree when changing the helpers themselves, so stale __pycache__ or import state cannot mask a regression.

Current roadmap

  1. Import-state cleanup - complete.
  2. Document helper conventions (this file).
  3. Pilot the repeated import-time core.database stub helper.
  4. Add further tiny helpers only when the repeated semantics are clear.
  5. Start low-risk file moves only after helper conventions are documented.
  6. Avoid moving high-risk security/route regression files first.