Files
odysseus/tests
Kenny Van de Maele 8ce945d338 feat: Add plan mode to the chat agent (#638)
* feat: Add plan mode to the chat agent

Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.

Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.

- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
  plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
  checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.

* Plan mode: persistent re-referenceable plan + agent write-back

Three improvements so a long plan survives a weak model and stays in reach:

1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
   sends the approved checklist back (`approved_plan`); the backend pins it as a
   top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
   the agent can always re-read the plan instead of losing the thread on a long
   run. New `build_active_plan_note()` (unit-tested).

2. Re-open / dock the plan anytime. The plan checklist is stored per-session
   (localStorage). When a plan exists, the plan-mode button opens a small menu
   ("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
   window — so it can stay docked while the agent works. The window live-refreshes
   as the plan changes.

3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
   `- [x]` after finishing them, or to revise steps when the user asks. Marker
   tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
   update live. The ACTIVE PLAN note instructs the agent to use it.

Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.

* Plan mode: drop bash/python, rely on read-only discovery tools

Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.

Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Comment: note _MCP_READONLY_VERBS are prefixes not whole words

Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: clarify why gating inverts the allowlist into a denylist

Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: stop hardcoding the read-only tool list in the directive

The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
2026-06-05 16:32:25 +02:00
..
2026-06-01 02:22:17 +00:00

Test Suite Notes

Purpose

This file documents the shared test helpers and the review expectations that go with them. The suite is being refactored incrementally, so this is a working reference for that effort — not a claim that the suite is already fully organized. Read it before adding a new helper or before reviewing a PR that touches tests/helpers/.

Core principles

  • Keep PRs small and homogeneous: one kind of change per PR.
  • Prefer explicit local setup over hidden global fixtures.
  • Avoid expanding the root conftest.py unless absolutely necessary.
  • Do not mix file moves with logic changes in the same PR.
  • Do not weaken tests with skip/xfail just to make CI pass.
  • Validate the focused files you changed, plus any neighboring or order-sensitive groups they interact with.

Helper conventions

The helpers below live under tests/helpers/. They exist to remove repeated boilerplate that already appeared across multiple tests. Reach for one only when your test matches its intended use; do not stretch a helper to cover a new case.

tests.helpers.cli_loader.load_script

Use when a test needs to import a script under scripts/ without repeating SourceFileLoader / importlib.util boilerplate.

  • Intended for script/CLI tests that load a single file from scripts/.
  • Not for arbitrary package imports — use a normal import for those.
  • When migrating an existing test to it, keep the existing stubs and assertions unchanged. Any sys.modules stubs the script needs at import time must still be injected (e.g. via monkeypatch) before calling load_script.

tests.helpers.import_state.clear_module

Use when a test must drop one cached module and its parent-package attribute before a fresh import.

  • Clears sys.modules[name].
  • Clears the parent-package attribute when present.
  • Good replacement for local sys.modules.pop(...) + delattr(parent, child) blocks.

tests.helpers.import_state.preserve_import_state

Use when a test temporarily installs stubs into sys.modules and needs deterministic cleanup afterward.

  • Context manager: restores both sys.modules entries and parent-package attributes on exit (normal or exception).
  • Useful around module-level stubs or temporary imports.
  • Prefer narrow, explicit module names over broad ones.

tests.helpers.import_state.clear_fake_database_modules

Use only for the guarded fake/stub database cleanup pattern.

  • Preserves a real-looking core.database (one with a string __file__).
  • Removes a fake/stub core.database and the related src.database state.
  • Do not use as a general database reset fixture.

tests.helpers.import_state.clear_fake_endpoint_resolver_modules

Use only for the guarded fake/stub src.endpoint_resolver cleanup pattern.

  • Preserves real resolver modules (those with a truthy __file__).
  • Evicts fake/stub resolver modules and the dependent route modules that were cached against them.
  • Accepts explicit extra dependent module names to evict alongside the defaults.

What not to abstract yet

Some remaining patterns should stay as-is for now rather than being forced into helpers:

  • Large mixed files such as security/review regression files.
  • Setup-oriented sys.modules stub installers.
  • One-off custom module patching.
  • DB/session/route setup, until it has been audited separately.

Validation expectations

Run validation locally before opening or approving a PR. Practical checks:

  • git diff --check — catch whitespace and conflict-marker errors.
  • python3 -m py_compile <changed files> — confirm changed files compile.
  • Focused pytest on the changed test files.
  • pytest on neighboring or order-sensitive test groups that share import state with the changed files.
  • grep for the old boilerplate when replacing it, to confirm no stragglers remain.
  • A fresh audit worktree when changing the helpers themselves, so stale __pycache__ or import state cannot mask a regression.

Current roadmap

  1. Import-state cleanup — complete.
  2. Document helper conventions (this file).
  3. Audit fake DB / SessionLocal / route setup duplication.
  4. Add tiny helpers only when the repeated semantics are clear.
  5. Start low-risk file moves only after helper conventions are documented.
  6. Avoid moving high-risk security/route regression files first.