Files
odysseus/tests/test_plan_mode.py
Kenny Van de Maele 8ce945d338 feat: Add plan mode to the chat agent (#638)
* feat: Add plan mode to the chat agent

Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.

Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.

- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
  plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
  checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.

* Plan mode: persistent re-referenceable plan + agent write-back

Three improvements so a long plan survives a weak model and stays in reach:

1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
   sends the approved checklist back (`approved_plan`); the backend pins it as a
   top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
   the agent can always re-read the plan instead of losing the thread on a long
   run. New `build_active_plan_note()` (unit-tested).

2. Re-open / dock the plan anytime. The plan checklist is stored per-session
   (localStorage). When a plan exists, the plan-mode button opens a small menu
   ("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
   window — so it can stay docked while the agent works. The window live-refreshes
   as the plan changes.

3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
   `- [x]` after finishing them, or to revise steps when the user asks. Marker
   tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
   update live. The ACTIVE PLAN note instructs the agent to use it.

Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.

* Plan mode: drop bash/python, rely on read-only discovery tools

Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.

Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Comment: note _MCP_READONLY_VERBS are prefixes not whole words

Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: clarify why gating inverts the allowlist into a denylist

Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: stop hardcoding the read-only tool list in the directive

The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
2026-06-05 16:32:25 +02:00

105 lines
4.3 KiB
Python

"""Plan mode gating regression tests.
Plan mode restricts the agent to read-only/inspection tools so it can investigate
and propose a plan without mutating anything. These pin the security-relevant
contract:
- The read-only allowlist contains only inspection tools (no writes/sends/manage_*).
- `plan_mode_disabled_tools()` blocks every mutating tool and never blocks an
allowlisted one.
- It fails CLOSED: if the tool-schema list can't be loaded, it still blocks a
known-mutating set rather than returning nothing (which would allow mutations).
Pure-function tests — no FastAPI app boot, no DB.
"""
from src.tool_security import (
PLAN_MODE_READONLY_TOOLS,
_PLAN_MODE_KNOWN_MUTATORS,
plan_mode_disabled_tools,
)
def test_allowlist_has_no_obvious_mutating_tools():
# Sanity: the read-only allowlist must not contain mutating/external tools.
mutating_markers = ("write_", "send_", "manage_", "create_", "edit_", "delete_")
for name in PLAN_MODE_READONLY_TOOLS:
assert not name.startswith(mutating_markers), f"{name} should not be read-only"
def test_plan_mode_blocks_mutating_tools():
disabled = plan_mode_disabled_tools()
# A representative spread of mutating/external tools must be blocked.
for name in (
"write_file", "send_email", "reply_to_email", "manage_memory",
"manage_settings", "create_document", "edit_document", "download_model",
"generate_image", "trigger_research",
):
assert name in disabled, f"{name} must be blocked in plan mode"
def test_plan_mode_allows_readonly_tools():
disabled = plan_mode_disabled_tools()
# Read-only investigation tools stay enabled, including the discovery tools
# (grep/glob/ls) that replace freestyle shell.
for name in ("read_file", "grep", "glob", "ls", "web_search", "web_fetch", "search_chats"):
assert name not in disabled, f"{name} should be usable in plan mode"
def test_plan_mode_blocks_shell():
# bash/python can mutate and can't be constrained read-only, so plan mode
# must block them (the whole point of dropping shell from plan mode).
disabled = plan_mode_disabled_tools()
for name in ("bash", "python"):
assert name in disabled, f"{name} must be blocked in plan mode"
def test_disabled_never_intersects_allowlist():
assert plan_mode_disabled_tools() & PLAN_MODE_READONLY_TOOLS == set()
def test_mcp_readonly_classification():
from src.mcp_manager import mcp_tool_is_readonly as ro
# Server-provided hints win over the name heuristic.
assert ro({"name": "zap", "annotations": {"readOnlyHint": True}}) is True
assert ro({"name": "list_things", "annotations": {"readOnlyHint": False}}) is False
assert ro({"name": "get_x", "annotations": {"destructiveHint": True}}) is False
# No hint → leading-verb heuristic, fail closed for ambiguous names.
assert ro({"name": "list_files"}) is True
assert ro({"name": "search_docs"}) is True
assert ro({"name": "send_message"}) is False
assert ro({"name": "frobnicate"}) is False
def test_fail_closed_fallback_blocks_mutations(monkeypatch):
# If the schema list can't load, we must still block (fail closed), not
# return an empty set that would silently allow every mutating tool.
import src.tool_security as ts
def _boom():
raise ImportError("simulated circular import failure")
# Force the dynamic path to fail by making the lazy import explode.
monkeypatch.setitem(
__import__("sys").modules, "src.agent_tools", None
)
disabled = ts.plan_mode_disabled_tools()
assert disabled, "plan mode must never fail open (empty disabled set)"
assert "write_file" in disabled
assert "send_email" in disabled
assert disabled == set(_PLAN_MODE_KNOWN_MUTATORS)
def test_active_plan_note_pins_checklist():
"""The approved-plan note re-grounds execution so a long plan survives
history truncation (the agent can always re-read it)."""
from src.agent_loop import build_active_plan_note
plan = "- [ ] step one\n- [ ] step two"
note = build_active_plan_note(plan)
assert "ACTIVE PLAN" in note
assert plan in note # the actual checklist is embedded
assert "IN ORDER" in note # execution guidance present
# Empty input → no note (so we never inject a blank pin).
assert build_active_plan_note("") == ""
assert build_active_plan_note(" ") == ""