* refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils
Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager()
into a single leaf module (src/tool_utils.py) that imports only from
src.constants. This eliminates the lazy-import hack
('from src import agent_tools' inside function bodies) in tool_execution.py
and tool_implementations.py, and fixes a latent bug: the _truncate copy in
tool_execution.py was missing the isinstance guard and would crash on None.
Also deletes mcp_servers/_common.py — it was dead code with zero callers
anywhere in the codebase, containing its own copy of truncate() and
constants that already exist in src/constants.py.
* fix(tools): route remaining get_mcp_manager imports to src.tool_utils
The maintainer's feedback flagged src/task_scheduler.py:1857 and
routes/task_routes.py:977. A project-wide search found a third call site
in src/agent_loop.py that also imported get_mcp_manager from
src.agent_tools instead of src.tool_utils.
All three are now sourced from the canonical location in src.tool_utils.
---------
Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>
MAX_OUTPUT_CHARS, MAX_READ_CHARS, and MAX_DIFF_LINES are now
defined once in src/constants.py and imported by the three files
that previously duplicated them (tool_execution.py,
tool_implementations.py, agent_tools.py). agent_tools.py re-exports
them for backward compatibility.
Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>
* feat: Add plan mode to the chat agent
Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.
Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.
- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.
* Plan mode: persistent re-referenceable plan + agent write-back
Three improvements so a long plan survives a weak model and stays in reach:
1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
sends the approved checklist back (`approved_plan`); the backend pins it as a
top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
the agent can always re-read the plan instead of losing the thread on a long
run. New `build_active_plan_note()` (unit-tested).
2. Re-open / dock the plan anytime. The plan checklist is stored per-session
(localStorage). When a plan exists, the plan-mode button opens a small menu
("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
window — so it can stay docked while the agent works. The window live-refreshes
as the plan changes.
3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
`- [x]` after finishing them, or to revise steps when the user asks. Marker
tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
update live. The ACTIVE PLAN note instructs the agent to use it.
Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.
* Plan mode: drop bash/python, rely on read-only discovery tools
Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.
Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Comment: note _MCP_READONLY_VERBS are prefixes not whole words
Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Plan mode: clarify why gating inverts the allowlist into a denylist
Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Plan mode: stop hardcoding the read-only tool list in the directive
The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.
Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.
- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
Validates a non-empty question + 2..6 options, normalizes string/object
options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
the turn ends and waits for the user's selection. Stream the question as
assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
user can be asked); the structured args serialize via the default
json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
free-text Other + dismiss x; removed once answered). Reuses CSS vars and
the .modal-close button; emoji go through the monochrome-SVG pipeline.
Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
cap, validation errors, serializer round-trip, registration.
Gives the agent first-class code navigation instead of shelling out via bash
(token-heavy, unreliable on weaker models, unstructured). Mirrors the
Grep/Glob/Read primitives that Claude Code / opencode expose.
- grep: regex search over file contents across a tree. Uses ripgrep when
available (with explicit excludes so junk dirs are skipped even without a
.gitignore); falls back to a pure-Python walk+regex when rg is absent.
Returns file:line:match, capped.
- glob: find files by glob pattern (recursive), newest first.
- ls: list a directory (folders first, then files with sizes).
- read_file: optional offset/limit for line-range reads of large files
(plain-path calls stay back-compatible).
All confined by the same path policy as read_file (_resolve_tool_path:
data/tmp allowlist + sensitive-file deny). Junk dirs (.git, node_modules,
venv, __pycache__, dist/build, …) skipped. Output capped (200 hits,
400 chars/line). Admin-gated like the other filesystem tools.
Wiring: schemas + native arg->content serializer (src/tool_schemas.py), tool
tags (src/agent_tools.py), always-available + descriptions (src/tool_index.py),
admin gate (src/tool_security.py), dispatch + impls (src/tool_execution.py).
Tests: tests/test_code_nav_tools.py — match/skip-junk/ignore-case/glob-filter,
allowlist rejection, glob/ls, read-range, and the no-ripgrep Python fallback.
* Add edit_file tool + file-change diffs
edit_file is an exact old_string -> new_string replacement on a file on disk
(fails if old_string is missing or non-unique unless replace_all); write_file
also returns a unified diff. Diffs render collapsed in the tool bubble
(filename + +adds/-dels, theme colors); the raw JSON command box is hidden.
Security: edit_file is a sensitive filesystem-write tool, treated everywhere
write_file is —
- added to NON_ADMIN_BLOCKED_TOOLS (is_public_blocked_tool / blocked_tools_for_owner),
so on auth-enabled deployments a non-admin cannot run it; execute_tool_block
refuses it for non-admin owners.
- confined by the same path policy as read_file/write_file (allowlist +
sensitive-file deny) via _resolve_tool_path.
Disambiguation in tool descriptions + bash prompt: edit_file/write_file are the
only way to write files (they show a diff) — never edit_document (editor panel)
or a bash heredoc/redirect.
Tests (tests/test_edit_file.py): non-admin block (policy + execution gate),
successful edit, not-found old_string, non-unique old_string (+ replace_all),
and path outside the allowed roots.
Files: src/tool_execution.py, src/agent_loop.py, src/tool_schemas.py,
src/agent_tools.py, src/tool_index.py, static/js/chat.js, static/style.css,
tests/test_edit_file.py.
* Drop redundant import os in write_file closure
os is already imported at module top.
Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:
* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).
Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.
`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.
Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.
Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.
`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.
Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
* feat(web-fetch): add web_fetch tool to read a specific URL's content
* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution
Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.