odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-15 17:25:26 -04:00

Author	SHA1	Message	Date
Kenny Van de Maele	620fdd0859	feat(agent): confine agent file/shell tools to a selectable workspace (#3665 ) * feat(agent): workspace confinement via context-local binding + get_workspace tool Bind the per-turn workspace once in execute_tool_block; the shared path resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd helper (agent_cwd) read it, so file tools + bash/python are confined centrally and a new tool that uses the shared helpers cannot accidentally bypass it. Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory modal (reusing existing modal/button CSS), the /workspace slash command, and a get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic (realpath/normcase/commonpath) and docker-safe (container paths, no host assumptions). Reopens #2023. * ux(workspace): clarify workspace is not a sandbox Picker modal note + pill tooltip + get_workspace tool/output wording now state plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder, but bash/python only start there (cwd) and are not sandboxed. Modal note reuses the existing .muted class. * fix(agent): treat an active workspace as file-work intent A vague low-signal message (e.g. "look at the local project") matches no domain keywords, so tool retrieval is skipped and only always-available tools are offered — leaving the agent with no file access even though a workspace is set. When a workspace is active, include the file/code tools (incl. get_workspace) on low-signal turns so the agent can act on the folder. Also requires the tool index (ChromaDB) to be reachable for normal retrieval; that is an environment dependency, not part of this change. * ux(workspace): hide pill + overflow entry in chat mode Workspace only scopes the agent's file/shell tools, so the pill and the overflow 'Workspace' entry are agent-only now — hidden in chat mode like the bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is called from the agent/chat setMode handler. * prompt(tools): steer bash/python to defer to the dedicated file tools bash/python schema descriptions (what native-tool-calling models read) were bare and gave no steer, so models would do file ops via the shell (e.g. writing SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python in the schema + tool-index + prompt section to prefer read_file/write_file/ edit_file/grep/glob/ls and only be used for what those do not cover. * prompt(tools): keep bash/python deferral generic (no hardcoded tool names) Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc. by name, so the guidance does not go stale if those tools are renamed. * style(workspace): drop em-dashes from added code comments/strings * ux(workspace): terser non-sandbox note in picker (no tool-name list) * ux(workspace): mirror terse non-sandbox wording in pill tooltip * chore: untrack local venv symlink (run-only, not part of the feature) * prompt(workspace): keep get_workspace text generic (no hardcoded tool names) * fix(agent): low-signal + workspace surfaces only read-only file tools Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but not write_file/edit_file/bash/python -- those wait for a request that actually calls for them (RAG retrieval still adds them on a real ask). * feat(workspace): cap browse listing at 500 dirs with a truncated hint Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local _MAX_BROWSE_DIRS so a directory with thousands of children does not dump every row into the picker; the response carries a truncated flag and the modal tells the user to type a path to jump in. * chore: untrack local venv symlink (run-only artifact) * fix(workspace): vet the workspace root against the sensitive-path deny list at bind time The in-workspace resolver deny-lists sensitive paths inside the workspace, but the empty-path search root is the workspace itself, so a workspace of ~/.ssh could be listed via ls with no path. vet_workspace() (public, in tool_execution next to the resolvers) rejects non-directories and sensitive roots before the path is ever bound; chat_routes uses it instead of its inline isdir check. * fix(workspace): reject filesystem roots and stop showing rejected workspaces as active Review findings from #3665: P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes every absolute path 'inside' the workspace and collapses confinement into host-wide file access. A root is its own dirname, so reject when dirname(resolved) == resolved; the browse response now carries a selectable flag and the picker disables 'Use this folder' on unselectable dirs. P3: /workspace set stored any string client-side and the chat route silently dropped rejected values, so the pill could claim a confinement that was not in effect. New admin-gated /api/workspace/vet validates manual paths before they persist (canonical path returned), and when a posted workspace is rejected at send time the stream emits workspace_rejected so the client clears the stored value and toasts instead of continuing silently. * fix(workspace): check caller privilege before vetting the posted workspace Review finding: /api/chat_stream called vet_workspace() on the posted value for every caller and emitted workspace_rejected on failure, so a non-admin who can chat but cannot use file/shell tools could distinguish existing directories from missing/file/sensitive/root paths by whether the event appeared. The resolution now lives in _resolve_request_workspace, which drops the submitted value uniformly for non-admin callers, with no vetting and no event, before the path ever touches the filesystem. Admin and single-user behavior is unchanged. Test pins that valid and invalid paths are indistinguishable for a non-admin and that vet_workspace is never invoked for them.	2026-06-11 18:17:54 +02:00
pewdiepie-archdaemon	1a529d63d9	Fix remaining CI regressions	2026-06-09 10:21:56 +09:00
pewdiepie-archdaemon	3b01760e95	Prepare tested main sync cleanup	2026-06-09 09:34:42 +09:00
RaresKeY	3a91c11ff8	fix: block app_api access to Cookbook host controls (#3231 )	2026-06-07 19:20:11 +02:00
RaresKeY	a3784da172	fix: block app_api access to shell routes (#3225 )	2026-06-07 15:19:08 +02:00
Nicholai	33edc40eae	fix: route misfenced web lookups to web tools Fixes #3067	2026-06-06 03:46:31 -06:00
Nicholai	86abcb75d0	fix: split Chroma embedding lanes (#3046 )	2026-06-06 03:17:19 -06:00
Nicholai	463713c2c6	feat(search): unify session transcript search (#2877 )	2026-06-05 18:08:31 -06:00
Kenny Van de Maele	8ce945d338	feat: Add plan mode to the chat agent (#638 ) * feat: Add plan mode to the chat agent Adds a plan mode: the agent investigates read-only, proposes a checklist, and waits for approval before changing anything. On approval it runs with full tools and checks items off as it goes. Enforcement reuses the existing disabled_tools gate. Includes a slash command: `/plan [on\|off]` (and `/toggle plan`) to flip the plan toggle from the chat input. - src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP). - src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the plan directive, force agent mode. - static/: plan toggle pill, Approve & Run, dockable plan window, task-list checkboxes, and the /plan slash command. - tests/test_plan_mode.py. * Plan mode: persistent re-referenceable plan + agent write-back Three improvements so a long plan survives a weak model and stays in reach: 1. Re-reference the plan (out-of-context fix). On the execution turn the frontend sends the approved checklist back (`approved_plan`); the backend pins it as a top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so the agent can always re-read the plan instead of losing the thread on a long run. New `build_active_plan_note()` (unit-tested). 2. Re-open / dock the plan anytime. The plan checklist is stored per-session (localStorage). When a plan exists, the plan-mode button opens a small menu ("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan window — so it can stay docked while the agent works. The window live-refreshes as the plan changes. 3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps `- [x]` after finishing them, or to revise steps when the user asks. Marker tool (no I/O) → `plan_update` SSE event → the stored plan + docked window update live. The ACTIVE PLAN note instructs the agent to use it. Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb), src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse `approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools / tool_index (always-available, not admin-gated). Frontend: static/js/chat.js (plan store, send `approved_plan`, handle `plan_update`, capture restated checklists), static/app.js (plan-button menu), static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key). Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py. * Plan mode: drop bash/python, rely on read-only discovery tools Shell can mutate (write files, hit the network) and can't be constrained to read-only at the tool layer, so plan mode no longer relies on a prompt to keep it well-behaved — bash/python are removed from the read-only allowlist and added to the fail-closed block set. Discovery is covered by the dedicated read-only tools (read_file, grep, glob, ls) instead. Rewrites the plan-mode directive to state shell is disabled and lists the available read-only tools positively. Addresses review feedback on #638. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Comment: note _MCP_READONLY_VERBS are prefixes not whole words Clarifies that entries like "summar" are intentional stems matched via startswith (covers summarise/summarize/summary), not typos. Addresses review feedback on #638. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Plan mode: clarify why gating inverts the allowlist into a denylist Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an allowlist, so it returns the inverse (all known tool names minus the allowlist). The static mutator set is a backstop for the schema-derived name list, which misses XML-only tools and can fail to import. Addresses review feedback on #638. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Plan mode: stop hardcoding the read-only tool list in the directive The model is already shown its available (read-only) tools by _assemble_prompt, which removes every disabled tool. Enumerating them again in the directive only duplicated that list and would drift as tools change. Point at the tools listed below instead. Addresses review feedback on #638.	2026-06-05 16:32:25 +02:00
Kenny Van de Maele	0a2adc9c96	Add ask_user tool: agent-posed multiple-choice questions (#2111 ) Let the agent pause and ask the user a multiple-choice question when a task is genuinely ambiguous and the answer changes what it does next — choosing between approaches, confirming an assumption, picking a target — instead of guessing. Modeled on the existing `ui_control` marker pattern: the `ask_user` tool returns an `ask_user` payload that the agent loop emits as an SSE event and then ends the turn. The frontend renders the question with clickable option buttons, a free-text "Other" input, and an x to dismiss; the user's choice is sent as the next message and the agent resumes with it in context. - src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O. Validates a non-empty question + 2..6 options, normalizes string/object options, returns the payload. - src/agent_loop.py: emit the `ask_user` event and break the round loop so the turn ends and waits for the user's selection. Stream the question as assistant text so it persists/replays (prevents a re-ask loop). - Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS, FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any user can be asked); the structured args serialize via the default json.dumps path. - routes/chat_routes.py: relay the `ask_user` event to the client. - static/js/chat.js + static/style.css: render the question card (options + free-text Other + dismiss x; removed once answered). Reuses CSS vars and the .modal-close button; emoji go through the monochrome-SVG pipeline. Bump chat.js cache pin. - tests/test_ask_user_tool.py: payload, multi flag, string options, option cap, validation errors, serializer round-trip, registration.	2026-06-05 11:49:11 +02:00
Kenny Van de Maele	367858a587	Merge branch 'main' into dev Bring main's maintainer-curated work (cookbook scheduler, calendar rendering/sync, settings polish, agent debug loop) into dev so dev is a superset of main (resolves the dev/main drift, #2543).	2026-06-05 10:50:51 +02:00
Nicholai	4df4cfeaff	Merge pull request #2387 from cirim-au/fix/manage-memory-always-available fix(tool_index): add manage_memory to ALWAYS_AVAILABLE	2026-06-05 02:14:10 -06:00
pewdiepie-archdaemon	f8aaeab245	Merge remote-tracking branch 'origin/dev'	2026-06-05 12:14:34 +09:00
pewdiepie-archdaemon	f19ac6ed03	Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus # Conflicts: # static/js/cookbookRunning.js	2026-06-05 11:23:15 +09:00
Kenny Van de Maele	7b4365fe57	Make write_file/edit_file always-available like read_file (#2684 ) read_file/grep/glob/ls are in ALWAYS_AVAILABLE but the on-disk write tools (write_file, edit_file) were only surfaced via per-query tool-RAG retrieval. On a bare 'edit X' request the retriever could miss them, so the model was never offered edit_file/write_file and wrongly fell back to edit_document (editor panel) or improvised with bash sed. Add both to ALWAYS_AVAILABLE next to read_file; they stay admin-gated by tool_security so non-admin exposure is unchanged. Fixes #2683	2026-06-05 00:02:14 +02:00
Kenny Van de Maele	1f00fff837	feat: add code-navigation tools (grep, glob, ls) + read_file line ranges (#1670 ) Gives the agent first-class code navigation instead of shelling out via bash (token-heavy, unreliable on weaker models, unstructured). Mirrors the Grep/Glob/Read primitives that Claude Code / opencode expose. - grep: regex search over file contents across a tree. Uses ripgrep when available (with explicit excludes so junk dirs are skipped even without a .gitignore); falls back to a pure-Python walk+regex when rg is absent. Returns file:line:match, capped. - glob: find files by glob pattern (recursive), newest first. - ls: list a directory (folders first, then files with sizes). - read_file: optional offset/limit for line-range reads of large files (plain-path calls stay back-compatible). All confined by the same path policy as read_file (_resolve_tool_path: data/tmp allowlist + sensitive-file deny). Junk dirs (.git, node_modules, venv, __pycache__, dist/build, …) skipped. Output capped (200 hits, 400 chars/line). Admin-gated like the other filesystem tools. Wiring: schemas + native arg->content serializer (src/tool_schemas.py), tool tags (src/agent_tools.py), always-available + descriptions (src/tool_index.py), admin gate (src/tool_security.py), dispatch + impls (src/tool_execution.py). Tests: tests/test_code_nav_tools.py — match/skip-junk/ignore-case/glob-filter, allowlist rejection, glob/ls, read-range, and the no-ripgrep Python fallback.	2026-06-04 18:37:32 +02:00
Kenny Van de Maele	7443c36bd9	feat: Add edit_file tool + file-change diffs (#1239 ) * Add edit_file tool + file-change diffs edit_file is an exact old_string -> new_string replacement on a file on disk (fails if old_string is missing or non-unique unless replace_all); write_file also returns a unified diff. Diffs render collapsed in the tool bubble (filename + +adds/-dels, theme colors); the raw JSON command box is hidden. Security: edit_file is a sensitive filesystem-write tool, treated everywhere write_file is — - added to NON_ADMIN_BLOCKED_TOOLS (is_public_blocked_tool / blocked_tools_for_owner), so on auth-enabled deployments a non-admin cannot run it; execute_tool_block refuses it for non-admin owners. - confined by the same path policy as read_file/write_file (allowlist + sensitive-file deny) via _resolve_tool_path. Disambiguation in tool descriptions + bash prompt: edit_file/write_file are the only way to write files (they show a diff) — never edit_document (editor panel) or a bash heredoc/redirect. Tests (tests/test_edit_file.py): non-admin block (policy + execution gate), successful edit, not-found old_string, non-unique old_string (+ replace_all), and path outside the allowed roots. Files: src/tool_execution.py, src/agent_loop.py, src/tool_schemas.py, src/agent_tools.py, src/tool_index.py, static/js/chat.js, static/style.css, tests/test_edit_file.py. * Drop redundant import os in write_file closure os is already imported at module top.	2026-06-04 18:29:10 +02:00
pewdiepie-archdaemon	9112861d8e	cookbook agent debug loop: persistent log files, auto-adopt orphan tmux, Codex/Claude skill parity Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner: * Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file. * `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400. * `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt). Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-` / `cookbook-` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll. `serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes. Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop. Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles. `adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd. Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).	2026-06-04 23:27:18 +09:00
Alexander Kenley	7b45a94b6d	Fix calendar routing and user-local time context (#408 ) * fix(chat): add user-local time context * fix(chat): route calendar follow-up phrasing * refactor(chat): log tool intent routing reasons * test(chat): align user time prompt shim --------- Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>	2026-06-04 13:20:04 +01:00
Dan (cirim)	911fd61100	fix(tool_index): add manage_memory to ALWAYS_AVAILABLE	2026-06-04 14:04:32 +10:00
lekt8	77614e9feb	Don't force-include the email toolset on every "tell me" query (#1707 ) (#1735 ) The agent tool-RAG force-includes a keyword hint's tools whenever any of its keywords appears in the query (word-boundary match). The email-intent hint listed "tell", which matches a huge fraction of requests — e.g. "visit <url> and tell me the title" — so the whole email toolset was force-included and crowded out the relevant tools. The model then saw a prompt dominated by email tools and reported it had no web search / could not visit the URL. Remove "tell" from the email keyword set. Genuine email intent still fires on email/mail/gmail/inbox/unread/message/send/reply. Test drives get_tools_for_query directly with retrieval stubbed (the keyword hints are deterministic, no embeddings needed): a "...tell me..." web query no longer pulls in email tools, a real email request still does. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 13:33:43 +09:00
pewdiepie-archdaemon	ff93a6c63b	Polish email and cookbook flows	2026-06-02 22:42:07 +09:00
mist	e249fa4557	Tools: match keyword hints on word boundaries `get_tools_for_query` force-includes whole tool families when the query mentions an intent keyword, but matched with a raw substring test (`kw in ql`). Short hints therefore fired inside unrelated words, bloating the tool set with irrelevant tools: - "fix" matched "prefix" -> document tools - "line" matched "deadline"/"online" -> document tools - "serve" matched "observe"/"reserve" -> cookbook serve tools - "reply" matched "replying" -> all email tools - "unread" matched "unreadable" -> all email tools Match each keyword on word boundaries instead (`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to the keyword matcher in topic_analyzer.py. Genuine intent keywords ("reply to this email", "edit the document", "serve the model") still match. This only removes substring-inside-a-word matches; it does not change whole -word matches (so e.g. an unrelated whole word like "tell" is a separate keyword-choice question, left untouched here). Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed; 3 of them fail on the pre-fix substring code), python -m py_compile src/tool_index.py, git diff --check.	2026-06-02 20:32:20 +09:00
Rifqi Akram	5b1e56407b	Add SSRF-guarded web fetch agent tool * feat(web-fetch): add web_fetch tool to read a specific URL's content * test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution Add explicit SSRF regression tests for the web_fetch path covering loopback, private LAN ranges, link-local/metadata, IPv6 private/local, redirect-into-private, and unsupported schemes. Harden _public_http_url to fail closed when a hostname resolves to no addresses.	2026-06-01 16:57:28 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

25 Commits