Commit Graph

34 Commits

Author SHA1 Message Date
pewdiepie-archdaemon 6d507f8128 Merge remote-tracking branch 'origin/dev' into test-main-dev-merge-20260615
# Conflicts:
#	src/tool_implementations.py
#	static/js/research/panel.js
2026-06-15 21:20:15 +09:00
pewdiepie-archdaemon 2cbd55b8bd Open email context for agent, email search across All Mail, cookbook serve polish
- Agent: pass the open email reader (uid/folder/account/from/subject/body
  preview) on every chat submit so 'reply to this' / 'write email saying
  hi' route to ui_control open_email_reply with the right UID instead of
  inventing a new .md draft. Code-level enforcement (chat_routes strips
  create_document + send_email when active_email is set); cross-session
  active_doc_id is now trusted instead of being silently dropped.
  set_active_email/clear_active_email tool-layer helpers in
  tool_implementations.

- ui_control open_email_reply: optional body argument so the agent can
  open-and-write in one call; envelope now forwards uid/folder/account/
  body/panel through tool_output. Tool description sharpened and the
  parser rejects empty bodies on reply/reply-all (forces the agent to
  write rather than open an empty draft).

- Email library: search now runs against [Gmail]/All Mail when the
  current folder is INBOX (archived emails surface). Whirlpool spinner
  + 'Searching…' placeholder while in flight. Each search result is
  stamped with its source folder so clicks open the right email instead
  of whatever shares its UID in INBOX. Search no longer re-applies the
  same text pill locally (which only checks subject/from/snippet, never
  body) so body-only matches don't get dropped after IMAP returns them.
  Initial inbox load bumped 100→500.

- Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the
  card menu and the open-reader more menu, backed by a new
  /api/email/flag/{uid}?on=true|false endpoint. Flagged emails always
  bubble to the top of the grid regardless of active sort.

- AI reply in doc editor: never overwrites existing draft text or the
  quoted history. AI suggestion is prepended; AI-generated 'On …
  wrote:' re-quotes are stripped so the original quote isn't visually
  edited.

- Cookbook serve: pre-launch GPU driver / has_gpu / install / version-
  floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0
  etc.) before the launch chain starts. Detect 'another model already
  running on this host' and offer Stop & launch (with graceful then
  force tmux kill helpers, port release wait). Per-vendor deep-link
  buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend
  picker is now a custom dropdown with accent-coloured logos for vLLM,
  SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to
  package names in Dependencies. Runtime-readiness note moved inside
  the panel (green when ready, red when missing) with an × dismiss.
  Esc collapses the expanded card; expanded card scrolls when it
  overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce
  Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on
  one row (Reasoning Parser auto-detected per model family).
  Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto'
  input — command builders read from the GPU button strip. Default
  cookbook open is Download tab.

- Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by
  release_date; release dates can be backfilled with the new
  scripts/backfill_model_release_dates.py and recipe metadata pulled
  with scripts/import_from_vllm_recipes.py against the upstream
  vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped
  on entries).

- Calendar: Quick add hint cycles a random Odysseus-themed example per
  open (wooden horse Friday, crew muster 10am daily, council on
  Ithaca, …). Typing a time like '11pm' in the event title updates
  the hero clock live.

- Doc editor: email-mode Reply button (sparkle icon, accent) opens the
  same Fast/Full + context popover the email reader uses; Ctrl+Alt+M
  toggles markdown preview.

- Memories panel: custom sort picker with per-option icons, default
  'Latest', visible Enabled/Disabled toggle text matching the section
  description style.
2026-06-15 20:47:51 +09:00
Dividesbyzer0 a07fe35936 fix(agent): honor explicit web search requests
Promote explicit web-search phrasing to tool use and keep web_search/web_fetch available for that turn even when the stale web toggle is false.
2026-06-15 15:02:10 +09:00
Kenny Van de Maele 620fdd0859 feat(agent): confine agent file/shell tools to a selectable workspace (#3665)
* feat(agent): workspace confinement via context-local binding + get_workspace tool

Bind the per-turn workspace once in execute_tool_block; the shared path
resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd
helper (agent_cwd) read it, so file tools + bash/python are confined centrally
and a new tool that uses the shared helpers cannot accidentally bypass it.

Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory
modal (reusing existing modal/button CSS), the /workspace slash command, and a
get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic
(realpath/normcase/commonpath) and docker-safe (container paths, no host
assumptions). Reopens #2023.

* ux(workspace): clarify workspace is not a sandbox

Picker modal note + pill tooltip + get_workspace tool/output wording now state
plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder,
but bash/python only start there (cwd) and are not sandboxed. Modal note reuses
the existing .muted class.

* fix(agent): treat an active workspace as file-work intent

A vague low-signal message (e.g. "look at the local project") matches no
domain keywords, so tool retrieval is skipped and only always-available tools
are offered — leaving the agent with no file access even though a workspace is
set. When a workspace is active, include the file/code tools (incl.
get_workspace) on low-signal turns so the agent can act on the folder.

Also requires the tool index (ChromaDB) to be reachable for normal retrieval;
that is an environment dependency, not part of this change.

* ux(workspace): hide pill + overflow entry in chat mode

Workspace only scopes the agent's file/shell tools, so the pill and the
overflow 'Workspace' entry are agent-only now — hidden in chat mode like the
bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is
called from the agent/chat setMode handler.

* prompt(tools): steer bash/python to defer to the dedicated file tools

bash/python schema descriptions (what native-tool-calling models read) were
bare and gave no steer, so models would do file ops via the shell (e.g. writing
SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python
in the schema + tool-index + prompt section to prefer read_file/write_file/
edit_file/grep/glob/ls and only be used for what those do not cover.

* prompt(tools): keep bash/python deferral generic (no hardcoded tool names)

Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc.
by name, so the guidance does not go stale if those tools are renamed.

* style(workspace): drop em-dashes from added code comments/strings

* ux(workspace): terser non-sandbox note in picker (no tool-name list)

* ux(workspace): mirror terse non-sandbox wording in pill tooltip

* chore: untrack local venv symlink (run-only, not part of the feature)

* prompt(workspace): keep get_workspace text generic (no hardcoded tool names)

* fix(agent): low-signal + workspace surfaces only read-only file tools

Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message
in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but
not write_file/edit_file/bash/python -- those wait for a request that actually
calls for them (RAG retrieval still adds them on a real ask).

* feat(workspace): cap browse listing at 500 dirs with a truncated hint

Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local
_MAX_BROWSE_DIRS so a directory with thousands of children does not dump every
row into the picker; the response carries a truncated flag and the modal tells
the user to type a path to jump in.

* chore: untrack local venv symlink (run-only artifact)

* fix(workspace): vet the workspace root against the sensitive-path deny list at bind time

The in-workspace resolver deny-lists sensitive paths inside the workspace,
but the empty-path search root is the workspace itself, so a workspace of
~/.ssh could be listed via ls with no path. vet_workspace() (public, in
tool_execution next to the resolvers) rejects non-directories and sensitive
roots before the path is ever bound; chat_routes uses it instead of its
inline isdir check.

* fix(workspace): reject filesystem roots and stop showing rejected workspaces as active

Review findings from #3665:

P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes
every absolute path 'inside' the workspace and collapses confinement into
host-wide file access. A root is its own dirname, so reject when
dirname(resolved) == resolved; the browse response now carries a selectable
flag and the picker disables 'Use this folder' on unselectable dirs.

P3: /workspace set stored any string client-side and the chat route silently
dropped rejected values, so the pill could claim a confinement that was not
in effect. New admin-gated /api/workspace/vet validates manual paths before
they persist (canonical path returned), and when a posted workspace is
rejected at send time the stream emits workspace_rejected so the client
clears the stored value and toasts instead of continuing silently.

* fix(workspace): check caller privilege before vetting the posted workspace

Review finding: /api/chat_stream called vet_workspace() on the posted value
for every caller and emitted workspace_rejected on failure, so a non-admin
who can chat but cannot use file/shell tools could distinguish existing
directories from missing/file/sensitive/root paths by whether the event
appeared. The resolution now lives in _resolve_request_workspace, which
drops the submitted value uniformly for non-admin callers, with no vetting
and no event, before the path ever touches the filesystem. Admin and
single-user behavior is unchanged. Test pins that valid and invalid paths
are indistinguishable for a non-admin and that vet_workspace is never
invoked for them.
2026-06-11 18:17:54 +02:00
pewdiepie-archdaemon f5ad59317c Tool retrieval: HARD drop manage_memory when query is a contact-save pattern
Description-level steering wasn't enough — even with the explicit 'DO
NOT use for info about another person' in manage_memory's description,
models kept choosing memory over manage_contact. They can't if memory
isn't in the toolset.

New logic in ToolIndex.get_tools_for_query: detect three contact-save
patterns and discard manage_memory from the returned set (overriding
ALWAYS_AVAILABLE):

1. 'save [up to 3 words] for/to <name>' where <name> isn't a timing /
   pronoun stopword (later, tomorrow, me, you, future, etc.). Catches
   the canonical 'save this for X' and the wider 'save this address
   for X', 'save it for X'.
2. 'to/in/into (my) contacts' or 'address book'. Catches both 'add X
   to my contacts' and 'put this in my address book for X'.
3. Possessive: 'save (his/her/their) (address/phone/email/...)'.
   Stronger signal — also force-adds manage_contact to the set in
   case the keyword fallback missed it.

Verified: 8 positive contact patterns all drop memory, 10 false-
positive 'save X for later/tomorrow/me/the next thing' all keep it.
2026-06-11 09:46:34 +09:00
pewdiepie-archdaemon df47536b8d manage_memory descriptions: explicit deferral to manage_contact for person info
Even with manage_contact in the retrieved tool set, models were still
defaulting to manage_memory when the user pasted an address + 'save for
<person>'. Both tools were in front of the model and it picked memory.

Tighten both descriptions to steer at decision-time:
- agent_loop.py manage_memory description: clarify scope is facts
  about the USER, with an explicit 'DO NOT use for info about another
  person' + a 'use manage_contact instead' line.
- tool_index.py manage_memory description: same in shorter form, so the
  embedded retrieval signal is consistent with the prompt-time
  description.
2026-06-11 09:25:23 +09:00
pewdiepie-archdaemon 8a00f954a9 Tool retrieval: catch 'add X to (my) contacts' / 'address book' phrasings
The literal phrase 'add to contacts' missed when there was a name
between 'add' and 'to', e.g. 'add Pat to my contacts'. Anchor on the
tail with 'to my contacts', 'to contacts', 'to address book' so word
boundaries fire regardless of what sits in front.
2026-06-11 09:18:30 +09:00
pewdiepie-archdaemon 8632072ce0 Contacts: postal-address support via vCard ADR, keep tool prompt minimal
Closes the gap that pushed the agent into manage_memory when the user
pasted an address and said 'save this for X'. manage_contact now
accepts an optional address arg end-to-end:

- routes/contacts_routes.py:
  - _normalize_contact carries an 'address' field
  - _build_vcard emits ADR:;;<address>;;;; (street component of the
    RFC-6350 7-part ADR), only when address is non-empty
  - _parse_vcards reads ADR, joins non-empty components with ', '
  - _create_contact and _update_contact thread address through;
    update preserves existing address when caller passes empty
- src/tool_implementations.py do_manage_contact:
  - add accepts address; require at least name+address or email
    (was: email required) so address-only contacts are addable
  - update accepts address; require name OR emails OR address
- src/tool_schemas.py: schema gets a single 'address' string field
- src/tool_index.py + src/agent_loop.py: descriptions get one
  'address' arg mention and a 'use this for save-X-for-person /
  address pastes / phone-with-name' steering line. Net: a few
  bytes added, not a paragraph.

Also: removed a stray name from the schema's manage_contact example
strings ('save Jonathan's email…') — no real names in the codebase.
2026-06-11 09:14:52 +09:00
pewdiepie-archdaemon 153b788134 Tool retrieval: surface manage_contact for 'save X for <person>' patterns
When the user dumps a postal address or phone number alongside a
person's name and says 'save this for X', the vector retriever was
missing manage_contact because its description only mentioned the
literal word 'contact'. The model defaulted to manage_memory (which is
in ALWAYS_AVAILABLE), so the saved fact ended up as un-named memory
that wouldn't surface on a later 'what's X's address?' search.

- Rewrite manage_contact's index description to anchor on the
  semantics: 'save info about another person', including postal/
  mailing address, ZIP, phone, etc. Now it embeds close to address-
  paste queries.
- Extend the keyword intent-map with 'save this for', 'save it for',
  'mailing address', 'postal code', 'their address', etc. — common
  ways users say 'this belongs to a contact' without the literal word
  'contact'.
2026-06-11 08:56:42 +09:00
pewdiepie-archdaemon 4f7061fd61 Settings overhaul + UI polish pass
Two months of iteration on the Settings panel, integration forms, and
small visual nudges across the app. Highlights:

Settings restructure
- Add Models: split into separate Local + API cards (no more in-card
  tabs); each fuses Type/Provider with the URL input.
- Added Models: new dedicated sidebar tab, with Probe + Clear-offline
  pulled into its header; Local/API sub-section icons accent-tinted.
- Search: Web Search and a new Deep Research card (Model + tuning),
  with a cross-link to AI Defaults. Provider hints use real clickable
  anchors; Web Search Test button shows a whirlpool spinner.
- AI Defaults: Image Generation card returns; Research Model card
  carries only Endpoint+Model with a cross-link to Search; Vision /
  Default / Utility fallbacks unified under one numbered-row design
  matching Search's chain.
- API Permissions (was 'API Tokens'): per-row rename, inline
  Permissions toggle that expands the scope-edit panel, in-field
  copy icons (icon→check on success). Empty state accent-tinted.
- Integrations: + Add Integration drops a type-picker menu directly
  under the button (drop-up on tight viewports); each integration
  form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses
  the same accent-outlined Save/Test/Cancel buttons right-aligned.
- Danger Zone: Wipe→Delete with trash icons; new 'Delete everything'
  row at the bottom that loops every category.

AI Synthesis (Reminders)
- Persona dropdown sourced from PROMPT_TEMPLATES + custom preset.
- src/reminder_personas.py mirrors the five built-ins for the
  server-side synthesis path.
- dispatch_reminder() reads reminder_llm_persona and uses the
  persona's system prompt; empty/unknown falls back to warm-neutral.

Esc handling
- Kebab menus and the provider picker intercept Esc in capture phase
  so dismissing a popup no longer closes the whole Settings modal.

Accent tinting
- Scoped CSS rule across data-settings-panel=ai/services/added-models/
  search/integrations/reminders for card h2 icons + the Added Models
  sub-section icons.

Codex/Claude integration form
- No more auto-creation on form open — explicit Create token button.
- New tokens start with every scope granted; existing tokens move out
  of the integration form into the API Permissions card.
- Setup reveal: copy buttons inline inside the token + setup code
  blocks; shorter subtitle wording.

Misc visual polish
- Save/Test/Cancel uniformly accent-outlined and right-aligned on
  every integration form.
- Provider logos render inline next to the search fallback selects
  and the Deep Research Search dropdown.
- Trash icons in fallback rows bumped to 20x20 so they fill the 32px
  button.
- Image generation default flipped to off.
2026-06-10 15:15:13 +09:00
pewdiepie-archdaemon 1a529d63d9 Fix remaining CI regressions 2026-06-09 10:21:56 +09:00
pewdiepie-archdaemon 3b01760e95 Prepare tested main sync cleanup 2026-06-09 09:34:42 +09:00
RaresKeY 3a91c11ff8 fix: block app_api access to Cookbook host controls (#3231) 2026-06-07 19:20:11 +02:00
RaresKeY a3784da172 fix: block app_api access to shell routes (#3225) 2026-06-07 15:19:08 +02:00
Nicholai 33edc40eae fix: route misfenced web lookups to web tools
Fixes #3067
2026-06-06 03:46:31 -06:00
Nicholai 86abcb75d0 fix: split Chroma embedding lanes (#3046) 2026-06-06 03:17:19 -06:00
Nicholai 463713c2c6 feat(search): unify session transcript search (#2877) 2026-06-05 18:08:31 -06:00
Kenny Van de Maele 8ce945d338 feat: Add plan mode to the chat agent (#638)
* feat: Add plan mode to the chat agent

Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.

Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.

- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
  plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
  checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.

* Plan mode: persistent re-referenceable plan + agent write-back

Three improvements so a long plan survives a weak model and stays in reach:

1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
   sends the approved checklist back (`approved_plan`); the backend pins it as a
   top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
   the agent can always re-read the plan instead of losing the thread on a long
   run. New `build_active_plan_note()` (unit-tested).

2. Re-open / dock the plan anytime. The plan checklist is stored per-session
   (localStorage). When a plan exists, the plan-mode button opens a small menu
   ("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
   window — so it can stay docked while the agent works. The window live-refreshes
   as the plan changes.

3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
   `- [x]` after finishing them, or to revise steps when the user asks. Marker
   tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
   update live. The ACTIVE PLAN note instructs the agent to use it.

Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.

* Plan mode: drop bash/python, rely on read-only discovery tools

Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.

Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Comment: note _MCP_READONLY_VERBS are prefixes not whole words

Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: clarify why gating inverts the allowlist into a denylist

Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: stop hardcoding the read-only tool list in the directive

The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
2026-06-05 16:32:25 +02:00
Kenny Van de Maele 0a2adc9c96 Add ask_user tool: agent-posed multiple-choice questions (#2111)
Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.

Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.

- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
  Validates a non-empty question + 2..6 options, normalizes string/object
  options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
  the turn ends and waits for the user's selection. Stream the question as
  assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
  FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
  user can be asked); the structured args serialize via the default
  json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
  free-text Other + dismiss x; removed once answered). Reuses CSS vars and
  the .modal-close button; emoji go through the monochrome-SVG pipeline.
  Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
  cap, validation errors, serializer round-trip, registration.
2026-06-05 11:49:11 +02:00
Kenny Van de Maele 367858a587 Merge branch 'main' into dev
Bring main's maintainer-curated work (cookbook scheduler, calendar rendering/sync, settings polish, agent debug loop) into dev so dev is a superset of main (resolves the dev/main drift, #2543).
2026-06-05 10:50:51 +02:00
Nicholai 4df4cfeaff Merge pull request #2387 from cirim-au/fix/manage-memory-always-available
fix(tool_index): add manage_memory to ALWAYS_AVAILABLE
2026-06-05 02:14:10 -06:00
pewdiepie-archdaemon f8aaeab245 Merge remote-tracking branch 'origin/dev' 2026-06-05 12:14:34 +09:00
pewdiepie-archdaemon f19ac6ed03 Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus
# Conflicts:
#	static/js/cookbookRunning.js
2026-06-05 11:23:15 +09:00
Kenny Van de Maele 7b4365fe57 Make write_file/edit_file always-available like read_file (#2684)
read_file/grep/glob/ls are in ALWAYS_AVAILABLE but the on-disk write tools
(write_file, edit_file) were only surfaced via per-query tool-RAG retrieval.
On a bare 'edit X' request the retriever could miss them, so the model was
never offered edit_file/write_file and wrongly fell back to edit_document
(editor panel) or improvised with bash sed. Add both to ALWAYS_AVAILABLE
next to read_file; they stay admin-gated by tool_security so non-admin
exposure is unchanged.

Fixes #2683
2026-06-05 00:02:14 +02:00
Kenny Van de Maele 1f00fff837 feat: add code-navigation tools (grep, glob, ls) + read_file line ranges (#1670)
Gives the agent first-class code navigation instead of shelling out via bash
(token-heavy, unreliable on weaker models, unstructured). Mirrors the
Grep/Glob/Read primitives that Claude Code / opencode expose.

- grep: regex search over file contents across a tree. Uses ripgrep when
  available (with explicit excludes so junk dirs are skipped even without a
  .gitignore); falls back to a pure-Python walk+regex when rg is absent.
  Returns file:line:match, capped.
- glob: find files by glob pattern (recursive), newest first.
- ls: list a directory (folders first, then files with sizes).
- read_file: optional offset/limit for line-range reads of large files
  (plain-path calls stay back-compatible).

All confined by the same path policy as read_file (_resolve_tool_path:
data/tmp allowlist + sensitive-file deny). Junk dirs (.git, node_modules,
venv, __pycache__, dist/build, …) skipped. Output capped (200 hits,
400 chars/line). Admin-gated like the other filesystem tools.

Wiring: schemas + native arg->content serializer (src/tool_schemas.py), tool
tags (src/agent_tools.py), always-available + descriptions (src/tool_index.py),
admin gate (src/tool_security.py), dispatch + impls (src/tool_execution.py).

Tests: tests/test_code_nav_tools.py — match/skip-junk/ignore-case/glob-filter,
allowlist rejection, glob/ls, read-range, and the no-ripgrep Python fallback.
2026-06-04 18:37:32 +02:00
Kenny Van de Maele 7443c36bd9 feat: Add edit_file tool + file-change diffs (#1239)
* Add edit_file tool + file-change diffs

edit_file is an exact old_string -> new_string replacement on a file on disk
(fails if old_string is missing or non-unique unless replace_all); write_file
also returns a unified diff. Diffs render collapsed in the tool bubble
(filename + +adds/-dels, theme colors); the raw JSON command box is hidden.

Security: edit_file is a sensitive filesystem-write tool, treated everywhere
write_file is —
  - added to NON_ADMIN_BLOCKED_TOOLS (is_public_blocked_tool / blocked_tools_for_owner),
    so on auth-enabled deployments a non-admin cannot run it; execute_tool_block
    refuses it for non-admin owners.
  - confined by the same path policy as read_file/write_file (allowlist +
    sensitive-file deny) via _resolve_tool_path.

Disambiguation in tool descriptions + bash prompt: edit_file/write_file are the
only way to write files (they show a diff) — never edit_document (editor panel)
or a bash heredoc/redirect.

Tests (tests/test_edit_file.py): non-admin block (policy + execution gate),
successful edit, not-found old_string, non-unique old_string (+ replace_all),
and path outside the allowed roots.

Files: src/tool_execution.py, src/agent_loop.py, src/tool_schemas.py,
src/agent_tools.py, src/tool_index.py, static/js/chat.js, static/style.css,
tests/test_edit_file.py.

* Drop redundant import os in write_file closure

os is already imported at module top.
2026-06-04 18:29:10 +02:00
pewdiepie-archdaemon 9112861d8e cookbook agent debug loop: persistent log files, auto-adopt orphan tmux, Codex/Claude skill parity
Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:

* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).

Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.

`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.

Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.

Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.

`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.

Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
2026-06-04 23:27:18 +09:00
Alexander Kenley 7b45a94b6d Fix calendar routing and user-local time context (#408)
* fix(chat): add user-local time context

* fix(chat): route calendar follow-up phrasing

* refactor(chat): log tool intent routing reasons

* test(chat): align user time prompt shim

---------

Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-04 13:20:04 +01:00
Dan (cirim) 911fd61100 fix(tool_index): add manage_memory to ALWAYS_AVAILABLE 2026-06-04 14:04:32 +10:00
lekt8 77614e9feb Don't force-include the email toolset on every "tell me" query (#1707) (#1735)
The agent tool-RAG force-includes a keyword hint's tools whenever any of its
keywords appears in the query (word-boundary match). The email-intent hint listed
"tell", which matches a huge fraction of requests — e.g. "visit <url> and tell
me the title" — so the whole email toolset was force-included and crowded out the
relevant tools. The model then saw a prompt dominated by email tools and reported
it had no web search / could not visit the URL.

Remove "tell" from the email keyword set. Genuine email intent still fires on
email/mail/gmail/inbox/unread/message/send/reply.

Test drives get_tools_for_query directly with retrieval stubbed (the keyword
hints are deterministic, no embeddings needed): a "...tell me..." web query no
longer pulls in email tools, a real email request still does.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:33:43 +09:00
pewdiepie-archdaemon ff93a6c63b Polish email and cookbook flows 2026-06-02 22:42:07 +09:00
mist e249fa4557 Tools: match keyword hints on word boundaries
`get_tools_for_query` force-includes whole tool families when the query
mentions an intent keyword, but matched with a raw substring test
(`kw in ql`). Short hints therefore fired inside unrelated words, bloating
the tool set with irrelevant tools:

  - "fix" matched "prefix"      -> document tools
  - "line" matched "deadline"/"online" -> document tools
  - "serve" matched "observe"/"reserve" -> cookbook serve tools
  - "reply" matched "replying"  -> all email tools
  - "unread" matched "unreadable" -> all email tools

Match each keyword on word boundaries instead
(`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to
the keyword matcher in topic_analyzer.py. Genuine intent keywords
("reply to this email", "edit the document", "serve the model") still match.

This only removes substring-inside-a-word matches; it does not change whole
-word matches (so e.g. an unrelated whole word like "tell" is a separate
keyword-choice question, left untouched here).

Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed;
3 of them fail on the pre-fix substring code), python -m py_compile
src/tool_index.py, git diff --check.
2026-06-02 20:32:20 +09:00
Rifqi Akram 5b1e56407b Add SSRF-guarded web fetch agent tool
* feat(web-fetch): add web_fetch tool to read a specific URL's content

* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution

Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.
2026-06-01 16:57:28 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00