* test(tools): add shim protection test for tool_implementations split
Covers all 48 top-level functions (33 do_* + 15 _helpers) extracted from
the original module. Guards the upcoming split: the shim must re-export
every symbol so existing 'from src.tool_implementations import X' imports
keep working. Passes on baseline (pre-split).
* refactor(tools): add src/tools/ package with shared _common
Slice 1 Task 2 (#4082/#4071). Adds the package skeleton and moves the
shared _parse_tool_args helper into src/tools/_common.py. Domain modules
will import from here. tool_implementations.py is untouched at this step.
* refactor(tools): extract system domain into src/tools/system.py
Slice 1 (#4082/#4071), Task 3: move the system-domain tool functions
(do_manage_skills/_skill_dump/do_manage_tasks/do_manage_endpoints/
do_manage_mcp/do_manage_webhooks/do_manage_tokens/do_manage_settings/
do_api_call/do_app_api) and the app_api blocklist constants out of
tool_implementations.py into a new src/tools/system.py module.
tool_implementations.py re-imports all of them so it stays a working
backward-compatible facade (shim test stays green).
- do_manage_mcp resolves get_mcp_manager via a function-local import
from tool_implementations so the test that patches
src.tool_implementations.get_mcp_manager still applies post-move.
- do_app_api imports _internal_headers and _INTERNAL_BASE (still in
tool_implementations) function-locally to avoid a circular import.
- Repoint test_context_budget introspection assertion to the moved
code's new home in src/tools/system.py.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(tools): extract cookbook domain into src/tools/cookbook.py
Moves the model-serving (cookbook) tool domain out of tool_implementations.py
into src/tools/cookbook.py as part of slice 1 (#4082/#4071):
- 13 do_* tools: download/serve/list/stop/tail/search/adopt/cached models,
list downloads/cancel, list cookbook servers, serve presets
- 9 private helpers: _cookbook_servers, _resolve_cookbook_host,
_cookbook_env_for_host, _infer_serve_{port,host}, _ensure_served_endpoint,
_cookbook_register_task, _cookbook_apply_retry_suggestion,
_scan_running_model_processes, _cookbook_kill_session
- _MODEL_PROCESS_PATTERNS constant (used only by _scan_running_model_processes)
tool_implementations.py stays a backward-compatible facade via a re-import
from src.tools.cookbook; src/tools/__init__ re-exports the same symbols.
_internal_headers and _INTERNAL_BASE stay in tool_implementations.py (shared
by system.py's do_app_api and many cookbook funcs). Each cookbook function
that needs them does a function-local import to avoid a top-level circular
dependency, matching the system-domain split.
Verified: compileall clean; shim test green; cookbook-touching suite
(652 passed, 1 skipped); full suite 3587 passed, 2 failed
(pre-existing test_api_chat_security, unrelated).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(tools): extract search domain into src/tools/search.py
* refactor(tools): extract notes domain into src/tools/notes.py
* refactor(tools): extract calendar domain into src/tools/calendar.py
Repoints tests/test_caldav_bidirectional_sync.py source-introspection
to src/tools/calendar.py (do_manage_calendar moved there).
* refactor(tools): extract image domain into src/tools/image.py
* refactor(tools): extract research domain into src/tools/research.py
* refactor(tools): extract contacts domain into src/tools/contacts.py
* refactor(tools): extract vault domain into src/tools/vault.py
Repoints tests/test_vault_password_not_in_argv.py source-introspection
to src/tools/vault.py (the vault do_* helpers moved there).
* refactor(tools): collapse tool_implementations to clean re-export shim
Move shared _INTERNAL_BASE/_internal_headers to src/tools/_common.py and
drop the duplicate _parse_tool_args (already in _common). tool_implementations.py
is now a pure re-export facade (+ 3 pre-existing email-context helpers, out of
scope). Domain files' function-local imports of these names still resolve via
the facade re-export.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): port upstream cookbook workflow changes to split module
Rebase onto dev dropped c504214 ("Cookbook model workflow fixes") edits
to do_serve_model / do_tail_serve_output: the extraction commit moved
the pre-edit bodies into src/tools/cookbook.py and git auto-accepted the
deletion from tool_implementations.py, losing dev's changes. Restore them
in their post-split home:
- do_serve_model: add where/log_path/next_tools and the expanded
"Next required check" output message
- do_tail_serve_output: empty-output fallback message replacing
"(empty pane)"
(do_manage_settings web_fetch alias edit was already applied to
src/tools/system.py during the system-extract conflict resolution.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): break admin_tools circular import in split facade
After rebasing onto dev (#3629 moved the admin manage_* tools into
src/agent_tools/admin_tools), the facade re-exported them via a top-level
`from src.agent_tools.admin_tools import ...`. But src.agent_tools.__init__
imports this facade at top level, so the eager import re-entered the
partially-initialized agent_tools package and broke collection.
Re-export the admin symbols (do_manage_endpoints/mcp/webhooks/tokens/
settings, _MCP_DENIED_COMMANDS, _validate_mcp_command) lazily through
module __getattr__ instead, and drop them from src/tools/__init__ (they
no longer live in the src.tools package). system.py now holds only the
skills/tasks/api bridges; admin tools live solely in admin_tools.py,
matching upstream.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): re-export dropped helpers through the split shim
Address review finding from #4423: the compatibility facade claimed to
preserve every original top-level symbol but omitted three helpers the
old src.tool_implementations exposed. Re-export them and pin them in
the shim protection test:
- _string_arg, _validate_cookbook_ssh_target <- src/tools/cookbook.py
- _mcp_allowed_commands <- src/agent_tools/admin_tools.py (lazily via
__getattr__, to keep the agent_tools.__init__ <-> facade import acyclic
after the #3629 admin-tools migration)
All three added to tests/test_tool_implementations_shim.py _EXPECTED so
the test contract now matches its "every original top-level function"
comment.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(tools): self-verify shim re-exports every domain do_*
The hand-maintained _EXPECTED list in the shim protection test can drift
silently when a new tool is added to a domain module but not re-exported
by the facade — exactly the omission a reviewer flagged post-split.
Add an auto-discovering test that enumerates every do_* from the domain
modules (incl. admin_tools) and asserts reachability through the shim,
so a forgotten re-export fails the build automatically.
Uses hasattr (not dir(ti)) because the admin symbols are re-exported
lazily via module __getattr__ and don't appear in dir(ti).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(tools): self-verify every in-repo facade import resolves
RaresKeY's P3 on the shim test was a claim-vs-reality gap: the docstring
said it protected "every from src.tool_implementations import X" but the
hand-maintained _EXPECTED list omitted three underscore helpers, so the
claim wasn't enforced. Re-exporting the three (cf1f5e3) fixed the known
gap; this closes the structural one.
Add test_every_facade_import_in_repo_resolves: ast-enumerate every
`from src.tool_implementations import X` site in src/ and tests/ and
assert hasattr(ti, X) for each. A forgotten re-export that anything in
the repo imports now fails the build automatically — including underscore
helpers, which the do_* discovery test does not cover.
Together with test_shim_reexports_every_domain_do_function, the shim
contract is now self-verifying. Demote _EXPECTED in the docstring to the
curated historical/downstream surface (the three helpers have no in-repo
consumer, so they stay manual by necessity) instead of "ground truth".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): dedupe _parse_tool_args + align shim guard with route consumers
Addresses two P3s from review (RaresKeY, 2026-06-26):
1. maintainability — _common carried a full copy of _parse_tool_args
alongside the canonical src.tool_utils one; future parser fixes could
diverge. The two bodies were byte-identical in logic, so _common now
re-exports from tool_utils (a leaf module, no circular-import risk).
The single-source test is extended to assert _common._parse_tool_args
and tool_implementations._parse_tool_args are the same object as
tool_utils._parse_tool_args.
2. test — the shim guard's import-site scan only walked src/ and tests/,
missing routes/chat_routes.py's clear_active_email/set_active_email
imports, and _EXPECTED omitted the active-email facade helpers. The
scan now walks every first-party Python dir (pruning venvs/caches/data
in-place), and set/get/clear_active_email are added to _EXPECTED
(get_active_email has no in-repo importer, so the scan alone can't see
it).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: yuandonghao <yuandonghao@cohl.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Upstream bug (present in pewdiepie-archdaemon/odysseus main): the task
executor passes task.endpoint_url VERBATIM to the model HTTP call, unlike
the chat path which stores build_chat_url(normalize_base(base)) on the
session. A task carrying an explicit bare OpenAI-compatible base such as
"http://host:11434/v1" therefore POSTs to a 404 ("page not found"); the
agent loop swallows the empty body into "The model returned an empty
response" and marks the run success, so nothing surfaces the failure.
Tasks that omit an endpoint dodge this only because _resolve_defaults()
cribs an already-full URL from a recent chat session. The API/token path
(e.g. an external client that POSTs /api/tasks with endpoint_url=".../v1")
hits it every time.
Fix: route every resolved task endpoint through _normalize_chat_endpoint()
at the three resolution sites (_execute_llm_task, the persona/research
session path, and _execute_research_task). The helper is idempotent
(strips any existing chat suffix, re-appends the correct one) and leaves
native-Ollama (/api...) and already-concrete URLs untouched, so other
providers are unaffected. Proven via isolated repro: ".../v1" -> 404 ->
empty; ".../v1/chat/completions" -> 200 -> real gemma4:31b output.
Regression test asserts the bare-/v1 -> full-chat-URL mapping, idempotency,
and the native-Ollama/empty passthroughs.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
src/exceptions.py was a byte-for-byte duplicate of the canonical
core/exceptions.py. Replace its class bodies with a re-export shim
(mirroring the core/constants.py -> src/constants.py pattern) so the
exception classes are defined in exactly one place. Also fix the stale
"# src/exceptions.py" header comment in core/exceptions.py.
No behavior change: both import paths resolve to the same class objects
(verified by identity), so `except SessionNotFoundError` works regardless
of which module it was imported from. Ran py_compile and
pytest tests/test_app.py (12 passed).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Part of #3629 (the `admin_tools.py` bullet). Moves the config/integration admin
tools off the legacy elif dispatch chain in tool_implementations.py onto the
agent_tools registry:
manage_endpoints, manage_mcp, manage_webhooks, manage_tokens, manage_settings
The do_* implementations (and manage_mcp's command-allowlist / RCE guard:
_validate_mcp_command, _mcp_allowed_commands, and the _MCP_* constants) move
verbatim into the new src/agent_tools/admin_tools.py. They register through a
single ADMIN_TOOL_HANDLERS map that TOOL_HANDLERS.update()s, and the five elif
branches plus their imports are dropped from tool_execution.py, so these tools
now flow through _direct_fallback like the other migrated clusters. The names
are re-exported from src.agent_tools for back-compat.
Dedup:
- _parse_tool_args was duplicated in tool_implementations.py and
document_tools.py. It now lives once in src.tool_utils (which imports nothing
from the project beyond src.constants, so this introduces no cycle) and both
call sites import it from there. The orphaned `import json` in document_tools
is removed with it.
- The five tools share one _owner_adapter(fn) factory that threads ctx["owner"]
into the owner-taking do_* signature, instead of five near-identical wrappers.
Tests: new tests/test_admin_tools_registry.py pins the registration, the
re-export back-compat, the owner-threading adapter, and the single-source
_parse_tool_args (across admin_tools and document_tools). Existing MCP /
settings / webhook suites are repointed at the new module.
* feat(discovery): detect llama.cpp servers and label local providers
Scan port 8080 (llama-server) and 11435 (APFEL) during discovery, fingerprint
llama.cpp via its native /props endpoint, and label well-known local serving
ports (8080 llama.cpp, 8000 vLLM, 1234 LM Studio, 11434 Ollama) consistently
in both the Python provider helper and the JS endpoint UI. Adds a llama.cpp
hint to the /setup slash command.
* fix(discovery): don't infer the serving tool from the port alone
Per review: vLLM, SGLang, llama.cpp and plain OpenAI-compatible servers all
share 8000/8080, so labeling by port mislabels real setups (a vLLM box on 8080
shown as llama.cpp). Drop the port->tool assertions from _provider_label and
providerLabel; the authoritative signal is the /props fingerprint done during
discovery, which is unchanged. Loopback now reads a neutral 'local endpoint' /
'Local'. Tests updated to assert the neutral labels.
* fix: use atomic write in APIKeyManager.save() to prevent data loss
Opening api_keys.json with 'w' truncates the file before writing, so a
crash, disk-full, or mid-write error leaves all stored provider API keys
corrupted. Switch to atomic write (temp file + fsync + os.replace) so
the original file is always intact on any failure.
Fixes#4591
* chore: trigger CI re-run
* chore: update PR description
* chore: fix how-to-test section for description check
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* First bare fix
* Adding the option toggle
* toggle function fix
* Final fix, added missing /auth/
* Extended toggle text & added tests
* Comments change
* Description toggle change
* br tag fix
* description change based on suggestion
* fix(routes): serve 404 instead of 500 when an HTML page file is missing
_serve_html_with_nonce opened the HTML file with no error handling, and
callers such as /backgrounds and /login pass their paths in with no
existence check, so a missing or unreadable file raised an unhandled
OSError that surfaced as a 500. Wrap the read and raise HTTPException(404)
instead; the normal render path (CSP-nonce substitution) is unchanged.
Fixes#4594
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): distinguish missing page (404) from read failure (500)
The previous fix caught a broad OSError and returned 404 for every
failure, which masks real server-side problems (permission errors, I/O
failures) as "not found" and lets them slip past error alerting. Split
FileNotFoundError (genuine 404) from other OSError, which now logs the
exception and returns a generic 500 — without leaking the OS error
string or file path into the response body.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): treat unreadable bundled HTML page as logged 500, not 404
Per PR #4637 review: every caller of the page-render helper serves a fixed,
server-owned template (index/login/backgrounds), never a client-supplied
path. So a missing or unreadable file is a server fault (broken deployment),
not a client "not found" — a 404 there mislabels a server error and hides a
missing core template from 5xx alerting, contradicting the OSError->500
rationale this PR is built on. Collapse both branches into a single logged,
leak-free 500.
Move the helper to src.app_helpers.serve_html_with_nonce so the behavior can
be unit-tested without importing the whole app (app.py is the slim
orchestrator; the test harness stubs src.database, so importing app in tests
is not viable). Add tests pinning missing/unreadable -> 500 (not 404) and
nonce injection on the happy path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix: Images cannot be seen by model that is vision capable
* fix: skip http(s) image_url for Ollama (images[] is base64-only)
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* fix(llm): detect mistral.ai provider and support reasoning_effort
Four coupled bugs broke Mistral thinking model support:
1. _detect_provider() had no mistral.ai host check, so all Mistral
endpoints fell through to the generic 'openai' provider string.
_provider_display_name() correctly identified them as 'Mistral',
making any 'if provider == "Mistral"' check elsewhere dead code.
2. reasoning_effort parameter was never sent in the request payload,
so Mistral never activated thinking mode even when the user
configured a thinking-capable model (mistral-small-latest,
mistral-medium-latest, magistral-*).
3. Mistral returns content as a typed array
([{"type":"thinking",...},{"type":"text",...}]) when
reasoning is on, not as a plain string. Both the streaming and
non-streaming parsers expected strings and silently dropped the
thinking content.
4. _THINKING_MODEL_PATTERNS didn't include magistral or mistral-*
model prefixes, so the frontend wouldn't tag reasoning output
as thinking even after the above were fixed.
Fix:
- Add mistral.ai to _detect_provider() host checks
- Add a _normalize_mistral_content() helper that splits the typed
array into (text, thinking) strings
- Inject payload["reasoning_effort"] = "high" when provider is
Mistral and _supports_thinking(model) is true, in both stream_llm
and llm_call_async payload construction
- Wire the normalizer into both response parsers
- Extend _THINKING_MODEL_PATTERNS to include magistral,
mistral-small, mistral-medium, mistral-large
Tested on Docker install with mistral-small-latest +
reasoning_effort=high. Reasoning streams correctly into the
thinking panel after the fix.
Fixes#4678
* fix(llm): address review — lowercase provider id, configurable effort, tests
Addresses vdmkenny's review on PR #4698:
1. Removed duplicate 'if provider == "mistral"' block in stream_llm
— two back-to-back copies, one was dead-redundant.
2. Dropped personal-context comment ('free-tier limits are generous
for this user') and made reasoning_effort configurable via env var
ODYSSEUS_MISTRAL_REASONING_EFFORT (high / medium / low / none).
Default remains 'high' for backward compat with the tested behavior.
3. Recased provider id from 'Mistral' to 'mistral' to match the
lowercase convention used by every other provider id in the file
(openai, anthropic, ollama, copilot, ...). _provider_display_name()
still returns the Title-Case 'Mistral' for UI labels — only the
runtime id used in 'if provider == ...' checks was recased.
4. Added tests/test_llm_core_mistral_content.py with 13 tests pinning
_normalize_mistral_content()'s contract: string passthrough, the
Mistral array format (thinking + text blocks), and edge cases
(empty, garbage, None, wrong types, missing fields, string-vs-array
inner thinking field).
Also fixed a gap the review didn't catch: the non-streaming paths
(llm_call sync + llm_call_async) were missing the reasoning_effort
injection entirely. Added the same injection to both, so Deep Research
and agent tool calls also activate Mistral thinking.
All 13 new tests pass. Existing reasoning/streaming/ollama-thinking
tests still pass (38 tests, no regressions).
Fixes#4678
* fix(security): redact credential-bearing URLs and PII from logs
Several log statements emitted sensitive data in clear text:
- model_routes / chat_routes / contacts_routes logged endpoint URLs raw.
Admin-configured URLs can embed credentials in userinfo or query
(e.g. https://user:pass@host, ?api_key=...). Route them through a
shared core.log_safety.redact_url() that drops userinfo/query/fragment.
- note_routes / task_scheduler logged operator email addresses (smtp_user,
recipient). Replaced with presence booleans, which keeps the diagnostic
("why didn't this send") without writing PII to logs.
model_routes already had a local redactor on its HTTPStatusError branch;
the generic except branch was missed, so reuse the existing helper there.
Clears CodeQL py/clear-text-logging-sensitive-data alerts 264, 317, 324,
325, 343, 344, 528.
* fix(security): re-bracket IPv6 hosts and single-source the URL redactor
Address review on #4750:
- redact_url now re-brackets IPv6 literals so host:port stays
unambiguous (https://[2001:db8::1]:8443/v1, not the bracket-less
ambiguous form).
- point model_routes._redact_url_for_log at the shared helper so the
two redactors are single-sourced (also picks up the IPv6 fix).
Moves create_session, list_sessions, send_to_session and manage_session out of
ai_interaction.py into src/agent_tools/session_tools.py (the do_ prefix
dropped) and registers them in TOOL_HANDLERS, so dispatch flows through the
registry instead of the dispatch_ai_tool elif in tool_execution.py. Same
pattern as the model-interaction move.
The bodies move verbatim; each fetches the runtime-set session manager via a
get_session_manager() shim, and reuses _resolve_model / AI_CHAT_TIMEOUT from
ai_interaction. manage_session's internal 'list' alias is repointed from the
old do_list_sessions to the moved list_sessions. stream_ai_tool (dead, no
callers) and do_pipeline stay put. dispatch_ai_tool loses its four now-unused
branches.
Tests: test_session_tools_registry covers registration, owner threading, the
manage_session->list_sessions delegation, graceful no-manager handling, and
registry dispatch. Verified end-to-end against a live SessionManager.
Detached bash jobs (#!bg) could be launched and auto-reported on completion,
but the agent had no way to act on a running one: no on-demand output read and
no kill (it blocked until the 1h max-runtime). bg_jobs had the pieces
(_read_output, list_for_session, internal _kill) but none was exposed.
Adds:
- bg_jobs.kill(job_id): tears down the process tree, marks the job killed, and
sets followed_up so the monitor does not also auto-continue a deliberate kill.
- manage_bg_jobs registry tool with actions list / output / kill, scoped to the
chat that launched the job (cross-session access reads as not found).
- Wiring: TOOL_HANDLERS/TAGS, function schema, RAG index + keyword hints, parser
name map, dispatch (threads session_id via _direct_fallback). Gated like bash
(NON_ADMIN_BLOCKED_TOOLS; plan-mode mutator).
- agent_loop: background-job intent regex maps to the files domain (and the tool
joins _DOMAIN_TOOL_MAP[files]) so short commands like 'kill that job' are not
dropped by the low-signal gate that skips tool retrieval.
- bg launch message tells the model to call manage_bg_jobs itself for check/stop
rather than printing raw tool syntax to the user.
Tests: tests/test_bg_job_tools.py (kill semantics, per-chat scoping, actions,
and the intent classifier).
* fix(tools): prune skipped dirs before descending in glob tool
GlobTool used pathlib.Path.rglob which descends into every directory
(including node_modules, .git, dist, etc.) and filters AFTER the walk.
On repos with large junk directories this causes the glob tool to hang
for minutes.
Replace rglob with os.walk that prunes _CODENAV_SKIP_DIRS before
descending — matching the approach GrepTool already uses. Also add a
fast path for literal patterns (no wildcards → direct path lookup).
Fixes#4493
* fix(tools): use regex glob matching to fix * semantics and literal fallback
Replace fnmatch with _glob_to_regex so that * stays within a single
path segment (matching pathlib/rglob semantics) and **/ spans zero or
more directories. Literal patterns now fall through to os.walk when
the direct path lookup misses, so e.g. 'foo.py' still finds files at
any depth.
Add tests for:
- bare literal matching in subdirectories
- multi-segment single-star patterns (sub/*.txt)
- * not crossing / boundaries
- ** matching at arbitrary depth
Closes#4493
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
The harmony stream router only recognized the analysis and final channels, so
gpt-oss's standard `commentary` channel (tool-call preambles / function-arg
bodies) was unhandled: the literal `<|channel|>commentary` marker, the
`to=functions.*` recipient, and the commentary body all leaked into the
visible answer. Add commentary to the marker regex + the suffix-hold table, and
route its body to thinking (only `final` is user-facing). Adds a regression
test (split-chunk + recipient + body), verified to fail without the fix.
* fix(agent): index api_call so RAG tool selection can retrieve it
api_call exists in FUNCTION_TOOL_SCHEMAS and the agent's system prompt
advertises configured API integrations, but the tool had no entry in
BUILTIN_TOOL_DESCRIPTIONS. RAG tool selection embeds those descriptions and
retrieves the top-K per message, so a tool without one can never be selected:
the agent claims it can call Home Assistant/Miniflux/Gitea/etc. and then
never receives the api_call schema (unless the Personal Assistant
ASSISTANT_ALWAYS_AVAILABLE path applies).
Add a retrieval-rich description for api_call, plus an ast-based parity test
asserting every FUNCTION_TOOL_SCHEMAS tool has an index description so the
next added tool cannot silently drift the same way.
Fixes#3794
* fix(agent): route API-integration intent to api_call at selection time
Addresses review (RaresKeY) on #3923: indexing api_call in the ToolIndex
description was necessary but not sufficient — the #3794 repro ('Use the
api_call tool to call Home Assistant GET /api/states') matched no domain in
_classify_agent_request, classified as low-signal, so the agent loop skipped
retrieval entirely and the schema filter sent only ALWAYS_AVAILABLE
(manage_memory/ask_user/update_plan). api_call never reached the model.
- _classify_agent_request: detect API-integration intent (api_call,
integration(s), Home Assistant/Miniflux/Gitea/Linkding/Jellyfin) -> new
'integrations' domain, so the turn is no longer low-signal.
- _DOMAIN_TOOL_MAP['integrations'] = {api_call}: deterministically seeds
api_call into relevant tools after retrieval, independent of embeddings.
- _DOMAIN_RULES['integrations']: rule pack (required — _domain_rules_for_tools
indexes _DOMAIN_RULES[domain] directly).
- tool_index _KEYWORD_HINTS: parity hint for the retrieval / keyword-fallback
paths.
- Regression drives the real classifier -> domain-map -> FUNCTION_TOOL_SCHEMAS
filter chain and asserts api_call is advertised for the #3794 prompt.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Moves chat_with_model, ask_teacher and list_models out of ai_interaction.py
into src/agent_tools/model_interaction_tools.py (the do_ prefix dropped) and
registers them in TOOL_HANDLERS, so dispatch flows through the registry instead
of the dispatch_ai_tool elif in tool_execution.py.
The implementations are relocated, not wrapped. ai_interaction.py keeps only
the shared helpers they reuse (_resolve_model, AI_CHAT_TIMEOUT), still used by
the not-yet-migrated session/pipeline tools. dispatch_ai_tool loses its three
now-unused branches.
Also removes the dead do_second_opinion: it was already off the live tool
surface (no tag/schema/parsing/dispatch; tool_index.py notes it was removed),
so the function and its stale frontend catalog entries (admin.js, assistant.js)
are deleted.
Tests: owner-scope test points at the new list_models location and drops the
moved tools from the dispatch_ai_tool parametrize; a new
test_model_interaction_registry covers registration, owner threading, and
registry dispatch.
* fix(security): allowlist manage_mcp 'add' to close the agent-path RCE
do_manage_mcp('add') passed model- and prompt-injection-controlled command,
args, and env straight to a stdio subprocess spawn with no validation, and it
persisted an enabled server row before connecting (so a payload also survived
to re-execute on restart). A string smuggled into a skill description, memory
entry, fetched page, or email body could register a server running arbitrary
code as the app UID, e.g. command='sh' args=['-c','...'].
Add _validate_mcp_command, applied on the agent path before any DB write or
spawn:
- Hard-deny interpreters, runtimes, package runners, shells, and exec-wrappers
(even if an operator lists one in ODYSSEUS_MCP_ALLOWED_COMMANDS).
- Require a bare basename (no path components, no shell metacharacters) that is
present in the operator allowlist (empty by default).
- Reject code-exec argv flags by prefix so glued forms are caught too
(-c/-e/-m/--eval/--exec/--print/--module/--command/--require), remote-URL
args, and env keys that inject code into the child (LD_PRELOAD, NODE_OPTIONS,
PYTHONPATH, DYLD_*, PATH, ...).
A rejected registration returns an error, writes no row, and makes no
connection. The trusted admin route is unchanged. Mirrors the policy intent of
_validate_serve_cmd but inverted for the model-reachable surface.
Supersedes #438; incorporates the bypass forms found in its review (interpreter
script paths, -m pip, glued -c/-e, --eval=, eval subcommands, package runners,
remote URLs) and adds integration coverage on the real do_manage_mcp path.
Closes#2891
* fix(security): deny versioned/alias runtimes in manage_mcp allowlist
Addresses RaresKeY's review on #4433. The hard-deny matched command names
exactly, so versioned or alias runtime forms (python3.11, node18, pip3,
ruby3.2, java, javac, bunx, tsx, ts-node, pypy3, ...) slipped past and, if an
operator allowlisted one, re-opened the prompt-injection-controlled MCP
registration path.
- Canonicalize a trailing version suffix before the deny check so versioned
forms collapse to the family (python3.11 -> python, node18 -> node, pip3 ->
pip); both the raw basename and the canonical form are denied.
- Broaden the denied-family set (java/javac/jshell/jbang/kotlin/dotnet/mono/
swift/osascript/tsx/ts-node/bunx/pypy/jruby/raku/luajit/wish/expect/iex).
Deny runs before the operator allowlist, so an alias cannot be allowlisted back
in. Canonicalization only feeds the deny check, so a legit name that ends in a
digit still reaches the normal allowlist check rather than being mis-denied.
Adds validator + integration regressions for versioned/alias runtimes asserting
no DB row and no connection, including the allowlisted-anyway case.
The scheduled-task runner built the agent's tool set from RAG retrieval plus
ASSISTANT_ALWAYS_AVAILABLE. Neither includes bash/python (nor the file tools),
and no keyword hint force-includes them, so a task only saw the shell when the
tool-embedding index happened to surface it. On hosts where that index is empty
or degraded (e.g. a fresh Docker deploy), retrieval returns nothing and the task
agent never receives bash/python — telling the user the shell is disabled even
for an admin owner.
Offer the shell/file group to task agents by default, mirroring the chat agent
where these are on unless a privilege or global setting turns them off. The
existing blocked_tools_for_owner() gate in stream_agent_loop still strips the
whole group for non-admin multi-user owners and only admits it for admins and
single-user (AUTH_ENABLED=false) deployments, so this changes what is offered,
not who is allowed. A crew that defines an explicit enabled_tools allowlist
still has its restriction honored.
Also merge the operator's global disabled_tools setting into the scheduler's
disabled set before composing relevant_tools and before entering the agent
loop, matching what chat already does. Without it, the global tool-disable
contract did not reach unattended scheduled tasks: an admin or AUTH_ENABLED=false
task could still see and call shell/file tools the operator had turned off
globally, since the prompt/schema/execution gates only enforce the disabled
tools passed in.
The non-native (prompted) tool-call path fed tool output back to the model as a plain "[Tool execution results]" user message, bypassing the untrusted_context_message wrapper that THREAT_MODEL.md requires for tool output. That path is what models without native tool-calling (many smaller local models) use, so prompt-injection inside a tool result (fetched page, file read, MCP/email output) could be read as instructions there.
Wrap it via untrusted_context_message("tool execution results", ...), the same hardening already applied to skills (#788) and escalation traces (#275). Also update _recent_context_for_retrieval, which used the old "[Tool execution results]" prefix as a sentinel to keep tool envelopes out of the retrieval query, to recognise the wrapped envelope via metadata.trusted.
The native path keeps returning tool-role messages (a user-role wrapper would break the native tool-call contract); it is covered by UNTRUSTED_CONTEXT_POLICY. Adds tests/test_tool_output_prompt_injection.py.
Fixes#1627.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Added PASSWORD_MIN_LENGTH and RESERVED_USERNAMES to src/constants.py as the
single source of truth. Previously PASSWORD_MIN_LENGTH was hardcoded as 8 in
four route handlers and all three JS validation paths; RESERVED_USERNAMES was
an inline frozenset duplicated in core/auth.py, routes/assistant_routes.py,
routes/research_routes.py, and src/task_scheduler.py.
Added GET /api/auth/policy (unauthenticated) so the frontend reads the real
values from the server instead of hardcoding them in JS.
Added missing empty-username guard to /setup and admin POST /users. Both
returned a misleading 500/409 on whitespace-only input. /signup already had the
check; this makes all three consistent.
docker-compose.yml injects FASTEMBED_CACHE_PATH=${FASTEMBED_CACHE_PATH:-},
which sets the variable to an empty string when the host has not defined it.
FASTEMBED_CACHE_DIR used os.getenv("FASTEMBED_CACHE_PATH", default), and
os.getenv only returns the default when the variable is ABSENT -- so the empty
value won and FASTEMBED_CACHE_DIR became "". os.makedirs("") then raised
[Errno 2] No such file or directory: '', FastEmbed failed to initialise, and
every vector feature (RAG, semantic memory, tool index) silently degraded on
the default Docker stack.
Treat an empty value like an absent one via `os.getenv(...) or default`.
Add a regression test covering the empty, unset, and explicit cases.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: check-in calendar digest leaks every user's events (no owner scope)
* Seed dtend on calendar events in digest test so the NOT NULL column is satisfied
The outbound UA for web_fetch / web_search was inlined in four places with
two different values and nothing keeping them current: content.py pinned a
mid-2021 Chrome 91 build, and providers.py sent a bare Mozilla/5.0 in three
spots. Some sites serve a degraded or blocked page to a UA that old.
Add WEB_FETCH_USER_AGENT to src/constants.py (env-overridable, matching the
existing Copilot/Kimi UA-constant pattern) and import it in content.py and
providers.py. Default to a current, common desktop UA so pages return their
normal HTML: the market-leading desktop OS (Windows; NT 10.0 covers Windows
10 and 11) and browser (Chrome) on a current stable build. The version is now
bumped in one place.
Service-specific self-identifying agents (Copilot, Kimi, webhooks, cookbook)
are intentionally left separate. Adds a regression pinning the constant shape,
the env override, and a guard against a new inline Mozilla literal in the
search sources.
Closes#4324
When a CardDAV contact matched the search query but had no email
address (only phone numbers), the tool silently dropped it and
returned 'No contacts found'. Fall back to the contact's phone
number(s) so the caller still receives usable information.
Refs: #4178 (the contacts-domain classifier fix that made the model
actually call resolve_contact for contacts queries, surfacing this
pre-existing gap)
* fix(search): add download budgets to web_fetch with truncation notice and hard ceiling
MAX_OUTPUT_CHARS only trims what the agent sees; fetch_webpage_content
buffered and cached the entire response body first, so a large or hostile
URL could pull arbitrarily many bytes into memory and the content cache.
The fetch is now a capped streaming GET (SSRF redirect guard unchanged):
a soft default budget (WEB_FETCH_SOFT_MAX_BYTES, 2 MB), a per-call
override via full/max_bytes on the web_fetch tool, and a hard ceiling
(WEB_FETCH_HARD_MAX_BYTES, 20 MB) that the override can never exceed.
When Content-Length already declares a body over the ceiling the fetch
is refused before any body bytes are buffered. Truncated results carry
truncated/fetched_bytes/total_bytes, the tool output leads with a
partial-content notice telling the model how to re-fetch with full=true,
and the tool schema documents the flag. A truncated PDF is reported as
a budget error since a cut PDF is unparseable. The effective cap is part
of the content-cache key so a truncated fetch is never served to a
full-budget request.
Existing tests that faked httpx.get or the old _get_public_url signature
are adapted to the streaming interface; behavior pins are unchanged.
Fixes#3812
* fix(search): close compressed-body cap bypass and protect the partial notice
Addresses RaresKeY's review on #3955:
- Force Accept-Encoding: identity for the capped fetch. With gzip/deflate the
wire bytes (and Content-Length) can be a fraction of the decoded body, so a
tiny compressed response could pass the hard-cap preflight and then expand
past the ceiling in a single decoded chunk before the streamed cap could
slice it. Identity makes Content-Length the true body size and keeps each
streamed chunk bounded by the network read, so the hard ceiling actually
bounds memory.
- Lead web_fetch output with the partial-content notice and cap the page
title. The notice is the user-facing contract for partial fetches, but the
title is untrusted, uncapped page content; placed ahead of the notice a giant
title could push it past MAX_OUTPUT_CHARS and drop it. The notice now leads
and the title is capped as a second guard.
Adds regressions: the fetch advertises identity encoding, and a truncated
result with an oversized title still surfaces the partial notice.
* fix(search): reject compressed responses that ignore the identity request
Requesting Accept-Encoding: identity is not enough on its own: a server can
ignore it and still return Content-Encoding: gzip, and httpx.iter_bytes would
decode that, so a tiny compressed body could balloon into one decoded chunk
far past the hard cap before the streamed loop slices it (and Content-Length,
the compressed wire length, makes the preflight and size metadata unreliable).
Refuse a non-identity Content-Encoding before reading the body. Adds a
regression where the server ignores the identity request and returns gzip;
the fetch is refused before any body is decoded.
* log(app): add warnings to silent except Exception blocks
- Internal tool auth header failure now logs a warning instead of
silently passing, making auth bypass easier to spot in logs.
- Token last_used_at update failure now logs at DEBUG (fire-and-forget,
non-critical, but useful when debugging token tracking issues).
- Image ownership verification failure now logs a warning so unexpected
access-check errors surface instead of silently allowing the request.
* log(chat_routes): add warnings to silent except Exception blocks
- clear_orphaned_session_endpoint: log before rollback so failures
appear in traces when users see stale/deleted model options.
- _endpoint_has_model (JSON parse): log malformed cached_models instead
of silently treating endpoint as valid.
- _has_any_visible_model (JSON parse): log malformed cached_models
instead of silently returning empty list.
- timezone header parse: log failure so time-zone-related tool bugs
(wrong scheduled times, calendar events) are traceable.
- attachments JSON parse: log failure so silently-dropped attachments
are visible in server logs.
* log(email_routes): add warnings to silent except Exception blocks
- Email alias resolution failure now logs a warning instead of silently
returning an empty list, making broken account configs diagnosable.
* log(document_routes): add warnings to silent except Exception blocks
- Export ZIP request body parse failure now logs a warning so empty
exports caused by malformed requests are diagnosable.
- clear_active_document failure on detach now logs a warning to help
trace doc re-injection bugs like #1160.
* log(agent_loop): add warnings to silent except Exception blocks
- builtin tool overrides load failure now logs a warning so misconfigured
settings don't silently fall back to defaults without a trace.
- Timezone context injection failure now logs a warning to help debug
incorrect scheduled times in agent-created tasks.
- PDF form-backed document detection failure now logs a warning so
broken form-doc UI is traceable to the root cause.
* log(llm_core): add warnings to silent except Exception blocks
- Malformed URL in _is_ollama_native_url now logs a warning so bad
endpoint configs are traceable instead of silently returning False.
- Model list fetch failure now logs a warning with the endpoint URL so
endpoints that silently vanish from the model picker are diagnosable.
* log: pass exception via exc_info instead of string interpolation
* fix(logging): avoid logging raw URLs in llm_core error paths
Drop the raw url/base_chat_url from the Ollama-detection and
model-list-fetch warning logs added by this sweep, since these values
can contain private hostnames, internal IPs, credentials, or other
deployment details.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>