odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-28 23:52:09 -04:00

Author	SHA1	Message	Date
Alexandre Teixeira	fcbddf3845	Merge pull request #4280 from GeekLuffy/feat/llm-self-eval feat(teacher): implement Tier 2 LLM self-evaluation	2026-06-26 18:35:01 +01:00
Alexandre Teixeira	ab01e7a000	Merge pull request #4448 from Muhammad-Ikhwan-Fathulloh/dev fix(upload): cache upload manifest and improve rename reliability	2026-06-26 18:04:59 +01:00
Alexandre Teixeira	626414584b	fix(upload): remove trailing whitespace	2026-06-26 18:01:04 +01:00
GeekLuffy	d5a45c1ce3	feat(teacher): add teacher_tier2_enabled setting and strict parser	2026-06-26 22:26:15 +05:30
Tal.Yuan	fc1351d0f8	refactor(tools): split tool_implementations.py into src/tools/ package (#4423 ) * test(tools): add shim protection test for tool_implementations split Covers all 48 top-level functions (33 do_* + 15 _helpers) extracted from the original module. Guards the upcoming split: the shim must re-export every symbol so existing 'from src.tool_implementations import X' imports keep working. Passes on baseline (pre-split). * refactor(tools): add src/tools/ package with shared _common Slice 1 Task 2 (#4082/#4071). Adds the package skeleton and moves the shared _parse_tool_args helper into src/tools/_common.py. Domain modules will import from here. tool_implementations.py is untouched at this step. * refactor(tools): extract system domain into src/tools/system.py Slice 1 (#4082/#4071), Task 3: move the system-domain tool functions (do_manage_skills/_skill_dump/do_manage_tasks/do_manage_endpoints/ do_manage_mcp/do_manage_webhooks/do_manage_tokens/do_manage_settings/ do_api_call/do_app_api) and the app_api blocklist constants out of tool_implementations.py into a new src/tools/system.py module. tool_implementations.py re-imports all of them so it stays a working backward-compatible facade (shim test stays green). - do_manage_mcp resolves get_mcp_manager via a function-local import from tool_implementations so the test that patches src.tool_implementations.get_mcp_manager still applies post-move. - do_app_api imports _internal_headers and _INTERNAL_BASE (still in tool_implementations) function-locally to avoid a circular import. - Repoint test_context_budget introspection assertion to the moved code's new home in src/tools/system.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(tools): extract cookbook domain into src/tools/cookbook.py Moves the model-serving (cookbook) tool domain out of tool_implementations.py into src/tools/cookbook.py as part of slice 1 (#4082/#4071): - 13 do_* tools: download/serve/list/stop/tail/search/adopt/cached models, list downloads/cancel, list cookbook servers, serve presets - 9 private helpers: _cookbook_servers, _resolve_cookbook_host, _cookbook_env_for_host, _infer_serve_{port,host}, _ensure_served_endpoint, _cookbook_register_task, _cookbook_apply_retry_suggestion, _scan_running_model_processes, _cookbook_kill_session - _MODEL_PROCESS_PATTERNS constant (used only by _scan_running_model_processes) tool_implementations.py stays a backward-compatible facade via a re-import from src.tools.cookbook; src/tools/__init__ re-exports the same symbols. _internal_headers and _INTERNAL_BASE stay in tool_implementations.py (shared by system.py's do_app_api and many cookbook funcs). Each cookbook function that needs them does a function-local import to avoid a top-level circular dependency, matching the system-domain split. Verified: compileall clean; shim test green; cookbook-touching suite (652 passed, 1 skipped); full suite 3587 passed, 2 failed (pre-existing test_api_chat_security, unrelated). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(tools): extract search domain into src/tools/search.py * refactor(tools): extract notes domain into src/tools/notes.py * refactor(tools): extract calendar domain into src/tools/calendar.py Repoints tests/test_caldav_bidirectional_sync.py source-introspection to src/tools/calendar.py (do_manage_calendar moved there). * refactor(tools): extract image domain into src/tools/image.py * refactor(tools): extract research domain into src/tools/research.py * refactor(tools): extract contacts domain into src/tools/contacts.py * refactor(tools): extract vault domain into src/tools/vault.py Repoints tests/test_vault_password_not_in_argv.py source-introspection to src/tools/vault.py (the vault do_* helpers moved there). * refactor(tools): collapse tool_implementations to clean re-export shim Move shared _INTERNAL_BASE/_internal_headers to src/tools/_common.py and drop the duplicate _parse_tool_args (already in _common). tool_implementations.py is now a pure re-export facade (+ 3 pre-existing email-context helpers, out of scope). Domain files' function-local imports of these names still resolve via the facade re-export. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tools): port upstream cookbook workflow changes to split module Rebase onto dev dropped `c504214` ("Cookbook model workflow fixes") edits to do_serve_model / do_tail_serve_output: the extraction commit moved the pre-edit bodies into src/tools/cookbook.py and git auto-accepted the deletion from tool_implementations.py, losing dev's changes. Restore them in their post-split home: - do_serve_model: add where/log_path/next_tools and the expanded "Next required check" output message - do_tail_serve_output: empty-output fallback message replacing "(empty pane)" (do_manage_settings web_fetch alias edit was already applied to src/tools/system.py during the system-extract conflict resolution.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tools): break admin_tools circular import in split facade After rebasing onto dev (#3629 moved the admin manage_* tools into src/agent_tools/admin_tools), the facade re-exported them via a top-level `from src.agent_tools.admin_tools import ...`. But src.agent_tools.__init__ imports this facade at top level, so the eager import re-entered the partially-initialized agent_tools package and broke collection. Re-export the admin symbols (do_manage_endpoints/mcp/webhooks/tokens/ settings, _MCP_DENIED_COMMANDS, _validate_mcp_command) lazily through module __getattr__ instead, and drop them from src/tools/__init__ (they no longer live in the src.tools package). system.py now holds only the skills/tasks/api bridges; admin tools live solely in admin_tools.py, matching upstream. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tools): re-export dropped helpers through the split shim Address review finding from #4423: the compatibility facade claimed to preserve every original top-level symbol but omitted three helpers the old src.tool_implementations exposed. Re-export them and pin them in the shim protection test: - _string_arg, _validate_cookbook_ssh_target <- src/tools/cookbook.py - _mcp_allowed_commands <- src/agent_tools/admin_tools.py (lazily via __getattr__, to keep the agent_tools.__init__ <-> facade import acyclic after the #3629 admin-tools migration) All three added to tests/test_tool_implementations_shim.py _EXPECTED so the test contract now matches its "every original top-level function" comment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(tools): self-verify shim re-exports every domain do_* The hand-maintained _EXPECTED list in the shim protection test can drift silently when a new tool is added to a domain module but not re-exported by the facade — exactly the omission a reviewer flagged post-split. Add an auto-discovering test that enumerates every do_* from the domain modules (incl. admin_tools) and asserts reachability through the shim, so a forgotten re-export fails the build automatically. Uses hasattr (not dir(ti)) because the admin symbols are re-exported lazily via module __getattr__ and don't appear in dir(ti). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(tools): self-verify every in-repo facade import resolves RaresKeY's P3 on the shim test was a claim-vs-reality gap: the docstring said it protected "every from src.tool_implementations import X" but the hand-maintained _EXPECTED list omitted three underscore helpers, so the claim wasn't enforced. Re-exporting the three (`cf1f5e3`) fixed the known gap; this closes the structural one. Add test_every_facade_import_in_repo_resolves: ast-enumerate every `from src.tool_implementations import X` site in src/ and tests/ and assert hasattr(ti, X) for each. A forgotten re-export that anything in the repo imports now fails the build automatically — including underscore helpers, which the do_* discovery test does not cover. Together with test_shim_reexports_every_domain_do_function, the shim contract is now self-verifying. Demote _EXPECTED in the docstring to the curated historical/downstream surface (the three helpers have no in-repo consumer, so they stay manual by necessity) instead of "ground truth". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tools): dedupe _parse_tool_args + align shim guard with route consumers Addresses two P3s from review (RaresKeY, 2026-06-26): 1. maintainability — _common carried a full copy of _parse_tool_args alongside the canonical src.tool_utils one; future parser fixes could diverge. The two bodies were byte-identical in logic, so _common now re-exports from tool_utils (a leaf module, no circular-import risk). The single-source test is extended to assert _common._parse_tool_args and tool_implementations._parse_tool_args are the same object as tool_utils._parse_tool_args. 2. test — the shim guard's import-site scan only walked src/ and tests/, missing routes/chat_routes.py's clear_active_email/set_active_email imports, and _EXPECTED omitted the active-email facade helpers. The scan now walks every first-party Python dir (pruning venvs/caches/data in-place), and set/get/clear_active_email are added to _EXPECTED (get_active_email has no in-repo importer, so the scan alone can't see it). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: yuandonghao <yuandonghao@cohl.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 15:40:04 +01:00
nikakhalatiani	6cd489f79d	Retry oversized embedding requests (#1106 )	2026-06-26 14:21:27 +01:00
Magiomakes	4e46e415ea	fix(tasks): normalize task endpoint URL to /chat/completions before model call (#4619 ) Upstream bug (present in pewdiepie-archdaemon/odysseus main): the task executor passes task.endpoint_url VERBATIM to the model HTTP call, unlike the chat path which stores build_chat_url(normalize_base(base)) on the session. A task carrying an explicit bare OpenAI-compatible base such as "http://host:11434/v1" therefore POSTs to a 404 ("page not found"); the agent loop swallows the empty body into "The model returned an empty response" and marks the run success, so nothing surfaces the failure. Tasks that omit an endpoint dodge this only because _resolve_defaults() cribs an already-full URL from a recent chat session. The API/token path (e.g. an external client that POSTs /api/tasks with endpoint_url=".../v1") hits it every time. Fix: route every resolved task endpoint through _normalize_chat_endpoint() at the three resolution sites (_execute_llm_task, the persona/research session path, and _execute_research_task). The helper is idempotent (strips any existing chat suffix, re-appends the correct one) and leaves native-Ollama (/api...) and already-concrete URLs untouched, so other providers are unaffected. Proven via isolated repro: ".../v1" -> 404 -> empty; ".../v1/chat/completions" -> 200 -> real gemma4:31b output. Regression test asserts the bare-/v1 -> full-chat-URL mapping, idempotency, and the native-Ollama/empty passthroughs. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 18:02:31 +02:00
Solanki Sumit	6a2a39f892	refactor(exceptions): dedupe src/exceptions via core re-export (#4785 ) src/exceptions.py was a byte-for-byte duplicate of the canonical core/exceptions.py. Replace its class bodies with a re-export shim (mirroring the core/constants.py -> src/constants.py pattern) so the exception classes are defined in exactly one place. Also fix the stale "# src/exceptions.py" header comment in core/exceptions.py. No behavior change: both import paths resolve to the same class objects (verified by identity), so `except SessionNotFoundError` works regardless of which module it was imported from. Ran py_compile and pytest tests/test_app.py (12 passed). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 16:50:07 +02:00
GeekLuffy	413e628a30	Merge remote-tracking branch 'upstream/dev' into feat/llm-self-eval	2026-06-24 13:07:10 +05:30
Kenny Van de Maele	5ce2056521	refactor(tools): migrate config/integration admin tools to the registry (#4742 ) Part of #3629 (the `admin_tools.py` bullet). Moves the config/integration admin tools off the legacy elif dispatch chain in tool_implementations.py onto the agent_tools registry: manage_endpoints, manage_mcp, manage_webhooks, manage_tokens, manage_settings The do_* implementations (and manage_mcp's command-allowlist / RCE guard: _validate_mcp_command, _mcp_allowed_commands, and the _MCP_* constants) move verbatim into the new src/agent_tools/admin_tools.py. They register through a single ADMIN_TOOL_HANDLERS map that TOOL_HANDLERS.update()s, and the five elif branches plus their imports are dropped from tool_execution.py, so these tools now flow through _direct_fallback like the other migrated clusters. The names are re-exported from src.agent_tools for back-compat. Dedup: - _parse_tool_args was duplicated in tool_implementations.py and document_tools.py. It now lives once in src.tool_utils (which imports nothing from the project beyond src.constants, so this introduces no cycle) and both call sites import it from there. The orphaned `import json` in document_tools is removed with it. - The five tools share one _owner_adapter(fn) factory that threads ctx["owner"] into the owner-taking do_* signature, instead of five near-identical wrappers. Tests: new tests/test_admin_tools_registry.py pins the registration, the re-export back-compat, the owner-threading adapter, and the single-source _parse_tool_args (across admin_tools and document_tools). Existing MCP / settings / webhook suites are repointed at the new module.	2026-06-24 09:29:10 +02:00
Joel Alejandro Escareño Fernández	e0ccf250a4	feat(discovery): detect llama.cpp servers and label local providers (#4729 ) * feat(discovery): detect llama.cpp servers and label local providers Scan port 8080 (llama-server) and 11435 (APFEL) during discovery, fingerprint llama.cpp via its native /props endpoint, and label well-known local serving ports (8080 llama.cpp, 8000 vLLM, 1234 LM Studio, 11434 Ollama) consistently in both the Python provider helper and the JS endpoint UI. Adds a llama.cpp hint to the /setup slash command. * fix(discovery): don't infer the serving tool from the port alone Per review: vLLM, SGLang, llama.cpp and plain OpenAI-compatible servers all share 8000/8080, so labeling by port mislabels real setups (a vLLM box on 8080 shown as llama.cpp). Drop the port->tool assertions from _provider_label and providerLabel; the authoritative signal is the /props fingerprint done during discovery, which is unchanged. Loopback now reads a neutral 'local endpoint' / 'Local'. Tests updated to assert the neutral labels.	2026-06-23 23:39:56 +02:00
Michael	72c0bde8a9	fix: use atomic write in APIKeyManager.save() to prevent credential data loss (#4591 ) (#4597 ) * fix: use atomic write in APIKeyManager.save() to prevent data loss Opening api_keys.json with 'w' truncates the file before writing, so a crash, disk-full, or mid-write error leaves all stored provider API keys corrupted. Switch to atomic write (temp file + fsync + os.replace) so the original file is always intact on any failure. Fixes #4591 * chore: trigger CI re-run * chore: update PR description * chore: fix how-to-test section for description check --------- Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>	2026-06-23 23:28:53 +02:00
Dividesbyzer0	2e16394b41	fix(agent): parse misfenced read_file calls (#4799 )	2026-06-23 23:20:13 +02:00
Jakub Grula	060dbf0681	feat: Allow admins to choose if they want to share defaults (#4752 ) * First bare fix * Adding the option toggle * toggle function fix * Final fix, added missing /auth/ * Extended toggle text & added tests * Comments change * Description toggle change * br tag fix * description change based on suggestion	2026-06-23 23:06:45 +02:00
Ahmed Dlshad	8f5e36a079	fix(routes): log and cleanly 500 on unreadable HTML page (#4637 ) * fix(routes): serve 404 instead of 500 when an HTML page file is missing _serve_html_with_nonce opened the HTML file with no error handling, and callers such as /backgrounds and /login pass their paths in with no existence check, so a missing or unreadable file raised an unhandled OSError that surfaced as a 500. Wrap the read and raise HTTPException(404) instead; the normal render path (CSP-nonce substitution) is unchanged. Fixes #4594 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(routes): distinguish missing page (404) from read failure (500) The previous fix caught a broad OSError and returned 404 for every failure, which masks real server-side problems (permission errors, I/O failures) as "not found" and lets them slip past error alerting. Split FileNotFoundError (genuine 404) from other OSError, which now logs the exception and returns a generic 500 — without leaking the OS error string or file path into the response body. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(routes): treat unreadable bundled HTML page as logged 500, not 404 Per PR #4637 review: every caller of the page-render helper serves a fixed, server-owned template (index/login/backgrounds), never a client-supplied path. So a missing or unreadable file is a server fault (broken deployment), not a client "not found" — a 404 there mislabels a server error and hides a missing core template from 5xx alerting, contradicting the OSError->500 rationale this PR is built on. Collapse both branches into a single logged, leak-free 500. Move the helper to src.app_helpers.serve_html_with_nonce so the behavior can be unit-tested without importing the whole app (app.py is the slim orchestrator; the test harness stubs src.database, so importing app in tests is not viable). Add tests pinning missing/unreadable -> 500 (not 404) and nonce injection on the happy path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 16:12:32 +02:00
Michael	e8175c9535	fix: Images cannot be seen by model that is vision capable (#4726 ) * fix: Images cannot be seen by model that is vision capable * fix: skip http(s) image_url for Ollama (images[] is base64-only) --------- Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>	2026-06-23 10:32:57 +02:00
aubrey	bd9149f79a	fix(llm): detect mistral.ai provider and support reasoning_effort (#4698 ) * fix(llm): detect mistral.ai provider and support reasoning_effort Four coupled bugs broke Mistral thinking model support: 1. _detect_provider() had no mistral.ai host check, so all Mistral endpoints fell through to the generic 'openai' provider string. _provider_display_name() correctly identified them as 'Mistral', making any 'if provider == "Mistral"' check elsewhere dead code. 2. reasoning_effort parameter was never sent in the request payload, so Mistral never activated thinking mode even when the user configured a thinking-capable model (mistral-small-latest, mistral-medium-latest, magistral-). 3. Mistral returns content as a typed array ([{"type":"thinking",...},{"type":"text",...}]) when reasoning is on, not as a plain string. Both the streaming and non-streaming parsers expected strings and silently dropped the thinking content. 4. _THINKING_MODEL_PATTERNS didn't include magistral or mistral- model prefixes, so the frontend wouldn't tag reasoning output as thinking even after the above were fixed. Fix: - Add mistral.ai to _detect_provider() host checks - Add a _normalize_mistral_content() helper that splits the typed array into (text, thinking) strings - Inject payload["reasoning_effort"] = "high" when provider is Mistral and _supports_thinking(model) is true, in both stream_llm and llm_call_async payload construction - Wire the normalizer into both response parsers - Extend _THINKING_MODEL_PATTERNS to include magistral, mistral-small, mistral-medium, mistral-large Tested on Docker install with mistral-small-latest + reasoning_effort=high. Reasoning streams correctly into the thinking panel after the fix. Fixes #4678 * fix(llm): address review — lowercase provider id, configurable effort, tests Addresses vdmkenny's review on PR #4698: 1. Removed duplicate 'if provider == "mistral"' block in stream_llm — two back-to-back copies, one was dead-redundant. 2. Dropped personal-context comment ('free-tier limits are generous for this user') and made reasoning_effort configurable via env var ODYSSEUS_MISTRAL_REASONING_EFFORT (high / medium / low / none). Default remains 'high' for backward compat with the tested behavior. 3. Recased provider id from 'Mistral' to 'mistral' to match the lowercase convention used by every other provider id in the file (openai, anthropic, ollama, copilot, ...). _provider_display_name() still returns the Title-Case 'Mistral' for UI labels — only the runtime id used in 'if provider == ...' checks was recased. 4. Added tests/test_llm_core_mistral_content.py with 13 tests pinning _normalize_mistral_content()'s contract: string passthrough, the Mistral array format (thinking + text blocks), and edge cases (empty, garbage, None, wrong types, missing fields, string-vs-array inner thinking field). Also fixed a gap the review didn't catch: the non-streaming paths (llm_call sync + llm_call_async) were missing the reasoning_effort injection entirely. Added the same injection to both, so Deep Research and agent tool calls also activate Mistral thinking. All 13 new tests pass. Existing reasoning/streaming/ollama-thinking tests still pass (38 tests, no regressions). Fixes #4678	2026-06-23 10:28:17 +02:00
nopoz	7e5db9a3c6	fix(security): redact credential-bearing URLs and PII from logs (#4750 ) * fix(security): redact credential-bearing URLs and PII from logs Several log statements emitted sensitive data in clear text: - model_routes / chat_routes / contacts_routes logged endpoint URLs raw. Admin-configured URLs can embed credentials in userinfo or query (e.g. https://user:pass@host, ?api_key=...). Route them through a shared core.log_safety.redact_url() that drops userinfo/query/fragment. - note_routes / task_scheduler logged operator email addresses (smtp_user, recipient). Replaced with presence booleans, which keeps the diagnostic ("why didn't this send") without writing PII to logs. model_routes already had a local redactor on its HTTPStatusError branch; the generic except branch was missed, so reuse the existing helper there. Clears CodeQL py/clear-text-logging-sensitive-data alerts 264, 317, 324, 325, 343, 344, 528. * fix(security): re-bracket IPv6 hosts and single-source the URL redactor Address review on #4750: - redact_url now re-brackets IPv6 literals so host:port stays unambiguous (https://[2001:db8::1]:8443/v1, not the bracket-less ambiguous form). - point model_routes._redact_url_for_log at the shared helper so the two redactors are single-sourced (also picks up the IPv6 fix).	2026-06-22 23:12:39 +02:00
Gabriel Peña	91bba117c1	fix ask-user choices across reloads (#4669 )	2026-06-22 20:49:49 +02:00
pewdiepie-archdaemon	993d504de3	Clear remaining CodeQL path and parser alerts	2026-06-22 02:45:05 +00:00
pewdiepie-archdaemon	fbdec22dcb	CodeQL hardening for cookbook sync	2026-06-22 02:39:18 +00:00
pewdiepie-archdaemon	19dd82b8f6	CI test fixes for dev sync	2026-06-22 02:20:15 +00:00
pewdiepie-archdaemon	57e7229219	CI fixes for cookbook workflow sync	2026-06-22 02:08:25 +00:00
pewdiepie-archdaemon	92daf4e560	Cookbook launch and gallery upload fixes	2026-06-22 01:49:15 +00:00
pewdiepie-archdaemon	75f04bc088	Merge origin/dev into main	2026-06-21 11:08:50 +00:00
pewdiepie-archdaemon	c504214925	Cookbook model workflow fixes	2026-06-21 11:02:35 +00:00
Kenny Van de Maele	ed18192a8e	refactor(tools): move session tools to the agent_tools registry (#4454 ) Moves create_session, list_sessions, send_to_session and manage_session out of ai_interaction.py into src/agent_tools/session_tools.py (the do_ prefix dropped) and registers them in TOOL_HANDLERS, so dispatch flows through the registry instead of the dispatch_ai_tool elif in tool_execution.py. Same pattern as the model-interaction move. The bodies move verbatim; each fetches the runtime-set session manager via a get_session_manager() shim, and reuses _resolve_model / AI_CHAT_TIMEOUT from ai_interaction. manage_session's internal 'list' alias is repointed from the old do_list_sessions to the moved list_sessions. stream_ai_tool (dead, no callers) and do_pipeline stay put. dispatch_ai_tool loses its four now-unused branches. Tests: test_session_tools_registry covers registration, owner threading, the manage_session->list_sessions delegation, graceful no-manager handling, and registry dispatch. Verified end-to-end against a live SessionManager.	2026-06-19 11:55:22 +02:00
Kenny Van de Maele	cdae9879f2	feat(agent): add manage_bg_jobs tool to inspect and kill background bash jobs (#4577 ) Detached bash jobs (#!bg) could be launched and auto-reported on completion, but the agent had no way to act on a running one: no on-demand output read and no kill (it blocked until the 1h max-runtime). bg_jobs had the pieces (_read_output, list_for_session, internal _kill) but none was exposed. Adds: - bg_jobs.kill(job_id): tears down the process tree, marks the job killed, and sets followed_up so the monitor does not also auto-continue a deliberate kill. - manage_bg_jobs registry tool with actions list / output / kill, scoped to the chat that launched the job (cross-session access reads as not found). - Wiring: TOOL_HANDLERS/TAGS, function schema, RAG index + keyword hints, parser name map, dispatch (threads session_id via _direct_fallback). Gated like bash (NON_ADMIN_BLOCKED_TOOLS; plan-mode mutator). - agent_loop: background-job intent regex maps to the files domain (and the tool joins _DOMAIN_TOOL_MAP[files]) so short commands like 'kill that job' are not dropped by the low-signal gate that skips tool retrieval. - bg launch message tells the model to call manage_bg_jobs itself for check/stop rather than printing raw tool syntax to the user. Tests: tests/test_bg_job_tools.py (kill semantics, per-chat scoping, actions, and the intent classifier).	2026-06-19 00:28:22 -07:00
pewdiepie-archdaemon	20c6ba8f32	Bump APP_VERSION to 1.0.1	2026-06-19 00:34:37 +00:00
pewdiepie-archdaemon	9adb940ef9	Agent stream: 10s heartbeat keepalive on the SSE subscribe so long-running thinking models dont drop the connection	2026-06-19 00:34:30 +00:00
pewdiepie-archdaemon	2fbfd22946	Agent loop: compact one-line tool-usage hints for local/small models so the system prompt doesnt eat the context budget	2026-06-19 00:34:24 +00:00
Michael	39a802bea2	fix(tools): prune skipped dirs before descending in glob tool (#4538 ) * fix(tools): prune skipped dirs before descending in glob tool GlobTool used pathlib.Path.rglob which descends into every directory (including node_modules, .git, dist, etc.) and filters AFTER the walk. On repos with large junk directories this causes the glob tool to hang for minutes. Replace rglob with os.walk that prunes _CODENAV_SKIP_DIRS before descending — matching the approach GrepTool already uses. Also add a fast path for literal patterns (no wildcards → direct path lookup). Fixes #4493 * fix(tools): use regex glob matching to fix * semantics and literal fallback Replace fnmatch with _glob_to_regex so that * stays within a single path segment (matching pathlib/rglob semantics) and */ spans zero or more directories. Literal patterns now fall through to os.walk when the direct path lookup misses, so e.g. 'foo.py' still finds files at any depth. Add tests for: - bare literal matching in subdirectories - multi-segment single-star patterns (sub/.txt) - * not crossing / boundaries - ** matching at arbitrary depth Closes #4493 --------- Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>	2026-06-18 22:02:29 +02:00
RaresKeY	1cc8a373b0	fix(cookbook): validate agent SSH targets (#4429 )	2026-06-18 21:41:33 +02:00
nubs	0bfc7750a2	fix(llm): route gpt-oss harmony commentary channel without leaking markers/tool-args (#4523 ) The harmony stream router only recognized the analysis and final channels, so gpt-oss's standard `commentary` channel (tool-call preambles / function-arg bodies) was unhandled: the literal `<\|channel\|>commentary` marker, the `to=functions.*` recipient, and the commentary body all leaked into the visible answer. Add commentary to the marker regex + the suffix-hold table, and route its body to thinking (only `final` is user-facing). Adds a regression test (split-chunk + recipient + body), verified to fail without the fix.	2026-06-18 21:12:25 +02:00
Mazen Tamer Salah	b51d83b16d	fix(agent): index api_call so RAG tool selection can retrieve it (#3923 ) * fix(agent): index api_call so RAG tool selection can retrieve it api_call exists in FUNCTION_TOOL_SCHEMAS and the agent's system prompt advertises configured API integrations, but the tool had no entry in BUILTIN_TOOL_DESCRIPTIONS. RAG tool selection embeds those descriptions and retrieves the top-K per message, so a tool without one can never be selected: the agent claims it can call Home Assistant/Miniflux/Gitea/etc. and then never receives the api_call schema (unless the Personal Assistant ASSISTANT_ALWAYS_AVAILABLE path applies). Add a retrieval-rich description for api_call, plus an ast-based parity test asserting every FUNCTION_TOOL_SCHEMAS tool has an index description so the next added tool cannot silently drift the same way. Fixes #3794 * fix(agent): route API-integration intent to api_call at selection time Addresses review (RaresKeY) on #3923: indexing api_call in the ToolIndex description was necessary but not sufficient — the #3794 repro ('Use the api_call tool to call Home Assistant GET /api/states') matched no domain in _classify_agent_request, classified as low-signal, so the agent loop skipped retrieval entirely and the schema filter sent only ALWAYS_AVAILABLE (manage_memory/ask_user/update_plan). api_call never reached the model. - _classify_agent_request: detect API-integration intent (api_call, integration(s), Home Assistant/Miniflux/Gitea/Linkding/Jellyfin) -> new 'integrations' domain, so the turn is no longer low-signal. - _DOMAIN_TOOL_MAP['integrations'] = {api_call}: deterministically seeds api_call into relevant tools after retrieval, independent of embeddings. - _DOMAIN_RULES['integrations']: rule pack (required — _domain_rules_for_tools indexes _DOMAIN_RULES[domain] directly). - tool_index _KEYWORD_HINTS: parity hint for the retrieval / keyword-fallback paths. - Regression drives the real classifier -> domain-map -> FUNCTION_TOOL_SCHEMAS filter chain and asserts api_call is advertised for the #3794 prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-18 08:43:25 +00:00
Kenny Van de Maele	56ba144875	refactor(tools): move model-interaction tools to the agent_tools registry (#4445 ) Moves chat_with_model, ask_teacher and list_models out of ai_interaction.py into src/agent_tools/model_interaction_tools.py (the do_ prefix dropped) and registers them in TOOL_HANDLERS, so dispatch flows through the registry instead of the dispatch_ai_tool elif in tool_execution.py. The implementations are relocated, not wrapped. ai_interaction.py keeps only the shared helpers they reuse (_resolve_model, AI_CHAT_TIMEOUT), still used by the not-yet-migrated session/pipeline tools. dispatch_ai_tool loses its three now-unused branches. Also removes the dead do_second_opinion: it was already off the live tool surface (no tag/schema/parsing/dispatch; tool_index.py notes it was removed), so the function and its stale frontend catalog entries (admin.js, assistant.js) are deleted. Tests: owner-scope test points at the new list_models location and drops the moved tools from the dispatch_ai_tool parametrize; a new test_model_interaction_registry covers registration, owner threading, and registry dispatch.	2026-06-18 05:56:37 +00:00
Muhammad-Ikhwan-Fathulloh	b3ed60e95a	fix: optimize upload manifest performance and fix owner rename bug	2026-06-16 23:11:30 +07:00
Kenny Van de Maele	93569b141b	fix(security): allowlist manage_mcp 'add' to close the agent-path RCE (#4433 ) * fix(security): allowlist manage_mcp 'add' to close the agent-path RCE do_manage_mcp('add') passed model- and prompt-injection-controlled command, args, and env straight to a stdio subprocess spawn with no validation, and it persisted an enabled server row before connecting (so a payload also survived to re-execute on restart). A string smuggled into a skill description, memory entry, fetched page, or email body could register a server running arbitrary code as the app UID, e.g. command='sh' args=['-c','...']. Add _validate_mcp_command, applied on the agent path before any DB write or spawn: - Hard-deny interpreters, runtimes, package runners, shells, and exec-wrappers (even if an operator lists one in ODYSSEUS_MCP_ALLOWED_COMMANDS). - Require a bare basename (no path components, no shell metacharacters) that is present in the operator allowlist (empty by default). - Reject code-exec argv flags by prefix so glued forms are caught too (-c/-e/-m/--eval/--exec/--print/--module/--command/--require), remote-URL args, and env keys that inject code into the child (LD_PRELOAD, NODE_OPTIONS, PYTHONPATH, DYLD_, PATH, ...). A rejected registration returns an error, writes no row, and makes no connection. The trusted admin route is unchanged. Mirrors the policy intent of _validate_serve_cmd but inverted for the model-reachable surface. Supersedes #438; incorporates the bypass forms found in its review (interpreter script paths, -m pip, glued -c/-e, --eval=, eval subcommands, package runners, remote URLs) and adds integration coverage on the real do_manage_mcp path. Closes #2891 fix(security): deny versioned/alias runtimes in manage_mcp allowlist Addresses RaresKeY's review on #4433. The hard-deny matched command names exactly, so versioned or alias runtime forms (python3.11, node18, pip3, ruby3.2, java, javac, bunx, tsx, ts-node, pypy3, ...) slipped past and, if an operator allowlisted one, re-opened the prompt-injection-controlled MCP registration path. - Canonicalize a trailing version suffix before the deny check so versioned forms collapse to the family (python3.11 -> python, node18 -> node, pip3 -> pip); both the raw basename and the canonical form are denied. - Broaden the denied-family set (java/javac/jshell/jbang/kotlin/dotnet/mono/ swift/osascript/tsx/ts-node/bunx/pypy/jruby/raku/luajit/wish/expect/iex). Deny runs before the operator allowlist, so an alias cannot be allowlisted back in. Canonicalization only feeds the deny check, so a legit name that ends in a digit still reaches the normal allowlist check rather than being mis-denied. Adds validator + integration regressions for versioned/alias runtimes asserting no DB row and no connection, including the allowlisted-anyway case.	2026-06-16 14:34:53 +00:00
Ashvin	dd20c2bc75	fix(tasks): offer shell/file tools to scheduled task agents by default (#4398 ) The scheduled-task runner built the agent's tool set from RAG retrieval plus ASSISTANT_ALWAYS_AVAILABLE. Neither includes bash/python (nor the file tools), and no keyword hint force-includes them, so a task only saw the shell when the tool-embedding index happened to surface it. On hosts where that index is empty or degraded (e.g. a fresh Docker deploy), retrieval returns nothing and the task agent never receives bash/python — telling the user the shell is disabled even for an admin owner. Offer the shell/file group to task agents by default, mirroring the chat agent where these are on unless a privilege or global setting turns them off. The existing blocked_tools_for_owner() gate in stream_agent_loop still strips the whole group for non-admin multi-user owners and only admits it for admins and single-user (AUTH_ENABLED=false) deployments, so this changes what is offered, not who is allowed. A crew that defines an explicit enabled_tools allowlist still has its restriction honored. Also merge the operator's global disabled_tools setting into the scheduler's disabled set before composing relevant_tools and before entering the agent loop, matching what chat already does. Without it, the global tool-disable contract did not reach unattended scheduled tasks: an admin or AUTH_ENABLED=false task could still see and call shell/file tools the operator had turned off globally, since the prompt/schema/execution gates only enforce the disabled tools passed in.	2026-06-16 13:27:30 +00:00
Rudy Wolf	4e477741e7	harden(agent-loop): wrap non-native tool results as untrusted data (#1629 ) The non-native (prompted) tool-call path fed tool output back to the model as a plain "[Tool execution results]" user message, bypassing the untrusted_context_message wrapper that THREAT_MODEL.md requires for tool output. That path is what models without native tool-calling (many smaller local models) use, so prompt-injection inside a tool result (fetched page, file read, MCP/email output) could be read as instructions there. Wrap it via untrusted_context_message("tool execution results", ...), the same hardening already applied to skills (#788) and escalation traces (#275). Also update _recent_context_for_retrieval, which used the old "[Tool execution results]" prefix as a sentinel to keep tool envelopes out of the retrieval query, to recognise the wrapped envelope via metadata.trusted. The native path keeps returning tool-role messages (a user-role wrapper would break the native tool-call contract); it is covered by UNTRUSTED_CONTEXT_POLICY. Adds tests/test_tool_output_prompt_injection.py. Fixes #1627. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 13:35:07 +02:00
Karl Jussila	ee72d71872	fix(auth): centralize password and username validation constants (#4120 ) Added PASSWORD_MIN_LENGTH and RESERVED_USERNAMES to src/constants.py as the single source of truth. Previously PASSWORD_MIN_LENGTH was hardcoded as 8 in four route handlers and all three JS validation paths; RESERVED_USERNAMES was an inline frozenset duplicated in core/auth.py, routes/assistant_routes.py, routes/research_routes.py, and src/task_scheduler.py. Added GET /api/auth/policy (unauthenticated) so the frontend reads the real values from the server instead of hardcoding them in JS. Added missing empty-username guard to /setup and admin POST /users. Both returned a misleading 500/409 on whitespace-only input. /signup already had the check; this makes all three consistent.	2026-06-16 09:52:15 +02:00
RaresKeY	260ce8ba59	fix(email): enforce MCP owner boundaries (#4335 ) * fix(email): enforce MCP owner boundaries * fix(email): fail closed for unowned MCP fallback	2026-06-16 04:31:24 +01:00
RaresKeY	33fe7276be	fix(endpoints): normalize URL handling (#4338 )	2026-06-16 03:59:18 +01:00
RaresKeY	4d10c16d02	fix(auth): clean up rename and null-owner ownership (#4340 )	2026-06-16 03:33:02 +01:00
TheDragonTail	0f966d6b9f	fix(embeddings): fall back to default cache dir when FASTEMBED_CACHE_PATH is empty (#3434 ) docker-compose.yml injects FASTEMBED_CACHE_PATH=${FASTEMBED_CACHE_PATH:-}, which sets the variable to an empty string when the host has not defined it. FASTEMBED_CACHE_DIR used os.getenv("FASTEMBED_CACHE_PATH", default), and os.getenv only returns the default when the variable is ABSENT -- so the empty value won and FASTEMBED_CACHE_DIR became "". os.makedirs("") then raised [Errno 2] No such file or directory: '', FastEmbed failed to initialise, and every vector feature (RAG, semantic memory, tool index) silently degraded on the default Docker stack. Treat an empty value like an absent one via `os.getenv(...) or default`. Add a regression test covering the empty, unset, and explicit cases. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 03:11:48 +01:00
Afonso Coutinho	7b09491557	fix: check-in calendar digest leaks every user's events (missing owner scope) (#1925 ) * fix: check-in calendar digest leaks every user's events (no owner scope) * Seed dtend on calendar events in digest test so the NOT NULL column is satisfied	2026-06-16 02:42:41 +01:00
Kenny Van de Maele	fafaf089c5	refactor(search): centralize the web-scraping User-Agent into one constant (#4325 ) The outbound UA for web_fetch / web_search was inlined in four places with two different values and nothing keeping them current: content.py pinned a mid-2021 Chrome 91 build, and providers.py sent a bare Mozilla/5.0 in three spots. Some sites serve a degraded or blocked page to a UA that old. Add WEB_FETCH_USER_AGENT to src/constants.py (env-overridable, matching the existing Copilot/Kimi UA-constant pattern) and import it in content.py and providers.py. Default to a current, common desktop UA so pages return their normal HTML: the market-leading desktop OS (Windows; NT 10.0 covers Windows 10 and 11) and browser (Chrome) on a current stable build. The version is now bumped in one place. Service-specific self-identifying agents (Copilot, Kimi, webhooks, cookbook) are intentionally left separate. Adds a regression pinning the constant shape, the env override, and a guard against a new inline Mozilla literal in the search sources. Closes #4324	2026-06-16 01:33:47 +00:00
holden093	dd2e23c9af	fix(agent): report phone numbers from resolve_contact when a matched contact has no email (#4327 ) When a CardDAV contact matched the search query but had no email address (only phone numbers), the tool silently dropped it and returned 'No contacts found'. Fall back to the contact's phone number(s) so the caller still receives usable information. Refs: #4178 (the contacts-domain classifier fix that made the model actually call resolve_contact for contacts queries, surfacing this pre-existing gap)	2026-06-16 00:03:33 +02:00
Kenny Van de Maele	074a1e6eff	fix(search): add download budgets to web_fetch with truncation notice and hard ceiling (#3955 ) * fix(search): add download budgets to web_fetch with truncation notice and hard ceiling MAX_OUTPUT_CHARS only trims what the agent sees; fetch_webpage_content buffered and cached the entire response body first, so a large or hostile URL could pull arbitrarily many bytes into memory and the content cache. The fetch is now a capped streaming GET (SSRF redirect guard unchanged): a soft default budget (WEB_FETCH_SOFT_MAX_BYTES, 2 MB), a per-call override via full/max_bytes on the web_fetch tool, and a hard ceiling (WEB_FETCH_HARD_MAX_BYTES, 20 MB) that the override can never exceed. When Content-Length already declares a body over the ceiling the fetch is refused before any body bytes are buffered. Truncated results carry truncated/fetched_bytes/total_bytes, the tool output leads with a partial-content notice telling the model how to re-fetch with full=true, and the tool schema documents the flag. A truncated PDF is reported as a budget error since a cut PDF is unparseable. The effective cap is part of the content-cache key so a truncated fetch is never served to a full-budget request. Existing tests that faked httpx.get or the old _get_public_url signature are adapted to the streaming interface; behavior pins are unchanged. Fixes #3812 * fix(search): close compressed-body cap bypass and protect the partial notice Addresses RaresKeY's review on #3955: - Force Accept-Encoding: identity for the capped fetch. With gzip/deflate the wire bytes (and Content-Length) can be a fraction of the decoded body, so a tiny compressed response could pass the hard-cap preflight and then expand past the ceiling in a single decoded chunk before the streamed cap could slice it. Identity makes Content-Length the true body size and keeps each streamed chunk bounded by the network read, so the hard ceiling actually bounds memory. - Lead web_fetch output with the partial-content notice and cap the page title. The notice is the user-facing contract for partial fetches, but the title is untrusted, uncapped page content; placed ahead of the notice a giant title could push it past MAX_OUTPUT_CHARS and drop it. The notice now leads and the title is capped as a second guard. Adds regressions: the fetch advertises identity encoding, and a truncated result with an oversized title still surfaces the partial notice. * fix(search): reject compressed responses that ignore the identity request Requesting Accept-Encoding: identity is not enough on its own: a server can ignore it and still return Content-Encoding: gzip, and httpx.iter_bytes would decode that, so a tiny compressed body could balloon into one decoded chunk far past the hard cap before the streamed loop slices it (and Content-Length, the compressed wire length, makes the preflight and size metadata unreliable). Refuse a non-identity Content-Encoding before reading the body. Adds a regression where the server ignores the identity request and returns gzip; the fetch is refused before any body is decoded.	2026-06-15 17:38:09 +00:00
Lucas Daniel	f4e8990635	chore: add warnings to silent except Exception blocks (#3212 ) * log(app): add warnings to silent except Exception blocks - Internal tool auth header failure now logs a warning instead of silently passing, making auth bypass easier to spot in logs. - Token last_used_at update failure now logs at DEBUG (fire-and-forget, non-critical, but useful when debugging token tracking issues). - Image ownership verification failure now logs a warning so unexpected access-check errors surface instead of silently allowing the request. * log(chat_routes): add warnings to silent except Exception blocks - clear_orphaned_session_endpoint: log before rollback so failures appear in traces when users see stale/deleted model options. - _endpoint_has_model (JSON parse): log malformed cached_models instead of silently treating endpoint as valid. - _has_any_visible_model (JSON parse): log malformed cached_models instead of silently returning empty list. - timezone header parse: log failure so time-zone-related tool bugs (wrong scheduled times, calendar events) are traceable. - attachments JSON parse: log failure so silently-dropped attachments are visible in server logs. * log(email_routes): add warnings to silent except Exception blocks - Email alias resolution failure now logs a warning instead of silently returning an empty list, making broken account configs diagnosable. * log(document_routes): add warnings to silent except Exception blocks - Export ZIP request body parse failure now logs a warning so empty exports caused by malformed requests are diagnosable. - clear_active_document failure on detach now logs a warning to help trace doc re-injection bugs like #1160. * log(agent_loop): add warnings to silent except Exception blocks - builtin tool overrides load failure now logs a warning so misconfigured settings don't silently fall back to defaults without a trace. - Timezone context injection failure now logs a warning to help debug incorrect scheduled times in agent-created tasks. - PDF form-backed document detection failure now logs a warning so broken form-doc UI is traceable to the root cause. * log(llm_core): add warnings to silent except Exception blocks - Malformed URL in _is_ollama_native_url now logs a warning so bad endpoint configs are traceable instead of silently returning False. - Model list fetch failure now logs a warning with the endpoint URL so endpoints that silently vanish from the model picker are diagnosable. * log: pass exception via exc_info instead of string interpolation * fix(logging): avoid logging raw URLs in llm_core error paths Drop the raw url/base_chat_url from the Ollama-detection and model-list-fetch warning logs added by this sweep, since these values can contain private hostnames, internal IPs, credentials, or other deployment details. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 17:49:27 +01:00

1 2 3 4 5 ...

473 Commits