odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-16 09:45:24 -04:00

Author	SHA1	Message	Date
ThomasAngel	a0b0420e6f	chore: Switch duckduckgo-search to ddgs (#3143 ) * Switch to ddgs duckduckgo_search was deprecated, this is the recommended replacement * Update test_service_search_provider_guards.py According to review comment	2026-06-10 17:59:47 +02:00
Mazen Tamer Salah	96975f8dd9	fix(contacts): tolerate non-string body in /api/contacts/import (#3638 ) import_vcf built `text = data.get("vcf") or data.get("text") or ""`, so a non-string JSON value (a number, list, etc.) stayed in place and the following `text.strip()` raised AttributeError, returning HTTP 500. Coerce vcf/text/csv with str() so non-string input degrades to the existing structured "no data" response, matching the file's convention elsewhere. Adds tests/test_contacts_import_nonstring.py covering non-string vcf, non-string csv, and an empty body.	2026-06-10 17:50:22 +02:00
Mazen Tamer Salah	4e210d3337	fix(research): stop rescanning the research dir on every status poll (#3637 ) get_status() called get_avg_duration() unconditionally, and that helper globs and JSON-parses every file under the research data dir. The SSE status stream polls get_status() roughly once a second, so with a few saved reports each poll re-read and re-parsed all of them, including for sessions that are not active (the disk branch never even used the value). Compute avg_duration only for active sessions and memoize it on the task entry, so a long stream computes it once instead of on every poll. Behaviour is unchanged: active streams still report avg_duration. Adds tests/test_research_status_avg_duration.py: an inactive session does no avg scan, and an active session computes it once across many polls.	2026-06-10 17:40:44 +02:00
RaresKeY	800d391234	fix(auth): roll back rename on owner migration failure (#3616 )	2026-06-10 17:28:27 +02:00
Ashvin	9c8df89973	fix(auth): case-insensitive skill owner match on rename (#3614 ) SKILL.md files written with mixed-case owner (e.g. 'owner: Alice') were skipped because the regex had no IGNORECASE flag. _usage.json keys like 'Alice::skill-name' were missed by the startswith prefix check for the same reason. Both comparisons now match the same way the deep_research and memory blocks do — case-insensitively against old_username. Fixes #3611	2026-06-10 17:20:36 +02:00
Maruf Hasan	edce608008	fix(ui): raw SVG markup displayed instead of search icon for web_search tool label (#3601 ) * fix(ui): escaped SVG renders as raw markup during web_search tool label The _toolLabels['web_search'] entry embedded an SVG HTML string concatenated with label text. At render time the entire value was passed through esc(), HTML-escaping <svg> tags so the icon displayed as raw text instead of rendering visually. Fix: separate icon from label text via a _toolIcons map. The SVG is injected as raw innerHTML (unescaped) in .agent-thread-icon, while the label text remains safely escaped. * test: add behavioral test for web_search tool icon rendering Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com> --------- Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com>	2026-06-10 16:50:43 +02:00
RaresKeY	ee6cfbd25a	fix(auth): drop reserved usernames loaded from auth config (#3727 )	2026-06-10 16:31:26 +02:00
RaresKeY	cd3fb4e96b	fix(auth): fail closed when deleting user tokens fails (#3733 )	2026-06-10 16:24:27 +02:00
SurprisedDuck	e115b0155c	fix(security): don't grant tool access in the pre-setup window (#3506 ) * fix(security): don't grant tool access in the pre-setup window owner_is_admin_or_single_user() returned True whenever auth was not configured, which conflated two very different states: - intentional single-user mode (operator set AUTH_ENABLED=false), and - the pre-setup window (auth enabled, but no admin created yet). In the second state, blocked_tools_for_owner() returned an empty set, so server-execution tools (bash/python) and other admin-only tools were ungated. The auth middleware already 401s /api/ requests pre-setup, but a caller that bypasses it (trusted loopback / internal-tool path) could reach those tools before setup completed. Treat "not configured" as admin only when auth is intentionally disabled (AUTH_ENABLED=false), mirroring the AUTH_ENABLED parsing in app.py and core.middleware. Single-user mode is preserved; the pre-setup window is now non-admin as defense-in-depth. Adds regression tests for both states. Fixes #3201 Supported by Claude Opus 4.8 * refactor(security): reuse _auth_disabled() instead of a duplicate helper Addresses review on #3506: src/auth_helpers.py already has _auth_disabled() with the identical AUTH_ENABLED parse. Drop the duplicate _auth_intentionally_disabled() and call the existing helper via a lazy import inside owner_is_admin_or_single_user (mirroring the lazy core.auth import) to avoid any import cycle. Removes the now-unused `import os`. Behaviour and the two regression tests are unchanged. Supported by Claude Opus 4.8 --------- Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>	2026-06-10 14:37:26 +02:00
ooovenenoso	725d174243	fix(research): track analyzed URLs separately (#3125 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-10 12:08:22 +01:00
Yeoh Ing Ji	3e49658204	refactor(tools): extract document tools to handle registry (#3666 ) * feat(tools): add document management tool handlers to the agent_tools module * feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py * feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools * refactor(tools): add document tool dispatcher and updated tool calling path * refactor(tools): remove duplicated document management functions * refactor(tools): removing unused functions and adding new import paths * refactor(tools): update document tool execute methods to use context dictionary * refactor(tests): update import paths for document tools in test files * refactor(tests): update owner parameter format in document management tests * refactor(tests): update import path for _owned_document_query * feat(tools): add document management tool handlers to the agent_tools module * feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py * feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools * refactor(tools): add document tool dispatcher and updated tool calling path * refactor(tools): remove duplicated document management functions * refactor(tools): removing unused functions and adding new import paths * refactor(tools): update document tool execute methods to use context dictionary * refactor(tests): update import paths for document tools in test files * refactor(tests): update owner parameter format in document management tests * refactor(tests): update import path for _owned_document_query * refactor: update import paths for document tools * fix(tests): correct source path for document ID test	2026-06-10 10:41:52 +02:00
Alexandre Teixeira	fc8e6366dd	test: mark first slow tests from duration evidence (#3711 )	2026-06-10 01:07:38 +02:00
Lucas Daniel	55ff22c6d5	fix(chat): stabilize system prompt, sequence memory extraction, and send stable session id to preserve KV cache (#3360 ) * fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache Fixes #2927. As diagnosed in the issue, three things in Odysseus's request pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache continuity, forcing a full prompt re-evaluation (15-30s+) on every turn: 1. Dynamic content folded into the system prompt every turn. Both the chat preface (ChatProcessor.build_context_preface) and the agent system prompt (_build_system_prompt) injected current_datetime_prompt() — text that changes every minute — directly into system-role messages, which llm_core then concatenates into the single system message sent as the cached prefix. Any byte difference there invalidates the entire cache. Moved this to a new current_datetime_context_message() helper that returns a standalone user-role message, inserted near the end of the array (right before the latest user turn) instead of mixed into the system prompt. The static system prefix (preset prompt + safety policy + agent base prompt) now stays byte-identical across turns of the same session. 2. Memory/skill extraction side-requests competed with the main completion. run_post_response_tasks fired extract_and_store / maybe_extract_skill via asyncio.create_task — fire-and-forget coroutines that could overlap the next turn's main request and steal llama.cpp's limited processing slots, evicting the cached checkpoint. They're now queued through a new _run_extraction_jobs_sequentially helper that waits for the session's stream to go idle and runs the jobs strictly one at a time. 3. No stable session identifier was sent to local backends, so llama.cpp assigned a new processing slot via LRU every turn ("session_id=<empty> server-selected (LCP/LRU)"), losing slot affinity. Added _apply_local_cache_affinity() in llm_core, which sets session_id and cache_prompt: true on outgoing payloads — gated to self-hosted OpenAI-compatible endpoints only (never api.openai.com or other cloud providers, which reject unrecognized request fields with a 400). Threaded session_id through stream_llm / llm_call_async / stream_agent_loop from the existing Odysseus session id. Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload- assembly and scheduling code paths: byte-identical system prefix across two turns of the same session (with a regression check that genuinely changed instructions DO still change it), the dynamic time block landing as a user-role message, extraction jobs waiting for the stream to go idle and running sequentially, and the outgoing payload carrying a stable session_id (same across turns of one session, different across sessions) only for self-hosted endpoints. Updated tests/test_user_time.py for the new message placement. * fix(tests): accept owner= kwarg in normalize_model_id monkeypatch The upstream normalize_model_id signature now takes an owner= keyword argument, and chat_helpers.py passes owner=getattr(sess, "owner", None) at the call site. Update the test stub lambda to **kwargs so it handles the new argument without breaking, and update chat_helpers.py to forward the owner parameter consistently. --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 22:46:54 +01:00
Lucas Daniel	d273085744	fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut (#3540 ) * fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut * fix(integrations): avoid mutating response dict in-place on truncation * fix(integrations): truncate dict responses and bound list sentinel overhead - Dict path now walks keys in insertion order, adding them one at a time while checking that the accumulated dict + _truncated marker fits within the 12 000-char limit. Previously the marker was appended without removing any content, so large dicts were not actually truncated. - List path now subtracts the sentinel's serialised size (+ element-separator padding) from the budget before binary-searching, so the final array including the sentinel stays at or under the limit. - Add regression tests: large-dict actually-truncated, small-dict pass-through, and list-with-sentinel respects the size bound. --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 22:34:08 +01:00
Michael	2e6fff2212	fix: preserve reasoning_content in sanitized messages for Moonshot/Kimi (#3152 ) Providers like Moonshot (Kimi K2.5/K2.6) require the reasoning_content field to be present on assistant tool-call messages in multi-turn conversations. The sanitizer's allow-list was missing this field, causing HTTP 400: 'thinking is enabled but reasoning_content is missing in assistant tool call message at index N'. Add reasoning_content to the allowed field set in _sanitize_llm_messages and cover with regression tests. Fixes #3118 Co-authored-by: michaelxer <michaelxer@users.noreply.github.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 21:44:38 +01:00
Alexandre Teixeira	a22c0fa85e	test: pilot core database stub helper (#3685 )	2026-06-09 22:23:33 +02:00
arnodecorte	38dc9a0a41	Allow cookbook scopes for API tokens (#3090 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 21:03:40 +01:00
Rohith Matam	fbd8ee9033	fix: fall back for npx cache subprocess check (#3560 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 20:41:23 +01:00
RaresKeY	5d33393a28	fix(gallery): fail closed for null-user owner scope (#3613 )	2026-06-09 20:20:21 +02:00
Alexandre Teixeira	cdfda4bd16	test: add fast lane and duration visibility (#3659 )	2026-06-09 20:11:47 +02:00
Sid	9e74a327f8	fix(llm): remove max_output_tokens from ChatGPT Subscription payload (#3656 ) ChatGPT's Codex API rejects any request that includes max_output_tokens, returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused Deep Research to always fail during the endpoint probe when a ChatGPT Subscription model was selected. Remove the conditional that set payload["max_output_tokens"] in _build_chatgpt_responses_payload(). The parameter is simply not sent. Also update the two affected tests: - Rename test_chatgpt_subscription_payload_uses_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens - Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero - Assert max_output_tokens is absent rather than present Fixes #3650	2026-06-09 17:42:12 +02:00
RosenTomov	c46d37d876	test(tool_execution): stop two tests leaking src.tool_execution into the suite (#2686 ) * Make in-venv pip-fallback test independent of the runner's environment test_pip_install_fallback_chain_propagates_failure_in_venv simulated the in-venv case by probing the real interpreter (sys.prefix != sys.base_prefix). That assumes the test runner is itself inside a venv. CI runs pytest with no venv, so venv_check reported not-in-venv, the negated guard flipped, the --user branch fired, and the assertion failed. Make venv_check exit 0 directly to simulate the in-venv condition deterministically, mirroring the outside-venv companion test. * Stop agent-tool import shims from leaking into the admin-gate test test_function_call_non_object_args and test_unknown_tool_calls stub heavy DB/auth deps at import time to load the real agent-tool stack, but they popped src.tool_execution and left core.auth stubbed without restoring. Popping and re-importing src.tool_execution rebinds the src package's tool_execution attribute, so test_edit_file's later 'import src.tool_execution as te' resolved to a different module object than the one execute_tool_block lives in. The monkeypatch on _owner_is_admin then missed, the non-admin edit_file gate never fired, and the edit went through (exit_code 0). Stop touching src.tool_execution and restore the heavy stubs after import. Verified the full suite is green on Linux (Python 3.11, matching CI). --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:35:10 +01:00
Alexandre Teixeira	d4ab09e8e1	test: add focused test selection runner (#3556 )	2026-06-09 17:03:47 +02:00
Sheikh Rahat Mahmud	9180847c0e	feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964 ) * Add consolidated service health endpoint for degraded-state reporting ROADMAP (High Priority) asks for "Better degraded-state reporting for ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no single readout of which subsystems are actually working: /api/health is only a liveness ping and each subsystem's signal lives in a different module, so a misconfigured self-host install gives no consolidated picture. This adds an admin-only GET /api/diagnostics/services endpoint backed by a new src/service_health.py aggregator. Each subsystem reports a uniform {name, status, detail, meta} where status is ok \| degraded \| down \| disabled, and the response rolls up an overall verdict (worst non-disabled status). Probes are deliberately non-intrusive and safe to poll: - ChromaDB: reads the .healthy flags on the RAG and memory vector stores. - SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No search query is run. - ntfy: GET the server's built-in /v1/health. No test notification is sent. - email: short IMAP connect+logout per configured account (no credentials in meta). - providers: probe each enabled ModelEndpoint's model list (no api_key in meta). Probe functions take their inputs as parameters and isolate the network call to injectable callables, so they unit-test without touching the network (same pattern as the merged provider-endpoint tests). Network probes run concurrently off the event loop via asyncio.to_thread with bounded per-probe timeouts. memory_vector is now passed into setup_diagnostics_routes (new optional param, backward-compatible) so ChromaDB's vector-memory store can be reported too. Tests: tests/test_service_health.py — 29 tests covering every status mapping per subsystem, the overall rollup, and that no secrets leak into meta. Verification: python -m pytest tests/test_service_health.py -q # 29 passed python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q Backend + tests only; an Admin/Settings UI badge that renders this endpoint is a natural follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(diagnostics): bound service-health wall-clock and redact secrets Addresses review on #964. Blocker 1 — genuinely bounded wall-clock: - providers_health and email_health now fan out per-item probes across a bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET), instead of probing endpoints/accounts sequentially. Stragglers are reported as a controlled `timeout` and never block; the pool is shut down with wait=False so the response returns on time regardless of endpoint/account count. - The IMAP connect path now honors the service-health budget: _imap_connect gained a pass-through `timeout` param and the probe calls it with _PROBE_TIMEOUT instead of the default 15s. - collect_service_health runs the four network subsystems concurrently, each under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop. Blocker 2 — no secret/raw-error leakage in the response: - _safe_url strips userinfo, query, and fragment from every URL surfaced in meta (searxng instance, ntfy base, provider name fallback), keeping only scheme/host/port/path. - _classify_error maps every probe failure to a controlled category token (timeout, connection_refused, dns_error, tls_error, network_error, http_error, auth_or_protocol_error, …) — raw str(exception), which can embed credentialed URLs or server text, is never returned. Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py): - URL userinfo/query redaction for searxng/ntfy/providers. - secret-bearing exception strings map to categories and don't leak. - multiple slow providers/accounts stay bounded (single + 25-endpoint cases). - subsystems run concurrently; aggregate deadline yields a controlled result. - route-level unauthenticated (401) / non-admin (403) / admin (200) coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(diagnostics): isolate route tests so they don't leak module globals The new route tests replaced src.service_health.collect_service_health and routes.diagnostics_routes.require_admin via direct assignment, which persisted for the rest of the pytest session. In CI's full alphabetical run that fake collector (returning services=[]) leaked into the later collect_service_health tests and failed them. Switch to monkeypatch.setattr so both are restored after each test. No production code change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:00:24 +01:00
Maanas	c1674fc2aa	refactor(tools): migrate execution logic to src/agent_tools/ package with handler registry (#3435 ) * refactor(tools): implement strict cohesive class coordinator pattern per #2917 * test: update edit_file tests to use EditFileTool class * fix(tools): restore tool_policy param and security backstop in coordinator * refactor(tools): migrate domain tools to agent_tools package per #2917 * test: update test imports for new agent_tools package * fix: resolve circular import between tool_execution and agent_tools * fix: remove leftover git conflict markers * fix(tools): resolve pytest failure and document _apply method * fix(tools): clean up whitespace and remove dead _tool_python helper --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 14:35:36 +01:00
Joshua Valderrama	35b4dd2824	fix: session context drifting — messages leaking between chats (#135 ) (#267 ) * docs: add implementation plan for fixing chat context drifting (#135) * fix: make Session.history immutable + fix {}.history crash - Session.history now exposes a COPY of the internal _history list - add_message() replaces history with a fresh copy each time - get_context_messages() derives from _history directly - replace_messages() updates both _history and history - truncate_messages() updates both _history and history - _persist_message() line 207: fixed {}.history fallback crash - Added 11 tests for session isolation and edge cases Addresses #135 root cause #1: shared mutable references * fix: task scheduler uses SessionManager methods instead of overwriting sessions - Added ensure_task_session() to SessionManager (checks cache first) - Task scheduler now uses ensure_task_session() instead of direct dict assignment - Task scheduler now uses SessionManager.add_message() for message persistence - Removed direct sess_obj.history.append() that was silently losing data Addresses #135 root causes #2 and #3 * fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old Prevents the cleanup task from deleting sessions that were just created and haven't received any messages yet (message_count == 0). Addresses #135 root cause #5 * test: comprehensive session isolation tests (10/10 passing) * refactor: consolidate _session_manager into singleton pattern - Added set_session_manager_instance / get_session_manager_instance to core/models - kept backward-compat aliases (set_session_manager, get_session_manager) - session_manager.py re-exports the singleton functions - ai_interaction.set_session_manager now syncs with the core singleton - context_compactor uses get_session_manager_instance() instead of getattr hack - app.py initializes the singleton once Addresses #135 root cause #4: fragile global wiring * test: add concurrent session isolation integration tests Verifies: - Concurrent add_message to different sessions doesn't cross-contaminate - Rapid parallel writes maintain isolation - Read-write concurrent access is safe All 3 async tests pass, proving the immutable history fix works under concurrency * fix: pre-import core.models in conftest to prevent test pollution test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at module level during collection. Any test collected after it imports Session as a MagicMock. Pre-importing core.models in conftest.py before test_agent_loop.py's module-level code runs prevents this. * fix: make .history authoritative mutable list, address PR review Per review feedback: keep .history as the authoritative mutable list so existing code doing .history.pop(), .history = [...], etc. still works. Fix the cross-contamination bug by ensuring __post_init__() gives each Session its OWN unique history list (never shared). Changes: - core/models.py: .history IS the authoritative list. _history aliases it. Each Session gets its own list in __post_init__. - core/session_manager.py: add_message() delegates to Session.add_message() instead of appending directly — no double-append, single source of truth. - tests/test_session_manager.py: updated test to reflect that .history references see new messages (same list, not a snapshot). - docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for shipping — useful design context but too much process/doc to ship). All 272 tests pass (3 pre-existing failures unrelated). * Fix session manager message persistence * Fix session history alias regressions * Fix session history aliasing and task delivery	2026-06-09 14:12:52 +01:00
Maruf Hasan	c3fcaf15b7	feat(providers): add NVIDIA AI provider endpoint support (#3456 ) * feat: add NVIDIA as an AI provider (integrate.api.nvidia.com) * feat: add NVIDIA option to provider settings dropdown and aliases * test: add NVIDIA provider detection and endpoint tests * Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering - nvidia.com -> 'nvidia' curated key for proper provider routing - _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed - _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip, kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse, -embedqa, -nemoretriever * Expand non-chat model filtering for NVIDIA embedding/guard/video models Add _NON_CHAT_PREFIXES: embed, recurrent Add _NON_CHAT_CONTAINS: topic-control, guard, calibration, ai-synthetic-video, cosmos-reason2 Catches remaining unfiltered non-chat models from NVIDIA catalog: embedding (llama-nemotron-embed, embed-qa), guard (llama-guard, nemoguard-topic-control), calibration (ising-calibration), video (ai-synthetic-video-detector, cosmos-reason2), recurrent (recurrentgemma-2b) * Filter non-chat models in _probe_endpoint via _is_chat_model() Previously _is_chat_model() was only used in the per-model probe and _first_chat_model(), so non-chat models still appeared in the model picker even though they were filtered in those specific paths. Applying the filter at _probe_endpoint() return ensures non-chat models (embeddings, safety guards, reward, calibration, video detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never enter cached_models and never appear in the picker. * Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models Prefix checks (mid.startswith) miss models with org prefixes like baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc. Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught regardless of the org prefix. Adds: embed, bge, recurrent, starcoder, gemma-2b * fix(model-routes): drop collision-prone substrings from global non-chat filter The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES and _NON_CHAT_CONTAINS tuples. These are intended to filter out embedding, retrieval, safety, and vision models from NVIDIA's catalog that are not chat-completions-capable. However, four of the added substrings collide with legitimate chat models served by other providers: - gemma-2b matches google/gemma-2b-it (instruct chat model) - starcoder matches bigcode/starcoder2-15b (code completion model) - recurrent matches google/recurrentgemma-2b (language model) - guard matches meta-llama/Llama-Guard-3-8B (safety classifier) Removing these four from the global tuples keeps the NVIDIA-specific filtering intact (safety, embedding, retrieval, and vision models are still caught by other tokens such as content-safety, -safety, -reward, embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while preventing false negatives for instruct/code models on other providers. Tests added for gemma-2b-it, google/gemma-2b-it, and bigcode/starcoder2-15b-instruct asserting they are recognized as chat models. Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS entries redundant since the prefix check runs first. Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * style: fix indentation of groq and xai test cases in test_provider_endpoints.py --------- Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-09 11:06:12 +02:00
Mazen Tamer Salah	3c4ec8828b	fix(embeddings): survive numpy embeddings when restoring a reset lane (#3410 ) When a lane reset fails to rewrite the recreated collection, the recovery path re-adds the preserved rows. It read the embeddings with `preserved.get("embeddings") or []` and gated the loop with `if ids and docs and old_embeddings:`. chromadb returns embeddings as a numpy ndarray, whose truth value is ambiguous, so both expressions raise ValueError inside the except block — the restore is abandoned and every preserved row is lost (the collection was already deleted), exactly when the code is trying to avoid data loss. Use an explicit `is None` check and `len(...)`, and convert ndarray batches to lists before re-adding. Adds tests/test_embedding_lane_ndarray_restore.py (preserved embeddings come back as np.ndarray); existing test_embedding_lanes.py still passes.	2026-06-09 10:40:17 +02:00
Ashvin	2fdb4813db	fix(auth): sync file-backed and in-memory owner caches on user rename (#3397 ) The DB owner-rename loop in rename_user patched every SQL column named owner, but three non-SQL stores were left behind: 1. session_manager.sessions -- in-memory Session objects carry s.owner set at server-boot time. get_sessions_for_user() does an exact s.owner == username check, so the renamed user chat sidebar goes empty until a server restart. 2. data/deep_research/.json -- each completed research report is a standalone JSON file with an owner field. research_routes filters by d.get(owner) == user, making every report invisible to the renamed user. 3. data/memory.json -- a flat JSON array; each entry carries an owner field. memory_manager.load(owner=user) filters on it, so all memories vanish from the memory panel. Fix: after the SQL loop, patch all three: - iterate sm.sessions and update owner in-place (exposed via app.state) - walk data/deep_research/.json and rewrite owner with atomic_write_json - update matching entries in memory.json with atomic_write_json All three use the same case-insensitive lower() comparison the SQL loop already uses. Each step is independently wrapped so a single failure does not abort the others or the rename itself. Fixes #3362	2026-06-09 10:19:45 +02:00
nubs	f1cda91683	fix(agent): scope skill index to owner (#2404 ) Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-09 09:51:29 +02:00
Kenny Van de Maele	0aba00f4cf	refactor(tools): remove dead workspace-confinement plumbing (#3590 ) Commit `e6b1009` removed the workspace feature's entry point (deleted routes/workspace_routes.py + static/js/workspace.js and dropped the workspace-param parsing in chat_routes), but left the downstream backend plumbing dangling: chat_routes passed a hardcoded workspace=None into stream_agent_loop, which forwarded it to execute_tool_block, so the workspace value was permanently None and every workspace-gated branch was unreachable. Remove the now-dead code (no behavior change, since workspace was always None): - src/tool_execution.py: drop _resolve_tool_path_in_workspace and the workspace params/branches on execute_tool_block, _direct_fallback, _call_mcp_tool, _do_edit_file, and _resolve_search_root; restore the bash/python/bg cwd to _AGENT_WORKDIR. - src/agent_loop.py: drop the workspace param on stream_agent_loop, the dead 'ACTIVE WORKSPACE' system-prompt block, and the workspace forward. - routes/chat_routes.py: drop the hardcoded workspace=None arg and var. - tests: delete test_workspace_confine.py (tested the removed feature) and the workspace assertion in test_tool_policy.py. Full suite: 2903 passed, 1 skipped.	2026-06-09 08:30:50 +02:00
Afonso Coutinho	fbed9027b0	fix: backup import dropping a user's skill on cross-tenant title/id collision (#2057 ) * Fix backup import dropping a user's skill on cross-tenant title/id collision The skills block of import_data deduped incoming skills against skills_manager.load_all(), which returns EVERY tenant's skills. So when a user imports their own backup, any skill whose id or title collides with another user's skill was silently skipped — the importing user lost their own data. This is the same cross-tenant bug already fixed for the memories block just above (#1743); the skills block was left with the old pattern. Filter the dedup sets to the importing user's own skills (owner == user); the full store is still saved back, preserving other users' skills. * Restore sys.modules after stubbing so backup test does not break collection of later src.* test modules * Patch backup_routes auth helpers via monkeypatch instead of sys.modules stubs so the test is import-order robust * Give FakeSkillsManager an add_skill method matching the disk-backed skills API	2026-06-09 08:04:22 +02:00
Disorder AA	d9141c6e56	fix(cookbook): allow spaces and non-ASCII characters in model directory paths (#3473 ) * fix(cookbook): allow spaces in model directory paths Allow POSIX external-drive paths and Windows drive paths with spaces while keeping shell metacharacters rejected. * fix(cookbook): also allow non-ASCII (Unicode) characters in model dir paths The ASCII-only allowlist that rejected spaces also rejected Cyrillic, accented Latin and CJK folder names (e.g. /Volumes/Модели, D:\AI Models\Модели) with 400 Invalid local_dir. Switch the path character class from [A-Za-z0-9._ -] to [\w. -] (\w is Unicode-aware on Python 3 str patterns) so localized folder names validate, while shell metacharacters (; & \| ` $ quotes newlines) stay rejected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cookbook): reject local_dir path segments starting with '-' The local_dir allowlist includes '-', so a directory like /models/-rf (or D:\models\-rf) could be parsed as a CLI flag by hf/etc. (option injection) — and quoting does not stop a value from being read as an option. Guard against it inside the validator so the safety stays fully self-contained there rather than depending on consumers' quoting. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:58:38 +02:00
onemorethan0	8ae2b5f58c	fix(llm): suppress thinking mode for qwen3/gemma4 on Ollama /v1 endpoint (#3228 ) * fix(llm): suppress thinking for qwen3/gemma4 on Ollama /v1 compat endpoint When using qwen3, QwQ, gemma4, or other thinking models via Ollama's OpenAI-compatible /v1 endpoint, the model routes all output into its <think>...</think> reasoning block. Since Odysseus strips thinking content from round_response and only accumulates native tool_calls, this produces a round with 0 chars, 0 native calls, 0 tool blocks — the agent appears to silently do nothing. Root cause: Odysseus classifies the /v1 endpoint as provider="openai" (not "ollama"), so the payload is built as a standard OpenAI payload without any Ollama-specific options. Ollama's /v1 endpoint accepts "think": false as a top-level parameter to suppress extended thinking, but this was never sent. Fix: - Add _is_ollama_openai_compat_url() to detect local Ollama /v1 URLs - Inject "think": false in both stream_llm and llm_call_async for thinking models (qwen3, QwQ, gemma4, DeepSeek-R1, etc.) on this endpoint Verified with qwen3:14b on Ollama 0.24: with think=False the model correctly emits native tool_calls in a single streaming chunk and the agent executes bash/file/web tools as expected. * fix(llm): extend _is_ollama_openai_compat_url to match localhost on any port Per reviewer feedback on PR #3228: 1. Generalize host detection to mirror _is_ollama_native_url: match any localhost/127.0.0.1/0.0.0.0/::1 host (not just port 11434) so that custom OLLAMA_HOST ports and container remaps are also covered. 2. Add tests/test_llm_core_ollama_thinking.py covering: - _is_ollama_openai_compat_url for all positive/negative URL cases including IPv6, non-default port, native /api path, and real OpenAI - Payload injection: think:false set for Ollama /v1 thinking model, not set for non-thinking model, not set for real OpenAI endpoint, and set for localhost on a non-default port (the new case)	2026-06-09 07:35:15 +02:00
pewdiepie-archdaemon	1a529d63d9	Fix remaining CI regressions	2026-06-09 10:21:56 +09:00
Ocean Bennett	db1bbfe588	fix(sessions): keep fresh chats during auto tidy (#1871 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 01:06:20 +01:00
Kenny Van de Maele	2404b00f18	refactor(uploads): centralize upload byte-limits in upload_limits.py (#3364 ) (#3518 ) Move every per-route upload byte-limit into src/upload_limits.py as a validated, env-overridable constant via read_byte_limit_env: - Add GALLERY_UPLOAD_MAX_BYTES, GALLERY_TRANSFORM_UPLOAD_MAX_BYTES, MEMORY_IMPORT_MAX_BYTES, PERSONAL_UPLOAD_MAX_BYTES, EMAIL_COMPOSE_UPLOAD_MAX_BYTES, STT_MAX_AUDIO_BYTES, ICS_MAX_BYTES. - Routes import their constant instead of defining it locally: replaces 4 raw int(os.getenv(...)) and removes 3 hardcoded literals. - The 3 previously-hardcoded limits (email compose, STT audio, calendar ICS) are now env-overridable with the same ODYSSEUS_*_MAX_BYTES naming. - Defaults unchanged, so behavior is unchanged unless an env var is set; an invalid value now fails fast with a clear message instead of a bare int() ValueError. - Document all env vars in .env.example and the README. Fixes #3364	2026-06-09 01:24:30 +02:00
Alexandre Teixeira	a240f28af9	test(taxonomy): auto-mark tests by area and sub-area (#3491 )	2026-06-09 01:13:28 +02:00
Ocean Bennett	e7c1d75884	fix(models): query v1 models for llama-server endpoints (#3380 ) * fix(models): query v1 models for llama-server endpoints * test(models): accept owner kwargs in llama-server regression	2026-06-09 01:09:02 +02:00
Mateus Oliveira	f7ae85590b	refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils (#3478 ) * refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager() into a single leaf module (src/tool_utils.py) that imports only from src.constants. This eliminates the lazy-import hack ('from src import agent_tools' inside function bodies) in tool_execution.py and tool_implementations.py, and fixes a latent bug: the _truncate copy in tool_execution.py was missing the isinstance guard and would crash on None. Also deletes mcp_servers/_common.py — it was dead code with zero callers anywhere in the codebase, containing its own copy of truncate() and constants that already exist in src/constants.py. * fix(tools): route remaining get_mcp_manager imports to src.tool_utils The maintainer's feedback flagged src/task_scheduler.py:1857 and routes/task_routes.py:977. A project-wide search found a third call site in src/agent_loop.py that also imported get_mcp_manager from src.agent_tools instead of src.tool_utils. All three are now sourced from the canonical location in src.tool_utils. --------- Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>	2026-06-09 01:05:30 +02:00
Ocean Bennett	62ffcb6236	fix(cookbook): preserve same-host ssh profile selection (#3373 ) * fix(cookbook): preserve same-host ssh profile selection * fix(cookbook): resolve same-host ssh profiles in running tab and port lookups	2026-06-09 00:36:10 +02:00
Wes Huber	85c6056c87	test(models): add regression coverage for Z.AI coding endpoint probing (#2244 ) Add focused tests for the z.ai/api/coding path override: - _match_provider_curated: 5 tests verifying coding vs base key - _probe_endpoint: 3 tests verifying model preservation, curated append on partial response, and base-zai exclusion Rebased onto dev per reviewer request. Fixes #2230 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-08 23:07:29 +01:00
Rohith Matam	049833e309	fix: skip malformed document tool call items (#3494 )	2026-06-08 23:25:31 +02:00
Cookiejunky	4e497f4878	fix(cookbook): guard break-system-packages pip flag (#3510 )	2026-06-08 23:10:20 +02:00
Lucas Daniel	5462030cde	fix(auth): per-user allowed-models checklist ignores cache, [None] doesn't block (#3355 ) Three issues combined to make the per-user 'Allowed models' checklist unreliable (#3032): 1. admin.js _loadModelsForUser fetched /api/models, which is backed by cached_models — endpoints that haven't been probed yet (e.g. a freshly-added DeepSeek API endpoint) simply didn't show up in the checklist. Switched to /api/model-endpoints, which always reflects every configured endpoint regardless of cache state. 2. _saveModels sent allowed_models: [] both when the admin clicked [All] (no restriction) and [None] (block everything) — the backend had no way to distinguish the two. 3. _enforce_chat_privileges treated an empty allowed_models list as 'no restriction' (falsy -> skip the check), so [None] had no effect. Added an explicit block_all_models privilege flag (defaulting to False, and forced to False for admins) that admin.js now sets when zero models are checked. _enforce_chat_privileges checks it first and 403s regardless of allowed_models contents.	2026-06-08 22:52:39 +02:00
Lucas Daniel	0a324f20d2	fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356 ) * fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models _resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block parser whenever a model produced no native tool_calls, regardless of whether that model has a reliable native function-calling channel. Native models (GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write illustrative ```bash/```python/```json examples in guide-only prose; the fallback parser matched these and executed them as real commands, sometimes looping for several rounds as the model tried to clarify with more examples (#3222). Restrict the textual fenced-block fallback to non-native models, which rely on it as their only tool-invocation channel. Native models are trusted to use their structured tool_calls channel for real invocations; when they don't emit one, a bare fence in their response is prose, not an action. The native tool_calls path itself is untouched. This sits one layer below #3088's guide-only policy enforcement: that PR blocks tool exposure/execution on explicit no-tools requests, while this fixes the parser so ordinary illustrative fences are never misread as calls in the first place, on any turn. * fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence _resolve_tool_blocks previously short-circuited the entire textual parser (tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native function-calling models with no native tool_calls. That also dropped Patterns 2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content as text), which are real calls a model couldn't emit on its structured channel (e.g. DeepSeek-V falling back to DSML), not illustrative examples. parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks passes skip_fenced=is_api_model so fenced examples stop being executed for native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and recoverable. cleaned_round mirrors the same gate when persisting round text, so an illustrative fence that wasn't executed isn't stripped from saved/reloaded history either (it was streaming once and then disappearing on reload).	2026-06-08 22:25:28 +02:00
Mazen Tamer Salah	8e494cc1c4	fix(chat): keep balanced trailing ')' when extracting URLs (#3406 ) extract_urls() stripped any trailing ')' unconditionally via `re.sub(r'[.,;:!?\)]+$', '', url)`. That corrupts URLs that legitimately end in a parenthesis — most commonly Wikipedia disambiguation links like https://en.wikipedia.org/wiki/Python_(programming_language), which became ...Python_(programming_language and then 404 when fetched by the web/research tools. Strip trailing sentence punctuation as before, but only drop a ')' when it is unbalanced (more ')' than '('), so a prose-glued "(see https://example.com)" still loses its closing paren while balanced URLs keep theirs. Added tests/test_extract_urls.py covering balanced, unbalanced, nested, and trailing-punctuation cases.	2026-06-08 21:33:29 +02:00
nubs	932b7f2446	fix(email): close IMAP socket when connect/login fails (#3174 ) (#3363 ) * fix(email): close IMAP socket when connect/login fails (#3174) _imap_connect opened a live socket via _open_imap_connection and then called conn.login() with no try/finally, and _open_imap_connection called conn.starttls() unguarded. When auth fails (e.g. an Office 365 app password on an MFA-enabled tenant, #3174) or STARTTLS is rejected, the already-open socket was orphaned. Every IMAP caller funnels through _imap_connect, including the 30-minute _auto_summarize_poller, so a persistently misconfigured account leaked one descriptor per pass toward FD exhaustion. The previously merged leak fixes (#1325/#1330/#1423/#1530) only guard the post-connect body and monkeypatch _imap_connect to succeed, so this connect-time path was uncovered. Wrap login() and starttls() so a failure calls conn.shutdown() (low-level close; logout() can't run pre-auth) before re-raising. Adds two regression tests that fail without the guard. * fix(email): guard MCP IMAP+SMTP connect-time leaks too (#3174) Folds in the sibling connect-time leaks vdmkenny flagged on #3363, so the whole connect-then-step leak class is closed in one place: - mcp_servers/email_server.py::_imap_connect — guard starttls() and login(); close pre-auth with conn.shutdown() before re-raising. - mcp_servers/email_server.py::_smtp_connect — guard starttls() and login(); SMTP has no shutdown(), so close with conn.close() (socket close, no QUIT). Routes SMTP (_send_smtp_message) is already safe via 'with smtplib.SMTP(...)'. Adds four regression tests (one per guard), verified to fail without the fix.	2026-06-08 21:21:41 +02:00
Alex Little	a58f526992	fix(presets): scope expand-prompt model resolution to owner (#3477 ) * fix(presets): scope expand-prompt model resolution to owner /api/presets/expand resolved its model endpoint with no owner, so in a multi-user setup it could match another user's endpoint and use its URL and decrypted api_key. Pass effective_user(request) to _resolve_model so resolution is owner-scoped. Adds a regression test. * fix(presets): scope teacher and audit model resolution to owner Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alex Little <alexwilliamlittle@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-08 21:12:02 +02:00
Mazen Tamer Salah	5198516979	fix(sessions): copy message metadata when forking a session (#3409 ) fork_session passed each source message's metadata dict by reference into the new session. add_message() -> _persist_message() stamps _db_id (and timestamp) onto that dict in place, so persisting the fork overwrote the SOURCE messages' _db_id with the forked rows' ids — silently breaking edit/delete-by-id on the original conversation. Copy the metadata dict per message so the fork and source no longer alias. Adds tests/test_fork_session_metadata.py asserting the source session's message metadata is unchanged after a fork.	2026-06-08 20:49:15 +02:00

1 2 3 4 5 ...

700 Commits