odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-16 01:35:36 -04:00

Author	SHA1	Message	Date
Alexandre Teixeira	a22c0fa85e	test: pilot core database stub helper (#3685 )	2026-06-09 22:23:33 +02:00
arnodecorte	38dc9a0a41	Allow cookbook scopes for API tokens (#3090 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 21:03:40 +01:00
Rohith Matam	fbd8ee9033	fix: fall back for npx cache subprocess check (#3560 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 20:41:23 +01:00
RaresKeY	5d33393a28	fix(gallery): fail closed for null-user owner scope (#3613 )	2026-06-09 20:20:21 +02:00
Alexandre Teixeira	cdfda4bd16	test: add fast lane and duration visibility (#3659 )	2026-06-09 20:11:47 +02:00
Sid	9e74a327f8	fix(llm): remove max_output_tokens from ChatGPT Subscription payload (#3656 ) ChatGPT's Codex API rejects any request that includes max_output_tokens, returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused Deep Research to always fail during the endpoint probe when a ChatGPT Subscription model was selected. Remove the conditional that set payload["max_output_tokens"] in _build_chatgpt_responses_payload(). The parameter is simply not sent. Also update the two affected tests: - Rename test_chatgpt_subscription_payload_uses_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens - Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero - Assert max_output_tokens is absent rather than present Fixes #3650	2026-06-09 17:42:12 +02:00
RosenTomov	c46d37d876	test(tool_execution): stop two tests leaking src.tool_execution into the suite (#2686 ) * Make in-venv pip-fallback test independent of the runner's environment test_pip_install_fallback_chain_propagates_failure_in_venv simulated the in-venv case by probing the real interpreter (sys.prefix != sys.base_prefix). That assumes the test runner is itself inside a venv. CI runs pytest with no venv, so venv_check reported not-in-venv, the negated guard flipped, the --user branch fired, and the assertion failed. Make venv_check exit 0 directly to simulate the in-venv condition deterministically, mirroring the outside-venv companion test. * Stop agent-tool import shims from leaking into the admin-gate test test_function_call_non_object_args and test_unknown_tool_calls stub heavy DB/auth deps at import time to load the real agent-tool stack, but they popped src.tool_execution and left core.auth stubbed without restoring. Popping and re-importing src.tool_execution rebinds the src package's tool_execution attribute, so test_edit_file's later 'import src.tool_execution as te' resolved to a different module object than the one execute_tool_block lives in. The monkeypatch on _owner_is_admin then missed, the non-admin edit_file gate never fired, and the edit went through (exit_code 0). Stop touching src.tool_execution and restore the heavy stubs after import. Verified the full suite is green on Linux (Python 3.11, matching CI). --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:35:10 +01:00
Alexandre Teixeira	d4ab09e8e1	test: add focused test selection runner (#3556 )	2026-06-09 17:03:47 +02:00
Sheikh Rahat Mahmud	9180847c0e	feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964 ) * Add consolidated service health endpoint for degraded-state reporting ROADMAP (High Priority) asks for "Better degraded-state reporting for ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no single readout of which subsystems are actually working: /api/health is only a liveness ping and each subsystem's signal lives in a different module, so a misconfigured self-host install gives no consolidated picture. This adds an admin-only GET /api/diagnostics/services endpoint backed by a new src/service_health.py aggregator. Each subsystem reports a uniform {name, status, detail, meta} where status is ok \| degraded \| down \| disabled, and the response rolls up an overall verdict (worst non-disabled status). Probes are deliberately non-intrusive and safe to poll: - ChromaDB: reads the .healthy flags on the RAG and memory vector stores. - SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No search query is run. - ntfy: GET the server's built-in /v1/health. No test notification is sent. - email: short IMAP connect+logout per configured account (no credentials in meta). - providers: probe each enabled ModelEndpoint's model list (no api_key in meta). Probe functions take their inputs as parameters and isolate the network call to injectable callables, so they unit-test without touching the network (same pattern as the merged provider-endpoint tests). Network probes run concurrently off the event loop via asyncio.to_thread with bounded per-probe timeouts. memory_vector is now passed into setup_diagnostics_routes (new optional param, backward-compatible) so ChromaDB's vector-memory store can be reported too. Tests: tests/test_service_health.py — 29 tests covering every status mapping per subsystem, the overall rollup, and that no secrets leak into meta. Verification: python -m pytest tests/test_service_health.py -q # 29 passed python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q Backend + tests only; an Admin/Settings UI badge that renders this endpoint is a natural follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(diagnostics): bound service-health wall-clock and redact secrets Addresses review on #964. Blocker 1 — genuinely bounded wall-clock: - providers_health and email_health now fan out per-item probes across a bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET), instead of probing endpoints/accounts sequentially. Stragglers are reported as a controlled `timeout` and never block; the pool is shut down with wait=False so the response returns on time regardless of endpoint/account count. - The IMAP connect path now honors the service-health budget: _imap_connect gained a pass-through `timeout` param and the probe calls it with _PROBE_TIMEOUT instead of the default 15s. - collect_service_health runs the four network subsystems concurrently, each under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop. Blocker 2 — no secret/raw-error leakage in the response: - _safe_url strips userinfo, query, and fragment from every URL surfaced in meta (searxng instance, ntfy base, provider name fallback), keeping only scheme/host/port/path. - _classify_error maps every probe failure to a controlled category token (timeout, connection_refused, dns_error, tls_error, network_error, http_error, auth_or_protocol_error, …) — raw str(exception), which can embed credentialed URLs or server text, is never returned. Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py): - URL userinfo/query redaction for searxng/ntfy/providers. - secret-bearing exception strings map to categories and don't leak. - multiple slow providers/accounts stay bounded (single + 25-endpoint cases). - subsystems run concurrently; aggregate deadline yields a controlled result. - route-level unauthenticated (401) / non-admin (403) / admin (200) coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(diagnostics): isolate route tests so they don't leak module globals The new route tests replaced src.service_health.collect_service_health and routes.diagnostics_routes.require_admin via direct assignment, which persisted for the rest of the pytest session. In CI's full alphabetical run that fake collector (returning services=[]) leaked into the later collect_service_health tests and failed them. Switch to monkeypatch.setattr so both are restored after each test. No production code change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:00:24 +01:00
Maanas	c1674fc2aa	refactor(tools): migrate execution logic to src/agent_tools/ package with handler registry (#3435 ) * refactor(tools): implement strict cohesive class coordinator pattern per #2917 * test: update edit_file tests to use EditFileTool class * fix(tools): restore tool_policy param and security backstop in coordinator * refactor(tools): migrate domain tools to agent_tools package per #2917 * test: update test imports for new agent_tools package * fix: resolve circular import between tool_execution and agent_tools * fix: remove leftover git conflict markers * fix(tools): resolve pytest failure and document _apply method * fix(tools): clean up whitespace and remove dead _tool_python helper --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 14:35:36 +01:00
Joshua Valderrama	35b4dd2824	fix: session context drifting — messages leaking between chats (#135 ) (#267 ) * docs: add implementation plan for fixing chat context drifting (#135) * fix: make Session.history immutable + fix {}.history crash - Session.history now exposes a COPY of the internal _history list - add_message() replaces history with a fresh copy each time - get_context_messages() derives from _history directly - replace_messages() updates both _history and history - truncate_messages() updates both _history and history - _persist_message() line 207: fixed {}.history fallback crash - Added 11 tests for session isolation and edge cases Addresses #135 root cause #1: shared mutable references * fix: task scheduler uses SessionManager methods instead of overwriting sessions - Added ensure_task_session() to SessionManager (checks cache first) - Task scheduler now uses ensure_task_session() instead of direct dict assignment - Task scheduler now uses SessionManager.add_message() for message persistence - Removed direct sess_obj.history.append() that was silently losing data Addresses #135 root causes #2 and #3 * fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old Prevents the cleanup task from deleting sessions that were just created and haven't received any messages yet (message_count == 0). Addresses #135 root cause #5 * test: comprehensive session isolation tests (10/10 passing) * refactor: consolidate _session_manager into singleton pattern - Added set_session_manager_instance / get_session_manager_instance to core/models - kept backward-compat aliases (set_session_manager, get_session_manager) - session_manager.py re-exports the singleton functions - ai_interaction.set_session_manager now syncs with the core singleton - context_compactor uses get_session_manager_instance() instead of getattr hack - app.py initializes the singleton once Addresses #135 root cause #4: fragile global wiring * test: add concurrent session isolation integration tests Verifies: - Concurrent add_message to different sessions doesn't cross-contaminate - Rapid parallel writes maintain isolation - Read-write concurrent access is safe All 3 async tests pass, proving the immutable history fix works under concurrency * fix: pre-import core.models in conftest to prevent test pollution test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at module level during collection. Any test collected after it imports Session as a MagicMock. Pre-importing core.models in conftest.py before test_agent_loop.py's module-level code runs prevents this. * fix: make .history authoritative mutable list, address PR review Per review feedback: keep .history as the authoritative mutable list so existing code doing .history.pop(), .history = [...], etc. still works. Fix the cross-contamination bug by ensuring __post_init__() gives each Session its OWN unique history list (never shared). Changes: - core/models.py: .history IS the authoritative list. _history aliases it. Each Session gets its own list in __post_init__. - core/session_manager.py: add_message() delegates to Session.add_message() instead of appending directly — no double-append, single source of truth. - tests/test_session_manager.py: updated test to reflect that .history references see new messages (same list, not a snapshot). - docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for shipping — useful design context but too much process/doc to ship). All 272 tests pass (3 pre-existing failures unrelated). * Fix session manager message persistence * Fix session history alias regressions * Fix session history aliasing and task delivery	2026-06-09 14:12:52 +01:00
Maruf Hasan	c3fcaf15b7	feat(providers): add NVIDIA AI provider endpoint support (#3456 ) * feat: add NVIDIA as an AI provider (integrate.api.nvidia.com) * feat: add NVIDIA option to provider settings dropdown and aliases * test: add NVIDIA provider detection and endpoint tests * Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering - nvidia.com -> 'nvidia' curated key for proper provider routing - _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed - _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip, kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse, -embedqa, -nemoretriever * Expand non-chat model filtering for NVIDIA embedding/guard/video models Add _NON_CHAT_PREFIXES: embed, recurrent Add _NON_CHAT_CONTAINS: topic-control, guard, calibration, ai-synthetic-video, cosmos-reason2 Catches remaining unfiltered non-chat models from NVIDIA catalog: embedding (llama-nemotron-embed, embed-qa), guard (llama-guard, nemoguard-topic-control), calibration (ising-calibration), video (ai-synthetic-video-detector, cosmos-reason2), recurrent (recurrentgemma-2b) * Filter non-chat models in _probe_endpoint via _is_chat_model() Previously _is_chat_model() was only used in the per-model probe and _first_chat_model(), so non-chat models still appeared in the model picker even though they were filtered in those specific paths. Applying the filter at _probe_endpoint() return ensures non-chat models (embeddings, safety guards, reward, calibration, video detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never enter cached_models and never appear in the picker. * Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models Prefix checks (mid.startswith) miss models with org prefixes like baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc. Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught regardless of the org prefix. Adds: embed, bge, recurrent, starcoder, gemma-2b * fix(model-routes): drop collision-prone substrings from global non-chat filter The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES and _NON_CHAT_CONTAINS tuples. These are intended to filter out embedding, retrieval, safety, and vision models from NVIDIA's catalog that are not chat-completions-capable. However, four of the added substrings collide with legitimate chat models served by other providers: - gemma-2b matches google/gemma-2b-it (instruct chat model) - starcoder matches bigcode/starcoder2-15b (code completion model) - recurrent matches google/recurrentgemma-2b (language model) - guard matches meta-llama/Llama-Guard-3-8B (safety classifier) Removing these four from the global tuples keeps the NVIDIA-specific filtering intact (safety, embedding, retrieval, and vision models are still caught by other tokens such as content-safety, -safety, -reward, embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while preventing false negatives for instruct/code models on other providers. Tests added for gemma-2b-it, google/gemma-2b-it, and bigcode/starcoder2-15b-instruct asserting they are recognized as chat models. Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS entries redundant since the prefix check runs first. Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * style: fix indentation of groq and xai test cases in test_provider_endpoints.py --------- Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-09 11:06:12 +02:00
Mazen Tamer Salah	3c4ec8828b	fix(embeddings): survive numpy embeddings when restoring a reset lane (#3410 ) When a lane reset fails to rewrite the recreated collection, the recovery path re-adds the preserved rows. It read the embeddings with `preserved.get("embeddings") or []` and gated the loop with `if ids and docs and old_embeddings:`. chromadb returns embeddings as a numpy ndarray, whose truth value is ambiguous, so both expressions raise ValueError inside the except block — the restore is abandoned and every preserved row is lost (the collection was already deleted), exactly when the code is trying to avoid data loss. Use an explicit `is None` check and `len(...)`, and convert ndarray batches to lists before re-adding. Adds tests/test_embedding_lane_ndarray_restore.py (preserved embeddings come back as np.ndarray); existing test_embedding_lanes.py still passes.	2026-06-09 10:40:17 +02:00
Ashvin	2fdb4813db	fix(auth): sync file-backed and in-memory owner caches on user rename (#3397 ) The DB owner-rename loop in rename_user patched every SQL column named owner, but three non-SQL stores were left behind: 1. session_manager.sessions -- in-memory Session objects carry s.owner set at server-boot time. get_sessions_for_user() does an exact s.owner == username check, so the renamed user chat sidebar goes empty until a server restart. 2. data/deep_research/.json -- each completed research report is a standalone JSON file with an owner field. research_routes filters by d.get(owner) == user, making every report invisible to the renamed user. 3. data/memory.json -- a flat JSON array; each entry carries an owner field. memory_manager.load(owner=user) filters on it, so all memories vanish from the memory panel. Fix: after the SQL loop, patch all three: - iterate sm.sessions and update owner in-place (exposed via app.state) - walk data/deep_research/.json and rewrite owner with atomic_write_json - update matching entries in memory.json with atomic_write_json All three use the same case-insensitive lower() comparison the SQL loop already uses. Each step is independently wrapped so a single failure does not abort the others or the rename itself. Fixes #3362	2026-06-09 10:19:45 +02:00
nubs	f1cda91683	fix(agent): scope skill index to owner (#2404 ) Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-09 09:51:29 +02:00
Kenny Van de Maele	0aba00f4cf	refactor(tools): remove dead workspace-confinement plumbing (#3590 ) Commit `e6b1009` removed the workspace feature's entry point (deleted routes/workspace_routes.py + static/js/workspace.js and dropped the workspace-param parsing in chat_routes), but left the downstream backend plumbing dangling: chat_routes passed a hardcoded workspace=None into stream_agent_loop, which forwarded it to execute_tool_block, so the workspace value was permanently None and every workspace-gated branch was unreachable. Remove the now-dead code (no behavior change, since workspace was always None): - src/tool_execution.py: drop _resolve_tool_path_in_workspace and the workspace params/branches on execute_tool_block, _direct_fallback, _call_mcp_tool, _do_edit_file, and _resolve_search_root; restore the bash/python/bg cwd to _AGENT_WORKDIR. - src/agent_loop.py: drop the workspace param on stream_agent_loop, the dead 'ACTIVE WORKSPACE' system-prompt block, and the workspace forward. - routes/chat_routes.py: drop the hardcoded workspace=None arg and var. - tests: delete test_workspace_confine.py (tested the removed feature) and the workspace assertion in test_tool_policy.py. Full suite: 2903 passed, 1 skipped.	2026-06-09 08:30:50 +02:00
Afonso Coutinho	fbed9027b0	fix: backup import dropping a user's skill on cross-tenant title/id collision (#2057 ) * Fix backup import dropping a user's skill on cross-tenant title/id collision The skills block of import_data deduped incoming skills against skills_manager.load_all(), which returns EVERY tenant's skills. So when a user imports their own backup, any skill whose id or title collides with another user's skill was silently skipped — the importing user lost their own data. This is the same cross-tenant bug already fixed for the memories block just above (#1743); the skills block was left with the old pattern. Filter the dedup sets to the importing user's own skills (owner == user); the full store is still saved back, preserving other users' skills. * Restore sys.modules after stubbing so backup test does not break collection of later src.* test modules * Patch backup_routes auth helpers via monkeypatch instead of sys.modules stubs so the test is import-order robust * Give FakeSkillsManager an add_skill method matching the disk-backed skills API	2026-06-09 08:04:22 +02:00
Disorder AA	d9141c6e56	fix(cookbook): allow spaces and non-ASCII characters in model directory paths (#3473 ) * fix(cookbook): allow spaces in model directory paths Allow POSIX external-drive paths and Windows drive paths with spaces while keeping shell metacharacters rejected. * fix(cookbook): also allow non-ASCII (Unicode) characters in model dir paths The ASCII-only allowlist that rejected spaces also rejected Cyrillic, accented Latin and CJK folder names (e.g. /Volumes/Модели, D:\AI Models\Модели) with 400 Invalid local_dir. Switch the path character class from [A-Za-z0-9._ -] to [\w. -] (\w is Unicode-aware on Python 3 str patterns) so localized folder names validate, while shell metacharacters (; & \| ` $ quotes newlines) stay rejected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cookbook): reject local_dir path segments starting with '-' The local_dir allowlist includes '-', so a directory like /models/-rf (or D:\models\-rf) could be parsed as a CLI flag by hf/etc. (option injection) — and quoting does not stop a value from being read as an option. Guard against it inside the validator so the safety stays fully self-contained there rather than depending on consumers' quoting. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:58:38 +02:00
onemorethan0	8ae2b5f58c	fix(llm): suppress thinking mode for qwen3/gemma4 on Ollama /v1 endpoint (#3228 ) * fix(llm): suppress thinking for qwen3/gemma4 on Ollama /v1 compat endpoint When using qwen3, QwQ, gemma4, or other thinking models via Ollama's OpenAI-compatible /v1 endpoint, the model routes all output into its <think>...</think> reasoning block. Since Odysseus strips thinking content from round_response and only accumulates native tool_calls, this produces a round with 0 chars, 0 native calls, 0 tool blocks — the agent appears to silently do nothing. Root cause: Odysseus classifies the /v1 endpoint as provider="openai" (not "ollama"), so the payload is built as a standard OpenAI payload without any Ollama-specific options. Ollama's /v1 endpoint accepts "think": false as a top-level parameter to suppress extended thinking, but this was never sent. Fix: - Add _is_ollama_openai_compat_url() to detect local Ollama /v1 URLs - Inject "think": false in both stream_llm and llm_call_async for thinking models (qwen3, QwQ, gemma4, DeepSeek-R1, etc.) on this endpoint Verified with qwen3:14b on Ollama 0.24: with think=False the model correctly emits native tool_calls in a single streaming chunk and the agent executes bash/file/web tools as expected. * fix(llm): extend _is_ollama_openai_compat_url to match localhost on any port Per reviewer feedback on PR #3228: 1. Generalize host detection to mirror _is_ollama_native_url: match any localhost/127.0.0.1/0.0.0.0/::1 host (not just port 11434) so that custom OLLAMA_HOST ports and container remaps are also covered. 2. Add tests/test_llm_core_ollama_thinking.py covering: - _is_ollama_openai_compat_url for all positive/negative URL cases including IPv6, non-default port, native /api path, and real OpenAI - Payload injection: think:false set for Ollama /v1 thinking model, not set for non-thinking model, not set for real OpenAI endpoint, and set for localhost on a non-default port (the new case)	2026-06-09 07:35:15 +02:00
pewdiepie-archdaemon	1a529d63d9	Fix remaining CI regressions	2026-06-09 10:21:56 +09:00
Ocean Bennett	db1bbfe588	fix(sessions): keep fresh chats during auto tidy (#1871 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 01:06:20 +01:00
Kenny Van de Maele	2404b00f18	refactor(uploads): centralize upload byte-limits in upload_limits.py (#3364 ) (#3518 ) Move every per-route upload byte-limit into src/upload_limits.py as a validated, env-overridable constant via read_byte_limit_env: - Add GALLERY_UPLOAD_MAX_BYTES, GALLERY_TRANSFORM_UPLOAD_MAX_BYTES, MEMORY_IMPORT_MAX_BYTES, PERSONAL_UPLOAD_MAX_BYTES, EMAIL_COMPOSE_UPLOAD_MAX_BYTES, STT_MAX_AUDIO_BYTES, ICS_MAX_BYTES. - Routes import their constant instead of defining it locally: replaces 4 raw int(os.getenv(...)) and removes 3 hardcoded literals. - The 3 previously-hardcoded limits (email compose, STT audio, calendar ICS) are now env-overridable with the same ODYSSEUS_*_MAX_BYTES naming. - Defaults unchanged, so behavior is unchanged unless an env var is set; an invalid value now fails fast with a clear message instead of a bare int() ValueError. - Document all env vars in .env.example and the README. Fixes #3364	2026-06-09 01:24:30 +02:00
Alexandre Teixeira	a240f28af9	test(taxonomy): auto-mark tests by area and sub-area (#3491 )	2026-06-09 01:13:28 +02:00
Ocean Bennett	e7c1d75884	fix(models): query v1 models for llama-server endpoints (#3380 ) * fix(models): query v1 models for llama-server endpoints * test(models): accept owner kwargs in llama-server regression	2026-06-09 01:09:02 +02:00
Mateus Oliveira	f7ae85590b	refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils (#3478 ) * refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager() into a single leaf module (src/tool_utils.py) that imports only from src.constants. This eliminates the lazy-import hack ('from src import agent_tools' inside function bodies) in tool_execution.py and tool_implementations.py, and fixes a latent bug: the _truncate copy in tool_execution.py was missing the isinstance guard and would crash on None. Also deletes mcp_servers/_common.py — it was dead code with zero callers anywhere in the codebase, containing its own copy of truncate() and constants that already exist in src/constants.py. * fix(tools): route remaining get_mcp_manager imports to src.tool_utils The maintainer's feedback flagged src/task_scheduler.py:1857 and routes/task_routes.py:977. A project-wide search found a third call site in src/agent_loop.py that also imported get_mcp_manager from src.agent_tools instead of src.tool_utils. All three are now sourced from the canonical location in src.tool_utils. --------- Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>	2026-06-09 01:05:30 +02:00
Ocean Bennett	62ffcb6236	fix(cookbook): preserve same-host ssh profile selection (#3373 ) * fix(cookbook): preserve same-host ssh profile selection * fix(cookbook): resolve same-host ssh profiles in running tab and port lookups	2026-06-09 00:36:10 +02:00
Wes Huber	85c6056c87	test(models): add regression coverage for Z.AI coding endpoint probing (#2244 ) Add focused tests for the z.ai/api/coding path override: - _match_provider_curated: 5 tests verifying coding vs base key - _probe_endpoint: 3 tests verifying model preservation, curated append on partial response, and base-zai exclusion Rebased onto dev per reviewer request. Fixes #2230 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-08 23:07:29 +01:00
Rohith Matam	049833e309	fix: skip malformed document tool call items (#3494 )	2026-06-08 23:25:31 +02:00
Cookiejunky	4e497f4878	fix(cookbook): guard break-system-packages pip flag (#3510 )	2026-06-08 23:10:20 +02:00
Lucas Daniel	5462030cde	fix(auth): per-user allowed-models checklist ignores cache, [None] doesn't block (#3355 ) Three issues combined to make the per-user 'Allowed models' checklist unreliable (#3032): 1. admin.js _loadModelsForUser fetched /api/models, which is backed by cached_models — endpoints that haven't been probed yet (e.g. a freshly-added DeepSeek API endpoint) simply didn't show up in the checklist. Switched to /api/model-endpoints, which always reflects every configured endpoint regardless of cache state. 2. _saveModels sent allowed_models: [] both when the admin clicked [All] (no restriction) and [None] (block everything) — the backend had no way to distinguish the two. 3. _enforce_chat_privileges treated an empty allowed_models list as 'no restriction' (falsy -> skip the check), so [None] had no effect. Added an explicit block_all_models privilege flag (defaulting to False, and forced to False for admins) that admin.js now sets when zero models are checked. _enforce_chat_privileges checks it first and 403s regardless of allowed_models contents.	2026-06-08 22:52:39 +02:00
Lucas Daniel	0a324f20d2	fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356 ) * fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models _resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block parser whenever a model produced no native tool_calls, regardless of whether that model has a reliable native function-calling channel. Native models (GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write illustrative ```bash/```python/```json examples in guide-only prose; the fallback parser matched these and executed them as real commands, sometimes looping for several rounds as the model tried to clarify with more examples (#3222). Restrict the textual fenced-block fallback to non-native models, which rely on it as their only tool-invocation channel. Native models are trusted to use their structured tool_calls channel for real invocations; when they don't emit one, a bare fence in their response is prose, not an action. The native tool_calls path itself is untouched. This sits one layer below #3088's guide-only policy enforcement: that PR blocks tool exposure/execution on explicit no-tools requests, while this fixes the parser so ordinary illustrative fences are never misread as calls in the first place, on any turn. * fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence _resolve_tool_blocks previously short-circuited the entire textual parser (tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native function-calling models with no native tool_calls. That also dropped Patterns 2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content as text), which are real calls a model couldn't emit on its structured channel (e.g. DeepSeek-V falling back to DSML), not illustrative examples. parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks passes skip_fenced=is_api_model so fenced examples stop being executed for native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and recoverable. cleaned_round mirrors the same gate when persisting round text, so an illustrative fence that wasn't executed isn't stripped from saved/reloaded history either (it was streaming once and then disappearing on reload).	2026-06-08 22:25:28 +02:00
Mazen Tamer Salah	8e494cc1c4	fix(chat): keep balanced trailing ')' when extracting URLs (#3406 ) extract_urls() stripped any trailing ')' unconditionally via `re.sub(r'[.,;:!?\)]+$', '', url)`. That corrupts URLs that legitimately end in a parenthesis — most commonly Wikipedia disambiguation links like https://en.wikipedia.org/wiki/Python_(programming_language), which became ...Python_(programming_language and then 404 when fetched by the web/research tools. Strip trailing sentence punctuation as before, but only drop a ')' when it is unbalanced (more ')' than '('), so a prose-glued "(see https://example.com)" still loses its closing paren while balanced URLs keep theirs. Added tests/test_extract_urls.py covering balanced, unbalanced, nested, and trailing-punctuation cases.	2026-06-08 21:33:29 +02:00
nubs	932b7f2446	fix(email): close IMAP socket when connect/login fails (#3174 ) (#3363 ) * fix(email): close IMAP socket when connect/login fails (#3174) _imap_connect opened a live socket via _open_imap_connection and then called conn.login() with no try/finally, and _open_imap_connection called conn.starttls() unguarded. When auth fails (e.g. an Office 365 app password on an MFA-enabled tenant, #3174) or STARTTLS is rejected, the already-open socket was orphaned. Every IMAP caller funnels through _imap_connect, including the 30-minute _auto_summarize_poller, so a persistently misconfigured account leaked one descriptor per pass toward FD exhaustion. The previously merged leak fixes (#1325/#1330/#1423/#1530) only guard the post-connect body and monkeypatch _imap_connect to succeed, so this connect-time path was uncovered. Wrap login() and starttls() so a failure calls conn.shutdown() (low-level close; logout() can't run pre-auth) before re-raising. Adds two regression tests that fail without the guard. * fix(email): guard MCP IMAP+SMTP connect-time leaks too (#3174) Folds in the sibling connect-time leaks vdmkenny flagged on #3363, so the whole connect-then-step leak class is closed in one place: - mcp_servers/email_server.py::_imap_connect — guard starttls() and login(); close pre-auth with conn.shutdown() before re-raising. - mcp_servers/email_server.py::_smtp_connect — guard starttls() and login(); SMTP has no shutdown(), so close with conn.close() (socket close, no QUIT). Routes SMTP (_send_smtp_message) is already safe via 'with smtplib.SMTP(...)'. Adds four regression tests (one per guard), verified to fail without the fix.	2026-06-08 21:21:41 +02:00
Alex Little	a58f526992	fix(presets): scope expand-prompt model resolution to owner (#3477 ) * fix(presets): scope expand-prompt model resolution to owner /api/presets/expand resolved its model endpoint with no owner, so in a multi-user setup it could match another user's endpoint and use its URL and decrypted api_key. Pass effective_user(request) to _resolve_model so resolution is owner-scoped. Adds a regression test. * fix(presets): scope teacher and audit model resolution to owner Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alex Little <alexwilliamlittle@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-08 21:12:02 +02:00
Mazen Tamer Salah	5198516979	fix(sessions): copy message metadata when forking a session (#3409 ) fork_session passed each source message's metadata dict by reference into the new session. add_message() -> _persist_message() stamps _db_id (and timestamp) onto that dict in place, so persisting the fork overwrote the SOURCE messages' _db_id with the forked rows' ids — silently breaking edit/delete-by-id on the original conversation. Copy the metadata dict per message so the fork and source no longer alias. Adds tests/test_fork_session_metadata.py asserting the source session's message metadata is unchanged after a fork.	2026-06-08 20:49:15 +02:00
Giuseppe Castelluccio	095c74b985	fix(security): fail closed in /api/models auth gate on unexpected errors (#3489 ) GET /api/models swallowed any non-HTTPException raised while checking whether the caller is authenticated (bare except Exception: pass), so a broken auth_manager or an exception from get_current_user silently granted the full model list to an anonymous caller instead of rejecting the request. Now any unexpected exception logs and returns HTTP 500. Split out of #2360 per reviewer request to keep the deny-list and the auth-gate fix as separate, single-purpose PRs. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-08 20:23:39 +02:00
CorVous	34a3f8637a	fix(memory): make auto-memory extraction reliable for reasoning models (#3190 ) * fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn extract_and_store appended the recent window as raw alternating role messages after the system prompt. Since the window is the last N messages, the prompt usually ENDED on an assistant turn — and a chat model given a prompt ending on an assistant turn returns an empty completion (nothing to answer). The result was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no memories were ever stored, while skill extraction (which flattens the transcript into a single user message) worked fine. Flatten the window into one user message ending with an explicit instruction, mirroring the skill extractor, so the model always responds. Also harden parsing for reasoning models, matching the audit path which already does this: - raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think> before emitting JSON; 500 truncated it before any JSON appeared); - strip <think>/prose preambles via strip_think and slice the embedded JSON array before json.loads, instead of bombing on char 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(memory): reframe the comment to the accurate root cause (raw-chat framing) The earlier comment leaned on "ends on an assistant turn -> empty completion", which is only one failure mode. The dominant cause, confirmed by a controlled repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat messages makes the model treat it as a conversation to continue rather than a transcript to analyze, so it returns [] even when durable facts are present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally Factor the strip/fence/slice/json.loads logic out of extract_and_store into a pure module-level helper _parse_extraction_json(raw) -> list and drop the 'text[0] != "["' guard so the array is sliced whenever both brackets exist (fixes trailing commentary like '[...] Done!' reaching json.loads). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 19:57:44 +02:00
Mazen Tamer Salah	8449baea80	fix(api-tokens): preserve scopes on a partial token update (#3407 ) PATCH /api/tokens/{id} unconditionally recomputed scopes from payload.get("scopes"). On a rename — body {"name": "..."} with no "scopes" key — that is None, so _normalize_scopes(None) returned the default ["chat"] and the handler overwrote token.scopes, silently dropping every scope the token had been granted (e.g. email:read, calendar:write). Only write scopes when the request actually includes them, and return the token's real stored scopes in the response (matching the GET /tokens display shape) instead of the recomputed default. tests/test_api_token_routes.py: add rename-preserves-scopes, explicit-scopes-applied, and missing-token-404 cases for the PATCH handler.	2026-06-08 19:37:31 +02:00
Mazen Tamer Salah	d58202d10e	fix(presets): persist presets atomically to avoid corruption on crash (#2169 ) PresetManager.save() used a plain open("w") + json.dump, which truncates presets.json before writing the new content. A crash, power loss, or serialization error mid-write leaves the file truncated/empty and every saved preset is lost. Route the write through core.atomic_io.atomic_write_json (tmp file + os.replace), matching how the rest of the codebase persists JSON state. The helper is imported lazily so this module stays free of the heavy core package import graph at module load time. Adds tests/test_preset_atomic_save.py covering the source contract, a failed-write leaving the existing file intact, and a round trip.	2026-06-08 19:16:37 +02:00
Mazen Tamer Salah	1209f258d7	fix(caldav): skip the prune when any object fails to parse (#3454 ) * fix(caldav): don't prune the whole window when no objects could be parsed The post-sync prune deletes local origin=="caldav" rows in the window whose UID the server didn't just return. With an empty seen_uids it falls back to `uid.isnot(None)` — a match-all delete. That's right when the calendar is genuinely empty, but when the server returns objects and every one fails to parse (malformed iCal / an icalendar error), seen_uids is empty only because nothing could be read, so the match-all branch silently deletes every local event in the 90-day-back/365-day-forward window. Track whether any object failed to parse and gate the prune with a small pure helper `_should_prune_window(seen_uids, parse_failed)`: prune when something was read, or when the calendar is genuinely empty (no objects, no parse errors), but never when objects came back unreadable. Adds tests/test_caldav_prune_parse_failure.py for the three cases. * fix(caldav): skip the prune on any parse failure, not just total Review follow-up (#3454): _should_prune_window returned True whenever seen_uids was non-empty, so a partial parse failure (say 48 of 50 objects parse) still pruned the 2 unreadable-but-still-upstream events, because their UIDs were absent from seen_uids. Any parse failure makes seen_uids an incomplete view of the server, so pruning against it is unsafe whether the failure is total or partial. Skip the prune on any parse failure (return not parse_failed); only prune on a clean read (a genuinely empty window is still safe to prune). Tradeoff: one permanently-unparseable event pauses deletion mirroring until it is fixed, which is the safe direction (false-keep beats false-delete). Replace the now-incorrect "partial failure still prunes" assertion with a partial-failure regression: one object parses, one fails, so the prune is skipped and the unparsed event's local copy is not deleted. --------- Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-08 18:59:14 +02:00
Mazen Tamer Salah	d71284194b	fix(memory): only delete memories the model explicitly drops in tidy (#3455 ) * fix(memory): only delete memories the model explicitly drops in tidy The AI memory-tidy path computed deletions as the complement of the model's `keep` list (`if mid not in keep_ids: continue`). When the model returned a valid response that simply omitted some existing ids — a common LLM lapse — every omitted memory was silently deleted, even though it was neither a duplicate nor listed in `drop`. Honor the explicit `drop` set instead: delete only ids the model dropped (minus any it saw only truncated), and preserve everything else, still applying cleaned text/category from `keep`. Adds tests/test_consolidate_memory_explicit_drops.py: a memory the model omits from both keep and drop survives; an explicitly dropped one is removed. * refactor(memory): remove now-dead keep_ids from tidy After deletion switched to drop_ids and text/category rewrites to cleaned_by_id, keep_ids was written but never read. Remove the init, the .add(mid) in the keep loop, and the truncated .update() (its truncated-protection is already covered by `drop_ids -= truncated_ids`). Pure deletion, no behavior change; tests stay green. Addresses review feedback on #3455. --------- Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>	2026-06-08 18:54:45 +02:00
Aman Tewary	d458cade98	docs(email): clarify Outlook password auth failures Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-08 15:32:16 +01:00
Mostafa Eid	d6882a895e	feat(chat): recall last user message on empty composer ArrowUp (#1175 ) Pressing ArrowUp on an empty #message composer restores the last sent user text, matching common chat-app UX (Slack, Discord, ChatGPT). - Read from #chat-history .msg-user dataset.raw (same path as resend/regenerate), not session sidebar metadata - Literal empty check (whitespace-only drafts are preserved); ignore Shift/Alt/Ctrl/Meta and IME composition - Extract wiring to composerArrowUpRecall.js; rAF + 250ms retry only (no global MutationObserver) - Add tests/test_composer_arrow_up_recall_js.py Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-08 13:06:05 +02:00
Vykos	4a9085d252	fix(endpoint): scope secondary endpoint lookups by owner * Scope secondary endpoint lookups by owner * Reject unregistered image endpoint URLs for non-admins * Adjust owner-scope tests for rebased routes * Allow non-admins to compare endpoints they own The compare owner-scope guard called _reject_raw_endpoint_url_for_non_admin with endpoint_id=None, so it rejected every signed-in non-admin /api/compare/start request — even for endpoints the caller owns — because compare resolves endpoints by URL and carries no endpoint_id. That locked non-admins out of compare entirely. Resolve the owned ModelEndpoint first and pass its id, so a registered endpoint the caller owns is allowed while only truly raw, unregistered URLs are rejected (mirrors the gallery inpaint/harmonize checks in this PR). Replace the source-only reject test with deterministic reject + allow regressions that no longer depend on the dev DB contents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Bind compare sessions to the resolved owner-scoped endpoint /api/compare/start created the [CMP] helper sessions with the raw caller-supplied endpoint URL and only used the owner-scoped lookup to decide whether to copy an API key. That stopped key borrowing but still let a non-admin inject an arbitrary raw endpoint URL into the compare session path. Now, when the supplied URL resolves to a registered endpoint visible to the caller, the session binds to that row's own normalized base URL (build_chat_url(normalize_base(ep.base_url))) plus its headers — the same registered-endpoint shape session_routes uses. The raw URL survives only when ep is None, which non-admins already hit a 403 on, leaving raw URLs reachable solely for admins / single-user mode with no borrowed key. Adds compare-specific behavior tests: another user's private endpoint is rejected (nothing created), the session binds to the stored URL rather than the raw input, and an admin raw URL is allowed but carries no inherited key. Addresses the review on #1511. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Validate both compare endpoints before creating any session start_comparison resolved + created each [CMP] session inside one loop, so a request pairing a valid owned endpoint A with an unregistered raw endpoint B raised 403 only after A's session was already created — and its Authorization header copied in. The rejected request left a partial compare session with that header behind. Split the flow into two phases: phase 1 resolves and owner-validates both endpoints (running the raw-URL reject helper) and stashes the session URL + headers; phase 2 creates the two sessions only once both passed. A 403 on either endpoint now aborts with nothing created and no header copied. Adds a regression test: owned endpoint A + unregistered/raw endpoint B -> 403 with no sessions created. Addresses the follow-up review on #1511. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Resolve compare credentials by endpoint id, not URL alone Two endpoints visible to a caller can share a base_url but hold different api_keys. _owned_endpoint_by_url returned whichever row sorted first, so /api/compare/start could copy the wrong key into the [CMP] session. Add _owned_endpoint_by_id (same owner scoping) and optional endpoint_a_id/ endpoint_b_id form fields. The id pins the exact registered endpoint; URL resolution remains only for legacy/admin raw-URL callers. An id the caller can't see 404s instead of falling back to a same-URL row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Loosen research-routes owner-scope assertion to the stable substring The rebased _resolve_research_endpoint generalized its owner derivation to honor an explicit owner arg first (owner = owner or getattr(sess, ...)), so the exact-line assertion broke CI. Assert the stable session-derivation substring instead of the full line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 11:51:55 +01:00
stocky789	1e0d9b92af	feat: add ChatGPT Subscription provider (#2876 ) * feat: Add ChatGPT Subscription support and related features - Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI. - Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status. - Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance. - Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints. - Added new slash commands for managing skills, including listing, searching, and invoking skills. - Implemented caching for skill catalog to optimize performance. - Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing. - Refactored existing code to accommodate new features and improve maintainability. * refactor: share provider device-flow setup - reuse one device-flow backend for Copilot and ChatGPT Subscription - add one frontend device-flow helper for Settings and /setup - put GitHub Copilot back into Add Models, now as a dropdown option - make provider selection just select; clicking Add starts sign-in - stop ChatGPT Subscription setup from opening auth tabs automatically - make /setup copilot and /setup chatgpt-subscription work from chat - show ChatGPT Subscription in the /setup suggestions - show the real error message when setup fails - add focused tests for the shared flow and setup UI * feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX Backend: - Resolve runtime bearer for provider-auth endpoints at probe time via a shared _resolve_probe_key() that delegates to resolve_endpoint_runtime, applied across all probe/refresh call sites. - Skip live completion probes and health pings for discovery-only providers (centralized behind _is_discovery_only_provider) — the Codex/Responses API has no such endpoints, so status is derived from cached models. - Never persist the short lived ChatGPT bearer to the plaintext sessions table; proactively clear any stale bearer left by an earlier code path. - Revoke orphaned ProviderAuthSession credentials when the last endpoint backing them is deleted (_delete_orphaned_provider_auth), surfaced via cleared_provider_auth in the delete response. Frontend (admin.js): - Auto-start the device-auth flow on provider selection so the authorization panel (code + Authorize) shows immediately instead of behind a "Sign in" click. - Remove the redundant top button for device auth providers, move retry into the panel via an inline "Try again". - Drop the self-evident hint text and add an execCommand clipboard fallback so Copy works in non-secure (HTTP/LAN) contexts. * fix: harden chatgpt subscription provider * chore: remove PR media from branch * Fix chatgpt subscription recovery and token handling --------- Co-authored-by: 5p00kyy <admin@5p00ky.dev>	2026-06-08 10:19:18 +02:00
Mike	ac94885c84	refactor(constants): single source of truth for data dir (#3368 ) * refactor(constants): single source of truth for data dir + merge core/src constants Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(contributing): use named src.constants for data paths, drop core/constants references Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:58:52 +02:00
Lucas Daniel	adc6ac9394	fix(compare): stream Compare panes directly to stop upstream promptly The previous approach polled request.is_disconnected() inside the async-for body of the chat/agent streaming loops. That happens too late: by the time the poll runs, __anext__() has already awaited and consumed the next upstream chunk, so a slow or silent generation could still run for a full round-trip (or until a read timeout) after the client disconnected. It was also unconditional, which would have made ordinary chat navigation/refresh/tab-close stop a run that the detached-run design intentionally keeps going server-side. Both problems trace back to the same root cause: chat_stream always wraps its generator in agent_runs (the detached-run manager), which decouples the generator's lifetime from the SSE response on purpose so normal chat/agent streams survive the client going away. Polling disconnection inside a detached generator can never be "prompt" — the generator isn't tied to that request anymore — and doing so defeats the whole point of detaching it. Compare panes don't need (or want) that: each pane's session exists only to drive that one generation, there's nothing meaningful to /resume, and the user expects the pane's Stop button — which aborts the fetch and closes the SSE — to cancel the upstream call right away. So route compare-mode requests around the agent_runs wrapper entirely and stream the generator directly as the SSE body. Starlette already cancels a streaming response's body iterator (raising CancelledError/GeneratorExit into it) the instant it notices the client disconnected — including while the generator is mid-await on the next upstream chunk — and the existing except (CancelledError, GeneratorExit) handlers in both the chat-mode and agent-mode loops already save the partial response exactly once. No polling needed; the redesign just stops getting in its own way. Normal (non-Compare) chat and agent streams are untouched and keep going through agent_runs, preserving detached-run semantics (surviving tab close / navigation / refresh, reconnect via /api/chat/resume). Replaces the source-text assertions in tests/test_compare_stop_disconnect_poll.py with runtime tests that actually exercise the cancellation contract: a Compare-shaped generator is cancelled mid-await (not after the next chunk arrives) and saves its partial exactly once; a normal completion still saves exactly once via the completion path; agent_runs keeps a detached run alive when its subscriber disconnects and only stops it on an explicit stop()/cancel (also saving the partial exactly once); and the cancellation contract is pinned for both chat-mode- and agent-mode-shaped chunk sequences.	2026-06-08 01:13:45 +01:00
Lucas Daniel	fa7c4f8ea9	fix(search): catch HTTPStatusError so 403/404 URLs degrade gracefully instead of 500 (#2203 ) raise_for_status() raises httpx.HTTPStatusError for 4xx/5xx responses, but the surrounding try/except only caught httpx.RequestError (network errors) and RateLimitError (429). Any other HTTP error code propagated uncaught up through chat_processor -> chat_helpers -> chat_routes and surfaced as a 500 Internal Server Error. Added an explicit except httpx.HTTPStatusError clause that logs a warning and returns an empty result, matching the behaviour already in place for network errors. Also adds focused regression tests that exercise the real fetch_webpage_content() path with a mocked _get_public_url: - 403/404 responses return the standard empty-result shape instead of raising, proving the new HTTPStatusError handling works end to end. - 429 responses still take their own dedicated rate-limit branch (the status_code == 429 check runs before raise_for_status() is reached), keeping that behaviour distinct from the new generic HTTPStatusError handling. Dropped the unrelated builtin_mcp.py change that had been carried over from a rebase; that fix is tracked separately in #2018 and this branch should stay scoped to the search content fetch path. Closes #2148	2026-06-08 01:09:21 +01:00
Alexandre Teixeira	77b75ca97e	docs(tests): define testing standard and taxonomy (#3372 )	2026-06-08 01:15:47 +02:00
Kenny Van de Maele	505d8bae5a	fix(cookbook): locate cookbook_state.json via DATA_DIR, not hardcoded /app/data (#3332 ) Three call sites hardcoded Path("/app/data/cookbook_state.json"), which only exists in Docker; on a native run the real path is <repo>/data, so the state file looked missing and cookbook serve-state was silently ignored. Two others used os.environ.get("DATA_DIR", "data") (a relative fallback, since DATA_DIR is never set as an env var). Route all five through core.constants.DATA_DIR so the path is consistent and absolute on both Docker and native. Part of #3331. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 00:13:47 +01:00

1 2 3 4 5 ...

685 Commits