odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-15 17:25:26 -04:00

Author	SHA1	Message	Date
Nacho Mata	73823c878e	fix(windows): detect per-user Git for Windows bash under %LocalAppData%\Programs\Git (#3738 ) find_bash() rejected the WindowsApps WSL stub and then probed only %LocalAppData%\Git, so per-user Git for Windows installs (winget / Inno Setup {userpf}) under %LocalAppData%\Programs\Git were never found and the Cookbook reported "needs Git Bash" despite Git being installed. Add the Programs\Git subfolder to the LocalAppData fallback root.	2026-06-11 13:41:12 +02:00
RaresKeY	50fedff2f2	fix(email): scope learned sender signatures by owner (#3724 )	2026-06-11 13:26:59 +02:00
Max Hsu	66c25cbc2f	fix(models): reassign default endpoint when current default is disabled (#3649 ) Adding a new endpoint only auto-set the global default chat endpoint when none was configured (`if not settings.get("default_endpoint_id")`). When the existing default pointed at an endpoint the user had since disabled, it was never reassigned, so features that read the raw `default_endpoint_id` setting (notably Memory → Tidy) failed with "No default model configured — set one in Settings" even though an enabled endpoint existed. Reassign the default when the configured endpoint is missing/disabled, via a new pure `_default_endpoint_needs_assignment` helper. Adds unit coverage for the helper plus route-level regression tests for the disabled/enabled cases. Fixes #3586 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 13:17:31 +02:00
Léo	09ec880c06	Merge pull request #3567 from shdrs/fix/no-scroll-snapping fix(docs): remove intrusive scroll-snap UX on landing page	2026-06-11 13:12:40 +02:00
Léo	5e16126bde	Merge branch 'dev' into fix/no-scroll-snapping	2026-06-11 13:08:50 +02:00
cyq	c01034f9cb	fix(settings): scrub camelCase secret keys (#3707 )	2026-06-11 12:53:33 +02:00
broken💎shaders	8adca3a924	Merge branch 'dev' into fix/no-scroll-snapping	2026-06-11 11:43:53 +08:00
RaresKeY	d5603ee575	fix(research): migrate active task owners on rename (#3618 )	2026-06-11 01:17:02 +02:00
Mazen Tamer Salah	9c00da6d1c	fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models (#3639 ) * fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models The route did `n = int(gpu_count)` with no guard, so a non-numeric query param like `?gpu_count=abc` raised ValueError and returned HTTP 500. Parse it defensively (mirroring the gpu_group guard a few lines above): a malformed value is ignored, exactly like omitting the param, and valid values still apply. Adds tests/test_hwfit_gpu_count_nonnumeric.py: a non-numeric gpu_count returns a ranking instead of raising, and a numeric value is still accepted. * test(hwfit): cover non-numeric manual_gpu_count too Follow-up to the gpu_count guard: add a regression test for the sibling manual_gpu_count query param (the hardware simulator in _apply_manual_hardware), which dev already guards by defaulting to 1 on a non-numeric value. This pins that behaviour so the endpoint's count parsing is fully covered and cannot regress to a 500.	2026-06-11 01:01:58 +02:00
RaresKeY	d1a5a7d680	fix(hwfit): validate remote SSH detection targets (#3718 )	2026-06-11 00:43:49 +02:00
Mazen Tamer Salah	218b9ecbc8	fix(startup): ping real endpoints in warmup/keepalive (#3641 ) _warmup_endpoints called model_discovery.get_endpoints(), which does not exist on ModelDiscovery. It raised AttributeError on every startup and on every 60s keepalive tick, was swallowed by the outer except, and pinged nothing, so the cold-start prevention the loop exists for never ran. Add ModelDiscovery.warmup_ping_urls(), which resolves the /models probe URLs from the real discover_models() output, and call it from the warmup loop via asyncio.to_thread (discovery does a blocking port scan, so keep it off the event loop). Adds tests/test_warmup_ping_urls.py: resolves /models URLs from discovered items, honors the limit, degrades to [] on discovery failure, and documents that get_endpoints never existed.	2026-06-10 19:21:45 +02:00
Srinesh R	d9a4b99046	fix: handle batch events format in manage_calendar tool (#3503 ) * fix: handle batch events format in manage_calendar tool Models like deepseek-v4-flash emit batch events array instead of individual create_event calls. The tool defaulted to list_events (no action key), so events were never created despite the model confirming success. - Add batch normalization in do_manage_calendar - Map start/end objects to flat dtstart/dtend strings - Add tests for both object and flat string formats * fix: surface partial batch failures in manage_calendar Partial failures were silently dropped - batches with mixed success/failure would report only created count with no error visibility. - Return non-zero exit code for any failures - Surface both created and failed counts in response - Include first error message for debugging - Add test for partial failure case * chore: strip trailing whitespace in batch normalization block * chore: strip whitespace-only blank lines in batch events test	2026-06-10 19:13:08 +02:00
Mazen Tamer Salah	f5b91f1e9e	fix(tasks): read Memory.text in classify_events personal context (#3640 ) The classify_events task pulled user memories to give the LLM personal context, but read `m.content`, which the Memory ORM does not have (the column is `text`). That raised AttributeError on the first row; the surrounding except swallowed it and logged at debug, so the personal-context block was silently always empty and events were classified without it. Extract the rendering into `_memory_context_lines` (reads `text`, robust via getattr, keeps the 200-char and 40-line caps) and raise the swallowed-exception log to warning so a future schema mismatch is visible. Adds tests/test_classify_events_memory_text.py for the field, truncation, blank skipping, missing-attr robustness, and the line cap.	2026-06-10 19:03:45 +02:00
Max Hsu	8bf8212846	fix(chat): copy only the displayed reply from the message copy buttons (#3731 ) The AI-message copy buttons copied dataset.raw, which is the full accumulated model output — still containing the <think time="..."> reasoning block and any tool-call markup that the renderer strips for display. Pasting therefore leaked the model's thinking, and the first heading after </think> lost its markdown formatting because it was glued to the closing tag. Add chatRenderer.copyMessageText(), which mirrors the display pipeline (stripToolBlocks then extractThinkingBlocks) and falls back to the raw text when stripping leaves nothing (thinking-only turns), and route both copy handlers — the message footer and the slash-reply footer — through it. The interrupted-turn Continue flow intentionally keeps reading dataset.raw. Fixes #3722 Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:29:22 +02:00
ThomasAngel	a0b0420e6f	chore: Switch duckduckgo-search to ddgs (#3143 ) * Switch to ddgs duckduckgo_search was deprecated, this is the recommended replacement * Update test_service_search_provider_guards.py According to review comment	2026-06-10 17:59:47 +02:00
Mazen Tamer Salah	96975f8dd9	fix(contacts): tolerate non-string body in /api/contacts/import (#3638 ) import_vcf built `text = data.get("vcf") or data.get("text") or ""`, so a non-string JSON value (a number, list, etc.) stayed in place and the following `text.strip()` raised AttributeError, returning HTTP 500. Coerce vcf/text/csv with str() so non-string input degrades to the existing structured "no data" response, matching the file's convention elsewhere. Adds tests/test_contacts_import_nonstring.py covering non-string vcf, non-string csv, and an empty body.	2026-06-10 17:50:22 +02:00
Mazen Tamer Salah	4e210d3337	fix(research): stop rescanning the research dir on every status poll (#3637 ) get_status() called get_avg_duration() unconditionally, and that helper globs and JSON-parses every file under the research data dir. The SSE status stream polls get_status() roughly once a second, so with a few saved reports each poll re-read and re-parsed all of them, including for sessions that are not active (the disk branch never even used the value). Compute avg_duration only for active sessions and memoize it on the task entry, so a long stream computes it once instead of on every poll. Behaviour is unchanged: active streams still report avg_duration. Adds tests/test_research_status_avg_duration.py: an inactive session does no avg scan, and an active session computes it once across many polls.	2026-06-10 17:40:44 +02:00
RaresKeY	800d391234	fix(auth): roll back rename on owner migration failure (#3616 )	2026-06-10 17:28:27 +02:00
Ashvin	9c8df89973	fix(auth): case-insensitive skill owner match on rename (#3614 ) SKILL.md files written with mixed-case owner (e.g. 'owner: Alice') were skipped because the regex had no IGNORECASE flag. _usage.json keys like 'Alice::skill-name' were missed by the startswith prefix check for the same reason. Both comparisons now match the same way the deep_research and memory blocks do — case-insensitively against old_username. Fixes #3611	2026-06-10 17:20:36 +02:00
Ashvin	6f73c8afaa	fix(sessions): use owner_filter for list_sessions queries when auth disabled (#3622 ) Direct DbSession.owner == user becomes WHERE owner IS NULL when user is None (auth disabled), hiding all sessions that carry an explicit owner. Same flaw on the Document and GalleryImage sub-queries (active-doc and gallery badges). Replace all three with owner_filter(), which is a no-op when user is falsy. Fixes #3620	2026-06-10 17:07:07 +02:00
Shashwat Deep	e384c5a2a6	fix(db): close sqlite migration connections on exception paths (#3600 ) The _migrate_* startup helpers in core/database.py opened a raw sqlite3.connect() inside a try and called conn.close() as the last statement in that try. If any earlier statement raised (locked DB, unexpected schema, a failed ALTER), close() was skipped and the bare except only logged the error — leaking the connection (file handle + lock) for the lifetime of the process. These migrations run on every startup. Wrap each in the conn = None + try/except/finally pattern already used by _migrate_chat_messages_fts in this same file, so the connection is closed on all exit paths. 25 functions; no change on the success path. Helpers that already close safely are left untouched: _migrate_chat_messages_fts and _migrate_backfill_task_folders (the latter uses SQLAlchemy's engine.connect() context manager). Same bug class as the previously merged DB-connection-leak fix (#64) and the IMAP logout-on-all-paths fix (#1530).	2026-06-10 17:03:01 +02:00
Maruf Hasan	edce608008	fix(ui): raw SVG markup displayed instead of search icon for web_search tool label (#3601 ) * fix(ui): escaped SVG renders as raw markup during web_search tool label The _toolLabels['web_search'] entry embedded an SVG HTML string concatenated with label text. At render time the entire value was passed through esc(), HTML-escaping <svg> tags so the icon displayed as raw text instead of rendering visually. Fix: separate icon from label text via a _toolIcons map. The SVG is injected as raw innerHTML (unescaped) in .agent-thread-icon, while the label text remains safely escaped. * test: add behavioral test for web_search tool icon rendering Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com> --------- Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com>	2026-06-10 16:50:43 +02:00
RaresKeY	ee6cfbd25a	fix(auth): drop reserved usernames loaded from auth config (#3727 )	2026-06-10 16:31:26 +02:00
RaresKeY	cd3fb4e96b	fix(auth): fail closed when deleting user tokens fails (#3733 )	2026-06-10 16:24:27 +02:00
SurprisedDuck	e115b0155c	fix(security): don't grant tool access in the pre-setup window (#3506 ) * fix(security): don't grant tool access in the pre-setup window owner_is_admin_or_single_user() returned True whenever auth was not configured, which conflated two very different states: - intentional single-user mode (operator set AUTH_ENABLED=false), and - the pre-setup window (auth enabled, but no admin created yet). In the second state, blocked_tools_for_owner() returned an empty set, so server-execution tools (bash/python) and other admin-only tools were ungated. The auth middleware already 401s /api/ requests pre-setup, but a caller that bypasses it (trusted loopback / internal-tool path) could reach those tools before setup completed. Treat "not configured" as admin only when auth is intentionally disabled (AUTH_ENABLED=false), mirroring the AUTH_ENABLED parsing in app.py and core.middleware. Single-user mode is preserved; the pre-setup window is now non-admin as defense-in-depth. Adds regression tests for both states. Fixes #3201 Supported by Claude Opus 4.8 * refactor(security): reuse _auth_disabled() instead of a duplicate helper Addresses review on #3506: src/auth_helpers.py already has _auth_disabled() with the identical AUTH_ENABLED parse. Drop the duplicate _auth_intentionally_disabled() and call the existing helper via a lazy import inside owner_is_admin_or_single_user (mirroring the lazy core.auth import) to avoid any import cycle. Removes the now-unused `import os`. Behaviour and the two regression tests are unchanged. Supported by Claude Opus 4.8 --------- Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>	2026-06-10 14:37:26 +02:00
broken💎shaders	59fc6604be	Merge branch 'dev' into fix/no-scroll-snapping	2026-06-10 19:58:30 +08:00
ooovenenoso	725d174243	fix(research): track analyzed URLs separately (#3125 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-10 12:08:22 +01:00
Yeoh Ing Ji	3e49658204	refactor(tools): extract document tools to handle registry (#3666 ) * feat(tools): add document management tool handlers to the agent_tools module * feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py * feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools * refactor(tools): add document tool dispatcher and updated tool calling path * refactor(tools): remove duplicated document management functions * refactor(tools): removing unused functions and adding new import paths * refactor(tools): update document tool execute methods to use context dictionary * refactor(tests): update import paths for document tools in test files * refactor(tests): update owner parameter format in document management tests * refactor(tests): update import path for _owned_document_query * feat(tools): add document management tool handlers to the agent_tools module * feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py * feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools * refactor(tools): add document tool dispatcher and updated tool calling path * refactor(tools): remove duplicated document management functions * refactor(tools): removing unused functions and adding new import paths * refactor(tools): update document tool execute methods to use context dictionary * refactor(tests): update import paths for document tools in test files * refactor(tests): update owner parameter format in document management tests * refactor(tests): update import path for _owned_document_query * refactor: update import paths for document tools * fix(tests): correct source path for document ID test	2026-06-10 10:41:52 +02:00
Alexandre Teixeira	fc8e6366dd	test: mark first slow tests from duration evidence (#3711 )	2026-06-10 01:07:38 +02:00
Lucas Daniel	55ff22c6d5	fix(chat): stabilize system prompt, sequence memory extraction, and send stable session id to preserve KV cache (#3360 ) * fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache Fixes #2927. As diagnosed in the issue, three things in Odysseus's request pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache continuity, forcing a full prompt re-evaluation (15-30s+) on every turn: 1. Dynamic content folded into the system prompt every turn. Both the chat preface (ChatProcessor.build_context_preface) and the agent system prompt (_build_system_prompt) injected current_datetime_prompt() — text that changes every minute — directly into system-role messages, which llm_core then concatenates into the single system message sent as the cached prefix. Any byte difference there invalidates the entire cache. Moved this to a new current_datetime_context_message() helper that returns a standalone user-role message, inserted near the end of the array (right before the latest user turn) instead of mixed into the system prompt. The static system prefix (preset prompt + safety policy + agent base prompt) now stays byte-identical across turns of the same session. 2. Memory/skill extraction side-requests competed with the main completion. run_post_response_tasks fired extract_and_store / maybe_extract_skill via asyncio.create_task — fire-and-forget coroutines that could overlap the next turn's main request and steal llama.cpp's limited processing slots, evicting the cached checkpoint. They're now queued through a new _run_extraction_jobs_sequentially helper that waits for the session's stream to go idle and runs the jobs strictly one at a time. 3. No stable session identifier was sent to local backends, so llama.cpp assigned a new processing slot via LRU every turn ("session_id=<empty> server-selected (LCP/LRU)"), losing slot affinity. Added _apply_local_cache_affinity() in llm_core, which sets session_id and cache_prompt: true on outgoing payloads — gated to self-hosted OpenAI-compatible endpoints only (never api.openai.com or other cloud providers, which reject unrecognized request fields with a 400). Threaded session_id through stream_llm / llm_call_async / stream_agent_loop from the existing Odysseus session id. Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload- assembly and scheduling code paths: byte-identical system prefix across two turns of the same session (with a regression check that genuinely changed instructions DO still change it), the dynamic time block landing as a user-role message, extraction jobs waiting for the stream to go idle and running sequentially, and the outgoing payload carrying a stable session_id (same across turns of one session, different across sessions) only for self-hosted endpoints. Updated tests/test_user_time.py for the new message placement. * fix(tests): accept owner= kwarg in normalize_model_id monkeypatch The upstream normalize_model_id signature now takes an owner= keyword argument, and chat_helpers.py passes owner=getattr(sess, "owner", None) at the call site. Update the test stub lambda to **kwargs so it handles the new argument without breaking, and update chat_helpers.py to forward the owner parameter consistently. --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 22:46:54 +01:00
Lucas Daniel	d273085744	fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut (#3540 ) * fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut * fix(integrations): avoid mutating response dict in-place on truncation * fix(integrations): truncate dict responses and bound list sentinel overhead - Dict path now walks keys in insertion order, adding them one at a time while checking that the accumulated dict + _truncated marker fits within the 12 000-char limit. Previously the marker was appended without removing any content, so large dicts were not actually truncated. - List path now subtracts the sentinel's serialised size (+ element-separator padding) from the budget before binary-searching, so the final array including the sentinel stays at or under the limit. - Add regression tests: large-dict actually-truncated, small-dict pass-through, and list-with-sentinel respects the size bound. --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 22:34:08 +01:00
Kenny Van de Maele	8753daf357	chore: backport main-only changes to dev AGPL relicense + Cookbook serve fix (#3704 ) * Change project license to AGPL-3.0-or-later * Fix Cookbook serve server selection --------- Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>	2026-06-09 23:20:34 +02:00
Michael	2e6fff2212	fix: preserve reasoning_content in sanitized messages for Moonshot/Kimi (#3152 ) Providers like Moonshot (Kimi K2.5/K2.6) require the reasoning_content field to be present on assistant tool-call messages in multi-turn conversations. The sanitizer's allow-list was missing this field, causing HTTP 400: 'thinking is enabled but reasoning_content is missing in assistant tool call message at index N'. Add reasoning_content to the allowed field set in _sanitize_llm_messages and cover with regression tests. Fixes #3118 Co-authored-by: michaelxer <michaelxer@users.noreply.github.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 21:44:38 +01:00
TimHoogervorst	8878443426	fix(calanders): Removed/merged duplicate calender delete endpoints (#3682 ) * merged two delete_calander functions performing the same thing * added proper 404 raise when nothing is found * removed 404 HTTPException and jus reverted it back to raise	2026-06-09 22:35:55 +02:00
Alexandre Teixeira	a22c0fa85e	test: pilot core database stub helper (#3685 )	2026-06-09 22:23:33 +02:00
TimHoogervorst	b1af29c7bc	fix(chat): add aria-label and title attributes to dismiss button for accessibility (#3693 )	2026-06-09 22:15:40 +02:00
OdWar420	2fae3b5f64	perf(http): gzip-compress text responses (#3690 ) The frontend's text assets shipped uncompressed on every cold load. Add Starlette's GZipMiddleware. Measured on the current assets: - style.css 1,127 KB -> 238 KB (-79%) - index.html 202 KB -> 35 KB (-83%) - chat.js 238 KB -> 60 KB (-75%) minimum_size=1024 skips tiny bodies; Starlette excludes `text/event-stream` by default, so the SSE streams (chat, shell, research, model-probe — all served with media_type="text/event-stream") are never compressed or buffered. Composes cleanly with the existing security-header middleware. No behavioural change. Built by OdWar -- with Claude thinking alongside.	2026-06-09 22:12:24 +02:00
arnodecorte	38dc9a0a41	Allow cookbook scopes for API tokens (#3090 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 21:03:40 +01:00
Rohith Matam	fbd8ee9033	fix: fall back for npx cache subprocess check (#3560 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 20:41:23 +01:00
Kenny Van de Maele	de80b065f2	fix(macos): start ChromaDB in start-macos.sh so tool calling works (#3664 ) * fix(macos): start ChromaDB from start-macos.sh so tool calling works start-macos.sh never started ChromaDB, so the tool index failed to initialize and tool/MCP injection silently degraded on native macOS installs (no Docker). Start a local chroma from the venv before launching, mirroring the existing Apfel background+trap pattern: idempotent (skips if 8100 is already serving), honors CHROMADB_HOST/CHROMADB_PORT (skips when remote), logs to a file, persists to data/chroma, and is killed in the exit trap. Fixes #3297 * fix(macos): bind/probe ChromaDB on IPv4 loopback to match app resolution Binding to the literal localhost can land on IPv6 ::1 while the app connects to localhost->127.0.0.1, leaving them unable to reach each other. Pin bind + probe to 127.0.0.1 (0.0.0.0 still honored). * style(macos): trim chromadb comments (present-tense, no issue refs)	2026-06-09 19:37:18 +01:00
Rares Tudor	016157019c	fix(tools): use _INTERNAL_BASE in serve-session endpoint registration (#3675 ) #3322 renamed the loopback base to _INTERNAL_BASE, but a later Cookbook commit reintroduced one call site using the old _COOKBOOK_BASE name, raising NameError whenever the agent registers a model endpoint for a running serve session. Fixes #3669	2026-06-09 20:31:29 +02:00
RaresKeY	5d33393a28	fix(gallery): fail closed for null-user owner scope (#3613 )	2026-06-09 20:20:21 +02:00
Alexandre Teixeira	cdfda4bd16	test: add fast lane and duration visibility (#3659 )	2026-06-09 20:11:47 +02:00
Sid	9e74a327f8	fix(llm): remove max_output_tokens from ChatGPT Subscription payload (#3656 ) ChatGPT's Codex API rejects any request that includes max_output_tokens, returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused Deep Research to always fail during the endpoint probe when a ChatGPT Subscription model was selected. Remove the conditional that set payload["max_output_tokens"] in _build_chatgpt_responses_payload(). The parameter is simply not sent. Also update the two affected tests: - Rename test_chatgpt_subscription_payload_uses_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens - Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero - Assert max_output_tokens is absent rather than present Fixes #3650	2026-06-09 17:42:12 +02:00
Ashvin	60d25e0e26	fix(cookbook): use COOKBOOK_STATE_FILE constant for state path (#3623 ) The module derived its state file path as Path(os.environ.get("DATA_DIR", "data")) / "cookbook_state.json". The correct env var is ODYSSEUS_DATA_DIR, which is already read by src/constants.py and exported as COOKBOOK_STATE_FILE. When ODYSSEUS_DATA_DIR is set (Docker, custom installs), the old code read the wrong env var and silently wrote state to data/cookbook_state.json relative to CWD while every other file resolved under the custom data directory. Fixes #3621	2026-06-09 17:39:06 +02:00
RosenTomov	c46d37d876	test(tool_execution): stop two tests leaking src.tool_execution into the suite (#2686 ) * Make in-venv pip-fallback test independent of the runner's environment test_pip_install_fallback_chain_propagates_failure_in_venv simulated the in-venv case by probing the real interpreter (sys.prefix != sys.base_prefix). That assumes the test runner is itself inside a venv. CI runs pytest with no venv, so venv_check reported not-in-venv, the negated guard flipped, the --user branch fired, and the assertion failed. Make venv_check exit 0 directly to simulate the in-venv condition deterministically, mirroring the outside-venv companion test. * Stop agent-tool import shims from leaking into the admin-gate test test_function_call_non_object_args and test_unknown_tool_calls stub heavy DB/auth deps at import time to load the real agent-tool stack, but they popped src.tool_execution and left core.auth stubbed without restoring. Popping and re-importing src.tool_execution rebinds the src package's tool_execution attribute, so test_edit_file's later 'import src.tool_execution as te' resolved to a different module object than the one execute_tool_block lives in. The monkeypatch on _owner_is_admin then missed, the non-admin edit_file gate never fired, and the edit went through (exit_code 0). Stop touching src.tool_execution and restore the heavy stubs after import. Verified the full suite is green on Linux (Python 3.11, matching CI). --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:35:10 +01:00
Alexandre Teixeira	d4ab09e8e1	test: add focused test selection runner (#3556 )	2026-06-09 17:03:47 +02:00
Sheikh Rahat Mahmud	9180847c0e	feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964 ) * Add consolidated service health endpoint for degraded-state reporting ROADMAP (High Priority) asks for "Better degraded-state reporting for ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no single readout of which subsystems are actually working: /api/health is only a liveness ping and each subsystem's signal lives in a different module, so a misconfigured self-host install gives no consolidated picture. This adds an admin-only GET /api/diagnostics/services endpoint backed by a new src/service_health.py aggregator. Each subsystem reports a uniform {name, status, detail, meta} where status is ok \| degraded \| down \| disabled, and the response rolls up an overall verdict (worst non-disabled status). Probes are deliberately non-intrusive and safe to poll: - ChromaDB: reads the .healthy flags on the RAG and memory vector stores. - SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No search query is run. - ntfy: GET the server's built-in /v1/health. No test notification is sent. - email: short IMAP connect+logout per configured account (no credentials in meta). - providers: probe each enabled ModelEndpoint's model list (no api_key in meta). Probe functions take their inputs as parameters and isolate the network call to injectable callables, so they unit-test without touching the network (same pattern as the merged provider-endpoint tests). Network probes run concurrently off the event loop via asyncio.to_thread with bounded per-probe timeouts. memory_vector is now passed into setup_diagnostics_routes (new optional param, backward-compatible) so ChromaDB's vector-memory store can be reported too. Tests: tests/test_service_health.py — 29 tests covering every status mapping per subsystem, the overall rollup, and that no secrets leak into meta. Verification: python -m pytest tests/test_service_health.py -q # 29 passed python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q Backend + tests only; an Admin/Settings UI badge that renders this endpoint is a natural follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(diagnostics): bound service-health wall-clock and redact secrets Addresses review on #964. Blocker 1 — genuinely bounded wall-clock: - providers_health and email_health now fan out per-item probes across a bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET), instead of probing endpoints/accounts sequentially. Stragglers are reported as a controlled `timeout` and never block; the pool is shut down with wait=False so the response returns on time regardless of endpoint/account count. - The IMAP connect path now honors the service-health budget: _imap_connect gained a pass-through `timeout` param and the probe calls it with _PROBE_TIMEOUT instead of the default 15s. - collect_service_health runs the four network subsystems concurrently, each under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop. Blocker 2 — no secret/raw-error leakage in the response: - _safe_url strips userinfo, query, and fragment from every URL surfaced in meta (searxng instance, ntfy base, provider name fallback), keeping only scheme/host/port/path. - _classify_error maps every probe failure to a controlled category token (timeout, connection_refused, dns_error, tls_error, network_error, http_error, auth_or_protocol_error, …) — raw str(exception), which can embed credentialed URLs or server text, is never returned. Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py): - URL userinfo/query redaction for searxng/ntfy/providers. - secret-bearing exception strings map to categories and don't leak. - multiple slow providers/accounts stay bounded (single + 25-endpoint cases). - subsystems run concurrently; aggregate deadline yields a controlled result. - route-level unauthenticated (401) / non-admin (403) / admin (200) coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(diagnostics): isolate route tests so they don't leak module globals The new route tests replaced src.service_health.collect_service_health and routes.diagnostics_routes.require_admin via direct assignment, which persisted for the rest of the pytest session. In CI's full alphabetical run that fake collector (returning services=[]) leaked into the later collect_service_health tests and failed them. Switch to monkeypatch.setattr so both are restored after each test. No production code change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:00:24 +01:00
Maanas	c1674fc2aa	refactor(tools): migrate execution logic to src/agent_tools/ package with handler registry (#3435 ) * refactor(tools): implement strict cohesive class coordinator pattern per #2917 * test: update edit_file tests to use EditFileTool class * fix(tools): restore tool_policy param and security backstop in coordinator * refactor(tools): migrate domain tools to agent_tools package per #2917 * test: update test imports for new agent_tools package * fix: resolve circular import between tool_execution and agent_tools * fix: remove leftover git conflict markers * fix(tools): resolve pytest failure and document _apply method * fix(tools): clean up whitespace and remove dead _tool_python helper --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 14:35:36 +01:00
Joshua Valderrama	35b4dd2824	fix: session context drifting — messages leaking between chats (#135 ) (#267 ) * docs: add implementation plan for fixing chat context drifting (#135) * fix: make Session.history immutable + fix {}.history crash - Session.history now exposes a COPY of the internal _history list - add_message() replaces history with a fresh copy each time - get_context_messages() derives from _history directly - replace_messages() updates both _history and history - truncate_messages() updates both _history and history - _persist_message() line 207: fixed {}.history fallback crash - Added 11 tests for session isolation and edge cases Addresses #135 root cause #1: shared mutable references * fix: task scheduler uses SessionManager methods instead of overwriting sessions - Added ensure_task_session() to SessionManager (checks cache first) - Task scheduler now uses ensure_task_session() instead of direct dict assignment - Task scheduler now uses SessionManager.add_message() for message persistence - Removed direct sess_obj.history.append() that was silently losing data Addresses #135 root causes #2 and #3 * fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old Prevents the cleanup task from deleting sessions that were just created and haven't received any messages yet (message_count == 0). Addresses #135 root cause #5 * test: comprehensive session isolation tests (10/10 passing) * refactor: consolidate _session_manager into singleton pattern - Added set_session_manager_instance / get_session_manager_instance to core/models - kept backward-compat aliases (set_session_manager, get_session_manager) - session_manager.py re-exports the singleton functions - ai_interaction.set_session_manager now syncs with the core singleton - context_compactor uses get_session_manager_instance() instead of getattr hack - app.py initializes the singleton once Addresses #135 root cause #4: fragile global wiring * test: add concurrent session isolation integration tests Verifies: - Concurrent add_message to different sessions doesn't cross-contaminate - Rapid parallel writes maintain isolation - Read-write concurrent access is safe All 3 async tests pass, proving the immutable history fix works under concurrency * fix: pre-import core.models in conftest to prevent test pollution test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at module level during collection. Any test collected after it imports Session as a MagicMock. Pre-importing core.models in conftest.py before test_agent_loop.py's module-level code runs prevents this. * fix: make .history authoritative mutable list, address PR review Per review feedback: keep .history as the authoritative mutable list so existing code doing .history.pop(), .history = [...], etc. still works. Fix the cross-contamination bug by ensuring __post_init__() gives each Session its OWN unique history list (never shared). Changes: - core/models.py: .history IS the authoritative list. _history aliases it. Each Session gets its own list in __post_init__. - core/session_manager.py: add_message() delegates to Session.add_message() instead of appending directly — no double-append, single source of truth. - tests/test_session_manager.py: updated test to reflect that .history references see new messages (same list, not a snapshot). - docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for shipping — useful design context but too much process/doc to ship). All 272 tests pass (3 pre-existing failures unrelated). * Fix session manager message persistence * Fix session history alias regressions * Fix session history aliasing and task delivery	2026-06-09 14:12:52 +01:00

1 2 3 4 5 ...

1122 Commits