Commit Graph

1118 Commits

Author SHA1 Message Date
Léo 5e16126bde Merge branch 'dev' into fix/no-scroll-snapping 2026-06-11 13:08:50 +02:00
cyq c01034f9cb fix(settings): scrub camelCase secret keys (#3707) 2026-06-11 12:53:33 +02:00
broken💎shaders 8adca3a924 Merge branch 'dev' into fix/no-scroll-snapping 2026-06-11 11:43:53 +08:00
RaresKeY d5603ee575 fix(research): migrate active task owners on rename (#3618) 2026-06-11 01:17:02 +02:00
Mazen Tamer Salah 9c00da6d1c fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models (#3639)
* fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models

The route did `n = int(gpu_count)` with no guard, so a non-numeric query param
like `?gpu_count=abc` raised ValueError and returned HTTP 500. Parse it
defensively (mirroring the gpu_group guard a few lines above): a malformed value
is ignored, exactly like omitting the param, and valid values still apply.

Adds tests/test_hwfit_gpu_count_nonnumeric.py: a non-numeric gpu_count returns a
ranking instead of raising, and a numeric value is still accepted.

* test(hwfit): cover non-numeric manual_gpu_count too

Follow-up to the gpu_count guard: add a regression test for the sibling
manual_gpu_count query param (the hardware simulator in _apply_manual_hardware),
which dev already guards by defaulting to 1 on a non-numeric value. This pins
that behaviour so the endpoint's count parsing is fully covered and cannot
regress to a 500.
2026-06-11 01:01:58 +02:00
RaresKeY d1a5a7d680 fix(hwfit): validate remote SSH detection targets (#3718) 2026-06-11 00:43:49 +02:00
Mazen Tamer Salah 218b9ecbc8 fix(startup): ping real endpoints in warmup/keepalive (#3641)
_warmup_endpoints called model_discovery.get_endpoints(), which does not exist
on ModelDiscovery. It raised AttributeError on every startup and on every 60s
keepalive tick, was swallowed by the outer except, and pinged nothing, so the
cold-start prevention the loop exists for never ran.

Add ModelDiscovery.warmup_ping_urls(), which resolves the /models probe URLs
from the real discover_models() output, and call it from the warmup loop via
asyncio.to_thread (discovery does a blocking port scan, so keep it off the event
loop).

Adds tests/test_warmup_ping_urls.py: resolves /models URLs from discovered
items, honors the limit, degrades to [] on discovery failure, and documents that
get_endpoints never existed.
2026-06-10 19:21:45 +02:00
Srinesh R d9a4b99046 fix: handle batch events format in manage_calendar tool (#3503)
* fix: handle batch events format in manage_calendar tool

Models like deepseek-v4-flash emit batch events array instead of individual create_event calls. The tool defaulted to list_events (no action key), so events were never created despite the model confirming success.

- Add batch normalization in do_manage_calendar

- Map start/end objects to flat dtstart/dtend strings

- Add tests for both object and flat string formats

* fix: surface partial batch failures in manage_calendar

Partial failures were silently dropped - batches with mixed success/failure would report only created count with no error visibility.

- Return non-zero exit code for any failures

- Surface both created and failed counts in response

- Include first error message for debugging

- Add test for partial failure case

* chore: strip trailing whitespace in batch normalization block

* chore: strip whitespace-only blank lines in batch events test
2026-06-10 19:13:08 +02:00
Mazen Tamer Salah f5b91f1e9e fix(tasks): read Memory.text in classify_events personal context (#3640)
The classify_events task pulled user memories to give the LLM personal context,
but read `m.content`, which the Memory ORM does not have (the column is `text`).
That raised AttributeError on the first row; the surrounding except swallowed it
and logged at debug, so the personal-context block was silently always empty and
events were classified without it.

Extract the rendering into `_memory_context_lines` (reads `text`, robust via
getattr, keeps the 200-char and 40-line caps) and raise the swallowed-exception
log to warning so a future schema mismatch is visible.

Adds tests/test_classify_events_memory_text.py for the field, truncation, blank
skipping, missing-attr robustness, and the line cap.
2026-06-10 19:03:45 +02:00
Max Hsu 8bf8212846 fix(chat): copy only the displayed reply from the message copy buttons (#3731)
The AI-message copy buttons copied dataset.raw, which is the full
accumulated model output — still containing the <think time="...">
reasoning block and any tool-call markup that the renderer strips for
display. Pasting therefore leaked the model's thinking, and the first
heading after </think> lost its markdown formatting because it was
glued to the closing tag.

Add chatRenderer.copyMessageText(), which mirrors the display pipeline
(stripToolBlocks then extractThinkingBlocks) and falls back to the raw
text when stripping leaves nothing (thinking-only turns), and route
both copy handlers — the message footer and the slash-reply footer —
through it. The interrupted-turn Continue flow intentionally keeps
reading dataset.raw.

Fixes #3722

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:29:22 +02:00
ThomasAngel a0b0420e6f chore: Switch duckduckgo-search to ddgs (#3143)
* Switch to ddgs

duckduckgo_search was deprecated, this is the recommended replacement

* Update test_service_search_provider_guards.py

According to review comment
2026-06-10 17:59:47 +02:00
Mazen Tamer Salah 96975f8dd9 fix(contacts): tolerate non-string body in /api/contacts/import (#3638)
import_vcf built `text = data.get("vcf") or data.get("text") or ""`, so a
non-string JSON value (a number, list, etc.) stayed in place and the following
`text.strip()` raised AttributeError, returning HTTP 500. Coerce vcf/text/csv
with str() so non-string input degrades to the existing structured "no data"
response, matching the file's convention elsewhere.

Adds tests/test_contacts_import_nonstring.py covering non-string vcf, non-string
csv, and an empty body.
2026-06-10 17:50:22 +02:00
Mazen Tamer Salah 4e210d3337 fix(research): stop rescanning the research dir on every status poll (#3637)
get_status() called get_avg_duration() unconditionally, and that helper globs
and JSON-parses every file under the research data dir. The SSE status stream
polls get_status() roughly once a second, so with a few saved reports each poll
re-read and re-parsed all of them, including for sessions that are not active
(the disk branch never even used the value).

Compute avg_duration only for active sessions and memoize it on the task entry,
so a long stream computes it once instead of on every poll. Behaviour is
unchanged: active streams still report avg_duration.

Adds tests/test_research_status_avg_duration.py: an inactive session does no
avg scan, and an active session computes it once across many polls.
2026-06-10 17:40:44 +02:00
RaresKeY 800d391234 fix(auth): roll back rename on owner migration failure (#3616) 2026-06-10 17:28:27 +02:00
Ashvin 9c8df89973 fix(auth): case-insensitive skill owner match on rename (#3614)
SKILL.md files written with mixed-case owner (e.g. 'owner: Alice') were
skipped because the regex had no IGNORECASE flag. _usage.json keys like
'Alice::skill-name' were missed by the startswith prefix check for the
same reason.

Both comparisons now match the same way the deep_research and memory
blocks do — case-insensitively against old_username.

Fixes #3611
2026-06-10 17:20:36 +02:00
Ashvin 6f73c8afaa fix(sessions): use owner_filter for list_sessions queries when auth disabled (#3622)
Direct DbSession.owner == user becomes WHERE owner IS NULL when user is None
(auth disabled), hiding all sessions that carry an explicit owner. Same flaw
on the Document and GalleryImage sub-queries (active-doc and gallery badges).
Replace all three with owner_filter(), which is a no-op when user is falsy.

Fixes #3620
2026-06-10 17:07:07 +02:00
Shashwat Deep e384c5a2a6 fix(db): close sqlite migration connections on exception paths (#3600)
The _migrate_* startup helpers in core/database.py opened a raw
sqlite3.connect() inside a try and called conn.close() as the last
statement in that try. If any earlier statement raised (locked DB,
unexpected schema, a failed ALTER), close() was skipped and the bare
except only logged the error — leaking the connection (file handle +
lock) for the lifetime of the process. These migrations run on every
startup.

Wrap each in the conn = None + try/except/finally pattern already used
by _migrate_chat_messages_fts in this same file, so the connection is
closed on all exit paths. 25 functions; no change on the success path.
Helpers that already close safely are left untouched: _migrate_chat_messages_fts
and _migrate_backfill_task_folders (the latter uses SQLAlchemy's
engine.connect() context manager).

Same bug class as the previously merged DB-connection-leak fix (#64)
and the IMAP logout-on-all-paths fix (#1530).
2026-06-10 17:03:01 +02:00
Maruf Hasan edce608008 fix(ui): raw SVG markup displayed instead of search icon for web_search tool label (#3601)
* fix(ui): escaped SVG renders as raw markup during web_search tool label

The _toolLabels['web_search'] entry embedded an SVG HTML string
concatenated with label text. At render time the entire value was
passed through esc(), HTML-escaping <svg> tags so the icon
displayed as raw text instead of rendering visually.

Fix: separate icon from label text via a _toolIcons map. The SVG
is injected as raw innerHTML (unescaped) in .agent-thread-icon,
while the label text remains safely escaped.

* test: add behavioral test for web_search tool icon rendering

Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com>

---------

Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com>
2026-06-10 16:50:43 +02:00
RaresKeY ee6cfbd25a fix(auth): drop reserved usernames loaded from auth config (#3727) 2026-06-10 16:31:26 +02:00
RaresKeY cd3fb4e96b fix(auth): fail closed when deleting user tokens fails (#3733) 2026-06-10 16:24:27 +02:00
SurprisedDuck e115b0155c fix(security): don't grant tool access in the pre-setup window (#3506)
* fix(security): don't grant tool access in the pre-setup window

owner_is_admin_or_single_user() returned True whenever auth was not
configured, which conflated two very different states:

  - intentional single-user mode (operator set AUTH_ENABLED=false), and
  - the pre-setup window (auth enabled, but no admin created yet).

In the second state, blocked_tools_for_owner() returned an empty set, so
server-execution tools (bash/python) and other admin-only tools were
ungated. The auth middleware already 401s /api/ requests pre-setup, but a
caller that bypasses it (trusted loopback / internal-tool path) could reach
those tools before setup completed.

Treat "not configured" as admin only when auth is intentionally disabled
(AUTH_ENABLED=false), mirroring the AUTH_ENABLED parsing in app.py and
core.middleware. Single-user mode is preserved; the pre-setup window is now
non-admin as defense-in-depth.

Adds regression tests for both states.

Fixes #3201

Supported by Claude Opus 4.8

* refactor(security): reuse _auth_disabled() instead of a duplicate helper

Addresses review on #3506: src/auth_helpers.py already has _auth_disabled()
with the identical AUTH_ENABLED parse. Drop the duplicate
_auth_intentionally_disabled() and call the existing helper via a lazy import
inside owner_is_admin_or_single_user (mirroring the lazy core.auth import) to
avoid any import cycle. Removes the now-unused `import os`. Behaviour and the
two regression tests are unchanged.

Supported by Claude Opus 4.8

---------

Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
2026-06-10 14:37:26 +02:00
broken💎shaders 59fc6604be Merge branch 'dev' into fix/no-scroll-snapping 2026-06-10 19:58:30 +08:00
ooovenenoso 725d174243 fix(research): track analyzed URLs separately (#3125)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-10 12:08:22 +01:00
Yeoh Ing Ji 3e49658204 refactor(tools): extract document tools to handle registry (#3666)
* feat(tools): add document management tool handlers to the agent_tools module

* feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py

* feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools

* refactor(tools): add document tool dispatcher and updated tool calling path

* refactor(tools): remove duplicated document management functions

* refactor(tools): removing unused functions and adding new import paths

* refactor(tools): update document tool execute methods to use context dictionary

* refactor(tests): update import paths for document tools in test files

* refactor(tests): update owner parameter format in document management tests

* refactor(tests): update import path for _owned_document_query

* feat(tools): add document management tool handlers to the agent_tools module

* feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py

* feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools

* refactor(tools): add document tool dispatcher and updated tool calling path

* refactor(tools): remove duplicated document management functions

* refactor(tools): removing unused functions and adding new import paths

* refactor(tools): update document tool execute methods to use context dictionary

* refactor(tests): update import paths for document tools in test files

* refactor(tests): update owner parameter format in document management tests

* refactor(tests): update import path for _owned_document_query

* refactor: update import paths for document tools

* fix(tests): correct source path for document ID test
2026-06-10 10:41:52 +02:00
Alexandre Teixeira fc8e6366dd test: mark first slow tests from duration evidence (#3711) 2026-06-10 01:07:38 +02:00
Lucas Daniel 55ff22c6d5 fix(chat): stabilize system prompt, sequence memory extraction, and send stable session id to preserve KV cache (#3360)
* fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache

Fixes #2927. As diagnosed in the issue, three things in Odysseus's request
pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache
continuity, forcing a full prompt re-evaluation (15-30s+) on every turn:

1. Dynamic content folded into the system prompt every turn. Both the chat
   preface (ChatProcessor.build_context_preface) and the agent system prompt
   (_build_system_prompt) injected current_datetime_prompt() — text that
   changes every minute — directly into system-role messages, which llm_core
   then concatenates into the single system message sent as the cached
   prefix. Any byte difference there invalidates the entire cache. Moved this
   to a new current_datetime_context_message() helper that returns a
   standalone user-role message, inserted near the end of the array (right
   before the latest user turn) instead of mixed into the system prompt. The
   static system prefix (preset prompt + safety policy + agent base prompt)
   now stays byte-identical across turns of the same session.

2. Memory/skill extraction side-requests competed with the main completion.
   run_post_response_tasks fired extract_and_store / maybe_extract_skill via
   asyncio.create_task — fire-and-forget coroutines that could overlap the
   next turn's main request and steal llama.cpp's limited processing slots,
   evicting the cached checkpoint. They're now queued through a new
   _run_extraction_jobs_sequentially helper that waits for the session's
   stream to go idle and runs the jobs strictly one at a time.

3. No stable session identifier was sent to local backends, so llama.cpp
   assigned a new processing slot via LRU every turn ("session_id=<empty>
   server-selected (LCP/LRU)"), losing slot affinity. Added
   _apply_local_cache_affinity() in llm_core, which sets session_id and
   cache_prompt: true on outgoing payloads — gated to self-hosted
   OpenAI-compatible endpoints only (never api.openai.com or other cloud
   providers, which reject unrecognized request fields with a 400). Threaded
   session_id through stream_llm / llm_call_async / stream_agent_loop from
   the existing Odysseus session id.

Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload-
assembly and scheduling code paths: byte-identical system prefix across two
turns of the same session (with a regression check that genuinely changed
instructions DO still change it), the dynamic time block landing as a
user-role message, extraction jobs waiting for the stream to go idle and
running sequentially, and the outgoing payload carrying a stable session_id
(same across turns of one session, different across sessions) only for
self-hosted endpoints. Updated tests/test_user_time.py for the new message
placement.

* fix(tests): accept owner= kwarg in normalize_model_id monkeypatch

The upstream normalize_model_id signature now takes an owner= keyword
argument, and chat_helpers.py passes owner=getattr(sess, "owner", None)
at the call site. Update the test stub lambda to **kwargs so it handles
the new argument without breaking, and update chat_helpers.py to forward
the owner parameter consistently.

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 22:46:54 +01:00
Lucas Daniel d273085744 fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut (#3540)
* fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut

* fix(integrations): avoid mutating response dict in-place on truncation

* fix(integrations): truncate dict responses and bound list sentinel overhead

- Dict path now walks keys in insertion order, adding them one at a time
  while checking that the accumulated dict + _truncated marker fits within
  the 12 000-char limit. Previously the marker was appended without removing
  any content, so large dicts were not actually truncated.
- List path now subtracts the sentinel's serialised size (+ element-separator
  padding) from the budget before binary-searching, so the final array
  including the sentinel stays at or under the limit.
- Add regression tests: large-dict actually-truncated, small-dict pass-through,
  and list-with-sentinel respects the size bound.

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 22:34:08 +01:00
Kenny Van de Maele 8753daf357 chore: backport main-only changes to dev AGPL relicense + Cookbook serve fix (#3704)
* Change project license to AGPL-3.0-or-later

* Fix Cookbook serve server selection

---------

Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>
2026-06-09 23:20:34 +02:00
Michael 2e6fff2212 fix: preserve reasoning_content in sanitized messages for Moonshot/Kimi (#3152)
Providers like Moonshot (Kimi K2.5/K2.6) require the reasoning_content
field to be present on assistant tool-call messages in multi-turn
conversations.  The sanitizer's allow-list was missing this field,
causing HTTP 400: 'thinking is enabled but reasoning_content is missing
in assistant tool call message at index N'.

Add reasoning_content to the allowed field set in
_sanitize_llm_messages and cover with regression tests.

Fixes #3118

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 21:44:38 +01:00
TimHoogervorst 8878443426 fix(calanders): Removed/merged duplicate calender delete endpoints (#3682)
* merged two delete_calander functions performing the same thing

* added proper 404 raise when nothing is found

* removed 404 HTTPException and jus reverted it back to raise
2026-06-09 22:35:55 +02:00
Alexandre Teixeira a22c0fa85e test: pilot core database stub helper (#3685) 2026-06-09 22:23:33 +02:00
TimHoogervorst b1af29c7bc fix(chat): add aria-label and title attributes to dismiss button for accessibility (#3693) 2026-06-09 22:15:40 +02:00
OdWar420 2fae3b5f64 perf(http): gzip-compress text responses (#3690)
The frontend's text assets shipped uncompressed on every cold load. Add
Starlette's GZipMiddleware. Measured on the current assets:

- style.css   1,127 KB -> 238 KB  (-79%)
- index.html    202 KB ->  35 KB  (-83%)
- chat.js       238 KB ->  60 KB  (-75%)

minimum_size=1024 skips tiny bodies; Starlette excludes `text/event-stream` by
default, so the SSE streams (chat, shell, research, model-probe — all served with
media_type="text/event-stream") are never compressed or buffered. Composes
cleanly with the existing security-header middleware. No behavioural change.

Built by OdWar -- with Claude thinking alongside.
2026-06-09 22:12:24 +02:00
arnodecorte 38dc9a0a41 Allow cookbook scopes for API tokens (#3090)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 21:03:40 +01:00
Rohith Matam fbd8ee9033 fix: fall back for npx cache subprocess check (#3560)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 20:41:23 +01:00
Kenny Van de Maele de80b065f2 fix(macos): start ChromaDB in start-macos.sh so tool calling works (#3664)
* fix(macos): start ChromaDB from start-macos.sh so tool calling works

start-macos.sh never started ChromaDB, so the tool index failed to initialize
and tool/MCP injection silently degraded on native macOS installs (no Docker).
Start a local chroma from the venv before launching, mirroring the existing
Apfel background+trap pattern: idempotent (skips if 8100 is already serving),
honors CHROMADB_HOST/CHROMADB_PORT (skips when remote), logs to a file, persists
to data/chroma, and is killed in the exit trap.

Fixes #3297

* fix(macos): bind/probe ChromaDB on IPv4 loopback to match app resolution

Binding to the literal localhost can land on IPv6 ::1 while the app connects to
localhost->127.0.0.1, leaving them unable to reach each other. Pin bind + probe
to 127.0.0.1 (0.0.0.0 still honored).

* style(macos): trim chromadb comments (present-tense, no issue refs)
2026-06-09 19:37:18 +01:00
Rares Tudor 016157019c fix(tools): use _INTERNAL_BASE in serve-session endpoint registration (#3675)
#3322 renamed the loopback base to _INTERNAL_BASE, but a later Cookbook
commit reintroduced one call site using the old _COOKBOOK_BASE name,
raising NameError whenever the agent registers a model endpoint for a
running serve session.

Fixes #3669
2026-06-09 20:31:29 +02:00
RaresKeY 5d33393a28 fix(gallery): fail closed for null-user owner scope (#3613) 2026-06-09 20:20:21 +02:00
Alexandre Teixeira cdfda4bd16 test: add fast lane and duration visibility (#3659) 2026-06-09 20:11:47 +02:00
Sid 9e74a327f8 fix(llm): remove max_output_tokens from ChatGPT Subscription payload (#3656)
ChatGPT's Codex API rejects any request that includes max_output_tokens,
returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused
Deep Research to always fail during the endpoint probe when a ChatGPT
Subscription model was selected.

Remove the conditional that set payload["max_output_tokens"] in
_build_chatgpt_responses_payload(). The parameter is simply not sent.

Also update the two affected tests:
- Rename test_chatgpt_subscription_payload_uses_max_output_tokens →
  test_chatgpt_subscription_payload_omits_max_output_tokens
- Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens →
  test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero
- Assert max_output_tokens is absent rather than present

Fixes #3650
2026-06-09 17:42:12 +02:00
Ashvin 60d25e0e26 fix(cookbook): use COOKBOOK_STATE_FILE constant for state path (#3623)
The module derived its state file path as Path(os.environ.get("DATA_DIR", "data"))
/ "cookbook_state.json". The correct env var is ODYSSEUS_DATA_DIR, which is
already read by src/constants.py and exported as COOKBOOK_STATE_FILE. When
ODYSSEUS_DATA_DIR is set (Docker, custom installs), the old code read the wrong
env var and silently wrote state to data/cookbook_state.json relative to CWD
while every other file resolved under the custom data directory.

Fixes #3621
2026-06-09 17:39:06 +02:00
RosenTomov c46d37d876 test(tool_execution): stop two tests leaking src.tool_execution into the suite (#2686)
* Make in-venv pip-fallback test independent of the runner's environment

test_pip_install_fallback_chain_propagates_failure_in_venv simulated the in-venv case by probing the real interpreter (sys.prefix != sys.base_prefix). That assumes the test runner is itself inside a venv. CI runs pytest with no venv, so venv_check reported not-in-venv, the negated guard flipped, the --user branch fired, and the assertion failed. Make venv_check exit 0 directly to simulate the in-venv condition deterministically, mirroring the outside-venv companion test.

* Stop agent-tool import shims from leaking into the admin-gate test

test_function_call_non_object_args and test_unknown_tool_calls stub heavy DB/auth deps at import time to load the real agent-tool stack, but they popped src.tool_execution and left core.auth stubbed without restoring. Popping and re-importing src.tool_execution rebinds the src package's tool_execution attribute, so test_edit_file's later 'import src.tool_execution as te' resolved to a different module object than the one execute_tool_block lives in. The monkeypatch on _owner_is_admin then missed, the non-admin edit_file gate never fired, and the edit went through (exit_code 0). Stop touching src.tool_execution and restore the heavy stubs after import. Verified the full suite is green on Linux (Python 3.11, matching CI).

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 16:35:10 +01:00
Alexandre Teixeira d4ab09e8e1 test: add focused test selection runner (#3556) 2026-06-09 17:03:47 +02:00
Sheikh Rahat Mahmud 9180847c0e feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964)
* Add consolidated service health endpoint for degraded-state reporting

ROADMAP (High Priority) asks for "Better degraded-state reporting for
ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no
single readout of which subsystems are actually working: /api/health is only a
liveness ping and each subsystem's signal lives in a different module, so a
misconfigured self-host install gives no consolidated picture.

This adds an admin-only GET /api/diagnostics/services endpoint backed by a new
src/service_health.py aggregator. Each subsystem reports a uniform
{name, status, detail, meta} where status is ok | degraded | down | disabled,
and the response rolls up an overall verdict (worst non-disabled status).

Probes are deliberately non-intrusive and safe to poll:
- ChromaDB: reads the .healthy flags on the RAG and memory vector stores.
- SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No
  search query is run.
- ntfy: GET the server's built-in /v1/health. No test notification is sent.
- email: short IMAP connect+logout per configured account (no credentials in
  meta).
- providers: probe each enabled ModelEndpoint's model list (no api_key in meta).

Probe functions take their inputs as parameters and isolate the network call to
injectable callables, so they unit-test without touching the network (same
pattern as the merged provider-endpoint tests). Network probes run concurrently
off the event loop via asyncio.to_thread with bounded per-probe timeouts.

memory_vector is now passed into setup_diagnostics_routes (new optional param,
backward-compatible) so ChromaDB's vector-memory store can be reported too.

Tests: tests/test_service_health.py — 29 tests covering every status mapping
per subsystem, the overall rollup, and that no secrets leak into meta.

Verification:
  python -m pytest tests/test_service_health.py -q          # 29 passed
  python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py
  python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q

Backend + tests only; an Admin/Settings UI badge that renders this endpoint is
a natural follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(diagnostics): bound service-health wall-clock and redact secrets

Addresses review on #964.

Blocker 1 — genuinely bounded wall-clock:
- providers_health and email_health now fan out per-item probes across a
  bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET),
  instead of probing endpoints/accounts sequentially. Stragglers are reported
  as a controlled `timeout` and never block; the pool is shut down with
  wait=False so the response returns on time regardless of endpoint/account
  count.
- The IMAP connect path now honors the service-health budget: _imap_connect
  gained a pass-through `timeout` param and the probe calls it with
  _PROBE_TIMEOUT instead of the default 15s.
- collect_service_health runs the four network subsystems concurrently, each
  under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall
  wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop.

Blocker 2 — no secret/raw-error leakage in the response:
- _safe_url strips userinfo, query, and fragment from every URL surfaced in
  meta (searxng instance, ntfy base, provider name fallback), keeping only
  scheme/host/port/path.
- _classify_error maps every probe failure to a controlled category token
  (timeout, connection_refused, dns_error, tls_error, network_error,
  http_error, auth_or_protocol_error, …) — raw str(exception), which can embed
  credentialed URLs or server text, is never returned.

Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py):
- URL userinfo/query redaction for searxng/ntfy/providers.
- secret-bearing exception strings map to categories and don't leak.
- multiple slow providers/accounts stay bounded (single + 25-endpoint cases).
- subsystems run concurrently; aggregate deadline yields a controlled result.
- route-level unauthenticated (401) / non-admin (403) / admin (200) coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(diagnostics): isolate route tests so they don't leak module globals

The new route tests replaced src.service_health.collect_service_health and
routes.diagnostics_routes.require_admin via direct assignment, which persisted
for the rest of the pytest session. In CI's full alphabetical run that fake
collector (returning services=[]) leaked into the later collect_service_health
tests and failed them. Switch to monkeypatch.setattr so both are restored after
each test. No production code change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 16:00:24 +01:00
Maanas c1674fc2aa refactor(tools): migrate execution logic to src/agent_tools/ package with handler registry (#3435)
* refactor(tools): implement strict cohesive class coordinator pattern per #2917

* test: update edit_file tests to use EditFileTool class

* fix(tools): restore tool_policy param and security backstop in coordinator

* refactor(tools): migrate domain tools to agent_tools package per #2917

* test: update test imports for new agent_tools package

* fix: resolve circular import between tool_execution and agent_tools

* fix: remove leftover git conflict markers

* fix(tools): resolve pytest failure and document _apply method

* fix(tools): clean up whitespace and remove dead _tool_python helper

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 14:35:36 +01:00
Joshua Valderrama 35b4dd2824 fix: session context drifting — messages leaking between chats (#135) (#267)
* docs: add implementation plan for fixing chat context drifting (#135)

* fix: make Session.history immutable + fix {}.history crash

- Session.history now exposes a COPY of the internal _history list
- add_message() replaces history with a fresh copy each time
- get_context_messages() derives from _history directly
- replace_messages() updates both _history and history
- truncate_messages() updates both _history and history
- _persist_message() line 207: fixed {}.history fallback crash
- Added 11 tests for session isolation and edge cases

Addresses #135 root cause #1: shared mutable references

* fix: task scheduler uses SessionManager methods instead of overwriting sessions

- Added ensure_task_session() to SessionManager (checks cache first)
- Task scheduler now uses ensure_task_session() instead of direct dict assignment
- Task scheduler now uses SessionManager.add_message() for message persistence
- Removed direct sess_obj.history.append() that was silently losing data

Addresses #135 root causes #2 and #3

* fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old

Prevents the cleanup task from deleting sessions that were just created
and haven't received any messages yet (message_count == 0).

Addresses #135 root cause #5

* test: comprehensive session isolation tests (10/10 passing)

* refactor: consolidate _session_manager into singleton pattern

- Added set_session_manager_instance / get_session_manager_instance to core/models
- kept backward-compat aliases (set_session_manager, get_session_manager)
- session_manager.py re-exports the singleton functions
- ai_interaction.set_session_manager now syncs with the core singleton
- context_compactor uses get_session_manager_instance() instead of getattr hack
- app.py initializes the singleton once

Addresses #135 root cause #4: fragile global wiring

* test: add concurrent session isolation integration tests

Verifies:
- Concurrent add_message to different sessions doesn't cross-contaminate
- Rapid parallel writes maintain isolation
- Read-write concurrent access is safe

All 3 async tests pass, proving the immutable history fix works under concurrency

* fix: pre-import core.models in conftest to prevent test pollution

test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at
module level during collection. Any test collected after it imports
Session as a MagicMock. Pre-importing core.models in conftest.py
before test_agent_loop.py's module-level code runs prevents this.

* fix: make .history authoritative mutable list, address PR review

Per review feedback: keep .history as the authoritative mutable list so
existing code doing .history.pop(), .history = [...], etc. still works.
Fix the cross-contamination bug by ensuring __post_init__() gives each
Session its OWN unique history list (never shared).

Changes:
- core/models.py: .history IS the authoritative list. _history aliases it.
  Each Session gets its own list in __post_init__.
- core/session_manager.py: add_message() delegates to Session.add_message()
  instead of appending directly — no double-append, single source of truth.
- tests/test_session_manager.py: updated test to reflect that .history
  references see new messages (same list, not a snapshot).
- docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for
  shipping — useful design context but too much process/doc to ship).

All 272 tests pass (3 pre-existing failures unrelated).

* Fix session manager message persistence

* Fix session history alias regressions

* Fix session history aliasing and task delivery
2026-06-09 14:12:52 +01:00
Maruf Hasan c3fcaf15b7 feat(providers): add NVIDIA AI provider endpoint support (#3456)
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 11:06:12 +02:00
Mazen Tamer Salah 3c4ec8828b fix(embeddings): survive numpy embeddings when restoring a reset lane (#3410)
When a lane reset fails to rewrite the recreated collection, the recovery path
re-adds the preserved rows. It read the embeddings with
`preserved.get("embeddings") or []` and gated the loop with
`if ids and docs and old_embeddings:`. chromadb returns embeddings as a numpy
ndarray, whose truth value is ambiguous, so both expressions raise ValueError
inside the except block — the restore is abandoned and every preserved row is
lost (the collection was already deleted), exactly when the code is trying to
avoid data loss.

Use an explicit `is None` check and `len(...)`, and convert ndarray batches to
lists before re-adding.

Adds tests/test_embedding_lane_ndarray_restore.py (preserved embeddings come
back as np.ndarray); existing test_embedding_lanes.py still passes.
2026-06-09 10:40:17 +02:00
Ashvin 2fdb4813db fix(auth): sync file-backed and in-memory owner caches on user rename (#3397)
The DB owner-rename loop in rename_user patched every SQL column named
owner, but three non-SQL stores were left behind:

1. session_manager.sessions -- in-memory Session objects carry s.owner
   set at server-boot time. get_sessions_for_user() does an exact
   s.owner == username check, so the renamed user chat sidebar goes empty
   until a server restart.

2. data/deep_research/*.json -- each completed research report is a
   standalone JSON file with an owner field. research_routes filters
   by d.get(owner) == user, making every report invisible to the
   renamed user.

3. data/memory.json -- a flat JSON array; each entry carries an owner
   field. memory_manager.load(owner=user) filters on it, so all memories
   vanish from the memory panel.

Fix: after the SQL loop, patch all three:
- iterate sm.sessions and update owner in-place (exposed via app.state)
- walk data/deep_research/*.json and rewrite owner with atomic_write_json
- update matching entries in memory.json with atomic_write_json

All three use the same case-insensitive lower() comparison the SQL loop
already uses. Each step is independently wrapped so a single failure
does not abort the others or the rename itself.

Fixes #3362
2026-06-09 10:19:45 +02:00
nubs f1cda91683 fix(agent): scope skill index to owner (#2404)
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 09:51:29 +02:00