Commit Graph

1069 Commits

Author SHA1 Message Date
Maanas c1674fc2aa refactor(tools): migrate execution logic to src/agent_tools/ package with handler registry (#3435)
* refactor(tools): implement strict cohesive class coordinator pattern per #2917

* test: update edit_file tests to use EditFileTool class

* fix(tools): restore tool_policy param and security backstop in coordinator

* refactor(tools): migrate domain tools to agent_tools package per #2917

* test: update test imports for new agent_tools package

* fix: resolve circular import between tool_execution and agent_tools

* fix: remove leftover git conflict markers

* fix(tools): resolve pytest failure and document _apply method

* fix(tools): clean up whitespace and remove dead _tool_python helper

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 14:35:36 +01:00
Joshua Valderrama 35b4dd2824 fix: session context drifting — messages leaking between chats (#135) (#267)
* docs: add implementation plan for fixing chat context drifting (#135)

* fix: make Session.history immutable + fix {}.history crash

- Session.history now exposes a COPY of the internal _history list
- add_message() replaces history with a fresh copy each time
- get_context_messages() derives from _history directly
- replace_messages() updates both _history and history
- truncate_messages() updates both _history and history
- _persist_message() line 207: fixed {}.history fallback crash
- Added 11 tests for session isolation and edge cases

Addresses #135 root cause #1: shared mutable references

* fix: task scheduler uses SessionManager methods instead of overwriting sessions

- Added ensure_task_session() to SessionManager (checks cache first)
- Task scheduler now uses ensure_task_session() instead of direct dict assignment
- Task scheduler now uses SessionManager.add_message() for message persistence
- Removed direct sess_obj.history.append() that was silently losing data

Addresses #135 root causes #2 and #3

* fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old

Prevents the cleanup task from deleting sessions that were just created
and haven't received any messages yet (message_count == 0).

Addresses #135 root cause #5

* test: comprehensive session isolation tests (10/10 passing)

* refactor: consolidate _session_manager into singleton pattern

- Added set_session_manager_instance / get_session_manager_instance to core/models
- kept backward-compat aliases (set_session_manager, get_session_manager)
- session_manager.py re-exports the singleton functions
- ai_interaction.set_session_manager now syncs with the core singleton
- context_compactor uses get_session_manager_instance() instead of getattr hack
- app.py initializes the singleton once

Addresses #135 root cause #4: fragile global wiring

* test: add concurrent session isolation integration tests

Verifies:
- Concurrent add_message to different sessions doesn't cross-contaminate
- Rapid parallel writes maintain isolation
- Read-write concurrent access is safe

All 3 async tests pass, proving the immutable history fix works under concurrency

* fix: pre-import core.models in conftest to prevent test pollution

test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at
module level during collection. Any test collected after it imports
Session as a MagicMock. Pre-importing core.models in conftest.py
before test_agent_loop.py's module-level code runs prevents this.

* fix: make .history authoritative mutable list, address PR review

Per review feedback: keep .history as the authoritative mutable list so
existing code doing .history.pop(), .history = [...], etc. still works.
Fix the cross-contamination bug by ensuring __post_init__() gives each
Session its OWN unique history list (never shared).

Changes:
- core/models.py: .history IS the authoritative list. _history aliases it.
  Each Session gets its own list in __post_init__.
- core/session_manager.py: add_message() delegates to Session.add_message()
  instead of appending directly — no double-append, single source of truth.
- tests/test_session_manager.py: updated test to reflect that .history
  references see new messages (same list, not a snapshot).
- docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for
  shipping — useful design context but too much process/doc to ship).

All 272 tests pass (3 pre-existing failures unrelated).

* Fix session manager message persistence

* Fix session history alias regressions

* Fix session history aliasing and task delivery
2026-06-09 14:12:52 +01:00
Maruf Hasan c3fcaf15b7 feat(providers): add NVIDIA AI provider endpoint support (#3456)
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 11:06:12 +02:00
Mazen Tamer Salah 3c4ec8828b fix(embeddings): survive numpy embeddings when restoring a reset lane (#3410)
When a lane reset fails to rewrite the recreated collection, the recovery path
re-adds the preserved rows. It read the embeddings with
`preserved.get("embeddings") or []` and gated the loop with
`if ids and docs and old_embeddings:`. chromadb returns embeddings as a numpy
ndarray, whose truth value is ambiguous, so both expressions raise ValueError
inside the except block — the restore is abandoned and every preserved row is
lost (the collection was already deleted), exactly when the code is trying to
avoid data loss.

Use an explicit `is None` check and `len(...)`, and convert ndarray batches to
lists before re-adding.

Adds tests/test_embedding_lane_ndarray_restore.py (preserved embeddings come
back as np.ndarray); existing test_embedding_lanes.py still passes.
2026-06-09 10:40:17 +02:00
Ashvin 2fdb4813db fix(auth): sync file-backed and in-memory owner caches on user rename (#3397)
The DB owner-rename loop in rename_user patched every SQL column named
owner, but three non-SQL stores were left behind:

1. session_manager.sessions -- in-memory Session objects carry s.owner
   set at server-boot time. get_sessions_for_user() does an exact
   s.owner == username check, so the renamed user chat sidebar goes empty
   until a server restart.

2. data/deep_research/*.json -- each completed research report is a
   standalone JSON file with an owner field. research_routes filters
   by d.get(owner) == user, making every report invisible to the
   renamed user.

3. data/memory.json -- a flat JSON array; each entry carries an owner
   field. memory_manager.load(owner=user) filters on it, so all memories
   vanish from the memory panel.

Fix: after the SQL loop, patch all three:
- iterate sm.sessions and update owner in-place (exposed via app.state)
- walk data/deep_research/*.json and rewrite owner with atomic_write_json
- update matching entries in memory.json with atomic_write_json

All three use the same case-insensitive lower() comparison the SQL loop
already uses. Each step is independently wrapped so a single failure
does not abort the others or the rename itself.

Fixes #3362
2026-06-09 10:19:45 +02:00
nubs f1cda91683 fix(agent): scope skill index to owner (#2404)
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 09:51:29 +02:00
Kenny Van de Maele 0aba00f4cf refactor(tools): remove dead workspace-confinement plumbing (#3590)
Commit e6b1009 removed the workspace feature's entry point (deleted
routes/workspace_routes.py + static/js/workspace.js and dropped the
workspace-param parsing in chat_routes), but left the downstream backend
plumbing dangling: chat_routes passed a hardcoded workspace=None into
stream_agent_loop, which forwarded it to execute_tool_block, so the
workspace value was permanently None and every workspace-gated branch
was unreachable.

Remove the now-dead code (no behavior change, since workspace was always
None):
- src/tool_execution.py: drop _resolve_tool_path_in_workspace and the
  workspace params/branches on execute_tool_block, _direct_fallback,
  _call_mcp_tool, _do_edit_file, and _resolve_search_root; restore the
  bash/python/bg cwd to _AGENT_WORKDIR.
- src/agent_loop.py: drop the workspace param on stream_agent_loop, the
  dead 'ACTIVE WORKSPACE' system-prompt block, and the workspace forward.
- routes/chat_routes.py: drop the hardcoded workspace=None arg and var.
- tests: delete test_workspace_confine.py (tested the removed feature) and
  the workspace assertion in test_tool_policy.py.

Full suite: 2903 passed, 1 skipped.
2026-06-09 08:30:50 +02:00
Afonso Coutinho fbed9027b0 fix: backup import dropping a user's skill on cross-tenant title/id collision (#2057)
* Fix backup import dropping a user's skill on cross-tenant title/id collision

The skills block of import_data deduped incoming skills against
skills_manager.load_all(), which returns EVERY tenant's skills. So when
a user imports their own backup, any skill whose id or title collides
with another user's skill was silently skipped — the importing user
lost their own data. This is the same cross-tenant bug already fixed
for the memories block just above (#1743); the skills block was left
with the old pattern. Filter the dedup sets to the importing user's own
skills (owner == user); the full store is still saved back, preserving
other users' skills.

* Restore sys.modules after stubbing so backup test does not break collection of later src.* test modules

* Patch backup_routes auth helpers via monkeypatch instead of sys.modules stubs so the test is import-order robust

* Give FakeSkillsManager an add_skill method matching the disk-backed skills API
2026-06-09 08:04:22 +02:00
Disorder AA d9141c6e56 fix(cookbook): allow spaces and non-ASCII characters in model directory paths (#3473)
* fix(cookbook): allow spaces in model directory paths

Allow POSIX external-drive paths and Windows drive paths with spaces while keeping shell metacharacters rejected.

* fix(cookbook): also allow non-ASCII (Unicode) characters in model dir paths

The ASCII-only allowlist that rejected spaces also rejected Cyrillic,
accented Latin and CJK folder names (e.g. /Volumes/Модели,
D:\AI Models\Модели) with 400 Invalid local_dir. Switch the path
character class from [A-Za-z0-9._ -] to [\w. -] (\w is Unicode-aware on
Python 3 str patterns) so localized folder names validate, while shell
metacharacters (; & | ` $ quotes newlines) stay rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(cookbook): reject local_dir path segments starting with '-'

The local_dir allowlist includes '-', so a directory like /models/-rf
(or D:\models\-rf) could be parsed as a CLI flag by hf/etc. (option
injection) — and quoting does not stop a value from being read as an
option. Guard against it inside the validator so the safety stays fully
self-contained there rather than depending on consumers' quoting.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 07:58:38 +02:00
onemorethan0 8ae2b5f58c fix(llm): suppress thinking mode for qwen3/gemma4 on Ollama /v1 endpoint (#3228)
* fix(llm): suppress thinking for qwen3/gemma4 on Ollama /v1 compat endpoint

When using qwen3, QwQ, gemma4, or other thinking models via Ollama's
OpenAI-compatible /v1 endpoint, the model routes all output into its
<think>...</think> reasoning block. Since Odysseus strips thinking
content from round_response and only accumulates native tool_calls,
this produces a round with 0 chars, 0 native calls, 0 tool blocks —
the agent appears to silently do nothing.

Root cause: Odysseus classifies the /v1 endpoint as provider="openai"
(not "ollama"), so the payload is built as a standard OpenAI payload
without any Ollama-specific options. Ollama's /v1 endpoint accepts
"think": false as a top-level parameter to suppress extended thinking,
but this was never sent.

Fix:
- Add _is_ollama_openai_compat_url() to detect local Ollama /v1 URLs
- Inject "think": false in both stream_llm and llm_call_async for
  thinking models (qwen3, QwQ, gemma4, DeepSeek-R1, etc.) on this
  endpoint

Verified with qwen3:14b on Ollama 0.24: with think=False the model
correctly emits native tool_calls in a single streaming chunk and
the agent executes bash/file/web tools as expected.

* fix(llm): extend _is_ollama_openai_compat_url to match localhost on any port

Per reviewer feedback on PR #3228:

1. Generalize host detection to mirror _is_ollama_native_url: match any
   localhost/127.0.0.1/0.0.0.0/::1 host (not just port 11434) so that
   custom OLLAMA_HOST ports and container remaps are also covered.

2. Add tests/test_llm_core_ollama_thinking.py covering:
   - _is_ollama_openai_compat_url for all positive/negative URL cases
     including IPv6, non-default port, native /api path, and real OpenAI
   - Payload injection: think:false set for Ollama /v1 thinking model,
     not set for non-thinking model, not set for real OpenAI endpoint,
     and set for localhost on a non-default port (the new case)
2026-06-09 07:35:15 +02:00
pewdiepie-archdaemon 637a34515d Merge remote-tracking branch 'origin/main' into dev 2026-06-09 10:41:48 +09:00
pewdiepie-archdaemon d397b3db2f Restore dropped regression fixes 2026-06-09 10:31:43 +09:00
pewdiepie-archdaemon 1a529d63d9 Fix remaining CI regressions 2026-06-09 10:21:56 +09:00
Boody f605bb3864 fix: Enforce dynamic custom search result limits in backend (#2359)
* fixed confusing credentials prompt

* fix(setup): return status from create_default_admin function

* fix(setup): initialize admin creation status in main function

* fix(setup): enhance admin creation feedback and status handling

* Enhance admin user login messages with conditional feedback based on creation status

* Refine admin user creation feedback messages for clarity and actionability and formatted code

* Add fallback error message for admin creation failure in setup script

* Add run script for Uvicorn with dotenv integration

* Refactor server runner to use argparse for host and port configuration

* Remove captured output print statement from server runner

* Fix server runner to ensure cross-platform compatibility and improve log handling

* removed run.py to match original repo

* Fixing custom search not working properly

* Refactor search settings event listeners for improved functionality and clarity

* Update search function signatures to use Optional for count parameter

* revert changes

* fixed broken merge issue

* Delete services/chat_data_scraper.py

added by mistake

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 02:20:59 +01:00
pewdiepie-archdaemon 37c573d865 Fix model endpoint route test regressions 2026-06-09 10:16:38 +09:00
pewdiepie-archdaemon 6f29b287f6 Remove stale plan slash toggle 2026-06-09 09:54:46 +09:00
pewdiepie-archdaemon 4715a5505d Fix duplicate cookbook server helper export 2026-06-09 09:53:41 +09:00
pewdiepie-archdaemon 84ca74f04b Restore cookbook server key exports 2026-06-09 09:51:53 +09:00
pewdiepie-archdaemon e6b1009b89 Remove non-merge-ready workspace and terminal agent hooks 2026-06-09 09:48:59 +09:00
pewdiepie-archdaemon fa8c93ec0a Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.

Highlights:
- Ollama as a first-class backend in the Cookbook:
  * Download input accepts ollama-style names (name:tag) → backend=ollama
  * /api/cookbook/ollama/library (cached scrape of ollama.com + curated
    fallback so classic models like qwen2.5 stay reachable)
  * "Browse Ollama library" toggle below Download with size chips
  * Engine=Ollama in hwfit toolbar merges the Ollama library into the
    main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
    columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
  loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
  checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
  pulled up 8px to align with the row's checkboxes, GGUF File dropdown
  exposed for Ollama backend, GPU re-render on Edit serve restore,
  _forceBackend flag so saved serveState wins over backend detection,
  cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
  Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
  Clear-offline buttons in Added Models toolbar, offline-pill removed
  (opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
  reachable (vLLM returns 404 on bare /v1, fully working endpoints
  were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
  serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
  Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
  serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
  ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
  auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish
2026-06-09 09:46:19 +09:00
pewdiepie-archdaemon 646f8bd2a9 Remove remaining plan mode frontend code 2026-06-09 09:44:22 +09:00
pewdiepie-archdaemon 2a2a93d845 Remove plan mode from merge-ready UI 2026-06-09 09:40:20 +09:00
pewdiepie-archdaemon 06a04efc59 Merge branch 'dev'
# Conflicts:
#	routes/task_routes.py
#	src/caldav_sync.py
2026-06-09 09:36:01 +09:00
pewdiepie-archdaemon 3b01760e95 Prepare tested main sync cleanup 2026-06-09 09:34:42 +09:00
Ocean Bennett db1bbfe588 fix(sessions): keep fresh chats during auto tidy (#1871)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 01:06:20 +01:00
Kenny Van de Maele 2404b00f18 refactor(uploads): centralize upload byte-limits in upload_limits.py (#3364) (#3518)
Move every per-route upload byte-limit into src/upload_limits.py as a
validated, env-overridable constant via read_byte_limit_env:

- Add GALLERY_UPLOAD_MAX_BYTES, GALLERY_TRANSFORM_UPLOAD_MAX_BYTES,
  MEMORY_IMPORT_MAX_BYTES, PERSONAL_UPLOAD_MAX_BYTES,
  EMAIL_COMPOSE_UPLOAD_MAX_BYTES, STT_MAX_AUDIO_BYTES, ICS_MAX_BYTES.
- Routes import their constant instead of defining it locally: replaces 4
  raw int(os.getenv(...)) and removes 3 hardcoded literals.
- The 3 previously-hardcoded limits (email compose, STT audio, calendar
  ICS) are now env-overridable with the same ODYSSEUS_*_MAX_BYTES naming.
- Defaults unchanged, so behavior is unchanged unless an env var is set;
  an invalid value now fails fast with a clear message instead of a bare
  int() ValueError.
- Document all env vars in .env.example and the README.

Fixes #3364
2026-06-09 01:24:30 +02:00
Alexandre Teixeira a240f28af9 test(taxonomy): auto-mark tests by area and sub-area (#3491) 2026-06-09 01:13:28 +02:00
Ocean Bennett e7c1d75884 fix(models): query v1 models for llama-server endpoints (#3380)
* fix(models): query v1 models for llama-server endpoints

* test(models): accept owner kwargs in llama-server regression
2026-06-09 01:09:02 +02:00
Mateus Oliveira f7ae85590b refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils (#3478)
* refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils

Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager()
into a single leaf module (src/tool_utils.py) that imports only from
src.constants. This eliminates the lazy-import hack
('from src import agent_tools' inside function bodies) in tool_execution.py
and tool_implementations.py, and fixes a latent bug: the _truncate copy in
tool_execution.py was missing the isinstance guard and would crash on None.

Also deletes mcp_servers/_common.py — it was dead code with zero callers
anywhere in the codebase, containing its own copy of truncate() and
constants that already exist in src/constants.py.

* fix(tools): route remaining get_mcp_manager imports to src.tool_utils

The maintainer's feedback flagged src/task_scheduler.py:1857 and
routes/task_routes.py:977. A project-wide search found a third call site
in src/agent_loop.py that also imported get_mcp_manager from
src.agent_tools instead of src.tool_utils.

All three are now sourced from the canonical location in src.tool_utils.

---------

Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>
2026-06-09 01:05:30 +02:00
Ocean Bennett 62ffcb6236 fix(cookbook): preserve same-host ssh profile selection (#3373)
* fix(cookbook): preserve same-host ssh profile selection

* fix(cookbook): resolve same-host ssh profiles in running tab and port lookups
2026-06-09 00:36:10 +02:00
Wes Huber 85c6056c87 test(models): add regression coverage for Z.AI coding endpoint probing (#2244)
Add focused tests for the z.ai/api/coding path override:
- _match_provider_curated: 5 tests verifying coding vs base key
- _probe_endpoint: 3 tests verifying model preservation, curated
  append on partial response, and base-zai exclusion

Rebased onto dev per reviewer request.

Fixes #2230

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-08 23:07:29 +01:00
Rohith Matam 049833e309 fix: skip malformed document tool call items (#3494) 2026-06-08 23:25:31 +02:00
Cookiejunky 4e497f4878 fix(cookbook): guard break-system-packages pip flag (#3510) 2026-06-08 23:10:20 +02:00
Lucas Daniel 5462030cde fix(auth): per-user allowed-models checklist ignores cache, [None] doesn't block (#3355)
Three issues combined to make the per-user 'Allowed models' checklist
unreliable (#3032):

1. admin.js _loadModelsForUser fetched /api/models, which is backed by
   cached_models — endpoints that haven't been probed yet (e.g. a
   freshly-added DeepSeek API endpoint) simply didn't show up in the
   checklist. Switched to /api/model-endpoints, which always reflects
   every configured endpoint regardless of cache state.

2. _saveModels sent allowed_models: [] both when the admin clicked
   [All] (no restriction) and [None] (block everything) — the backend
   had no way to distinguish the two.

3. _enforce_chat_privileges treated an empty allowed_models list as
   'no restriction' (falsy -> skip the check), so [None] had no effect.

Added an explicit block_all_models privilege flag (defaulting to False,
and forced to False for admins) that admin.js now sets when zero models
are checked. _enforce_chat_privileges checks it first and 403s
regardless of allowed_models contents.
2026-06-08 22:52:39 +02:00
Lucas Daniel 0a324f20d2 fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356)
* fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models

_resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block
parser whenever a model produced no native tool_calls, regardless of whether
that model has a reliable native function-calling channel. Native models
(GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write
illustrative ```bash/```python/```json examples in guide-only prose; the
fallback parser matched these and executed them as real commands, sometimes
looping for several rounds as the model tried to clarify with more examples
(#3222).

Restrict the textual fenced-block fallback to non-native models, which rely
on it as their only tool-invocation channel. Native models are trusted to use
their structured tool_calls channel for real invocations; when they don't
emit one, a bare fence in their response is prose, not an action. The native
tool_calls path itself is untouched.

This sits one layer below #3088's guide-only policy enforcement: that PR
blocks tool exposure/execution on explicit no-tools requests, while this fixes
the parser so ordinary illustrative fences are never misread as calls in the
first place, on any turn.

* fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence

_resolve_tool_blocks previously short-circuited the entire textual parser
(tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native
function-calling models with no native tool_calls. That also dropped Patterns
2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content
as text), which are real calls a model couldn't emit on its structured channel
(e.g. DeepSeek-V falling back to DSML), not illustrative examples.

parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY
Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks
passes skip_fenced=is_api_model so fenced examples stop being executed for
native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and
recoverable. cleaned_round mirrors the same gate when persisting round text, so
an illustrative fence that wasn't executed isn't stripped from saved/reloaded
history either (it was streaming once and then disappearing on reload).
2026-06-08 22:25:28 +02:00
Mazen Tamer Salah 8e494cc1c4 fix(chat): keep balanced trailing ')' when extracting URLs (#3406)
extract_urls() stripped any trailing ')' unconditionally via
`re.sub(r'[.,;:!?\)]+$', '', url)`. That corrupts URLs that legitimately
end in a parenthesis — most commonly Wikipedia disambiguation links like
https://en.wikipedia.org/wiki/Python_(programming_language), which became
...Python_(programming_language and then 404 when fetched by the web/research
tools.

Strip trailing sentence punctuation as before, but only drop a ')' when it is
unbalanced (more ')' than '('), so a prose-glued "(see https://example.com)"
still loses its closing paren while balanced URLs keep theirs.

Added tests/test_extract_urls.py covering balanced, unbalanced, nested, and
trailing-punctuation cases.
2026-06-08 21:33:29 +02:00
nubs 932b7f2446 fix(email): close IMAP socket when connect/login fails (#3174) (#3363)
* fix(email): close IMAP socket when connect/login fails (#3174)

_imap_connect opened a live socket via _open_imap_connection and then
called conn.login() with no try/finally, and _open_imap_connection called
conn.starttls() unguarded. When auth fails (e.g. an Office 365 app password
on an MFA-enabled tenant, #3174) or STARTTLS is rejected, the already-open
socket was orphaned. Every IMAP caller funnels through _imap_connect,
including the 30-minute _auto_summarize_poller, so a persistently
misconfigured account leaked one descriptor per pass toward FD exhaustion.

The previously merged leak fixes (#1325/#1330/#1423/#1530) only guard the
post-connect body and monkeypatch _imap_connect to succeed, so this
connect-time path was uncovered. Wrap login() and starttls() so a failure
calls conn.shutdown() (low-level close; logout() can't run pre-auth) before
re-raising. Adds two regression tests that fail without the guard.

* fix(email): guard MCP IMAP+SMTP connect-time leaks too (#3174)

Folds in the sibling connect-time leaks vdmkenny flagged on #3363, so the
whole connect-then-step leak class is closed in one place:

- mcp_servers/email_server.py::_imap_connect — guard starttls() and login();
  close pre-auth with conn.shutdown() before re-raising.
- mcp_servers/email_server.py::_smtp_connect — guard starttls() and login();
  SMTP has no shutdown(), so close with conn.close() (socket close, no QUIT).

Routes SMTP (_send_smtp_message) is already safe via 'with smtplib.SMTP(...)'.
Adds four regression tests (one per guard), verified to fail without the fix.
2026-06-08 21:21:41 +02:00
Alex Little a58f526992 fix(presets): scope expand-prompt model resolution to owner (#3477)
* fix(presets): scope expand-prompt model resolution to owner

/api/presets/expand resolved its model endpoint with no owner, so in a
multi-user setup it could match another user's endpoint and use its URL
and decrypted api_key. Pass effective_user(request) to _resolve_model so
resolution is owner-scoped. Adds a regression test.

* fix(presets): scope teacher and audit model resolution to owner

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alex Little <alexwilliamlittle@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 21:12:02 +02:00
nopoz ed6cc88974 ci: harden existing workflows for the security gate (#3498)
Pin actions to commit SHAs, set persist-credentials: false on every
checkout, and scope token permissions to the jobs that use them. Suppress
the two findings that are safe by design: the description bot's
pull_request_target trigger (no fork code runs) and an intentional
word-split in the docker manifest step.

Clears actionlint and zizmor against dev so the blocking gate from #1314
can pass once both land.
2026-06-08 20:58:59 +02:00
Mazen Tamer Salah 5198516979 fix(sessions): copy message metadata when forking a session (#3409)
fork_session passed each source message's metadata dict by reference into the
new session. add_message() -> _persist_message() stamps _db_id (and timestamp)
onto that dict in place, so persisting the fork overwrote the SOURCE messages'
_db_id with the forked rows' ids — silently breaking edit/delete-by-id on the
original conversation.

Copy the metadata dict per message so the fork and source no longer alias.

Adds tests/test_fork_session_metadata.py asserting the source session's
message metadata is unchanged after a fork.
2026-06-08 20:49:15 +02:00
Giuseppe Castelluccio 095c74b985 fix(security): fail closed in /api/models auth gate on unexpected errors (#3489)
GET /api/models swallowed any non-HTTPException raised while checking
whether the caller is authenticated (bare except Exception: pass), so a
broken auth_manager or an exception from get_current_user silently
granted the full model list to an anonymous caller instead of rejecting
the request. Now any unexpected exception logs and returns HTTP 500.

Split out of #2360 per reviewer request to keep the deny-list and the
auth-gate fix as separate, single-purpose PRs.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 20:23:39 +02:00
CorVous 34a3f8637a fix(memory): make auto-memory extraction reliable for reasoning models (#3190)
* fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn

extract_and_store appended the recent window as raw alternating role messages
after the system prompt. Since the window is the last N messages, the prompt
usually ENDED on an assistant turn — and a chat model given a prompt ending on
an assistant turn returns an empty completion (nothing to answer). The result
was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no
memories were ever stored, while skill extraction (which flattens the transcript
into a single user message) worked fine.

Flatten the window into one user message ending with an explicit instruction,
mirroring the skill extractor, so the model always responds. Also harden parsing
for reasoning models, matching the audit path which already does this:
- raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think>
  before emitting JSON; 500 truncated it before any JSON appeared);
- strip <think>/prose preambles via strip_think and slice the embedded JSON
  array before json.loads, instead of bombing on char 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(memory): reframe the comment to the accurate root cause (raw-chat framing)

The earlier comment leaned on "ends on an assistant turn -> empty completion",
which is only one failure mode. The dominant cause, confirmed by a controlled
repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat
messages makes the model treat it as a conversation to continue rather than a
transcript to analyze, so it returns [] even when durable facts are present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally

Factor the strip/fence/slice/json.loads logic out of extract_and_store into
a pure module-level helper _parse_extraction_json(raw) -> list and drop the
'text[0] != "["' guard so the array is sliced whenever both brackets exist
(fixes trailing commentary like '[...] Done!' reaching json.loads).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 19:57:44 +02:00
Mazen Tamer Salah 8449baea80 fix(api-tokens): preserve scopes on a partial token update (#3407)
PATCH /api/tokens/{id} unconditionally recomputed scopes from
payload.get("scopes"). On a rename — body {"name": "..."} with no "scopes"
key — that is None, so _normalize_scopes(None) returned the default ["chat"]
and the handler overwrote token.scopes, silently dropping every scope the
token had been granted (e.g. email:read, calendar:write).

Only write scopes when the request actually includes them, and return the
token's real stored scopes in the response (matching the GET /tokens display
shape) instead of the recomputed default.

tests/test_api_token_routes.py: add rename-preserves-scopes,
explicit-scopes-applied, and missing-token-404 cases for the PATCH handler.
2026-06-08 19:37:31 +02:00
Mazen Tamer Salah d58202d10e fix(presets): persist presets atomically to avoid corruption on crash (#2169)
PresetManager.save() used a plain open("w") + json.dump, which truncates
presets.json before writing the new content. A crash, power loss, or
serialization error mid-write leaves the file truncated/empty and every
saved preset is lost.

Route the write through core.atomic_io.atomic_write_json (tmp file +
os.replace), matching how the rest of the codebase persists JSON state.
The helper is imported lazily so this module stays free of the heavy core
package import graph at module load time.

Adds tests/test_preset_atomic_save.py covering the source contract, a
failed-write leaving the existing file intact, and a round trip.
2026-06-08 19:16:37 +02:00
Mazen Tamer Salah 1209f258d7 fix(caldav): skip the prune when any object fails to parse (#3454)
* fix(caldav): don't prune the whole window when no objects could be parsed

The post-sync prune deletes local origin=="caldav" rows in the window whose UID
the server didn't just return. With an empty seen_uids it falls back to
`uid.isnot(None)` — a match-all delete. That's right when the calendar is
genuinely empty, but when the server returns objects and every one fails to
parse (malformed iCal / an icalendar error), seen_uids is empty only because
nothing could be read, so the match-all branch silently deletes every local
event in the 90-day-back/365-day-forward window.

Track whether any object failed to parse and gate the prune with a small pure
helper `_should_prune_window(seen_uids, parse_failed)`: prune when something was
read, or when the calendar is genuinely empty (no objects, no parse errors), but
never when objects came back unreadable.

Adds tests/test_caldav_prune_parse_failure.py for the three cases.

* fix(caldav): skip the prune on any parse failure, not just total

Review follow-up (#3454): _should_prune_window returned True whenever seen_uids
was non-empty, so a partial parse failure (say 48 of 50 objects parse) still
pruned the 2 unreadable-but-still-upstream events, because their UIDs were absent
from seen_uids. Any parse failure makes seen_uids an incomplete view of the
server, so pruning against it is unsafe whether the failure is total or partial.

Skip the prune on any parse failure (return not parse_failed); only prune on a
clean read (a genuinely empty window is still safe to prune). Tradeoff: one
permanently-unparseable event pauses deletion mirroring until it is fixed, which
is the safe direction (false-keep beats false-delete).

Replace the now-incorrect "partial failure still prunes" assertion with a
partial-failure regression: one object parses, one fails, so the prune is
skipped and the unparsed event's local copy is not deleted.

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 18:59:14 +02:00
Mazen Tamer Salah d71284194b fix(memory): only delete memories the model explicitly drops in tidy (#3455)
* fix(memory): only delete memories the model explicitly drops in tidy

The AI memory-tidy path computed deletions as the complement of the model's
`keep` list (`if mid not in keep_ids: continue`). When the model returned a
valid response that simply omitted some existing ids — a common LLM lapse — every
omitted memory was silently deleted, even though it was neither a duplicate nor
listed in `drop`.

Honor the explicit `drop` set instead: delete only ids the model dropped (minus
any it saw only truncated), and preserve everything else, still applying cleaned
text/category from `keep`.

Adds tests/test_consolidate_memory_explicit_drops.py: a memory the model omits
from both keep and drop survives; an explicitly dropped one is removed.

* refactor(memory): remove now-dead keep_ids from tidy

After deletion switched to drop_ids and text/category rewrites to cleaned_by_id,
keep_ids was written but never read. Remove the init, the .add(mid) in the keep
loop, and the truncated .update() (its truncated-protection is already covered by
`drop_ids -= truncated_ids`). Pure deletion, no behavior change; tests stay green.

Addresses review feedback on #3455.

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 18:54:45 +02:00
Aman Tewary d458cade98 docs(email): clarify Outlook password auth failures
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-08 15:32:16 +01:00
PewDiePie fe19d072e3 Revert "fix: expose supports_tools toggle for local endpoints in UI (#3195)" (#3438)
This reverts commit 7b68413433.

Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 14:46:01 +02:00
PewDiePie 09565acc1e Revert "feat(model-picker): add remove-from-recent button to Recent section rows (#2894)" (#3437)
This reverts commit 2a422c00ec.

Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>
2026-06-08 14:41:25 +02:00
Mostafa Eid d6882a895e feat(chat): recall last user message on empty composer ArrowUp (#1175)
Pressing ArrowUp on an empty #message composer restores the last sent user text, matching common chat-app UX (Slack, Discord, ChatGPT).

- Read from #chat-history .msg-user dataset.raw (same path as resend/regenerate), not session sidebar metadata

- Literal empty check (whitespace-only drafts are preserved); ignore Shift/Alt/Ctrl/Meta and IME composition

- Extract wiring to composerArrowUpRecall.js; rAF + 250ms retry only (no global MutationObserver)

- Add tests/test_composer_arrow_up_recall_js.py

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 13:06:05 +02:00