SKILL.md files written with mixed-case owner (e.g. 'owner: Alice') were
skipped because the regex had no IGNORECASE flag. _usage.json keys like
'Alice::skill-name' were missed by the startswith prefix check for the
same reason.
Both comparisons now match the same way the deep_research and memory
blocks do — case-insensitively against old_username.
Fixes#3611
Direct DbSession.owner == user becomes WHERE owner IS NULL when user is None
(auth disabled), hiding all sessions that carry an explicit owner. Same flaw
on the Document and GalleryImage sub-queries (active-doc and gallery badges).
Replace all three with owner_filter(), which is a no-op when user is falsy.
Fixes#3620
The _migrate_* startup helpers in core/database.py opened a raw
sqlite3.connect() inside a try and called conn.close() as the last
statement in that try. If any earlier statement raised (locked DB,
unexpected schema, a failed ALTER), close() was skipped and the bare
except only logged the error — leaking the connection (file handle +
lock) for the lifetime of the process. These migrations run on every
startup.
Wrap each in the conn = None + try/except/finally pattern already used
by _migrate_chat_messages_fts in this same file, so the connection is
closed on all exit paths. 25 functions; no change on the success path.
Helpers that already close safely are left untouched: _migrate_chat_messages_fts
and _migrate_backfill_task_folders (the latter uses SQLAlchemy's
engine.connect() context manager).
Same bug class as the previously merged DB-connection-leak fix (#64)
and the IMAP logout-on-all-paths fix (#1530).
* fix(ui): escaped SVG renders as raw markup during web_search tool label
The _toolLabels['web_search'] entry embedded an SVG HTML string
concatenated with label text. At render time the entire value was
passed through esc(), HTML-escaping <svg> tags so the icon
displayed as raw text instead of rendering visually.
Fix: separate icon from label text via a _toolIcons map. The SVG
is injected as raw innerHTML (unescaped) in .agent-thread-icon,
while the label text remains safely escaped.
* test: add behavioral test for web_search tool icon rendering
Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com>
---------
Co-authored-by: TheDragonTail <jakeoldfield2@gmail.com>
* fix(security): don't grant tool access in the pre-setup window
owner_is_admin_or_single_user() returned True whenever auth was not
configured, which conflated two very different states:
- intentional single-user mode (operator set AUTH_ENABLED=false), and
- the pre-setup window (auth enabled, but no admin created yet).
In the second state, blocked_tools_for_owner() returned an empty set, so
server-execution tools (bash/python) and other admin-only tools were
ungated. The auth middleware already 401s /api/ requests pre-setup, but a
caller that bypasses it (trusted loopback / internal-tool path) could reach
those tools before setup completed.
Treat "not configured" as admin only when auth is intentionally disabled
(AUTH_ENABLED=false), mirroring the AUTH_ENABLED parsing in app.py and
core.middleware. Single-user mode is preserved; the pre-setup window is now
non-admin as defense-in-depth.
Adds regression tests for both states.
Fixes#3201
Supported by Claude Opus 4.8
* refactor(security): reuse _auth_disabled() instead of a duplicate helper
Addresses review on #3506: src/auth_helpers.py already has _auth_disabled()
with the identical AUTH_ENABLED parse. Drop the duplicate
_auth_intentionally_disabled() and call the existing helper via a lazy import
inside owner_is_admin_or_single_user (mirroring the lazy core.auth import) to
avoid any import cycle. Removes the now-unused `import os`. Behaviour and the
two regression tests are unchanged.
Supported by Claude Opus 4.8
---------
Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
* fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache
Fixes#2927. As diagnosed in the issue, three things in Odysseus's request
pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache
continuity, forcing a full prompt re-evaluation (15-30s+) on every turn:
1. Dynamic content folded into the system prompt every turn. Both the chat
preface (ChatProcessor.build_context_preface) and the agent system prompt
(_build_system_prompt) injected current_datetime_prompt() — text that
changes every minute — directly into system-role messages, which llm_core
then concatenates into the single system message sent as the cached
prefix. Any byte difference there invalidates the entire cache. Moved this
to a new current_datetime_context_message() helper that returns a
standalone user-role message, inserted near the end of the array (right
before the latest user turn) instead of mixed into the system prompt. The
static system prefix (preset prompt + safety policy + agent base prompt)
now stays byte-identical across turns of the same session.
2. Memory/skill extraction side-requests competed with the main completion.
run_post_response_tasks fired extract_and_store / maybe_extract_skill via
asyncio.create_task — fire-and-forget coroutines that could overlap the
next turn's main request and steal llama.cpp's limited processing slots,
evicting the cached checkpoint. They're now queued through a new
_run_extraction_jobs_sequentially helper that waits for the session's
stream to go idle and runs the jobs strictly one at a time.
3. No stable session identifier was sent to local backends, so llama.cpp
assigned a new processing slot via LRU every turn ("session_id=<empty>
server-selected (LCP/LRU)"), losing slot affinity. Added
_apply_local_cache_affinity() in llm_core, which sets session_id and
cache_prompt: true on outgoing payloads — gated to self-hosted
OpenAI-compatible endpoints only (never api.openai.com or other cloud
providers, which reject unrecognized request fields with a 400). Threaded
session_id through stream_llm / llm_call_async / stream_agent_loop from
the existing Odysseus session id.
Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload-
assembly and scheduling code paths: byte-identical system prefix across two
turns of the same session (with a regression check that genuinely changed
instructions DO still change it), the dynamic time block landing as a
user-role message, extraction jobs waiting for the stream to go idle and
running sequentially, and the outgoing payload carrying a stable session_id
(same across turns of one session, different across sessions) only for
self-hosted endpoints. Updated tests/test_user_time.py for the new message
placement.
* fix(tests): accept owner= kwarg in normalize_model_id monkeypatch
The upstream normalize_model_id signature now takes an owner= keyword
argument, and chat_helpers.py passes owner=getattr(sess, "owner", None)
at the call site. Update the test stub lambda to **kwargs so it handles
the new argument without breaking, and update chat_helpers.py to forward
the owner parameter consistently.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut
* fix(integrations): avoid mutating response dict in-place on truncation
* fix(integrations): truncate dict responses and bound list sentinel overhead
- Dict path now walks keys in insertion order, adding them one at a time
while checking that the accumulated dict + _truncated marker fits within
the 12 000-char limit. Previously the marker was appended without removing
any content, so large dicts were not actually truncated.
- List path now subtracts the sentinel's serialised size (+ element-separator
padding) from the budget before binary-searching, so the final array
including the sentinel stays at or under the limit.
- Add regression tests: large-dict actually-truncated, small-dict pass-through,
and list-with-sentinel respects the size bound.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
Providers like Moonshot (Kimi K2.5/K2.6) require the reasoning_content
field to be present on assistant tool-call messages in multi-turn
conversations. The sanitizer's allow-list was missing this field,
causing HTTP 400: 'thinking is enabled but reasoning_content is missing
in assistant tool call message at index N'.
Add reasoning_content to the allowed field set in
_sanitize_llm_messages and cover with regression tests.
Fixes#3118
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* merged two delete_calander functions performing the same thing
* added proper 404 raise when nothing is found
* removed 404 HTTPException and jus reverted it back to raise
The frontend's text assets shipped uncompressed on every cold load. Add
Starlette's GZipMiddleware. Measured on the current assets:
- style.css 1,127 KB -> 238 KB (-79%)
- index.html 202 KB -> 35 KB (-83%)
- chat.js 238 KB -> 60 KB (-75%)
minimum_size=1024 skips tiny bodies; Starlette excludes `text/event-stream` by
default, so the SSE streams (chat, shell, research, model-probe — all served with
media_type="text/event-stream") are never compressed or buffered. Composes
cleanly with the existing security-header middleware. No behavioural change.
Built by OdWar -- with Claude thinking alongside.
* fix(macos): start ChromaDB from start-macos.sh so tool calling works
start-macos.sh never started ChromaDB, so the tool index failed to initialize
and tool/MCP injection silently degraded on native macOS installs (no Docker).
Start a local chroma from the venv before launching, mirroring the existing
Apfel background+trap pattern: idempotent (skips if 8100 is already serving),
honors CHROMADB_HOST/CHROMADB_PORT (skips when remote), logs to a file, persists
to data/chroma, and is killed in the exit trap.
Fixes#3297
* fix(macos): bind/probe ChromaDB on IPv4 loopback to match app resolution
Binding to the literal localhost can land on IPv6 ::1 while the app connects to
localhost->127.0.0.1, leaving them unable to reach each other. Pin bind + probe
to 127.0.0.1 (0.0.0.0 still honored).
* style(macos): trim chromadb comments (present-tense, no issue refs)
#3322 renamed the loopback base to _INTERNAL_BASE, but a later Cookbook
commit reintroduced one call site using the old _COOKBOOK_BASE name,
raising NameError whenever the agent registers a model endpoint for a
running serve session.
Fixes#3669
ChatGPT's Codex API rejects any request that includes max_output_tokens,
returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused
Deep Research to always fail during the endpoint probe when a ChatGPT
Subscription model was selected.
Remove the conditional that set payload["max_output_tokens"] in
_build_chatgpt_responses_payload(). The parameter is simply not sent.
Also update the two affected tests:
- Rename test_chatgpt_subscription_payload_uses_max_output_tokens →
test_chatgpt_subscription_payload_omits_max_output_tokens
- Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens →
test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero
- Assert max_output_tokens is absent rather than present
Fixes#3650
The module derived its state file path as Path(os.environ.get("DATA_DIR", "data"))
/ "cookbook_state.json". The correct env var is ODYSSEUS_DATA_DIR, which is
already read by src/constants.py and exported as COOKBOOK_STATE_FILE. When
ODYSSEUS_DATA_DIR is set (Docker, custom installs), the old code read the wrong
env var and silently wrote state to data/cookbook_state.json relative to CWD
while every other file resolved under the custom data directory.
Fixes#3621
* Make in-venv pip-fallback test independent of the runner's environment
test_pip_install_fallback_chain_propagates_failure_in_venv simulated the in-venv case by probing the real interpreter (sys.prefix != sys.base_prefix). That assumes the test runner is itself inside a venv. CI runs pytest with no venv, so venv_check reported not-in-venv, the negated guard flipped, the --user branch fired, and the assertion failed. Make venv_check exit 0 directly to simulate the in-venv condition deterministically, mirroring the outside-venv companion test.
* Stop agent-tool import shims from leaking into the admin-gate test
test_function_call_non_object_args and test_unknown_tool_calls stub heavy DB/auth deps at import time to load the real agent-tool stack, but they popped src.tool_execution and left core.auth stubbed without restoring. Popping and re-importing src.tool_execution rebinds the src package's tool_execution attribute, so test_edit_file's later 'import src.tool_execution as te' resolved to a different module object than the one execute_tool_block lives in. The monkeypatch on _owner_is_admin then missed, the non-admin edit_file gate never fired, and the edit went through (exit_code 0). Stop touching src.tool_execution and restore the heavy stubs after import. Verified the full suite is green on Linux (Python 3.11, matching CI).
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* Add consolidated service health endpoint for degraded-state reporting
ROADMAP (High Priority) asks for "Better degraded-state reporting for
ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no
single readout of which subsystems are actually working: /api/health is only a
liveness ping and each subsystem's signal lives in a different module, so a
misconfigured self-host install gives no consolidated picture.
This adds an admin-only GET /api/diagnostics/services endpoint backed by a new
src/service_health.py aggregator. Each subsystem reports a uniform
{name, status, detail, meta} where status is ok | degraded | down | disabled,
and the response rolls up an overall verdict (worst non-disabled status).
Probes are deliberately non-intrusive and safe to poll:
- ChromaDB: reads the .healthy flags on the RAG and memory vector stores.
- SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No
search query is run.
- ntfy: GET the server's built-in /v1/health. No test notification is sent.
- email: short IMAP connect+logout per configured account (no credentials in
meta).
- providers: probe each enabled ModelEndpoint's model list (no api_key in meta).
Probe functions take their inputs as parameters and isolate the network call to
injectable callables, so they unit-test without touching the network (same
pattern as the merged provider-endpoint tests). Network probes run concurrently
off the event loop via asyncio.to_thread with bounded per-probe timeouts.
memory_vector is now passed into setup_diagnostics_routes (new optional param,
backward-compatible) so ChromaDB's vector-memory store can be reported too.
Tests: tests/test_service_health.py — 29 tests covering every status mapping
per subsystem, the overall rollup, and that no secrets leak into meta.
Verification:
python -m pytest tests/test_service_health.py -q # 29 passed
python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py
python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q
Backend + tests only; an Admin/Settings UI badge that renders this endpoint is
a natural follow-up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(diagnostics): bound service-health wall-clock and redact secrets
Addresses review on #964.
Blocker 1 — genuinely bounded wall-clock:
- providers_health and email_health now fan out per-item probes across a
bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET),
instead of probing endpoints/accounts sequentially. Stragglers are reported
as a controlled `timeout` and never block; the pool is shut down with
wait=False so the response returns on time regardless of endpoint/account
count.
- The IMAP connect path now honors the service-health budget: _imap_connect
gained a pass-through `timeout` param and the probe calls it with
_PROBE_TIMEOUT instead of the default 15s.
- collect_service_health runs the four network subsystems concurrently, each
under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall
wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop.
Blocker 2 — no secret/raw-error leakage in the response:
- _safe_url strips userinfo, query, and fragment from every URL surfaced in
meta (searxng instance, ntfy base, provider name fallback), keeping only
scheme/host/port/path.
- _classify_error maps every probe failure to a controlled category token
(timeout, connection_refused, dns_error, tls_error, network_error,
http_error, auth_or_protocol_error, …) — raw str(exception), which can embed
credentialed URLs or server text, is never returned.
Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py):
- URL userinfo/query redaction for searxng/ntfy/providers.
- secret-bearing exception strings map to categories and don't leak.
- multiple slow providers/accounts stay bounded (single + 25-endpoint cases).
- subsystems run concurrently; aggregate deadline yields a controlled result.
- route-level unauthenticated (401) / non-admin (403) / admin (200) coverage.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(diagnostics): isolate route tests so they don't leak module globals
The new route tests replaced src.service_health.collect_service_health and
routes.diagnostics_routes.require_admin via direct assignment, which persisted
for the rest of the pytest session. In CI's full alphabetical run that fake
collector (returning services=[]) leaked into the later collect_service_health
tests and failed them. Switch to monkeypatch.setattr so both are restored after
each test. No production code change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* refactor(tools): implement strict cohesive class coordinator pattern per #2917
* test: update edit_file tests to use EditFileTool class
* fix(tools): restore tool_policy param and security backstop in coordinator
* refactor(tools): migrate domain tools to agent_tools package per #2917
* test: update test imports for new agent_tools package
* fix: resolve circular import between tool_execution and agent_tools
* fix: remove leftover git conflict markers
* fix(tools): resolve pytest failure and document _apply method
* fix(tools): clean up whitespace and remove dead _tool_python helper
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* docs: add implementation plan for fixing chat context drifting (#135)
* fix: make Session.history immutable + fix {}.history crash
- Session.history now exposes a COPY of the internal _history list
- add_message() replaces history with a fresh copy each time
- get_context_messages() derives from _history directly
- replace_messages() updates both _history and history
- truncate_messages() updates both _history and history
- _persist_message() line 207: fixed {}.history fallback crash
- Added 11 tests for session isolation and edge cases
Addresses #135 root cause #1: shared mutable references
* fix: task scheduler uses SessionManager methods instead of overwriting sessions
- Added ensure_task_session() to SessionManager (checks cache first)
- Task scheduler now uses ensure_task_session() instead of direct dict assignment
- Task scheduler now uses SessionManager.add_message() for message persistence
- Removed direct sess_obj.history.append() that was silently losing data
Addresses #135 root causes #2 and #3
* fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old
Prevents the cleanup task from deleting sessions that were just created
and haven't received any messages yet (message_count == 0).
Addresses #135 root cause #5
* test: comprehensive session isolation tests (10/10 passing)
* refactor: consolidate _session_manager into singleton pattern
- Added set_session_manager_instance / get_session_manager_instance to core/models
- kept backward-compat aliases (set_session_manager, get_session_manager)
- session_manager.py re-exports the singleton functions
- ai_interaction.set_session_manager now syncs with the core singleton
- context_compactor uses get_session_manager_instance() instead of getattr hack
- app.py initializes the singleton once
Addresses #135 root cause #4: fragile global wiring
* test: add concurrent session isolation integration tests
Verifies:
- Concurrent add_message to different sessions doesn't cross-contaminate
- Rapid parallel writes maintain isolation
- Read-write concurrent access is safe
All 3 async tests pass, proving the immutable history fix works under concurrency
* fix: pre-import core.models in conftest to prevent test pollution
test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at
module level during collection. Any test collected after it imports
Session as a MagicMock. Pre-importing core.models in conftest.py
before test_agent_loop.py's module-level code runs prevents this.
* fix: make .history authoritative mutable list, address PR review
Per review feedback: keep .history as the authoritative mutable list so
existing code doing .history.pop(), .history = [...], etc. still works.
Fix the cross-contamination bug by ensuring __post_init__() gives each
Session its OWN unique history list (never shared).
Changes:
- core/models.py: .history IS the authoritative list. _history aliases it.
Each Session gets its own list in __post_init__.
- core/session_manager.py: add_message() delegates to Session.add_message()
instead of appending directly — no double-append, single source of truth.
- tests/test_session_manager.py: updated test to reflect that .history
references see new messages (same list, not a snapshot).
- docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for
shipping — useful design context but too much process/doc to ship).
All 272 tests pass (3 pre-existing failures unrelated).
* Fix session manager message persistence
* Fix session history alias regressions
* Fix session history aliasing and task delivery
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)
* feat: add NVIDIA option to provider settings dropdown and aliases
* test: add NVIDIA provider detection and endpoint tests
* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering
- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
-embedqa, -nemoretriever
* Expand non-chat model filtering for NVIDIA embedding/guard/video models
Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
ai-synthetic-video, cosmos-reason2
Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)
* Filter non-chat models in _probe_endpoint via _is_chat_model()
Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.
* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models
Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.
Adds: embed, bge, recurrent, starcoder, gemma-2b
* fix(model-routes): drop collision-prone substrings from global non-chat filter
The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:
- gemma-2b matches google/gemma-2b-it (instruct chat model)
- starcoder matches bigcode/starcoder2-15b (code completion model)
- recurrent matches google/recurrentgemma-2b (language model)
- guard matches meta-llama/Llama-Guard-3-8B (safety classifier)
Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.
Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS
Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* style: fix indentation of groq and xai test cases in test_provider_endpoints.py
---------
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
When a lane reset fails to rewrite the recreated collection, the recovery path
re-adds the preserved rows. It read the embeddings with
`preserved.get("embeddings") or []` and gated the loop with
`if ids and docs and old_embeddings:`. chromadb returns embeddings as a numpy
ndarray, whose truth value is ambiguous, so both expressions raise ValueError
inside the except block — the restore is abandoned and every preserved row is
lost (the collection was already deleted), exactly when the code is trying to
avoid data loss.
Use an explicit `is None` check and `len(...)`, and convert ndarray batches to
lists before re-adding.
Adds tests/test_embedding_lane_ndarray_restore.py (preserved embeddings come
back as np.ndarray); existing test_embedding_lanes.py still passes.
The DB owner-rename loop in rename_user patched every SQL column named
owner, but three non-SQL stores were left behind:
1. session_manager.sessions -- in-memory Session objects carry s.owner
set at server-boot time. get_sessions_for_user() does an exact
s.owner == username check, so the renamed user chat sidebar goes empty
until a server restart.
2. data/deep_research/*.json -- each completed research report is a
standalone JSON file with an owner field. research_routes filters
by d.get(owner) == user, making every report invisible to the
renamed user.
3. data/memory.json -- a flat JSON array; each entry carries an owner
field. memory_manager.load(owner=user) filters on it, so all memories
vanish from the memory panel.
Fix: after the SQL loop, patch all three:
- iterate sm.sessions and update owner in-place (exposed via app.state)
- walk data/deep_research/*.json and rewrite owner with atomic_write_json
- update matching entries in memory.json with atomic_write_json
All three use the same case-insensitive lower() comparison the SQL loop
already uses. Each step is independently wrapped so a single failure
does not abort the others or the rename itself.
Fixes#3362
Commit e6b1009 removed the workspace feature's entry point (deleted
routes/workspace_routes.py + static/js/workspace.js and dropped the
workspace-param parsing in chat_routes), but left the downstream backend
plumbing dangling: chat_routes passed a hardcoded workspace=None into
stream_agent_loop, which forwarded it to execute_tool_block, so the
workspace value was permanently None and every workspace-gated branch
was unreachable.
Remove the now-dead code (no behavior change, since workspace was always
None):
- src/tool_execution.py: drop _resolve_tool_path_in_workspace and the
workspace params/branches on execute_tool_block, _direct_fallback,
_call_mcp_tool, _do_edit_file, and _resolve_search_root; restore the
bash/python/bg cwd to _AGENT_WORKDIR.
- src/agent_loop.py: drop the workspace param on stream_agent_loop, the
dead 'ACTIVE WORKSPACE' system-prompt block, and the workspace forward.
- routes/chat_routes.py: drop the hardcoded workspace=None arg and var.
- tests: delete test_workspace_confine.py (tested the removed feature) and
the workspace assertion in test_tool_policy.py.
Full suite: 2903 passed, 1 skipped.
* Fix backup import dropping a user's skill on cross-tenant title/id collision
The skills block of import_data deduped incoming skills against
skills_manager.load_all(), which returns EVERY tenant's skills. So when
a user imports their own backup, any skill whose id or title collides
with another user's skill was silently skipped — the importing user
lost their own data. This is the same cross-tenant bug already fixed
for the memories block just above (#1743); the skills block was left
with the old pattern. Filter the dedup sets to the importing user's own
skills (owner == user); the full store is still saved back, preserving
other users' skills.
* Restore sys.modules after stubbing so backup test does not break collection of later src.* test modules
* Patch backup_routes auth helpers via monkeypatch instead of sys.modules stubs so the test is import-order robust
* Give FakeSkillsManager an add_skill method matching the disk-backed skills API
* fix(cookbook): allow spaces in model directory paths
Allow POSIX external-drive paths and Windows drive paths with spaces while keeping shell metacharacters rejected.
* fix(cookbook): also allow non-ASCII (Unicode) characters in model dir paths
The ASCII-only allowlist that rejected spaces also rejected Cyrillic,
accented Latin and CJK folder names (e.g. /Volumes/Модели,
D:\AI Models\Модели) with 400 Invalid local_dir. Switch the path
character class from [A-Za-z0-9._ -] to [\w. -] (\w is Unicode-aware on
Python 3 str patterns) so localized folder names validate, while shell
metacharacters (; & | ` $ quotes newlines) stay rejected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(cookbook): reject local_dir path segments starting with '-'
The local_dir allowlist includes '-', so a directory like /models/-rf
(or D:\models\-rf) could be parsed as a CLI flag by hf/etc. (option
injection) — and quoting does not stop a value from being read as an
option. Guard against it inside the validator so the safety stays fully
self-contained there rather than depending on consumers' quoting.
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): suppress thinking for qwen3/gemma4 on Ollama /v1 compat endpoint
When using qwen3, QwQ, gemma4, or other thinking models via Ollama's
OpenAI-compatible /v1 endpoint, the model routes all output into its
<think>...</think> reasoning block. Since Odysseus strips thinking
content from round_response and only accumulates native tool_calls,
this produces a round with 0 chars, 0 native calls, 0 tool blocks —
the agent appears to silently do nothing.
Root cause: Odysseus classifies the /v1 endpoint as provider="openai"
(not "ollama"), so the payload is built as a standard OpenAI payload
without any Ollama-specific options. Ollama's /v1 endpoint accepts
"think": false as a top-level parameter to suppress extended thinking,
but this was never sent.
Fix:
- Add _is_ollama_openai_compat_url() to detect local Ollama /v1 URLs
- Inject "think": false in both stream_llm and llm_call_async for
thinking models (qwen3, QwQ, gemma4, DeepSeek-R1, etc.) on this
endpoint
Verified with qwen3:14b on Ollama 0.24: with think=False the model
correctly emits native tool_calls in a single streaming chunk and
the agent executes bash/file/web tools as expected.
* fix(llm): extend _is_ollama_openai_compat_url to match localhost on any port
Per reviewer feedback on PR #3228:
1. Generalize host detection to mirror _is_ollama_native_url: match any
localhost/127.0.0.1/0.0.0.0/::1 host (not just port 11434) so that
custom OLLAMA_HOST ports and container remaps are also covered.
2. Add tests/test_llm_core_ollama_thinking.py covering:
- _is_ollama_openai_compat_url for all positive/negative URL cases
including IPv6, non-default port, native /api path, and real OpenAI
- Payload injection: think:false set for Ollama /v1 thinking model,
not set for non-thinking model, not set for real OpenAI endpoint,
and set for localhost on a non-default port (the new case)
* fixed confusing credentials prompt
* fix(setup): return status from create_default_admin function
* fix(setup): initialize admin creation status in main function
* fix(setup): enhance admin creation feedback and status handling
* Enhance admin user login messages with conditional feedback based on creation status
* Refine admin user creation feedback messages for clarity and actionability and formatted code
* Add fallback error message for admin creation failure in setup script
* Add run script for Uvicorn with dotenv integration
* Refactor server runner to use argparse for host and port configuration
* Remove captured output print statement from server runner
* Fix server runner to ensure cross-platform compatibility and improve log handling
* removed run.py to match original repo
* Fixing custom search not working properly
* Refactor search settings event listeners for improved functionality and clarity
* Update search function signatures to use Optional for count parameter
* revert changes
* fixed broken merge issue
* Delete services/chat_data_scraper.py
added by mistake
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.
Highlights:
- Ollama as a first-class backend in the Cookbook:
* Download input accepts ollama-style names (name:tag) → backend=ollama
* /api/cookbook/ollama/library (cached scrape of ollama.com + curated
fallback so classic models like qwen2.5 stay reachable)
* "Browse Ollama library" toggle below Download with size chips
* Engine=Ollama in hwfit toolbar merges the Ollama library into the
main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
pulled up 8px to align with the row's checkboxes, GGUF File dropdown
exposed for Ollama backend, GPU re-render on Edit serve restore,
_forceBackend flag so saved serveState wins over backend detection,
cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
Clear-offline buttons in Added Models toolbar, offline-pill removed
(opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
reachable (vLLM returns 404 on bare /v1, fully working endpoints
were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish