Normalize OpenAI-compatible chat URL shapes so base /v1 endpoints route to /v1/chat/completions while already-full chat endpoints remain idempotent.
Preserve native local Ollama routing for bare localhost:11434 endpoints, keep localhost:11434/v1 as OpenAI-compatible, and add focused regression coverage for provider detection, chat target URLs, and model listing from /v1.
Part of #541.
Move the research route domain into the canonical routes/research/ subpackage while preserving the legacy routes.research_routes import path through a sys.modules compatibility shim.
The moved canonical module is behavior-preserving, app wiring now imports the canonical route setup function, source-introspection tests point at the new canonical path, and shim regression coverage pins legacy/canonical same-object behavior plus string-targeted monkeypatch reach-through.
Refs #4082.
Refs #4071.
Two py/polynomial-redos sinks ran regexes with two adjacent \s-matching
quantifiers over untrusted model text, backtracking O(n^2) when the tail failed
on a whitespace flood:
- routes/skills_routes.py: the last-resort verdict-from-prose extractor used
`["\'\s:]*\s*` — the class already matches \s, so the trailing \s* was a
redundant second quantifier. Dropped it (extracted to a documented module
constant _VERDICT_PROSE_RE); the matched text is identical, the scan linear.
- src/agent_loop.py _EXPLICIT_CONTINUATION_RE: `\s*[.!?]*\s*$` put two \s*
around `[.!?]*`. Rewrote as `\s*(?:[.!?]+\s*)?$` — same accepted tails (no
two \s* adjacent), linear. Portable form (no possessive quantifiers).
Both verified output-equivalent to the originals across a fuzz corpus. Adds
tests/test_redos_verdict_continuation.py pinning the unchanged match sets and
bounding the flood inputs (old patterns took seconds at 40k whitespace chars).
Layer a defensive cleanup on top of the generated-query web-search flow so the final selected query is sanitized before reaching comprehensive_web_search.
- remove fenced code blocks from the final search query
- preserve inline code as plain text
- collapse whitespace and cap query length
- cover generated-query success plus LLM failure/empty fallback paths
Partially addresses #4547.
Keep strong references to builtin MCP startup tasks until completion and kill/reap the npx probe subprocess when cancellation interrupts the probe. Includes focused regression coverage for both lifecycle paths.
Replay user-site .pth hooks when checking cookbook runtime dependencies so packages installed with --user are visible to dependency completion. Includes focused regression coverage.
Wrap blocking _resolve_model calls in asyncio.to_thread across async model interaction paths so endpoint/model resolution does not stall the event loop. Preserve owner-scoped resolution and add focused regression coverage.
Move scheduled-task current-time context out of the system prompt and into a user-role context message so the system prompt remains stable for prompt caching. Preserve time grounding on both the agent-loop path and fallback direct-call path, with focused regression coverage.
* fix(security): prevent ReDoS in XML and args tool-call parsers
Four py/polynomial-redos sinks in tool_parsing.py ran lazy/greedy regexes over
untrusted model output (tool-call markup is attacker-influenced via prompt
injection). When the closing delimiter was absent, each rescanned to
end-of-string from every opener -> O(n^2):
- args => { ... } in _parse_tool_call_block: greedy \{([\s\S]*)\} restarted
from every `args:{` opener. Now finds the opener once and takes through the
last `}` (rfind) — equivalent capture, O(n).
- _XML_INVOKE_RE: lazy <invoke ...>([\s\S]*?)</invoke>. Now _iter_xml_invoke
pairs each opener with the first reachable </invoke> and stops when none is.
- _XML_DIRECT_TOOL_RE and the <tag>([\s\S]*?)</\1> param scan in
_parse_tool_code_block: lazy backreference patterns. Now _iter_backref_blocks
pairs each opener with the nearest matching closer and memoizes tag names
with no remaining closer, so an opener flood stays O(n).
All four are output-equivalent to the originals on well-formed tool-call markup;
the lazy patterns remain defined (still re-exported via agent_tools) but no
longer drive a finditer over untrusted text. Adds tests/test_redos_xml_tool_parsers.py
pinning correctness and bounding the opener-flood inputs (old paths took 4-15s).
* fix(security): harden invoke-parameter and distinct-name tag scans
Forward-only the two residual ReDoS paths in the XML/tool parsers that the
outer-delimiter fix left quadratic:
- _parse_xml_invoke parsed <parameter> with _XML_PARAM_RE.finditer, so a
closed <invoke> body full of unclosed <parameter> openers rescanned the
body from every opener (O(n^2), ~11s at 8k openers). Now scans forward-only
via _iter_named_blocks, factored out of _iter_xml_invoke.
- _iter_backref_blocks only memoized repeated missing tag names; a flood of
distinct unclosed names searched the suffix once per name (O(n^2)). It now
indexes every closer by name in one linear pass and binary-searches per
opener (O(n log n)). Covers the direct and tool_code backref scans.
Output-equivalent to the prior scanners (200k randomized trials match the
memoized version for both the direct ci=True and tool_code ci=False configs).
Adds regressions for the closed-invoke parameter flood and the distinct-name
floods (45k openers now run in ~0.05s, were 5-6s).
Keep an unhealthy MemoryVectorStore instance available for health reporting instead of discarding it as disabled. This lets health checks report a degraded/down vector-store state while preserving focused regression coverage for initializer behavior.
An account configured with SMTP only (no imap_host) has no inbox, but the
inbox list path still called _imap_connect, which handed an empty host to
imaplib. imaplib.IMAP4("", 993) silently dials localhost:993 and fails with
"[Errno 111] Connection refused", so the email panel's poll logged a
"Failed to list emails" ERROR every ~60s and surfaced a scary error in the UI.
_imap_connect now fails fast with a typed EmailNotConfiguredError (subclass of
RuntimeError, so existing broad handlers keep working) when no imap_host is set,
and the inbox list returns an empty result for that case instead of an error.
SMTP send is unaffected.
* fix: tool results misthreaded when a native call fails to convert
* Unpack the third converted_calls return from _resolve_tool_blocks in the fenced-example tests
Guard the agent_max_tool_calls settings read so hand-edited or agent-written non-numeric settings.json values fall back to 0 instead of crashing agent-mode chat stream initialization. Add regression coverage for guarded coercion.
Use the upload handler's tolerant index loader when reading upload metadata so corrupt uploads.json degrades to missing metadata instead of a 500. Return 400 for malformed vision JSON request bodies and add regression coverage for both paths.
Accept calendar datetime phrases such as "3pm tomorrow" by adding a time-first natural-language parser branch mirroring the reminder parser. Add regression coverage proving time-first forms match their existing day-first equivalents.
* fix(security): prevent ReDoS in LLM-output tool/think parsers
The regexes that parse untrusted model output in text_helpers.py and
tool_parsing.py are delimiter-bounded with a lazy [\s\S]*? (or an
ambiguous (\s+[^>]*)?). Applied with re.sub/re.finditer over a whole
response, they degrade to O(n^2) when the closing delimiter is absent:
the engine rescans to end-of-string from every opener. Model output is
untrusted, so a prompt-injected or malicious model can stall the agent
loop with many unclosed openers (measured ~25s on a 60KB <thought flood).
- text_helpers.py: replace ambiguous <thought(\s+[^>]*)?> with
<thought([^>]*)> (identical capture, no \s+/[^>]* overlap); skip the
Gemma <|channel>...<channel|> subs when no <channel|> closer is present.
- tool_parsing.py: gate _TOOL_CALL_RE, _XML_TOOL_CALL_RE and _TOOL_CODE_RE
(in parse_tool_blocks and strip_tool_blocks) on a cheap presence check
for their closing delimiter. With no closer the regex cannot match, so
skipping is equivalent; only the wasted O(n^2) rescan is removed.
Resolves CodeQL py/polynomial-redos #230, #231, #232, #233, #235, #236,
#524. The _XML_OPEN_TOOL_CALL_RE alerts (#234, #477) are false positives
(its greedy [\s\S]*\Z is linear) and left untouched.
* fix(security): close ReDoS gaps in tool/think parsers from review
Addresses two review findings on the closer-guard approach:
- Whole-string "closer exists?" checks were bypassable: a stale closer
before an opener flood, or a closer with no reachable inner `}`, kept
the guard true while every opener still rescanned to end-of-string
(O(n^2)). Replace the substring guards with `_iter_delimited`, a
forward-only scan that pairs each opener with a *later* closer and
stops once none is reachable (O(n)). `parse_tool_blocks` and
`strip_tool_blocks` (via `_strip_delimited`) both use it for the
[TOOL_CALL], <tool_call>/<function_call>, and <tool_code> formats.
Verified equivalent to the original regexes on well-formed inputs.
- `<thought([^>]*)>` dropped the tag-name boundary and corrupted
unrelated tags (`<thoughtful>` -> `<thinkful>`). Use `<thought(\s[^>]*)?>`:
the single fixed `\s` keeps the pattern linear (no `\s+`/`[^>]*`
overlap) while restoring the boundary; capture is byte-for-byte
identical for real `<thought ...>` openers.
Adds regressions for stale-closer-before-opener, closer-present-without-
inner-brace, and the <thoughtful>/<thoughts> passthrough.
* fix(security): close Gemma channel ReDoS guard flagged in review
vdmkenny noted the same bypassable whole-string guard remained in
text_helpers.py: `if "<channel|>" in out.lower()` gating the Gemma
thought/response channel subs. A stale `<channel|>` before a
`<|channel>thought` opener flood keeps the guard true while every opener
still rescans to end-of-string (measured ~7.3s at 4k openers).
Replace it with `_sub_delimited`, the same forward-only scan used for the
tool-call parsers: pair each opener with a later closer, stop when none is
reachable (O(n)). Verified output-equivalent to the original capture regexes
on well-formed multi-channel inputs; the stale-closer case now runs in <2ms.
Adds a regression for stale-closer-before-opener on the Gemma path.
* fix(security): harden strip_think() think-tag ReDoS flagged in review
The earlier fixes hardened normalize_thinking_markup and the delimiter
scanners, but the production entrypoint strip_think() still ran
_THINK_CLOSED_RE / _THINK_ATTR_RE / _THINK_OPEN_RE (and the stray-tag
_THINK_TAG_RE) over untrusted model output. Those kept the same ReDoS
shapes: the lazy `<open>[\s\S]*?</close>` rescanned to end-of-string from
every opener, and `(?:\s+[^>]*)?` / `[^>]*` attribute scans ran to
end-of-string from every opener on a "many openers, no closer" flood. On
the prior head, malformed `<think` / `<thinking` / `<thought` floods took
6-14s through strip_think(). The shipped `<thought>` normalization had the
same residual: the single-opener case was linear but an opener flood was
still O(n^2) (~4.4s).
- Replace the lazy multi-pass _THINK_CLOSED_RE loop with the existing
forward-only _sub_delimited scan (pair each opener with the first
reachable closer, stop when none is reachable). One pass collapses
sequential and nested blocks as before.
- Bound every opener/stray-tag attribute scan at `<` (`[^<>]` not `[^>]`)
so a no-`>` opener flood can't drive a single match attempt to
end-of-string. Identical capture for well-formed think/thought tags.
- email_helpers._strip_think: compute had_think from the single linear
_THINK_TAG_RE instead of the lazy closed/open `.search()` calls, which
had the same O(n^2) on the email reply/summary/extraction paths.
All flood variants now finish in <10ms (were 6-14s). Output verified
byte-for-byte identical to the prior implementation over a 34-case corpus
(nested, mismatched, attr, uppercase, Gemma, prose, prompt-echo). Adds
strip_think() timing regressions for malformed openers, opener floods
(all three tag names), the closed-opener flood, and the malformed-closer
flood.
* docs: trim verbose comments in think-tag ReDoS fix
Recognize api.cerebras.ai as a Cerebras cloud provider so llama.cpp/LM Studio cache-affinity fields are not attached even when endpoint_kind is misconfigured as local. Add regression coverage for provider detection, self-hosted classification, and payload field exclusion.
Strip fenced code blocks before extracting visual-report headings so heading-looking lines inside code fences do not desync TOC anchors. Add regression coverage for backtick and tilde fences while preserving normal heading extraction.
Install libmagic1 and image-scoped python-magic in the Docker image so upload MIME detection can use content sniffing. Add regression coverage for the Dockerfile dependency pair and the libmagic-present sniffing path.
* Refresh README screenshot
* fix(notes): allow inline editing of checklist items
* fix(notes): delete checklist item if inline edit is empty
* fix(notes): use debounce for text click to bypass toggle on double click
* fix(notes): use Edit button exclusively for inline edit to avoid UX delay on toggle
---------
Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>
* fix(api): handle varying response formats for model IDs from compatible providers
merge conflict for pr-2204 resolved
* fix(modal): keep body-portaled dropdowns above their tool modal at any stack depth (#4720) (#4724)
* fix(memory): keep the Brain memory item menu above the modal at any stack depth
The memory item "⋮" dropdown is portaled to <body> with a hardcoded
z-index of 10001. Tool modals, however, get a monotonically increasing
z-index from modalManager's bring-to-front counter (_modalTopZ), which
climbs unbounded as modals are opened/restored over a session. Once that
counter passes 10001, the Brain modal stacks above the body-portaled
dropdown, so the menu renders behind the panel — visible only where it
spills past the modal's edge (#4720).
Derive the dropdown's z-index from the owning modal's current z-index
(+1), keeping 10001 as a floor for the common low-counter case, so the
menu always sits just above its modal however high the counter has climbed.
Verified with document.elementFromPoint at the dropdown's location: with a
high modal z-index the old build returns the modal at every sampled point
(menu behind); the fixed build returns the dropdown (menu on top). The
default low-counter case is unchanged (z stays 10001).
* refactor(modal): route body-portaled dropdowns through a shared topPortalZ() helper
The hardcoded z-index:10001 the Brain memory menu used (#4720) is the same
literal shared by ~16 body-portaled dropdowns across calendar, cookbook,
cookbookServe, documentLibrary, emailLibrary, gallery, notes, emojiPicker and
memory — each renders behind its owning tool modal once modalManager's
bring-to-front counter climbs past the literal over a long session.
Promote the per-dropdown fix into a single topPortalZ() helper in
toolWindowZOrder.js — the existing source of truth for tool-window z, already
imported by modalManager's _bringToFront and notes.js — returning
max(topToolWindowZ(), dock-chip floor) + 1, so a portaled dropdown always sits
just above the live tool-window stack however high the counter has climbed.
Route all 16 sites through it. The slashCommands tour tooltips and the
cookbookServe VRAM dialog are intentionally left out (neither is a modal-owned
portaled dropdown).
Add tests/test_portal_dropdown_z_js.py covering the helper, including the #4720
scenario (modal counter at 99999 -> dropdown at 100000). Existing
test_notes_z_order_js.py stays green.
* fix(llm): detect mistral.ai provider and support reasoning_effort (#4698)
* fix(llm): detect mistral.ai provider and support reasoning_effort
Four coupled bugs broke Mistral thinking model support:
1. _detect_provider() had no mistral.ai host check, so all Mistral
endpoints fell through to the generic 'openai' provider string.
_provider_display_name() correctly identified them as 'Mistral',
making any 'if provider == "Mistral"' check elsewhere dead code.
2. reasoning_effort parameter was never sent in the request payload,
so Mistral never activated thinking mode even when the user
configured a thinking-capable model (mistral-small-latest,
mistral-medium-latest, magistral-*).
3. Mistral returns content as a typed array
([{"type":"thinking",...},{"type":"text",...}]) when
reasoning is on, not as a plain string. Both the streaming and
non-streaming parsers expected strings and silently dropped the
thinking content.
4. _THINKING_MODEL_PATTERNS didn't include magistral or mistral-*
model prefixes, so the frontend wouldn't tag reasoning output
as thinking even after the above were fixed.
Fix:
- Add mistral.ai to _detect_provider() host checks
- Add a _normalize_mistral_content() helper that splits the typed
array into (text, thinking) strings
- Inject payload["reasoning_effort"] = "high" when provider is
Mistral and _supports_thinking(model) is true, in both stream_llm
and llm_call_async payload construction
- Wire the normalizer into both response parsers
- Extend _THINKING_MODEL_PATTERNS to include magistral,
mistral-small, mistral-medium, mistral-large
Tested on Docker install with mistral-small-latest +
reasoning_effort=high. Reasoning streams correctly into the
thinking panel after the fix.
Fixes#4678
* fix(llm): address review — lowercase provider id, configurable effort, tests
Addresses vdmkenny's review on PR #4698:
1. Removed duplicate 'if provider == "mistral"' block in stream_llm
— two back-to-back copies, one was dead-redundant.
2. Dropped personal-context comment ('free-tier limits are generous
for this user') and made reasoning_effort configurable via env var
ODYSSEUS_MISTRAL_REASONING_EFFORT (high / medium / low / none).
Default remains 'high' for backward compat with the tested behavior.
3. Recased provider id from 'Mistral' to 'mistral' to match the
lowercase convention used by every other provider id in the file
(openai, anthropic, ollama, copilot, ...). _provider_display_name()
still returns the Title-Case 'Mistral' for UI labels — only the
runtime id used in 'if provider == ...' checks was recased.
4. Added tests/test_llm_core_mistral_content.py with 13 tests pinning
_normalize_mistral_content()'s contract: string passthrough, the
Mistral array format (thinking + text blocks), and edge cases
(empty, garbage, None, wrong types, missing fields, string-vs-array
inner thinking field).
Also fixed a gap the review didn't catch: the non-streaming paths
(llm_call sync + llm_call_async) were missing the reasoning_effort
injection entirely. Added the same injection to both, so Deep Research
and agent tool calls also activate Mistral thinking.
All 13 new tests pass. Existing reasoning/streaming/ollama-thinking
tests still pass (38 tests, no regressions).
Fixes#4678
* fix: Images cannot be seen by model that is vision capable (#4726)
* fix: Images cannot be seen by model that is vision capable
* fix: skip http(s) image_url for Ollama (images[] is base64-only)
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* fix(chat): strip executed email tool fences from the live stream (#3993) (#4275)
* fix(chat): strip executed email tool fences from the live stream (#3993)
The backend strips every fenced tool block from persisted text (the regex in
src/tool_parsing.py is built from the full TOOL_TAGS set, which includes the
email tools), so a reloaded session renders cleanly. The live frontend path
uses a separate hardcoded EXEC_FENCE_RE in static/js/chatRenderer.js that only
listed web_search/read_file/write_file/create_document/edit_document/
update_document — so executed email tool fences (list_emails, etc.) lingered as
raw code blocks in the live assistant bubble until the user reloaded.
Add the nine email tool tags to EXEC_FENCE_RE so the live render settles into
the same clean layout as the history reload. bash/python stay excluded on
purpose: those are languages a user may legitimately have asked the model to
show as code, not tool invocations.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(chat): single-source live exec-fence tool list from TOOL_TAGS (#3993)
Per review: EXEC_FENCE_RE was a second, hand-maintained copy of the
executable-tool list, so any tool not in it — and every future tool added to
TOOL_TAGS — would leave its executed fence lingering in the live bubble until
reload (the original #3993 bug, recurring one tool at a time).
EXEC_FENCE_RE is now built from an explicit EXEC_TOOL_TAGS list that mirrors
TOOL_TAGS (src/agent_tools/__init__.py) minus bash/python, which stay excluded
as legitimate code-example languages. A new regression test
(test_exec_fence_re_covers_all_executable_tools) extracts both lists from
source and fails if they drift, so the whole class is caught in CI instead of
by a user — the "minimum acceptable middle ground" from the review, made exact
(set equality, not just coverage).
Verified: pytest tests/test_live_strip_email_tool_fences.py (5 passed);
node --check static/js/chatRenderer.js; and a node run of the built regex
confirms email/generate_image/manage_memory/ls fences strip while
bash/python/sh are preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(chat): build live exec-fence list from /api/tools at runtime (#3993)
Make TOOL_TAGS the single source for live exec-fence stripping. chatRenderer.js
no longer hard-codes a tool list; it fetches the backend's authoritative set
once from GET /api/tools (sorted(TOOL_TAGS)) and builds EXEC_FENCE_RE from it at
load, minus bash/python. No second list to drift, and a future tool added to
TOOL_TAGS is covered automatically — without touching the streaming path.
Until the fetch resolves EXEC_FENCE_RE is null and exec fences aren't stripped
(a sub-second window before the first stream); the backend already strips
persisted history, so a reload always renders clean.
Drop test_exec_fence_re_covers_all_executable_tools (no hand-maintained list to
guard) and add source-level guards: the frontend keeps no hard-coded list and
fetches /api/tools, and the endpoint serves the full sorted(TOOL_TAGS).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CVCKth4g8pWh7pwFDVm4iL
* fix(chat): warn on /api/tools fetch failure instead of swallowing it (#3993)
A fresh-context review flagged that loadExecFenceRegex's catch silently
discarded errors: if the one-shot fetch fails, EXEC_FENCE_RE stays null for the
whole session and live exec fences go unstripped until reload, with zero signal.
console.warn it, and correct the comment to describe the failure mode honestly
(was understated as just a sub-second startup window).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CVCKth4g8pWh7pwFDVm4iL
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(routes): log and cleanly 500 on unreadable HTML page (#4637)
* fix(routes): serve 404 instead of 500 when an HTML page file is missing
_serve_html_with_nonce opened the HTML file with no error handling, and
callers such as /backgrounds and /login pass their paths in with no
existence check, so a missing or unreadable file raised an unhandled
OSError that surfaced as a 500. Wrap the read and raise HTTPException(404)
instead; the normal render path (CSP-nonce substitution) is unchanged.
Fixes#4594
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): distinguish missing page (404) from read failure (500)
The previous fix caught a broad OSError and returned 404 for every
failure, which masks real server-side problems (permission errors, I/O
failures) as "not found" and lets them slip past error alerting. Split
FileNotFoundError (genuine 404) from other OSError, which now logs the
exception and returns a generic 500 — without leaking the OS error
string or file path into the response body.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): treat unreadable bundled HTML page as logged 500, not 404
Per PR #4637 review: every caller of the page-render helper serves a fixed,
server-owned template (index/login/backgrounds), never a client-supplied
path. So a missing or unreadable file is a server fault (broken deployment),
not a client "not found" — a 404 there mislabels a server error and hides a
missing core template from 5xx alerting, contradicting the OSError->500
rationale this PR is built on. Collapse both branches into a single logged,
leak-free 500.
Move the helper to src.app_helpers.serve_html_with_nonce so the behavior can
be unit-tested without importing the whole app (app.py is the slim
orchestrator; the test harness stubs src.database, so importing app in tests
is not viable). Add tests pinning missing/unreadable -> 500 (not 404) and
nonce injection on the happy path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(catalog): add Gemma 4 12B/QAT entries and RTX 3050 bandwidth (#4728)
Add official Gemma 4 12B-it plus QAT-INT4/INT8 catalog entries (with their
GGUF sources), QAT quantization support across the quant tables and the
prequantized-prefix list, and the missing RTX 3050 / 3050 Ti memory
bandwidth so speed estimates stop falling back to the generic cuda value.
* fix debugging on windows (#4679)
* fix: Real-ESRGAN install + Cookbook deps-panel crash on the Python 3.14 image (#4694)
* fix(docker): make Real-ESRGAN installable on the Python 3.14 image
realesrgan's deps basicsr/gfpgan/facexlib (unmaintained since 2022) read
their version in setup.py via `exec(...); locals()['__version__']`, which
raises KeyError on Python 3.13+ — PEP 667 made locals() in a function an
independent snapshot that exec() can no longer mutate. That fails the
Cookbook "install realesrgan" sdist build on the python:3.14 base.
Add a `realesrgan-wheels` builder stage that fetches the pinned sdists,
patches get_version() to exec into an explicit namespace dict, and builds
wheels; the final stage installs them --no-deps so a later
`pip install realesrgan` resolves from wheels instead of rebuilding the
broken sdists. torch stays a runtime pull to keep the base image lean.
Also add the runtime libs opencv-python (cv2) needs — libgl1,
libglib2.0-0t64, libxcb1 — which the slim base omits; without them the
install succeeds but `import cv2` dies with
`libxcb.so.1: cannot open shared object file`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(cookbook): don't let a package's sys.exit() on import hang the deps panel
The local optional-dependency probe imports each package in-process and
catches ImportError / Exception. But a package can call sys.exit() at
import time — e.g. rembg does `sys.exit(1)` when no onnxruntime backend
loads. SystemExit is a BaseException, not Exception, so it escaped the
probe, propagated out of the list_packages endpoint, and hung the whole
Dependencies panel / worker (the UI loads forever).
Catch (Exception, SystemExit) so one broken optional package is reported
as not-usable instead of taking down the panel.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): 500 (not 404) when the app-shell index.html is missing (#4791)
Follow-up to #4637. serve_index — the handler for / and the SPA deep-link
routes (/notes, /calendar, /cookbook, /email, /memory, /gallery, /tasks,
/library) — pre-checked os.path.exists and raised its own
HTTPException(404, "index.html not found") when the bundle was missing. So a
missing core template returned 404 before serve_html_with_nonce's 500 could
fire, the one inconsistency left after #4637.
index.html is a fixed, app-bundled template; a missing one is a broken
deployment (server fault), not a client "not found", so it should surface as a
logged 500 in 5xx alerting rather than a 404. Keep the static->root fallback,
drop the redundant existence guard and the dead-end 404, and let the shared
helper handle the missing case.
Verified against the running app: / and /notes return 200 with the bundle
present and a logged 500 when index.html is absent.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(setup): load .env so a pre-seeded admin password is honored on native installs (#4787)
setup.py read ODYSSEUS_ADMIN_USER / ODYSSEUS_ADMIN_PASSWORD via os.getenv()
but never loaded .env, so on native Linux/macOS installs a password
pre-seeded in .env (documented in docs/setup.md and .env.example) was
silently ignored and a random one generated, breaking the first login.
Docker was unaffected because compose passes the vars into the container env.
Call load_dotenv(BASE_DIR/.env, encoding="utf-8-sig") at the top of main(),
mirroring app.py (utf-8-sig tolerates a Notepad UTF-8 BOM). load_dotenv does
not override already-exported OS vars, so the existing precedence is kept.
python-dotenv is already a required dependency.
Adds a regression test that pre-seeds credentials only in .env (not the
shell) and asserts the stored bcrypt hash matches the pre-seeded password.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: email poller marks calendar extraction processed on LLM failure (#4622)
Move calendar processed-marker insert into the LLM success path (else branch).
Previously, the INSERT ran even after a transient LLM failure, causing the
poller to skip retrying calendar extraction on subsequent runs.
Minimal change: only touches the try/except/else control flow in
_auto_summarize_pass_single() — preserves existing formatting and line endings.
* feat(ui): add toggle for padding around chat area (#4691)
* feat: Allow admins to choose if they want to share defaults (#4752)
* First bare fix
* Adding the option toggle
* toggle function fix
* Final fix, added missing /auth/
* Extended toggle text & added tests
* Comments change
* Description toggle change
* br tag fix
* description change based on suggestion
* fix(agent): parse misfenced read_file calls (#4799)
* fix: use atomic write in APIKeyManager.save() to prevent credential data loss (#4591) (#4597)
* fix: use atomic write in APIKeyManager.save() to prevent data loss
Opening api_keys.json with 'w' truncates the file before writing, so a
crash, disk-full, or mid-write error leaves all stored provider API keys
corrupted. Switch to atomic write (temp file + fsync + os.replace) so
the original file is always intact on any failure.
Fixes#4591
* chore: trigger CI re-run
* chore: update PR description
* chore: fix how-to-test section for description check
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* feat(discovery): detect llama.cpp servers and label local providers (#4729)
* feat(discovery): detect llama.cpp servers and label local providers
Scan port 8080 (llama-server) and 11435 (APFEL) during discovery, fingerprint
llama.cpp via its native /props endpoint, and label well-known local serving
ports (8080 llama.cpp, 8000 vLLM, 1234 LM Studio, 11434 Ollama) consistently
in both the Python provider helper and the JS endpoint UI. Adds a llama.cpp
hint to the /setup slash command.
* fix(discovery): don't infer the serving tool from the port alone
Per review: vLLM, SGLang, llama.cpp and plain OpenAI-compatible servers all
share 8000/8080, so labeling by port mislabels real setups (a vLLM box on 8080
shown as llama.cpp). Drop the port->tool assertions from _provider_label and
providerLabel; the authoritative signal is the /props fingerprint done during
discovery, which is unchanged. Loopback now reads a neutral 'local endpoint' /
'Local'. Tests updated to assert the neutral labels.
* refactor(tools): migrate config/integration admin tools to the registry (#4742)
Part of #3629 (the `admin_tools.py` bullet). Moves the config/integration admin
tools off the legacy elif dispatch chain in tool_implementations.py onto the
agent_tools registry:
manage_endpoints, manage_mcp, manage_webhooks, manage_tokens, manage_settings
The do_* implementations (and manage_mcp's command-allowlist / RCE guard:
_validate_mcp_command, _mcp_allowed_commands, and the _MCP_* constants) move
verbatim into the new src/agent_tools/admin_tools.py. They register through a
single ADMIN_TOOL_HANDLERS map that TOOL_HANDLERS.update()s, and the five elif
branches plus their imports are dropped from tool_execution.py, so these tools
now flow through _direct_fallback like the other migrated clusters. The names
are re-exported from src.agent_tools for back-compat.
Dedup:
- _parse_tool_args was duplicated in tool_implementations.py and
document_tools.py. It now lives once in src.tool_utils (which imports nothing
from the project beyond src.constants, so this introduces no cycle) and both
call sites import it from there. The orphaned `import json` in document_tools
is removed with it.
- The five tools share one _owner_adapter(fn) factory that threads ctx["owner"]
into the owner-taking do_* signature, instead of five near-identical wrappers.
Tests: new tests/test_admin_tools_registry.py pins the registration, the
re-export back-compat, the owner-threading adapter, and the single-source
_parse_tool_args (across admin_tools and document_tools). Existing MCP /
settings / webhook suites are repointed at the new module.
* refactor(exceptions): dedupe src/exceptions via core re-export (#4785)
src/exceptions.py was a byte-for-byte duplicate of the canonical
core/exceptions.py. Replace its class bodies with a re-export shim
(mirroring the core/constants.py -> src/constants.py pattern) so the
exception classes are defined in exactly one place. Also fix the stale
"# src/exceptions.py" header comment in core/exceptions.py.
No behavior change: both import paths resolve to the same class objects
(verified by identity), so `except SessionNotFoundError` works regardless
of which module it was imported from. Ran py_compile and
pytest tests/test_app.py (12 passed).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(tasks): normalize task endpoint URL to /chat/completions before model call (#4619)
Upstream bug (present in pewdiepie-archdaemon/odysseus main): the task
executor passes task.endpoint_url VERBATIM to the model HTTP call, unlike
the chat path which stores build_chat_url(normalize_base(base)) on the
session. A task carrying an explicit bare OpenAI-compatible base such as
"http://host:11434/v1" therefore POSTs to a 404 ("page not found"); the
agent loop swallows the empty body into "The model returned an empty
response" and marks the run success, so nothing surfaces the failure.
Tasks that omit an endpoint dodge this only because _resolve_defaults()
cribs an already-full URL from a recent chat session. The API/token path
(e.g. an external client that POSTs /api/tasks with endpoint_url=".../v1")
hits it every time.
Fix: route every resolved task endpoint through _normalize_chat_endpoint()
at the three resolution sites (_execute_llm_task, the persona/research
session path, and _execute_research_task). The helper is idempotent
(strips any existing chat suffix, re-appends the correct one) and leaves
native-Ollama (/api...) and already-concrete URLs untouched, so other
providers are unaffected. Proven via isolated repro: ".../v1" -> 404 ->
empty; ".../v1/chat/completions" -> 200 -> real gemma4:31b output.
Regression test asserts the bare-/v1 -> full-chat-URL mapping, idempotency,
and the native-Ollama/empty passthroughs.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(model-routes): harden _probe_endpoint against malformed model-list responses (#4789)
* fix(model-routes): harden _probe_endpoint against malformed model-list responses
_probe_endpoint parsed model lists with data.get(...) at four sites without
checking that data is a dict, and built the list with a truthiness-only
filter. A /models (or /api/tags) endpoint returning HTTP 200 with valid but
non-dict JSON ([], "x", null, 123) made data.get(...) raise AttributeError,
and a non-string id like 123 passed the filter and then hit .startswith() /
.lower() in the Z.AI/Kimi curated merge and _is_chat_model(). Both errors are
swallowed by the broad except Exception, but the comprehension dies mid-list
so the ENTIRE probed model list is discarded and the endpoint silently
degrades — masking a misconfigured/non-compliant upstream as "no models".
- Guard each data.get(...) with isinstance(data, dict) so a non-dict body
falls through the existing `or []` default.
- Restrict the OpenAI and Ollama model-list comprehensions to non-empty str
values, protecting the .startswith() merges and both _is_chat_model calls.
- Add an isinstance guard at the top of _is_chat_model (defense in depth for
all four call sites).
No behavior change for well-formed {"data":[...]} / {"models":[...]}
responses. Adds regression tests (non-dict body via caplog, mixed/all
non-string ids, _is_chat_model boundary) that fail before the fix and pass
after.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(model-routes): extract _openai_model_ids / _ollama_model_names helpers
Per review on #4789: the malformed-response guards were inlined four times in
_probe_endpoint (two OpenAI-id comprehensions, two Ollama-name comprehensions).
Pull each into a small, directly-testable helper so the security-relevant
parsing lives in one place and a future malformed-shape fix doesn't have to be
applied in four spots (CONTRIBUTING flags repeated logic for this reason).
Behavior is unchanged. Adds direct unit tests for both helpers (non-dict body,
non-string ids, non-dict entries, name>model precedence).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(cookbook): only block model launch on real port collisions (#4760)
* Fix#4507: only block model launch on real port collisions
Quick-run hardcoded port 8000 and never called _nextAvailablePort(), so
every launch collided. Both pre-launch guards (serve panel + quick-run)
were count-based and fired regardless of port.
- quick-run now auto-assigns a free port (8080 for llama.cpp)
- both guards parse the new port and only prompt on a real overlap,
stopping only the colliding serve
- dialog reports the actual port instead of a hardcoded 8000
* refactor(cookbook): share _taskPort for port parsing; auto-assign llama.cpp port
Addresses review on #4760:
- _taskPort regex now matches --port= as well as --port (space)
- _nextAvailablePort and both launch guards reuse _taskPort instead of inline regex
- quick-run llama.cpp no longer pins 8080, so two can run concurrently
* fix(cookbook): _taskPort also parses -p; add port-parsing tests
Addresses review on #4760:
- _taskPort now matches -p <n> too, so it's the complete single reader
(was missing the short flag that other readers already handle)
- add tests/test_cookbook_port_parsing_js.py covering the port forms,
shared-reader reuse, and llama.cpp auto-assign
* test(cookbook): extract pure port helpers and test behavior
Addresses review on #4760: the prior tests only asserted source strings.
- extract portOf() and nextFreePort() into static/js/cookbookPorts.js
- cookbookRunning.js imports them; _taskPort and _nextAvailablePort delegate
- tests run the helpers via node and assert real behavior: all port forms
(--port, --port=, -p, -p=), next-free-port skipping taken ports, and the
same-port-clash / different-port-coexist outcome
---------
Co-authored-by: samy <samy@odysseus.boukouro.com>
* fix(ui): route tasks.js + skills.js dropdowns through topPortalZ() (#4768)
Fixes#4767. #4724 routed 16 body-portaled dropdowns through the shared
topPortalZ() helper so they always render just above the currently-raised tool
modal, but two were missed and still used a hardcoded z-index, so they hit the
same #4720 bug once a modal's bring-to-front counter climbed past the literal:
- tasks.js _showTaskDropdown(): inline z-index:100000 on .task-dropdown
- skills.js kebab menu (.skill-kebab-menu): z-index:100002 in style.css
Both now set zIndex from topPortalZ() after they are appended to the body,
matching the other migrated sites. The dead CSS z-index on .skill-kebab-menu is
removed (the inline value always wins). test_portal_dropdown_z_js.py gains a
source guard asserting both files use topPortalZ() and that no hardcoded
100000/100002 portal literal survives in either file or style.css.
* do_list_models in ai_interaction.py dropped
---------
Co-authored-by: Max Hsu <maxmilian@users.noreply.github.com>
Co-authored-by: aubrey <kyuhex@gmail.com>
Co-authored-by: Michael <52305679+michaelxer@users.noreply.github.com>
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Ahmed Dlshad <ahmed.dlshad.m@gmail.com>
Co-authored-by: Joel Alejandro Escareño Fernández <52678667+TheAlexz@users.noreply.github.com>
Co-authored-by: Kalin Stoyanov <kgs.void@gmail.com>
Co-authored-by: Pedro Barbosa <devpedrobarbosa@gmail.com>
Co-authored-by: Solanki Sumit <125974181+YAMRAJ13y@users.noreply.github.com>
Co-authored-by: Rudra Sarker <78224940+rudra496@users.noreply.github.com>
Co-authored-by: Skoh <101289702+SkohTV@users.noreply.github.com>
Co-authored-by: Jakub Grula <ramsters110@gmail.com>
Co-authored-by: Dividesbyzer0 <54127744+zoomdbz@users.noreply.github.com>
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
Co-authored-by: Magiomakes <114195802+Magiomakes@users.noreply.github.com>
Co-authored-by: Samy <12219635+touzenesmy@users.noreply.github.com>
Co-authored-by: samy <samy@odysseus.boukouro.com>
A single-day all-day event whose source writes DTEND equal to DTSTART
(treating DTEND as an inclusive bound rather than the RFC 5545 exclusive
one) was stored verbatim as a zero-duration row. list_events selects
events overlapping the window with `dtstart < end AND dtend > start`, so
that row is filtered out for any window starting at or after its date and
the event never appears, even though the import reported success.
Events created via the API never hit this because creation always
synthesizes a positive duration; only the two import paths can persist a
non-positive one. Clamp a non-positive end at import (import_ics and the
CalDAV pull) to the same default span used when DTEND is absent: one day
for all-day events, one hour otherwise.
Also repair the persisted state for users who already imported before this
clamp existed. Their stored zero-duration row is invisible, and re-importing
the same ICS hit the duplicate branch and skipped without touching it, so
the event stayed hidden. The duplicate branch now backfills the clamp onto
the matched row before skipping, and the response reports a `repaired` count.
(The CalDAV pull already rewrites dtend on re-sync, so it self-heals.)
_parse_msg_content deserializes stored multimodal content (image/audio
blocks) back into a list. It treated ANY string starting with '[{' and
containing the substring "type" as serialized content, requiring only
that each element be a dict — never that "type" be a real content-block
kind. So a plain text message whose content happens to be a JSON array
of typed objects (e.g. a user pasting an API schema sample like
[{"type": "object", ...}]) was silently parsed from str into a list on
the next hydration, destroying the original string. This runs on every
session load from the DB (_db_to_session -> get_session). Restrict the
round-trip to non-empty lists whose every element is a dict whose
"type" is a recognized block kind (text/image/image_url/audio/...);
real multimodal content (verified: document_processor emits exactly
these) still round-trips, JSON-looking text is left untouched.
The Codex cookbook bridge authorized cookie sessions with require_user()
only, allowing non-admin accounts to read cookbook task state, server
topology, task logs, tmux sessions, and model presets. The stop/adopt
routes also execute local or SSH-backed tmux commands.
Add _require_cookbook_scope() that enforces require_admin() for
cookie-session callers while preserving the existing API-token scope
checks. Apply it to all nine /api/codex/cookbook/* routes.
Fixes#4542
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Prevent untrusted source/context guard text from being merged into the current visible user request during provider message sanitization.
Changes:
- Detect untrusted context blocks during LLM message sanitization
- Insert a short assistant boundary before the current user request
- Keep the visible user prompt as its own user message
- Preserve normal consecutive user-message merging for non-untrusted cases
- Strengthen prompt-security wording to avoid mentioning guard wrappers
- Add regression coverage for untrusted context followed by a user prompt
Notes:
- Untrusted context remains role:user for safety
- This does not add prompt debug logging
- This does not change frontend draft persistence
The lazy `<think>.*?</think>` pattern (one compiled `_THINK_RE`, one inline
copy) is applied with `re.sub` over whole model responses. With a `<think>`
opener and no closer, the engine rescans to end-of-string from every opener
-> O(n^2) on attacker-influenced output (prompt injection can echo thousands
of openers via tool output / retrieved content). CodeQL py/polynomial-redos.
Replace both with `_strip_think_blocks`, a forward-only linear scan that is
byte-for-byte equivalent to the original narrow regex: only literal
`<think>`/`</think>` (any case) match, a dangling opener with no closer is
left intact, and an orphan `</think>` is never stripped. Routing through the
broader `text_helpers.strip_think` was avoided on purpose -- it also strips
`<thinking>`, attributes and prompt echoes, which would change what the
loop's progress/circling heuristics see.
Adds tests/test_redos_think_blocks.py pinning regex-equivalence on a battery
of well-formed/edge inputs plus a linear-time bound on hostile input.
Per code audit #4388: Wrap _migrate_single_user and
_drop_reserved_loaded_users with _config_lock to ensure atomic
config reads/writes and prevent potential race conditions during
concurrent access.
This is a defense-in-depth fix - these methods run at startup
before concurrent requests are accepted, but adding the lock
makes the code consistent with other config mutations.
The email-account endpoints coerced user-supplied ports with a bare int(data.get("imap_port") or 993), so a non-numeric port (e.g. "imap") raised ValueError and surfaced as an HTTP 500 in the create, update, and test-config endpoints.
Add a _coerce_port(value, default) -> (port, error) helper and use it in all three endpoints, returning the endpoints standard {"ok": False, "error": ...} response (matching the existing "name required" validation) instead of crashing. A blank or missing port still falls back to the default (993/465).
ROADMAP "Self-host troubleshooting cookbook" asks to document the weird
30-second fixes that otherwise become 30-minute searches. Adds a "Common
self-host traps" subsection under Troubleshooting covering: the UTF-8 BOM
.env gotcha (app.py loads with utf-8-sig), macOS AirPlay holding port 7000
(the start script uses 7860), the plain-HTTP Tailscale/LAN clipboard
limitation, self-hosted ntfy delivery (NTFY_BIND/NTFY_BASE_URL + the ntfy
Android Instant-delivery toggle), Dovecot cleartext-auth on LAN mail stacks,
and Radicale full-collection-URL sync.
Docs only; grounded in existing repo behavior (.env.example NTFY_* block,
app.py utf-8-sig loader, start-macos.sh port choice).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(tools): add shim protection test for tool_implementations split
Covers all 48 top-level functions (33 do_* + 15 _helpers) extracted from
the original module. Guards the upcoming split: the shim must re-export
every symbol so existing 'from src.tool_implementations import X' imports
keep working. Passes on baseline (pre-split).
* refactor(tools): add src/tools/ package with shared _common
Slice 1 Task 2 (#4082/#4071). Adds the package skeleton and moves the
shared _parse_tool_args helper into src/tools/_common.py. Domain modules
will import from here. tool_implementations.py is untouched at this step.
* refactor(tools): extract system domain into src/tools/system.py
Slice 1 (#4082/#4071), Task 3: move the system-domain tool functions
(do_manage_skills/_skill_dump/do_manage_tasks/do_manage_endpoints/
do_manage_mcp/do_manage_webhooks/do_manage_tokens/do_manage_settings/
do_api_call/do_app_api) and the app_api blocklist constants out of
tool_implementations.py into a new src/tools/system.py module.
tool_implementations.py re-imports all of them so it stays a working
backward-compatible facade (shim test stays green).
- do_manage_mcp resolves get_mcp_manager via a function-local import
from tool_implementations so the test that patches
src.tool_implementations.get_mcp_manager still applies post-move.
- do_app_api imports _internal_headers and _INTERNAL_BASE (still in
tool_implementations) function-locally to avoid a circular import.
- Repoint test_context_budget introspection assertion to the moved
code's new home in src/tools/system.py.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(tools): extract cookbook domain into src/tools/cookbook.py
Moves the model-serving (cookbook) tool domain out of tool_implementations.py
into src/tools/cookbook.py as part of slice 1 (#4082/#4071):
- 13 do_* tools: download/serve/list/stop/tail/search/adopt/cached models,
list downloads/cancel, list cookbook servers, serve presets
- 9 private helpers: _cookbook_servers, _resolve_cookbook_host,
_cookbook_env_for_host, _infer_serve_{port,host}, _ensure_served_endpoint,
_cookbook_register_task, _cookbook_apply_retry_suggestion,
_scan_running_model_processes, _cookbook_kill_session
- _MODEL_PROCESS_PATTERNS constant (used only by _scan_running_model_processes)
tool_implementations.py stays a backward-compatible facade via a re-import
from src.tools.cookbook; src/tools/__init__ re-exports the same symbols.
_internal_headers and _INTERNAL_BASE stay in tool_implementations.py (shared
by system.py's do_app_api and many cookbook funcs). Each cookbook function
that needs them does a function-local import to avoid a top-level circular
dependency, matching the system-domain split.
Verified: compileall clean; shim test green; cookbook-touching suite
(652 passed, 1 skipped); full suite 3587 passed, 2 failed
(pre-existing test_api_chat_security, unrelated).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(tools): extract search domain into src/tools/search.py
* refactor(tools): extract notes domain into src/tools/notes.py
* refactor(tools): extract calendar domain into src/tools/calendar.py
Repoints tests/test_caldav_bidirectional_sync.py source-introspection
to src/tools/calendar.py (do_manage_calendar moved there).
* refactor(tools): extract image domain into src/tools/image.py
* refactor(tools): extract research domain into src/tools/research.py
* refactor(tools): extract contacts domain into src/tools/contacts.py
* refactor(tools): extract vault domain into src/tools/vault.py
Repoints tests/test_vault_password_not_in_argv.py source-introspection
to src/tools/vault.py (the vault do_* helpers moved there).
* refactor(tools): collapse tool_implementations to clean re-export shim
Move shared _INTERNAL_BASE/_internal_headers to src/tools/_common.py and
drop the duplicate _parse_tool_args (already in _common). tool_implementations.py
is now a pure re-export facade (+ 3 pre-existing email-context helpers, out of
scope). Domain files' function-local imports of these names still resolve via
the facade re-export.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): port upstream cookbook workflow changes to split module
Rebase onto dev dropped c504214 ("Cookbook model workflow fixes") edits
to do_serve_model / do_tail_serve_output: the extraction commit moved
the pre-edit bodies into src/tools/cookbook.py and git auto-accepted the
deletion from tool_implementations.py, losing dev's changes. Restore them
in their post-split home:
- do_serve_model: add where/log_path/next_tools and the expanded
"Next required check" output message
- do_tail_serve_output: empty-output fallback message replacing
"(empty pane)"
(do_manage_settings web_fetch alias edit was already applied to
src/tools/system.py during the system-extract conflict resolution.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): break admin_tools circular import in split facade
After rebasing onto dev (#3629 moved the admin manage_* tools into
src/agent_tools/admin_tools), the facade re-exported them via a top-level
`from src.agent_tools.admin_tools import ...`. But src.agent_tools.__init__
imports this facade at top level, so the eager import re-entered the
partially-initialized agent_tools package and broke collection.
Re-export the admin symbols (do_manage_endpoints/mcp/webhooks/tokens/
settings, _MCP_DENIED_COMMANDS, _validate_mcp_command) lazily through
module __getattr__ instead, and drop them from src/tools/__init__ (they
no longer live in the src.tools package). system.py now holds only the
skills/tasks/api bridges; admin tools live solely in admin_tools.py,
matching upstream.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): re-export dropped helpers through the split shim
Address review finding from #4423: the compatibility facade claimed to
preserve every original top-level symbol but omitted three helpers the
old src.tool_implementations exposed. Re-export them and pin them in
the shim protection test:
- _string_arg, _validate_cookbook_ssh_target <- src/tools/cookbook.py
- _mcp_allowed_commands <- src/agent_tools/admin_tools.py (lazily via
__getattr__, to keep the agent_tools.__init__ <-> facade import acyclic
after the #3629 admin-tools migration)
All three added to tests/test_tool_implementations_shim.py _EXPECTED so
the test contract now matches its "every original top-level function"
comment.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(tools): self-verify shim re-exports every domain do_*
The hand-maintained _EXPECTED list in the shim protection test can drift
silently when a new tool is added to a domain module but not re-exported
by the facade — exactly the omission a reviewer flagged post-split.
Add an auto-discovering test that enumerates every do_* from the domain
modules (incl. admin_tools) and asserts reachability through the shim,
so a forgotten re-export fails the build automatically.
Uses hasattr (not dir(ti)) because the admin symbols are re-exported
lazily via module __getattr__ and don't appear in dir(ti).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(tools): self-verify every in-repo facade import resolves
RaresKeY's P3 on the shim test was a claim-vs-reality gap: the docstring
said it protected "every from src.tool_implementations import X" but the
hand-maintained _EXPECTED list omitted three underscore helpers, so the
claim wasn't enforced. Re-exporting the three (cf1f5e3) fixed the known
gap; this closes the structural one.
Add test_every_facade_import_in_repo_resolves: ast-enumerate every
`from src.tool_implementations import X` site in src/ and tests/ and
assert hasattr(ti, X) for each. A forgotten re-export that anything in
the repo imports now fails the build automatically — including underscore
helpers, which the do_* discovery test does not cover.
Together with test_shim_reexports_every_domain_do_function, the shim
contract is now self-verifying. Demote _EXPECTED in the docstring to the
curated historical/downstream surface (the three helpers have no in-repo
consumer, so they stay manual by necessity) instead of "ground truth".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(tools): dedupe _parse_tool_args + align shim guard with route consumers
Addresses two P3s from review (RaresKeY, 2026-06-26):
1. maintainability — _common carried a full copy of _parse_tool_args
alongside the canonical src.tool_utils one; future parser fixes could
diverge. The two bodies were byte-identical in logic, so _common now
re-exports from tool_utils (a leaf module, no circular-import risk).
The single-source test is extended to assert _common._parse_tool_args
and tool_implementations._parse_tool_args are the same object as
tool_utils._parse_tool_args.
2. test — the shim guard's import-site scan only walked src/ and tests/,
missing routes/chat_routes.py's clear_active_email/set_active_email
imports, and _EXPECTED omitted the active-email facade helpers. The
scan now walks every first-party Python dir (pruning venvs/caches/data
in-place), and set/get/clear_active_email are added to _EXPECTED
(get_active_email has no in-repo importer, so the scan alone can't see
it).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: yuandonghao <yuandonghao@cohl.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat: add dismiss (×) button to all toast notifications (#1355)
* Refresh README presentation
* fix: reset pointer-events on toast dismiss button click
Action toasts set pointer-events:auto on #toast for their clickable
button, but the × close-button handler only cleared the auto-hide timer
without resetting pointer-events. This left an invisible fixed overlay
blocking clicks in the top-right area after manual dismissal.
- Add pointerEvents reset in both showToast and showError close handlers
- Add DOM behavior tests for pointer-events across all toast types
---------
Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>
* fix: include in-memory templates in group participant character list
_getCharacterList() only fetched user templates from the /api/presets/templates
endpoint. When a character was just created in the Character tab, the async
auto-save to the templates API might not have completed by the time the Group
tab loaded its participant dropdown — causing newly created characters to be
missing.
Now also merges the in-memory userTemplates array from presets.js as a
fallback. These are updated as soon as the async save completes (via the
loadUserTemplates callback), so they bridge the gap between character creation
and API persistence.
Fixes#3207
* fix: optimistic userTemplates update on character save
Update the in-memory userTemplates array immediately when saveCustomPreset()
succeeds, before the fire-and-forget templates API POST completes. This
bridges the timing gap where _getCharacterList() calls getUserTemplates()
and gets stale data because loadUserTemplates() hasn't been triggered yet.
* test: verify group participant dropdown merges in-memory templates
Source-level guards for the #3207 fix:
- group.js imports and calls getUserTemplates() to merge in-memory templates
- presets.js exports getUserTemplates and does optimistic in-memory update on save
5 tests ensuring the fix can't be silently reverted.
* fix: generate client-side id for optimistic update, return shallow copy from getUserTemplates
1. New characters now get a 'user-<hex>' id immediately on save, matching
the server's convention (uuid.uuid4().hex[:8]). Previously the id was ''
which the merge guard in _getCharacterList filtered as falsy.
2. getUserTemplates() now returns [...userTemplates] so callers cannot
accidentally mutate module state.
* fix(group.js): fix selection drop-downs behavior
- add an identifier to the selection drop-downs
based on what type it is.
- fix behavior of continuously adding a row
when a user clicks the "Group" tab button.
- fix behavior of not repopulating existing
selection drop-downs whenever a user
clicks the "Group" tab button.
* fix(#3207): remove duplicate of latest persona
- fix the duplication of the latest persona
or character being shown in selection
drop-downs.
- remove unnecessary blocks of code in
`_getCharacterList()`
- add functionality to show error toast if saving
a preset template/character fails.
- add functionality to revert optimistic update
of preset template/character if saving fails.
* chore(group.js,preset.js): fix test & format errors
remove trailing whitespaces in lines 230 and 232
in /static/group.js
add back the expected syntax from
tests/test_group_character_dropdown.py
* fix(presets.js,group.js): fix runtime errors
as stated in a comment by @alteixeira20,
runtime errors exist for the applied fixes.
fixes:
- missing ending `]`
querySelectorAll("select.preset-input[data-selection-type=character")
in `group.js`
- spelling error in `modelSelection.vale` in `group.js`
- fix the ordering logic error in optimistic rollback where `Object.assign` is called first before the clone happens in `saveCustomPreset` in `presets.js`.
- add tests for the cloning logic bug with the same format as previous tests by checking the order of LOC in `tests/test_group_character_dropdown.py`.
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix(cookbook): prefer native llama-server on local Windows
* fix(cookbook): harden local llama-server launch commands
* fix(cookbook): build serve commands for selected target
Fixes#4767. #4724 routed 16 body-portaled dropdowns through the shared
topPortalZ() helper so they always render just above the currently-raised tool
modal, but two were missed and still used a hardcoded z-index, so they hit the
same #4720 bug once a modal's bring-to-front counter climbed past the literal:
- tasks.js _showTaskDropdown(): inline z-index:100000 on .task-dropdown
- skills.js kebab menu (.skill-kebab-menu): z-index:100002 in style.css
Both now set zIndex from topPortalZ() after they are appended to the body,
matching the other migrated sites. The dead CSS z-index on .skill-kebab-menu is
removed (the inline value always wins). test_portal_dropdown_z_js.py gains a
source guard asserting both files use topPortalZ() and that no hardcoded
100000/100002 portal literal survives in either file or style.css.
* Fix#4507: only block model launch on real port collisions
Quick-run hardcoded port 8000 and never called _nextAvailablePort(), so
every launch collided. Both pre-launch guards (serve panel + quick-run)
were count-based and fired regardless of port.
- quick-run now auto-assigns a free port (8080 for llama.cpp)
- both guards parse the new port and only prompt on a real overlap,
stopping only the colliding serve
- dialog reports the actual port instead of a hardcoded 8000
* refactor(cookbook): share _taskPort for port parsing; auto-assign llama.cpp port
Addresses review on #4760:
- _taskPort regex now matches --port= as well as --port (space)
- _nextAvailablePort and both launch guards reuse _taskPort instead of inline regex
- quick-run llama.cpp no longer pins 8080, so two can run concurrently
* fix(cookbook): _taskPort also parses -p; add port-parsing tests
Addresses review on #4760:
- _taskPort now matches -p <n> too, so it's the complete single reader
(was missing the short flag that other readers already handle)
- add tests/test_cookbook_port_parsing_js.py covering the port forms,
shared-reader reuse, and llama.cpp auto-assign
* test(cookbook): extract pure port helpers and test behavior
Addresses review on #4760: the prior tests only asserted source strings.
- extract portOf() and nextFreePort() into static/js/cookbookPorts.js
- cookbookRunning.js imports them; _taskPort and _nextAvailablePort delegate
- tests run the helpers via node and assert real behavior: all port forms
(--port, --port=, -p, -p=), next-free-port skipping taken ports, and the
same-port-clash / different-port-coexist outcome
---------
Co-authored-by: samy <samy@odysseus.boukouro.com>
* fix(model-routes): harden _probe_endpoint against malformed model-list responses
_probe_endpoint parsed model lists with data.get(...) at four sites without
checking that data is a dict, and built the list with a truthiness-only
filter. A /models (or /api/tags) endpoint returning HTTP 200 with valid but
non-dict JSON ([], "x", null, 123) made data.get(...) raise AttributeError,
and a non-string id like 123 passed the filter and then hit .startswith() /
.lower() in the Z.AI/Kimi curated merge and _is_chat_model(). Both errors are
swallowed by the broad except Exception, but the comprehension dies mid-list
so the ENTIRE probed model list is discarded and the endpoint silently
degrades — masking a misconfigured/non-compliant upstream as "no models".
- Guard each data.get(...) with isinstance(data, dict) so a non-dict body
falls through the existing `or []` default.
- Restrict the OpenAI and Ollama model-list comprehensions to non-empty str
values, protecting the .startswith() merges and both _is_chat_model calls.
- Add an isinstance guard at the top of _is_chat_model (defense in depth for
all four call sites).
No behavior change for well-formed {"data":[...]} / {"models":[...]}
responses. Adds regression tests (non-dict body via caplog, mixed/all
non-string ids, _is_chat_model boundary) that fail before the fix and pass
after.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(model-routes): extract _openai_model_ids / _ollama_model_names helpers
Per review on #4789: the malformed-response guards were inlined four times in
_probe_endpoint (two OpenAI-id comprehensions, two Ollama-name comprehensions).
Pull each into a small, directly-testable helper so the security-relevant
parsing lives in one place and a future malformed-shape fix doesn't have to be
applied in four spots (CONTRIBUTING flags repeated logic for this reason).
Behavior is unchanged. Adds direct unit tests for both helpers (non-dict body,
non-string ids, non-dict entries, name>model precedence).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upstream bug (present in pewdiepie-archdaemon/odysseus main): the task
executor passes task.endpoint_url VERBATIM to the model HTTP call, unlike
the chat path which stores build_chat_url(normalize_base(base)) on the
session. A task carrying an explicit bare OpenAI-compatible base such as
"http://host:11434/v1" therefore POSTs to a 404 ("page not found"); the
agent loop swallows the empty body into "The model returned an empty
response" and marks the run success, so nothing surfaces the failure.
Tasks that omit an endpoint dodge this only because _resolve_defaults()
cribs an already-full URL from a recent chat session. The API/token path
(e.g. an external client that POSTs /api/tasks with endpoint_url=".../v1")
hits it every time.
Fix: route every resolved task endpoint through _normalize_chat_endpoint()
at the three resolution sites (_execute_llm_task, the persona/research
session path, and _execute_research_task). The helper is idempotent
(strips any existing chat suffix, re-appends the correct one) and leaves
native-Ollama (/api...) and already-concrete URLs untouched, so other
providers are unaffected. Proven via isolated repro: ".../v1" -> 404 ->
empty; ".../v1/chat/completions" -> 200 -> real gemma4:31b output.
Regression test asserts the bare-/v1 -> full-chat-URL mapping, idempotency,
and the native-Ollama/empty passthroughs.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
src/exceptions.py was a byte-for-byte duplicate of the canonical
core/exceptions.py. Replace its class bodies with a re-export shim
(mirroring the core/constants.py -> src/constants.py pattern) so the
exception classes are defined in exactly one place. Also fix the stale
"# src/exceptions.py" header comment in core/exceptions.py.
No behavior change: both import paths resolve to the same class objects
(verified by identity), so `except SessionNotFoundError` works regardless
of which module it was imported from. Ran py_compile and
pytest tests/test_app.py (12 passed).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Part of #3629 (the `admin_tools.py` bullet). Moves the config/integration admin
tools off the legacy elif dispatch chain in tool_implementations.py onto the
agent_tools registry:
manage_endpoints, manage_mcp, manage_webhooks, manage_tokens, manage_settings
The do_* implementations (and manage_mcp's command-allowlist / RCE guard:
_validate_mcp_command, _mcp_allowed_commands, and the _MCP_* constants) move
verbatim into the new src/agent_tools/admin_tools.py. They register through a
single ADMIN_TOOL_HANDLERS map that TOOL_HANDLERS.update()s, and the five elif
branches plus their imports are dropped from tool_execution.py, so these tools
now flow through _direct_fallback like the other migrated clusters. The names
are re-exported from src.agent_tools for back-compat.
Dedup:
- _parse_tool_args was duplicated in tool_implementations.py and
document_tools.py. It now lives once in src.tool_utils (which imports nothing
from the project beyond src.constants, so this introduces no cycle) and both
call sites import it from there. The orphaned `import json` in document_tools
is removed with it.
- The five tools share one _owner_adapter(fn) factory that threads ctx["owner"]
into the owner-taking do_* signature, instead of five near-identical wrappers.
Tests: new tests/test_admin_tools_registry.py pins the registration, the
re-export back-compat, the owner-threading adapter, and the single-source
_parse_tool_args (across admin_tools and document_tools). Existing MCP /
settings / webhook suites are repointed at the new module.
* feat(discovery): detect llama.cpp servers and label local providers
Scan port 8080 (llama-server) and 11435 (APFEL) during discovery, fingerprint
llama.cpp via its native /props endpoint, and label well-known local serving
ports (8080 llama.cpp, 8000 vLLM, 1234 LM Studio, 11434 Ollama) consistently
in both the Python provider helper and the JS endpoint UI. Adds a llama.cpp
hint to the /setup slash command.
* fix(discovery): don't infer the serving tool from the port alone
Per review: vLLM, SGLang, llama.cpp and plain OpenAI-compatible servers all
share 8000/8080, so labeling by port mislabels real setups (a vLLM box on 8080
shown as llama.cpp). Drop the port->tool assertions from _provider_label and
providerLabel; the authoritative signal is the /props fingerprint done during
discovery, which is unchanged. Loopback now reads a neutral 'local endpoint' /
'Local'. Tests updated to assert the neutral labels.
* fix: use atomic write in APIKeyManager.save() to prevent data loss
Opening api_keys.json with 'w' truncates the file before writing, so a
crash, disk-full, or mid-write error leaves all stored provider API keys
corrupted. Switch to atomic write (temp file + fsync + os.replace) so
the original file is always intact on any failure.
Fixes#4591
* chore: trigger CI re-run
* chore: update PR description
* chore: fix how-to-test section for description check
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* First bare fix
* Adding the option toggle
* toggle function fix
* Final fix, added missing /auth/
* Extended toggle text & added tests
* Comments change
* Description toggle change
* br tag fix
* description change based on suggestion
Move calendar processed-marker insert into the LLM success path (else branch).
Previously, the INSERT ran even after a transient LLM failure, causing the
poller to skip retrying calendar extraction on subsequent runs.
Minimal change: only touches the try/except/else control flow in
_auto_summarize_pass_single() — preserves existing formatting and line endings.
setup.py read ODYSSEUS_ADMIN_USER / ODYSSEUS_ADMIN_PASSWORD via os.getenv()
but never loaded .env, so on native Linux/macOS installs a password
pre-seeded in .env (documented in docs/setup.md and .env.example) was
silently ignored and a random one generated, breaking the first login.
Docker was unaffected because compose passes the vars into the container env.
Call load_dotenv(BASE_DIR/.env, encoding="utf-8-sig") at the top of main(),
mirroring app.py (utf-8-sig tolerates a Notepad UTF-8 BOM). load_dotenv does
not override already-exported OS vars, so the existing precedence is kept.
python-dotenv is already a required dependency.
Adds a regression test that pre-seeds credentials only in .env (not the
shell) and asserts the stored bcrypt hash matches the pre-seeded password.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to #4637. serve_index — the handler for / and the SPA deep-link
routes (/notes, /calendar, /cookbook, /email, /memory, /gallery, /tasks,
/library) — pre-checked os.path.exists and raised its own
HTTPException(404, "index.html not found") when the bundle was missing. So a
missing core template returned 404 before serve_html_with_nonce's 500 could
fire, the one inconsistency left after #4637.
index.html is a fixed, app-bundled template; a missing one is a broken
deployment (server fault), not a client "not found", so it should surface as a
logged 500 in 5xx alerting rather than a 404. Keep the static->root fallback,
drop the redundant existence guard and the dead-end 404, and let the shared
helper handle the missing case.
Verified against the running app: / and /notes return 200 with the bundle
present and a logged 500 when index.html is absent.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(docker): make Real-ESRGAN installable on the Python 3.14 image
realesrgan's deps basicsr/gfpgan/facexlib (unmaintained since 2022) read
their version in setup.py via `exec(...); locals()['__version__']`, which
raises KeyError on Python 3.13+ — PEP 667 made locals() in a function an
independent snapshot that exec() can no longer mutate. That fails the
Cookbook "install realesrgan" sdist build on the python:3.14 base.
Add a `realesrgan-wheels` builder stage that fetches the pinned sdists,
patches get_version() to exec into an explicit namespace dict, and builds
wheels; the final stage installs them --no-deps so a later
`pip install realesrgan` resolves from wheels instead of rebuilding the
broken sdists. torch stays a runtime pull to keep the base image lean.
Also add the runtime libs opencv-python (cv2) needs — libgl1,
libglib2.0-0t64, libxcb1 — which the slim base omits; without them the
install succeeds but `import cv2` dies with
`libxcb.so.1: cannot open shared object file`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(cookbook): don't let a package's sys.exit() on import hang the deps panel
The local optional-dependency probe imports each package in-process and
catches ImportError / Exception. But a package can call sys.exit() at
import time — e.g. rembg does `sys.exit(1)` when no onnxruntime backend
loads. SystemExit is a BaseException, not Exception, so it escaped the
probe, propagated out of the list_packages endpoint, and hung the whole
Dependencies panel / worker (the UI loads forever).
Catch (Exception, SystemExit) so one broken optional package is reported
as not-usable instead of taking down the panel.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Add official Gemma 4 12B-it plus QAT-INT4/INT8 catalog entries (with their
GGUF sources), QAT quantization support across the quant tables and the
prequantized-prefix list, and the missing RTX 3050 / 3050 Ti memory
bandwidth so speed estimates stop falling back to the generic cuda value.
* fix(routes): serve 404 instead of 500 when an HTML page file is missing
_serve_html_with_nonce opened the HTML file with no error handling, and
callers such as /backgrounds and /login pass their paths in with no
existence check, so a missing or unreadable file raised an unhandled
OSError that surfaced as a 500. Wrap the read and raise HTTPException(404)
instead; the normal render path (CSP-nonce substitution) is unchanged.
Fixes#4594
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): distinguish missing page (404) from read failure (500)
The previous fix caught a broad OSError and returned 404 for every
failure, which masks real server-side problems (permission errors, I/O
failures) as "not found" and lets them slip past error alerting. Split
FileNotFoundError (genuine 404) from other OSError, which now logs the
exception and returns a generic 500 — without leaking the OS error
string or file path into the response body.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): treat unreadable bundled HTML page as logged 500, not 404
Per PR #4637 review: every caller of the page-render helper serves a fixed,
server-owned template (index/login/backgrounds), never a client-supplied
path. So a missing or unreadable file is a server fault (broken deployment),
not a client "not found" — a 404 there mislabels a server error and hides a
missing core template from 5xx alerting, contradicting the OSError->500
rationale this PR is built on. Collapse both branches into a single logged,
leak-free 500.
Move the helper to src.app_helpers.serve_html_with_nonce so the behavior can
be unit-tested without importing the whole app (app.py is the slim
orchestrator; the test harness stubs src.database, so importing app in tests
is not viable). Add tests pinning missing/unreadable -> 500 (not 404) and
nonce injection on the happy path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(chat): strip executed email tool fences from the live stream (#3993)
The backend strips every fenced tool block from persisted text (the regex in
src/tool_parsing.py is built from the full TOOL_TAGS set, which includes the
email tools), so a reloaded session renders cleanly. The live frontend path
uses a separate hardcoded EXEC_FENCE_RE in static/js/chatRenderer.js that only
listed web_search/read_file/write_file/create_document/edit_document/
update_document — so executed email tool fences (list_emails, etc.) lingered as
raw code blocks in the live assistant bubble until the user reloaded.
Add the nine email tool tags to EXEC_FENCE_RE so the live render settles into
the same clean layout as the history reload. bash/python stay excluded on
purpose: those are languages a user may legitimately have asked the model to
show as code, not tool invocations.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(chat): single-source live exec-fence tool list from TOOL_TAGS (#3993)
Per review: EXEC_FENCE_RE was a second, hand-maintained copy of the
executable-tool list, so any tool not in it — and every future tool added to
TOOL_TAGS — would leave its executed fence lingering in the live bubble until
reload (the original #3993 bug, recurring one tool at a time).
EXEC_FENCE_RE is now built from an explicit EXEC_TOOL_TAGS list that mirrors
TOOL_TAGS (src/agent_tools/__init__.py) minus bash/python, which stay excluded
as legitimate code-example languages. A new regression test
(test_exec_fence_re_covers_all_executable_tools) extracts both lists from
source and fails if they drift, so the whole class is caught in CI instead of
by a user — the "minimum acceptable middle ground" from the review, made exact
(set equality, not just coverage).
Verified: pytest tests/test_live_strip_email_tool_fences.py (5 passed);
node --check static/js/chatRenderer.js; and a node run of the built regex
confirms email/generate_image/manage_memory/ls fences strip while
bash/python/sh are preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(chat): build live exec-fence list from /api/tools at runtime (#3993)
Make TOOL_TAGS the single source for live exec-fence stripping. chatRenderer.js
no longer hard-codes a tool list; it fetches the backend's authoritative set
once from GET /api/tools (sorted(TOOL_TAGS)) and builds EXEC_FENCE_RE from it at
load, minus bash/python. No second list to drift, and a future tool added to
TOOL_TAGS is covered automatically — without touching the streaming path.
Until the fetch resolves EXEC_FENCE_RE is null and exec fences aren't stripped
(a sub-second window before the first stream); the backend already strips
persisted history, so a reload always renders clean.
Drop test_exec_fence_re_covers_all_executable_tools (no hand-maintained list to
guard) and add source-level guards: the frontend keeps no hard-coded list and
fetches /api/tools, and the endpoint serves the full sorted(TOOL_TAGS).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CVCKth4g8pWh7pwFDVm4iL
* fix(chat): warn on /api/tools fetch failure instead of swallowing it (#3993)
A fresh-context review flagged that loadExecFenceRegex's catch silently
discarded errors: if the one-shot fetch fails, EXEC_FENCE_RE stays null for the
whole session and live exec fences go unstripped until reload, with zero signal.
console.warn it, and correct the comment to describe the failure mode honestly
(was understated as just a sub-second startup window).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CVCKth4g8pWh7pwFDVm4iL
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: Images cannot be seen by model that is vision capable
* fix: skip http(s) image_url for Ollama (images[] is base64-only)
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* fix(llm): detect mistral.ai provider and support reasoning_effort
Four coupled bugs broke Mistral thinking model support:
1. _detect_provider() had no mistral.ai host check, so all Mistral
endpoints fell through to the generic 'openai' provider string.
_provider_display_name() correctly identified them as 'Mistral',
making any 'if provider == "Mistral"' check elsewhere dead code.
2. reasoning_effort parameter was never sent in the request payload,
so Mistral never activated thinking mode even when the user
configured a thinking-capable model (mistral-small-latest,
mistral-medium-latest, magistral-*).
3. Mistral returns content as a typed array
([{"type":"thinking",...},{"type":"text",...}]) when
reasoning is on, not as a plain string. Both the streaming and
non-streaming parsers expected strings and silently dropped the
thinking content.
4. _THINKING_MODEL_PATTERNS didn't include magistral or mistral-*
model prefixes, so the frontend wouldn't tag reasoning output
as thinking even after the above were fixed.
Fix:
- Add mistral.ai to _detect_provider() host checks
- Add a _normalize_mistral_content() helper that splits the typed
array into (text, thinking) strings
- Inject payload["reasoning_effort"] = "high" when provider is
Mistral and _supports_thinking(model) is true, in both stream_llm
and llm_call_async payload construction
- Wire the normalizer into both response parsers
- Extend _THINKING_MODEL_PATTERNS to include magistral,
mistral-small, mistral-medium, mistral-large
Tested on Docker install with mistral-small-latest +
reasoning_effort=high. Reasoning streams correctly into the
thinking panel after the fix.
Fixes#4678
* fix(llm): address review — lowercase provider id, configurable effort, tests
Addresses vdmkenny's review on PR #4698:
1. Removed duplicate 'if provider == "mistral"' block in stream_llm
— two back-to-back copies, one was dead-redundant.
2. Dropped personal-context comment ('free-tier limits are generous
for this user') and made reasoning_effort configurable via env var
ODYSSEUS_MISTRAL_REASONING_EFFORT (high / medium / low / none).
Default remains 'high' for backward compat with the tested behavior.
3. Recased provider id from 'Mistral' to 'mistral' to match the
lowercase convention used by every other provider id in the file
(openai, anthropic, ollama, copilot, ...). _provider_display_name()
still returns the Title-Case 'Mistral' for UI labels — only the
runtime id used in 'if provider == ...' checks was recased.
4. Added tests/test_llm_core_mistral_content.py with 13 tests pinning
_normalize_mistral_content()'s contract: string passthrough, the
Mistral array format (thinking + text blocks), and edge cases
(empty, garbage, None, wrong types, missing fields, string-vs-array
inner thinking field).
Also fixed a gap the review didn't catch: the non-streaming paths
(llm_call sync + llm_call_async) were missing the reasoning_effort
injection entirely. Added the same injection to both, so Deep Research
and agent tool calls also activate Mistral thinking.
All 13 new tests pass. Existing reasoning/streaming/ollama-thinking
tests still pass (38 tests, no regressions).
Fixes#4678
* fix(memory): keep the Brain memory item menu above the modal at any stack depth
The memory item "⋮" dropdown is portaled to <body> with a hardcoded
z-index of 10001. Tool modals, however, get a monotonically increasing
z-index from modalManager's bring-to-front counter (_modalTopZ), which
climbs unbounded as modals are opened/restored over a session. Once that
counter passes 10001, the Brain modal stacks above the body-portaled
dropdown, so the menu renders behind the panel — visible only where it
spills past the modal's edge (#4720).
Derive the dropdown's z-index from the owning modal's current z-index
(+1), keeping 10001 as a floor for the common low-counter case, so the
menu always sits just above its modal however high the counter has climbed.
Verified with document.elementFromPoint at the dropdown's location: with a
high modal z-index the old build returns the modal at every sampled point
(menu behind); the fixed build returns the dropdown (menu on top). The
default low-counter case is unchanged (z stays 10001).
* refactor(modal): route body-portaled dropdowns through a shared topPortalZ() helper
The hardcoded z-index:10001 the Brain memory menu used (#4720) is the same
literal shared by ~16 body-portaled dropdowns across calendar, cookbook,
cookbookServe, documentLibrary, emailLibrary, gallery, notes, emojiPicker and
memory — each renders behind its owning tool modal once modalManager's
bring-to-front counter climbs past the literal over a long session.
Promote the per-dropdown fix into a single topPortalZ() helper in
toolWindowZOrder.js — the existing source of truth for tool-window z, already
imported by modalManager's _bringToFront and notes.js — returning
max(topToolWindowZ(), dock-chip floor) + 1, so a portaled dropdown always sits
just above the live tool-window stack however high the counter has climbed.
Route all 16 sites through it. The slashCommands tour tooltips and the
cookbookServe VRAM dialog are intentionally left out (neither is a modal-owned
portaled dropdown).
Add tests/test_portal_dropdown_z_js.py covering the helper, including the #4720
scenario (modal counter at 99999 -> dropdown at 100000). Existing
test_notes_z_order_js.py stays green.
* fix(security): redact credential-bearing URLs and PII from logs
Several log statements emitted sensitive data in clear text:
- model_routes / chat_routes / contacts_routes logged endpoint URLs raw.
Admin-configured URLs can embed credentials in userinfo or query
(e.g. https://user:pass@host, ?api_key=...). Route them through a
shared core.log_safety.redact_url() that drops userinfo/query/fragment.
- note_routes / task_scheduler logged operator email addresses (smtp_user,
recipient). Replaced with presence booleans, which keeps the diagnostic
("why didn't this send") without writing PII to logs.
model_routes already had a local redactor on its HTTPStatusError branch;
the generic except branch was missed, so reuse the existing helper there.
Clears CodeQL py/clear-text-logging-sensitive-data alerts 264, 317, 324,
325, 343, 344, 528.
* fix(security): re-bracket IPv6 hosts and single-source the URL redactor
Address review on #4750:
- redact_url now re-brackets IPv6 literals so host:port stays
unambiguous (https://[2001:db8::1]:8443/v1, not the bracket-less
ambiguous form).
- point model_routes._redact_url_for_log at the shared helper so the
two redactors are single-sourced (also picks up the IPv6 fix).
* fix(security): escape backslashes in calendar bg-image CSS url()
The calendar event-background CSS escaped ' -> \' for a bg: image URL but
not backslashes first. Inside a single-quoted url('...'), \ is the CSS
escape char, so a URL value ending in/containing a backslash escapes the
closing quote and breaks out of the string, injecting arbitrary CSS. The
bg:<url> value is per-event and CalDAV-syncable, hence untrusted (CodeQL
js/incomplete-sanitization).
Add a single canonical _cssUrlEscape() in calendar/utils.js that escapes
backslashes FIRST, then quotes, and route all four sinks through it:
calendar.js:416 / :1263 (the flagged #463/#464), the event-form preview
(:2931), and _calBgCss() in utils.js — the latter two share the identical
bug but were unflagged. Output is byte-identical to the old escaping for
legitimate URLs (which contain no backslashes); only malicious input differs.
Resolves CodeQL js/incomplete-sanitization #463, #464.
* fix(security): route remaining calendar bg url() sinks through _cssUrlEscape
Review (vdmkenny) flagged that the centralization missed an injectable
sibling sink: the edit-form color-picker swatch (calendar.js:2856) built
`url('${url}')` from `existing.color` (a CalDAV-syncable, untrusted `bg:`
value) raw, then interpolated it into `style="background:..."` via innerHTML
- the same `'`/`\` breakout class as the sinks already fixed. The custom-dot
preview (:2953) was likewise raw (non-exploitable - a CSSOM `.style`
assignment of a URL the current user just picked - but it broke the invariant).
Route both through `_cssUrlEscape`, and normalize the two pre-escaped-variable
sites (_calItemBgStyle, _renderWeek) to the same inline form so all five
url() interpolations in calendar.js follow one rule. Add a whole-file
invariant test asserting every `url('${...}')` calls `_cssUrlEscape` - this
catches a future missed sink, the exact failure mode here. Behavior-identical
for legitimate URLs (no visual change).
* fix: document read fails with 403 when auth is disabled
Add _auth_disabled() bypass in _verify_doc_owner() and the
/api/documents/{session_id} route guard so documents remain accessible
in single-user / no-auth mode.
Minimal change: only adds the auth-disabled check alongside existing
403 raises — preserves existing formatting and line endings.
* refactor: hoist _auth_disabled import to module level
Address reviewer feedback on PR #4623 — no circular import exists
(src.auth_helpers only imports stdlib + fastapi), so the inline
imports are unnecessary. Moves the import to module top in both
document_helpers.py and document_routes.py.
* test: add regression tests for auth-disabled document access (PR #4623)
Remote Cookbook hwfit probes failed on Windows hosts because the PowerShell script was sent as nested -Command quoting through OpenSSH. Use -EncodedCommand for remote probes, auto-detect platform when omitted (including Darwin for Mac SSH hosts), and return a clearer error when SSH works but the probe fails.
Co-authored-by: Cursor <cursoragent@cursor.com>
The app's ad-hoc dropdown/context menus each wire their own document-level
outside-click listener, but that listener only removes itself on an *outside*
click. Every other dismissal path -- clicking a menu item (which calls
el.remove() directly), a Cancel button, Escape, or the "close the
previously-open menu" reopen sweep -- tears the node down without
unregistering the listener, orphaning it on `document`. The stranded listener
then lingers and can break the next menu interaction: the recurring "the
button stops working until I refresh the page" class of bug (e.g. delete an
email, then the kebab/more button is dead on the other rows).
Route all 16 of these menus through the existing escMenuStack helper
(bindMenuDismiss / dismissOrRemove), exactly as documentLibrary.js
_showLibDropdown, cookbookRunning.js, and research/panel.js already do: a
single idempotent close() owns the teardown and is released on every dismissal
path, reopen sweeps use dismissOrRemove() instead of a bare .remove(), and
Escape flows through the central LIFO esc-stack arbiter. Net -49 lines.
Menus migrated: cookbook _showDepMenu; document export menu and
_openDocAiReplyChoice; emailInbox _showEmailMenu; emailLibrary
_showReaderMoreMenu / _showCardMenu / _showBulkActionsMenu; gallery
_showGalleryBulkMenu; notes _pickCustomDate / _openNoteCornerMenu; settings
(3 unified-integrations dropdowns); skills _openSkillMenu; tasks
_showTaskDropdown; compare _toggleExportMenu.
Per-menu semantics preserved (anchor-as-inside tests, the tasks 250ms
ghost-click guard, emailLibrary's reader-more-active anchor class and the
bulk-Cancel select-mode reset, settings' reused-vs-recreated lifecycles).
Six menus with custom lifecycles (notes _openReminderMenu, sessions
long-press, document markdown-toolbar, emojiPicker, compare model selector)
are intentionally left for a follow-up -- each needs individual review.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The native Windows launcher binds to 127.0.0.1 via its own -BindHost
parameter and does not read APP_BIND/ODYSSEUS_HOST from .env, so editing
.env alone leaves the server on loopback. Document the -BindHost flag in
the Native Windows setup section, with the existing keep-auth-on /
don't-expose-publicly caveats.
Fixes#4552
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Add a post-delete verification step: after the CardDAV server returns
2xx/404, force-re-fetch the contact list and confirm the UID is gone.
If the UID is still present, log a warning and return False instead of
silently reporting success.
This catches the case where _resolve_resource_url falls back to the
guessed {uid}.vcf URL but the contact's real resource URL differs —
the DELETE hits the wrong URL, server returns 404 (treated as success),
but the contact remains. Previously this caused silent persistence
failures and agent loops.
Inline backtick spans were converted to <code> only at the end of
mdToHtml, after the bare-URL autolink and <a>/allowed-HTML passes. A URL
inside inline code is preceded by a space, so the autolink wrapped it in
an <a> tag and swapped it for an ___ALLOWED_HTML_ placeholder, corrupting
commands like `irm http://127.0.0.1:3000/x`.
Extract inline code into placeholders before the link passes, mirroring
the existing fenced-code-block handling, and restore them last so
placeholders carried inside restored <a> blocks resolve. Escape the code
at extraction time since it now bypasses the global escape pass.
The fallback regex in email_pollers.py that recovers a
[{"action": ...}, ...] JSON array from raw model output used lazy
[^[\]]*? runs inside a (?:,\s*\{...\}\s*)* repetition, which backtracks
exponentially (CodeQL py/redos) on inputs like [{"action"},{ + }},{{ * N.
It runs on the LLM reply to an email→calendar prompt embedding the
untrusted email body, so a crafted email can stall the background poller.
Extract the pattern to a module-level _CAL_ACTION_ARRAY_RE and rewrite the
object-content class from the lazy [^[\]]*? to a greedy brace-delimited
[^{}], which removes the quantifier ambiguity. The match is linear (a 500KB
adversarial input now resolves in <1ms) and equivalent on well-formed
arrays; it is also strictly more robust for values containing '[' or ']'
(the old class bailed on those and extracted nothing).
Resolves CodeQL py/redos #198.
* feat(a11y): add a Text size control and an OpenDyslexic font option
Text size: a Theme > Font & Layout control (Default / Larger) that scales the whole UI via CSS zoom, so the many hard-coded px sizes scale too (density only moves the root font-size). Stored globally so it persists across theme switches; applied early in the boot script to avoid a flash. OpenDyslexic: a dyslexia-friendly self-hosted font (SIL OFL 1.1), bundled as woff2 alongside Fira Code/Inter and wired into the Font select. Reuses the existing density/font pattern end to end; no new colours, spacing, or component styles.
* fix(a11y): keep modals on-screen at Larger text size
Inline vh heights on .modal-content overrode the ui-scale-125 max-height
compensation, so Cookbook (and the email/doc/skills/PDF modals) overflowed
the viewport at 125% — pushing the header and close button off-screen.
Let the compensation own those heights.
* fix(a11y): keep PDF export modal at its original 86vh on Default size
The DELETE /api/personal/file disk-delete containment check used the
shared PERSONAL_UPLOADS_DIR root, so one admin could delete another
user's personal upload by passing its path (uploads are partitioned per
owner under <root>/<owner>/). Confine the check to the caller's own
per-owner subdir via _personal_upload_dir_for_owner(owner). RAG removal
and listing exclusion are unchanged (they still serve non-upload indexed
sources). Adds a regression test for the cross-owner case.
Moves create_session, list_sessions, send_to_session and manage_session out of
ai_interaction.py into src/agent_tools/session_tools.py (the do_ prefix
dropped) and registers them in TOOL_HANDLERS, so dispatch flows through the
registry instead of the dispatch_ai_tool elif in tool_execution.py. Same
pattern as the model-interaction move.
The bodies move verbatim; each fetches the runtime-set session manager via a
get_session_manager() shim, and reuses _resolve_model / AI_CHAT_TIMEOUT from
ai_interaction. manage_session's internal 'list' alias is repointed from the
old do_list_sessions to the moved list_sessions. stream_ai_tool (dead, no
callers) and do_pipeline stay put. dispatch_ai_tool loses its four now-unused
branches.
Tests: test_session_tools_registry covers registration, owner threading, the
manage_session->list_sessions delegation, graceful no-manager handling, and
registry dispatch. Verified end-to-end against a live SessionManager.
chatRenderer.js built the model-info popup HTML by concatenating the
model name (from the LLM response's model/answered_by field) into
popup.innerHTML without escaping, so a model advertised as an HTML/script
payload executed when the user clicked the role label. Wrap both
insertions with the uiModule.esc() helper the same function already uses.
Also apply existing escape helpers at two latent sinks flagged by CodeQL,
fed only by self-authored/server values today: document-tab title via
_esc(), and the calendar event background URL (escape the double quote
that would otherwise break out of the style="..." attribute).
Never imported anywhere in the codebase (unused since v1.0); it is the only
root dependency and nothing depends on it. Removing it also drops 6 transitive
packages from the lockfile.
Fixes#4565
Detached bash jobs (#!bg) could be launched and auto-reported on completion,
but the agent had no way to act on a running one: no on-demand output read and
no kill (it blocked until the 1h max-runtime). bg_jobs had the pieces
(_read_output, list_for_session, internal _kill) but none was exposed.
Adds:
- bg_jobs.kill(job_id): tears down the process tree, marks the job killed, and
sets followed_up so the monitor does not also auto-continue a deliberate kill.
- manage_bg_jobs registry tool with actions list / output / kill, scoped to the
chat that launched the job (cross-session access reads as not found).
- Wiring: TOOL_HANDLERS/TAGS, function schema, RAG index + keyword hints, parser
name map, dispatch (threads session_id via _direct_fallback). Gated like bash
(NON_ADMIN_BLOCKED_TOOLS; plan-mode mutator).
- agent_loop: background-job intent regex maps to the files domain (and the tool
joins _DOMAIN_TOOL_MAP[files]) so short commands like 'kill that job' are not
dropped by the low-signal gate that skips tool retrieval.
- bg launch message tells the model to call manage_bg_jobs itself for check/stop
rather than printing raw tool syntax to the user.
Tests: tests/test_bg_job_tools.py (kill semantics, per-chat scoping, actions,
and the intent classifier).
Three different background loops (_reconnectTask reachability poll,
_checkServeReachability, _pollBackgroundStatus) each independently
flipped Ollama sidecar tasks between running and stopped because the
`docker exec ollama-rocm ollama show <tag>` cmd exits cleanly after
its verification print, which the loops misread as the serve dying.
Added _isOllamaSidecarTask(task) and an early-bail in each of the
three loops so the task stays pinned to running once the show-cmd
exits 0. Also the tmux-graceful-kill path prepends a
`docker exec ollama-rocm ollama stop <tag>` before tearing down
the tmux session, so the Ollama-side model load gets unloaded too
(was leaving the model resident in the daemon after Stop).
Frontend half of the backend-detection + per-OS install command work,
plus a pile of mobile/UX fixes:
Backend awareness:
- _gpuEnvPrefix() picks CUDA_VISIBLE_DEVICES / HIP_VISIBLE_DEVICES /
nothing based on detected hwfit backend + scanned-host match (so a
stale ajax scan does not leak CUDA env vars into a kierkegaard
Vulkan launch). Replaces 6 hardcoded CUDA_VISIBLE_DEVICES sites.
- GGML_CUDA_ENABLE_UNIFIED_MEMORY only emitted when backend is
actually CUDA (was leaking onto Vulkan/ROCm via saved presets).
Per-target install command:
- Dep rows render a single mono command box + Copy button when the
server resolved pkg.install_cmd_for_target. Reused in the build-deps
install failure toast so the toast and the row show the same line.
- Diagnosis patterns split cmake/g++/git out of the generic
llama-cpp-python catch-all so a missing-cmake failure surfaces a
cmake-specific message + per-distro Copy buttons.
Form toggles always visible:
- Reasoning Parser, Expert Parallel, MoE Env Vars no longer gated on
model-family detection. Detection still hints (parser tag shown when
matched); toggle works with sensible defaults otherwise. MiniMax M-
series added to MoE family detector so the auto-fill is right.
Mobile + GPU default:
- Launch tab cached-list flex collapsed to 0px on mobile because the
desktop `flex: 1 1 0` had no parent height to grow into. Override
to `flex: 0 0 auto` in the cookbook mobile @media block.
- doclib-card expand on mobile (Firefox no :has() support) pins
explicit px heights so the launch form actually appears.
- llama_mode defaults to gpu when hwfit detected cuda/rocm/vulkan/
metal on the current target, instead of always cpu (which was
forcing -ngl 0 on first-open and burning 35GB models on CPU).
When a llama.cpp launch needs cmake/build-essential/git the user used to
get a four-distro dump ("apt: x / pacman: y / dnf: z / brew: w") and
had to pick the right one. Now:
- shell_routes /api/cookbook/packages probes /etc/os-release on the
target in the same SSH round-trip as the existing system-prereq
check, classifies into debian / arch / fedora / alpine / suse /
macos, and builds a single install_cmd_for_target string from the
(os_family, backend) matrix. CUDA hosts get nvidia-cuda-toolkit;
ROCm gets rocm-dev / rocm-hip-sdk; Vulkan gets libvulkan-dev /
vulkan-headers; etc.
- llama_cpp catalog entry gets system_prereqs: [cmake, g++, git].
When any of those are missing on the target, the row picks up
pkg.build_deps_missing + pkg.install_cmd_for_target for the
frontend to render.
- New POST /api/cookbook/install-system-deps endpoint runs the right
package manager via passwordless sudo on the target. Allowlisted to
{cmake, build-essential, g++, gcc, git, tmux, make}; sudo -n only
so it can never hang waiting for a password (returns a clear
"passwordless sudo unavailable" error via stderr instead).
Three classes of incorrect detection fixed:
(1) AMD GPU + no ROCm installed (e.g. Strix Halo) was reported as
backend=rocm everywhere, so launch commands emitted
HIP_VISIBLE_DEVICES (silent no-op on Vulkan) and the from-source
build path failed. Both _probe_amd_sysfs (routes/cookbook_routes)
and _detect_amd (services/hwfit/hardware) now probe rocminfo /
hipconfig / vulkaninfo at detection time and report vulkan when
only Vulkan is present.
(2) Build helper was picking the CUDA branch on AMD hosts whenever a
stray pip-installed nvcc was on PATH (vLLM wheels carry one
without libcudart). Added _odysseus_has_nvidia_hw() that checks
nvidia-smi / /dev/nvidia* / lspci, and gates both the nvcc PATH
augmentation and the CUDA elif branch on real hardware.
(3) Build chain reordered to ROCm/HIP > CUDA > Vulkan > CPU. Vulkan
tier added between CUDA and CPU as a portable fallback for hosts
with a GPU but no native toolchain (the common Strix Halo case).
Same _append_llama_cpp_linux_accel_build_lines also auto-attempts
sudo -n apt/pacman/dnf install of cmake/build-essential/git when
they are missing, surfacing a clear no-passwordless-sudo warning
otherwise.
Cookbook now needs to docker-exec into ollama-rocm (and any other sibling
container holding a model server) from inside its own container, so:
- Dockerfile installs the Docker CLI from the static binary tarball
(the Debian docker.io package ships dockerd but not the client on slim)
- docker-compose.yml bind-mounts /var/run/docker.sock and adds group_add
for the host docker group (default GID 963)
- entrypoint.sh detects the socket GID, creates a local group with that
GID, and runs usermod -aG before gosu-dropping to the app user so the
supplementary group propagates through (gosu strips by default)
* fix(tools): prune skipped dirs before descending in glob tool
GlobTool used pathlib.Path.rglob which descends into every directory
(including node_modules, .git, dist, etc.) and filters AFTER the walk.
On repos with large junk directories this causes the glob tool to hang
for minutes.
Replace rglob with os.walk that prunes _CODENAV_SKIP_DIRS before
descending — matching the approach GrepTool already uses. Also add a
fast path for literal patterns (no wildcards → direct path lookup).
Fixes#4493
* fix(tools): use regex glob matching to fix * semantics and literal fallback
Replace fnmatch with _glob_to_regex so that * stays within a single
path segment (matching pathlib/rglob semantics) and **/ spans zero or
more directories. Literal patterns now fall through to os.walk when
the direct path lookup misses, so e.g. 'foo.py' still finds files at
any depth.
Add tests for:
- bare literal matching in subdirectories
- multi-segment single-star patterns (sub/*.txt)
- * not crossing / boundaries
- ** matching at arbitrary depth
Closes#4493
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
The Dependencies tab's llama.cpp docker recipe surfaced
\`docker pull ghcr.io/ggerganov/llama.cpp:server-cuda\`. The upstream
repo moved from github.com/ggerganov/llama.cpp to
github.com/ggml-org/llama.cpp and the old GHCR namespace no longer
publishes images, so copying the recipe failed with:
failed to resolve reference "ghcr.io/ggerganov/llama.cpp:server-cuda":
not found
Point the recipe at \`ghcr.io/ggml-org/llama.cpp:server-cuda\`, which is
already the namespace routes/cookbook_routes.py uses for the source
clone. Adds a regression test in the same shape as
test_cookbook_diagnosis_js.py asserting the new namespace and forbidding
the dead one.
No CSS/HTML/SVG/style changes — the file is a pure data module
(no DOM access) consumed by other renderers; only the displayed command
text changes.
Two background tasks scheduled on every chat completion in
routes/chat_helpers.py — the memory/skill extraction dispatch and the
session auto-namer — are created via bare asyncio.create_task(...).
asyncio only holds a weak reference to the outer task, so the GC can
collect it mid-execution and the work silently never runs.
Add a module-private _BG_TASKS set and a _spawn_bg() helper that mirrors
WebhookManager._spawn_tracked (the pattern #3964 / #4336 established for
the webhook emitters two lines apart in the same function). Route both
call sites through it so the lifecycle owner is explicit.
Adds an AST-level guard test that fails on any bare
asyncio.create_task(...) statement in routes/chat_helpers.py to prevent
a regression — same shape as test_webhook_emitters_use_manager.py from
#4336.
The same bare pattern exists in routes/email_routes.py and
routes/cookbook_routes.py; left out of this PR per CONTRIBUTING.md's
"one fix per PR" and tracked in #4443's "Additional Information" for a
follow-up.
The diagnosis panel offered a "Kill vLLM processes" (pkill -f vllm) recovery
for ANY Python traceback — including pip build failures and other tracebacks
that have nothing to do with vLLM. That advice is useless for a build failure
and harmful if an unrelated vLLM server happens to be running.
ERROR_PATTERNS in static/js/cookbook-diagnosis.js had one catch-all traceback
matcher that always attached the vLLM-kill fix. Split it into three (all
keeping the existing healthy-server suppression):
- pip build failure (Failed to build / metadata-generation-failed /
subprocess-exited-with-error / Could not build wheels) -> "a dependency
failed to build" message, no kill.
- vLLM-specific traceback (tail mentions vllm) -> keeps the kill, now scoped.
- any other traceback -> neutral "check the captured output" message, no kill.
How to test:
- node --check static/js/cookbook-diagnosis.js
- Trigger a wheel-build failure (old package on a newer Python) or a non-vLLM
traceback and open the diagnosis. Before: generic traceback message + "Kill
vLLM processes" button. After: a build-failure / neutral message with no kill;
only a real vLLM traceback still offers it.
Fixes#4516
Co-authored-by: Claude
The persistent login cookie's max_age hardcoded 60 * 60 * 24 * 7, an
independent copy of the session token lifetime that core/auth.py already
defines once as TOKEN_TTL (and reports to the frontend via /api/auth/policy
as session_days). If TOKEN_TTL changes, the cookie silently drifts: the
browser keeps a cookie for a token whose lifetime no longer matches.
Import TOKEN_TTL and use it for the cookie max_age so the session lifetime
has a single source of truth. No behaviour change at the current value.
Fixes#4471
The harmony stream router only recognized the analysis and final channels, so
gpt-oss's standard `commentary` channel (tool-call preambles / function-arg
bodies) was unhandled: the literal `<|channel|>commentary` marker, the
`to=functions.*` recipient, and the commentary body all leaked into the
visible answer. Add commentary to the marker regex + the suffix-hold table, and
route its body to thinking (only `final` is user-facing). Adds a regression
test (split-chunk + recipient + body), verified to fail without the fix.
_patch_prefs installs a fake routes.prefs_routes with a bare
sys.modules[...] = assignment that is never undone. The stub is an empty
ModuleType without _save_for_user, so a later test whose code path runs
`from routes.prefs_routes import _save_for_user` (e.g. test_backup_import_skills)
fails with ImportError under an unfavorable test order.
Install the stub with monkeypatch.setitem instead (the helper already takes
monkeypatch and uses it for DATA_DIR) so it is reverted at teardown.
Repro: pytest tests/test_skill_index_prompt_injection.py tests/test_backup_import_skills.py
(1 failed before, 5 passed after).
* fix(agent): index api_call so RAG tool selection can retrieve it
api_call exists in FUNCTION_TOOL_SCHEMAS and the agent's system prompt
advertises configured API integrations, but the tool had no entry in
BUILTIN_TOOL_DESCRIPTIONS. RAG tool selection embeds those descriptions and
retrieves the top-K per message, so a tool without one can never be selected:
the agent claims it can call Home Assistant/Miniflux/Gitea/etc. and then
never receives the api_call schema (unless the Personal Assistant
ASSISTANT_ALWAYS_AVAILABLE path applies).
Add a retrieval-rich description for api_call, plus an ast-based parity test
asserting every FUNCTION_TOOL_SCHEMAS tool has an index description so the
next added tool cannot silently drift the same way.
Fixes#3794
* fix(agent): route API-integration intent to api_call at selection time
Addresses review (RaresKeY) on #3923: indexing api_call in the ToolIndex
description was necessary but not sufficient — the #3794 repro ('Use the
api_call tool to call Home Assistant GET /api/states') matched no domain in
_classify_agent_request, classified as low-signal, so the agent loop skipped
retrieval entirely and the schema filter sent only ALWAYS_AVAILABLE
(manage_memory/ask_user/update_plan). api_call never reached the model.
- _classify_agent_request: detect API-integration intent (api_call,
integration(s), Home Assistant/Miniflux/Gitea/Linkding/Jellyfin) -> new
'integrations' domain, so the turn is no longer low-signal.
- _DOMAIN_TOOL_MAP['integrations'] = {api_call}: deterministically seeds
api_call into relevant tools after retrieval, independent of embeddings.
- _DOMAIN_RULES['integrations']: rule pack (required — _domain_rules_for_tools
indexes _DOMAIN_RULES[domain] directly).
- tool_index _KEYWORD_HINTS: parity hint for the retrieval / keyword-fallback
paths.
- Regression drives the real classifier -> domain-map -> FUNCTION_TOOL_SCHEMAS
filter chain and asserts api_call is advertised for the #3794 prompt.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(document): allow render-pdf to be framed and 503 cleanly on missing PyMuPDF
Fixes#2101.
Two related bugs in the PDF-form library preview flow:
1. SecurityHeadersMiddleware was sending X-Frame-Options: DENY and
frame-ancestors 'none' on /api/document/{doc_id}/render-pdf, but
static/js/documentLibrary.js embeds the response in an <iframe> for
the library card preview. The browser blocked the load with
ERR_BLOCKED_BY_RESPONSE, leaving the user with a blank panel.
Extend the existing is_tool_render exemption to also cover
/api/document/.../render-pdf. Per-document owner checks still run in
the route handler, so the exemption is scoped the same way as the
tool-render exemption it mirrors. /api/document/.../export-pdf is
left untouched — it's a download (Content-Disposition: attachment),
not an iframe embed.
2. routes/document_routes.py:render_pdf called fill_fields, which
raises RuntimeError via _require_fitz() when the optional PyMuPDF
dependency isn't installed. That RuntimeError bubbled out as a
generic 500 with a cryptic 'PDF render failed' detail.
Reuse the existing _load_pdf_viewer_fitz() helper to fail fast with
a 503 and a user-actionable install hint (mentions
requirements-optional.txt and AGPL-3.0), matching the convention
used by the other PDF endpoints.
Tests cover both fixes:
- middleware headers on /api/document/.../render-pdf (iframeable, but
X-Content-Type-Options and Referrer-Policy are still set)
- middleware headers on /api/document/.../export-pdf (must stay strict)
- middleware path matching precision (similar-but-different paths stay
strict)
- middleware headers on /api/tools/.../render (no regression)
- middleware headers on /api/chat (no regression)
- render-pdf returns 503 with install hint when PyMuPDF is missing
- 503 is raised before any file I/O (fail-fast ordering)
* chore: address maintainer feedback on PDF previews same-origin framing and comment trimming
* chore: make render-pdf regression tests order-independent
Moves chat_with_model, ask_teacher and list_models out of ai_interaction.py
into src/agent_tools/model_interaction_tools.py (the do_ prefix dropped) and
registers them in TOOL_HANDLERS, so dispatch flows through the registry instead
of the dispatch_ai_tool elif in tool_execution.py.
The implementations are relocated, not wrapped. ai_interaction.py keeps only
the shared helpers they reuse (_resolve_model, AI_CHAT_TIMEOUT), still used by
the not-yet-migrated session/pipeline tools. dispatch_ai_tool loses its three
now-unused branches.
Also removes the dead do_second_opinion: it was already off the live tool
surface (no tag/schema/parsing/dispatch; tool_index.py notes it was removed),
so the function and its stale frontend catalog entries (admin.js, assistant.js)
are deleted.
Tests: owner-scope test points at the new list_models location and drops the
moved tools from the dispatch_ai_tool parametrize; a new
test_model_interaction_registry covers registration, owner threading, and
registry dispatch.
* fix(security): allowlist manage_mcp 'add' to close the agent-path RCE
do_manage_mcp('add') passed model- and prompt-injection-controlled command,
args, and env straight to a stdio subprocess spawn with no validation, and it
persisted an enabled server row before connecting (so a payload also survived
to re-execute on restart). A string smuggled into a skill description, memory
entry, fetched page, or email body could register a server running arbitrary
code as the app UID, e.g. command='sh' args=['-c','...'].
Add _validate_mcp_command, applied on the agent path before any DB write or
spawn:
- Hard-deny interpreters, runtimes, package runners, shells, and exec-wrappers
(even if an operator lists one in ODYSSEUS_MCP_ALLOWED_COMMANDS).
- Require a bare basename (no path components, no shell metacharacters) that is
present in the operator allowlist (empty by default).
- Reject code-exec argv flags by prefix so glued forms are caught too
(-c/-e/-m/--eval/--exec/--print/--module/--command/--require), remote-URL
args, and env keys that inject code into the child (LD_PRELOAD, NODE_OPTIONS,
PYTHONPATH, DYLD_*, PATH, ...).
A rejected registration returns an error, writes no row, and makes no
connection. The trusted admin route is unchanged. Mirrors the policy intent of
_validate_serve_cmd but inverted for the model-reachable surface.
Supersedes #438; incorporates the bypass forms found in its review (interpreter
script paths, -m pip, glued -c/-e, --eval=, eval subcommands, package runners,
remote URLs) and adds integration coverage on the real do_manage_mcp path.
Closes#2891
* fix(security): deny versioned/alias runtimes in manage_mcp allowlist
Addresses RaresKeY's review on #4433. The hard-deny matched command names
exactly, so versioned or alias runtime forms (python3.11, node18, pip3,
ruby3.2, java, javac, bunx, tsx, ts-node, pypy3, ...) slipped past and, if an
operator allowlisted one, re-opened the prompt-injection-controlled MCP
registration path.
- Canonicalize a trailing version suffix before the deny check so versioned
forms collapse to the family (python3.11 -> python, node18 -> node, pip3 ->
pip); both the raw basename and the canonical form are denied.
- Broaden the denied-family set (java/javac/jshell/jbang/kotlin/dotnet/mono/
swift/osascript/tsx/ts-node/bunx/pypy/jruby/raku/luajit/wish/expect/iex).
Deny runs before the operator allowlist, so an alias cannot be allowlisted back
in. Canonicalization only feeds the deny check, so a legit name that ends in a
digit still reaches the normal allowlist check rather than being mis-denied.
Adds validator + integration regressions for versioned/alias runtimes asserting
no DB row and no connection, including the allowlisted-anyway case.
* fix(hwfit): use CPU fallback for cpu_only speed estimates
* fix(hwfit): preserve ARM fallback for cpu_only estimates
---------
Co-authored-by: Cata <cata@bigjohn.local>
_showDiagnosis referenced an undefined `body` (left over from the refactor
that moved the diagnosis text into the toolbar), throwing a ReferenceError
whenever a failed task rendered fix buttons. Because open() wraps its render
in try/finally with no catch, the throw escaped before the modal was
un-hidden, so the whole Cookbook silently failed to open.
- cookbook-diagnosis.js: append the fixes row to `diag` (the in-scope
container) instead of the removed `body` element.
- cookbook.js: guard the render passes in open() so one broken task card
can't leave the entire panel stuck hidden.
Fixes#4406
The scheduled-task runner built the agent's tool set from RAG retrieval plus
ASSISTANT_ALWAYS_AVAILABLE. Neither includes bash/python (nor the file tools),
and no keyword hint force-includes them, so a task only saw the shell when the
tool-embedding index happened to surface it. On hosts where that index is empty
or degraded (e.g. a fresh Docker deploy), retrieval returns nothing and the task
agent never receives bash/python — telling the user the shell is disabled even
for an admin owner.
Offer the shell/file group to task agents by default, mirroring the chat agent
where these are on unless a privilege or global setting turns them off. The
existing blocked_tools_for_owner() gate in stream_agent_loop still strips the
whole group for non-admin multi-user owners and only admits it for admins and
single-user (AUTH_ENABLED=false) deployments, so this changes what is offered,
not who is allowed. A crew that defines an explicit enabled_tools allowlist
still has its restriction honored.
Also merge the operator's global disabled_tools setting into the scheduler's
disabled set before composing relevant_tools and before entering the agent
loop, matching what chat already does. Without it, the global tool-disable
contract did not reach unattended scheduled tasks: an admin or AUTH_ENABLED=false
task could still see and call shell/file tools the operator had turned off
globally, since the prompt/schema/execution gates only enforce the disabled
tools passed in.
cmd_list filtered on the event START falling inside the window
(dtstart >= start AND dtstart < end). The canonical web route
(routes/calendar_routes.py) and the recurrence contract test use
OVERLAP semantics for non-recurring events: dtstart < end AND
dtend > start. So an event that began before the window but is still
ongoing inside it — e.g. a 09:00-17:00 conference listed at 14:00, or
any multi-day event spanning the window — was silently dropped by the
CLI even though the web UI shows it. Use overlap, matching the route.
dtend is NOT NULL in the schema, so no null-end regression.
The non-native (prompted) tool-call path fed tool output back to the model as a plain "[Tool execution results]" user message, bypassing the untrusted_context_message wrapper that THREAT_MODEL.md requires for tool output. That path is what models without native tool-calling (many smaller local models) use, so prompt-injection inside a tool result (fetched page, file read, MCP/email output) could be read as instructions there.
Wrap it via untrusted_context_message("tool execution results", ...), the same hardening already applied to skills (#788) and escalation traces (#275). Also update _recent_context_for_retrieval, which used the old "[Tool execution results]" prefix as a sentinel to keep tool envelopes out of the retrieval query, to recognise the wrapped envelope via metadata.trusted.
The native path keeps returning tool-role messages (a user-role wrapper would break the native tool-call contract); it is covered by UNTRUSTED_CONTEXT_POLICY. Adds tests/test_tool_output_prompt_injection.py.
Fixes#1627.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The in-process tool loopback stamps current_user = "internal-tool" and
require_admin grants admin to that sentinel; it is also a reserved username.
That security-sensitive string was hand-typed in ~7 places (stamp, admin gate,
RESERVED_USERNAMES, and standalone admin-equivalent checks in note/research/
shell/task routes), where a typo silently breaks an auth gate.
Add INTERNAL_TOOL_USER in core/middleware.py next to INTERNAL_TOOL_TOKEN/
INTERNAL_TOOL_HEADER and use it at every such site. A typo is now an
ImportError, not a silent mismatch. auth.py importing middleware is acyclic
(middleware imports no app modules). Behaviour is unchanged.
The multi-sentinel sets bundling internal-tool with api/demo/system
(assistant_routes, task_scheduler, research_routes) are a separate reserved-set
dedup, left for a follow-up.
Closes#4332
Added PASSWORD_MIN_LENGTH and RESERVED_USERNAMES to src/constants.py as the
single source of truth. Previously PASSWORD_MIN_LENGTH was hardcoded as 8 in
four route handlers and all three JS validation paths; RESERVED_USERNAMES was
an inline frozenset duplicated in core/auth.py, routes/assistant_routes.py,
routes/research_routes.py, and src/task_scheduler.py.
Added GET /api/auth/policy (unauthenticated) so the frontend reads the real
values from the server instead of hardcoding them in JS.
Added missing empty-username guard to /setup and admin POST /users. Both
returned a misleading 500/409 on whitespace-only input. /signup already had the
check; this makes all three consistent.
* docs(architecture): add Phase 0 runtime inventory document
Per #4082 requirements, this no-code planning document maps:
- Largest runtime modules (Python + frontend)
- Import dependency graph and cross-layer violations
- Route ownership grouped by feature domain
- Tool registry boundaries and split candidates
- Risk-ranked candidate slices with recommended first 3 PRs
- Safety guardrails and validation commands for follow-up work
Closes#4082
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(architecture): correct inventory metrics per review feedback
Address @alteixeira20 review on #4148 (CHANGES_REQUESTED):
- src/ flat .py: ~60 -> 91; routes/: 52 -> 54
- core/database.py importers: 49 -> 94; src/agent_loop.py: -> 21
- src/ -> routes/ import lines: ~20 -> 38
- src/ subdirs: 3 -> 2 (agent_tools/, search/); drop non-existent agent/
- move main.py and src/agent/ out of current-structure into new
section 10 'Future Direction (NOT current state)'
- route grouping: frame as one domain per PR, not a broad
reorganization (helper imports / registration / test path risk)
* docs(architecture): round-2 fixes — move to specs/, correct counts, frame as candidate
Per @alteixeira20 + @RaresKeY review on #4148:
- Move docs/architecture-runtime-inventory.md -> specs/ (docs/ is
GitHub Pages public content, per @RaresKeY)
- src/ -> routes/ import lines: 38 -> 30 (direct grep of import lines
referencing routes/, matching reviewer's count)
- self-caught count drift: tests 552 -> 544; routes->src 349 -> 351;
src->core 49 -> 99
- frame section 6 (rankings/package shapes/split order/route grouping)
and section 10 (future direction) as candidate proposals pending
maintainer agreement, not a committed plan (per @RaresKeY)
* docs(architecture): round-3 reviewer fixes — fix tool categorization, counts, appendix
Self-review as reviewer found:
- §5.2 tool categories were wrong: listed filesystem/shell/email-sending
tools that are NOT in tool_implementations.py (they live in src/agent_tools/).
Rewrote to the actual 33 do_* functions grouped by domain
(system/cookbook/calendar/notes/search/research/contacts/vault/image)
- §2.1 builtin_actions.py: 0 -> 2 classes, ~26 -> ~24 functions
- §5.1: '33+' -> '33' (exact count)
- Appendix A: 'Complete File Listing' -> 'File Listing'; src noted as
'61 of 91 shown' (was claiming complete but listed 61)
- Last updated date refreshed
* docs(architecture): round-4 — verify remaining counts, soften §6.3 framing
- task_scheduler ~6 -> 5 funcs; tool_index ~580 -> 542 lines (verified vs dev)
- §6.3 'Recommended First 3 Slices' -> 'Candidate' (ownership unsettled, per review)
- verified §4 route-domain line counts, §2.2 frontend counts, mcp_servers=4
- full test suite: 3267 passed, 1 skipped, 0 failed
* docs(architecture): refresh Phase 0 inventory metrics + document counting method
Refresh every count against current dev (b58af42) per review on #4148:
- src/ flat .py: 91 -> 95; tests/test_*.py: 544 -> 583
- core.database importers: 94 -> 102; src.agent_loop importers: 21 -> 22
- src/ -> routes/ lines: 30 -> 31; routes/ -> src/: 351 -> 374; src/ -> core/: 99 -> 106
- Last updated: dev@9d7a3d6 -> dev@b58af42
Add a "How the metrics are computed" note under section 3.4 with the exact
grep/find command for each count, so the numbers are reproducible and future
dev drift is a one-command recheck instead of another review round (per the
request to note the counting method).
Documentation-only; no code changes.
Co-Authored-By: Claude <noreply@anthropic.com>
* docs(architecture): refresh remaining counts + add snapshot basis note
Reviewer self-audit of the previous refresh caught more stale counts after
the rebase onto dev@b58af42:
- tool_implementations importers: 18 -> 17 (§3.2, §6.2, Appendix B)
- core/database classes: 27 -> 28 (§2.1, §6.2)
- mcp_servers .py files: 4 -> 5 (§1.1)
- routes/ -> core/ import lines: 124 -> 126 (§3.4)
Line counts in §2.1/§2.2 also drifted over the rebased range but are left
as-is and covered by a new "Snapshot basis" note in the header: line counts
are a snapshot that drifts as dev moves (recompute with wc -l), while the
importer/file/import-line counts are the authoritative ones refreshed here.
This keeps the inventory honest about live metric vs structural snapshot, so
dev drift no longer triggers a review round.
Documentation-only; no code changes.
Co-Authored-By: Claude <noreply@anthropic.com>
* docs(architecture): fix missed tool_implementations importer count in §6.3
Follow-up to the previous refresh: §6.3 Slice 1 still read "18 importers"
after the 18->17 update elsewhere. Correct to 17 for consistency. Doc-only.
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: yuandonghao <yuandonghao@cohl.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
docker-compose.yml injects FASTEMBED_CACHE_PATH=${FASTEMBED_CACHE_PATH:-},
which sets the variable to an empty string when the host has not defined it.
FASTEMBED_CACHE_DIR used os.getenv("FASTEMBED_CACHE_PATH", default), and
os.getenv only returns the default when the variable is ABSENT -- so the empty
value won and FASTEMBED_CACHE_DIR became "". os.makedirs("") then raised
[Errno 2] No such file or directory: '', FastEmbed failed to initialise, and
every vector feature (RAG, semantic memory, tool index) silently degraded on
the default Docker stack.
Treat an empty value like an absent one via `os.getenv(...) or default`.
Add a regression test covering the empty, unset, and explicit cases.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: check-in calendar digest leaks every user's events (no owner scope)
* Seed dtend on calendar events in digest test so the NOT NULL column is satisfied
The outbound UA for web_fetch / web_search was inlined in four places with
two different values and nothing keeping them current: content.py pinned a
mid-2021 Chrome 91 build, and providers.py sent a bare Mozilla/5.0 in three
spots. Some sites serve a degraded or blocked page to a UA that old.
Add WEB_FETCH_USER_AGENT to src/constants.py (env-overridable, matching the
existing Copilot/Kimi UA-constant pattern) and import it in content.py and
providers.py. Default to a current, common desktop UA so pages return their
normal HTML: the market-leading desktop OS (Windows; NT 10.0 covers Windows
10 and 11) and browser (Chrome) on a current stable build. The version is now
bumped in one place.
Service-specific self-identifying agents (Copilot, Kimi, webhooks, cookbook)
are intentionally left separate. Adds a regression pinning the constant shape,
the env override, and a guard against a new inline Mozilla literal in the
search sources.
Closes#4324
The three public webhook emitters in chat_helpers and webhook_routes
schedule deliveries via asyncio.create_task(webhook_manager.fire(...)),
which bypasses WebhookManager._bg_tasks. asyncio only holds a weak
reference to the outer task, so the GC can collect it mid-delivery and
the webhook is silently dropped.
Route all three through webhook_manager.fire_and_forget() so the task
is tracked by _spawn_tracked() and the manager owns the full lifecycle.
Adds an AST-level guard test that scans routes/ for direct
asyncio.create_task wrapping webhook_manager.fire(...) to prevent
regressions.
When a CardDAV contact matched the search query but had no email
address (only phone numbers), the tool silently dropped it and
returned 'No contacts found'. Fall back to the contact's phone
number(s) so the caller still receives usable information.
Refs: #4178 (the contacts-domain classifier fix that made the model
actually call resolve_contact for contacts queries, surfacing this
pre-existing gap)
* fix(api): attribute bearer-token actions to the token owner on owner-scoped routes
Owner-scoped chat, session, and upload routes called
get_current_user(), which resolves a bearer ody_ API token to the
sandboxed "api" pseudo-user. A paired API-token client (companion, CLI,
IDE extension) therefore saw and created a separate "api"-owned silo
instead of the owner's data.
effective_user() already exists for exactly this: it attributes a token's
actions to request.state.api_token_owner, is identical to
get_current_user() for cookie sessions, and falls back safely when a
token has no owner. session_routes.py was already migrated; this
completes the migration for the remaining owner-scoped routes:
- chat_helpers.py: chat-privilege enforcement, message attribution, prefs/context
- chat_routes.py: orphaned-endpoint owner, session-auth owner, message search
- upload_routes.py: upload owner attribution + access checks
The /api/models swap is intentionally omitted: #4292 already migrated it
to effective_user (plus the chat-scope gate and ownerless-token 403), so
this PR keeps dev's version of routes/model_routes.py unchanged.
chat_routes.py keeps importing get_current_user for the workspace owner
gate; session_routes.py drops the now-unused import.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test: target effective_user in auth monkeypatches and owner-scope assertion
The owner-scoped routes now call effective_user() instead of
get_current_user(), so the tests that stubbed get_current_user (or
asserted on it) follow suit:
- test_chat_helpers.py, test_review_regressions.py,
test_kv_cache_invalidation_2927.py: monkeypatch effective_user
- test_session_endpoint_owner_scope.py: assert the owner-scope guard uses
effective_user(request)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(search): add download budgets to web_fetch with truncation notice and hard ceiling
MAX_OUTPUT_CHARS only trims what the agent sees; fetch_webpage_content
buffered and cached the entire response body first, so a large or hostile
URL could pull arbitrarily many bytes into memory and the content cache.
The fetch is now a capped streaming GET (SSRF redirect guard unchanged):
a soft default budget (WEB_FETCH_SOFT_MAX_BYTES, 2 MB), a per-call
override via full/max_bytes on the web_fetch tool, and a hard ceiling
(WEB_FETCH_HARD_MAX_BYTES, 20 MB) that the override can never exceed.
When Content-Length already declares a body over the ceiling the fetch
is refused before any body bytes are buffered. Truncated results carry
truncated/fetched_bytes/total_bytes, the tool output leads with a
partial-content notice telling the model how to re-fetch with full=true,
and the tool schema documents the flag. A truncated PDF is reported as
a budget error since a cut PDF is unparseable. The effective cap is part
of the content-cache key so a truncated fetch is never served to a
full-budget request.
Existing tests that faked httpx.get or the old _get_public_url signature
are adapted to the streaming interface; behavior pins are unchanged.
Fixes#3812
* fix(search): close compressed-body cap bypass and protect the partial notice
Addresses RaresKeY's review on #3955:
- Force Accept-Encoding: identity for the capped fetch. With gzip/deflate the
wire bytes (and Content-Length) can be a fraction of the decoded body, so a
tiny compressed response could pass the hard-cap preflight and then expand
past the ceiling in a single decoded chunk before the streamed cap could
slice it. Identity makes Content-Length the true body size and keeps each
streamed chunk bounded by the network read, so the hard ceiling actually
bounds memory.
- Lead web_fetch output with the partial-content notice and cap the page
title. The notice is the user-facing contract for partial fetches, but the
title is untrusted, uncapped page content; placed ahead of the notice a giant
title could push it past MAX_OUTPUT_CHARS and drop it. The notice now leads
and the title is capped as a second guard.
Adds regressions: the fetch advertises identity encoding, and a truncated
result with an oversized title still surfaces the partial notice.
* fix(search): reject compressed responses that ignore the identity request
Requesting Accept-Encoding: identity is not enough on its own: a server can
ignore it and still return Content-Encoding: gzip, and httpx.iter_bytes would
decode that, so a tiny compressed body could balloon into one decoded chunk
far past the hard cap before the streamed loop slices it (and Content-Length,
the compressed wire length, makes the preflight and size metadata unreliable).
Refuse a non-identity Content-Encoding before reading the body. Adds a
regression where the server ignores the identity request and returns gzip;
the fetch is refused before any body is decoded.
providers.py redefined REQUEST_TIMEOUT = 20 locally, shadowing the same
value in src/constants.py and risking drift if the constant is bumped.
Import it from src.constants and drop the local copy; same value, one
source of truth.
Closes#4329
* fix(api): normalize non-object JSON bodies to empty dict in token PATCH
Valid non-dict JSON (e.g. [] or null) reaches payload.get(...) and
raises AttributeError. Normalize to {} so the route returns a controlled
response instead of an unhandled 500.
Fixes#3966
* test(api): add regression tests for PATCH with non-object JSON bodies
Covers array body ([]), null body, and normal object body as requested
in alteixeira20's review of #3976.
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
NVIDIA Grace Blackwell GB10 / DGX Spark was missing from GPU_BANDWIDTH, so
_lookup_bandwidth() returned None for it and _estimate_speed() fell through
to the crude FALLBACK_K path (k/active-params). That over-stated tok/s and
let speed scores saturate regardless of the box's real ~273 GB/s LPDDR5X
pool — distorting model ranking on these 128GB unified-memory rigs.
Add "gb10": 273 (GB/s). nvidia-smi reports the device name as "NVIDIA GB10",
which substring-matches the new key, so detected GB10 boxes now estimate
speed from the real bandwidth instead of the fallback.
* log(app): add warnings to silent except Exception blocks
- Internal tool auth header failure now logs a warning instead of
silently passing, making auth bypass easier to spot in logs.
- Token last_used_at update failure now logs at DEBUG (fire-and-forget,
non-critical, but useful when debugging token tracking issues).
- Image ownership verification failure now logs a warning so unexpected
access-check errors surface instead of silently allowing the request.
* log(chat_routes): add warnings to silent except Exception blocks
- clear_orphaned_session_endpoint: log before rollback so failures
appear in traces when users see stale/deleted model options.
- _endpoint_has_model (JSON parse): log malformed cached_models instead
of silently treating endpoint as valid.
- _has_any_visible_model (JSON parse): log malformed cached_models
instead of silently returning empty list.
- timezone header parse: log failure so time-zone-related tool bugs
(wrong scheduled times, calendar events) are traceable.
- attachments JSON parse: log failure so silently-dropped attachments
are visible in server logs.
* log(email_routes): add warnings to silent except Exception blocks
- Email alias resolution failure now logs a warning instead of silently
returning an empty list, making broken account configs diagnosable.
* log(document_routes): add warnings to silent except Exception blocks
- Export ZIP request body parse failure now logs a warning so empty
exports caused by malformed requests are diagnosable.
- clear_active_document failure on detach now logs a warning to help
trace doc re-injection bugs like #1160.
* log(agent_loop): add warnings to silent except Exception blocks
- builtin tool overrides load failure now logs a warning so misconfigured
settings don't silently fall back to defaults without a trace.
- Timezone context injection failure now logs a warning to help debug
incorrect scheduled times in agent-created tasks.
- PDF form-backed document detection failure now logs a warning so
broken form-doc UI is traceable to the root cause.
* log(llm_core): add warnings to silent except Exception blocks
- Malformed URL in _is_ollama_native_url now logs a warning so bad
endpoint configs are traceable instead of silently returning False.
- Model list fetch failure now logs a warning with the endpoint URL so
endpoints that silently vanish from the model picker are diagnosable.
* log: pass exception via exc_info instead of string interpolation
* fix(logging): avoid logging raw URLs in llm_core error paths
Drop the raw url/base_chat_url from the Ollama-detection and
model-list-fetch warning logs added by this sweep, since these values
can contain private hostnames, internal IPs, credentials, or other
deployment details.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(core): abstract runtime path logic for frozen distribution packages
* Address review feedback: revert browser MCP check, persistent data dir default when frozen, and add path tests
* feat(email): add Google OAuth2 for Google Workspace / .edu IMAP & SMTP
Google deprecated basic-auth (password) access for Google Workspace
accounts in May 2025. This means any .edu or org Google email account
could no longer connect via IMAP/SMTP with a username + password —
the email feature was silently broken for a large class of users.
This PR adds full OAuth2 (XOAUTH2) support for Google accounts so
Workspace / .edu emails work out of the box.
## What changed
### Backend
- `core/database.py`: add `oauth_provider`, `oauth_access_token`,
`oauth_refresh_token`, `oauth_token_expiry`, and `display_name`
columns to `EmailAccount` + idempotent migration
- `routes/email_helpers.py`: XOAUTH2 auth in `_imap_connect()` and
`_send_smtp_message()`, automatic token refresh, OAuth fields in
`_get_email_config()`
- `routes/email_routes.py`: OAuth authorize + callback routes,
`_smtp_ready()` fix, OAuth fields through `_deliver()` closure,
`display_name` in `From:` header
### Frontend
- `static/js/settings.js`: "Google Workspace / .edu" provider preset,
"Connect with Google" button, success/error banner, display name field
- `static/js/document.js`: `_accountCanSend()` recognises OAuth accounts
as SMTP-capable
* security: sign OAuth state, scope callback by owner, fix quotes & logs
Addresses reviewer feedback on the email OAuth2 PR:
- OAuth state is now HMAC-SHA256 signed (keyed with the app secret from
secret_storage) encoding account_id + owner + a random nonce, and is
verified with constant-time comparison in the callback before any
token write. Replaces the bare account_id state, closing the CSRF /
state-guessing gap.
- Callback extracts the owner from the verified state and re-checks it
against EmailAccount.owner before writing tokens, matching the
ownership guards used elsewhere in the email routes. Single-user mode
(owner == "") still accepts any account, consistent with
_assert_owns_account.
- Replaced curly/smart quotes in the Name/Email/Display Name input rows
with plain ASCII so getElementById lookups and event wiring work.
- Stripped account name, SMTP host/user, owner, and raw provider error
text from send-config and OAuth logs; failures now surface as generic
error codes in the redirect instead of raw exception strings.
* test(email): add OAuth2 state, _smtp_ready, and XOAUTH2 tests
Move the OAuth state sign/verify helpers out of the setup_email_routes
closure into module-level make_oauth_state/verify_oauth_state in
email_helpers.py so they can be unit-tested, then add tests/test_email_oauth.py:
- signed state round-trips account_id + owner, nonce is unique per call
- tampered account_id, forged signature, and garbage states are rejected
- _smtp_ready treats an OAuth account (no password) as send-capable, and
still rejects host+user-only accounts with neither password nor OAuth
- _xoauth2_string / _xoauth2_bytes produce the correct SASL XOAUTH2 framing
14 new tests; existing test_security_regressions.py still passes (28).
* refactor(email): single XOAUTH2 frame helper, use RuntimeError
Polish from self-review before merge:
- Collapse the XOAUTH2 framing to one source of truth: _xoauth2_raw()
returns the unencoded SASL string used by both the SMTP and IMAP auth
callbacks (each library base64-encodes it), and _xoauth2_bytes() is
just its .encode(). Removes the unused base64 _xoauth2_string helper
and the duplicated inline frame in _send_smtp_message.
- Raise RuntimeError (not bare Exception) for the "OAuth token
unavailable" path, matching the convention used across src/.
- Update tests accordingly.
All 14 OAuth tests + 28 security regressions pass; SMTP/IMAP XOAUTH2
verified live against a real Workspace account.
* tests(email-oauth): cover the security-sensitive OAuth paths before merge
The previous tests only exercised pure helpers (state signing, _smtp_ready,
XOAUTH2 framing). This adds coverage for the actual token-custody and
ownership behaviour, pinning the real route handlers rather than
re-implementations of their logic.
Real OAuth callback route (pulled live from setup_email_routes()):
- missing code -> generic missing_code redirect, no account id / owner in URL
- provider error -> generic google_error redirect, raw error not echoed
- tampered/invalid state -> invalid_state redirect, auth code never leaked
- signed state with owner mismatch -> token write refused (ownership_error),
DB row left untouched
- signed state with matching owner -> tokens written encrypted, and only to
the intended account (a second account stays untouched)
Real accounts-list route:
- exposes oauth_provider status but never the access/refresh token values,
encrypted or otherwise
Token storage / refresh helpers (isolated in-memory SQLite, mocked HTTP):
- refreshed access token stored encrypted; expiry is a timestamp, not a token
- fresh token uses cache (no refresh call); expired token triggers refresh
- refresh HTTP failure returns None silently, no exception or secret surfaced
- missing client credentials short-circuits to None
Password-account regression:
- password IMAP accounts call conn.login(); OAuth accounts call XOAUTH2
authenticate() and never login()
28 tests pass (14 prior + 14 new).
* fix(email-oauth): drop raw exception text from token-refresh log
Google token refresh failures now log the account id only, matching
the conservative logging used elsewhere on the OAuth path — no raw
provider/exception details surfacing in logs.
* fix(email-oauth): bring OAuth UI parity to the Integrations email form
The Google Workspace / .edu provider preset, Display Name field, and
Connect-with-Google flow were only wired into the Email-tab account
form. The Integrations-tab form (a separate code path for the same
account type) was missing all three, so the OAuth option was invisible
from that entry point. Mirrors the same PROVIDERS entry, OAuth section,
and connect handler so both forms behave identically.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix(notes): fail closed when an unauthenticated request reaches owner-scoped routes
The notes CRUD routes resolved the acting user with bare get_current_user().
A request that reached them with no identity (auth-middleware regression,
SSRF from a sibling service) came through as user=None — which every query
treats as the single-user mode: list all accounts' notes, read/update/
delete/pin/archive any row, reorder globally.
Resolve the owner through require_user() instead, which already encodes the
right policy: 401 when auth is configured, while the documented anonymous
modes (AUTH_ENABLED=false, LOCALHOST_BYPASS on loopback, unconfigured
first-run) still resolve to the single-user path. fire-reminder in the same
file already gated this way; the CRUD routes now match, and the inline
require_user import there is folded into the module import.
Extracted from #2940 (stabilization slice).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test(notes): drive fail-closed test via ASGITransport, not sync TestClient
The focused fail-closed test hung at `TestClient(app).get(...)` on some
environments. Starlette's sync TestClient runs the app in a background
event-loop thread (anyio blocking portal) and then dispatches each sync
endpoint onto a second worker thread; that handshake deadlocks on certain
anyio/httpx/platform combos. The identity injection also used
BaseHTTPMiddleware (@app.middleware("http")), the other known TestClient
deadlock source.
Switch to the repo's existing httpx.ASGITransport + AsyncClient idiom so the
whole request runs on the test's own event loop (no portal thread, no
BaseHTTPMiddleware). Identity now comes from a pure-ASGI shim that writes the
same request.state fields the real auth middleware sets, and a non-loopback
client peer keeps require_user's loopback fall-throughs out of the picture.
Same assertions and coverage; production code unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
_reminder_minutes matched the offset with (?:m|min|minute|minutes)\b and
(?:h|hr|hour|hours)\b. The trailing \b makes the common plural
abbreviations "mins"/"hrs" fail to match (after "min" the "s" is a word
char, so no boundary), so reminder_minutes "5 mins" or "2 hrs" returned
None and the event was created with no reminder, silently.
Widen the two unit regexes and the matching reminder_only description
regex to a strict superset that also accepts mins/hrs. The sibling
duration parser already accepts these forms (it has no \b), so this only
brings the reminder parser in line.
* test: align README presentation guards with the #4306 refresh
The 'Refresh README presentation' change (#4306) swapped the ASCII banner
for a centered wordmark image and moved the native quickstart into
docs/setup.md, which left four base tests failing on dev and froze the
merge gate:
- test_security_regressions::test_readme_native_quickstart_uses_loopback
now also accepts the loopback guidance from docs/setup.md, where the
quickstart moved (no behaviour change; the guidance is intact there).
- test_readme_ascii_fenced guards the new wordmark title instead of the
removed ASCII banner, and keeps a defensive check that any reintroduced
box-drawing banner stays inside a code fence (the original #1390 mode).
- The five unreferenced demo gifs under docs/ (chat, compare, document,
notes, research) are removed so test_docs_no_orphan_images passes; they
were de-referenced by the refresh. Recoverable from history if a docs
page wants to embed them again.
* chore: refresh PR checks
---------
Co-authored-by: Alexandre Teixeira <alexandremagteixeira@gmail.com>
build_models_url returns /models (no /v1 prefix) for non-local generic
OpenAI-compatible hosts (intentional, see endpoint_resolver.py:206). The
tests added in #4272 expected /v1/models, which is the local/deepseek
behavior. Match production semantics.
The two delete-ordering tests did monkeypatch.chdir(tmp_path) and wrote the
image under tmp_path/data/generated_images, but DATA_DIR (and therefore
gallery_routes.GALLERY_IMAGE_DIR) is always an absolute path, so the delete
resolver pointed at the repo's real data dir and ignored the chdir.
test_file_removed_on_successful_delete therefore failed on dev (the file at
the tmp path was never the one being removed), and test_file_kept_when_commit_fails
passed only by accident. Set GALLERY_IMAGE_DIR to the seeded tmp dir via
monkeypatch so both tests exercise the real path and pass deterministically.
* test(hwfit): assert the Apple matcher, not the general lookup, in the non-Apple guard
f7aa2de (#2564) added test_non_apple_gpu_with_cores_does_not_match, which
asserts _lookup_bandwidth(RTX 4090) is None. But '4090': 1008 has been in
the general GPU_BANDWIDTH table since v1.0, so _lookup_bandwidth correctly
returns the card's real bandwidth and the test fails (expected None, got
1008) - reddening the required pytest gate on dev and, by inheritance,
every open PR.
The guard's actual intent is that the Apple-specific bandwidth path does
not false-match a non-Apple card that carries a gpu_cores count. Point
the two asserts at _lookup_apple_bandwidth, which returns None for any
name without 'apple' regardless of the general table. The general-lookup
behavior (4090 -> 1008) is correct and untouched.
* fix(hwfit): route string GPU names through the Apple bandwidth helper
Second half of the #2564 regression (RaresKeY review on #4303). That
change moved the Apple tiers out of the generic GPU_BANDWIDTH table into
the dict-only _lookup_apple_bandwidth, but _lookup_bandwidth only called
that helper for dict inputs. A bare-string caller like
_lookup_bandwidth("Apple M3 Max") therefore fell through to the generic
table, found no Apple key, and returned None instead of the conservative
tier. Route both dict and string inputs through the Apple helper (a
string carries no gpu_cores, so it gets the model's lowest tier).
Regression added for the string path plus a non-Apple string control.
* fix: resolve Apple Silicon bandwidth variants
* fix(hwfit): preserve string lookup path in _lookup_bandwidth
* fix(hwfit): guard Apple bandwidth lookup against false GPU matches
Add "apple" not in gn check to _lookup_apple_bandwidth() so that
non-Apple GPUs with "m3"/"m4"/"m5" in their names (e.g. NVIDIA
Quadro M4 000) don't incorrectly match Apple bandwidth tiers.
Addresses @o3LL review comment on PR #2564.
#4159 (4b0a977) made build_models_url insert /v1 for path-less bases, so
the TestBuildersRejectLookalikeHosts model assertions that expected
/models started failing and turned the pytest gate red on dev.
Both the generic OpenAI branch and the real Anthropic branch now end in
/v1/models, so a URL-only assertion no longer proves a lookalike host
dodged the Anthropic/Ollama branch. Assert _detect_provider == "openai"
directly and keep the /v1/models expectation.
- Agent: pass the open email reader (uid/folder/account/from/subject/body
preview) on every chat submit so 'reply to this' / 'write email saying
hi' route to ui_control open_email_reply with the right UID instead of
inventing a new .md draft. Code-level enforcement (chat_routes strips
create_document + send_email when active_email is set); cross-session
active_doc_id is now trusted instead of being silently dropped.
set_active_email/clear_active_email tool-layer helpers in
tool_implementations.
- ui_control open_email_reply: optional body argument so the agent can
open-and-write in one call; envelope now forwards uid/folder/account/
body/panel through tool_output. Tool description sharpened and the
parser rejects empty bodies on reply/reply-all (forces the agent to
write rather than open an empty draft).
- Email library: search now runs against [Gmail]/All Mail when the
current folder is INBOX (archived emails surface). Whirlpool spinner
+ 'Searching…' placeholder while in flight. Each search result is
stamped with its source folder so clicks open the right email instead
of whatever shares its UID in INBOX. Search no longer re-applies the
same text pill locally (which only checks subject/from/snippet, never
body) so body-only matches don't get dropped after IMAP returns them.
Initial inbox load bumped 100→500.
- Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the
card menu and the open-reader more menu, backed by a new
/api/email/flag/{uid}?on=true|false endpoint. Flagged emails always
bubble to the top of the grid regardless of active sort.
- AI reply in doc editor: never overwrites existing draft text or the
quoted history. AI suggestion is prepended; AI-generated 'On …
wrote:' re-quotes are stripped so the original quote isn't visually
edited.
- Cookbook serve: pre-launch GPU driver / has_gpu / install / version-
floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0
etc.) before the launch chain starts. Detect 'another model already
running on this host' and offer Stop & launch (with graceful then
force tmux kill helpers, port release wait). Per-vendor deep-link
buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend
picker is now a custom dropdown with accent-coloured logos for vLLM,
SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to
package names in Dependencies. Runtime-readiness note moved inside
the panel (green when ready, red when missing) with an × dismiss.
Esc collapses the expanded card; expanded card scrolls when it
overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce
Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on
one row (Reasoning Parser auto-detected per model family).
Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto'
input — command builders read from the GPU button strip. Default
cookbook open is Download tab.
- Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by
release_date; release dates can be backfilled with the new
scripts/backfill_model_release_dates.py and recipe metadata pulled
with scripts/import_from_vllm_recipes.py against the upstream
vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped
on entries).
- Calendar: Quick add hint cycles a random Odysseus-themed example per
open (wooden horse Friday, crew muster 10am daily, council on
Ithaca, …). Typing a time like '11pm' in the event title updates
the hero clock live.
- Doc editor: email-mode Reply button (sparkle icon, accent) opens the
same Fast/Full + context popover the email reader uses; Ctrl+Alt+M
toggles markdown preview.
- Memories panel: custom sort picker with per-option icons, default
'Latest', visible Enabled/Disabled toggle text matching the section
description style.
* Agent: make skill-prescribed tools actually callable
The skill index and matched-skill procedures are injected into the
prompt, but tool selection never followed: manage_skills wasn't in the
RAG-selected schema list (so the model substituted manage_memory), and
a matched skill could prescribe tools (grep, read_file) the model had
no schema for. Now:
- manage_skills rides along whenever the owner has any skills indexed
- a Jaccard-matched skill's requires_toolsets join the selection
- viewing a skill mid-turn via manage_skills unlocks its
requires_toolsets for subsequent rounds
- admin-intent turns send _ADMIN_TOOLS schemas, matching the prompt
text _build_base_prompt already advertises
- index_for(active_toolsets=None) no longer hides requires_toolsets
skills from callers that don't know the active set
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Agent: validate skill requires_toolsets against known tools, not TOOL_SECTIONS
grep/glob/ls ship as function schemas without a prompt-prose section,
so gating on TOOL_SECTIONS silently dropped them from a skill's
requires_toolsets.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
* fix(research): preserve Discuss spin-off primer during context trimming
trim_for_context() kept only system_msgs[:1] as essential and dropped the
rest under budget pressure. A research "Discuss" spin-off seeds the report
as a system message that sits after the preface system messages, so it
landed in extra_system and was the first thing evicted once the chat grew
— the conversation then lost its grounding and drifted off task.
Treat any system message carrying research_spinoff_from metadata as
essential, alongside the leading system prompt, so the seeded report
survives trimming. maybe_compact already retains all system messages.
Tests: tests/test_context_compactor.py::TestResearchPrimerPreserved
* fix(research): ground Discuss spin-off chats on the seeded report
build_chat_context injected global memory (pinned + hybrid-retrieved) and
personal-doc RAG every turn, keyed off the user-level memory_enabled pref
and a request-scoped use_rag flag — never the session. A research spin-off,
whose primer declares the report the sole knowledge base, thus had
unrelated keyword-matched facts pulled in ("wrong data") competing with the
report; its rag=False flag was also ignored (use_rag defaulted on).
Add _session_is_research_spinoff(sess) (detects the primer research_spinoff_from
metadata; handles ChatMessage and dict forms) and, for such sessions,
disable memory injection and force RAG off.
Tests: tests/test_chat_helpers.py spin-off detection cases
---------
Co-authored-by: Dan (cirim) <claude@cirim.org>
Clicking the card body outside the edit <textarea> bubbled to the card's
click handler and collapsed the card, silently discarding unsaved skill
edits (issue #4002). The textarea's own stopPropagation only shields
clicks landing on it. Bail out of the card click handler while a
.skill-md-editor is present so the card only leaves edit mode via Save
(Cancel button is handled separately by #3580). Mirrors the same guard
into the built-in capability card, which shared the bug.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The scripts/odysseus-backup snapshot/restore CLI was undocumented in
README.md and docs/. Add docs/backup-restore.md covering the snapshot,
list, verify, and restore subcommands, default include/skip behavior
(deep_research and mail-attachments skipped unless flagged), the
destructive-restore warning and its data.before-restore-* stash, a cron
example, and Docker-vs-native data/ paths (including the ChromaDB named
volume caveat). Link it from the README Data section.
Addresses the "Backup/restore guide and helper flow for data/" item in
ROADMAP.md. Docs only; no change to the tool.
Fixes#2583
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(auth): add per-user admin promote/demote toggle
Admin-only API and Users-tab control to grant/revoke admin rights; refuses to demote the last admin.
* fix(auth): restore pre-admin privilege restrictions on demotion
Promoting now stashes the user's privilege map (privileges_before_admin)
and demoting restores it instead of resetting to defaults, so a
promote/demote round trip can no longer broaden a restricted user's
access. Users without a stash (created as admin, or promoted before this
fix) still demote to DEFAULT_PRIVILEGES so a born-admin's stored all-True
map — including can_use_bash — can't survive demotion.
---------
Co-authored-by: K M Merajul Arefin <merajul.arefin@therapservices.net>
delete_gallery_image() deleted the on-disk file before setting
is_active=False and committing. If that commit failed and rolled back,
the record stayed active but its file was already gone — a broken,
unviewable image (data loss).
Soft-delete and commit first, then remove the file best-effort, so a
missing or locked file can no longer 500 a delete that already succeeded
logically.
Adds tests/test_gallery_delete_file_ordering.py covering the
commit-failure (file kept) and success (file removed) paths.
start-macos.sh guarded venv creation with `[ ! -d venv ]`, which trusts any
existing venv/ directory even when a prior run was interrupted before pip was
bootstrapped into it. Re-runs then failed with "No module named pip" and never
self-healed, contradicting the script's "safe to re-run" promise.
Validate that the venv has a working pip before reusing it, and rebuild it
otherwise.
Fixes#3105
When a session ID is sent to POST /api/memory/import but that session no
longer exists in the DB, the previous code raised HTTP 404. The import
endpoint only needs the session as an LLM-config source; the file being
imported has nothing to do with the session. A fallback to the utility
endpoint (already used when no session_id is supplied at all) is correct
and safe.
The extract endpoint is intentionally left alone — it reads the session's
message history and therefore genuinely requires a live session.
Co-authored-by: clochard04 <clochard724@gmail.com>
windowDrag.js ran its own top-edge fullscreen system (cy <= SNAP_PX →
_enterFs()) independently of the tileManager.js snap zones, causing
duplicate/unexpected fullscreen behavior when dragging window chips
toward the top of the screen.
Hardcode enableFullscreen to false. tileManager.js remains the single
source of truth for fullscreen/maximize snap behavior and is untouched.
* fix(security): encrypt CardDAV password at rest in settings.json
CardDAV password was stored in plaintext in data/settings.json, while
other secrets (email, CalDAV) are encrypted using src.secret_storage.
On read (_get_carddav_config): decrypt the password via decrypt().
On write (update_config): encrypt the password via encrypt() before
saving to settings.json.
decrypt() is a no-op on plaintext, so existing deployments upgrade
transparently on the first read after the next config save.
* test: add coverage for CardDAV password encryption
Nine tests covering:
- encrypt-on-save and decrypt-on-read round-trip
- encrypted value is stored with enc: prefix (plaintext absent from file)
- legacy plaintext passthrough
- CARDDAV_PASSWORD env var passthrough (not decrypted)
- empty password / no settings file
- double-save does not corrupt
- encrypt() idempotent on already-encrypted value
* fix(kimi): resolve Kimi Code API 403 errors and User-Agent restrictions
Kimi Code subscription keys require a whitelisted coding-agent User-Agent to avoid access_terminated_error 403s. This adds User-Agent probing and caching for Kimi Code endpoints.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(kimi): omit temperature for kimi-for-coding API calls
Kimi Code rejects any non-default temperature with HTTP 400, which broke deep research probes and low-temp LLM rounds.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
The 'HF token: NOT SET' shell hint shown when downloading a gated/private
model told users to add a token under 'Odysseus Settings -> Cookbook ->
HuggingFace Token'. There is no Cookbook section under the app Settings;
the HuggingFace Token field lives under the Cookbook page's Settings tab
(static/js/cookbook.js — data-backend="Settings" group). Following the
old hint led nowhere. Reverse the path to match the real UI.
Fixes#3829
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
The dashboard background status reconciler (_pollBackgroundStatus) only
recovered "done" for dependency installs when the backend reported a
finished task as "stopped". A real model download whose tmux pane is
gone after DOWNLOAD_OK (so the dead-session check misses the landed
snapshot) fell through to `task.type === 'download' ? 'crashed'`, so a
completed download was shown as crashed (and stalled on the Serve tab).
Recover "done" from the terminal DOWNLOAD_OK sentinel, mirroring the
dep-install recovery already present. The background poll runs blind, so
it keys off the conclusive exit-0 sentinel only — not the `/snapshots/`
path, which can be printed mid-stream for multi-file downloads and would
risk marking an incomplete download done.
Fixes#3897
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add check for mobile screen width (<= 768px) to prevent accidental submissions via the Enter key.
- Update event listeners in static/app.js and static/js/chat.js to respect this constraint.
When a download's tmux pane is gone, the status endpoint trusted only the
HF-cache probe to tell completed from stopped. The probe derives its cache
root from its own environment, but the download runner exports
HF_HOME=<local_dir> (the #2722 fix), so custom-dir downloads land in
<local_dir>/hub where the probe never looks - and ollama pulls don't touch
the HF cache at all. Finished downloads were reported as stopped forever,
and tasks already persisted as completed were demoted back to stopped on
the next poll. This is the backend half of #3897, deliberately left out of
the frontend fix in #4000.
- honor the conclusive runner markers first: DOWNLOAD_OK -> completed
(keeping the "Fetching 0 files" error guard), DOWNLOAD_FAILED -> error
- pass the task's local_dir through to the cache probes so they check the
cache the download actually wrote to, keeping the env-var fallback for
default-cache downloads
- move the probe scripts and marker classification into
routes/cookbook_output.py (dependency-free) with behavioral tests
Fixes#4017
* fix(agent): don't let a materialized default budget defeat context scaling
#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.
Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.
is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): rework context-budget fix per review (#4122)
Address RaresKeY's review:
P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.
P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.
hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(agent): lock the materialized-default budget regression (review on #4121)
Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.
To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.
Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): bind the budget context window to its own provenance (review #4122)
RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
fallback while the fresh probe proves the real (smaller) served context — we'd
scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
skill test runs, bg_monitor) were capped at 6000 even when a long window is
discoverable.
Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(agent): fix stale budget comments + tighten to the contract (review #4122)
- settings.py: an explicit budget is clamped to the window only — hard_max is
auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
non-default = explicit cap, hard_max = auto-only ceiling).
Comment/docstring-only; no behaviour change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)
The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230 — #1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).
The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).
Comment/docstring-only; no behaviour change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make src.youtube_handler a compatibility wrapper around services.youtube.youtube_handler so transcript state, URL parsing, and timeout behavior no longer diverge.
Resolve remove_directory_from_rag paths through the same PERSONAL_DIR confinement helper used by add_directory_to_rag before removal sinks are reached.
Lock the API key encryption key file to owner-only permissions on creation and when reading existing keys, with regression coverage for permissions and encryption roundtrip.
Clarify in the Docker install section that Apple Silicon Docker cannot use Metal GPU acceleration for Cookbook model serving and point users to the native Apple Silicon path.
Fixes#4232
Convert email search and archive handlers from async def to sync def so FastAPI runs their blocking IMAP I/O in the threadpool instead of the event loop.
When the backend (vllm / sglang / llama_cpp / diffusers) is missing on
the chosen serve target, the runtime-readiness note already flips red
and reads '<backend> missing on <host>.' but offered no fix path.
Append an accent-coloured link that calls openCookbookDependencies with
expandRecipe + the model's repo id, so one click switches to the
Dependencies tab, expands the right backend row's recipe panel, and
pre-selects the model so the user just hits Run.
- Drop the 'Install via' label and the pill-tag variant buttons. The
toggle is now the same sliding-pill mode-toggle used by the
Agent/Chat selector in the chat input. Pip/uv on the left, Docker on
the right, default = Pip/uv. CSS: extended .mode-chat::before's
translateX(100%) rule to also fire on .mode-right so non-chat
callers can use the same animation without claiming the chat-only
class name.
- Copy button moves inside the <pre>: absolute-positioned at the
top-right corner, icon-only, padding-right on the pre makes room.
Matches the Setup-token copy pattern in the integrations form.
Each recipe catalog entry now carries two variants:
variants.pip → uv pip install …
variants.docker → docker pull <image>
A small 'Install via' pill row in the panel toggles between them
(default = Pip/uv per the user's preference). Switching variant or
changing the model re-renders the <pre> via _refreshRecipePre(); the
display text drops the 'source venv/bin/activate' prefix for Docker
since docker pull doesn't need a venv. Run honours the active variant
so picking Docker queues 'docker pull …' as the tmux task.
Recipes now hold ONLY the install command(s). The rendered <pre>
prepends a 'source <envPath>/bin/activate' line so the user sees a
paste-ready sequence; Run uses env_prefix (same path the Install
button uses) to activate the configured venv before the install
command, so the install lands in the existing environment rather
than a fresh .venv in whatever CWD the tmux task happens to start in.
- cookbook-deps-recipes.js: trim each recipe to its single pip command
- cookbook.js: _recipeDisplayText() prepends the activate context for
display; pre's data-dep-recipe-install holds the raw install-only
command list so Run knows what to send; Run builds env_prefix the
same way _installDep does.
The recipe dropdown was a static catalog (MiniMax / Any vLLM model). Now
it lists every model already downloaded on the active server (the same
_cachedModelIds set the Launch tab + dl-dots already drive), plus an
'Other (generic …)' fallback. The change handler uses pickRecipe(backend,
modelId) to find the best match — MiniMax ids land on the MiniMax recipe,
everything else falls back to the generic install.
cookbook-diagnosis.js: openCookbookDependencies's pre-select logic now
matches by full option value (model id) instead of label substring, since
the dropdown values are full repo ids now.
Before the quickrun (Run) button fires /api/model/serve, ask the deps
API whether the chosen backend (vllm / sglang / llama_cpp) is actually
installed on the target server. If not:
- Toast: '<backend> not installed on <host>. Opening Dependencies …'
- Route the user into the Dependencies tab via the existing
_openCookbookDependencies helper (now exported as
openCookbookDependencies)
- Auto-expand the recipe panel for that backend
- Pre-select the user's model in the panel's picker so the right
recipe is highlighted out of the box
The serve task is suppressed; the Run button is re-enabled. Once the
install task finishes in Running, the user clicks Run again.
cookbook-diagnosis.js: openCookbookDependencies takes an opts object
that, when expandRecipe is set, finds the row's caret and clicks it,
then matches a recipe label by model (currently only MiniMax has a
specific entry; the generic fallback stays selected otherwise).
Each row for vllm, sglang, llama_cpp now carries an expand caret that
opens an inline recipe panel below the row. The panel has:
- 'Serving which model?' select populated from a new tiny catalog
- <pre> code block showing the exact shell sequence for that pair
- Copy: clipboard the commands
- Run: launch the joined 'cmd1 && cmd2 && …' as a tmux task on the
currently-selected deps server (same plumbing as Install)
New file: src/static/js/cookbook-deps-recipes.js — single source of
truth for the recipes. Seeded with MiniMax M2/M2.7 + a generic fallback
for each backend (all three use 'uv venv → source .venv/bin/activate
→ uv pip install ... --torch-backend auto', the recipe the user
pasted). Adding model-specific recipes is now a one-entry edit.
Next commit: Launch-tab pre-flight that intercepts the serve click
when the backend isn't installed and deep-links into this panel.
Rows inside the Advanced details were inheriting the standard
6px row-gap from .hwfit-serve-row (used to give the Core knobs
some breathing room). Inside Advanced — where the rows are
mostly single-line dropdowns — that read as half a row of empty
space between every pair.
Now inside Advanced only:
- grid row-gap drops to 4px
- label → control margin-top drops to 1px (was 2px)
- checks row gap also drops to 4px
Outside Advanced (Core, etc.) the original spacing stays.
Was rendering as a separate body block below the Copy/× toolbar.
Now the diagnosis message and the suggested-action text sit inline
on the left of the toolbar, with Copy and × pinned to the right —
reads as one self-contained header strip instead of stacked rows.
The tmux default 2000-line scrollback was getting blown out by
long vLLM tracebacks (DeepSeek-V4-Flash launch crash had the root
cause scrolled off; the user saw only the tail "See root cause
above"). Bumped:
- tmux server history-limit to 100000 at session creation (prepended
to each tmux new-session command so both local + ssh remote inherit
the larger scrollback)
- crash-watchdog capture-pane from -S -200 → -S -2000 so the
diagnosis includes the actual exception line
These models OOM on --kv-cache-dtype auto (≈bf16) at any usable
context with current tensor-parallel layouts. _detectModelOptimizations
now seeds opts.kvCacheDtype='fp8' for them, and the serve panel's KV
Cache select picks that up as the default unless the user has a
saved override on this skill.
The DeepSeek branch in _detectModelOptimizations matched only V3
and R1 literally. DeepSeek-V4-Flash (and future Vx / Rx) didn't
hit any branch, so the Expert Parallel checkbox + Speculative
defaults never surfaced in the Run panel. Widened to a regex that
catches v3/v3.1/v4/v5/v10+ and r1/r2/… for both the expert-parallel
flag and the MTP speculative defaults.
The +/- step buttons next to the Speculative tokens count read as
clutter for a 1-10 single-digit input — the native number-input
spinner + manual typing is enough. Reduced the input width to 44px
so it sits tight next to the method dropdown.
Reads the last hwfit scan's backend (window._hwfitSystemCache.backend)
and picks the right vLLM install path per vendor:
- NVIDIA/CUDA (default)
- uv: uv pip install -U vllm --torch-backend auto
- docker: docker pull vllm/vllm-openai:latest
- AMD/ROCm
- uv: uv pip install -U vllm --torch-backend rocm
- docker: docker pull rocm/vllm-dev:main
The <pre> previews are re-painted on render to match what Run will
actually launch, and the confirm dialog tags the backend so the user
knows what they're committing to.
Was just a copy-paste reference. Each row now has a Run button that
launches the command as a tmux task on the currently-selected deps
server (same path Reinstall already uses) — Odysseus does the work,
the user watches progress in the Active tab. Dropped the plain
pip option since the existing per-package Install button already
covers it; kept uv (recommended) and Docker pull as the two
alternatives.
- Moved "Extra args" out from above the vLLM advanced checks
(Reasoning Parser, Speculative, MoE Env) to AFTER them, so it
reads as "after the advanced toggles, anything else".
- Added a collapsed "Manual install (vLLM)" details block to the
Dependencies tab description with three copy-paste recipes:
uv venv + uv pip (recommended), plain pip, and docker pull
vllm/vllm-openai:latest. Useful when the in-app Install button
can't run (offline target, custom torch backend, etc).
max_tokens=0 made stream_agent_loop omit the param entirely, which
on some OpenAI-compat upstreams (DeepSeek in the report) meant the
model defaulted to a very short or zero-token completion — the user
saw "the model returned an empty" even though normal chat with the
same model worked (chat sends its preset's max_tokens). Match the
chat default.
- .adm-model-logo + .settings-select { margin-left: -4px } pulls
the select 4px closer to its logo chip so the row reads as one
unit instead of having an obvious gap between the icon and the
dropdown.
- Fallback-row selects get flex:1 so the trash-can sits flush
against the right edge of the row — matching the right edge of
the Endpoint and Model selects in the rows above the fallback
list (was rendering tight to the model select's content width).
Provider SVGs in providers.js declare only viewBox (no width/height),
so when injected into the 18×18 logo chips they fell back to the
browser default of 300×150 and blew out the row.
- CSS: SVGs inside settings logo chips (`span[id$="-logo"]`,
the 18px wrappers in fallback rows) now stretch to 100%/100% of
their container.
- Added matching `-logo` chip next to the Endpoint dropdowns in
Default Chat Model and Utility Model cards.
- New `_syncEndpointLogo` helper mirrors the selected endpoint
option's text label through providerLogo() (the select value is
a UUID and wouldn't match anything otherwise), and
`_fillEndpointSelect` calls it on each render.
- The API occasionally returns the same skill twice (built-in
shadow vs user copy, or a write/read race) which made the
duplicate-detector tag BOTH copies as the "recommended" keeper
(the find-skills card showing duplicate #1 twice). Loading now
filters out repeats by lowercased name before render.
- Reordered the per-skill kebab menu: Publish/Unpublish → Select
→ Edit → Test → Audit → Delete. Select previously sat at the
bottom; lifting it next to Publish puts the bulk-mode entry
point with the other bulk-style action.
- When Settings is expanded, the toggle bar's bottom radius/border
flattens and merges into the row below (zero gap, softer
top-border on the body) so the row visually reads as the toggle's
open-state content instead of an unrelated card below it.
- .research-query margin-top trimmed from 6px to 2px (lifts the
textarea ~4px closer to the description line above).
Auto handles 90%+ of cases — the row of category buttons was
visual noise on the main panel. Now:
- Removed the .research-category-row from above the textarea.
- Added a Format <select> inside Settings (next to Rounds) with
Auto / Product / Compare / How-to / Fact-check options. Default
is Auto, same as before.
- Updated all the JS that read .research-cat.active / data-cat to
read #research-category.value instead (_saveSettings, _readSettings,
_resetCategoryToAuto, _editJob, _restoreSavedSettings).
Same wire to the backend — settings.category still carries through.
Was rendering on its own row under "Multi-step web research with an
LLM-in-the-loop agent". Now appended to that same flex-wrap line as
"— past runs in Library, Research" so the header section stays one
visual block instead of two.
Was rendering on a second row below the "Past research" header,
inflating it to two rows. Now appended to the title span as a small
inline chip — "Past research — all in Library, Research" — keeping
the header at one row. Same click → close panel + open Library tab.
The capture-phase scroll listener was firing for scrolls anywhere
in the modal — including the Trending models list, which lives
inside the Direct Download fold body. Scrolling that list was
auto-folding the section that contains it.
Bail early if the scroll target is the fold body or a descendant —
the section only folds on scrolls in sibling scrollers (.cookbook-body,
.hwfit-list, .modal-content).
Each time the panel opens we pick a random entry from a list of 10
diverse research prompts (history, tech, food, science, fact-check,
how-to) so the textarea hint feels fresh and shows the breadth of
queries the tool handles instead of always nudging toward the same
Odysseus example.
- Removed standalone "Edit cmd & relaunch" — "Edit in serve panel"
renamed to "Edit & relaunch" and is now the single edit entry.
Tooltip notes that the raw cmd is still editable inside the panel.
- Tagged each item with a group (run / edit / endpoint / copy /
danger) and renderer inserts a thin divider whenever the group
changes, so the menu reads as visual blocks instead of one long
list.
- Header h2 inside the Active group now says "Active" (matches
the renamed tab) instead of "Running".
- Both context-menu Reconnect entries (the normal one and the
recover-from-vanished-process fix) say "Reconnect tmux" so the
user knows what the action actually does.
- Sibling cookbook-server-section-* blocks inside the Active group
get a top divider + 14px gap so transitions between server
groups (local / remote-host / etc) read clearly.
Previously the global GPU-toggle total was set once and never
overridden, so a first scan on the local 1-GPU container left
the Run-panel GPU button row stuck on GPU 0 even after switching
to a 4-GPU remote host. Now any scan returning a positive total
updates the binding; zero/missing values still don't clobber a
known-good count (no flicker during in-flight re-probes).
Mirrored the panel's runtime readiness note into a small chip
appended to the .memory-item-title at the top of the expanded
serve card. The in-panel note becomes a hidden source-of-truth.
This way the "vLLM ready on … : vLLM CLI: …; python package:
vllm 0.22.0" status sits inline with the model name where the
user is already looking, instead of buried below the toolbar row.
On initial render, compare total_ram_gb vs gpu_vram_gb — if RAM is
the larger pool, pre-select the RAM (count=0) button instead of the
max-GPU button. Boxes with more system RAM than VRAM (low-VRAM
GPU + lots of system memory, or CPU-only servers with a small
adapter) now open on the dominant pool.
New order: [Standard ▾] [Search ............] [Engine] [Quant] [Context]
so the two primary picks (type + free text) sit together at the
left, with the more advanced filters lined up to the right.
display:none toggle was instant and felt jarring during auto-fold/
auto-expand. Swapped to a CSS class `.is-folded` that transitions
max-height (0 ↔ 1200px) and opacity (0 ↔ 1) over ~280ms with ease,
so both manual chevron clicks and the scroll-driven toggles slide
in/out smoothly.
scroll handler now tracks per-target scrollTop via WeakMap. Downward
scroll on any scroller in the cookbook modal folds Direct Download;
scrolling back to top (scrollTop <= 0) unfolds it. Manual chevron
clicks still win — they persist to localStorage; auto-toggles
don't, so the user's last explicit pick survives reload.
IntersectionObserver missed the case because scrolling inside the
nested .hwfit-list (max-height:52vh own scroller) doesn't move the
header out of view at all. The user wants any downward scroll in
the scan/download area to fold Direct Download.
Switched to a capture-phase scroll listener on #cookbook-modal that
catches every scroll event from any nested scroller (.hwfit-list,
.cookbook-body, .modal-content). Folds only on downward scrolls so
scrolling back up doesn't keep re-folding.
The scroll listener on .cookbook-body never fired — the user is
likely scrolling inside the nested .hwfit-list (max-height:52vh)
which doesn't bubble to its parent. IntersectionObserver fires
whenever the Direct Download header crosses the viewport edge
regardless of which container moved.
Folds only when boundingClientRect.top < 0 (header pushed up past
the top) so modal close / detach doesn't trigger it.
Previous .modal-body / .cookbook-content lookup matched neither the
desktop scroller (.cookbook-body) nor the mobile one (#cookbook-modal
.modal-content), so the scroll listener was attached to document.body
and never fired. Walk up to whichever scroller actually exists.
Added a scroll listener on the parent .modal-body / cookbook-content
that folds the Direct Download body once its h2 header has scrolled
above the container's top edge. Frees the viewport for the Scan
section below while leaving the chevron clickable to expand again.
Auto-fold doesn't write to localStorage (only manual clicks do)
so the user's last explicit preference still wins on reload.
- Added a trending-up (market-up) SVG before the label, tinted
accent so the section reads as "what's hot".
- Chevron ▸ moved from the left to the right side of the toggle
row (still rotates via the existing CSS).
- Bumped the toggle row taller (26→34px) with 13px font + 18px
icon so the section header has more presence.
- Brain admin-card header rows get min-height:32px so cards with
toggles and cards without (Inject Skills) align.
- Cookbook Trending models tab nudged up 8px (top:-3 → -11).
- Removed the ↻ RESCAN button in hwfit toolbar; manual EDIT still
available and auto-probe runs on container restart.
- Reordered the toolbar so Audit sits left of Select (matches the
brain memories layout where bulk actions live before Select)
- Renamed "Audit all" → "Audit"
- Star icon in Audit now tinted with var(--accent, var(--red))
- Select button gets the same dot/X SVG swap used in brain
memories (dot in idle state, X when bulk-select mode is active)
Per user report — the tag's mode metadata coincided with a
500 error on agent mode (especially on mobile). Removing the
UI tag, the chat.js writes of metadata.mode, and the CSS pill
so agent mode posts work cleanly again.
Touches:
- chat.js: drop _sendMode capture + meta.mode writes (user + assistant)
- chatRenderer.js: roleTimestamp() back to a single (when) arg, drop
the .role-mode-tag append; updated three call sites
- style.css: dropped .role-mode-tag and .role-mode-agent rules
- Each input now has a sibling .email-field-prefix span (To / Cc /
Bcc / Subject) absolute-positioned at the left edge in the
accent color. Inputs get padding-left:44px (64px for Subject)
so typed text doesn't slide under the prefix.
- Placeholders shrink back to just the example so only the
prefix gets the accent color, not the example text.
- Cc toggle moved another 2px up (calc(50% + 4px) → calc(50% + 2px)).
- Removed the <label>To/Cc/Bcc/Subject</label> elements — they
doubled what the placeholder said.
- Placeholders now carry both the field name AND an example so an
empty input still tells the user what to type:
To recipient@example.com
Cc cc@example.com, example2
Bcc bcc@example.com
Subject
Adds a per-field X (24x24 SVG, opacity 0.4 → 1 + accent on hover)
absolute-positioned at the right edge of each Cc/Bcc field. Click
hides both rows, clears their inputs, and restores the Cc opener
on the To row. Inputs get padding-right:32px so the close button
doesn't overlap typed text.
Was `top: calc(50% + 4px)` which left the button 4px below the
true vertical center of the input — visibly misaligned. Dropped
the +4 offset so the toggle anchors at top:50% / translateY(-50%)
and tracks the input's center.
Also removed the redundant base rule's position:relative + top:2px
nudge — it was being overridden by the more-specific
.email-field .email-cc-toggle absolute positioning anyway.
- Renamed "Note (no timer)" → "Note".
- Clicking it now opens a small modal with a textarea + Save/Cancel.
- The typed text becomes the todo item; due_date is omitted so no
timer fires. Esc cancels; Cmd/Ctrl+Enter saves.
Re-adds the timer-less note path next to the time-based presets.
Picking it POSTs the same payload but omits due_date so the entry
lives in notes as a plain reply todo with no reminder firing.
Toast: "Reply note saved" instead of "Todo reminder set for …".
Was sticking on toggled-on state if the user closed the library
while in select-mode — reopening showed the Cancel/X toggle even
though no emails were selected. Force-reset state._selectMode and
state._selectedUids in openEmailLibrary so each open starts fresh.
Correct behavior:
1. Cached draft + first click → opens the cached reply
2. Cached draft + second click → clears the cache and opens the
Fast/Full + context menu so the user can request a fresh draft
3. No cache → opens the menu directly
Per-button shownOnce dataset tracks the first-click state so the
second click triggers the menu instead of replaying the cached
reply again.
- AI reply: removed the cached_ai_reply shortcut so clicking the
button always reopens the Fast/Full + context menu. Lets the user
ask for a fresh draft (with new steering) instead of being locked
into the cached one.
- .email-cc-toggle gets position:relative + top:2px so it
baseline-aligns with the To: field chips next to it in the
document email compose.
- Initial button: dot-in-circle SVG + "Select" label
- After click (select-mode on): X SVG + "Cancel" label + .active class
- Same SVG glyphs as memory.js so the two pages feel consistent.
Hooked into the toolbar Select toggle AND the bulk-bar Cancel button
so both reset the icon state.
When the chevron opens the details, the To/Cc rows pop up as an
absolutely-positioned panel anchored to the bottom of the meta
block — with bg, border, rounded corners and a shadow. Nothing
in the rest of the header reflows: From row stays put, the action
cluster stays put, the email body content stays put. This kills
all the height-/spacing-jump quirks the inline-expanded design
was fighting.
- .email-reader-meta .recipient-chip drops max-width and overflow
truncation so the full name renders in each chip. The parent
.recipient-chips span already has overflow-x:auto, so users can
swipe horizontally to reveal any chip whose tail is clipped off
the right edge of the row.
- Strong (From: / To: / Cc:) labels get explicit white-space:nowrap
+ flex-shrink:0 so they never truncate even when the row is
squeezed to its minimum width.
- Horizontal: max-width and left already clamped to viewport-16.
- Vertical: prefer below the button, but flip ABOVE if there's
more space there (e.g. button near the bottom of the viewport).
- max-height clamped to viewport-16 with overflow:auto as a final
guard so the menu can never extend past the screen edge.
Dropped the two-step (pick mode → context → OK) flow. Now the
context textarea is at the top of the popover and Fast (left) /
Full (right) sit below as the confirm buttons themselves — they
fire the draft with whatever's currently in the textarea (empty
= no steering).
The document-level capture listener was closing the popover on
ANY click — including clicks inside the context textarea, which
made it impossible to focus the input. Replaced with an inline
handler that bails when the click target is inside the menu.
Restructured flow:
1. Click Fast or Full → reveals an optional context textarea
("Add context (optional)") below
2. Type optional steering note or leave blank
3. Click OK → triggers the draft with the chosen mode + note
Dropped the standalone … note-toggle button — the textarea is now
gated on picking a mode, which makes it easier to discover.
- Removed the conditional Draft fast / Draft full buttons. Note
textarea is always-on via the … toggle, and whatever's in it
is picked up by the existing Fast / Full buttons as noteHint.
- Clamped the popover max-width and left position to
Math.min(220, viewport-16) + 8px margin so the (now wider) menu
doesn't spill off the right edge on narrow mobile screens.
Top row keeps Fast / Full + a new horizontal-dots button. Clicking
the dots reveals a textarea ("e.g. reply nicely but say no"); as
soon as text is in it the panel shows Draft fast / Draft full
buttons that pass the note through as noteHint to the AI reply
endpoint. Empty textarea hides the draft buttons so the user only
gets the steered draft when they've actually typed direction.
Firefox mobile rendered the backdrop-filter:blur + var(--panel)
combination on the slide-out sidebar as semi-transparent, so the
chat input bar's selected-model label (e.g. "minimax") was
visible behind the drawer. Force background:var(--panel) and
backdrop-filter:none inside the mobile @media block.
The left-edge gradient fade was likely the source of the perceived
shadow under the icons on mobile. Forced background:none and the
matching padding-left:0 on mobile so the cluster reads as bare
icons without any soft edge.
Was only the date moving — the expanded card had a more-specific
`padding: 4px 0 6px` shorthand on the title row that zeroed out
the padding-left from my earlier nudge. Added the expanded-card
selector to the padding-left:4px rule so the title now lines up
with the meta line in both list and expanded states.
Was using flex-wrap:wrap on the From row, which let the chip span
flip onto a new row below From: when the available width briefly
dropped — then snap back as the chip span's overflow-scroll kicked
in. Switching to flex-wrap:nowrap keeps the label glued to the
chip; the chip span shrinks/scrolls horizontally instead.
In docked mode the header already reserves vertical space for the
absolute action cluster, so the To/Cc details fit without any
height tradeoff — force [hidden] open and hide the chevron toggle
so the recipients are always visible there.
Was From→To = 0 (meta gap 2 + details margin-top 0) while To→Cc
was 6 (details gap). Set details gap to 2 in docked too so all
three meta rows have the same vertical distance. Dropped the
per-row margin-top:4 docked override since spacing now comes
entirely from gaps.
Docked header is flex-direction:column, and the base
align-items:flex-start was sizing the meta to its chip width and
parking it at the left — the absolute cluster's right:0 then
landed at the meta's right edge in the middle of the pane.
align-items:stretch makes meta fill the header width so right:0
hits the actual right edge.
Dropped the docked-specific overrides (cluster flowing below meta,
padding-right:0, header min-height:0). The same container-query
rules drive both: cluster floats top-right and wraps to 2 rows
when the reader width crosses 600px, snaps to overlay below 380px.
Docked pane width is just another container width.
1. Moved the min-height from .email-reader-header to .email-reader-meta
(92px) inside the <600 container query. Targeting the container
itself in its own @container rule was flaky; using a descendant
that affects the parent's intrinsic height works reliably.
2. Dropped the margin-top:0 reset on the cluster in the <380 overlay
rule — that was clearing the base -7px lift and sliding the
cluster ~7px downward at the breakpoint. Now both states use the
same -7px lift so the visual position is stable across the
transition.
Dropped the @media(769px) from-row min-height + align-items:center
and the strong > top:-2px nudge — leftovers from the grid layout
that were forcing extra height and label offsets the block-flow
meta doesn't need.
Consolidated docked overrides into a single flat block (no @media
wrapper) and merged the two .email-reader-meta declarations into
one. Same visual result, much less competing CSS to debug.
When the cluster wraps to 2 rows (44 + 4 gap + 44 = 92px tall), it
was peeking out below the header bottom because min-height stayed
at 60px (only ~44px of cluster room). Bumped min-height to 108px
inside the same <600 container query so the wrapped cluster sits
fully inside the header with 8px breathing room top + bottom.
load_settings() already catches PermissionError, but load_features() caught only
FileNotFoundError/JSONDecodeError/ValueError. An existing-but-unreadable
data/features.json (e.g. root-owned after a deploy) therefore raised instead of
falling back to DEFAULT_FEATURES, taking down GET /api/auth/features and anything
that reads feature flags. Add PermissionError to the except tuple to match
load_settings().
Adds tests/test_load_features_permission_error.py.
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* ci: add security scanning suite and governance
Consolidates the security CI work into one reviewable change. Adds, as
separate workflow files under .github/workflows/:
- secret-scan.yml gitleaks (pinned + checksum-verified), full history
- workflow-security.yml actionlint + zizmor, audits the workflows themselves
- dependency-review.yml PR dependency gate + advisory pip-audit
- container-scan.yml hadolint (blocking) + Trivy image scan (advisory)
- codeql.yml CodeQL for Python and JS, main + weekly
Plus .github/dependabot.yml (pip/npm/actions/docker), .github/CODEOWNERS,
and docs/security-ci.md explaining each check and the one-time settings.
All additive: no existing files are modified. Actions are pinned to commit
SHAs, tokens default-deny (permissions: {}), advisory scans never block,
and SARIF upload is gated to push so fork PRs do not fail on a read-only
token. Composes with the correctness CI in #1015.
* ci(security): isolate Trivy from the Dockerfile lint gate
Address review on #1314 (points 2 and 3).
container-scan.yml now runs only hadolint (the blocking Dockerfile lint)
and keeps the broad pull_request + push:[main] trigger so the required
check always reports and never hangs a PR.
The advisory image scan moves to container-trivy.yml, split by event:
- pull_request / workflow_dispatch: build and scan under contents:read
only, no SARIF upload. The image build runs PR-supplied Dockerfile
instructions, so this path holds no write scope.
- push to main: build, scan, and upload SARIF with security-events:write.
Only this trusted path is granted write.
This stops PR jobs from requesting security-events:write they never use,
and a paths-ignore (matching docker-publish.yml) skips the image rebuild
on docs-only changes.
docs/security-ci.md: correct the trigger description to "every pull
request and every push to main", matching the workflows and the existing
ci.yml convention.
Verified locally: zizmor --offline --min-severity=low and actionlint are
clean on the changed and new workflow files.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix: read allow_bash/allow_web_search from JSON body (#3229)
API callers using Content-Type: application/json had bash and web
tools silently disabled because allow_bash / allow_web_search were
only read from FormData (which is empty for JSON requests).
Changes:
- Fall back to JSON body for allow_bash and allow_web_search values
- Only add bash/web_search to disabled_tools when explicitly set to a
falsy value; when unset (None), defer to per-user privilege checks
- Admins with can_use_bash=True now get bash enabled by default
Fixes#3229
* fix: always send explicit allow_bash/allow_web_search from frontend
The backend 'is not None' guard (from prior commit) is correct for API
callers, but the frontend only sent allow_bash=true when the toggle was
ON — omission meant 'unspecified' which the backend treated as 'don't
disable'. Now the frontend always sends an explicit true/false value:
- allow_bash: sent on every request (checked ? 'true' : 'false')
- allow_web_search: explicit 'false' when toggle is off in agent mode
With explicit frontend values, the 'is not None' guard is safe:
- explicit true → tool enabled
- explicit false → tool disabled
- None (API caller omission) → defer to per-user privilege
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix: expand cookbook error output tail from 12 to 50 lines
When a task reaches status 'error', the status endpoint was returning
only the last 12 lines of the subprocess log. The existing context-menu
'Copy last 50 lines' action was therefore copying the same 12 lines,
making it useless for diagnosing failures that produce long stack traces
or build output.
- Set _tail_lines = 50 when status == 'error', keep 12 for running tasks
- Initialise exit_code = None before the status-classification block so
it is always defined in the result dict (was only set inside the
is_alive branch, potential NameError in the dead-session path)
- Include exit_code in the task-status response dict
- JS poller captures exit_code from live data into local task state
The frontend output panel and 'Copy last 50 lines' now show the actual
error context without any UI changes.
* refactor: extract output-tail logic to testable helper + behavioral tests
Addresses review feedback on #1538: the previous tests were source-level
string guards. Extract the tail-slicing into a dependency-free helper
(routes/cookbook_output.error_aware_output_tail) and replace the guards
with behavioral tests that exercise the actual logic:
- error status with a 200-line snapshot -> exactly the last 50 lines
- running/ready/completed/stopped/unknown -> last 12 lines
- short snapshot -> all lines, no padding
- empty snapshot -> empty string
- error tail is a strict superset (suffix-compatible) of the non-error tail
The helper has no FastAPI/SQLAlchemy imports so it unit-tests without
standing up the app.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
Add a documentation-only test layout inventory for the first low-risk split of the flat tests directory.
Records the current 28-file area_cli set, including tests/test_research_cli_status.py, and documents validation/non-goals for the future mechanical move.
Closes#3712
Part of #2523
* feat(agent): workspace confinement via context-local binding + get_workspace tool
Bind the per-turn workspace once in execute_tool_block; the shared path
resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd
helper (agent_cwd) read it, so file tools + bash/python are confined centrally
and a new tool that uses the shared helpers cannot accidentally bypass it.
Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory
modal (reusing existing modal/button CSS), the /workspace slash command, and a
get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic
(realpath/normcase/commonpath) and docker-safe (container paths, no host
assumptions). Reopens#2023.
* ux(workspace): clarify workspace is not a sandbox
Picker modal note + pill tooltip + get_workspace tool/output wording now state
plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder,
but bash/python only start there (cwd) and are not sandboxed. Modal note reuses
the existing .muted class.
* fix(agent): treat an active workspace as file-work intent
A vague low-signal message (e.g. "look at the local project") matches no
domain keywords, so tool retrieval is skipped and only always-available tools
are offered — leaving the agent with no file access even though a workspace is
set. When a workspace is active, include the file/code tools (incl.
get_workspace) on low-signal turns so the agent can act on the folder.
Also requires the tool index (ChromaDB) to be reachable for normal retrieval;
that is an environment dependency, not part of this change.
* ux(workspace): hide pill + overflow entry in chat mode
Workspace only scopes the agent's file/shell tools, so the pill and the
overflow 'Workspace' entry are agent-only now — hidden in chat mode like the
bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is
called from the agent/chat setMode handler.
* prompt(tools): steer bash/python to defer to the dedicated file tools
bash/python schema descriptions (what native-tool-calling models read) were
bare and gave no steer, so models would do file ops via the shell (e.g. writing
SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python
in the schema + tool-index + prompt section to prefer read_file/write_file/
edit_file/grep/glob/ls and only be used for what those do not cover.
* prompt(tools): keep bash/python deferral generic (no hardcoded tool names)
Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc.
by name, so the guidance does not go stale if those tools are renamed.
* style(workspace): drop em-dashes from added code comments/strings
* ux(workspace): terser non-sandbox note in picker (no tool-name list)
* ux(workspace): mirror terse non-sandbox wording in pill tooltip
* chore: untrack local venv symlink (run-only, not part of the feature)
* prompt(workspace): keep get_workspace text generic (no hardcoded tool names)
* fix(agent): low-signal + workspace surfaces only read-only file tools
Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message
in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but
not write_file/edit_file/bash/python -- those wait for a request that actually
calls for them (RAG retrieval still adds them on a real ask).
* feat(workspace): cap browse listing at 500 dirs with a truncated hint
Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local
_MAX_BROWSE_DIRS so a directory with thousands of children does not dump every
row into the picker; the response carries a truncated flag and the modal tells
the user to type a path to jump in.
* chore: untrack local venv symlink (run-only artifact)
* fix(workspace): vet the workspace root against the sensitive-path deny list at bind time
The in-workspace resolver deny-lists sensitive paths inside the workspace,
but the empty-path search root is the workspace itself, so a workspace of
~/.ssh could be listed via ls with no path. vet_workspace() (public, in
tool_execution next to the resolvers) rejects non-directories and sensitive
roots before the path is ever bound; chat_routes uses it instead of its
inline isdir check.
* fix(workspace): reject filesystem roots and stop showing rejected workspaces as active
Review findings from #3665:
P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes
every absolute path 'inside' the workspace and collapses confinement into
host-wide file access. A root is its own dirname, so reject when
dirname(resolved) == resolved; the browse response now carries a selectable
flag and the picker disables 'Use this folder' on unselectable dirs.
P3: /workspace set stored any string client-side and the chat route silently
dropped rejected values, so the pill could claim a confinement that was not
in effect. New admin-gated /api/workspace/vet validates manual paths before
they persist (canonical path returned), and when a posted workspace is
rejected at send time the stream emits workspace_rejected so the client
clears the stored value and toasts instead of continuing silently.
* fix(workspace): check caller privilege before vetting the posted workspace
Review finding: /api/chat_stream called vet_workspace() on the posted value
for every caller and emitted workspace_rejected on failure, so a non-admin
who can chat but cannot use file/shell tools could distinguish existing
directories from missing/file/sensitive/root paths by whether the event
appeared. The resolution now lives in _resolve_request_workspace, which
drops the submitted value uniformly for non-admin callers, with no vetting
and no event, before the path ever touches the filesystem. Admin and
single-user behavior is unchanged. Test pins that valid and invalid paths
are indistinguishable for a non-admin and that vet_workspace is never
invoked for them.
Replace hardcoded [:2000] and [:4000] slicing with the shared _truncate
helper from tool_utils, which uses MAX_OUTPUT_CHARS and adds an explicit
truncation indicator when content is cut.
Scoped down from the original PR: only agent/tool-output display
behavior, no integrations.py changes.
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers
_apply_local_cache_affinity adds session_id + cache_prompt for llama.cpp
KV-cache slot affinity (#2927), gated on _is_self_hosted_openai_compatible,
which treated any unknown OpenAI-compatible host as self-hosted. Strict
cloud providers added as custom endpoints (Mistral at api.mistral.ai)
reject unknown body fields, so every request failed with 422
extra_forbidden. Self-hosted now also requires the endpoint to resolve as
local via model_context.is_local_endpoint: loopback/private/tailscale
host, or endpoint kind explicitly configured as "local" (the escape hatch
for tunneled self-hosted servers). is_local_endpoint is promoted to a
public name since llm_core now shares it.
Fixes#3793
* test(llm): sweep cloud OpenAI-compatible hosts in affinity gating
Parametrized cases adapted from #3839 (credit: Shabablinchikow): deepseek,
x.ai, together, fireworks, and the Gemini OpenAI-compat endpoint must all
stay free of the llama.cpp extras, not just the Mistral host from #3793.
* fix(llm): narrow the Tailscale range to 100.64.0.0/10 in is_local_endpoint
Review finding on #3945: _PRIVATE_PREFIXES carried a bare "100." prefix,
treating all of 100.0.0.0/8 as local while Tailscale only uses the CGNAT
block 100.64.0.0/10. Public 100.x hosts (e.g. AWS ranges outside the
block) were classified local and still received the llama.cpp extras
this PR exists to keep away from strict providers. Match the narrowed
classification routes/model_routes.py already uses, with boundary tests
just below, inside, and just above the range.
_search_fts ran the FTS MATCH query, then looked up each hit's full row with its
own db.query(...).filter(id == message_id).first() inside a loop, so a search
returning N hits issued N extra SELECTs. Fetch all hit rows in a single IN(...)
query via _fetch_messages_by_id and reassemble results in hit (relevance) order.
Adds tests/test_session_search_batch_fetch.py asserting a single batched query
(and no query for empty input). Existing session-search tests stay green.
raw.githubusercontent.com serves Markdown as text/plain, JSON APIs and raw
config files serve application/json, and a lot of code and tool documentation
lives in .md/.txt. fetch_webpage_content only handled PDF and HTML, so a
non-HTML body produced empty content and web_fetch reported 'no readable text
content'. Add a branch that returns the body verbatim for non-HTML text/*,
JSON (application/json and +json), and a .md/.txt/.text/.json URL-suffix
fallback for mislabeled octet-stream. HTML and PDF handling unchanged.
Fixes#3808
* fix: use correct element IDs for privilege-gated button hiding
The privilege-gated button hiding in initializeEventListeners() used
stale element IDs that no longer exist in the DOM:
- 'tool-bash-btn' -> 'bash-toggle-btn' (the actual shell button ID)
- 'tool-image-btn' -> 'set-imgEnabledToggle' (admin settings toggle,
since no standalone image button exists in the composer)
Without this fix, users without can_use_bash / can_generate_images
privileges still see buttons that appear to work but then fail.
* fix: remove incorrect image generation toggle targeting
The set-imgEnabledToggle is the global admin Image Generation master
switch, not a per-user composer control. Non-admins without
can_generate_images never render that toggle, so the lookup is null
and the branch no-ops. Admins without the privilege get the app-wide
toggle force-unchecked based on personal privilege, which is confusing.
There is no composer image button in the DOM, so nothing to hide here.
Drop the can_generate_images block entirely as vdmkenny requested.
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* fix(email): keep FETCH attributes Gmail sends after the header literal
imaplib returns a UID FETCH response as an interleaved list of
(meta, literal) tuples plus bare bytes elements. Which attributes land
where is server-specific: Dovecot sends FLAGS before the RFC822.HEADER
literal (inside the tuple meta), Gmail sends them after it, as a bare
` FLAGS (\Seen))` element. The email list grouping loop and the search
loop only inspected tuples, so on Gmail every message lost its FLAGS and
the whole mailbox rendered as unread/unflagged, with mark-read appearing
to have no effect.
Extract the grouping into _group_uid_fetch_records(), fold bare bytes
parts into the current message meta there, and reuse it in both the
batched list fetch and the per-UID search fetch. Covered by unit tests
with captured Gmail-shaped and Dovecot-shaped responses.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test(email): use raw byte literals for IMAP backslash escapes
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Replaced the grid layout (which made From row height depend on
cluster height, causing To/Cc to shoot up or down at the wrap
breakpoint) with a plain block stack:
- meta = position:relative block
- From row + details = natural block flow with padding-right
reserving space for the absolute cluster on the right
- cluster = position:absolute top-right, width changes per
container query (308px wide / 158px narrow / 180px overlay)
- padding-right tightens from 320px → 170px → 0 as the cluster
shrinks and finally goes overlay
- details margin-top dropped from -10px to 0 since there's no
grid row gap to compensate for
To/Cc now hugs From with no jumps when the cluster wraps or
overlays.
fire() and fire_and_forget() scheduled delivery with bare create_task()/
loop.create_task() and kept no reference. asyncio holds only a weak reference to
a task, so the GC could collect a delivery (or the fire() coroutine itself)
before it completed, silently dropping the webhook.
Track in-flight tasks in a set on the manager via a _spawn_tracked() helper that
holds a strong reference for the task's lifetime and discards it on completion
(add_done_callback), and route both schedule sites through it.
Adds tests/test_webhook_task_refs.py.
Starlette 1.2.0 prefers httpx2 in the test client and emits a
StarletteDeprecationWarning on TestClient import when only classic httpx
is installed. Adding httpx2 silences the suite-wide warning; runtime code
keeps importing httpx directly and is unaffected.
Fixes#3942
Removed the medium-mode -12px details margin compensation — it
under/over-shot depending on grid row sizing. Replaced with a
:has() rule: when the user expands To/Cc, the From row gets
min-height 92px (matching the cluster's 2-row max height). Row 1
becomes the same size whether the cluster is 1 row (wide) or 2
rows (narrow), so resizing across the 600px wrap breakpoint no
longer makes To/Cc shoot up 4px.
When the cluster snaps to absolute overlay at <380px, it stops
contributing to grid row sizing — row 1 was collapsing to the From
row's natural height, which made the To/Cc details slide upward and
left the floating cluster visually misaligned against them. Setting
min-height:88px on the From row inside the same container query
holds row 1 at the cluster's two-row height so nothing jumps.
PATCH and DELETE /api/tokens/{id} both called require_admin but never
checked that the token belonged to the requesting admin. Any admin could
rename, re-scope, or delete another admin's token by ID.
create_token already stamps owner on every token — update and delete
just never read it. Fixed by comparing token.owner against
get_current_user(request) after the 404 guard, same pattern the rest of
the auth routes use. Check is skipped when current_user is falsy
(AUTH_ENABLED=false / single-user mode).
Fixes#3898
Was fanning out to 3 rows because the 152px max-width (3 icons +
2 gaps exact) had no slack — subpixel rounding could push the
third icon over and trigger another wrap. Bumped to 158px in the
in-grid mode (600px breakpoint) and 180px in the absolute-overlay
mode (380px breakpoint, where the 22px padding-left from the
gradient fade was also eating into the 3-icon row width).
Was wrapping into 4+ rows at narrow widths because the cluster's
grid column could shrink below the 3-icon cap. Set both min-width
and max-width to the 3-icon row width and justify-self:end on the
cluster so the icons stay glued to the right edge instead of
sliding toward the middle when the cluster is wider than its
content.
Anthropic removed the sampling parameters (temperature, top_p, top_k)
starting with Claude Opus 4.7 — sending temperature at all, even 0.0,
returns HTTP 400. _build_anthropic_payload sent it unconditionally, so
every native-Anthropic request to Opus 4.7/4.8 failed: the research probe
(ResearchHandler._probe_endpoint, temperature=0) aborted runs before they
started, and all DeepResearcher._llm calls 400'd.
Add _anthropic_rejects_temperature (version-gates opus-N-M >= (4,7)) and
omit temperature in the Anthropic builder for those models. Older Claude
models (Opus 4.6 and below, Sonnet/Haiku) keep temperature and the
existing [0,1] clamp.
The version gate is hardened against real-world model id shapes:
- a word-boundary anchor so a substring like `octopus-4-8` is not read
as Opus and stripped of temperature;
- a 1-2 digit minor cap so a dated id such as `claude-opus-4-20250514`
(Opus 4.0, listed in ANTHROPIC_MODELS) parses as major-only and keeps
temperature, while dated 4.7+ snapshots still match;
- a non-string guard so a non-string model can't raise AttributeError
(the previous builder never called .lower() on it).
Adds regression tests covering 4.7/4.8 omission, older/dated/legacy
retention, the substring overmatch, and non-string inputs.
Fixes#3065
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 600px / 380px breakpoints were @container docpane queries but
the email reader isn't inside a docpane container — they never
fired and the cluster wrapped to 3+ rows at narrow widths. Added
container-type:inline-size + container-name:emailreader on
.email-reader-header and switched the queries to that container,
so the 2-row cap now actually applies.
Three-step shrink:
1. > 600px pane: cluster sits in col 2 as 1 row of 6
2. 380-600px pane: cluster capped at 3-icon width so wrapping
stops at 3 + 3 (max 2 rows) — chips share width with the 2-row
cluster instead of multiplying into 3+ rows
3. < 380px pane: cluster snaps to absolute overlay with left-edge
box-shadow, still capped at 3-icon width so it's the same 2-row
shape but floating above the truncated chips
Grid tracks now:
- col 1: minmax(60px, 250px) — chip natural width capped at 250px,
with the 60px (4 char) floor enforced on From / To / Cc alike
- col 2: minmax(48px, 1fr) — takes the rest, shrinks first when
the pane narrows
Removed the hard max-width on the action cluster so on wide panes
it stays as one row of 6. Once col 2 shrinks below the 1-row width,
flex-wrap kicks in and the icons re-stack to 3+3. Chips only start
to shrink past that point.
- Action cluster's max-width is calc(48*3 + 4*2) so the 6 icons
always lay out as 3 top / 3 bottom by default.
- When the pane narrows the chips in col 1 shrink first (with 60px
min so 4 chars + ellipsis stay visible).
- At <380px the cluster snaps to absolute overlay with a left-edge
box-shadow so it reads as floating above the truncated chip.
Two-step shrink behavior:
1. As the pane narrows, the action cluster (max-width:50% of meta)
wraps to a 2-row icon stack first
2. Then the recipient chip span starts overflow-scrolling, but
keeps a 60px min-width (~4 chars) so the first chars of the
sender/recipient name stay visible
Previously only the From row affected the action cluster's column
width — To/Cc detail rows spanned both columns and ignored the
cluster. Now:
- meta-details lives in col 1 only so the To/Cc chips shrink
together with the From chip when the pane narrows
- action cluster spans rows 1 and 2 so its width is set by the
widest col-1 content; a long To/Cc list triggers the wrap to a
2-row icon stack just like a long From sender does
Meta switched to CSS grid in undocked mode:
- row 1, col 1: From row (label + chip + chevron)
- row 1, col 2: action cluster
- row 2, span: To/Cc details
The cluster shrinks alongside the chip and flex-wraps into a 2-row
icon stack before crowding the chip. At very narrow pane widths
(< 380px via @container docpane) it snaps back to absolute overlay
so From: still fits.
Docked mode overrides meta back to flex column so the cluster
flows naturally last — under From, and under To/Cc when expanded.
Was rendering as a transparent ghost — From chip / sender text bled
through the gaps between icons. Added a left-fading gradient
backed by var(--bg) so the cluster reads as an opaque overlay
while chips poking out from underneath blend smoothly into its
left edge.
- Window-level recipient-chip click handler now bails if the chip
is inside .email-reader-meta — the per-reader handler still
toggles the expanded-address view on click.
- The from-sender (magnifying glass) search button SVG is now
tinted with var(--accent-primary) so it stands out as a deliberate
search action against the neutral Reply / Forward / etc icons.
Moved the action cluster out of the From row to a sibling of meta
inside .email-reader-meta. Undocked: cluster is absolute-positioned
top-right of the header so it overlays the From line as before.
Docked: cluster is in-flow at the bottom of the meta column, so it
sits below the From row when collapsed and below the To/Cc rows
when the user expands the recipient details via the chevron.
The previous commit read toggleState.mode before it was declared
(send-time site near line 632) and outside its closure (assistant
finalize site near line 3426). Both threw ReferenceError / TDZ on
first send, which crashed the chat send + render pipeline.
Read fresh via Storage.loadToggleState() at each site, defaulting to
'chat' on any error. Mode-tag rendering otherwise unchanged.
Cluster is now in-flow with margin-left:auto and flex-wrap:wrap so
when the chip text grows long enough to crowd it, the buttons split
to a second row of icons before they have to cover the chip. The
absolute-overlay behavior kicks back in at very narrow pane widths
(<380px via @container docpane) so From: still fits on one row when
the pane is truly cramped.
Pulled the From row's negative margin from -8 to -4 so the whole
row (From: label AND chip) sits 4px lower together. Action cluster
below now justifies flex-end so the icons sit at the right edge
of the row instead of left-aligned.
Find-GitBash accepted the Microsoft Store / WSL bash.exe alias and only probed <root>\Git, so it never detected per-user Git for Windows installs under %LocalAppData%\Programs\Git and could skip the launcher's "install Git Bash" note even when no usable Git Bash was present.
Reject the WSL stub (system32/sysnative/windowsapps) and also probe %LocalAppData%\Programs\Git, mirroring core/platform_compat.find_bash.
Refs #3740
Sometimes the user lands in chat mode without realizing — surface the
mode the message went out on as a small uppercase pill right after the
timestamp in the role header.
- roleTimestamp(when, mode) gains an optional mode arg. Agent renders
in accent; Chat renders in muted/neutral. Other values render
nothing (back-compat for older history without the field).
- The three roleTimestamp call sites pass metadata?.mode through.
- chat.js writes mode into the user-message metadata at send time and
into the assistant metadata when the active-stream render lands,
reading toggleState.mode so research/agent overrides upstream still
flow through correctly.
Historical messages from before this change just don't show the pill —
graceful fallback, no migration needed.
find_bash() rejected the WindowsApps WSL stub and then probed only %LocalAppData%\Git, so per-user Git for Windows installs (winget / Inno Setup {userpf}) under %LocalAppData%\Programs\Git were never found and the Cookbook reported "needs Git Bash" despite Git being installed.
Add the Programs\Git subfolder to the LocalAppData fallback root.
When the modal is docked there's no room to overlay the actions on
the From line. Now:
- From row gets flex-wrap so the action cluster drops to its own
row below the From label + chevron
- Action cluster goes position:static, flex-basis:100%, no gradient
fade, no padding-left, left-aligned
- Whole From row pulled up 8px to claim back vertical space
- Header min-height drops back to 0 since buttons no longer
overlap
Also bumped the gap from From to To/Cc details by 2px (-8 → -6).
opacity 0.55 → 0.45 and explicit color:var(--fg), matching the
.cal-search-icon treatment so the email chip-bar magnifier reads at
the same muted intensity as the calendar search field.
Bumped header min-height to 60px and padding-top to 8px so the
44px-tall action buttons (absolutely positioned inside the From
row) have room without overflowing the header. From row gets
min-height:44px on desktop so the buttons fit cleanly inside it.
Dropped the now-redundant negative margin nudges on the From row
and the strong label.
Search input gets position:relative;top:-1px so the placeholder text
sits 1px higher inside the chip bar.
AI reply choice popover: drop the '...' kebab and the 'Draft with
note' textarea row entirely. Replace the concentric-circle Full icon
with our standard accent dot (filled 6px circle in viewBox 24).
Pinned .email-reader-actions-inline to absolute top:0 right:0 of the
From row with a gradient fade. When the window narrows the cluster
stays on the From line and the recipient-chips span scrolls under
it, so users can swipe/drag to reveal recipients tucked behind the
buttons instead of seeing From: jump above the action row.
Main button: open cached AI draft if one exists, otherwise generate
a fast draft inline. No more intermediate Fast/Full/Note menu.
Caret on the side opens a focused popover with just a textarea +
Generate button — the user types instructions (e.g. 'thank them and
confirm Tuesday at 2', 'decline politely') and submitting fires the
full-mode generation with those instructions as the noteHint.
- _aiReplySplitButtonHtml(data) centralizes the new HTML so all three
reader render sites use the same markup.
- _showAiReplyChoice rewritten — drops the Fast/Full toggle row plus
the kebab + 'Draft with note' two-step. Ctrl/Cmd+Enter submits.
- _handleAiReplyButton routes based on which inner button was clicked
(caret → popover, main → run-or-open).
- The three reader event registrations now listen on .ai-reply-split
so both inner buttons feed the same handler.
- Desktop (>= 769px): From row gets margin-top -4px so the whole
From + action cluster sits 4px higher in the header.
- Mobile @media block untouched.
- To/Cc gap bumped 4px → 6px for slight breathing room.
Strong labels reserve min-width:36px so the chips after each label
start at the same x — From, To, Cc all line up. Killed the
docked/docpane grid-stack overrides that were splitting label and
chips onto separate rows, since chips already scroll horizontally
inside each row when there are too many.
- Docked: From row + action cluster nudged up 4px
- Chevron pulled 4px left so it sits tight to the From chip
- To/Cc detail block pulled up 8px to hug the From row
- 4px gap between To and Cc rows (was 2px)
Adding a new endpoint only auto-set the global default chat endpoint when
none was configured (`if not settings.get("default_endpoint_id")`). When the
existing default pointed at an endpoint the user had since disabled, it was
never reassigned, so features that read the raw `default_endpoint_id` setting
(notably Memory → Tidy) failed with "No default model configured — set one in
Settings" even though an enabled endpoint existed.
Reassign the default when the configured endpoint is missing/disabled, via a
new pure `_default_endpoint_needs_assignment` helper. Adds unit coverage for
the helper plus route-level regression tests for the disabled/enabled cases.
Fixes#3586
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Found the culprit — the docked-modal CSS forced .email-reader-meta-row
into a single-column grid, which collapsed the From row into a
vertical stack and pushed the action buttons below it.
Fix:
- Merged the primary + secondary action rows into one flat
.email-reader-actions-inline cluster inside the From row
- Made the cluster flex-wrap so it stays inline when undocked and
wraps below the chip when truly cramped (docked, narrow tab)
- Excluded .email-reader-meta-from from the docked-modal and
narrow-docpane grid-stack rules — those overrides now only
apply to the To/Cc detail rows
Absolutely-positioned 13px search SVG at the left edge of the chip bar
(same circle+line glyph used elsewhere). Bar padding-left bumped 8→26
to leave room. pointer-events:none on the icon so clicks still land
on the input, opacity 0.55 to match other muted prefix icons.
Star → bookmark banner SVG also in the card title row (em.is_flagged
glyph) and the inbox toolbar's _starIcon / _starFilledIcon, so every
favorites affordance matches the chats sidebar bookmark.
Search dropdown gains a third suggestion kind:
- kind: 'email' rows surface emails from the snapshot whose subject or
sender name match the typed term (top 4, scored by startsWith vs
substring). Render row carries a small envelope glyph + bolded
subject + 'from name' on the right.
- Picking one closes the dropdown and expands that exact card via
_toggleCardPreview, scrolling it into view.
Restructured the DOM so the Reply / Reply-all / Forward row lives
INSIDE the email-reader-meta-from div (after the chips span), and
the Summary / AI / More row sits directly below as a sibling of
From inside the meta. Killed the outer email-reader-actions
wrapper that kept letting the buttons drift out of position.
CSS now pushes the primary row right via margin-left:auto on the
From row and right-aligns the secondary row below it.
Reorganized the action cluster into two visible rows so each fits
the available width:
- Top row (on the From line): Reply / Reply-all / Forward
- Bottom row (under it): Summary / AI reply / More
Action cluster goes back to flex-direction:column, the row
wrappers are flex rows again (no more display:contents flatten).
After picking a filter from the dropdown the pill was 'icon + Unread'.
Drop the text — the icon is the affordance — so the pill collapses to
just the glyph + ×. Hover surfaces the friendly label via the title
attribute. Contact + text pills still carry their text label.
align-items: flex-start on the header keeps the action cluster
locked to the From line when the user expands the To/Cc details
— previously it drifted to vertical center as the meta grew taller.
Typing a filter keyword now surfaces the matching filter row in the
autocomplete (each with its existing dropdown icon). Picking one pins
a filter pill and drives the global filter state.
Keyword catalog (_LIB_FILTER_OPTIONS):
- has-attachments ← 'attachment', 'attachments', 'has attachment', 'attach'
- unread ← 'unread', 'new', 'unseen'
- favorites ← 'favorite', 'starred', 'star', 'flagged'
- undone ← 'undone', 'pending', 'todo'
- reminders ← 'reminder', 'reminders'
- unanswered ← 'unanswered', 'unreplied', 'no reply'
- pending_30d ← 'pending 30d', 'pending', 'recent pending'
- stale_30d ← 'stale', 'old', 'stale 30d'
- tag:urgent ← 'urgent', 'critical'
- tag:reply-soon ← 'reply soon', 'reply', 'follow up'
- tag:spam ← 'spam', 'junk'
- tag:newsletter ← 'newsletter', 'newsletters', 'subscriptions'
- tag:marketing ← 'marketing', 'promo', 'promotional'
Filter pill behaviour:
- Only one filter pill is active at a time — adding a new one replaces
any existing filter pill.
- _applyFilterPillSideEffect drives the existing #email-lib-filter
select (or the #email-attach-btn toggle for has-attachments). The
server-side list refetch follows for free via the existing 'change'
handler.
- Removing the filter pill clears the side effect.
Pill render gains the filter icon as a leading glyph; the suggestion
row renders icon + label in the accent colour so it visually reads as
a filter, not a contact.
After the toolbar reshuffle the action block is now two stacked rows
(Summary/More above Reply/Forward/AI), making it taller than the meta
block. The mobile header rule was align-items:center, which then pulled
the From:/To: rows down into the vertical middle of the header — the
'From: is in the middle' symptom. Switch to flex-start so meta sticks
to the top edge where the user expects it.
With the meta collapsed to a single visible From row + chevron,
there is room to put the action cluster on that same row as a
right-aligned sibling. Dropped the absolute positioning and
gradient-fade overlap — actions now flex-end via margin-left:auto
so From sits on the left and Reply / Reply-all / Forward / AI /
Summary / More all sit on the right of the same row.
Also moved the chevron inside the recipient-chips span so it sits
adjacent to the sender chip instead of wrapping onto a second line.
1. Multiple pills now AND together — 'alice + bob' means both alice
AND bob are somewhere on the email, not 'from alice OR from bob'.
(some → every in the filter.)
2. Default autocomplete focus is now -1 (no row pre-selected) so plain
Enter commits the input as a text pill — typing then Enter behaves
like a normal search. ArrowDown / ArrowUp + Enter still picks a
contact suggestion. Tab still autocompletes the most-relevant match
regardless of arrow state.
3. Pill × button nudged up 4px so it sits on the visual centerline
inside the 18px pill height.
Only the From row shows by default. When the email has To and/or
Cc recipients, a small chevron sits next to the From chip — click
it to inline-expand the To/Cc rows below (rotates 180deg open).
Trims the header to a single visible row in the common case,
leaving the action cluster plenty of vertical headroom to stay
on a single row.
Reverted the single-row meta strip — misread the user's ask. Each
meta field gets its own row (From / To / Cc stacked), label sits
tight to the chips on the same line, recipient chips inside the
row still scroll horizontally so long lists slide under the
floating action cluster.
Three stacked meta rows wasted vertical space — From, To, Cc now
share one horizontal strip with each label tight to its chips. The
strip itself scrolls horizontally so the action cluster (still
floating top-right) can cover the right edge and the user can drag
to reveal recipients hidden underneath.
This also gives the actions a single shared row, since the meta
no longer dictates a multi-row header height.
Four fixes from the first round of usage:
1. Pill height was larger than the chip-bar's row — shrink to a fixed
18px-tall pill (line-height + height pinned) so it sits inside
the input row.
2. List refresh wiped pill state — when _loadEmails replaces
state._libEmails (refresh, folder switch, etc.), refresh the
snapshot to the new list and re-apply the pill filter so pills
persist instead of resetting to 'show all emails'.
3. Click-to-add only worked inside the open email reader. Extend the
capture-phase handler to ALSO catch clicks on .email-meta-sender
inside the library grid — the list card's sender name is the most
natural place to want to pivot from.
4. Esc inside the chip-input didn't close the modal. New behaviour:
if the autocomplete dropdown is open, Esc closes only the dropdown
(and swallows the event); otherwise Esc blurs the input and bubbles
so the existing modal Esc handler can close the library.
Also wires data-email + data-name on .email-meta-sender so the click
handler has reliable targeting.
The primary/secondary row wrappers were still creating nested flex
containers — even with parent flex-direction:row the two row divs
sized to content and could stack visually. Switching the wrappers
to display:contents collapses them entirely so all 6 buttons
become direct flex children of .email-reader-actions and lay out
on a single row guaranteed.
Replace the single text-input + IMAP search round-trip with a deterministic
local chip-bar filter modelled on the gallery's tag pills.
What lives in the bar
- Each filter is a pill: { type: 'contact', name, email } or
{ type: 'text', text }.
- Click anywhere in the bar lands the cursor in the input field.
- Typing populates a dropdown of matching contacts + recently-seen senders
(cached per modal open via _buildSuggestionSource).
- Tab / Enter on a highlighted suggestion → adds a contact pill.
- Enter on free text with no suggestion match → adds a text pill.
- Backspace on empty input → pops the last pill.
- × on a pill removes that one.
- Arrow keys navigate the suggestion list.
Filtering
- _applyPillFilter snapshots the loaded list once, then for every render
shows emails where ANY pill matches:
contact pill — from_address equals OR to/cc contains the pill's email
text pill — broad substring match across subject/from/snippet
Click-to-add
- Capture-phase click handler on .recipient-chip inside the email reader
drops the person into the library as a contact pill (and reopens the
library window if it was closed/minimized).
Removed the debounced /api/email/search IMAP call and its 'Loading emails'
side effect. The dropped server search was the source of the 'type
jonathan, get stuck on Loading' bug.
Reply / Reply all / Forward / AI / Summary / More now flow inline
on one row instead of being split into a primary (Summary+More) and
secondary (Reply group) stack. Mobile + docked overrides also
flipped from column to row.
From/To/Cc back on the left, action cluster (Reply / Reply-all /
Forward / AI / Summary / More) absolute-positioned top-right with a
gradient fade so chips that overflow slide cleanly underneath. The
recipient-chips lists no longer wrap — they scroll horizontally,
matching the account-chip strip pattern, so users can drag/swipe
to reveal recipients hidden under the action cluster.
Mobile (@media max-width:768px) gets the same row+absolute layout
instead of the previous column with actions on top. The narrow
container query (docpane max-width:460px) still falls back to
in-flow column so it doesn't overlap on very narrow panes.
Was: from/to/cc/date meta on the left, action cluster (Reply / Reply
all / Forward / AI reply / Summary / More) pinned to the right of
the header. Now: actions stretch across the top in their two existing
sub-rows, the from/to/cc meta sits below.
Pure CSS — no template restructure. The .email-reader-header flexbox
flips to flex-direction:column, .email-reader-actions gets order:-1
to render first, and the existing flex-end aligned action-row rules
swap to flex-start so buttons read left-to-right across the top
toolbar. Mobile media query overrides bend the same way so the
layout is consistent across breakpoints.
The Fast/Full popover now has a kebab (three-dot) button alongside the
two preset choices. Clicking it expands a textarea below with a
'Draft with note' send button. The textarea is for the user to tell
the AI how to reply ('confirm Tuesday at 2', 'decline politely', 'say
we'll need an extra week') instead of accepting a generic draft.
Plumbing:
- emailLibrary.js: kebab button + note panel inside .email-ai-reply-choice
menu. Submitting calls _runAiReplyFromButton with mode='ai-reply-full'
and a noteHint string.
- _runAiReplyFromButton signature gains noteHint; passes it through
state._onEmailClick as opts.noteHint.
- emailInbox.js consumer: forwards opts.noteHint into _openEmail's new
5th arg, which puts it in the /api/email/ai-reply POST body as
user_hint.
- routes/email_routes.py /ai-reply: reads user_hint, appends a
'User's instructions for THIS reply' section to the user message
(priority over default tone/length). Also skips the per-message
AI-reply cache when a hint is set — the cached generic draft would
silently override the instructions otherwise.
Restructure the action cluster so it stays as two visible rows inside
.email-reader-actions instead of flattening via display:contents:
- Top row: Summary, More
- Bottom row: Reply, Reply all (conditional), Forward, AI reply
Dropped the Search button — wasn't part of the requested layout.
CSS: .email-reader-actions becomes flex column with both rows
right-aligned; .email-reader-actions-row becomes a real flex row
(no more display:contents flattening) so each row stays on its own
line. Whole block continues to sit beside the From/To meta inside
.email-reader-header.
After dropping the 'Default' chip, _loadAccounts started setting
state._libAccountId asynchronously while _loadEmails fired in parallel
with the still-null id. The list request was going out with no
account_id (so the server defaulted) while subsequent per-email reads
used the explicit id set after _loadAccounts resolved — back to the
same desync the chip-removal was meant to fix.
Sequence them: await _loadAccounts first, then kick off the folders /
reminders / emails fetches. The list always carries the right
account_id from the very first call.
Replying from an email opened in a new tab was dragging that window to
the left-sidebar dock — same treatment as the main email library, even
though the user had explicitly opted to pop it into its own floating
viewer. Annoying when the viewer is mid-screen and Reply yanks it.
Add an early bail in _snapEmailModalToLeftSidebar for modals whose id
starts with 'email-view-' (the 'open in new tab' reader). Compose still
opens; the floating viewer just stays where it is, on top of the
library. User can move/close it themselves.
Bug: clicking the dot to change the server-side default account while
viewing 'Default' left a desynced state — the email list still showed
the OLD default's cached UIDs, but the server's default now pointed
at a different account. Opening any email used the visible UID +
account_id='' on the read endpoint, which resolved against the NEW
default account → wrong email content (or older mail entirely).
Fix: remove the 'Default' chip. _loadAccounts now auto-selects the
is_default account (or the first one) into state._libAccountId so the
list view + every per-email request always carries an explicit
account_id and can't desync from set-default.
The dot button still lives on each account chip for changing which
account the server treats as the default — but it no longer affects
which account the list is currently displaying.
- .email-reader-actions flex-wrap nowrap → wrap so when the cluster
exceeds the room next to a tall multi-recipient meta block, the
buttons wrap within the actions area instead of pushing the whole
block onto its own row below From/To.
- New rule: .email-reader-more-wrap gets order:99 so the More kebab
sits at the far right of the flattened flex row instead of in the
middle (its source order put it ahead of the secondary row's AI
Reply / Summary buttons after display:contents flattening).
The All/Unread/Favorites/etc selector was a native <select>, which
can't render SVG inside <option>. Replace it with a custom picker
that:
- Keeps the existing <select id="email-lib-filter"> as the value
store (hidden via display:none). All existing 'change' listeners
keep working — the picker just dispatches a change event after
updating the select's value.
- Renders a styled button + drop-out menu built from the select's
options (preserves optgroup labels like 'Tags').
- Each option carries an SVG icon: lines for All, ringed dot for
Unread, star for Favorites, empty checkbox for Undone, bell for
Reminders, reply arrow for Unanswered/Reply-soon, clock for
Pending, calendar-x for Stale, exclamation-triangle for Urgent,
ban for Spam, newsletter and megaphone for the marketing tags.
- Icons use var(--accent) so they pick up the user's theme color.
- Click outside / Esc closes the menu (Esc handler is capture-phase
+ stopPropagation so it doesn't bubble to the modal-close listener
and shut the whole email window).
CSS scoped under .email-filter-picker.
More menu reorganization:
- Group 1: Open in new tab, Remind to reply
- Group 2 (state): Mark as Unread/Read, Mark as Done/Not Done, Move to
Archive, Save sender to contacts
- Group 3 (destructive, unchanged): Move to Spam, Move to Trash,
Delete Permanently
- Renames: Done→'Mark as Done', Archive→'Move to Archive', Mark
Read/Unread→'Mark as Read'/'Mark as Unread'.
- Mark Unread moves out of group 1 down into the state-change group
alongside Done; Save sender to contacts moves down into the same
state group.
Toolbar row reshuffle (applies to both the email-list card reader and
the email document view):
- Row 1 (primary): Reply, Reply all, Forward, Search, More — Forward
no longer has to fight Search/More for space in the secondary row.
- Row 2 (secondary): AI reply, Summary — gets its own dedicated row.
The 6px dot was easy to miss on touch / small-cursor setups. Replace
padding-only sizing with explicit width:18px;height:18px on the
button, dot centered inside via justify-content. Anchor moved from
right:9 → right:6 so the visible dot stays where it was; the extra
clickable area extends inward from the chip edge.
Found the reason yesterday's tool-retrieval drop wasn't taking effect:
in _build_agent_prompt, when relevant_tools was provided, it computed
tool_names = set(ALWAYS_AVAILABLE) | set(relevant_tools)
which silently re-added every tool get_tools_for_query had just
deliberately discarded. So when a 'save this for <person>' query
dropped manage_memory from the retrieved set, the prompt builder put
it right back, and the model saw both tools again.
Trust the relevant_tools set. get_tools_for_query already starts from
ALWAYS_AVAILABLE — any discard there is intentional and should
propagate. Only force-include ask_user and update_plan here as belt-
and-suspenders since the agent loop relies on those for its own
control flow.
Other callers (task_scheduler) already union ALWAYS_AVAILABLE or
ASSISTANT_ALWAYS_AVAILABLE into relevant_tools before passing it in,
so they're unaffected.
GET /api/contacts/config masks the saved password as '***' (or ''
when none). Mirror that into the password input's placeholder so users
can see at a glance that a password is on file — matching the email
account form's '(unchanged)' pattern.
Description-level steering wasn't enough — even with the explicit 'DO
NOT use for info about another person' in manage_memory's description,
models kept choosing memory over manage_contact. They can't if memory
isn't in the toolset.
New logic in ToolIndex.get_tools_for_query: detect three contact-save
patterns and discard manage_memory from the returned set (overriding
ALWAYS_AVAILABLE):
1. 'save [up to 3 words] for/to <name>' where <name> isn't a timing /
pronoun stopword (later, tomorrow, me, you, future, etc.). Catches
the canonical 'save this for X' and the wider 'save this address
for X', 'save it for X'.
2. 'to/in/into (my) contacts' or 'address book'. Catches both 'add X
to my contacts' and 'put this in my address book for X'.
3. Possessive: 'save (his/her/their) (address/phone/email/...)'.
Stronger signal — also force-adds manage_contact to the set in
case the keyword fallback missed it.
Verified: 8 positive contact patterns all drop memory, 10 false-
positive 'save X for later/tomorrow/me/the next thing' all keep it.
Even with manage_contact in the retrieved tool set, models were still
defaulting to manage_memory when the user pasted an address + 'save for
<person>'. Both tools were in front of the model and it picked memory.
Tighten both descriptions to steer at decision-time:
- agent_loop.py manage_memory description: clarify scope is facts
about the USER, with an explicit 'DO NOT use for info about another
person' + a 'use manage_contact instead' line.
- tool_index.py manage_memory description: same in shorter form, so the
embedded retrieval signal is consistent with the prompt-time
description.
The contacts manager in Settings was stuck at name+email inline only —
no address field, no phone input on add, no search to find anything in
a list of 100+ contacts.
UI:
- Add form gets phone and address inputs alongside name/email. The
email-required gate becomes name-OR-email so address/phone-only
entries are creatable.
- Edit form gets an address input, threaded into the PUT body.
- Search input above the list filters client-side by name / emails /
phones / address (debounced 80ms). Count badge shows N/M when a
filter is active.
Backend:
- /api/contacts/{uid} PUT now accepts address and routes it through
_update_contact (which already supports it after the previous
commit). Validation loosened: name OR email OR address.
- /api/contacts/add POST now accepts phone + address. Phone goes
through an immediate _update_contact since _create_contact's
signature only takes name+email+address.
8px ring read as a sliver next to the chip label. Bump to a 10x10 SVG
with stroke-width:3 for the hollow ring so it presents like the
sidebar notif dot at this size. Chip padding-right bumped 20→24 so
the larger glyph isn't crushed against the text.
The literal phrase 'add to contacts' missed when there was a name
between 'add' and 'to', e.g. 'add Pat to my contacts'. Anchor on the
tail with 'to my contacts', 'to contacts', 'to address book' so word
boundaries fire regardless of what sits in front.
Replace the star polygon with a small 8px circle dot — filled +
accent-tinted on the default account, hollow + muted on others.
Vertical position bumped up 2px via top: calc(50% - 2px) so it
visually centers against the chip's text baseline instead of
geometric center.
Closes the gap that pushed the agent into manage_memory when the user
pasted an address and said 'save this for X'. manage_contact now
accepts an optional address arg end-to-end:
- routes/contacts_routes.py:
- _normalize_contact carries an 'address' field
- _build_vcard emits ADR:;;<address>;;;; (street component of the
RFC-6350 7-part ADR), only when address is non-empty
- _parse_vcards reads ADR, joins non-empty components with ', '
- _create_contact and _update_contact thread address through;
update preserves existing address when caller passes empty
- src/tool_implementations.py do_manage_contact:
- add accepts address; require at least name+address or email
(was: email required) so address-only contacts are addable
- update accepts address; require name OR emails OR address
- src/tool_schemas.py: schema gets a single 'address' string field
- src/tool_index.py + src/agent_loop.py: descriptions get one
'address' arg mention and a 'use this for save-X-for-person /
address pastes / phone-with-name' steering line. Net: a few
bytes added, not a paragraph.
Also: removed a stray name from the schema's manage_contact example
strings ('save Jonathan's email…') — no real names in the codebase.
- The 'All (default)' chip showed only the default account, so the
label was misleading. Rename to just 'Default' to match behavior.
- Each user account chip gets a star button (filled if it IS the
default, hollow otherwise). Clicking calls the existing
POST /api/email/accounts/{id}/set-default and refreshes the strip.
Cross-account aggregation (a true 'All') is a separate bigger lift
that needs UID namespacing and merge/sort in _list_emails_sync;
flagged for follow-up rather than smuggled into this change.
When the user dumps a postal address or phone number alongside a
person's name and says 'save this for X', the vector retriever was
missing manage_contact because its description only mentioned the
literal word 'contact'. The model defaulted to manage_memory (which is
in ALWAYS_AVAILABLE), so the saved fact ended up as un-named memory
that wouldn't surface on a later 'what's X's address?' search.
- Rewrite manage_contact's index description to anchor on the
semantics: 'save info about another person', including postal/
mailing address, ZIP, phone, etc. Now it embeds close to address-
paste queries.
- Extend the keyword intent-map with 'save this for', 'save it for',
'mailing address', 'postal code', 'their address', etc. — common
ways users say 'this belongs to a contact' without the literal word
'contact'.
Closes the auto-send hole that let earlier models invent signatures
(e.g. signing 'David' for a user named Felix) and SMTP them to real
recipients before the user could review.
New setting: agent_email_confirm (default True).
When on, the MCP send_email and reply_to_email tools no longer SMTP
directly — they write the composed email to scheduled_emails with a new
status 'agent_draft' (far-future send_at so the scheduled-send poller
ignores them) and return a {pending: true, pending_id, to, subject,
body, message: ...} payload. The model surfaces that to the user.
Backend endpoints to approve / cancel:
- GET /api/email/pending → list staged drafts for the owner
- POST /api/email/pending/{id}/approve → flip status to 'pending' +
backdate send_at so the
existing scheduled-send
poller delivers immediately
- DELETE /api/email/pending/{id} → status = 'cancelled'
UI:
- Settings / AI Defaults gets a new 'Email Safety' card with the
toggle, default on.
- Tool descriptions for send_email and reply_to_email now include the
pending behavior + an explicit 'DO NOT invent a signature, do not
type a person's name' guardrail.
Pass 2 (next): inline chat card with Send / Discard buttons so the user
doesn't have to type a confirmation reply. Today's prompt + the listing
endpoint give the model a clean path to surface drafts.
Old rule fixed the loading wrap at min-height:180px so the spinner
landed near the top of the email-list section. Switch to
position:absolute inset:0 over the grid (with #email-lib-grid set to
position:relative) so the whirlpool + 'Loading emails' label center
within the entire visible email area regardless of section height.
- 'Marking done' / 'Marking read' / 'Marking unread' label was 2px low
vs. the whirlpool spinner inside the Actions button. The existing
loading-label CSS only scoped to #email-lib-bulk-delete; extend it
to also cover #email-lib-bulk-actions and bump top from 0 to -2px.
- 'All' checkbox label was inline-styled top:2px so the box + text sat
lower than the surrounding bulk-action items. Reset to top:0 to
match memory + skills select-all rows.
Two pain points:
- IMAP server search is genuinely slow.
- The grid blanked to a whirlpool on every keystroke, so even fast
searches felt dead because you couldn't see your own results.
Fix:
- _localSearchFilter runs synchronously on every keystroke, filtering
the pre-search snapshot by subject / from-name / from-address /
snippet so the grid responds immediately. Snapshot is taken on the
first non-empty keystroke and restored when the input is cleared.
- _doSearch no longer renders the loading-whirlpool spinner into the
grid. The local filter already shows useful results; surface
'Searching…' in the stats badge to indicate the server search is in
flight.
- When server results land, they replace the grid; if the user has
already typed past them, the seq guard skips the stale render.
* fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models
The route did `n = int(gpu_count)` with no guard, so a non-numeric query param
like `?gpu_count=abc` raised ValueError and returned HTTP 500. Parse it
defensively (mirroring the gpu_group guard a few lines above): a malformed value
is ignored, exactly like omitting the param, and valid values still apply.
Adds tests/test_hwfit_gpu_count_nonnumeric.py: a non-numeric gpu_count returns a
ranking instead of raising, and a numeric value is still accepted.
* test(hwfit): cover non-numeric manual_gpu_count too
Follow-up to the gpu_count guard: add a regression test for the sibling
manual_gpu_count query param (the hardware simulator in _apply_manual_hardware),
which dev already guards by defaulting to 1 on a non-numeric value. This pins
that behaviour so the endpoint's count parsing is fully covered and cannot
regress to a 500.
Before: only delete showed a spinner/disabled buttons. Picking Done on
92 selected emails fired off 184 sequential HTTP calls (mark-answered
+ mark-read) with zero UI feedback, so it looked like the click did
nothing for the ~20-30 seconds it took to grind through.
- All five bulk actions (delete / archive / done / read / unread) now
swap the target button into a whirlpool+verb-ing state, dim siblings,
and show 'N/M…' progress in the count label that ticks as each
request resolves.
- Per-uid work runs in parallel with a hard cap of 6 in flight, so a
90-email Done finishes in ~3 server round-trips of latency instead
of 90, but we still don't open 90 simultaneous IMAP-backed connections.
Group 1 — per-email view actions:
Open in new tab → Mark Unread/Read → Remind to reply
Group 2 — non-destructive state changes:
Save sender to contacts → Done/Not Done → Archive
Group 3 — destructive (own divider):
Move to Spam → Move to Trash → Delete Permanently
Adds support for { separator: true } items in the actions array,
rendered as .dropdown-divider rows.
Repro: filter Undone → Select All → uncheck a few → Actions → Done →
nothing visible happens. Reason: the bulk-Done branch only flipped
em.is_answered on the in-memory entries; the cards stayed in
state._libEmails so they kept rendering, but now with the done check
ticked. From the user's POV — still 'undone' filter, cards still
there — it looked like the action was a no-op.
When the filter is 'undone' specifically, treat marking done as a
view-removal (same animate-then-prune step archive/delete uses).
When clicking an email higher up in the list, its top edge can be hiding
behind the modal header or off-screen. After applying the
.email-card-expanded class + the new minHeight, scrollIntoView(block:start)
on the next animation frame so the user sees the whole card.
The expanded email card painted a kebab menu in its title row because
the per-card .memory-item-actions menu at the bottom was hidden while
expanded. Both pointed at _showCardMenu(em). Remove the duplicate:
- Drop the email-card-header-menu button (and its rightCluster
wrapper) — title row now just holds the nav arrows.
- Remove the CSS rule that hid .memory-item-actions on
.email-card-expanded so the bottom kebab stays visible.
- Unread-dot insert point retargets to .email-card-nav-arrows now
that the rightCluster is gone.
state._selectedUids holds whatever the server returns for em.uid (string
or number); the bulk action looped Array.from(...) and did strict ===
against state._libEmails entries. When the types disagreed, the find()
returned undefined, the in-memory is_answered flip never happened, and
the post-loop _renderGrid() painted the cards back into their original
not-done state — looking like 'mark done' did nothing even though the
server-side call had succeeded.
- Compare via String() on both sides so the in-memory state actually
flips.
- Surface HTTP failure from mark-answered/mark-read so the existing
failedReadSync toast can fire if the calls don't go through.
Wrap the height:28px / top:0 rule in @media (min-width:769px) so it
can't leak into mobile, where a different touch-friendly variant
already sets min-height:36px + top:-2px.
Base .memory-toolbar-btn is 24px tall at top:-4px. Bump the compose
button alone to 28px (4px taller) and top:-2px (moves down 2px) so
it reads as the primary action in the toolbar without affecting
Select/Refresh.
Previously _prepareEmailWindowForDocument would:
1. Check if there was horizontal room for both email + doc.
2. If not, try collapsing the sidebar to recover space.
3. If even that wasn't enough, _clearEmailDocumentSplit() — the
email tab-down the user has been disliking.
Drop step 3. We still try collapsing the sidebar (free easy room),
but if the layout is still cramped, just dock anyway and let the
user manage their layout. _clearEmailDocumentSplit() is still
called on the legitimate close paths.
Wire the existing built-in PERSONAS catalog through to scheduled tasks
the same way I wired it to reminder synthesis. Repurposes the
dormant scheduled_tasks.character_id column.
UI (static/js/tasks.js)
- New 'Persona' select in the LLM / Research task form, with the five
built-in characters (socrates/razor/nietzsche/spark/odysseus) plus a
default 'no persona' option. Pre-populates from existing.character_id
on edit. Non-llm/research types explicitly clear it on save.
API (routes/task_routes.py)
- TaskCreate + TaskUpdate gain character_id: Optional[str].
- _task_to_dict echoes character_id back so the form can hydrate on
edit. Update endpoint stores '' as None to allow clearing.
Runner (src/task_scheduler.py)
- When task.character_id is set and matches a built-in persona, prepend
the persona prompt to the task system prompt so the model speaks in
that voice while still knowing it's running a scheduled task.
- crew_member.personality still wins as the base; character_id stacks
on top.
I removed the .email-menu-wrap markup from email rows earlier but
left the JS that queries it and calls .addEventListener on the
result. Since the query returns null, every _createEmailItem call
threw and the row never made it into the list — most visibly:
clicking a sender name to filter by them didn't appear to work,
because the row wiring (including the sender click handler) was
ripped out mid-construction.
- Drop the unconditional menuWrap.addEventListener('click', ...)
block — there's no menu to open.
- Drop the early-return guard on touchstart that referenced the
removed wrap.
- The two remaining .email-menu-wrap queries are already guarded
with 'if (menuWrap)' so they safely no-op.
The bell is already gated on settings.reminder_channel === 'email', but
the check only ran at email-library init — so switching the reminder
channel in Settings didn't update the bell until you reopened Email.
- Settings/Reminders channel-change handler now dispatches
odysseus-reminder-channel-changed { channel } after saving.
- emailLibrary listens for it and re-runs _syncEmailReminderBellVisibility
with the new channel value.
The strip already lives where account chips render, so the text label
beside the whirlpool was redundant. Strip the label + the fallback
'Accounts...' text — the spinner alone tells the user accounts are
loading.
Dropped the .email-menu-wrap / .email-menu-btn from each row. Other
handlers that check 'if (e.target.closest(.email-menu-wrap)) return;'
safely no-op when the element doesn't exist. Row click + swipe still
open the email and its in-reader actions.
Transparent at rest, accent gradient animates in on hover with a 0.18s
ease transition. Drag affordance + col-resize cursor still work; the
stripe just stops bothering you when not touched.
Right-side handle mirrors the gradient direction (left-to-right
gradient flipped to right-to-left).
Gmail composer chips arrive with inline border:1px solid #ddd + an
assumed white background, so on dark themes they read as a barely-
visible white box with the filename invisible. Override .gmail_chip /
.gmail_drive_chip inside .email-reader-body:
- Strip inline width:386px / height:20px (use auto + max-width:100%),
- Re-flow as inline-flex with a 6px gap so icon + filename align.
- Background tinted with var(--fg) 4%, border = var(--border).
- Anchor uses var(--accent) and the filename span uses var(--fg) so
text is always legible regardless of theme.
- Icon img clamped to 16x16.
Before: the attachment chip just dimmed (opacity 0.6) while the file
downloaded — easy to miss on a large attachment.
Now: replace the paperclip SVG with a 12px whirlpool spinner for the
duration of the fetch, restoring the original icon when the download
finishes (or errors out). Same loading vocabulary as Test / Scan /
Probe / Send buttons elsewhere in the UI.
When the email-answered event fires (user just sent a reply, so the
source email auto-marks as done), the row was getting the .active
class instantly with no visible cue beyond the checkbox tick. Add a
brief .email-auto-done-flash class on the row that runs two keyframe
animations:
- email-auto-done-row: tints the row background with the accent for
~1.2s then fades to transparent.
- email-auto-done-check: pops the done checkbox to 1.4× with an
accent ring that expands outward over 0.6s.
Class self-removes after 1.2s so it doesn't replay on re-renders.
The drag handle painted a 35% accent gradient strip on the page edge
of any docked panel. The col-resize cursor on hover is enough to
surface the affordance; the stripe felt like a stray UI element.
The single-row chip strip relied on native horizontal scroll, which is
hard to reach without a horizontal wheel. Wire two scroll mechanisms
on the strip once it's rendered:
- Vertical wheel → horizontal scroll (intercept only when overflow
exists and the wheel motion is primarily vertical, so normal page
scroll still works elsewhere).
- Mouse grab-and-drag: cursor goes grab/grabbing, mousedown→move
bumps scrollLeft by the cursor delta. A 5px drag threshold cancels
the chip click so the user can drag-scroll without accidentally
switching accounts.
- Revert the email row layout — sender/date stay above subject again,
matching the original two-line item that the user actually wanted.
- The account filter chips (#email-lib-accounts) wrapped onto multiple
rows on desktop. Promote the mobile-only horizontal-scroll rule to
apply at every breakpoint so the chips always sit on one row with
overflow scroll, regardless of screen size.
Group row's auto-sort-sessions-btn padding-left 6→4, and
.sort-dropdown-item left padding 8→6 so 'Last Active', 'Newest First',
'By Folder', '↑↓ Rearrange', '● Select' all shift in by the same
amount, matching the Group nudge.
Subject was on its own line below sender/date. Move it inline so each
email occupies one row: sender capped at 35% width (ellipsis), subject
takes the remaining space (ellipsis), date pins to the right. Tighter
list density at the cost of dropping the spare line for snippet text
(none was being rendered anyway).
Two related fixes for the Chats section header:
- The 'manage' label only slid out when the button itself was hovered.
Add section-header-flex:hover to the reveal rule so hovering the sort
icon (or anywhere in the section header) also opens the label.
- Parent-hover opacity bumped 0.45 → 0.85 so the 'manage' text reads
much more clearly when revealed. Direct hover on the button still
pushes to full opacity 1.
The shared .section-header-btn:hover rule paints a tinted background
across all section header buttons. On the Chats manage button this
showed as a box behind the sliding 'manage' label, which the user
didn't want. Override background to transparent for that one button.
The session-dropdown Esc handler only closed .session-dropdown-menu,
leaving the .session-folder-submenu (Move to folder → folder list)
orphaned on screen. Same gap on the click-away path. Extend both
selectors to cover the submenu so a single Esc / outside-click
dismisses the whole stack.
Email's 'new' label is absolutely positioned to the LEFT of the '+'
icon, which works there because the '+' is the visible/clickable
anchor. The chats manage button has no visible glyph at rest, so the
label was rendered outside the button's bounding box — hovering
'manage' lost the :hover state and clicking it missed.
Override list-item-plus-label inside chats-manage-btn:
position: static (in flex flow) + max-width:0 / max-width:80px
expand-on-hover so the button's clickable rect grows alongside the
text. Hover stays sticky; click hits.
The list-item-plus-label slide-in needs a visible anchor element so
the button takes up consistent width and the absolutely-positioned
label can fly in to the left of it. Email uses the '+' SVG as that
anchor; here we use an empty 13x13 spacer span instead — same
footprint, no glyph. Result: empty button at rest (still visible per
the chats-manage-btn fade rules), 'manage' slides in from the left
on direct hover.
Removed the book/library SVG and list-item-plus-btn/-label classes.
The button is now a plain text button styled like email's 'new' label
(9.5px, 0.02em letter-spacing), reusing the existing chats-manage-btn
opacity hover-reveal rules so it still fades until you hover the
section.
Two months of iteration on the Settings panel, integration forms, and
small visual nudges across the app. Highlights:
Settings restructure
- Add Models: split into separate Local + API cards (no more in-card
tabs); each fuses Type/Provider with the URL input.
- Added Models: new dedicated sidebar tab, with Probe + Clear-offline
pulled into its header; Local/API sub-section icons accent-tinted.
- Search: Web Search and a new Deep Research card (Model + tuning),
with a cross-link to AI Defaults. Provider hints use real clickable
anchors; Web Search Test button shows a whirlpool spinner.
- AI Defaults: Image Generation card returns; Research Model card
carries only Endpoint+Model with a cross-link to Search; Vision /
Default / Utility fallbacks unified under one numbered-row design
matching Search's chain.
- API Permissions (was 'API Tokens'): per-row rename, inline
Permissions toggle that expands the scope-edit panel, in-field
copy icons (icon→check on success). Empty state accent-tinted.
- Integrations: + Add Integration drops a type-picker menu directly
under the button (drop-up on tight viewports); each integration
form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses
the same accent-outlined Save/Test/Cancel buttons right-aligned.
- Danger Zone: Wipe→Delete with trash icons; new 'Delete everything'
row at the bottom that loops every category.
AI Synthesis (Reminders)
- Persona dropdown sourced from PROMPT_TEMPLATES + custom preset.
- src/reminder_personas.py mirrors the five built-ins for the
server-side synthesis path.
- dispatch_reminder() reads reminder_llm_persona and uses the
persona's system prompt; empty/unknown falls back to warm-neutral.
Esc handling
- Kebab menus and the provider picker intercept Esc in capture phase
so dismissing a popup no longer closes the whole Settings modal.
Accent tinting
- Scoped CSS rule across data-settings-panel=ai/services/added-models/
search/integrations/reminders for card h2 icons + the Added Models
sub-section icons.
Codex/Claude integration form
- No more auto-creation on form open — explicit Create token button.
- New tokens start with every scope granted; existing tokens move out
of the integration form into the API Permissions card.
- Setup reveal: copy buttons inline inside the token + setup code
blocks; shorter subtitle wording.
Misc visual polish
- Save/Test/Cancel uniformly accent-outlined and right-aligned on
every integration form.
- Provider logos render inline next to the search fallback selects
and the Deep Research Search dropdown.
- Trash icons in fallback rows bumped to 20x20 so they fill the 32px
button.
- Image generation default flipped to off.
Lift the LLM/Image Type select to the left of the URL input and the Add
button to its right, so the primary action (URL + Add) sits on one row.
Scan / Ollama / Key / Test stay on the action row below.
Drop the in-card Local/API tab strip — each is now its own admin card with
a normal h2 heading (Local on top, API below). The API key input is
always visible (no more click-to-reveal toggle), matching how cloud
providers actually work. Local keeps the optional key reveal since
local servers usually don't need one.
Dead code removed: wireModelsTabs IIFE and the adm-epApiKeyBtn toggle wire.
Move the Added Models endpoint lists out of the Add Models card into a
dedicated sidebar tab between Add Models and AI Defaults. The card now
focuses purely on adding (Local / API tabs), while the new panel owns
the existing endpoints + Probe and Clear-offline controls.
admin.js: defensive fallback so a stale 'added' value in localStorage
falls back to 'local' instead of leaving both panes hidden.
Move the Added Local + Added API lists out of the per-type tabs into
a dedicated third tab. Each Add tab is now just the form; the new tab
collects both lists together with Local / API subheadings.
Card layout:
Add Models [Probe] [Clear offline]
[Local] [API] [Added Models]
Tab content:
Local → Add Local form
API → Add API form
Added Models → Local list + API list (subheadings)
All endpoint list/form IDs preserved. Tab switcher JS is generic so
the new 'added' tab works without code changes.
Earlier split into 4 flat cards wasn't what was asked for. Restore to
a single 'Add Models' card with two tabs at the top:
Local → Add form + Added Local Models list
API → Add form + Added API Endpoints list
Probe / Clear-offline live on the card header and act on both lists.
Active tab is remembered in localStorage so the user lands back where
they were. All form/list IDs preserved (adm-epLocalUrl, adm-epList-local,
adm-epList-api, etc.) so admin.js continues to work unchanged.
Replaces the .adm-section-toggle fold-open JS with a tab-switcher; the
fold elements no longer exist so the old handler was already a no-op.
The previous 'Add Models' card had two collapsible folds (Local + API)
inside it and 'Added Models' had two inline subsections. Both folded
states added a click-to-expand step that wasn't earning its keep —
users coming to Settings to add a model don't want a fold, they want
the form.
Reshape: four flat admin-cards in the Services panel, each with its
own h2 title matching the rest of Settings:
Add Local Model (was Add Models → Local fold)
Add API (was Add Models → API fold)
Added Local Models (was Added Models → Local subsection)
Added API Endpoints (was Added Models → API subsection)
The collapsible JS hook in admin.js already guards on
'if (!head) return' so removing the .adm-section-toggle headers
turns it into a clean no-op — no breakage.
All input/list IDs preserved (adm-epLocalUrl, adm-epList-local,
adm-epList-api, etc.) so the rest of admin.js continues to work
unchanged. Probe / Clear-offline live on the Local card and act on
both lists together (existing behavior).
The Teacher Mode feature stays out of the default UI per the 2.0
roadmap — backend escalation is already dormant when teacher_model is
unset (its default) and we want to focus on core reliability before
surfacing escalation as a feature.
Nothing removed from the backend:
- src/teacher_escalation.py still gates on get_setting('teacher_model')
- agent_loop.py's run_teacher_inline hook is a no-op without the setting
- settings backup/restore round-trips the teacher_model key unchanged
- power users can still set it via manage_settings or the JSON backup
settings.js's initTeacherModel already early-returns when the card's
DOM ids are missing, so the JS side is clean.
To re-surface the card, revert this commit.
| [Fira Code](https://github.com/tonsky/FiraCode) | SIL Open Font License 1.1 | Nikita Prokopov & contributors |
| [Inter](https://github.com/rsms/inter) | SIL Open Font License 1.1 | Rasmus Andersson |
| [GohuFont](https://font.gohu.org/) (`fonts/custom/GohuFont.ttf`) | WTFPL | Hugo Chargois |
| [OpenDyslexic](https://opendyslexic.org/) (`fonts/OpenDyslexic-{Regular,Bold}.woff2`) | SIL Open Font License 1.1 ([`licenses/OpenDyslexic-OFL.txt`](licenses/OpenDyslexic-OFL.txt)) | Abbie Gonzalez |
> **Branch note:** `dev` is the default branch and contains the latest development changes, but it may be unstable. For the more stable curated branch, use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main).
<p align="center">
A self-hosted AI workspace for chat, agents, research, documents, email, notes, calendar, and local model workflows.
A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan.
- **Chat** -- chat with any local model or API; adding them is super simple.<br><sub>vLLM · llama.cpp · Ollama · OpenRouter · OpenAI · GitHub Copilot</sub>
- **Agent** -- hand it tools and let it run the whole task itself.<br><sub>built on [opencode](https://github.com/anomalyco/opencode) · MCP · web · files · shell · skills · memory</sub>
- **Cookbook** -- Scans your hardware, recommends models, click to download and serve.. easy!<br><sub>built on [llmfit](https://github.com/AlexsJones/llmfit) · VRAM-aware · GGUF / FP8 / AWQ · fit scoring · vLLM / llama.cpp serving</sub>
- **Deep Research** -- multi-step runs that gather, read, and synthesize sources into a nice visual report.<br><sub>adapted from [Tongyi DeepResearch](https://github.com/Alibaba-NLP/DeepResearch)</sub>
- **Compare** -- a fun tool to compare models side by side. Test completely blind, no bias!<br><sub>multi-model · blind test · synthesis</sub>
- **Documents** -- YOU write the text, AI is there to assist, not the opposite.<br><sub>multi-tab editor · markdown · HTML · CSV · syntax highlighting · AI edits · suggestions</sub>
- **Memory / Skills** -- Persistent memory and skills, your agent evolves over time as it better understands you and your tasks!<br><sub>ChromaDB · fastembed (ONNX) · vector + keyword retrieval · import/export</sub>
- **Email** -- IMAP/SMTP inbox with AI triage built in: urgency reminders, auto-tag, auto-summary, auto-reply drafts, auto-spam.<br><sub>IMAP · SMTP · per-account routing · CalDAV-aware</sub>
- **Notes & Tasks** -- Quick notes with reminders, a todo list, and scheduled tasks the agent can act on.<br><sub>note pings · checklist · cron-style tasks · ntfy / browser / email channels</sub>
- **Calendar** -- Local-first calendar with CalDAV sync to Radicale / Nextcloud / Apple / Fastmail.<br><sub>CalDAV pull · .ics import/export · per-calendar colors · agent-aware</sub>
- **Works on mobile** -- looks and runs great on your phone, not just desktop.<br><sub>responsive · installable (PWA) · touch gestures</sub>
- **Extras** -- more to explore, happy if you give it a go!<br><sub>image editor · theme editor · file uploads (vision + PDF) · web search · presets · sessions · 2FA</sub>
## Demo
A full, hover-to-play tour lives on the landing page (`docs/index.html`).
<details>
<summary>Screenshots / clips</summary>
### Chat & Agents

### Deep Research

### Compare

### Documents

### Notes & Tasks

</details>
---
## Quick Start
Defaults work out of the box: clone, run, then configure models/search/email
inside **Settings**. Only edit `.env` for deployment-level overrides like
`APP_BIND`, `APP_PORT`, `AUTH_ENABLED`, `DATABASE_URL`, or a pre-seeded admin password.
> `dev` is the default branch and gets the newest changes first. Use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main) if you want the more curated branch.
On first setup, Odysseus creates an admin account (`admin` unless
`ODYSSEUS_ADMIN_USER` is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in `docker compose logs odysseus`.
Use that for the first login, then change it in **Settings**.
Contributing? See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, testing, and
cp .env.example .env# optional, but recommended for explicit defaults
cp .env.example .env
docker compose up -d --build
```
To include optional extras in the image (PDF viewer, Office extraction; includes AGPL PyMuPDF), build with `docker compose build --build-arg INSTALL_OPTIONAL=true` before `up`.
Open `http://localhost:7000` when the containers are healthy. Docker Compose
binds the web UI to `127.0.0.1` by default. If the port is taken, set
`APP_PORT=7001` in `.env` and recreate the container. Set `APP_BIND=0.0.0.0`
only when you intentionally want LAN/reverse-proxy access.
Open `http://localhost:7000` when the containers are healthy. The first admin password is printed in `docker compose logs odysseus`.
If `python` points at an older interpreter, use `py -3.12` (or another installed
3.11+ version) for the venv step.
**Requirements:** Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full **Cookbook** background
model downloads and the agent shell tool, also install
[Git for Windows](https://git-scm.com/download/win) (provides `bash.exe`).
Local GPU *serving* of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
[Ollama](https://ollama.com/download) is the easiest path — point Odysseus at
`http://localhost:11434/v1` in Settings.
Open `http://localhost:7000`, log in with the generated admin password,
and configure everything else inside **Settings**.
## Troubleshooting & Advanced Setup
### `chromadb-client` conflicts with embedded ChromaDB
If `chromadb-client` (the lightweight HTTP-only package) is installed alongside the full `chromadb` package, Odysseus starts but ChromaDB silently falls back to HTTP-only mode and fails.
**Fix:** uninstall `chromadb-client` and force-reinstall the full package:
```bash
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
```
### HTTPS + LAN/Tailscale exposure
To expose Odysseus on a local network or Tailscale with HTTPS:
1. Change the bind address to `0.0.0.0` in `.env` (`APP_BIND=0.0.0.0` or `ODYSSEUS_HOST=0.0.0.0`).
2. Generate a locally-trusted cert for your LAN/Tailscale IPs using [mkcert](https://github.com/FiloSottile/mkcert):
4. Install the `mkcert` CA on any other device you want to access Odysseus from (e.g., for iOS, email the `rootCA.pem` to yourself, install the profile, and trust it in Certificate Trust Settings).
### Optional Dependencies
`requirements-optional.txt` contains packages that unlock extra features. It is not installed by default.
| Package | Feature unlocked |
|---------|-----------------|
| `faster-whisper` | Local speech-to-text (microphone -> text) via the "local" STT provider. |
| `ddgs` | DuckDuckGo as a search provider option. |
| `PyMuPDF` | PDF page rendering in the side viewer panel and form-filling. (Note: AGPL-3.0) |
| `markitdown` | Office/EPUB document text extraction (converts .docx/.xlsx/.pptx/.xls/.epub to Markdown). |
### Outlook / Office 365 email
Odysseus email accounts currently use IMAP/SMTP username-password auth. Outlook
and Microsoft 365 generally require OAuth instead, so normal Microsoft mailbox
passwords will fail. See [docs/email-outlook.md](docs/email-outlook.md) for the
current limitation and the planned integration direction.
## Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep `AUTH_ENABLED=true` for any network-accessible deployment.
- Keep `LOCALHOST_BYPASS=false` outside local development.
- Use `SECURE_COOKIES=true` when Odysseus is served through HTTPS by a trusted reverse proxy or private access gateway.
- Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy or private access layer.
- Keep `.env`, `data/`, `logs/`, databases, uploads, generated media, backups, auth/session files, API keys, and model/provider tokens out of Git and private shares. They are ignored by default.
- Review `data/auth.json` after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
- Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to `127.0.0.1`; bind to `0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
- Keep ChromaDB, SearXNG, ntfy, Ollama, vLLM, llama.cpp, databases, and raw model/provider APIs internal-only. Expose only the authenticated Odysseus web/API entrypoint through your trusted proxy or private access layer.
- Before publishing a fork, run `git status --short` and confirm no private files from `.env`, `data/`, `logs/`, uploads, backups, or local databases are staged.
### Private or proxied deployments
Odysseus serves plain HTTP on its app port. Docker Compose binds Odysseus and the bundled services to `127.0.0.1` by default, so a typical production/private setup is:
1. Keep Odysseus on localhost, for example `127.0.0.1:7000`.
2. Terminate HTTPS at a trusted reverse proxy or private access gateway.
3. Put the authenticated Odysseus web/API entrypoint behind that layer.
4. Keep raw service and model ports internal-only.
Cloudflare Access, Tailscale, Caddy, nginx, and Traefik can all fit this pattern; none are required by Odysseus. If your access layer reaches Odysseus on the same host, proxy to `http://127.0.0.1:7000` and keep `AUTH_ENABLED=true`, `LOCALHOST_BYPASS=false`, and `SECURE_COOKIES=true`.
Common internal-only ports from the default docs/compose setup:
| Port | Service |
|---|---|
| `7000` | Odysseus raw app port |
| `8080` | SearXNG |
| `8091` | ntfy |
| `8100` | ChromaDB host port for manual/compose access |
| `11434` | Ollama |
| `8000-8020` | Common local model/provider APIs |
A full hover-to-play tour lives on the landing page: [`docs/index.html`](docs/index.html).
## Contributing
Help is welcome. The best entry points are fresh-install testing, provider setup
bugs, mobile/editor polish, docs, and small focused refactors. See
[ROADMAP.md](ROADMAP.md) for the current help-wanted list.
## Configuration
Most setup is done inside the app with `/setup` or **Settings**. Use `.env`
for deployment-level defaults and secrets you want present before first boot.
Key settings:
Help is welcome. The best entry points are fresh-install testing, provider setup bugs, mobile/editor polish, docs, and small focused refactors. See [CONTRIBUTING.md](CONTRIBUTING.md) and [ROADMAP.md](ROADMAP.md).
| Variable | Default | Description |
|---|---|---|
| `LLM_HOST` | `localhost` | Your LLM server (e.g. `llm-host.local:8000`) |
| `LLM_HOSTS` | -- | Comma-separated list for model discovery |
| `OPENAI_API_KEY` | -- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
| `SEARXNG_INSTANCE` | `http://localhost:8080` | SearXNG URL. Docker overrides this to `http://searxng:8080`. |
| `SEARXNG_SECRET` | generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
| `APP_BIND` | `127.0.0.1` | Docker Compose host bind address for the web UI. Use `0.0.0.0` only for intentional LAN/reverse-proxy access. |
| `APP_PORT` | `7000` | Docker Compose host port for the web UI. |
| `ODYSSEUS_CHAT_UPLOAD_MAX_BYTES` | `10485760` | Chat/agent attachment cap in bytes. Raise for larger local PDFs or text documents. |
| `ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES` | `104857600` | Gallery image upload cap in bytes (100 MB). |
| `ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES` | `26214400` | Gallery transform input cap in bytes (25 MB). |
| `ODYSSEUS_MEMORY_IMPORT_MAX_BYTES` | `10485760` | Memory import file cap in bytes (10 MB). |
| `ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES` | `26214400` | Personal document upload cap in bytes (25 MB). |
| `ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES` | `26214400` | Email compose attachment cap in bytes (25 MB). |
| `ODYSSEUS_STT_MAX_AUDIO_BYTES` | `26214400` | Speech-to-text audio cap in bytes (25 MB). |
| `ODYSSEUS_ICS_MAX_BYTES` | `10485760` | Calendar `.ics` import cap in bytes (10 MB). |
## Security
All upload-limit vars are validated (must be a positive integer) and optional; an invalid value fails fast at startup.
### Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, `@playwright/mcp`) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
```bash
npx -y @playwright/mcp@latest --version
```
That installs `@playwright/mcp` plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
Odysseus is a self-hosted workspace with powerful local tools. Keep auth enabled, keep private data out of Git, and do not expose raw model/service ports publicly. Deployment details are in the [setup guide](docs/setup.md#security-notes).
## Star History
@@ -451,19 +72,5 @@ All user data lives in `data/` (gitignored): `app.db` (sessions, messages, docum
</a>
## License
AGPL-3.0-or-later -- see [LICENSE](LICENSE) and [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md).
```
|
|||
|||||
| | | |||||||
)_) )_) )_) ~|~
)___))___))___)\ |
)____)____)_____)\\|
_____|____|____|_____\\\__
\ /
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
~^~ all aboard! ~^~
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
```
AGPL-3.0-or-later -- see [LICENSE](LICENSE) and [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md).
This page keeps the detailed install, deployment, troubleshooting, and configuration notes out of the front README.
## Quick Start
> **Branch note:**`dev` is the default branch and contains the latest development changes, but it may be unstable. For the more stable curated branch, use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main).
Defaults work out of the box: clone, run, then configure models/search/email
inside **Settings**. Only edit `.env` for deployment-level overrides like
`APP_BIND`, `APP_PORT`, `AUTH_ENABLED`, `DATABASE_URL`, or a pre-seeded admin password.
On first setup, Odysseus creates an admin account (`admin` unless
`ODYSSEUS_ADMIN_USER` is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in `docker compose logs odysseus`.
Use that for the first login, then change it in **Settings**.
Contributing? See [CONTRIBUTING.md](../CONTRIBUTING.md) for setup, testing, and
cp .env.example .env # optional, but recommended for explicit defaults
docker compose up -d --build
```
To include optional extras in the image (PDF viewer, Office extraction; includes AGPL PyMuPDF), build with `docker compose build --build-arg INSTALL_OPTIONAL=true` before `up`.
Open `http://localhost:7000` when the containers are healthy. Docker Compose
binds the web UI to `127.0.0.1` by default. If the port is taken, set
`APP_PORT=7001` in `.env` and recreate the container. Set `APP_BIND=0.0.0.0`
only when you intentionally want LAN/reverse-proxy access.
> **On Apple Silicon (M-series) Macs:** Docker can't reach the Metal GPU, so
> Cookbook serves local models on CPU only. For GPU-accelerated model serving,
> run natively instead — see [Apple Silicon](#apple-silicon) below.
The manual `uvicorn` command takes the same address as `--host 0.0.0.0`. Bind
outside loopback only for a trusted LAN/VPN such as Tailscale: keep
`AUTH_ENABLED=true` and do not expose the port directly to the public internet.
**Requirements:** Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full **Cookbook** background
model downloads and the agent shell tool, also install
[Git for Windows](https://git-scm.com/download/win) (provides `bash.exe`).
Local GPU *serving* of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
[Ollama](https://ollama.com/download) is the easiest path — point Odysseus at
`http://localhost:11434/v1` in Settings.
Open `http://localhost:7000`, log in with the generated admin password,
and configure everything else inside **Settings**.
## Troubleshooting & Advanced Setup
### `chromadb-client` conflicts with embedded ChromaDB
If `chromadb-client` (the lightweight HTTP-only package) is installed alongside the full `chromadb` package, Odysseus starts but ChromaDB silently falls back to HTTP-only mode and fails.
**Fix:** uninstall `chromadb-client` and force-reinstall the full package:
```bash
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
```
### HTTPS + LAN/Tailscale exposure
To expose Odysseus on a local network or Tailscale with HTTPS:
1. Change the bind address to `0.0.0.0` in `.env` (`APP_BIND=0.0.0.0` or `ODYSSEUS_HOST=0.0.0.0`).
2. Generate a locally-trusted cert for your LAN/Tailscale IPs using [mkcert](https://github.com/FiloSottile/mkcert):
4. Install the `mkcert` CA on any other device you want to access Odysseus from (e.g., for iOS, email the `rootCA.pem` to yourself, install the profile, and trust it in Certificate Trust Settings).
### Common self-host traps (30-second fixes)
A grab-bag of small gotchas that otherwise turn into long debugging sessions.
- **`AUTH_ENABLED=false` is ignored / you're still forced to log in (Windows).** If you edited `.env` in Notepad it may have saved a UTF-8 **BOM**, turning the first key into `AUTH_ENABLED` so it is never matched. Odysseus loads `.env` with `encoding="utf-8-sig"` to tolerate a leading BOM, but the safe fix is to re-save `.env` as **UTF-8 without BOM** (VS Code: *Save with Encoding → UTF-8*).
- **macOS: the app isn't at `http://localhost:7000`.** macOS AirPlay Receiver usually holds port `7000`, so the macOS start script serves on **`7860`** instead — open `http://localhost:7860`. To use `7000`, free it (System Settings → General → AirDrop & Handoff → turn off *AirPlay Receiver*) and set `APP_PORT=7000`.
- **Copy buttons do nothing over a plain-HTTP Tailscale/LAN URL.** Browsers only expose the clipboard API (`navigator.clipboard`) on **secure origins** — HTTPS, or `localhost`. Over `http://100.x.y.z:7860` it is blocked. Serve over HTTPS (see *HTTPS + LAN/Tailscale exposure* above); `localhost` is exempt, so copy still works on the host itself.
- **Self-hosted ntfy reminders don't reach your phone.** Two things: (1) the bundled ntfy binds to loopback by default — to reach it from your phone set `NTFY_BIND` to your host/Tailscale IP and `NTFY_BASE_URL` to the same server URL in `.env`, then recreate the ntfy container (see the `NTFY_*` block in `.env.example`); (2) in the ntfy **Android** app, subscribe to the topic with **Instant delivery** enabled — non-`ntfy.sh` servers don't get instant push otherwise.
- **Local mail (Dovecot) login fails: "Plaintext authentication disallowed on non-encrypted connections."** Your IMAP/SMTP server is refusing cleartext auth over an unencrypted link. Prefer enabling TLS on the mail server; on a trusted LAN only, you can allow cleartext (Dovecot: `disable_plaintext_auth = no`).
- **Calendar/contacts (Radicale) won't sync.** Point Odysseus at the **full collection URL** with its trailing slash — e.g. `http://host:5232/<user>/<collection-id>/` — not just the server root. Radicale shows this address for each calendar/address book in its web UI.
### Optional Dependencies
`requirements-optional.txt` contains packages that unlock extra features. It is not installed by default.
| Package | Feature unlocked |
|---------|-----------------|
| `faster-whisper` | Local speech-to-text (microphone -> text) via the "local" STT provider. |
| `ddgs` | DuckDuckGo as a search provider option. |
| `PyMuPDF` | PDF page rendering in the side viewer panel and form-filling. (Note: AGPL-3.0) |
| `markitdown` | Office/EPUB document text extraction (converts .docx/.xlsx/.pptx/.xls/.epub to Markdown). |
### Faster, reproducible installs with uv (optional)
[uv](https://docs.astral.sh/uv/) works as a drop-in replacement for the
venv + pip steps in the native install guides, no project changes are needed but this change results in faster installs along with a lockfile for reproducible environments. After [installing `uv`](https://docs.astral.sh/uv/getting-started/installation/), use:
```bash
uv venv venv --python 3.13
uv pip install -r requirements.txt
# then continue as usual: python setup.py, uvicorn, ...
```
`requirements.txt` is intentionally unpinned, so two installs at different times can produce different package versions. If you want a reproducible environment (e.g. across your own machines, or to roll back after a bad upgrade), snapshot and restore exact versions with:
```bash
uv pip compile requirements.txt -o requirements.lock # snapshot current resolution
uv pip sync requirements.lock # reproduce it exactly later
```
`requirements.lock` is gitignored and platform-specific (compile it on the OS you deploy to). Regenerate it deliberately when you want to take upgrades. The plain `uv pip install -r requirements.txt` keeps following the unpinned requirements like pip does.
### Outlook / Office 365 email
Odysseus email accounts currently use IMAP/SMTP username-password auth. Outlook
and Microsoft 365 generally require OAuth instead, so normal Microsoft mailbox
passwords will fail. See [docs/email-outlook.md](docs/email-outlook.md) for the
current limitation and the planned integration direction.
## Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep `AUTH_ENABLED=true` for any network-accessible deployment.
- Keep `LOCALHOST_BYPASS=false` outside local development.
- Use `SECURE_COOKIES=true` when Odysseus is served through HTTPS by a trusted reverse proxy or private access gateway.
- Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy or private access layer.
- Keep `.env`, `data/`, `logs/`, databases, uploads, generated media, backups, auth/session files, API keys, and model/provider tokens out of Git and private shares. They are ignored by default.
- Review `data/auth.json` after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
- Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to `127.0.0.1`; bind to `0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
- Keep ChromaDB, SearXNG, ntfy, Ollama, vLLM, llama.cpp, databases, and raw model/provider APIs internal-only. Expose only the authenticated Odysseus web/API entrypoint through your trusted proxy or private access layer.
- Before publishing a fork, run `git status --short` and confirm no private files from `.env`, `data/`, `logs/`, uploads, backups, or local databases are staged.
### Private or proxied deployments
Odysseus serves plain HTTP on its app port. Docker Compose binds Odysseus and the bundled services to `127.0.0.1` by default, so a typical production/private setup is:
1. Keep Odysseus on localhost, for example `127.0.0.1:7000`.
2. Terminate HTTPS at a trusted reverse proxy or private access gateway.
3. Put the authenticated Odysseus web/API entrypoint behind that layer.
4. Keep raw service and model ports internal-only.
Cloudflare Access, Tailscale, Caddy, nginx, and Traefik can all fit this pattern; none are required by Odysseus. If your access layer reaches Odysseus on the same host, proxy to `http://127.0.0.1:7000` and keep `AUTH_ENABLED=true`, `LOCALHOST_BYPASS=false`, and `SECURE_COOKIES=true`.
`ALLOWED_ORIGINS` lists exact permitted origins for cross-origin browser/API clients; ordinary same-origin reverse-proxy access usually does not need a special CORS entry.
Common internal-only ports from the default docs/compose setup:
| Port | Service |
|---|---|
| `7000` | Odysseus raw app port |
| `8080` | SearXNG |
| `8091` | ntfy |
| `8100` | ChromaDB host port for manual/compose access |
| `11434` | Ollama |
| `8000-8020` | Common local model/provider APIs |
## Configuration
Most setup is done inside the app with `/setup` or **Settings**. Use `.env`
for deployment-level defaults and secrets you want present before first boot.
Key settings:
| Variable | Default | Description |
|---|---|---|
| `LLM_HOST` | `localhost` | Your LLM server (e.g. `llm-host.local:8000`) |
| `LLM_HOSTS` | -- | Comma-separated list for model discovery |
| `OPENAI_API_KEY` | -- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
| `SEARXNG_INSTANCE` | `http://localhost:8080` | SearXNG URL. Docker overrides this to `http://searxng:8080`. |
| `SEARXNG_SECRET` | generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
| `APP_BIND` | `127.0.0.1` | Docker Compose host bind address for the web UI. Use `0.0.0.0` only for intentional LAN/reverse-proxy access. |
| `APP_PORT` | `7000` | Docker Compose host port for the web UI. |
| `APP_DATA_DIR` | `./data` | Docker Compose host directory for application data volumes. |
| `ODYSSEUS_CHAT_UPLOAD_MAX_BYTES` | `10485760` | Chat/agent attachment cap in bytes. Raise for larger local PDFs or text documents. |
| `ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES` | `104857600` | Gallery image upload cap in bytes (100 MB). |
| `ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES` | `26214400` | Gallery transform input cap in bytes (25 MB). |
| `ODYSSEUS_MEMORY_IMPORT_MAX_BYTES` | `10485760` | Memory import file cap in bytes (10 MB). |
| `ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES` | `26214400` | Personal document upload cap in bytes (25 MB). |
| `ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES` | `26214400` | Email compose attachment cap in bytes (25 MB). |
| `ODYSSEUS_STT_MAX_AUDIO_BYTES` | `26214400` | Speech-to-text audio cap in bytes (25 MB). |
| `ODYSSEUS_ICS_MAX_BYTES` | `10485760` | Calendar `.ics` import cap in bytes (10 MB). |
All upload-limit vars are validated (must be a positive integer) and optional; an invalid value fails fast at startup.
### Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, `@playwright/mcp`) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
```bash
npx -y @playwright/mcp@latest --version
```
That installs `@playwright/mcp` plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
@@ -102,6 +102,7 @@ python3 ~/.claude/skills/odysseus/scripts/odysseus_api.py POST /api/codex/memory
## Email draft + send
- Prefer `POST /api/codex/emails/draft-document` for agent-written email replies. It creates an editable Odysseus Document with `language: "email"` and does not touch IMAP/send.
@@ -102,6 +102,7 @@ python3 integrations/codex/scripts/odysseus_api.py POST /api/codex/memory '{"tex
## Email draft + send
- Prefer `POST /api/codex/emails/draft-document` for Codex-written email replies. It creates an editable Odysseus Document with `language: "email"` and does not touch IMAP/send.
Fail"Couldn't find Python 3.11+ for Windows setup. Install Python 3.11+ (or open the Python launcher with 'py -3.11') from https://www.python.org/downloads/, then re-run this script."
logger.info(f"[doc-inject] found by ID: title={active_doc.title!r}, lang={active_doc.language!r}, is_active={active_doc.is_active}, content_len={len(active_doc.current_contentor'')}")
else:
logger.warning(f"[doc-inject] NOT FOUND by ID {active_doc_id}")
@@ -656,9 +766,18 @@ def setup_chat_routes(
# Build disabled-tools set from frontend toggles + user privileges
disabled_tools=set()
ifstr(allow_bash).lower()!="true":
# Only disable bash/web_search when the caller *explicitly* set them
# to a falsy value. When unset (None), defer to per-user privilege
# checks below — this lets admins with can_use_bash=True use bash
# by default without having to send allow_bash in every request.
runner_lines.append(' echo "[odysseus] No matching prebuilt llama-server for this host (arch=$_odysseus_arch) — will build from source."')
runner_lines.append(' fi')
runner_lines.append(' if [ -z "$_odysseus_have_prebuilt" ]; then')
# Detect pip-installed nvcc (from vLLM/nvidia CUDA wheels) and put it on PATH
# so cmake's CUDA configure can find it. We keep this after the ROCm/HIP
# check — a machine with both stacks should honor the native HIP toolchain on
# AMD hosts instead of accidentally preferring a stray nvcc wheel.
runner_lines.append(' for _cudir in ~/.local/lib/python*/site-packages/nvidia/cu13 ~/.local/lib/python*/site-packages/nvidia/cu12 ~/.local/lib/python*/site-packages/nvidia/cuda_nvcc; do')
runner_lines.append(' if _odysseus_has_nvidia_hw; then')
runner_lines.append(' for _cudir in ~/.local/lib/python*/site-packages/nvidia/cu13 ~/.local/lib/python*/site-packages/nvidia/cu12 ~/.local/lib/python*/site-packages/nvidia/cuda_nvcc; do')
runner_lines.append(' echo "[odysseus] WARNING: missing build deps ($_missing) — passwordless sudo is unavailable, cannot auto-install. Cookbook Diagnosis will explain the fix after the build fails."')
runner_lines.append(' echo "[odysseus] WARNING: no HIP/CUDA toolchain found — building llama-server for CPU only."')
runner_lines.append(' echo "[odysseus] WARNING: no HIP/CUDA/Vulkan toolchain found — building llama-server for CPU only."')
runner_lines.append(' echo "[odysseus] GPU inference will not be available for this llama.cpp build."')
runner_lines.append(' echo "[odysseus] Install ROCm for AMD GPUs or vLLM/CUDA tooling for NVIDIA, then re-launch this serve task."')
runner_lines.append(' echo "[odysseus] Install Vulkan (libvulkan-dev) / ROCm for AMD GPUs or CUDA tooling for NVIDIA, then re-launch this serve task."')
r"Please ensure sgl_kernel is properly installed",
"SGLang native dependencies are missing on this server.",
[
{"label":"install OS packages: libnuma-dev python3.12-dev build-essential","op":"manual"},
{"label":"upgrade sglang-kernel after OS packages are installed","op":"manual"},
],
),
(
r"sglang.*command not found|No module named sglang|SGLang is not installed",
"SGLang is not installed or not in PATH on this server.",
[{"label":"install SGLang in Cookbook Dependencies","op":"dependency","package":"sglang[all]"}],
),
# System build deps come BEFORE the generic llama.cpp catch-all so
# cmake / build-essential / git missing → a specific OS-package
# remediation instead of "install llama-cpp-python[server]" (which
# itself fails to compile when cmake is absent).
(
r"llama-server.*command not found|llama\.cpp.*not found|No module named.*llama_cpp|No module named 'starlette_context'|git: command not found|cmake: command not found",
r"cmake: command not found|cmake.*not found.*[Cc]ould not",
"cmake is required to build llama.cpp from source but isn't installed on this server.",
# Probe scripts for the dead-session download check, run as
# `python3 -c <PROBE> <repo_id> <cache_root>` (locally or over SSH).
# cache_root is the task's custom download dir, '' for the default HF cache.
# It has to be passed explicitly: the download runner exports
# HF_HOME=<local_dir>, so that task's cache lives under <local_dir>/hub, and
# the probe process's own environment knows nothing about it.
HF_CACHE_COMPLETE_PROBE=(
"import os,sys;"
"repo=sys.argv[1];"
"root=os.path.expanduser(sys.argv[2]) if len(sys.argv)>2 and sys.argv[2] else '';"
"base=os.path.join(root,'hub') if root else (os.environ.get('HUGGINGFACE_HUB_CACHE') or os.path.join(os.environ.get('HF_HOME', os.path.expanduser('~/.cache/huggingface')), 'hub'));"
"ok=os.path.isdir(snap) and any(os.path.isdir(os.path.join(snap,x)) and os.listdir(os.path.join(snap,x)) for x in os.listdir(snap));"
"inc=False;"
"blobs=os.path.join(d,'blobs');"
"inc=os.path.isdir(blobs) and any(x.endswith('.incomplete') for x in os.listdir(blobs));"
"sys.exit(0 if ok and not inc else 1)"
)
HF_CACHE_INCOMPLETE_PROBE=(
"import os,sys;"
"repo=sys.argv[1];"
"root=os.path.expanduser(sys.argv[2]) if len(sys.argv)>2 and sys.argv[2] else '';"
"base=os.path.join(root,'hub') if root else (os.environ.get('HUGGINGFACE_HUB_CACHE') or os.path.join(os.environ.get('HF_HOME', os.path.expanduser('~/.cache/huggingface')), 'hub'));"
{"role":"system","content":"You are a reminder assistant. Write a single short, warm, motivating sentence (max 25 words) reminding the user about the note below. Do not add greetings, preamble, or hashtags. Output only the sentence."},
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.