Add official Gemma 4 12B-it plus QAT-INT4/INT8 catalog entries (with their
GGUF sources), QAT quantization support across the quant tables and the
prequantized-prefix list, and the missing RTX 3050 / 3050 Ti memory
bandwidth so speed estimates stop falling back to the generic cuda value.
* fix(routes): serve 404 instead of 500 when an HTML page file is missing
_serve_html_with_nonce opened the HTML file with no error handling, and
callers such as /backgrounds and /login pass their paths in with no
existence check, so a missing or unreadable file raised an unhandled
OSError that surfaced as a 500. Wrap the read and raise HTTPException(404)
instead; the normal render path (CSP-nonce substitution) is unchanged.
Fixes#4594
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): distinguish missing page (404) from read failure (500)
The previous fix caught a broad OSError and returned 404 for every
failure, which masks real server-side problems (permission errors, I/O
failures) as "not found" and lets them slip past error alerting. Split
FileNotFoundError (genuine 404) from other OSError, which now logs the
exception and returns a generic 500 — without leaking the OS error
string or file path into the response body.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(routes): treat unreadable bundled HTML page as logged 500, not 404
Per PR #4637 review: every caller of the page-render helper serves a fixed,
server-owned template (index/login/backgrounds), never a client-supplied
path. So a missing or unreadable file is a server fault (broken deployment),
not a client "not found" — a 404 there mislabels a server error and hides a
missing core template from 5xx alerting, contradicting the OSError->500
rationale this PR is built on. Collapse both branches into a single logged,
leak-free 500.
Move the helper to src.app_helpers.serve_html_with_nonce so the behavior can
be unit-tested without importing the whole app (app.py is the slim
orchestrator; the test harness stubs src.database, so importing app in tests
is not viable). Add tests pinning missing/unreadable -> 500 (not 404) and
nonce injection on the happy path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(chat): strip executed email tool fences from the live stream (#3993)
The backend strips every fenced tool block from persisted text (the regex in
src/tool_parsing.py is built from the full TOOL_TAGS set, which includes the
email tools), so a reloaded session renders cleanly. The live frontend path
uses a separate hardcoded EXEC_FENCE_RE in static/js/chatRenderer.js that only
listed web_search/read_file/write_file/create_document/edit_document/
update_document — so executed email tool fences (list_emails, etc.) lingered as
raw code blocks in the live assistant bubble until the user reloaded.
Add the nine email tool tags to EXEC_FENCE_RE so the live render settles into
the same clean layout as the history reload. bash/python stay excluded on
purpose: those are languages a user may legitimately have asked the model to
show as code, not tool invocations.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(chat): single-source live exec-fence tool list from TOOL_TAGS (#3993)
Per review: EXEC_FENCE_RE was a second, hand-maintained copy of the
executable-tool list, so any tool not in it — and every future tool added to
TOOL_TAGS — would leave its executed fence lingering in the live bubble until
reload (the original #3993 bug, recurring one tool at a time).
EXEC_FENCE_RE is now built from an explicit EXEC_TOOL_TAGS list that mirrors
TOOL_TAGS (src/agent_tools/__init__.py) minus bash/python, which stay excluded
as legitimate code-example languages. A new regression test
(test_exec_fence_re_covers_all_executable_tools) extracts both lists from
source and fails if they drift, so the whole class is caught in CI instead of
by a user — the "minimum acceptable middle ground" from the review, made exact
(set equality, not just coverage).
Verified: pytest tests/test_live_strip_email_tool_fences.py (5 passed);
node --check static/js/chatRenderer.js; and a node run of the built regex
confirms email/generate_image/manage_memory/ls fences strip while
bash/python/sh are preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(chat): build live exec-fence list from /api/tools at runtime (#3993)
Make TOOL_TAGS the single source for live exec-fence stripping. chatRenderer.js
no longer hard-codes a tool list; it fetches the backend's authoritative set
once from GET /api/tools (sorted(TOOL_TAGS)) and builds EXEC_FENCE_RE from it at
load, minus bash/python. No second list to drift, and a future tool added to
TOOL_TAGS is covered automatically — without touching the streaming path.
Until the fetch resolves EXEC_FENCE_RE is null and exec fences aren't stripped
(a sub-second window before the first stream); the backend already strips
persisted history, so a reload always renders clean.
Drop test_exec_fence_re_covers_all_executable_tools (no hand-maintained list to
guard) and add source-level guards: the frontend keeps no hard-coded list and
fetches /api/tools, and the endpoint serves the full sorted(TOOL_TAGS).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CVCKth4g8pWh7pwFDVm4iL
* fix(chat): warn on /api/tools fetch failure instead of swallowing it (#3993)
A fresh-context review flagged that loadExecFenceRegex's catch silently
discarded errors: if the one-shot fetch fails, EXEC_FENCE_RE stays null for the
whole session and live exec fences go unstripped until reload, with zero signal.
console.warn it, and correct the comment to describe the failure mode honestly
(was understated as just a sub-second startup window).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CVCKth4g8pWh7pwFDVm4iL
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: Images cannot be seen by model that is vision capable
* fix: skip http(s) image_url for Ollama (images[] is base64-only)
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* fix(llm): detect mistral.ai provider and support reasoning_effort
Four coupled bugs broke Mistral thinking model support:
1. _detect_provider() had no mistral.ai host check, so all Mistral
endpoints fell through to the generic 'openai' provider string.
_provider_display_name() correctly identified them as 'Mistral',
making any 'if provider == "Mistral"' check elsewhere dead code.
2. reasoning_effort parameter was never sent in the request payload,
so Mistral never activated thinking mode even when the user
configured a thinking-capable model (mistral-small-latest,
mistral-medium-latest, magistral-*).
3. Mistral returns content as a typed array
([{"type":"thinking",...},{"type":"text",...}]) when
reasoning is on, not as a plain string. Both the streaming and
non-streaming parsers expected strings and silently dropped the
thinking content.
4. _THINKING_MODEL_PATTERNS didn't include magistral or mistral-*
model prefixes, so the frontend wouldn't tag reasoning output
as thinking even after the above were fixed.
Fix:
- Add mistral.ai to _detect_provider() host checks
- Add a _normalize_mistral_content() helper that splits the typed
array into (text, thinking) strings
- Inject payload["reasoning_effort"] = "high" when provider is
Mistral and _supports_thinking(model) is true, in both stream_llm
and llm_call_async payload construction
- Wire the normalizer into both response parsers
- Extend _THINKING_MODEL_PATTERNS to include magistral,
mistral-small, mistral-medium, mistral-large
Tested on Docker install with mistral-small-latest +
reasoning_effort=high. Reasoning streams correctly into the
thinking panel after the fix.
Fixes#4678
* fix(llm): address review — lowercase provider id, configurable effort, tests
Addresses vdmkenny's review on PR #4698:
1. Removed duplicate 'if provider == "mistral"' block in stream_llm
— two back-to-back copies, one was dead-redundant.
2. Dropped personal-context comment ('free-tier limits are generous
for this user') and made reasoning_effort configurable via env var
ODYSSEUS_MISTRAL_REASONING_EFFORT (high / medium / low / none).
Default remains 'high' for backward compat with the tested behavior.
3. Recased provider id from 'Mistral' to 'mistral' to match the
lowercase convention used by every other provider id in the file
(openai, anthropic, ollama, copilot, ...). _provider_display_name()
still returns the Title-Case 'Mistral' for UI labels — only the
runtime id used in 'if provider == ...' checks was recased.
4. Added tests/test_llm_core_mistral_content.py with 13 tests pinning
_normalize_mistral_content()'s contract: string passthrough, the
Mistral array format (thinking + text blocks), and edge cases
(empty, garbage, None, wrong types, missing fields, string-vs-array
inner thinking field).
Also fixed a gap the review didn't catch: the non-streaming paths
(llm_call sync + llm_call_async) were missing the reasoning_effort
injection entirely. Added the same injection to both, so Deep Research
and agent tool calls also activate Mistral thinking.
All 13 new tests pass. Existing reasoning/streaming/ollama-thinking
tests still pass (38 tests, no regressions).
Fixes#4678
* fix(memory): keep the Brain memory item menu above the modal at any stack depth
The memory item "⋮" dropdown is portaled to <body> with a hardcoded
z-index of 10001. Tool modals, however, get a monotonically increasing
z-index from modalManager's bring-to-front counter (_modalTopZ), which
climbs unbounded as modals are opened/restored over a session. Once that
counter passes 10001, the Brain modal stacks above the body-portaled
dropdown, so the menu renders behind the panel — visible only where it
spills past the modal's edge (#4720).
Derive the dropdown's z-index from the owning modal's current z-index
(+1), keeping 10001 as a floor for the common low-counter case, so the
menu always sits just above its modal however high the counter has climbed.
Verified with document.elementFromPoint at the dropdown's location: with a
high modal z-index the old build returns the modal at every sampled point
(menu behind); the fixed build returns the dropdown (menu on top). The
default low-counter case is unchanged (z stays 10001).
* refactor(modal): route body-portaled dropdowns through a shared topPortalZ() helper
The hardcoded z-index:10001 the Brain memory menu used (#4720) is the same
literal shared by ~16 body-portaled dropdowns across calendar, cookbook,
cookbookServe, documentLibrary, emailLibrary, gallery, notes, emojiPicker and
memory — each renders behind its owning tool modal once modalManager's
bring-to-front counter climbs past the literal over a long session.
Promote the per-dropdown fix into a single topPortalZ() helper in
toolWindowZOrder.js — the existing source of truth for tool-window z, already
imported by modalManager's _bringToFront and notes.js — returning
max(topToolWindowZ(), dock-chip floor) + 1, so a portaled dropdown always sits
just above the live tool-window stack however high the counter has climbed.
Route all 16 sites through it. The slashCommands tour tooltips and the
cookbookServe VRAM dialog are intentionally left out (neither is a modal-owned
portaled dropdown).
Add tests/test_portal_dropdown_z_js.py covering the helper, including the #4720
scenario (modal counter at 99999 -> dropdown at 100000). Existing
test_notes_z_order_js.py stays green.
* fix(security): redact credential-bearing URLs and PII from logs
Several log statements emitted sensitive data in clear text:
- model_routes / chat_routes / contacts_routes logged endpoint URLs raw.
Admin-configured URLs can embed credentials in userinfo or query
(e.g. https://user:pass@host, ?api_key=...). Route them through a
shared core.log_safety.redact_url() that drops userinfo/query/fragment.
- note_routes / task_scheduler logged operator email addresses (smtp_user,
recipient). Replaced with presence booleans, which keeps the diagnostic
("why didn't this send") without writing PII to logs.
model_routes already had a local redactor on its HTTPStatusError branch;
the generic except branch was missed, so reuse the existing helper there.
Clears CodeQL py/clear-text-logging-sensitive-data alerts 264, 317, 324,
325, 343, 344, 528.
* fix(security): re-bracket IPv6 hosts and single-source the URL redactor
Address review on #4750:
- redact_url now re-brackets IPv6 literals so host:port stays
unambiguous (https://[2001:db8::1]:8443/v1, not the bracket-less
ambiguous form).
- point model_routes._redact_url_for_log at the shared helper so the
two redactors are single-sourced (also picks up the IPv6 fix).
* fix(security): escape backslashes in calendar bg-image CSS url()
The calendar event-background CSS escaped ' -> \' for a bg: image URL but
not backslashes first. Inside a single-quoted url('...'), \ is the CSS
escape char, so a URL value ending in/containing a backslash escapes the
closing quote and breaks out of the string, injecting arbitrary CSS. The
bg:<url> value is per-event and CalDAV-syncable, hence untrusted (CodeQL
js/incomplete-sanitization).
Add a single canonical _cssUrlEscape() in calendar/utils.js that escapes
backslashes FIRST, then quotes, and route all four sinks through it:
calendar.js:416 / :1263 (the flagged #463/#464), the event-form preview
(:2931), and _calBgCss() in utils.js — the latter two share the identical
bug but were unflagged. Output is byte-identical to the old escaping for
legitimate URLs (which contain no backslashes); only malicious input differs.
Resolves CodeQL js/incomplete-sanitization #463, #464.
* fix(security): route remaining calendar bg url() sinks through _cssUrlEscape
Review (vdmkenny) flagged that the centralization missed an injectable
sibling sink: the edit-form color-picker swatch (calendar.js:2856) built
`url('${url}')` from `existing.color` (a CalDAV-syncable, untrusted `bg:`
value) raw, then interpolated it into `style="background:..."` via innerHTML
- the same `'`/`\` breakout class as the sinks already fixed. The custom-dot
preview (:2953) was likewise raw (non-exploitable - a CSSOM `.style`
assignment of a URL the current user just picked - but it broke the invariant).
Route both through `_cssUrlEscape`, and normalize the two pre-escaped-variable
sites (_calItemBgStyle, _renderWeek) to the same inline form so all five
url() interpolations in calendar.js follow one rule. Add a whole-file
invariant test asserting every `url('${...}')` calls `_cssUrlEscape` - this
catches a future missed sink, the exact failure mode here. Behavior-identical
for legitimate URLs (no visual change).
* fix: document read fails with 403 when auth is disabled
Add _auth_disabled() bypass in _verify_doc_owner() and the
/api/documents/{session_id} route guard so documents remain accessible
in single-user / no-auth mode.
Minimal change: only adds the auth-disabled check alongside existing
403 raises — preserves existing formatting and line endings.
* refactor: hoist _auth_disabled import to module level
Address reviewer feedback on PR #4623 — no circular import exists
(src.auth_helpers only imports stdlib + fastapi), so the inline
imports are unnecessary. Moves the import to module top in both
document_helpers.py and document_routes.py.
* test: add regression tests for auth-disabled document access (PR #4623)
Remote Cookbook hwfit probes failed on Windows hosts because the PowerShell script was sent as nested -Command quoting through OpenSSH. Use -EncodedCommand for remote probes, auto-detect platform when omitted (including Darwin for Mac SSH hosts), and return a clearer error when SSH works but the probe fails.
Co-authored-by: Cursor <cursoragent@cursor.com>
The app's ad-hoc dropdown/context menus each wire their own document-level
outside-click listener, but that listener only removes itself on an *outside*
click. Every other dismissal path -- clicking a menu item (which calls
el.remove() directly), a Cancel button, Escape, or the "close the
previously-open menu" reopen sweep -- tears the node down without
unregistering the listener, orphaning it on `document`. The stranded listener
then lingers and can break the next menu interaction: the recurring "the
button stops working until I refresh the page" class of bug (e.g. delete an
email, then the kebab/more button is dead on the other rows).
Route all 16 of these menus through the existing escMenuStack helper
(bindMenuDismiss / dismissOrRemove), exactly as documentLibrary.js
_showLibDropdown, cookbookRunning.js, and research/panel.js already do: a
single idempotent close() owns the teardown and is released on every dismissal
path, reopen sweeps use dismissOrRemove() instead of a bare .remove(), and
Escape flows through the central LIFO esc-stack arbiter. Net -49 lines.
Menus migrated: cookbook _showDepMenu; document export menu and
_openDocAiReplyChoice; emailInbox _showEmailMenu; emailLibrary
_showReaderMoreMenu / _showCardMenu / _showBulkActionsMenu; gallery
_showGalleryBulkMenu; notes _pickCustomDate / _openNoteCornerMenu; settings
(3 unified-integrations dropdowns); skills _openSkillMenu; tasks
_showTaskDropdown; compare _toggleExportMenu.
Per-menu semantics preserved (anchor-as-inside tests, the tasks 250ms
ghost-click guard, emailLibrary's reader-more-active anchor class and the
bulk-Cancel select-mode reset, settings' reused-vs-recreated lifecycles).
Six menus with custom lifecycles (notes _openReminderMenu, sessions
long-press, document markdown-toolbar, emojiPicker, compare model selector)
are intentionally left for a follow-up -- each needs individual review.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The native Windows launcher binds to 127.0.0.1 via its own -BindHost
parameter and does not read APP_BIND/ODYSSEUS_HOST from .env, so editing
.env alone leaves the server on loopback. Document the -BindHost flag in
the Native Windows setup section, with the existing keep-auth-on /
don't-expose-publicly caveats.
Fixes#4552
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Add a post-delete verification step: after the CardDAV server returns
2xx/404, force-re-fetch the contact list and confirm the UID is gone.
If the UID is still present, log a warning and return False instead of
silently reporting success.
This catches the case where _resolve_resource_url falls back to the
guessed {uid}.vcf URL but the contact's real resource URL differs —
the DELETE hits the wrong URL, server returns 404 (treated as success),
but the contact remains. Previously this caused silent persistence
failures and agent loops.
Inline backtick spans were converted to <code> only at the end of
mdToHtml, after the bare-URL autolink and <a>/allowed-HTML passes. A URL
inside inline code is preceded by a space, so the autolink wrapped it in
an <a> tag and swapped it for an ___ALLOWED_HTML_ placeholder, corrupting
commands like `irm http://127.0.0.1:3000/x`.
Extract inline code into placeholders before the link passes, mirroring
the existing fenced-code-block handling, and restore them last so
placeholders carried inside restored <a> blocks resolve. Escape the code
at extraction time since it now bypasses the global escape pass.
The fallback regex in email_pollers.py that recovers a
[{"action": ...}, ...] JSON array from raw model output used lazy
[^[\]]*? runs inside a (?:,\s*\{...\}\s*)* repetition, which backtracks
exponentially (CodeQL py/redos) on inputs like [{"action"},{ + }},{{ * N.
It runs on the LLM reply to an email→calendar prompt embedding the
untrusted email body, so a crafted email can stall the background poller.
Extract the pattern to a module-level _CAL_ACTION_ARRAY_RE and rewrite the
object-content class from the lazy [^[\]]*? to a greedy brace-delimited
[^{}], which removes the quantifier ambiguity. The match is linear (a 500KB
adversarial input now resolves in <1ms) and equivalent on well-formed
arrays; it is also strictly more robust for values containing '[' or ']'
(the old class bailed on those and extracted nothing).
Resolves CodeQL py/redos #198.
* feat(a11y): add a Text size control and an OpenDyslexic font option
Text size: a Theme > Font & Layout control (Default / Larger) that scales the whole UI via CSS zoom, so the many hard-coded px sizes scale too (density only moves the root font-size). Stored globally so it persists across theme switches; applied early in the boot script to avoid a flash. OpenDyslexic: a dyslexia-friendly self-hosted font (SIL OFL 1.1), bundled as woff2 alongside Fira Code/Inter and wired into the Font select. Reuses the existing density/font pattern end to end; no new colours, spacing, or component styles.
* fix(a11y): keep modals on-screen at Larger text size
Inline vh heights on .modal-content overrode the ui-scale-125 max-height
compensation, so Cookbook (and the email/doc/skills/PDF modals) overflowed
the viewport at 125% — pushing the header and close button off-screen.
Let the compensation own those heights.
* fix(a11y): keep PDF export modal at its original 86vh on Default size
The DELETE /api/personal/file disk-delete containment check used the
shared PERSONAL_UPLOADS_DIR root, so one admin could delete another
user's personal upload by passing its path (uploads are partitioned per
owner under <root>/<owner>/). Confine the check to the caller's own
per-owner subdir via _personal_upload_dir_for_owner(owner). RAG removal
and listing exclusion are unchanged (they still serve non-upload indexed
sources). Adds a regression test for the cross-owner case.
Moves create_session, list_sessions, send_to_session and manage_session out of
ai_interaction.py into src/agent_tools/session_tools.py (the do_ prefix
dropped) and registers them in TOOL_HANDLERS, so dispatch flows through the
registry instead of the dispatch_ai_tool elif in tool_execution.py. Same
pattern as the model-interaction move.
The bodies move verbatim; each fetches the runtime-set session manager via a
get_session_manager() shim, and reuses _resolve_model / AI_CHAT_TIMEOUT from
ai_interaction. manage_session's internal 'list' alias is repointed from the
old do_list_sessions to the moved list_sessions. stream_ai_tool (dead, no
callers) and do_pipeline stay put. dispatch_ai_tool loses its four now-unused
branches.
Tests: test_session_tools_registry covers registration, owner threading, the
manage_session->list_sessions delegation, graceful no-manager handling, and
registry dispatch. Verified end-to-end against a live SessionManager.
chatRenderer.js built the model-info popup HTML by concatenating the
model name (from the LLM response's model/answered_by field) into
popup.innerHTML without escaping, so a model advertised as an HTML/script
payload executed when the user clicked the role label. Wrap both
insertions with the uiModule.esc() helper the same function already uses.
Also apply existing escape helpers at two latent sinks flagged by CodeQL,
fed only by self-authored/server values today: document-tab title via
_esc(), and the calendar event background URL (escape the double quote
that would otherwise break out of the style="..." attribute).
Never imported anywhere in the codebase (unused since v1.0); it is the only
root dependency and nothing depends on it. Removing it also drops 6 transitive
packages from the lockfile.
Fixes#4565
Detached bash jobs (#!bg) could be launched and auto-reported on completion,
but the agent had no way to act on a running one: no on-demand output read and
no kill (it blocked until the 1h max-runtime). bg_jobs had the pieces
(_read_output, list_for_session, internal _kill) but none was exposed.
Adds:
- bg_jobs.kill(job_id): tears down the process tree, marks the job killed, and
sets followed_up so the monitor does not also auto-continue a deliberate kill.
- manage_bg_jobs registry tool with actions list / output / kill, scoped to the
chat that launched the job (cross-session access reads as not found).
- Wiring: TOOL_HANDLERS/TAGS, function schema, RAG index + keyword hints, parser
name map, dispatch (threads session_id via _direct_fallback). Gated like bash
(NON_ADMIN_BLOCKED_TOOLS; plan-mode mutator).
- agent_loop: background-job intent regex maps to the files domain (and the tool
joins _DOMAIN_TOOL_MAP[files]) so short commands like 'kill that job' are not
dropped by the low-signal gate that skips tool retrieval.
- bg launch message tells the model to call manage_bg_jobs itself for check/stop
rather than printing raw tool syntax to the user.
Tests: tests/test_bg_job_tools.py (kill semantics, per-chat scoping, actions,
and the intent classifier).
Three different background loops (_reconnectTask reachability poll,
_checkServeReachability, _pollBackgroundStatus) each independently
flipped Ollama sidecar tasks between running and stopped because the
`docker exec ollama-rocm ollama show <tag>` cmd exits cleanly after
its verification print, which the loops misread as the serve dying.
Added _isOllamaSidecarTask(task) and an early-bail in each of the
three loops so the task stays pinned to running once the show-cmd
exits 0. Also the tmux-graceful-kill path prepends a
`docker exec ollama-rocm ollama stop <tag>` before tearing down
the tmux session, so the Ollama-side model load gets unloaded too
(was leaving the model resident in the daemon after Stop).
Frontend half of the backend-detection + per-OS install command work,
plus a pile of mobile/UX fixes:
Backend awareness:
- _gpuEnvPrefix() picks CUDA_VISIBLE_DEVICES / HIP_VISIBLE_DEVICES /
nothing based on detected hwfit backend + scanned-host match (so a
stale ajax scan does not leak CUDA env vars into a kierkegaard
Vulkan launch). Replaces 6 hardcoded CUDA_VISIBLE_DEVICES sites.
- GGML_CUDA_ENABLE_UNIFIED_MEMORY only emitted when backend is
actually CUDA (was leaking onto Vulkan/ROCm via saved presets).
Per-target install command:
- Dep rows render a single mono command box + Copy button when the
server resolved pkg.install_cmd_for_target. Reused in the build-deps
install failure toast so the toast and the row show the same line.
- Diagnosis patterns split cmake/g++/git out of the generic
llama-cpp-python catch-all so a missing-cmake failure surfaces a
cmake-specific message + per-distro Copy buttons.
Form toggles always visible:
- Reasoning Parser, Expert Parallel, MoE Env Vars no longer gated on
model-family detection. Detection still hints (parser tag shown when
matched); toggle works with sensible defaults otherwise. MiniMax M-
series added to MoE family detector so the auto-fill is right.
Mobile + GPU default:
- Launch tab cached-list flex collapsed to 0px on mobile because the
desktop `flex: 1 1 0` had no parent height to grow into. Override
to `flex: 0 0 auto` in the cookbook mobile @media block.
- doclib-card expand on mobile (Firefox no :has() support) pins
explicit px heights so the launch form actually appears.
- llama_mode defaults to gpu when hwfit detected cuda/rocm/vulkan/
metal on the current target, instead of always cpu (which was
forcing -ngl 0 on first-open and burning 35GB models on CPU).
When a llama.cpp launch needs cmake/build-essential/git the user used to
get a four-distro dump ("apt: x / pacman: y / dnf: z / brew: w") and
had to pick the right one. Now:
- shell_routes /api/cookbook/packages probes /etc/os-release on the
target in the same SSH round-trip as the existing system-prereq
check, classifies into debian / arch / fedora / alpine / suse /
macos, and builds a single install_cmd_for_target string from the
(os_family, backend) matrix. CUDA hosts get nvidia-cuda-toolkit;
ROCm gets rocm-dev / rocm-hip-sdk; Vulkan gets libvulkan-dev /
vulkan-headers; etc.
- llama_cpp catalog entry gets system_prereqs: [cmake, g++, git].
When any of those are missing on the target, the row picks up
pkg.build_deps_missing + pkg.install_cmd_for_target for the
frontend to render.
- New POST /api/cookbook/install-system-deps endpoint runs the right
package manager via passwordless sudo on the target. Allowlisted to
{cmake, build-essential, g++, gcc, git, tmux, make}; sudo -n only
so it can never hang waiting for a password (returns a clear
"passwordless sudo unavailable" error via stderr instead).
Three classes of incorrect detection fixed:
(1) AMD GPU + no ROCm installed (e.g. Strix Halo) was reported as
backend=rocm everywhere, so launch commands emitted
HIP_VISIBLE_DEVICES (silent no-op on Vulkan) and the from-source
build path failed. Both _probe_amd_sysfs (routes/cookbook_routes)
and _detect_amd (services/hwfit/hardware) now probe rocminfo /
hipconfig / vulkaninfo at detection time and report vulkan when
only Vulkan is present.
(2) Build helper was picking the CUDA branch on AMD hosts whenever a
stray pip-installed nvcc was on PATH (vLLM wheels carry one
without libcudart). Added _odysseus_has_nvidia_hw() that checks
nvidia-smi / /dev/nvidia* / lspci, and gates both the nvcc PATH
augmentation and the CUDA elif branch on real hardware.
(3) Build chain reordered to ROCm/HIP > CUDA > Vulkan > CPU. Vulkan
tier added between CUDA and CPU as a portable fallback for hosts
with a GPU but no native toolchain (the common Strix Halo case).
Same _append_llama_cpp_linux_accel_build_lines also auto-attempts
sudo -n apt/pacman/dnf install of cmake/build-essential/git when
they are missing, surfacing a clear no-passwordless-sudo warning
otherwise.