Three different background loops (_reconnectTask reachability poll,
_checkServeReachability, _pollBackgroundStatus) each independently
flipped Ollama sidecar tasks between running and stopped because the
`docker exec ollama-rocm ollama show <tag>` cmd exits cleanly after
its verification print, which the loops misread as the serve dying.
Added _isOllamaSidecarTask(task) and an early-bail in each of the
three loops so the task stays pinned to running once the show-cmd
exits 0. Also the tmux-graceful-kill path prepends a
`docker exec ollama-rocm ollama stop <tag>` before tearing down
the tmux session, so the Ollama-side model load gets unloaded too
(was leaving the model resident in the daemon after Stop).
Frontend half of the backend-detection + per-OS install command work,
plus a pile of mobile/UX fixes:
Backend awareness:
- _gpuEnvPrefix() picks CUDA_VISIBLE_DEVICES / HIP_VISIBLE_DEVICES /
nothing based on detected hwfit backend + scanned-host match (so a
stale ajax scan does not leak CUDA env vars into a kierkegaard
Vulkan launch). Replaces 6 hardcoded CUDA_VISIBLE_DEVICES sites.
- GGML_CUDA_ENABLE_UNIFIED_MEMORY only emitted when backend is
actually CUDA (was leaking onto Vulkan/ROCm via saved presets).
Per-target install command:
- Dep rows render a single mono command box + Copy button when the
server resolved pkg.install_cmd_for_target. Reused in the build-deps
install failure toast so the toast and the row show the same line.
- Diagnosis patterns split cmake/g++/git out of the generic
llama-cpp-python catch-all so a missing-cmake failure surfaces a
cmake-specific message + per-distro Copy buttons.
Form toggles always visible:
- Reasoning Parser, Expert Parallel, MoE Env Vars no longer gated on
model-family detection. Detection still hints (parser tag shown when
matched); toggle works with sensible defaults otherwise. MiniMax M-
series added to MoE family detector so the auto-fill is right.
Mobile + GPU default:
- Launch tab cached-list flex collapsed to 0px on mobile because the
desktop `flex: 1 1 0` had no parent height to grow into. Override
to `flex: 0 0 auto` in the cookbook mobile @media block.
- doclib-card expand on mobile (Firefox no :has() support) pins
explicit px heights so the launch form actually appears.
- llama_mode defaults to gpu when hwfit detected cuda/rocm/vulkan/
metal on the current target, instead of always cpu (which was
forcing -ngl 0 on first-open and burning 35GB models on CPU).
When a llama.cpp launch needs cmake/build-essential/git the user used to
get a four-distro dump ("apt: x / pacman: y / dnf: z / brew: w") and
had to pick the right one. Now:
- shell_routes /api/cookbook/packages probes /etc/os-release on the
target in the same SSH round-trip as the existing system-prereq
check, classifies into debian / arch / fedora / alpine / suse /
macos, and builds a single install_cmd_for_target string from the
(os_family, backend) matrix. CUDA hosts get nvidia-cuda-toolkit;
ROCm gets rocm-dev / rocm-hip-sdk; Vulkan gets libvulkan-dev /
vulkan-headers; etc.
- llama_cpp catalog entry gets system_prereqs: [cmake, g++, git].
When any of those are missing on the target, the row picks up
pkg.build_deps_missing + pkg.install_cmd_for_target for the
frontend to render.
- New POST /api/cookbook/install-system-deps endpoint runs the right
package manager via passwordless sudo on the target. Allowlisted to
{cmake, build-essential, g++, gcc, git, tmux, make}; sudo -n only
so it can never hang waiting for a password (returns a clear
"passwordless sudo unavailable" error via stderr instead).
Three classes of incorrect detection fixed:
(1) AMD GPU + no ROCm installed (e.g. Strix Halo) was reported as
backend=rocm everywhere, so launch commands emitted
HIP_VISIBLE_DEVICES (silent no-op on Vulkan) and the from-source
build path failed. Both _probe_amd_sysfs (routes/cookbook_routes)
and _detect_amd (services/hwfit/hardware) now probe rocminfo /
hipconfig / vulkaninfo at detection time and report vulkan when
only Vulkan is present.
(2) Build helper was picking the CUDA branch on AMD hosts whenever a
stray pip-installed nvcc was on PATH (vLLM wheels carry one
without libcudart). Added _odysseus_has_nvidia_hw() that checks
nvidia-smi / /dev/nvidia* / lspci, and gates both the nvcc PATH
augmentation and the CUDA elif branch on real hardware.
(3) Build chain reordered to ROCm/HIP > CUDA > Vulkan > CPU. Vulkan
tier added between CUDA and CPU as a portable fallback for hosts
with a GPU but no native toolchain (the common Strix Halo case).
Same _append_llama_cpp_linux_accel_build_lines also auto-attempts
sudo -n apt/pacman/dnf install of cmake/build-essential/git when
they are missing, surfacing a clear no-passwordless-sudo warning
otherwise.
Cookbook now needs to docker-exec into ollama-rocm (and any other sibling
container holding a model server) from inside its own container, so:
- Dockerfile installs the Docker CLI from the static binary tarball
(the Debian docker.io package ships dockerd but not the client on slim)
- docker-compose.yml bind-mounts /var/run/docker.sock and adds group_add
for the host docker group (default GID 963)
- entrypoint.sh detects the socket GID, creates a local group with that
GID, and runs usermod -aG before gosu-dropping to the app user so the
supplementary group propagates through (gosu strips by default)
build_models_url returns /models (no /v1 prefix) for non-local generic
OpenAI-compatible hosts (intentional, see endpoint_resolver.py:206). The
tests added in #4272 expected /v1/models, which is the local/deepseek
behavior. Match production semantics.
The two delete-ordering tests did monkeypatch.chdir(tmp_path) and wrote the
image under tmp_path/data/generated_images, but DATA_DIR (and therefore
gallery_routes.GALLERY_IMAGE_DIR) is always an absolute path, so the delete
resolver pointed at the repo's real data dir and ignored the chdir.
test_file_removed_on_successful_delete therefore failed on dev (the file at
the tmp path was never the one being removed), and test_file_kept_when_commit_fails
passed only by accident. Set GALLERY_IMAGE_DIR to the seeded tmp dir via
monkeypatch so both tests exercise the real path and pass deterministically.
* test(hwfit): assert the Apple matcher, not the general lookup, in the non-Apple guard
f7aa2de (#2564) added test_non_apple_gpu_with_cores_does_not_match, which
asserts _lookup_bandwidth(RTX 4090) is None. But '4090': 1008 has been in
the general GPU_BANDWIDTH table since v1.0, so _lookup_bandwidth correctly
returns the card's real bandwidth and the test fails (expected None, got
1008) - reddening the required pytest gate on dev and, by inheritance,
every open PR.
The guard's actual intent is that the Apple-specific bandwidth path does
not false-match a non-Apple card that carries a gpu_cores count. Point
the two asserts at _lookup_apple_bandwidth, which returns None for any
name without 'apple' regardless of the general table. The general-lookup
behavior (4090 -> 1008) is correct and untouched.
* fix(hwfit): route string GPU names through the Apple bandwidth helper
Second half of the #2564 regression (RaresKeY review on #4303). That
change moved the Apple tiers out of the generic GPU_BANDWIDTH table into
the dict-only _lookup_apple_bandwidth, but _lookup_bandwidth only called
that helper for dict inputs. A bare-string caller like
_lookup_bandwidth("Apple M3 Max") therefore fell through to the generic
table, found no Apple key, and returned None instead of the conservative
tier. Route both dict and string inputs through the Apple helper (a
string carries no gpu_cores, so it gets the model's lowest tier).
Regression added for the string path plus a non-Apple string control.
* fix: resolve Apple Silicon bandwidth variants
* fix(hwfit): preserve string lookup path in _lookup_bandwidth
* fix(hwfit): guard Apple bandwidth lookup against false GPU matches
Add "apple" not in gn check to _lookup_apple_bandwidth() so that
non-Apple GPUs with "m3"/"m4"/"m5" in their names (e.g. NVIDIA
Quadro M4 000) don't incorrectly match Apple bandwidth tiers.
Addresses @o3LL review comment on PR #2564.
#4159 (4b0a977) made build_models_url insert /v1 for path-less bases, so
the TestBuildersRejectLookalikeHosts model assertions that expected
/models started failing and turned the pytest gate red on dev.
Both the generic OpenAI branch and the real Anthropic branch now end in
/v1/models, so a URL-only assertion no longer proves a lookalike host
dodged the Anthropic/Ollama branch. Assert _detect_provider == "openai"
directly and keep the /v1/models expectation.
- Agent: pass the open email reader (uid/folder/account/from/subject/body
preview) on every chat submit so 'reply to this' / 'write email saying
hi' route to ui_control open_email_reply with the right UID instead of
inventing a new .md draft. Code-level enforcement (chat_routes strips
create_document + send_email when active_email is set); cross-session
active_doc_id is now trusted instead of being silently dropped.
set_active_email/clear_active_email tool-layer helpers in
tool_implementations.
- ui_control open_email_reply: optional body argument so the agent can
open-and-write in one call; envelope now forwards uid/folder/account/
body/panel through tool_output. Tool description sharpened and the
parser rejects empty bodies on reply/reply-all (forces the agent to
write rather than open an empty draft).
- Email library: search now runs against [Gmail]/All Mail when the
current folder is INBOX (archived emails surface). Whirlpool spinner
+ 'Searching…' placeholder while in flight. Each search result is
stamped with its source folder so clicks open the right email instead
of whatever shares its UID in INBOX. Search no longer re-applies the
same text pill locally (which only checks subject/from/snippet, never
body) so body-only matches don't get dropped after IMAP returns them.
Initial inbox load bumped 100→500.
- Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the
card menu and the open-reader more menu, backed by a new
/api/email/flag/{uid}?on=true|false endpoint. Flagged emails always
bubble to the top of the grid regardless of active sort.
- AI reply in doc editor: never overwrites existing draft text or the
quoted history. AI suggestion is prepended; AI-generated 'On …
wrote:' re-quotes are stripped so the original quote isn't visually
edited.
- Cookbook serve: pre-launch GPU driver / has_gpu / install / version-
floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0
etc.) before the launch chain starts. Detect 'another model already
running on this host' and offer Stop & launch (with graceful then
force tmux kill helpers, port release wait). Per-vendor deep-link
buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend
picker is now a custom dropdown with accent-coloured logos for vLLM,
SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to
package names in Dependencies. Runtime-readiness note moved inside
the panel (green when ready, red when missing) with an × dismiss.
Esc collapses the expanded card; expanded card scrolls when it
overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce
Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on
one row (Reasoning Parser auto-detected per model family).
Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto'
input — command builders read from the GPU button strip. Default
cookbook open is Download tab.
- Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by
release_date; release dates can be backfilled with the new
scripts/backfill_model_release_dates.py and recipe metadata pulled
with scripts/import_from_vllm_recipes.py against the upstream
vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped
on entries).
- Calendar: Quick add hint cycles a random Odysseus-themed example per
open (wooden horse Friday, crew muster 10am daily, council on
Ithaca, …). Typing a time like '11pm' in the event title updates
the hero clock live.
- Doc editor: email-mode Reply button (sparkle icon, accent) opens the
same Fast/Full + context popover the email reader uses; Ctrl+Alt+M
toggles markdown preview.
- Memories panel: custom sort picker with per-option icons, default
'Latest', visible Enabled/Disabled toggle text matching the section
description style.
* Agent: make skill-prescribed tools actually callable
The skill index and matched-skill procedures are injected into the
prompt, but tool selection never followed: manage_skills wasn't in the
RAG-selected schema list (so the model substituted manage_memory), and
a matched skill could prescribe tools (grep, read_file) the model had
no schema for. Now:
- manage_skills rides along whenever the owner has any skills indexed
- a Jaccard-matched skill's requires_toolsets join the selection
- viewing a skill mid-turn via manage_skills unlocks its
requires_toolsets for subsequent rounds
- admin-intent turns send _ADMIN_TOOLS schemas, matching the prompt
text _build_base_prompt already advertises
- index_for(active_toolsets=None) no longer hides requires_toolsets
skills from callers that don't know the active set
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Agent: validate skill requires_toolsets against known tools, not TOOL_SECTIONS
grep/glob/ls ship as function schemas without a prompt-prose section,
so gating on TOOL_SECTIONS silently dropped them from a skill's
requires_toolsets.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
* fix(research): preserve Discuss spin-off primer during context trimming
trim_for_context() kept only system_msgs[:1] as essential and dropped the
rest under budget pressure. A research "Discuss" spin-off seeds the report
as a system message that sits after the preface system messages, so it
landed in extra_system and was the first thing evicted once the chat grew
— the conversation then lost its grounding and drifted off task.
Treat any system message carrying research_spinoff_from metadata as
essential, alongside the leading system prompt, so the seeded report
survives trimming. maybe_compact already retains all system messages.
Tests: tests/test_context_compactor.py::TestResearchPrimerPreserved
* fix(research): ground Discuss spin-off chats on the seeded report
build_chat_context injected global memory (pinned + hybrid-retrieved) and
personal-doc RAG every turn, keyed off the user-level memory_enabled pref
and a request-scoped use_rag flag — never the session. A research spin-off,
whose primer declares the report the sole knowledge base, thus had
unrelated keyword-matched facts pulled in ("wrong data") competing with the
report; its rag=False flag was also ignored (use_rag defaulted on).
Add _session_is_research_spinoff(sess) (detects the primer research_spinoff_from
metadata; handles ChatMessage and dict forms) and, for such sessions,
disable memory injection and force RAG off.
Tests: tests/test_chat_helpers.py spin-off detection cases
---------
Co-authored-by: Dan (cirim) <claude@cirim.org>
Clicking the card body outside the edit <textarea> bubbled to the card's
click handler and collapsed the card, silently discarding unsaved skill
edits (issue #4002). The textarea's own stopPropagation only shields
clicks landing on it. Bail out of the card click handler while a
.skill-md-editor is present so the card only leaves edit mode via Save
(Cancel button is handled separately by #3580). Mirrors the same guard
into the built-in capability card, which shared the bug.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The scripts/odysseus-backup snapshot/restore CLI was undocumented in
README.md and docs/. Add docs/backup-restore.md covering the snapshot,
list, verify, and restore subcommands, default include/skip behavior
(deep_research and mail-attachments skipped unless flagged), the
destructive-restore warning and its data.before-restore-* stash, a cron
example, and Docker-vs-native data/ paths (including the ChromaDB named
volume caveat). Link it from the README Data section.
Addresses the "Backup/restore guide and helper flow for data/" item in
ROADMAP.md. Docs only; no change to the tool.
Fixes#2583
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(auth): add per-user admin promote/demote toggle
Admin-only API and Users-tab control to grant/revoke admin rights; refuses to demote the last admin.
* fix(auth): restore pre-admin privilege restrictions on demotion
Promoting now stashes the user's privilege map (privileges_before_admin)
and demoting restores it instead of resetting to defaults, so a
promote/demote round trip can no longer broaden a restricted user's
access. Users without a stash (created as admin, or promoted before this
fix) still demote to DEFAULT_PRIVILEGES so a born-admin's stored all-True
map — including can_use_bash — can't survive demotion.
---------
Co-authored-by: K M Merajul Arefin <merajul.arefin@therapservices.net>