The recipe dropdown was a static catalog (MiniMax / Any vLLM model). Now
it lists every model already downloaded on the active server (the same
_cachedModelIds set the Launch tab + dl-dots already drive), plus an
'Other (generic …)' fallback. The change handler uses pickRecipe(backend,
modelId) to find the best match — MiniMax ids land on the MiniMax recipe,
everything else falls back to the generic install.
cookbook-diagnosis.js: openCookbookDependencies's pre-select logic now
matches by full option value (model id) instead of label substring, since
the dropdown values are full repo ids now.
Before the quickrun (Run) button fires /api/model/serve, ask the deps
API whether the chosen backend (vllm / sglang / llama_cpp) is actually
installed on the target server. If not:
- Toast: '<backend> not installed on <host>. Opening Dependencies …'
- Route the user into the Dependencies tab via the existing
_openCookbookDependencies helper (now exported as
openCookbookDependencies)
- Auto-expand the recipe panel for that backend
- Pre-select the user's model in the panel's picker so the right
recipe is highlighted out of the box
The serve task is suppressed; the Run button is re-enabled. Once the
install task finishes in Running, the user clicks Run again.
cookbook-diagnosis.js: openCookbookDependencies takes an opts object
that, when expandRecipe is set, finds the row's caret and clicks it,
then matches a recipe label by model (currently only MiniMax has a
specific entry; the generic fallback stays selected otherwise).
Was rendering as a separate body block below the Copy/× toolbar.
Now the diagnosis message and the suggested-action text sit inline
on the left of the toolbar, with Copy and × pinned to the right —
reads as one self-contained header strip instead of stacked rows.
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.
Highlights:
- Ollama as a first-class backend in the Cookbook:
* Download input accepts ollama-style names (name:tag) → backend=ollama
* /api/cookbook/ollama/library (cached scrape of ollama.com + curated
fallback so classic models like qwen2.5 stay reachable)
* "Browse Ollama library" toggle below Download with size chips
* Engine=Ollama in hwfit toolbar merges the Ollama library into the
main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
pulled up 8px to align with the row's checkboxes, GGUF File dropdown
exposed for Ollama backend, GPU re-render on Edit serve restore,
_forceBackend flag so saved serveState wins over backend detection,
cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
Clear-offline buttons in Added Models toolbar, offline-pill removed
(opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
reachable (vLLM returns 404 on bare /v1, fully working endpoints
were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish
This commit consolidates all Windows Cookbook background fixes into a single comprehensive commit based on the latest main branch.
Key fixes included:
1. React looksSuccessful Mismatch: Append 'DOWNLOAD_OK' for pip install commands in routes/cookbook_routes.py.
2. Local Windows SSH Wrapper & Log Directory Mismatch: Bypassed ssh wrappers and dynamically selected odysseus-tmux logs for local tasks in static/js/cookbookRunning.js.
3. WSL Bash Filtration: Filtered out the WSL bash stub at C:\Windows\System32\bash.exe in core/platform_compat.py.
4. Drive-Colon Path Normalization: Replaced .as_posix() with git_bash_path() in routes/shell_routes.py and src/bg_jobs.py.
5. GGUF-Only Hardware Fitting: Restructured local Windows recommendations to rank GGUF only in services/hwfit/fit.py.
6. Safe Win32 Process Liveness Probe: Replaced os.kill(pid, 0) with a safe Win32 API probe using GetExitCodeProcess in core/platform_compat.py.
7. Prebuilt llama-cpp-python Wheels: Supply the CPU extra index during compilation failure fallback.
8. Enforce UTF-8 log encoding: Set PYTHONIOENCODING=utf-8 on Windows bootstrap runners.
9. Fix Linux Llama.cpp Build script syntax error in routes/cookbook_helpers.py.
10. Page Reload Status Check: Run sys.executable instead of 'python3' to bypass Microsoft Store execution stubs on local Windows hosts.
11. Llama.cpp serve build bypass: Bypassed cmake compilation checks on local Windows and verified python bindings directly.
12. Serve Command Path Validation: Masked safe GGUF path printf subshells '' inside the serve command validator.
13. CPU Mismatch Diagnostics: Intercepted AVX2-lacking '0xc000001d' (Illegal Instruction) crashes in static/js/cookbook-diagnosis.js and guided users to Ollama.
14. Windows Pytest stability: Fixed stub import leakage in test files.
Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:
* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).
Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.
`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.
Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.
Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.
`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.
Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
Backend (services/hwfit + routes):
- rank_models picks visible set by REQUESTED column, not always score —
sorting by Param now shows highest-param models PERIOD (incl. too_tight).
- New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang
cannot serve them); default non-prequantized to BF16 on 2+ GPUs.
- AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so
FP8 wins when both fit.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above
M2.5 on equal composite score; >=100B integers not misread as versions.
- /api/cookbook/hf-latest no longer drops models without an "NB" pattern in
the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered).
- Cached-model scan: atexit flushes models JSON even if the script is
killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s.
- KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New
"stalled" status for shells <1 MB with no .incomplete files.
- /api/cookbook/state POST guard: rejects "done" download tasks lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stops stale tabs from poisoning persisted state.
- hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization
Q4_K_M -> BF16 (it is the native base, not a quant).
Frontend (static/js):
- Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/
50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score
on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit
column header tagged with active budget (RAM / GPU / N GPU).
- Foldable Download admin-card: the Download h2 is the chevron trigger;
state persists in localStorage.
- Download card surfaces destination dir (Dir: <path>). Same dir on running
task row, font/color matched to uptime (9px Fira Code muted, opacity .4).
- Serve panel ctx text input always resets to model max on open. Sub-MB
cached models show with red "download stalled" badge.
- Bulk-select Cancel + Delete reset the Select button label on exit.
- Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/
required; bare "Download complete" no longer marks the task done after
the first config file. Clear button now sends tmux kill-session too.
True overall % for multi-shard downloads: ((N-1)+frac)/total instead of
hf_transfer per-shard aggregate.
- Diagnosis card simplified: removed fold toggle, copy button, dismiss X.
Suggestion font matches message body (12px).
- HF token field flashes green check + "Saved" on save.
- Cached scan no longer counts stalled rows as downloaded in Scan/Download.
CSS:
- dep Install button width pinned to 76px to match Installed split.
- task-sub row +1px; task-status badge gets margin-right 8px.
- Ctx slider styled like gallery editor sliders (thin pill rail, red thumb).
- Bulk-select cancel button top -3px -> -5px.