Commit Graph

61 Commits

Author SHA1 Message Date
pewdiepie-archdaemon 2cbd55b8bd Open email context for agent, email search across All Mail, cookbook serve polish
- Agent: pass the open email reader (uid/folder/account/from/subject/body
  preview) on every chat submit so 'reply to this' / 'write email saying
  hi' route to ui_control open_email_reply with the right UID instead of
  inventing a new .md draft. Code-level enforcement (chat_routes strips
  create_document + send_email when active_email is set); cross-session
  active_doc_id is now trusted instead of being silently dropped.
  set_active_email/clear_active_email tool-layer helpers in
  tool_implementations.

- ui_control open_email_reply: optional body argument so the agent can
  open-and-write in one call; envelope now forwards uid/folder/account/
  body/panel through tool_output. Tool description sharpened and the
  parser rejects empty bodies on reply/reply-all (forces the agent to
  write rather than open an empty draft).

- Email library: search now runs against [Gmail]/All Mail when the
  current folder is INBOX (archived emails surface). Whirlpool spinner
  + 'Searching…' placeholder while in flight. Each search result is
  stamped with its source folder so clicks open the right email instead
  of whatever shares its UID in INBOX. Search no longer re-applies the
  same text pill locally (which only checks subject/from/snippet, never
  body) so body-only matches don't get dropped after IMAP returns them.
  Initial inbox load bumped 100→500.

- Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the
  card menu and the open-reader more menu, backed by a new
  /api/email/flag/{uid}?on=true|false endpoint. Flagged emails always
  bubble to the top of the grid regardless of active sort.

- AI reply in doc editor: never overwrites existing draft text or the
  quoted history. AI suggestion is prepended; AI-generated 'On …
  wrote:' re-quotes are stripped so the original quote isn't visually
  edited.

- Cookbook serve: pre-launch GPU driver / has_gpu / install / version-
  floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0
  etc.) before the launch chain starts. Detect 'another model already
  running on this host' and offer Stop & launch (with graceful then
  force tmux kill helpers, port release wait). Per-vendor deep-link
  buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend
  picker is now a custom dropdown with accent-coloured logos for vLLM,
  SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to
  package names in Dependencies. Runtime-readiness note moved inside
  the panel (green when ready, red when missing) with an × dismiss.
  Esc collapses the expanded card; expanded card scrolls when it
  overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce
  Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on
  one row (Reasoning Parser auto-detected per model family).
  Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto'
  input — command builders read from the GPU button strip. Default
  cookbook open is Download tab.

- Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by
  release_date; release dates can be backfilled with the new
  scripts/backfill_model_release_dates.py and recipe metadata pulled
  with scripts/import_from_vllm_recipes.py against the upstream
  vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped
  on entries).

- Calendar: Quick add hint cycles a random Odysseus-themed example per
  open (wooden horse Friday, crew muster 10am daily, council on
  Ithaca, …). Typing a time like '11pm' in the event title updates
  the hero clock live.

- Doc editor: email-mode Reply button (sparkle icon, accent) opens the
  same Fast/Full + context popover the email reader uses; Ctrl+Alt+M
  toggles markdown preview.

- Memories panel: custom sort picker with per-option icons, default
  'Latest', visible Enabled/Disabled toggle text matching the section
  description style.
2026-06-15 20:47:51 +09:00
pewdiepie-archdaemon 768bcb565a Cookbook/Dependencies: variant toggle now uses the agent/chat mode-toggle; Copy inside code
- Drop the 'Install via' label and the pill-tag variant buttons. The
  toggle is now the same sliding-pill mode-toggle used by the
  Agent/Chat selector in the chat input. Pip/uv on the left, Docker on
  the right, default = Pip/uv. CSS: extended .mode-chat::before's
  translateX(100%) rule to also fire on .mode-right so non-chat
  callers can use the same animation without claiming the chat-only
  class name.
- Copy button moves inside the <pre>: absolute-positioned at the
  top-right corner, icon-only, padding-right on the pre makes room.
  Matches the Setup-token copy pattern in the integrations form.
2026-06-14 22:54:38 +09:00
pewdiepie-archdaemon 63b4ad2e9c Cookbook/Dependencies: Pip/uv vs Docker variant toggle on recipe panel
Each recipe catalog entry now carries two variants:
  variants.pip    → uv pip install …
  variants.docker → docker pull <image>

A small 'Install via' pill row in the panel toggles between them
(default = Pip/uv per the user's preference). Switching variant or
changing the model re-renders the <pre> via _refreshRecipePre(); the
display text drops the 'source venv/bin/activate' prefix for Docker
since docker pull doesn't need a venv. Run honours the active variant
so picking Docker queues 'docker pull …' as the tmux task.
2026-06-14 22:47:26 +09:00
pewdiepie-archdaemon d70eb99a0d Cookbook/Dependencies recipes: install into configured venv, drop 'uv venv'
Recipes now hold ONLY the install command(s). The rendered <pre>
prepends a 'source <envPath>/bin/activate' line so the user sees a
paste-ready sequence; Run uses env_prefix (same path the Install
button uses) to activate the configured venv before the install
command, so the install lands in the existing environment rather
than a fresh .venv in whatever CWD the tmux task happens to start in.

- cookbook-deps-recipes.js: trim each recipe to its single pip command
- cookbook.js: _recipeDisplayText() prepends the activate context for
  display; pre's data-dep-recipe-install holds the raw install-only
  command list so Run knows what to send; Run builds env_prefix the
  same way _installDep does.
2026-06-14 22:45:12 +09:00
pewdiepie-archdaemon d44de3af43 Cookbook/Dependencies: populate recipe model picker from downloaded models
The recipe dropdown was a static catalog (MiniMax / Any vLLM model). Now
it lists every model already downloaded on the active server (the same
_cachedModelIds set the Launch tab + dl-dots already drive), plus an
'Other (generic …)' fallback. The change handler uses pickRecipe(backend,
modelId) to find the best match — MiniMax ids land on the MiniMax recipe,
everything else falls back to the generic install.

cookbook-diagnosis.js: openCookbookDependencies's pre-select logic now
matches by full option value (model id) instead of label substring, since
the dropdown values are full repo ids now.
2026-06-14 22:40:52 +09:00
pewdiepie-archdaemon 600fa6be8a Cookbook/Dependencies: per-backend recipe panel (vllm/sglang/llama_cpp)
Each row for vllm, sglang, llama_cpp now carries an expand caret that
opens an inline recipe panel below the row. The panel has:
  - 'Serving which model?' select populated from a new tiny catalog
  - <pre> code block showing the exact shell sequence for that pair
  - Copy: clipboard the commands
  - Run: launch the joined 'cmd1 && cmd2 && …' as a tmux task on the
    currently-selected deps server (same plumbing as Install)

New file: src/static/js/cookbook-deps-recipes.js — single source of
truth for the recipes. Seeded with MiniMax M2/M2.7 + a generic fallback
for each backend (all three use 'uv venv → source .venv/bin/activate
→ uv pip install ... --torch-backend auto', the recipe the user
pasted). Adding model-specific recipes is now a one-entry edit.

Next commit: Launch-tab pre-flight that intercepts the serve click
when the backend isn't installed and deep-links into this panel.
2026-06-14 22:33:49 +09:00
pewdiepie-archdaemon 781a3ee829 Cookbook: rename 'Run' tab → 'Launch' (cookbook.js:1865) 2026-06-14 22:23:38 +09:00
pewdiepie-archdaemon 4074e77d93 Cookbook: auto-set KV cache to fp8 for DeepSeek V3/V4/R1 MoE families
These models OOM on --kv-cache-dtype auto (≈bf16) at any usable
context with current tensor-parallel layouts. _detectModelOptimizations
now seeds opts.kvCacheDtype='fp8' for them, and the serve panel's KV
Cache select picks that up as the default unless the user has a
saved override on this skill.
2026-06-14 08:57:29 +09:00
pewdiepie-archdaemon d3944be1be Cookbook: detect DeepSeek V4+ as MoE so Expert Parallel + Spec show
The DeepSeek branch in _detectModelOptimizations matched only V3
and R1 literally. DeepSeek-V4-Flash (and future Vx / Rx) didn't
hit any branch, so the Expert Parallel checkbox + Speculative
defaults never surfaced in the Run panel. Widened to a regex that
catches v3/v3.1/v4/v5/v10+ and r1/r2/… for both the expert-parallel
flag and the MTP speculative defaults.
2026-06-14 08:51:57 +09:00
pewdiepie-archdaemon 1d7d9c5e9c Cookbook deps: drop the manual vLLM install block + Run handlers 2026-06-14 08:49:20 +09:00
pewdiepie-archdaemon adac89c8e2 Cookbook deps: NVIDIA vs AMD ROCm-aware vLLM install commands
Reads the last hwfit scan's backend (window._hwfitSystemCache.backend)
and picks the right vLLM install path per vendor:

- NVIDIA/CUDA (default)
  - uv:     uv pip install -U vllm --torch-backend auto
  - docker: docker pull vllm/vllm-openai:latest
- AMD/ROCm
  - uv:     uv pip install -U vllm --torch-backend rocm
  - docker: docker pull rocm/vllm-dev:main

The <pre> previews are re-painted on render to match what Run will
actually launch, and the confirm dialog tags the backend so the user
knows what they're committing to.
2026-06-14 08:46:58 +09:00
pewdiepie-archdaemon 65a2e51af8 Cookbook deps: convert manual install snippets to Run buttons
Was just a copy-paste reference. Each row now has a Run button that
launches the command as a tmux task on the currently-selected deps
server (same path Reinstall already uses) — Odysseus does the work,
the user watches progress in the Active tab. Dropped the plain
pip option since the existing per-package Install button already
covers it; kept uv (recommended) and Docker pull as the two
alternatives.
2026-06-14 08:45:43 +09:00
pewdiepie-archdaemon 04a97adbb3 Cookbook: Extra args under Reasoning/Spec + manual vLLM install hints in Dependencies
- Moved "Extra args" out from above the vLLM advanced checks
  (Reasoning Parser, Speculative, MoE Env) to AFTER them, so it
  reads as "after the advanced toggles, anything else".
- Added a collapsed "Manual install (vLLM)" details block to the
  Dependencies tab description with three copy-paste recipes:
  uv venv + uv pip (recommended), plain pip, and docker pull
  vllm/vllm-openai:latest. Useful when the in-app Install button
  can't run (offline target, custom torch backend, etc).
2026-06-14 08:43:10 +09:00
pewdiepie-archdaemon 654f9f82c7 Cookbook: don't auto-fold Direct Download from inside its own body
The capture-phase scroll listener was firing for scrolls anywhere
in the modal — including the Trending models list, which lives
inside the Direct Download fold body. Scrolling that list was
auto-folding the section that contains it.

Bail early if the scroll target is the fold body or a descendant —
the section only folds on scrolls in sibling scrollers (.cookbook-body,
.hwfit-list, .modal-content).
2026-06-13 22:26:04 +09:00
pewdiepie-archdaemon ac4de93928 Cookbook: rename Serve tab → Run (label only, data-backend stays Serve) 2026-06-13 21:55:24 +09:00
pewdiepie-archdaemon 44a60c1261 Cookbook toolbar: move Search next to Standard, Engine/Quant/Context to right
New order: [Standard ▾] [Search ............] [Engine] [Quant] [Context]
so the two primary picks (type + free text) sit together at the
left, with the more advanced filters lined up to the right.
2026-06-13 21:30:05 +09:00
pewdiepie-archdaemon f09f606bec Cookbook fold: smooth max-height + opacity transition
display:none toggle was instant and felt jarring during auto-fold/
auto-expand. Swapped to a CSS class `.is-folded` that transitions
max-height (0 ↔ 1200px) and opacity (0 ↔ 1) over ~280ms with ease,
so both manual chevron clicks and the scroll-driven toggles slide
in/out smoothly.
2026-06-13 20:14:34 +09:00
pewdiepie-archdaemon e6349c016e Cookbook auto-fold: auto-expand when scrolling back to top
scroll handler now tracks per-target scrollTop via WeakMap. Downward
scroll on any scroller in the cookbook modal folds Direct Download;
scrolling back to top (scrollTop <= 0) unfolds it. Manual chevron
clicks still win — they persist to localStorage; auto-toggles
don't, so the user's last explicit pick survives reload.
2026-06-13 20:12:30 +09:00
pewdiepie-archdaemon e630605aef Cookbook auto-fold: capture-phase scroll listener catches hwfit-list
IntersectionObserver missed the case because scrolling inside the
nested .hwfit-list (max-height:52vh own scroller) doesn't move the
header out of view at all. The user wants any downward scroll in
the scan/download area to fold Direct Download.

Switched to a capture-phase scroll listener on #cookbook-modal that
catches every scroll event from any nested scroller (.hwfit-list,
.cookbook-body, .modal-content). Folds only on downward scrolls so
scrolling back up doesn't keep re-folding.
2026-06-13 20:10:46 +09:00
pewdiepie-archdaemon 74e563dabc Cookbook auto-fold: use IntersectionObserver to catch any scroll source
The scroll listener on .cookbook-body never fired — the user is
likely scrolling inside the nested .hwfit-list (max-height:52vh)
which doesn't bubble to its parent. IntersectionObserver fires
whenever the Direct Download header crosses the viewport edge
regardless of which container moved.

Folds only when boundingClientRect.top < 0 (header pushed up past
the top) so modal close / detach doesn't trigger it.
2026-06-13 20:07:32 +09:00
pewdiepie-archdaemon ae0b29af3d Cookbook auto-fold: target the actual scroll container (.cookbook-body)
Previous .modal-body / .cookbook-content lookup matched neither the
desktop scroller (.cookbook-body) nor the mobile one (#cookbook-modal
.modal-content), so the scroll listener was attached to document.body
and never fired. Walk up to whichever scroller actually exists.
2026-06-13 20:05:33 +09:00
pewdiepie-archdaemon d68c75a82c Cookbook: auto-fold Direct Download when its header scrolls past top
Added a scroll listener on the parent .modal-body / cookbook-content
that folds the Direct Download body once its h2 header has scrolled
above the container's top edge. Frees the viewport for the Scan
section below while leaving the chevron clickable to expand again.

Auto-fold doesn't write to localStorage (only manual clicks do)
so the user's last explicit preference still wins on reload.
2026-06-13 20:03:14 +09:00
pewdiepie-archdaemon a615f7f786 Cookbook Trending: drop ↻ refresh button (trending list reloads on toggle) 2026-06-13 20:01:54 +09:00
pewdiepie-archdaemon 0808de0b3b Cookbook Trending: shrink trending-up icon 18px → 15px 2026-06-13 20:01:26 +09:00
pewdiepie-archdaemon aba3a7ae43 Cookbook Trending: accent trending-up icon + chevron on right + larger row
- Added a trending-up (market-up) SVG before the label, tinted
  accent so the section reads as "what's hot".
- Chevron ▸ moved from the left to the right side of the toggle
  row (still rotates via the existing CSS).
- Bumped the toggle row taller (26→34px) with 13px font + 18px
  icon so the section header has more presence.
2026-06-13 19:59:41 +09:00
pewdiepie-archdaemon f78084c230 Brain cards 32px tall + Trending tab up 8px + drop hwfit Rescan
- Brain admin-card header rows get min-height:32px so cards with
  toggles and cards without (Inject Skills) align.
- Cookbook Trending models tab nudged up 8px (top:-3 → -11).
- Removed the ↻ RESCAN button in hwfit toolbar; manual EDIT still
  available and auto-probe runs on container restart.
2026-06-13 19:56:22 +09:00
pewdiepie-archdaemon d397b3db2f Restore dropped regression fixes 2026-06-09 10:31:43 +09:00
pewdiepie-archdaemon 4715a5505d Fix duplicate cookbook server helper export 2026-06-09 09:53:41 +09:00
pewdiepie-archdaemon 84ca74f04b Restore cookbook server key exports 2026-06-09 09:51:53 +09:00
pewdiepie-archdaemon fa8c93ec0a Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.

Highlights:
- Ollama as a first-class backend in the Cookbook:
  * Download input accepts ollama-style names (name:tag) → backend=ollama
  * /api/cookbook/ollama/library (cached scrape of ollama.com + curated
    fallback so classic models like qwen2.5 stay reachable)
  * "Browse Ollama library" toggle below Download with size chips
  * Engine=Ollama in hwfit toolbar merges the Ollama library into the
    main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
    columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
  loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
  checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
  pulled up 8px to align with the row's checkboxes, GGUF File dropdown
  exposed for Ollama backend, GPU re-render on Edit serve restore,
  _forceBackend flag so saved serveState wins over backend detection,
  cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
  Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
  Clear-offline buttons in Added Models toolbar, offline-pill removed
  (opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
  reachable (vLLM returns 404 on bare /v1, fully working endpoints
  were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
  serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
  Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
  serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
  ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
  auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish
2026-06-09 09:46:19 +09:00
pewdiepie-archdaemon 3b01760e95 Prepare tested main sync cleanup 2026-06-09 09:34:42 +09:00
Ocean Bennett 62ffcb6236 fix(cookbook): preserve same-host ssh profile selection (#3373)
* fix(cookbook): preserve same-host ssh profile selection

* fix(cookbook): resolve same-host ssh profiles in running tab and port lookups
2026-06-09 00:36:10 +02:00
Sebastian Andres El Khoury Seoane 8d9d4ec9c6 feat(platform): Add support for APFEL as part of the dependencies and models for the Cookbook. (#2657)
* feat(platform): add support for Apple Silicon detection in platform compatibility

test(tests): enhance shell_routes tests for Apple Silicon compatibility

* fix issues with missing import

* fix: correct package name in package-lock.json and enhance package installation commands in shell_routes.py and cookbook.js

* feat: add Apfel startup and health checks on macOS

- bootstrap Apfel via Homebrew on arm64 macOS
- start `apfel --serve --port 11435` detached for Odysseus
- verify readiness via `/health`
- clean up the Apfel process on exit or Ctrl+C

* fix: duplicate variable declaration post-merge conflict
- Should fix `node` CI issues.

* fix: issues with the update status of the APFEL dependency.
- fixed by changing the main conditional that determines the update.

* Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file.

* Fix: whitespace issues with the model_routes file

* Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file. Final

* Fix: Fixed updates using PIP for APFEL instead of custom cmd
2026-06-07 17:28:02 +02:00
Léo 573d431399 fix(cookbook): don't infer server OS from the browser's user-agent (#3223)
_getPlatform('local') fell back to navigator.userAgent to decide the
*server's* platform. On a Mac/Linux homeserver opened from a Windows
browser this returned 'windows', so the GGUF serve builder emitted the
Windows python-only shape (`python -m llama_cpp.server`, no
`llama-server ||` fallback). That command fails on the Unix host with
"No module named llama_cpp" even though native llama-server is installed,
and the diagnosis then misleadingly tells the user to pip-install
llama-cpp-python.

Trust the server-side hardware probe over the user-agent: a non-empty
probe backend (metal/cuda/rocm/cpu_*) means a Unix server; local Windows
instead carries platform:"windows" which already sets _envState.platform
and short-circuits. Only fall back to the browser hint when there is no
server-side signal at all. Keeps #1389/#2961's local-Windows path intact.

Fixes #3221

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:20:05 +02:00
ooovenenoso 2bdf43b74d feat(cookbook): add Gemma4 thinking chat template (#2955)
* feat(cookbook): add Gemma4 thinking chat template

* fix(cookbook): place Gemma4 thinking token in system turn
2026-06-05 22:43:31 +02:00
the_peaceful b5c45326e4 Fix Windows Cookbook background tasks, exit statuses, and empty SSH logs wrapper (#1389)
This commit consolidates all Windows Cookbook background fixes into a single comprehensive commit based on the latest main branch.

Key fixes included:
1. React looksSuccessful Mismatch: Append 'DOWNLOAD_OK' for pip install commands in routes/cookbook_routes.py.
2. Local Windows SSH Wrapper & Log Directory Mismatch: Bypassed ssh wrappers and dynamically selected odysseus-tmux logs for local tasks in static/js/cookbookRunning.js.
3. WSL Bash Filtration: Filtered out the WSL bash stub at C:\Windows\System32\bash.exe in core/platform_compat.py.
4. Drive-Colon Path Normalization: Replaced .as_posix() with git_bash_path() in routes/shell_routes.py and src/bg_jobs.py.
5. GGUF-Only Hardware Fitting: Restructured local Windows recommendations to rank GGUF only in services/hwfit/fit.py.
6. Safe Win32 Process Liveness Probe: Replaced os.kill(pid, 0) with a safe Win32 API probe using GetExitCodeProcess in core/platform_compat.py.
7. Prebuilt llama-cpp-python Wheels: Supply the CPU extra index during compilation failure fallback.
8. Enforce UTF-8 log encoding: Set PYTHONIOENCODING=utf-8 on Windows bootstrap runners.
9. Fix Linux Llama.cpp Build script syntax error in routes/cookbook_helpers.py.
10. Page Reload Status Check: Run sys.executable instead of 'python3' to bypass Microsoft Store execution stubs on local Windows hosts.
11. Llama.cpp serve build bypass: Bypassed cmake compilation checks on local Windows and verified python bindings directly.
12. Serve Command Path Validation: Masked safe GGUF path printf subshells '' inside the serve command validator.
13. CPU Mismatch Diagnostics: Intercepted AVX2-lacking '0xc000001d' (Illegal Instruction) crashes in static/js/cookbook-diagnosis.js and guided users to Ollama.
14. Windows Pytest stability: Fixed stub import leakage in test files.
2026-06-05 14:41:07 +02:00
pewdiepie-archdaemon 9112861d8e cookbook agent debug loop: persistent log files, auto-adopt orphan tmux, Codex/Claude skill parity
Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:

* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).

Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.

`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.

Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.

Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.

`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.

Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
2026-06-04 23:27:18 +09:00
pewdiepie-archdaemon 562bc4dedc Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
  ascending then truncated top-N, which made "highest VRAM" mathematically
  unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
  re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
  default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
  Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
  with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
  actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
  QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.

Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
  re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
  later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
  tooltip. Smart title-suffix strips the parts already in the repo name
  (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
  (#1962) for model-add requests.

Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
  "Download complete"). True overall % for multi-shard downloads:
  ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
  live shard line, the pill is hidden + relabels as "reconnect" + revives
  on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
  to 8s), scan persisted done/error/crashed downloads and probe their
  tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
  state is done but tmux is still alive revives the existing task and
  refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
  /api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
  DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
  shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
  status got stuck. Dir text on the running row, font matched to uptime.

Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
  metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
  matches body; action buttons get icons on the left (Retry/Copy/Edit/
  Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
  downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
2026-06-03 20:25:25 +09:00
pewdiepie-archdaemon 3706d756f3 Merge remote-tracking branch 'origin/main' into visual-pr-playground
# Conflicts:
#	routes/cookbook_routes.py
#	routes/hwfit_routes.py
#	services/hwfit/fit.py
#	services/hwfit/models.py
#	static/js/cookbook-diagnosis.js
#	static/js/cookbook-hwfit.js
#	static/js/cookbook.js
#	static/js/cookbookRunning.js
2026-06-03 16:49:10 +09:00
pewdiepie-archdaemon eb79b76432 Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes
Backend (services/hwfit + routes):
- rank_models picks visible set by REQUESTED column, not always score —
  sorting by Param now shows highest-param models PERIOD (incl. too_tight).
- New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang
  cannot serve them); default non-prequantized to BF16 on 2+ GPUs.
- AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so
  FP8 wins when both fit.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above
  M2.5 on equal composite score; >=100B integers not misread as versions.
- /api/cookbook/hf-latest no longer drops models without an "NB" pattern in
  the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered).
- Cached-model scan: atexit flushes models JSON even if the script is
  killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s.
- KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New
  "stalled" status for shells <1 MB with no .incomplete files.
- /api/cookbook/state POST guard: rejects "done" download tasks lacking
  DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
  shard is N<total — stops stale tabs from poisoning persisted state.
- hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization
  Q4_K_M -> BF16 (it is the native base, not a quant).

Frontend (static/js):
- Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/
  50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score
  on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit
  column header tagged with active budget (RAM / GPU / N GPU).
- Foldable Download admin-card: the Download h2 is the chevron trigger;
  state persists in localStorage.
- Download card surfaces destination dir (Dir: <path>). Same dir on running
  task row, font/color matched to uptime (9px Fira Code muted, opacity .4).
- Serve panel ctx text input always resets to model max on open. Sub-MB
  cached models show with red "download stalled" badge.
- Bulk-select Cancel + Delete reset the Select button label on exit.
- Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/
  required; bare "Download complete" no longer marks the task done after
  the first config file. Clear button now sends tmux kill-session too.
  True overall % for multi-shard downloads: ((N-1)+frac)/total instead of
  hf_transfer per-shard aggregate.
- Diagnosis card simplified: removed fold toggle, copy button, dismiss X.
  Suggestion font matches message body (12px).
- HF token field flashes green check + "Saved" on save.
- Cached scan no longer counts stalled rows as downloaded in Scan/Download.

CSS:
- dep Install button width pinned to 76px to match Installed split.
- task-sub row +1px; task-status badge gets margin-right 8px.
- Ctx slider styled like gallery editor sliders (thin pill rail, red thumb).
- Bulk-select cancel button top -3px -> -5px.
2026-06-03 16:32:20 +09:00
ghreprimand 6f001af2a3 Add a 'Rebuild llama.cpp' Cookbook action to force a fresh GPU build (#1787)
The serve bootstrap builds llama-server from source only when it is missing
from PATH, so a host that first compiled CPU-only (no nvcc present at build
time) reuses that CPU-only binary on every later serve and never gets a GPU
build, even after a CUDA/ROCm toolkit is installed. There was no UI lever to
force a rebuild.

Adds a 'Rebuild llama.cpp' button to the Cookbook Dependencies tab. It clears
the cached ~/bin/llama-server symlink and ~/llama.cpp/build directory (locally
or on the selected remote server) so the next serve recompiles and picks up
CUDA/HIP if a toolchain is now present. It installs and downloads nothing.

- routes/cookbook_helpers.py: _llama_cpp_rebuild_cmd() (single source of truth)
- routes/shell_routes.py: POST /api/cookbook/rebuild-engine (admin-only, reuses
  the existing SSH plumbing for remote hosts)
- static/js/cookbook.js: header button + handler honoring the deps server selector
- tests: cover the command shape and a clean run on a fresh HOME

Motivated by #831 (RTX 4070 user stuck on a CPU-only build with no way to
re-trigger the build).

Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-03 13:28:19 +09:00
lekt8 0e6cbd8315 Drop GPU-only flags from the CPU-only (-ngl 0) serve command (#1433)
A CPU-only llama.cpp serve config still emitted --flash-attn on and exported
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto
profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed
to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is
unchanged — the gate only affects the ngl=0 path.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 04:26:15 +09:00
LittleLlama 50a486b608 fix(cookbook): add NVFP4 to quantization picker dropdown (#1378)
Fixes #1328
2026-06-03 03:26:43 +09:00
spooky f667667da3 fix: distinguish external cookbook runtimes (#1188) 2026-06-02 23:20:00 +09:00
spooky 5b87e69221 feat: add vllm kv cache dtype option (#1185) 2026-06-02 23:17:16 +09:00
pewdiepie-archdaemon ff93a6c63b Polish email and cookbook flows 2026-06-02 22:42:07 +09:00
spooky cd4f496cb4 Fix native Cookbook quant classification 2026-06-02 13:07:20 +09:00
Juan Pablo Jiménez eda99360d1 Fix Cookbook dependency install completion state
* Fix Cookbook dependency install completion state

Mark Cookbook dependency installs as complete when the background runner
exits successfully, even when HuggingFace-specific download markers are
absent.

* Add focused regression coverage for cookbook dependency completion.

Keep the fix narrowly scoped while carrying env_path through dependency tasks and locking the completion reconciliation behavior with targeted tests.
2026-06-02 12:59:29 +09:00
spooky 0f3280ee05 Expose advanced llama.cpp serve controls 2026-06-02 12:46:16 +09:00
Leo 6fca7e86b7 Cookbook serve profiles and engine filter
* Cookbook: Engine filter + intelligent hardware-computed serve profiles

Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).

Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
  picker. Pure client-side view filter over the fetched list via the same
  _detectBackend() the serve commands use, so what you filter to is exactly what
  would launch. Re-renders from cache (no refetch). Empty-state message + the
  instant-cache-paint path account for it too.

Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
  model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
  context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
  instead of failing; a model that fits stays fully on GPU; quant tracks profile
  intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
  so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
  — partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
  context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
  quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
  --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
  previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
  Cache / Flash Attn fields. Context is clamped to the model's trained limit
  (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
  fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
  amdgpu ErrorDeviceLost.

Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook: make column-header sorting discoverable (incl. Newest)

Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.

- Style sortable headers as interactive: pointer cursor, hover underline, and
  the active sort column bolded/highlighted. There was no CSS for
  .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
  not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
  existing header-click sort wiring and the "newest" SORT_KEY.

No new sort control — uses the existing column-header paradigm.

Checks: node --check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)

In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.

- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
  mode), the quant is locked to the file's and profiles differ only in the real
  serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
  use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
  (no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
  the repo/file name) and passes them, so profiles match what's actually served.

Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.

Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor

Two serve-panel additions:

1. **Vision toggle.** A "Vision" checkbox that serves the model with its
   multimodal projector so it can read images. The mmproj path is resolved at
   runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
   the model folder makes the toggle just work; `--mmproj … --image-max-tokens
   1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.

2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
   while the panel is open and shows VRAM used/total/%, free, and — crucially on
   a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
   health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
   CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
   (previously read for total only and discarded for 'used').

Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.

Checks: node --check + py_compile pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:34:42 +09:00