Salastil/odysseus - odysseus - Gitea: Git with a cup of tea

Salastil/odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-15 17:25:26 -04:00

Author	SHA1	Message	Date
pewdiepie-archdaemon	2cbd55b8bd	Open email context for agent, email search across All Mail, cookbook serve polish - Agent: pass the open email reader (uid/folder/account/from/subject/body preview) on every chat submit so 'reply to this' / 'write email saying hi' route to ui_control open_email_reply with the right UID instead of inventing a new .md draft. Code-level enforcement (chat_routes strips create_document + send_email when active_email is set); cross-session active_doc_id is now trusted instead of being silently dropped. set_active_email/clear_active_email tool-layer helpers in tool_implementations. - ui_control open_email_reply: optional body argument so the agent can open-and-write in one call; envelope now forwards uid/folder/account/ body/panel through tool_output. Tool description sharpened and the parser rejects empty bodies on reply/reply-all (forces the agent to write rather than open an empty draft). - Email library: search now runs against [Gmail]/All Mail when the current folder is INBOX (archived emails surface). Whirlpool spinner + 'Searching…' placeholder while in flight. Each search result is stamped with its source folder so clicks open the right email instead of whatever shares its UID in INBOX. Search no longer re-applies the same text pill locally (which only checks subject/from/snippet, never body) so body-only matches don't get dropped after IMAP returns them. Initial inbox load bumped 100→500. - Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the card menu and the open-reader more menu, backed by a new /api/email/flag/{uid}?on=true\|false endpoint. Flagged emails always bubble to the top of the grid regardless of active sort. - AI reply in doc editor: never overwrites existing draft text or the quoted history. AI suggestion is prepended; AI-generated 'On … wrote:' re-quotes are stripped so the original quote isn't visually edited. - Cookbook serve: pre-launch GPU driver / has_gpu / install / version- floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0 etc.) before the launch chain starts. Detect 'another model already running on this host' and offer Stop & launch (with graceful then force tmux kill helpers, port release wait). Per-vendor deep-link buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend picker is now a custom dropdown with accent-coloured logos for vLLM, SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to package names in Dependencies. Runtime-readiness note moved inside the panel (green when ready, red when missing) with an × dismiss. Esc collapses the expanded card; expanded card scrolls when it overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on one row (Reasoning Parser auto-detected per model family). Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto' input — command builders read from the GPU button strip. Default cookbook open is Download tab. - Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by release_date; release dates can be backfilled with the new scripts/backfill_model_release_dates.py and recipe metadata pulled with scripts/import_from_vllm_recipes.py against the upstream vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped on entries). - Calendar: Quick add hint cycles a random Odysseus-themed example per open (wooden horse Friday, crew muster 10am daily, council on Ithaca, …). Typing a time like '11pm' in the event title updates the hero clock live. - Doc editor: email-mode Reply button (sparkle icon, accent) opens the same Fast/Full + context popover the email reader uses; Ctrl+Alt+M toggles markdown preview. - Memories panel: custom sort picker with per-option icons, default 'Latest', visible Enabled/Disabled toggle text matching the section description style.	2026-06-15 20:47:51 +09:00
pewdiepie-archdaemon	768bcb565a	Cookbook/Dependencies: variant toggle now uses the agent/chat mode-toggle; Copy inside code - Drop the 'Install via' label and the pill-tag variant buttons. The toggle is now the same sliding-pill mode-toggle used by the Agent/Chat selector in the chat input. Pip/uv on the left, Docker on the right, default = Pip/uv. CSS: extended .mode-chat::before's translateX(100%) rule to also fire on .mode-right so non-chat callers can use the same animation without claiming the chat-only class name. - Copy button moves inside the <pre>: absolute-positioned at the top-right corner, icon-only, padding-right on the pre makes room. Matches the Setup-token copy pattern in the integrations form.	2026-06-14 22:54:38 +09:00
pewdiepie-archdaemon	63b4ad2e9c	Cookbook/Dependencies: Pip/uv vs Docker variant toggle on recipe panel Each recipe catalog entry now carries two variants: variants.pip → uv pip install … variants.docker → docker pull <image> A small 'Install via' pill row in the panel toggles between them (default = Pip/uv per the user's preference). Switching variant or changing the model re-renders the <pre> via _refreshRecipePre(); the display text drops the 'source venv/bin/activate' prefix for Docker since docker pull doesn't need a venv. Run honours the active variant so picking Docker queues 'docker pull …' as the tmux task.	2026-06-14 22:47:26 +09:00
pewdiepie-archdaemon	d70eb99a0d	Cookbook/Dependencies recipes: install into configured venv, drop 'uv venv' Recipes now hold ONLY the install command(s). The rendered <pre> prepends a 'source <envPath>/bin/activate' line so the user sees a paste-ready sequence; Run uses env_prefix (same path the Install button uses) to activate the configured venv before the install command, so the install lands in the existing environment rather than a fresh .venv in whatever CWD the tmux task happens to start in. - cookbook-deps-recipes.js: trim each recipe to its single pip command - cookbook.js: _recipeDisplayText() prepends the activate context for display; pre's data-dep-recipe-install holds the raw install-only command list so Run knows what to send; Run builds env_prefix the same way _installDep does.	2026-06-14 22:45:12 +09:00
pewdiepie-archdaemon	d44de3af43	Cookbook/Dependencies: populate recipe model picker from downloaded models The recipe dropdown was a static catalog (MiniMax / Any vLLM model). Now it lists every model already downloaded on the active server (the same _cachedModelIds set the Launch tab + dl-dots already drive), plus an 'Other (generic …)' fallback. The change handler uses pickRecipe(backend, modelId) to find the best match — MiniMax ids land on the MiniMax recipe, everything else falls back to the generic install. cookbook-diagnosis.js: openCookbookDependencies's pre-select logic now matches by full option value (model id) instead of label substring, since the dropdown values are full repo ids now.	2026-06-14 22:40:52 +09:00
pewdiepie-archdaemon	600fa6be8a	Cookbook/Dependencies: per-backend recipe panel (vllm/sglang/llama_cpp) Each row for vllm, sglang, llama_cpp now carries an expand caret that opens an inline recipe panel below the row. The panel has: - 'Serving which model?' select populated from a new tiny catalog - <pre> code block showing the exact shell sequence for that pair - Copy: clipboard the commands - Run: launch the joined 'cmd1 && cmd2 && …' as a tmux task on the currently-selected deps server (same plumbing as Install) New file: src/static/js/cookbook-deps-recipes.js — single source of truth for the recipes. Seeded with MiniMax M2/M2.7 + a generic fallback for each backend (all three use 'uv venv → source .venv/bin/activate → uv pip install ... --torch-backend auto', the recipe the user pasted). Adding model-specific recipes is now a one-entry edit. Next commit: Launch-tab pre-flight that intercepts the serve click when the backend isn't installed and deep-links into this panel.	2026-06-14 22:33:49 +09:00
pewdiepie-archdaemon	781a3ee829	Cookbook: rename 'Run' tab → 'Launch' (cookbook.js:1865)	2026-06-14 22:23:38 +09:00
pewdiepie-archdaemon	4074e77d93	Cookbook: auto-set KV cache to fp8 for DeepSeek V3/V4/R1 MoE families These models OOM on --kv-cache-dtype auto (≈bf16) at any usable context with current tensor-parallel layouts. _detectModelOptimizations now seeds opts.kvCacheDtype='fp8' for them, and the serve panel's KV Cache select picks that up as the default unless the user has a saved override on this skill.	2026-06-14 08:57:29 +09:00
pewdiepie-archdaemon	d3944be1be	Cookbook: detect DeepSeek V4+ as MoE so Expert Parallel + Spec show The DeepSeek branch in _detectModelOptimizations matched only V3 and R1 literally. DeepSeek-V4-Flash (and future Vx / Rx) didn't hit any branch, so the Expert Parallel checkbox + Speculative defaults never surfaced in the Run panel. Widened to a regex that catches v3/v3.1/v4/v5/v10+ and r1/r2/… for both the expert-parallel flag and the MTP speculative defaults.	2026-06-14 08:51:57 +09:00
pewdiepie-archdaemon	1d7d9c5e9c	Cookbook deps: drop the manual vLLM install block + Run handlers	2026-06-14 08:49:20 +09:00
pewdiepie-archdaemon	adac89c8e2	Cookbook deps: NVIDIA vs AMD ROCm-aware vLLM install commands Reads the last hwfit scan's backend (window._hwfitSystemCache.backend) and picks the right vLLM install path per vendor: - NVIDIA/CUDA (default) - uv: uv pip install -U vllm --torch-backend auto - docker: docker pull vllm/vllm-openai:latest - AMD/ROCm - uv: uv pip install -U vllm --torch-backend rocm - docker: docker pull rocm/vllm-dev:main The <pre> previews are re-painted on render to match what Run will actually launch, and the confirm dialog tags the backend so the user knows what they're committing to.	2026-06-14 08:46:58 +09:00
pewdiepie-archdaemon	65a2e51af8	Cookbook deps: convert manual install snippets to Run buttons Was just a copy-paste reference. Each row now has a Run button that launches the command as a tmux task on the currently-selected deps server (same path Reinstall already uses) — Odysseus does the work, the user watches progress in the Active tab. Dropped the plain pip option since the existing per-package Install button already covers it; kept uv (recommended) and Docker pull as the two alternatives.	2026-06-14 08:45:43 +09:00
pewdiepie-archdaemon	04a97adbb3	Cookbook: Extra args under Reasoning/Spec + manual vLLM install hints in Dependencies - Moved "Extra args" out from above the vLLM advanced checks (Reasoning Parser, Speculative, MoE Env) to AFTER them, so it reads as "after the advanced toggles, anything else". - Added a collapsed "Manual install (vLLM)" details block to the Dependencies tab description with three copy-paste recipes: uv venv + uv pip (recommended), plain pip, and docker pull vllm/vllm-openai:latest. Useful when the in-app Install button can't run (offline target, custom torch backend, etc).	2026-06-14 08:43:10 +09:00
pewdiepie-archdaemon	654f9f82c7	Cookbook: don't auto-fold Direct Download from inside its own body The capture-phase scroll listener was firing for scrolls anywhere in the modal — including the Trending models list, which lives inside the Direct Download fold body. Scrolling that list was auto-folding the section that contains it. Bail early if the scroll target is the fold body or a descendant — the section only folds on scrolls in sibling scrollers (.cookbook-body, .hwfit-list, .modal-content).	2026-06-13 22:26:04 +09:00
pewdiepie-archdaemon	ac4de93928	Cookbook: rename Serve tab → Run (label only, data-backend stays Serve)	2026-06-13 21:55:24 +09:00
pewdiepie-archdaemon	44a60c1261	Cookbook toolbar: move Search next to Standard, Engine/Quant/Context to right New order: [Standard ▾] [Search ............] [Engine] [Quant] [Context] so the two primary picks (type + free text) sit together at the left, with the more advanced filters lined up to the right.	2026-06-13 21:30:05 +09:00
pewdiepie-archdaemon	f09f606bec	Cookbook fold: smooth max-height + opacity transition display:none toggle was instant and felt jarring during auto-fold/ auto-expand. Swapped to a CSS class `.is-folded` that transitions max-height (0 ↔ 1200px) and opacity (0 ↔ 1) over ~280ms with ease, so both manual chevron clicks and the scroll-driven toggles slide in/out smoothly.	2026-06-13 20:14:34 +09:00
pewdiepie-archdaemon	e6349c016e	Cookbook auto-fold: auto-expand when scrolling back to top scroll handler now tracks per-target scrollTop via WeakMap. Downward scroll on any scroller in the cookbook modal folds Direct Download; scrolling back to top (scrollTop <= 0) unfolds it. Manual chevron clicks still win — they persist to localStorage; auto-toggles don't, so the user's last explicit pick survives reload.	2026-06-13 20:12:30 +09:00
pewdiepie-archdaemon	e630605aef	Cookbook auto-fold: capture-phase scroll listener catches hwfit-list IntersectionObserver missed the case because scrolling inside the nested .hwfit-list (max-height:52vh own scroller) doesn't move the header out of view at all. The user wants any downward scroll in the scan/download area to fold Direct Download. Switched to a capture-phase scroll listener on #cookbook-modal that catches every scroll event from any nested scroller (.hwfit-list, .cookbook-body, .modal-content). Folds only on downward scrolls so scrolling back up doesn't keep re-folding.	2026-06-13 20:10:46 +09:00
pewdiepie-archdaemon	74e563dabc	Cookbook auto-fold: use IntersectionObserver to catch any scroll source The scroll listener on .cookbook-body never fired — the user is likely scrolling inside the nested .hwfit-list (max-height:52vh) which doesn't bubble to its parent. IntersectionObserver fires whenever the Direct Download header crosses the viewport edge regardless of which container moved. Folds only when boundingClientRect.top < 0 (header pushed up past the top) so modal close / detach doesn't trigger it.	2026-06-13 20:07:32 +09:00
pewdiepie-archdaemon	ae0b29af3d	Cookbook auto-fold: target the actual scroll container (.cookbook-body) Previous .modal-body / .cookbook-content lookup matched neither the desktop scroller (.cookbook-body) nor the mobile one (#cookbook-modal .modal-content), so the scroll listener was attached to document.body and never fired. Walk up to whichever scroller actually exists.	2026-06-13 20:05:33 +09:00
pewdiepie-archdaemon	d68c75a82c	Cookbook: auto-fold Direct Download when its header scrolls past top Added a scroll listener on the parent .modal-body / cookbook-content that folds the Direct Download body once its h2 header has scrolled above the container's top edge. Frees the viewport for the Scan section below while leaving the chevron clickable to expand again. Auto-fold doesn't write to localStorage (only manual clicks do) so the user's last explicit preference still wins on reload.	2026-06-13 20:03:14 +09:00
pewdiepie-archdaemon	a615f7f786	Cookbook Trending: drop ↻ refresh button (trending list reloads on toggle)	2026-06-13 20:01:54 +09:00
pewdiepie-archdaemon	0808de0b3b	Cookbook Trending: shrink trending-up icon 18px → 15px	2026-06-13 20:01:26 +09:00
pewdiepie-archdaemon	aba3a7ae43	Cookbook Trending: accent trending-up icon + chevron on right + larger row - Added a trending-up (market-up) SVG before the label, tinted accent so the section reads as "what's hot". - Chevron ▸ moved from the left to the right side of the toggle row (still rotates via the existing CSS). - Bumped the toggle row taller (26→34px) with 13px font + 18px icon so the section header has more presence.	2026-06-13 19:59:41 +09:00
pewdiepie-archdaemon	f78084c230	Brain cards 32px tall + Trending tab up 8px + drop hwfit Rescan - Brain admin-card header rows get min-height:32px so cards with toggles and cards without (Inject Skills) align. - Cookbook Trending models tab nudged up 8px (top:-3 → -11). - Removed the ↻ RESCAN button in hwfit toolbar; manual EDIT still available and auto-probe runs on container restart.	2026-06-13 19:56:22 +09:00
pewdiepie-archdaemon	d397b3db2f	Restore dropped regression fixes	2026-06-09 10:31:43 +09:00
pewdiepie-archdaemon	4715a5505d	Fix duplicate cookbook server helper export	2026-06-09 09:53:41 +09:00
pewdiepie-archdaemon	84ca74f04b	Restore cookbook server key exports	2026-06-09 09:51:53 +09:00
pewdiepie-archdaemon	fa8c93ec0a	Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish Surface a lot of accumulated cookbook + UI work as a single non-agent commit so the agent rework lands cleanly. Highlights: - Ollama as a first-class backend in the Cookbook: * Download input accepts ollama-style names (name:tag) → backend=ollama * /api/cookbook/ollama/library (cached scrape of ollama.com + curated fallback so classic models like qwen2.5 stay reachable) * "Browse Ollama library" toggle below Download with size chips * Engine=Ollama in hwfit toolbar merges the Ollama library into the main scan list as per-tag rows with the same Fit/Param/Quant/VRAM columns; click → fills Download input - API Tokens form added to Integrations panel (matching wired loadTokens()/initTokenForm() that had no HTML) - Serve panel polish: Advanced fold tightening (-8px nudges on vLLM checks, Extra args, Spec row), n_cpu_moe + Split Mode controls pulled up 8px to align with the row's checkboxes, GGUF File dropdown exposed for Ollama backend, GPU re-render on Edit serve restore, _forceBackend flag so saved serveState wins over backend detection, cookbook:servers-changed CustomEvent so panels don't need refresh - Models page redesign: Add Models row (URL + hidden API key reveal + Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All + Clear-offline buttons in Added Models toolbar, offline-pill removed (opacity already conveys state), Engine dropdown gains Ollama option - _ping_endpoint probes /v1/models then base, accepts 4xx as reachable (vLLM returns 404 on bare /v1, fully working endpoints were showing offline) - Diagnosis card: × dismiss + Copy bundle buttons restored on the serve error feedback card - Orphan tmux sweep re-enabled behind a 60s rate-limit + background Thread (off the main event loop) so dead serves get discovered - cookbook_routes auto-register watchdog: drops the endpoint if the serve session exits non-zero within the first ~3min - ollama-rocm sidecar awareness in download wrapper (`docker exec ollama-rocm ollama pull` when host ollama isn't installed) - Skill extractor sets initial_status="published" when auto_approve_skills pref is on (audit demotes later) - Skill list / model list / cookbook scan misc polish	2026-06-09 09:46:19 +09:00
pewdiepie-archdaemon	3b01760e95	Prepare tested main sync cleanup	2026-06-09 09:34:42 +09:00
Ocean Bennett	62ffcb6236	fix(cookbook): preserve same-host ssh profile selection (#3373 ) * fix(cookbook): preserve same-host ssh profile selection * fix(cookbook): resolve same-host ssh profiles in running tab and port lookups	2026-06-09 00:36:10 +02:00
Sebastian Andres El Khoury Seoane	8d9d4ec9c6	feat(platform): Add support for APFEL as part of the dependencies and models for the Cookbook. (#2657 ) * feat(platform): add support for Apple Silicon detection in platform compatibility test(tests): enhance shell_routes tests for Apple Silicon compatibility * fix issues with missing import * fix: correct package name in package-lock.json and enhance package installation commands in shell_routes.py and cookbook.js * feat: add Apfel startup and health checks on macOS - bootstrap Apfel via Homebrew on arm64 macOS - start `apfel --serve --port 11435` detached for Odysseus - verify readiness via `/health` - clean up the Apfel process on exit or Ctrl+C * fix: duplicate variable declaration post-merge conflict - Should fix `node` CI issues. * fix: issues with the update status of the APFEL dependency. - fixed by changing the main conditional that determines the update. * Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file. * Fix: whitespace issues with the model_routes file * Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file. Final * Fix: Fixed updates using PIP for APFEL instead of custom cmd	2026-06-07 17:28:02 +02:00
Léo	573d431399	fix(cookbook): don't infer server OS from the browser's user-agent (#3223 ) _getPlatform('local') fell back to navigator.userAgent to decide the server's platform. On a Mac/Linux homeserver opened from a Windows browser this returned 'windows', so the GGUF serve builder emitted the Windows python-only shape (`python -m llama_cpp.server`, no `llama-server \|\|` fallback). That command fails on the Unix host with "No module named llama_cpp" even though native llama-server is installed, and the diagnosis then misleadingly tells the user to pip-install llama-cpp-python. Trust the server-side hardware probe over the user-agent: a non-empty probe backend (metal/cuda/rocm/cpu_*) means a Unix server; local Windows instead carries platform:"windows" which already sets _envState.platform and short-circuits. Only fall back to the browser hint when there is no server-side signal at all. Keeps #1389/#2961's local-Windows path intact. Fixes #3221 Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 13:20:05 +02:00
ooovenenoso	2bdf43b74d	feat(cookbook): add Gemma4 thinking chat template (#2955 ) * feat(cookbook): add Gemma4 thinking chat template * fix(cookbook): place Gemma4 thinking token in system turn	2026-06-05 22:43:31 +02:00
the_peaceful	b5c45326e4	Fix Windows Cookbook background tasks, exit statuses, and empty SSH logs wrapper (#1389 ) This commit consolidates all Windows Cookbook background fixes into a single comprehensive commit based on the latest main branch. Key fixes included: 1. React looksSuccessful Mismatch: Append 'DOWNLOAD_OK' for pip install commands in routes/cookbook_routes.py. 2. Local Windows SSH Wrapper & Log Directory Mismatch: Bypassed ssh wrappers and dynamically selected odysseus-tmux logs for local tasks in static/js/cookbookRunning.js. 3. WSL Bash Filtration: Filtered out the WSL bash stub at C:\Windows\System32\bash.exe in core/platform_compat.py. 4. Drive-Colon Path Normalization: Replaced .as_posix() with git_bash_path() in routes/shell_routes.py and src/bg_jobs.py. 5. GGUF-Only Hardware Fitting: Restructured local Windows recommendations to rank GGUF only in services/hwfit/fit.py. 6. Safe Win32 Process Liveness Probe: Replaced os.kill(pid, 0) with a safe Win32 API probe using GetExitCodeProcess in core/platform_compat.py. 7. Prebuilt llama-cpp-python Wheels: Supply the CPU extra index during compilation failure fallback. 8. Enforce UTF-8 log encoding: Set PYTHONIOENCODING=utf-8 on Windows bootstrap runners. 9. Fix Linux Llama.cpp Build script syntax error in routes/cookbook_helpers.py. 10. Page Reload Status Check: Run sys.executable instead of 'python3' to bypass Microsoft Store execution stubs on local Windows hosts. 11. Llama.cpp serve build bypass: Bypassed cmake compilation checks on local Windows and verified python bindings directly. 12. Serve Command Path Validation: Masked safe GGUF path printf subshells '' inside the serve command validator. 13. CPU Mismatch Diagnostics: Intercepted AVX2-lacking '0xc000001d' (Illegal Instruction) crashes in static/js/cookbook-diagnosis.js and guided users to Ollama. 14. Windows Pytest stability: Fixed stub import leakage in test files.	2026-06-05 14:41:07 +02:00
pewdiepie-archdaemon	9112861d8e	cookbook agent debug loop: persistent log files, auto-adopt orphan tmux, Codex/Claude skill parity Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner: * Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file. * `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400. * `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt). Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-` / `cookbook-` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll. `serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes. Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop. Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles. `adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd. Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).	2026-06-04 23:27:18 +09:00
pewdiepie-archdaemon	562bc4dedc	Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI Backend (services/hwfit + routes): - VRAM column sort now shows global highest first (was special-cased to ascending then truncated top-N, which made "highest VRAM" mathematically unreachable). Every column path uses reverse=True for the truncation. - Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep re-probing the rig during a session; Rescan button still forces fresh. - Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5. - hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the actual 156 GB / 284 GB disk footprints). - New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT / QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES. Frontend — Scan/Download: - Engine + Quant swapped in the toolbar; Quant defaults to "All". - Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag re-sorts by vram ascending (smallest fitting first); back to Max → score. - Ctx slider rail now visible — was background:transparent in a duplicate later-cascade rule. Hardcoded grey + !important. - Search input moved to the far right of the toolbar. - Type/Standard default; "Context" not uppercased; Search placeholder dimmed. - Engine "?" + Quant "?" inline help chips inside their dropdown boxes. - Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc. - Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in tooltip. Smart title-suffix strips the parts already in the repo name (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)"). - Conditional warning for safetensors models on non-GPU rigs only. - Dependency Install / Installed / Installed▾ / N/A all 75.85px wide. - Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag. - Foldable Download admin-card (h2 chevron); line under h2 only when folded. - HF token save gets a green ✓ + "Saved" flash. - Cached scan no longer counts stalled rows as downloaded. - Footer: "Request it →" link with GitHub mark to the public discussion (#1962) for model-add requests. Frontend — Running tab: - Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare "Download complete"). True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate. - ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m". - Clear button kills the tmux session too; if the output still shows a live shard line, the pill is hidden + relabels as "reconnect" + revives on click. - Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled to 8s), scan persisted done/error/crashed downloads and probe their tmux session — if alive, flip status back to running and reattach. - Per-launch zombie probe: clicking Download on a model whose persisted state is done but tmux is still alive revives the existing task and refuses to start a duplicate. - Pre-launch GPU probe: vllm / sglang / diffusers serve check /api/cookbook/gpus first; warns + confirms if no GPU is visible. - Server-side state guard: rejects "done" POSTs for downloads lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stale tabs can't poison persisted state any more. - Running count includes tasks whose output looks active even if persisted status got stuck. Dir text on the running row, font matched to uptime. Serve panel: - Ctx text input always resets to model max on open (default 20000 when metadata is missing). - Max Seqs default 8 -> 4. KV Cache dtype select 32px tall. - Lightning icon on Launch (same as Action toggle). - Diagnosis card simplified (no fold/copy/dismiss), suggestion font matches body; action buttons get icons on the left (Retry/Copy/Edit/ Install/Kill/Switch/etc.). - Incomplete-download serve warning when model status is downloading / stalled / has_incomplete. - MTP "?" tooltip ("supported on a few model families … up to ~3× faster").	2026-06-03 20:25:25 +09:00
pewdiepie-archdaemon	3706d756f3	Merge remote-tracking branch 'origin/main' into visual-pr-playground # Conflicts: # routes/cookbook_routes.py # routes/hwfit_routes.py # services/hwfit/fit.py # services/hwfit/models.py # static/js/cookbook-diagnosis.js # static/js/cookbook-hwfit.js # static/js/cookbook.js # static/js/cookbookRunning.js	2026-06-03 16:49:10 +09:00
pewdiepie-archdaemon	eb79b76432	Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes Backend (services/hwfit + routes): - rank_models picks visible set by REQUESTED column, not always score — sorting by Param now shows highest-param models PERIOD (incl. too_tight). - New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang cannot serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so FP8 wins when both fit. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5 on equal composite score; >=100B integers not misread as versions. - /api/cookbook/hf-latest no longer drops models without an "NB" pattern in the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered). - Cached-model scan: atexit flushes models JSON even if the script is killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s. - KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New "stalled" status for shells <1 MB with no .incomplete files. - /api/cookbook/state POST guard: rejects "done" download tasks lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stops stale tabs from poisoning persisted state. - hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization Q4_K_M -> BF16 (it is the native base, not a quant). Frontend (static/js): - Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/ 50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit column header tagged with active budget (RAM / GPU / N GPU). - Foldable Download admin-card: the Download h2 is the chevron trigger; state persists in localStorage. - Download card surfaces destination dir (Dir: <path>). Same dir on running task row, font/color matched to uptime (9px Fira Code muted, opacity .4). - Serve panel ctx text input always resets to model max on open. Sub-MB cached models show with red "download stalled" badge. - Bulk-select Cancel + Delete reset the Select button label on exit. - Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/ required; bare "Download complete" no longer marks the task done after the first config file. Clear button now sends tmux kill-session too. True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer per-shard aggregate. - Diagnosis card simplified: removed fold toggle, copy button, dismiss X. Suggestion font matches message body (12px). - HF token field flashes green check + "Saved" on save. - Cached scan no longer counts stalled rows as downloaded in Scan/Download. CSS: - dep Install button width pinned to 76px to match Installed split. - task-sub row +1px; task-status badge gets margin-right 8px. - Ctx slider styled like gallery editor sliders (thin pill rail, red thumb). - Bulk-select cancel button top -3px -> -5px.	2026-06-03 16:32:20 +09:00
ghreprimand	6f001af2a3	Add a 'Rebuild llama.cpp' Cookbook action to force a fresh GPU build (#1787 ) The serve bootstrap builds llama-server from source only when it is missing from PATH, so a host that first compiled CPU-only (no nvcc present at build time) reuses that CPU-only binary on every later serve and never gets a GPU build, even after a CUDA/ROCm toolkit is installed. There was no UI lever to force a rebuild. Adds a 'Rebuild llama.cpp' button to the Cookbook Dependencies tab. It clears the cached ~/bin/llama-server symlink and ~/llama.cpp/build directory (locally or on the selected remote server) so the next serve recompiles and picks up CUDA/HIP if a toolchain is now present. It installs and downloads nothing. - routes/cookbook_helpers.py: _llama_cpp_rebuild_cmd() (single source of truth) - routes/shell_routes.py: POST /api/cookbook/rebuild-engine (admin-only, reuses the existing SSH plumbing for remote hosts) - static/js/cookbook.js: header button + handler honoring the deps server selector - tests: cover the command shape and a clean run on a fresh HOME Motivated by #831 (RTX 4070 user stuck on a CPU-only build with no way to re-trigger the build). Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-03 13:28:19 +09:00
lekt8	0e6cbd8315	Drop GPU-only flags from the CPU-only (-ngl 0) serve command (#1433 ) A CPU-only llama.cpp serve config still emitted --flash-attn on and exported GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is unchanged — the gate only affects the ngl=0 path. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:26:15 +09:00
LittleLlama	50a486b608	fix(cookbook): add NVFP4 to quantization picker dropdown (#1378 ) Fixes #1328	2026-06-03 03:26:43 +09:00
spooky	f667667da3	fix: distinguish external cookbook runtimes (#1188 )	2026-06-02 23:20:00 +09:00
spooky	5b87e69221	feat: add vllm kv cache dtype option (#1185 )	2026-06-02 23:17:16 +09:00
pewdiepie-archdaemon	ff93a6c63b	Polish email and cookbook flows	2026-06-02 22:42:07 +09:00
spooky	cd4f496cb4	Fix native Cookbook quant classification	2026-06-02 13:07:20 +09:00
Juan Pablo Jiménez	eda99360d1	Fix Cookbook dependency install completion state * Fix Cookbook dependency install completion state Mark Cookbook dependency installs as complete when the background runner exits successfully, even when HuggingFace-specific download markers are absent. * Add focused regression coverage for cookbook dependency completion. Keep the fix narrowly scoped while carrying env_path through dependency tasks and locking the completion reconciliation behavior with targeted tests.	2026-06-02 12:59:29 +09:00
spooky	0f3280ee05	Expose advanced llama.cpp serve controls	2026-06-02 12:46:16 +09:00
Leo	6fca7e86b7	Cookbook serve profiles and engine filter * Cookbook: Engine filter + intelligent hardware-computed serve profiles Two related Cookbook serving improvements for accurate, hardware-aware model serving (especially on consumer GPUs that can only run GGUF/llama.cpp). Engine filter - New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant picker. Pure client-side view filter over the fetched list via the same _detectBackend() the serve commands use, so what you filter to is exactly what would launch. Re-renders from cache (no refetch). Empty-state message + the instant-cache-paint path account for it too. Intelligent serve profiles (Quality / Balanced / Speed) - services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM + model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type, context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU instead of failing; a model that fits stays fully on GPU; quant tracks profile intent; vision models keep image-encoder headroom. Reuses models.py VRAM math so filtering and serving agree on what fits. Pure/deterministic (no t/s claims — partial-offload speed isn't reliably predictable; fit is what's computed). - /api/hwfit/profiles endpoint returns the profiles + the model's trained context limit, with loose name matching (strips org/ prefix, -GGUF suffix, quant tag) so a local GGUF folder name resolves to its catalog entry. - _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn / --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It previously only set -ngl/-c, which is why it OOM'd or ran slow. - Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV Cache / Flash Attn fields. Context is clamped to the model's trained limit (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch — fixes a crash where a stale 256k/16M preset + quantized KV cache caused an amdgpu ErrorDeviceLost. Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed VRAM, context cap, launchable flags, vision headroom, no-GPU empty. Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k, matching hand-tuning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook: make column-header sorting discoverable (incl. Newest) Sorting in Cookbook is via clickable column headers (pewds' design), but the headers had no visual cue that they're interactive — so sorting in general, and the Newest sort on the Model header specifically, was undiscoverable. - Style sortable headers as interactive: pointer cursor, hover underline, and the active sort column bolded/highlighted. There was no CSS for .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort, not just Newest. - The Model column header sorts by release_date (newest first), reusing the existing header-click sort wiring and the "newest" SORT_KEY. No new sort control — uses the existing column-header paradigm. Checks: node --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2) In the Serve tab the model is a specific GGUF file already on disk, so its quant can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K" as if you could re-quantize it. That's meaningless when serving a fixed file. - compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE mode), the quant is locked to the file's and profiles differ only in the real serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode (no override) still varies the quant to show download options. - /api/hwfit/profiles accepts serve_weights_gb & serve_quant. - The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from the repo/file name) and passes them, so profiles match what's actually served. Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k ncm15) — no nonsensical quant changes. Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor Two serve-panel additions: 1. Vision toggle. A "Vision" checkbox that serves the model with its multimodal projector so it can read images. The mmproj path is resolved at runtime (find mmproj-.gguf next to the model), so dropping an mmproj file in the model folder makes the toggle just work; `--mmproj … --image-max-tokens 1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found. 2. Live GPU-memory monitor.* A readout that polls /api/cookbook/gpus every 4s while the panel is open and shows VRAM used/total/%, free, and — crucially on a discrete card — RAM spillover (AMD gtt_used_mb), with a plain-language health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint (previously read for total only and discarded for 'used'). Lets you see at a glance whether a config fits VRAM (fast) or is paging to system RAM over PCIe (slow) instead of guessing. Checks: node --check + py_compile pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:34:42 +09:00

1 2