odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-17 10:15:27 -04:00

Author	SHA1	Message	Date
pewdiepie-archdaemon	6d507f8128	Merge remote-tracking branch 'origin/dev' into test-main-dev-merge-20260615 # Conflicts: # src/tool_implementations.py # static/js/research/panel.js	2026-06-15 21:20:15 +09:00
pewdiepie-archdaemon	2cbd55b8bd	Open email context for agent, email search across All Mail, cookbook serve polish - Agent: pass the open email reader (uid/folder/account/from/subject/body preview) on every chat submit so 'reply to this' / 'write email saying hi' route to ui_control open_email_reply with the right UID instead of inventing a new .md draft. Code-level enforcement (chat_routes strips create_document + send_email when active_email is set); cross-session active_doc_id is now trusted instead of being silently dropped. set_active_email/clear_active_email tool-layer helpers in tool_implementations. - ui_control open_email_reply: optional body argument so the agent can open-and-write in one call; envelope now forwards uid/folder/account/ body/panel through tool_output. Tool description sharpened and the parser rejects empty bodies on reply/reply-all (forces the agent to write rather than open an empty draft). - Email library: search now runs against [Gmail]/All Mail when the current folder is INBOX (archived emails surface). Whirlpool spinner + 'Searching…' placeholder while in flight. Each search result is stamped with its source folder so clicks open the right email instead of whatever shares its UID in INBOX. Search no longer re-applies the same text pill locally (which only checks subject/from/snippet, never body) so body-only matches don't get dropped after IMAP returns them. Initial inbox load bumped 100→500. - Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the card menu and the open-reader more menu, backed by a new /api/email/flag/{uid}?on=true\|false endpoint. Flagged emails always bubble to the top of the grid regardless of active sort. - AI reply in doc editor: never overwrites existing draft text or the quoted history. AI suggestion is prepended; AI-generated 'On … wrote:' re-quotes are stripped so the original quote isn't visually edited. - Cookbook serve: pre-launch GPU driver / has_gpu / install / version- floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0 etc.) before the launch chain starts. Detect 'another model already running on this host' and offer Stop & launch (with graceful then force tmux kill helpers, port release wait). Per-vendor deep-link buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend picker is now a custom dropdown with accent-coloured logos for vLLM, SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to package names in Dependencies. Runtime-readiness note moved inside the panel (green when ready, red when missing) with an × dismiss. Esc collapses the expanded card; expanded card scrolls when it overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on one row (Reasoning Parser auto-detected per model family). Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto' input — command builders read from the GPU button strip. Default cookbook open is Download tab. - Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by release_date; release dates can be backfilled with the new scripts/backfill_model_release_dates.py and recipe metadata pulled with scripts/import_from_vllm_recipes.py against the upstream vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped on entries). - Calendar: Quick add hint cycles a random Odysseus-themed example per open (wooden horse Friday, crew muster 10am daily, council on Ithaca, …). Typing a time like '11pm' in the event title updates the hero clock live. - Doc editor: email-mode Reply button (sparkle icon, accent) opens the same Fast/Full + context popover the email reader uses; Ctrl+Alt+M toggles markdown preview. - Memories panel: custom sort picker with per-option icons, default 'Latest', visible Enabled/Disabled toggle text matching the section description style.	2026-06-15 20:47:51 +09:00
Mazen Tamer Salah	9c00da6d1c	fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models (#3639 ) * fix(hwfit): tolerate non-numeric gpu_count in /api/hwfit/models The route did `n = int(gpu_count)` with no guard, so a non-numeric query param like `?gpu_count=abc` raised ValueError and returned HTTP 500. Parse it defensively (mirroring the gpu_group guard a few lines above): a malformed value is ignored, exactly like omitting the param, and valid values still apply. Adds tests/test_hwfit_gpu_count_nonnumeric.py: a non-numeric gpu_count returns a ranking instead of raising, and a numeric value is still accepted. * test(hwfit): cover non-numeric manual_gpu_count too Follow-up to the gpu_count guard: add a regression test for the sibling manual_gpu_count query param (the hardware simulator in _apply_manual_hardware), which dev already guards by defaulting to 1 on a non-numeric value. This pins that behaviour so the endpoint's count parsing is fully covered and cannot regress to a 500.	2026-06-11 01:01:58 +02:00
RaresKeY	d1a5a7d680	fix(hwfit): validate remote SSH detection targets (#3718 )	2026-06-11 00:43:49 +02:00
pewdiepie-archdaemon	fa8c93ec0a	Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish Surface a lot of accumulated cookbook + UI work as a single non-agent commit so the agent rework lands cleanly. Highlights: - Ollama as a first-class backend in the Cookbook: * Download input accepts ollama-style names (name:tag) → backend=ollama * /api/cookbook/ollama/library (cached scrape of ollama.com + curated fallback so classic models like qwen2.5 stay reachable) * "Browse Ollama library" toggle below Download with size chips * Engine=Ollama in hwfit toolbar merges the Ollama library into the main scan list as per-tag rows with the same Fit/Param/Quant/VRAM columns; click → fills Download input - API Tokens form added to Integrations panel (matching wired loadTokens()/initTokenForm() that had no HTML) - Serve panel polish: Advanced fold tightening (-8px nudges on vLLM checks, Extra args, Spec row), n_cpu_moe + Split Mode controls pulled up 8px to align with the row's checkboxes, GGUF File dropdown exposed for Ollama backend, GPU re-render on Edit serve restore, _forceBackend flag so saved serveState wins over backend detection, cookbook:servers-changed CustomEvent so panels don't need refresh - Models page redesign: Add Models row (URL + hidden API key reveal + Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All + Clear-offline buttons in Added Models toolbar, offline-pill removed (opacity already conveys state), Engine dropdown gains Ollama option - _ping_endpoint probes /v1/models then base, accepts 4xx as reachable (vLLM returns 404 on bare /v1, fully working endpoints were showing offline) - Diagnosis card: × dismiss + Copy bundle buttons restored on the serve error feedback card - Orphan tmux sweep re-enabled behind a 60s rate-limit + background Thread (off the main event loop) so dead serves get discovered - cookbook_routes auto-register watchdog: drops the endpoint if the serve session exits non-zero within the first ~3min - ollama-rocm sidecar awareness in download wrapper (`docker exec ollama-rocm ollama pull` when host ollama isn't installed) - Skill extractor sets initial_status="published" when auto_approve_skills pref is on (audit demotes later) - Skill list / model list / cookbook scan misc polish	2026-06-09 09:46:19 +09:00
pewdiepie-archdaemon	3706d756f3	Merge remote-tracking branch 'origin/main' into visual-pr-playground # Conflicts: # routes/cookbook_routes.py # routes/hwfit_routes.py # services/hwfit/fit.py # services/hwfit/models.py # static/js/cookbook-diagnosis.js # static/js/cookbook-hwfit.js # static/js/cookbook.js # static/js/cookbookRunning.js	2026-06-03 16:49:10 +09:00
pewdiepie-archdaemon	eb79b76432	Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes Backend (services/hwfit + routes): - rank_models picks visible set by REQUESTED column, not always score — sorting by Param now shows highest-param models PERIOD (incl. too_tight). - New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang cannot serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so FP8 wins when both fit. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5 on equal composite score; >=100B integers not misread as versions. - /api/cookbook/hf-latest no longer drops models without an "NB" pattern in the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered). - Cached-model scan: atexit flushes models JSON even if the script is killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s. - KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New "stalled" status for shells <1 MB with no .incomplete files. - /api/cookbook/state POST guard: rejects "done" download tasks lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stops stale tabs from poisoning persisted state. - hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization Q4_K_M -> BF16 (it is the native base, not a quant). Frontend (static/js): - Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/ 50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit column header tagged with active budget (RAM / GPU / N GPU). - Foldable Download admin-card: the Download h2 is the chevron trigger; state persists in localStorage. - Download card surfaces destination dir (Dir: <path>). Same dir on running task row, font/color matched to uptime (9px Fira Code muted, opacity .4). - Serve panel ctx text input always resets to model max on open. Sub-MB cached models show with red "download stalled" badge. - Bulk-select Cancel + Delete reset the Select button label on exit. - Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/ required; bare "Download complete" no longer marks the task done after the first config file. Clear button now sends tmux kill-session too. True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer per-shard aggregate. - Diagnosis card simplified: removed fold toggle, copy button, dismiss X. Suggestion font matches message body (12px). - HF token field flashes green check + "Saved" on save. - Cached scan no longer counts stalled rows as downloaded in Scan/Download. CSS: - dep Install button width pinned to 76px to match Installed split. - task-sub row +1px; task-status badge gets margin-right 8px. - Ctx slider styled like gallery editor sliders (thin pill rail, red thumb). - Bulk-select cancel button top -3px -> -5px.	2026-06-03 16:32:20 +09:00
Shaw	16f7feee0a	fix(hwfit): honor manual "metal" backend in the hardware simulator (#1090 ) The Cookbook's manual hardware simulator ("what if I had this setup") let users pick a backend, but _apply_manual_hardware only accepted cuda/rocm/cpu_x86/ cpu_arm and silently coerced anything else to cuda. So selecting Apple/Metal simulated a CUDA box instead — and ranked safetensors-only repos a Mac can't serve, even though the rest of hwfit (services.hwfit.fit, the serve-command generation) already supports Metal as GGUF-only via llama.cpp/Ollama. Add "metal" to the accepted backends (now a named _MANUAL_BACKENDS set, kept a subset of what fit.py understands) and set unified_memory=True for it — Apple Silicon shares one memory pool with the GPU — while clearing that flag for the discrete (cuda/rocm) and CPU backends. _apply_manual_hardware is lifted to module scope so it is directly unit-testable; both route call sites are unchanged. Adds tests/test_hwfit_manual_backend.py, including an end-to-end check that a simulated Metal box only recommends GGUF-servable models. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 23:12:34 +09:00
pewdiepie-archdaemon	ff93a6c63b	Polish email and cookbook flows	2026-06-02 22:42:07 +09:00
Leo	6fca7e86b7	Cookbook serve profiles and engine filter * Cookbook: Engine filter + intelligent hardware-computed serve profiles Two related Cookbook serving improvements for accurate, hardware-aware model serving (especially on consumer GPUs that can only run GGUF/llama.cpp). Engine filter - New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant picker. Pure client-side view filter over the fetched list via the same _detectBackend() the serve commands use, so what you filter to is exactly what would launch. Re-renders from cache (no refetch). Empty-state message + the instant-cache-paint path account for it too. Intelligent serve profiles (Quality / Balanced / Speed) - services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM + model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type, context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU instead of failing; a model that fits stays fully on GPU; quant tracks profile intent; vision models keep image-encoder headroom. Reuses models.py VRAM math so filtering and serving agree on what fits. Pure/deterministic (no t/s claims — partial-offload speed isn't reliably predictable; fit is what's computed). - /api/hwfit/profiles endpoint returns the profiles + the model's trained context limit, with loose name matching (strips org/ prefix, -GGUF suffix, quant tag) so a local GGUF folder name resolves to its catalog entry. - _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn / --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It previously only set -ngl/-c, which is why it OOM'd or ran slow. - Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV Cache / Flash Attn fields. Context is clamped to the model's trained limit (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch — fixes a crash where a stale 256k/16M preset + quantized KV cache caused an amdgpu ErrorDeviceLost. Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed VRAM, context cap, launchable flags, vision headroom, no-GPU empty. Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k, matching hand-tuning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook: make column-header sorting discoverable (incl. Newest) Sorting in Cookbook is via clickable column headers (pewds' design), but the headers had no visual cue that they're interactive — so sorting in general, and the Newest sort on the Model header specifically, was undiscoverable. - Style sortable headers as interactive: pointer cursor, hover underline, and the active sort column bolded/highlighted. There was no CSS for .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort, not just Newest. - The Model column header sorts by release_date (newest first), reusing the existing header-click sort wiring and the "newest" SORT_KEY. No new sort control — uses the existing column-header paradigm. Checks: node --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2) In the Serve tab the model is a specific GGUF file already on disk, so its quant can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K" as if you could re-quantize it. That's meaningless when serving a fixed file. - compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE mode), the quant is locked to the file's and profiles differ only in the real serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode (no override) still varies the quant to show download options. - /api/hwfit/profiles accepts serve_weights_gb & serve_quant. - The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from the repo/file name) and passes them, so profiles match what's actually served. Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k ncm15) — no nonsensical quant changes. Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor Two serve-panel additions: 1. Vision toggle. A "Vision" checkbox that serves the model with its multimodal projector so it can read images. The mmproj path is resolved at runtime (find mmproj-.gguf next to the model), so dropping an mmproj file in the model folder makes the toggle just work; `--mmproj … --image-max-tokens 1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found. 2. Live GPU-memory monitor.* A readout that polls /api/cookbook/gpus every 4s while the panel is open and shows VRAM used/total/%, free, and — crucially on a discrete card — RAM spillover (AMD gtt_used_mb), with a plain-language health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint (previously read for total only and discarded for 'used'). Lets you see at a glance whether a config fits VRAM (fast) or is paging to system RAM over PCIe (slow) instead of guessing. Checks: node --check + py_compile pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:34:42 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

11 Commits