mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-19 19:25:27 -04:00
eb79b76432bc78b571e45f8b4639baa2673a51e1
9 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
eb79b76432 |
Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes
Backend (services/hwfit + routes): - rank_models picks visible set by REQUESTED column, not always score — sorting by Param now shows highest-param models PERIOD (incl. too_tight). - New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang cannot serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so FP8 wins when both fit. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5 on equal composite score; >=100B integers not misread as versions. - /api/cookbook/hf-latest no longer drops models without an "NB" pattern in the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered). - Cached-model scan: atexit flushes models JSON even if the script is killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s. - KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New "stalled" status for shells <1 MB with no .incomplete files. - /api/cookbook/state POST guard: rejects "done" download tasks lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stops stale tabs from poisoning persisted state. - hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization Q4_K_M -> BF16 (it is the native base, not a quant). Frontend (static/js): - Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/ 50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit column header tagged with active budget (RAM / GPU / N GPU). - Foldable Download admin-card: the Download h2 is the chevron trigger; state persists in localStorage. - Download card surfaces destination dir (Dir: <path>). Same dir on running task row, font/color matched to uptime (9px Fira Code muted, opacity .4). - Serve panel ctx text input always resets to model max on open. Sub-MB cached models show with red "download stalled" badge. - Bulk-select Cancel + Delete reset the Select button label on exit. - Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/ required; bare "Download complete" no longer marks the task done after the first config file. Clear button now sends tmux kill-session too. True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer per-shard aggregate. - Diagnosis card simplified: removed fold toggle, copy button, dismiss X. Suggestion font matches message body (12px). - HF token field flashes green check + "Saved" on save. - Cached scan no longer counts stalled rows as downloaded in Scan/Download. CSS: - dep Install button width pinned to 76px to match Installed split. - task-sub row +1px; task-status badge gets margin-right 8px. - Ctx slider styled like gallery editor sliders (thin pill rail, red thumb). - Bulk-select cancel button top -3px -> -5px. |
||
|
|
d42e6a7acc |
Scope skill mutations to caller owner
SkillsManager.update_skill walks every SKILL.md on disk and matches by
slug only; the 'owner' key in its scalar_keys whitelist meant a caller
could pass updates={'owner': 'attacker', 'description': 'pwned'} and the
first matching file on disk got silently re-owned. Two users with the
same slug under different category directories (which is supported by
the on-disk layout <category>/<name>/SKILL.md) could each stomp the
other's skill via the manage_skills tool or the in-process callers in
tool_implementations.py (edit, patch, publish, delete).
update_skill and delete_skill now require the caller's owner and only
match a file whose parsed owner field matches. The default of None
means 'no scope' and only matches ownerless skills, so an unsafe call
without an explicit owner is now a no-op. 'owner' is also removed from
scalar_keys so the updates dict cannot be used to reassign ownership
even when the manager is called from an in-process path that didn't
supply the owner argument.
The in-process callers in tool_implementations.py are updated to pass
owner=owner (which was already in scope at every call site) so the
HTTP and agent paths both go through the scoped check. The HTTP route
at routes/skills_routes.py:1499 was already owner-scoped via
sm.load(owner=user); the fix brings the in-process path up to the
same standard.
|
||
|
|
9b1acf6612 |
Fix year extraction in research queries
* fix: extract full year in research query entities, not just the century * fix: same year capture-group bug in the services search copy * test: research query extracts the full year |
||
|
|
033852ab14 | fix: require GGUF sources for llama downloads (#368) | ||
|
|
9955f5bc95 |
Fix VRAM estimates for pre-quantized HF repos
The Cookbook fit scanner was reporting impossibly low VRAM requirements
for some pre-quantized models — e.g. cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit
shown as 7.1 GB ('perfect' on a 12 GB card) when the real load is ~40 GB.
Root cause is in the catalog builder. When _entry_from_modelinfo falls
back to safetensors metadata for the parameter count, it stored
safetensors.total directly. For pre-quantized repos that figure reflects
*packed* element counts: AWQ/GPTQ-Int4 pack 8x 4-bit weights into one
I32, AWQ-8bit/GPTQ-Int8/FP8 pack 4x. The catalog therefore recorded
~1/8 of the real parameter count, and min_vram_gb = packed * bpp
double-applied the quantization.
Fix the safetensors fallback:
* prefer the per-dtype parameters dict when available and unpack only the
I32/I64 entries (the F16/BF16 scale/zero tensors and embeddings are
already at their real element counts)
* fall back to total * pack_factor when only total is exposed
Patch the catalog entries that were affected by the old fallback so the
fit ratings reflect reality without waiting for a full catalog rebuild:
* cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit 11.4B -> 79.7B (40.8 GB VRAM)
* stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ 4.6B -> 30.5B
* stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ 5.1B -> 30.5B
* warshanks/Qwen3-8B-abliterated-AWQ 2.2B -> 8.2B
* QuantTrio/sarvam-30b-AWQ 7B -> 30B
* QuantTrio/sarvam-105b-AWQ 19B -> 105B
Closes #377.
|
||
|
|
14e8cffa41 |
Fail closed on untrusted teacher draft confidence
Follow-up to #275. get_relevant_skills() treats a missing/unparseable confidence as 1.0, so it always clears the injection threshold. For teacher-escalation drafts -- auto-written from a possibly untrusted trace and then injected as authoritative guidance -- that means a draft can be auto-injected regardless of the configured confidence bar. Require teacher-escalation drafts to carry an explicit, parseable confidence that meets min_confidence; fail closed otherwise. Hand-authored legacy drafts keep the lenient "unset -> keep" behavior so they don't silently vanish, and published skills are unaffected. Ran: python -m py_compile services/memory/skills.py + a get_relevant_skills unit check (teacher drafts with None/garbage/0.8 excluded at min=0.85; 0.9 included; legacy + published unaffected; gate-off control unchanged). Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
0888a3b3e6 | Add native Windows compatibility layer | ||
|
|
f1817fd560 |
Add macOS Apple Silicon Cookbook support
* Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning hardware.py detects Apple Silicon locally and over SSH, reporting backend=metal, the chip name, and a RAM-scaled fraction of unified memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8) that can't be served on Metal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit |
||
|
|
e5c99a5eee | Odysseus v1.0 |