mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-30 00:22:10 -04:00
Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
This commit is contained in:
+71
-12
@@ -527,6 +527,9 @@ export async function _hwfitFetch(fresh = false) {
|
||||
if (useCase) params.set('use_case', useCase);
|
||||
if (quantPref) params.set('quant', quantPref);
|
||||
if (targetCtx) params.set('ctx', String(targetCtx));
|
||||
// Fit-only filter — set by the dot in the Fit column header.
|
||||
const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
|
||||
if (_fitOnly) params.set('fit_only', '1');
|
||||
}
|
||||
const endpoint = isImageMode ? `/api/hwfit/image-models?${params}` : `/api/hwfit/models?${params}`;
|
||||
const res = await fetch(endpoint);
|
||||
@@ -888,9 +891,15 @@ export function _hwfitRenderList(el, models) {
|
||||
arrow = isReversed ? ' \u25B2' : ' \u25BC';
|
||||
}
|
||||
const dataAttr = col.key ? ` data-sort="${col.key}"` : '';
|
||||
const label = (col.cls === 'hwfit-fit' && _budget)
|
||||
? `${col.label} <span style="font-size:0.75em;opacity:0.6;font-weight:normal;">(${_budget})</span>`
|
||||
: col.label;
|
||||
// Fit column gets a small dot to its left that toggles "show only models
|
||||
// that fit" — replaces the old Fits On/Off button next to the toolbar.
|
||||
let label = col.label;
|
||||
if (col.cls === 'hwfit-fit') {
|
||||
const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
|
||||
label = `<span class="hwfit-fit-dot${_fitOnly ? ' active' : ''}" title="${_fitOnly ? 'Showing only models that fit. Click to also show too-tight rows.' : 'Click to show only models that fit your hardware.'}" data-fit-dot>●</span>${col.label}`;
|
||||
// (Budget tag removed — the GPU/RAM/N-GPU suffix next to "Fit" was noise;
|
||||
// the toggle row already shows which budget is active.)
|
||||
}
|
||||
html += `<span class="hwfit-col ${col.cls}${sortable}${active}"${dataAttr}>${label}${arrow}</span>`;
|
||||
}
|
||||
html += '</div>';
|
||||
@@ -910,9 +919,31 @@ export function _hwfitRenderList(el, models) {
|
||||
const dlDot = (_cachedModelIds && (_cachedModelIds.has(m.name) || [..._cachedModelIds].some(id => id === m.name?.split('/').pop()))) ? '<span class="hwfit-dl-dot" title="Downloaded">\u25CF</span>' : '';
|
||||
html += `<div class="hwfit-row" data-model="${esc(m.name)}">`;
|
||||
html += `<span class="hwfit-col hwfit-fit" style="color:${fitColor}">${esc(fitLabel)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(m.name?.split('/').pop() || m.name)}${moeBadge}${imgBadge}${dlDot}</span>`;
|
||||
// Append quant to the title when it's not already in the repo name. The
|
||||
// suffix strips quant-parts the name already contains — e.g. for
|
||||
// QuantTrio/MiniMax-M2-AWQ + quant=AWQ-4bit we just show "(4bit)", not
|
||||
// "(AWQ-4bit)". DeepSeek-V4-Flash + FP4-MoE-Mixed keeps the full tag
|
||||
// (none of those parts are in the repo id).
|
||||
const _short = m.name?.split('/').pop() || m.name || '';
|
||||
const _quantTag = (m.quant || '').trim();
|
||||
const _lowerShort = _short.toLowerCase();
|
||||
let _quantSuffix = '';
|
||||
if (_quantTag) {
|
||||
const _parts = _quantTag.split(/[-_]/).filter(Boolean);
|
||||
const _remaining = _parts.filter(p => !_lowerShort.includes(p.toLowerCase()));
|
||||
if (_remaining.length && _remaining.length < _parts.length + 1) { // at least one part is new
|
||||
let _display = _remaining.join('-');
|
||||
if (_display.length > 9) _display = _display.slice(0, 9) + '…';
|
||||
_quantSuffix = ` <span class="hwfit-name-quant" title="${esc(_quantTag)} — full storage format">(${esc(_display)})</span>`;
|
||||
}
|
||||
}
|
||||
html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(_short)}${_quantSuffix}${moeBadge}${imgBadge}${dlDot}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-params">${esc(pcount)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-quant">${esc(m.quant || '?')}</span>`;
|
||||
// Truncate the Quant cell to 9 chars + ellipsis so long tags like
|
||||
// "FP4-MoE-Mixed" don't push neighboring columns. Full tag stays in title.
|
||||
const _qRaw = m.quant || '?';
|
||||
const _qShort = _qRaw.length > 9 ? _qRaw.slice(0, 9) + '…' : _qRaw;
|
||||
html += `<span class="hwfit-col hwfit-c-quant" title="${esc(_qRaw)}">${esc(_qShort)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-vram">${vramLabel}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-ctx">${m.is_image_gen ? '\u2014' : ctx}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-speed">${m.is_image_gen ? '\u2014' : tps + ' t/s'}</span>`;
|
||||
@@ -934,7 +965,26 @@ export function _hwfitRenderList(el, models) {
|
||||
});
|
||||
// Clickable header columns → sort (click again to toggle direction)
|
||||
el.querySelectorAll('.hwfit-header .hwfit-sortable').forEach(col => {
|
||||
col.addEventListener('click', () => {
|
||||
col.addEventListener('click', (e) => {
|
||||
// The little dot inside the Fit header is its own toggle (fit-only
|
||||
// filter), don't let it fall through to a sort click.
|
||||
if (e.target.closest('[data-fit-dot]')) {
|
||||
const on = !e.target.classList.contains('active');
|
||||
try { localStorage.setItem('hwfit_fit_only_v1', on ? '1' : '0'); } catch {}
|
||||
// Un-toggling the fit filter (off → showing too-tight rows again) is
|
||||
// typically because the user wants to see the LARGE models they can't
|
||||
// run yet — re-sort by VRAM descending so the biggest surface first.
|
||||
if (!on) {
|
||||
const sortSel = document.getElementById('hwfit-sort');
|
||||
if (sortSel) {
|
||||
sortSel.value = 'vram';
|
||||
sortSel.dataset.reverse = '0'; // descending (biggest first)
|
||||
}
|
||||
}
|
||||
_hwfitCache = null;
|
||||
_hwfitFetch();
|
||||
return;
|
||||
}
|
||||
const sortKey = col.dataset.sort;
|
||||
if (!sortKey) return;
|
||||
const sel = document.getElementById('hwfit-sort');
|
||||
@@ -1018,7 +1068,16 @@ export function _expandModelRow(row, modelData) {
|
||||
if (modelData.is_image_gen) {
|
||||
html += `<div style="font-size:10px;opacity:0.5;margin-top:4px;">${esc((modelData.capabilities || []).join(' \u00B7 ') || '')}${modelData.description ? ' \u2014 ' + esc(modelData.description) : ''}</div>`;
|
||||
} else if (_requiresAcceleratorBackend(modelData)) {
|
||||
html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
|
||||
// Only show the "needs CUDA/ROCm" note when the host doesn't already have
|
||||
// one. With a visible CUDA/ROCm accelerator the note is noise — the user
|
||||
// can already serve the model and reading the warning on every row makes
|
||||
// the panel feel like everything's broken.
|
||||
const _sys = _hwfitCache?.system || {};
|
||||
const _backend = (_sys.backend || '').toLowerCase();
|
||||
const _hasGpuAccel = !!_sys.has_gpu && (_backend === 'cuda' || _backend === 'rocm');
|
||||
if (!_hasGpuAccel) {
|
||||
html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
|
||||
}
|
||||
}
|
||||
html += `</div>`;
|
||||
|
||||
@@ -1243,14 +1302,14 @@ export function _hwfitInit() {
|
||||
const targetCtx = _ctxValue();
|
||||
try { localStorage.setItem(_CTX_KEY, String(targetCtx)); } catch {}
|
||||
// Ctx drag affects sort mode: a specific ctx target (anything < Max)
|
||||
// implies the user is hunting for "what fits at this context length",
|
||||
// so re-rank by fit (lowest first). Dragging back to Max means no
|
||||
// ctx constraint → go back to the default score-based ranking.
|
||||
// implies "what runs at this context length" — sort by VRAM ascending
|
||||
// so the cheapest-fitting models surface first. Dragging back to Max
|
||||
// releases the constraint → go back to the default score ranking.
|
||||
const sortSel = document.getElementById('hwfit-sort');
|
||||
if (sortSel) {
|
||||
if (targetCtx) {
|
||||
sortSel.value = 'fit';
|
||||
sortSel.dataset.reverse = '1';
|
||||
sortSel.value = 'vram';
|
||||
sortSel.dataset.reverse = '1'; // ascending = smallest VRAM first
|
||||
} else {
|
||||
sortSel.value = 'score';
|
||||
sortSel.dataset.reverse = '';
|
||||
|
||||
Reference in New Issue
Block a user