Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI

Backend (services/hwfit + routes): - VRAM column sort now shows global highest first (was special-cased to ascending then truncated top-N, which made "highest VRAM" mathematically unreachable). Every column path uses reverse=True for the truncation. - Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep re-probing the rig during a session; Rescan button still forces fresh. - Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5. - hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the actual 156 GB / 284 GB disk footprints). - New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT / QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES. Frontend — Scan/Download: - Engine + Quant swapped in the toolbar; Quant defaults to "All". - Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag re-sorts by vram ascending (smallest fitting first); back to Max → score. - Ctx slider rail now visible — was background:transparent in a duplicate later-cascade rule. Hardcoded grey + !important. - Search input moved to the far right of the toolbar. - Type/Standard default; "Context" not uppercased; Search placeholder dimmed. - Engine "?" + Quant "?" inline help chips inside their dropdown boxes. - Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc. - Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in tooltip. Smart title-suffix strips the parts already in the repo name (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)"). - Conditional warning for safetensors models on non-GPU rigs only. - Dependency Install / Installed / Installed▾ / N/A all 75.85px wide. - Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag. - Foldable Download admin-card (h2 chevron); line under h2 only when folded. - HF token save gets a green ✓ + "Saved" flash. - Cached scan no longer counts stalled rows as downloaded. - Footer: "Request it →" link with GitHub mark to the public discussion (#1962) for model-add requests. Frontend — Running tab: - Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare "Download complete"). True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate. - ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m". - Clear button kills the tmux session too; if the output still shows a live shard line, the pill is hidden + relabels as "reconnect" + revives on click. - Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled to 8s), scan persisted done/error/crashed downloads and probe their tmux session — if alive, flip status back to running and reattach. - Per-launch zombie probe: clicking Download on a model whose persisted state is done but tmux is still alive revives the existing task and refuses to start a duplicate. - Pre-launch GPU probe: vllm / sglang / diffusers serve check /api/cookbook/gpus first; warns + confirms if no GPU is visible. - Server-side state guard: rejects "done" POSTs for downloads lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stale tabs can't poison persisted state any more. - Running count includes tasks whose output looks active even if persisted status got stuck. Dir text on the running row, font matched to uptime. Serve panel: - Ctx text input always resets to model max on open (default 20000 when metadata is missing). - Max Seqs default 8 -> 4. KV Cache dtype select 32px tall. - Lightning icon on Launch (same as Action toggle). - Diagnosis card simplified (no fold/copy/dismiss), suggestion font matches body; action buttons get icons on the left (Retry/Copy/Edit/ Install/Kill/Switch/etc.). - Incomplete-download serve warning when model status is downloading / stalled / has_incomplete. - MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
2026-06-16 17:55:26 -04:00 · 2026-06-03 20:25:25 +09:00
parent 3706d756f3
commit 562bc4dedc
12 changed files with 669 additions and 115 deletions
@@ -5110,6 +5110,100 @@
  "release_date": "2023-10-29",
  "_discovered": true
 },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Flash",
+  "provider": "deepseek-ai",
+  "parameter_count": "284B",
+  "parameters_raw": 284000000000,
+  "active_parameters": 13000000000,
+  "is_moe": true,
+  "min_ram_gb": 200.0,
+  "recommended_ram_gb": 320.0,
+  "min_vram_gb": 156.0,
+  "quantization": "FP4-MoE-Mixed",
+  "context_length": 1000000,
+  "use_case": "General-purpose reasoning, long-context",
+  "capabilities": [
+   "long_context",
+   "reasoning",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 3542202,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Flash-Base",
+  "provider": "deepseek-ai",
+  "parameter_count": "284B",
+  "parameters_raw": 284000000000,
+  "active_parameters": 13000000000,
+  "is_moe": true,
+  "min_ram_gb": 290.0,
+  "recommended_ram_gb": 460.0,
+  "min_vram_gb": 284.0,
+  "quantization": "FP8-Mixed",
+  "context_length": 1000000,
+  "use_case": "Base pretrained \u2014 fine-tuning starting point",
+  "capabilities": [
+   "long_context",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 0,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Pro",
+  "provider": "deepseek-ai",
+  "parameter_count": "1.6T",
+  "parameters_raw": 1600000000000,
+  "active_parameters": 49000000000,
+  "is_moe": true,
+  "min_ram_gb": 1100.0,
+  "recommended_ram_gb": 1800.0,
+  "min_vram_gb": 880.0,
+  "quantization": "FP4-MoE-Mixed",
+  "context_length": 1000000,
+  "use_case": "Flagship reasoning, long-context",
+  "capabilities": [
+   "long_context",
+   "reasoning",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 0,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Pro-Base",
+  "provider": "deepseek-ai",
+  "parameter_count": "1.6T",
+  "parameters_raw": 1600000000000,
+  "active_parameters": 49000000000,
+  "is_moe": true,
+  "min_ram_gb": 1700.0,
+  "recommended_ram_gb": 2600.0,
+  "min_vram_gb": 1600.0,
+  "quantization": "FP8-Mixed",
+  "context_length": 1000000,
+  "use_case": "Base pretrained \u2014 fine-tuning starting point",
+  "capabilities": [
+   "long_context",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 0,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
 {
  "name": "deepseek-ai/deepseek-coder-6.7b-base",
  "provider": "DeepSeek",
@@ -13886,53 +13980,6 @@
  "gguf_sources": [],
  "capabilities": []
 },
- {
-  "name": "deepseek-ai/DeepSeek-V4-Flash",
-  "provider": "DeepSeek",
-  "parameter_count": "158B",
-  "parameters_raw": 158000000000,
-  "min_ram_gb": 165.0,
-  "recommended_ram_gb": 205.0,
-  "min_vram_gb": 165.0,
-  "quantization": "FP8",
-  "context_length": 1000000,
-  "use_case": "General purpose, reasoning (MoE)",
-  "is_moe": true,
-  "num_experts": null,
-  "active_experts": null,
-  "active_parameters": 13000000000,
-  "architecture": "deepseek_v4",
-  "pipeline_tag": "text-generation",
-  "release_date": "2026-04-22",
-  "gguf_sources": [
-   {
-    "repo": "unsloth/DeepSeek-V4-Flash",
-    "provider": "unsloth"
-   }
-  ],
-  "capabilities": []
- },
- {
-  "name": "deepseek-ai/DeepSeek-V4-Pro",
-  "provider": "DeepSeek",
-  "parameter_count": "1600B",
-  "parameters_raw": 1600000000000,
-  "min_ram_gb": 928.5,
-  "recommended_ram_gb": 1207.0,
-  "min_vram_gb": 928.5,
-  "quantization": "Q4_K_M",
-  "context_length": 1000000,
-  "use_case": "Frontier reasoning (MoE)",
-  "is_moe": true,
-  "num_experts": null,
-  "active_experts": null,
-  "active_parameters": 49000000000,
-  "architecture": "deepseek_v4",
-  "pipeline_tag": "text-generation",
-  "release_date": "2026-04-22",
-  "gguf_sources": [],
-  "capabilities": []
- },
 {
  "name": "google/gemma-4-E2B-it",
  "provider": "Google",