mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-17 02:05:22 -04:00
f1817fd560
* Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning hardware.py detects Apple Silicon locally and over SSH, reporting backend=metal, the chip name, and a RAM-scaled fraction of unified memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8) that can't be served on Metal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit32ac81dbc6) * Generate macOS/Metal serve commands and surface the Metal GPU cookbook_routes.py adds a macOS serve path (Ollama, Metal-aware llama.cpp build using `sysctl hw.ncpu` instead of `nproc`, and a clear error if vLLM is attempted). The frontend defaults Metal serving to llama.cpp and offers llama.cpp/Ollama instead of vLLM/SGLang. The odysseus-cookbook CLI's `gpus` command reports the Metal GPU via sysctl/vm_stat. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit4ba01ce25d) * Add launchd LaunchAgent for macOS (systemd equivalent) com.odysseus.ui.plist + install-service-macos.sh run Odysseus at login and restart on crash, the macOS counterpart to odysseus-ui.service. The installer auto-fills paths from the venv, so there's no hand-editing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit3d4b6b2c7b) * Document macOS install (brew, Ollama, AirPlay port, launchd) README + setup.py cover the Homebrew / Apple Silicon path: brew install python@3.11 tmux ollama, Metal serving via Ollama/llama.cpp, the launchd service, and the macOS AirPlay Receiver conflict on ports 7000/5000. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit8dc9a3578a) * Add downloadable macOS launcher app builder build-macos-app.sh generates dist/Odysseus.app and a drag-to-Applications dist/Odysseus.dmg. The app starts the local server from this repo's venv and opens the UI in a chrome-less app window (Chromium --app mode, falling back to the default browser). It's a launcher wrapper — it drives the venv rather than bundling Python — so the install path is baked in at build time. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit7927940c38) * Harden macOS Cookbook support: hide MLX, fix Metal build cache Builds on the adopted PR #213 macOS/Metal work with two fixes and tests: - fit.py: always drop MLX-quantized models. Odysseus only generates serve commands for llama.cpp/Ollama (Metal) and vLLM/SGLang (CUDA); MLX needs the mlx_lm runtime and the catalog's MLX repos ship no GGUF alternative, so they were surfaced on Apple Silicon but could never be served. - cookbook_routes.py (macOS branch only): `rm -rf build` before configure so a poisoned CMakeCache from a prior failed CUDA attempt can't make every later build fail; explicit -DCMAKE_BUILD_TYPE=Release; a clear "brew install cmake" hint if cmake is missing. Linux/CUDA path unchanged. - tests/test_hwfit_macos.py: MLX hidden on metal, MLX still hidden on CUDA (regression guard), Metal detection on Apple Silicon, and skipped on Linux/Intel (proves non-macOS detection is untouched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Propagate unified_memory flag and document macOS GPU/Docker caveat - hardware.py: detect_system now carries the unified_memory flag from GPU detection into the system dict (it was set by _detect_apple_silicon / AMD-APU detection but dropped during result assembly, so the API always reported null). Lets callers distinguish unified from discrete VRAM. - README: prominent warning that Docker on Apple Silicon can't reach the Metal GPU (runs a Linux VM) — Cookbook must run natively for GPU serving; fix stale text that said Cookbook recommends MLX models (now hidden as unservable). - test: detect_system propagates unified_memory. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Put Odysseus's venv bin on PATH for cookbook runners Native (non-Docker) installs run from a virtualenv whose bin holds the `hf` CLI and `python3` the cookbook download/serve tmux scripts shell out to. Those scripts start in a fresh login shell with the venv NOT activated, so on a native macOS install `hf download` failed with "hf: command not found" — and the `pip --user` self-heal missed because macOS has no bare `pip` command. - cookbook_helpers.py: _local_tooling_path_export() — pure helper returning a PATH export for the running interpreter's bin dir (escaped for double quotes). - cookbook_routes.py: download + serve runners prepend that dir on local runs (gated off SSH/Windows); swap the `pip` install fallbacks to `python3 -m pip`. - tests: helper output for normal and spaced paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Document macOS llama.cpp serving prerequisites Clarify the two serving paths on Apple Silicon: the recommended zero-build route (brew install llama.cpp ships a Metal llama-server Cookbook finds on PATH), and the from-source fallback, which requires cmake + Xcode Command Line Tools. Without those the build is skipped and serving silently degrades to a slow CPU build, so new users now know to install them (or use the prebuilt) up front. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Recommend only GGUF-servable models on Metal Apple Silicon's only serving engines are llama.cpp and Ollama, both GGUF-only (vLLM/SGLang are CUDA/ROCm and don't run on macOS). The catalog tags raw safetensors repos with a default Q4_K_M quant, so the fit-ranking was recommending ~397/501 models that have no GGUF and fail to serve on Metal with "No GGUF found" (e.g. microsoft/Phi-mini-MoE-instruct). Drop any model without a real GGUF (is_gguf/gguf_sources) on Apple Silicon — subsumes the previous AWQ/GPTQ/FP8 special-case into one rule. On CUDA these stay visible since vLLM serves safetensors directly. Metal recommendations go 501 -> 104, all actually servable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Remove macOS launchd LaunchAgent (cherry-picked extra) Drop the launchd service from the PR #213 cherry-picks: the install-service-macos.sh installer, the com.odysseus.ui.plist template, and the README section documenting them. Tangential to the core Cookbook/Metal support and not wanted. The build-macos-app.sh launcher is kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add one-command macOS quick start (start-macos.sh) Running Odysseus natively on a Mac previously meant ~7 manual terminal steps (brew deps, venv, activate, pip, setup.py, uvicorn with the right port) — not friendly for a generic macOS user, and the native run is required because Docker on macOS can't reach the Metal GPU. - start-macos.sh: installs Homebrew deps (python@3.11, tmux, prebuilt Metal llama.cpp), creates the venv, installs requirements, runs setup, and launches on a non-AirPlay port (7860). Idempotent; re-run to start again. - README: the Apple Silicon section now leads with this one-command quick start and the clickable .app, with engine/port/manual details folded into a collapsible block. Added a pointer at the top of the manual-install section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * macOS quick start: auto-open browser when ready The "open this URL" line scrolled out of view as uvicorn kept logging after it, so users missed it. Now start-macos.sh waits (in the background) until the server accepts connections, prints a boxed "ready" banner at that point (i.e. after the startup burst, not before), and opens the URL in the default browser automatically. Skippable with ODYSSEUS_NO_OPEN=1 for headless/SSH use. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Don't assume/force a specific Python version on macOS The README claimed "system Python is 3.9" — a machine-specific generalization that's often wrong (macOS ships no recent Python by default; many users already have 3.11+). Make it generic, and make start-macos.sh detect an existing Python 3.11+ and use it, only installing python@3.11 when none is found instead of forcing it on top of the user's Python. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Align start-macos.sh venv path with build-macos-app.sh start-macos.sh created the environment in .venv/, but build-macos-app.sh and the manual install steps use venv/ — so the clickable .app wouldn't reuse the quick-start's environment and would rebuild a second one. Use venv/ everywhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * README: state clearly that MLX is unsupported on Apple Silicon Odysseus has no mlx_lm runtime; it serves GGUF (llama.cpp/Ollama) and CUDA (vLLM/SGLang) only. MLX-only models can't run on a Mac and are hidden from Cookbook — make that explicit in both the quick start and the details. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * start-macos.sh: build the venv with an arm64 Python on Apple Silicon A clean-room run surfaced this: with a universal2/x86 Python (e.g. the python.org installer under /usr/local), the venv's compiled extensions install as arm64 but get loaded as x86_64 when launched from the .app bundle, so it crashes with "incompatible architecture (have arm64, need x86_64)". The terminal run happened to work only because a universal binary defaults to arm64 there. On Apple Silicon, look only under /opt/homebrew (arm64-only) for the build Python, and install Homebrew's python@3.11 if none is present — so the venv is arm64-only and launches correctly from both the terminal and the .app. Intel and non-mac paths are unchanged. Verified end-to-end in a clean clone: .app now boots on Metal with no arch error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address dev-exp review: macOS setup robustness + doc/UX fixes From the voltagent dev-exp review of the branch: - README: fix broken anchor links (the em-dash heading produced a slug the links didn't match); simplify the heading to a stable slug. - cookbook_routes.py: add /opt/homebrew/bin and /usr/local/bin to the serve PATH so a brew-installed llama-server/ollama is found instead of falling back to a slow source build. - start-macos.sh: guard against an empty Python path; fail fast with a clear message on port-in-use; ERR trap with a "safe to re-run" message; show pip progress (drop --quiet on the slow requirements install); stop the background browser-opener cleanly on exit/Ctrl+C (no orphaned poller). - setup.py: bind hint to 127.0.0.1; suppress the manual run-hint when launched by start-macos.sh (ODYSSEUS_SKIP_RUN_HINT) so the URL isn't contradictory. - build-macos-app.sh: the .app only opens the browser once the server is actually ready (not after the readiness timeout). - cookbookServe.js: drop "Diffusers" from the Metal backend picker — diffusion_server.py is CUDA-only, so it was an unservable option on macOS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: yunggilja <yunggilja@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
553 lines
21 KiB
Python
553 lines
21 KiB
Python
import os
|
|
import platform
|
|
import subprocess
|
|
import time
|
|
|
|
CACHE_TTL = 1800 # 30 min — hardware rarely changes; use the Rescan button to force a re-probe
|
|
|
|
|
|
_remote_host = None # set by detect_system(host=...)
|
|
_remote_port = None # set by detect_system(ssh_port=...)
|
|
_remote_platform = None # set by detect_system(platform=...): "windows", "linux", "termux"
|
|
_last_gpu_error = None # set by _detect_nvidia() when nvidia-smi errors (driver mismatch, etc.)
|
|
|
|
|
|
def _run(cmd):
|
|
try:
|
|
if _remote_host:
|
|
# Run command on remote host via SSH
|
|
if isinstance(cmd, list):
|
|
cmd_str = " ".join(cmd)
|
|
else:
|
|
cmd_str = cmd
|
|
ssh_cmd = ["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no"]
|
|
if _remote_port and _remote_port != "22":
|
|
ssh_cmd += ["-p", _remote_port]
|
|
ssh_cmd += [_remote_host, cmd_str]
|
|
r = subprocess.run(
|
|
ssh_cmd,
|
|
capture_output=True, text=True, timeout=15,
|
|
)
|
|
else:
|
|
r = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
|
if r.returncode == 0:
|
|
return r.stdout.strip()
|
|
except Exception:
|
|
pass
|
|
return None
|
|
|
|
|
|
def _group_gpus(gpus):
|
|
"""Group identical GPUs by (name, rounded VRAM).
|
|
|
|
vLLM tensor-parallel only works across IDENTICAL GPUs, so a mixed box must
|
|
be split into homogeneous pools. Each group carries the device indices so a
|
|
serve command can pin CUDA_VISIBLE_DEVICES to exactly one pool. Biggest pool
|
|
(by total VRAM) first — that's the sensible auto-default serving target.
|
|
"""
|
|
groups = {}
|
|
order = []
|
|
for g in gpus:
|
|
key = (g["name"], round(g["vram_gb"]))
|
|
if key not in groups:
|
|
groups[key] = {
|
|
"name": g["name"],
|
|
"vram_each": round(g["vram_gb"], 1),
|
|
"count": 0,
|
|
"indices": [],
|
|
}
|
|
order.append(key)
|
|
groups[key]["count"] += 1
|
|
groups[key]["indices"].append(g.get("index"))
|
|
out = []
|
|
for key in order:
|
|
grp = groups[key]
|
|
grp["vram_total"] = round(grp["vram_each"] * grp["count"], 1)
|
|
out.append(grp)
|
|
out.sort(key=lambda x: x["vram_total"], reverse=True)
|
|
return out
|
|
|
|
|
|
def _detect_nvidia():
|
|
global _last_gpu_error
|
|
_last_gpu_error = None
|
|
out = _run(["nvidia-smi", "--query-gpu=memory.total,name", "--format=csv,noheader,nounits"])
|
|
# Remote fallback: a non-interactive SSH shell often has a minimal PATH
|
|
# that omits where nvidia-smi lives (/usr/bin, /usr/local/cuda/bin), so the
|
|
# first call silently returns nothing → "No GPU" on hosts that DO have GPUs.
|
|
# Retry through a login shell with the common CUDA bin dirs on PATH.
|
|
if not out and _remote_host:
|
|
out = _run(
|
|
"bash -lc 'export PATH=\"$PATH:/usr/bin:/usr/local/bin:/usr/local/cuda/bin\"; "
|
|
"nvidia-smi --query-gpu=memory.total,name --format=csv,noheader,nounits'"
|
|
)
|
|
# Last resort: call nvidia-smi by absolute path. Some hosts have a login
|
|
# shell that isn't bash (or a profile that errors), so the bash -lc retry
|
|
# above still comes back empty even though the binary is right there.
|
|
if not out and _remote_host:
|
|
for _p in ("/usr/bin/nvidia-smi", "/usr/local/bin/nvidia-smi", "/usr/local/cuda/bin/nvidia-smi"):
|
|
out = _run(f"{_p} --query-gpu=memory.total,name --format=csv,noheader,nounits")
|
|
if out:
|
|
break
|
|
if not out:
|
|
return None
|
|
|
|
# nvidia-smi present but unable to talk to the driver (e.g. it was updated
|
|
# without a reboot). It prints an error and no GPU rows — surface that as a
|
|
# driver error rather than the misleading "No GPU".
|
|
_low = out.lower()
|
|
if ("nvml" in _low or "driver/library version mismatch" in _low
|
|
or "couldn't communicate" in _low or "no devices were found" in _low
|
|
or "failed to initialize" in _low):
|
|
_last_gpu_error = out.strip().split("\n")[0][:140] or "NVIDIA driver error"
|
|
return None
|
|
|
|
gpus = []
|
|
# nvidia-smi lists GPUs in index order (0,1,2,...), so the row position is
|
|
# the CUDA device index we'd pass to CUDA_VISIBLE_DEVICES.
|
|
for idx, line in enumerate(out.strip().split("\n")):
|
|
parts = [p.strip() for p in line.split(",")]
|
|
if len(parts) >= 2:
|
|
try:
|
|
vram_mb = float(parts[0])
|
|
gpus.append({"index": idx, "name": parts[1], "vram_gb": vram_mb / 1024.0})
|
|
except ValueError:
|
|
continue
|
|
|
|
if not gpus:
|
|
return None
|
|
total_vram = sum(g["vram_gb"] for g in gpus)
|
|
groups = _group_gpus(gpus)
|
|
return {
|
|
"gpu_name": gpus[0]["name"],
|
|
"gpu_vram_gb": round(total_vram, 1),
|
|
"gpu_count": len(gpus),
|
|
"gpus": gpus,
|
|
"gpu_groups": groups,
|
|
"homogeneous": len(groups) <= 1,
|
|
"backend": "cuda",
|
|
}
|
|
|
|
|
|
def _detect_amd():
|
|
"""Detect AMD GPUs. Handles both discrete cards (with mem_info_vram_total)
|
|
and APUs / unified-memory SoCs like Strix Halo (which expose
|
|
mem_info_vis_vram_total instead, or only mem_info_gtt_total)."""
|
|
def _read(path):
|
|
if _remote_host:
|
|
val = _run(["cat", path])
|
|
return val.strip() if val else None
|
|
try:
|
|
with open(path) as f:
|
|
return f.read().strip()
|
|
except Exception:
|
|
return None
|
|
|
|
def _list_drm_cards():
|
|
if _remote_host:
|
|
out = _run(["ls", "/sys/class/drm"])
|
|
if not out:
|
|
return []
|
|
return [e for e in out.split() if e.startswith("card") and "-" not in e]
|
|
try:
|
|
return [e for e in os.listdir("/sys/class/drm") if e.startswith("card") and "-" not in e]
|
|
except Exception:
|
|
return []
|
|
|
|
try:
|
|
cards = []
|
|
is_apu = False
|
|
for _cidx, entry in enumerate(_list_drm_cards()):
|
|
base = f"/sys/class/drm/{entry}/device"
|
|
vendor = _read(f"{base}/vendor")
|
|
if vendor != "0x1002":
|
|
continue
|
|
# Discrete cards usually report real VRAM in mem_info_vram_total,
|
|
# while some AMD APUs / Docker views expose a tiny vram_total and
|
|
# the usable pool in vis_vram_total. Use the larger of those two;
|
|
# only fall back to GTT if neither VRAM field is available.
|
|
vram_raw = _read(f"{base}/mem_info_vram_total")
|
|
vis_raw = _read(f"{base}/mem_info_vis_vram_total")
|
|
gtt_raw = _read(f"{base}/mem_info_gtt_total")
|
|
vram_val = int(vram_raw) if vram_raw and vram_raw.isdigit() else 0
|
|
vis_val = int(vis_raw) if vis_raw and vis_raw.isdigit() else 0
|
|
gtt_val = int(gtt_raw) if gtt_raw and gtt_raw.isdigit() else 0
|
|
vram_bytes = max(vram_val, vis_val)
|
|
if vram_bytes <= 0:
|
|
vram_bytes = gtt_val
|
|
if vis_val and vis_val >= vram_val:
|
|
is_apu = True
|
|
if vram_bytes <= 0:
|
|
continue
|
|
name = _read(f"{base}/product_name") or f"AMD GPU ({entry})"
|
|
cards.append({"index": _cidx, "name": name, "vram_gb": vram_bytes / (1024**3)})
|
|
|
|
if not cards:
|
|
return None
|
|
total_vram = sum(c["vram_gb"] for c in cards)
|
|
groups = _group_gpus(cards)
|
|
# NOTE: for APUs with BIOS UMA carveout (e.g. Strix Halo), vis_vram_total
|
|
# is the real usable GPU memory — it's physically backed but reserved
|
|
# by BIOS so it doesn't appear in /proc/meminfo. Don't cap it at system
|
|
# RAM: the two pools are separate from the OS's perspective.
|
|
return {
|
|
"gpu_name": cards[0]["name"],
|
|
"gpu_vram_gb": round(total_vram, 1),
|
|
"gpu_count": len(cards),
|
|
"gpus": cards,
|
|
"gpu_groups": groups,
|
|
"homogeneous": len(groups) <= 1,
|
|
"backend": "rocm",
|
|
"unified_memory": is_apu,
|
|
}
|
|
except Exception:
|
|
return None
|
|
|
|
|
|
def _detect_apple_silicon():
|
|
"""Detect Apple Silicon (M-series) GPUs.
|
|
|
|
Macs have no discrete VRAM — the GPU shares the system's unified memory.
|
|
We report a fraction of total RAM as the usable GPU budget (matching macOS's
|
|
default Metal working-set limit) so the Cookbook recommends models that
|
|
actually run on the GPU instead of classifying the machine as CPU-only.
|
|
|
|
backend="metal" is what services.hwfit.fit and the serve-command generation
|
|
key off of (they already understand MLX / llama.cpp-Metal). Works locally
|
|
(platform.system()=="Darwin") and over SSH (uname -s == Darwin).
|
|
"""
|
|
# Gate to macOS — locally via platform, remotely via uname.
|
|
if _remote_host:
|
|
if "darwin" not in (_run(["uname", "-s"]) or "").lower():
|
|
return None
|
|
arch = (_run(["uname", "-m"]) or "").lower()
|
|
else:
|
|
if platform.system() != "Darwin":
|
|
return None
|
|
arch = platform.machine().lower()
|
|
|
|
# Only Apple Silicon (arm64) has a Metal GPU worth serving LLMs on; Intel
|
|
# Macs fall through to the CPU path.
|
|
if "arm" not in arch and "aarch64" not in arch:
|
|
return None
|
|
|
|
# Chip name, e.g. "Apple M4 Max" — carries the Pro/Max/Ultra variant that
|
|
# the fit bandwidth table keys off of.
|
|
brand = (_run(["sysctl", "-n", "machdep.cpu.brand_string"]) or "Apple Silicon").strip()
|
|
|
|
# Total unified memory in bytes.
|
|
memsize = _run(["sysctl", "-n", "hw.memsize"])
|
|
try:
|
|
total_gb = int(memsize) / (1024**3) if memsize else 0.0
|
|
except ValueError:
|
|
total_gb = 0.0
|
|
if total_gb <= 0:
|
|
return None
|
|
|
|
# Usable GPU budget. macOS lets Metal use most of unified memory, but the
|
|
# default working-set limit scales with RAM: small machines have to keep
|
|
# more back for the OS + app. These fractions track Apple's
|
|
# recommendedMaxWorkingSetSize defaults across the lineup. Honour an
|
|
# explicit override if the user raised it with
|
|
# `sudo sysctl iogpu.wired_limit_mb=…`.
|
|
if total_gb <= 16:
|
|
frac = 0.67
|
|
elif total_gb <= 64:
|
|
frac = 0.75
|
|
else:
|
|
frac = 0.80
|
|
vram_gb = round(total_gb * frac, 1)
|
|
wired = _run(["sysctl", "-n", "iogpu.wired_limit_mb"])
|
|
try:
|
|
wired_mb = int(wired) if wired else 0
|
|
if wired_mb > 0:
|
|
vram_gb = round(wired_mb / 1024.0, 1)
|
|
except ValueError:
|
|
pass
|
|
|
|
gpu = {"index": 0, "name": brand, "vram_gb": vram_gb}
|
|
return {
|
|
"gpu_name": brand,
|
|
"gpu_vram_gb": vram_gb,
|
|
"gpu_count": 1,
|
|
"gpus": [gpu],
|
|
"gpu_groups": _group_gpus([gpu]),
|
|
"homogeneous": True,
|
|
"backend": "metal",
|
|
# Unified memory: the "VRAM" above is carved out of system RAM, not a
|
|
# separate pool — downstream fit logic uses this to avoid double-budgeting.
|
|
"unified_memory": True,
|
|
}
|
|
|
|
|
|
def _read_file(path):
|
|
"""Read a file, locally or via SSH."""
|
|
if _remote_host:
|
|
return _run(["cat", path])
|
|
try:
|
|
with open(path) as f:
|
|
return f.read()
|
|
except Exception:
|
|
return None
|
|
|
|
|
|
def _parse_meminfo():
|
|
"""Parse /proc/meminfo into a dict of key -> KB values."""
|
|
text = _read_file("/proc/meminfo")
|
|
if not text:
|
|
return {}
|
|
result = {}
|
|
for line in text.split("\n"):
|
|
if ":" in line:
|
|
key, val = line.split(":", 1)
|
|
parts = val.strip().split()
|
|
if parts:
|
|
try:
|
|
result[key.strip()] = int(parts[0])
|
|
except ValueError:
|
|
pass
|
|
return result
|
|
|
|
|
|
def _get_ram_gb():
|
|
meminfo = _parse_meminfo()
|
|
if "MemTotal" in meminfo:
|
|
return meminfo["MemTotal"] / (1024**2)
|
|
|
|
if not _remote_host:
|
|
try:
|
|
pages = os.sysconf("SC_PHYS_PAGES")
|
|
page_size = os.sysconf("SC_PAGE_SIZE")
|
|
if pages and page_size:
|
|
return (pages * page_size) / (1024**3)
|
|
except Exception:
|
|
pass
|
|
|
|
# macOS has no /proc/meminfo — fall back to sysctl (works locally and over
|
|
# SSH to a remote Mac, where the sysconf path above isn't taken).
|
|
memsize = _run(["sysctl", "-n", "hw.memsize"])
|
|
if memsize:
|
|
try:
|
|
return int(memsize.strip()) / (1024**3)
|
|
except ValueError:
|
|
pass
|
|
return 0.0
|
|
|
|
|
|
def _get_available_ram_gb():
|
|
meminfo = _parse_meminfo()
|
|
if "MemAvailable" in meminfo:
|
|
return meminfo["MemAvailable"] / (1024**2)
|
|
return _get_ram_gb() * 0.7
|
|
|
|
|
|
def _get_cpu_name():
|
|
text = _read_file("/proc/cpuinfo")
|
|
if text:
|
|
for line in text.split("\n"):
|
|
if line.startswith("model name"):
|
|
return line.split(":", 1)[1].strip()
|
|
|
|
# macOS has no /proc/cpuinfo — sysctl gives the chip name (e.g. "Apple M4").
|
|
# Harmlessly returns nothing on Linux, so it's safe to try unconditionally.
|
|
brand = _run(["sysctl", "-n", "machdep.cpu.brand_string"])
|
|
if brand and brand.strip():
|
|
return brand.strip()
|
|
|
|
if not _remote_host:
|
|
return platform.processor() or "unknown"
|
|
return "unknown"
|
|
|
|
|
|
def _get_cpu_count():
|
|
if _remote_host:
|
|
# nproc on Linux; hw.ncpu via sysctl on a remote Mac (no nproc there).
|
|
out = _run(["nproc"]) or _run(["sysctl", "-n", "hw.ncpu"])
|
|
if out:
|
|
try:
|
|
return int(out.strip())
|
|
except ValueError:
|
|
pass
|
|
# fallback: count "processor" lines in /proc/cpuinfo
|
|
text = _read_file("/proc/cpuinfo")
|
|
if text:
|
|
return sum(1 for line in text.split("\n") if line.startswith("processor"))
|
|
return os.cpu_count() or 1
|
|
|
|
|
|
def _detect_windows():
|
|
"""Detect Windows hardware in a single SSH call using PowerShell."""
|
|
# Single PowerShell command that gathers all hardware info at once
|
|
ps_cmd = (
|
|
"$r = @{}; "
|
|
"$os = Get-CimInstance Win32_OperatingSystem; "
|
|
"$r.ram_gb = [math]::Round($os.TotalVisibleMemorySize / 1048576, 1); "
|
|
"$r.avail_gb = [math]::Round($os.FreePhysicalMemory / 1048576, 1); "
|
|
"$cpu = Get-CimInstance Win32_Processor | Select-Object -First 1; "
|
|
"$r.cpu_name = $cpu.Name; "
|
|
"$r.cpu_cores = (Get-CimInstance Win32_Processor | Measure-Object -Property NumberOfLogicalProcessors -Sum).Sum; "
|
|
"$r.arch = $cpu.AddressWidth; "
|
|
# GPU detection via nvidia-smi (fastest) or WMI fallback
|
|
"try { "
|
|
" $nv = nvidia-smi --query-gpu=memory.total,name --format=csv,noheader,nounits 2>$null; "
|
|
" if ($LASTEXITCODE -eq 0 -and $nv) { "
|
|
" $gpus = @(); "
|
|
" foreach ($line in $nv -split \"`n\") { "
|
|
" $p = $line -split ','; "
|
|
" if ($p.Count -ge 2) { $gpus += @{name=$p[1].Trim(); vram_mb=[double]$p[0].Trim()} } "
|
|
" }; "
|
|
" $r.gpu_name = $gpus[0].name; "
|
|
" $r.gpu_vram_gb = [math]::Round(($gpus | Measure-Object -Property vram_mb -Sum).Sum / 1024, 1); "
|
|
" $r.gpu_count = $gpus.Count; "
|
|
" $r.gpu_backend = 'cuda'; "
|
|
" } "
|
|
"} catch {}; "
|
|
"if (-not $r.gpu_name) { "
|
|
" $wmiGpu = Get-CimInstance Win32_VideoController | Where-Object { $_.AdapterRAM -gt 0 } | Select-Object -First 1; "
|
|
" if ($wmiGpu) { "
|
|
" $r.gpu_name = $wmiGpu.Name; "
|
|
" $r.gpu_vram_gb = [math]::Round($wmiGpu.AdapterRAM / 1073741824, 1); "
|
|
" $r.gpu_count = 1; "
|
|
" $r.gpu_backend = 'cpu_x86'; " # WMI doesn't tell us CUDA/ROCm
|
|
" } "
|
|
"}; "
|
|
"$r | ConvertTo-Json -Compress"
|
|
)
|
|
out = _run(f'powershell -Command "{ps_cmd}"')
|
|
if not out:
|
|
return None
|
|
import json as _json
|
|
try:
|
|
d = _json.loads(out)
|
|
result = {
|
|
"total_ram_gb": d.get("ram_gb", 0),
|
|
"available_ram_gb": d.get("avail_gb", 0),
|
|
"cpu_cores": d.get("cpu_cores", 1),
|
|
"cpu_name": d.get("cpu_name", "unknown"),
|
|
"has_gpu": bool(d.get("gpu_name")),
|
|
"gpu_name": d.get("gpu_name"),
|
|
"gpu_vram_gb": d.get("gpu_vram_gb"),
|
|
"gpu_count": d.get("gpu_count", 0),
|
|
"backend": d.get("gpu_backend", "cpu_x86"),
|
|
}
|
|
# PowerShell only reports aggregate GPU info, not per-card detail, so we
|
|
# can't tell a mixed box from a uniform one here — assume one homogeneous
|
|
# pool spanning all reported GPUs (the common Windows case).
|
|
_n = result["gpu_count"] or 0
|
|
if result["has_gpu"] and _n > 0:
|
|
_each = round((result["gpu_vram_gb"] or 0) / _n, 1)
|
|
result["gpus"] = [
|
|
{"index": i, "name": result["gpu_name"], "vram_gb": _each} for i in range(_n)
|
|
]
|
|
result["gpu_groups"] = [{
|
|
"name": result["gpu_name"],
|
|
"vram_each": _each,
|
|
"count": _n,
|
|
"indices": list(range(_n)),
|
|
"vram_total": result["gpu_vram_gb"],
|
|
}]
|
|
result["homogeneous"] = True
|
|
return result
|
|
except Exception:
|
|
return None
|
|
|
|
|
|
_cache_by_host = {} # host -> (timestamp, result)
|
|
|
|
|
|
def detect_system(host="", ssh_port="", platform="", fresh=False):
|
|
"""Detect system hardware: RAM, CPU, GPU. Cached per host (hardware rarely
|
|
changes, and probing a remote host over SSH is slow). Pass fresh=True to
|
|
bypass the cache and re-probe (the "Rescan" button).
|
|
If host is set (e.g. 'user@server'), runs detection commands over SSH.
|
|
platform: "windows", "linux", "termux", or "" (auto-detect).
|
|
"""
|
|
global _remote_host, _remote_port, _remote_platform
|
|
|
|
cache_key = host or "_local"
|
|
now = time.time()
|
|
if not fresh and cache_key in _cache_by_host:
|
|
ts, cached = _cache_by_host[cache_key]
|
|
if (now - ts) < CACHE_TTL:
|
|
return cached
|
|
|
|
_remote_host = host or None
|
|
_remote_port = ssh_port or None
|
|
_remote_platform = platform or None
|
|
|
|
# Windows: single PowerShell command for all hardware info
|
|
if _remote_platform == "windows" and _remote_host:
|
|
result = _detect_windows()
|
|
if result:
|
|
_remote_host = None
|
|
_remote_platform = None
|
|
_cache_by_host[cache_key] = (now, result)
|
|
return result
|
|
# If Windows detection failed, return error
|
|
result = {"error": f"Cannot connect to {host}", "host": host}
|
|
_remote_host = None
|
|
_remote_platform = None
|
|
_cache_by_host[cache_key] = (now, result)
|
|
return result
|
|
|
|
# Linux/Termux: existing multi-command detection
|
|
total_ram = round(_get_ram_gb(), 1)
|
|
# If remote host returns 0 RAM, connection likely failed
|
|
if _remote_host and total_ram <= 0:
|
|
result = {"error": f"Cannot connect to {host}", "host": host}
|
|
_cache_by_host[cache_key] = (now, result)
|
|
_remote_host = None
|
|
_remote_platform = None
|
|
return result
|
|
available_ram = round(_get_available_ram_gb(), 1)
|
|
cpu_cores = _get_cpu_count()
|
|
cpu_name = _get_cpu_name()
|
|
|
|
gpu_info = _detect_apple_silicon() or _detect_nvidia() or _detect_amd()
|
|
|
|
if gpu_info:
|
|
result = {
|
|
"total_ram_gb": total_ram,
|
|
"available_ram_gb": available_ram,
|
|
"cpu_cores": cpu_cores,
|
|
"cpu_name": cpu_name,
|
|
"has_gpu": True,
|
|
"gpu_name": gpu_info["gpu_name"],
|
|
"gpu_vram_gb": gpu_info["gpu_vram_gb"],
|
|
"gpu_count": gpu_info["gpu_count"],
|
|
"gpus": gpu_info.get("gpus", []),
|
|
"gpu_groups": gpu_info.get("gpu_groups", []),
|
|
"homogeneous": gpu_info.get("homogeneous", True),
|
|
"backend": gpu_info["backend"],
|
|
# Apple Silicon / AMD APUs share system RAM with the GPU — carry the
|
|
# flag through so callers can tell unified from discrete VRAM.
|
|
"unified_memory": gpu_info.get("unified_memory", False),
|
|
}
|
|
else:
|
|
if _remote_host:
|
|
arch_out = _run(["uname", "-m"]) or ""
|
|
else:
|
|
import platform as _platform
|
|
arch_out = _platform.machine().lower()
|
|
backend = "cpu_arm" if "aarch64" in arch_out or "arm" in arch_out else "cpu_x86"
|
|
result = {
|
|
"total_ram_gb": total_ram,
|
|
"available_ram_gb": available_ram,
|
|
"cpu_cores": cpu_cores,
|
|
"cpu_name": cpu_name,
|
|
"has_gpu": False,
|
|
"gpu_name": None,
|
|
"gpu_vram_gb": None,
|
|
"gpu_count": 0,
|
|
"backend": backend,
|
|
# Set when nvidia-smi exists but failed (e.g. driver/library
|
|
# version mismatch) — lets the UI say "GPU driver error" instead
|
|
# of the misleading "No GPU".
|
|
"gpu_error": _last_gpu_error,
|
|
}
|
|
|
|
_remote_host = None
|
|
_remote_platform = None
|
|
_cache_by_host[cache_key] = (now, result)
|
|
return result
|