mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-28 07:35:27 -04:00
Cookbook backend detection: report Vulkan on AMD hosts without ROCm; gate CUDA build on actual NVIDIA hardware
Three classes of incorrect detection fixed:
(1) AMD GPU + no ROCm installed (e.g. Strix Halo) was reported as
backend=rocm everywhere, so launch commands emitted
HIP_VISIBLE_DEVICES (silent no-op on Vulkan) and the from-source
build path failed. Both _probe_amd_sysfs (routes/cookbook_routes)
and _detect_amd (services/hwfit/hardware) now probe rocminfo /
hipconfig / vulkaninfo at detection time and report vulkan when
only Vulkan is present.
(2) Build helper was picking the CUDA branch on AMD hosts whenever a
stray pip-installed nvcc was on PATH (vLLM wheels carry one
without libcudart). Added _odysseus_has_nvidia_hw() that checks
nvidia-smi / /dev/nvidia* / lspci, and gates both the nvcc PATH
augmentation and the CUDA elif branch on real hardware.
(3) Build chain reordered to ROCm/HIP > CUDA > Vulkan > CPU. Vulkan
tier added between CUDA and CPU as a portable fallback for hosts
with a GPU but no native toolchain (the common Strix Halo case).
Same _append_llama_cpp_linux_accel_build_lines also auto-attempts
sudo -n apt/pacman/dnf install of cmake/build-essential/git when
they are missing, surfacing a clear no-passwordless-sudo warning
otherwise.
This commit is contained in:
@@ -282,7 +282,17 @@ def _detect_amd():
|
||||
"gpus": cards,
|
||||
"gpu_groups": groups,
|
||||
"homogeneous": len(groups) <= 1,
|
||||
"backend": "rocm",
|
||||
# Pick the actual runtime label: ROCm/HIP only when its
|
||||
# toolchain is installed, otherwise Vulkan if vulkaninfo is
|
||||
# present (mesa RADV works fine on RDNA/CDNA when ROCm
|
||||
# packages are absent — see Strix Halo where ROCm support
|
||||
# is still backporting). Reporting "rocm" on a Vulkan-only
|
||||
# host misleads downstream env-var pinning
|
||||
# (HIP_VISIBLE_DEVICES is a no-op there).
|
||||
"backend": (
|
||||
"rocm" if (_run(["which", "rocminfo"]) or _run(["which", "hipconfig"]))
|
||||
else ("vulkan" if _run(["which", "vulkaninfo"]) else "rocm")
|
||||
),
|
||||
"unified_memory": is_apu,
|
||||
# AMD ISA/family so downstream can tell datacenter Instinct (CDNA,
|
||||
# where vLLM/SGLang run AWQ/GPTQ reliably) from consumer Radeon
|
||||
|
||||
Reference in New Issue
Block a user