Cookbook backend detection: report Vulkan on AMD hosts without ROCm; gate CUDA build on actual NVIDIA hardware

Three classes of incorrect detection fixed:

(1) AMD GPU + no ROCm installed (e.g. Strix Halo) was reported as
    backend=rocm everywhere, so launch commands emitted
    HIP_VISIBLE_DEVICES (silent no-op on Vulkan) and the from-source
    build path failed. Both _probe_amd_sysfs (routes/cookbook_routes)
    and _detect_amd (services/hwfit/hardware) now probe rocminfo /
    hipconfig / vulkaninfo at detection time and report vulkan when
    only Vulkan is present.

(2) Build helper was picking the CUDA branch on AMD hosts whenever a
    stray pip-installed nvcc was on PATH (vLLM wheels carry one
    without libcudart). Added _odysseus_has_nvidia_hw() that checks
    nvidia-smi / /dev/nvidia* / lspci, and gates both the nvcc PATH
    augmentation and the CUDA elif branch on real hardware.

(3) Build chain reordered to ROCm/HIP > CUDA > Vulkan > CPU. Vulkan
    tier added between CUDA and CPU as a portable fallback for hosts
    with a GPU but no native toolchain (the common Strix Halo case).
    Same _append_llama_cpp_linux_accel_build_lines also auto-attempts
    sudo -n apt/pacman/dnf install of cmake/build-essential/git when
    they are missing, surfacing a clear no-passwordless-sudo warning
    otherwise.
This commit is contained in:
pewdiepie-archdaemon
2026-06-19 00:33:07 +00:00
parent b3e186746a
commit 1324e1b0d5
3 changed files with 293 additions and 20 deletions
+11 -1
View File
@@ -282,7 +282,17 @@ def _detect_amd():
"gpus": cards,
"gpu_groups": groups,
"homogeneous": len(groups) <= 1,
"backend": "rocm",
# Pick the actual runtime label: ROCm/HIP only when its
# toolchain is installed, otherwise Vulkan if vulkaninfo is
# present (mesa RADV works fine on RDNA/CDNA when ROCm
# packages are absent — see Strix Halo where ROCm support
# is still backporting). Reporting "rocm" on a Vulkan-only
# host misleads downstream env-var pinning
# (HIP_VISIBLE_DEVICES is a no-op there).
"backend": (
"rocm" if (_run(["which", "rocminfo"]) or _run(["which", "hipconfig"]))
else ("vulkan" if _run(["which", "vulkaninfo"]) else "rocm")
),
"unified_memory": is_apu,
# AMD ISA/family so downstream can tell datacenter Instinct (CDNA,
# where vLLM/SGLang run AWQ/GPTQ reliably) from consumer Radeon