Add a 'Rebuild llama.cpp' Cookbook action to force a fresh GPU build (#1787)

The serve bootstrap builds llama-server from source only when it is missing
from PATH, so a host that first compiled CPU-only (no nvcc present at build
time) reuses that CPU-only binary on every later serve and never gets a GPU
build, even after a CUDA/ROCm toolkit is installed. There was no UI lever to
force a rebuild.

Adds a 'Rebuild llama.cpp' button to the Cookbook Dependencies tab. It clears
the cached ~/bin/llama-server symlink and ~/llama.cpp/build directory (locally
or on the selected remote server) so the next serve recompiles and picks up
CUDA/HIP if a toolchain is now present. It installs and downloads nothing.

- routes/cookbook_helpers.py: _llama_cpp_rebuild_cmd() (single source of truth)
- routes/shell_routes.py: POST /api/cookbook/rebuild-engine (admin-only, reuses
  the existing SSH plumbing for remote hosts)
- static/js/cookbook.js: header button + handler honoring the deps server selector
- tests: cover the command shape and a clean run on a fresh HOME

Motivated by #831 (RTX 4070 user stuck on a CPU-only build with no way to
re-trigger the build).

Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
This commit is contained in:
ghreprimand
2026-06-02 23:28:19 -05:00
committed by GitHub
parent 51857c9008
commit 6f001af2a3
4 changed files with 135 additions and 0 deletions
+33
View File
@@ -10,6 +10,7 @@ from routes.cookbook_helpers import (
_append_llama_cpp_linux_accel_build_lines,
_append_serve_exit_code_lines,
_append_serve_preflight_exit_lines,
_llama_cpp_rebuild_cmd,
_local_tooling_path_export,
_pip_install_attempt,
_pip_install_fallback_chain,
@@ -338,6 +339,38 @@ def test_llama_cpp_linux_bootstrap_keeps_cpu_fallback_when_no_gpu_toolchain():
assert 'WARNING: no HIP/CUDA toolchain found — building llama-server for CPU only.' in script
assert 'Install ROCm for AMD GPUs or vLLM/CUDA tooling for NVIDIA' in script
def test_llama_cpp_rebuild_cmd_clears_cached_build_paths():
cmd = _llama_cpp_rebuild_cmd()
# Must remove both the cached symlink and the build dir the serve bootstrap
# links/creates, so the next serve recompiles from source.
assert 'rm -f "$HOME/bin/llama-server"' in cmd
assert 'rm -rf "$HOME/llama.cpp/build"' in cmd
# Recreates ~/bin so a never-served host does not error on a missing dir.
assert 'mkdir -p "$HOME/bin"' in cmd
# Diagnosis-only on the destructive side: it must not install or fetch.
assert 'pip install' not in cmd
assert 'git clone' not in cmd
assert 'curl' not in cmd and 'wget' not in cmd
def test_llama_cpp_rebuild_cmd_runs_clean_on_a_fresh_home(tmp_path):
"""The command should succeed even when neither path exists yet."""
import os
env = dict(os.environ)
env["HOME"] = str(tmp_path)
result = subprocess.run(
["bash", "-c", _llama_cpp_rebuild_cmd()],
capture_output=True, text=True, env=env, timeout=10,
)
assert result.returncode == 0, result.stderr
assert (tmp_path / "bin").is_dir()
assert "Cleared the cached llama.cpp build" in result.stdout
def test_cached_model_scan_reports_plain_dir_gguf(tmp_path):
"""Custom download dirs may sit inside the HF hub cache and contain plain
per-model folders. They must show up in Serve and keep the GGUF signal."""