Add macOS Apple Silicon Cookbook support

* Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning

hardware.py detects Apple Silicon locally and over SSH, reporting
backend=metal, the chip name, and a RAM-scaled fraction of unified
memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth
table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8)
that can't be served on Metal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 32ac81dbc6)

* Generate macOS/Metal serve commands and surface the Metal GPU

cookbook_routes.py adds a macOS serve path (Ollama, Metal-aware
llama.cpp build using `sysctl hw.ncpu` instead of `nproc`, and a clear
error if vLLM is attempted). The frontend defaults Metal serving to
llama.cpp and offers llama.cpp/Ollama instead of vLLM/SGLang. The
odysseus-cookbook CLI's `gpus` command reports the Metal GPU via
sysctl/vm_stat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 4ba01ce25d)

* Add launchd LaunchAgent for macOS (systemd equivalent)

com.odysseus.ui.plist + install-service-macos.sh run Odysseus at login
and restart on crash, the macOS counterpart to odysseus-ui.service. The
installer auto-fills paths from the venv, so there's no hand-editing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 3d4b6b2c7b)

* Document macOS install (brew, Ollama, AirPlay port, launchd)

README + setup.py cover the Homebrew / Apple Silicon path: brew install
python@3.11 tmux ollama, Metal serving via Ollama/llama.cpp, the launchd
service, and the macOS AirPlay Receiver conflict on ports 7000/5000.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 8dc9a3578a)

* Add downloadable macOS launcher app builder

build-macos-app.sh generates dist/Odysseus.app and a drag-to-Applications
dist/Odysseus.dmg. The app starts the local server from this repo's venv and
opens the UI in a chrome-less app window (Chromium --app mode, falling back to
the default browser). It's a launcher wrapper — it drives the venv rather than
bundling Python — so the install path is baked in at build time.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 7927940c38)

* Harden macOS Cookbook support: hide MLX, fix Metal build cache

Builds on the adopted PR #213 macOS/Metal work with two fixes and tests:

- fit.py: always drop MLX-quantized models. Odysseus only generates serve
  commands for llama.cpp/Ollama (Metal) and vLLM/SGLang (CUDA); MLX needs the
  mlx_lm runtime and the catalog's MLX repos ship no GGUF alternative, so they
  were surfaced on Apple Silicon but could never be served.
- cookbook_routes.py (macOS branch only): `rm -rf build` before configure so a
  poisoned CMakeCache from a prior failed CUDA attempt can't make every later
  build fail; explicit -DCMAKE_BUILD_TYPE=Release; a clear "brew install cmake"
  hint if cmake is missing. Linux/CUDA path unchanged.
- tests/test_hwfit_macos.py: MLX hidden on metal, MLX still hidden on CUDA
  (regression guard), Metal detection on Apple Silicon, and skipped on
  Linux/Intel (proves non-macOS detection is untouched).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Propagate unified_memory flag and document macOS GPU/Docker caveat

- hardware.py: detect_system now carries the unified_memory flag from GPU
  detection into the system dict (it was set by _detect_apple_silicon / AMD-APU
  detection but dropped during result assembly, so the API always reported
  null). Lets callers distinguish unified from discrete VRAM.
- README: prominent warning that Docker on Apple Silicon can't reach the Metal
  GPU (runs a Linux VM) — Cookbook must run natively for GPU serving; fix stale
  text that said Cookbook recommends MLX models (now hidden as unservable).
- test: detect_system propagates unified_memory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Put Odysseus's venv bin on PATH for cookbook runners

Native (non-Docker) installs run from a virtualenv whose bin holds the `hf` CLI
and `python3` the cookbook download/serve tmux scripts shell out to. Those
scripts start in a fresh login shell with the venv NOT activated, so on a native
macOS install `hf download` failed with "hf: command not found" — and the
`pip --user` self-heal missed because macOS has no bare `pip` command.

- cookbook_helpers.py: _local_tooling_path_export() — pure helper returning a
  PATH export for the running interpreter's bin dir (escaped for double quotes).
- cookbook_routes.py: download + serve runners prepend that dir on local runs
  (gated off SSH/Windows); swap the `pip` install fallbacks to `python3 -m pip`.
- tests: helper output for normal and spaced paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Document macOS llama.cpp serving prerequisites

Clarify the two serving paths on Apple Silicon: the recommended zero-build
route (brew install llama.cpp ships a Metal llama-server Cookbook finds on PATH),
and the from-source fallback, which requires cmake + Xcode Command Line Tools.
Without those the build is skipped and serving silently degrades to a slow CPU
build, so new users now know to install them (or use the prebuilt) up front.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Recommend only GGUF-servable models on Metal

Apple Silicon's only serving engines are llama.cpp and Ollama, both GGUF-only
(vLLM/SGLang are CUDA/ROCm and don't run on macOS). The catalog tags raw
safetensors repos with a default Q4_K_M quant, so the fit-ranking was
recommending ~397/501 models that have no GGUF and fail to serve on Metal with
"No GGUF found" (e.g. microsoft/Phi-mini-MoE-instruct).

Drop any model without a real GGUF (is_gguf/gguf_sources) on Apple Silicon —
subsumes the previous AWQ/GPTQ/FP8 special-case into one rule. On CUDA these
stay visible since vLLM serves safetensors directly. Metal recommendations go
501 -> 104, all actually servable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Remove macOS launchd LaunchAgent (cherry-picked extra)

Drop the launchd service from the PR #213 cherry-picks: the
install-service-macos.sh installer, the com.odysseus.ui.plist template, and the
README section documenting them. Tangential to the core Cookbook/Metal support
and not wanted. The build-macos-app.sh launcher is kept.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Add one-command macOS quick start (start-macos.sh)

Running Odysseus natively on a Mac previously meant ~7 manual terminal steps
(brew deps, venv, activate, pip, setup.py, uvicorn with the right port) — not
friendly for a generic macOS user, and the native run is required because Docker
on macOS can't reach the Metal GPU.

- start-macos.sh: installs Homebrew deps (python@3.11, tmux, prebuilt Metal
  llama.cpp), creates the venv, installs requirements, runs setup, and launches
  on a non-AirPlay port (7860). Idempotent; re-run to start again.
- README: the Apple Silicon section now leads with this one-command quick start
  and the clickable .app, with engine/port/manual details folded into a
  collapsible block. Added a pointer at the top of the manual-install section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* macOS quick start: auto-open browser when ready

The "open this URL" line scrolled out of view as uvicorn kept logging after it,
so users missed it. Now start-macos.sh waits (in the background) until the
server accepts connections, prints a boxed "ready" banner at that point (i.e.
after the startup burst, not before), and opens the URL in the default browser
automatically. Skippable with ODYSSEUS_NO_OPEN=1 for headless/SSH use.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Don't assume/force a specific Python version on macOS

The README claimed "system Python is 3.9" — a machine-specific generalization
that's often wrong (macOS ships no recent Python by default; many users already
have 3.11+). Make it generic, and make start-macos.sh detect an existing
Python 3.11+ and use it, only installing python@3.11 when none is found instead
of forcing it on top of the user's Python.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Align start-macos.sh venv path with build-macos-app.sh

start-macos.sh created the environment in .venv/, but build-macos-app.sh and
the manual install steps use venv/ — so the clickable .app wouldn't reuse the
quick-start's environment and would rebuild a second one. Use venv/ everywhere.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* README: state clearly that MLX is unsupported on Apple Silicon

Odysseus has no mlx_lm runtime; it serves GGUF (llama.cpp/Ollama) and CUDA
(vLLM/SGLang) only. MLX-only models can't run on a Mac and are hidden from
Cookbook — make that explicit in both the quick start and the details.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* start-macos.sh: build the venv with an arm64 Python on Apple Silicon

A clean-room run surfaced this: with a universal2/x86 Python (e.g. the
python.org installer under /usr/local), the venv's compiled extensions install
as arm64 but get loaded as x86_64 when launched from the .app bundle, so it
crashes with "incompatible architecture (have arm64, need x86_64)". The terminal
run happened to work only because a universal binary defaults to arm64 there.

On Apple Silicon, look only under /opt/homebrew (arm64-only) for the build
Python, and install Homebrew's python@3.11 if none is present — so the venv is
arm64-only and launches correctly from both the terminal and the .app. Intel
and non-mac paths are unchanged. Verified end-to-end in a clean clone: .app now
boots on Metal with no arch error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Address dev-exp review: macOS setup robustness + doc/UX fixes

From the voltagent dev-exp review of the branch:
- README: fix broken anchor links (the em-dash heading produced a slug the links
  didn't match); simplify the heading to a stable slug.
- cookbook_routes.py: add /opt/homebrew/bin and /usr/local/bin to the serve PATH
  so a brew-installed llama-server/ollama is found instead of falling back to a
  slow source build.
- start-macos.sh: guard against an empty Python path; fail fast with a clear
  message on port-in-use; ERR trap with a "safe to re-run" message; show pip
  progress (drop --quiet on the slow requirements install); stop the background
  browser-opener cleanly on exit/Ctrl+C (no orphaned poller).
- setup.py: bind hint to 127.0.0.1; suppress the manual run-hint when launched
  by start-macos.sh (ODYSSEUS_SKIP_RUN_HINT) so the URL isn't contradictory.
- build-macos-app.sh: the .app only opens the browser once the server is
  actually ready (not after the readiness timeout).
- cookbookServe.js: drop "Diffusers" from the Metal backend picker —
  diffusion_server.py is CUDA-only, so it was an unservable option on macOS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: yunggilja <yunggilja@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
John Chaplin
2026-06-01 15:29:19 +09:30
committed by GitHub
parent b998c52dd0
commit f1817fd560
13 changed files with 835 additions and 29 deletions
+73 -3
View File
@@ -59,6 +59,10 @@ image build. Open `http://localhost:7000` after the containers are healthy.
If port `7000` is already taken, set `APP_PORT=7001` (or another free port) If port `7000` is already taken, set `APP_PORT=7001` (or another free port)
in `.env`, recreate the container, and open `http://localhost:7001`. in `.env`, recreate the container, and open `http://localhost:7001`.
> **On Apple Silicon, Docker can't use the Metal GPU** (it runs a Linux VM), so
> Cookbook will serve models on the CPU only. For GPU-accelerated Cookbook,
> run the app natively — see [Apple Silicon](#apple-silicon-m-series).
Cookbook remote servers use an Odysseus-owned SSH key from `./data/ssh` Cookbook remote servers use an Odysseus-owned SSH key from `./data/ssh`
inside Docker. In **Cookbook -> Settings -> Servers**, generate/copy the inside Docker. In **Cookbook -> Settings -> Servers**, generate/copy the
public key and add it to the remote server's `~/.ssh/authorized_keys`. public key and add it to the remote server's `~/.ssh/authorized_keys`.
@@ -111,8 +115,12 @@ The Cookbook model catalog check should print a non-zero count. If it prints
`0`, rebuild the Odysseus image with `docker compose build --no-cache odysseus`. `0`, rebuild the Odysseus image with `docker compose build --no-cache odysseus`.
### Option 2: Manual install — Linux / macOS ### Option 2: Manual install — Linux / macOS
**Requirements:** Python 3.11+. On Linux/Termux, Cookbook also requires `tmux` **Requirements:** Python 3.11+. Cookbook also requires `tmux` for background
for background model downloads and serves. model downloads and serves.
> **On macOS (Apple Silicon)?** Skip the manual steps below — run
> `./start-macos.sh` for a one-command setup. See
> [Apple Silicon](#apple-silicon-m-series).
Install system packages first: Install system packages first:
```bash ```bash
@@ -124,19 +132,81 @@ sudo pacman -S tmux
# Fedora # Fedora
sudo dnf install tmux sudo dnf install tmux
# macOS (Homebrew). macOS ships no recent Python by default — install 3.11+
# (skip the python line if you already have Python 3.11 or newer):
brew install python@3.11 tmux
``` ```
Then install Odysseus: Then install Odysseus:
```bash ```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus cd odysseus
python3 -m venv venv python3 -m venv venv # on macOS use: python3.11 -m venv venv
source venv/bin/activate source venv/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
python setup.py # creates data dirs and prints an initial admin password python setup.py # creates data dirs and prints an initial admin password
python -m uvicorn app:app --host 0.0.0.0 --port 7000 python -m uvicorn app:app --host 0.0.0.0 --port 7000
``` ```
#### Apple Silicon (M-series)
> **On a Mac, run Odysseus natively (not in Docker) so Cookbook can use the
> Metal GPU.** Cookbook serves models on whatever machine Odysseus runs on, and
> Docker on macOS is a Linux VM with **no access to the GPU** — in a container
> your Mac looks like a CPU-only Linux box.
**Quick start — one command.** From a fresh clone:
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh
```
That installs what's needed via Homebrew (Python 3.11+, `tmux`, and a prebuilt
Metal `llama-server`), sets everything up, and launches Odysseus at
**http://127.0.0.1:7860**. Log in with the admin password it prints, open
**Cookbook**, and it detects your GPU (`backend: metal`) and recommends GGUF
models that fit your Mac. (MLX models aren't supported on macOS and are hidden —
see below.) Re-run `./start-macos.sh` any time to start it again (use another
port with `ODYSSEUS_PORT=7900 ./start-macos.sh`).
**Prefer a clickable app?** After your first `./start-macos.sh`, build a
launcher `Odysseus.app` (+ a drag-to-Applications `.dmg`) that starts the server
and opens the UI in its own window:
```bash
./build-macos-app.sh # → dist/Odysseus.app and dist/Odysseus.dmg
```
<details>
<summary>What <code>start-macos.sh</code> does, serving engines, and manual steps</summary>
`start-macos.sh` is just the manual steps wrapped up: Homebrew deps → a Python
`venv``pip install -r requirements.txt``python setup.py``uvicorn` on a
non-AirPlay port. Run them by hand if you prefer (the Linux steps above, but use
`python3.11 -m venv` and `--port 7860`).
**Serving engines on Metal** — Cookbook only recommends models it can serve here:
- **llama.cpp** — `brew install llama.cpp` (done by `start-macos.sh`) provides a
prebuilt Metal `llama-server`, no compile. Without it, Cookbook builds it from
source on first serve, which needs `cmake` + Xcode Command Line Tools
(`brew install cmake && xcode-select --install`).
- **Ollama** — `brew install ollama` is another simple Metal-accelerated option.
- vLLM/SGLang are CUDA/ROCm-only and do **not** run on macOS.
**MLX models are not supported on Apple Silicon.** Odysseus serves models via
llama.cpp/Ollama (GGUF) and vLLM/SGLang (CUDA) — it has no MLX (`mlx_lm`)
runtime. So MLX-only models can't be served on a Mac and are deliberately
**hidden** from Cookbook's recommendations there; pick a GGUF build instead.
**Port 7000 & AirPlay** — macOS AirPlay Receiver holds ports 7000/5000, so
`start-macos.sh` defaults to **7860**. To use 7000, turn AirPlay Receiver off in
System Settings → General → AirDrop & Handoff.
**Build prerequisites baked in** — the `.app` wraps this repo's `venv` (it
doesn't bundle Python), so the path is fixed at build time — rebuild if you move
the repo.
</details>
### Option 3: Manual install — Windows (PowerShell) ### Option 3: Manual install — Windows (PowerShell)
Windows support is not actively tested. Use it with caution; Docker on Linux Windows support is not actively tested. Use it with caution; Docker on Linux
or a Linux/macOS manual install is the safer path for now. or a Linux/macOS manual install is the safer path for now.
+169
View File
@@ -0,0 +1,169 @@
#!/bin/bash
# Build a downloadable macOS launcher app + .dmg for Odysseus.
#
# ./build-macos-app.sh
#
# Produces:
# dist/Odysseus.app — double-click: starts the local server (using this
# repo's venv) and opens the UI in an app-style window.
# dist/Odysseus.dmg — drag-to-Applications disk image (the downloadable).
#
# This is a *launcher* wrapper: it drives the venv we set up in this repo, it
# does not bundle Python. The install path is baked into the app at build time,
# so rebuild if you move the repo. Override the port with ODYSSEUS_PORT.
set -e
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
APP_NAME="Odysseus"
INSTALL_DIR="$REPO_DIR"
PORT="${ODYSSEUS_PORT:-7860}"
DIST="$REPO_DIR/dist"
APP="$DIST/$APP_NAME.app"
echo "Building $APP_NAME.app"
echo " install dir: $INSTALL_DIR"
echo " port: $PORT"
rm -rf "$APP"
mkdir -p "$APP/Contents/MacOS" "$APP/Contents/Resources"
# ── Icon (best effort) — center-crop docs/odysseus.jpg to a square .icns ──
if [ -f "$REPO_DIR/docs/odysseus.jpg" ] && command -v sips >/dev/null 2>&1; then
TMPIMG="$(mktemp -d)"
# Center-crop to a square, scale to 512 (sips' icns encoder caps at 512), and
# let sips emit the .icns directly — more robust across macOS versions than
# building an .iconset by hand.
sips -c 720 720 "$REPO_DIR/docs/odysseus.jpg" --out "$TMPIMG/sq.png" >/dev/null 2>&1 || cp "$REPO_DIR/docs/odysseus.jpg" "$TMPIMG/sq.png"
sips -z 512 512 "$TMPIMG/sq.png" --out "$TMPIMG/icon.png" >/dev/null 2>&1
if sips -s format icns "$TMPIMG/icon.png" --out "$APP/Contents/Resources/odysseus.icns" >/dev/null 2>&1; then
echo " icon: odysseus.icns"
else
echo " icon: (skipped — conversion failed)"
fi
rm -rf "$TMPIMG"
else
echo " icon: (skipped — no docs/odysseus.jpg)"
fi
# ── Info.plist ──
cat > "$APP/Contents/Info.plist" <<PLIST
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleName</key> <string>$APP_NAME</string>
<key>CFBundleDisplayName</key> <string>$APP_NAME</string>
<key>CFBundleIdentifier</key> <string>com.odysseus.launcher</string>
<key>CFBundleVersion</key> <string>1.0</string>
<key>CFBundleShortVersionString</key><string>1.0</string>
<key>CFBundlePackageType</key> <string>APPL</string>
<key>CFBundleExecutable</key> <string>$APP_NAME</string>
<key>CFBundleIconFile</key> <string>odysseus</string>
<key>LSMinimumSystemVersion</key> <string>11.0</string>
<key>NSHighResolutionCapable</key> <true/>
<key>LSUIElement</key> <false/>
</dict>
</plist>
PLIST
# ── Launcher executable (placeholders filled below) ──
cat > "$APP/Contents/MacOS/$APP_NAME.tmpl" <<'LAUNCHER'
#!/bin/bash
# Odysseus.app — start the local server and open the UI in an app window.
INSTALL_DIR="__INSTALL_DIR__"
PORT="__PORT__"
URL="http://127.0.0.1:${PORT}"
export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:$PATH"
UVICORN="$INSTALL_DIR/venv/bin/uvicorn"
LOG="$INSTALL_DIR/logs/odysseus-app.log"
notify() { /usr/bin/osascript -e "display notification \"$1\" with title \"Odysseus\"" >/dev/null 2>&1; }
die_gui() {
/usr/bin/osascript -e "display dialog \"$1\" with title \"Odysseus\" buttons {\"OK\"} default button 1 with icon stop" >/dev/null 2>&1
exit 1
}
[ -x "$UVICORN" ] || die_gui "Odysseus isn't set up yet. Open Terminal and run:
cd $INSTALL_DIR
python3.11 -m venv venv
./venv/bin/pip install -r requirements.txt
./venv/bin/python setup.py"
# Open the UI in a chrome-less app window (Chromium browsers), else default browser.
open_ui() {
local b base exe bin
for b in "Google Chrome" "Microsoft Edge" "Brave Browser" "Chromium"; do
for base in "/Applications" "$HOME/Applications"; do
if [ -d "$base/$b.app" ]; then
exe="$(/usr/bin/defaults read "$base/$b.app/Contents/Info" CFBundleExecutable 2>/dev/null)"
bin="$base/$b.app/Contents/MacOS/$exe"
if [ -x "$bin" ]; then
"$bin" --app="$URL" --new-window >/dev/null 2>&1 &
return 0
fi
fi
done
done
/usr/bin/open "$URL"
}
mkdir -p "$INSTALL_DIR/logs"
# Already running? Just open the UI.
if /usr/bin/curl -s -o /dev/null --max-time 2 "$URL"; then
open_ui
exit 0
fi
notify "Starting…"
cd "$INSTALL_DIR" || die_gui "Install folder not found: $INSTALL_DIR"
"$UVICORN" app:app --host 127.0.0.1 --port "$PORT" >>"$LOG" 2>&1 &
SERVER_PID=$!
# Quitting the app stops the server it started.
trap 'kill $SERVER_PID 2>/dev/null; exit 0' TERM INT
# Wait for readiness (first run downloads an embedding model — allow ~2 min).
READY=0
for i in $(seq 1 120); do
/usr/bin/curl -s -o /dev/null --max-time 2 "$URL" && { READY=1; break; }
kill -0 "$SERVER_PID" 2>/dev/null || die_gui "Odysseus failed to start. Log:
$LOG"
sleep 1
done
if [ "$READY" = "1" ]; then
open_ui
else
notify "Odysseus is taking a while — open $URL once it finishes starting."
fi
wait "$SERVER_PID"
LAUNCHER
sed -e "s|__INSTALL_DIR__|$INSTALL_DIR|g" -e "s|__PORT__|$PORT|g" \
"$APP/Contents/MacOS/$APP_NAME.tmpl" > "$APP/Contents/MacOS/$APP_NAME"
rm -f "$APP/Contents/MacOS/$APP_NAME.tmpl"
chmod +x "$APP/Contents/MacOS/$APP_NAME"
# Refresh Finder's icon cache for the new bundle.
touch "$APP"
# ── .dmg (drag-to-Applications) ──
echo "Packaging dist/$APP_NAME.dmg"
STAGE="$(mktemp -d)/dmg"
mkdir -p "$STAGE"
cp -R "$APP" "$STAGE/"
ln -s /Applications "$STAGE/Applications"
rm -f "$DIST/$APP_NAME.dmg"
hdiutil create -volname "$APP_NAME" -srcfolder "$STAGE" -ov -format UDZO "$DIST/$APP_NAME.dmg" >/dev/null
rm -rf "$STAGE"
echo ""
echo "Done:"
echo " $APP"
echo " $DIST/$APP_NAME.dmg"
echo ""
echo "Run it: open '$APP'"
echo "Install: open '$DIST/$APP_NAME.dmg' (drag Odysseus to Applications)"
+22
View File
@@ -102,6 +102,28 @@ def _shell_path(p: str) -> str:
return '"' + p + '"' return '"' + p + '"'
def _local_tooling_path_export(executable: str) -> str:
"""Bash line prepending the running interpreter's bin dir to PATH.
When Odysseus runs from a virtualenv, that bin dir holds the tools the
cookbook runners shell out to (`hf`, `python`). tmux runners start from a
fresh login shell with the venv NOT activated, so without this they can't
find `hf` and downloads fail with "hf: command not found" — notably on
macOS, where the `pip --user` self-heal also misses (`pip` isn't a command,
only `pip3`/`python3 -m pip`). Local runs only; meaningless over SSH.
"""
bin_dir = os.path.dirname(os.path.abspath(executable))
# Escape for a double-quoted context: $PATH must still expand, but spaces
# and shell metacharacters in the path must be preserved literally.
esc = (
bin_dir.replace("\\", "\\\\")
.replace('"', '\\"')
.replace("$", "\\$")
.replace("`", "\\`")
)
return f'export PATH="{esc}:$PATH"'
def _ps_squote(v: str) -> str: def _ps_squote(v: str) -> str:
"""Escape a value for PowerShell single-quoted string interpolation. """Escape a value for PowerShell single-quoted string interpolation.
Belt-and-suspenders on top of _validate_token's regex — if the regex Belt-and-suspenders on top of _validate_token's regex — if the regex
+55 -8
View File
@@ -7,6 +7,7 @@ import os
import re import re
import shlex import shlex
import shutil import shutil
import sys
import uuid import uuid
from pathlib import Path from pathlib import Path
@@ -25,7 +26,7 @@ from routes.cookbook_helpers import (
_validate_repo_id, _validate_include, _validate_remote_host, _validate_token, _validate_repo_id, _validate_include, _validate_remote_host, _validate_token,
_validate_local_dir, _validate_ssh_port, _validate_gpus, _shell_path, _validate_local_dir, _validate_ssh_port, _validate_gpus, _shell_path,
_ps_squote, _bash_squote, _validate_serve_cmd, _parse_serve_phase, _ps_squote, _bash_squote, _validate_serve_cmd, _parse_serve_phase,
_safe_env_prefix, _safe_env_prefix, _local_tooling_path_export,
ModelDownloadRequest, ServeRequest, ModelDownloadRequest, ServeRequest,
) )
@@ -357,16 +358,22 @@ def setup_cookbook_routes() -> APIRouter:
lines.append(f"export HF_TOKEN='{_bash_squote(req.hf_token)}'") lines.append(f"export HF_TOKEN='{_bash_squote(req.hf_token)}'")
# Ensure pip-user scripts (e.g. hf CLI installed via --user) are on PATH # Ensure pip-user scripts (e.g. hf CLI installed via --user) are on PATH
lines.append('export PATH="$HOME/.local/bin:$PATH"') lines.append('export PATH="$HOME/.local/bin:$PATH"')
# When Odysseus runs from a venv (e.g. native macOS install), put its bin
# on PATH so the tmux shell finds the bundled `hf`/`python3` without an
# activated venv. Local bash runs only — meaningless over SSH/Windows.
if not req.remote_host and req.platform != "windows":
lines.append(_local_tooling_path_export(sys.executable))
# Best-effort install hf CLI (always). hf_transfer (Rust parallel downloader) # Best-effort install hf CLI (always). hf_transfer (Rust parallel downloader)
# is fast but flaky on large files — it tends to crash near the end at high # is fast but flaky on large files — it tends to crash near the end at high
# throughput. Retries set disable_hf_transfer to fall back to the plain, # throughput. Retries set disable_hf_transfer to fall back to the plain,
# slower-but-reliable downloader (resumes cleanly from the .incomplete files). # slower-but-reliable downloader (resumes cleanly from the .incomplete files).
lines.append("command -v hf >/dev/null 2>&1 || pip install --user --break-system-packages -q -U huggingface_hub 2>/dev/null || pip install -q -U huggingface_hub 2>/dev/null") # Use `python3 -m pip` not `pip` — macOS has no bare `pip` command.
lines.append("command -v hf >/dev/null 2>&1 || python3 -m pip install --user --break-system-packages -q -U huggingface_hub 2>/dev/null || python3 -m pip install -q -U huggingface_hub 2>/dev/null")
if req.disable_hf_transfer: if req.disable_hf_transfer:
lines.append("export HF_HUB_ENABLE_HF_TRANSFER=0") lines.append("export HF_HUB_ENABLE_HF_TRANSFER=0")
lines.append("export HF_HUB_DOWNLOAD_MAX_WORKERS=4") lines.append("export HF_HUB_DOWNLOAD_MAX_WORKERS=4")
else: else:
lines.append("python3 -c 'import hf_transfer' 2>/dev/null || pip install --user --break-system-packages -q hf_transfer 2>/dev/null || pip install -q hf_transfer 2>/dev/null") lines.append("python3 -c 'import hf_transfer' 2>/dev/null || python3 -m pip install --user --break-system-packages -q hf_transfer 2>/dev/null || python3 -m pip install -q hf_transfer 2>/dev/null")
lines.append("python3 -c 'import hf_transfer' 2>/dev/null && export HF_HUB_ENABLE_HF_TRANSFER=1") lines.append("python3 -c 'import hf_transfer' 2>/dev/null && export HF_HUB_ENABLE_HF_TRANSFER=1")
lines.append("export HF_HUB_DOWNLOAD_MAX_WORKERS=8") lines.append("export HF_HUB_DOWNLOAD_MAX_WORKERS=8")
@@ -845,6 +852,10 @@ def setup_cookbook_routes() -> APIRouter:
# ── Linux/Termux: bash + tmux (existing flow) ── # ── Linux/Termux: bash + tmux (existing flow) ──
runner_lines = ["#!/bin/bash"] runner_lines = ["#!/bin/bash"]
runner_lines.extend(_user_shell_path_bootstrap()) runner_lines.extend(_user_shell_path_bootstrap())
# Put Odysseus's own venv bin on PATH (local runs only) so the serve
# shell resolves the bundled python3/hf, mirroring the download flow.
if not remote:
runner_lines.append(_local_tooling_path_export(sys.executable))
runner_lines.append("export FLASHINFER_DISABLE_VERSION_CHECK=1") runner_lines.append("export FLASHINFER_DISABLE_VERSION_CHECK=1")
if req.hf_token: if req.hf_token:
runner_lines.append(f"export HF_TOKEN='{_bash_squote(req.hf_token)}'") runner_lines.append(f"export HF_TOKEN='{_bash_squote(req.hf_token)}'")
@@ -864,7 +875,10 @@ def setup_cookbook_routes() -> APIRouter:
# Jinja2 rejects (do_tojson ensure_ascii). Build it once from # Jinja2 rejects (do_tojson ensure_ascii). Build it once from
# source if missing; keep llama-cpp-python only as a fallback. # source if missing; keep llama-cpp-python only as a fallback.
runner_lines.append('# Ensure a llama.cpp server (prefer native llama-server)') runner_lines.append('# Ensure a llama.cpp server (prefer native llama-server)')
runner_lines.append('export PATH="$HOME/.local/bin:$HOME/bin:$HOME/llama.cpp/build/bin:$PATH"') # Include the Homebrew bin dirs so a brew-installed llama-server /
# ollama is found (otherwise macOS falls back to a slow source build).
# /opt/homebrew = Apple Silicon, /usr/local = Intel; harmless on Linux.
runner_lines.append('export PATH="$HOME/.local/bin:$HOME/bin:$HOME/llama.cpp/build/bin:/opt/homebrew/bin:/usr/local/bin:$PATH"')
runner_lines.append('if [ -d /data/data/com.termux ]; then') runner_lines.append('if [ -d /data/data/com.termux ]; then')
runner_lines.append(' # Termux: no native build — use the Python bindings (CPU).') runner_lines.append(' # Termux: no native build — use the Python bindings (CPU).')
runner_lines.append(' if ! python3 -c "import llama_cpp" 2>/dev/null; then') runner_lines.append(' if ! python3 -c "import llama_cpp" 2>/dev/null; then')
@@ -876,17 +890,50 @@ def setup_cookbook_routes() -> APIRouter:
runner_lines.append(' echo "Native llama-server not found — building from source (one-time, may take a few minutes)..."') runner_lines.append(' echo "Native llama-server not found — building from source (one-time, may take a few minutes)..."')
runner_lines.append(' mkdir -p ~/bin') runner_lines.append(' mkdir -p ~/bin')
runner_lines.append(' cd ~ && [ -d llama.cpp ] || git clone --depth 1 https://github.com/ggml-org/llama.cpp') runner_lines.append(' cd ~ && [ -d llama.cpp ] || git clone --depth 1 https://github.com/ggml-org/llama.cpp')
# GPU build if CUDA is present; fall back to a plain (CPU) build. # Build with the right accelerator: Metal on macOS (llama.cpp
runner_lines.append(' cd ~/llama.cpp && { cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build; } \\') # enables it automatically, no flag), CUDA on Linux when present,
runner_lines.append(' && cmake --build build -j"$(nproc)" --target llama-server \\') # else a plain CPU build. nproc is Linux-only — fall back to
runner_lines.append(' && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server') # `sysctl hw.ncpu` on macOS. (Tip: `brew install llama.cpp` ships
# a prebuilt llama-server and skips this whole source build.)
runner_lines.append(' NPROC="$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)"')
runner_lines.append(' if [ "$(uname -s)" = "Darwin" ]; then')
runner_lines.append(' command -v cmake >/dev/null 2>&1 || echo "WARNING: cmake not found — install it with: brew install cmake (or: brew install llama.cpp for a prebuilt llama-server)."')
# Start from a clean cache: a prior failed configure (e.g. a CUDA
# attempt) poisons build/CMakeCache.txt, so a plain `cmake -B build`
# would reuse the bad settings and fail again. CMAKE_BUILD_TYPE is
# explicit so the binary is optimized (Metal auto-enables on macOS).
runner_lines.append(' cd ~/llama.cpp && rm -rf build && cmake -B build -DCMAKE_BUILD_TYPE=Release \\')
runner_lines.append(' && cmake --build build -j"$NPROC" --target llama-server \\')
runner_lines.append(' && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
runner_lines.append(' else')
runner_lines.append(' cd ~/llama.cpp && { cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build; } \\')
runner_lines.append(' && cmake --build build -j"$NPROC" --target llama-server \\')
runner_lines.append(' && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
runner_lines.append(' fi')
runner_lines.append(' # If the native build failed, fall back to the Python bindings.') runner_lines.append(' # If the native build failed, fall back to the Python bindings.')
runner_lines.append(' if ! command -v llama-server &>/dev/null && ! python3 -c "import llama_cpp" 2>/dev/null; then') runner_lines.append(' if ! command -v llama-server &>/dev/null && ! python3 -c "import llama_cpp" 2>/dev/null; then')
runner_lines.append(' echo "llama-server build failed — installing Python bindings as fallback..."') runner_lines.append(' echo "llama-server build failed — installing Python bindings as fallback..."')
runner_lines.append(' pip install --user --break-system-packages -q llama-cpp-python 2>/dev/null || pip install -q llama-cpp-python 2>/dev/null || true') runner_lines.append(' pip install --user --break-system-packages -q llama-cpp-python 2>/dev/null || pip install -q llama-cpp-python 2>/dev/null || true')
runner_lines.append(' fi') runner_lines.append(' fi')
runner_lines.append('fi') runner_lines.append('fi')
elif "ollama" in req.cmd:
# Ollama manages its own model store and HTTP server. Just make
# sure the binary exists and the daemon is up before running the
# command (the natural serving engine on Apple Silicon / Metal).
runner_lines.append('if ! command -v ollama &>/dev/null; then')
runner_lines.append(' echo "ERROR: Ollama not found. Install it (macOS: brew install ollama, or https://ollama.com/download), then launch again."')
runner_lines.append(' exit 127')
runner_lines.append('fi')
runner_lines.append('if ! curl -sf http://localhost:11434/api/tags >/dev/null 2>&1; then')
runner_lines.append(' echo "Starting ollama server..."; (ollama serve >/dev/null 2>&1 &)')
runner_lines.append(' for _ in 1 2 3 4 5 6 7 8 9 10; do curl -sf http://localhost:11434/api/tags >/dev/null 2>&1 && break; sleep 1; done')
runner_lines.append('fi')
elif "vllm serve" in req.cmd: elif "vllm serve" in req.cmd:
# vLLM is CUDA/ROCm-only and does not run on macOS at all.
runner_lines.append('if [ "$(uname -s)" = "Darwin" ]; then')
runner_lines.append(' echo "ERROR: vLLM does not run on macOS. Use Ollama or llama.cpp (Metal) instead."')
runner_lines.append(' exit 1')
runner_lines.append('fi')
# Put ~/.local/bin on PATH first — without a venv, vllm installs # Put ~/.local/bin on PATH first — without a venv, vllm installs
# there via --user and the non-login serve shell otherwise can't # there via --user and the non-login serve shell otherwise can't
# find the `vllm` CLI ("command not found"). Mirrors llama.cpp above. # find the `vllm` CLI ("command not found"). Mirrors llama.cpp above.
+71 -3
View File
@@ -95,21 +95,89 @@ def cmd_list(args) -> None:
# ─── gpus ──────────────────────────────────────────────────────────── # ─── gpus ────────────────────────────────────────────────────────────
def _macos_metal_gpu() -> list | None:
"""Apple Silicon has no discrete VRAM — report total unified memory as the
GPU budget so the web UI's picker shows the Mac's Metal GPU instead of
'no GPU'. `free` is approximated from vm_stat (page-granular); macOS doesn't
expose Metal utilization to the shell, so util is 0. Returns None off macOS."""
if sys.platform != "darwin":
return None
def _sysctl(key: str) -> str | None:
try:
r = subprocess.run(["sysctl", "-n", key], capture_output=True, text=True, timeout=5)
return r.stdout.strip() if r.returncode == 0 else None
except Exception:
return None
memsize = _sysctl("hw.memsize")
if not memsize or not memsize.isdigit():
return None
total_mb = int(memsize) // (1024 * 1024)
name = _sysctl("machdep.cpu.brand_string") or "Apple Silicon"
free_mb = total_mb
try:
vm = subprocess.run(["vm_stat"], capture_output=True, text=True, timeout=5)
if vm.returncode == 0:
page_size, pages = 4096, {}
for line in vm.stdout.splitlines():
if "page size of" in line:
m = re.search(r"page size of (\d+)", line)
if m:
page_size = int(m.group(1))
elif ":" in line:
k, v = line.split(":", 1)
v = v.strip().rstrip(".")
if v.isdigit():
pages[k.strip()] = int(v)
free_pages = (pages.get("Pages free", 0) + pages.get("Pages inactive", 0)
+ pages.get("Pages speculative", 0))
if free_pages:
free_mb = (free_pages * page_size) // (1024 * 1024)
except Exception:
pass
return [{
"index": 0,
"name": name,
"free_mb": free_mb,
"total_mb": total_mb,
"used_mb": max(0, total_mb - free_mb),
"util_pct": 0,
"uuid": "apple-metal-0",
"unified_memory": True,
"busy": (free_mb / total_mb) < 0.5 if total_mb else False,
}]
def cmd_gpus(args) -> None: def cmd_gpus(args) -> None:
"""Same shape the web UI gets — index/name/free_mb/total_mb/used_mb/ """Same shape the web UI gets — index/name/free_mb/total_mb/used_mb/
util_pct/uuid. Returns `[]` with an `error` field if nvidia-smi is util_pct/uuid. On Apple Silicon (no nvidia-smi) reports the Metal GPU's
missing (laptop / CPU-only box). Pass `--host user@box` to run over unified memory instead. Returns `[]` with an `error` field only on a
SSH against a remote machine.""" CPU-only non-Mac box. Pass `--host user@box` to run over SSH."""
query = "nvidia-smi --query-gpu=index,name,memory.free,memory.total,memory.used,utilization.gpu,uuid --format=csv,noheader,nounits" query = "nvidia-smi --query-gpu=index,name,memory.free,memory.total,memory.used,utilization.gpu,uuid --format=csv,noheader,nounits"
prefix = _ssh_prefix(args.host, args.ssh_port) prefix = _ssh_prefix(args.host, args.ssh_port)
cmd = prefix + (query.split() if not prefix else [query]) cmd = prefix + (query.split() if not prefix else [query])
try: try:
out = subprocess.run(cmd, capture_output=True, text=True, timeout=15) out = subprocess.run(cmd, capture_output=True, text=True, timeout=15)
except FileNotFoundError: except FileNotFoundError:
# No nvidia-smi locally → try the Metal fallback before giving up.
if not prefix:
mac = _macos_metal_gpu()
if mac is not None:
emit({"ok": True, "gpus": mac, "backend": "metal"}, args)
return
msg = "ssh not found" if prefix else "nvidia-smi not found" msg = "ssh not found" if prefix else "nvidia-smi not found"
emit({"ok": False, "error": msg, "gpus": []}, args) emit({"ok": False, "error": msg, "gpus": []}, args)
return return
if out.returncode != 0: if out.returncode != 0:
# nvidia-smi present but errored (or no NVIDIA GPU) — fall back to Metal.
if not prefix:
mac = _macos_metal_gpu()
if mac is not None:
emit({"ok": True, "gpus": mac, "backend": "metal"}, args)
return
emit({"ok": False, "error": out.stderr.strip()[:200], "gpus": []}, args) emit({"ok": False, "error": out.stderr.strip()[:200], "gpus": []}, args)
return return
gpus = [] gpus = []
+27 -6
View File
@@ -19,12 +19,22 @@ GPU_BANDWIDTH = {
"6950 xt": 576, "6900 xt": 512, "6800 xt": 512, "6800": 512, "6700 xt": 384, "6600 xt": 256, "6600": 224, "6950 xt": 576, "6900 xt": 512, "6800 xt": 512, "6800": 512, "6700 xt": 384, "6600 xt": 256, "6600": 224,
"mi300x": 5300, "mi300": 5300, "mi250x": 3277, "mi250": 3277, "mi210": 1638, "mi100": 1229, "mi300x": 5300, "mi300": 5300, "mi250x": 3277, "mi250": 3277, "mi210": 1638, "mi100": 1229,
"9070 xt": 624, "9070": 488, "9070 xt": 624, "9070": 488,
# Apple Silicon unified-memory bandwidth (GB/s). Keyed off the chip name
# reported by sysctl machdep.cpu.brand_string (e.g. "Apple M4 Max"). Listed
# before the bare "m_" keys matters less than length-sorting (done below),
# which guarantees "m4 max" is tried before "m4".
"m1 ultra": 800, "m1 max": 400, "m1 pro": 200, "m1": 68,
"m2 ultra": 800, "m2 max": 400, "m2 pro": 200, "m2": 100,
"m3 ultra": 800, "m3 max": 300, "m3 pro": 150, "m3": 100,
"m4 max": 410, "m4 pro": 273, "m4": 120,
} }
# Pre-sort keys by length descending for correct substring matching # Pre-sort keys by length descending for correct substring matching
_BW_KEYS_SORTED = sorted(GPU_BANDWIDTH.keys(), key=len, reverse=True) _BW_KEYS_SORTED = sorted(GPU_BANDWIDTH.keys(), key=len, reverse=True)
FALLBACK_K = {"cuda": 220, "rocm": 180, "cpu_x86": 70, "cpu_arm": 90} # metal: backstop for Apple Silicon chips not in GPU_BANDWIDTH (e.g. a future
# M5) — the named chips above take the accurate bandwidth path instead.
FALLBACK_K = {"cuda": 220, "rocm": 180, "metal": 150, "cpu_x86": 70, "cpu_arm": 90}
USE_CASE_WEIGHTS = { USE_CASE_WEIGHTS = {
"general": (0.45, 0.30, 0.15, 0.10), "general": (0.45, 0.30, 0.15, 0.10),
@@ -411,17 +421,28 @@ def rank_models(system, use_case=None, limit=50, search=None, sort="score", quan
# If user picked a prequantized format (AWQ/FP8/GPTQ), filter to only those models # If user picked a prequantized format (AWQ/FP8/GPTQ), filter to only those models
filter_native = quant and any(quant.startswith(p) for p in ("AWQ-", "GPTQ-", "FP8")) filter_native = quant and any(quant.startswith(p) for p in ("AWQ-", "GPTQ-", "FP8"))
# MLX-quantized models only run on Apple Silicon (Metal). Exclude them on
# every other backend (CUDA / ROCm / CPU) so Linux/Windows users don't see
# unrunnable suggestions.
system_backend = (system.get("backend") or "").lower() system_backend = (system.get("backend") or "").lower()
apple_silicon = system_backend in ("mps", "metal", "apple") apple_silicon = system_backend in ("mps", "metal", "apple")
for m in models: for m in models:
native_q = m.get("quantization", "") native_q = m.get("quantization", "")
# Drop MLX models on non-Apple hardware # MLX-quantized models need the MLX runtime (mlx_lm), which Odysseus
if not apple_silicon and native_q.startswith("mlx-"): # doesn't generate serve commands for — only llama.cpp/Ollama (Metal)
# and vLLM/SGLang (CUDA). MLX repos ship no GGUF alternative, so they're
# unrunnable on every backend we support. Always drop them, on Apple
# Silicon too, so the Cookbook never recommends a model it can't serve.
if native_q.startswith("mlx-"):
continue
# On Apple Silicon the only serving engines are llama.cpp and Ollama,
# both GGUF-only (vLLM/SGLang are CUDA/ROCm and don't run on macOS). So
# a model is Metal-servable ONLY if it ships a real GGUF. Drop everything
# else — raw safetensors repos (which the catalog still tags with a
# default GGUF quant) and vLLM-only AWQ/GPTQ/FP8 builds alike. Without
# this the Cookbook recommends models the Mac can't run; on CUDA these
# stay visible because vLLM serves safetensors directly.
if apple_silicon and not (m.get("is_gguf") or m.get("gguf_sources")):
continue continue
# Format filter: AWQ tab → only AWQ models, FP8 tab → only FP8 models # Format filter: AWQ tab → only AWQ models, FP8 tab → only FP8 models
+97 -2
View File
@@ -204,6 +204,82 @@ def _detect_amd():
return None return None
def _detect_apple_silicon():
"""Detect Apple Silicon (M-series) GPUs.
Macs have no discrete VRAM the GPU shares the system's unified memory.
We report a fraction of total RAM as the usable GPU budget (matching macOS's
default Metal working-set limit) so the Cookbook recommends models that
actually run on the GPU instead of classifying the machine as CPU-only.
backend="metal" is what services.hwfit.fit and the serve-command generation
key off of (they already understand MLX / llama.cpp-Metal). Works locally
(platform.system()=="Darwin") and over SSH (uname -s == Darwin).
"""
# Gate to macOS — locally via platform, remotely via uname.
if _remote_host:
if "darwin" not in (_run(["uname", "-s"]) or "").lower():
return None
arch = (_run(["uname", "-m"]) or "").lower()
else:
if platform.system() != "Darwin":
return None
arch = platform.machine().lower()
# Only Apple Silicon (arm64) has a Metal GPU worth serving LLMs on; Intel
# Macs fall through to the CPU path.
if "arm" not in arch and "aarch64" not in arch:
return None
# Chip name, e.g. "Apple M4 Max" — carries the Pro/Max/Ultra variant that
# the fit bandwidth table keys off of.
brand = (_run(["sysctl", "-n", "machdep.cpu.brand_string"]) or "Apple Silicon").strip()
# Total unified memory in bytes.
memsize = _run(["sysctl", "-n", "hw.memsize"])
try:
total_gb = int(memsize) / (1024**3) if memsize else 0.0
except ValueError:
total_gb = 0.0
if total_gb <= 0:
return None
# Usable GPU budget. macOS lets Metal use most of unified memory, but the
# default working-set limit scales with RAM: small machines have to keep
# more back for the OS + app. These fractions track Apple's
# recommendedMaxWorkingSetSize defaults across the lineup. Honour an
# explicit override if the user raised it with
# `sudo sysctl iogpu.wired_limit_mb=…`.
if total_gb <= 16:
frac = 0.67
elif total_gb <= 64:
frac = 0.75
else:
frac = 0.80
vram_gb = round(total_gb * frac, 1)
wired = _run(["sysctl", "-n", "iogpu.wired_limit_mb"])
try:
wired_mb = int(wired) if wired else 0
if wired_mb > 0:
vram_gb = round(wired_mb / 1024.0, 1)
except ValueError:
pass
gpu = {"index": 0, "name": brand, "vram_gb": vram_gb}
return {
"gpu_name": brand,
"gpu_vram_gb": vram_gb,
"gpu_count": 1,
"gpus": [gpu],
"gpu_groups": _group_gpus([gpu]),
"homogeneous": True,
"backend": "metal",
# Unified memory: the "VRAM" above is carved out of system RAM, not a
# separate pool — downstream fit logic uses this to avoid double-budgeting.
"unified_memory": True,
}
def _read_file(path): def _read_file(path):
"""Read a file, locally or via SSH.""" """Read a file, locally or via SSH."""
if _remote_host: if _remote_host:
@@ -246,6 +322,15 @@ def _get_ram_gb():
return (pages * page_size) / (1024**3) return (pages * page_size) / (1024**3)
except Exception: except Exception:
pass pass
# macOS has no /proc/meminfo — fall back to sysctl (works locally and over
# SSH to a remote Mac, where the sysconf path above isn't taken).
memsize = _run(["sysctl", "-n", "hw.memsize"])
if memsize:
try:
return int(memsize.strip()) / (1024**3)
except ValueError:
pass
return 0.0 return 0.0
@@ -263,6 +348,12 @@ def _get_cpu_name():
if line.startswith("model name"): if line.startswith("model name"):
return line.split(":", 1)[1].strip() return line.split(":", 1)[1].strip()
# macOS has no /proc/cpuinfo — sysctl gives the chip name (e.g. "Apple M4").
# Harmlessly returns nothing on Linux, so it's safe to try unconditionally.
brand = _run(["sysctl", "-n", "machdep.cpu.brand_string"])
if brand and brand.strip():
return brand.strip()
if not _remote_host: if not _remote_host:
return platform.processor() or "unknown" return platform.processor() or "unknown"
return "unknown" return "unknown"
@@ -270,7 +361,8 @@ def _get_cpu_name():
def _get_cpu_count(): def _get_cpu_count():
if _remote_host: if _remote_host:
out = _run(["nproc"]) # nproc on Linux; hw.ncpu via sysctl on a remote Mac (no nproc there).
out = _run(["nproc"]) or _run(["sysctl", "-n", "hw.ncpu"])
if out: if out:
try: try:
return int(out.strip()) return int(out.strip())
@@ -411,7 +503,7 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
cpu_cores = _get_cpu_count() cpu_cores = _get_cpu_count()
cpu_name = _get_cpu_name() cpu_name = _get_cpu_name()
gpu_info = _detect_nvidia() or _detect_amd() gpu_info = _detect_apple_silicon() or _detect_nvidia() or _detect_amd()
if gpu_info: if gpu_info:
result = { result = {
@@ -427,6 +519,9 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
"gpu_groups": gpu_info.get("gpu_groups", []), "gpu_groups": gpu_info.get("gpu_groups", []),
"homogeneous": gpu_info.get("homogeneous", True), "homogeneous": gpu_info.get("homogeneous", True),
"backend": gpu_info["backend"], "backend": gpu_info["backend"],
# Apple Silicon / AMD APUs share system RAM with the GPU — carry the
# flag through so callers can tell unified from discrete VRAM.
"unified_memory": gpu_info.get("unified_memory", False),
} }
else: else:
if _remote_host: if _remote_host:
+12 -6
View File
@@ -109,9 +109,12 @@ def check_deps():
print("\n [warn] tmux not found") print("\n [warn] tmux not found")
print(" Cookbook uses tmux for background downloads and model serves.") print(" Cookbook uses tmux for background downloads and model serves.")
print(" Install it with your OS package manager, for example:") print(" Install it with your OS package manager, for example:")
print(" sudo apt install tmux") if sys.platform == "darwin":
print(" sudo pacman -S tmux") print(" brew install tmux")
print(" sudo dnf install tmux") else:
print(" sudo apt install tmux")
print(" sudo pacman -S tmux")
print(" sudo dnf install tmux")
elif os.name != "nt": elif os.name != "nt":
print(" [ok] tmux installed") print(" [ok] tmux installed")
@@ -142,9 +145,12 @@ def main():
print(f" [warn] Admin creation failed: {e}") print(f" [warn] Admin creation failed: {e}")
print("\n=== Setup complete ===") print("\n=== Setup complete ===")
print(f"\nStart the server with:") # start-macos.sh launches the server itself (on its own port) right after
print(f" python -m uvicorn app:app --host 0.0.0.0 --port 7000") # this, so suppress the manual hint there to avoid a contradictory URL.
print(f"\nThen open http://localhost:7000") if not os.getenv("ODYSSEUS_SKIP_RUN_HINT"):
print(f"\nStart the server with:")
print(f" python -m uvicorn app:app --host 127.0.0.1 --port 7000")
print(f"\nThen open http://localhost:7000")
print(f"Login with the admin username and temporary password printed above.\n") print(f"Login with the admin username and temporary password printed above.\n")
Executable
+139
View File
@@ -0,0 +1,139 @@
#!/bin/bash
# Odysseus — one-command quick start for macOS (Apple Silicon).
#
# ./start-macos.sh
#
# Installs everything Odysseus needs via Homebrew, sets up a local Python
# environment, and launches the app — so a generic Mac user can run it without
# knowing anything about venvs, pip, or uvicorn. Safe to re-run; it skips work
# that's already done.
#
# Why native (not Docker): Cookbook serves models on whatever machine Odysseus
# runs on, and Docker on macOS is a Linux VM with no access to the Metal GPU.
# Running natively lets Cookbook detect and use your Mac's GPU.
set -e
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$REPO_DIR"
PORT="${ODYSSEUS_PORT:-7860}" # 7860, not 7000 — macOS AirPlay Receiver holds 7000.
# Friendly message on any failure — re-running is safe (every step is idempotent).
trap 'echo; echo "✗ Setup failed above. It is safe to re-run ./start-macos.sh."; exit 1' ERR
echo "▶ Odysseus quick start for macOS"
# Fail fast if the port is already taken (e.g. a previous run still running).
if (exec 3<>"/dev/tcp/127.0.0.1/$PORT") 2>/dev/null; then
echo "✗ Port $PORT is already in use. Stop what's using it, or pick another port:"
echo " ODYSSEUS_PORT=7900 ./start-macos.sh"
exit 1
fi
# 1. Homebrew — the macOS package manager. We can't safely auto-install it
# (it wants its own interactive confirmation), so point the user at it.
if ! command -v brew >/dev/null 2>&1; then
echo
echo "Homebrew is required but not installed. Install it (one command), then re-run this script:"
echo ' /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"'
echo
echo "More info: https://brew.sh"
exit 1
fi
# 2. Find a Python 3.11+ to build the environment with.
# On Apple Silicon we require an *arm64* interpreter (Homebrew's, under
# /opt/homebrew). A universal2 or x86 Python — e.g. the python.org installer
# at /usr/local — produces a venv whose compiled extensions get loaded as the
# wrong architecture when launched from the .app bundle (Cookbook then dies
# with "incompatible architecture"). So on arm64 we only look under
# /opt/homebrew and install Homebrew's python@3.11 if it's missing. On Intel
# (or non-mac) we just use whatever Python 3.11+ is on PATH.
PY=""
if [ "$(uname -m)" = "arm64" ]; then
cands="/opt/homebrew/bin/python3.13 /opt/homebrew/bin/python3.12 /opt/homebrew/bin/python3.11"
else
cands="python3 python3.13 python3.12 python3.11"
fi
for cand in $cands; do
p="$(command -v "$cand" 2>/dev/null)" || continue
if "$p" -c 'import sys; raise SystemExit(0 if sys.version_info[:2] >= (3, 11) else 1)' 2>/dev/null; then
PY="$p"; break
fi
done
# System dependencies:
# - tmux : Cookbook runs model downloads/serves in the background
# - llama.cpp : a prebuilt, Metal-enabled llama-server so Cookbook can serve
# GGUF models on the GPU with no compile step
# - python@3.11 : installed only if no suitable (arm64) Python was found above
echo "▶ Installing dependencies (Homebrew)…"
if [ -n "$PY" ]; then
echo " (using $("$PY" --version 2>&1) at $PY)"
brew install tmux llama.cpp
else
brew install python@3.11 tmux llama.cpp
PY="$(command -v /opt/homebrew/bin/python3.11 || command -v python3.11 || true)"
fi
if [ -z "$PY" ] || [ ! -x "$PY" ]; then
echo "✗ Couldn't find a Python 3.11+ to build the environment with."
echo " Check: ls /opt/homebrew/bin/python3* (or install one: brew install python@3.11)"
exit 1
fi
# 3. Python environment + dependencies (kept inside the repo, in venv/).
# Named `venv` to match the manual steps and build-macos-app.sh, so the
# clickable .app reuses this same environment.
if [ ! -d venv ]; then
echo "▶ Creating Python environment…"
"$PY" -m venv venv
fi
echo "▶ Installing Python packages (first run downloads a few — can take a few minutes)…"
./venv/bin/python -m pip install --quiet --upgrade pip
# Not --quiet: this is the slow step, so show progress (and any real errors).
./venv/bin/python -m pip install -r requirements.txt
# 4. First-run setup: creates data dirs and prints an initial admin password
# the first time (idempotent — does nothing if already set up). Suppress its
# manual run hint — we launch the server ourselves just below.
echo "▶ Preparing Odysseus…"
ODYSSEUS_SKIP_RUN_HINT=1 ./venv/bin/python setup.py
# 5. Launch. Bind to loopback only (safe default).
URL="http://127.0.0.1:$PORT"
# Open the browser automatically once the server is accepting connections — so
# the URL isn't lost in the startup logs that keep scrolling. Runs in the
# background and is cleaned up when the server stops. Skip with
# ODYSSEUS_NO_OPEN=1 (e.g. over SSH / headless).
POLLER_PID=""
if [ -z "$ODYSSEUS_NO_OPEN" ] && command -v open >/dev/null 2>&1; then
(
for _ in $(seq 1 90); do
if (exec 3<>"/dev/tcp/127.0.0.1/$PORT") 2>/dev/null; then
printf '\n'
printf ' ┌────────────────────────────────────────────┐\n'
printf ' │ ✓ Odysseus is ready — opening your browser │\n'
printf ' │ %-40s │\n' "$URL"
printf ' │ (Press Ctrl+C in this window to stop) │\n'
printf ' └────────────────────────────────────────────┘\n\n'
open "$URL"
break
fi
sleep 1
done
) &
POLLER_PID=$!
fi
# Setup is done — drop the setup-failure handler, and clean up the background
# opener when the server exits or the user presses Ctrl+C.
trap - ERR
trap '[ -n "$POLLER_PID" ] && kill "$POLLER_PID" 2>/dev/null' EXIT INT TERM
echo
echo "▶ Starting Odysseus — it will open in your browser at $URL"
echo " (this takes a few seconds; press Ctrl+C here to stop)"
echo
./venv/bin/python -m uvicorn app:app --host 127.0.0.1 --port "$PORT"
+15
View File
@@ -171,6 +171,13 @@ export function _isWindows(hostOrTask) {
return _getPlatform(hostOrTask) === 'windows'; return _getPlatform(hostOrTask) === 'windows';
} }
/** Check if the detected (local) hardware is Apple Silicon / Metal. Keys off the
* hardware probe's backend rather than a platform string, since a local Mac
* reports no platform but does report backend: "metal". */
export function _isMetal() {
return ['metal', 'mps', 'apple'].includes(String(_hwfitCache?.system?.backend || '').toLowerCase());
}
/** Detect model-specific vLLM optimizations */ /** Detect model-specific vLLM optimizations */
function _detectModelOptimizations(modelName) { function _detectModelOptimizations(modelName) {
const n = (modelName || '').toLowerCase(); const n = (modelName || '').toLowerCase();
@@ -252,6 +259,13 @@ export function _detectBackend(model) {
return { backend: 'llamacpp', label: 'llama.cpp' }; return { backend: 'llamacpp', label: 'llama.cpp' };
} }
// Apple Silicon (Metal) → llama.cpp (GGUF). vLLM/SGLang are CUDA/ROCm-only and
// don't run on macOS; AWQ/GPTQ/FP8 (vLLM-only) models are already filtered out
// of metal Cookbook results, so llama.cpp is always the right engine here.
if (['metal', 'mps', 'apple'].includes(sysBackend)) {
return { backend: 'llamacpp', label: 'llama.cpp' };
}
// AWQ / GPTQ / FP8 → vLLM // AWQ / GPTQ / FP8 → vLLM
if (/^AWQ|^GPTQ/.test(q) || q === 'FP8') { if (/^AWQ|^GPTQ/.test(q) || q === 'FP8') {
return { backend: 'vllm', label: 'vLLM' }; return { backend: 'vllm', label: 'vLLM' };
@@ -1764,6 +1778,7 @@ const shared = {
_sshPrefix, _sshPrefix,
_getPlatform, _getPlatform,
_isWindows, _isWindows,
_isMetal,
_buildEnvPrefix, _buildEnvPrefix,
_buildServeCmd, _buildServeCmd,
_shellQuote, _shellQuote,
+5
View File
@@ -16,6 +16,7 @@ let _getPort;
let _sshPrefix; let _sshPrefix;
let _getPlatform; let _getPlatform;
let _isWindows; let _isWindows;
let _isMetal;
let _buildEnvPrefix; let _buildEnvPrefix;
let _buildServeCmd; let _buildServeCmd;
let _shellQuote; let _shellQuote;
@@ -382,6 +383,9 @@ function _rerenderCachedModels() {
panelHtml += `<div class="hwfit-serve-row">`; panelHtml += `<div class="hwfit-serve-row">`;
const _backendChoices = _isWindows() const _backendChoices = _isWindows()
? [['llamacpp','llama.cpp']] ? [['llamacpp','llama.cpp']]
: _isMetal()
// Diffusers (diffusion_server.py) is CUDA-only — omit it on Metal.
? [['llamacpp','llama.cpp'],['ollama','Ollama']]
: [['vllm','vLLM'],['sglang','SGLang'],['llamacpp','llama.cpp'],['diffusers','Diffusers']]; : [['vllm','vLLM'],['sglang','SGLang'],['llamacpp','llama.cpp'],['diffusers','Diffusers']];
const backendOpts = _backendChoices.map(([v,l]) => `<option value="${v}"${defaultBackend===v?' selected':''}>${l}</option>`).join(''); const backendOpts = _backendChoices.map(([v,l]) => `<option value="${v}"${defaultBackend===v?' selected':''}>${l}</option>`).join('');
panelHtml += `<label>${_l('Backend','Inference engine: vLLM, SGLang, llama.cpp, or Diffusers')}<select class="hwfit-sf" data-field="backend">${backendOpts}</select></label>`; panelHtml += `<label>${_l('Backend','Inference engine: vLLM, SGLang, llama.cpp, or Diffusers')}<select class="hwfit-sf" data-field="backend">${backendOpts}</select></label>`;
@@ -1592,6 +1596,7 @@ export function initServe(shared) {
_sshPrefix = shared._sshPrefix; _sshPrefix = shared._sshPrefix;
_getPlatform = shared._getPlatform; _getPlatform = shared._getPlatform;
_isWindows = shared._isWindows; _isWindows = shared._isWindows;
_isMetal = shared._isMetal;
_buildEnvPrefix = shared._buildEnvPrefix; _buildEnvPrefix = shared._buildEnvPrefix;
_buildServeCmd = shared._buildServeCmd; _buildServeCmd = shared._buildServeCmd;
_shellQuote = shared._shellQuote; _shellQuote = shared._shellQuote;
+21 -1
View File
@@ -1,7 +1,12 @@
import pytest import pytest
from fastapi import HTTPException from fastapi import HTTPException
from routes.cookbook_helpers import _safe_env_prefix, _validate_gpus, _validate_ssh_port from routes.cookbook_helpers import (
_local_tooling_path_export,
_safe_env_prefix,
_validate_gpus,
_validate_ssh_port,
)
def test_safe_env_prefix_accepts_quoted_venv_path(): def test_safe_env_prefix_accepts_quoted_venv_path():
@@ -38,3 +43,18 @@ def test_validate_gpus_accepts_indexes_only():
assert _validate_gpus("0,1,2") == "0,1,2" assert _validate_gpus("0,1,2") == "0,1,2"
with pytest.raises(HTTPException): with pytest.raises(HTTPException):
_validate_gpus("0; rm -rf /") _validate_gpus("0; rm -rf /")
def test_local_tooling_path_export_prepends_interpreter_bin():
"""The cookbook runners must see the venv's bin (where `hf`/`python` live)
so tmux shells can find them without an activated venv."""
assert (
_local_tooling_path_export("/opt/venv/bin/python")
== 'export PATH="/opt/venv/bin:$PATH"'
)
def test_local_tooling_path_export_preserves_spaces_and_expands_path():
line = _local_tooling_path_export("/Users/John Smith/.venv/bin/python3")
assert line == 'export PATH="/Users/John Smith/.venv/bin:$PATH"'
assert line.endswith(':$PATH"') # $PATH stays expandable in double quotes
+129
View File
@@ -0,0 +1,129 @@
"""macOS / Apple Silicon (Metal) support for Cookbook hardware-fit.
Covers the Metal-specific behavior added for Apple Silicon and locks in the
guarantee that non-macOS (Linux/Windows) detection is unchanged.
"""
from services.hwfit import hardware
from services.hwfit.fit import rank_models
from services.hwfit.models import get_models
def _metal_system(ram_gb=16.0, vram_gb=10.7):
return {
"has_gpu": True,
"backend": "metal",
"gpu_name": "Apple M2",
"gpu_vram_gb": vram_gb,
"gpu_count": 1,
"available_ram_gb": ram_gb * 0.7,
"total_ram_gb": ram_gb,
"unified_memory": True,
}
def _fake_sysctl(brand="Apple M2 Pro", memsize_gb=32, wired_mb=None):
def run(cmd):
joined = " ".join(cmd)
if "machdep.cpu.brand_string" in joined:
return brand
if "hw.memsize" in joined:
return str(int(memsize_gb * 1024**3))
if "iogpu.wired_limit_mb" in joined:
return str(wired_mb) if wired_mb is not None else None
return None
return run
def test_mlx_models_hidden_on_metal():
"""MLX-quantized models can't be served by llama.cpp or Ollama (the only
Metal-capable engines Odysseus generates), so they must never be recommended
on Apple Silicon even though the catalog tags them as Apple-only."""
results = rank_models(_metal_system(), limit=900)
mlx = [m for m in results if str(m.get("quant", "")).startswith("mlx-")]
assert mlx == [], f"MLX models surfaced but cannot be served: {[m['name'] for m in mlx]}"
def _cuda_system():
return {
"has_gpu": True, "backend": "cuda", "gpu_name": "NVIDIA RTX 4090",
"gpu_vram_gb": 24.0, "gpu_count": 1, "available_ram_gb": 32.0, "total_ram_gb": 64.0,
}
def test_mlx_hidden_on_cuda_backend_unchanged():
"""Regression guard: Linux/CUDA users never saw MLX before and still don't."""
mlx = [m for m in rank_models(_cuda_system(), limit=900) if str(m.get("quant", "")).startswith("mlx-")]
assert mlx == []
def test_only_gguf_models_recommended_on_metal():
"""llama.cpp and Ollama (the only Metal engines) need GGUF. Safetensors-only
repos incl. vLLM-only AWQ/GPTQ/FP8 can't be served on Metal, so every
model recommended on Apple Silicon must ship a servable GGUF."""
catalog = {m["name"]: m for m in get_models()}
unservable = [
r["name"] for r in rank_models(_metal_system(), limit=900)
if not (catalog.get(r["name"], {}).get("is_gguf")
or catalog.get(r["name"], {}).get("gguf_sources"))
]
assert unservable == [], f"{len(unservable)} non-GGUF models on Metal, e.g. {unservable[:3]}"
def test_safetensors_models_still_recommended_on_cuda():
"""Regression guard: vLLM serves safetensors on CUDA, so non-GGUF repos must
NOT be filtered there the GGUF-only rule is Metal-specific."""
names = {r["name"] for r in rank_models(_cuda_system(), limit=900)}
assert "microsoft/Phi-mini-MoE-instruct" in names
def test_apple_silicon_detected_as_metal(monkeypatch):
"""On local Apple Silicon, detection reports a Metal GPU with a RAM-scaled
unified-memory budget."""
monkeypatch.setattr(hardware, "_remote_host", None)
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
monkeypatch.setattr(hardware.platform, "machine", lambda: "arm64")
monkeypatch.setattr(hardware, "_run", _fake_sysctl(memsize_gb=32))
info = hardware._detect_apple_silicon()
assert info is not None
assert info["backend"] == "metal"
assert info["gpu_name"] == "Apple M2 Pro"
assert info["unified_memory"] is True
assert info["gpu_vram_gb"] == 24.0 # 32GB * 0.75
def test_apple_silicon_skipped_on_linux(monkeypatch):
"""Guarantee Linux detection is untouched: the Metal probe bails immediately."""
monkeypatch.setattr(hardware, "_remote_host", None)
monkeypatch.setattr(hardware.platform, "system", lambda: "Linux")
monkeypatch.setattr(hardware.platform, "machine", lambda: "x86_64")
monkeypatch.setattr(hardware, "_run", _fake_sysctl())
assert hardware._detect_apple_silicon() is None
def test_intel_mac_skipped(monkeypatch):
"""Intel Macs have no Metal GPU worth serving LLMs on — fall through to CPU."""
monkeypatch.setattr(hardware, "_remote_host", None)
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
monkeypatch.setattr(hardware.platform, "machine", lambda: "x86_64")
monkeypatch.setattr(hardware, "_run", _fake_sysctl())
assert hardware._detect_apple_silicon() is None
def test_detect_system_propagates_unified_memory(monkeypatch):
"""The unified_memory flag set by GPU detection must survive into the
system dict so the API and UI can report it (it was being dropped)."""
monkeypatch.setattr(hardware, "_detect_apple_silicon", lambda: {
"gpu_name": "Apple M4", "gpu_vram_gb": 10.7, "gpu_count": 1,
"gpus": [], "gpu_groups": [], "homogeneous": True,
"backend": "metal", "unified_memory": True,
})
monkeypatch.setattr(hardware, "_get_ram_gb", lambda: 16.0)
monkeypatch.setattr(hardware, "_get_available_ram_gb", lambda: 11.0)
monkeypatch.setattr(hardware, "_get_cpu_count", lambda: 10)
monkeypatch.setattr(hardware, "_get_cpu_name", lambda: "Apple M4")
s = hardware.detect_system(fresh=True)
assert s["backend"] == "metal"
assert s.get("unified_memory") is True