mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-15 17:25:26 -04:00
00320972dc
Two bugs caused GPU inference to silently fall back to CPU inside the Odysseus Docker container even when the GPU was correctly passed through. ## entrypoint.sh — CUDA_HOME detection only covered CUDA 13.x wheels The nvcc glob only searched vidia/cu13, which matches the vidia-nvcc-cu13 pip wheel layout. CUDA 12.x wheels install nvcc to vidia/cuda_nvcc/bin/nvcc (nvidia-cuda-nvcc-cu12) or vidia/cu12 (nvidia-nvcc-cu12) — completely different paths. The glob found nothing, so CUDA_HOME was never set. Worse, VLLM_USE_FLASHINFER_SAMPLER=0 was inside the same if-block, so it was never set either. vLLM then tried to JIT-compile the FlashInfer sampler at startup, failed with 'Could not find nvcc', and crashed — even though the GPU was fully visible to the container. Fix: expand the search to also check nvidia/cu12 and nvidia/cuda_nvcc. Move VLLM_USE_FLASHINFER_SAMPLER=0 to an unconditional export after the loop (it is sampler-only, no impact on the attention path, and the correct setting for any container where CUDA headers may be incomplete). ## cookbook_routes.py — llama.cpp Linux source build silently fell back to CPU The cmake invocation was: cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build 2>/dev/null suppressed all configure errors. When nvcc is absent (the slim base image has no CUDA toolkit — intentional), cmake fails silently, then the || fallback re-runs without -DGGML_CUDA=ON. A CPU-only binary is produced with no warning. Additionally, a stale CMakeCache.txt from the failed CUDA attempt was reused (no rm -rf build), poisoning the next configure run. The macOS branch already did rm -rf build for exactly this reason; the Linux branch did not. Fix: before cmake, detect pip-installed nvcc across the same three path patterns as entrypoint.sh and expose it via CUDA_HOME/PATH. If nvcc is found, run a clean CUDA build with full error visibility. If not, fall back to a CPU build with an explicit warning telling the user how to get a GPU build (install vLLM via Cookbook -> Dependencies, which brings the CUDA wheels including nvcc, then re-launch). ## .env.example — document Windows COMPOSE_FILE separator Added a comment showing the semicolon separator required on Windows Docker Desktop alongside the existing colon-separator (Linux) example.
148 lines
5.8 KiB
Bash
148 lines
5.8 KiB
Bash
# Odysseus UI — Environment Configuration
|
|
# Copy this file to .env and fill in your values.
|
|
|
|
# ============================================================
|
|
# LLM Configuration
|
|
# ============================================================
|
|
|
|
# Primary LLM host (default: localhost)
|
|
LLM_HOST=localhost
|
|
|
|
# Additional LLM hosts, comma-separated (for model discovery)
|
|
# Use hostnames/IPs only; Odysseus scans common serve ports, including Ollama's 11434.
|
|
# LLM_HOSTS=llm-host.local,backup-llm.local
|
|
|
|
# Optional Ollama base URL. In Docker, host Ollama is usually reachable here
|
|
# when started with OLLAMA_HOST=0.0.0.0:11434.
|
|
# OLLAMA_BASE_URL=http://host.docker.internal:11434/v1
|
|
|
|
# OpenAI API key (only needed if using OpenAI models).
|
|
# Do not commit real keys. Keep this commented until needed.
|
|
# OPENAI_API_KEY=your_openai_api_key_here
|
|
|
|
# Research service LLM endpoint
|
|
# RESEARCH_LLM_ENDPOINT=http://localhost:8000/v1/chat/completions
|
|
|
|
# ============================================================
|
|
# Search & Web
|
|
# ============================================================
|
|
|
|
# SearXNG instance URL (self-hosted, for web search).
|
|
# Docker Compose overrides this to http://searxng:8080 for in-network access.
|
|
SEARXNG_INSTANCE=http://localhost:8080
|
|
|
|
# Optional SearXNG cookie/CSRF secret. If blank, Docker generates one on first boot
|
|
# and stores it in the searxng-data volume.
|
|
# SEARXNG_SECRET=
|
|
|
|
# ============================================================
|
|
# Database
|
|
# ============================================================
|
|
|
|
# SQLite database path (default: sqlite:///./data/app.db)
|
|
# DATABASE_URL=sqlite:///./data/app.db
|
|
|
|
# ============================================================
|
|
# Auth & Security
|
|
# ============================================================
|
|
|
|
# Enable authentication (default: true)
|
|
# AUTH_ENABLED=true
|
|
|
|
# Host bind address and port for the Odysseus web UI in Docker Compose.
|
|
# Keep APP_BIND on loopback unless you intentionally want LAN/reverse-proxy access.
|
|
# APP_BIND=127.0.0.1
|
|
# Change this if another local service already uses 7000 (macOS AirPlay often does).
|
|
# APP_PORT=7000
|
|
|
|
# Development-only auth bypass for loopback requests.
|
|
# Keep false for Docker, LAN, reverse proxy, and any shared deployment.
|
|
# LOCALHOST_BYPASS=false
|
|
|
|
# Optional: pre-seed the first admin password during setup.
|
|
# Do not commit a real password.
|
|
# ODYSSEUS_ADMIN_PASSWORD=change_me_before_first_boot
|
|
|
|
# CORS allowed origins (default: localhost-only; restrict to your public origin in production)
|
|
# ALLOWED_ORIGINS=http://localhost:7000,http://localhost:8000
|
|
|
|
# ============================================================
|
|
# ChromaDB (vector store)
|
|
# ============================================================
|
|
|
|
# ChromaDB service host.
|
|
# Manual host run: localhost:8100 when using `docker run -p 8100:8000 chromadb/chroma`.
|
|
# Docker Compose overrides these to chromadb:8000 for in-network access.
|
|
# CHROMADB_HOST=localhost
|
|
# CHROMADB_PORT=8100
|
|
|
|
# Docker Compose host-port bind addresses for bundled services.
|
|
# Defaults are loopback-only for safety. To expose ntfy only on Tailscale,
|
|
# set NTFY_BIND to your host's Tailscale IP and update NTFY_BASE_URL.
|
|
# CHROMADB_BIND=127.0.0.1
|
|
# NTFY_BIND=127.0.0.1
|
|
# NTFY_BASE_URL=http://localhost:8091
|
|
# Example:
|
|
# NTFY_BIND=100.x.y.z
|
|
# NTFY_BASE_URL=http://100.x.y.z:8091
|
|
|
|
# ============================================================
|
|
# RAG / Embeddings
|
|
# ============================================================
|
|
|
|
# Embedding API endpoint (OpenAI-compatible /v1/embeddings)
|
|
# Default: http://{LLM_HOST}:11434/v1/embeddings (ollama)
|
|
# EMBEDDING_URL=http://localhost:11434/v1/embeddings
|
|
|
|
# Embedding model name (must be available at the endpoint above)
|
|
# EMBEDDING_MODEL=all-minilm:l6-v2
|
|
|
|
# Local fallback embedding model (used when no HTTP embedding API is available)
|
|
# Uses fastembed (ONNX) — downloads model on first run (~50MB)
|
|
# FASTEMBED_MODEL=sentence-transformers/all-MiniLM-L6-v2
|
|
# FASTEMBED_CACHE_PATH= # defaults to ~/.cache/fastembed
|
|
|
|
# ============================================================
|
|
# Misc
|
|
# ============================================================
|
|
|
|
# Cleanup interval in hours (default: 24)
|
|
# CLEANUP_INTERVAL_HOURS=24
|
|
|
|
# In-process email pollers (default: on). Set to 0 if you're driving
|
|
# polling from cron / systemd via `scripts/odysseus-mail poll-scheduled`
|
|
# and `scripts/odysseus-mail poll-summary`, otherwise both schedulers
|
|
# race on the same SQLite.
|
|
# ODYSSEUS_INPROCESS_POLLERS=1
|
|
|
|
# In-process scheduled-task runner (default: on). Set to 0 to let an
|
|
# external driver fire scheduled tasks. Calendar reminders are
|
|
# frontend-driven (polling /api/notes from the browser) so no gate is
|
|
# needed there.
|
|
# ODYSSEUS_INPROCESS_TASKS=1
|
|
|
|
# Host used by the built-in "run_script" scheduled-task action.
|
|
# Empty/local/localhost runs scripts on the app host. Set to an SSH host alias
|
|
# if you intentionally want scheduled scripts to run remotely.
|
|
# ODYSSEUS_SCRIPT_HOST=localhost
|
|
|
|
# ============================================================
|
|
# GPU support (Docker Compose)
|
|
# ============================================================
|
|
# Pass the host GPU into the odysseus container. Default (unset) = CPU.
|
|
# COMPOSE_FILE is a native `docker compose` feature: a colon-separated
|
|
# list of files merged left-to-right. Pick ONE GPU line below, or leave
|
|
# all commented for CPU.
|
|
#
|
|
# NVIDIA (requires nvidia-container-toolkit + `nvidia-ctk runtime
|
|
# configure --runtime=docker` on the host):
|
|
# COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
|
|
# COMPOSE_FILE=docker-compose.yml;docker/gpu.nvidia.yml #(Windows)
|
|
#
|
|
# AMD ROCm (requires ROCm drivers on the host):
|
|
# COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
|
|
#
|
|
# These overlays only expose the GPU devices. The slim Odysseus image
|
|
# still needs CUDA/ROCm userspace via Cookbook -> Dependencies (vLLM,
|
|
# llama-cpp-python, etc.) before models can actually serve on GPU.
|