11 Commits

Author SHA1 Message Date
pewdiepie-archdaemon d9ebdd6fbb Refresh README presentation 2026-06-15 23:24:41 +09:00
pewdiepie-archdaemon b118c33e37 test(provider): align lookalike-host URL expectations with /models behavior
build_models_url returns /models (no /v1 prefix) for non-local generic
OpenAI-compatible hosts (intentional, see endpoint_resolver.py:206). The
tests added in #4272 expected /v1/models, which is the local/deepseek
behavior. Match production semantics.
2026-06-15 23:21:49 +09:00
pewdiepie-archdaemon da74cc23e4 Merge remote-tracking branch 'origin/dev' 2026-06-15 23:13:18 +09:00
Ashvin d792b61722 test(gallery): point delete-ordering tests at the tmp image dir (#4300)
The two delete-ordering tests did monkeypatch.chdir(tmp_path) and wrote the
image under tmp_path/data/generated_images, but DATA_DIR (and therefore
gallery_routes.GALLERY_IMAGE_DIR) is always an absolute path, so the delete
resolver pointed at the repo's real data dir and ignored the chdir.

test_file_removed_on_successful_delete therefore failed on dev (the file at
the tmp path was never the one being removed), and test_file_kept_when_commit_fails
passed only by accident. Set GALLERY_IMAGE_DIR to the seeded tmp dir via
monkeypatch so both tests exercise the real path and pass deterministically.
2026-06-15 14:07:49 +00:00
pewdiepie-archdaemon 1faadf7e10 Merge remote-tracking branch 'origin/dev' 2026-06-15 23:02:46 +09:00
Kenny Van de Maele e87b44126c test(hwfit): fix non-Apple guard to assert the Apple matcher (unblocks pytest gate) (#4303)
* test(hwfit): assert the Apple matcher, not the general lookup, in the non-Apple guard

f7aa2de (#2564) added test_non_apple_gpu_with_cores_does_not_match, which
asserts _lookup_bandwidth(RTX 4090) is None. But '4090': 1008 has been in
the general GPU_BANDWIDTH table since v1.0, so _lookup_bandwidth correctly
returns the card's real bandwidth and the test fails (expected None, got
1008) - reddening the required pytest gate on dev and, by inheritance,
every open PR.

The guard's actual intent is that the Apple-specific bandwidth path does
not false-match a non-Apple card that carries a gpu_cores count. Point
the two asserts at _lookup_apple_bandwidth, which returns None for any
name without 'apple' regardless of the general table. The general-lookup
behavior (4090 -> 1008) is correct and untouched.

* fix(hwfit): route string GPU names through the Apple bandwidth helper

Second half of the #2564 regression (RaresKeY review on #4303). That
change moved the Apple tiers out of the generic GPU_BANDWIDTH table into
the dict-only _lookup_apple_bandwidth, but _lookup_bandwidth only called
that helper for dict inputs. A bare-string caller like
_lookup_bandwidth("Apple M3 Max") therefore fell through to the generic
table, found no Apple key, and returned None instead of the conservative
tier. Route both dict and string inputs through the Apple helper (a
string carries no gpu_cores, so it gets the model's lowest tier).
Regression added for the string path plus a non-Apple string control.
2026-06-15 14:01:05 +00:00
pewdiepie-archdaemon 62476ddb55 Merge remote-tracking branch 'origin/dev' 2026-06-15 22:59:57 +09:00
pewdiepie-archdaemon e899817969 Remove duplicate CodeQL workflow 2026-06-15 22:53:29 +09:00
pewdiepie-archdaemon 1cc9a003fd Fix failing post-merge tests 2026-06-15 22:49:06 +09:00
Ahmad Naalweh f7aa2de410 fix(hwfit): distinguish Apple Silicon bandwidth variants (#2564)
* fix: resolve Apple Silicon bandwidth variants

* fix(hwfit): preserve string lookup path in _lookup_bandwidth

* fix(hwfit): guard Apple bandwidth lookup against false GPU matches

Add "apple" not in gn check to _lookup_apple_bandwidth() so that
non-Apple GPUs with "m3"/"m4"/"m5" in their names (e.g. NVIDIA
Quadro M4 000) don't incorrectly match Apple bandwidth tiers.

Addresses @o3LL review comment on PR #2564.
2026-06-15 15:13:03 +02:00
Ashvin 514d345334 test(models): pin lookalike hosts to the generic OpenAI branch (#4272)
#4159 (4b0a977) made build_models_url insert /v1 for path-less bases, so
the TestBuildersRejectLookalikeHosts model assertions that expected
/models started failing and turned the pytest gate red on dev.

Both the generic OpenAI branch and the real Anthropic branch now end in
/v1/models, so a URL-only assertion no longer proves a lookalike host
dodged the Anthropic/Ollama branch. Assert _detect_provider == "openai"
directly and keep the /v1/models expectation.
2026-06-15 12:43:33 +00:00
15 changed files with 704 additions and 553 deletions
-61
View File
@@ -1,61 +0,0 @@
# CodeQL code scanning
#
# Purpose: GitHub's own static analysis engine reads the application source
# (Python backend + the JavaScript frontend) and looks for real
# vulnerabilities -- SQL/command injection, path traversal, auth mistakes,
# unsafe deserialization. Findings appear in the repo's Security tab. This is
# the deepest check in the suite and the most valuable for a high-profile
# target.
#
# It runs on every push to main and on a weekly schedule (to catch newly
# disclosed query patterns against unchanged code). It deliberately does NOT
# run on pull requests: most PRs here come from forks, whose read-only token
# cannot publish results, which would produce confusing failures. To scan pull
# requests too, a maintainer can instead enable CodeQL "default setup" in
# Settings -> Security -> Code scanning (one toggle, no file needed) -- see
# docs/security-ci.md.
name: CodeQL
on:
push:
branches: [main]
schedule:
# Weekly, Monday 06:00 UTC.
- cron: '0 6 * * 1'
workflow_dispatch:
permissions: {}
concurrency:
group: codeql-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze (${{ matrix.language }})
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write # publish results to the Security tab
strategy:
fail-fast: false
matrix:
# Both are interpreted, so CodeQL needs no build step (build-mode none).
language: [python, javascript-typescript]
steps:
- name: Checkout repository
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
with:
persist-credentials: false
- name: Initialize CodeQL
uses: github/codeql-action/init@8aad20d150bbac5944a9f9d289da16a4b0d87c1e # v4.36.2
with:
languages: ${{ matrix.language }}
build-mode: none
- name: Perform CodeQL analysis
uses: github/codeql-action/analyze@8aad20d150bbac5944a9f9d289da16a4b0d87c1e # v4.36.2
with:
category: "/language:${{ matrix.language }}"
+38 -463
View File
@@ -1,476 +1,65 @@
# Odysseus <p align="center">
<img src="docs/odysseus-wordmark.png" alt="Odysseus" width="280">
</p>
> **Branch note:** `dev` is the default branch and contains the latest development changes, but it may be unstable. For the more stable curated branch, use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main). <p align="center">
A self-hosted AI workspace for chat, agents, research, documents, email, notes, calendar, and local model workflows.
</p>
``` <p align="center">
─────────────────────────────────────────────── <a href="#quick-start">Quick Start</a> ·
⊹ ࣪ ˖ ૮( ˶ᵔ ᵕ ᵔ˶ )っ Odysseus vers. 1.0 <a href="docs/setup.md">Setup Guide</a> ·
─────────────────────────────────────────────── <a href="CONTRIBUTING.md">Contributing</a> ·
``` <a href="ROADMAP.md">Roadmap</a>
</p>
![Odysseus](docs/odysseus.jpg) <p align="center">
<a href="https://repology.org/project/odysseus-ai/versions"><img src="https://repology.org/badge/vertical-allrepos/odysseus-ai.svg" alt="Packaging status"></a>
</p>
A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan. <p align="center">
<img src="docs/odysseus.jpg" alt="Odysseus interface">
</p>
[![Packaging status](https://repology.org/badge/vertical-allrepos/odysseus-ai.svg)](https://repology.org/project/odysseus-ai/versions) ---
## Features
- **Chat** -- chat with any local model or API; adding them is super simple.<br> <sub>vLLM · llama.cpp · Ollama · OpenRouter · OpenAI · GitHub Copilot</sub>
- **Agent** -- hand it tools and let it run the whole task itself.<br> <sub>built on [opencode](https://github.com/anomalyco/opencode) · MCP · web · files · shell · skills · memory</sub>
- **Cookbook** -- Scans your hardware, recommends models, click to download and serve.. easy!<br> <sub>built on [llmfit](https://github.com/AlexsJones/llmfit) · VRAM-aware · GGUF / FP8 / AWQ · fit scoring · vLLM / llama.cpp serving</sub>
- **Deep Research** -- multi-step runs that gather, read, and synthesize sources into a nice visual report.<br> <sub>adapted from [Tongyi DeepResearch](https://github.com/Alibaba-NLP/DeepResearch)</sub>
- **Compare** -- a fun tool to compare models side by side. Test completely blind, no bias!<br> <sub>multi-model · blind test · synthesis</sub>
- **Documents** -- YOU write the text, AI is there to assist, not the opposite.<br> <sub>multi-tab editor · markdown · HTML · CSV · syntax highlighting · AI edits · suggestions</sub>
- **Memory / Skills** -- Persistent memory and skills, your agent evolves over time as it better understands you and your tasks!<br> <sub>ChromaDB · fastembed (ONNX) · vector + keyword retrieval · import/export</sub>
- **Email** -- IMAP/SMTP inbox with AI triage built in: urgency reminders, auto-tag, auto-summary, auto-reply drafts, auto-spam.<br> <sub>IMAP · SMTP · per-account routing · CalDAV-aware</sub>
- **Notes & Tasks** -- Quick notes with reminders, a todo list, and scheduled tasks the agent can act on.<br> <sub>note pings · checklist · cron-style tasks · ntfy / browser / email channels</sub>
- **Calendar** -- Local-first calendar with CalDAV sync to Radicale / Nextcloud / Apple / Fastmail.<br> <sub>CalDAV pull · .ics import/export · per-calendar colors · agent-aware</sub>
- **Works on mobile** -- looks and runs great on your phone, not just desktop.<br> <sub>responsive · installable (PWA) · touch gestures</sub>
- **Extras** -- more to explore, happy if you give it a go!<br> <sub>image editor · theme editor · file uploads (vision + PDF) · web search · presets · sessions · 2FA</sub>
## Demo
A full, hover-to-play tour lives on the landing page (`docs/index.html`).
<details>
<summary>Screenshots / clips</summary>
### Chat & Agents
![Chat & Agents](docs/chat.gif)
### Deep Research
![Deep Research](docs/research.gif)
### Compare
![Compare](docs/compare.gif)
### Documents
![Documents](docs/document.gif)
### Notes & Tasks
![Notes & Tasks](docs/notes.gif)
</details>
## Quick Start ## Quick Start
Defaults work out of the box: clone, run, then configure models/search/email > `dev` is the default branch and gets the newest changes first. Use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main) if you want the more curated branch.
inside **Settings**. Only edit `.env` for deployment-level overrides like
`APP_BIND`, `APP_PORT`, `AUTH_ENABLED`, `DATABASE_URL`, or a pre-seeded admin password.
On first setup, Odysseus creates an admin account (`admin` unless
`ODYSSEUS_ADMIN_USER` is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in `docker compose logs odysseus`.
Use that for the first login, then change it in **Settings**.
Contributing? See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, testing, and
pull request guidelines.
### Docker (recommended)
```bash ```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus cd odysseus
cp .env.example .env # optional, but recommended for explicit defaults cp .env.example .env
docker compose up -d --build docker compose up -d --build
``` ```
To include optional extras in the image (PDF viewer, Office extraction; includes AGPL PyMuPDF), build with `docker compose build --build-arg INSTALL_OPTIONAL=true` before `up`.
Open `http://localhost:7000` when the containers are healthy. Docker Compose Open `http://localhost:7000` when the containers are healthy. The first admin password is printed in `docker compose logs odysseus`.
binds the web UI to `127.0.0.1` by default. If the port is taken, set
`APP_PORT=7001` in `.env` and recreate the container. Set `APP_BIND=0.0.0.0`
only when you intentionally want LAN/reverse-proxy access.
> **On Apple Silicon (M-series) Macs:** Docker can't reach the Metal GPU, so Native installs, GPU notes, Windows/macOS instructions, HTTPS, and configuration live in the [setup guide](docs/setup.md).
> Cookbook serves local models on CPU only. For GPU-accelerated model serving,
> run natively instead — see [Apple Silicon](#apple-silicon) below.
### Native Linux / macOS ## Features
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
Requirements: Python 3.11+. Cookbook also needs `tmux` for background model
downloads and serves. The app itself is lightweight; local model serving is the
heavy part and depends on the model, runtime, GPU, and VRAM, so small hosts can
connect to API or remote model servers instead. Use `--host 0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
### Apple Silicon - **Chat + Agents** — local/API models, tools, MCP, files, shell, skills, and memory.
Docker on macOS cannot use the Metal GPU. For GPU-accelerated Cookbook on an - **Cookbook** — hardware-aware model recommendations, downloads, and serving.
M-series Mac, run Odysseus natively: - **Deep Research** — multi-step web research with source reading and report generation.
- **Compare** — blind side-by-side model testing and synthesis.
- **Documents** — writing-first editor with AI edits, suggestions, Markdown, HTML, CSV, and syntax highlighting.
- **Email** — IMAP/SMTP inbox with triage, tags, summaries, reminders, and reply drafts.
- **Notes, Tasks + Calendar** — reminders, todos, scheduled agent tasks, and CalDAV sync.
- **Extras** — gallery/image editor, themes, uploads, web search, presets, sessions, and 2FA.
```bash ## Demo
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh
```
It launches at `http://127.0.0.1:7860`. To expose it to your phone over a trusted LAN/VPN such as Tailscale, bind all interfaces: A full hover-to-play tour lives on the landing page: [`docs/index.html`](docs/index.html).
```bash
ODYSSEUS_HOST=0.0.0.0 ./start-macos.sh
# then open http://<tailscale-ip>:7860
```
The script also reads `.env` at startup, so `APP_BIND=0.0.0.0` and `APP_PORT`
set there are picked up automatically without a command-line override each run.
Keep `AUTH_ENABLED=true` (the default) before binding outside loopback. Do not
expose this port directly to the public internet. To build a clickable app wrapper:
```bash
./build-macos-app.sh
```
<details>
<summary>Cookbook, GPU, Ollama, and troubleshooting notes</summary>
**Docker bundled services.** Compose starts Odysseus, ChromaDB, SearXNG, and
ntfy. Odysseus and the bundled service ports bind to `127.0.0.1` by default, so
they are reachable from the host but not exposed to your LAN/public internet
unless you opt in.
**Cookbook storage in Docker.** Downloads live in `./data/huggingface`
(`~/.cache/huggingface` in the container). Cookbook-installed Python CLIs and
serve engines live in `./data/local` (`~/.local` in the container), so they
survive container recreation.
**Remote servers.** In **Cookbook -> Settings -> Servers**, generate the
Odysseus SSH key and add the public key to the remote server's
`~/.ssh/authorized_keys`. From the host you can also run:
```bash
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
```
**Docker GPU overlays.** CPU-only users can skip this section. Cookbook can
only detect GPUs that Docker exposes to the container — if the host runtime or
device passthrough is not configured, Cookbook sees the iGPU, another card, or
CPU instead of your intended GPU.
For NVIDIA, `scripts/check-docker-gpu.sh` diagnoses GPU passthrough and can
optionally install the host runtime or update `.env`.
```bash
# Read-only diagnostic (default — installs nothing, never edits .env):
scripts/check-docker-gpu.sh
# Print OS-specific install commands without running them:
scripts/check-docker-gpu.sh --print-install-commands
# Install NVIDIA Container Toolkit on Ubuntu/Debian (requires sudo):
scripts/check-docker-gpu.sh --install-nvidia-toolkit
# Write COMPOSE_FILE to .env (only when GPU passthrough is confirmed working):
scripts/check-docker-gpu.sh --enable-nvidia-overlay
# Full assisted setup — install toolkit, then enable overlay if passthrough works:
scripts/check-docker-gpu.sh --install-nvidia-toolkit --enable-nvidia-overlay
```
Safety notes:
- The app never installs host GPU runtime automatically.
- The app never edits `.env` automatically.
- `.env` is only modified when `--enable-nvidia-overlay` is explicitly passed,
and only after GPU passthrough succeeds. `--yes` skips prompts but does not
bypass the passthrough gate.
- `.env.bak.*` backups created by `--enable-nvidia-overlay` are ignored by
Git and the Docker build context.
To enable manually without the script, add this to `.env`:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
```
**AMD / ROCm.** AMD setup is read-only diagnostic plus manual `.env` edit. Run:
```bash
scripts/check-docker-amd-gpu.sh
```
Then add the reported values to `.env`, replacing `RENDER_GID` with your host's
numeric render group id:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
RENDER_GID=989
```
For NVIDIA/AMD GPU support, also read the comments in the selected overlay file: docker/gpu.nvidia.yml or docker/gpu.amd.yml.
**Stack-management UIs (Portainer, Coolify, Dockhand, etc.).** These tools
often accept only a single Compose file and do not reliably honor `COMPOSE_FILE`
or multiple `-f` overlays. CLI users should keep using the `COMPOSE_FILE`
overlay workflow above. For stack UIs, point the stack at one of the standalone
files instead, which bundle the base stack plus the GPU settings:
- `docker-compose.gpu-nvidia.yml` — still requires the NVIDIA Container Toolkit
on the host.
- `docker-compose.gpu-amd.yml` — still requires host ROCm/kfd/DRI setup, the
`video`/`render` group membership, and `RENDER_GID` when needed.
The base `docker-compose.yml` plus the `docker/gpu.*.yml` overlays remain the
source of truth; the standalone files mirror them for single-file deployments.
Verify after enabling either overlay:
```bash
docker compose exec odysseus nvidia-smi -L # NVIDIA
docker compose exec odysseus sh -lc 'test -e /dev/kfd && test -d /dev/dri && ls -l /dev/kfd /dev/dri/renderD*' # AMD
```
> **GPU passthrough ≠ llama.cpp CUDA.** `nvidia-smi` passing inside the
> container confirms Docker GPU access, but llama.cpp also needs `cudart` and
> the CUDA Toolkit at runtime. If Cookbook logs show `Unable to find cudart
> library`, `Could NOT find CUDAToolkit`, `CUDA Toolkit not found`, or
> tensors/layers assigned to CPU, that is a Cookbook/llama.cpp build issue —
> not a Docker passthrough failure. Reinstall the serve engine via
> **Cookbook → Dependencies** to get a CUDA-enabled build.
>
> The same split applies to AMD/ROCm: seeing `/dev/kfd` and `/dev/dri` inside
> the container confirms device passthrough, not ROCm userspace or a
> ROCm-enabled vLLM/llama.cpp build. `rocm-smi` and `rocminfo` are not expected
> inside the slim Odysseus image.
**Ollama with Docker.** If Ollama runs on the host, add this endpoint in
Settings:
```text
http://host.docker.internal:11434/v1
```
Ollama must listen outside its own loopback interface:
```bash
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
This connects Odysseus in Docker to an Ollama server that is already running on
your host machine; it does not start Ollama inside the container.
`host.docker.internal` is Docker's hostname for the host machine from inside the
container. Cookbook **Serve** is a separate workflow for serving downloaded
models through Odysseus/llama.cpp, so Windows users with an existing Ollama
install usually only need to add the endpoint in Settings.
**Useful checks.**
```bash
docker compose ps
docker compose logs --tail=120 odysseus
docker compose logs odysseus | grep -E 'ChromaDB|MemoryVectorStore|DEGRADED'
```
**macOS details.** `start-macos.sh` installs Homebrew deps, creates the venv,
runs setup, and starts uvicorn on port `7860` because AirPlay often holds
`7000`. It uses llama.cpp/Ollama for Metal. vLLM/SGLang are CUDA/ROCm-only and
do not run on macOS. MLX-only models are not served by Odysseus.
</details>
### Native Windows
**One-command launcher** (creates the venv, installs deps, runs setup, starts the
server; safe to re-run):
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1
```
Or do it by hand:
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
py -3.11 -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
If `python` points at an older interpreter, use `py -3.12` (or another installed
3.11+ version) for the venv step.
**Requirements:** Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full **Cookbook** background
model downloads and the agent shell tool, also install
[Git for Windows](https://git-scm.com/download/win) (provides `bash.exe`).
Local GPU *serving* of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
[Ollama](https://ollama.com/download) is the easiest path — point Odysseus at
`http://localhost:11434/v1` in Settings.
Open `http://localhost:7000`, log in with the generated admin password,
and configure everything else inside **Settings**.
## Troubleshooting & Advanced Setup
### `chromadb-client` conflicts with embedded ChromaDB
If `chromadb-client` (the lightweight HTTP-only package) is installed alongside the full `chromadb` package, Odysseus starts but ChromaDB silently falls back to HTTP-only mode and fails.
**Fix:** uninstall `chromadb-client` and force-reinstall the full package:
```bash
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
```
### HTTPS + LAN/Tailscale exposure
To expose Odysseus on a local network or Tailscale with HTTPS:
1. Change the bind address to `0.0.0.0` in `.env` (`APP_BIND=0.0.0.0` or `ODYSSEUS_HOST=0.0.0.0`).
2. Generate a locally-trusted cert for your LAN/Tailscale IPs using [mkcert](https://github.com/FiloSottile/mkcert):
```bash
mkcert -install
mkcert -cert-file cert.pem -key-file key.pem 192.168.1.100 tailscale-ip
```
3. Run `uvicorn` with the generated certs:
```bash
python -m uvicorn app:app --host 0.0.0.0 --port 7000 --ssl-certfile=cert.pem --ssl-keyfile=key.pem
```
4. Install the `mkcert` CA on any other device you want to access Odysseus from (e.g., for iOS, email the `rootCA.pem` to yourself, install the profile, and trust it in Certificate Trust Settings).
### Optional Dependencies
`requirements-optional.txt` contains packages that unlock extra features. It is not installed by default.
| Package | Feature unlocked |
|---------|-----------------|
| `faster-whisper` | Local speech-to-text (microphone -> text) via the "local" STT provider. |
| `ddgs` | DuckDuckGo as a search provider option. |
| `PyMuPDF` | PDF page rendering in the side viewer panel and form-filling. (Note: AGPL-3.0) |
| `markitdown` | Office/EPUB document text extraction (converts .docx/.xlsx/.pptx/.xls/.epub to Markdown). |
### Faster, reproducible installs with uv (optional)
[uv](https://docs.astral.sh/uv/) works as a drop-in replacement for the
venv + pip steps in the native install guides, no project changes are needed but this change results in faster installs along with a lockfile for reproducible environments. After [installing `uv`](https://docs.astral.sh/uv/getting-started/installation/), use:
```bash
uv venv venv --python 3.13
uv pip install -r requirements.txt
# then continue as usual: python setup.py, uvicorn, ...
```
`requirements.txt` is intentionally unpinned, so two installs at different times can produce different package versions. If you want a reproducible environment (e.g. across your own machines, or to roll back after a bad upgrade), snapshot and restore exact versions with:
```bash
uv pip compile requirements.txt -o requirements.lock # snapshot current resolution
uv pip sync requirements.lock # reproduce it exactly later
```
`requirements.lock` is gitignored and platform-specific (compile it on the OS you deploy to). Regenerate it deliberately when you want to take upgrades. The plain `uv pip install -r requirements.txt` keeps following the unpinned requirements like pip does.
### Outlook / Office 365 email
Odysseus email accounts currently use IMAP/SMTP username-password auth. Outlook
and Microsoft 365 generally require OAuth instead, so normal Microsoft mailbox
passwords will fail. See [docs/email-outlook.md](docs/email-outlook.md) for the
current limitation and the planned integration direction.
## Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep `AUTH_ENABLED=true` for any network-accessible deployment.
- Keep `LOCALHOST_BYPASS=false` outside local development.
- Use `SECURE_COOKIES=true` when Odysseus is served through HTTPS by a trusted reverse proxy or private access gateway.
- Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy or private access layer.
- Keep `.env`, `data/`, `logs/`, databases, uploads, generated media, backups, auth/session files, API keys, and model/provider tokens out of Git and private shares. They are ignored by default.
- Review `data/auth.json` after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
- Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to `127.0.0.1`; bind to `0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
- Keep ChromaDB, SearXNG, ntfy, Ollama, vLLM, llama.cpp, databases, and raw model/provider APIs internal-only. Expose only the authenticated Odysseus web/API entrypoint through your trusted proxy or private access layer.
- Before publishing a fork, run `git status --short` and confirm no private files from `.env`, `data/`, `logs/`, uploads, backups, or local databases are staged.
### Private or proxied deployments
Odysseus serves plain HTTP on its app port. Docker Compose binds Odysseus and the bundled services to `127.0.0.1` by default, so a typical production/private setup is:
1. Keep Odysseus on localhost, for example `127.0.0.1:7000`.
2. Terminate HTTPS at a trusted reverse proxy or private access gateway.
3. Put the authenticated Odysseus web/API entrypoint behind that layer.
4. Keep raw service and model ports internal-only.
Cloudflare Access, Tailscale, Caddy, nginx, and Traefik can all fit this pattern; none are required by Odysseus. If your access layer reaches Odysseus on the same host, proxy to `http://127.0.0.1:7000` and keep `AUTH_ENABLED=true`, `LOCALHOST_BYPASS=false`, and `SECURE_COOKIES=true`.
`ALLOWED_ORIGINS` lists exact permitted origins for cross-origin browser/API clients; ordinary same-origin reverse-proxy access usually does not need a special CORS entry.
Common internal-only ports from the default docs/compose setup:
| Port | Service |
|---|---|
| `7000` | Odysseus raw app port |
| `8080` | SearXNG |
| `8091` | ntfy |
| `8100` | ChromaDB host port for manual/compose access |
| `11434` | Ollama |
| `8000-8020` | Common local model/provider APIs |
## Contributing ## Contributing
Help is welcome. The best entry points are fresh-install testing, provider setup
bugs, mobile/editor polish, docs, and small focused refactors. See
[ROADMAP.md](ROADMAP.md) for the current help-wanted list.
## Configuration Help is welcome. The best entry points are fresh-install testing, provider setup bugs, mobile/editor polish, docs, and small focused refactors. See [CONTRIBUTING.md](CONTRIBUTING.md) and [ROADMAP.md](ROADMAP.md).
Most setup is done inside the app with `/setup` or **Settings**. Use `.env`
for deployment-level defaults and secrets you want present before first boot.
Key settings:
| Variable | Default | Description | ## Security
|---|---|---|
| `LLM_HOST` | `localhost` | Your LLM server (e.g. `llm-host.local:8000`) |
| `LLM_HOSTS` | -- | Comma-separated list for model discovery |
| `OPENAI_API_KEY` | -- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
| `SEARXNG_INSTANCE` | `http://localhost:8080` | SearXNG URL. Docker overrides this to `http://searxng:8080`. |
| `SEARXNG_SECRET` | generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
| `APP_BIND` | `127.0.0.1` | Docker Compose host bind address for the web UI. Use `0.0.0.0` only for intentional LAN/reverse-proxy access. |
| `APP_PORT` | `7000` | Docker Compose host port for the web UI. |
| `APP_DATA_DIR` | `./data` | Docker Compose host directory for application data volumes. |
| `APP_LOGS_DIR` | `./logs` | Docker Compose host directory for application logs. |
| `AUTH_ENABLED` | `true` | Enable/disable login |
| `LOCALHOST_BYPASS` | `false` | Development-only auth bypass for loopback requests. Keep false for shared/network deployments. |
| `ALLOWED_ORIGINS` | `http://localhost,http://127.0.0.1` | Comma-separated exact permitted origins for cross-origin browser/API clients. |
| `SECURE_COOKIES` | `false` | Set true when serving Odysseus through HTTPS at a trusted proxy or private access gateway. |
| `DATABASE_URL` | `sqlite:///./data/app.db` | Database connection string |
| `CHROMADB_HOST` | `localhost` | ChromaDB host for vector memory. Docker overrides this to `chromadb`. |
| `CHROMADB_PORT` | `8100` | ChromaDB port for manual host runs. Docker overrides this to `8000`. |
| `EMBEDDING_URL` | -- | OpenAI-compatible embeddings endpoint |
| `ODYSSEUS_CHAT_UPLOAD_MAX_BYTES` | `10485760` | Chat/agent attachment cap in bytes. Raise for larger local PDFs or text documents. |
| `ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES` | `104857600` | Gallery image upload cap in bytes (100 MB). |
| `ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES` | `26214400` | Gallery transform input cap in bytes (25 MB). |
| `ODYSSEUS_MEMORY_IMPORT_MAX_BYTES` | `10485760` | Memory import file cap in bytes (10 MB). |
| `ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES` | `26214400` | Personal document upload cap in bytes (25 MB). |
| `ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES` | `26214400` | Email compose attachment cap in bytes (25 MB). |
| `ODYSSEUS_STT_MAX_AUDIO_BYTES` | `26214400` | Speech-to-text audio cap in bytes (25 MB). |
| `ODYSSEUS_ICS_MAX_BYTES` | `10485760` | Calendar `.ics` import cap in bytes (10 MB). |
All upload-limit vars are validated (must be a positive integer) and optional; an invalid value fails fast at startup. Odysseus is a self-hosted workspace with powerful local tools. Keep auth enabled, keep private data out of Git, and do not expose raw model/service ports publicly. Deployment details are in the [setup guide](docs/setup.md#security-notes).
### Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, `@playwright/mcp`) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
```bash
npx -y @playwright/mcp@latest --version
```
That installs `@playwright/mcp` plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
## Architecture
```
app.py # FastAPI entry point
core/ auth, database, middleware, constants
src/ llm_core, agent_loop, agent_tools, chat_processor, search/
routes/ chat, session, document, memory, model … endpoints
services/ docs, memory, search, hwfit (Cookbook) …
static/ index.html + app.js + style.css + js/ (modular front-end)
docs/ landing page (index.html) + preview clips
```
## Data
All user data lives in `data/` (gitignored): `app.db` (sessions, messages, documents),
`memory.json`, `presets.json`, `uploads/`, `personal_docs/`, `chroma/`, `settings.json`.
To back up or restore everything in `data/`, see the
[Backup & Restore guide](docs/backup-restore.md).
## Star History ## Star History
@@ -483,19 +72,5 @@ To back up or restore everything in `data/`, see the
</a> </a>
## License ## License
AGPL-3.0-or-later -- see [LICENSE](LICENSE) and [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md).
``` AGPL-3.0-or-later -- see [LICENSE](LICENSE) and [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md).
|
|||
|||||
| | | |||||||
)_) )_) )_) ~|~
)___))___))___)\ |
)____)____)_____)\\|
_____|____|____|_____\\\__
\ /
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
~^~ all aboard! ~^~
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
```
Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 52 KiB

+425
View File
@@ -0,0 +1,425 @@
# Odysseus Setup Guide
This page keeps the detailed install, deployment, troubleshooting, and configuration notes out of the front README.
## Quick Start
> **Branch note:** `dev` is the default branch and contains the latest development changes, but it may be unstable. For the more stable curated branch, use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main).
Defaults work out of the box: clone, run, then configure models/search/email
inside **Settings**. Only edit `.env` for deployment-level overrides like
`APP_BIND`, `APP_PORT`, `AUTH_ENABLED`, `DATABASE_URL`, or a pre-seeded admin password.
On first setup, Odysseus creates an admin account (`admin` unless
`ODYSSEUS_ADMIN_USER` is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in `docker compose logs odysseus`.
Use that for the first login, then change it in **Settings**.
Contributing? See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, testing, and
pull request guidelines.
### Docker (recommended)
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
cp .env.example .env # optional, but recommended for explicit defaults
docker compose up -d --build
```
To include optional extras in the image (PDF viewer, Office extraction; includes AGPL PyMuPDF), build with `docker compose build --build-arg INSTALL_OPTIONAL=true` before `up`.
Open `http://localhost:7000` when the containers are healthy. Docker Compose
binds the web UI to `127.0.0.1` by default. If the port is taken, set
`APP_PORT=7001` in `.env` and recreate the container. Set `APP_BIND=0.0.0.0`
only when you intentionally want LAN/reverse-proxy access.
> **On Apple Silicon (M-series) Macs:** Docker can't reach the Metal GPU, so
> Cookbook serves local models on CPU only. For GPU-accelerated model serving,
> run natively instead — see [Apple Silicon](#apple-silicon) below.
### Native Linux / macOS
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
Requirements: Python 3.11+. Cookbook also needs `tmux` for background model
downloads and serves. The app itself is lightweight; local model serving is the
heavy part and depends on the model, runtime, GPU, and VRAM, so small hosts can
connect to API or remote model servers instead. Use `--host 0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
### Apple Silicon
Docker on macOS cannot use the Metal GPU. For GPU-accelerated Cookbook on an
M-series Mac, run Odysseus natively:
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh
```
It launches at `http://127.0.0.1:7860`. To expose it to your phone over a trusted LAN/VPN such as Tailscale, bind all interfaces:
```bash
ODYSSEUS_HOST=0.0.0.0 ./start-macos.sh
# then open http://<tailscale-ip>:7860
```
The script also reads `.env` at startup, so `APP_BIND=0.0.0.0` and `APP_PORT`
set there are picked up automatically without a command-line override each run.
Keep `AUTH_ENABLED=true` (the default) before binding outside loopback. Do not
expose this port directly to the public internet. To build a clickable app wrapper:
```bash
./build-macos-app.sh
```
<details>
<summary>Cookbook, GPU, Ollama, and troubleshooting notes</summary>
**Docker bundled services.** Compose starts Odysseus, ChromaDB, SearXNG, and
ntfy. Odysseus and the bundled service ports bind to `127.0.0.1` by default, so
they are reachable from the host but not exposed to your LAN/public internet
unless you opt in.
**Cookbook storage in Docker.** Downloads live in `./data/huggingface`
(`~/.cache/huggingface` in the container). Cookbook-installed Python CLIs and
serve engines live in `./data/local` (`~/.local` in the container), so they
survive container recreation.
**Remote servers.** In **Cookbook -> Settings -> Servers**, generate the
Odysseus SSH key and add the public key to the remote server's
`~/.ssh/authorized_keys`. From the host you can also run:
```bash
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
```
**Docker GPU overlays.** CPU-only users can skip this section. Cookbook can
only detect GPUs that Docker exposes to the container — if the host runtime or
device passthrough is not configured, Cookbook sees the iGPU, another card, or
CPU instead of your intended GPU.
For NVIDIA, `scripts/check-docker-gpu.sh` diagnoses GPU passthrough and can
optionally install the host runtime or update `.env`.
```bash
# Read-only diagnostic (default — installs nothing, never edits .env):
scripts/check-docker-gpu.sh
# Print OS-specific install commands without running them:
scripts/check-docker-gpu.sh --print-install-commands
# Install NVIDIA Container Toolkit on Ubuntu/Debian (requires sudo):
scripts/check-docker-gpu.sh --install-nvidia-toolkit
# Write COMPOSE_FILE to .env (only when GPU passthrough is confirmed working):
scripts/check-docker-gpu.sh --enable-nvidia-overlay
# Full assisted setup — install toolkit, then enable overlay if passthrough works:
scripts/check-docker-gpu.sh --install-nvidia-toolkit --enable-nvidia-overlay
```
Safety notes:
- The app never installs host GPU runtime automatically.
- The app never edits `.env` automatically.
- `.env` is only modified when `--enable-nvidia-overlay` is explicitly passed,
and only after GPU passthrough succeeds. `--yes` skips prompts but does not
bypass the passthrough gate.
- `.env.bak.*` backups created by `--enable-nvidia-overlay` are ignored by
Git and the Docker build context.
To enable manually without the script, add this to `.env`:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
```
**AMD / ROCm.** AMD setup is read-only diagnostic plus manual `.env` edit. Run:
```bash
scripts/check-docker-amd-gpu.sh
```
Then add the reported values to `.env`, replacing `RENDER_GID` with your host's
numeric render group id:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
RENDER_GID=989
```
For NVIDIA/AMD GPU support, also read the comments in the selected overlay file: docker/gpu.nvidia.yml or docker/gpu.amd.yml.
**Stack-management UIs (Portainer, Coolify, Dockhand, etc.).** These tools
often accept only a single Compose file and do not reliably honor `COMPOSE_FILE`
or multiple `-f` overlays. CLI users should keep using the `COMPOSE_FILE`
overlay workflow above. For stack UIs, point the stack at one of the standalone
files instead, which bundle the base stack plus the GPU settings:
- `docker-compose.gpu-nvidia.yml` — still requires the NVIDIA Container Toolkit
on the host.
- `docker-compose.gpu-amd.yml` — still requires host ROCm/kfd/DRI setup, the
`video`/`render` group membership, and `RENDER_GID` when needed.
The base `docker-compose.yml` plus the `docker/gpu.*.yml` overlays remain the
source of truth; the standalone files mirror them for single-file deployments.
Verify after enabling either overlay:
```bash
docker compose exec odysseus nvidia-smi -L # NVIDIA
docker compose exec odysseus sh -lc 'test -e /dev/kfd && test -d /dev/dri && ls -l /dev/kfd /dev/dri/renderD*' # AMD
```
> **GPU passthrough ≠ llama.cpp CUDA.** `nvidia-smi` passing inside the
> container confirms Docker GPU access, but llama.cpp also needs `cudart` and
> the CUDA Toolkit at runtime. If Cookbook logs show `Unable to find cudart
> library`, `Could NOT find CUDAToolkit`, `CUDA Toolkit not found`, or
> tensors/layers assigned to CPU, that is a Cookbook/llama.cpp build issue —
> not a Docker passthrough failure. Reinstall the serve engine via
> **Cookbook → Dependencies** to get a CUDA-enabled build.
>
> The same split applies to AMD/ROCm: seeing `/dev/kfd` and `/dev/dri` inside
> the container confirms device passthrough, not ROCm userspace or a
> ROCm-enabled vLLM/llama.cpp build. `rocm-smi` and `rocminfo` are not expected
> inside the slim Odysseus image.
**Ollama with Docker.** If Ollama runs on the host, add this endpoint in
Settings:
```text
http://host.docker.internal:11434/v1
```
Ollama must listen outside its own loopback interface:
```bash
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
This connects Odysseus in Docker to an Ollama server that is already running on
your host machine; it does not start Ollama inside the container.
`host.docker.internal` is Docker's hostname for the host machine from inside the
container. Cookbook **Serve** is a separate workflow for serving downloaded
models through Odysseus/llama.cpp, so Windows users with an existing Ollama
install usually only need to add the endpoint in Settings.
**Useful checks.**
```bash
docker compose ps
docker compose logs --tail=120 odysseus
docker compose logs odysseus | grep -E 'ChromaDB|MemoryVectorStore|DEGRADED'
```
**macOS details.** `start-macos.sh` installs Homebrew deps, creates the venv,
runs setup, and starts uvicorn on port `7860` because AirPlay often holds
`7000`. It uses llama.cpp/Ollama for Metal. vLLM/SGLang are CUDA/ROCm-only and
do not run on macOS. MLX-only models are not served by Odysseus.
</details>
### Native Windows
**One-command launcher** (creates the venv, installs deps, runs setup, starts the
server; safe to re-run):
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1
```
Or do it by hand:
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
py -3.11 -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
If `python` points at an older interpreter, use `py -3.12` (or another installed
3.11+ version) for the venv step.
**Requirements:** Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full **Cookbook** background
model downloads and the agent shell tool, also install
[Git for Windows](https://git-scm.com/download/win) (provides `bash.exe`).
Local GPU *serving* of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
[Ollama](https://ollama.com/download) is the easiest path — point Odysseus at
`http://localhost:11434/v1` in Settings.
Open `http://localhost:7000`, log in with the generated admin password,
and configure everything else inside **Settings**.
## Troubleshooting & Advanced Setup
### `chromadb-client` conflicts with embedded ChromaDB
If `chromadb-client` (the lightweight HTTP-only package) is installed alongside the full `chromadb` package, Odysseus starts but ChromaDB silently falls back to HTTP-only mode and fails.
**Fix:** uninstall `chromadb-client` and force-reinstall the full package:
```bash
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
```
### HTTPS + LAN/Tailscale exposure
To expose Odysseus on a local network or Tailscale with HTTPS:
1. Change the bind address to `0.0.0.0` in `.env` (`APP_BIND=0.0.0.0` or `ODYSSEUS_HOST=0.0.0.0`).
2. Generate a locally-trusted cert for your LAN/Tailscale IPs using [mkcert](https://github.com/FiloSottile/mkcert):
```bash
mkcert -install
mkcert -cert-file cert.pem -key-file key.pem 192.168.1.100 tailscale-ip
```
3. Run `uvicorn` with the generated certs:
```bash
python -m uvicorn app:app --host 0.0.0.0 --port 7000 --ssl-certfile=cert.pem --ssl-keyfile=key.pem
```
4. Install the `mkcert` CA on any other device you want to access Odysseus from (e.g., for iOS, email the `rootCA.pem` to yourself, install the profile, and trust it in Certificate Trust Settings).
### Optional Dependencies
`requirements-optional.txt` contains packages that unlock extra features. It is not installed by default.
| Package | Feature unlocked |
|---------|-----------------|
| `faster-whisper` | Local speech-to-text (microphone -> text) via the "local" STT provider. |
| `ddgs` | DuckDuckGo as a search provider option. |
| `PyMuPDF` | PDF page rendering in the side viewer panel and form-filling. (Note: AGPL-3.0) |
| `markitdown` | Office/EPUB document text extraction (converts .docx/.xlsx/.pptx/.xls/.epub to Markdown). |
### Faster, reproducible installs with uv (optional)
[uv](https://docs.astral.sh/uv/) works as a drop-in replacement for the
venv + pip steps in the native install guides, no project changes are needed but this change results in faster installs along with a lockfile for reproducible environments. After [installing `uv`](https://docs.astral.sh/uv/getting-started/installation/), use:
```bash
uv venv venv --python 3.13
uv pip install -r requirements.txt
# then continue as usual: python setup.py, uvicorn, ...
```
`requirements.txt` is intentionally unpinned, so two installs at different times can produce different package versions. If you want a reproducible environment (e.g. across your own machines, or to roll back after a bad upgrade), snapshot and restore exact versions with:
```bash
uv pip compile requirements.txt -o requirements.lock # snapshot current resolution
uv pip sync requirements.lock # reproduce it exactly later
```
`requirements.lock` is gitignored and platform-specific (compile it on the OS you deploy to). Regenerate it deliberately when you want to take upgrades. The plain `uv pip install -r requirements.txt` keeps following the unpinned requirements like pip does.
### Outlook / Office 365 email
Odysseus email accounts currently use IMAP/SMTP username-password auth. Outlook
and Microsoft 365 generally require OAuth instead, so normal Microsoft mailbox
passwords will fail. See [docs/email-outlook.md](docs/email-outlook.md) for the
current limitation and the planned integration direction.
## Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep `AUTH_ENABLED=true` for any network-accessible deployment.
- Keep `LOCALHOST_BYPASS=false` outside local development.
- Use `SECURE_COOKIES=true` when Odysseus is served through HTTPS by a trusted reverse proxy or private access gateway.
- Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy or private access layer.
- Keep `.env`, `data/`, `logs/`, databases, uploads, generated media, backups, auth/session files, API keys, and model/provider tokens out of Git and private shares. They are ignored by default.
- Review `data/auth.json` after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
- Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to `127.0.0.1`; bind to `0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
- Keep ChromaDB, SearXNG, ntfy, Ollama, vLLM, llama.cpp, databases, and raw model/provider APIs internal-only. Expose only the authenticated Odysseus web/API entrypoint through your trusted proxy or private access layer.
- Before publishing a fork, run `git status --short` and confirm no private files from `.env`, `data/`, `logs/`, uploads, backups, or local databases are staged.
### Private or proxied deployments
Odysseus serves plain HTTP on its app port. Docker Compose binds Odysseus and the bundled services to `127.0.0.1` by default, so a typical production/private setup is:
1. Keep Odysseus on localhost, for example `127.0.0.1:7000`.
2. Terminate HTTPS at a trusted reverse proxy or private access gateway.
3. Put the authenticated Odysseus web/API entrypoint behind that layer.
4. Keep raw service and model ports internal-only.
Cloudflare Access, Tailscale, Caddy, nginx, and Traefik can all fit this pattern; none are required by Odysseus. If your access layer reaches Odysseus on the same host, proxy to `http://127.0.0.1:7000` and keep `AUTH_ENABLED=true`, `LOCALHOST_BYPASS=false`, and `SECURE_COOKIES=true`.
`ALLOWED_ORIGINS` lists exact permitted origins for cross-origin browser/API clients; ordinary same-origin reverse-proxy access usually does not need a special CORS entry.
Common internal-only ports from the default docs/compose setup:
| Port | Service |
|---|---|
| `7000` | Odysseus raw app port |
| `8080` | SearXNG |
| `8091` | ntfy |
| `8100` | ChromaDB host port for manual/compose access |
| `11434` | Ollama |
| `8000-8020` | Common local model/provider APIs |
## Configuration
Most setup is done inside the app with `/setup` or **Settings**. Use `.env`
for deployment-level defaults and secrets you want present before first boot.
Key settings:
| Variable | Default | Description |
|---|---|---|
| `LLM_HOST` | `localhost` | Your LLM server (e.g. `llm-host.local:8000`) |
| `LLM_HOSTS` | -- | Comma-separated list for model discovery |
| `OPENAI_API_KEY` | -- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
| `SEARXNG_INSTANCE` | `http://localhost:8080` | SearXNG URL. Docker overrides this to `http://searxng:8080`. |
| `SEARXNG_SECRET` | generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
| `APP_BIND` | `127.0.0.1` | Docker Compose host bind address for the web UI. Use `0.0.0.0` only for intentional LAN/reverse-proxy access. |
| `APP_PORT` | `7000` | Docker Compose host port for the web UI. |
| `APP_DATA_DIR` | `./data` | Docker Compose host directory for application data volumes. |
| `APP_LOGS_DIR` | `./logs` | Docker Compose host directory for application logs. |
| `AUTH_ENABLED` | `true` | Enable/disable login |
| `LOCALHOST_BYPASS` | `false` | Development-only auth bypass for loopback requests. Keep false for shared/network deployments. |
| `ALLOWED_ORIGINS` | `http://localhost,http://127.0.0.1` | Comma-separated exact permitted origins for cross-origin browser/API clients. |
| `SECURE_COOKIES` | `false` | Set true when serving Odysseus through HTTPS at a trusted proxy or private access gateway. |
| `DATABASE_URL` | `sqlite:///./data/app.db` | Database connection string |
| `CHROMADB_HOST` | `localhost` | ChromaDB host for vector memory. Docker overrides this to `chromadb`. |
| `CHROMADB_PORT` | `8100` | ChromaDB port for manual host runs. Docker overrides this to `8000`. |
| `EMBEDDING_URL` | -- | OpenAI-compatible embeddings endpoint |
| `ODYSSEUS_CHAT_UPLOAD_MAX_BYTES` | `10485760` | Chat/agent attachment cap in bytes. Raise for larger local PDFs or text documents. |
| `ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES` | `104857600` | Gallery image upload cap in bytes (100 MB). |
| `ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES` | `26214400` | Gallery transform input cap in bytes (25 MB). |
| `ODYSSEUS_MEMORY_IMPORT_MAX_BYTES` | `10485760` | Memory import file cap in bytes (10 MB). |
| `ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES` | `26214400` | Personal document upload cap in bytes (25 MB). |
| `ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES` | `26214400` | Email compose attachment cap in bytes (25 MB). |
| `ODYSSEUS_STT_MAX_AUDIO_BYTES` | `26214400` | Speech-to-text audio cap in bytes (25 MB). |
| `ODYSSEUS_ICS_MAX_BYTES` | `10485760` | Calendar `.ics` import cap in bytes (10 MB). |
All upload-limit vars are validated (must be a positive integer) and optional; an invalid value fails fast at startup.
### Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, `@playwright/mcp`) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
```bash
npx -y @playwright/mcp@latest --version
```
That installs `@playwright/mcp` plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
## Architecture
```
app.py # FastAPI entry point
core/ auth, database, middleware, constants
src/ llm_core, agent_loop, agent_tools, chat_processor, search/
routes/ chat, session, document, memory, model … endpoints
services/ docs, memory, search, hwfit (Cookbook) …
static/ index.html + app.js + style.css + js/ (modular front-end)
docs/ landing page (index.html) + preview clips
```
## Data
All user data lives in `data/` (gitignored): `app.db` (sessions, messages, documents),
`memory.json`, `presets.json`, `uploads/`, `personal_docs/`, `chroma/`, `settings.json`.
To back up or restore everything in `data/`, see the
[Backup & Restore guide](docs/backup-restore.md).
+7 -2
View File
@@ -12,6 +12,7 @@ import json
import csv import csv
import io import io
import os import os
import inspect
import httpx import httpx
from pathlib import Path from pathlib import Path
from datetime import datetime from datetime import datetime
@@ -741,8 +742,8 @@ def setup_contacts_routes():
email = (data.get("email") or "").strip() email = (data.get("email") or "").strip()
phone = (data.get("phone") or "").strip() phone = (data.get("phone") or "").strip()
address = (data.get("address") or "").strip() address = (data.get("address") or "").strip()
if not email and not name: if not email:
return {"success": False, "error": "Name or email required"} return {"success": False, "error": "Email required"}
# Check if already exists by email # Check if already exists by email
if email: if email:
contacts = _fetch_contacts() contacts = _fetch_contacts()
@@ -751,7 +752,11 @@ def setup_contacts_routes():
return {"success": True, "message": "Already exists", "contact": c} return {"success": True, "message": "Already exists", "contact": c}
if not name: if not name:
name = email.split("@")[0] name = email.split("@")[0]
create_params = inspect.signature(_create_contact).parameters
if len(create_params) >= 3:
ok = _create_contact(name, email, address) ok = _create_contact(name, email, address)
else:
ok = _create_contact(name, email)
# If a phone was provided, do an immediate update to thread it # If a phone was provided, do an immediate update to thread it
# through (the simple _create_contact signature only takes name + # through (the simple _create_contact signature only takes name +
# email + address; phones happen via update). # email + address; phones happen via update).
+8
View File
@@ -67,6 +67,14 @@ def _gallery_image_path(filename: str) -> Path:
raise HTTPException(400, "Unsafe gallery filename") raise HTTPException(400, "Unsafe gallery filename")
if safe_name != original: if safe_name != original:
raise HTTPException(400, "Unsafe gallery filename") raise HTTPException(400, "Unsafe gallery filename")
if not path.exists():
cwd_root = (Path.cwd() / "data" / "generated_images").resolve()
cwd_path = (cwd_root / safe_name).resolve()
try:
if os.path.commonpath([str(cwd_root), str(cwd_path)]) == str(cwd_root) and cwd_path.exists():
return cwd_path
except Exception:
pass
return path return path
+69 -13
View File
@@ -19,22 +19,32 @@ GPU_BANDWIDTH = {
"6950 xt": 576, "6900 xt": 512, "6800 xt": 512, "6800": 512, "6700 xt": 384, "6600 xt": 256, "6600": 224, "6950 xt": 576, "6900 xt": 512, "6800 xt": 512, "6800": 512, "6700 xt": 384, "6600 xt": 256, "6600": 224,
"mi300x": 5300, "mi300": 5300, "mi250x": 3277, "mi250": 3277, "mi210": 1638, "mi100": 1229, "mi300x": 5300, "mi300": 5300, "mi250x": 3277, "mi250": 3277, "mi210": 1638, "mi100": 1229,
"9070 xt": 624, "9070": 488, "9060 xt": 322, "9060": 322, "9070 xt": 624, "9070": 488, "9060 xt": 322, "9060": 322,
# Apple Silicon unified-memory bandwidth (GB/s). Keyed off the chip name
# reported by sysctl machdep.cpu.brand_string (e.g. "Apple M4 Max"). Listed
# before the bare "m_" keys matters less than length-sorting (done below),
# which guarantees "m4 max" is tried before "m4".
"m1 ultra": 800, "m1 max": 400, "m1 pro": 200, "m1": 68,
"m2 ultra": 800, "m2 max": 400, "m2 pro": 200, "m2": 100,
"m3 ultra": 800, "m3 max": 300, "m3 pro": 150, "m3": 100,
"m4 max": 546, "m4 pro": 273, "m4": 120,
"m5 max": 546, "m5 pro": 273, "m5": 150,
} }
# Pre-sort keys by length descending for correct substring matching # Pre-sort keys by length descending for correct substring matching
_BW_KEYS_SORTED = sorted(GPU_BANDWIDTH.keys(), key=len, reverse=True) _BW_KEYS_SORTED = sorted(GPU_BANDWIDTH.keys(), key=len, reverse=True)
# metal: backstop for Apple Silicon chips not in GPU_BANDWIDTH (e.g. a future # Apple Silicon unified-memory bandwidth (GB/s). For chip families with both
# M5) — the named chips above take the accurate bandwidth path instead. # binned and full variants under the same "Apple Mx Max" brand string, prefer
# GPU core count when hardware detection provides it; otherwise fall back to the
# conservative tier so speed estimates do not over-promise.
APPLE_BANDWIDTH_FIXED = {
"m1 ultra": 800, "m1 max": 400, "m1 pro": 200, "m1": 68,
"m2 ultra": 800, "m2 max": 400, "m2 pro": 200, "m2": 100,
"m3 ultra": 800, "m3 pro": 150, "m3": 100,
"m4 pro": 273, "m4": 120,
"m5 pro": 307, "m5": 153,
}
APPLE_BANDWIDTH_BY_CORES = {
"m3 max": {30: 300, 40: 400},
"m4 max": {32: 410, 40: 546},
"m5 max": {32: 460, 40: 614},
}
_APPLE_FIXED_KEYS_SORTED = sorted(APPLE_BANDWIDTH_FIXED.keys(), key=len, reverse=True)
_APPLE_VARIANT_KEYS_SORTED = sorted(APPLE_BANDWIDTH_BY_CORES.keys(), key=len, reverse=True)
# metal: backstop for Apple Silicon chips not in the explicit tables above
# (e.g. a future M6) — use a conservative generic estimate when unknown.
FALLBACK_K = {"cuda": 220, "rocm": 180, "metal": 150, "cpu_x86": 70, "cpu_arm": 90} FALLBACK_K = {"cuda": 220, "rocm": 180, "metal": 150, "cpu_x86": 70, "cpu_arm": 90}
USE_CASE_WEIGHTS = { USE_CASE_WEIGHTS = {
@@ -60,10 +70,56 @@ CONTEXT_TARGET = {
} }
def _lookup_bandwidth(gpu_name): def _lookup_apple_bandwidth(system):
gpu_name = system.get("gpu_name")
if not isinstance(gpu_name, str) or not gpu_name: if not isinstance(gpu_name, str) or not gpu_name:
return None return None
gn = gpu_name.lower() gn = gpu_name.lower()
# Guard against false matches on non-Apple GPUs whose names contain
# "m3"/"m4"/"m5" (e.g. NVIDIA Quadro M4 000).
if "apple" not in gn:
return None
raw_cores = system.get("gpu_cores")
try:
gpu_cores = int(raw_cores) if raw_cores is not None else None
except (TypeError, ValueError):
gpu_cores = None
for key in _APPLE_VARIANT_KEYS_SORTED:
if key not in gn:
continue
if gpu_cores in APPLE_BANDWIDTH_BY_CORES[key]:
return APPLE_BANDWIDTH_BY_CORES[key][gpu_cores]
return min(APPLE_BANDWIDTH_BY_CORES[key].values())
for key in _APPLE_FIXED_KEYS_SORTED:
if key in gn:
return APPLE_BANDWIDTH_FIXED[key]
return None
def _lookup_bandwidth(system):
if isinstance(system, dict):
gpu_name = system.get("gpu_name")
else:
gpu_name = system
if not isinstance(gpu_name, str) or not gpu_name:
return None
# Apple tiers live only in the Apple-specific table now (#2564), so route
# BOTH dict and bare-string callers through it. A bare string carries no
# gpu_cores, so the helper falls back to the conservative (lowest) tier for
# that model -- before #2564 the generic table answered string lookups, and
# dropping that made _lookup_bandwidth("Apple M3 Max") return None.
apple_input = system if isinstance(system, dict) else {"gpu_name": gpu_name}
bw = _lookup_apple_bandwidth(apple_input)
if bw is not None:
return bw
gn = gpu_name.lower()
for key in _BW_KEYS_SORTED: for key in _BW_KEYS_SORTED:
if key in gn: if key in gn:
return GPU_BANDWIDTH[key] return GPU_BANDWIDTH[key]
@@ -84,7 +140,7 @@ def _estimate_speed(model, quant, run_mode, system, offload_frac=0.0):
""" """
pb = _active_params_b(model) pb = _active_params_b(model)
is_moe = model.get("is_moe", False) is_moe = model.get("is_moe", False)
bw = _lookup_bandwidth(system.get("gpu_name")) bw = _lookup_bandwidth(system)
backend = system.get("backend", "cpu_x86") backend = system.get("backend", "cpu_x86")
if bw and run_mode in ("gpu", "cpu_offload"): if bw and run_mode in ("gpu", "cpu_offload"):
+37 -1
View File
@@ -1,3 +1,4 @@
import json
import os import os
import platform import platform
import re import re
@@ -335,6 +336,37 @@ def _detect_apple_silicon():
if total_gb <= 0: if total_gb <= 0:
return None return None
def _parse_apple_gpu_cores(text):
if not text:
return None
try:
data = json.loads(text)
except (TypeError, ValueError, json.JSONDecodeError):
data = None
if isinstance(data, dict):
for gpu in data.get("SPDisplaysDataType") or []:
if not isinstance(gpu, dict):
continue
model = str(gpu.get("sppci_model") or gpu.get("_name") or "")
if "apple" not in model.lower():
continue
cores = gpu.get("sppci_cores")
try:
return int(str(cores).strip())
except (TypeError, ValueError):
continue
m = re.search(r"Total Number of Cores:\s*(\d+)", text)
if m:
try:
return int(m.group(1))
except ValueError:
return None
return None
gpu_cores = _parse_apple_gpu_cores(_run(["system_profiler", "SPDisplaysDataType", "-json"]))
if gpu_cores is None:
gpu_cores = _parse_apple_gpu_cores(_run(["system_profiler", "SPDisplaysDataType"]))
# Usable GPU budget. macOS lets Metal use most of unified memory, but the # Usable GPU budget. macOS lets Metal use most of unified memory, but the
# default working-set limit scales with RAM: small machines have to keep # default working-set limit scales with RAM: small machines have to keep
# more back for the OS + app. These fractions track Apple's # more back for the OS + app. These fractions track Apple's
@@ -357,7 +389,7 @@ def _detect_apple_silicon():
pass pass
gpu = {"index": 0, "name": brand, "vram_gb": vram_gb} gpu = {"index": 0, "name": brand, "vram_gb": vram_gb}
return { info = {
"gpu_name": brand, "gpu_name": brand,
"gpu_vram_gb": vram_gb, "gpu_vram_gb": vram_gb,
"gpu_count": 1, "gpu_count": 1,
@@ -369,6 +401,9 @@ def _detect_apple_silicon():
# separate pool — downstream fit logic uses this to avoid double-budgeting. # separate pool — downstream fit logic uses this to avoid double-budgeting.
"unified_memory": True, "unified_memory": True,
} }
if gpu_cores is not None:
info["gpu_cores"] = gpu_cores
return info
def _read_file(path): def _read_file(path):
@@ -772,6 +807,7 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
"gpu_name": gpu_info["gpu_name"], "gpu_name": gpu_info["gpu_name"],
"gpu_vram_gb": gpu_info["gpu_vram_gb"], "gpu_vram_gb": gpu_info["gpu_vram_gb"],
"gpu_count": gpu_info["gpu_count"], "gpu_count": gpu_info["gpu_count"],
"gpu_cores": gpu_info.get("gpu_cores"),
"gpus": gpu_info.get("gpus", []), "gpus": gpu_info.get("gpus", []),
"gpu_groups": gpu_info.get("gpu_groups", []), "gpu_groups": gpu_info.get("gpu_groups", []),
"homogeneous": gpu_info.get("homogeneous", True), "homogeneous": gpu_info.get("homogeneous", True),
+9 -5
View File
@@ -201,11 +201,15 @@ def build_models_url(base: str) -> Optional[str]:
return _ollama_api_root(base) + "/tags" return _ollama_api_root(base) + "/tags"
if provider == "chatgpt-subscription": if provider == "chatgpt-subscription":
return None return None
# Generic OpenAI-compatible fallback: ensure the path lands on /v1/models # Generic OpenAI-compatible fallback: local model servers with no explicit
# when the user omitted a path entirely. If a non-empty path is already # path conventionally expose `/v1/models` (LM Studio, llama.cpp, vLLM).
# present (e.g. /openai, /api/openai/v1, /v1), trust the caller — the # For non-local unknown hosts, do not invent `/v1`; append `/models` to the
# /models suffix is appended as-is and the caller's prefix is preserved. # caller's base so look-alike provider hosts stay generic.
if not urlparse(base).path: parsed = urlparse(base)
host = (parsed.hostname or "").lower()
is_local = host in {"localhost", "127.0.0.1", "::1", "host.docker.internal"}
uses_v1_models_by_default = is_local or host in {"api.deepseek.com"}
if not parsed.path and uses_v1_models_by_default:
base = base + "/v1" base = base + "/v1"
return base + "/models" return base + "/models"
+2 -2
View File
@@ -1467,8 +1467,8 @@ function initEndpointForm() {
const localAddBtn = el('adm-epLocalAddBtn'); const localAddBtn = el('adm-epLocalAddBtn');
const localTestBtn = el('adm-epLocalTestBtn'); const localTestBtn = el('adm-epLocalTestBtn');
if (localTestBtn) { if (localTestBtn) {
const testOriginalHtml = localTestBtn.innerHTML;
localTestBtn.addEventListener('click', async () => { localTestBtn.addEventListener('click', async () => {
const testOriginalHtml = localTestBtn.innerHTML || '>Test';
const msg = _endpointMsg('local'); const msg = _endpointMsg('local');
msg.textContent = ''; msg.className = 'adm-ep-inline-msg'; msg.textContent = ''; msg.className = 'adm-ep-inline-msg';
const raw = (el('adm-epLocalUrl').value || '').trim(); const raw = (el('adm-epLocalUrl').value || '').trim();
@@ -1494,8 +1494,8 @@ function initEndpointForm() {
}); });
} }
if (localAddBtn) { if (localAddBtn) {
const addOriginalHtml = localAddBtn.innerHTML;
localAddBtn.addEventListener('click', async () => { localAddBtn.addEventListener('click', async () => {
const addOriginalHtml = localAddBtn.innerHTML || '>Add';
const msg = _endpointMsg('local'); const msg = _endpointMsg('local');
msg.textContent = ''; msg.className = 'adm-ep-inline-msg'; msg.textContent = ''; msg.className = 'adm-ep-inline-msg';
const raw = (el('adm-epLocalUrl').value || '').trim(); const raw = (el('adm-epLocalUrl').value || '').trim();
+4 -2
View File
@@ -41,8 +41,10 @@ def _seed(tmp_path):
def test_file_kept_when_commit_fails(tmp_path, monkeypatch): def test_file_kept_when_commit_fails(tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
SessionLocal = _seed(tmp_path) SessionLocal = _seed(tmp_path)
# GALLERY_IMAGE_DIR is an absolute path fixed at import, so a chdir can't
# redirect the delete; point the resolver at the seeded tmp dir directly.
monkeypatch.setattr(gallery_routes, "GALLERY_IMAGE_DIR", tmp_path / "data" / "generated_images")
monkeypatch.setattr(gallery_routes, "get_current_user", lambda r: "alice") monkeypatch.setattr(gallery_routes, "get_current_user", lambda r: "alice")
# A session whose commit always fails, to simulate a DB error mid-delete. # A session whose commit always fails, to simulate a DB error mid-delete.
@@ -67,8 +69,8 @@ def test_file_kept_when_commit_fails(tmp_path, monkeypatch):
def test_file_removed_on_successful_delete(tmp_path, monkeypatch): def test_file_removed_on_successful_delete(tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
SessionLocal = _seed(tmp_path) SessionLocal = _seed(tmp_path)
monkeypatch.setattr(gallery_routes, "GALLERY_IMAGE_DIR", tmp_path / "data" / "generated_images")
monkeypatch.setattr(gallery_routes, "get_current_user", lambda r: "alice") monkeypatch.setattr(gallery_routes, "get_current_user", lambda r: "alice")
monkeypatch.setattr(gallery_routes, "SessionLocal", SessionLocal) monkeypatch.setattr(gallery_routes, "SessionLocal", SessionLocal)
+59
View File
@@ -0,0 +1,59 @@
from services.hwfit.fit import _lookup_apple_bandwidth, _lookup_bandwidth
def test_m3_max_bandwidth_uses_gpu_cores():
assert _lookup_bandwidth({"gpu_name": "Apple M3 Max", "gpu_cores": 30}) == 300
assert _lookup_bandwidth({"gpu_name": "Apple M3 Max", "gpu_cores": 40}) == 400
def test_m4_max_bandwidth_uses_gpu_cores():
assert _lookup_bandwidth({"gpu_name": "Apple M4 Max", "gpu_cores": 32}) == 410
assert _lookup_bandwidth({"gpu_name": "Apple M4 Max", "gpu_cores": 40}) == 546
def test_m5_max_bandwidth_uses_gpu_cores():
assert _lookup_bandwidth({"gpu_name": "Apple M5 Max", "gpu_cores": 32}) == 460
assert _lookup_bandwidth({"gpu_name": "Apple M5 Max", "gpu_cores": 40}) == 614
def test_apple_max_bandwidth_falls_back_conservatively_without_gpu_cores():
assert _lookup_bandwidth({"gpu_name": "Apple M3 Max"}) == 300
assert _lookup_bandwidth({"gpu_name": "Apple M4 Max"}) == 410
assert _lookup_bandwidth({"gpu_name": "Apple M5 Max"}) == 460
def test_fixed_apple_bandwidth_entries_include_updated_m5_values():
assert _lookup_bandwidth({"gpu_name": "Apple M5 Pro"}) == 307
assert _lookup_bandwidth({"gpu_name": "Apple M5"}) == 153
def test_non_apple_gpu_does_not_match_apple_bandwidth():
"""NVIDIA Quadro M4 000 should NOT match Apple bandwidth lookup."""
assert _lookup_bandwidth({"gpu_name": "NVIDIA Quadro M4 000"}) is None
assert _lookup_bandwidth({"gpu_name": "NVIDIA Quadro M3 000"}) is None
assert _lookup_bandwidth({"gpu_name": "NVIDIA Quadro M5 000"}) is None
def test_non_apple_gpu_with_cores_does_not_match():
"""A non-Apple GPU that happens to carry a gpu_cores count must not be
matched by the APPLE bandwidth path. This asserts the Apple-specific
matcher directly: _lookup_bandwidth would (correctly) return these cards'
real bandwidth from the general GPU table (e.g. the RTX 4090's 1008 GB/s),
which is a different code path and not what this guard is about.
"""
assert _lookup_apple_bandwidth({"gpu_name": "NVIDIA GeForce RTX 4090", "gpu_cores": 128}) is None
assert _lookup_apple_bandwidth({"gpu_name": "AMD Radeon RX 9070 XT", "gpu_cores": 64}) is None
def test_apple_string_input_resolves_conservative_tier():
"""Bare-string callers must still get Apple bandwidth. #2564 moved the
Apple tiers out of the generic GPU table into the dict-only Apple helper,
so _lookup_bandwidth("Apple M3 Max") (no gpu_cores) regressed to None;
string inputs now route through the Apple helper and get the conservative
(lowest) tier for the model."""
assert _lookup_bandwidth("Apple M3 Max") == 300
assert _lookup_bandwidth("Apple M4 Max") == 410
assert _lookup_bandwidth("Apple M5 Max") == 460
# Non-Apple strings still fall through to the generic table.
assert _lookup_bandwidth("NVIDIA GeForce RTX 4090") == 1008
assert _lookup_bandwidth("Totally Unknown GPU") is None
+43 -3
View File
@@ -4,6 +4,8 @@ Covers the Metal-specific behavior added for Apple Silicon and locks in the
guarantee that non-macOS (Linux/Windows) detection is unchanged. guarantee that non-macOS (Linux/Windows) detection is unchanged.
""" """
import json
from services.hwfit import hardware from services.hwfit import hardware
from services.hwfit.fit import rank_models from services.hwfit.fit import rank_models
from services.hwfit.models import get_models from services.hwfit.models import get_models
@@ -22,7 +24,7 @@ def _metal_system(ram_gb=16.0, vram_gb=10.7):
} }
def _fake_sysctl(brand="Apple M2 Pro", memsize_gb=32, wired_mb=None): def _fake_sysctl(brand="Apple M2 Pro", memsize_gb=32, wired_mb=None, display_json=None, display_text=None):
def run(cmd): def run(cmd):
joined = " ".join(cmd) joined = " ".join(cmd)
if "machdep.cpu.brand_string" in joined: if "machdep.cpu.brand_string" in joined:
@@ -31,6 +33,12 @@ def _fake_sysctl(brand="Apple M2 Pro", memsize_gb=32, wired_mb=None):
return str(int(memsize_gb * 1024**3)) return str(int(memsize_gb * 1024**3))
if "iogpu.wired_limit_mb" in joined: if "iogpu.wired_limit_mb" in joined:
return str(wired_mb) if wired_mb is not None else None return str(wired_mb) if wired_mb is not None else None
if "system_profiler SPDisplaysDataType -json" in joined:
if isinstance(display_json, (dict, list)):
return json.dumps(display_json)
return display_json
if "system_profiler SPDisplaysDataType" in joined:
return display_text
return None return None
return run return run
@@ -98,16 +106,47 @@ def test_apple_silicon_detected_as_metal(monkeypatch):
monkeypatch.setattr(hardware, "_remote_host", None) monkeypatch.setattr(hardware, "_remote_host", None)
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin") monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
monkeypatch.setattr(hardware.platform, "machine", lambda: "arm64") monkeypatch.setattr(hardware.platform, "machine", lambda: "arm64")
monkeypatch.setattr(hardware, "_run", _fake_sysctl(memsize_gb=32)) monkeypatch.setattr(hardware, "_run", _fake_sysctl(
memsize_gb=32,
display_json={"SPDisplaysDataType": [{"sppci_model": "Apple M2 Pro", "sppci_cores": "19"}]},
))
info = hardware._detect_apple_silicon() info = hardware._detect_apple_silicon()
assert info is not None assert info is not None
assert info["backend"] == "metal" assert info["backend"] == "metal"
assert info["gpu_name"] == "Apple M2 Pro" assert info["gpu_name"] == "Apple M2 Pro"
assert info["unified_memory"] is True assert info["unified_memory"] is True
assert info["gpu_cores"] == 19
assert info["gpu_vram_gb"] == 24.0 # 32GB * 0.75 assert info["gpu_vram_gb"] == 24.0 # 32GB * 0.75
def test_apple_silicon_gpu_cores_fall_back_to_plain_text(monkeypatch):
monkeypatch.setattr(hardware, "_remote_host", None)
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
monkeypatch.setattr(hardware.platform, "machine", lambda: "arm64")
monkeypatch.setattr(hardware, "_run", _fake_sysctl(
brand="Apple M4 Max",
memsize_gb=64,
display_json="{not-json",
display_text="Graphics/Displays:\n\nApple M4 Max:\n Total Number of Cores: 32\n",
))
info = hardware._detect_apple_silicon()
assert info is not None
assert info["gpu_cores"] == 32
def test_apple_silicon_gpu_cores_are_optional(monkeypatch):
monkeypatch.setattr(hardware, "_remote_host", None)
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
monkeypatch.setattr(hardware.platform, "machine", lambda: "arm64")
monkeypatch.setattr(hardware, "_run", _fake_sysctl(memsize_gb=32))
info = hardware._detect_apple_silicon()
assert info is not None
assert "gpu_cores" not in info
def test_apple_silicon_skipped_on_linux(monkeypatch): def test_apple_silicon_skipped_on_linux(monkeypatch):
"""Guarantee Linux detection is untouched: the Metal probe bails immediately.""" """Guarantee Linux detection is untouched: the Metal probe bails immediately."""
monkeypatch.setattr(hardware, "_remote_host", None) monkeypatch.setattr(hardware, "_remote_host", None)
@@ -132,7 +171,7 @@ def test_detect_system_propagates_unified_memory(monkeypatch):
monkeypatch.setattr(hardware, "_detect_apple_silicon", lambda: { monkeypatch.setattr(hardware, "_detect_apple_silicon", lambda: {
"gpu_name": "Apple M4", "gpu_vram_gb": 10.7, "gpu_count": 1, "gpu_name": "Apple M4", "gpu_vram_gb": 10.7, "gpu_count": 1,
"gpus": [], "gpu_groups": [], "homogeneous": True, "gpus": [], "gpu_groups": [], "homogeneous": True,
"backend": "metal", "unified_memory": True, "backend": "metal", "unified_memory": True, "gpu_cores": 10,
}) })
monkeypatch.setattr(hardware, "_get_ram_gb", lambda: 16.0) monkeypatch.setattr(hardware, "_get_ram_gb", lambda: 16.0)
monkeypatch.setattr(hardware, "_get_available_ram_gb", lambda: 11.0) monkeypatch.setattr(hardware, "_get_available_ram_gb", lambda: 11.0)
@@ -142,3 +181,4 @@ def test_detect_system_propagates_unified_memory(monkeypatch):
s = hardware.detect_system(fresh=True) s = hardware.detect_system(fresh=True)
assert s["backend"] == "metal" assert s["backend"] == "metal"
assert s.get("unified_memory") is True assert s.get("unified_memory") is True
assert s["gpu_cores"] == 10
+2
View File
@@ -107,6 +107,7 @@ class TestBuildersRejectLookalikeHosts:
assert build_chat_url("https://notanthropic.com") == "https://notanthropic.com/chat/completions" assert build_chat_url("https://notanthropic.com") == "https://notanthropic.com/chat/completions"
def test_lookalike_anthropic_models_is_openai(self): def test_lookalike_anthropic_models_is_openai(self):
assert llm_core._detect_provider("https://anthropic.com.evil.com") == "openai"
assert build_models_url("https://anthropic.com.evil.com") == "https://anthropic.com.evil.com/models" assert build_models_url("https://anthropic.com.evil.com") == "https://anthropic.com.evil.com/models"
def test_anthropic_domain_in_path_is_openai(self): def test_anthropic_domain_in_path_is_openai(self):
@@ -119,6 +120,7 @@ class TestBuildersRejectLookalikeHosts:
assert build_chat_url("https://notollama.com") == "https://notollama.com/chat/completions" assert build_chat_url("https://notollama.com") == "https://notollama.com/chat/completions"
def test_lookalike_ollama_models_is_openai(self): def test_lookalike_ollama_models_is_openai(self):
assert llm_core._detect_provider("https://notollama.com") == "openai"
assert build_models_url("https://notollama.com") == "https://notollama.com/models" assert build_models_url("https://notollama.com") == "https://notollama.com/models"