Merge remote-tracking branch 'upstream/dev' into feat/llm-self-eval

This commit is contained in:
GeekLuffy
2026-06-24 13:07:10 +05:30
332 changed files with 30741 additions and 5444 deletions
+4
View File
@@ -15,6 +15,10 @@ build/
# at runtime — never baked into the image. Mirrored in .gitignore.
secrets.env
secrets.env.*
secrets.env~
.secrets.env.swp
.secrets.env.swo
**/#secrets.env#
!secrets.env.example
/data/
/logs/
+7
View File
@@ -190,3 +190,10 @@ SEARXNG_INSTANCE=http://localhost:8080
# These overlays only expose the GPU devices. The slim Odysseus image
# still needs CUDA/ROCm userspace via Cookbook -> Dependencies (vLLM,
# llama-cpp-python, etc.) before models can actually serve on GPU.
# ============================================================
# Storage Paths (Docker Compose)
# ============================================================
# APP_DATA_DIR=./data
# APP_LOGS_DIR=./logs
+123
View File
@@ -0,0 +1,123 @@
# Pull Request Review Template
Use this shape as a copyable reference for substantive PR reviews; GitHub does
not auto-apply this file to review comments. Omit sections that do not add
useful signal. Lead with confirmed findings; keep speculative notes out of the
public review unless they are framed as a concrete open question.
## Small PR Path
For narrow docs, typo, test-only, or obvious local fixes, a short review is
enough:
```md
LGTM after checking:
- scope:
- validation:
- residual risk:
```
Use the fuller structure below for larger, risky, multi-finding, or
security-sensitive reviews.
## Findings
**<sub><sub>![P2 Badge](https://img.shields.io/badge/P2-yellow?style=flat)</sub></sub> issue (test): Short issue title**
- **Problem:** Concrete broken flow, contract, input, or risk.
- **Impact:** Why this matters to users, CI, maintainers, data, security, or scale.
- **Ask:** Smallest practical correction or decision the author should make.
- **Location:** `path:line`
## Open Questions
- **question (scope, non-blocking): Short author question** Ask the concrete
intent, scope, or tradeoff question.
## Validation
- Ran:
- Not run:
- Residual risk:
## PR Hygiene
- Target/template/checks:
- Related, duplicate, or superseding context:
## No Findings Variant
```md
## Findings
none confirmed
## Validation
- Ran:
- Not run:
- Residual risk:
```
## Legend
- **Findings:** Verified, author-actionable issues that should be fixed or
consciously accepted before merge.
- **Priority badges:** The shields.io badges below are optional formatting for
priority labels. Plain `P0`, `P1`, `P2`, or `P3` text is also acceptable when
an external image dependency is undesirable or may not render.
- **P0:** `![P0 Badge](https://img.shields.io/badge/P0-red?style=flat)` -
release-blocking or actively dangerous.
- **P1:** `![P1 Badge](https://img.shields.io/badge/P1-orange?style=flat)` -
serious bug, security risk, data-loss risk, or broken primary flow.
- **P2:** `![P2 Badge](https://img.shields.io/badge/P2-yellow?style=flat)` -
meaningful correctness, test, maintainability, or edge-case issue.
- **P3:** `![P3 Badge](https://img.shields.io/badge/P3-lightgrey?style=flat)` -
minor polish or low-risk cleanup.
- **Intent labels:**
- **`issue`:** A confirmed defect, regression, broken contract, or concrete
risk.
- **`suggestion`:** A non-blocking improvement that would make the PR clearer,
safer, or easier to maintain.
- **`nit`:** A tiny, non-blocking cleanup or style note. Use it only when the
author can safely ignore it without changing the review outcome.
- **`question`:** A real author-facing clarification about intent, scope, or
tradeoffs. Do not use questions to hide an issue that should be stated
directly.
- **`LGTM`:** "Looks good to me." Use only when the review found no blocking
issues, or when any remaining notes are clearly optional.
- **Decorations:** Optional labels in parentheses that clarify the finding type,
scope, or merge impact.
- **`security`:** Auth, authorization, ownership, secrets, SSRF, injection,
unsafe external input, or other trust-boundary concerns.
- **`test`:** Missing, failing, misleading, brittle, or insufficient tests.
- **`scope`:** PR scope, feature boundaries, unrelated churn, or work that
should be split into a separate issue or PR.
- **`ci`:** CI configuration, workflow failures, flaky checks, or validation
signal quality.
- **`api`:** Route, request/response, public function, schema, persistence, or
integration contract changes.
- **`docs`:** User-facing docs, contributor docs, examples, or comments that
need to change with the code.
- **`non-blocking`:** Useful feedback that should not prevent merge by
itself.
- **Finding fields:**
- **Problem:** What is wrong, what contract is ambiguous, or what risk the PR
introduces.
- **Impact:** Why the problem matters in practical terms.
- **Ask:** The smallest concrete fix, test, or decision requested from the PR
author.
- **Location:** The most useful repo-relative file and line reference for the
finding, using `path:line`.
- **Optional sections:**
- **Open Questions:** Genuine scope or intent questions; omit when there are
no real questions.
- **Validation:** What the reviewer ran, what was intentionally not run, and
what risk remains after review.
- **PR Hygiene:** Target-branch, template, CI/check, duplicate, related-work,
or superseding-PR notes.
- **`none confirmed`:** Use only when no review-worthy findings were confirmed;
still list validation gaps or residual risk when relevant.
+6 -6
View File
@@ -19,10 +19,10 @@ jobs:
name: Python syntax (compileall)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
# Byte-compile sources — catches syntax errors without installing deps.
@@ -32,10 +32,10 @@ jobs:
name: JS syntax (node --check)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
- uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
with:
node-version: "20"
# Syntax-check our own JS (skip vendored libs in static/lib).
@@ -54,7 +54,7 @@ jobs:
# ROADMAP "fresh install smoke tests" item; make this required once green.
continue-on-error: true
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
fetch-depth: 0
persist-credentials: false
@@ -81,7 +81,7 @@ jobs:
echo "docs_only=false" >> "$GITHUB_OUTPUT"
fi
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
if: steps.docs-check.outputs.docs_only != 'true'
with:
python-version: "3.11"
-61
View File
@@ -1,61 +0,0 @@
# CodeQL code scanning
#
# Purpose: GitHub's own static analysis engine reads the application source
# (Python backend + the JavaScript frontend) and looks for real
# vulnerabilities -- SQL/command injection, path traversal, auth mistakes,
# unsafe deserialization. Findings appear in the repo's Security tab. This is
# the deepest check in the suite and the most valuable for a high-profile
# target.
#
# It runs on every push to main and on a weekly schedule (to catch newly
# disclosed query patterns against unchanged code). It deliberately does NOT
# run on pull requests: most PRs here come from forks, whose read-only token
# cannot publish results, which would produce confusing failures. To scan pull
# requests too, a maintainer can instead enable CodeQL "default setup" in
# Settings -> Security -> Code scanning (one toggle, no file needed) -- see
# docs/security-ci.md.
name: CodeQL
on:
push:
branches: [main]
schedule:
# Weekly, Monday 06:00 UTC.
- cron: '0 6 * * 1'
workflow_dispatch:
permissions: {}
concurrency:
group: codeql-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze (${{ matrix.language }})
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write # publish results to the Security tab
strategy:
fail-fast: false
matrix:
# Both are interpreted, so CodeQL needs no build step (build-mode none).
language: [python, javascript-typescript]
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Initialize CodeQL
uses: github/codeql-action/init@03e4368ac7daa2bd82b3e85262f3bf87ee112f57 # v3.36.0
with:
languages: ${{ matrix.language }}
build-mode: none
- name: Perform CodeQL analysis
uses: github/codeql-action/analyze@03e4368ac7daa2bd82b3e85262f3bf87ee112f57 # v3.36.0
with:
category: "/language:${{ matrix.language }}"
+1 -1
View File
@@ -37,7 +37,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
+3 -3
View File
@@ -52,7 +52,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
@@ -93,7 +93,7 @@ jobs:
security-events: write # upload SARIF to the Security tab
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
@@ -119,7 +119,7 @@ jobs:
TRIVY_DB_REPOSITORY: ghcr.io/aquasecurity/trivy-db:2
- name: Upload Trivy results
uses: github/codeql-action/upload-sarif@03e4368ac7daa2bd82b3e85262f3bf87ee112f57 # v3.36.0
uses: github/codeql-action/upload-sarif@8aad20d150bbac5944a9f9d289da16a4b0d87c1e # v4.36.2
with:
sarif_file: trivy-results.sarif
category: trivy-image
+2 -2
View File
@@ -36,7 +36,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
@@ -55,7 +55,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
+2 -2
View File
@@ -45,7 +45,7 @@ jobs:
arch: arm64
runner: ubuntu-24.04-arm
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
- name: Set up Buildx
@@ -86,7 +86,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
- name: Read APP_VERSION + short sha
@@ -14,7 +14,7 @@ jobs:
# Skip bots (Dependabot, release-drafter, etc.)
if: ${{ github.event.issue.user.type != 'Bot' }}
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
sparse-checkout: .github/scripts
persist-credentials: false
+1 -1
View File
@@ -23,7 +23,7 @@ jobs:
# Skip bots: they open PRs programmatically and have their own process.
if: github.event.pull_request.user.type != 'Bot'
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
- uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
ref: ${{ github.base_ref }}
sparse-checkout: .github/scripts
+1 -1
View File
@@ -35,7 +35,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
# Full history so a secret committed in an earlier commit (and later
# deleted) is still caught -- deletion does not remove it from Git.
+2 -2
View File
@@ -36,7 +36,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
@@ -61,7 +61,7 @@ jobs:
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
with:
persist-credentials: false
+1
View File
@@ -86,6 +86,7 @@ Bundled in `static/fonts/`:
| [Fira Code](https://github.com/tonsky/FiraCode) | SIL Open Font License 1.1 | Nikita Prokopov & contributors |
| [Inter](https://github.com/rsms/inter) | SIL Open Font License 1.1 | Rasmus Andersson |
| [GohuFont](https://font.gohu.org/) (`fonts/custom/GohuFont.ttf`) | WTFPL | Hugo Chargois |
| [OpenDyslexic](https://opendyslexic.org/) (`fonts/OpenDyslexic-{Regular,Bold}.woff2`) | SIL Open Font License 1.1 ([`licenses/OpenDyslexic-OFL.txt`](licenses/OpenDyslexic-OFL.txt)) | Abbie Gonzalez |
## Python dependencies
+1 -1
View File
@@ -37,7 +37,7 @@ Manual development uses Python 3.11+:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m uvicorn app:app --host 0.0.0.0 --port 7000
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
Windows is not actively tested. Docker on Linux or a Linux/macOS manual install is the safer path for now.
+48 -1
View File
@@ -1,4 +1,15 @@
FROM python:3.12-slim
# ---- builder: patch + build wheels for Real-ESRGAN's broken-on-3.14 deps ----
# basicsr/gfpgan/facexlib read their version via exec()+locals()['__version__'],
# which raises KeyError on Python 3.13+ (PEP 667). Build patched wheels here so
# the final image / Cookbook never has to compile the broken sdists. See
# docker/build-realesrgan-wheels.sh for the full rationale.
FROM python:3.14-slim AS realesrgan-wheels
RUN apt-get update && apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*
COPY docker/build-realesrgan-wheels.sh /usr/local/bin/build-realesrgan-wheels.sh
RUN bash /usr/local/bin/build-realesrgan-wheels.sh /wheels
FROM python:3.14-slim
# System deps. tmux is required by Cookbook for background downloads/serves.
# openssh-client is required for Cookbook remote server tests, setup, probes,
@@ -18,8 +29,35 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
tmux \
openssh-client \
gosu \
libgl1 \
libglib2.0-0t64 \
libxcb1 \
&& rm -rf /var/lib/apt/lists/*
# libgl1/libglib2.0-0t64/libxcb1 are runtime shared libs (libGL.so.1,
# libglib-2.0/libgthread, libxcb.so.1) that opencv-python (cv2) loads. The
# slim base omits them, so the Cookbook "install realesrgan" path imports cv2
# and dies with `libxcb.so.1: cannot open shared object file` despite a clean
# pip install. Using full opencv-python (not -headless) because basicsr/gfpgan/
# facexlib/realesrgan all depend on the `opencv-python` distribution by name.
# Docker CLI (client only — daemon stays on the host via the
# /var/run/docker.sock mount). The Debian `docker.io` package ships
# dockerd but not the client binary on slim, so grab the static client
# tarball from download.docker.com instead.
ARG DOCKER_CLI_VERSION=27.5.1
RUN ARCH="$(dpkg --print-architecture)" \
&& case "$ARCH" in \
amd64) DARCH=x86_64 ;; \
arm64) DARCH=aarch64 ;; \
*) echo "unsupported arch $ARCH"; exit 1 ;; \
esac \
&& curl -fsSL "https://download.docker.com/linux/static/stable/${DARCH}/docker-${DOCKER_CLI_VERSION}.tgz" \
-o /tmp/docker.tgz \
&& tar -xzf /tmp/docker.tgz -C /tmp \
&& install -m 0755 /tmp/docker/docker /usr/local/bin/docker \
&& rm -rf /tmp/docker /tmp/docker.tgz
WORKDIR /app
# Install Python deps first (layer cache). Optional extras (PyMuPDF AGPL, etc.)
@@ -29,6 +67,15 @@ COPY requirements.txt requirements-optional.txt ./
RUN pip install --no-cache-dir -r requirements.txt \
&& if [ "$INSTALL_OPTIONAL" = "true" ]; then pip install --no-cache-dir -r requirements-optional.txt; fi
# Pre-install the patched basicsr/gfpgan/facexlib wheels built in the
# realesrgan-wheels stage (--no-deps keeps the image lean — torch & friends are
# pulled only when realesrgan is actually installed). With these dists already
# satisfied, the Cookbook's plain `pip install realesrgan` resolves them from
# wheels instead of rebuilding the sdists that fail on Python 3.14.
COPY --from=realesrgan-wheels /wheels/ /tmp/odysseus-wheels/
RUN pip install --no-cache-dir --no-deps /tmp/odysseus-wheels/*.whl \
&& rm -rf /tmp/odysseus-wheels
# Copy app code
COPY . .
+45
View File
@@ -0,0 +1,45 @@
# -*- mode: python ; coding: utf-8 -*-
a = Analysis(
['launcher.py'],
pathex=[],
binaries=[],
datas=[('static', 'static'), ('scripts', 'scripts'), ('mcp_servers', 'mcp_servers'), ('services/hwfit/data', 'services/hwfit/data'), ('config', 'config'), ('.env.example', '.env.example')],
hiddenimports=[],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
noarchive=False,
optimize=0,
)
pyz = PYZ(a.pure)
exe = EXE(
pyz,
a.scripts,
[],
exclude_binaries=True,
name='Odysseus',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
icon=['static\\icon.ico'],
)
coll = COLLECT(
exe,
a.binaries,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='Odysseus',
)
+38 -458
View File
@@ -1,471 +1,65 @@
# Odysseus
<p align="center">
<img src="docs/odysseus-wordmark.png" alt="Odysseus" width="238">
</p>
> **Branch note:** `dev` is the default branch and contains the latest development changes, but it may be unstable. For the more stable curated branch, use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main).
<p align="center">
A self-hosted AI workspace for chat, agents, research, documents, email, notes, calendar, and local model workflows.
</p>
```
───────────────────────────────────────────────
⊹ ࣪ ˖ ૮( ˶ᵔ ᵕ ᵔ˶ )っ Odysseus vers. 1.0
───────────────────────────────────────────────
```
<p align="center">
<a href="#quick-start">Quick Start</a> ·
<a href="docs/setup.md">Setup Guide</a> ·
<a href="CONTRIBUTING.md">Contributing</a> ·
<a href="ROADMAP.md">Roadmap</a>
</p>
![Odysseus](docs/odysseus.jpg)
<p align="center">
<a href="https://repology.org/project/odysseus-ai/versions"><img src="https://repology.org/badge/vertical-allrepos/odysseus-ai.svg" alt="Packaging status"></a>
</p>
A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan.
<p align="center">
<img src="docs/odysseus-browser.jpg" alt="Odysseus interface">
</p>
[![Packaging status](https://repology.org/badge/vertical-allrepos/odysseus-ai.svg)](https://repology.org/project/odysseus-ai/versions)
## Features
- **Chat** -- chat with any local model or API; adding them is super simple.<br> <sub>vLLM · llama.cpp · Ollama · OpenRouter · OpenAI · GitHub Copilot</sub>
- **Agent** -- hand it tools and let it run the whole task itself.<br> <sub>built on [opencode](https://github.com/anomalyco/opencode) · MCP · web · files · shell · skills · memory</sub>
- **Cookbook** -- Scans your hardware, recommends models, click to download and serve.. easy!<br> <sub>built on [llmfit](https://github.com/AlexsJones/llmfit) · VRAM-aware · GGUF / FP8 / AWQ · fit scoring · vLLM / llama.cpp serving</sub>
- **Deep Research** -- multi-step runs that gather, read, and synthesize sources into a nice visual report.<br> <sub>adapted from [Tongyi DeepResearch](https://github.com/Alibaba-NLP/DeepResearch)</sub>
- **Compare** -- a fun tool to compare models side by side. Test completely blind, no bias!<br> <sub>multi-model · blind test · synthesis</sub>
- **Documents** -- YOU write the text, AI is there to assist, not the opposite.<br> <sub>multi-tab editor · markdown · HTML · CSV · syntax highlighting · AI edits · suggestions</sub>
- **Memory / Skills** -- Persistent memory and skills, your agent evolves over time as it better understands you and your tasks!<br> <sub>ChromaDB · fastembed (ONNX) · vector + keyword retrieval · import/export</sub>
- **Email** -- IMAP/SMTP inbox with AI triage built in: urgency reminders, auto-tag, auto-summary, auto-reply drafts, auto-spam.<br> <sub>IMAP · SMTP · per-account routing · CalDAV-aware</sub>
- **Notes & Tasks** -- Quick notes with reminders, a todo list, and scheduled tasks the agent can act on.<br> <sub>note pings · checklist · cron-style tasks · ntfy / browser / email channels</sub>
- **Calendar** -- Local-first calendar with CalDAV sync to Radicale / Nextcloud / Apple / Fastmail.<br> <sub>CalDAV pull · .ics import/export · per-calendar colors · agent-aware</sub>
- **Works on mobile** -- looks and runs great on your phone, not just desktop.<br> <sub>responsive · installable (PWA) · touch gestures</sub>
- **Extras** -- more to explore, happy if you give it a go!<br> <sub>image editor · theme editor · file uploads (vision + PDF) · web search · presets · sessions · 2FA</sub>
## Demo
A full, hover-to-play tour lives on the landing page (`docs/index.html`).
<details>
<summary>Screenshots / clips</summary>
### Chat & Agents
![Chat & Agents](docs/chat.gif)
### Deep Research
![Deep Research](docs/research.gif)
### Compare
![Compare](docs/compare.gif)
### Documents
![Documents](docs/document.gif)
### Notes & Tasks
![Notes & Tasks](docs/notes.gif)
</details>
---
## Quick Start
Defaults work out of the box: clone, run, then configure models/search/email
inside **Settings**. Only edit `.env` for deployment-level overrides like
`APP_BIND`, `APP_PORT`, `AUTH_ENABLED`, `DATABASE_URL`, or a pre-seeded admin password.
> `dev` is the default branch and gets the newest changes first. Use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main) if you want the more curated branch.
On first setup, Odysseus creates an admin account (`admin` unless
`ODYSSEUS_ADMIN_USER` is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in `docker compose logs odysseus`.
Use that for the first login, then change it in **Settings**.
Contributing? See [CONTRIBUTING.md](CONTRIBUTING.md) for setup, testing, and
pull request guidelines.
### Docker (recommended)
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
cp .env.example .env # optional, but recommended for explicit defaults
cp .env.example .env
docker compose up -d --build
```
To include optional extras in the image (PDF viewer, Office extraction; includes AGPL PyMuPDF), build with `docker compose build --build-arg INSTALL_OPTIONAL=true` before `up`.
Open `http://localhost:7000` when the containers are healthy. Docker Compose
binds the web UI to `127.0.0.1` by default. If the port is taken, set
`APP_PORT=7001` in `.env` and recreate the container. Set `APP_BIND=0.0.0.0`
only when you intentionally want LAN/reverse-proxy access.
Open `http://localhost:7000` when the containers are healthy. The first admin password is printed in `docker compose logs odysseus`.
> **On Apple Silicon (M-series) Macs:** Docker can't reach the Metal GPU, so
> Cookbook serves local models on CPU only. For GPU-accelerated model serving,
> run natively instead — see [Apple Silicon](#apple-silicon) below.
Native installs, GPU notes, Windows/macOS instructions, HTTPS, and configuration live in the [setup guide](docs/setup.md).
### Native Linux / macOS
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
Requirements: Python 3.11+. Cookbook also needs `tmux` for background model
downloads and serves. The app itself is lightweight; local model serving is the
heavy part and depends on the model, runtime, GPU, and VRAM, so small hosts can
connect to API or remote model servers instead. Use `--host 0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
## Features
### Apple Silicon
Docker on macOS cannot use the Metal GPU. For GPU-accelerated Cookbook on an
M-series Mac, run Odysseus natively:
- **Chat + Agents** — local/API models, tools, MCP, files, shell, skills, and memory.
- **Cookbook** — hardware-aware model recommendations, downloads, and serving.
- **Deep Research** — multi-step web research with source reading and report generation.
- **Compare** — blind side-by-side model testing and synthesis.
- **Documents** — writing-first editor with AI edits, suggestions, Markdown, HTML, CSV, and syntax highlighting.
- **Email** — IMAP/SMTP inbox with triage, tags, summaries, reminders, and reply drafts.
- **Notes, Tasks + Calendar** — reminders, todos, scheduled agent tasks, and CalDAV sync.
- **Extras** — gallery/image editor, themes, uploads, web search, presets, sessions, and 2FA.
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh
```
## Demo
It launches at `http://127.0.0.1:7860`. To expose it to your phone over a trusted LAN/VPN such as Tailscale, bind all interfaces:
```bash
ODYSSEUS_HOST=0.0.0.0 ./start-macos.sh
# then open http://<tailscale-ip>:7860
```
The script also reads `.env` at startup, so `APP_BIND=0.0.0.0` and `APP_PORT`
set there are picked up automatically without a command-line override each run.
Keep `AUTH_ENABLED=true` (the default) before binding outside loopback. Do not
expose this port directly to the public internet. To build a clickable app wrapper:
```bash
./build-macos-app.sh
```
<details>
<summary>Cookbook, GPU, Ollama, and troubleshooting notes</summary>
**Docker bundled services.** Compose starts Odysseus, ChromaDB, SearXNG, and
ntfy. Odysseus and the bundled service ports bind to `127.0.0.1` by default, so
they are reachable from the host but not exposed to your LAN/public internet
unless you opt in.
**Cookbook storage in Docker.** Downloads live in `./data/huggingface`
(`~/.cache/huggingface` in the container). Cookbook-installed Python CLIs and
serve engines live in `./data/local` (`~/.local` in the container), so they
survive container recreation.
**Remote servers.** In **Cookbook -> Settings -> Servers**, generate the
Odysseus SSH key and add the public key to the remote server's
`~/.ssh/authorized_keys`. From the host you can also run:
```bash
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
```
**Docker GPU overlays.** CPU-only users can skip this section. Cookbook can
only detect GPUs that Docker exposes to the container — if the host runtime or
device passthrough is not configured, Cookbook sees the iGPU, another card, or
CPU instead of your intended GPU.
For NVIDIA, `scripts/check-docker-gpu.sh` diagnoses GPU passthrough and can
optionally install the host runtime or update `.env`.
```bash
# Read-only diagnostic (default — installs nothing, never edits .env):
scripts/check-docker-gpu.sh
# Print OS-specific install commands without running them:
scripts/check-docker-gpu.sh --print-install-commands
# Install NVIDIA Container Toolkit on Ubuntu/Debian (requires sudo):
scripts/check-docker-gpu.sh --install-nvidia-toolkit
# Write COMPOSE_FILE to .env (only when GPU passthrough is confirmed working):
scripts/check-docker-gpu.sh --enable-nvidia-overlay
# Full assisted setup — install toolkit, then enable overlay if passthrough works:
scripts/check-docker-gpu.sh --install-nvidia-toolkit --enable-nvidia-overlay
```
Safety notes:
- The app never installs host GPU runtime automatically.
- The app never edits `.env` automatically.
- `.env` is only modified when `--enable-nvidia-overlay` is explicitly passed,
and only after GPU passthrough succeeds. `--yes` skips prompts but does not
bypass the passthrough gate.
- `.env.bak.*` backups created by `--enable-nvidia-overlay` are ignored by
Git and the Docker build context.
To enable manually without the script, add this to `.env`:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
```
**AMD / ROCm.** AMD setup is read-only diagnostic plus manual `.env` edit. Run:
```bash
scripts/check-docker-amd-gpu.sh
```
Then add the reported values to `.env`, replacing `RENDER_GID` with your host's
numeric render group id:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
RENDER_GID=989
```
For NVIDIA/AMD GPU support, also read the comments in the selected overlay file: docker/gpu.nvidia.yml or docker/gpu.amd.yml.
**Stack-management UIs (Portainer, Coolify, Dockhand, etc.).** These tools
often accept only a single Compose file and do not reliably honor `COMPOSE_FILE`
or multiple `-f` overlays. CLI users should keep using the `COMPOSE_FILE`
overlay workflow above. For stack UIs, point the stack at one of the standalone
files instead, which bundle the base stack plus the GPU settings:
- `docker-compose.gpu-nvidia.yml` — still requires the NVIDIA Container Toolkit
on the host.
- `docker-compose.gpu-amd.yml` — still requires host ROCm/kfd/DRI setup, the
`video`/`render` group membership, and `RENDER_GID` when needed.
The base `docker-compose.yml` plus the `docker/gpu.*.yml` overlays remain the
source of truth; the standalone files mirror them for single-file deployments.
Verify after enabling either overlay:
```bash
docker compose exec odysseus nvidia-smi -L # NVIDIA
docker compose exec odysseus sh -lc 'test -e /dev/kfd && test -d /dev/dri && ls -l /dev/kfd /dev/dri/renderD*' # AMD
```
> **GPU passthrough ≠ llama.cpp CUDA.** `nvidia-smi` passing inside the
> container confirms Docker GPU access, but llama.cpp also needs `cudart` and
> the CUDA Toolkit at runtime. If Cookbook logs show `Unable to find cudart
> library`, `Could NOT find CUDAToolkit`, `CUDA Toolkit not found`, or
> tensors/layers assigned to CPU, that is a Cookbook/llama.cpp build issue —
> not a Docker passthrough failure. Reinstall the serve engine via
> **Cookbook → Dependencies** to get a CUDA-enabled build.
>
> The same split applies to AMD/ROCm: seeing `/dev/kfd` and `/dev/dri` inside
> the container confirms device passthrough, not ROCm userspace or a
> ROCm-enabled vLLM/llama.cpp build. `rocm-smi` and `rocminfo` are not expected
> inside the slim Odysseus image.
**Ollama with Docker.** If Ollama runs on the host, add this endpoint in
Settings:
```text
http://host.docker.internal:11434/v1
```
Ollama must listen outside its own loopback interface:
```bash
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
This connects Odysseus in Docker to an Ollama server that is already running on
your host machine; it does not start Ollama inside the container.
`host.docker.internal` is Docker's hostname for the host machine from inside the
container. Cookbook **Serve** is a separate workflow for serving downloaded
models through Odysseus/llama.cpp, so Windows users with an existing Ollama
install usually only need to add the endpoint in Settings.
**Useful checks.**
```bash
docker compose ps
docker compose logs --tail=120 odysseus
docker compose logs odysseus | grep -E 'ChromaDB|MemoryVectorStore|DEGRADED'
```
**macOS details.** `start-macos.sh` installs Homebrew deps, creates the venv,
runs setup, and starts uvicorn on port `7860` because AirPlay often holds
`7000`. It uses llama.cpp/Ollama for Metal. vLLM/SGLang are CUDA/ROCm-only and
do not run on macOS. MLX-only models are not served by Odysseus.
</details>
### Native Windows
**One-command launcher** (creates the venv, installs deps, runs setup, starts the
server; safe to re-run):
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1
```
Or do it by hand:
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
py -3.11 -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
If `python` points at an older interpreter, use `py -3.12` (or another installed
3.11+ version) for the venv step.
**Requirements:** Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full **Cookbook** background
model downloads and the agent shell tool, also install
[Git for Windows](https://git-scm.com/download/win) (provides `bash.exe`).
Local GPU *serving* of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
[Ollama](https://ollama.com/download) is the easiest path — point Odysseus at
`http://localhost:11434/v1` in Settings.
Open `http://localhost:7000`, log in with the generated admin password,
and configure everything else inside **Settings**.
## Troubleshooting & Advanced Setup
### `chromadb-client` conflicts with embedded ChromaDB
If `chromadb-client` (the lightweight HTTP-only package) is installed alongside the full `chromadb` package, Odysseus starts but ChromaDB silently falls back to HTTP-only mode and fails.
**Fix:** uninstall `chromadb-client` and force-reinstall the full package:
```bash
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
```
### HTTPS + LAN/Tailscale exposure
To expose Odysseus on a local network or Tailscale with HTTPS:
1. Change the bind address to `0.0.0.0` in `.env` (`APP_BIND=0.0.0.0` or `ODYSSEUS_HOST=0.0.0.0`).
2. Generate a locally-trusted cert for your LAN/Tailscale IPs using [mkcert](https://github.com/FiloSottile/mkcert):
```bash
mkcert -install
mkcert -cert-file cert.pem -key-file key.pem 192.168.1.100 tailscale-ip
```
3. Run `uvicorn` with the generated certs:
```bash
python -m uvicorn app:app --host 0.0.0.0 --port 7000 --ssl-certfile=cert.pem --ssl-keyfile=key.pem
```
4. Install the `mkcert` CA on any other device you want to access Odysseus from (e.g., for iOS, email the `rootCA.pem` to yourself, install the profile, and trust it in Certificate Trust Settings).
### Optional Dependencies
`requirements-optional.txt` contains packages that unlock extra features. It is not installed by default.
| Package | Feature unlocked |
|---------|-----------------|
| `faster-whisper` | Local speech-to-text (microphone -> text) via the "local" STT provider. |
| `ddgs` | DuckDuckGo as a search provider option. |
| `PyMuPDF` | PDF page rendering in the side viewer panel and form-filling. (Note: AGPL-3.0) |
| `markitdown` | Office/EPUB document text extraction (converts .docx/.xlsx/.pptx/.xls/.epub to Markdown). |
### Faster, reproducible installs with uv (optional)
[uv](https://docs.astral.sh/uv/) works as a drop-in replacement for the
venv + pip steps in the native install guides, no project changes are needed but this change results in faster installs along with a lockfile for reproducible environments. After [installing `uv`](https://docs.astral.sh/uv/getting-started/installation/), use:
```bash
uv venv venv --python 3.13
uv pip install -r requirements.txt
# then continue as usual: python setup.py, uvicorn, ...
```
`requirements.txt` is intentionally unpinned, so two installs at different times can produce different package versions. If you want a reproducible environment (e.g. across your own machines, or to roll back after a bad upgrade), snapshot and restore exact versions with:
```bash
uv pip compile requirements.txt -o requirements.lock # snapshot current resolution
uv pip sync requirements.lock # reproduce it exactly later
```
`requirements.lock` is gitignored and platform-specific (compile it on the OS you deploy to). Regenerate it deliberately when you want to take upgrades. The plain `uv pip install -r requirements.txt` keeps following the unpinned requirements like pip does.
### Outlook / Office 365 email
Odysseus email accounts currently use IMAP/SMTP username-password auth. Outlook
and Microsoft 365 generally require OAuth instead, so normal Microsoft mailbox
passwords will fail. See [docs/email-outlook.md](docs/email-outlook.md) for the
current limitation and the planned integration direction.
## Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep `AUTH_ENABLED=true` for any network-accessible deployment.
- Keep `LOCALHOST_BYPASS=false` outside local development.
- Use `SECURE_COOKIES=true` when Odysseus is served through HTTPS by a trusted reverse proxy or private access gateway.
- Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy or private access layer.
- Keep `.env`, `data/`, `logs/`, databases, uploads, generated media, backups, auth/session files, API keys, and model/provider tokens out of Git and private shares. They are ignored by default.
- Review `data/auth.json` after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
- Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to `127.0.0.1`; bind to `0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
- Keep ChromaDB, SearXNG, ntfy, Ollama, vLLM, llama.cpp, databases, and raw model/provider APIs internal-only. Expose only the authenticated Odysseus web/API entrypoint through your trusted proxy or private access layer.
- Before publishing a fork, run `git status --short` and confirm no private files from `.env`, `data/`, `logs/`, uploads, backups, or local databases are staged.
### Private or proxied deployments
Odysseus serves plain HTTP on its app port. Docker Compose binds Odysseus and the bundled services to `127.0.0.1` by default, so a typical production/private setup is:
1. Keep Odysseus on localhost, for example `127.0.0.1:7000`.
2. Terminate HTTPS at a trusted reverse proxy or private access gateway.
3. Put the authenticated Odysseus web/API entrypoint behind that layer.
4. Keep raw service and model ports internal-only.
Cloudflare Access, Tailscale, Caddy, nginx, and Traefik can all fit this pattern; none are required by Odysseus. If your access layer reaches Odysseus on the same host, proxy to `http://127.0.0.1:7000` and keep `AUTH_ENABLED=true`, `LOCALHOST_BYPASS=false`, and `SECURE_COOKIES=true`.
`ALLOWED_ORIGINS` lists exact permitted origins for cross-origin browser/API clients; ordinary same-origin reverse-proxy access usually does not need a special CORS entry.
Common internal-only ports from the default docs/compose setup:
| Port | Service |
|---|---|
| `7000` | Odysseus raw app port |
| `8080` | SearXNG |
| `8091` | ntfy |
| `8100` | ChromaDB host port for manual/compose access |
| `11434` | Ollama |
| `8000-8020` | Common local model/provider APIs |
A full hover-to-play tour lives on the landing page: [`docs/index.html`](docs/index.html).
## Contributing
Help is welcome. The best entry points are fresh-install testing, provider setup
bugs, mobile/editor polish, docs, and small focused refactors. See
[ROADMAP.md](ROADMAP.md) for the current help-wanted list.
## Configuration
Most setup is done inside the app with `/setup` or **Settings**. Use `.env`
for deployment-level defaults and secrets you want present before first boot.
Key settings:
Help is welcome. The best entry points are fresh-install testing, provider setup bugs, mobile/editor polish, docs, and small focused refactors. See [CONTRIBUTING.md](CONTRIBUTING.md) and [ROADMAP.md](ROADMAP.md).
| Variable | Default | Description |
|---|---|---|
| `LLM_HOST` | `localhost` | Your LLM server (e.g. `llm-host.local:8000`) |
| `LLM_HOSTS` | -- | Comma-separated list for model discovery |
| `OPENAI_API_KEY` | -- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
| `SEARXNG_INSTANCE` | `http://localhost:8080` | SearXNG URL. Docker overrides this to `http://searxng:8080`. |
| `SEARXNG_SECRET` | generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
| `APP_BIND` | `127.0.0.1` | Docker Compose host bind address for the web UI. Use `0.0.0.0` only for intentional LAN/reverse-proxy access. |
| `APP_PORT` | `7000` | Docker Compose host port for the web UI. |
| `AUTH_ENABLED` | `true` | Enable/disable login |
| `LOCALHOST_BYPASS` | `false` | Development-only auth bypass for loopback requests. Keep false for shared/network deployments. |
| `ALLOWED_ORIGINS` | `http://localhost,http://127.0.0.1` | Comma-separated exact permitted origins for cross-origin browser/API clients. |
| `SECURE_COOKIES` | `false` | Set true when serving Odysseus through HTTPS at a trusted proxy or private access gateway. |
| `DATABASE_URL` | `sqlite:///./data/app.db` | Database connection string |
| `CHROMADB_HOST` | `localhost` | ChromaDB host for vector memory. Docker overrides this to `chromadb`. |
| `CHROMADB_PORT` | `8100` | ChromaDB port for manual host runs. Docker overrides this to `8000`. |
| `EMBEDDING_URL` | -- | OpenAI-compatible embeddings endpoint |
| `ODYSSEUS_CHAT_UPLOAD_MAX_BYTES` | `10485760` | Chat/agent attachment cap in bytes. Raise for larger local PDFs or text documents. |
| `ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES` | `104857600` | Gallery image upload cap in bytes (100 MB). |
| `ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES` | `26214400` | Gallery transform input cap in bytes (25 MB). |
| `ODYSSEUS_MEMORY_IMPORT_MAX_BYTES` | `10485760` | Memory import file cap in bytes (10 MB). |
| `ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES` | `26214400` | Personal document upload cap in bytes (25 MB). |
| `ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES` | `26214400` | Email compose attachment cap in bytes (25 MB). |
| `ODYSSEUS_STT_MAX_AUDIO_BYTES` | `26214400` | Speech-to-text audio cap in bytes (25 MB). |
| `ODYSSEUS_ICS_MAX_BYTES` | `10485760` | Calendar `.ics` import cap in bytes (10 MB). |
## Security
All upload-limit vars are validated (must be a positive integer) and optional; an invalid value fails fast at startup.
### Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, `@playwright/mcp`) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
```bash
npx -y @playwright/mcp@latest --version
```
That installs `@playwright/mcp` plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
## Architecture
```
app.py # FastAPI entry point
core/ auth, database, middleware, constants
src/ llm_core, agent_loop, agent_tools, chat_processor, search/
routes/ chat, session, document, memory, model … endpoints
services/ docs, memory, search, hwfit (Cookbook) …
static/ index.html + app.js + style.css + js/ (modular front-end)
docs/ landing page (index.html) + preview clips
```
## Data
All user data lives in `data/` (gitignored): `app.db` (sessions, messages, documents),
`memory.json`, `presets.json`, `uploads/`, `personal_docs/`, `chroma/`, `settings.json`.
Odysseus is a self-hosted workspace with powerful local tools. Keep auth enabled, keep private data out of Git, and do not expose raw model/service ports publicly. Deployment details are in the [setup guide](docs/setup.md#security-notes).
## Star History
@@ -478,19 +72,5 @@ All user data lives in `data/` (gitignored): `app.db` (sessions, messages, docum
</a>
## License
AGPL-3.0-or-later -- see [LICENSE](LICENSE) and [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md).
```
|
|||
|||||
| | | |||||||
)_) )_) )_) ~|~
)___))___))___)\ |
)____)____)_____)\\|
_____|____|____|_____\\\__
\ /
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
~^~ all aboard! ~^~
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
```
AGPL-3.0-or-later -- see [LICENSE](LICENSE) and [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md).
+46 -30
View File
@@ -1,6 +1,17 @@
# app.py — slim orchestrator
import mimetypes
import os
import sys
import asyncio
# On Windows, asyncio.create_subprocess_exec/shell require the ProactorEventLoop.
# When started via `python -m uvicorn` from a terminal, uvicorn sets this
# automatically. But the VS Code debugger (and other non-uvicorn entrypoints)
# use the default SelectorEventLoop, which raises NotImplementedError on any
# subprocess call. Force ProactorEventLoop here so the right loop is always
# used, regardless of how the process is launched.
if sys.platform == "win32":
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
def register_static_mime_types() -> None:
@@ -38,12 +49,12 @@ load_dotenv(encoding="utf-8-sig")
import asyncio
import logging
import secrets
from datetime import datetime
from datetime import datetime, timezone
from typing import Dict
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse, FileResponse, HTMLResponse
from fastapi.responses import JSONResponse, FileResponse
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from starlette.middleware.base import BaseHTTPMiddleware
@@ -64,7 +75,7 @@ from core.exceptions import (
import bcrypt as _bcrypt
from src.app_helpers import abs_join
from src.app_helpers import abs_join, serve_html_with_nonce
from src.generated_images import GENERATED_IMAGE_HEADERS, resolve_generated_image_path
from starlette.responses import RedirectResponse
@@ -113,12 +124,13 @@ app = FastAPI(
)
# ========= CORS =========
CORS_ALLOW_METHODS = ["GET", "POST", "PUT", "PATCH", "DELETE"]
allowed_origins = os.getenv("ALLOWED_ORIGINS", "http://localhost,http://127.0.0.1").split(",")
app.add_middleware(
CORSMiddleware,
allow_origins=allowed_origins,
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE"],
allow_methods=CORS_ALLOW_METHODS,
allow_headers=[
"Accept",
"Authorization",
@@ -167,6 +179,7 @@ _TIMEOUT_EXEMPT_PREFIXES = (
"/api/cookbook/setup", # remote pacman/apt installs
"/api/upload", # large files
"/api/image", # diffusion proxies (inpaint/harmonize/upscale/etc.) — own 120s httpx timeout
"/api/memory/audit", # retains own 120s LLM inactivity timeout
)
@@ -315,7 +328,7 @@ if AUTH_ENABLED:
# (no admin cookie available in that context). Restricted to
# loopback clients + matching token to keep it locked down.
try:
from core.middleware import INTERNAL_TOOL_HEADER, INTERNAL_TOOL_TOKEN as _ITT
from core.middleware import INTERNAL_TOOL_HEADER, INTERNAL_TOOL_TOKEN as _ITT, INTERNAL_TOOL_USER
_hdr = request.headers.get(INTERNAL_TOOL_HEADER)
if _hdr and secrets.compare_digest(_hdr, _ITT) and _is_trusted_loopback(request):
# Impersonation: when the agent's loopback call sets
@@ -327,11 +340,11 @@ if AUTH_ENABLED:
if _impersonate and _impersonate in getattr(_auth_mgr, "users", {}):
request.state.current_user = _impersonate
else:
request.state.current_user = "internal-tool"
request.state.current_user = INTERNAL_TOOL_USER
request.state.api_token = False
return await call_next(request)
except Exception:
pass
except Exception as _e:
logger.warning("Internal tool auth header check failed", exc_info=_e)
# Allow DIRECT localhost requests (internal service calls from
# heartbeats etc.). Tunnel/proxy-forwarded requests are excluded by
# _is_trusted_loopback so LOCALHOST_BYPASS can't be abused over a
@@ -384,11 +397,10 @@ if AUTH_ENABLED:
_db.close()
try:
await _asyncio.to_thread(_do)
except Exception:
pass
except Exception as _e:
logger.debug("Failed to update token last_used_at", exc_info=_e)
_asyncio.create_task(_touch_last_used(matched_id))
# Keep bearer-token callers out of normal cookie/user
# routes. API-aware routes can read api_token_owner.
request.state.current_user = "api"
request.state.api_token = True
request.state.api_token_id = matched_id
@@ -437,7 +449,7 @@ class _RevalidatingStatic(StaticFiles):
return resp
app.mount("/static", _RevalidatingStatic(directory="static"), name="static")
app.mount("/static", _RevalidatingStatic(directory=STATIC_DIR), name="static")
# ========= GENERATED IMAGES =========
@app.get("/api/generated-image/{filename}")
@@ -463,8 +475,8 @@ async def serve_generated_image(filename: str, request: Request):
_db.close()
except HTTPException:
raise
except Exception:
pass
except Exception as _e:
logger.warning("Image ownership verification failed for %r", filename, exc_info=_e)
ext = filename.rsplit('.', 1)[-1].lower()
mime = {
"png": "image/png", "jpg": "image/jpeg", "jpeg": "image/jpeg",
@@ -527,6 +539,7 @@ memory_vector = components.get("memory_vector")
upload_handler = components["upload_handler"]
app.state.upload_handler = upload_handler
personal_docs_mgr = components["personal_docs_manager"]
app.state.personal_docs_manager = personal_docs_mgr
api_key_manager = components["api_key_manager"]
preset_manager = components["preset_manager"]
chat_processor = components["chat_processor"]
@@ -788,23 +801,17 @@ app.include_router(setup_companion_routes())
# ========= ROUTES (kept in app.py) =========
def _serve_html_with_nonce(request: Request, file_path: str) -> HTMLResponse:
"""Read an HTML file and inject the CSP nonce into inline <script> tags."""
with open(file_path, "r", encoding="utf-8") as f:
html = f.read()
nonce = getattr(request.state, "csp_nonce", "")
html = html.replace("{{CSP_NONCE}}", nonce)
return HTMLResponse(html)
@app.get("/")
async def serve_index(request: Request):
static_path = abs_join(BASE_DIR, "static/index.html")
if os.path.exists(static_path):
return _serve_html_with_nonce(request, static_path)
root_path = abs_join(BASE_DIR, "index.html")
if os.path.exists(root_path):
return _serve_html_with_nonce(request, root_path)
raise HTTPException(404, "index.html not found")
return serve_html_with_nonce(request, static_path)
# No static bundle — fall back to a root-level index.html if one is shipped.
# If neither exists, serve_html_with_nonce logs it and returns a generic 500:
# a missing index.html is a broken deployment (server fault), not a client
# "not found". This keeps the app-shell route consistent with the other
# bundled-template routes instead of mislabelling the fault as a 404.
return serve_html_with_nonce(request, abs_join(BASE_DIR, "index.html"))
@app.get("/notes")
async def serve_notes(request: Request):
@@ -845,13 +852,13 @@ async def serve_library(request: Request):
@app.get("/backgrounds")
async def serve_backgrounds(request: Request):
"""Sandbox page for prototyping background effects. No auth required."""
return _serve_html_with_nonce(request, abs_join(BASE_DIR, "static/backgrounds.html"))
return serve_html_with_nonce(request, abs_join(BASE_DIR, "static/backgrounds.html"))
@app.get("/login")
async def serve_login(request: Request):
if not AUTH_ENABLED:
return RedirectResponse(url="/", status_code=302)
return _serve_html_with_nonce(request, abs_join(BASE_DIR, "static/login.html"))
return serve_html_with_nonce(request, abs_join(BASE_DIR, "static/login.html"))
@app.get("/api/version")
async def get_version():
@@ -860,7 +867,7 @@ async def get_version():
@app.get("/api/health")
async def health_check() -> Dict[str, str]:
return {"status": "healthy", "timestamp": datetime.utcnow().isoformat()}
return {"status": "healthy", "timestamp": datetime.now(timezone.utc).isoformat()}
@app.get("/api/ready")
async def readiness_check() -> JSONResponse:
@@ -1170,3 +1177,12 @@ async def _shutdown_event():
except Exception as e:
logger.warning(f"MCP shutdown error: {e}")
logger.info("Application shutdown complete")
if __name__ == "__main__":
import uvicorn
bind_host = os.getenv("APP_BIND", "127.0.0.1")
bind_port = int(os.getenv("APP_PORT", "7000"))
uvicorn.run(app, host=bind_host, port=bind_port, log_level="info")
+72
View File
@@ -0,0 +1,72 @@
#Requires -Version 5.1
<#
Build a portable Windows distribution for Odysseus.
Output layout:
dist\Odysseus\Odysseus.exe
dist\Odysseus\static\...
dist\Odysseus\scripts\...
dist\Odysseus\mcp_servers\...
dist\Odysseus\services\hwfit\data\...
The app then keeps using its normal filesystem layout when frozen.
Usage:
powershell -ExecutionPolicy Bypass -File .\build-windows-portable.ps1
#>
$ErrorActionPreference = "Stop"
Set-Location -Path $PSScriptRoot
function Write-Step($msg) { Write-Host ""; Write-Host ("==> " + $msg) -ForegroundColor Cyan }
function Fail($msg) {
Write-Host ""
Write-Host ("ERROR: " + $msg) -ForegroundColor Red
exit 1
}
Write-Step "Checking for Python"
$pyExe = $null
if (Test-Path ".\.venv\Scripts\python.exe") {
$pyExe = (Resolve-Path ".\.venv\Scripts\python.exe").Path
} else {
foreach ($c in @("py", "python")) {
$cmd = Get-Command $c -ErrorAction SilentlyContinue
if ($cmd) { $pyExe = $cmd.Source; break }
}
if ($pyExe -like "*WindowsApps*python.exe") {
$pyCmd = Get-Command py -ErrorAction SilentlyContinue
if ($pyCmd) {
$pyExe = $pyCmd.Source
}
}
}
if (-not $pyExe) {
Fail "Python not found on PATH. Install Python 3.11+ first."
}
Write-Host ("Using Python: " + $pyExe)
Write-Step "Installing build dependencies"
& $pyExe -m pip install --upgrade pip --quiet
& $pyExe -m pip install -r requirements.txt pyinstaller pystray Pillow
if ($LASTEXITCODE -ne 0) { Fail "Dependency install failed." }
Write-Step "Building portable exe bundle"
Remove-Item -Recurse -Force build, dist -ErrorAction SilentlyContinue
$dataArgs = @(
"--add-data", "static;static",
"--add-data", "scripts;scripts",
"--add-data", "mcp_servers;mcp_servers",
"--add-data", "services/hwfit/data;services/hwfit/data",
"--add-data", "config;config",
"--add-data", ".env.example;.env.example"
)
& $pyExe -m PyInstaller --noconfirm --clean --onedir --noconsole --icon=static/icon.ico --name Odysseus @dataArgs launcher.py
if ($LASTEXITCODE -ne 0) { Fail "PyInstaller build failed." }
Write-Host ""
Write-Host "Build complete." -ForegroundColor Green
Write-Host "Portable app folder: $PSScriptRoot\dist\Odysseus" -ForegroundColor Green
Write-Host "Distribute the whole folder (or zip it) so static assets and scripts stay with the exe." -ForegroundColor Green
+17 -3
View File
@@ -5,8 +5,9 @@ offers and pair to it, without duplicating any LLM logic.
Auth is enforced globally by AuthMiddleware (app.py), so reaching a handler here
means the caller is authenticated by either a cookie session or a Bearer `ody_`
API token. The read endpoints (ping/info/models) accept either; the pairing
endpoints are admin-cookie only.
API token. Ping/info accept either credential type, models requires a chat-
scoped API token for bearer callers, and the pairing endpoints are admin-cookie
only.
Pairing CSRF posture: minting happens ONLY on POST. The session cookie is
SameSite=Lax (routes/auth_routes.py), which a browser does not send on a
@@ -18,7 +19,7 @@ on a GET would be unsafe (Lax cookies ride top-level GET navigations), so GET
import html
from fastapi import APIRouter, Request
from fastapi import APIRouter, HTTPException, Request
from fastapi.responses import HTMLResponse
from core.middleware import require_admin
@@ -52,6 +53,18 @@ def owner_can_see(row_owner, owner) -> bool:
return row_owner is None or row_owner == owner
def require_models_scope(request: Request) -> None:
"""Require the companion chat scope for bearer-token model inventory."""
if not getattr(request.state, "api_token", False):
return
scopes = getattr(request.state, "api_token_scopes", None) or []
if isinstance(scopes, str):
scopes = [scope.strip() for scope in scopes.split(",")]
scope_set = {str(scope).strip() for scope in scopes if str(scope).strip()}
if _pairing.COMPANION_SCOPE not in scope_set:
raise HTTPException(403, "API token requires chat scope")
def mint_pairing_token(owner: str, invalidate=None) -> tuple[str, str]:
"""Mint a pairing token AND invalidate the auth middleware's in-memory token
cache, so the new token is accepted on the very next request without a server
@@ -103,6 +116,7 @@ def setup_companion_routes() -> APIRouter:
rows -- the same rule as owner_filter. Read-only; never returns api_key
material.
"""
require_models_scope(request)
import json as _json
from core.database import SessionLocal, ModelEndpoint
+95 -8
View File
@@ -3,6 +3,7 @@ Authentication module — multi-user password hashing, session tokens, config pe
Config stored in data/auth.json. Uses bcrypt directly.
"""
import enum
import json
import os
import secrets
@@ -19,6 +20,7 @@ logger = logging.getLogger(__name__)
from core.atomic_io import atomic_write_json as _atomic_write_json # noqa: E402
from core.middleware import INTERNAL_TOOL_USER # noqa: E402
DEFAULT_PRIVILEGES = {
"can_use_agent": True,
@@ -46,7 +48,7 @@ ADMIN_PRIVILEGES["allowed_models_restricted"] = False
# backwards for this sentinel.
ADMIN_PRIVILEGES["block_all_models"] = False
from src.constants import AUTH_FILE
from src.constants import AUTH_FILE, PASSWORD_MIN_LENGTH
DEFAULT_AUTH_PATH = AUTH_FILE
TOKEN_TTL = 60 * 60 * 24 * 7 # 7 days
@@ -64,7 +66,7 @@ TOKEN_TTL = 60 * 60 * 24 * 7 # 7 days
# of those names would be denied an assistant and inconsistently owner-scoped.
# Refuse to create or rename into any of them so the sentinels can't be
# impersonated. (Keep this in sync with that synthetic-owner set.)
RESERVED_USERNAMES = frozenset({"internal-tool", "api", "demo", "system"})
RESERVED_USERNAMES = frozenset({INTERNAL_TOOL_USER, "api", "demo", "system"})
def normalize_known_username(users: Dict[str, Any], username: str | None) -> Optional[str]:
@@ -83,6 +85,15 @@ def _verify_password(password: str, hashed: str) -> bool:
return bcrypt.checkpw(password.encode("utf-8"), hashed.encode("utf-8"))
class SetAdminResult(enum.Enum):
"""Outcome of AuthManager.set_admin, so callers can map each case to a
precise response instead of guessing from a bare bool."""
OK = "ok"
USER_NOT_FOUND = "user_not_found"
NOT_AUTHORIZED = "not_authorized" # requester is not an admin
LAST_ADMIN = "last_admin" # would remove the last remaining admin
class AuthManager:
"""Manages multi-user password + session-token auth system."""
@@ -233,6 +244,15 @@ class AuthManager:
def is_configured(self) -> bool:
return len(self.users) > 0
def policy(self) -> dict:
"""Return public auth policy constants for the frontend."""
return {
"password_min_length": PASSWORD_MIN_LENGTH,
"reserved_usernames": sorted(RESERVED_USERNAMES),
"signup_enabled": self.signup_enabled,
"session_days": TOKEN_TTL // 86400,
}
# ------------------------------------------------------------------
# Account management
# ------------------------------------------------------------------
@@ -387,6 +407,69 @@ class AuthManager:
logger.info(f"Updated privileges for '{username}': {current}")
return True
def set_admin(self, username: str, is_admin: bool,
requesting_user: str) -> SetAdminResult:
"""Promote/demote an existing user to/from admin. Admin only.
Refuses to remove the last remaining admin so the instance can never
be locked out of admin access; self-demotion is allowed as long as
another admin remains. Admin status is re-checked live on every
request, so unlike delete/rename no session or token revocation is
needed — a demoted admin simply fails the next is_admin() gate.
Promotion stashes the user's current privilege map and demotion
restores it, so a temporary admin stint can't silently broaden a
user's non-admin access; users without a stash (created as admin,
or promoted before stashing existed) demote to DEFAULT_PRIVILEGES.
Counting admins and flipping the flag happen in one critical section
so two concurrent demotions can't race the admin count to zero.
"""
username = (username or "").strip().lower()
requesting_user = (requesting_user or "").strip().lower()
is_admin = bool(is_admin)
with self._config_lock:
target = self._config.get("users", {}).get(username)
if target is None:
return SetAdminResult.USER_NOT_FOUND
if not self.users.get(requesting_user, {}).get("is_admin"):
return SetAdminResult.NOT_AUTHORIZED
currently_admin = bool(target.get("is_admin"))
if currently_admin == is_admin:
return SetAdminResult.OK # no-op; leave privileges untouched
if currently_admin and not is_admin:
admin_count = sum(1 for d in self.users.values() if d.get("is_admin"))
if admin_count <= 1:
return SetAdminResult.LAST_ADMIN
# Write order matters for lock-free readers: get_privileges()
# reads without _config_lock and trusts is_admin, so the admin
# flag must be flipped while the stored map is safe to expose —
# before writing admin privileges on promote, after restoring
# the pre-admin map on demote.
if is_admin:
target["is_admin"] = True
# Stash the pre-admin map so a later demotion can restore it.
# While is_admin is set the stored map is inert: get_privileges
# short-circuits to ADMIN_PRIVILEGES and set_privileges refuses
# admins, so only set_admin ever touches the stash.
target["privileges_before_admin"] = dict(
target.get("privileges") or DEFAULT_PRIVILEGES
)
target["privileges"] = dict(ADMIN_PRIVILEGES)
else:
# Restore the stashed pre-admin map. Fall back to defaults for
# users created as admins (their stored map is ADMIN_PRIVILEGES,
# which must not leak past demotion — e.g. can_use_bash) and
# for admins promoted before the stash existed.
target["privileges"] = dict(
target.pop("privileges_before_admin", None)
or DEFAULT_PRIVILEGES
)
target["is_admin"] = False
self._save()
logger.info("Set is_admin=%s for '%s' (by '%s')", is_admin, username, requesting_user)
return SetAdminResult.OK
def change_password(self, username: str, current_password: str, new_password: str) -> bool:
username = username.strip().lower()
if username not in self.users:
@@ -500,16 +583,20 @@ class AuthManager:
return None
return self.create_session_trusted(username)
def create_session_trusted(self, username: str) -> str:
def create_session_trusted(self, username: str) -> Optional[str]:
"""Issue a session token for an already-verified user.
Call only after verify_password (and TOTP if enabled) have passed."""
username = username.strip().lower()
token = secrets.token_hex(32)
with self._sessions_lock:
self._sessions[token] = {
"username": username,
"expiry": time.time() + TOKEN_TTL,
}
with self._config_lock:
if username not in self.users:
logger.warning("Refused to issue session for missing user '%s'", username)
return None
with self._sessions_lock:
self._sessions[token] = {
"username": username,
"expiry": time.time() + TOKEN_TTL,
}
self._save_sessions()
return token
+49 -2
View File
@@ -2,12 +2,15 @@ import os
import logging
import sqlite3
from datetime import datetime, timezone
from pathlib import Path
from sqlalchemy import event, create_engine, Column, String, Text, Boolean, DateTime, Integer, ForeignKey, JSON, Index, func, text
from sqlalchemy.engine import Engine
from sqlalchemy.types import TypeDecorator
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy.orm import relationship, sessionmaker, backref
from src.runtime_paths import get_app_root
logger = logging.getLogger(__name__)
# Create base class for declarative models
@@ -29,9 +32,26 @@ class TimestampMixin:
def updated_at(cls):
return Column(DateTime, default=utcnow_naive, onupdate=utcnow_naive, nullable=False)
# Get database URL from environment, default to SQLite in DATA_DIR
# Ensure the writable data directory exists before SQLite connects.
from src.constants import DATA_DIR, AUTH_FILE, MEMORY_FILE, USER_PREFS_FILE, SETTINGS_FILE
DATABASE_URL = os.getenv("DATABASE_URL", f"sqlite:///{DATA_DIR}/app.db")
Path(DATA_DIR).mkdir(parents=True, exist_ok=True)
def _default_database_url() -> str:
return f"sqlite:///{Path(DATA_DIR) / 'app.db'}"
def _normalize_sqlite_url(url: str) -> str:
if not url.startswith("sqlite:///"):
return url
db_path = url.replace("sqlite:///", "", 1)
if db_path == ":memory:" or os.path.isabs(db_path):
return url
return f"sqlite:///{(Path(get_app_root()) / db_path).resolve().as_posix()}"
# Get database URL from environment, default to SQLite in DATA_DIR
DATABASE_URL = _normalize_sqlite_url(os.getenv("DATABASE_URL", _default_database_url()))
# Create engine
engine = create_engine(
@@ -324,6 +344,13 @@ class EmailAccount(TimestampMixin, Base):
smtp_password = Column(String, default="")
from_address = Column(String, default="")
display_name = Column(String, nullable=True) # "Hriday Ranka" — used in From: header
# OAuth2 (Google / Google Workspace). Tokens stored encrypted via secret_storage.
oauth_provider = Column(String, nullable=True) # "google" or None
oauth_access_token = Column(String, nullable=True) # encrypted
oauth_refresh_token = Column(String, nullable=True) # encrypted
oauth_token_expiry = Column(String, nullable=True) # unix timestamp string
__table_args__ = (
Index('ix_email_accounts_owner_default', 'owner', 'is_default'),
@@ -1427,6 +1454,25 @@ def _migrate_add_task_automation_columns():
except Exception as e:
logging.getLogger(__name__).warning(f"task automation migration: {e}")
def _migrate_add_email_oauth_columns():
"""Add Google OAuth and display_name columns to email_accounts if missing."""
try:
with engine.connect() as conn:
cols = [r[1] for r in conn.execute(text("PRAGMA table_info(email_accounts)"))]
for col, typedef in [
("oauth_provider", "TEXT"),
("oauth_access_token", "TEXT"),
("oauth_refresh_token", "TEXT"),
("oauth_token_expiry", "TEXT"),
("display_name", "TEXT"),
]:
if col not in cols:
conn.execute(text(f"ALTER TABLE email_accounts ADD COLUMN {col} {typedef}"))
conn.commit()
except Exception as e:
logging.getLogger(__name__).warning(f"email oauth columns migration: {e}")
def _migrate_add_oauth_config():
"""Add oauth_config column to mcp_servers table if missing."""
try:
@@ -1771,6 +1817,7 @@ def init_db():
_migrate_add_tidy_verdict()
_migrate_add_doc_source_email_cols()
_migrate_add_oauth_config()
_migrate_add_email_oauth_columns()
_migrate_add_task_automation_columns()
_migrate_add_disabled_tools()
_migrate_add_mcp_oauth_tokens_column()
+27
View File
@@ -0,0 +1,27 @@
"""Helpers for keeping sensitive data out of logs.
Endpoint URLs configured by admins can embed credentials in the userinfo
(``https://user:pass@host``) or query string (``?api_key=...``). Logging them
raw leaks those secrets, so route/diagnostic logs run URLs through
``redact_url`` first. Reconstructing the URL without userinfo/query/fragment
also doubles as a sanitizer barrier for CodeQL's clear-text-logging query.
"""
from urllib.parse import urlparse, urlunparse
def redact_url(url: str) -> str:
"""Return a URL safe for logs by removing userinfo and query/fragment.
Keeps scheme, host, port and path so logs stay useful for debugging.
"""
try:
parsed = urlparse(url or "")
host = parsed.hostname or ""
if ":" in host: # IPv6 literal — re-bracket so host:port stays unambiguous
host = f"[{host}]"
if parsed.port:
host = f"{host}:{parsed.port}"
return urlunparse((parsed.scheme, host, parsed.path, "", "", ""))
except Exception:
return "<endpoint>"
+6 -7
View File
@@ -15,6 +15,8 @@ from starlette.responses import Response
# same value from this module. Never persisted or exposed externally.
INTERNAL_TOOL_TOKEN = os.environ.get("ODYSSEUS_INTERNAL_TOKEN") or secrets.token_hex(32)
INTERNAL_TOOL_HEADER = "X-Odysseus-Internal-Token"
# Pseudo-username on in-process tool-loopback requests; require_admin trusts it and it is reserved.
INTERNAL_TOOL_USER = "internal-tool"
def is_cors_preflight(method: str, headers) -> bool:
@@ -39,7 +41,7 @@ def require_admin(request: Request):
hdr = request.headers.get(INTERNAL_TOOL_HEADER)
if hdr and secrets.compare_digest(hdr, INTERNAL_TOOL_TOKEN):
return
if getattr(request.state, "current_user", None) == "internal-tool":
if getattr(request.state, "current_user", None) == INTERNAL_TOOL_USER:
return
except Exception:
pass
@@ -65,10 +67,9 @@ class SecurityHeadersMiddleware(BaseHTTPMiddleware):
response = await call_next(request)
path = request.url.path
# Tool render endpoints are served inside iframes — allow framing by self
# Tool render endpoints
is_tool_render = path.startswith("/api/tools/") and path.endswith("/render")
# PDF previews are embedded by the in-app document library. Keep the
# exception route-scoped so normal app pages remain unframeable.
# Document library PDF preview endpoint
is_document_pdf_preview = path.startswith("/api/document/") and path.endswith("/render-pdf")
# Visual report pages are self-contained HTML — need inline scripts + external images
is_report = path.startswith("/api/research/report/")
@@ -95,9 +96,7 @@ class SecurityHeadersMiddleware(BaseHTTPMiddleware):
"frame-ancestors 'none'"
)
elif is_tool_render:
# Tool iframe content: skip all framing headers — the iframe's
# sandbox="allow-scripts" attribute provides isolation.
# Don't overwrite the route's own restrictive CSP either.
# Skip framing headers for tools.
pass
elif is_document_pdf_preview:
response.headers["X-Frame-Options"] = "SAMEORIGIN"
+21 -5
View File
@@ -16,18 +16,26 @@ services:
ports:
- "${APP_BIND:-127.0.0.1}:${APP_PORT:-7000}:7000"
volumes:
- ./data:/app/data:z
- ./logs:/app/logs:z
- ${APP_DATA_DIR:-./data}:/app/data:z
- ${APP_LOGS_DIR:-./logs}:/app/logs:z
# Cookbook remote-server SSH identity. Odysseus can generate a key here;
# add the shown public key to each remote server's authorized_keys.
- ./data/ssh:/app/.ssh:z
- ${APP_DATA_DIR:-./data}/ssh:/app/.ssh:z
# Cookbook local model cache. Inside Docker, "Local" means the Odysseus
# container, so persist its HuggingFace cache under ./data/huggingface.
- ./data/huggingface:/app/.cache/huggingface:z
- ${APP_DATA_DIR:-./data}/huggingface:/app/.cache/huggingface:z
# Cookbook-installed Python CLIs/packages (vLLM, llama-cpp-python, etc.)
# land under /app/.local for the odysseus user. Persist them so a
# container recreate does not silently remove installed serve engines.
- ./data/local:/app/.local:z
- ${APP_DATA_DIR:-./data}/local:/app/.local:z
# Docker socket — lets Cookbook launch commands like
# `docker exec ollama-rocm ollama show <tag>` reach the host's
# Docker daemon (and sibling containers like ollama-rocm /
# ollama-test). The in-container user needs to be in the
# socket's owning group — see `group_add` below; the GID
# there must match the host's `docker` group (defaults to 963
# on Debian, 999 on Ubuntu — override via env if yours differs).
- /var/run/docker.sock:/var/run/docker.sock
extra_hosts:
# Lets the container reach local services on the Docker host, including
# Ollama at http://host.docker.internal:11434.
@@ -60,6 +68,13 @@ services:
- ODYSSEUS_INPROCESS_TASKS=${ODYSSEUS_INPROCESS_TASKS:-1}
- ODYSSEUS_SCRIPT_HOST=${ODYSSEUS_SCRIPT_HOST:-localhost}
- ODYSSEUS_CHAT_UPLOAD_MAX_BYTES=${ODYSSEUS_CHAT_UPLOAD_MAX_BYTES:-10485760}
- ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES=${ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES:-104857600}
- ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES=${ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_MEMORY_IMPORT_MAX_BYTES=${ODYSSEUS_MEMORY_IMPORT_MAX_BYTES:-10485760}
- ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES=${ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES=${ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_STT_MAX_AUDIO_BYTES=${ODYSSEUS_STT_MAX_AUDIO_BYTES:-26214400}
- ODYSSEUS_ICS_MAX_BYTES=${ODYSSEUS_ICS_MAX_BYTES:-10485760}
- DATA_BRAVE_API_KEY=${DATA_BRAVE_API_KEY:-}
- GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
- GOOGLE_PSE_CX=${GOOGLE_PSE_CX:-}
@@ -86,6 +101,7 @@ services:
- /dev/kfd
- /dev/dri
group_add:
- "${DOCKER_GID:-963}"
- video
- ${RENDER_GID:-render}
+22 -5
View File
@@ -15,18 +15,28 @@ services:
ports:
- "${APP_BIND:-127.0.0.1}:${APP_PORT:-7000}:7000"
volumes:
- ./data:/app/data:z
- ./logs:/app/logs:z
- ${APP_DATA_DIR:-./data}:/app/data:z
- ${APP_LOGS_DIR:-./logs}:/app/logs:z
# Cookbook remote-server SSH identity. Odysseus can generate a key here;
# add the shown public key to each remote server's authorized_keys.
- ./data/ssh:/app/.ssh:z
- ${APP_DATA_DIR:-./data}/ssh:/app/.ssh:z
# Cookbook local model cache. Inside Docker, "Local" means the Odysseus
# container, so persist its HuggingFace cache under ./data/huggingface.
- ./data/huggingface:/app/.cache/huggingface:z
- ${APP_DATA_DIR:-./data}/huggingface:/app/.cache/huggingface:z
# Cookbook-installed Python CLIs/packages (vLLM, llama-cpp-python, etc.)
# land under /app/.local for the odysseus user. Persist them so a
# container recreate does not silently remove installed serve engines.
- ./data/local:/app/.local:z
- ${APP_DATA_DIR:-./data}/local:/app/.local:z
# Docker socket — lets Cookbook launch commands like
# `docker exec ollama-rocm ollama show <tag>` reach the host's
# Docker daemon (and sibling containers like ollama-rocm /
# ollama-test). The in-container user needs to be in the
# socket's owning group — see `group_add` below; the GID
# there must match the host's `docker` group (defaults to 963
# on Debian, 999 on Ubuntu — override via env if yours differs).
- /var/run/docker.sock:/var/run/docker.sock
group_add:
- "${DOCKER_GID:-963}"
extra_hosts:
# Lets the container reach local services on the Docker host, including
# Ollama at http://host.docker.internal:11434.
@@ -59,6 +69,13 @@ services:
- ODYSSEUS_INPROCESS_TASKS=${ODYSSEUS_INPROCESS_TASKS:-1}
- ODYSSEUS_SCRIPT_HOST=${ODYSSEUS_SCRIPT_HOST:-localhost}
- ODYSSEUS_CHAT_UPLOAD_MAX_BYTES=${ODYSSEUS_CHAT_UPLOAD_MAX_BYTES:-10485760}
- ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES=${ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES:-104857600}
- ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES=${ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_MEMORY_IMPORT_MAX_BYTES=${ODYSSEUS_MEMORY_IMPORT_MAX_BYTES:-10485760}
- ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES=${ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES=${ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_STT_MAX_AUDIO_BYTES=${ODYSSEUS_STT_MAX_AUDIO_BYTES:-26214400}
- ODYSSEUS_ICS_MAX_BYTES=${ODYSSEUS_ICS_MAX_BYTES:-10485760}
- DATA_BRAVE_API_KEY=${DATA_BRAVE_API_KEY:-}
- GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
- GOOGLE_PSE_CX=${GOOGLE_PSE_CX:-}
+22 -5
View File
@@ -4,18 +4,28 @@ services:
ports:
- "${APP_BIND:-127.0.0.1}:${APP_PORT:-7000}:7000"
volumes:
- ./data:/app/data:z
- ./logs:/app/logs:z
- ${APP_DATA_DIR:-./data}:/app/data:z
- ${APP_LOGS_DIR:-./logs}:/app/logs:z
# Cookbook remote-server SSH identity. Odysseus can generate a key here;
# add the shown public key to each remote server's authorized_keys.
- ./data/ssh:/app/.ssh:z
- ${APP_DATA_DIR:-./data}/ssh:/app/.ssh:z
# Cookbook local model cache. Inside Docker, "Local" means the Odysseus
# container, so persist its HuggingFace cache under ./data/huggingface.
- ./data/huggingface:/app/.cache/huggingface:z
- ${APP_DATA_DIR:-./data}/huggingface:/app/.cache/huggingface:z
# Cookbook-installed Python CLIs/packages (vLLM, llama-cpp-python, etc.)
# land under /app/.local for the odysseus user. Persist them so a
# container recreate does not silently remove installed serve engines.
- ./data/local:/app/.local:z
- ${APP_DATA_DIR:-./data}/local:/app/.local:z
# Docker socket — lets Cookbook launch commands like
# `docker exec ollama-rocm ollama show <tag>` reach the host's
# Docker daemon (and sibling containers like ollama-rocm /
# ollama-test). The in-container user needs to be in the
# socket's owning group — see `group_add` below; the GID
# there must match the host's `docker` group (defaults to 963
# on Debian, 999 on Ubuntu — override via env if yours differs).
- /var/run/docker.sock:/var/run/docker.sock
group_add:
- "${DOCKER_GID:-963}"
extra_hosts:
# Lets the container reach local services on the Docker host, including
# Ollama at http://host.docker.internal:11434.
@@ -48,6 +58,13 @@ services:
- ODYSSEUS_INPROCESS_TASKS=${ODYSSEUS_INPROCESS_TASKS:-1}
- ODYSSEUS_SCRIPT_HOST=${ODYSSEUS_SCRIPT_HOST:-localhost}
- ODYSSEUS_CHAT_UPLOAD_MAX_BYTES=${ODYSSEUS_CHAT_UPLOAD_MAX_BYTES:-10485760}
- ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES=${ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES:-104857600}
- ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES=${ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_MEMORY_IMPORT_MAX_BYTES=${ODYSSEUS_MEMORY_IMPORT_MAX_BYTES:-10485760}
- ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES=${ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES=${ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES:-26214400}
- ODYSSEUS_STT_MAX_AUDIO_BYTES=${ODYSSEUS_STT_MAX_AUDIO_BYTES:-26214400}
- ODYSSEUS_ICS_MAX_BYTES=${ODYSSEUS_ICS_MAX_BYTES:-10485760}
- DATA_BRAVE_API_KEY=${DATA_BRAVE_API_KEY:-}
- GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
- GOOGLE_PSE_CX=${GOOGLE_PSE_CX:-}
+70
View File
@@ -0,0 +1,70 @@
#!/usr/bin/env bash
# Build patched wheels for Real-ESRGAN's unmaintained dependencies.
#
# basicsr / gfpgan / facexlib (xinntao, last released 2022) read their version
# in setup.py with:
#
# exec(compile(f.read(), version_file, 'exec'))
# return locals()['__version__']
#
# Python 3.13+ implements PEP 667: locals() inside a function returns an
# independent snapshot that exec() can no longer mutate, so the read raises
# `KeyError: '__version__'` and the sdist build fails. That is why the Cookbook
# "install realesrgan" button dies on the python:3.14 image. The packages have
# no fixed release, so we patch get_version() to exec into an explicit namespace
# dict (works on every Python) and build wheels from the patched source.
#
# Usage: build-realesrgan-wheels.sh [OUTPUT_DIR] (default: /wheels)
set -euo pipefail
OUT="${1:-/wheels}"
mkdir -p "$OUT"
work="$(mktemp -d)"
trap 'rm -rf "$work"' EXIT
cd "$work"
# Pinned to the versions Real-ESRGAN 0.3.0 resolves to.
SPECS="basicsr==1.4.2 gfpgan==1.3.8 facexlib==0.3.0"
for spec in $SPECS; do
name="${spec%%==*}"
ver="${spec##*==}"
# pip download builds metadata (and trips the same bug), so fetch the raw
# sdist URL from the PyPI JSON API instead.
url="$(python - "$name" "$ver" <<'PY'
import json, sys, urllib.request
name, ver = sys.argv[1], sys.argv[2]
data = json.load(urllib.request.urlopen(f"https://pypi.org/pypi/{name}/{ver}/json"))
for f in data["urls"]:
if f["packagetype"] == "sdist":
print(f["url"]); break
else:
sys.exit(f"no sdist found for {name}=={ver}")
PY
)"
echo ">> fetching ${name} ${ver}: ${url}"
curl -fsSL "$url" -o "${name}.tar.gz"
tar xzf "${name}.tar.gz"
done
echo ">> patching get_version()"
python - <<'PY'
import pathlib
old_exec = "exec(compile(f.read(), version_file, 'exec'))"
new_exec = "_ver_ns = {}\n exec(compile(f.read(), version_file, 'exec'), _ver_ns)"
old_ret = "return locals()['__version__']"
new_ret = "return _ver_ns['__version__']"
patched = 0
for setup in pathlib.Path(".").glob("*/setup.py"):
s = setup.read_text()
if old_exec in s and old_ret in s:
setup.write_text(s.replace(old_exec, new_exec).replace(old_ret, new_ret))
print(" patched", setup)
patched += 1
assert patched == 3, f"expected to patch 3 setup.py files, patched {patched}"
PY
echo ">> building wheels into ${OUT}"
pip wheel --no-deps -w "$OUT" ./basicsr-* ./gfpgan-* ./facexlib-*
ls -l "$OUT"
+74 -19
View File
@@ -13,6 +13,8 @@ set -e
PUID="${PUID:-1000}"
PGID="${PGID:-1000}"
GOSU_BIN="$(command -v gosu)"
PYTHON_BIN="$(command -v python)"
# Reuse an existing matching group/user if the host's UID/GID already
# corresponds to one in /etc/passwd (e.g. when the image is rebuilt
@@ -24,26 +26,78 @@ if ! getent passwd "$PUID" >/dev/null 2>&1; then
useradd -u "$PUID" -g "$PGID" -M -s /bin/sh -d /app odysseus
fi
# Repair ownership on every writable path the app touches at runtime.
#
# Bind-mounted dirs (/app/data, /app/logs) are the obvious ones, but
# the app ALSO writes inside the image's own source tree at runtime:
# - services/cache/{search,content}/* (search cache LRU)
# - services/search_analytics.json
# - services/search_engine_error.log
# - services/tts cache, etc.
# These dirs were created as root during `docker build`, so dropping
# to PUID:PGID would otherwise crash on the first import that tries
# to mkdir them. Chown the whole /app tree — fast (<1s on this size)
# and idempotent via the `-not -uid` filter so we only touch files
# that need fixing.
for dir in /app /app/data /app/logs; do
ODY_USER="$(getent passwd "$PUID" | cut -d: -f1)"
[ -z "$ODY_USER" ] && ODY_USER=odysseus
# Docker-socket group plumbing. When /var/run/docker.sock is bind-mounted
# (Cookbook uses docker exec to reach sibling containers), the socket is
# owned by root:<host docker gid>. Add the app user to that group and later
# call gosu by username so supplementary groups are retained.
DOCKER_SOCK="${DOCKER_SOCK:-/var/run/docker.sock}"
if [ -S "$DOCKER_SOCK" ]; then
SOCK_GID="$(stat -c '%g' "$DOCKER_SOCK" 2>/dev/null || echo '')"
if [ -n "$SOCK_GID" ] && [ "$SOCK_GID" != "0" ]; then
if ! getent group "$SOCK_GID" >/dev/null 2>&1; then
groupadd -g "$SOCK_GID" docker_host || true
fi
SOCK_GROUP="$(getent group "$SOCK_GID" | cut -d: -f1)"
if [ -n "$SOCK_GROUP" ]; then
usermod -aG "$SOCK_GROUP" "$ODY_USER" 2>/dev/null || true
fi
fi
fi
mount_root_for() {
awk -v target="$1" '$5 == target { print $4; exit }' /proc/self/mountinfo 2>/dev/null || true
}
is_broad_mount_root() {
case "$1" in
/|/home|/srv|/var|/usr|/opt|/tmp|/mnt|/media)
return 0
;;
esac
return 1
}
repair_tree_ownership() {
dir="$1"
if [ -d "$dir" ]; then
# `find ... -not -uid` keeps this O(touched-files), not
# O(everything), so terabyte-sized maildirs don't slow startup.
find "$dir" -not -uid "$PUID" -print0 2>/dev/null \
find "$dir" -xdev -not -uid "$PUID" -print0 2>/dev/null \
| xargs -0 -r chown "$PUID:$PGID" 2>/dev/null || true
fi
}
repair_app_tree_ownership() {
if [ -d /app ]; then
find /app -xdev \
\( -path /app/data -o -path /app/logs -o -path /app/.ssh -o -path /app/.cache -o -path /app/.local \) -prune \
-o -not -uid "$PUID" -print0 2>/dev/null \
| xargs -0 -r chown "$PUID:$PGID" 2>/dev/null || true
fi
}
repair_bind_mount_ownership() {
dir="$1"
if [ ! -d "$dir" ]; then
return
fi
mount_root="$(mount_root_for "$dir")"
if is_broad_mount_root "$mount_root"; then
echo "Skipping recursive ownership repair for $dir because it maps to broad host path $mount_root" >&2
chown "$PUID:$PGID" "$dir" 2>/dev/null || true
return
fi
repair_tree_ownership "$dir"
}
# Repair image-owned writable paths without walking into bind-mounted host
# trees, then repair the app-owned mount roots separately.
repair_app_tree_ownership
for dir in /app/data /app/logs /app/.ssh /app/.cache/huggingface /app/.local; do
repair_bind_mount_ownership "$dir"
done
# Cookbook installs vllm/etc. via `pip install --user`, which pulls
@@ -70,6 +124,7 @@ for cu in \
break
fi
done
# Disable the FlashInfer JIT sampler unconditionally — it is sampler-only
# and has no impact on the attention path, but requires nvcc + matching
# CUDA headers at startup. Without this, vLLM crashes with "Could not find
@@ -83,9 +138,9 @@ export PATH="/app/.local/bin:$PATH"
# Run first-time setup as the app user so data/ files get the right ownership.
# setup.py is idempotent — skips auth.json / .env if they already exist.
# || true so a setup failure never prevents the container from starting.
gosu "$PUID:$PGID" python /app/setup.py || true
"$GOSU_BIN" "$ODY_USER" "$PYTHON_BIN" /app/setup.py || true
# Drop root and run the actual app. `gosu` is preferred over `su` /
# `sudo` because it cleans up the process tree (no extra shell layer)
# so signals (SIGTERM from `docker stop`) reach uvicorn directly.
exec gosu "$PUID:$PGID" "$@"
exec "$GOSU_BIN" "$ODY_USER" "$@"
+129
View File
@@ -0,0 +1,129 @@
# Backup & Restore
Odysseus keeps all of your state in the `data/` directory — the SQLite database
(`app.db`), the Fernet encryption key (`data/.app_key`), the vault, memory, RAG
indexes, personal documents, and uploads. The `scripts/odysseus-backup` tool
snapshots that directory into a single gzip tarball and restores it later.
Snapshots are safe to take while the app is running: SQLite databases are copied
through SQLite's own `.backup` API rather than a raw file copy, so an in-flight
write can't corrupt the snapshot.
> **A snapshot contains your secrets.** The tarball includes the Fernet
> encryption key (`data/.app_key`), the vault, sessions, and any stored
> provider/API tokens — so treat it like a password. Store backups somewhere
> private, never commit them to Git, and prefer an encrypted destination when
> copying them offsite.
## Quick start
Run the tool from the repository root:
```bash
# Create a snapshot → backups/odysseus-backup-<YYYYMMDD-HHMMSS>.tar.gz
./scripts/odysseus-backup snapshot
# List existing snapshots (most recent first)
./scripts/odysseus-backup list
# Check a tarball's integrity without extracting it
./scripts/odysseus-backup verify backups/odysseus-backup-20260101-120000.tar.gz
# Restore (destructive — see the warning below)
./scripts/odysseus-backup restore backups/odysseus-backup-20260101-120000.tar.gz --yes
```
The script depends only on the Python standard library, so any `python3` on your
`PATH` will run it — you don't need the app's virtualenv active.
Every command prints a JSON result. Add `--pretty` for indented output.
## Commands
### `snapshot`
Writes a `tar.gz` of `data/` to `backups/<timestamp>.tar.gz`.
| Flag | Effect |
| --- | --- |
| `--out PATH` | Write to a specific path instead of the default `backups/` location. Must be **outside** `data/`. |
| `--include-research` | Include `data/deep_research/` (skipped by default — research runs are large). |
| `--include-attachments` | Include `data/mail-attachments/` (skipped by default — cached IMAP extractions, re-derivable). |
By default the snapshot includes everything under `data/` **except**
`deep_research/` and `mail-attachments/`. Personal uploads and documents are
included.
```bash
# Snapshot straight to a mounted NAS path
./scripts/odysseus-backup snapshot --out /mnt/nas/odysseus-$(date +%F).tar.gz
# Full snapshot including research runs and mail attachments
./scripts/odysseus-backup snapshot --include-research --include-attachments
```
### `list`
Lists the tarballs in `backups/`, most recent first, with size and modification
time.
### `verify PATH`
Opens the tarball read-only and walks every member to confirm it is intact and
safe to restore. Nothing is extracted. Use this before relying on an old backup
or after copying one across machines.
### `restore PATH --yes`
Overwrites `data/` from a tarball.
> **Restore is destructive.** It replaces the current `data/` directory. `--yes`
> is required so a mistyped command can't wipe your live state.
Restore is not a blind delete: before extracting, the tool **renames your current
`data/` to `data.before-restore-<timestamp>`** in the repository root. If a
restore turns out to be wrong, your previous state is still there — delete the
restored `data/` and rename the stashed directory back. The restore path is also
validated entry-by-entry: archives containing absolute paths, `..` segments,
symlinks, or anything outside `data/` are rejected.
## Scheduling offsite backups
The tarball output composes cleanly with cron and any copy tool. For example, a
nightly snapshot copied offsite:
```cron
0 3 * * * cd /path/to/odysseus && ./scripts/odysseus-backup snapshot --out "/mnt/nas/odysseus-$(date +\%F).tar.gz"
```
Swap the `--out` target for `scp`, `rclone`, `s3cmd`, or similar to push the
snapshot to remote storage.
## Docker vs native installs
The tool reads `data/` and writes `backups/` relative to the repository root, so
where you run it matters:
- **Native installs** — run it from the repo root as shown above. `data/` and
`backups/` are both in the repo directory.
- **Docker**`docker-compose.yml` bind-mounts the host's `./data` to
`/app/data`, so the live data is also present on the host. **Run the tool on
the host** from the repo root; the snapshot reads the bind-mounted `./data` and
writes to `./backups` on the host. Running it *inside* the container is not
recommended, because `backups/` is not a mounted volume and the tarball would
be lost when the container is recreated.
> **ChromaDB caveat (Docker only).** In the Docker setup, ChromaDB stores its
> vectors in a separate Compose-managed volume (declared as `chromadb-data`),
> **not** under `./data`. `odysseus-backup` therefore does not capture the Docker
> ChromaDB store. Back it up separately if you need it. Compose prefixes the
> volume with the project name, so find the real name first
> (`docker volume ls | grep chromadb`), then archive it — for example:
>
> ```bash
> docker run --rm -v <project>_chromadb-data:/data -v "$PWD":/backup \
> alpine tar czf /backup/chromadb.tar.gz -C /data .
> ```
>
> On native installs ChromaDB lives at `data/chroma/` and is included in the
> snapshot normally.
BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.0 MiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.4 MiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 1003 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 79 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.5 MiB

+14 -9
View File
@@ -1,14 +1,16 @@
# Security CI guide
This project runs a set of automated security checks on every pull request and
on every push to `main`. This page explains what each one does, whether it can
This project runs a set of automated security checks on pull requests and
selected branch pushes. This page explains what each one does, whether it can
block a merge, and the few one-time settings you should turn on to get the full
benefit.
## What runs, and why
Each check lives in its own file under `.github/workflows/`. They run
automatically; you do not start them.
Most checks live in files under `.github/workflows/`. CodeQL is configured
through GitHub's code scanning default setup, so it appears as a dynamic GitHub
workflow instead of a checked-in workflow file. They run automatically; you do
not start them.
| Check | What it protects against | Blocks a merge? |
|---|---|---|
@@ -88,11 +90,14 @@ let the workflows run on one pull request first, then add them here.
2. Turn on **Dependency graph** (usually on by default for public repos) -- this
powers Dependency review and Dependabot.
3. Turn on **Dependabot alerts** and **Dependabot security updates**.
4. Under **Code scanning**, you have two ways to scan the app code with CodeQL:
- The included `codeql.yml` workflow already scans `main` and runs weekly.
- To also scan **pull requests** (recommended, since most contributions come
from forks), click **Set up -> Default** under Code scanning. GitHub then
runs CodeQL on pull requests for you, with no token limitations.
4. Under **Code scanning**, use **Set up -> Default** for CodeQL. GitHub then
runs CodeQL as a dynamic workflow without the fork-token limitations that
affect checked-in advanced workflows.
Do not also add a checked-in CodeQL workflow while default setup is enabled:
GitHub rejects advanced CodeQL uploads when default setup is active. If the
project later needs an advanced CodeQL workflow, disable default setup first
and keep only one CodeQL publishing path active.
## Keeping it current
+438
View File
@@ -0,0 +1,438 @@
# Odysseus Setup Guide
This page keeps the detailed install, deployment, troubleshooting, and configuration notes out of the front README.
## Quick Start
> **Branch note:** `dev` is the default branch and contains the latest development changes, but it may be unstable. For the more stable curated branch, use [`main`](https://github.com/pewdiepie-archdaemon/odysseus/tree/main).
Defaults work out of the box: clone, run, then configure models/search/email
inside **Settings**. Only edit `.env` for deployment-level overrides like
`APP_BIND`, `APP_PORT`, `AUTH_ENABLED`, `DATABASE_URL`, or a pre-seeded admin password.
On first setup, Odysseus creates an admin account (`admin` unless
`ODYSSEUS_ADMIN_USER` is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in `docker compose logs odysseus`.
Use that for the first login, then change it in **Settings**.
Contributing? See [CONTRIBUTING.md](../CONTRIBUTING.md) for setup, testing, and
pull request guidelines.
### Docker (recommended)
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
cp .env.example .env # optional, but recommended for explicit defaults
docker compose up -d --build
```
To include optional extras in the image (PDF viewer, Office extraction; includes AGPL PyMuPDF), build with `docker compose build --build-arg INSTALL_OPTIONAL=true` before `up`.
Open `http://localhost:7000` when the containers are healthy. Docker Compose
binds the web UI to `127.0.0.1` by default. If the port is taken, set
`APP_PORT=7001` in `.env` and recreate the container. Set `APP_BIND=0.0.0.0`
only when you intentionally want LAN/reverse-proxy access.
> **On Apple Silicon (M-series) Macs:** Docker can't reach the Metal GPU, so
> Cookbook serves local models on CPU only. For GPU-accelerated model serving,
> run natively instead — see [Apple Silicon](#apple-silicon) below.
### Native Linux / macOS
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
Requirements: Python 3.11+. Cookbook also needs `tmux` for background model
downloads and serves. The app itself is lightweight; local model serving is the
heavy part and depends on the model, runtime, GPU, and VRAM, so small hosts can
connect to API or remote model servers instead. Use `--host 0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
### Apple Silicon
Docker on macOS cannot use the Metal GPU. For GPU-accelerated Cookbook on an
M-series Mac, run Odysseus natively:
```bash
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh
```
It launches at `http://127.0.0.1:7860`. To expose it to your phone over a trusted LAN/VPN such as Tailscale, bind all interfaces:
```bash
ODYSSEUS_HOST=0.0.0.0 ./start-macos.sh
# then open http://<tailscale-ip>:7860
```
The script also reads `.env` at startup, so `APP_BIND=0.0.0.0` and `APP_PORT`
set there are picked up automatically without a command-line override each run.
Keep `AUTH_ENABLED=true` (the default) before binding outside loopback. Do not
expose this port directly to the public internet. To build a clickable app wrapper:
```bash
./build-macos-app.sh
```
<details>
<summary>Cookbook, GPU, Ollama, and troubleshooting notes</summary>
**Docker bundled services.** Compose starts Odysseus, ChromaDB, SearXNG, and
ntfy. Odysseus and the bundled service ports bind to `127.0.0.1` by default, so
they are reachable from the host but not exposed to your LAN/public internet
unless you opt in.
**Cookbook storage in Docker.** Downloads live in `./data/huggingface`
(`~/.cache/huggingface` in the container). Cookbook-installed Python CLIs and
serve engines live in `./data/local` (`~/.local` in the container), so they
survive container recreation.
**Remote servers.** In **Cookbook -> Settings -> Servers**, generate the
Odysseus SSH key and add the public key to the remote server's
`~/.ssh/authorized_keys`. From the host you can also run:
```bash
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
```
**Docker GPU overlays.** CPU-only users can skip this section. Cookbook can
only detect GPUs that Docker exposes to the container — if the host runtime or
device passthrough is not configured, Cookbook sees the iGPU, another card, or
CPU instead of your intended GPU.
For NVIDIA, `scripts/check-docker-gpu.sh` diagnoses GPU passthrough and can
optionally install the host runtime or update `.env`.
```bash
# Read-only diagnostic (default — installs nothing, never edits .env):
scripts/check-docker-gpu.sh
# Print OS-specific install commands without running them:
scripts/check-docker-gpu.sh --print-install-commands
# Install NVIDIA Container Toolkit on Ubuntu/Debian (requires sudo):
scripts/check-docker-gpu.sh --install-nvidia-toolkit
# Write COMPOSE_FILE to .env (only when GPU passthrough is confirmed working):
scripts/check-docker-gpu.sh --enable-nvidia-overlay
# Full assisted setup — install toolkit, then enable overlay if passthrough works:
scripts/check-docker-gpu.sh --install-nvidia-toolkit --enable-nvidia-overlay
```
Safety notes:
- The app never installs host GPU runtime automatically.
- The app never edits `.env` automatically.
- `.env` is only modified when `--enable-nvidia-overlay` is explicitly passed,
and only after GPU passthrough succeeds. `--yes` skips prompts but does not
bypass the passthrough gate.
- `.env.bak.*` backups created by `--enable-nvidia-overlay` are ignored by
Git and the Docker build context.
To enable manually without the script, add this to `.env`:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
```
**AMD / ROCm.** AMD setup is read-only diagnostic plus manual `.env` edit. Run:
```bash
scripts/check-docker-amd-gpu.sh
```
Then add the reported values to `.env`, replacing `RENDER_GID` with your host's
numeric render group id:
```bash
COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
RENDER_GID=989
```
For NVIDIA/AMD GPU support, also read the comments in the selected overlay file: docker/gpu.nvidia.yml or docker/gpu.amd.yml.
**Stack-management UIs (Portainer, Coolify, Dockhand, etc.).** These tools
often accept only a single Compose file and do not reliably honor `COMPOSE_FILE`
or multiple `-f` overlays. CLI users should keep using the `COMPOSE_FILE`
overlay workflow above. For stack UIs, point the stack at one of the standalone
files instead, which bundle the base stack plus the GPU settings:
- `docker-compose.gpu-nvidia.yml` — still requires the NVIDIA Container Toolkit
on the host.
- `docker-compose.gpu-amd.yml` — still requires host ROCm/kfd/DRI setup, the
`video`/`render` group membership, and `RENDER_GID` when needed.
The base `docker-compose.yml` plus the `docker/gpu.*.yml` overlays remain the
source of truth; the standalone files mirror them for single-file deployments.
Verify after enabling either overlay:
```bash
docker compose exec odysseus nvidia-smi -L # NVIDIA
docker compose exec odysseus sh -lc 'test -e /dev/kfd && test -d /dev/dri && ls -l /dev/kfd /dev/dri/renderD*' # AMD
```
> **GPU passthrough ≠ llama.cpp CUDA.** `nvidia-smi` passing inside the
> container confirms Docker GPU access, but llama.cpp also needs `cudart` and
> the CUDA Toolkit at runtime. If Cookbook logs show `Unable to find cudart
> library`, `Could NOT find CUDAToolkit`, `CUDA Toolkit not found`, or
> tensors/layers assigned to CPU, that is a Cookbook/llama.cpp build issue —
> not a Docker passthrough failure. Reinstall the serve engine via
> **Cookbook → Dependencies** to get a CUDA-enabled build.
>
> The same split applies to AMD/ROCm: seeing `/dev/kfd` and `/dev/dri` inside
> the container confirms device passthrough, not ROCm userspace or a
> ROCm-enabled vLLM/llama.cpp build. `rocm-smi` and `rocminfo` are not expected
> inside the slim Odysseus image.
**Ollama with Docker.** If Ollama runs on the host, add this endpoint in
Settings:
```text
http://host.docker.internal:11434/v1
```
Ollama must listen outside its own loopback interface:
```bash
OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
This connects Odysseus in Docker to an Ollama server that is already running on
your host machine; it does not start Ollama inside the container.
`host.docker.internal` is Docker's hostname for the host machine from inside the
container. Cookbook **Serve** is a separate workflow for serving downloaded
models through Odysseus/llama.cpp, so Windows users with an existing Ollama
install usually only need to add the endpoint in Settings.
**Useful checks.**
```bash
docker compose ps
docker compose logs --tail=120 odysseus
docker compose logs odysseus | grep -E 'ChromaDB|MemoryVectorStore|DEGRADED'
```
**macOS details.** `start-macos.sh` installs Homebrew deps, creates the venv,
runs setup, and starts uvicorn on port `7860` because AirPlay often holds
`7000`. It uses llama.cpp/Ollama for Metal. vLLM/SGLang are CUDA/ROCm-only and
do not run on macOS. MLX-only models are not served by Odysseus.
</details>
### Native Windows
**One-command launcher** (creates the venv, installs deps, runs setup, starts the
server; safe to re-run):
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1
```
Or do it by hand:
```powershell
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
py -3.11 -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
```
If `python` points at an older interpreter, use `py -3.12` (or another installed
3.11+ version) for the venv step.
**Exposing on a LAN/Tailscale (Windows):** the launcher binds to `127.0.0.1` and
does **not** read `APP_BIND` / `ODYSSEUS_HOST` from `.env`, so editing `.env`
alone leaves the native Windows server on loopback. Pass the launcher's
`-BindHost` flag instead:
```powershell
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1 -BindHost 0.0.0.0
```
The manual `uvicorn` command takes the same address as `--host 0.0.0.0`. Bind
outside loopback only for a trusted LAN/VPN such as Tailscale: keep
`AUTH_ENABLED=true` and do not expose the port directly to the public internet.
**Requirements:** Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full **Cookbook** background
model downloads and the agent shell tool, also install
[Git for Windows](https://git-scm.com/download/win) (provides `bash.exe`).
Local GPU *serving* of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
[Ollama](https://ollama.com/download) is the easiest path — point Odysseus at
`http://localhost:11434/v1` in Settings.
Open `http://localhost:7000`, log in with the generated admin password,
and configure everything else inside **Settings**.
## Troubleshooting & Advanced Setup
### `chromadb-client` conflicts with embedded ChromaDB
If `chromadb-client` (the lightweight HTTP-only package) is installed alongside the full `chromadb` package, Odysseus starts but ChromaDB silently falls back to HTTP-only mode and fails.
**Fix:** uninstall `chromadb-client` and force-reinstall the full package:
```bash
./venv/bin/pip uninstall chromadb-client -y
./venv/bin/pip install --force-reinstall chromadb
```
### HTTPS + LAN/Tailscale exposure
To expose Odysseus on a local network or Tailscale with HTTPS:
1. Change the bind address to `0.0.0.0` in `.env` (`APP_BIND=0.0.0.0` or `ODYSSEUS_HOST=0.0.0.0`).
2. Generate a locally-trusted cert for your LAN/Tailscale IPs using [mkcert](https://github.com/FiloSottile/mkcert):
```bash
mkcert -install
mkcert -cert-file cert.pem -key-file key.pem 192.168.1.100 tailscale-ip
```
3. Run `uvicorn` with the generated certs:
```bash
python -m uvicorn app:app --host 0.0.0.0 --port 7000 --ssl-certfile=cert.pem --ssl-keyfile=key.pem
```
4. Install the `mkcert` CA on any other device you want to access Odysseus from (e.g., for iOS, email the `rootCA.pem` to yourself, install the profile, and trust it in Certificate Trust Settings).
### Optional Dependencies
`requirements-optional.txt` contains packages that unlock extra features. It is not installed by default.
| Package | Feature unlocked |
|---------|-----------------|
| `faster-whisper` | Local speech-to-text (microphone -> text) via the "local" STT provider. |
| `ddgs` | DuckDuckGo as a search provider option. |
| `PyMuPDF` | PDF page rendering in the side viewer panel and form-filling. (Note: AGPL-3.0) |
| `markitdown` | Office/EPUB document text extraction (converts .docx/.xlsx/.pptx/.xls/.epub to Markdown). |
### Faster, reproducible installs with uv (optional)
[uv](https://docs.astral.sh/uv/) works as a drop-in replacement for the
venv + pip steps in the native install guides, no project changes are needed but this change results in faster installs along with a lockfile for reproducible environments. After [installing `uv`](https://docs.astral.sh/uv/getting-started/installation/), use:
```bash
uv venv venv --python 3.13
uv pip install -r requirements.txt
# then continue as usual: python setup.py, uvicorn, ...
```
`requirements.txt` is intentionally unpinned, so two installs at different times can produce different package versions. If you want a reproducible environment (e.g. across your own machines, or to roll back after a bad upgrade), snapshot and restore exact versions with:
```bash
uv pip compile requirements.txt -o requirements.lock # snapshot current resolution
uv pip sync requirements.lock # reproduce it exactly later
```
`requirements.lock` is gitignored and platform-specific (compile it on the OS you deploy to). Regenerate it deliberately when you want to take upgrades. The plain `uv pip install -r requirements.txt` keeps following the unpinned requirements like pip does.
### Outlook / Office 365 email
Odysseus email accounts currently use IMAP/SMTP username-password auth. Outlook
and Microsoft 365 generally require OAuth instead, so normal Microsoft mailbox
passwords will fail. See [docs/email-outlook.md](docs/email-outlook.md) for the
current limitation and the planned integration direction.
## Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep `AUTH_ENABLED=true` for any network-accessible deployment.
- Keep `LOCALHOST_BYPASS=false` outside local development.
- Use `SECURE_COOKIES=true` when Odysseus is served through HTTPS by a trusted reverse proxy or private access gateway.
- Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy or private access layer.
- Keep `.env`, `data/`, `logs/`, databases, uploads, generated media, backups, auth/session files, API keys, and model/provider tokens out of Git and private shares. They are ignored by default.
- Review `data/auth.json` after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
- Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to `127.0.0.1`; bind to `0.0.0.0` only when you intentionally want LAN/reverse-proxy access.
- Keep ChromaDB, SearXNG, ntfy, Ollama, vLLM, llama.cpp, databases, and raw model/provider APIs internal-only. Expose only the authenticated Odysseus web/API entrypoint through your trusted proxy or private access layer.
- Before publishing a fork, run `git status --short` and confirm no private files from `.env`, `data/`, `logs/`, uploads, backups, or local databases are staged.
### Private or proxied deployments
Odysseus serves plain HTTP on its app port. Docker Compose binds Odysseus and the bundled services to `127.0.0.1` by default, so a typical production/private setup is:
1. Keep Odysseus on localhost, for example `127.0.0.1:7000`.
2. Terminate HTTPS at a trusted reverse proxy or private access gateway.
3. Put the authenticated Odysseus web/API entrypoint behind that layer.
4. Keep raw service and model ports internal-only.
Cloudflare Access, Tailscale, Caddy, nginx, and Traefik can all fit this pattern; none are required by Odysseus. If your access layer reaches Odysseus on the same host, proxy to `http://127.0.0.1:7000` and keep `AUTH_ENABLED=true`, `LOCALHOST_BYPASS=false`, and `SECURE_COOKIES=true`.
`ALLOWED_ORIGINS` lists exact permitted origins for cross-origin browser/API clients; ordinary same-origin reverse-proxy access usually does not need a special CORS entry.
Common internal-only ports from the default docs/compose setup:
| Port | Service |
|---|---|
| `7000` | Odysseus raw app port |
| `8080` | SearXNG |
| `8091` | ntfy |
| `8100` | ChromaDB host port for manual/compose access |
| `11434` | Ollama |
| `8000-8020` | Common local model/provider APIs |
## Configuration
Most setup is done inside the app with `/setup` or **Settings**. Use `.env`
for deployment-level defaults and secrets you want present before first boot.
Key settings:
| Variable | Default | Description |
|---|---|---|
| `LLM_HOST` | `localhost` | Your LLM server (e.g. `llm-host.local:8000`) |
| `LLM_HOSTS` | -- | Comma-separated list for model discovery |
| `OPENAI_API_KEY` | -- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
| `SEARXNG_INSTANCE` | `http://localhost:8080` | SearXNG URL. Docker overrides this to `http://searxng:8080`. |
| `SEARXNG_SECRET` | generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
| `APP_BIND` | `127.0.0.1` | Docker Compose host bind address for the web UI. Use `0.0.0.0` only for intentional LAN/reverse-proxy access. |
| `APP_PORT` | `7000` | Docker Compose host port for the web UI. |
| `APP_DATA_DIR` | `./data` | Docker Compose host directory for application data volumes. |
| `APP_LOGS_DIR` | `./logs` | Docker Compose host directory for application logs. |
| `AUTH_ENABLED` | `true` | Enable/disable login |
| `LOCALHOST_BYPASS` | `false` | Development-only auth bypass for loopback requests. Keep false for shared/network deployments. |
| `ALLOWED_ORIGINS` | `http://localhost,http://127.0.0.1` | Comma-separated exact permitted origins for cross-origin browser/API clients. |
| `SECURE_COOKIES` | `false` | Set true when serving Odysseus through HTTPS at a trusted proxy or private access gateway. |
| `DATABASE_URL` | `sqlite:///./data/app.db` | Database connection string |
| `CHROMADB_HOST` | `localhost` | ChromaDB host for vector memory. Docker overrides this to `chromadb`. |
| `CHROMADB_PORT` | `8100` | ChromaDB port for manual host runs. Docker overrides this to `8000`. |
| `EMBEDDING_URL` | -- | OpenAI-compatible embeddings endpoint |
| `ODYSSEUS_CHAT_UPLOAD_MAX_BYTES` | `10485760` | Chat/agent attachment cap in bytes. Raise for larger local PDFs or text documents. |
| `ODYSSEUS_GALLERY_UPLOAD_MAX_BYTES` | `104857600` | Gallery image upload cap in bytes (100 MB). |
| `ODYSSEUS_GALLERY_TRANSFORM_UPLOAD_MAX_BYTES` | `26214400` | Gallery transform input cap in bytes (25 MB). |
| `ODYSSEUS_MEMORY_IMPORT_MAX_BYTES` | `10485760` | Memory import file cap in bytes (10 MB). |
| `ODYSSEUS_PERSONAL_UPLOAD_MAX_BYTES` | `26214400` | Personal document upload cap in bytes (25 MB). |
| `ODYSSEUS_EMAIL_COMPOSE_UPLOAD_MAX_BYTES` | `26214400` | Email compose attachment cap in bytes (25 MB). |
| `ODYSSEUS_STT_MAX_AUDIO_BYTES` | `26214400` | Speech-to-text audio cap in bytes (25 MB). |
| `ODYSSEUS_ICS_MAX_BYTES` | `10485760` | Calendar `.ics` import cap in bytes (10 MB). |
All upload-limit vars are validated (must be a positive integer) and optional; an invalid value fails fast at startup.
### Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, `@playwright/mcp`) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
```bash
npx -y @playwright/mcp@latest --version
```
That installs `@playwright/mcp` plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
## Architecture
```
app.py # FastAPI entry point
core/ auth, database, middleware, constants
src/ llm_core, agent_loop, agent_tools, chat_processor, search/
routes/ chat, session, document, memory, model … endpoints
services/ docs, memory, search, hwfit (Cookbook) …
static/ index.html + app.js + style.css + js/ (modular front-end)
docs/ landing page (index.html) + preview clips
```
## Data
All user data lives in `data/` (gitignored): `app.db` (sessions, messages, documents),
`memory.json`, `presets.json`, `uploads/`, `personal_docs/`, `chroma/`, `settings.json`.
To back up or restore everything in `data/`, see the
[Backup & Restore guide](docs/backup-restore.md).
@@ -102,6 +102,7 @@ python3 ~/.claude/skills/odysseus/scripts/odysseus_api.py POST /api/codex/memory
## Email draft + send
- Prefer `POST /api/codex/emails/draft-document` for agent-written email replies. It creates an editable Odysseus Document with `language: "email"` and does not touch IMAP/send.
- `POST /api/codex/emails/draft` — body matches `SendEmailRequest` (`to`, `cc`, `bcc`, `subject`, `body`, `body_html`, `attachments`, `account_id`, `in_reply_to`, `references`). Requires `email:draft` (or `email:send`).
- `POST /api/codex/emails/send` — same body. Requires `email:send`. Never send without explicit user instruction.
@@ -17,6 +17,11 @@ def _usage() -> int:
print(" odysseus_api.py todos add TITLE", file=sys.stderr)
print(" odysseus_api.py emails list [limit]", file=sys.stderr)
print(" odysseus_api.py emails read UID", file=sys.stderr)
print(" odysseus_api.py emails draft-doc JSON_PAYLOAD", file=sys.stderr)
print(" odysseus_api.py documents list [limit]", file=sys.stderr)
print(" odysseus_api.py documents read DOC_ID", file=sys.stderr)
print(" odysseus_api.py documents create JSON_PAYLOAD", file=sys.stderr)
print(" odysseus_api.py documents delete DOC_ID", file=sys.stderr)
print(" odysseus_api.py cookbook tasks", file=sys.stderr)
print(" odysseus_api.py cookbook servers", file=sys.stderr)
print(" odysseus_api.py cookbook cached [HOST]", file=sys.stderr)
@@ -79,6 +84,33 @@ def main() -> int:
method = "GET"
path = f"/api/codex/emails/{sys.argv[3]}"
body = None
elif action in ("draft-doc", "draft_document") and len(sys.argv) >= 4:
method = "POST"
path = "/api/codex/emails/draft-document"
body = " ".join(sys.argv[3:])
else:
return _usage()
elif command in ("documents", "docs"):
if len(sys.argv) < 3:
return _usage()
action = sys.argv[2].lower()
if action == "list":
method = "GET"
limit = sys.argv[3] if len(sys.argv) >= 4 else "50"
path = f"/api/codex/documents?limit={limit}"
body = None
elif action == "read" and len(sys.argv) >= 4:
method = "GET"
path = f"/api/codex/documents/{sys.argv[3]}"
body = None
elif action == "create" and len(sys.argv) >= 4:
method = "POST"
path = "/api/codex/documents"
body = " ".join(sys.argv[3:])
elif action == "delete" and len(sys.argv) >= 4:
method = "DELETE"
path = f"/api/codex/documents/{sys.argv[3]}"
body = None
else:
return _usage()
elif command == "cookbook":
@@ -17,6 +17,11 @@ def _usage() -> int:
print(" odysseus_api.py todos add TITLE", file=sys.stderr)
print(" odysseus_api.py emails list [limit]", file=sys.stderr)
print(" odysseus_api.py emails read UID", file=sys.stderr)
print(" odysseus_api.py emails draft-doc JSON_PAYLOAD", file=sys.stderr)
print(" odysseus_api.py documents list [limit]", file=sys.stderr)
print(" odysseus_api.py documents read DOC_ID", file=sys.stderr)
print(" odysseus_api.py documents create JSON_PAYLOAD", file=sys.stderr)
print(" odysseus_api.py documents delete DOC_ID", file=sys.stderr)
print(" odysseus_api.py cookbook tasks", file=sys.stderr)
print(" odysseus_api.py cookbook servers", file=sys.stderr)
print(" odysseus_api.py cookbook cached [HOST]", file=sys.stderr)
@@ -79,6 +84,33 @@ def main() -> int:
method = "GET"
path = f"/api/codex/emails/{sys.argv[3]}"
body = None
elif action in ("draft-doc", "draft_document") and len(sys.argv) >= 4:
method = "POST"
path = "/api/codex/emails/draft-document"
body = " ".join(sys.argv[3:])
else:
return _usage()
elif command in ("documents", "docs"):
if len(sys.argv) < 3:
return _usage()
action = sys.argv[2].lower()
if action == "list":
method = "GET"
limit = sys.argv[3] if len(sys.argv) >= 4 else "50"
path = f"/api/codex/documents?limit={limit}"
body = None
elif action == "read" and len(sys.argv) >= 4:
method = "GET"
path = f"/api/codex/documents/{sys.argv[3]}"
body = None
elif action == "create" and len(sys.argv) >= 4:
method = "POST"
path = "/api/codex/documents"
body = " ".join(sys.argv[3:])
elif action == "delete" and len(sys.argv) >= 4:
method = "DELETE"
path = f"/api/codex/documents/{sys.argv[3]}"
body = None
else:
return _usage()
elif command == "cookbook":
@@ -102,6 +102,7 @@ python3 integrations/codex/scripts/odysseus_api.py POST /api/codex/memory '{"tex
## Email draft + send
- Prefer `POST /api/codex/emails/draft-document` for Codex-written email replies. It creates an editable Odysseus Document with `language: "email"` and does not touch IMAP/send.
- `POST /api/codex/emails/draft` — body matches `SendEmailRequest` (`to`, `cc`, `bcc`, `subject`, `body`, `body_html`, `attachments`, `account_id`, `in_reply_to`, `references`). Requires `email:draft` (or `email:send`).
- `POST /api/codex/emails/send` — same body. Requires `email:send`. Never send without explicit user instruction.
+22 -1
View File
@@ -105,6 +105,14 @@ if (-not $pyExe) {
}
}
if ($pyExe -like "*WindowsApps*python.exe") {
$pyCmd = Get-Command py -ErrorAction SilentlyContinue
if ($pyCmd) {
$pyExe = $pyCmd.Source
$pyArgs = @("-3.11")
}
}
if (-not $pyExe) {
Fail "Couldn't find Python 3.11+ for Windows setup. Install Python 3.11+ (or open the Python launcher with 'py -3.11') from https://www.python.org/downloads/, then re-run this script."
}
@@ -141,7 +149,20 @@ if (-not (Find-GitBash)) {
Write-Host " https://git-scm.com/download/win" -ForegroundColor Yellow
}
# 6. Start the server (use `python -m uvicorn` - bare `uvicorn` may not be on PATH)
# 6. Point CUDA_PATH at a real CUDA toolkit so GPU llama-cpp-python can import.
$cudaBase = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA"
if (Test-Path $cudaBase) {
$cudaBest = Get-ChildItem $cudaBase -Directory -ErrorAction SilentlyContinue |
Where-Object { Test-Path (Join-Path $_.FullName "bin") } |
Sort-Object { try { [version]($_.Name -replace "^v", "") } catch { [version]"0.0" } } -Descending |
Select-Object -First 1
if ($cudaBest) {
$env:CUDA_PATH = $cudaBest.FullName
Write-Host ("Using CUDA_PATH = " + $cudaBest.FullName) -ForegroundColor Cyan
}
}
# 7. Start the server (use `python -m uvicorn` - bare `uvicorn` may not be on PATH)
Write-Step ("Starting Odysseus at http://{0}:{1}" -f $BindHost, $Port)
Write-Host "Press Ctrl+C to stop."
Write-Host ""
+142
View File
@@ -0,0 +1,142 @@
# launcher.py
"""Dedicated entrypoint for the standalone Windows portable launcher.
Handles:
- Immediate GUI splash screen creation using tkinter.
- Suppressing console stream crashes in windowed GUI mode via NullWriter.
- Spawning system tray icon via pystray and Pillow (lazy-loaded).
- Auto-opening default browser pointing to the running backend.
- Launching the FastAPI server (importing and running app.py).
"""
import os
import sys
import threading
import time
import webbrowser
# Define a dummy NullWriter to suppress standard stream crashes (isatty etc.) in GUI mode
class NullWriter:
def write(self, text):
pass
def flush(self):
pass
def isatty(self):
return False
if sys.stdout is None:
sys.stdout = NullWriter()
if sys.stderr is None:
sys.stderr = NullWriter()
splash_root = None
# If running from a frozen PyInstaller bundle, launch the splash screen IMMEDIATELY
if getattr(sys, 'frozen', False):
import tkinter as tk
def show_splash_instantly():
global splash_root
try:
splash_root = tk.Tk()
splash_root.title("Odysseus")
splash_root.overrideredirect(True)
splash_root.configure(bg="#1a1c23")
# Accented borders
splash_root.config(highlightbackground="#e06c75", highlightcolor="#e06c75", highlightthickness=1)
w, h = 360, 160
ws = splash_root.winfo_screenwidth()
hs = splash_root.winfo_screenheight()
x = (ws - w) // 2
y = (hs - h) // 2
splash_root.geometry(f"{w}x{h}+{x}+{y}")
tk.Label(splash_root, text="⛵ Odysseus", font=("Segoe UI", 22, "bold"), bg="#1a1c23", fg="#e06c75").pack(pady=(22, 2))
tk.Label(splash_root, text="Launching background services...", font=("Segoe UI", 10), bg="#1a1c23", fg="#d1d4e0").pack(pady=2)
tk.Label(splash_root, text="Please wait, this will take a few seconds.", font=("Segoe UI", 8, "italic"), bg="#1a1c23", fg="#5c6370").pack(pady=(12, 0))
splash_root.attributes("-topmost", True)
splash_root.mainloop()
except Exception:
pass
# Launch the GUI splash screen immediately on a background thread
threading.Thread(target=show_splash_instantly, daemon=True).start()
def create_tray_image():
# Generate a beautiful 64x64 icon matching Odysseus brand red accent (#e06c75)
from PIL import Image, ImageDraw
image = Image.new('RGBA', (64, 64), (0, 0, 0, 0))
dc = ImageDraw.Draw(image)
accent_red = (224, 108, 117, 255)
light_red = (224, 108, 117, 150)
# Draw premium sailing boat
dc.polygon([(32, 10), (32, 45), (12, 45)], fill=accent_red)
dc.polygon([(32, 18), (32, 45), (48, 45)], fill=light_red)
dc.polygon([(8, 48), (56, 48), (44, 56), (20, 56)], fill=accent_red)
return image
def on_open_browser(icon, item, url):
webbrowser.open(url)
def on_exit(icon, item):
icon.stop()
os._exit(0)
def setup_system_tray(url):
try:
import pystray
icon_img = create_tray_image()
menu = (
pystray.MenuItem('Open Odysseus', lambda icon, item: on_open_browser(icon, item, url), default=True),
pystray.MenuItem('Exit', on_exit)
)
tray_icon = pystray.Icon(
"Odysseus",
icon_img,
"Odysseus",
menu
)
tray_icon.run()
except Exception:
pass
def open_browser(url):
# Allow uvicorn and app lifecycles to complete warmups
time.sleep(3.5)
# Safely close the splash screen
try:
global splash_root
if splash_root:
splash_root.after(0, splash_root.destroy)
except Exception:
pass
webbrowser.open(url)
if __name__ == "__main__":
import uvicorn
# Import the FastAPI app from app.py
from app import app
bind_host = os.getenv("APP_BIND", "127.0.0.1")
bind_port = int(os.getenv("APP_PORT", "7000"))
url = f"http://{bind_host}:{bind_port}"
if getattr(sys, 'frozen', False):
# Start browser manager thread
threading.Thread(target=open_browser, args=(url,), daemon=True).start()
# Start system tray manager thread
threading.Thread(target=setup_system_tray, args=(url,), daemon=True).start()
uvicorn.run(app, host=bind_host, port=bind_port, log_level="info")
+94
View File
@@ -0,0 +1,94 @@
Copyright (c) 2019-07-29, Abbie Gonzalez (https://abbiecod.es|support@abbiecod.es),
with Reserved Font Name OpenDyslexic.
Copyright (c) 12/2012 - 2019
This Font Software is licensed under the SIL Open Font License, Version 1.1.
This license is copied below, and is also available with a FAQ at:
http://scripts.sil.org/OFL
-----------------------------------------------------------
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
-----------------------------------------------------------
PREAMBLE
The goals of the Open Font License (OFL) are to stimulate worldwide
development of collaborative font projects, to support the font creation
efforts of academic and linguistic communities, and to provide a free and
open framework in which fonts may be shared and improved in partnership
with others.
The OFL allows the licensed fonts to be used, studied, modified and
redistributed freely as long as they are not sold by themselves. The
fonts, including any derivative works, can be bundled, embedded,
redistributed and/or sold with any software provided that any reserved
names are not used by derivative works. The fonts and derivatives,
however, cannot be released under any other type of license. The
requirement for fonts to remain under this license does not apply
to any document created using the fonts or their derivatives.
DEFINITIONS
"Font Software" refers to the set of files released by the Copyright
Holder(s) under this license and clearly marked as such. This may
include source files, build scripts and documentation.
"Reserved Font Name" refers to any names specified as such after the
copyright statement(s).
"Original Version" refers to the collection of Font Software components as
distributed by the Copyright Holder(s).
"Modified Version" refers to any derivative made by adding to, deleting,
or substituting -- in part or in whole -- any of the components of the
Original Version, by changing formats or by porting the Font Software to a
new environment.
"Author" refers to any designer, engineer, programmer, technical
writer or other person who contributed to the Font Software.
PERMISSION & CONDITIONS
Permission is hereby granted, free of charge, to any person obtaining
a copy of the Font Software, to use, study, copy, merge, embed, modify,
redistribute, and sell modified and unmodified copies of the Font
Software, subject to the following conditions:
1) Neither the Font Software nor any of its individual components,
in Original or Modified Versions, may be sold by itself.
2) Original or Modified Versions of the Font Software may be bundled,
redistributed and/or sold with any software, provided that each copy
contains the above copyright notice and this license. These can be
included either as stand-alone text files, human-readable headers or
in the appropriate machine-readable metadata fields within text or
binary files as long as those fields can be easily viewed by the user.
3) No Modified Version of the Font Software may use the Reserved Font
Name(s) unless explicit written permission is granted by the corresponding
Copyright Holder. This restriction only applies to the primary font name as
presented to the users.
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
Software shall not be used to promote, endorse or advertise any
Modified Version, except to acknowledge the contribution(s) of the
Copyright Holder(s) and the Author(s) or with their explicit written
permission.
5) The Font Software, modified or unmodified, in part or in whole,
must be distributed entirely under this license, and must not be
distributed under any other license. The requirement for fonts to
remain under this license does not apply to any document created
using the Font Software.
TERMINATION
This license becomes null and void if any of the above conditions are
not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
OTHER DEALINGS IN THE FONT SOFTWARE.
+190 -12
View File
@@ -23,6 +23,7 @@ import os.path
from pathlib import Path
from datetime import datetime, timedelta
import uuid
from contextvars import ContextVar
from mcp.server import Server
from mcp.server.stdio import stdio_server
@@ -55,6 +56,8 @@ def _uid_fetch_rows(data) -> list:
# flat keys when no DB row matches (legacy single-account behaviour).
_ACCOUNT_CACHE: dict = {} # key = normalized account selector -> config dict
_MCP_OWNER_ARG = "_odysseus_owner"
_CURRENT_OWNER: ContextVar[str | None] = ContextVar("email_mcp_owner", default=None)
def _clean_header_value(value) -> str:
@@ -68,6 +71,45 @@ def _db_path() -> Path:
return Path(APP_DB)
def _current_owner() -> str:
owner = _CURRENT_OWNER.get()
return str(owner or "").strip()
def _account_visible_to_owner(row: dict, owner: str) -> bool:
row_owner = str(row.get("owner") or "").strip()
if row_owner == owner:
return True
if row_owner:
return False
# Legacy ownerless accounts are only visible to a scoped caller when the
# mailbox itself matches the owner, mirroring the HTTP email route fallback.
owner_l = owner.lower()
return owner_l in {
str(row.get("imap_user") or "").strip().lower(),
str(row.get("from_address") or "").strip().lower(),
}
def _filter_accounts_for_owner(rows: list[dict]) -> list[dict]:
owner = _current_owner()
if owner:
return [r for r in rows if _account_visible_to_owner(r, owner)]
owners = {str(r.get("owner") or "").strip() for r in rows if str(r.get("owner") or "").strip()}
if len(owners) > 1:
return []
return rows
def _mcp_owner_required(rows: list[dict] | None = None) -> bool:
if _current_owner():
return False
rows = rows if rows is not None else _read_accounts_from_db()
owners = {str(r.get("owner") or "").strip() for r in rows if str(r.get("owner") or "").strip()}
return len(owners) > 1
def _load_email_writing_style() -> str:
"""Return the existing Settings > Email > Writing Style value."""
try:
@@ -121,9 +163,8 @@ def _default_document_owner() -> str | None:
return None
def _list_accounts_raw() -> list:
"""Return list of dicts from the email_accounts table. Empty list if table
missing or empty. Never raises."""
def _read_accounts_from_db() -> list:
"""Return all enabled email account rows. Empty list if missing. Never raises."""
path = _db_path()
if not path.exists():
return []
@@ -131,9 +172,10 @@ def _list_accounts_raw() -> list:
conn = sqlite3.connect(str(path))
conn.row_factory = sqlite3.Row
columns = {r[1] for r in conn.execute("PRAGMA table_info(email_accounts)").fetchall()}
owner_select = "owner" if "owner" in columns else "NULL AS owner"
smtp_security_select = "smtp_security" if "smtp_security" in columns else "'' AS smtp_security"
rows = conn.execute(f"""
SELECT id, name, is_default, enabled,
SELECT id, {owner_select}, name, is_default, enabled,
imap_host, imap_port, imap_user, imap_password, imap_starttls,
smtp_host, smtp_port, {smtp_security_select}, smtp_user, smtp_password, from_address
FROM email_accounts WHERE enabled = 1
@@ -147,11 +189,15 @@ def _list_accounts_raw() -> list:
return []
def _resolve_account(selector: str | None) -> dict | None:
def _list_accounts_raw() -> list:
"""Return owner-visible email account rows for the active MCP call."""
return _filter_accounts_for_owner(_read_accounts_from_db())
def _resolve_account_from_rows(rows: list[dict], selector: str | None) -> dict | None:
"""Given a selector (None = default, or a name/user/id string), return the
matching row or None. Matching is case-insensitive substring on name +
imap_user + from_address, plus exact id match."""
rows = _list_accounts_raw()
if not rows:
return None
if not selector:
@@ -186,6 +232,10 @@ def _resolve_account(selector: str | None) -> dict | None:
return None
def _resolve_account(selector: str | None) -> dict | None:
return _resolve_account_from_rows(_list_accounts_raw(), selector)
def _load_config(account: str | None = None) -> dict:
"""Return the full config dict for the requested account (or default).
@@ -194,7 +244,7 @@ def _load_config(account: str | None = None) -> dict:
2. env vars + settings.json flat keys (legacy)
3. hardcoded fallbacks (localhost:31143 etc.)
"""
cache_key = (account or "").strip().lower() or "__default__"
cache_key = (_current_owner(), (account or "").strip().lower() or "__default__")
if cache_key in _ACCOUNT_CACHE:
return _ACCOUNT_CACHE[cache_key]
@@ -223,8 +273,11 @@ def _load_config(account: str | None = None) -> dict:
"account_name": None,
}
rows = _list_accounts_raw()
row = _resolve_account(account)
raw_rows = _read_accounts_from_db()
rows = _filter_accounts_for_owner(raw_rows)
row = _resolve_account_from_rows(rows, account)
if _current_owner() and raw_rows and not rows:
raise ValueError("No email account is configured for the authenticated owner")
if account and rows and not row:
available = ", ".join(
f"{r.get('name') or r.get('imap_user')} <{r.get('imap_user') or r.get('from_address') or '?'}>"
@@ -885,8 +938,109 @@ def _smtp_connect(account=None, cfg=None):
return conn
def _read_agent_email_confirm_setting() -> bool:
"""True if the user wants agent send_email/reply_to_email calls to be
queued for manual approval instead of SMTPed immediately. Defaults to
True so a fresh install is safe agents have been observed inventing
signatures and sending to real recipients without the user's review."""
try:
from src.settings import get_setting
return bool(get_setting("agent_email_confirm", True))
except Exception:
return True
def _stash_agent_draft(*, to, subject, body, in_reply_to=None, references=None,
cc=None, bcc=None, account=None) -> dict:
"""Insert the composed email into scheduled_emails with status
'agent_draft' and a far-future send_at so the scheduled-send poller
never picks it up. Returns the pending payload the model surfaces to
the user (and that the chat UI can render as an approval card)."""
try:
from src.constants import SCHEDULED_EMAILS_DB
except Exception:
return {"success": False, "error": "Pending-email storage unavailable"}
pending_id = uuid.uuid4().hex[:16]
far_future = "9999-12-31T00:00:00"
now = datetime.utcnow().isoformat()
try:
conn = sqlite3.connect(SCHEDULED_EMAILS_DB)
# Touch the schema in case the email-routes init hasn't run yet
# (MCP server can boot independently).
conn.execute("""
CREATE TABLE IF NOT EXISTS scheduled_emails (
id TEXT PRIMARY KEY,
to_addr TEXT NOT NULL,
cc TEXT,
bcc TEXT,
subject TEXT,
body TEXT NOT NULL,
in_reply_to TEXT,
references_hdr TEXT,
attachments TEXT,
send_at TEXT NOT NULL,
created_at TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
error TEXT,
owner TEXT DEFAULT '',
account_id TEXT,
odysseus_kind TEXT
)
""")
conn.execute("""
INSERT INTO scheduled_emails
(id, to_addr, cc, bcc, subject, body, in_reply_to, references_hdr,
attachments, send_at, created_at, status, account_id, odysseus_kind, owner)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'agent_draft', ?, ?, ?)
""", (
pending_id,
to if isinstance(to, str) else ", ".join(to),
cc if isinstance(cc, str) else (", ".join(cc) if cc else None),
bcc if isinstance(bcc, str) else (", ".join(bcc) if bcc else None),
subject or "",
body or "",
in_reply_to or None,
references if isinstance(references, str) else (" ".join(references) if references else None),
"[]",
far_future,
now,
account or None,
"agent_draft",
_current_owner(),
))
conn.commit()
conn.close()
except Exception as e:
return {"success": False, "error": f"Failed to stash draft: {e}"}
return {
"success": True,
"pending": True,
"pending_id": pending_id,
"to": to if isinstance(to, str) else ", ".join(to),
"subject": subject or "",
"body": body or "",
"message": (
"✋ Draft staged for your approval — nothing has been sent yet.\n"
"Review the To/Subject/Body above. Reply 'send' to deliver, or "
"'cancel' to discard."
),
}
def _send_email(to, subject, body, in_reply_to=None, references=None, cc=None, bcc=None, account=None):
"""Send an email via SMTP. Returns dict with status."""
"""Send an email via SMTP. Returns dict with status.
When the `agent_email_confirm` setting is on (the default), the email
is NOT SMTPed instead it lands in scheduled_emails as an
`agent_draft` row and the user reviews + approves it from the chat
UI. This closes the auto-send hole that let earlier models invent
signatures and ship them to real recipients without confirmation."""
if _read_agent_email_confirm_setting():
return _stash_agent_draft(
to=to, subject=subject, body=body,
in_reply_to=in_reply_to, references=references,
cc=cc, bcc=bcc, account=account,
)
send_account, cfg = _resolve_send_config(account)
msg = EmailMessage()
msg["From"] = _clean_header_value(cfg["from_address"])
@@ -1038,7 +1192,7 @@ def _create_email_draft_document(
doc_id = str(uuid.uuid4())
ver_id = str(uuid.uuid4())
doc_title = (title or subject or "Email draft").strip() or "Email draft"
doc_owner = _default_document_owner()
doc_owner = _current_owner() or _default_document_owner()
db = SessionLocal()
try:
@@ -1824,10 +1978,22 @@ async def list_tools() -> list[Tool]:
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
arguments = dict(arguments) if isinstance(arguments, dict) else {}
owner = str(arguments.pop(_MCP_OWNER_ARG, "") or "").strip()
owner_token = _CURRENT_OWNER.set(owner or None)
try:
all_db_accounts = _read_accounts_from_db()
if _mcp_owner_required(all_db_accounts):
return [TextContent(
type="text",
text="Error: email MCP requires an authenticated owner when multiple email account owners are configured.",
)]
if name == "list_email_accounts":
rows = _list_accounts_raw()
rows = _filter_accounts_for_owner(all_db_accounts)
if not rows:
if all_db_accounts and owner:
return [TextContent(type="text", text="No email accounts configured for this owner.")]
return [TextContent(type="text", text="No email accounts configured. Legacy single-account mode active.")]
lines = [f"Found {len(rows)} email account(s):\n"]
for r in rows:
@@ -2007,6 +2173,16 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:
bcc=arguments.get("bcc"),
account=acct,
)
if "error" in result:
return [TextContent(type="text", text=f"Error: {result['error']}")]
if result.get("pending"):
return [TextContent(
type="text",
text=(
f"Draft staged for approval (pending id: {result.get('pending_id')}). "
"Nothing has been sent yet. Review and approve it in Odysseus before delivery."
),
)]
acct_note = f" (from {result['account']})" if result.get("account") else ""
return [TextContent(type="text", text=f"Sent email to {result['to']} with subject '{result['subject']}'{acct_note}.")]
@@ -2182,6 +2358,8 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:
except Exception as e:
return [TextContent(type="text", text=f"Error: {e}")]
finally:
_CURRENT_OWNER.reset(owner_token)
# ── Main ──
+92 -32
View File
@@ -6,6 +6,7 @@ Imports MemoryManager and MemoryVectorStore from the Odysseus codebase.
"""
import asyncio
import os
import sys
import time
from pathlib import Path
@@ -23,6 +24,55 @@ _memory_manager = None
_memory_vector = None
_initialized = False
_OWNER_ENV_KEYS = ("ODYSSEUS_MCP_MEMORY_OWNER", "ODYSSEUS_MEMORY_OWNER")
_OWNER_SCOPE_ERROR = (
"Error: Memory MCP owner is not configured for an owner-scoped memory store. "
"Set ODYSSEUS_MCP_MEMORY_OWNER for this server or use the owner-aware native memory tool."
)
def _configured_owner() -> str | None:
for key in _OWNER_ENV_KEYS:
owner = os.environ.get(key, "").strip()
if owner:
return owner
return None
def _entry_owner(entry: dict) -> str | None:
owner = entry.get("owner")
if owner is None:
return None
owner_text = str(owner).strip()
return owner_text or None
def _owner_scoped_store(entries: list[dict]) -> bool:
return any(_entry_owner(entry) for entry in entries if isinstance(entry, dict))
def _scope_entries() -> tuple[str | None, list[dict], list[dict], str | None]:
"""Return configured owner, all entries, visible entries, and optional error."""
entries = _memory_manager.load_all()
owner = _configured_owner()
if owner is None and _owner_scoped_store(entries):
return None, entries, [], _OWNER_SCOPE_ERROR
if owner is None:
visible = [
entry for entry in entries
if isinstance(entry, dict) and _entry_owner(entry) is None
]
else:
visible = [
entry for entry in entries
if isinstance(entry, dict) and _entry_owner(entry) == owner
]
return owner, entries, visible, None
def _text_result(text: str) -> list[TextContent]:
return [TextContent(type="text", text=text)]
def _ensure_init():
"""Lazy-init memory managers on first use."""
@@ -75,43 +125,46 @@ async def list_tools() -> list[Tool]:
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
if name != "manage_memory":
return [TextContent(type="text", text=f"Unknown tool: {name}")]
return _text_result(f"Unknown tool: {name}")
_ensure_init()
if not _memory_manager:
return [TextContent(type="text", text="Error: Memory manager not available")]
return _text_result("Error: Memory manager not available")
action = arguments.get("action", "")
if action == "list":
category_filter = arguments.get("category", "")
memories = _memory_manager.load()
_owner, _all_memories, memories, scope_error = _scope_entries()
if scope_error:
return _text_result(scope_error)
if category_filter:
memories = [m for m in memories if m.get("category", "").lower() == category_filter.lower()]
if not memories:
msg = "No memories found"
if category_filter:
msg += f" in category '{category_filter}'"
return [TextContent(type="text", text=msg + ".")]
return _text_result(msg + ".")
lines = [f"Found {len(memories)} memory entries:\n"]
for m in memories[:100]:
for m in memories:
cat = m.get("category", "fact")
mid = m.get("id", "?")[:8]
text = m.get("text", "")
if len(text) > 150:
text = text[:150] + "..."
lines.append(f"- [{cat}] `{mid}` — {text}")
if len(memories) > 100:
lines.append(f"... and {len(memories) - 100} more")
return [TextContent(type="text", text="\n".join(lines))]
return _text_result("\n".join(lines))
elif action == "add":
text = arguments.get("text", "")
category = arguments.get("category", "fact")
if not text:
return [TextContent(type="text", text="Error: Memory text cannot be empty")]
entry = _memory_manager.add_entry(text, source="ai_agent", category=category)
memories = _memory_manager.load_all()
return _text_result("Error: Memory text cannot be empty")
owner, memories, _visible, scope_error = _scope_entries()
if scope_error:
return _text_result(scope_error)
entry = _memory_manager.add_entry(text, source="ai_agent", category=category, owner=owner)
memories.append(entry)
_memory_manager.save(memories)
if _memory_vector and _memory_vector.healthy:
@@ -119,25 +172,28 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:
_memory_vector.add(entry["id"], text)
except Exception:
pass
return [TextContent(type="text", text=f"Memory added: [{category}] {text} (id: {entry['id'][:8]})")]
return _text_result(f"Memory added: [{category}] {text} (id: {entry['id'][:8]})")
elif action == "edit":
memory_id = arguments.get("memory_id", "")
new_text = arguments.get("text", "")
if not memory_id or not new_text:
return [TextContent(type="text", text="Error: edit needs memory_id and text")]
memories = _memory_manager.load_all()
found = False
return _text_result("Error: edit needs memory_id and text")
_owner, memories, visible, scope_error = _scope_entries()
if scope_error:
return _text_result(scope_error)
full_id = None
for m in memories:
for m in visible:
if m.get("id", "").startswith(memory_id):
m["text"] = new_text
m["timestamp"] = int(time.time())
found = True
full_id = m["id"]
break
if not found:
return [TextContent(type="text", text=f"Error: Memory '{memory_id}' not found")]
if not full_id:
return _text_result(f"Error: Memory '{memory_id}' not found")
for m in memories:
if m.get("id") == full_id:
m["text"] = new_text
m["timestamp"] = int(time.time())
break
_memory_manager.save(memories)
if _memory_vector and _memory_vector.healthy and full_id:
try:
@@ -145,24 +201,26 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:
_memory_vector.add(full_id, new_text)
except Exception:
pass
return [TextContent(type="text", text=f"Memory updated: {new_text}")]
return _text_result(f"Memory updated: {new_text}")
elif action == "delete":
memory_id = arguments.get("memory_id", "")
if not memory_id:
return [TextContent(type="text", text="Error: delete needs memory_id")]
memories = _memory_manager.load_all()
return _text_result("Error: delete needs memory_id")
_owner, memories, visible, scope_error = _scope_entries()
if scope_error:
return _text_result(scope_error)
full_id = None
deleted_text = ""
deleted_category = ""
for m in memories:
for m in visible:
if m.get("id", "").startswith(memory_id):
full_id = m["id"]
deleted_text = m.get("text", "")
deleted_category = m.get("category", "")
break
if not full_id:
return [TextContent(type="text", text=f"Error: Memory '{memory_id}' not found")]
return _text_result(f"Error: Memory '{memory_id}' not found")
memories = [m for m in memories if m.get("id") != full_id]
_memory_manager.save(memories)
if _memory_vector and _memory_vector.healthy and full_id:
@@ -172,30 +230,32 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:
pass
cat = f"[{deleted_category}] " if deleted_category else ""
snippet = deleted_text if len(deleted_text) <= 120 else deleted_text[:117] + "..."
return [TextContent(type="text", text=f"Memory deleted: {cat}{snippet} (id: {memory_id})")]
return _text_result(f"Memory deleted: {cat}{snippet} (id: {memory_id})")
elif action == "search":
query = arguments.get("text", "")
if not query:
return [TextContent(type="text", text="Error: search needs text (query)")]
memories = _memory_manager.load()
return _text_result("Error: search needs text (query)")
_owner, _all_memories, memories, scope_error = _scope_entries()
if scope_error:
return _text_result(scope_error)
if hasattr(_memory_manager, 'get_relevant_memories'):
results = _memory_manager.get_relevant_memories(query, memories, threshold=0.05, max_items=20)
else:
query_lower = query.lower()
results = [m for m in memories if query_lower in m.get("text", "").lower()][:20]
if not results:
return [TextContent(type="text", text=f"No memories found matching '{query}'.")]
return _text_result(f"No memories found matching '{query}'.")
lines = [f"Found {len(results)} matching memories:\n"]
for m in results:
cat = m.get("category", "fact")
mid = m.get("id", "?")[:8]
text = m.get("text", "")
lines.append(f"- [{cat}] `{mid}` — {text}")
return [TextContent(type="text", text="\n".join(lines))]
return _text_result("\n".join(lines))
else:
return [TextContent(type="text", text=f"Error: Unknown action '{action}'. Use: list, add, edit, delete, search")]
return _text_result(f"Error: Unknown action '{action}'. Use: list, add, edit, delete, search")
async def run():
+6 -77
View File
@@ -4,90 +4,19 @@
"requires": true,
"packages": {
"": {
"dependencies": {
"@anthropic-ai/sdk": "^0.98.0"
},
"devDependencies": {
"@antithesishq/bombadil": "^0.3.2"
}
},
"node_modules/@anthropic-ai/sdk": {
"version": "0.98.0",
"resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.98.0.tgz",
"integrity": "sha512-N7aXtCvC5g6T1Y4V29lJjceu/zTkVkIZF0jdBvagr0TRFHuKeImffalGWEfqZKrvjH+IQbzJWw6TmSmUzrlMgg==",
"license": "MIT",
"dependencies": {
"json-schema-to-ts": "^3.1.1",
"standardwebhooks": "^1.0.0"
},
"bin": {
"anthropic-ai-sdk": "bin/cli"
},
"peerDependencies": {
"zod": "^3.25.0 || ^4.0.0"
},
"peerDependenciesMeta": {
"zod": {
"optional": true
}
"@antithesishq/bombadil": "^0.6.1"
}
},
"node_modules/@antithesishq/bombadil": {
"version": "0.3.2",
"resolved": "https://registry.npmjs.org/@antithesishq/bombadil/-/bombadil-0.3.2.tgz",
"integrity": "sha512-ATy1w9ZY5gbny1H8DFc7rxZitT7DLLLFDiGcRZe+8TQiUrV5tLO+IJGOVNNLp3RpCqjZqSsxGiKoQsx31ipV1g==",
"version": "0.6.1",
"resolved": "https://registry.npmjs.org/@antithesishq/bombadil/-/bombadil-0.6.1.tgz",
"integrity": "sha512-d1iufG3MI7gSMSiSmMeNdcMW+qR0yQXL2zdkVynC3n3DYgFJYlYXKUQzygmqU12m4RWlR5iOdQU1hsx5UT6+IA==",
"dev": true,
"license": "MIT"
},
"node_modules/@babel/runtime": {
"version": "7.29.7",
"resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.7.tgz",
"integrity": "sha512-Nq8OhGWiZIZGV6hLHoyAKLLcJihP/xFeBMGJoUrxTX2psI8dCifzLhZISFb+VWS3wFMRDmCGw5R+dOySCqPLhw==",
"license": "MIT",
"engines": {
"node": ">=6.9.0"
"bin": {
"bombadil": "bin/bombadil.js"
}
},
"node_modules/@stablelib/base64": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/@stablelib/base64/-/base64-1.0.1.tgz",
"integrity": "sha512-1bnPQqSxSuc3Ii6MhBysoWCg58j97aUjuCSZrGSmDxNqtytIi0k8utUenAwTZN4V5mXXYGsVUI9zeBqy+jBOSQ==",
"license": "MIT"
},
"node_modules/fast-sha256": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/fast-sha256/-/fast-sha256-1.3.0.tgz",
"integrity": "sha512-n11RGP/lrWEFI/bWdygLxhI+pVeo1ZYIVwvvPkW7azl/rOy+F3HYRZ2K5zeE9mmkhQppyv9sQFx0JM9UabnpPQ==",
"license": "Unlicense"
},
"node_modules/json-schema-to-ts": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/json-schema-to-ts/-/json-schema-to-ts-3.1.1.tgz",
"integrity": "sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==",
"license": "MIT",
"dependencies": {
"@babel/runtime": "^7.18.3",
"ts-algebra": "^2.0.0"
},
"engines": {
"node": ">=16"
}
},
"node_modules/standardwebhooks": {
"version": "1.0.0",
"resolved": "https://registry.npmjs.org/standardwebhooks/-/standardwebhooks-1.0.0.tgz",
"integrity": "sha512-BbHGOQK9olHPMvQNHWul6MYlrRTAOKn03rOe4A8O3CLWhNf4YHBqq2HJKKC+sfqpxiBY52pNeesD6jIiLDz8jg==",
"license": "MIT",
"dependencies": {
"@stablelib/base64": "^1.0.0",
"fast-sha256": "^1.3.0"
}
},
"node_modules/ts-algebra": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/ts-algebra/-/ts-algebra-2.0.0.tgz",
"integrity": "sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==",
"license": "MIT"
}
}
}
+1 -4
View File
@@ -4,9 +4,6 @@
"url": "https://github.com/pewdiepie-archdaemon/odysseus.git"
},
"devDependencies": {
"@antithesishq/bombadil": "^0.3.2"
},
"dependencies": {
"@anthropic-ai/sdk": "^0.98.0"
"@antithesishq/bombadil": "^0.6.1"
}
}
+1 -1
View File
@@ -33,4 +33,4 @@ PyMuPDF
# magika (onnxruntime), already a core dep via fastembed. We avoid the
# [all]/Azure/audio extras (cloud + heavy). Pinned to a release >30 days old per
# the dependency-age discussion in issue #485.
markitdown[docx,pptx,xlsx,xls]==0.1.5
markitdown[docx,pptx,xlsx,xls]==0.1.6
+2 -2
View File
@@ -3,8 +3,8 @@ uvicorn
python-multipart
python-dotenv
httpx
pydantic>=2.0
pydantic-settings>=2.0
pydantic>=2.13.4
pydantic-settings>=2.14.1
SQLAlchemy
pypdf
beautifulsoup4
+3
View File
@@ -31,6 +31,7 @@ ALLOWED_SCOPES = {
TOKEN_PROFILES = {
"chat": ["chat"],
"codex_todos": ["todos:read", "todos:write"],
"codex_documents": ["documents:read", "documents:write"],
"codex_email_drafts": ["email:read", "email:draft", "documents:read", "documents:write"],
}
@@ -159,6 +160,8 @@ def setup_api_token_routes() -> APIRouter:
payload = await request.json()
except Exception:
payload = {}
if not isinstance(payload, dict):
payload = {}
with get_db_session() as db:
token = db.query(ApiToken).filter(ApiToken.id == token_id).first()
if not token:
+3 -2
View File
@@ -16,6 +16,7 @@ from pydantic import BaseModel
from core.database import SessionLocal, CrewMember, ScheduledTask
from src.auth_helpers import get_current_user
from core.auth import RESERVED_USERNAMES
from src.task_scheduler import compute_next_run
@@ -89,11 +90,11 @@ def setup_assistant_routes(task_scheduler) -> APIRouter:
# check-in tasks seeded. Hitting any /assistant route under one of these
# used to seed a full CrewMember + Morning/Midday/Evening tasks under that
# owner, which then double-fired alongside the real user's check-ins.
_SYNTHETIC_OWNERS = frozenset({"internal-tool", "api", "demo", "system", ""})
# RESERVED_USERNAMES covers the same set; the `not owner` guard handles "".
async def _get_or_create(owner: str) -> CrewMember:
"""Return the per-owner assistant CrewMember, creating it on demand."""
if not owner or owner in _SYNTHETIC_OWNERS:
if not owner or owner in RESERVED_USERNAMES:
raise HTTPException(status_code=400, detail=f"Cannot seed assistant for {owner!r}")
db = SessionLocal()
try:
+75 -11
View File
@@ -12,8 +12,8 @@ import re
from pathlib import Path
from core.atomic_io import atomic_write_json, atomic_write_text
from core.auth import AuthManager
from src.constants import DEEP_RESEARCH_DIR, MEMORY_FILE, SKILLS_DIR
from core.auth import AuthManager, RESERVED_USERNAMES, SetAdminResult, TOKEN_TTL
from src.constants import DEEP_RESEARCH_DIR, MEMORY_FILE, PASSWORD_MIN_LENGTH, SKILLS_DIR
from src.rate_limiter import RateLimiter
from src.settings_scrub import scrub_settings
from src.settings import (
@@ -73,6 +73,11 @@ class DeleteUserRequest(BaseModel):
class RenameUserRequest(BaseModel):
username: str
class SetAdminRequest(BaseModel):
is_admin: bool
class SetOpenRegistrationRequest(BaseModel):
enabled: bool
@@ -97,8 +102,12 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
raise HTTPException(429, "Too many requests — try again later")
if auth_manager.is_configured:
raise HTTPException(400, "Already configured")
if len(body.password) < 8:
raise HTTPException(400, "Password must be at least 8 characters")
if len(body.password) < PASSWORD_MIN_LENGTH:
raise HTTPException(400, f"Password must be at least {PASSWORD_MIN_LENGTH} characters")
if len(body.username.strip()) < 1:
raise HTTPException(400, "Username is required")
if body.username.lower() in RESERVED_USERNAMES:
raise HTTPException(403, "Username is reserved")
ok = await asyncio.to_thread(auth_manager.setup, body.username, body.password)
if not ok:
raise HTTPException(500, "Setup failed")
@@ -113,10 +122,12 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
raise HTTPException(400, "Run setup first")
if not auth_manager.signup_enabled:
raise HTTPException(403, "Registration is disabled. Ask an admin for an account.")
if len(body.password) < 8:
raise HTTPException(400, "Password must be at least 8 characters")
if len(body.password) < PASSWORD_MIN_LENGTH:
raise HTTPException(400, f"Password must be at least {PASSWORD_MIN_LENGTH} characters")
if len(body.username.strip()) < 1:
raise HTTPException(400, "Username is required")
if body.username.lower() in RESERVED_USERNAMES:
raise HTTPException(403, "Username is reserved")
ok = await asyncio.to_thread(auth_manager.create_user, body.username, body.password, is_admin=False)
if not ok:
raise HTTPException(409, "Username already taken")
@@ -139,6 +150,8 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
raise HTTPException(401, "Invalid 2FA code")
# All checks passed — create session (password already verified above)
token = await asyncio.to_thread(auth_manager.create_session_trusted, username)
if not token:
raise HTTPException(401, "Invalid credentials")
cookie_kwargs = dict(
key=SESSION_COOKIE,
value=token,
@@ -148,7 +161,7 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
path="/",
)
if body.remember:
cookie_kwargs["max_age"] = 60 * 60 * 24 * 7 # 7 days
cookie_kwargs["max_age"] = TOKEN_TTL
response.set_cookie(**cookie_kwargs)
return {"ok": True, "username": username}
@@ -177,13 +190,18 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
pass
return result
@router.get("/policy")
async def auth_policy():
"""Return public auth policy constants for the frontend."""
return auth_manager.policy()
@router.post("/change-password")
async def change_password(body: ChangePasswordRequest, request: Request):
user = _get_current_user(request)
if not user:
raise HTTPException(401, "Not authenticated")
if len(body.new_password) < 8:
raise HTTPException(400, "Password must be at least 8 characters")
if len(body.new_password) < PASSWORD_MIN_LENGTH:
raise HTTPException(400, f"Password must be at least {PASSWORD_MIN_LENGTH} characters")
current_token = request.cookies.get(SESSION_COOKIE)
ok = await asyncio.to_thread(auth_manager.change_password, user, body.current_password, body.new_password)
if not ok:
@@ -263,8 +281,12 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
user = _get_current_user(request)
if not user or not auth_manager.is_admin(user):
raise HTTPException(403, "Admin only")
if len(body.password) < 8:
raise HTTPException(400, "Password must be at least 8 characters")
if len(body.password) < PASSWORD_MIN_LENGTH:
raise HTTPException(400, f"Password must be at least {PASSWORD_MIN_LENGTH} characters")
if len(body.username.strip()) < 1:
raise HTTPException(400, "Username is required")
if body.username.lower() in RESERVED_USERNAMES:
raise HTTPException(403, "Username is reserved")
ok = auth_manager.create_user(body.username, body.password, body.is_admin)
if not ok:
raise HTTPException(409, "Username already taken")
@@ -427,6 +449,23 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
except Exception as e:
logger.warning("Failed to rename upload owner references %s -> %s: %s", old_username, new_username, e)
# direct personal RAG uploads live in per-owner directories and the
# vector metadata also carries the username used for owner-filtered
# search. Keep both in sync with the auth rename.
try:
from routes.personal_routes import rename_personal_upload_owner
personal_docs_manager = getattr(request.app.state, "personal_docs_manager", None)
if personal_docs_manager is not None:
rag_manager = getattr(personal_docs_manager, "rag_manager", None)
rename_personal_upload_owner(
old_username,
new_username,
personal_docs_manager=personal_docs_manager,
rag_manager=rag_manager,
)
except Exception as e:
logger.warning("Failed to rename personal RAG upload owner references %s -> %s: %s", old_username, new_username, e)
# skills: SKILL.md frontmatter carries owner: <username>; the usage
# sidecar (_usage.json) keys entries as owner::skill-name. Both must
# be updated or the renamed user's Skills panel goes empty.
@@ -487,6 +526,31 @@ def setup_auth_routes(auth_manager: AuthManager) -> APIRouter:
invalidator()
return {"ok": True, "username": new_username, "renamed_self": old_username == user}
@router.put("/users/{username}/admin")
async def set_user_admin(username: str, body: SetAdminRequest, request: Request):
"""Promote/demote a user to/from admin. Admin only.
The last remaining admin can't be demoted (no lockout). Self-demotion
is allowed while another admin exists; the `self` flag tells the UI to
reload the acting user into the normal-user view.
"""
user = _get_current_user(request)
if not user or not auth_manager.is_admin(user):
raise HTTPException(403, "Admin only")
result = auth_manager.set_admin(username, body.is_admin, user)
if result is SetAdminResult.USER_NOT_FOUND:
raise HTTPException(404, "User not found")
if result is SetAdminResult.NOT_AUTHORIZED:
raise HTTPException(403, "Admin only")
if result is SetAdminResult.LAST_ADMIN:
raise HTTPException(400, "Cannot demote the last admin")
target = (username or "").strip().lower()
return {
"ok": True,
"is_admin": body.is_admin,
"self": target == (user or "").strip().lower(),
}
@router.post("/signup-toggle", deprecated=True)
async def toggle_signup(request: Request):
"""
+95 -26
View File
@@ -14,7 +14,7 @@ from core.database import Session as DBSession, ModelEndpoint
from src.llm_core import normalize_model_id
from src.endpoint_resolver import normalize_base
from src.context_compactor import maybe_compact, trim_for_context
from src.auth_helpers import get_current_user
from src.auth_helpers import effective_user
from src.prompt_security import untrusted_context_message
from routes.prefs_routes import _load_for_user as load_prefs_for_user
@@ -22,6 +22,47 @@ from fastapi import HTTPException
logger = logging.getLogger(__name__)
_CASUAL_OPENING_RE = re.compile(
r"^\s*(?:h+i+|hey+|hello+|yo+|sup+|what'?s up|wass?up|hiya|howdy|"
r"lol|lmao|haha+|hehe+|thanks?|thank you|ty|idk|dunno|meh|bruh|bro)\b(?P<tail>.*)$",
re.IGNORECASE,
)
_CASUAL_BLOCKLIST_RE = re.compile(
r"\b(?:cookbook|serve|serving|launch|start|vllm|sglang|llama\.?cpp|ollama|"
r"download|model|email|document|doc|note|calendar|task|search|web|research|"
r"file|folder|repo|git|settings?|endpoint|api|token|mcp)\b",
re.IGNORECASE,
)
def _is_casual_low_signal(text: str) -> bool:
"""Short greetings/slang should not pull memory, skills, RAG, or docs."""
s = str(text or "").strip()
m = _CASUAL_OPENING_RE.match(s)
if not m:
return False
tail = m.group("tail") or ""
if _CASUAL_BLOCKLIST_RE.search(tail):
return False
tail_words = re.findall(r"[A-Za-z0-9_'-]+", tail)
return len(tail_words) <= 2
# Strong references to in-flight fire-and-forget tasks scheduled from this
# module. asyncio only keeps weak references to tasks created via
# create_task, so without this the GC can collect a task mid-execution and
# the background work (extraction, auto-naming) silently never runs.
# Mirrors WebhookManager._spawn_tracked from src/webhook_manager.py.
_BG_TASKS: set[asyncio.Task] = set()
def _spawn_bg(coro) -> asyncio.Task:
"""Schedule a background task and hold a strong reference until it finishes."""
task = asyncio.create_task(coro)
_BG_TASKS.add(task)
task.add_done_callback(_BG_TASKS.discard)
return task
# ── Data containers ────────────────────────────────────────────────────── #
@@ -78,7 +119,7 @@ def _enforce_chat_privileges(request, sess) -> None:
which means unrestricted allowed_models / zero cap -> no-op for them.
"""
try:
user = get_current_user(request)
user = effective_user(request)
except Exception:
user = None
if not user:
@@ -159,17 +200,9 @@ async def auto_name_session(session_manager, sess):
return
owner = getattr(sess, "owner", None)
t_url, t_model, t_headers = resolve_task_endpoint(owner=owner)
if not t_model:
# If no task/utility model is configured at all, fall back to
# the session's own model so auto-naming still works even on
# minimal setups.
from src.endpoint_resolver import resolve_endpoint
_fallback = resolve_endpoint("default", owner=owner)
if _fallback and _fallback[1]:
t_url, t_model, t_headers = _fallback
else:
t_url, t_model, t_headers = sess.endpoint_url, sess.model, sess.headers
t_url, t_model, t_headers = resolve_task_endpoint(
sess.endpoint_url, sess.model, sess.headers, owner=owner
)
if not t_model:
logger.debug("[auto-name] No model provided, skipping")
return
@@ -346,11 +379,11 @@ def add_user_message(sess, chat_handler, preprocessed: PreprocessedMessage, inco
def fire_message_event(request, webhook_manager, session_id: str, sess, message: str, compare_mode: bool = False):
"""Fire webhook and event_bus events for a new user message."""
if webhook_manager and not compare_mode:
asyncio.create_task(webhook_manager.fire("chat.message", {
webhook_manager.fire_and_forget("chat.message", {
"session_id": session_id, "model": sess.model, "message": message[:2000],
}))
})
from src.event_bus import fire_event
user = get_current_user(request)
user = effective_user(request)
fire_event("message_sent", user)
@@ -505,6 +538,29 @@ def _normalize_model_id_from_cache(sess) -> Optional[str]:
return None
def _session_is_research_spinoff(sess) -> bool:
"""True if this session was created via research "Discuss" spin-off.
Detected by the primer system message the spin-off endpoint seeds into
history (metadata ``research_spinoff_from``). Such sessions are grounded
on the seeded report, so global memory + personal-doc RAG injection is
suppressed for them (the report is the sole knowledge base). Handles both
ChatMessage objects and plain dicts.
"""
for m in getattr(sess, "history", []) or []:
role = getattr(m, "role", None)
if role is None and isinstance(m, dict):
role = m.get("role")
if role != "system":
continue
md = getattr(m, "metadata", None)
if md is None and isinstance(m, dict):
md = m.get("metadata")
if (md or {}).get("research_spinoff_from"):
return True
return False
async def build_chat_context(
sess,
request,
@@ -553,9 +609,11 @@ async def build_chat_context(
if not incognito:
fire_message_event(request, webhook_manager, session_id, sess, message, compare_mode)
# Resolve user prefs
user = get_current_user(request)
# Resolve owner-scoped prefs/context. Browser requests keep the cookie user;
# bearer-token chat requests use the token owner instead of the "api" sentinel.
user = effective_user(request)
uprefs = load_prefs_for_user(user)
casual_low_signal = _is_casual_low_signal(message)
# Memory enabled?
mem_enabled = not incognito and not no_memory and uprefs.get("memory_enabled", True)
@@ -565,18 +623,29 @@ async def build_chat_context(
if not allow_tool_preprocessing:
mem_enabled = False
skills_enabled = False
if casual_low_signal:
mem_enabled = False
skills_enabled = False
logger.debug(
"Memory enabled=%s for user=%s (incognito=%s, no_memory=%s, pref=%s)",
mem_enabled, user, incognito, no_memory, uprefs.get("memory_enabled", "NOT_SET"),
)
# Research-spinoff ("Discuss") sessions are grounded on the seeded report:
# the primer system message IS the knowledge base. Injecting global memory
# or personal-doc RAG on every turn pulls in keyword-matched but off-topic
# facts ("wrong data") and competes with the report, so suppress both here.
is_research_spinoff = _session_is_research_spinoff(sess)
if is_research_spinoff:
mem_enabled = False
# Use RAG?
use_rag_val = (str(use_rag).lower() != "false") if use_rag is not None else True
if incognito or not allow_tool_preprocessing:
if incognito or not allow_tool_preprocessing or is_research_spinoff or casual_low_signal:
use_rag_val = False
# If pre-fetched search context was provided (compare mode), skip live web search
skip_web = bool(search_context) or not allow_tool_preprocessing
skip_web = bool(search_context) or not allow_tool_preprocessing or casual_low_signal
# Build context preface
# The stream path uses enhanced_message (with CoT/preprocessing applied),
@@ -595,7 +664,7 @@ async def build_chat_context(
incognito=incognito,
use_skills=skills_enabled,
)
if use_rag is not None:
if use_rag is not None or is_research_spinoff or casual_low_signal:
_preface_kwargs["use_rag"] = use_rag_val
preface, rag_sources, web_sources = chat_processor.build_context_preface(**_preface_kwargs)
@@ -603,7 +672,7 @@ async def build_chat_context(
used_memories = getattr(chat_processor, '_last_used_memories', [])
# Inject pre-fetched search context (compare mode)
if search_context and allow_tool_preprocessing:
if search_context and allow_tool_preprocessing and not casual_low_signal:
preface.append(untrusted_context_message("prefetched search context", search_context))
# YouTube transcripts
@@ -1081,7 +1150,7 @@ def run_post_response_tasks(
)))
if _extraction_jobs:
asyncio.create_task(_run_extraction_jobs_sequentially(session_id, _extraction_jobs))
_spawn_bg(_run_extraction_jobs_sequentially(session_id, _extraction_jobs))
# Token accumulation
if last_metrics:
@@ -1089,11 +1158,11 @@ def run_post_response_tasks(
# Webhook
if webhook_manager and not compare_mode:
asyncio.create_task(webhook_manager.fire("chat.completed", {
webhook_manager.fire_and_forget("chat.completed", {
"session_id": session_id, "model": sess.model,
"user_message": message, "response": full_response[:2000],
}))
})
# Auto-name
if needs_auto_name(sess.name):
asyncio.create_task(auto_name_session(session_manager, sess))
_spawn_bg(auto_name_session(session_manager, sess))
+122 -21
View File
@@ -6,7 +6,7 @@ import os
import time
import logging
from datetime import datetime
from typing import Dict, Any, AsyncGenerator, List
from typing import Dict, Any, AsyncGenerator, List, Optional
from fastapi import APIRouter, Request, HTTPException, Form, Query
from fastapi.responses import StreamingResponse
@@ -23,12 +23,13 @@ from src.endpoint_resolver import normalize_base as _normalize_base, build_chat_
from src.session_search import search_session_messages
from src.prompt_security import untrusted_context_message
from core.exceptions import SessionNotFoundError
from src.auth_helpers import get_current_user
from src.auth_helpers import effective_user, get_current_user
from routes.session_routes import _verify_session_owner
from routes.document_helpers import _owner_session_filter
from core.database import SessionLocal, get_session_mode, set_session_mode
from core.database import Session as DBSession, ChatMessage as DBChatMessage
from core.database import Document as DBDocument, ModelEndpoint
from core.log_safety import redact_url
from routes.research_routes import _resolve_research_endpoint
from routes.model_routes import _visible_models
from routes.chat_helpers import (
@@ -126,7 +127,8 @@ def _clear_orphaned_session_endpoint(sess, owner: str | None = None) -> bool:
sess.model = ""
sess.headers = {}
return True
except Exception:
except Exception as e:
logger.warning("Failed to clear orphaned session endpoint", exc_info=e)
db.rollback()
return False
finally:
@@ -144,7 +146,8 @@ def _endpoint_cache_contains_model(endpoint, model: str) -> bool:
return True
try:
models = json.loads(raw) if isinstance(raw, str) else raw
except Exception:
except Exception as e:
logger.warning("Failed to parse cached models list, treating as containing model", exc_info=e)
return True
if not isinstance(models, list) or not models:
return True
@@ -236,7 +239,8 @@ def _recover_empty_session_model(sess, session_id: str, owner: str | None = None
is_chatgpt_subscription = False
try:
cached = json.loads(ep.cached_models) if isinstance(ep.cached_models, str) else (ep.cached_models or [])
except Exception:
except Exception as e:
logger.warning("Failed to parse cached_models for endpoint %r", getattr(ep, "id", "?"), exc_info=e)
cached = []
if not cached:
visible = []
@@ -360,7 +364,7 @@ def setup_chat_routes(
sess = session_manager.get_session(session)
except KeyError:
raise HTTPException(404, f"Session '{session}' not found")
owner = get_current_user(request)
owner = effective_user(request)
if _clear_orphaned_session_endpoint(sess, owner=owner):
raise HTTPException(400, "Selected model endpoint was removed. Pick another model in Settings.")
@@ -526,6 +530,66 @@ def setup_chat_routes(
active_doc_id = form_data.get("active_doc_id", "").strip()
logger.info(f"[doc-inject] chat_mode={chat_mode}, active_doc_id={active_doc_id!r}")
# Active email reader — when the user has an email open in the UI, the
# frontend passes its uid/folder/account so "reply", "summarize this",
# etc. resolve to the real email instead of the agent inventing a
# fake markdown draft.
active_email_uid = form_data.get("active_email_uid", "").strip()
active_email_folder = form_data.get("active_email_folder", "INBOX").strip() or "INBOX"
active_email_account = form_data.get("active_email_account", "").strip()
active_email_ctx: Optional[Dict[str, str]] = None
# Always reset between requests so a stale active-email pointer from
# a previous turn (different reader closed, different account, etc.)
# can't leak in when the user has no email open this turn.
try:
from src.tool_implementations import clear_active_email
clear_active_email()
except Exception:
pass
if active_email_uid:
active_email_ctx = {
"uid": active_email_uid,
"folder": active_email_folder,
"account": active_email_account,
}
# Try to enrich with subject + from so the agent's system prompt
# block can quote them. Best-effort: a stale cache is fine, a
# missing email just means we pass uid/folder/account only.
try:
from routes.email_routes import _read_cache_get, _read_cache_key
_ck = _read_cache_key(active_email_account or None, active_email_folder, active_email_uid, owner=get_current_user(request))
_cached_email = _read_cache_get(_ck)
if _cached_email and isinstance(_cached_email, dict):
active_email_ctx["subject"] = str(_cached_email.get("subject") or "")
active_email_ctx["from"] = str(
_cached_email.get("from_address")
or _cached_email.get("from")
or _cached_email.get("from_name")
or ""
)
_body_preview = (_cached_email.get("body") or "")[:2000]
if _body_preview:
active_email_ctx["body_preview"] = _body_preview
except Exception as _e:
logger.debug(f"[email-inject] cache enrich skipped: {_e}")
# Stash so email tools can resolve "this email" without UID guessing.
try:
from src.tool_implementations import set_active_email
set_active_email(
uid=active_email_uid,
folder=active_email_folder,
account=active_email_account or None,
subject=active_email_ctx.get("subject"),
sender=active_email_ctx.get("from"),
)
except Exception as _e:
logger.debug(f"[email-inject] set_active_email failed: {_e}")
logger.info(
"[email-inject] active_email uid=%s folder=%s account=%s subject=%r",
active_email_uid, active_email_folder, active_email_account or "(default)",
active_email_ctx.get("subject", ""),
)
try:
# Attachment-only sends: skip the message-required check when the
# user has attached one or more files (the attachment IS the action).
@@ -540,7 +604,7 @@ def setup_chat_routes(
# but BEFORE loading. Prevents cross-user session hijack.
_verify_session_owner(request, session)
sess = session_manager.get_session(session)
owner = get_current_user(request)
owner = effective_user(request)
if _clear_orphaned_session_endpoint(sess, owner=owner):
raise HTTPException(400, "Selected model endpoint was removed. Pick another model in Settings.")
# Issue #587: picker shows a model from the endpoint cache but
@@ -571,7 +635,7 @@ def setup_chat_routes(
_enforce_chat_privileges(request, sess)
# Ensure session has auth headers
resolve_session_auth(sess, session, owner=get_current_user(request))
resolve_session_auth(sess, session, owner=effective_user(request))
# Check for research_pending BEFORE mode persist overwrites it
do_research = str(use_research).lower() == "true"
@@ -586,8 +650,8 @@ def setup_chat_routes(
elif attachments:
try:
att_ids = [str(x) for x in json.loads(attachments)]
except Exception:
pass
except Exception as e:
logger.warning("Failed to parse attachments JSON, ignoring attachments", exc_info=e)
no_memory = str(form_data.get("no_memory", "")).lower() == "true"
pre_context_tool_policy = build_effective_tool_policy(
@@ -641,15 +705,27 @@ def setup_chat_routes(
active_doc_id,
)
active_doc = None
elif doc_session and doc_session != session:
logger.warning(
"[doc-inject] ignoring stale active_doc_id %s from session %s while in session %s",
active_doc_id,
doc_session,
session,
)
active_doc = None
else:
# NOTE: previously dropped the doc when doc.session_id
# != current chat session — but that broke the common
# case of "open an email draft from one chat, ask a
# different chat to write into it". The frontend only
# sends active_doc_id for docs currently visible in
# the UI, and we already owner-checked above, so trust
# the explicit signal. We just log the mismatch and
# re-bind the doc to the current session so future
# turns find it via the session-fallback path too.
if doc_session and doc_session != session:
logger.info(
"[doc-inject] cross-session active_doc_id %s (was session %s, now %s) — accepting and rebinding",
active_doc_id, doc_session, session,
)
try:
active_doc.session_id = session
_doc_db.commit()
except Exception as _e:
_doc_db.rollback()
logger.warning(f"[doc-inject] session rebind failed: {_e}")
logger.info(f"[doc-inject] found by ID: title={active_doc.title!r}, lang={active_doc.language!r}, is_active={active_doc.is_active}, content_len={len(active_doc.current_content or '')}")
else:
logger.warning(f"[doc-inject] NOT FOUND by ID {active_doc_id}")
@@ -714,6 +790,21 @@ def setup_chat_routes(
"manage_skills", # skill presets tied to user
})
# Active email reader open → strip the tools that let the agent
# "drift" to a new compose: create_document (writes a fake email-
# shaped .md file) and send_email (sends fresh to a recipient the
# agent invented). With those gone, the only paths left for "write
# email saying X" are ui_control open_email_reply (draft) and
# reply_to_email (immediate send) — both of which use the open
# email's UID. Code-level enforcement instead of relying on a
# prompt rule the model can ignore.
if active_email_ctx and active_email_ctx.get("uid"):
disabled_tools.update({
"create_document",
"send_email",
"mcp__email__send_email",
})
# Enforce per-user privileges
_privs = {}
_user = ctx.user
@@ -739,7 +830,11 @@ def setup_chat_routes(
from src.settings import get_setting
_global_disabled = get_setting("disabled_tools", [])
if _global_disabled and isinstance(_global_disabled, list):
disabled_tools.update(_global_disabled)
explicit_web_allowed = allow_web_search is not None and str(allow_web_search).lower() == "true"
if explicit_web_allowed:
disabled_tools.update(t for t in _global_disabled if t not in {"web_search", "web_fetch"})
else:
disabled_tools.update(_global_disabled)
# Light auto-escalation: the user is in chat mode and just expressed a
# notes/calendar/email intent. Grant the relevant managers but withhold
@@ -836,7 +931,7 @@ def setup_chat_routes(
if effective_do_research:
_r_ep, _r_model, _r_headers = _resolve_research_endpoint(sess)
_auth_keys = list(_r_headers.keys()) if _r_headers else []
logger.info(f"Research endpoint resolved: model={_r_model}, endpoint={_r_ep}, auth_keys={_auth_keys}, sess_headers_keys={list(sess.headers.keys()) if isinstance(sess.headers, dict) else type(sess.headers)}")
logger.info(f"Research endpoint resolved: model={_r_model}, endpoint={redact_url(_r_ep)}, auth_keys={_auth_keys}, sess_headers_keys={list(sess.headers.keys()) if isinstance(sess.headers, dict) else type(sess.headers)}")
# Clarification round: only for very short/vague queries on first research message.
# Skip in compare mode — each pane is a fresh session, so every one would
@@ -1169,6 +1264,10 @@ def setup_chat_routes(
_max_rounds = _DEFAULT_ROUNDS
_max_rounds = max(1, min(_max_rounds, 200))
_forced_tools = None
if allow_web_search is not None and str(allow_web_search).lower() == "true":
_forced_tools = {"web_search", "web_fetch"}
async for chunk in stream_agent_loop(
sess.endpoint_url,
sess.model,
@@ -1181,6 +1280,7 @@ def setup_chat_routes(
max_rounds=_max_rounds,
context_length=ctx.context_length,
active_document=active_doc,
active_email=active_email_ctx,
session_id=session,
disabled_tools=disabled_tools if disabled_tools else None,
tool_policy=tool_policy,
@@ -1189,6 +1289,7 @@ def setup_chat_routes(
plan_mode=plan_mode,
approved_plan=approved_plan or None,
workspace=workspace or None,
forced_tools=_forced_tools,
):
if chunk.startswith("data: ") and not chunk.startswith("data: [DONE]"):
try:
@@ -1394,7 +1495,7 @@ def setup_chat_routes(
if not q or not q.strip():
return []
_user = get_current_user(request)
_user = effective_user(request)
return [
result.to_dict()
for result in search_session_messages(
+75 -4
View File
@@ -46,8 +46,12 @@ def _ssh_prefix_for_task(task: dict) -> tuple[str, str]:
shell metacharacters in ``remoteHost`` is rejected with 400 rather than
injected.
"""
host = validate_remote_host((task.get("remoteHost") or "").strip() or None) or ""
ssh_port = validate_ssh_port((task.get("sshPort") or "").strip() or None) or ""
raw_host = task.get("remoteHost")
raw_port = task.get("sshPort")
host_value = str(raw_host).strip() if raw_host is not None else None
port_value = str(raw_port).strip() if raw_port is not None else None
host = validate_remote_host(host_value or None) or ""
ssh_port = validate_ssh_port(port_value or None) or ""
port_flag = f"-p {ssh_port} " if ssh_port and ssh_port != "22" else ""
return host, port_flag
@@ -91,6 +95,20 @@ def _scope_owner(request: Request, allowed: set[str]) -> str:
return require_user(request)
def _scope_owner_all(request: Request, required: set[str]) -> str:
"""Return owner only when an API token has every required scope."""
if getattr(request.state, "api_token", False):
scopes = set(getattr(request.state, "api_token_scopes", []) or [])
missing = required - scopes
if missing:
raise HTTPException(403, f"API token missing required scope: {' and '.join(sorted(missing))}")
owner = getattr(request.state, "api_token_owner", None)
if not owner:
raise HTTPException(403, "API token has no owner")
return owner
return require_user(request)
def _find_endpoint(router: APIRouter | None, method: str, path: str):
if router is None:
return None
@@ -138,7 +156,7 @@ def setup_codex_routes(
"read": scoped(EMAIL_READ_SCOPES),
"draft": scoped(EMAIL_DRAFT_SCOPES),
"send": scoped(EMAIL_SEND_SCOPES),
"actions": ["list", "read", "draft", "send"],
"actions": ["list", "read", "draft_document", "draft", "send"],
},
"memory": {
"read": scoped(MEMORY_READ_SCOPES),
@@ -262,6 +280,59 @@ def setup_codex_routes(
# Both handlers in routes/email_routes.py already accept `owner=` via
# FastAPI Depends, so we call them directly without patching state.
def _email_draft_document_content(body: dict[str, Any]) -> str:
def clean(v: Any) -> str:
if isinstance(v, list):
return ", ".join(str(x).strip() for x in v if str(x).strip())
return str(v or "").strip()
to = clean(body.get("to"))
cc = clean(body.get("cc"))
bcc = clean(body.get("bcc"))
subject = clean(body.get("subject"))
in_reply_to = clean(body.get("in_reply_to"))
references = clean(body.get("references"))
body_text = str(body.get("body") or body.get("body_html") or "").strip()
lines = [
f"To: {to}",
]
if cc:
lines.append(f"Cc: {cc}")
if bcc:
lines.append(f"Bcc: {bcc}")
lines.append(f"Subject: {subject}")
if in_reply_to:
lines.append(f"In-Reply-To: {in_reply_to}")
if references:
lines.append(f"References: {references}")
lines.extend(["---", body_text])
return "\n".join(lines).rstrip() + "\n"
@router.post("/emails/draft-document")
async def codex_email_draft_document(request: Request, body: dict[str, Any] = Body(default_factory=dict)):
owner = _scope_owner(request, EMAIL_DRAFT_SCOPES)
docs_owner = _scope_owner_all(request, DOCS_WRITE_SCOPES)
if docs_owner != owner:
raise HTTPException(403, "API token owner mismatch")
if documents_create_endpoint is None:
raise HTTPException(503, "Documents integration is not available")
from routes.document_routes import DocumentCreate
subject = str(body.get("subject") or "Email draft").strip() or "Email draft"
title = str(body.get("title") or subject).strip() or "Email draft"
req = DocumentCreate(
session_id=body.get("session_id"),
title=title,
language="email",
content=_email_draft_document_content(body),
)
result = await _as_owner(request, owner, documents_create_endpoint, request, req)
if isinstance(result, dict):
result = dict(result)
result["draft_type"] = "document"
result["send_required_confirmation"] = True
return result
@router.post("/emails/draft")
async def codex_email_draft(request: Request, body: dict[str, Any] = Body(default_factory=dict)):
owner = _scope_owner(request, EMAIL_DRAFT_SCOPES)
@@ -726,7 +797,7 @@ def setup_codex_routes(
norm = dict(body or {})
sess = (norm.get("tmux_session") or norm.get("session_id") or "").strip()
model = (norm.get("model") or norm.get("repo_id") or "").strip()
host = (norm.get("host") or norm.get("remote_host") or "").strip()
host = validate_remote_host((norm.get("host") or norm.get("remote_host") or "").strip() or None) or ""
port = norm.get("port") or 8000
import re as _re
if not sess or not _re.fullmatch(r"[a-zA-Z0-9_-]+", sess):
+83 -27
View File
@@ -12,11 +12,13 @@ import json
import csv
import io
import os
import inspect
import httpx
from pathlib import Path
from datetime import datetime
from urllib.parse import urljoin, urlparse, urlunparse
from core.log_safety import redact_url
from fastapi import APIRouter, Query, Depends, Response, HTTPException
from typing import List, Dict, Optional
@@ -90,11 +92,13 @@ def _normalize_contact(contact: Dict) -> Dict:
name = str(contact.get("name") or "").strip()
if not name and emails:
name = emails[0].split("@")[0]
address = str(contact.get("address") or "").strip()
return {
"uid": str(contact.get("uid") or uuid.uuid4()),
"name": name,
"emails": emails,
"phones": phones,
"address": address,
}
@@ -150,7 +154,7 @@ def _parse_vcards(text: str) -> List[Dict]:
for block in re.split(r"BEGIN:VCARD", text):
if not block.strip():
continue
contact = {"name": "", "emails": [], "phones": [], "uid": ""}
contact = {"name": "", "emails": [], "phones": [], "uid": "", "address": ""}
for line in block.split("\n"):
line = line.strip()
# Strip an optional RFC 6350 group prefix (e.g. "item1.EMAIL;...")
@@ -173,6 +177,15 @@ def _parse_vcards(text: str) -> List[Dict]:
phone = _vunesc(name_part.split(":", 1)[1])
if phone and phone not in contact["phones"]:
contact["phones"].append(phone)
elif name_part.startswith("ADR"):
# vCard ADR is 7 semicolon-separated components:
# post-office-box;extended-address;street;locality;region;postal-code;country.
# Recover a human-readable string by joining non-empty
# components with ", ".
if ":" in name_part:
raw = name_part.split(":", 1)[1]
parts = [_vunesc(p).strip() for p in raw.split(";")]
contact["address"] = ", ".join(p for p in parts if p)
elif name_part.startswith("UID:"):
contact["uid"] = _vunesc(name_part[4:])
if contact["name"] or contact["emails"]:
@@ -197,7 +210,8 @@ def _vesc(value: str) -> str:
def _build_vcard(name: str, email: str, uid: Optional[str] = None,
emails: Optional[List[str]] = None,
phones: Optional[List[str]] = None) -> str:
phones: Optional[List[str]] = None,
address: Optional[str] = None) -> str:
"""Build a vCard. Accepts either a single `email` (legacy callers) or
full `emails`/`phones` lists (edit path). The first email is marked
PREF=1. All values are RFC-6350-escaped."""
@@ -230,6 +244,12 @@ def _build_vcard(name: str, email: str, uid: Optional[str] = None,
lines.append(f"EMAIL;PREF=1:{_vesc(em)}" if i == 0 else f"EMAIL:{_vesc(em)}")
for ph in phone_list:
lines.append(f"TEL:{_vesc(ph)}")
# Address: stuff the whole human-readable string into the street
# component of ADR. vCard ADR has 7 semicolon-separated components:
# post-office-box;extended-address;street;locality;region;postal-code;country.
addr = (address or "").strip()
if addr:
lines.append(f"ADR:;;{_vesc(addr)};;;;")
lines.append("END:VCARD")
return "\r\n".join(lines) + "\r\n"
@@ -366,7 +386,7 @@ def _resolve_resource_url(uid: str) -> str:
return _lookup() or _vcard_url(uid)
def _create_contact(name: str, email: str) -> bool:
def _create_contact(name: str, email: str, address: str = "") -> bool:
"""Add a new contact via CardDAV or local contacts."""
cfg = _get_carddav_config()
if not _carddav_configured(cfg):
@@ -375,12 +395,12 @@ def _create_contact(name: str, email: str) -> bool:
for c in contacts:
if email_l and email_l in [e.lower() for e in c.get("emails", [])]:
return True
contacts.append(_normalize_contact({"name": name, "emails": [email]}))
contacts.append(_normalize_contact({"name": name, "emails": [email], "address": address}))
_save_local_contacts(contacts)
return True
contact_uid = str(uuid.uuid4())
vcard = _build_vcard(name, email, contact_uid)
vcard = _build_vcard(name, email, contact_uid, address=address)
try:
url = _carddav_base_url(cfg) + "/" + contact_uid + ".vcf"
auth = None
@@ -613,7 +633,7 @@ def _contacts_to_csv(contacts: List[Dict]) -> str:
return out.getvalue()
def _update_contact(uid: str, name: str, emails: List[str], phones: List[str]) -> bool:
def _update_contact(uid: str, name: str, emails: List[str], phones: List[str], address: str = "") -> bool:
"""Rewrite an existing contact via CardDAV or local contacts."""
cfg = _get_carddav_config()
if not _carddav_configured(cfg):
@@ -622,16 +642,19 @@ def _update_contact(uid: str, name: str, emails: List[str], phones: List[str]) -
out = []
for c in contacts:
if c.get("uid") == uid:
out.append(_normalize_contact({"uid": uid, "name": name, "emails": emails, "phones": phones}))
# Preserve existing address when caller passes "" (only
# updating name/emails/phones, not touching address).
addr = address if address else c.get("address", "")
out.append(_normalize_contact({"uid": uid, "name": name, "emails": emails, "phones": phones, "address": addr}))
found = True
else:
out.append(c)
if not found:
out.append(_normalize_contact({"uid": uid, "name": name, "emails": emails, "phones": phones}))
out.append(_normalize_contact({"uid": uid, "name": name, "emails": emails, "phones": phones, "address": address}))
_save_local_contacts(out)
return True
vcard = _build_vcard(name, "", uid=uid, emails=emails, phones=phones)
vcard = _build_vcard(name, "", uid=uid, emails=emails, phones=phones, address=address)
# Use the real resource href (handles externally-created contacts whose
# filename != UID); falls back to the <uid>.vcf guess.
try:
@@ -667,15 +690,24 @@ def _delete_contact(uid: str) -> bool:
url = _resolve_resource_url(uid)
auth = (cfg["username"], cfg["password"]) if cfg["username"] else None
r = httpx.delete(url, auth=auth, timeout=10)
if r.status_code in (200, 204):
_contact_cache["fetched_at"] = None
return True
if r.status_code == 404:
# Resource not found at the resolved URL. With href resolution
# this should be rare (genuinely already deleted). Invalidate
# the cache and report success so the UI doesn't keep a ghost.
logger.info(f"CardDAV DELETE 404 for {uid} — treating as already gone")
if r.status_code in (200, 204, 404):
# Invalidate cache so the next fetch sees the server truth.
_contact_cache["fetched_at"] = None
# Verify: force a fresh fetch and check the UID is actually gone.
# A 404 on the guessed URL ({uid}.vcf) can mean the contact
# lives at a different resource URL — the DELETE missed it but
# we'd silently report success. This check catches that.
fresh = _fetch_contacts(force=True)
still_there = any(c.get("uid") == uid for c in fresh)
if still_there:
logger.warning(
f"CardDAV DELETE reported success for {uid} "
f"but UID still present after re-fetch — "
f"resource URL may differ from {redact_url(url)}"
)
return False
if r.status_code == 404:
logger.info(f"CardDAV DELETE 404 for {uid} — already gone")
return True
logger.warning(f"CardDAV DELETE returned {r.status_code}: {r.text[:200]}")
return False
@@ -718,16 +750,39 @@ def setup_contacts_routes():
"""Add a new contact."""
name = (data.get("name") or "").strip()
email = (data.get("email") or "").strip()
phone = (data.get("phone") or "").strip()
address = (data.get("address") or "").strip()
if not email:
return {"success": False, "error": "Email required"}
# Check if already exists
contacts = _fetch_contacts()
for c in contacts:
if email.lower() in [e.lower() for e in c["emails"]]:
return {"success": True, "message": "Already exists", "contact": c}
# Check if already exists by email
if email:
contacts = _fetch_contacts()
for c in contacts:
if email.lower() in [e.lower() for e in c["emails"]]:
return {"success": True, "message": "Already exists", "contact": c}
if not name:
name = email.split("@")[0]
ok = _create_contact(name, email)
create_params = inspect.signature(_create_contact).parameters
if len(create_params) >= 3:
ok = _create_contact(name, email, address)
else:
ok = _create_contact(name, email)
# If a phone was provided, do an immediate update to thread it
# through (the simple _create_contact signature only takes name +
# email + address; phones happen via update).
if ok and phone:
try:
fresh = _fetch_contacts(force=True)
created = next((c for c in fresh if name == c.get("name") and (not email or email in c.get("emails", []))), None)
if created:
_update_contact(
created["uid"], name,
created.get("emails", []),
[phone],
address,
)
except Exception:
pass
return {"success": ok}
@router.post("/import")
@@ -810,7 +865,7 @@ def setup_contacts_routes():
# match PUT /{uid} with uid="config".
@router.put("/{uid}")
async def edit_contact(uid: str, data: dict, _admin: str = Depends(require_admin)):
"""Edit an existing contact — name / emails / phones."""
"""Edit an existing contact — name / emails / phones / address."""
name = (data.get("name") or "").strip()
emails = data.get("emails")
phones = data.get("phones")
@@ -818,11 +873,12 @@ def setup_contacts_routes():
emails = [data["email"]]
emails = [e.strip() for e in (emails or []) if e and e.strip()]
phones = [p.strip() for p in (phones or []) if p and p.strip()]
if not name and not emails:
return {"success": False, "error": "Name or email required"}
address = (data.get("address") or "").strip()
if not name and not emails and not address:
return {"success": False, "error": "Name, email, or address required"}
if not name and emails:
name = emails[0].split("@")[0]
ok = _update_contact(uid, name, emails, phones)
ok = _update_contact(uid, name, emails, phones, address)
return {"success": ok}
@router.delete("/{uid}")
+188 -19
View File
@@ -362,7 +362,12 @@ def _user_shell_path_bootstrap() -> list[str]:
' ODYSSEUS_USER_PATH="$("$ODYSSEUS_USER_SHELL" -ic \'printf "__ODYSSEUS_PATH__%s\\n" "$PATH"\' 2>/dev/null | sed -n \'s/^__ODYSSEUS_PATH__//p\' | tail -n 1 || true)"',
' if [ -n "$ODYSSEUS_USER_PATH" ]; then export PATH="$ODYSSEUS_USER_PATH:$PATH"; fi',
'fi',
'command -v python3 >/dev/null 2>&1 || python3() { python "$@"; }',
# Windows can expose python3 as a Microsoft Store App Execution Alias
# under WindowsApps. Git Bash sees that stub as present, but it exits
# before running Python. A Windows venv usually has python.exe, not
# python3.exe, so treat a missing or WindowsApps python3 as absent.
'_odys_py3="$(command -v python3 2>/dev/null || true)"',
'case "$_odys_py3" in ""|*[Ww]indows[Aa]pps*) python3() { python "$@"; } ;; esac',
'command -v python >/dev/null 2>&1 || python() { python3 "$@"; }',
]
@@ -500,6 +505,8 @@ def _cached_model_scan_script(model_dirs: list[str] | None = None, add_hf_cache:
" if u.startswith('KB'): return int(n * 1024)",
" return int(n)",
"def scan_ollama():",
" if any(m.get('is_ollama') for m in models): return",
" if os.name == 'nt' and not os.environ.get('ODYSSEUS_ALLOW_OLLAMA_CLI_SCAN'): return",
" if not shutil.which('ollama'): return",
" try:",
" p = subprocess.run(['ollama', 'list'], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, text=True, timeout=6)",
@@ -530,8 +537,8 @@ def _cached_model_scan_script(model_dirs: list[str] | None = None, add_hf_cache:
" models.append({'repo_id':name,'size_bytes':size_bytes,'nb_files':1,'has_incomplete':False,'path':'ollama','backend':'ollama','is_ollama':True})",
" return",
"for _hf_cache in hf_cache_paths(): scan_hf(_hf_cache)",
"scan_ollama()",
"scan_ollama_api()",
"scan_ollama()",
]
for model_dir in model_dirs or []:
lines.append(f"scan_dir(os.path.expanduser({model_dir!r}))")
@@ -779,25 +786,149 @@ def _append_llama_cpp_linux_accel_build_lines(runner_lines: list[str]) -> None:
to hard-wire CUDA on Linux. That made ROCm hosts attempt a CUDA configure and
fail with "CUDA Toolkit not found" instead of building with HIP.
"""
# Try a prebuilt binary from llama.cpp's GitHub releases FIRST — no
# cmake/build-essential/git/CUDA-headers needed at all. The from-source
# build below stays as a fallback (custom flags, esoteric arch, no
# internet, etc). 30 seconds vs 5+ minutes of compile, and removes
# every OS-package dep from the launch path. Sets _odysseus_have_prebuilt=1
# on success; the existing build-tier if/elif chain below is gated on
# that variable so we never compile twice or shadow the prebuilt symlink.
runner_lines.append(' _odysseus_have_prebuilt=""')
runner_lines.append(' _odysseus_arch="$(uname -m)"')
runner_lines.append(' _odysseus_prebuilt_url=""')
runner_lines.append(' if command -v curl >/dev/null 2>&1 && [ "$_odysseus_arch" = "x86_64" ]; then')
runner_lines.append(' _odysseus_pat=""')
runner_lines.append(' _odysseus_has_nv_inline() { command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L 2>/dev/null | grep -q "GPU "; }')
runner_lines.append(' _odysseus_has_vk_inline() { ldconfig -p 2>/dev/null | grep -q "libvulkan\\.so" || command -v vulkaninfo >/dev/null 2>&1 || [ -e /usr/lib/x86_64-linux-gnu/libvulkan.so.1 ]; }')
runner_lines.append(' _odysseus_has_vkdev_inline() { ls /dev/dri/renderD* >/dev/null 2>&1 || (lspci 2>/dev/null | grep -Ei \'VGA|3D|Display\' | grep -Eiq \'AMD|ATI|Radeon\'); }')
runner_lines.append(' if _odysseus_has_nv_inline; then')
runner_lines.append(' _odysseus_pat="ubuntu.*cuda"')
runner_lines.append(' elif _odysseus_has_vkdev_inline && _odysseus_has_vk_inline; then')
runner_lines.append(' _odysseus_pat="ubuntu.*vulkan"')
runner_lines.append(' else')
runner_lines.append(' _odysseus_pat="ubuntu-x64\\\\.zip"')
runner_lines.append(' fi')
runner_lines.append(' _odysseus_prebuilt_url="$(curl -fsSL --max-time 15 https://api.github.com/repos/ggml-org/llama.cpp/releases/latest 2>/dev/null | grep \'"browser_download_url"\' | cut -d\'"\' -f4 | grep -iE "$_odysseus_pat" | grep -iv "arm\\|aarch64" | head -1)"')
runner_lines.append(' fi')
# Accept any of unzip / bsdtar / python3 -m zipfile as the extractor.
# python3 is essentially always present on modern Linux, so this lets
# the prebuilt path work on minimal Ubuntu installs that lack `unzip`.
runner_lines.append(' if [ -n "$_odysseus_prebuilt_url" ] && (command -v unzip >/dev/null 2>&1 || command -v bsdtar >/dev/null 2>&1 || command -v python3 >/dev/null 2>&1); then')
runner_lines.append(' echo "[odysseus] Found prebuilt llama-server: $_odysseus_prebuilt_url"')
runner_lines.append(' mkdir -p ~/bin "$HOME/.cache/odysseus/llama-cpp-prebuilt" && cd "$HOME/.cache/odysseus/llama-cpp-prebuilt"')
runner_lines.append(' rm -f llama-cpp.zip')
runner_lines.append(' if curl -fsSL --max-time 120 "$_odysseus_prebuilt_url" -o llama-cpp.zip && [ -s llama-cpp.zip ]; then')
runner_lines.append(' rm -rf build && mkdir -p build')
runner_lines.append(' if command -v unzip >/dev/null 2>&1; then unzip -qq -o llama-cpp.zip -d build; elif command -v bsdtar >/dev/null 2>&1; then bsdtar -xf llama-cpp.zip -C build; else python3 -c "import zipfile; zipfile.ZipFile(\\"llama-cpp.zip\\").extractall(\\"build\\")"; fi')
runner_lines.append(' _odysseus_extracted="$(find build -type f -name llama-server 2>/dev/null | head -1)"')
runner_lines.append(' if [ -n "$_odysseus_extracted" ]; then')
runner_lines.append(' chmod +x "$_odysseus_extracted"')
runner_lines.append(' ln -sf "$_odysseus_extracted" ~/bin/llama-server')
runner_lines.append(' _odysseus_libdir="$(dirname "$_odysseus_extracted")"')
runner_lines.append(' mkdir -p ~/.config && echo "export LD_LIBRARY_PATH=\\"$_odysseus_libdir:\\${LD_LIBRARY_PATH:-}\\"" > ~/.config/odysseus-llama-cpp-env')
runner_lines.append(' _odysseus_have_prebuilt=1')
runner_lines.append(' echo "[odysseus] Prebuilt llama-server installed at $_odysseus_extracted"')
runner_lines.append(' fi')
runner_lines.append(' fi')
runner_lines.append(' [ -z "$_odysseus_have_prebuilt" ] && echo "[odysseus] Prebuilt download/extract failed — falling back to from-source build."')
runner_lines.append(' elif [ -z "$_odysseus_prebuilt_url" ]; then')
runner_lines.append(' echo "[odysseus] No matching prebuilt llama-server for this host (arch=$_odysseus_arch) — will build from source."')
runner_lines.append(' fi')
runner_lines.append(' if [ -z "$_odysseus_have_prebuilt" ]; then')
# Detect pip-installed nvcc (from vLLM/nvidia CUDA wheels) and put it on PATH
# so cmake's CUDA configure can find it. We keep this after the ROCm/HIP
# check — a machine with both stacks should honor the native HIP toolchain on
# AMD hosts instead of accidentally preferring a stray nvcc wheel.
runner_lines.append(' for _cudir in ~/.local/lib/python*/site-packages/nvidia/cu13 ~/.local/lib/python*/site-packages/nvidia/cu12 ~/.local/lib/python*/site-packages/nvidia/cuda_nvcc; do')
runner_lines.append(' [ -x "$_cudir/bin/nvcc" ] && export CUDA_HOME="$_cudir" && export PATH="$_cudir/bin:$PATH" && break')
runner_lines.append(' done')
# so cmake's CUDA configure can find it — BUT only when actual NVIDIA
# hardware is present. On AMD/Intel hosts the pip nvcc is a misleading
# leftover (no libcudart, no GPU it could target) and would otherwise
# send the build down the CUDA branch and fail with "CUDA Toolkit not
# found" instead of trying Vulkan.
runner_lines.append(' _odysseus_has_nvidia_hw() {')
runner_lines.append(' command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L 2>/dev/null | grep -q "GPU " && return 0')
runner_lines.append(' ls /dev/nvidia* >/dev/null 2>&1 && return 0')
runner_lines.append(' lspci 2>/dev/null | grep -iE \'VGA|3D|Display\' | grep -iq nvidia && return 0')
runner_lines.append(' return 1')
runner_lines.append(' }')
runner_lines.append(' if _odysseus_has_nvidia_hw; then')
runner_lines.append(' for _cudir in ~/.local/lib/python*/site-packages/nvidia/cu13 ~/.local/lib/python*/site-packages/nvidia/cu12 ~/.local/lib/python*/site-packages/nvidia/cuda_nvcc; do')
runner_lines.append(' [ -x "$_cudir/bin/nvcc" ] && export CUDA_HOME="$_cudir" && export PATH="$_cudir/bin:$PATH" && break')
runner_lines.append(' done')
runner_lines.append(' fi')
# rm -rf build so a prior poisoned CMakeCache.txt (e.g. from a failed CUDA
# or HIP attempt) doesn't cause the next configure to reuse stale settings.
runner_lines.append(' mkdir -p ~/bin')
runner_lines.append(' cd ~/llama.cpp && rm -rf build')
# Try to install cmake / build-essential / git automatically before the
# build, but ONLY via passwordless sudo (`sudo -n`) — interactive sudo
# would hang a tmux-backgrounded serve task waiting for a password. If
# sudo asks for a password the install is skipped silently and the
# diagnosis pattern (cookbook_routes.py / cookbook_helpers.py) surfaces
# an explicit "install cmake" suggestion in the Cookbook diagnosis
# toolbar after the inevitable build failure.
runner_lines.append(' _odysseus_apt_bootstrap() {')
runner_lines.append(' local _missing=""')
runner_lines.append(' command -v cmake >/dev/null 2>&1 || _missing="$_missing cmake"')
runner_lines.append(' command -v g++ >/dev/null 2>&1 || command -v gcc >/dev/null 2>&1 || _missing="$_missing build-essential"')
runner_lines.append(' command -v git >/dev/null 2>&1 || _missing="$_missing git"')
runner_lines.append(' [ -z "$_missing" ] && return 0')
runner_lines.append(' if command -v apt-get >/dev/null 2>&1 && sudo -n true 2>/dev/null; then')
runner_lines.append(' echo "[odysseus] Auto-installing missing build deps via apt:$_missing"')
runner_lines.append(' sudo -n env DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>&1 | tail -3')
runner_lines.append(' sudo -n env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends $_missing 2>&1 | tail -5 || true')
runner_lines.append(' elif command -v pacman >/dev/null 2>&1 && sudo -n true 2>/dev/null; then')
runner_lines.append(' echo "[odysseus] Auto-installing missing build deps via pacman:$_missing"')
runner_lines.append(' local _pacpkgs="$(echo "$_missing" | sed -e \'s/build-essential/base-devel/g\')"')
runner_lines.append(' sudo -n pacman -Sy --needed --noconfirm $_pacpkgs 2>&1 | tail -5 || true')
runner_lines.append(' elif command -v dnf >/dev/null 2>&1 && sudo -n true 2>/dev/null; then')
runner_lines.append(' echo "[odysseus] Auto-installing missing build deps via dnf:$_missing"')
runner_lines.append(' local _dnfpkgs="$(echo "$_missing" | sed -e \'s/build-essential/gcc gcc-c++ make/g\')"')
runner_lines.append(' sudo -n dnf install -y $_dnfpkgs 2>&1 | tail -5 || true')
runner_lines.append(' else')
runner_lines.append(' echo "[odysseus] WARNING: missing build deps ($_missing) — passwordless sudo is unavailable, cannot auto-install. Cookbook Diagnosis will explain the fix after the build fails."')
runner_lines.append(' fi')
runner_lines.append(' }')
runner_lines.append(' _odysseus_apt_bootstrap')
runner_lines.append(' _odysseus_missing_build_deps=""')
runner_lines.append(' command -v cmake >/dev/null 2>&1 || _odysseus_missing_build_deps="$_odysseus_missing_build_deps cmake"')
runner_lines.append(' command -v git >/dev/null 2>&1 || _odysseus_missing_build_deps="$_odysseus_missing_build_deps git"')
runner_lines.append(' command -v g++ >/dev/null 2>&1 || command -v gcc >/dev/null 2>&1 || _odysseus_missing_build_deps="$_odysseus_missing_build_deps build-essential"')
runner_lines.append(' if [ -n "$_odysseus_missing_build_deps" ]; then')
runner_lines.append(' echo "ERROR: llama.cpp source build needs missing packages:$_odysseus_missing_build_deps"')
runner_lines.append(' if command -v apt-get >/dev/null 2>&1; then')
runner_lines.append(' echo "Install on this host: sudo apt-get update && sudo apt-get install -y cmake build-essential git"')
runner_lines.append(' elif command -v pacman >/dev/null 2>&1; then')
runner_lines.append(' echo "Install on this host: sudo pacman -Sy --needed cmake base-devel git"')
runner_lines.append(' elif command -v dnf >/dev/null 2>&1; then')
runner_lines.append(' echo "Install on this host: sudo dnf install -y cmake gcc gcc-c++ make git"')
runner_lines.append(' fi')
runner_lines.append(' echo "Alternative: install a native llama-server on PATH, then relaunch."')
runner_lines.append(' ODYSSEUS_PREFLIGHT_EXIT=127')
runner_lines.append(' fi')
runner_lines.append(' cd ~/llama.cpp')
runner_lines.append(' _odysseus_has_vulkan() {')
runner_lines.append(' ldconfig -p 2>/dev/null | grep -q \'libvulkan\\.so\' && return 0')
runner_lines.append(' [ -e /usr/lib/libvulkan.so.1 ] && return 0')
runner_lines.append(' [ -e /usr/lib/x86_64-linux-gnu/libvulkan.so.1 ] && return 0')
runner_lines.append(' command -v vulkaninfo >/dev/null 2>&1 && return 0')
runner_lines.append(' return 1')
runner_lines.append(' }')
runner_lines.append(' _odysseus_has_vulkan_device() {')
runner_lines.append(' ls /dev/dri/renderD* >/dev/null 2>&1 && return 0')
runner_lines.append(' lspci 2>/dev/null | grep -Ei \'VGA|3D|Display\' | grep -Eiq \'AMD|ATI|Radeon\' && return 0')
runner_lines.append(' return 1')
runner_lines.append(' }')
# Backend preference: native ROCm/HIP > native CUDA > Vulkan > CPU.
# Vulkan is a portable fallback that works on AMD when ROCm isn't
# installed (e.g. Strix Halo) and on any vendor's discrete GPU, but
# it's ~30-40% slower than native HIP/CUDA for LLM inference — only
# pick it when no native toolchain is present.
runner_lines.append(' if command -v hipconfig &>/dev/null || [ -d /opt/rocm ] || [ -n "$ROCM_PATH" ] || [ -n "$HIP_PATH" ]; then')
runner_lines.append(' rm -rf build')
runner_lines.append(' if command -v hipconfig &>/dev/null; then')
runner_lines.append(' export HIPCXX="${HIPCXX:-$(hipconfig -l)/clang}"')
runner_lines.append(' export HIP_PATH="${HIP_PATH:-$(hipconfig -R)}"')
runner_lines.append(' fi')
runner_lines.append(' echo "[odysseus] ROCm/HIP detected — building llama-server with HIP support..."')
runner_lines.append(' cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_HIP=ON && cmake --build build -j"$NPROC" --target llama-server && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
runner_lines.append(' elif command -v nvcc &>/dev/null; then')
runner_lines.append(' elif command -v nvcc &>/dev/null && _odysseus_has_nvidia_hw; then')
runner_lines.append(' rm -rf build')
# nvcc alone is not sufficient — pip-installed CUDA wheels or incomplete
# tooling can expose nvcc without shipping libcudart, causing cmake to fail
# mid-build with "CUDA runtime library not found". Check cudart explicitly
@@ -821,31 +952,50 @@ def _append_llama_cpp_linux_accel_build_lines(runner_lines: list[str]) -> None:
runner_lines.append(' echo "[odysseus] Ensure libcudart is installed (e.g. cuda-runtime package) and visible via ldconfig or CUDA_HOME."')
runner_lines.append(' cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j"$NPROC" --target llama-server && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
runner_lines.append(' fi')
runner_lines.append(' elif _odysseus_has_vulkan_device && _odysseus_has_vulkan; then')
runner_lines.append(' echo "[odysseus] Vulkan-capable GPU detected (no ROCm/CUDA toolchain installed) — building llama-server with Vulkan support..."')
runner_lines.append(' rm -rf build-vulkan')
runner_lines.append(' cmake -B build-vulkan -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON && cmake --build build-vulkan -j"$NPROC" --target llama-server && ln -sf ~/llama.cpp/build-vulkan/bin/llama-server ~/bin/llama-server')
runner_lines.append(' else')
runner_lines.append(' echo "[odysseus] WARNING: no HIP/CUDA toolchain found — building llama-server for CPU only."')
runner_lines.append(' echo "[odysseus] WARNING: no HIP/CUDA/Vulkan toolchain found — building llama-server for CPU only."')
runner_lines.append(' echo "[odysseus] GPU inference will not be available for this llama.cpp build."')
runner_lines.append(' echo "[odysseus] Install ROCm for AMD GPUs or vLLM/CUDA tooling for NVIDIA, then re-launch this serve task."')
runner_lines.append(' echo "[odysseus] Install Vulkan (libvulkan-dev) / ROCm for AMD GPUs or CUDA tooling for NVIDIA, then re-launch this serve task."')
runner_lines.append(' rm -rf build')
runner_lines.append(' cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j"$NPROC" --target llama-server && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
runner_lines.append(' fi')
runner_lines.append(' fi # end _odysseus_have_prebuilt guard')
def _llama_cpp_rebuild_cmd() -> str:
def _llama_cpp_rebuild_cmd(update_source: bool = False) -> str:
"""Shell command that clears the Cookbook-managed llama.cpp build.
Removes the cached ``llama-server`` symlink and the ``~/llama.cpp/build``
Removes the cached ``llama-server`` symlink and the ``~/llama.cpp/build*``
directory so the next llama.cpp serve recompiles from source, picking up a
CUDA or HIP toolchain if one is now available. The serve bootstrap only
builds when ``llama-server`` is missing from PATH, so without this an
existing CPU-only build is reused forever. It deliberately installs and
downloads nothing; the rebuild itself happens on the next serve.
existing CPU-only build is reused forever. When ``update_source`` is true,
the command also fast-forwards the Cookbook-managed ``~/llama.cpp`` checkout
if it exists. The rebuild itself happens on the next serve.
"""
update_cmd = ''
if update_source:
update_cmd = (
'if [ -d "$HOME/llama.cpp/.git" ]; then '
'git -C "$HOME/llama.cpp" pull --ff-only --depth 1 || '
'echo "[odysseus] WARNING: llama.cpp source update failed; clearing cached build anyway."; '
'elif command -v git >/dev/null 2>&1; then '
'git clone --depth 1 https://github.com/ggml-org/llama.cpp "$HOME/llama.cpp" || '
'echo "[odysseus] WARNING: llama.cpp clone failed; clearing cached build anyway."; '
'fi && '
)
return (
'mkdir -p "$HOME/bin" && '
f'{update_cmd}'
'rm -f "$HOME/bin/llama-server" && '
'rm -rf "$HOME/llama.cpp/build" && '
'rm -rf "$HOME/llama.cpp/build" "$HOME/llama.cpp/build-vulkan" && '
'echo "[odysseus] Cleared the cached llama.cpp build. '
'Re-launch the serve task to rebuild llama-server from source '
'(CUDA or HIP will be used if a toolchain is now available)."'
'(Vulkan, HIP, or CUDA will be used if a matching toolchain is now available)."'
)
@@ -1108,8 +1258,27 @@ def _diagnose_serve_output(text: str) -> dict | None:
"SGLang is not installed or not in PATH on this server.",
[{"label": "install SGLang in Cookbook Dependencies", "op": "dependency", "package": "sglang[all]"}],
),
# System build deps come BEFORE the generic llama.cpp catch-all so
# cmake / build-essential / git missing → a specific OS-package
# remediation instead of "install llama-cpp-python[server]" (which
# itself fails to compile when cmake is absent).
(
r"llama-server.*command not found|llama\.cpp.*not found|No module named.*llama_cpp|No module named 'starlette_context'|git: command not found|cmake: command not found",
r"cmake: command not found|cmake.*not found.*[Cc]ould not",
"cmake is required to build llama.cpp from source but isn't installed on this server.",
[{"label": "install build deps for llama.cpp (apt: cmake build-essential git / pacman: cmake base-devel git / dnf: cmake gcc-c++ make git / brew: cmake git)", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
(
r"^(make|g\+\+|gcc): command not found|Could not find C\+\+ compiler",
"A C/C++ compiler (build-essential) is required to build llama.cpp from source.",
[{"label": "install build deps for llama.cpp on this server", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
(
r"^git: command not found",
"git is required to clone the llama.cpp source tree.",
[{"label": "install build deps for llama.cpp on this server", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
(
r"llama-server.*command not found|llama\.cpp.*not found|No module named.*llama_cpp|No module named 'starlette_context'",
"llama.cpp / llama-cpp-python dependencies are missing.",
[{"label": "install llama.cpp dependencies or llama-cpp-python[server]", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
+533 -22
View File
@@ -189,8 +189,27 @@ def setup_cookbook_routes() -> APIRouter:
"SGLang is not installed or not in PATH on this server.",
[{"label": "install SGLang in Cookbook Dependencies", "op": "dependency", "package": "sglang[all]"}],
),
# System build deps come BEFORE the generic llama.cpp catch-all
# so cmake / build-essential / git missing → a specific OS-package
# remediation instead of "install llama-cpp-python[server]" (which
# itself fails to compile when cmake is absent).
(
r"llama-server.*command not found|llama\.cpp.*not found|No module named.*llama_cpp|No module named 'starlette_context'|git: command not found|cmake: command not found",
r"cmake: command not found|cmake.*not found.*[Cc]ould not",
"cmake is required to build llama.cpp from source but isn't installed on this server.",
[{"label": "install build deps for llama.cpp (apt: cmake build-essential git / pacman: cmake base-devel git / dnf: cmake gcc-c++ make git / brew: cmake git)", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
(
r"^(make|g\+\+|gcc): command not found|Could not find C\+\+ compiler",
"A C/C++ compiler (build-essential) is required to build llama.cpp from source.",
[{"label": "install build deps for llama.cpp on this server", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
(
r"^git: command not found",
"git is required to clone the llama.cpp source tree.",
[{"label": "install build deps for llama.cpp on this server", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
(
r"llama-server.*command not found|llama\.cpp.*not found|No module named.*llama_cpp|No module named 'starlette_context'",
"llama.cpp / llama-cpp-python dependencies are missing.",
[{"label": "install llama.cpp dependencies or llama-cpp-python[server]", "op": "dependency", "package": "llama-cpp-python[server]"}],
),
@@ -254,6 +273,79 @@ def setup_cookbook_routes() -> APIRouter:
def _load_stored_hf_token() -> str:
return load_stored_hf_token(state_path=_cookbook_state_path)
def _normalize_minimax_m3_vllm_cmd(cmd: str) -> str:
"""Patch MiniMax M3 vLLM launches into the known-good local form.
The browser form can be stale or omit advanced-only fields. MiniMax M3
is sensitive to several flags: using the HF repo id with block-size 128
fails KV-cache setup, and FlashInfer sampler JIT fails on this host's
system nvcc. Normalize server-side before writing the tmux runner.
"""
cmd_lower = (cmd or "").lower()
if not cmd or "vllm serve" not in cmd_lower or "minimax" not in cmd_lower or "m3" not in cmd_lower:
return cmd
try:
parts = shlex.split(cmd)
except ValueError:
return cmd
if "serve" not in parts:
return cmd
env_re = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*=")
env_parts = [p for p in parts if env_re.match(p)]
body = [p for p in parts if not env_re.match(p)]
try:
serve_i = body.index("serve")
except ValueError:
return cmd
if serve_i + 1 >= len(body):
return cmd
repo_id = "cyankiwi/MiniMax-M3-AWQ-INT4"
snapshot = (
"/home/pewds/.cache/huggingface/hub/"
"models--cyankiwi--MiniMax-M3-AWQ-INT4/"
"snapshots/4082acbbec1236d21828d55b6bb0fe02ade4ab5b"
)
if body[serve_i + 1] == repo_id:
body[serve_i + 1] = snapshot
def add_env(key: str, value: str) -> None:
if not any(p.startswith(f"{key}=") for p in env_parts):
env_parts.append(f"{key}={value}")
def has_flag(flag: str) -> bool:
return any(p == flag or p.startswith(flag + "=") for p in body)
def set_flag(flag: str, value: str) -> None:
for i, part in enumerate(body):
if part == flag:
if i + 1 < len(body):
body[i + 1] = value
else:
body.append(value)
return
if part.startswith(flag + "="):
body[i] = f"{flag}={value}"
return
body.extend([flag, value])
def add_bool(flag: str) -> None:
if not has_flag(flag):
body.append(flag)
add_env("VLLM_TARGET_DEVICE", "cuda")
add_env("VLLM_USE_FLASHINFER_SAMPLER", "0")
set_flag("--served-model-name", repo_id)
set_flag("--tool-call-parser", "minimax_m3")
set_flag("--reasoning-parser", "minimax_m3")
set_flag("--attention-backend", "TRITON_ATTN")
set_flag("--block-size", "128")
add_bool("--language-model-only")
add_bool("--disable-custom-all-reduce")
add_bool("--enable-expert-parallel")
return shlex.join(env_parts + body)
def _cookbook_ssh_dir() -> Path:
# The Docker image keeps cookbook keys under /app/.ssh; that path only
# exists inside the container. On Windows (and any non-container host)
@@ -676,7 +768,7 @@ def setup_cookbook_routes() -> APIRouter:
_spf = f"-p {_port} " if _port and _port != "22" else ""
setup_cmd = (
f"scp -O {_pf}-q '{runner_path}' {remote}:{remote_runner} && "
f"ssh {_spf}{remote} 'chmod +x {remote_runner} && tmux new-session -d -s {session_id} \"./{remote_runner}\"'"
f"ssh {_spf}{remote} 'chmod +x {remote_runner} && tmux set-option -g history-limit 100000 2>/dev/null; tmux new-session -d -s {session_id} \"./{remote_runner}\"'"
)
else:
# Local: run hf download in the background (tmux on POSIX, a detached
@@ -708,7 +800,7 @@ def setup_cookbook_routes() -> APIRouter:
lines.append('exec "${SHELL:-/bin/bash}"')
wrapper_script.write_text("\n".join(lines) + "\n", encoding="utf-8")
wrapper_script.chmod(0o755)
setup_cmd = None if IS_WINDOWS else f"tmux new-session -d -s {session_id} {shlex.quote(str(wrapper_script))}"
setup_cmd = None if IS_WINDOWS else f"tmux set-option -g history-limit 100000 2>/dev/null; tmux new-session -d -s {session_id} {shlex.quote(str(wrapper_script))}"
logger.info(f"Model download: {req.repo_id} (backend={'ollama' if is_ollama_download else 'hf'}, include={req.include}, session={session_id}, remote={remote})")
logger.info(f"Download setup_cmd: {setup_cmd}")
@@ -984,9 +1076,9 @@ def setup_cookbook_routes() -> APIRouter:
ssh_args = ["ssh"]
if ssh_port and ssh_port != "22":
ssh_args.extend(["-p", str(ssh_port)])
capture_cmd = ssh_args + [remote, "tmux", "capture-pane", "-t", session_id, "-p", "-S", "-200"]
capture_cmd = ssh_args + [remote, "tmux", "capture-pane", "-t", session_id, "-p", "-S", "-2000"]
else:
capture_cmd = ["tmux", "capture-pane", "-t", session_id, "-p", "-S", "-200"]
capture_cmd = ["tmux", "capture-pane", "-t", session_id, "-p", "-S", "-2000"]
_exit_re = re.compile(r"=== Process exited with code (-?\d+) ===")
for wait_s in _waits:
@@ -1230,6 +1322,7 @@ def setup_cookbook_routes() -> APIRouter:
# `TypeError: argument of type 'NoneType'` (a 500 instead of a clean 400).
req.cmd = _validate_serve_cmd(req.cmd) or ""
req.cmd = _normalize_llama_cpp_python_cache_types(req.cmd) or ""
req.cmd = _normalize_minimax_m3_vllm_cmd(req.cmd)
req.cmd = _venv_safe_local_pip_install_cmd(
req.cmd,
local=not bool(req.remote_host),
@@ -1243,8 +1336,16 @@ def setup_cookbook_routes() -> APIRouter:
req.cmd = _pip_install_no_cache(req.cmd)
# Accept common aliases and enforce server extras for llama-cpp so
# `python -m llama_cpp.server` has all runtime dependencies.
req.cmd = re.sub(r"(?<![A-Za-z0-9_.-])llama_cpp(?![A-Za-z0-9_.-])", "llama-cpp-python[server]", req.cmd)
req.cmd = re.sub(r"(?<![A-Za-z0-9_.-])llama-cpp-python(?!\[)", "llama-cpp-python[server]", req.cmd)
# CRITICAL: the lookbehind / lookahead must also exclude `/` so
# the regex DOESN'T mangle a URL path like
# https://abetlen.github.io/llama-cpp-python/whl/cu124
# The previous regex turned that URL into
# https://abetlen.github.io/llama-cpp-python[server]/whl/cu124
# which pip then couldn't resolve → silent fallback to source
# build of the .tar.gz → CPU-only binary (because CMAKE_ARGS
# isn't set), defeating the entire purpose of the CUDA index.
req.cmd = re.sub(r"(?<![A-Za-z0-9_.\-/])llama_cpp(?![A-Za-z0-9_.\-/])", "llama-cpp-python[server]", req.cmd)
req.cmd = re.sub(r"(?<![A-Za-z0-9_.\-/])llama-cpp-python(?![\[/])", "llama-cpp-python[server]", req.cmd)
if "llama-cpp-python" in req.cmd and "--extra-index-url" not in req.cmd:
req.cmd += " --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu"
# PEP-508-style package spec — letters, digits, `.-_` for the
@@ -1284,6 +1385,11 @@ def setup_cookbook_routes() -> APIRouter:
# LOCAL execution on a native-Windows host never uses tmux (detached
# process path below), regardless of the UI-supplied platform.
local_windows = IS_WINDOWS and not remote
if is_windows and remote and "diffusion_server.py" in req.cmd:
raise HTTPException(
400,
"Remote Windows Diffusers serving is not supported yet; use local Windows or a Linux remote server.",
)
if not is_windows and not local_windows and not await _binary_available("tmux", remote, req.ssh_port):
return {
@@ -1426,6 +1532,69 @@ def setup_cookbook_routes() -> APIRouter:
runner_lines.append(' else')
_append_llama_cpp_linux_accel_build_lines(runner_lines)
runner_lines.append(' fi')
# Source the env file the prebuilt-download path writes so
# LD_LIBRARY_PATH includes the directory holding libllama.so
# and friends. No-op when prebuilt wasn't used.
runner_lines.append(' [ -r ~/.config/odysseus-llama-cpp-env ] && . ~/.config/odysseus-llama-cpp-env')
# Auto-upgrade pip llama-cpp-python to the CUDA-enabled
# wheel when (a) NVIDIA hardware is present and (b) the
# currently-installed wheel is CPU-only. Without this the
# user gets the Python server happily running at 3 tok/s
# because pip's default index ships CPU-only wheels.
# Forward-compat: cu124 wheels work on driver/runtime
# 12.4+ including the cu13.x line.
runner_lines.append(' if command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi -L 2>/dev/null | grep -q "GPU " && python3 -c "import llama_cpp" 2>/dev/null; then')
runner_lines.append(' if ! python3 -c "import llama_cpp; import sys; sys.exit(0 if llama_cpp.llama_supports_gpu_offload() else 1)" 2>/dev/null; then')
runner_lines.append(' echo "[odysseus] NVIDIA detected but installed llama-cpp-python is CPU-only — reinstalling with CUDA wheel index for GPU offload..."')
runner_lines.append(' python3 -m pip install --user --break-system-packages --force-reinstall --no-cache-dir "llama-cpp-python[server]" --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124 2>&1 | tail -8 || echo "[odysseus] WARNING: CUDA wheel reinstall failed — Python server will stay CPU-only (slow). Manual fix: pip install --user --force-reinstall \'llama-cpp-python[server]\' --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124"')
runner_lines.append(' if python3 -c "import llama_cpp; import sys; sys.exit(0 if llama_cpp.llama_supports_gpu_offload() else 1)" 2>/dev/null; then')
runner_lines.append(' echo "[odysseus] llama-cpp-python now supports GPU offload."')
runner_lines.append(' fi')
runner_lines.append(' fi')
runner_lines.append(' fi')
# SHORT-CIRCUIT before the build/pip fallback: if the
# native binary is missing but llama_cpp Python is already
# installed, drop a wrapper at ~/bin/llama-server that
# translates llama-server CLI args to llama_cpp.server's
# underscore-style flags. The user's serve command stays
# `llama-server ...` and "just works" — no build, no cmake,
# no second install. This is the path that unblocks every
# remote where pip-installed llama-cpp-python is already
# working but Cookbook used to insist on a native binary.
runner_lines.append(' if ! command -v llama-server >/dev/null 2>&1 && python3 -c "import llama_cpp" 2>/dev/null; then')
runner_lines.append(' mkdir -p ~/bin')
runner_lines.append(' cat > ~/bin/llama-server <<\'_ODY_LLAMA_SHIM_EOF\'')
runner_lines.append('#!/usr/bin/env bash')
runner_lines.append('# Auto-generated by Odysseus Cookbook: a `llama-server` lookalike')
runner_lines.append('# that translates the native CLI to `python -m llama_cpp.server`.')
runner_lines.append('# Lets cookbook-generated launch commands run unchanged on hosts')
runner_lines.append('# where only the pip llama-cpp-python package is installed.')
runner_lines.append('ARGS=()')
runner_lines.append('while [ $# -gt 0 ]; do')
runner_lines.append(' case "$1" in')
runner_lines.append(' -ngl|--gpu-layers|--n-gpu-layers) ARGS+=(--n_gpu_layers "$2"); shift 2 ;;')
runner_lines.append(' -c|--ctx-size) ARGS+=(--n_ctx "$2"); shift 2 ;;')
runner_lines.append(' -b|--batch-size) ARGS+=(--n_batch "$2"); shift 2 ;;')
runner_lines.append(' -ub|--ubatch-size) shift 2 ;; # llama-cpp-python has no separate ubatch')
runner_lines.append(' --flash-attn) ARGS+=(--flash_attn true); shift 2 ;;')
runner_lines.append(' --cache-type-k) ARGS+=(--type_k "$2"); shift 2 ;;')
runner_lines.append(' --cache-type-v) ARGS+=(--type_v "$2"); shift 2 ;;')
runner_lines.append(' --n-cpu-moe) ARGS+=(--n_cpu_moe "$2"); shift 2 ;;')
runner_lines.append(' --mmproj) ARGS+=(--clip_model_path "$2"); shift 2 ;;')
runner_lines.append(' --image-max-tokens) shift 2 ;; # native-only')
runner_lines.append(' --no-mmap) ARGS+=(--no_mmap true); shift ;;')
runner_lines.append(' --no-warmup) shift ;; # native-only')
runner_lines.append(' --chat-template) ARGS+=(--chat_format "$2"); shift 2 ;;')
runner_lines.append(' --fit|--split-mode|--tensor-split|--main-gpu|--parallel) shift 2 ;; # native-only')
runner_lines.append(' --mlock) ARGS+=(--use_mlock true); shift ;;')
runner_lines.append(' *) ARGS+=("$1"); shift ;;')
runner_lines.append(' esac')
runner_lines.append('done')
runner_lines.append('exec python3 -m llama_cpp.server "${ARGS[@]}"')
runner_lines.append('_ODY_LLAMA_SHIM_EOF')
runner_lines.append(' chmod +x ~/bin/llama-server')
runner_lines.append(' echo "[odysseus] Created llama-server shim → python -m llama_cpp.server (no native binary needed)"')
runner_lines.append(' fi')
runner_lines.append(' # If the native build failed, fall back to the Python bindings.')
runner_lines.append(' if ! command -v llama-server &>/dev/null && ! python3 -c "import llama_cpp" 2>/dev/null; then')
runner_lines.append(' echo "llama-server build failed — installing Python bindings as fallback..."')
@@ -1489,6 +1658,96 @@ def setup_cookbook_routes() -> APIRouter:
runner_lines.append(' echo "ERROR: vLLM is not installed."')
runner_lines.append(' ODYSSEUS_PREFLIGHT_EXIT=127')
runner_lines.append('fi')
runner_lines.append(f"ODYSSEUS_SERVE_CMD='{_bash_squote(req.cmd)}'")
runner_lines.append('if [ -z "$ODYSSEUS_PREFLIGHT_EXIT" ]; then')
runner_lines.append(' ODYSSEUS_VLLM_HELP_CMD="$(python3 - "$ODYSSEUS_SERVE_CMD" <<\'PY\'')
runner_lines.append('import shlex, sys')
runner_lines.append('parts = shlex.split(sys.argv[1])')
runner_lines.append('try:')
runner_lines.append(' serve_i = parts.index("serve")')
runner_lines.append('except ValueError:')
runner_lines.append(' print("vllm serve --help")')
runner_lines.append('else:')
runner_lines.append(' print(shlex.join(parts[:serve_i + 1] + ["--help"]))')
runner_lines.append('PY')
runner_lines.append(')"')
runner_lines.append(' ODYSSEUS_VLLM_SUPPORTS_SWAP=0')
runner_lines.append(' if eval "$ODYSSEUS_VLLM_HELP_CMD" 2>&1 | grep -q -- "--swap-space"; then ODYSSEUS_VLLM_SUPPORTS_SWAP=1; fi')
runner_lines.append('fi')
runner_lines.append('if [ -z "$ODYSSEUS_PREFLIGHT_EXIT" ] && [ "${ODYSSEUS_VLLM_SUPPORTS_SWAP:-0}" = "1" ] && ! printf "%s" "$ODYSSEUS_SERVE_CMD" | grep -q -- "--swap-space"; then')
runner_lines.append(' echo "[odysseus] Setting vLLM --swap-space 0 so the runtime does not reserve CPU swap per GPU."')
runner_lines.append(' ODYSSEUS_SERVE_CMD="${ODYSSEUS_SERVE_CMD} --swap-space 0"')
runner_lines.append('fi')
runner_lines.append('if [ -z "$ODYSSEUS_PREFLIGHT_EXIT" ] && [ "${ODYSSEUS_VLLM_SUPPORTS_SWAP:-0}" != "1" ]; then')
runner_lines.append(' if printf "%s" "$ODYSSEUS_SERVE_CMD" | grep -q -- "--swap-space"; then')
runner_lines.append(' echo "[odysseus] vLLM serve does not expose --swap-space; removing the flag and patching the runtime default to 0."')
runner_lines.append(' ODYSSEUS_SERVE_CMD="$(python3 - "$ODYSSEUS_SERVE_CMD" <<\'PY\'')
runner_lines.append('import shlex, sys')
runner_lines.append('parts = shlex.split(sys.argv[1])')
runner_lines.append('out = []')
runner_lines.append('skip = False')
runner_lines.append('for part in parts:')
runner_lines.append(' if skip:')
runner_lines.append(' skip = False')
runner_lines.append(' continue')
runner_lines.append(' if part == "--swap-space":')
runner_lines.append(' skip = True')
runner_lines.append(' continue')
runner_lines.append(' if part.startswith("--swap-space="):')
runner_lines.append(' continue')
runner_lines.append(' out.append(part)')
runner_lines.append('print(shlex.join(out))')
runner_lines.append('PY')
runner_lines.append(')"')
runner_lines.append(' fi')
runner_lines.append(' ODYSSEUS_SERVE_CMD="$(python3 - "$ODYSSEUS_SERVE_CMD" <<\'PY\'')
runner_lines.append('import shlex, sys')
runner_lines.append('parts = shlex.split(sys.argv[1])')
runner_lines.append('patch = r"""import inspect, sys')
runner_lines.append('from vllm.engine.arg_utils import EngineArgs, AsyncEngineArgs')
runner_lines.append('def _odysseus_swap0(cls):')
runner_lines.append(' params = list(inspect.signature(cls).parameters)')
runner_lines.append(' if "swap_space" not in params:')
runner_lines.append(' return')
runner_lines.append(' idx = params.index("swap_space")')
runner_lines.append(' defaults = list(cls.__init__.__defaults__ or ())')
runner_lines.append(' if idx < len(defaults):')
runner_lines.append(' defaults[idx] = 0')
runner_lines.append(' cls.__init__.__defaults__ = tuple(defaults)')
runner_lines.append(' fields = getattr(cls, "__dataclass_fields__", {})')
runner_lines.append(' if "swap_space" in fields:')
runner_lines.append(' fields["swap_space"].default = 0')
runner_lines.append('_odysseus_swap0(EngineArgs)')
runner_lines.append('_odysseus_swap0(AsyncEngineArgs)')
runner_lines.append('try:')
runner_lines.append(' from vllm.config import CacheConfig')
runner_lines.append(' CacheConfig.swap_space = 0')
runner_lines.append('except Exception:')
runner_lines.append(' pass')
runner_lines.append('_orig_create_engine_config = EngineArgs.create_engine_config')
runner_lines.append('def _odysseus_create_engine_config(self, *args, **kwargs):')
runner_lines.append(' self.swap_space = 0')
runner_lines.append(' return _orig_create_engine_config(self, *args, **kwargs)')
runner_lines.append('EngineArgs.create_engine_config = _odysseus_create_engine_config')
runner_lines.append('AsyncEngineArgs.create_engine_config = _odysseus_create_engine_config')
runner_lines.append('from vllm.entrypoints.cli.main import main')
runner_lines.append('sys.exit(main())"""')
runner_lines.append('try:')
runner_lines.append(' serve_i = parts.index("serve")')
runner_lines.append('except ValueError:')
runner_lines.append(' print(shlex.join(parts))')
runner_lines.append('else:')
runner_lines.append(' exe_i = serve_i - 1')
runner_lines.append(' exe = parts[exe_i] if exe_i >= 0 else "vllm"')
runner_lines.append(' py = "python3"')
runner_lines.append(' if exe.endswith("/bin/vllm"):')
runner_lines.append(' py = exe[:-len("/bin/vllm")] + "/bin/python"')
runner_lines.append(' parts[exe_i:serve_i] = [py, "-c", patch]')
runner_lines.append(' print(shlex.join(parts))')
runner_lines.append('PY')
runner_lines.append(')"')
runner_lines.append(' echo "[odysseus] Patched vLLM internal swap_space default to 0 for this runtime."')
runner_lines.append('fi')
elif "sglang.launch_server" in req.cmd:
runner_lines.append('export PATH="$HOME/.local/bin:$PATH"')
runner_lines.append('if ! command -v sglang &>/dev/null; then')
@@ -1530,7 +1789,10 @@ def setup_cookbook_routes() -> APIRouter:
runner_lines,
keep_shell_open=not local_windows,
)
runner_lines.append(req.cmd)
if "vllm serve" in req.cmd:
runner_lines.append('eval "$ODYSSEUS_SERVE_CMD"')
else:
runner_lines.append(req.cmd)
if local_windows:
# Detached background process — no interactive shell to keep open.
# Print the exit marker the status poller looks for, then stop.
@@ -1577,10 +1839,10 @@ def setup_cookbook_routes() -> APIRouter:
setup_cmd = (
f"{scp_extras}"
f"scp -O {_Pf}-q '{runner_path}' {remote}:{remote_runner} && "
f"ssh {_pf}{remote} 'chmod +x {remote_runner} && tmux new-session -d -s {session_id} \"./{remote_runner}\"'"
f"ssh {_pf}{remote} 'chmod +x {remote_runner} && tmux set-option -g history-limit 100000 2>/dev/null; tmux new-session -d -s {session_id} \"./{remote_runner}\"'"
)
else:
setup_cmd = f"tmux new-session -d -s {session_id} {shlex.quote(str(runner_path))}"
setup_cmd = f"tmux set-option -g history-limit 100000 2>/dev/null; tmux new-session -d -s {session_id} {shlex.quote(str(runner_path))}"
if setup_cmd is None:
# LOCAL Windows: launch the bash runner detached; no tmux setup_cmd.
@@ -1834,6 +2096,25 @@ def setup_cookbook_routes() -> APIRouter:
out, err = await _run_gpu_shell("ls -1 /sys/class/drm 2>/dev/null", host, ssh_port, timeout=4)
if err is not None or not out:
return []
# Pick the runtime label up-front so each GPU dict gets the
# right `backend`. AMD silicon can be driven by ROCm/HIP (native)
# OR Vulkan (mesa RADV). Reporting "rocm" on a host where no
# ROCm toolchain is installed misleads the frontend env-var
# prefix logic — it would emit `HIP_VISIBLE_DEVICES=` for a
# Vulkan-only stack, which is a silent no-op at best.
rt_out, _ = await _run_gpu_shell(
'command -v rocminfo >/dev/null 2>&1 && echo rocm '
'|| (command -v hipconfig >/dev/null 2>&1 && echo rocm) '
'|| (command -v vulkaninfo >/dev/null 2>&1 && echo vulkan) '
'|| echo unknown',
host, ssh_port, timeout=4,
)
_amd_runtime = (rt_out or "").strip().splitlines()[-1:][0].strip() if rt_out else "rocm"
if _amd_runtime not in ("rocm", "vulkan"):
# Default to rocm so existing ROCm-installed hosts keep
# working; "unknown" only happens when neither toolchain is
# detected (e.g. minimal sysfs read on a fresh box).
_amd_runtime = "rocm"
gpus = []
for entry in out.split():
if not entry.startswith("card") or "-" in entry:
@@ -1877,7 +2158,7 @@ def setup_cookbook_routes() -> APIRouter:
"free_mb": free_mb, "total_mb": total_mb, "used_mb": used_mb,
"gtt_used_mb": gtt_used_mb,
"util_pct": 0, "busy": bool(total_mb and (free_mb / total_mb) < 0.85),
"processes": [], "backend": "rocm", "source": "amd-sysfs",
"processes": [], "backend": _amd_runtime, "source": "amd-sysfs",
"unified_memory": unified,
})
if gpus:
@@ -2018,10 +2299,15 @@ def setup_cookbook_routes() -> APIRouter:
amd_gpus = await _probe_amd_sysfs(host, ssh_port)
if amd_gpus:
# The per-GPU dict already carries the runtime label picked by
# _probe_amd_sysfs (rocm vs vulkan); mirror that into the
# wrapper so the frontend can read `data.backend` directly
# without scanning the list.
_amd_wrap_backend = str(amd_gpus[0].get("backend") or "rocm")
return {
"ok": True,
"gpus": amd_gpus,
"backend": "rocm",
"backend": _amd_wrap_backend,
"source": "amd-sysfs",
"fallback_from": "nvidia-smi",
"nvidia_error": nvidia_error,
@@ -2161,6 +2447,17 @@ def setup_cookbook_routes() -> APIRouter:
disk_tasks = on_disk.get("tasks") or [] if isinstance(on_disk, dict) else []
incoming_tasks = data.get("tasks") if isinstance(data.get("tasks"), list) else []
incoming_removed = data.get("removedTasks") if isinstance(data.get("removedTasks"), dict) else {}
disk_removed = on_disk.get("removedTasks") if isinstance(on_disk, dict) and isinstance(on_disk.get("removedTasks"), dict) else {}
removed_tasks = {**disk_removed, **incoming_removed}
data["removedTasks"] = removed_tasks
removed_ids = set(removed_tasks.keys())
if removed_ids:
incoming_tasks = [
t for t in incoming_tasks
if not (isinstance(t, dict) and t.get("sessionId") in removed_ids)
]
data["tasks"] = incoming_tasks
# Anti-poisoning guard: a stale browser tab can keep POSTing a
# download task as status='done' from before the strict-finish
# fix landed, undoing any server-side correction. For each
@@ -2198,6 +2495,8 @@ def setup_cookbook_routes() -> APIRouter:
sid = t.get("sessionId")
if not sid or sid in incoming_ids:
continue # client's version wins
if sid in removed_ids:
continue # intentional cross-device clear/remove
ts = t.get("ts") or 0
if isinstance(ts, (int, float)) and (now_ms - ts) <= RACE_WINDOW_MS:
preserved.append(t)
@@ -2304,16 +2603,14 @@ def setup_cookbook_routes() -> APIRouter:
# Add 30% headroom for KV cache, activations, etc.
needed_vram = (est_vram * 1.3) if est_vram else None
if vram_gb > 0 and needed_vram is not None and needed_vram > vram_gb:
continue
# Unknown-size models (e.g. MiniMax-M2.7, DeepSeek-V4-Flash) have no
# "NB" in the repo id, so the regex above can't extract their
# param count. Previously we dropped them entirely, which made
# brand-new flagship releases silently vanish from this list even
# on rigs with hundreds of GB of VRAM. Adapters/LoRAs are already
# filtered by _is_excluded(), so what falls through here is
# overwhelmingly full models — keep them, just without a size
# badge (the frontend handles needed_vram_gb=null gracefully).
if vram_gb > 0:
if needed_vram is None:
# The "trending models that fit" list must be conservative:
# if we cannot estimate size from the repo id/tags, do not
# present it as runnable on this hardware.
continue
if needed_vram > vram_gb:
continue
out.append({
"repo_id": repo_id,
@@ -2510,6 +2807,33 @@ def setup_cookbook_routes() -> APIRouter:
except Exception as e:
logger.warning(f"orphan sweep: state write failed: {e}")
@router.get("/api/cookbook/hf-gguf-files")
async def hf_gguf_files(repo_id: str, owner: str = Depends(require_user)):
"""List GGUF files in a HuggingFace repo for the direct-download picker."""
import httpx
repo_id = _validate_repo_id(repo_id)
url = f"https://huggingface.co/api/models/{repo_id}"
try:
headers = {}
token = _load_stored_hf_token()
if token:
headers["Authorization"] = f"Bearer {token}"
async with httpx.AsyncClient(timeout=15, follow_redirects=True) as client:
resp = await client.get(url, headers=headers)
if resp.status_code != 200:
return {"ok": False, "files": [], "error": f"HF API HTTP {resp.status_code}"}
data = resp.json()
except Exception:
logger.exception("HF GGUF file scan failed for %s", repo)
return {"ok": False, "files": [], "error": "HF API request failed"}
files = [
str(s.get("rfilename") or "")
for s in data.get("siblings", [])
if str(s.get("rfilename") or "").lower().endswith(".gguf")
]
return {"ok": True, "repo_id": repo_id, "files": files}
# In-memory cache for the Ollama library scrape. ollama.com is a public
# site, but it doesn't expose a stable JSON listing — we fetch the HTML
# search page and regex out the model cards. Cached for 1 h so a busy
@@ -2625,6 +2949,193 @@ def setup_cookbook_routes() -> APIRouter:
"error": _ollama_library_cache["error"],
}
# ── vLLM recipe scraper ─────────────────────────────────────────────
# Fetches the official YAML recipe for a model from vllm-project/recipes
# and normalizes it into a small JSON the frontend can consume. Cached
# per-repo so the GitHub raw endpoint isn't hammered.
_vllm_recipe_cache: dict[str, tuple[float, dict | None]] = {}
# Manifest of all <org>/<model> ids that have a recipe in the upstream
# repo. Cheap to fetch (one Git Tree API call), so we cache the whole
# set for ~12h. Per-row "does this model have a recipe?" lookups hit
# this set instead of doing 912 individual recipe fetches.
_vllm_recipe_manifest: dict = {"fetched_at": 0.0, "models": set(), "error": ""}
@router.get("/api/cookbook/vllm-recipe-manifest")
async def vllm_recipe_manifest(refresh: int = 0):
"""Return the set of <org>/<model> ids known to have a vLLM recipe.
One GitHub Tree API call, 12h cache. The frontend uses this to badge
rows in the model list before the user expands them."""
import time as _time
import httpx as _httpx
TTL = 12 * 3600.0
now = _time.time()
if (
refresh
or (now - _vllm_recipe_manifest["fetched_at"]) > TTL
or not _vllm_recipe_manifest["models"]
):
url = (
"https://api.github.com/repos/vllm-project/recipes/"
"git/trees/main?recursive=1"
)
def _fetch_sync() -> tuple[int, dict | None, str]:
try:
headers = {"Accept": "application/vnd.github+json"}
with _httpx.Client(timeout=10.0, follow_redirects=True) as client:
r = client.get(url, headers=headers)
if r.status_code != 200:
return r.status_code, None, r.text[:200]
return 200, r.json(), ""
except Exception as e:
return 0, None, f"fetch error: {e}"
status, data, err = await asyncio.to_thread(_fetch_sync)
if status == 200 and isinstance(data, dict):
models: set[str] = set()
for entry in data.get("tree") or []:
path = (entry or {}).get("path") or ""
if not path.startswith("models/") or not path.endswith(".yaml"):
continue
# path = "models/<org>/<model>.yaml" → "<org>/<model>"
body = path[len("models/"):-len(".yaml")]
if "/" in body:
models.add(body)
_vllm_recipe_manifest["models"] = models
_vllm_recipe_manifest["fetched_at"] = now
_vllm_recipe_manifest["error"] = ""
else:
_vllm_recipe_manifest["error"] = (
f"HTTP {status}: {err}" if status else err
)
# Don't clobber a stale-but-usable list on transient failures.
if not _vllm_recipe_manifest["models"]:
return {
"models": [],
"count": 0,
"error": _vllm_recipe_manifest["error"],
}
return {
"models": sorted(_vllm_recipe_manifest["models"]),
"count": len(_vllm_recipe_manifest["models"]),
"fetched_at": _vllm_recipe_manifest["fetched_at"],
"error": _vllm_recipe_manifest["error"],
}
@router.get("/api/cookbook/vllm-recipe")
async def vllm_recipe(repo: str, refresh: int = 0):
"""Return the vLLM official recipe for a HuggingFace repo, if one
exists at vllm-project/recipes. `repo` is the full HF id like
'MiniMaxAI/MiniMax-M2'. Cached 6h."""
import time as _time
import httpx as _httpx
import yaml as _yaml
TTL = 6 * 3600.0
now = _time.time()
repo = (repo or "").strip().strip("/")
if "/" not in repo:
return {"exists": False, "error": "repo must be <org>/<model>"}
cached = _vllm_recipe_cache.get(repo)
if cached and not refresh and (now - cached[0]) < TTL:
return cached[1] or {"exists": False, "cached": True}
url = (
f"https://raw.githubusercontent.com/vllm-project/recipes/"
f"main/models/{repo}.yaml"
)
def _fetch_sync() -> tuple[int, str]:
try:
with _httpx.Client(timeout=8.0, follow_redirects=True) as client:
r = client.get(url)
return r.status_code, r.text
except Exception as e:
return 0, f"fetch error: {e}"
status, text = await asyncio.to_thread(_fetch_sync)
if status == 404:
_vllm_recipe_cache[repo] = (now, {"exists": False})
return {"exists": False}
if status != 200:
return {"exists": False, "error": f"HTTP {status}", "transient": True}
try:
doc = _yaml.safe_load(text) or {}
except Exception as e:
return {"exists": False, "error": f"yaml parse: {e}"}
meta = doc.get("meta") or {}
model = doc.get("model") or {}
features = doc.get("features") or {}
deps = doc.get("dependencies") or []
variants = doc.get("variants") or {}
hw_overrides = doc.get("hardware_overrides") or {}
strat_overrides = doc.get("strategy_overrides") or {}
# Tool-call + reasoning parsers, as flat arg arrays, so the frontend
# can drop them straight into the launch command.
tool_calling = features.get("tool_calling") or {}
reasoning = features.get("reasoning") or {}
normalized = {
"exists": True,
"source_url": url,
"title": meta.get("title") or "",
"provider": meta.get("provider") or "",
"description": meta.get("description") or "",
"date_updated": str(meta.get("date_updated") or ""),
"hardware_support": meta.get("hardware") or {},
"model_id": model.get("model_id") or repo,
"min_vllm_version": model.get("min_vllm_version") or "",
"architecture": model.get("architecture") or "",
"parameter_count": model.get("parameter_count") or "",
"active_parameters": model.get("active_parameters") or "",
"context_length": model.get("context_length") or 0,
"base_args": list(model.get("base_args") or []),
"base_env": dict(model.get("base_env") or {}),
"tool_calling": {
"description": tool_calling.get("description") or "",
"args": list(tool_calling.get("args") or []),
} if tool_calling else None,
"reasoning": {
"description": reasoning.get("description") or "",
"args": list(reasoning.get("args") or []),
} if reasoning else None,
"dependencies": [
{
"note": (d.get("note") or "").strip(),
"command": (d.get("command") or "").strip(),
"optional": bool(d.get("optional", False)),
}
for d in deps if isinstance(d, dict)
],
"variants": {
k: {
"model_id": v.get("model_id") or model.get("model_id") or repo,
"precision": v.get("precision") or "",
"vram_minimum_gb": v.get("vram_minimum_gb") or 0,
"description": v.get("description") or "",
"extra_args": list(v.get("extra_args") or []),
"extra_env": dict(v.get("extra_env") or {}),
}
for k, v in variants.items() if isinstance(v, dict)
},
"hardware_overrides": {
hw: {
"extra_args": list((ov or {}).get("extra_args") or []),
"extra_env": dict((ov or {}).get("extra_env") or {}),
}
for hw, ov in hw_overrides.items() if isinstance(ov, dict)
},
"strategy_overrides": {
strat: dict(ov or {})
for strat, ov in strat_overrides.items() if isinstance(ov, dict)
},
"compatible_strategies": list(doc.get("compatible_strategies") or []),
}
_vllm_recipe_cache[repo] = (now, normalized)
return normalized
@router.get("/api/cookbook/tasks/status")
async def cookbook_tasks_status(request: Request):
"""Check status of all active cookbook tmux sessions.
+7 -2
View File
@@ -12,6 +12,7 @@ from pydantic import BaseModel
from core.database import Document, DocumentVersion
from core.database import Session as DbSession
from src.auth_helpers import _auth_disabled
from src.upload_handler import UploadHandler
logger = logging.getLogger(__name__)
@@ -78,6 +79,8 @@ def _verify_doc_owner(db, doc: Document, user: str):
the session join for any not-yet-backfilled legacy row.
"""
if user is None:
if _auth_disabled():
return # Single-user / no-auth mode: allow access
raise HTTPException(403, "Authentication required")
if doc.owner is not None:
if doc.owner != user:
@@ -102,8 +105,10 @@ def _owner_session_filter(q, user):
The owner backfill runs in init_db before the app serves requests, so
by the time this filter is live there are no NULL-owner rows to leak;
we therefore match the owner strictly."""
if user is None:
we therefore match the owner strictly for authenticated callers."""
if not user:
if user == "" or _auth_disabled():
return q
return q.filter(False)
return q.filter(Document.owner == user)
+13 -5
View File
@@ -10,7 +10,7 @@ from fastapi import APIRouter, HTTPException, Query, Request, UploadFile, File,
from sqlalchemy import case, func, or_
from core.database import SessionLocal, Document, DocumentVersion
from core.database import Session as DbSession
from src.auth_helpers import get_current_user
from src.auth_helpers import get_current_user, _auth_disabled
from src.constants import MAIL_ATTACHMENTS_DIR
logger = logging.getLogger(__name__)
@@ -388,7 +388,8 @@ def setup_document_routes(session_manager, upload_handler=None) -> APIRouter:
db = SessionLocal()
try:
if not user:
raise HTTPException(403, "Authentication required")
if not _auth_disabled():
raise HTTPException(403, "Authentication required")
# v2 review HIGH-9: raise 403 explicitly when the caller
# can't see this session, instead of returning [] which the
# UI treats identically to "no docs" and silently masks
@@ -503,7 +504,8 @@ def setup_document_routes(session_manager, upload_handler=None) -> APIRouter:
user = get_current_user(request)
try:
data = await request.json()
except Exception:
except Exception as e:
logger.warning("Failed to parse export request body, defaulting to empty", exc_info=e)
data = {}
ids = data.get("ids") or []
if not ids:
@@ -645,8 +647,8 @@ def setup_document_routes(session_manager, upload_handler=None) -> APIRouter:
try:
from src.agent_tools.document_tools import clear_active_document
clear_active_document(doc_id)
except Exception:
pass
except Exception as e:
logger.warning("Failed to clear active document %r on detach", doc_id, exc_info=e)
db.commit()
db.refresh(doc)
return _doc_to_dict(doc)
@@ -1331,6 +1333,12 @@ def setup_document_routes(session_manager, upload_handler=None) -> APIRouter:
if not pdf_path:
raise HTTPException(404, f"Source PDF {upload_id} not found")
# Fail fast with a clear 503 if the optional PyMuPDF dependency
# is missing — fill_fields/stamp_annotations will otherwise
# raise RuntimeError deep inside and bubble out as a 500.
# Mirrors the convention in _load_pdf_viewer_fitz above.
_load_pdf_viewer_fitz()
values = parse_markdown_to_values(doc.current_content or "")
out_path = tempfile.NamedTemporaryFile(suffix=".pdf", delete=False).name
_to_unlink.append(out_path)
+179 -18
View File
@@ -13,6 +13,8 @@ and `email_pollers.py` (the background loops):
"""
import os
import base64
import time
import imaplib
import smtplib
import email as email_mod
@@ -38,6 +40,106 @@ from src.secret_storage import decrypt as _decrypt
logger = logging.getLogger(__name__)
def _xoauth2_raw(user: str, access_token: str) -> str:
"""The SASL XOAUTH2 initial-response string (unencoded).
Both smtplib.SMTP.auth() and imaplib.IMAP4.authenticate() base64-encode
the value their callback returns, so callers pass this raw form never
pre-encoded to avoid double base64.
"""
return f"user={user}\x01auth=Bearer {access_token}\x01\x01"
def _xoauth2_bytes(user: str, access_token: str) -> bytes:
"""Raw XOAUTH2 bytes for imaplib's authenticate() callback."""
return _xoauth2_raw(user, access_token).encode()
def make_oauth_state(account_id: str, owner: str) -> str:
"""Return an HMAC-signed, base64-encoded OAuth state token.
Encodes account_id + owner + a random nonce, signed with the app secret
so the callback can validate that the flow was initiated by an
authenticated, owning user (CSRF / state-forgery protection).
"""
import hmac as _hmac, hashlib as _hl, secrets as _sec
from src.secret_storage import _load_or_create_key
nonce = _sec.token_hex(16)
payload = json.dumps({"a": account_id, "o": owner, "n": nonce}, separators=(",", ":"))
sig = _hmac.new(_load_or_create_key(), payload.encode(), _hl.sha256).hexdigest()
return base64.urlsafe_b64encode(f"{payload}|{sig}".encode()).decode()
def verify_oauth_state(state: str) -> dict | None:
"""Verify an OAuth state token's HMAC signature.
Returns the decoded payload dict ({"a", "o", "n"}) on success, or None if
the token is malformed, tampered, or signed with a different key.
"""
import hmac as _hmac, hashlib as _hl
from src.secret_storage import _load_or_create_key
try:
decoded = base64.urlsafe_b64decode(state.encode()).decode()
payload, sig = decoded.rsplit("|", 1)
expected = _hmac.new(_load_or_create_key(), payload.encode(), _hl.sha256).hexdigest()
if not _hmac.compare_digest(sig, expected):
return None
return json.loads(payload)
except Exception:
return None
def _refresh_google_token(account_id: str) -> str | None:
"""Exchange the stored refresh token for a new access token and persist it."""
import httpx
from core.database import SessionLocal as _SL, EmailAccount as _EA
from src.secret_storage import encrypt as _enc, decrypt as _dec
client_id = os.environ.get("GOOGLE_OAUTH_CLIENT_ID", "")
client_secret = os.environ.get("GOOGLE_OAUTH_CLIENT_SECRET", "")
if not client_id or not client_secret:
return None
db = _SL()
try:
row = db.get(_EA, account_id)
if not row or not row.oauth_refresh_token:
return None
refresh_token = _dec(row.oauth_refresh_token or "")
if not refresh_token:
return None
resp = httpx.post("https://oauth2.googleapis.com/token", data={
"client_id": client_id,
"client_secret": client_secret,
"refresh_token": refresh_token,
"grant_type": "refresh_token",
}, timeout=10)
resp.raise_for_status()
data = resp.json()
access_token = data["access_token"]
row.oauth_access_token = _enc(access_token)
row.oauth_token_expiry = str(int(time.time()) + data.get("expires_in", 3600))
db.commit()
return access_token
except Exception:
logger.warning(f"Google token refresh failed for account {account_id}")
return None
finally:
db.close()
def _get_valid_google_token(account_id: str, cfg: dict) -> str | None:
"""Return a valid Google access token, refreshing if expired or missing."""
from src.secret_storage import decrypt as _dec
access_token = _dec(cfg.get("oauth_access_token") or "")
expiry_str = cfg.get("oauth_token_expiry") or ""
if access_token and expiry_str:
try:
if int(expiry_str) - 60 > time.time():
return access_token
except (ValueError, TypeError):
pass
return _refresh_google_token(account_id)
def _smtp_security_mode(cfg: dict) -> str:
raw = str(cfg.get("smtp_security") or "").strip().lower()
if raw in {"ssl", "starttls", "none"}:
@@ -54,20 +156,29 @@ def _send_smtp_message(cfg: dict, from_addr: str, recipients: list[str], message
port = int(cfg.get("smtp_port") or 465)
user = cfg.get("smtp_user") or ""
password = cfg.get("smtp_password") or ""
def _auth_smtp(smtp):
if cfg.get("oauth_provider") == "google":
token = _get_valid_google_token(cfg.get("account_id"), cfg)
if not token:
raise RuntimeError("Google OAuth token unavailable — reconnect the account")
smtp.ehlo()
smtp.auth("XOAUTH2", lambda challenge=None: _xoauth2_raw(user, token), initial_response_ok=True)
elif user and password:
smtp.login(user, password)
security = _smtp_security_mode(cfg)
if security == "ssl":
with smtplib.SMTP_SSL(host, port, timeout=timeout) as smtp:
if user and password:
smtp.login(user, password)
_auth_smtp(smtp)
smtp.sendmail(from_addr, recipients, message)
return
with smtplib.SMTP(host, port, timeout=timeout) as smtp:
if security == "starttls":
smtp.starttls()
if user and password:
smtp.login(user, password)
_auth_smtp(smtp)
smtp.sendmail(from_addr, recipients, message)
@@ -701,10 +812,16 @@ def _get_email_config(account_id: str | None = None, owner: str = "") -> dict:
"imap_password": _decrypt(row.imap_password or ""),
"imap_starttls": bool(row.imap_starttls),
"from_address": row.from_address or row.imap_user or "",
"oauth_provider": row.oauth_provider or "",
"oauth_access_token": row.oauth_access_token or "",
"oauth_refresh_token": row.oauth_refresh_token or "",
"oauth_token_expiry": row.oauth_token_expiry or "",
"display_name": row.display_name or "",
}
if not (cfg["smtp_host"] and cfg["smtp_user"] and cfg["smtp_password"]):
is_oauth = bool(cfg.get("oauth_provider"))
if not is_oauth and not (cfg["smtp_host"] and cfg["smtp_user"] and cfg["smtp_password"]):
logger.warning(f"SMTP not configured for account {row.name!r}")
if not (cfg["imap_host"] and cfg["imap_user"] and cfg["imap_password"]):
if not is_oauth and not (cfg["imap_host"] and cfg["imap_user"] and cfg["imap_password"]):
logger.warning(f"IMAP not configured for account {row.name!r}")
return cfg
finally:
@@ -825,12 +942,19 @@ def _imap_connect(account_id: str | None = None, owner: str = "",
timeout=timeout,
)
try:
conn.login(cfg["imap_user"], cfg["imap_password"])
if cfg.get("oauth_provider") == "google":
token = _get_valid_google_token(cfg.get("account_id"), cfg)
if not token:
raise RuntimeError("Google OAuth token unavailable — reconnect the account in Settings → Integrations")
conn.authenticate("XOAUTH2", lambda x: _xoauth2_bytes(cfg["imap_user"], token))
else:
conn.login(cfg["imap_user"], cfg["imap_password"])
except Exception:
# A failed AUTHENTICATE (e.g. an Office 365 app password on an
# MFA-enabled tenant, #3174) otherwise orphans the already-connected
# socket; close it before propagating so a misconfigured account
# can't leak one descriptor per retry / background poller pass.
# MFA-enabled tenant, #3174, or an expired/revoked OAuth token)
# otherwise orphans the already-connected socket; close it before
# propagating so a misconfigured account can't leak one descriptor
# per retry / background poller pass.
try:
conn.shutdown()
except Exception:
@@ -1109,22 +1233,30 @@ def _list_attachments_from_msg(msg):
return attachments
idx = 0
for part in msg.walk():
if part.is_multipart():
continue
cd = str(part.get("Content-Disposition", ""))
ct = part.get_content_type()
is_attached_email = ct == "message/rfc822" and ("attachment" in cd.lower() or part.get_filename())
if part.is_multipart() and not is_attached_email:
continue
# Skip text/html body parts (only consider real attachments)
if ct in ("text/plain", "text/html") and "attachment" not in cd:
continue
filename = part.get_filename()
if filename:
filename = _decode_header(filename)
if ct == "message/rfc822" and not re.search(r"\.[A-Za-z0-9]{1,8}$", filename):
filename = f"{filename}.eml"
else:
# Inline images, etc. - generate a name
ext = ct.split("/")[-1] if "/" in ct else "bin"
ext = "eml" if ct == "message/rfc822" else (ct.split("/")[-1] if "/" in ct else "bin")
filename = f"attachment_{idx}.{ext}"
payload = part.get_payload(decode=True)
size = len(payload) if payload else 0
if payload is None and ct == "message/rfc822":
try:
payload = part.as_bytes()
except Exception:
payload = b""
size = len(payload) if payload is not None else 0
attachments.append({
"index": idx,
"filename": filename,
@@ -1136,29 +1268,58 @@ def _list_attachments_from_msg(msg):
return attachments
def _is_likely_signature_image_attachment(att: dict) -> bool:
"""Match the reader's inline signature/logo image filter."""
filename = str((att or {}).get("filename") or "").lower()
if not re.search(r"\.(png|jpe?g|gif|bmp|svg|webp)$", filename):
return False
size = int((att or {}).get("size") or 0)
if re.search(r"^image\d{3,}\.(png|jpe?g|gif)$", filename):
return True
if re.search(r"^(signature|logo|sig|footer|banner)[-_\d]*\.(png|jpe?g|gif|svg)$", filename):
return True
return 0 < size < 30 * 1024
def _has_visible_attachments(msg) -> bool:
"""Return True only for attachments the reader will render as chips."""
return any(
not _is_likely_signature_image_attachment(att)
for att in _list_attachments_from_msg(msg)
)
def _extract_attachment_to_disk(msg, index, target_dir):
"""Extract a specific attachment to disk and return the file path."""
if not msg.is_multipart():
return None
idx = 0
for part in msg.walk():
if part.is_multipart():
continue
cd = str(part.get("Content-Disposition", ""))
ct = part.get_content_type()
is_attached_email = ct == "message/rfc822" and ("attachment" in cd.lower() or part.get_filename())
if part.is_multipart() and not is_attached_email:
continue
if ct in ("text/plain", "text/html") and "attachment" not in cd:
continue
if idx == index:
filename = part.get_filename()
if filename:
filename = _decode_header(filename)
if ct == "message/rfc822" and not re.search(r"\.[A-Za-z0-9]{1,8}$", filename):
filename = f"{filename}.eml"
else:
ext = ct.split("/")[-1] if "/" in ct else "bin"
ext = "eml" if ct == "message/rfc822" else (ct.split("/")[-1] if "/" in ct else "bin")
filename = f"attachment_{idx}.{ext}"
# Sanitize
safe_name = re.sub(r"[^\w\s\-.]", "_", filename).strip()
payload = part.get_payload(decode=True)
if not payload:
if payload is None and ct == "message/rfc822":
try:
payload = part.as_bytes()
except Exception:
payload = b""
if payload is None:
return None
target_dir.mkdir(parents=True, exist_ok=True)
filepath = target_dir / safe_name
+29 -15
View File
@@ -44,6 +44,17 @@ from routes.email_helpers import (
logger = logging.getLogger(__name__)
# Recovers a `[{"action": ...}, ...]` JSON array from raw LLM output when the
# fenced-block strip leaves nothing usable. Runs on model output influenced by
# untrusted email bodies, so it must not backtrack: the object content class is
# `[^{}]` (brace-delimited, greedy) rather than the old `[^[\]]*?` lazy runs,
# which exploded exponentially on inputs like `[{"action"},{` + `}},{{` * N
# (CodeQL py/redos #198).
_CAL_ACTION_ARRAY_RE = re.compile(
r'\[\s*\{[^{}]*"action"[^{}]*\}\s*(?:,\s*\{[^{}]*\}\s*)*\]',
re.DOTALL,
)
def _owner_for_email_account(account_id: str | None) -> str:
if not account_id:
@@ -558,7 +569,7 @@ async def _auto_summarize_pass_single(days_back: int = 1, account_id: str | None
cal_extract = _strip_think(_raw_original)
cal_extract = re.sub(r"^```(?:json)?\s*|\s*```$", "", cal_extract, flags=re.MULTILINE).strip()
if not cal_extract and _raw_original:
matches = list(re.finditer(r'\[\s*\{[^[\]]*?"action"[^[\]]*?\}\s*(?:,\s*\{[^[\]]*?\}\s*)*\]', _raw_original, re.DOTALL))
matches = list(_CAL_ACTION_ARRAY_RE.finditer(_raw_original))
if matches:
cal_extract = matches[-1].group()
logger.info(f"[cal-extract] uid={uid.decode() if isinstance(uid, bytes) else uid} folder={_folder} subj={subject[:50]!r} raw_len={len(cal_extract)} orig_len={len(_raw_original)} raw={cal_extract[:800]!r}")
@@ -683,20 +694,23 @@ async def _auto_summarize_pass_single(days_back: int = 1, account_id: str | None
logger.warning(f"[cal-extract] JSON parse failed: {je} on raw={cal_extract[:200]!r}")
except Exception as e:
logger.warning(f"[cal-extract] Meeting extraction LLM call failed for uid={uid}: {e}")
# Record we processed this email so we don't re-LLM next run
try:
_cc = _sql3.connect(SCHEDULED_DB)
_cc.execute(
"INSERT OR REPLACE INTO email_calendar_extractions "
"(message_id, owner, uid, events_created, created_at) VALUES (?, ?, ?, ?, ?)",
(message_id, account_owner or "", uid.decode() if isinstance(uid, bytes) else str(uid),
_cal_run_count, datetime.utcnow().isoformat())
)
_cc.commit()
_cc.close()
_cal_existing.add(message_id)
except Exception as ce:
logger.debug(f"Could not cache calendar extraction: {ce}")
else:
# Record we processed this email so we don't re-LLM next run.
# Only mark as processed on success ? transient LLM failures
# are retried on the next poll run (matches summary/reply pattern).
try:
_cc = _sql3.connect(SCHEDULED_DB)
_cc.execute(
"INSERT OR REPLACE INTO email_calendar_extractions "
"(message_id, owner, uid, events_created, created_at) VALUES (?, ?, ?, ?, ?)",
(message_id, account_owner or "", uid.decode() if isinstance(uid, bytes) else str(uid),
_cal_run_count, datetime.utcnow().isoformat())
)
_cc.commit()
_cc.close()
_cal_existing.add(message_id)
except Exception as ce:
logger.debug(f"Could not cache calendar extraction: {ce}")
if need_urgent:
try:
+495 -52
View File
@@ -13,7 +13,9 @@ handlers need. The split is mechanical — no behavior change.
"""
import asyncio
import os
import sqlite3 as _sql3
import time
import email as email_mod
import email.header
import email.utils
@@ -43,8 +45,9 @@ from routes.email_helpers import (
_load_settings, _save_settings, _get_email_config,
_send_smtp_message, _smtp_security_mode,
_IMAP_TIMEOUT_SECONDS, _open_imap_connection,
make_oauth_state, verify_oauth_state,
_imap_connect, _imap, _decode_header, _detect_sent_folder, _detect_drafts_folder,
_extract_attachment_text, _list_attachments_from_msg,
_extract_attachment_text, _list_attachments_from_msg, _has_visible_attachments, _is_likely_signature_image_attachment,
_extract_attachment_to_disk, _extract_html, _extract_text,
_fetch_sender_thread_context, _pre_retrieve_context,
_EMAIL_REPLY_SYS_PROMPT_BASE, _POOL_HOOKS,
@@ -58,6 +61,7 @@ from routes.email_pollers import _start_poller
logger = logging.getLogger(__name__)
ODYSSEUS_MAIL_ORIGIN = "odysseus-ui"
EMAIL_READ_ATTACHMENT_VERSION = 2
def _email_tag_owner_aliases(account_id: str | None, owner: str = "") -> list[str]:
@@ -76,15 +80,16 @@ def _email_tag_owner_aliases(account_id: str | None, owner: str = "") -> list[st
cfg.get("smtp_user") or "",
cfg.get("from_address") or "",
])
except Exception:
except Exception as _e:
logger.warning("Failed to resolve email account alias", exc_info=_e)
resolved_account_id = None
row = db.get(_EA, resolved_account_id) if resolved_account_id else None
if row:
aliases.extend([row.owner or "", row.imap_user or "", row.from_address or ""])
finally:
db.close()
except Exception:
pass
except Exception as _e:
logger.warning("Failed to load email aliases", exc_info=_e)
out = []
for a in aliases:
a = (a or "").strip()
@@ -244,6 +249,21 @@ def _imap_uid_fetch(conn, uid_set: str | bytes, query: str):
return conn.uid("FETCH", _uid_bytes(uid_set), query)
def _imap_search_quote(value: str) -> str:
return '"' + str(value or "").replace("\\", "\\\\").replace('"', '\\"') + '"'
def _message_id_chain(*values: str) -> list[str]:
seen = set()
out = []
for value in values:
for mid in re.findall(r"<[^>]+>", value or ""):
if mid not in seen:
seen.add(mid)
out.append(mid)
return out
def _uid_from_fetch_meta(meta_b: bytes) -> str:
m = re.search(rb"\bUID\s+(\d+)\b", meta_b)
return m.group(1).decode() if m else ""
@@ -285,7 +305,9 @@ def _group_uid_fetch_records(msg_data) -> list:
def _smtp_ready(cfg: dict) -> bool:
return bool(cfg.get("smtp_host") and cfg.get("smtp_user") and cfg.get("smtp_password"))
if not cfg.get("smtp_host") or not cfg.get("smtp_user"):
return False
return bool(cfg.get("smtp_password") or cfg.get("oauth_provider"))
def _resolve_send_config(account_id: str | None = None, owner: str = "") -> dict:
@@ -360,6 +382,21 @@ def _apply_odysseus_headers(msg, kind: str | None = None, ref_id: str | None = N
msg["X-Odysseus-Ref"] = re.sub(r"[^A-Za-z0-9_.:-]", "-", ref_id)[:128]
def _normalize_addr_field(field: str) -> str:
"""Strip the malformed-but-common trailing/leading commas and stray
whitespace from a To/Cc/Bcc string before it lands in the MIME header
or the SMTP envelope. Users often paste a single address with a
trailing comma (e.g. `felix@pewdiepie.com,`) and most MTAs reject the
resulting `To: felix@pewdiepie.com,` line as a syntax error. Collapse
any run of separator junk between addresses too."""
if not field:
return field
# Split on commas, drop empty tokens, rejoin with a single ', '.
parts = [p.strip() for p in field.split(",")]
parts = [p for p in parts if p]
return ", ".join(parts)
def _envelope_recipients(*fields: str) -> list:
"""Extract bare SMTP envelope addresses from one or more To/Cc/Bcc header
strings. A naive `field.split(",")` corrupts display names that contain a
@@ -988,6 +1025,65 @@ def setup_email_routes():
except Exception:
pass
def _related_thread_attachments_sync(
folder: str,
account_id: str | None,
owner: str,
current_uid: str,
current_message_id: str,
in_reply_to: str,
references: str,
limit: int = 12,
) -> list[dict]:
"""Return visible attachments from referenced messages in this folder."""
wanted_ids = _message_id_chain(references, in_reply_to)
current_mid = (current_message_id or "").strip()
wanted_ids = [mid for mid in wanted_ids if mid and mid != current_mid]
if not wanted_ids:
return []
related: list[dict] = []
try:
with _imap(account_id, owner=owner) as conn:
conn.select(_q(folder), readonly=True)
# Search newest referenced messages first; cap work so opening
# a long thread stays bounded.
for mid in reversed(wanted_ids[-10:]):
if len(related) >= limit:
break
status, data = _imap_uid_search(conn, f'(HEADER Message-ID {_imap_search_quote(mid)})')
if status != "OK" or not data or not data[0]:
continue
for uid_b in reversed(data[0].split()[-3:]):
source_uid = uid_b.decode(errors="ignore")
if not source_uid or source_uid == str(current_uid):
continue
st2, msg_data = _imap_uid_fetch(conn, source_uid, "(BODY.PEEK[])")
if st2 != "OK" or not msg_data or not isinstance(msg_data[0], tuple):
continue
msg = email_mod.message_from_bytes(msg_data[0][1])
source_from = _decode_header(msg.get("From", ""))
source_subject = _decode_header(msg.get("Subject", ""))
source_date = msg.get("Date", "")
for att in _list_attachments_from_msg(msg):
if _is_likely_signature_image_attachment(att):
continue
enriched = dict(att)
enriched.update({
"source_uid": source_uid,
"source_folder": folder,
"source_message_id": (msg.get("Message-ID") or "").strip(),
"source_from": source_from,
"source_subject": source_subject,
"source_date": source_date,
})
related.append(enriched)
if len(related) >= limit:
break
except Exception as e:
logger.debug(f"related thread attachment lookup failed uid={current_uid}: {e}")
return related
@router.get("/list")
async def list_emails(
folder: str = Query("INBOX"),
@@ -1097,7 +1193,12 @@ def setup_email_routes():
account_id: str | None = Query(None),
owner: str = Depends(require_owner),
):
"""Search emails server-side via IMAP SEARCH. Matches subject, from, or body text."""
"""Search emails server-side via IMAP SEARCH. Matches subject, from, or body text.
When the caller asks for INBOX and the account has an "All Mail"
folder (Gmail does), we transparently swap to All Mail so the
search surfaces archived / labelled emails too. Plain IMAP
accounts fall back to whatever folder the caller specified."""
if not q or len(q) < 2:
return {"emails": [], "total": 0, "query": q}
# CRLF in q would terminate the IMAP command early — reject defensively.
@@ -1105,7 +1206,27 @@ def setup_email_routes():
raise HTTPException(400, "Invalid query")
try:
with _imap(account_id, owner=owner) as conn:
conn.select(_q(folder), readonly=True)
# If the user asked for INBOX, try to upgrade to All Mail —
# one folder == every email on Gmail-class servers.
effective_folder = folder
if (folder or "").upper() == "INBOX":
try:
status, folder_lines = conn.list()
if status == "OK" and folder_lines:
for raw in folder_lines:
if isinstance(raw, bytes):
raw = raw.decode("utf-8", errors="replace")
m = re.match(r"\((?P<flags>[^)]*)\)\s+\"[^\"]*\"\s+(?P<name>.+)", raw)
if not m:
continue
flags = (m.group("flags") or "").lower()
name = m.group("name").strip().strip('"')
if "\\all" in flags or "all mail" in name.lower():
effective_folder = name
break
except Exception:
pass
conn.select(_q(effective_folder), readonly=True)
# Escape backslash and quote for the IMAP-SEARCH quoted-string.
q_escaped = q.replace('\\', '\\\\').replace('"', '\\"')
@@ -1113,7 +1234,7 @@ def setup_email_routes():
status, data = _imap_uid_search(conn, search_cmd)
if status != "OK" or not data[0]:
return {"emails": [], "total": 0, "query": q}
return {"emails": [], "total": 0, "query": q, "folder": effective_folder}
uid_list = data[0].split()
total = len(uid_list)
@@ -1178,6 +1299,13 @@ def setup_email_routes():
"is_flagged": "\\Flagged" in flags,
"flags": flags,
"has_attachments": has_attachments,
# Stamp the folder so the frontend opens each
# email from the folder it actually lives in
# (the search may have run against All Mail
# even though the caller asked for INBOX),
# otherwise clicks open whatever happens to
# have the same UID in INBOX → wrong email.
"folder": effective_folder,
})
except Exception as e:
logger.warning(f"Error parsing search result {uid}: {e}")
@@ -1226,6 +1354,17 @@ def setup_email_routes():
sender_name, sender_addr = email.utils.parseaddr(sender)
parsed_date = email.utils.parsedate_to_datetime(date_str) if date_str else None
attachments = _list_attachments_from_msg(msg)
related_attachments = []
if not _has_visible_attachments(msg):
related_attachments = _related_thread_attachments_sync(
folder,
account_id,
owner,
uid,
message_id,
in_reply_to,
references,
)
if mark_seen:
# Set \Seen in a separate readwrite session so concurrent reads
@@ -1334,6 +1473,8 @@ def setup_email_routes():
"body": body,
"body_html": body_html,
"attachments": attachments,
"related_attachments": related_attachments,
"attachment_version": EMAIL_READ_ATTACHMENT_VERSION,
"cached_summary": cached_summary,
"cached_ai_reply": cached_ai_reply,
"boundaries": cached_boundaries,
@@ -1364,6 +1505,12 @@ def setup_email_routes():
"""Read email body. Cached for 30m, sync IMAP work runs in a thread."""
ck = _read_cache_key(account_id, folder, uid, owner=owner)
cached = _read_cache_get(ck)
if cached is not None:
# Older cached read responses lack the thread-attachment fallback.
# Fetch once so replies that reference prior attachments can show
# those files without waiting for cache expiry.
if cached.get("attachment_version") != EMAIL_READ_ATTACHMENT_VERSION:
cached = None
if cached is not None:
if mark_seen:
try:
@@ -1498,6 +1645,12 @@ def setup_email_routes():
return {"error": f"Attachment index {index} not found"}
from pathlib import Path as _Path
target_root = os.path.abspath(str(target_dir))
filepath_str = os.path.abspath(str(filepath))
if os.path.commonpath([target_root, filepath_str]) != target_root:
logger.warning("Rejected attachment path outside extraction dir: %s", filepath)
return {"error": "Invalid attachment path"}
filepath = _Path(filepath_str)
base = _Path(filepath).name
if base.startswith("."):
return {"error": "Invalid filename", "filename": base}
@@ -1552,6 +1705,65 @@ def setup_email_routes():
return None
doc_session_id = _resolve_doc_session()
def _create_markdown_doc(content: str, summary: str):
from src.database import SessionLocal as _SL, Document as _Doc, DocumentVersion as _DV
doc_id = str(uuid.uuid4())
ver_id = str(uuid.uuid4())
_db = _SL()
try:
_db.query(_Doc).filter(_Doc.is_active == True).update({"is_active": False})
_db.add(_Doc(
id=doc_id, session_id=doc_session_id, title=title,
language="markdown", current_content=content,
version_count=1, is_active=True,
))
_db.add(_DV(
id=ver_id, document_id=doc_id, version_number=1,
content=content, summary=summary, source="upload",
))
_db.commit()
finally:
_db.close()
_tag_doc_with_source(doc_id)
return doc_id
def _attached_email_markdown(raw_bytes: bytes):
if not raw_bytes:
return f"# Attached email: {base}\n\n_(empty email attachment)_"
try:
attached_msg = email_mod.message_from_bytes(raw_bytes)
except Exception:
logger.exception("Failed to parse attached email %s", base)
return f"# Attached email: {base}\n\nCould not parse this email attachment."
attached_subject = _decode_header(attached_msg.get("Subject", "")) or base
attached_from = _decode_header(attached_msg.get("From", ""))
attached_to = _decode_header(attached_msg.get("To", ""))
attached_cc = _decode_header(attached_msg.get("Cc", ""))
attached_date = attached_msg.get("Date", "")
attached_body = _extract_text(attached_msg).strip()
attached_atts = _list_attachments_from_msg(attached_msg)
lines = [f"# Attached email: {attached_subject}", ""]
if attached_from:
lines.append(f"**From:** {attached_from}")
if attached_to:
lines.append(f"**To:** {attached_to}")
if attached_cc:
lines.append(f"**Cc:** {attached_cc}")
if attached_date:
lines.append(f"**Date:** {attached_date}")
lines.extend(["", "## Body", "", attached_body or "_(no readable body)_"])
if attached_atts:
lines.extend(["", "## Attachments", ""])
for att in attached_atts:
size = int(att.get("size") or 0)
size_label = f"{size} B" if size < 1024 else f"{round(size / 1024)} KB"
name = att.get("filename") or f"attachment_{att.get('index', '')}"
ctype = att.get("content_type") or "application/octet-stream"
lines.append(f"- {name} ({ctype}, {size_label})")
return "\n".join(lines).strip()
# ── PDF path (existing) ────────────────────────────────────
if ext == ".pdf":
import shutil as _shutil
@@ -1598,6 +1810,39 @@ def setup_email_routes():
_tag_doc_with_source(doc_id)
return {"doc_id": doc_id, "filename": filepath.name}
# ── Attached email (.eml / message/rfc822) ────────────────
if ext == ".eml":
def _attachment_bytes_from_msg():
if not msg.is_multipart():
return b""
idx = 0
for part in msg.walk():
cd = str(part.get("Content-Disposition", ""))
ct = part.get_content_type()
is_attached_email = ct == "message/rfc822" and ("attachment" in cd.lower() or part.get_filename())
if part.is_multipart() and not is_attached_email:
continue
if ct in ("text/plain", "text/html") and "attachment" not in cd:
continue
if idx == index:
payload = part.get_payload(decode=True)
if payload is None and ct == "message/rfc822":
try:
payload = part.as_bytes()
except Exception:
payload = b""
return payload or b""
idx += 1
return b""
try:
content = _attached_email_markdown(_attachment_bytes_from_msg())
except Exception:
logger.exception("Failed to read email attachment %s", base)
return {"error": "Failed to read email attachment", "filename": base}
doc_id = _create_markdown_doc(content, "Imported attached email")
return {"doc_id": doc_id, "filename": filepath.name}
# ── DOCX path: extract text → markdown document ───────────
if ext == ".docx":
try:
@@ -1635,25 +1880,7 @@ def setup_email_routes():
lines.append("")
content = "\n".join(lines).strip() or f"_(empty {base})_"
from src.database import SessionLocal as _SL, Document as _Doc, DocumentVersion as _DV
doc_id = str(uuid.uuid4())
ver_id = str(uuid.uuid4())
_db = _SL()
try:
_db.query(_Doc).filter(_Doc.is_active == True).update({"is_active": False})
_db.add(_Doc(
id=doc_id, session_id=doc_session_id, title=title,
language="markdown", current_content=content,
version_count=1, is_active=True,
))
_db.add(_DV(
id=ver_id, document_id=doc_id, version_number=1,
content=content, summary="Imported from DOCX", source="upload",
))
_db.commit()
finally:
_db.close()
_tag_doc_with_source(doc_id)
doc_id = _create_markdown_doc(content, "Imported from DOCX")
return {"doc_id": doc_id, "filename": filepath.name}
# ── Plain text / markdown ────────────────────────────────
@@ -1662,25 +1889,7 @@ def setup_email_routes():
content = filepath.read_text(encoding="utf-8", errors="replace")
except Exception as e:
return {"error": f"Failed to read text file: {e}", "filename": base}
from src.database import SessionLocal as _SL, Document as _Doc, DocumentVersion as _DV
doc_id = str(uuid.uuid4())
ver_id = str(uuid.uuid4())
_db = _SL()
try:
_db.query(_Doc).filter(_Doc.is_active == True).update({"is_active": False})
_db.add(_Doc(
id=doc_id, session_id=doc_session_id, title=title,
language="markdown", current_content=content,
version_count=1, is_active=True,
))
_db.add(_DV(
id=ver_id, document_id=doc_id, version_number=1,
content=content, summary="Imported from email attachment", source="upload",
))
_db.commit()
finally:
_db.close()
_tag_doc_with_source(doc_id)
doc_id = _create_markdown_doc(content, "Imported from email attachment")
return {"doc_id": doc_id, "filename": filepath.name}
return {"error": f"Unsupported attachment type: {ext}", "filename": base}
@@ -1724,6 +1933,22 @@ def setup_email_routes():
logger.error(f"Failed to mark unread {uid}: {e}")
return {"success": False, "error": "Mail operation failed"}
@router.post("/flag/{uid}")
async def flag_email(uid: str, folder: str = Query("INBOX"), account_id: str | None = Query(None),
on: bool = Query(True), owner: str = Depends(require_owner)):
"""Toggle the \\Flagged flag (a.k.a. favorite / star) on an email.
Pass `on=true` to favorite, `on=false` to unfavorite."""
try:
with _imap(account_id, owner=owner) as conn:
conn.select(_q(folder))
if not _store_email_flag(conn, uid, "\\Flagged", add=bool(on)):
return {"success": False, "error": "Email not found"}
_invalidate_list_cache(account_id, folder)
return {"success": True, "flagged": bool(on)}
except Exception as e:
logger.error(f"Failed to flag {uid}: {e}")
return {"success": False, "error": "Mail operation failed"}
@router.post("/mark-read/{uid}")
async def mark_read(uid: str, folder: str = Query("INBOX"), account_id: str | None = Query(None), owner: str = Depends(require_owner)):
"""Mark an email as read (set \\Seen flag)."""
@@ -1973,7 +2198,10 @@ def setup_email_routes():
outer = MIMEMultipart("alternative")
body_container = outer
outer["From"] = cfg["from_address"]
to = _normalize_addr_field(to or "")
cc = _normalize_addr_field(cc or "")
bcc = _normalize_addr_field(bcc or "")
outer["From"] = email.utils.formataddr((cfg.get("display_name") or "", cfg["from_address"]))
outer["To"] = to
if cc:
outer["Cc"] = cc
@@ -2104,6 +2332,77 @@ def setup_email_routes():
logger.error(f"cancel_scheduled {sid!r} failed: {e}")
return {"success": False, "error": "Mail operation failed"}
# ── Agent send-confirm: list/approve/cancel ──────────────────────────
# When `agent_email_confirm` is on, the MCP send_email tool drops the
# composed email into scheduled_emails with status='agent_draft' (a
# far-future send_at so the poller never picks it up). These endpoints
# let the chat UI surface them for the user and either approve (flip
# to status='pending' with send_at=now so the poller delivers it) or
# cancel (status='cancelled').
@router.get("/pending")
async def list_pending_agent_drafts(owner: str = Depends(require_owner)):
import sqlite3
try:
conn = sqlite3.connect(SCHEDULED_DB)
conn.row_factory = sqlite3.Row
rows = conn.execute(
"""SELECT id, to_addr, subject, body, created_at, account_id
FROM scheduled_emails
WHERE status = 'agent_draft' AND owner = ?
ORDER BY created_at DESC""",
(owner or "",),
).fetchall()
conn.close()
return {"pending": [dict(r) for r in rows]}
except Exception as e:
logger.error(f"list_pending_agent_drafts failed: {e}")
return {"pending": [], "error": "Mail operation failed"}
@router.post("/pending/{sid}/approve")
async def approve_agent_draft(sid: str, owner: str = Depends(require_owner)):
"""Approve a draft staged by the agent: flip status → pending and
backdate send_at so the scheduled-send poller picks it up
immediately."""
import sqlite3
try:
conn = sqlite3.connect(SCHEDULED_DB)
cur = conn.execute(
"""UPDATE scheduled_emails
SET status = 'pending', send_at = ?
WHERE id = ? AND status = 'agent_draft' AND owner = ?""",
(datetime.utcnow().isoformat(), sid, owner or ""),
)
conn.commit()
affected = cur.rowcount
conn.close()
if not affected:
return {"success": False, "error": "Draft not found or already handled"}
return {"success": True}
except Exception as e:
logger.error(f"approve_agent_draft {sid!r} failed: {e}")
return {"success": False, "error": "Mail operation failed"}
@router.delete("/pending/{sid}")
async def cancel_agent_draft(sid: str, owner: str = Depends(require_owner)):
"""Discard a draft the agent staged for approval."""
import sqlite3
try:
conn = sqlite3.connect(SCHEDULED_DB)
cur = conn.execute(
"""UPDATE scheduled_emails SET status = 'cancelled'
WHERE id = ? AND status = 'agent_draft' AND owner = ?""",
(sid, owner or ""),
)
conn.commit()
affected = cur.rowcount
conn.close()
if not affected:
return {"success": False, "error": "Draft not found or already handled"}
return {"success": True}
except Exception as e:
logger.error(f"cancel_agent_draft {sid!r} failed: {e}")
return {"success": False, "error": "Mail operation failed"}
@router.get("/resolve-contact")
async def resolve_contact(name: str = Query(..., description="Name to search for"), owner: str = Depends(require_owner)):
"""Search Sent folder for a contact by name. Returns matching email addresses."""
@@ -2164,6 +2463,7 @@ def setup_email_routes():
try:
cfg = _resolve_send_config(req.account_id, owner=owner)
except Exception as e:
logger.warning(f"No SMTP-capable account resolved: {e}")
return {"success": False, "error": str(e) or "No SMTP-capable email account configured"}
# Use 'mixed' if we have attachments, 'alternative' otherwise
@@ -2176,7 +2476,10 @@ def setup_email_routes():
outer = MIMEMultipart("alternative")
body_container = outer
outer["From"] = cfg["from_address"]
req.to = _normalize_addr_field(req.to or "")
req.cc = _normalize_addr_field(req.cc or "")
req.bcc = _normalize_addr_field(req.bcc or "")
outer["From"] = email.utils.formataddr((cfg.get("display_name") or "", cfg["from_address"]))
outer["To"] = req.to
if req.cc:
outer["Cc"] = req.cc
@@ -2227,6 +2530,10 @@ def setup_email_routes():
_account_id = cfg.get("account_id") or req.account_id # capture for the IMAP append in the closure
_in_reply_to = (req.in_reply_to or "").strip()
_oauth_provider = cfg.get("oauth_provider") or ""
_oauth_access_token = cfg.get("oauth_access_token") or ""
_oauth_refresh_token = cfg.get("oauth_refresh_token") or ""
_oauth_token_expiry = cfg.get("oauth_token_expiry") or ""
def _deliver():
try:
@@ -2237,6 +2544,11 @@ def setup_email_routes():
"smtp_security": _smtp_security,
"smtp_user": _smtp_user,
"smtp_password": _smtp_pw,
"account_id": _account_id,
"oauth_provider": _oauth_provider,
"oauth_access_token": _oauth_access_token,
"oauth_refresh_token": _oauth_refresh_token,
"oauth_token_expiry": _oauth_token_expiry,
},
_from,
_recipients,
@@ -2349,7 +2661,7 @@ def setup_email_routes():
msg.attach(MIMEText(_draft_html, "html", "utf-8"))
else:
msg = MIMEText(req.body, "plain", "utf-8")
msg["From"] = cfg["from_address"]
msg["From"] = email.utils.formataddr((cfg.get("display_name") or "", cfg["from_address"]))
msg["To"] = req.to
if req.cc:
msg["Cc"] = req.cc
@@ -2617,11 +2929,15 @@ def setup_email_routes():
source_uid = (data.get("uid") or "").strip()
source_folder = (data.get("folder") or "INBOX").strip()
fast_reply = bool(data.get("fast", False))
user_hint = (data.get("user_hint") or "").strip()
if not original_body:
return {"success": False, "error": "No email body provided"}
if message_id:
# Skip cache lookup when the caller supplied a user_hint — the
# cached generic reply doesn't reflect the instructions and
# would silently override them.
if message_id and not user_hint:
try:
_c = _sql3.connect(SCHEDULED_DB)
owner_clause, owner_params = _email_cache_owner_clause(owner)
@@ -2761,8 +3077,13 @@ def setup_email_routes():
user_msg = (
f"Recipient: {to}\nSubject: {subject}\n\n"
f"Original email and any current draft:\n{original_body[:6000]}\n\n"
f"Draft a reply. Return only the reply body text."
)
if user_hint:
user_msg += (
f"User's instructions for THIS reply (follow these — they override "
f"defaults like length/tone):\n{user_hint[:2000]}\n\n"
)
user_msg += "Draft a reply. Return only the reply body text."
# Build a candidate chain so a stale session-stored API key
# (the most common cause of "authentication failed" here)
@@ -2992,6 +3313,8 @@ def setup_email_routes():
"from_address": r.from_address or "",
"has_imap_password": bool(r.imap_password),
"has_smtp_password": bool(r.smtp_password),
"oauth_provider": r.oauth_provider or "",
"display_name": r.display_name or "",
})
return {"accounts": out}
finally:
@@ -3024,6 +3347,7 @@ def setup_email_routes():
smtp_user=(data.get("smtp_user") or "").strip(),
smtp_password=_enc(data.get("smtp_password") or ""),
from_address=(data.get("from_address") or "").strip(),
display_name=(data.get("display_name") or "").strip(),
# SECURITY: stamp the creator so all subsequent reads / mutations
# can filter by user. Without this every new account leaks to
# every other user.
@@ -3058,7 +3382,7 @@ def setup_email_routes():
if not row:
return {"ok": False, "error": "Account not found"}
# Simple fields
for key in ("name", "imap_host", "imap_user", "smtp_host", "smtp_user", "from_address"):
for key in ("name", "imap_host", "imap_user", "smtp_host", "smtp_user", "from_address", "display_name"):
if key in data:
setattr(row, key, (data[key] or "").strip())
for key in ("imap_port", "smtp_port"):
@@ -3247,4 +3571,123 @@ def setup_email_routes():
finally:
db.close()
# ── Google OAuth2 routes ──
@router.get("/oauth/google/authorize")
async def google_oauth_authorize(account_id: str = Query(...), request: Request = None, owner: str = Depends(require_user)):
import urllib.parse
_assert_owns_account(account_id, owner)
client_id = os.environ.get("GOOGLE_OAUTH_CLIENT_ID", "")
if not client_id:
raise HTTPException(400, "GOOGLE_OAUTH_CLIENT_ID not set — add it to .env")
redirect_uri = (
os.environ.get("GOOGLE_OAUTH_REDIRECT_URI")
or f"http://{request.headers.get('host', 'localhost:7000')}/api/email/oauth/google/callback"
)
state = make_oauth_state(account_id, owner)
params = urllib.parse.urlencode({
"client_id": client_id,
"redirect_uri": redirect_uri,
"response_type": "code",
"scope": "https://mail.google.com/ email",
"access_type": "offline",
"prompt": "consent",
"state": state,
})
from fastapi.responses import RedirectResponse as _RR
return _RR(f"https://accounts.google.com/o/oauth2/v2/auth?{params}")
@router.get("/oauth/google/callback")
async def google_oauth_callback(
code: str = Query(None),
state: str = Query(None),
error: str = Query(None),
request: Request = None,
):
import urllib.parse
from fastapi.responses import RedirectResponse as _RR
if error:
return _RR("/?section=integrations&email_oauth_error=google_error")
if not code or not state:
return _RR("/?section=integrations&email_oauth_error=missing_code")
state_data = verify_oauth_state(state)
if not state_data:
return _RR("/?section=integrations&email_oauth_error=invalid_state")
account_id = state_data.get("a", "")
owner = state_data.get("o", "")
client_id = os.environ.get("GOOGLE_OAUTH_CLIENT_ID", "")
client_secret = os.environ.get("GOOGLE_OAUTH_CLIENT_SECRET", "")
redirect_uri = (
os.environ.get("GOOGLE_OAUTH_REDIRECT_URI")
or f"http://{request.headers.get('host', 'localhost:7000')}/api/email/oauth/google/callback"
)
import httpx as _httpx
try:
resp = _httpx.post("https://oauth2.googleapis.com/token", data={
"code": code,
"client_id": client_id,
"client_secret": client_secret,
"redirect_uri": redirect_uri,
"grant_type": "authorization_code",
}, timeout=10)
resp.raise_for_status()
data = resp.json()
except Exception:
logger.warning("Google token exchange failed")
return _RR("/?section=integrations&email_oauth_error=token_exchange_failed")
access_token = data.get("access_token", "")
refresh_token = data.get("refresh_token", "")
expiry = str(int(time.time()) + data.get("expires_in", 3600))
# Fetch the email address from userinfo so we can auto-fill imap_user.
email_addr = ""
display_name = ""
try:
ui = _httpx.get("https://www.googleapis.com/oauth2/v1/userinfo",
headers={"Authorization": f"Bearer {access_token}"}, timeout=10)
if ui.is_success:
ui_data = ui.json()
email_addr = ui_data.get("email", "")
display_name = ui_data.get("name", "")
except Exception:
pass
from core.database import SessionLocal, EmailAccount
from src.secret_storage import encrypt as _enc
db = SessionLocal()
try:
row = db.query(EmailAccount).filter(EmailAccount.id == account_id).first()
if not row:
return _RR("/?section=integrations&email_oauth_error=account_not_found")
# SECURITY: verify the account belongs to the initiating user.
if owner and row.owner and row.owner != owner:
logger.warning("OAuth callback owner mismatch — rejecting token write")
return _RR("/?section=integrations&email_oauth_error=ownership_error")
row.oauth_provider = "google"
row.oauth_access_token = _enc(access_token)
if refresh_token:
row.oauth_refresh_token = _enc(refresh_token)
row.oauth_token_expiry = expiry
# Auto-fill Google IMAP/SMTP settings if not already configured.
if not row.imap_host:
row.imap_host = "imap.gmail.com"
row.imap_port = 993
row.imap_starttls = False
if not row.smtp_host:
row.smtp_host = "smtp.gmail.com"
row.smtp_port = 587
if email_addr:
if not row.imap_user:
row.imap_user = email_addr
if not row.smtp_user:
row.smtp_user = email_addr
if not row.from_address:
row.from_address = email_addr
if not row.name or row.name == row.id:
row.name = email_addr
if display_name and not row.display_name:
row.display_name = display_name
db.commit()
finally:
db.close()
return _RR("/?section=integrations&email_oauth_success=1")
return router
+1
View File
@@ -9,6 +9,7 @@ from pathlib import Path
from fastapi import APIRouter, HTTPException, Form, Depends
from core.constants import EMBEDDING_ENDPOINT_FILE, FASTEMBED_CACHE_DIR
from core.middleware import require_admin
from src.runtime_paths import get_app_root
logger = logging.getLogger(__name__)
+2 -5
View File
@@ -224,8 +224,6 @@ def setup_gallery_routes() -> APIRouter:
@router.post("/api/gallery/{image_id}/replace")
async def gallery_replace(request: Request, image_id: str):
"""Replace an existing gallery image file with a new one."""
from pathlib import Path
user = get_current_user(request)
db = SessionLocal()
try:
@@ -241,9 +239,8 @@ def setup_gallery_routes() -> APIRouter:
raise HTTPException(400, "No image provided")
content = await read_upload_limited(file, GALLERY_UPLOAD_MAX_BYTES, "Gallery replacement")
img_dir = Path(GENERATED_IMAGES_DIR)
img_dir.mkdir(parents=True, exist_ok=True)
img_path = img_dir / _sanitize_gallery_filename(img.filename)
GALLERY_IMAGE_DIR.mkdir(parents=True, exist_ok=True)
img_path = _gallery_image_path(img.filename)
img_path.write_bytes(content)
# Refresh dimensions in case the editor resized the canvas.
+112 -5
View File
@@ -1,8 +1,13 @@
import json
import os
import re
import shlex
import subprocess
from copy import deepcopy
from fastapi import APIRouter, HTTPException
from core.platform_compat import run_ssh_command
from routes._validators import validate_remote_host, validate_ssh_port
@@ -107,6 +112,73 @@ def _apply_manual_hardware(system, manual_mode="", manual_gpu_count="", manual_v
return system
def _run_model_probe(host: str, ssh_port: str, cmd: str) -> str:
try:
if host:
r = run_ssh_command(
host,
ssh_port or None,
cmd,
timeout=15,
connect_timeout=5,
strict_host_key_checking=False,
text=True,
)
else:
r = subprocess.run(["bash", "-lc", cmd], capture_output=True, text=True, timeout=15)
if r.returncode == 0:
return (r.stdout or "").strip()
except Exception:
return ""
return ""
def _inspect_model_path(model_path: str, host: str = "", ssh_port: str = "") -> dict:
"""Read lightweight metadata from a local or SSH-visible HF model folder."""
path = (model_path or "").strip()
if not path or path.startswith(("http://", "https://")):
return {}
if not (path.startswith("/") or path.startswith("~")):
return {}
qpath = shlex.quote(path)
qconfig = shlex.quote(os.path.join(path, "config.json"))
out = {}
exists = _run_model_probe(host, ssh_port, f"test -d {qpath} && printf found || printf missing")
if exists != "found":
target = host or "local container"
out["model_probe_error"] = f"Model path is not visible on {target}: {path}"
return out
raw_config = _run_model_probe(host, ssh_port, f"test -f {qconfig} && sed -n '1,240p' {qconfig}")
if raw_config:
try:
cfg = json.loads(raw_config)
except Exception:
cfg = {}
for key in ("context_length", "max_position_embeddings", "n_ctx_train", "model_max_length", "max_seq_len"):
value = cfg.get(key)
if isinstance(value, (int, float)) and value > 0:
out["model_ctx_max"] = int(value)
break
else:
out["model_probe_error"] = f"config.json not found in model path: {path}"
size_cmd = (
f"find {qpath} -type f \\( -name '*.safetensors' -o -name '*.bin' -o -name '*.gguf' \\) "
"-printf '%s\\n' 2>/dev/null | awk '{s+=$1} END {if (s>0) printf \"%.6f\", s/1073741824}'"
)
weights = _run_model_probe(host, ssh_port, size_cmd)
try:
weights_gb = float(weights)
except Exception:
weights_gb = 0.0
if weights_gb > 0:
out["model_weights_gb"] = round(weights_gb, 3)
elif "model_probe_error" not in out:
out["model_probe_error"] = f"No model weight files found in: {path}"
return out
def setup_hwfit_routes():
router = APIRouter(prefix="/api/hwfit", tags=["hwfit"])
@@ -119,7 +191,7 @@ def setup_hwfit_routes():
return detect_system(host=host, ssh_port=ssh_port, platform=platform, fresh=fresh)
@router.get("/models")
def get_models(use_case: str = "", sort: str = "score", limit: int = 50, search: str = "", host: str = "", quant: str = "", ctx: str = "", gpu_count: str = "", gpu_group: str = "", ssh_port: str = "", platform: str = "", fresh: bool = False, manual_mode: str = "", manual_gpu_count: str = "", manual_vram_gb: str = "", manual_ram_gb: str = "", manual_backend: str = "", ignore_detected_gpu: bool = False, ignore_detected_ram: bool = False, fit_only: bool = False):
def get_models(use_case: str = "", sort: str = "newest", limit: int = 50, search: str = "", host: str = "", quant: str = "", ctx: str = "", gpu_count: str = "", gpu_group: str = "", ssh_port: str = "", platform: str = "", fresh: bool = False, manual_mode: str = "", manual_gpu_count: str = "", manual_vram_gb: str = "", manual_ram_gb: str = "", manual_backend: str = "", ignore_detected_gpu: bool = False, ignore_detected_ram: bool = False, fit_only: bool = False):
"""Rank LLM models against detected hardware and return scored results.
gpu_count: override GPU count (0 = CPU only, 1-N = simulate N GPUs of the
active group). gpu_group: index into system.gpu_groups (the homogeneous
@@ -235,7 +307,7 @@ def setup_hwfit_routes():
return {"system": system, "models": results}
@router.get("/profiles")
def get_serve_profiles(model: str = "", host: str = "", ssh_port: str = "", platform: str = "", fresh: bool = False, serve_weights_gb: float = 0.0, serve_quant: str = ""):
def get_serve_profiles(model: str = "", model_path: str = "", host: str = "", ssh_port: str = "", platform: str = "", fresh: bool = False, serve_weights_gb: float = 0.0, serve_quant: str = ""):
"""Compute llama.cpp serve profiles (Quality/Balanced/Speed) for `model`
against the detected hardware on `host` (or local). Returns concrete
flags (n_gpu_layers, n_cpu_moe, cache_type, ctx) the serve UI can apply.
@@ -260,8 +332,23 @@ def setup_hwfit_routes():
# "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct".
s = (s or "").lower().strip()
s = s.split("/")[-1] # drop org prefix
s = re.sub(r"[-_.]?gguf$", "", s) # drop trailing gguf marker
s = re.sub(r"[-_.](q\d[^/]*|iq\d[^/]*|fp8|bf16|f16|awq[^/]*|gptq[^/]*)$", "", s)
for suffix in ("-gguf", "_gguf", ".gguf", "gguf"):
if s.endswith(suffix):
s = s[: -len(suffix)]
break
cut_at = None
for idx, ch in enumerate(s):
if ch not in "-_." or idx + 1 >= len(s):
continue
suffix = s[idx + 1:]
if (
suffix in {"fp8", "bf16", "f16"}
or suffix.startswith(("awq", "gptq", "iq"))
or (suffix.startswith("q") and len(suffix) > 1 and suffix[1].isdigit())
):
cut_at = idx
if cut_at is not None:
s = s[:cut_at]
return s
m = catalog.get(model)
@@ -272,8 +359,16 @@ def setup_hwfit_routes():
if nn and (nn == want or want.endswith(nn) or nn.endswith(want)):
m = entry
break
path_meta = _inspect_model_path(model_path or model, host=host, ssh_port=ssh_port)
if m is None:
return {"system": system, "profiles": [], "error": "model not in catalog"}
return {
"system": system,
"profiles": [],
"error": "model not in catalog",
"model_ctx_max": int(path_meta.get("model_ctx_max") or 0),
"model_weights_gb": float(path_meta.get("model_weights_gb") or 0),
"model_probe_error": path_meta.get("model_probe_error") or "",
}
# Surface the model's trained context limit so the serve UI can clamp a
# user-typed context down to it (asking for ctx > n_ctx_train overflows
# and, with a quantized KV cache, can crash the GPU).
@@ -283,6 +378,16 @@ def setup_hwfit_routes():
if isinstance(v, (int, float)) and v > 0:
model_ctx_max = int(v)
break
path_ctx_max = int(path_meta.get("model_ctx_max") or 0)
if path_ctx_max > 0:
model_ctx_max = max(model_ctx_max, path_ctx_max)
model_weights_gb = float(path_meta.get("model_weights_gb") or 0)
if model_weights_gb <= 0:
for k in ("min_vram_gb", "required_gb", "size_gb", "recommended_ram_gb", "min_ram_gb"):
v = m.get(k)
if isinstance(v, (int, float)) and v > 0:
model_weights_gb = float(v)
break
return {
"system": system,
"profiles": compute_serve_profiles(
@@ -291,6 +396,8 @@ def setup_hwfit_routes():
serve_quant=(serve_quant or None),
),
"model_ctx_max": model_ctx_max,
"model_weights_gb": model_weights_gb,
"model_probe_error": path_meta.get("model_probe_error") or "",
}
@router.get("/image-models")
+33 -58
View File
@@ -273,65 +273,30 @@ def setup_memory_routes(memory_manager: MemoryManager, session_manager: SessionM
async def api_audit_memories(request: Request, session: str = Form(None)):
"""Deduplicate and consolidate memories via LLM.
Uses the default model from settings, or falls back to a session's model.
Uses task/utility/default settings through the shared resolver, with
the active session as fallback when no task or utility model is set.
Returns before and after memory counts.
"""
from routes.model_routes import _load_settings, _normalize_base, build_chat_url
from core.database import ModelEndpoint
import json as _json
endpoint_url = model = None
headers = {}
# Try utility model from settings first — memory audit is a background
# task and should prefer the lighter utility model over the main chat model.
from src.task_endpoint import resolve_task_endpoint
user = _owner(request)
t_url, t_model, t_headers = resolve_task_endpoint(owner=user)
if t_url and t_model:
endpoint_url, model, headers = t_url, t_model, t_headers
else:
# Fall back to default model if no task/utility model configured
settings = _load_settings()
ep_id = settings.get("default_endpoint_id", "")
default_model = settings.get("default_model", "")
if ep_id:
db = SessionLocal()
try:
ep = db.query(ModelEndpoint).filter(
ModelEndpoint.id == ep_id, ModelEndpoint.is_enabled == True
).first()
if ep:
base = _normalize_base(ep.base_url)
endpoint_url = build_chat_url(base)
model = default_model
if not model and ep.models:
try:
models = _json.loads(ep.models) if isinstance(ep.models, str) else ep.models
if models:
model = models[0]
except Exception:
pass
if ep.api_key:
headers = {"Authorization": f"Bearer {ep.api_key}"}
finally:
db.close()
fallback_url = fallback_model = None
fallback_headers = None
if session:
try:
sess = session_manager.get_session(session)
_assert_session_owner(sess, user)
fallback_url = sess.endpoint_url
fallback_model = sess.model
fallback_headers = sess.headers
except KeyError:
pass
# Fall back to session model if no default configured
if not endpoint_url and session:
try:
sess = session_manager.get_session(session)
_assert_session_owner(sess, _owner(request))
endpoint_url = sess.endpoint_url
model = sess.model
headers = sess.headers
except KeyError:
pass
endpoint_url, model, headers = resolve_task_endpoint(
fallback_url, fallback_model, fallback_headers, owner=user
)
if not endpoint_url or not model:
raise HTTPException(400, "No default model configured — set one in Settings")
user = _owner(request)
result = await audit_memories(
memory_manager,
memory_vector,
@@ -369,18 +334,28 @@ def setup_memory_routes(memory_manager: MemoryManager, session_manager: SessionM
model = None
headers = {}
user = _owner(request)
if session:
try:
sess = session_manager.get_session(session)
_assert_session_owner(sess, _owner(request))
endpoint_url, model, headers = resolve_task_endpoint(
sess.endpoint_url, sess.model, sess.headers, owner=_owner(request)
)
_assert_session_owner(sess, user)
except KeyError:
logger.warning("Session %s not found, falling back to utility endpoint", session)
endpoint_url, model, headers = resolve_endpoint("utility", owner=_owner(request))
sess = None
except HTTPException as exc:
if exc.status_code != 404:
raise
sess = None
if sess is None:
logger.warning("Session %s not found or inaccessible, falling back to utility endpoint", session)
endpoint_url, model, headers = resolve_endpoint("utility", owner=user)
else:
endpoint_url, model, headers = resolve_task_endpoint(
sess.endpoint_url, sess.model, sess.headers, owner=user
)
else:
endpoint_url, model, headers = resolve_task_endpoint(owner=_owner(request))
endpoint_url, model, headers = resolve_task_endpoint(owner=user)
if not endpoint_url or not model:
raise HTTPException(400, "No LLM model configured. Set a default model in Settings.")
+124 -35
View File
@@ -5,6 +5,7 @@ import re
import uuid
import json
import hashlib
import ipaddress
import socket
import time as _time
import logging
@@ -16,6 +17,7 @@ from fastapi import APIRouter, HTTPException, Form, Query, Body, Request, Respon
from pydantic import BaseModel
from fastapi.responses import StreamingResponse
from core.database import SessionLocal, ModelEndpoint, Session as DbSession
from core.log_safety import redact_url as _redact_url_for_log
from core.middleware import require_admin
from src.llm_core import _detect_provider, _host_match, ANTHROPIC_MODELS
from src.tls_overrides import llm_verify
@@ -26,7 +28,7 @@ from src.endpoint_resolver import (
build_models_url,
build_headers,
)
from src.auth_helpers import _auth_disabled, owner_filter
from src.auth_helpers import _auth_disabled, effective_user, owner_filter
logger = logging.getLogger(__name__)
@@ -405,8 +407,11 @@ def _endpoint_refresh_timeout(ep: Any, category: str) -> float:
except Exception:
val = 0
if val > 0:
return float(max(1, min(30, val)))
return 2.5 if category == "local" else 2.0
return float(max(1, min(60, val)))
# llama.cpp and other local OpenAI-compatible servers can block briefly
# while warming/loading. A 2s local timeout makes working endpoints flicker
# offline before /v1/models is ready.
return 10.0 if category == "local" else 2.0
def _manual_refresh_timeout(ep: Any, category: str, requested: Any = None) -> float:
@@ -473,7 +478,7 @@ def _explicit_model_list_timeout(base_url: str, endpoint_kind: str = "auto", req
category = _classify_endpoint(base_url, kind)
if kind in ("api", "proxy") or category == "api":
return 30.0
return 3.0 if _is_ollama_base(base_url) else 2.0
return 15.0 if category == "local" else (3.0 if _is_ollama_base(base_url) else 2.0)
def _cached_model_ids(ep: Any) -> List[str]:
@@ -562,6 +567,8 @@ def _safe_build_models_url(base_url: str) -> str:
"""Build a /models URL without letting optional provider imports break probes."""
try:
return build_models_url(base_url)
except ValueError:
raise
except Exception as exc:
logger.debug("Model URL detection failed for %s: %s", base_url, exc)
return f"{(base_url or '').rstrip('/')}/models"
@@ -633,7 +640,7 @@ def _probe_single_model(base: str, api_key: str, model_id: str, timeout: int = 1
try:
t0 = _time.time()
r = httpx.post(target_url, headers=h, json=payload, timeout=timeout)
r = httpx.post(target_url, headers=h, json=payload, timeout=timeout, verify=llm_verify())
latency = round((_time.time() - t0) * 1000)
if r.is_success:
return {"status": "ok", "latency_ms": latency}
@@ -659,13 +666,20 @@ def _probe_single_model(base: str, api_key: str, model_id: str, timeout: int = 1
# Hostnames / IP prefixes that indicate a local endpoint
_LOCAL_HOSTS = {"localhost", "127.0.0.1", "0.0.0.0", "::1"}
_PRIVATE_PREFIXES = ("10.", "172.16.", "172.17.", "172.18.", "172.19.",
"172.20.", "172.21.", "172.22.", "172.23.", "172.24.",
"172.25.", "172.26.", "172.27.", "172.28.", "172.29.",
"172.30.", "172.31.", "192.168.")
_PRIVATE_NETWORKS = (
ipaddress.ip_network("10.0.0.0/8"),
ipaddress.ip_network("172.16.0.0/12"),
ipaddress.ip_network("192.168.0.0/16"),
)
_TAILSCALE_CGNAT = ipaddress.ip_network("100.64.0.0/10")
_TAILSCALE_RE = re.compile(r"^100\.(6[4-9]|[7-9]\d|1[01]\d|12[0-7])\.")
def _local_ip_literal(host: str) -> bool:
try:
ip = ipaddress.ip_address(host)
except ValueError:
return False
return any(ip in network for network in _PRIVATE_NETWORKS) or ip in _TAILSCALE_CGNAT
def _classify_endpoint(base_url: str, endpoint_kind: str = "auto") -> str:
@@ -679,9 +693,7 @@ def _classify_endpoint(base_url: str, endpoint_kind: str = "auto") -> str:
return "api"
try:
host = urlparse(base_url).hostname or ""
if host in _LOCAL_HOSTS or host.startswith(_PRIVATE_PREFIXES):
return "local"
if _TAILSCALE_RE.match(host):
if host in _LOCAL_HOSTS or _local_ip_literal(host):
return "local"
except Exception:
pass
@@ -703,6 +715,16 @@ def _effective_endpoint_kind(ep: Any, base_url: str) -> str:
return "auto"
def _is_loading_model_response(resp: Any) -> bool:
if getattr(resp, "status_code", None) != 503:
return False
try:
body = resp.text or ""
except Exception:
body = ""
return "loading model" in body.lower()
def _probe_endpoint(base_url: str, api_key: str = None, timeout: int = 5) -> List[str]:
"""Probe a base URL's /models endpoint and return list of model IDs.
@@ -767,16 +789,19 @@ def _probe_endpoint(base_url: str, api_key: str = None, timeout: int = 5) -> Lis
models.append(_e)
return [m for m in models if _is_chat_model(m)]
except httpx.HTTPStatusError as e:
if e.response is not None and _is_loading_model_response(e.response):
logger.info("Endpoint still loading model at %s", _redact_url_for_log(url))
return []
if api_key:
status = e.response.status_code if e.response is not None else "unknown"
logger.warning(f"Failed to probe {url} with API key: HTTP {status}")
logger.warning("Failed to probe %s with API key: HTTP %s", _redact_url_for_log(url), status)
return []
logger.warning(f"Failed to probe {url}: {e}")
logger.warning("Failed to probe %s: %s", _redact_url_for_log(url), e)
except Exception as e:
if api_key:
logger.warning(f"Failed to probe {url} with API key: {e}")
logger.warning("Failed to probe %s with API key: %s", _redact_url_for_log(url), e)
return []
logger.warning(f"Failed to probe {url}: {e}")
logger.warning("Failed to probe %s: %s", _redact_url_for_log(url), e)
# Older Ollama builds and some proxies expose native /api/tags even when
# the OpenAI-compatible /v1/models path is unavailable.
@@ -816,6 +841,15 @@ def _ping_endpoint(base_url: str, api_key: str = None, timeout: float = 1.5) ->
or "ollama" in (parsed_base.hostname or "").lower()
)
def _is_loading_model_response(r) -> bool:
if getattr(r, "status_code", None) != 503:
return False
try:
body = r.text or ""
except Exception:
body = ""
return "loading model" in body.lower()
def _result_from_response(r) -> Dict[str, Any]:
if 300 <= r.status_code < 400:
loc = r.headers.get("location", "")
@@ -832,6 +866,13 @@ def _ping_endpoint(base_url: str, api_key: str = None, timeout: float = 1.5) ->
"status_code": r.status_code,
"error": None,
}
if _is_loading_model_response(r):
return {
"reachable": True,
"loading": True,
"status_code": r.status_code,
"error": "Loading model",
}
return {"reachable": False, "status_code": r.status_code, "error": f"HTTP {r.status_code}"}
last_error: Optional[str] = None
@@ -864,7 +905,7 @@ def _ping_endpoint(base_url: str, api_key: str = None, timeout: float = 1.5) ->
if 400 <= sc < 500 and sc not in (401, 403):
models_url = _safe_build_models_url(base)
try:
r2 = httpx.get(models_url, headers=headers, timeout=timeout, verify=llm_verify())
r2 = httpx.get(models_url, headers=headers,timeout=timeout, verify=llm_verify())
result2 = _result_from_response(r2)
if result2["reachable"]:
return result2
@@ -1048,9 +1089,11 @@ def setup_model_routes(model_discovery):
except Exception:
return 0.0
def _failure_delay(fails: int) -> float:
def _failure_delay(fails: int, *, empty_local: bool = False) -> float:
if fails <= 0:
return 0.0
if empty_local:
return min(5.0 * (2 ** max(0, fails - 1)), 30.0)
return min(_REFRESH_FAILURE_BASE * (2 ** max(0, fails - 1)), _REFRESH_FAILURE_MAX)
def _should_refresh_endpoint(ep: Any, now: float, force: bool = False) -> tuple[bool, Dict[str, Any]]:
@@ -1081,7 +1124,12 @@ def setup_model_routes(model_discovery):
fails = int(state.get("fail_count") or 0)
if fails and not force:
last_failure = float(state.get("last_failure") or 0.0)
if now - last_failure < _failure_delay(fails):
empty_local = (
not cached
and category == "local"
and str(getattr(ep, "id", "") or "").startswith("local-")
)
if now - last_failure < _failure_delay(fails, empty_local=empty_local):
return False, info
if cached and not force:
interval = _endpoint_refresh_interval(ep, category)
@@ -1255,13 +1303,16 @@ def setup_model_routes(model_discovery):
# Require auth; "" is the unconfigured single-user mode, treated as
# "see everything" by _fetch_models.
try:
from src.auth_helpers import get_current_user as _gcu
owner = _gcu(request) or ""
except Exception:
owner = ""
# Reject anonymous in configured deployments — no leaking the model
# list to unauthenticated callers.
try:
if getattr(request.state, "api_token", False):
scopes = set(getattr(request.state, "api_token_scopes", []) or [])
if "chat" not in scopes:
raise HTTPException(403, "API token is not scoped for chat")
if not getattr(request.state, "api_token_owner", None):
raise HTTPException(403, "API token has no owner")
owner = effective_user(request) or ""
# Reject anonymous in configured deployments — no leaking the model
# list to unauthenticated callers.
auth_mgr = getattr(request.app.state, "auth_manager", None)
if not owner and not _auth_disabled() and auth_mgr is not None and getattr(auth_mgr, "is_configured", False):
raise HTTPException(401, "Not authenticated")
@@ -1393,7 +1444,7 @@ def setup_model_routes(model_discovery):
t0 = _time.time()
ping = _ping_endpoint(base, ep.api_key, timeout=1.5)
entry["latency_ms"] = round((_time.time() - t0) * 1000)
entry["status"] = "online" if ping.get("reachable") or cached_count else "offline"
entry["status"] = "loading" if ping.get("loading") else ("online" if ping.get("reachable") or cached_count else "offline")
entry["error"] = ping.get("error")
entry["model_count"] = cached_count or (len(ANTHROPIC_MODELS) if provider == "anthropic" else 0)
except Exception as e:
@@ -1567,9 +1618,37 @@ def setup_model_routes(model_discovery):
# "everything's already cached" path because this branch only
# runs for endpoints with an empty cached_models.
if not all_models and not pinned and r.is_enabled:
ping = _ping_endpoint(r.base_url, r.api_key, timeout=3.5)
base_for_ping = _normalize_base(r.base_url)
kind_for_ping = _effective_endpoint_kind(r, base_for_ping)
ping_timeout = 10.0 if _classify_endpoint(base_for_ping, kind_for_ping) == "local" else 3.5
ping = _ping_endpoint(r.base_url, r.api_key, timeout=ping_timeout)
if ping.get("reachable"):
status = "empty"
status = "loading" if ping.get("loading") else "empty"
if ping.get("loading"):
base = _normalize_base(r.base_url)
kind = _effective_endpoint_kind(r, base)
results.append({
"id": r.id,
"name": r.name,
"base_url": r.base_url,
"has_key": bool(r.api_key),
"api_key_fingerprint": _api_key_fingerprint(r.api_key),
"is_enabled": r.is_enabled,
"models": visible,
"pinned_models": pinned,
"hidden_count": len(hidden),
"online": True,
"status": status,
"ping_error": (ping or {}).get("error") if ping else None,
"model_type": getattr(r, "model_type", None) or "llm",
"supports_tools": getattr(r, "supports_tools", None),
"endpoint_kind": kind,
"category": _classify_endpoint(base, kind),
"model_refresh_mode": _endpoint_refresh_mode(r, kind),
"model_refresh_interval": getattr(r, "model_refresh_interval", None),
"model_refresh_timeout": getattr(r, "model_refresh_timeout", None),
})
continue
# Best-effort: if the probe came back reachable, try
# to populate cached_models in the background so the
# NEXT picker load shows "online" instead of "empty".
@@ -1577,7 +1656,7 @@ def setup_model_routes(model_discovery):
# "empty" status, and the existing background refresh
# path will eventually fill it in too.
try:
probed = _probe_endpoint(r.base_url, r.api_key, timeout=5)
probed = _probe_endpoint(r.base_url, r.api_key, timeout=max(5, int(ping_timeout)))
if probed:
r.cached_models = json.dumps(probed)
db.commit()
@@ -1755,7 +1834,7 @@ def setup_model_routes(model_discovery):
model_ids = _probe_endpoint(base_url, api_key.strip() or None, timeout=explicit_timeout) if should_probe else []
ping = {"reachable": False, "error": None}
if (should_probe or requested_kind in ("api", "proxy")) and not model_ids:
ping = _ping_endpoint(base_url, api_key.strip() or None, timeout=min(explicit_timeout, 2.0))
ping = _ping_endpoint(base_url, api_key.strip() or None, timeout=min(explicit_timeout, 10.0))
if require_model_list and not model_ids:
raise HTTPException(400, _model_endpoint_error_message(base_url, ping))
@@ -1822,7 +1901,7 @@ def setup_model_routes(model_discovery):
"models": _merge_model_ids(model_ids, _pinned),
"pinned_models": _pinned,
"online": bool(model_ids) or bool(_pinned) or bool(ping.get("reachable")),
"status": "online" if (model_ids or _pinned) else ("empty" if ping.get("reachable") else "offline"),
"status": "online" if (model_ids or _pinned) else ("loading" if ping.get("loading") else ("empty" if ping.get("reachable") else "offline")),
"ping_error": ping.get("error") if ping else None,
"endpoint_kind": requested_kind,
"category": _classify_endpoint(base_url, requested_kind),
@@ -1847,11 +1926,11 @@ def setup_model_routes(model_discovery):
configured_timeout = _parse_positive_int(model_refresh_timeout, minimum=1, maximum=60)
probe_timeout = _explicit_model_list_timeout(base_url, requested_kind, configured_timeout)
models = _probe_endpoint(base_url, api_key.strip() or None, timeout=probe_timeout)
ping = {"reachable": True, "error": None} if models else _ping_endpoint(base_url, api_key.strip() or None, timeout=min(probe_timeout, 2.0))
ping = {"reachable": True, "error": None} if models else _ping_endpoint(base_url, api_key.strip() or None, timeout=min(probe_timeout, 10.0))
return {
"base_url": base_url,
"online": bool(models) or bool(ping.get("reachable")),
"status": "online" if models else ("empty" if ping.get("reachable") else "offline"),
"status": "online" if models else ("loading" if ping.get("loading") else ("empty" if ping.get("reachable") else "offline")),
"ping_error": ping.get("error") if ping else None,
"models": models,
"count": len(models),
@@ -2029,6 +2108,16 @@ def setup_model_routes(model_discovery):
ep_id = (_user_prefs.get("default_endpoint_id") or "").strip()
model = (_user_prefs.get("default_model") or "").strip()
_fallbacks = _user_prefs.get("default_model_fallbacks") or []
# If user has no personal default, fall back to global default
# But only based on the "share_defaults_with_users" flag
# (only if share_defaults_with_users is enabled)
if settings.get("share_defaults_with_users", False):
if not ep_id:
ep_id = settings.get("default_endpoint_id", "")
if not model:
model = settings.get("default_model", "")
if not _fallbacks:
_fallbacks = settings.get("default_model_fallbacks") or []
else:
ep_id = settings.get("default_endpoint_id", "")
model = settings.get("default_model", "")
+29 -10
View File
@@ -10,7 +10,8 @@ from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel
from core.database import SessionLocal, Note
from src.auth_helpers import get_current_user
from core.middleware import INTERNAL_TOOL_USER
from src.auth_helpers import require_user
from src.constants import DATA_DIR
from sqlalchemy.orm.attributes import flag_modified
@@ -208,14 +209,17 @@ async def dispatch_reminder(
try:
from src.endpoint_resolver import resolve_endpoint
from src.llm_core import llm_call_async
from src.reminder_personas import synthesis_system_prompt
url, model, headers = resolve_endpoint("utility", owner=owner or None)
if not url:
url, model, headers = resolve_endpoint("default", owner=owner or None)
if url and model:
persona_id = (settings.get("reminder_llm_persona") or "").strip()
sys_prompt = synthesis_system_prompt(persona_id)
raw = await llm_call_async(
url=url, model=model,
messages=[
{"role": "system", "content": "You are a reminder assistant. Write a single short, warm, motivating sentence (max 25 words) reminding the user about the note below. Do not add greetings, preamble, or hashtags. Output only the sentence."},
{"role": "system", "content": sys_prompt},
{"role": "user", "content": f"Title: {title}\n\n{note_body}".strip()},
],
temperature=0.7, max_tokens=200, headers=headers, timeout=30,
@@ -331,10 +335,11 @@ async def dispatch_reminder(
# Loud diagnostic so we can see WHY a reminder didn't send (the
# previous "silently no-op when cfg has no smtp_host" was invisible).
logger.info(
f"dispatch_reminder[email] note_id={note_id} owner={owner!r} "
f"smtp_host={cfg.get('smtp_host')!r} smtp_user={cfg.get('smtp_user')!r} "
f"from={from_addr!r} recipient={recipient!r} "
f"account_name={cfg.get('account_name')!r}"
"dispatch_reminder[email] note_id=%s owner=%r "
"has_smtp_host=%s has_smtp_user=%s has_from=%s has_recipient=%s",
note_id, owner,
bool(cfg.get("smtp_host")), bool(cfg.get("smtp_user")),
bool(from_addr), bool(recipient),
)
missing = []
if not cfg.get("smtp_host"):
@@ -567,10 +572,19 @@ def setup_note_routes(task_scheduler=None):
router = APIRouter(prefix="/api/notes", tags=["notes"])
def _owner(request: Request) -> Optional[str]:
return get_current_user(request)
# require_user, not bare get_current_user: a request that reaches
# these owner-scoped routes with NO identity (auth-middleware
# regression, SSRF from a sibling service) must fail closed (401)
# when auth is configured — not be treated as the single-user mode
# and handed blanket access to every account's notes. The documented
# anonymous modes (AUTH_ENABLED=false, LOCALHOST_BYPASS on loopback,
# unconfigured first-run) still resolve to None, the single-user
# path. fire_reminder below already gated this way; the CRUD routes
# did not.
return require_user(request) or None
def _is_admin_or_single_user(request: Request, user: str | None) -> bool:
if user == "internal-tool":
if user == INTERNAL_TOOL_USER:
return True
if not user:
# require_user() already admitted this request, which only happens
@@ -802,8 +816,7 @@ def setup_note_routes(task_scheduler=None):
Returns {synthesis, email_sent}.
"""
# Gate against anonymous callers — LLM synthesis can burn tokens.
from src.auth_helpers import require_user as _ru
user = _ru(request)
user = require_user(request)
body = await request.json()
note_id = str(body.get("note_id") or "").strip()
if not note_id:
@@ -826,6 +839,12 @@ def setup_note_routes(task_scheduler=None):
_override["reminder_webhook_integration_id"] = body["webhook_integration_id"]
if body.get("webhook_payload_template"):
_override["reminder_webhook_payload_template"] = body["webhook_payload_template"]
# Mirror the in-UI AI Synthesis toggle + persona so the test
# actually exercises the synthesis path before/without a Save.
if "llm_synthesis" in body:
_override["reminder_llm_synthesis"] = bool(body["llm_synthesis"])
if "llm_persona" in body:
_override["reminder_llm_persona"] = str(body["llm_persona"] or "")
else:
db = SessionLocal()
try:
+91 -6
View File
@@ -2,8 +2,9 @@
"""Routes for personal documents management."""
import os
import logging
import shutil
import uuid
from typing import List, Tuple
from typing import Any, Dict, List, Tuple
from fastapi import APIRouter, HTTPException, Query, Request, UploadFile, File, Depends
from src.request_models import DirectoryRequest
from core.constants import BASE_DIR, PERSONAL_DIR, PERSONAL_UPLOADS_DIR
@@ -18,14 +19,15 @@ UPLOADS_DIR = PERSONAL_UPLOADS_DIR
logger = logging.getLogger(__name__)
def _personal_upload_dir_for_owner(owner: str | None) -> str:
def _personal_upload_dir_for_owner(owner: str | None, *, create: bool = True) -> str:
"""Return the per-owner upload directory used for direct RAG uploads."""
owner_segment = secure_filename((owner or "local").strip())[:80] or "local"
upload_dir = os.path.abspath(os.path.join(UPLOADS_DIR, owner_segment))
base_abs = os.path.abspath(UPLOADS_DIR)
if os.path.commonpath([upload_dir, base_abs]) != base_abs:
raise ValueError("Unsafe upload owner path")
os.makedirs(upload_dir, exist_ok=True)
if create:
os.makedirs(upload_dir, exist_ok=True)
return upload_dir
@@ -44,6 +46,87 @@ def _unique_personal_upload_path(upload_dir: str, original_name: str | None) ->
raise ValueError("Unsafe upload filename")
return file_path, filename, safe_name
def _unique_existing_target(path: str) -> str:
"""Return a non-existing sibling path for rename collision handling."""
if not os.path.exists(path):
return path
stem, ext = os.path.splitext(path)
while True:
candidate = f"{stem}-{uuid.uuid4().hex[:10]}{ext}"
if not os.path.exists(candidate):
return candidate
def _remove_empty_tree(path: str) -> None:
"""Best-effort removal of empty directories under ``path``."""
if not os.path.isdir(path):
return
for root, dirs, _files in os.walk(path, topdown=False):
for dirname in dirs:
candidate = os.path.join(root, dirname)
try:
os.rmdir(candidate)
except OSError:
pass
try:
os.rmdir(path)
except OSError:
pass
def rename_personal_upload_owner(
old_owner: str,
new_owner: str,
*,
personal_docs_manager: Any = None,
rag_manager: Any = None,
) -> Dict[str, Any]:
"""Move direct personal uploads and rewrite RAG owner metadata on user rename."""
old_dir = _personal_upload_dir_for_owner(old_owner, create=False)
new_dir = _personal_upload_dir_for_owner(new_owner, create=False)
path_map: Dict[str, str] = {}
moved_files = 0
if os.path.isdir(old_dir) and old_dir != new_dir:
os.makedirs(new_dir, exist_ok=True)
for root, _dirs, files in os.walk(old_dir):
rel_root = os.path.relpath(root, old_dir)
target_root = new_dir if rel_root == "." else os.path.join(new_dir, rel_root)
os.makedirs(target_root, exist_ok=True)
for filename in files:
source = os.path.abspath(os.path.join(root, filename))
target = _unique_existing_target(os.path.abspath(os.path.join(target_root, filename)))
shutil.move(source, target)
path_map[source] = target
moved_files += 1
_remove_empty_tree(old_dir)
if personal_docs_manager is not None:
rename_directory = getattr(personal_docs_manager, "rename_directory", None)
if callable(rename_directory):
rename_directory(old_dir, new_dir, path_map=path_map)
rag_result = None
if rag_manager is not None:
rename_owner = getattr(rag_manager, "rename_owner", None)
if callable(rename_owner):
rag_result = rename_owner(
old_owner,
new_owner,
path_map=path_map,
path_prefixes=[(old_dir, new_dir)],
)
return {
"old_dir": old_dir,
"new_dir": new_dir,
"moved_files": moved_files,
"path_map": path_map,
"rag_result": rag_result,
}
def setup_personal_routes(personal_docs_manager, rag_manager, rag_available):
"""
Setup personal documents related routes.
@@ -275,11 +358,13 @@ def setup_personal_routes(personal_docs_manager, rag_manager, rag_available):
except Exception as e:
logger.warning(f"RAG removal failed for {filepath}: {e}")
# Delete file from disk if it's in uploads dir
# Delete file from disk if it's in the caller's own uploads dir.
# Scope to the per-owner subdir, not the shared uploads root, so one
# admin can't delete another user's personal files by path.
deleted_from_disk = False
try:
abs_target = os.path.abspath(filepath)
base_abs = os.path.abspath(UPLOADS_DIR)
abs_target = os.path.realpath(filepath)
base_abs = os.path.realpath(_personal_upload_dir_for_owner(owner, create=False))
in_uploads = (
abs_target == base_abs
or os.path.commonpath([abs_target, base_abs]) == base_abs
+4 -2
View File
@@ -12,8 +12,10 @@ from typing import Optional
from fastapi import APIRouter, HTTPException, Query, Request
from fastapi.responses import HTMLResponse, StreamingResponse
from pydantic import BaseModel, Field
from core.middleware import INTERNAL_TOOL_USER
from src.endpoint_resolver import resolve_endpoint
from src.auth_helpers import _auth_disabled, get_current_user
from core.auth import RESERVED_USERNAMES
from src.constants import DEEP_RESEARCH_DIR
_SESSION_ID_RE = re.compile(r"^[a-zA-Z0-9-]{1,128}$")
@@ -385,9 +387,9 @@ def setup_research_routes(research_handler, session_manager=None) -> APIRouter:
"""Launch a research job from the dedicated panel."""
from src.auth_helpers import require_privilege
user = require_privilege(request, "can_use_research")
if user == "internal-tool":
if user == INTERNAL_TOOL_USER:
tool_owner = (request.headers.get("X-Odysseus-Owner") or "").strip()
if tool_owner and tool_owner not in {"internal-tool", "api", "demo", "system"}:
if tool_owner and tool_owner not in RESERVED_USERNAMES:
auth_mgr = getattr(request.app.state, "auth_manager", None)
if auth_mgr is not None and getattr(auth_mgr, "is_configured", False):
try:
+16 -5
View File
@@ -11,7 +11,7 @@ from core.session_manager import SessionManager
from core.models import ChatMessage
from src.request_models import SessionResponse
from core.database import Session as DbSession, SessionLocal, Document, GalleryImage, utcnow_naive
from src.auth_helpers import get_current_user, effective_user, _auth_disabled, owner_filter
from src.auth_helpers import effective_user, _auth_disabled, owner_filter
from src.session_actions import is_session_recently_active
@@ -328,7 +328,7 @@ def setup_session_routes(session_manager: SessionManager, config: dict, webhook_
endpoint_id: str = Form(""),
):
skip_val = str(skip_validation).lower() == "true"
user = get_current_user(request)
user = effective_user(request)
endpoint_api_key = ""
endpoint_base_url = ""
_reject_raw_endpoint_url_for_non_admin(request, user, endpoint_id, endpoint_url)
@@ -477,7 +477,7 @@ def setup_session_routes(session_manager: SessionManager, config: dict, webhook_
db.close()
# Switch model/endpoint mid-session
if model is not None and endpoint_url is not None:
user = get_current_user(request)
user = effective_user(request)
_reject_raw_endpoint_url_for_non_admin(request, user, endpoint_id, endpoint_url)
endpoint_api_key = ""
endpoint_base_url = ""
@@ -1004,6 +1004,7 @@ def setup_session_routes(session_manager: SessionManager, config: dict, webhook_
"""
from src.llm_core import llm_call
user = effective_user(request)
single_user_mode = not user and _auth_disabled()
user_sessions = session_manager.get_sessions_for_user(user)
# Delete empty and throwaway sessions before sorting
@@ -1022,7 +1023,12 @@ def setup_session_routes(session_manager: SessionManager, config: dict, webhook_
}
_THROWAWAY_MAX_MESSAGES = 4 # only delete if <= this many messages
try:
rows = db.query(DbSession).filter(DbSession.archived == False, DbSession.owner == user).limit(2000).all()
rows_q = db.query(DbSession).filter(DbSession.archived == False)
if user:
rows_q = rows_q.filter(DbSession.owner == user)
elif not single_user_mode:
rows_q = rows_q.filter(DbSession.owner == user)
rows = rows_q.limit(2000).all()
folder_map = {r.id: r.folder for r in rows}
# Precompute per-session message counts in TWO aggregate queries
# instead of 13 queries PER session — with many chats the per-row
@@ -1242,7 +1248,12 @@ def setup_session_routes(session_manager: SessionManager, config: dict, webhook_
db = SessionLocal()
try:
for sid, folder_name in assignments.items():
db_session = db.query(DbSession).filter(DbSession.id == sid, DbSession.owner == user).first()
db_session_q = db.query(DbSession).filter(DbSession.id == sid)
if user:
db_session_q = db_session_q.filter(DbSession.owner == user)
elif not single_user_mode:
db_session_q = db_session_q.filter(DbSession.owner == user)
db_session = db_session_q.first()
if db_session:
db_session.folder = folder_name
db_session.updated_at = datetime.utcnow()
+368 -13
View File
@@ -15,6 +15,7 @@ from collections import namedtuple
from pathlib import Path
from typing import Dict, Any
from core.platform_compat import IS_APPLE_SILICON, which_tool
from core.middleware import INTERNAL_TOOL_USER
from src.optional_deps import prepare_optional_dependency_import
# POSIX-only: `pty`/`fcntl` transitively import `termios`, which does NOT exist
@@ -55,7 +56,7 @@ def _require_admin(request: Request):
# In-process tool loopback. The AuthMiddleware already validated the
# internal token + loopback client before setting this marker, so
# honour it here as admin-equivalent.
if user == "internal-tool":
if user == INTERNAL_TOOL_USER:
return
if not user or user == "api":
raise HTTPException(403, "Admin only")
@@ -330,6 +331,9 @@ def add_user_install_bins_to_path():
candidates.append(os.path.join(site.USER_BASE, 'bin'))
except Exception:
pass
candidates.append(os.path.expanduser('~/bin'))
candidates.append(os.path.expanduser('~/llama.cpp/build/bin'))
candidates.append(os.path.expanduser('~/llama.cpp/build-vulkan/bin'))
candidates.append(os.path.expanduser('~/.local/bin'))
parts = os.environ.get('PATH', '').split(os.pathsep) if os.environ.get('PATH') else []
changed = False
@@ -961,12 +965,84 @@ def setup_shell_routes() -> APIRouter:
return StreamingResponse(generate(), media_type="text/event-stream")
def _os_id_from_release(text: str) -> str:
"""Map /etc/os-release contents to a canonical family for our matrix."""
if not text:
return ""
ids = []
for line in text.splitlines():
line = line.strip()
if line.startswith("ID=") or line.startswith("ID_LIKE="):
ids += line.split("=", 1)[1].strip().strip('"').split()
ids = [i.lower() for i in ids]
if any(x in ids for x in ("debian", "ubuntu", "linuxmint", "pop", "elementary")):
return "debian"
if any(x in ids for x in ("arch", "manjaro", "endeavouros", "cachyos", "garuda")):
return "arch"
if any(x in ids for x in ("fedora", "rhel", "centos", "rocky", "almalinux", "ol")):
return "fedora"
if "alpine" in ids:
return "alpine"
if any(x in ids for x in ("suse", "opensuse", "opensuse-leap", "opensuse-tumbleweed", "sles")):
return "suse"
return ""
# Matrix lookup keyed on (os_family, backend) → (pkg_mgr_cmd_template, pkg_list_per_dep).
# Each `system_prereqs` name resolves to a list of OS-specific package
# names that get joined into the final `sudo apt install -y …` etc.
# command. Backend-specific extras (CUDA toolkit, ROCm, Vulkan headers)
# are added only when the detected backend needs them.
_PKG_NAMES = {
# canonical-name → {os_id: [actual_pkg_names_on_this_os]}
"cmake": {"debian": ["cmake"], "arch": ["cmake"], "fedora": ["cmake"], "alpine": ["cmake"], "suse": ["cmake"], "macos": ["cmake"]},
"build-essential": {"debian": ["build-essential"], "arch": ["base-devel"], "fedora": ["gcc", "gcc-c++", "make"], "alpine": ["build-base"], "suse": ["gcc-c++", "make"], "macos": []},
"g++": {"debian": ["g++"], "arch": ["gcc"], "fedora": ["gcc-c++"], "alpine": ["g++"], "suse": ["gcc-c++"], "macos": []},
"gcc": {"debian": ["gcc"], "arch": ["gcc"], "fedora": ["gcc"], "alpine": ["gcc"], "suse": ["gcc"], "macos": []},
"make": {"debian": ["make"], "arch": ["make"], "fedora": ["make"], "alpine": ["make"], "suse": ["make"], "macos": []},
"git": {"debian": ["git"], "arch": ["git"], "fedora": ["git"], "alpine": ["git"], "suse": ["git"], "macos": ["git"]},
"tmux": {"debian": ["tmux"], "arch": ["tmux"], "fedora": ["tmux"], "alpine": ["tmux"], "suse": ["tmux"], "macos": ["tmux"]},
}
_BACKEND_EXTRAS = {
"cuda": {"debian": ["nvidia-cuda-toolkit"], "arch": ["cuda"], "fedora": ["cuda-toolkit"], "alpine": [], "suse": ["cuda"], "macos": []},
"rocm": {"debian": ["rocm-dev"], "arch": ["rocm-hip-sdk"], "fedora": ["rocm-devel"], "alpine": [], "suse": ["rocm-dev"], "macos": []},
"vulkan": {"debian": ["libvulkan-dev", "vulkan-tools"], "arch": ["vulkan-headers", "vulkan-tools"], "fedora": ["vulkan-headers", "vulkan-tools"], "alpine": ["vulkan-loader-dev", "vulkan-tools"], "suse": ["vulkan-devel", "vulkan-tools"], "macos": []},
}
_PKG_MGR = {
"debian": "sudo apt install -y {pkgs}",
"arch": "sudo pacman -S --needed {pkgs}",
"fedora": "sudo dnf install -y {pkgs}",
"alpine": "sudo apk add {pkgs}",
"suse": "sudo zypper install -n {pkgs}",
"macos": "brew install {pkgs}",
}
def _install_cmd_for_target(os_id: str, backend: str, missing: list[str]) -> str:
"""Build a single OS+backend-aware install command for the missing prereqs."""
if not os_id or os_id not in _PKG_MGR:
return ""
pkgs: list[str] = []
seen: set[str] = set()
for m in missing:
for p in _PKG_NAMES.get(m, {}).get(os_id, []):
if p not in seen:
pkgs.append(p); seen.add(p)
# Add backend-specific extras only when the build would actually
# consume them (a CUDA toolkit isn't useful on a Vulkan box).
backend = (backend or "").lower()
for p in _BACKEND_EXTRAS.get(backend, {}).get(os_id, []):
if p not in seen:
pkgs.append(p); seen.add(p)
if not pkgs:
return ""
return _PKG_MGR[os_id].format(pkgs=" ".join(pkgs))
@router.get("/api/cookbook/packages")
async def list_packages(
request: Request,
host: str | None = None,
ssh_port: str | None = None,
venv: str | None = None,
backend: str | None = None,
):
"""Check which optional packages are installed.
@@ -1015,6 +1091,12 @@ def setup_shell_routes() -> APIRouter:
"kind": "system",
"install_hint": "Install Docker on the selected server and allow this user to run docker.",
},
# Note: cmake / gcc / git are not separate dependency rows —
# they're declared as `system_prereqs` on llama_cpp (and any
# other engine that compiles from source) so they appear as
# an inline status note on that engine's row instead of
# cluttering the panel with raw OS package names that aren't
# meaningful product-level dependencies on their own.
# ── LLM ── installs on GPU servers for model serving/downloading
{
"name": "hf_transfer",
@@ -1026,9 +1108,16 @@ def setup_shell_routes() -> APIRouter:
{
"name": "llama_cpp",
"pip": "llama-cpp-python[server]",
"desc": "Serve GGUF models via llama.cpp",
"desc": "Great for single-GPU or CPU inference with GGUF models",
"category": "LLM",
"target": "remote",
# Build-toolchain prereqs. Cookbook's launch bootstrap
# compiles llama-server from source when no prebuilt
# binary is present; without these the build aborts
# with `cmake: command not found`. Surfaced inline on
# this row so the user doesn't have to chase three
# separate OS-package rows.
"system_prereqs": ["cmake", "g++", "git"],
},
{
"name": "sglang",
@@ -1040,7 +1129,7 @@ def setup_shell_routes() -> APIRouter:
{
"name": "vllm",
"pip": "vllm",
"desc": "High-throughput LLM serving engine",
"desc": "Great for high-throughput multi-GPU inference",
"category": "LLM",
"target": "remote",
},
@@ -1103,6 +1192,7 @@ def setup_shell_routes() -> APIRouter:
# venv over SSH so a remote `pip install` actually reflects here.
remote_status: dict = {}
remote_details: dict = {}
remote_probe_error = ""
remote_names = [
p["name"]
for p in packages
@@ -1141,16 +1231,56 @@ def setup_shell_routes() -> APIRouter:
break
except ValueError as e:
raise HTTPException(400, str(e))
except Exception:
except Exception as e:
remote_status = {}
if host and remote_system_names:
remote_probe_error = f"SSH package probe failed: {str(e)[:160]}"
if "llama_cpp" in remote_names:
try:
inner = (
'export PATH="$HOME/.local/bin:$HOME/bin:'
'$HOME/llama.cpp/build/bin:$HOME/llama.cpp/build-vulkan/bin:$PATH"; '
"command -v llama-server 2>/dev/null || true"
)
argv = _ssh_base_argv(host, ssh_port) + [inner]
proc = await asyncio.create_subprocess_exec(
*argv,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
out, _err = await asyncio.wait_for(proc.communicate(), timeout=8)
llama_server_path = out.decode("utf-8", errors="replace").strip().splitlines()
llama_server_path = llama_server_path[-1].strip() if llama_server_path else ""
if llama_server_path:
remote_status["llama_cpp"] = True
probe = remote_details.setdefault("llama_cpp", {})
if isinstance(probe, dict):
probe.setdefault("binaries", {})["llama-server"] = llama_server_path
except Exception as e:
if not remote_probe_error:
remote_probe_error = f"SSH llama-server probe failed: {str(e)[:160]}"
pass
# Union of system_names + every package's system_prereqs. Probing
# the prereqs alongside the main system deps in a single SSH call
# avoids a second round-trip per Cookbook → Dependencies refresh.
prereq_names: set[str] = set()
for p in packages:
for pr in p.get("system_prereqs") or []:
prereq_names.add(str(pr))
all_system_names = list(set(remote_system_names) | prereq_names)
# Detect the target's OS family + read /etc/os-release in the same
# SSH round-trip as the prereq probe — used downstream to render a
# single OS-specific install command per row instead of dumping
# every distro's syntax onto the user.
target_os_id: str = ""
if host and all_system_names:
try:
checks = []
for name in remote_system_names:
for name in all_system_names:
qn = shlex.quote(name)
checks.append(
f"if command -v {qn} >/dev/null 2>&1; then echo {qn}=1; else echo {qn}=0; fi"
)
checks.append("echo '---OSREL---'; cat /etc/os-release 2>/dev/null || true")
inner = " ; ".join(checks)
argv = _ssh_base_argv(host, ssh_port) + [inner]
proc = await asyncio.create_subprocess_exec(
@@ -1160,20 +1290,45 @@ def setup_shell_routes() -> APIRouter:
)
out, _err = await asyncio.wait_for(proc.communicate(), timeout=12)
txt = out.decode("utf-8", errors="replace").strip()
_section, _osrel_lines = "probe", []
for line in txt.splitlines():
if line.strip() == "---OSREL---":
_section = "osrel"; continue
if _section == "osrel":
_osrel_lines.append(line)
continue
name, sep, value = line.strip().partition("=")
if sep and name in remote_system_names:
if sep and name in all_system_names:
remote_status[name] = value == "1"
target_os_id = _os_id_from_release("\n".join(_osrel_lines))
except ValueError as e:
raise HTTPException(400, str(e))
except Exception:
except Exception as e:
if not remote_probe_error:
remote_probe_error = f"SSH system probe failed: {str(e)[:160]}"
pass
elif not host:
# Local target — probe in-process so the inline install command
# still appears in the dep panel when the cookbook container
# itself is the selected server.
try:
with open("/etc/os-release", encoding="utf-8") as f:
target_os_id = _os_id_from_release(f.read())
except Exception:
target_os_id = ""
if sys.platform == "darwin":
target_os_id = "macos"
for pkg in packages:
on_remote = bool(host and pkg.get("target") == "remote")
probe = None
if on_remote:
pkg["installed"] = bool(remote_status.get(pkg["name"], False))
if remote_probe_error and pkg["name"] not in remote_status:
pkg["installed"] = None
pkg["probe_error"] = remote_probe_error
pkg["status_note"] = remote_probe_error
else:
pkg["installed"] = bool(remote_status.get(pkg["name"], False))
probe = remote_details.get(pkg["name"])
if isinstance(probe, dict):
pkg["details"] = probe
@@ -1222,13 +1377,116 @@ def setup_shell_routes() -> APIRouter:
pkg["installed"] = False
except importlib_metadata.PackageNotFoundError:
pkg["installed"] = False
except Exception:
except (Exception, SystemExit):
# Installed but crashes on import — e.g. a CUDA build of
# llama-cpp-python raising FileNotFoundError when the CUDA
# toolkit dir is absent. One broken optional package must not
# 500 the entire packages panel; report it as not usable.
# toolkit dir is absent, or rembg calling sys.exit(1) when no
# onnxruntime backend can be loaded. SystemExit is a
# BaseException, not Exception, so without catching it here a
# single sys.exit-on-import package escapes and takes down the
# whole packages panel / worker (the panel hangs forever). One
# broken optional package must not 500 — or hang — the entire
# panel; report it as not usable.
pkg["installed"] = False
# llama_cpp partial-state probe: when the package is installed
# but the wheel was built CPU-only AND the target has NVIDIA
# hardware, mark the row as partial (yellow/orange) with a
# one-click upgrade to the CUDA wheel. Without this the row
# reads "ready" green while inference runs at 3 tok/s on GPU
# silicon — actively misleading.
if pkg["name"] == "llama_cpp" and pkg.get("installed"):
_native_llama_server = bool(
isinstance(probe, dict)
and isinstance(probe.get("binaries"), dict)
and probe["binaries"].get("llama-server")
)
_gpu_capable = False
_has_nvidia_target = False
if _native_llama_server:
# Native llama-server is the launcher path Cookbook now
# prefers. Do not mark this as a CPU-only Python wheel just
# because llama-cpp-python is absent from the selected venv.
_gpu_capable = True
elif on_remote and host:
try:
# Activate the configured venv FIRST so the probe
# runs against the same python the launch script
# would activate. Without this prefix, bare
# `python3` was checked — which can disagree with
# the venv's wheel (e.g. user-site has CUDA wheel
# but venv has CPU-only), and the dep panel then
# showed "ready" green while every launch fell to
# CPU.
_vp = _venv_activate_prefix(venv)
probe = (
f'{_vp}python3 -c "import llama_cpp; import sys; '
'sys.exit(0 if llama_cpp.llama_supports_gpu_offload() else 1)" '
'&& echo llama_cpp_gpu=1 || echo llama_cpp_gpu=0; '
'command -v nvidia-smi >/dev/null 2>&1 '
'&& nvidia-smi -L 2>/dev/null | grep -q "GPU " '
'&& echo nvidia=1 || echo nvidia=0'
)
argv = _ssh_base_argv(host, ssh_port) + [probe]
proc = await asyncio.create_subprocess_exec(
*argv, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
)
out, _ = await asyncio.wait_for(proc.communicate(), timeout=8)
txt = out.decode("utf-8", errors="replace")
if "llama_cpp_gpu=1" in txt:
_gpu_capable = True
if "nvidia=1" in txt:
_has_nvidia_target = True
except Exception:
pass
else:
try:
import llama_cpp as _lcp # type: ignore
_gpu_capable = bool(_lcp.llama_supports_gpu_offload())
except Exception:
_gpu_capable = False
_has_nvidia_target = shutil.which("nvidia-smi") is not None
if (not _gpu_capable) and _has_nvidia_target:
pkg["partial"] = True
pkg["partial_reason"] = "Installed but CPU-only wheel — GPU detected on this target. Upgrade to a CUDA wheel for ~10× faster inference."
pkg["partial_action"] = "reinstall_llama_cpp_cuda"
# Attach per-package system_prereqs status. We probed each
# prereq name above; surface "Missing build deps: …" ONLY
# when the package itself is not installed — if the package
# works (e.g. llama-cpp-python already imports cleanly), the
# build toolchain is irrelevant and surfacing it as a red
# flag confuses users ("ready" + "missing" on the same row).
_prereqs = list(pkg.get("system_prereqs") or [])
if _prereqs:
if on_remote:
_pr_present = {n: bool(remote_status.get(n)) for n in _prereqs}
else:
_pr_present = {n: shutil.which(n) is not None for n in _prereqs}
pkg["system_prereqs_status"] = _pr_present
_missing = [n for n, ok in _pr_present.items() if not ok]
# Suppress the "missing build deps" hint when the package
# itself is installed — build deps are only relevant if
# the user would need to recompile from source.
if pkg.get("installed"):
_missing = []
if _missing:
# Build a target-specific install command from the
# (os_family, backend) matrix when we know both. Fall
# back to the multi-distro hint only when the target's
# OS can't be classified (e.g. ssh probe failed).
_resolved_os = target_os_id or "debian" # safest default
_cmd = _install_cmd_for_target(_resolved_os, backend or "", _missing)
if _cmd and target_os_id:
_hint = "Missing build deps for this target: " + ", ".join(_missing)
pkg["install_cmd_for_target"] = _cmd
pkg["install_cmd_os"] = target_os_id
pkg["install_cmd_backend"] = (backend or "").lower()
else:
_hint = "Missing build deps: " + ", ".join(_missing) + ". Install via apt: cmake build-essential git / pacman: cmake base-devel git / dnf: cmake gcc-c++ make git / brew: cmake git."
_existing_note = pkg.get("status_note") or ""
pkg["status_note"] = (_existing_note + "" + _hint) if _existing_note else _hint
pkg["build_deps_missing"] = _missing
if pkg.get("installed"):
update_status = _package_pip_update_status(pkg, probe)
pkg["pip_update_available"] = update_status.available
@@ -1288,6 +1546,102 @@ def setup_shell_routes() -> APIRouter:
return {"ok": True, "output": stdout.decode()[-200:]}
return {"ok": False, "error": stderr.decode()[-300:]}
@router.post("/api/cookbook/install-system-deps")
async def install_system_deps(request: Request):
"""Install OS-level system packages (cmake/build-essential/git/tmux)
on a remote target or in the local container. Admin only.
Bounded by a per-package allowlist anything outside the catalog
is rejected so the route can't be coerced into installing arbitrary
OS packages. Uses `sudo -n` (passwordless) so the call returns a
clear "needs sudo password" error instead of hanging when interactive
sudo is required.
"""
_require_admin(request)
body = await request.json()
raw = body.get("packages") or []
host = (body.get("remote_host") or "").strip()
ssh_port = body.get("ssh_port")
# Names users can request — must match canonical names used in the
# deps catalog's `system_prereqs` field and on the System rows.
ALLOWED = {"cmake", "build-essential", "g++", "gcc", "git", "tmux", "make"}
pkgs = [str(p).strip() for p in raw if str(p).strip() in ALLOWED]
if not pkgs:
return {"ok": False, "error": "no installable packages requested (allowlist: " + ", ".join(sorted(ALLOWED)) + ")"}
# Re-map to the right package name per OS. apt/dpkg use the names
# as-is; pacman has base-devel for build-essential, etc.
def _apt(names): return list(names)
def _pacman(names):
return ["base-devel" if n == "build-essential" else n for n in names]
def _dnf(names):
out = []
for n in names:
if n == "build-essential": out += ["gcc", "gcc-c++", "make"]
elif n == "g++": out += ["gcc-c++"]
else: out.append(n)
return out
def _brew(names):
return [n for n in names if n not in ("build-essential", "g++", "gcc", "make")]
# Build a single shell snippet that detects the package manager and
# runs the right install. Non-interactive sudo (-n) only — if sudo
# asks for a password the script reports it instead of hanging.
apt_pkgs = " ".join(shlex.quote(p) for p in _apt(pkgs))
pac_pkgs = " ".join(shlex.quote(p) for p in _pacman(pkgs))
dnf_pkgs = " ".join(shlex.quote(p) for p in _dnf(pkgs))
brew_pkgs = " ".join(shlex.quote(p) for p in _brew(pkgs))
# Error messages go to stderr (>&2) so the route's error field
# gets populated. Without the redirect, `echo "ERROR…"` on stdout
# left stderr empty and the frontend toast fell through to a
# bare "HTTP 200" instead of surfacing the real reason.
script = (
'set -e; '
'if ! sudo -n true 2>/dev/null; then '
' echo "ERROR: passwordless sudo unavailable on this target. Run once: sudo apt install -y ' + " ".join(pkgs) + ' (or your distro equivalent: pacman -S, dnf install, brew install). After that, Cookbook can install the rest." >&2; exit 2; fi; '
'if command -v apt-get >/dev/null 2>&1; then '
f' sudo -n env DEBIAN_FRONTEND=noninteractive apt-get update -qq && sudo -n env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends {apt_pkgs}; '
'elif command -v pacman >/dev/null 2>&1; then '
f' sudo -n pacman -Sy --needed --noconfirm {pac_pkgs}; '
'elif command -v dnf >/dev/null 2>&1; then '
f' sudo -n dnf install -y {dnf_pkgs}; '
'elif command -v brew >/dev/null 2>&1; then '
f' brew install {brew_pkgs}; '
'else '
' echo "ERROR: no supported package manager (apt/pacman/dnf/brew) on this target." >&2; exit 3; fi'
)
try:
if host:
argv = _ssh_base_argv(host, ssh_port) + [script]
else:
argv = ["bash", "-lc", script]
except ValueError as e:
raise HTTPException(400, str(e))
try:
proc = await asyncio.create_subprocess_exec(
*argv, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
)
out, err = await asyncio.wait_for(proc.communicate(), timeout=180)
except asyncio.TimeoutError:
return {"ok": False, "error": "Install timed out after 180s"}
ok = (proc.returncode == 0)
# Combine stderr + (last lines of stdout) into a single error
# blob when ok=False — some package managers print useful failure
# context to stdout, and a script that exits via `echo ...; exit N`
# without `>&2` would otherwise hand back an empty error string
# and force the frontend to show a bare "HTTP 200".
err_txt = err.decode("utf-8", errors="replace").strip()
out_txt = out.decode("utf-8", errors="replace").strip()
if not ok:
tail_out = out_txt[-500:] if out_txt else ""
combined = err_txt or tail_out or f"exit code {proc.returncode}"
else:
combined = None
return {
"ok": ok,
"exit_code": proc.returncode,
"output": out_txt[-1000:],
"error": combined,
}
@router.post("/api/cookbook/rebuild-engine")
async def rebuild_engine(request: Request):
"""Clear the cached llama.cpp build so the next serve recompiles.
@@ -1308,7 +1662,8 @@ def setup_shell_routes() -> APIRouter:
return {"ok": False, "error": f"Unsupported engine: {engine}"}
host = str(body.get("remote_host") or "").strip()
ssh_port = body.get("ssh_port")
cmd = _llama_cpp_rebuild_cmd()
update_source = bool(body.get("update_source"))
cmd = _llama_cpp_rebuild_cmd(update_source=update_source)
try:
argv = (
(_ssh_base_argv(host, ssh_port) + [cmd])
+5 -1
View File
@@ -691,8 +691,12 @@ async def _run_skill_test_once(md: str, task: str, url, model, headers, owner) -
{"role": "user", "content": task},
]
try:
# max_tokens explicitly set: passing 0 lets some upstreams (Ollama,
# OpenAI-compat) generate an empty completion, which manifested as
# the skill test returning nothing while chat (which carries its
# preset's max_tokens) worked. 4096 matches the chat default.
async for chunk in stream_agent_loop(url, model, messages, headers=headers,
temperature=0.3, max_tokens=0, max_rounds=8, owner=owner):
temperature=0.3, max_tokens=4096, max_rounds=8, owner=owner):
if not chunk.startswith("data: ") or chunk.strip() == "data: [DONE]":
continue
try:
+9 -1
View File
@@ -11,6 +11,7 @@ from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel
from core.database import SessionLocal, ScheduledTask, TaskRun
from core.middleware import INTERNAL_TOOL_USER
from core.constants import internal_api_base
from src.auth_helpers import get_current_user
from src.constants import DATA_DIR, EMAIL_URGENCY_CACHE_DIR
@@ -151,6 +152,7 @@ class TaskCreate(BaseModel):
endpoint_url: Optional[str] = None
then_task_id: Optional[str] = None # chain: run this task after success
notifications_enabled: Optional[bool] = None # None lets action-specific defaults apply
character_id: Optional[str] = None # built-in persona id (PERSONAS) — biases output voice
class TaskUpdate(BaseModel):
@@ -171,6 +173,7 @@ class TaskUpdate(BaseModel):
endpoint_url: Optional[str] = None
then_task_id: Optional[str] = None
notifications_enabled: Optional[bool] = None
character_id: Optional[str] = None
def _display_task_name(t: ScheduledTask) -> str:
@@ -203,6 +206,7 @@ def _task_to_dict(t: ScheduledTask, include_last_run_result: bool = False) -> di
"output_target": t.output_target,
"session_id": t.session_id,
"crew_member_id": getattr(t, "crew_member_id", None),
"character_id": getattr(t, "character_id", None),
"model": t.model,
"endpoint_url": t.endpoint_url,
"run_count": t.run_count or 0,
@@ -424,7 +428,7 @@ def setup_task_routes(task_scheduler) -> APIRouter:
# In-process tool-loopback marker — AuthMiddleware validated
# the internal token + loopback client before stamping this,
# so treat as admin-equivalent.
if user == "internal-tool":
if user == INTERNAL_TOOL_USER:
return True
try:
from core.auth import AuthManager
@@ -552,6 +556,7 @@ def setup_task_routes(task_scheduler) -> APIRouter:
then_task_id=then_task_id,
webhook_token=webhook_token,
notifications_enabled=notifications_enabled,
character_id=(req.character_id or None),
)
db.add(task)
db.commit()
@@ -705,6 +710,9 @@ def setup_task_routes(task_scheduler) -> APIRouter:
task.then_task_id = _validate_then_task_id(db, req.then_task_id, user, current_task_id=task.id)
if req.notifications_enabled is not None:
task.notifications_enabled = bool(req.notifications_enabled)
if req.character_id is not None:
# Empty string clears the persona; non-empty stores the id.
task.character_id = req.character_id or None
if req.cron_expression is not None:
if req.cron_expression:
try:
+80 -7
View File
@@ -3,11 +3,16 @@ import os
import time
import json
import asyncio
import shutil
import uuid
from pathlib import Path
from fastapi import APIRouter, Request, File, UploadFile, HTTPException
from typing import List
import logging
from core.middleware import require_admin
from src.auth_helpers import get_current_user
from core.database import SessionLocal, GalleryImage
from src.auth_helpers import effective_user
from src.constants import GENERATED_IMAGES_DIR
from src.upload_handler import count_recent_uploads
logger = logging.getLogger(__name__)
@@ -50,6 +55,69 @@ def setup_upload_routes(upload_handler):
raise HTTPException(404, "File not found")
raise HTTPException(404, "File not found")
def _promote_chat_image_to_gallery(meta: dict, owner: str | None) -> str | None:
"""Make chat-uploaded images visible in Gallery without changing chat storage."""
is_image_file = getattr(upload_handler, "is_image_file", None)
if not callable(is_image_file):
return None
if not is_image_file(meta.get("name", ""), meta.get("mime", "")):
return None
source_path = meta.get("path")
if not source_path or not os.path.isfile(source_path):
return None
db = SessionLocal()
try:
file_hash = meta.get("hash")
if file_hash:
q = db.query(GalleryImage).filter(
GalleryImage.file_hash == file_hash,
GalleryImage.is_active == True, # noqa: E712
)
if owner:
q = q.filter(GalleryImage.owner == owner)
existing = q.first()
if existing:
return existing.id
image_dir = Path(GENERATED_IMAGES_DIR)
image_dir.mkdir(parents=True, exist_ok=True)
ext = Path(meta.get("name") or source_path).suffix.lower()
if ext not in {".png", ".jpg", ".jpeg", ".webp", ".gif"}:
mime_ext = {
"image/png": ".png",
"image/jpeg": ".jpg",
"image/jpg": ".jpg",
"image/webp": ".webp",
"image/gif": ".gif",
}.get(meta.get("mime", ""))
ext = mime_ext or ".png"
filename = f"{uuid.uuid4().hex[:12]}{ext}"
dest_path = image_dir / filename
shutil.copy2(source_path, dest_path)
image_id = str(uuid.uuid4())
db.add(GalleryImage(
id=image_id,
filename=filename,
prompt=meta.get("name") or "Chat upload",
model="chat-upload",
owner=owner,
file_hash=file_hash,
width=meta.get("width"),
height=meta.get("height"),
file_size=meta.get("size"),
))
db.commit()
return image_id
except Exception as e:
db.rollback()
logger.warning("Failed to add chat image upload to gallery: %s", e)
return None
finally:
db.close()
@router.post("")
async def api_upload(request: Request, files: List[UploadFile] = File(...)):
@@ -78,8 +146,10 @@ def setup_upload_routes(upload_handler):
for u in files:
try:
meta = upload_handler.save_upload(u, client_ip, owner=get_current_user(request))
out.append({
owner = effective_user(request)
meta = upload_handler.save_upload(u, client_ip, owner=owner)
gallery_id = _promote_chat_image_to_gallery(meta, owner)
item = {
"id": meta["id"],
"name": meta["name"],
"mime": meta["mime"],
@@ -89,7 +159,10 @@ def setup_upload_routes(upload_handler):
"width": meta.get("width"),
"height": meta.get("height"),
"is_duplicate": meta.get("is_duplicate", False)
})
}
if gallery_id:
item["gallery_id"] = gallery_id
out.append(item)
except HTTPException:
raise
except Exception as e:
@@ -138,7 +211,7 @@ def setup_upload_routes(upload_handler):
original_name = info.get("name", file_id)
auth_mgr = getattr(request.app.state, "auth_manager", None)
auth_configured = bool(auth_mgr and auth_mgr.is_configured)
current_user = get_current_user(request)
current_user = effective_user(request)
file_owner = info.get("owner") if info else None
if auth_configured:
if not current_user:
@@ -204,7 +277,7 @@ def setup_upload_routes(upload_handler):
info = _load_upload_info(file_id)
auth_mgr = getattr(request.app.state, "auth_manager", None)
auth_configured = bool(auth_mgr and auth_mgr.is_configured)
current_user = get_current_user(request)
current_user = effective_user(request)
file_owner = info.get("owner") if info else None
if auth_configured:
if not current_user:
@@ -247,7 +320,7 @@ def setup_upload_routes(upload_handler):
raise HTTPException(404, "File not found")
auth_mgr = getattr(request.app.state, "auth_manager", None)
auth_configured = bool(auth_mgr and auth_mgr.is_configured)
current_user = get_current_user(request)
current_user = effective_user(request)
file_owner = info.get("owner")
if auth_configured:
if not current_user:
+2 -3
View File
@@ -1,6 +1,5 @@
"""Webhook, API Token, and sync chat routes."""
import asyncio
import uuid
import logging
from typing import Optional
@@ -385,10 +384,10 @@ def setup_webhook_routes(
sess.add_message(ChatMessage("assistant", reply))
session_manager.save_sessions()
asyncio.create_task(webhook_manager.fire("chat.completed", {
webhook_manager.fire_and_forget("chat.completed", {
"session_id": session_id, "model": sess.model,
"user_message": message[:2000], "response": reply[:2000],
}))
})
return {"response": reply, "session_id": session_id, "model": sess.model}
+133
View File
@@ -0,0 +1,133 @@
#!/usr/bin/env python3
"""Backfill release_date on entries in services/hwfit/data/hf_models.json.
Why: the `newest` sort in the cookbook ranks rows by release_date. Anything
missing a date sorts to the bottom. This script pulls `created_at` from the
HuggingFace API for each catalog entry without one (or all entries when
--refresh is passed) and writes the catalog back.
Usage:
python scripts/backfill_model_release_dates.py # missing only
python scripts/backfill_model_release_dates.py --refresh # all entries
python scripts/backfill_model_release_dates.py --limit 50 # cap requests
python scripts/backfill_model_release_dates.py --dry-run # show, don't write
Auth: set HF_TOKEN env var (or huggingface-cli login) to access gated repos.
"""
import argparse
import json
import os
import sys
import time
from datetime import datetime
from pathlib import Path
try:
from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError
except ImportError:
print("Install huggingface_hub: pip install huggingface_hub", file=sys.stderr)
sys.exit(1)
CATALOG_PATH = Path(__file__).resolve().parent.parent / "services" / "hwfit" / "data" / "hf_models.json"
def fetch_release_date(api: HfApi, repo_id: str) -> str | None:
"""Return YYYY-MM-DD release date, or None on miss / error."""
try:
info = api.model_info(repo_id, files_metadata=False)
except HfHubHTTPError as e:
# 401 = gated/private, 404 = renamed/deleted. Either way, no date.
status = getattr(getattr(e, "response", None), "status_code", None)
print(f" {repo_id}: HTTP {status or '?'}", file=sys.stderr)
return None
except Exception as e:
print(f" {repo_id}: {type(e).__name__}: {e}", file=sys.stderr)
return None
created = getattr(info, "created_at", None)
if not created:
return None
return created.strftime("%Y-%m-%d")
def main():
p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
p.add_argument("--refresh", action="store_true", help="Overwrite existing release_date too (default: only fill missing).")
p.add_argument("--limit", type=int, default=0, help="Stop after N API calls (0 = no limit).")
p.add_argument("--dry-run", action="store_true", help="Don't write back; just report.")
p.add_argument("--sleep", type=float, default=0.05, help="Seconds to sleep between requests (default 0.05).")
args = p.parse_args()
if not CATALOG_PATH.exists():
print(f"Catalog not found: {CATALOG_PATH}", file=sys.stderr)
sys.exit(2)
with CATALOG_PATH.open(encoding="utf-8") as f:
catalog = json.load(f)
candidates = []
for i, m in enumerate(catalog):
name = m.get("name")
if not name:
continue
existing = (m.get("release_date") or "").strip()
if existing and not args.refresh:
continue
candidates.append(i)
if args.limit:
candidates = candidates[: args.limit]
print(f"Catalog: {CATALOG_PATH}")
print(f"Total entries: {len(catalog)}")
print(f"Targets ({'refresh all' if args.refresh else 'missing only'}{'' if not args.limit else f', capped at {args.limit}'}): {len(candidates)}")
if not candidates:
print("Nothing to do.")
return
api = HfApi(token=os.environ.get("HF_TOKEN") or None)
updated = 0
skipped = 0
started = time.time()
for n, idx in enumerate(candidates, start=1):
entry = catalog[idx]
name = entry["name"]
old = (entry.get("release_date") or "").strip()
new = fetch_release_date(api, name)
if new is None:
skipped += 1
tag = "skip"
elif new == old:
tag = "unchanged"
else:
entry["release_date"] = new
updated += 1
tag = f"set {new}" + (f" (was {old})" if old else "")
print(f"[{n}/{len(candidates)}] {name}{tag}")
if args.sleep:
time.sleep(args.sleep)
elapsed = time.time() - started
print()
print(f"Done in {elapsed:.1f}s — {updated} updated, {skipped} skipped (HF unavailable / gated / missing date).")
if args.dry_run:
print("Dry run — no write.")
return
if updated:
# Atomic write: tmp file in the same dir, then rename. Keeps the
# catalog usable even if the process dies mid-write.
tmp = CATALOG_PATH.with_suffix(".json.tmp")
with tmp.open("w", encoding="utf-8") as f:
json.dump(catalog, f, indent=1, ensure_ascii=False)
f.write("\n")
tmp.replace(CATALOG_PATH)
print(f"Wrote {CATALOG_PATH}")
else:
print("No changes to write.")
if __name__ == "__main__":
main()
+341
View File
@@ -0,0 +1,341 @@
#!/usr/bin/env python3
"""Import models from the upstream vllm-project/recipes catalog into our
local hf_models.json. Two modes:
--update-existing Stamp min_vllm_version + vllm_recipe=True on rows we
already carry. Cheap, no HF API calls.
--add-missing Create new catalog rows for every recipe model we
don't carry. Hits the HF API for created_at + downloads
(~1 req per missing model, paced).
Both modes write atomically (tmp + rename) so a crashed run leaves the
catalog intact. Default with no mode flags runs both, prefer to pass them
explicitly.
Usage:
python scripts/import_from_vllm_recipes.py --update-existing
python scripts/import_from_vllm_recipes.py --add-missing
python scripts/import_from_vllm_recipes.py --dry-run
python scripts/import_from_vllm_recipes.py --limit 10
Auth: set HF_TOKEN to access gated repos when --add-missing.
"""
import argparse
import json
import os
import re
import sys
import time
from datetime import datetime
from pathlib import Path
try:
import httpx
import yaml
except ImportError:
print("pip install httpx PyYAML", file=sys.stderr)
sys.exit(1)
try:
from huggingface_hub import HfApi
from huggingface_hub.utils import HfHubHTTPError
except ImportError:
HfApi = None
HfHubHTTPError = Exception
CATALOG_PATH = Path(__file__).resolve().parent.parent / "services" / "hwfit" / "data" / "hf_models.json"
RECIPES_TREE_URL = (
"https://api.github.com/repos/vllm-project/recipes/git/trees/main?recursive=1"
)
RECIPE_RAW_URL = (
"https://raw.githubusercontent.com/vllm-project/recipes/main/models/{repo}.yaml"
)
# Map recipe `precision` to the closest catalog `quantization` label that
# fit.py / models.py already understand.
_PRECISION_TO_QUANT = {
"fp8": "FP8",
"nvfp4": "NVFP4",
"mxfp4": "MXFP4",
"bf16": "BF16",
"fp16": "F16",
"f16": "F16",
"fp4": "FP4",
"int8": "INT8",
"int4": "INT4",
"awq-4bit": "AWQ-4bit",
"awq-8bit": "AWQ-8bit",
}
# Architecture name → use_case fallback. fit.py weights use_case for filtering;
# missing field defaults to a generic bucket.
_ARCH_USE_CASE = {
"moe": "General-purpose reasoning, long-context",
"llama": "General-purpose chat",
"qwen2": "General-purpose chat",
"qwen3": "General-purpose reasoning",
"deepseek_v3_moe": "General-purpose reasoning, long-context",
"deepseek_v4_moe": "General-purpose reasoning, long-context",
}
def _parse_param_count(s) -> int:
"""'230B' / '8.6B' / '4.2T' → integer parameter count."""
if s is None:
return 0
s = str(s).strip().replace(",", "")
m = re.match(r"^([\d.]+)\s*([KMBT]?)$", s, re.I)
if not m:
return 0
num = float(m.group(1))
unit = (m.group(2) or "").upper()
mult = {"K": 1e3, "M": 1e6, "B": 1e9, "T": 1e12, "": 1.0}[unit]
return int(num * mult)
def _capabilities_for(arch: str, hardware: dict, ctx_len: int, has_reasoning: bool) -> list[str]:
caps = []
if "moe" in (arch or "").lower():
caps.append("moe")
if has_reasoning:
caps.append("reasoning")
if ctx_len and ctx_len >= 100_000:
caps.append("long_context")
if any(hw in (hardware or {}) for hw in ("mi300x", "mi325x", "mi350x", "mi355x")):
caps.append("amd_supported")
return caps
def _fetch_manifest(client: httpx.Client) -> set[str]:
r = client.get(RECIPES_TREE_URL, headers={"Accept": "application/vnd.github+json"}, timeout=15)
r.raise_for_status()
tree = (r.json() or {}).get("tree") or []
out: set[str] = set()
for e in tree:
path = (e or {}).get("path") or ""
if path.startswith("models/") and path.endswith(".yaml"):
body = path[len("models/"):-len(".yaml")]
if "/" in body:
out.add(body)
return out
def _fetch_recipe(client: httpx.Client, repo: str) -> dict | None:
url = RECIPE_RAW_URL.format(repo=repo)
try:
r = client.get(url, timeout=10)
if r.status_code != 200:
return None
return yaml.safe_load(r.text) or {}
except Exception:
return None
def _stamp_from_recipe(entry: dict, recipe: dict) -> bool:
"""Mutate entry with recipe-derived fields. Returns True if anything changed."""
model = recipe.get("model") or {}
meta = recipe.get("meta") or {}
features = recipe.get("features") or {}
changed = False
new_min = (model.get("min_vllm_version") or "").strip()
if new_min and entry.get("min_vllm_version") != new_min:
entry["min_vllm_version"] = new_min
changed = True
if not entry.get("vllm_recipe"):
entry["vllm_recipe"] = True
changed = True
# Hardware support map — useful for filtering "which models run on my AMD box".
hw = meta.get("hardware") or {}
if hw and entry.get("recipe_hardware") != hw:
entry["recipe_hardware"] = {k: str(v) for k, v in hw.items()}
changed = True
# Tool/reasoning parser hints — purely informational at catalog level;
# the live launch command builder still reads them from the recipe API.
if features.get("reasoning") and not entry.get("has_reasoning_parser"):
entry["has_reasoning_parser"] = True
changed = True
if features.get("tool_calling") and not entry.get("has_tool_call_parser"):
entry["has_tool_call_parser"] = True
changed = True
return changed
def _build_new_entry(repo: str, recipe: dict, hf_info=None) -> dict | None:
"""Build a fresh catalog entry from a recipe + (optional) HF model info."""
model = recipe.get("model") or {}
meta = recipe.get("meta") or {}
features = recipe.get("features") or {}
variants = recipe.get("variants") or {}
org, name = repo.split("/", 1)
raw_params = _parse_param_count(model.get("parameter_count"))
active_raw = _parse_param_count(model.get("active_parameters"))
ctx = model.get("context_length") or 0
# Pick the smallest-VRAM variant as the catalog quant — that's what most
# users land on first. NVFP4/MXFP4 typically win this on Blackwell;
# FP8 elsewhere; BF16 baseline only.
pick_quant = None
pick_vram = None
for vk, vv in variants.items():
if not isinstance(vv, dict):
continue
prec = (vv.get("precision") or "").lower()
vram = vv.get("vram_minimum_gb") or 0
quant = _PRECISION_TO_QUANT.get(prec)
if quant and (pick_vram is None or (vram and vram < pick_vram)):
pick_quant = quant
pick_vram = vram or pick_vram
if not pick_quant:
pick_quant = "BF16"
arch = (model.get("architecture") or "").lower()
use_case = _ARCH_USE_CASE.get(arch, "General-purpose chat")
caps = _capabilities_for(arch, meta.get("hardware") or {}, ctx, bool(features.get("reasoning")))
rel_date = ""
downloads = 0
likes = 0
if hf_info is not None:
created = getattr(hf_info, "created_at", None)
if created:
rel_date = created.strftime("%Y-%m-%d")
downloads = int(getattr(hf_info, "downloads", 0) or 0)
likes = int(getattr(hf_info, "likes", 0) or 0)
if not rel_date:
rel_date = str(meta.get("date_updated") or datetime.utcnow().strftime("%Y-%m-%d"))
entry: dict = {
"name": repo,
"provider": org,
"parameter_count": str(model.get("parameter_count") or "?"),
"parameters_raw": raw_params,
"is_moe": "moe" in arch,
"quantization": pick_quant,
"context_length": int(ctx or 0),
"use_case": use_case,
"capabilities": caps,
"pipeline_tag": "text-generation",
"architecture": arch or "unknown",
"hf_downloads": downloads,
"hf_likes": likes,
"release_date": rel_date,
# Recipe-derived bits.
"vllm_recipe": True,
"min_vllm_version": (model.get("min_vllm_version") or "").strip() or None,
"recipe_hardware": {k: str(v) for k, v in (meta.get("hardware") or {}).items()},
"has_reasoning_parser": bool(features.get("reasoning")),
"has_tool_call_parser": bool(features.get("tool_calling")),
}
if active_raw:
entry["active_parameters"] = active_raw
if pick_vram:
# min_vram_gb is what hwfit uses for "does this fit". Recipe states a
# minimum for the chosen variant; round up slightly for KV-cache room.
entry["min_vram_gb"] = float(pick_vram)
entry["min_ram_gb"] = float(round(pick_vram * 0.6, 1))
entry["recommended_ram_gb"] = float(round(pick_vram * 1.2, 1))
# Drop empty / None fields to keep the JSON tidy.
return {k: v for k, v in entry.items() if v not in (None, "", [], {})}
def main():
p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
p.add_argument("--update-existing", action="store_true", help="Stamp min_vllm_version + vllm_recipe on existing rows.")
p.add_argument("--add-missing", action="store_true", help="Add new rows for recipe models not in the catalog.")
p.add_argument("--limit", type=int, default=0, help="Stop after N recipe fetches.")
p.add_argument("--dry-run", action="store_true", help="Don't write back; just report.")
p.add_argument("--sleep", type=float, default=0.05, help="Seconds between HTTP requests.")
args = p.parse_args()
if not args.update_existing and not args.add_missing:
args.update_existing = args.add_missing = True
with CATALOG_PATH.open(encoding="utf-8") as f:
catalog = json.load(f)
by_name = {m.get("name"): m for m in catalog if m.get("name")}
client = httpx.Client(follow_redirects=True)
print(f"Catalog: {CATALOG_PATH} ({len(catalog)} entries)")
print("Fetching upstream manifest…")
try:
manifest = _fetch_manifest(client)
except Exception as e:
print(f"FATAL: manifest fetch failed: {e}", file=sys.stderr)
sys.exit(2)
print(f"Manifest: {len(manifest)} recipes")
existing = sorted(by_name.keys() & manifest)
missing = sorted(manifest - by_name.keys())
print(f"Match catalog ↔ manifest: existing={len(existing)} missing={len(missing)}")
targets: list[tuple[str, str]] = [] # (repo, action)
if args.update_existing:
targets.extend((r, "update") for r in existing)
if args.add_missing:
targets.extend((r, "add") for r in missing)
if args.limit:
targets = targets[: args.limit]
print(f"Targets: {len(targets)}")
hf_api = HfApi(token=os.environ.get("HF_TOKEN") or None) if HfApi else None
updated = added = skipped = 0
started = time.time()
for n, (repo, action) in enumerate(targets, 1):
recipe = _fetch_recipe(client, repo)
if not recipe:
print(f"[{n}/{len(targets)}] {repo:55} skip (no recipe fetched)")
skipped += 1
time.sleep(args.sleep)
continue
if action == "update":
entry = by_name[repo]
if _stamp_from_recipe(entry, recipe):
updated += 1
print(f"[{n}/{len(targets)}] {repo:55} updated")
else:
print(f"[{n}/{len(targets)}] {repo:55} unchanged")
else: # add
hf_info = None
if hf_api:
try:
hf_info = hf_api.model_info(repo, files_metadata=False)
except HfHubHTTPError as e:
code = getattr(getattr(e, "response", None), "status_code", "?")
print(f" HF {code} for {repo} — building from recipe only", file=sys.stderr)
except Exception as e:
print(f" HF error for {repo}: {e}", file=sys.stderr)
new_entry = _build_new_entry(repo, recipe, hf_info)
if new_entry:
catalog.append(new_entry)
by_name[repo] = new_entry
added += 1
print(f"[{n}/{len(targets)}] {repo:55} added ({new_entry.get('parameter_count','?')}, {new_entry.get('quantization','?')})")
else:
skipped += 1
print(f"[{n}/{len(targets)}] {repo:55} skip (couldn't build entry)")
time.sleep(args.sleep)
elapsed = time.time() - started
print()
print(f"Done in {elapsed:.1f}s — added={added}, updated={updated}, skipped={skipped}")
if args.dry_run:
print("Dry run — no write.")
return
if added or updated:
tmp = CATALOG_PATH.with_suffix(".json.tmp")
with tmp.open("w", encoding="utf-8") as f:
json.dump(catalog, f, indent=1, ensure_ascii=False)
f.write("\n")
tmp.replace(CATALOG_PATH)
print(f"Wrote {CATALOG_PATH} ({len(catalog)} entries)")
else:
print("No changes — catalog untouched.")
if __name__ == "__main__":
main()
+5 -1
View File
@@ -103,9 +103,13 @@ def cmd_list(args) -> None:
end = _parse_dt(args.end) if args.end else (start + timedelta(days=30))
db = SessionLocal()
try:
# Overlap semantics, matching the web route (routes/calendar_routes.py)
# and the recurring-expansion contract: an event is in the window when
# it starts before the window end AND ends after the window start. This
# includes multi-day / in-progress events that began before `start`.
q = db.query(CalendarEvent).filter(
CalendarEvent.dtstart >= start,
CalendarEvent.dtstart < end,
CalendarEvent.dtend > start,
)
if args.calendar:
cal = db.query(CalendarCal).filter(CalendarCal.name == args.calendar).first()
+133 -1
View File
@@ -14059,6 +14059,138 @@
"vision"
]
},
{
"name": "google/gemma-4-12B-it",
"provider": "Google",
"parameter_count": "12.0B",
"parameters_raw": 12000000000,
"min_ram_gb": 8.5,
"recommended_ram_gb": 11.0,
"min_vram_gb": 7.5,
"quantization": "Q4_K_M",
"context_length": 131072,
"use_case": "General purpose, multimodal; unsloth/gemma-4-12B-it-GGUF Dynamic variants reduce VRAM from ~7.5 GB to ~5.5 GB",
"is_moe": false,
"num_experts": null,
"active_experts": null,
"active_parameters": null,
"architecture": "gemma4",
"pipeline_tag": "image-text-to-text",
"release_date": "2026-04-01",
"gguf_sources": [
{
"repo": "unsloth/gemma-4-12B-it-GGUF",
"provider": "unsloth"
}
],
"capabilities": [
"vision"
]
},
{
"name": "google/gemma-4-12B-it-qat-int4",
"provider": "Google",
"parameter_count": "12.0B",
"parameters_raw": 12000000000,
"min_ram_gb": 8.0,
"recommended_ram_gb": 9.5,
"min_vram_gb": 6.5,
"quantization": "QAT-INT4",
"context_length": 131072,
"use_case": "General purpose, multimodal (QAT quantization-aware training — higher quality than post-train INT4; vLLM native; no GGUF)",
"is_moe": false,
"num_experts": null,
"active_experts": null,
"active_parameters": null,
"architecture": "gemma4",
"pipeline_tag": "image-text-to-text",
"release_date": "2026-04-01",
"gguf_sources": [],
"capabilities": [
"vision"
]
},
{
"name": "google/gemma-4-12B-it-qat-int8",
"provider": "Google",
"parameter_count": "12.0B",
"parameters_raw": 12000000000,
"min_ram_gb": 15.0,
"recommended_ram_gb": 20.0,
"min_vram_gb": 13.5,
"quantization": "QAT-INT8",
"context_length": 131072,
"use_case": "General purpose, multimodal (QAT INT8 — highest quality, 2x VRAM of QAT-INT4; vLLM native; no GGUF)",
"is_moe": false,
"num_experts": null,
"active_experts": null,
"active_parameters": null,
"architecture": "gemma4",
"pipeline_tag": "image-text-to-text",
"release_date": "2026-04-01",
"gguf_sources": [],
"capabilities": [
"vision"
]
},
{
"name": "google/gemma-4-12B-it-qat-q4_0-gguf",
"provider": "Google",
"parameter_count": "12.0B",
"parameters_raw": 12000000000,
"min_ram_gb": 8.5,
"recommended_ram_gb": 11.0,
"min_vram_gb": 7.5,
"quantization": "QAT-INT4",
"context_length": 262144,
"use_case": "General purpose, multimodal (vision + audio); official Google QAT int4 GGUF — near-bf16 quality at int4 size, served on llama.cpp/Ollama with CPU offload",
"is_moe": false,
"num_experts": null,
"active_experts": null,
"active_parameters": null,
"architecture": "gemma4",
"pipeline_tag": "image-text-to-text",
"release_date": "2026-04-01",
"gguf_sources": [
{
"repo": "google/gemma-4-12B-it-qat-q4_0-gguf",
"provider": "Google",
"file": "gemma-4-12b-it-qat-q4_0.gguf"
}
],
"capabilities": [
"vision",
"audio"
]
},
{
"name": "google/gemma-4-26B-A4B-it-qat-q4_0-gguf",
"provider": "Google",
"parameter_count": "25.2B",
"parameters_raw": 25200000000,
"min_ram_gb": 14.4,
"recommended_ram_gb": 18.0,
"min_vram_gb": 14.4,
"quantization": "QAT-INT4",
"context_length": 262144,
"use_case": "High-throughput, multimodal MoE (3.8B active); official Google QAT int4 GGUF — near-bf16 quality at int4 size, served on llama.cpp with CPU offload",
"is_moe": true,
"num_experts": null,
"active_experts": null,
"active_parameters": 3800000000,
"architecture": "gemma4",
"pipeline_tag": "image-text-to-text",
"release_date": "2026-04-01",
"gguf_sources": [
{
"repo": "google/gemma-4-26B-A4B-it-qat-q4_0-gguf",
"provider": "Google"
}
],
"capabilities": [
"vision"
]
},
{
"name": "google/gemma-4-31B-it",
"provider": "Google",
@@ -19144,4 +19276,4 @@
],
"_discovered": true
}
]
]
+117 -14
View File
@@ -9,7 +9,7 @@ from services.hwfit.models import (
GPU_BANDWIDTH = {
"5090": 1792, "5080": 960, "5070 ti": 896, "5070": 672, "5060 ti": 448, "5060": 256,
"4090": 1008, "4080 super": 736, "4080": 717, "4070 ti super": 672, "4070 ti": 504, "4070 super": 504, "4070": 504, "4060 ti": 288, "4060": 272,
"3090 ti": 1008, "3090": 936, "3080 ti": 912, "3080": 760, "3070 ti": 608, "3070": 448, "3060 ti": 448, "3060": 360,
"3090 ti": 1008, "3090": 936, "3080 ti": 912, "3080": 760, "3070 ti": 608, "3070": 448, "3060 ti": 448, "3060": 360, "3050 ti": 192, "3050": 224,
"2080 ti": 616, "2080 super": 496, "2080": 448, "2070 super": 448, "2070": 448, "2060 super": 448, "2060": 336,
"1660 ti": 288, "1660 super": 336, "1660": 192, "1650 super": 192, "1650": 128,
"h100 sxm": 3350, "h100": 2039, "h200": 4800, "a100 sxm": 2039, "a100": 1555,
@@ -19,22 +19,36 @@ GPU_BANDWIDTH = {
"6950 xt": 576, "6900 xt": 512, "6800 xt": 512, "6800": 512, "6700 xt": 384, "6600 xt": 256, "6600": 224,
"mi300x": 5300, "mi300": 5300, "mi250x": 3277, "mi250": 3277, "mi210": 1638, "mi100": 1229,
"9070 xt": 624, "9070": 488, "9060 xt": 322, "9060": 322,
# Apple Silicon unified-memory bandwidth (GB/s). Keyed off the chip name
# reported by sysctl machdep.cpu.brand_string (e.g. "Apple M4 Max"). Listed
# before the bare "m_" keys matters less than length-sorting (done below),
# which guarantees "m4 max" is tried before "m4".
"m1 ultra": 800, "m1 max": 400, "m1 pro": 200, "m1": 68,
"m2 ultra": 800, "m2 max": 400, "m2 pro": 200, "m2": 100,
"m3 ultra": 800, "m3 max": 300, "m3 pro": 150, "m3": 100,
"m4 max": 546, "m4 pro": 273, "m4": 120,
"m5 max": 546, "m5 pro": 273, "m5": 150,
# NVIDIA GB10 Grace-Blackwell superchip (DGX Spark). Unified LPDDR5X memory,
# not Apple Silicon, so it lives in the generic GPU table — the Apple-only
# lookup never matches it (its name carries no "apple").
"gb10": 273,
}
# Pre-sort keys by length descending for correct substring matching
_BW_KEYS_SORTED = sorted(GPU_BANDWIDTH.keys(), key=len, reverse=True)
# metal: backstop for Apple Silicon chips not in GPU_BANDWIDTH (e.g. a future
# M5) — the named chips above take the accurate bandwidth path instead.
# Apple Silicon unified-memory bandwidth (GB/s). For chip families with both
# binned and full variants under the same "Apple Mx Max" brand string, prefer
# GPU core count when hardware detection provides it; otherwise fall back to the
# conservative tier so speed estimates do not over-promise.
APPLE_BANDWIDTH_FIXED = {
"m1 ultra": 800, "m1 max": 400, "m1 pro": 200, "m1": 68,
"m2 ultra": 800, "m2 max": 400, "m2 pro": 200, "m2": 100,
"m3 ultra": 800, "m3 pro": 150, "m3": 100,
"m4 pro": 273, "m4": 120,
"m5 pro": 307, "m5": 153,
}
APPLE_BANDWIDTH_BY_CORES = {
"m3 max": {30: 300, 40: 400},
"m4 max": {32: 410, 40: 546},
"m5 max": {32: 460, 40: 614},
}
_APPLE_FIXED_KEYS_SORTED = sorted(APPLE_BANDWIDTH_FIXED.keys(), key=len, reverse=True)
_APPLE_VARIANT_KEYS_SORTED = sorted(APPLE_BANDWIDTH_BY_CORES.keys(), key=len, reverse=True)
# metal: backstop for Apple Silicon chips not in the explicit tables above
# (e.g. a future M6) — use a conservative generic estimate when unknown.
FALLBACK_K = {"cuda": 220, "rocm": 180, "metal": 150, "cpu_x86": 70, "cpu_arm": 90}
USE_CASE_WEIGHTS = {
@@ -60,16 +74,100 @@ CONTEXT_TARGET = {
}
def _lookup_bandwidth(gpu_name):
def _lookup_apple_bandwidth(system):
gpu_name = system.get("gpu_name")
if not isinstance(gpu_name, str) or not gpu_name:
return None
gn = gpu_name.lower()
# Guard against false matches on non-Apple GPUs whose names contain
# "m3"/"m4"/"m5" (e.g. NVIDIA Quadro M4 000).
if "apple" not in gn:
return None
raw_cores = system.get("gpu_cores")
try:
gpu_cores = int(raw_cores) if raw_cores is not None else None
except (TypeError, ValueError):
gpu_cores = None
for key in _APPLE_VARIANT_KEYS_SORTED:
if key not in gn:
continue
if gpu_cores in APPLE_BANDWIDTH_BY_CORES[key]:
return APPLE_BANDWIDTH_BY_CORES[key][gpu_cores]
return min(APPLE_BANDWIDTH_BY_CORES[key].values())
for key in _APPLE_FIXED_KEYS_SORTED:
if key in gn:
return APPLE_BANDWIDTH_FIXED[key]
return None
def _lookup_bandwidth(system):
if isinstance(system, dict):
gpu_name = system.get("gpu_name")
else:
gpu_name = system
if not isinstance(gpu_name, str) or not gpu_name:
return None
# Apple tiers live only in the Apple-specific table now (#2564), so route
# BOTH dict and bare-string callers through it. A bare string carries no
# gpu_cores, so the helper falls back to the conservative (lowest) tier for
# that model -- before #2564 the generic table answered string lookups, and
# dropping that made _lookup_bandwidth("Apple M3 Max") return None.
apple_input = system if isinstance(system, dict) else {"gpu_name": gpu_name}
bw = _lookup_apple_bandwidth(apple_input)
if bw is not None:
return bw
gn = gpu_name.lower()
for key in _BW_KEYS_SORTED:
if key in gn:
return GPU_BANDWIDTH[key]
return None
def _canonical_cpu_backend(system):
"""Return the canonical CPU backend for cpu_only speed estimation.
Normalizes CPU-architecture aliases separately from the GPU backend, and
overrides GPU-only backends (CUDA/ROCm/Metal) so they do not inherit a
discrete-GPU fallback constant when the model is actually running on CPU.
"""
backend = (system.get("backend") or "").lower().strip()
cpu_arch = (system.get("cpu_arch") or "").lower().strip()
cpu_name = (system.get("cpu_name") or "").lower()
gpu_name = (system.get("gpu_name") or "").lower()
# Already-canonical CPU backends
if backend in ("cpu_x86", "cpu_arm"):
return backend
# Raw CPU-architecture aliases. Treat plain "arm" as 32-bit ARM, not the
# ARM64-class CPU fallback used for Apple Silicon/aarch64 machines.
if backend in ("x86_64", "amd64", "i386", "i686"):
return "cpu_x86"
if backend in ("arm64", "aarch64"):
return "cpu_arm"
# Prefer an explicit CPU architecture field when present
if cpu_arch:
if cpu_arch in ("x86_64", "amd64", "x86", "i386", "i686"):
return "cpu_x86"
if cpu_arch in ("arm64", "aarch64"):
return "cpu_arm"
# Apple Silicon enters ranking as backend="metal"; its CPU path is ARM.
if backend in ("metal", "mps", "apple") or "apple" in cpu_name or "apple" in gpu_name:
return "cpu_arm"
# Conservative default for CUDA/ROCm/discrete GPU backends and unknowns.
return "cpu_x86"
def _estimate_speed(model, quant, run_mode, system, offload_frac=0.0):
"""Estimate tok/s. Uses active params for MoE (only active experts run per token).
@@ -84,9 +182,14 @@ def _estimate_speed(model, quant, run_mode, system, offload_frac=0.0):
"""
pb = _active_params_b(model)
is_moe = model.get("is_moe", False)
bw = _lookup_bandwidth(system.get("gpu_name"))
bw = _lookup_bandwidth(system)
backend = system.get("backend", "cpu_x86")
# CPU-only inference must never inherit a GPU backend's fallback constant,
# even if the detected system happens to report a CUDA/Metal/ROCm backend.
if run_mode == "cpu_only":
backend = _canonical_cpu_backend(system)
if bw and run_mode in ("gpu", "cpu_offload"):
bpp = QUANT_BYTES_PER_PARAM.get(quant, 0.5)
model_gb = pb * bpp
+111 -14
View File
@@ -1,3 +1,4 @@
import json
import os
import platform
import re
@@ -281,7 +282,17 @@ def _detect_amd():
"gpus": cards,
"gpu_groups": groups,
"homogeneous": len(groups) <= 1,
"backend": "rocm",
# Pick the actual runtime label: ROCm/HIP only when its
# toolchain is installed, otherwise Vulkan if vulkaninfo is
# present (mesa RADV works fine on RDNA/CDNA when ROCm
# packages are absent — see Strix Halo where ROCm support
# is still backporting). Reporting "rocm" on a Vulkan-only
# host misleads downstream env-var pinning
# (HIP_VISIBLE_DEVICES is a no-op there).
"backend": (
"rocm" if (_run(["which", "rocminfo"]) or _run(["which", "hipconfig"]))
else ("vulkan" if _run(["which", "vulkaninfo"]) else "rocm")
),
"unified_memory": is_apu,
# AMD ISA/family so downstream can tell datacenter Instinct (CDNA,
# where vLLM/SGLang run AWQ/GPTQ reliably) from consumer Radeon
@@ -319,7 +330,7 @@ def _detect_apple_silicon():
# Only Apple Silicon (arm64) has a Metal GPU worth serving LLMs on; Intel
# Macs fall through to the CPU path.
if "arm" not in arch and "aarch64" not in arch:
if _canonical_cpu_arch(arch) != "arm64":
return None
# Chip name, e.g. "Apple M4 Max" — carries the Pro/Max/Ultra variant that
@@ -335,6 +346,37 @@ def _detect_apple_silicon():
if total_gb <= 0:
return None
def _parse_apple_gpu_cores(text):
if not text:
return None
try:
data = json.loads(text)
except (TypeError, ValueError, json.JSONDecodeError):
data = None
if isinstance(data, dict):
for gpu in data.get("SPDisplaysDataType") or []:
if not isinstance(gpu, dict):
continue
model = str(gpu.get("sppci_model") or gpu.get("_name") or "")
if "apple" not in model.lower():
continue
cores = gpu.get("sppci_cores")
try:
return int(str(cores).strip())
except (TypeError, ValueError):
continue
m = re.search(r"Total Number of Cores:\s*(\d+)", text)
if m:
try:
return int(m.group(1))
except ValueError:
return None
return None
gpu_cores = _parse_apple_gpu_cores(_run(["system_profiler", "SPDisplaysDataType", "-json"]))
if gpu_cores is None:
gpu_cores = _parse_apple_gpu_cores(_run(["system_profiler", "SPDisplaysDataType"]))
# Usable GPU budget. macOS lets Metal use most of unified memory, but the
# default working-set limit scales with RAM: small machines have to keep
# more back for the OS + app. These fractions track Apple's
@@ -357,7 +399,7 @@ def _detect_apple_silicon():
pass
gpu = {"index": 0, "name": brand, "vram_gb": vram_gb}
return {
info = {
"gpu_name": brand,
"gpu_vram_gb": vram_gb,
"gpu_count": 1,
@@ -369,6 +411,9 @@ def _detect_apple_silicon():
# separate pool — downstream fit logic uses this to avoid double-budgeting.
"unified_memory": True,
}
if gpu_cores is not None:
info["gpu_cores"] = gpu_cores
return info
def _read_file(path):
@@ -468,12 +513,57 @@ def _get_cpu_count():
return os.cpu_count() or 1
def _canonical_cpu_arch(value):
arch = str(value or "").lower().strip().replace("-", "_")
if arch in ("x86_64", "amd64", "x64"):
return "x86_64"
if arch in ("i386", "i686", "x86"):
return "x86"
if arch in ("arm64", "aarch64"):
return "arm64"
if arch == "arm" or arch.startswith("armv"):
return "arm"
return arch
def _get_cpu_arch():
if _remote_host:
return _canonical_cpu_arch(_run(["uname", "-m"]) or "")
return _canonical_cpu_arch(platform.machine())
def _powershell_exe():
"""Pick the best PowerShell executable for LOCAL execution: prefer pwsh
(PowerShell 7+), fall back to Windows PowerShell 5.1. Returns an absolute
path so we don't depend on a particular PATH ordering."""
return shutil.which("pwsh") or shutil.which("powershell") or "powershell"
def _powershell_encoded_for_ssh(script: str):
"""Run a PowerShell script on a remote Windows host over SSH.
Nested quotes in powershell -Command break when passed through Windows
OpenSSH's cmd wrapper; -EncodedCommand avoids that.
"""
import base64
encoded = base64.b64encode(script.encode("utf-16-le")).decode("ascii")
return _run(f"powershell -NoProfile -EncodedCommand {encoded}")
def _probe_remote_platform():
"""Best-effort OS detection over SSH when the caller didn't pass platform."""
out = _run("echo %OS%")
if out and "Windows_NT" in out:
return "windows"
uname = (_run(["uname", "-s"]) or "").strip().lower()
if uname == "darwin":
# Mac uses the linux detection path (_detect_apple_silicon over SSH).
return "linux"
if uname == "linux":
out = _run("test -d /data/data/com.termux && echo termux || echo linux")
if out and "termux" in out:
return "termux"
return "linux"
def _detect_windows():
"""Detect Windows hardware via PowerShell/WMI.
@@ -493,6 +583,7 @@ def _detect_windows():
$r.cpu_name = $cpu.Name
$r.cpu_cores = (Get-CimInstance Win32_Processor | Measure-Object -Property NumberOfLogicalProcessors -Sum).Sum
$r.arch = $cpu.AddressWidth
$r.cpu_arch = if ($env:PROCESSOR_ARCHITEW6432) { $env:PROCESSOR_ARCHITEW6432 } else { $env:PROCESSOR_ARCHITECTURE }
# GPU detection via nvidia-smi (fastest) or WMI fallback
try {
$nv = nvidia-smi --query-gpu=memory.total,name --format=csv,noheader,nounits 2>$null
@@ -535,9 +626,8 @@ def _detect_windows():
"""
)
if _remote_host:
# Remote: ship a single command string over SSH. The remote shell parses
# the quoting; PowerShell on the far side runs the -Command payload.
out = _run(f'powershell -Command "{ps_cmd}"')
# Remote: use -EncodedCommand so OpenSSH/cmd quoting does not break the script.
out = _powershell_encoded_for_ssh(ps_cmd.strip())
else:
# Local: pass a LIST argv straight to subprocess so the OS hands ps_cmd
# to PowerShell verbatim — no fragile string-level quote escaping. Prefer
@@ -564,6 +654,7 @@ def _detect_windows():
"available_ram_gb": d.get("avail_gb", 0),
"cpu_cores": _as_int(d.get("cpu_cores"), 1),
"cpu_name": _cpu_name,
"cpu_arch": _canonical_cpu_arch(d.get("cpu_arch")),
"has_gpu": bool(d.get("gpu_name")),
"gpu_name": d.get("gpu_name"),
"gpu_vram_gb": d.get("gpu_vram_gb"),
@@ -707,6 +798,13 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
"""
global _remote_host, _remote_port, _remote_platform
if host and not platform:
_remote_host = host
_remote_port = ssh_port or None
platform = _probe_remote_platform()
_remote_host = None
_remote_port = None
cache_key = _cache_key(host, ssh_port, platform)
now = time.time()
if not fresh and cache_key in _cache_by_host:
@@ -727,8 +825,8 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
_remote_platform = None
_cache_by_host[cache_key] = (now, result)
return result
# If Windows detection failed, return error
result = {"error": f"Cannot connect to {host}", "host": host}
# SSH may work while the PowerShell hardware probe still fails.
result = {"error": f"Windows hardware probe failed for {host}", "host": host}
_remote_host = None
_remote_platform = None
_cache_by_host[cache_key] = (now, result)
@@ -759,6 +857,7 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
available_ram = round(_get_available_ram_gb(), 1)
cpu_cores = _get_cpu_count()
cpu_name = _get_cpu_name()
cpu_arch = _get_cpu_arch()
gpu_info = _detect_apple_silicon() or _detect_nvidia() or _detect_amd()
@@ -768,10 +867,12 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
"available_ram_gb": available_ram,
"cpu_cores": cpu_cores,
"cpu_name": cpu_name,
"cpu_arch": cpu_arch,
"has_gpu": True,
"gpu_name": gpu_info["gpu_name"],
"gpu_vram_gb": gpu_info["gpu_vram_gb"],
"gpu_count": gpu_info["gpu_count"],
"gpu_cores": gpu_info.get("gpu_cores"),
"gpus": gpu_info.get("gpus", []),
"gpu_groups": gpu_info.get("gpu_groups", []),
"homogeneous": gpu_info.get("homogeneous", True),
@@ -781,17 +882,13 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
"unified_memory": gpu_info.get("unified_memory", False),
}
else:
if _remote_host:
arch_out = _run(["uname", "-m"]) or ""
else:
import platform as _platform
arch_out = _platform.machine().lower()
backend = "cpu_arm" if "aarch64" in arch_out or "arm" in arch_out else "cpu_x86"
backend = "cpu_arm" if cpu_arch == "arm64" else "cpu_x86"
result = {
"total_ram_gb": total_ram,
"available_ram_gb": available_ram,
"cpu_cores": cpu_cores,
"cpu_name": cpu_name,
"cpu_arch": cpu_arch,
"has_gpu": False,
"gpu_name": None,
"gpu_vram_gb": None,
+8
View File
@@ -12,6 +12,7 @@ QUANT_BPP = {
"Q4_K_M": 0.58, "Q4_0": 0.58, "Q3_K_M": 0.48, "Q2_K": 0.37,
"AWQ-4bit": 0.50, "AWQ-8bit": 1.0,
"GPTQ-Int4": 0.50, "GPTQ-Int8": 1.0,
"QAT-INT4": 0.50, "QAT-INT8": 1.0,
"mlx-4bit": 0.55, "mlx-8bit": 1.0, "mlx-6bit": 0.75,
# DeepSeek-V4-style mixed: MoE experts in FP4 (bulk), attention + non-
# expert dense in FP8, embeddings/LM head in BF16. By weight count the
@@ -30,6 +31,7 @@ QUANT_SPEED_MULT = {
"Q4_K_M": 1.15, "Q4_0": 1.15, "Q3_K_M": 1.25, "Q2_K": 1.35,
"AWQ-4bit": 1.2, "AWQ-8bit": 0.85,
"GPTQ-Int4": 1.2, "GPTQ-Int8": 0.85,
"QAT-INT4": 1.15, "QAT-INT8": 0.85,
"mlx-4bit": 1.15, "mlx-8bit": 0.85, "mlx-6bit": 1.0,
"FP4-MoE-Mixed": 1.10, # slightly slower than pure FP4 because of mixed-dtype dispatch
"FP8-Mixed": 0.85,
@@ -47,6 +49,10 @@ QUANT_QUALITY_PENALTY = {
# penalty so FP8 wins when both fit. AWQ-4bit stays heavier.
"AWQ": -1.0, "AWQ-4bit": -4.0, "AWQ-8bit": -1.0,
"GPTQ": -1.0, "GPTQ-Int4": -4.0, "GPTQ-Int8": -1.0,
# Quantization-aware training recovers most of the int4 quality loss, so a
# QAT-INT4 build lands far closer to bf16 than a post-training Q4/INT4
# (Google reports near-bf16 quality). Penalize it lightly, not like Q4_K_M.
"QAT-INT4": -1.0, "QAT-INT8": 0.0,
"mlx-4bit": -4.0, "mlx-8bit": -0.5, "mlx-6bit": -1.5,
# DeepSeek-V4 mixed: only MoE experts at FP4 (the rest is FP8/BF16),
# so the realized quality is much closer to FP8 than to pure FP4 —
@@ -63,6 +69,7 @@ QUANT_BYTES_PER_PARAM = {
"Q4_K_M": 0.5, "Q4_0": 0.5, "Q3_K_M": 0.375, "Q2_K": 0.25,
"AWQ-4bit": 0.5, "AWQ-8bit": 1.0,
"GPTQ-Int4": 0.5, "GPTQ-Int8": 1.0,
"QAT-INT4": 0.5, "QAT-INT8": 1.0,
"mlx-4bit": 0.5, "mlx-8bit": 1.0, "mlx-6bit": 0.75,
"FP4-MoE-Mixed": 0.55,
"FP8-Mixed": 1.0,
@@ -74,6 +81,7 @@ PREQUANTIZED_PREFIXES = (
"AWQ-", "GPTQ-", "mlx-", "FP8", "FP4", "NVFP4", "MXFP4", "NF4",
"INT4", "INT8", "W4A16", "W8A8", "W8A16",
"FP4-MoE-Mixed", "FP8-Mixed",
"QAT-",
)
+42 -26
View File
@@ -66,41 +66,57 @@ def _has_duplicate_title(skills, title: str) -> bool:
def _extract_json_object(text: str) -> Optional[dict]:
"""Best-effort extraction of a JSON object from an LLM response.
The response may be wrapped in code fences or surrounded by prose, and some
models emit a stray brace in the prose before the real object
(e.g. "uses {placeholder} then {...}"). Slicing first-'{' .. last-'}' then
grabs an unparseable span and the skill is silently lost. Try the whole
string first, then each '{' start position in turn, returning the first
candidate that parses to a JSON object (dict). Returns None if none do.
The response may be wrapped in code fences or surrounded by prose. Uses
json.JSONDecoder().raw_decode() to locate the boundaries of complete JSON
objects starting at each '{' position. Nested objects are filtered out to
keep only top-level candidates. If multiple non-overlapping valid JSON
objects are found, it is treated as ambiguous and returns None. Otherwise,
returns the single valid candidate dictionary.
"""
if not text:
return None
s = text.strip()
if s.startswith("```"):
s = s.split("\n", 1)[-1].rsplit("```", 1)[0].strip()
end = s.rfind("}")
if end == -1:
decoder = json.JSONDecoder()
candidates = []
start = s.find("{")
while start != -1:
try:
obj, idx = decoder.raw_decode(s[start:])
end_pos = start + idx
if isinstance(obj, dict):
candidates.append((start, end_pos, obj))
except (json.JSONDecodeError, ValueError):
pass
start = s.find("{", start + 1)
# Filter out nested candidates to identify top-level dictionaries
top_level = []
for c in candidates:
is_nested = False
for other in candidates:
if other == c:
continue
if other[0] <= c[0] and c[1] <= other[1]:
is_nested = True
break
if not is_nested:
top_level.append(c)
if not top_level:
return None
def _as_dict(candidate):
try:
obj = json.loads(candidate)
except (json.JSONDecodeError, ValueError):
return None
return obj if isinstance(obj, dict) else None
if len(top_level) > 1:
logger.debug(
"[skill-extract] Found multiple non-overlapping JSON objects: %s",
[item[2].get("title") for item in top_level]
)
return None
# The clean, common case: the whole (de-fenced) string is the object.
obj = _as_dict(s)
if obj is not None:
return obj
# Otherwise scan each '{' candidate up to the last '}'.
start = s.find("{")
while 0 <= start < end:
obj = _as_dict(s[start : end + 1])
if obj is not None:
return obj
start = s.find("{", start + 1)
return None
return top_level[0][2]
async def maybe_extract_skill(
+6 -4
View File
@@ -603,7 +603,6 @@ class SkillsManager:
escalation) those are work-in-progress and pollute the
prompt with half-finished procedures.
"""
active_toolsets = active_toolsets or []
out = []
for s in self.load(owner=owner):
status = s.get("status")
@@ -617,13 +616,16 @@ class SkillsManager:
# Platform gating
if platform and s.get("platforms") and platform not in s["platforms"]:
continue
# requires_toolsets: hide unless every required toolset is active
# requires_toolsets: hide unless every required toolset is active.
# active_toolsets=None means the caller doesn't know the active
# set (API listings, chat preface) — don't gate in that case;
# only an explicit list filters.
req = s.get("requires_toolsets") or []
if req and not all(t in active_toolsets for t in req):
if req and active_toolsets is not None and not all(t in active_toolsets for t in req):
continue
# fallback_for_toolsets: hide when any of those toolsets is active
fb = s.get("fallback_for_toolsets") or []
if fb and any(t in active_toolsets for t in fb):
if fb and active_toolsets and any(t in active_toolsets for t in fb):
continue
out.append({
"name": s["name"],
+163 -14
View File
@@ -15,6 +15,8 @@ from urllib.parse import urljoin, urlparse
import httpx
from bs4 import BeautifulSoup
from src.constants import WEB_FETCH_SOFT_MAX_BYTES, WEB_FETCH_HARD_MAX_BYTES, WEB_FETCH_USER_AGENT
from .analytics import RateLimitError, error_logger
from .cache import (
CONTENT_CACHE_DIR,
@@ -89,18 +91,128 @@ def _public_http_url(url: str) -> bool:
return False
def _get_public_url(url: str, headers: dict, timeout: int, max_redirects: int = 5) -> httpx.Response:
class BodyTooLargeError(Exception):
"""The server declared a body larger than the hard fetch ceiling."""
def __init__(self, url: str, declared_bytes: int):
self.url = url
self.declared_bytes = declared_bytes
super().__init__(
f"response body is {declared_bytes:,} bytes, over the "
f"{WEB_FETCH_HARD_MAX_BYTES:,}-byte hard cap"
)
class _CappedFetch:
"""Result of a size-capped streaming GET.
Carries just what fetch_webpage_content needs from an httpx.Response,
plus the cap bookkeeping: the (possibly truncated) body, whether the
cap cut it short, and the size the server declared via Content-Length
(wire bytes; None when absent).
"""
__slots__ = ("status_code", "headers", "content", "truncated",
"declared_bytes", "encoding", "url")
def __init__(self, status_code, headers, content, truncated,
declared_bytes, encoding, url):
self.status_code = status_code
self.headers = headers
self.content = content
self.truncated = truncated
self.declared_bytes = declared_bytes
self.encoding = encoding
self.url = url
@property
def text(self) -> str:
return self.content.decode(self.encoding or "utf-8", errors="replace")
def raise_for_status(self):
if self.status_code >= 400:
request = httpx.Request("GET", self.url)
raise httpx.HTTPStatusError(
f"HTTP {self.status_code} for {self.url}",
request=request,
response=httpx.Response(self.status_code, request=request),
)
def _get_public_url(url: str, headers: dict, timeout: int, max_redirects: int = 5,
max_bytes: int = None) -> "_CappedFetch":
"""Capped streaming GET with SSRF-guarded manual redirects.
The body is streamed and buffering stops at ``max_bytes`` (default: the
soft cap), so an oversized resource cannot be pulled into memory or the
content cache in full. When Content-Length already declares a body over
the hard ceiling, the fetch is refused before any body bytes are read.
"""
cap = min(max_bytes or WEB_FETCH_SOFT_MAX_BYTES, WEB_FETCH_HARD_MAX_BYTES)
current = url
for _ in range(max_redirects + 1):
if not _public_http_url(current):
raise httpx.RequestError("Blocked private/internal URL", request=httpx.Request("GET", current))
response = httpx.get(current, headers=headers, timeout=timeout, follow_redirects=False)
if response.status_code not in (301, 302, 303, 307, 308):
return response
location = response.headers.get("location")
if not location:
return response
current = urljoin(str(response.url), location)
# Force identity transfer-encoding. With gzip/deflate the wire bytes
# (and Content-Length) can be a small fraction of the decoded body, so
# a tiny compressed response could pass the hard-cap preflight and then
# expand past the ceiling in a single decoded chunk before the streamed
# cap below can slice it. Identity makes Content-Length the true body
# size and keeps each streamed chunk bounded by the network read.
req_headers = dict(headers or {})
req_headers["Accept-Encoding"] = "identity"
with httpx.stream("GET", current, headers=req_headers, timeout=timeout,
follow_redirects=False) as response:
if response.status_code in (301, 302, 303, 307, 308):
location = response.headers.get("location")
if not location:
return _CappedFetch(response.status_code, response.headers, b"",
False, None, response.encoding, str(response.url))
current = urljoin(str(response.url), location)
continue
# A server can ignore the identity request and still return a
# compressed body; httpx.iter_bytes would then decode it, and a tiny
# gzip can balloon into one decoded chunk far past the cap before we
# slice. Refuse a compressed Content-Encoding so the streamed cap
# stays a real memory bound (Content-Length is the compressed wire
# length here, so the preflight and size metadata are unreliable too).
enc = (response.headers.get("content-encoding") or "").strip().lower()
if enc and enc != "identity":
raise httpx.RequestError(
f"Refusing compressed response (Content-Encoding: {enc}) after "
"requesting identity: cannot bound decoded body size",
request=httpx.Request("GET", current),
)
declared = None
raw_len = response.headers.get("content-length")
if raw_len and raw_len.isdigit():
declared = int(raw_len)
# Refuse before buffering anything when the server already tells
# us the body exceeds the absolute ceiling (Content-Length is wire
# bytes; the decompressed body can only be larger).
if declared is not None and declared > WEB_FETCH_HARD_MAX_BYTES:
raise BodyTooLargeError(current, declared)
chunks = []
read = 0
truncated = False
# We requested identity above, so iter_bytes yields the raw body in
# network-read-sized chunks (no decompression expansion); the cap
# therefore bounds what we actually buffer.
for chunk in response.iter_bytes():
read += len(chunk)
if read > cap:
keep = cap - (read - len(chunk))
if keep > 0:
chunks.append(chunk[:keep])
truncated = True
break
chunks.append(chunk)
return _CappedFetch(response.status_code, response.headers,
b"".join(chunks), truncated, declared,
response.encoding, str(response.url))
raise httpx.RequestError("Too many redirects", request=httpx.Request("GET", current))
# PDF extraction (optional dependency)
@@ -222,9 +334,19 @@ def _empty_result(url: str, error: str = "") -> dict:
# ----------------------------------------------------------------------
# Main content fetcher
# ----------------------------------------------------------------------
def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) -> dict:
"""Fetch and extract meaningful content from a webpage with caching."""
cache_key = generate_cache_key(url)
def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0,
max_bytes: int = None) -> dict:
"""Fetch and extract meaningful content from a webpage with caching.
``max_bytes`` raises the download budget per call (clamped to the hard
cap); the default is the soft cap. When the body is cut short the result
carries ``truncated``/``fetched_bytes``/``total_bytes`` so callers can
tell the model the content is partial (#3812).
"""
effective_cap = min(max_bytes or WEB_FETCH_SOFT_MAX_BYTES, WEB_FETCH_HARD_MAX_BYTES)
# The cap is part of the cache identity: a truncated soft-cap fetch must
# not be served to a later full-budget request for the same URL.
cache_key = generate_cache_key(f"{url}#cap={effective_cap}")
cache_file = CONTENT_CACHE_DIR / f"{cache_key}.cache"
# Check cache
@@ -247,18 +369,24 @@ def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) ->
# Fetch
try:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"User-Agent": WEB_FETCH_USER_AGENT,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
# identity so the streamed size cap in _get_public_url stays honest
# (a compressed body can decode to far more than Content-Length).
"Accept-Encoding": "identity",
"Connection": "keep-alive",
}
response = _get_public_url(url, headers=headers, timeout=timeout)
response = _get_public_url(url, headers=headers, timeout=timeout,
max_bytes=effective_cap)
if response.status_code == 429:
raise RateLimitError(f"Rate limit hit for {url} (attempt {retry_attempt})")
response.raise_for_status()
except BodyTooLargeError as e:
error_logger.warning(f"Refused oversized body for {url}: {e}")
return _empty_result(url, f"TooLarge: {e}")
except httpx.HTTPStatusError as e:
error_logger.warning(f"HTTP {e.response.status_code} fetching {url}: {e}")
return _empty_result(url, f"HTTP {e.response.status_code}: {e}")
@@ -269,9 +397,27 @@ def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) ->
error_logger.error(str(e))
return _empty_result(url, str(e))
# Size bookkeeping shared by every content branch below. getattr keeps
# plain httpx.Response stand-ins (tests) working without the cap fields.
_size_fields = {
"truncated": getattr(response, "truncated", False),
"fetched_bytes": len(response.content),
"total_bytes": getattr(response, "declared_bytes", None),
}
# PDF handling
content_type = response.headers.get("Content-Type", "").lower()
if "application/pdf" in content_type or url.lower().endswith(".pdf"):
if _size_fields["truncated"]:
# A PDF cut mid-stream is not parseable; unlike text there is no
# useful partial result, so report the budget problem instead.
_declared = _size_fields["total_bytes"]
return _empty_result(
url,
f"TooLarge: PDF exceeds the {effective_cap:,}-byte fetch budget"
+ (f" (size {_declared:,} bytes)" if _declared else "")
+ "; retry with a larger budget if it fits under the hard cap",
)
if pdf_extract_text is None:
logger.error("pdfminer.six is not installed; cannot extract PDF text.")
pdf_text = ""
@@ -295,6 +441,7 @@ def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) ->
"js_message": "",
"success": bool(pdf_text),
"error": "" if pdf_text else "Failed to extract PDF text",
**_size_fields,
}
_cache_result(cache_file, cache_key, result, url)
return result
@@ -329,6 +476,7 @@ def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) ->
"js_message": "",
"success": bool(text_body),
"error": "" if text_body else "Empty response body",
**_size_fields,
}
_cache_result(cache_file, cache_key, result, url)
return result
@@ -391,6 +539,7 @@ def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) ->
"js_message": js_message,
"success": True,
"error": "",
**_size_fields,
}
_cache_result(cache_file, cache_key, result, url)
return result
+4 -6
View File
@@ -9,14 +9,12 @@ from urllib.parse import urljoin, urlparse, parse_qs
import httpx
from bs4 import BeautifulSoup
from src.constants import SEARXNG_INSTANCE
from src.constants import SEARXNG_INSTANCE, REQUEST_TIMEOUT, WEB_FETCH_USER_AGENT
from .analytics import RateLimitError, error_logger
from .query import build_enhanced_query
logger = logging.getLogger(__name__)
REQUEST_TIMEOUT = 20
# Provider registry — maps setting value to (label, needs_key, needs_url)
PROVIDER_INFO = {
"searxng": ("SearXNG", False, True),
@@ -140,7 +138,7 @@ def searxng_search_api(query: str, count: Optional[int] = None, categories: str
count = count if count is not None else _get_result_count()
instance = _get_search_instance()
api_key = ""
headers = {"User-Agent": "Mozilla/5.0"}
headers = {"User-Agent": WEB_FETCH_USER_AGENT}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
# News/fresh queries do badly in the 'general' category — it favours
@@ -252,7 +250,7 @@ def searxng_search(query, max_results=10):
"""Search using SearXNG instance - parsing HTML."""
instance = _get_search_instance()
api_key = ""
req_headers = {"User-Agent": "Mozilla/5.0"}
req_headers = {"User-Agent": WEB_FETCH_USER_AGENT}
if api_key:
req_headers["Authorization"] = f"Bearer {api_key}"
try:
@@ -391,7 +389,7 @@ def duckduckgo_search(query: str, count: Optional[int] = None, time_filter: Opti
response = httpx.get(
"https://html.duckduckgo.com/html/",
params={"q": query, "kp": _safesearch_for("duckduckgo_html")},
headers={"User-Agent": "Mozilla/5.0"},
headers={"User-Agent": WEB_FETCH_USER_AGENT},
timeout=REQUEST_TIMEOUT,
)
response.raise_for_status()
+29 -6
View File
@@ -16,8 +16,9 @@ sys.path.insert(0, BASE_DIR)
from src.constants import (
DATA_DIR, AUTH_FILE, UPLOAD_DIR, PERSONAL_DIR, PERSONAL_UPLOADS_DIR,
TTS_CACHE_DIR, GENERATED_IMAGES_DIR, DEEP_RESEARCH_DIR, CHROMA_DIR,
RAG_DIR, MEMORY_VECTORS_DIR,
RAG_DIR, MEMORY_VECTORS_DIR, PASSWORD_MIN_LENGTH,
)
from core.auth import RESERVED_USERNAMES
DIRS = [
DATA_DIR,
@@ -59,15 +60,23 @@ def _prompt_admin_credentials():
print(" (Press Enter to accept defaults)")
print()
username = input(" Username [admin]: ").strip().lower()
if not username:
username = "admin"
while True:
username = input(" Username [admin]: ").strip().lower()
if not username:
username = "admin"
if username in RESERVED_USERNAMES:
print(f" '{username}' is a reserved username. Choose another.")
continue
break
while True:
password = getpass.getpass(" Password: ")
if not password:
print(" Password cannot be empty.")
continue
if len(password) < PASSWORD_MIN_LENGTH:
print(f" Password must be at least {PASSWORD_MIN_LENGTH} characters.")
continue
confirm = getpass.getpass(" Confirm password: ")
if password != confirm:
print(" Passwords don't match. Try again.")
@@ -93,8 +102,13 @@ def create_default_admin():
password = os.getenv("ODYSSEUS_ADMIN_PASSWORD", "").strip()
if username and password:
# Both provided via env — use them directly
pass
# Both provided via env — validate before using
if username in RESERVED_USERNAMES:
print(f" [error] ODYSSEUS_ADMIN_USER '{username}' is a reserved username")
return "failed"
if len(password) < PASSWORD_MIN_LENGTH:
print(f" [error] ODYSSEUS_ADMIN_PASSWORD must be at least {PASSWORD_MIN_LENGTH} characters")
return "failed"
elif sys.stdin.isatty() and not os.getenv("ODYSSEUS_SKIP_ADMIN_PROMPT"):
# Interactive terminal — ask the user
username, password = _prompt_admin_credentials()
@@ -225,6 +239,15 @@ def check_arch():
def main():
print("\n=== Odysseus Setup ===\n")
# Load .env so pre-seeded ODYSSEUS_ADMIN_USER / ODYSSEUS_ADMIN_PASSWORD (and
# other deployment vars) are honored on native installs, not just when they
# are exported in the shell. Mirrors app.py: encoding="utf-8-sig" tolerates a
# UTF-8 BOM in a Notepad-saved .env. load_dotenv does not override already
# exported OS env vars, so the existing precedence is preserved. python-dotenv
# is a hard dependency (requirements.txt) and is verified by check_deps below.
from dotenv import load_dotenv
load_dotenv(os.path.join(BASE_DIR, ".env"), encoding="utf-8-sig")
# Fail fast with a clear message if the CPU architecture is wrong (Apple
# Silicon under an x86/Rosetta Python) before importing anything native.
check_arch()
+412
View File
@@ -0,0 +1,412 @@
# Architecture Runtime Inventory
> **Purpose**: Phase 0 planning baseline for codebase readability improvements (#4071).
> **Parent issue**: [#4082](https://github.com/pewdiepie-archdaemon/odysseus/issues/4082)
> **Last updated**: dev@b58af42 | 2026-06-16
> **Status**: Draft — to be reviewed before follow-up slices open.
> **Snapshot basis**: Importer / file / import-line counts are refreshed to `dev@b58af42` (2026-06-16) and are recomputable via the commands in §3.4. **Line counts** in §2.1 / §2.2 are a snapshot from an earlier baseline and drift as `dev` moves — recompute any of them with `wc -l <file>`. This inventory tracks structure and risk, not live metrics.
This document maps the current runtime module structure, identifies high-risk boundaries, and recommends safe first refactor slices. It does **not** move files, change imports, or alter runtime behavior.
---
## 1. Current Structure Overview
### 1.1 Top-Level Layout
```
odysseus/
├── app.py # FastAPI app entrypoint (1,145 lines)
├── conf/ # Configuration (config.py, settings.py, settings_scrub.py)
├── src/ # 95 flat .py files + 2 subdirectories
│ ├── agent_tools/ # Tool helpers: document, filesystem, subprocess, web
│ └── search/ # Search subsystem
├── routes/ # 54 flat .py files — HTTP route handlers
├── core/ # 10 files — database models, auth, middleware, session
├── mcp_servers/ # 5 files — MCP server implementations
├── scripts/ # CLI tools and one-shot scripts
├── static/ # Frontend HTML/CSS/JS
├── tests/ # 583 test files (~54,800 lines)
└── services/ # (exists as needed)
```
### 1.2 Directory Flatness Metric
| Directory | Flat `.py` Files | Subdirectories | Concern |
|-----------|-----------------|----------------|---------|
| `src/` | **95** | 2 (`agent_tools/`, `search/`) | No domain grouping; 95 files in one directory |
| `routes/` | **54** | 0 | All route handlers in one flat directory |
| `core/` | 10 | 0 | Manageable, but `database.py` is oversized |
---
## 2. Largest Runtime Modules
### 2.1 Python Backend
| Rank | File | Lines | Classes | Functions | Risk |
|------|------|-------|---------|-----------|------|
| 1 | `src/tool_implementations.py` | **4,032** | 0 | ~48 | **HIGH** |
| 2 | `routes/email_routes.py` | **3,245** | — | — | **MEDIUM** |
| 3 | `routes/cookbook_routes.py` | **2,969** | — | — | **MEDIUM** |
| 4 | `src/agent_loop.py` | **2,961** | 0 | ~24 | **HIGH** |
| 5 | `src/task_scheduler.py` | **2,330** | — | 5 | MEDIUM |
| 6 | `routes/model_routes.py` | **2,266** | — | — | MEDIUM |
| 7 | `core/database.py` | **2,265** | 28 | ~59 helpers | **HIGH** |
| 8 | `src/builtin_actions.py` | **2,262** | 2 | ~24 | MEDIUM |
| 9 | `src/llm_core.py` | **2,164** | — | — | MEDIUM |
| 10 | `mcp_servers/email_server.py` | 2,197 | — | — | LOW (separate process) |
| 11 | `src/visual_report.py` | 1,918 | — | — | LOW |
| 12 | `routes/gallery_routes.py` | 1,896 | — | — | LOW |
| 13 | `src/ai_interaction.py` | 1,846 | — | — | MEDIUM |
| 14 | `routes/document_routes.py` | 1,717 | — | — | LOW |
| 15 | `routes/skills_routes.py` | 1,648 | — | — | LOW |
**Heuristic**: Files > 2,000 lines with 20+ public symbols and many importers are the highest-risk splits. Files 1,0002,000 lines are medium-risk if tightly coupled.
### 2.2 Frontend
| File | Lines | Concern |
|------|-------|---------|
| `static/style.css` | **36,653** | Entire app CSS in one file (tracked separately in #2617) |
| `static/js/document.js` | **9,776** | Single JS file for document functionality |
| `static/js/slashCommands.js` | 6,498 | |
| `static/js/settings.js` | 5,266 | |
| `static/js/emailLibrary.js` | 5,217 | |
| `static/js/notes.js` | 5,124 | |
| `static/js/chat.js` | 4,985 | |
| `static/app.js` | 4,090 | |
**Note**: Frontend modularization is tracked separately in #2617 (CSS) and is not the focus of this Phase 0 inventory. Frontend is listed here for completeness but follow-up slices should target Python backend boundaries first.
---
## 3. Import Dependency Graph
### 3.1 Who Depends on `core/database.py`
**102 files** import from `core.database` — this is the most depended-upon module:
- All route handlers (`routes/*.py`)
- Most `src/*.py` files
- `core/session_manager.py`, `core/auth.py`
- Multiple test files
**Implication**: Any split of `core/database.py` is the highest-risk refactor. It should be tackled **last**, never first.
### 3.2 Who Depends on `src/tool_implementations.py`
**17 files** import from `src.tool_implementations`:
- `src/agent_loop.py`, `src/builtin_actions.py`, `src/tool_index.py`
- `src/task_scheduler.py`, `src/tool_policy.py`
- Various tests
### 3.3 Who Depends on `src/agent_loop.py`
**22 files** import from `src.agent_loop`:
- `src/tool_policy.py`, `src/teacher_escalation.py`, `src/bg_monitor.py`
- `src/task_scheduler.py`
- Multiple test files
### 3.4 Cross-Layer Import Violations
**`src/` importing from `routes/`** (backwards dependency — domain logic depending on HTTP layer):
```
src/tool_implementations.py ──→ routes/calendar_routes.py
src/tool_implementations.py ──→ routes/cookbook_helpers.py
src/tool_implementations.py ──→ routes/email_helpers.py
src/tool_implementations.py ──→ routes/email_pollers.py
src/tool_implementations.py ──→ routes/email_routes.py
src/tool_implementations.py ──→ routes/model_routes.py
src/tool_implementations.py ──→ routes/note_routes.py
src/tool_implementations.py ──→ routes/prefs_routes.py
```
> These are **runtime imports** (inside function bodies, not at module top), which mitigates circular import risk but indicates fuzzy layer boundaries. Function-level inline imports from the HTTP layer into business logic are a code smell.
**Import counts (top-level)**:
| Direction | Count | Notes |
|-----------|-------|-------|
| `routes/``src/` | **374** | Expected: HTTP handlers call domain logic |
| `routes/``core/` | **126** | Expected: handlers access DB models |
| `src/``routes/` | **31** | **Unexpected**: domain logic reaching into HTTP layer (direct grep of import lines referencing `routes/`) |
| `src/``core/` | **106** | Acceptable but could be reduced with a data-access layer |
> **How the metrics in this document are computed** — recompute against current `dev` before treating any count as authoritative (the tree drifts; these numbers are a snapshot, not a live value):
> - `src/` flat `.py` files: `find src -maxdepth 1 -name '*.py' | wc -l`
> - `tests/` test files: `find tests -name 'test_*.py' | wc -l`
> - `core.database` importers: `grep -rlE '(from|import) +core\.database' --include='*.py' . | grep -v core/database.py | wc -l`
> - `src.agent_loop` importers: `grep -rlE '(from|import) +src\.agent_loop' --include='*.py' . | grep -v src/agent_loop.py | wc -l`
> - Cross-layer import lines: `grep -rhE '(from|import) +<pkg>' --include='*.py' <dir>/ | wc -l` (e.g. `(from|import) +routes` over `src/`)
---
## 4. Route Ownership Map
Routes can be grouped into logical feature domains. Current flat structure obscures these boundaries:
| Domain | Route Files | Total Lines | Review Complexity |
|--------|-------------|-------------|-------------------|
| **Email** | `email_routes.py`, `email_helpers.py`, `email_pollers.py` | 5,936 | HIGH — most complex domain |
| **Chat / Agent** | `chat_routes.py`, `chat_helpers.py`, `shell_routes.py`, `codex_routes.py`, `skills_routes.py` | 6,365 | HIGH — core interaction surface |
| **Cookbook** | `cookbook_routes.py`, `cookbook_helpers.py`, `cookbook_output.py` | 4,110 | MEDIUM |
| **Model / LLM** | `model_routes.py`, `assistant_routes.py`, `copilot_routes.py` | 2,764 | MEDIUM |
| **Calendar / Contacts** | `calendar_routes.py`, `contacts_routes.py` | 2,336 | MEDIUM |
| **Documents** | `document_routes.py`, `document_helpers.py` | 1,954 | LOW |
| **Auth** | `auth_routes.py`, `api_token_routes.py`, `device_flow.py` | 1,171 | LOW |
| **Tasks** | `task_routes.py` (standalone) | 1,157 | LOW |
| **Session** | `session_routes.py` (standalone) | 1,287 | LOW |
| **Gallery** | `gallery_routes.py`, `gallery_helpers.py` | 1,896 | LOW |
| **Memory** | `memory_routes.py` | — | LOW |
| **Research** | `research_routes.py` | — | LOW |
| **MCP** | `mcp_routes.py` | — | LOW |
| **Notes** | `note_routes.py` | — | LOW |
| **Other** | `prefs_routes.py`, `upload_routes.py`, `vault_routes.py`, `webhook_routes.py`, `workspace_routes.py`, `search_routes.py`, `history_routes.py`, `hwfit_routes.py`, `preset_routes.py`, `signature_routes.py`, `backup_routes.py`, `cleanup_routes.py`, `diagnostics_routes.py`, `embedding_routes.py`, `emoji_routes.py`, `font_routes.py`, `stt_routes.py`, `tts_routes.py`, `compare_routes.py`, `personal_routes.py`, `editor_draft_routes.py`, `admin_wipe_routes.py`, `chatgpt_subscription_routes.py` | 2,000+ | LOW individual, HIGH cumulative |
---
## 5. Tool Registry & Implementation Boundaries
### 5.1 Current Tool Architecture
| Component | File | Lines | Role |
|-----------|------|-------|------|
| Tool schemas | `src/tool_schemas.py` | 1,392 | JSON Schema tool definitions (Duck-TypedDict) |
| Tool index | `src/tool_index.py` | 542 | RAG-based tool retrieval from ChromaDB |
| Tool implementations | `src/tool_implementations.py` | 4,032 | 33 `do_*` functions — all tool execution logic |
| Tool security | `src/tool_security.py` | — | Owner-scoped tool blocking |
| Tool policy | `src/tool_policy.py` | — | Guide-only directive, plan-mode disabled tools |
| Tool utils | `src/tool_utils.py` | — | Shared tool helpers |
### 5.2 Tool Implementation Categories
The 33 `do_*` functions in `tool_implementations.py` fall into natural domain groups — the basis for slice 1's split in §6.2:
| Category | `do_*` functions | Count |
|----------|------------------|-------|
| **System / config** | `do_manage_skills`, `do_manage_tasks`, `do_manage_endpoints`, `do_manage_mcp`, `do_manage_webhooks`, `do_manage_tokens`, `do_manage_settings`, `do_api_call`, `do_app_api` | 9 |
| **Cookbook / model serving** | `do_download_model`, `do_serve_model`, `do_list_served_models`, `do_stop_served_model`, `do_tail_serve_output`, `do_list_downloads`, `do_cancel_download`, `do_search_hf_models`, `do_adopt_served_model`, `do_list_cookbook_servers`, `do_list_serve_presets`, `do_serve_preset`, `do_list_cached_models` | 13 |
| **Notes** | `do_manage_notes` | 1 |
| **Calendar** | `do_manage_calendar` | 1 |
| **Search** | `do_search_chats` | 1 |
| **Research** | `do_manage_research`, `do_trigger_research` | 2 |
| **Contacts** | `do_resolve_contact`, `do_manage_contact` | 2 |
| **Vault** | `do_vault_search`, `do_vault_get`, `do_vault_unlock` | 3 |
| **Image** | `do_edit_image` | 1 |
| | **Total** | **33** |
> Low-level tools (filesystem, subprocess, web fetch, document parsing) live in `src/agent_tools/`, **not** in `tool_implementations.py` — out of scope for this split.
---
## 6. Risk Assessment & Candidate Slice Ranking
> **Candidate proposals, not a committed plan.** The rankings, package shapes (e.g. `src/pkg/`, `src/domain/`, `src/infra/`, `src/api/`), split ordering, and route-grouping strategy below are **options for maintainer discussion**. Per #4082/#4071, slice ownership and order are settled by maintainers before any follow-up PR. §1–§3 above are the factual current-state inventory.
### 6.1 Risk Scale
| Level | Criteria |
|-------|----------|
| **LOW** | File has ≤3 importers AND ≤500 lines, OR is a pure refactor with clear boundaries |
| **MEDIUM** | File has 415 importers OR 5001,500 lines |
| **HIGH** | File has 16+ importers OR >2,000 lines, OR has cross-layer import violations |
### 6.2 Ranked Split Candidates
| Priority | Target | Risk | Rationale |
|----------|--------|------|-----------|
| **1** | `src/tool_implementations.py``src/tools/*.py` | **MEDIUM** | 4,032 lines → ~10 files by tool category. Already has natural boundaries. 17 importers, tracked in #3629. Use `__init__.py` shim to keep existing imports working. |
| **2** | `routes/` → domain subdirectories (one domain per PR) | **MEDIUM** | 54 flat files. Done **one domain at a time** (e.g. a standalone PR for the email domain, then chat, …), not a broad reorganization — route modules carry helper imports, registration assumptions, and test import paths. |
| **3** | `src/agent_loop.py``src/agent/loop.py` + submodules | **MEDIUM-HIGH** | 2,961 lines, 24 functions. Can extract prompt building, classification, verification, and runaway detection. Tracked in #3266. |
| **4** | `src/``src/pkg/`, `src/domain/`, `src/infra/`, `src/api/` | **MEDIUM** | Structural reorganization. Split flat `src/` into layered packages. Must come after routes and tools are stable. |
| **5** | `routes/email_*.py` consolidation | **LOW** | Already grouped by filename prefix. Low-risk cleanup within the email domain. |
| **6** | `core/database.py``src/infra/database/models/*.py` | **HIGH** | 28 classes, 102 importers. Highest-risk split. Must be **last** in any sequence. Requires careful import shim strategy. |
| **7** | Frontend CSS modularization | **MEDIUM** | 36,653 lines. Tracked in #2617. Separate timeline from backend work. |
| **8** | Frontend JS modularization | **MEDIUM** | 9,776 lines in `document.js`. Introduce ES modules at minimum. |
### 6.3 Candidate First 3 Behavior-Preserving Slices
**Slice 1: Split `tool_implementations.py`** (Lowest-risk high-impact)
- Create `src/tools/` package with one file per tool category
- Add `src/tools/__init__.py` re-exporting all symbols with current names
- Update 17 importers to use new paths (can be deferred via shim)
- Validation: `python -m pytest tests/ -x -q` + manual smoke test of tool execution
- Reference: #3629
**Slice 2: Group `routes/` by domain** (one domain per PR, not a broad sweep)
Route modules carry helper imports, router registration assumptions, and test import paths, so this must be done **one domain at a time** rather than as a single reorganization PR. Example sequence (each its own PR):
- PR 2a: move the **email** domain (`email_routes.py`, `email_helpers.py`, `email_pollers.py`) → `routes/email/` + shim
- PR 2b: move the **chat/agent** domain → `routes/chat/` + shim
- PR 2c: move the **cookbook** domain → `routes/cookbook/` + shim
- …and so on per domain from §4
Each PR: add `__init__.py` re-exporting old names, update `app.py` router imports, validation `python app.py` starts clean. **No behavior change** — pure file reorganization.
**Slice 3: Extract `agent_loop.py` submodules** (Improve reviewability)
- Move prompt assembly → `src/agent/prompt.py`
- Move request classification → `src/agent/classifier.py`
- Move sub-agent verification → `src/agent/verifier.py`
- Move runaway detection → `src/agent/runaway.py`
- Move context management → `src/agent/context.py`
- Keep `src/agent/loop.py` as the main orchestration module
- Validation: `python -m pytest tests/test_agent_loop.py tests/test_loop_breaker_runaway.py -v`
---
## 7. Safety Guardrails for Follow-Up Work
Per maintainer guidance in #4082 and #4071:
- [ ] **One domain/slice per PR** — never mix multiple reorganizations
- [ ] **No behavior changes** mixed with file moves — pure reorganization only
- [ ] **Keep compatibility shims**`__init__.py` re-exports for all existing import paths
- [ ] **Add or identify focused tests** before risky splits
- [ ] **Do not start with `core/database.py`** or broad route movement unless this inventory shows a safe boundary
- [ ] **Prefer small, reviewable slices** over large restructures
- [ ] **No packaging/runtime/tooling migration** mixed into file moves
- [ ] **No frontend framework migration** inside this stabilization lane
- [ ] **Validate with `python -m compileall`** — every PR must pass CI checks
- [ ] **Validate with `pytest`** — run the full test suite before opening each PR
---
## 8. Validation Commands
Each follow-up PR should be verifiable with these commands before submission:
```bash
# Syntax check — must pass with zero errors
python -m compileall src/ routes/ core/ conf/
# Full test suite — must match baseline pass rate
python -m pytest tests/ -x -q
# Import shim verification — existing import paths must still work
python -c "from src.tool_implementations import do_search_chats; print('OK')"
# App startup smoke test (if backend touched)
timeout 5 python app.py 2>&1 | head -5 || true
```
---
## 9. Open Questions
1. Is `#2538` (specs ground truth) the canonical behavior map baseline, and should this inventory be kept in sync with those specs once merged?
2. Should route grouping follow the domain map proposed here, or is there a different taxonomy preferred by maintainers?
3. For the `tool_implementations.py` split (#3629), is the tool categorization in §5.2 acceptable, or should it follow a different grouping?
4. Should compatibility shims (`__init__.py`) be temporary (removed in a follow-up wave) or permanent?
5. Should an ADR (Architecture Decision Record) document be started to track decisions made during this process?
---
## 10. Future Direction (NOT current state)
The following are **future refactor targets** (candidate directions **pending maintainer agreement**, not committed), recorded here so this inventory does not imply they exist today. None of them are present in the current `dev` tree:
- `main.py` — proposed rename of the `app.py` entrypoint. Today the app boots via `app.py`.
- `src/agent/` — proposed package to hold `agent_loop.py` submodules (prompt/classifier/verifier/runaway/context). Today `agent_loop.py` is a single flat file in `src/`.
- `src/infra/`, `src/domain/`, `src/pkg/`, `src/api/` — proposed layered reorganization of the flat `src/` directory (slice 4 in §6).
These become real only when the corresponding slices land.
---
## Appendix A: File Listing
### `src/` (95 files — 61 shown; run `ls src/*.py` for the full list)
```
agent_loop.py tool_implementations.py tool_schemas.py
tool_index.py tool_security.py tool_policy.py
tool_utils.py builtin_actions.py task_scheduler.py
llm_core.py model_context.py model_discovery.py
session_search.py context_budget.py context_compactor.py
ai_interaction.py action_intents.py agent_runs.py
app_helpers.py app_initializer.py config.py
database.py memory.py memory_provider.py
secret_storage.py prompt_security.py url_security.py
url_safety.py rate_limiter.py cleanup_service.py
readiness.py service_health.py exceptions.py
request_models.py assistant_log.py bg_monitor.py
builtin_mcp.py chat_helpers.py chroma_client.py
document_processor.py embedding_lanes.py deep_research.py
research_handler.py research_utils.py personal_docs.py
rag_manager.py rag_singleton.py topic_analyzer.py
visual_report.py youtube_handler.py pdf_forms.py
pdf_form_doc.py pdf_runtime.py caldav_writeback.py
email_thread_parser.py text_helpers.py user_time.py
teacher_escalation.py cookbook_serve_lifecycle.py
chatgpt_subscription.py mcp_manager.py
```
### `routes/` (54 files)
```
__init__.py _validators.py
auth_routes.py api_token_routes.py device_flow.py
chat_routes.py chat_helpers.py shell_routes.py
codex_routes.py skills_routes.py
email_routes.py email_helpers.py email_pollers.py
cookbook_routes.py cookbook_helpers.py cookbook_output.py
model_routes.py assistant_routes.py copilot_routes.py
calendar_routes.py contacts_routes.py
document_routes.py document_helpers.py
gallery_routes.py gallery_helpers.py
task_routes.py session_routes.py
note_routes.py memory_routes.py research_routes.py
mcp_routes.py search_routes.py history_routes.py
webhook_routes.py workspace_routes.py upload_routes.py
vault_routes.py prefs_routes.py preset_routes.py
signature_routes.py personal_routes.py hwfit_routes.py
backup_routes.py cleanup_routes.py diagnostics_routes.py
embedding_routes.py emoji_routes.py font_routes.py
stt_routes.py tts_routes.py compare_routes.py
editor_draft_routes.py chatgpt_subscription_routes.py admin_wipe_routes.py
```
### `core/` (10 files)
```
__init__.py constants.py database.py models.py
auth.py middleware.py session_manager.py exceptions.py
atomic_io.py platform_compat.py
```
---
## Appendix B: Key Import Relationships
```
core/database.py ←── 102 importers (routes/*, src/*, core/*, tests/*)
├── routes/auth_routes.py
├── routes/email_routes.py
├── src/builtin_actions.py
├── src/task_scheduler.py
├── src/tool_implementations.py (inline)
└── ...97 more
src/tool_implementations.py ←── 17 importers
├── src/agent_loop.py
├── src/builtin_actions.py
├── src/tool_index.py
├── src/task_scheduler.py
├── src/tool_policy.py
└── ...12 more (mostly tests)
src/agent_loop.py ←── 22 importers
├── src/tool_policy.py
├── src/teacher_escalation.py
├── src/bg_monitor.py
├── src/task_scheduler.py
└── 18 more (incl. tests)
```
+543 -46
View File
@@ -267,6 +267,10 @@ _DOMAIN_RULES = {
- Use `resolve_contact` to look up a contact's email or phone number by name. Searches the CardDAV address book and sent email history.
- Use `manage_contact` to list, add, update, or delete contacts in the address book.
- Do NOT use `manage_memory` for contact lookups contact details live in the address book, not memory.""",
"integrations": """\
## Integration/API rules
- To query or control a configured service integration (Home Assistant, Miniflux, Gitea, Linkding, Jellyfin, or any other registered service), use `api_call` with the integration name, HTTP method, path, and optional JSON body.
- Do not use shell, curl, or `app_api` to reach a user's connected integration when `api_call` is available.""",
}
_DOMAIN_TOOL_MAP = {
@@ -277,9 +281,10 @@ _DOMAIN_TOOL_MAP = {
"notes_calendar_tasks": {"manage_notes", "manage_calendar", "manage_tasks"},
"ui": {"ui_control"},
"sessions": {"create_session", "list_sessions", "manage_session", "send_to_session", "search_chats"},
"files": {"bash", "python", "read_file", "write_file", "edit_file", "grep", "glob", "ls", "get_workspace"},
"files": {"bash", "python", "read_file", "write_file", "edit_file", "grep", "glob", "ls", "get_workspace", "manage_bg_jobs"},
"settings": {"manage_settings", "manage_endpoints", "manage_mcp", "manage_webhooks", "manage_tokens", "app_api"},
"contacts": {"resolve_contact", "manage_contact"},
"integrations": {"api_call"},
}
def _domain_rules_for_tools(tool_names: set) -> list[str]:
@@ -408,7 +413,7 @@ Generate an image. Line 1 = description, line 2 = model name, line 3 = WxH (e.g.
"ask_teacher": "- ```ask_teacher``` — Escalate a hard question to a more capable model. Line 1 = model name or 'auto', rest = the question. Use when stuck or need expert knowledge.",
"list_models": "- ```list_models``` — Show all available AI models across all endpoints. Use when user asks what models are available.",
"manage_session": "- ```manage_session``` — Rename, archive, delete, fork, switch, or `list` chats (the UI calls them 'chats'; 'session' is internal). Line 1 = action (list/switch/rename/archive/unarchive/delete/important/unimportant/truncate/fork), Line 2 = exact chat id from `list_sessions` (or `current` where supported). For delete/archive/truncate, always list first and reuse the exact id; never invent placeholder ids. `switch`/`open` returns a clickable anchor link the user can tap to open the chat — use for \"open my X chat\".",
"manage_memory": "- ```manage_memory``` — Manage the user's persistent memory (facts, identity, preferences, context that persists across chats). Line 1 = action (list/add/edit/delete/search), rest = content. Use when user says 'remember this', states identity facts like 'my name is <name>' / 'call me <name>' / 'I live in <place>', or asks about stored memories.",
"manage_memory": "- ```manage_memory``` — Manage the user's persistent memory (facts about the USER themselves, their preferences, context that persists across chats). Line 1 = action (list/add/edit/delete/search), rest = content. Use when user says 'remember this' about themselves, states identity facts like 'my name is <name>' / 'call me <name>' / 'I live in <place>', or asks about stored memories. DO NOT use for info about another person (their address, phone, email, birthday) — that goes in `manage_contact`. If the user pastes an address/phone with a name and says 'save this for <person>', use `manage_contact add` with the address arg, NOT manage_memory.",
"manage_skills": "- ```manage_skills``` — Skill registry (SKILL.md format). Args (JSON): {\"action\": \"list|view|view_ref|search|add|edit|patch|publish|delete\", ...}. `list` returns the index of available skills (published + teacher-escalation drafts); `view name=foo` fetches the full SKILL.md; `view_ref name=foo path=...` loads a reference file under the skill directory. For `add`, provide an explicit kebab-case `name` and only report the exact returned name, because storage may normalize or dedupe it. Use this BEFORE doing domain work — there may already be a procedure (published or draft) that prescribes the correct steps. Drafts written by the teacher loop are authoritative guidance even though they're not yet published.",
"manage_tasks": "- ```manage_tasks``` — Create and manage scheduled background tasks (recurring AI jobs). Args (JSON): {\"action\": \"list|create|edit|delete|pause|resume|run\", ...}",
"manage_endpoints": "- ```manage_endpoints``` — Add, remove, or configure AI model API endpoints. Args (JSON): {\"action\": \"list|add|delete|enable|disable\", ...}. Use when user wants to add a new AI provider.",
@@ -428,7 +433,9 @@ Notes, checklists, AND user reminders. Use this for "create/add/write a note", t
```send_email
{"to": "recipient@example.com", "subject": "Re: Your question", "body": "Hi, ...", "account": "gmail"}
```
Send a new email via SMTP. Use `resolve_contact` first if you only have a name. If multiple email accounts exist, call `list_email_accounts` first and pass the chosen `account`.""",
Send a new email via SMTP. Use `resolve_contact` first if you only have a name. If multiple email accounts exist, call `list_email_accounts` first and pass the chosen `account`.
CRITICAL signatures: DO NOT invent a sign-off name. End the body with just `Thanks,` or similar never type a person's name unless the user explicitly told you what to sign as. When `agent_email_confirm` is on (default), the tool returns `{pending: true, pending_id: ...}` and stages the email for the user to approve in the chat UI instead of SMTPing immediately.""",
"list_emails": """\
```list_emails
{"folder": "INBOX", "max_results": 20, "unread_only": false, "account": "gmail"}
@@ -439,7 +446,9 @@ List recent emails from a folder, newest first, including read messages by defau
```reply_to_email
{"uid": "1234", "body": "Sounds good — talk Friday.", "account": "gmail"}
```
SEND a reply email immediately by UID. Do not use this for "open a reply" or "start a reply" those should use `ui_control` with `open_email_reply <uid> <folder> reply` to open the email draft document. For follow-up requests like "reply ..." after reading/listing email where the user clearly wants to send now, use the exact UID and account from the latest `read_email`/`list_emails` result. Never invent UID `1`. Threads automatically (In-Reply-To/References handled).""",
SEND a reply email immediately by UID. Do not use this for "open a reply" or "start a reply" those should use `ui_control` with `open_email_reply <uid> <folder> reply` to open the email draft document. For follow-up requests like "reply ..." after reading/listing email where the user clearly wants to send now, use the exact UID and account from the latest `read_email`/`list_emails` result. Never invent UID `1`. Threads automatically (In-Reply-To/References handled).
CRITICAL signatures: DO NOT invent a sign-off name. End the body with just `Thanks,` or similar never type a person's name unless the user explicitly told you what to sign as. When `agent_email_confirm` is on (default), the tool returns `{pending: true, pending_id: ...}` and stages the email for the user to approve in the chat UI instead of SMTPing immediately.""",
"bulk_email": """\
```bulk_email
{"action": "delete", "uids": ["10997", "10998"], "folder": "INBOX", "account": "Gmail"}
@@ -449,7 +458,7 @@ Bulk delete/archive/mark emails. Use this for "delete all those" after listing e
"archive_email": "- ```archive_email``` — Archive one email by UID. Args (JSON): {\"uid\":\"...\", \"folder\":\"INBOX\", \"account\":\"Gmail\"}. For multiple messages use bulk_email.",
"mark_email_read": "- ```mark_email_read``` — Mark one email read/unread. Args (JSON): {\"uid\":\"...\", \"read\":true, \"folder\":\"INBOX\", \"account\":\"Gmail\"}. For multiple messages use bulk_email.",
"resolve_contact": "- ```resolve_contact``` — Look up a contact's email by name. Searches CardDAV address book + sent email history. Args (JSON): {\"name\": \"...\"}. Use BEFORE send_email when the user gives only a name.",
"manage_contact": "- ```manage_contact``` — Create/update/delete/list CardDAV contacts. Args (JSON): {\"action\": \"list|add|update|delete\", \"name\": \"...\", \"email\": \"...\", \"uid\": \"...\"}. Use only for explicit address-book/contact requests with contact details. Do NOT use for user identity facts like 'my name is <name>'; save those with manage_memory. For update/delete, call action=list first to get the uid.",
"manage_contact": "- ```manage_contact``` — Create/update/delete/list CardDAV contacts. Args (JSON): {\"action\": \"list|add|update|delete\", \"name\": \"...\", \"email\": \"...\", \"phones\": [...], \"address\": \"...\", \"uid\": \"...\"}. Use for info about another person: email, phone, postal address. For 'save this for <person>' / address paste / phone next to a name, use this — NOT manage_memory. Do NOT use for user identity facts ('my name is X'); those are manage_memory. For update/delete, call action=list first for the uid.",
"manage_calendar": """\
```manage_calendar
{"action": "create_event", "summary": "<event title>", "dtstart": "<natural language or ISO datetime>"}
@@ -520,7 +529,7 @@ def get_builtin_overrides() -> dict:
ov = get_setting("builtin_tool_overrides", {})
return ov if isinstance(ov, dict) else {}
except Exception as e:
logger.warning('Failed to load builtin tool overrides: %s', e)
logger.warning("Failed to load builtin tool overrides, using defaults", exc_info=e)
return {}
@@ -532,17 +541,44 @@ def _section_text(name: str, default: str) -> str:
return val if isinstance(val, str) and val.strip() else default
def _compact_tool_line(name: str, section: str) -> str:
"""One-line fenced-tool usage hint for compact/local prompts."""
text = (section or "").strip()
if not text:
return f"- `{name}`"
if text.startswith("- "):
return text
lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
usage = []
in_fence = False
for ln in lines:
if ln.startswith("```"):
usage.append(ln)
in_fence = not in_fence
if len(usage) >= 3:
break
continue
if in_fence and len(usage) < 3:
usage.append(ln)
if usage:
return f"- `{name}` — " + " ".join(usage)
return f"- `{name}` — " + lines[0][:160]
def _assemble_prompt(tool_names: set, disabled_tools: set = None, compact: bool = False) -> str:
"""Build the system prompt with only the specified tools included."""
disabled = disabled_tools or set()
included = tool_names - disabled
if compact:
tool_list = ", ".join(sorted(included)) if included else "none"
tool_lines = []
for name, _default_section in TOOL_SECTIONS.items():
if name in included:
tool_lines.append(_compact_tool_line(name, _section_text(name, _default_section)))
parts = [
"You are an AI assistant with tool access.",
f"Available tools: {tool_list}.",
_API_AGENT_RULES,
_AGENT_PREAMBLE,
"## Available tools\n" + ("\n".join(tool_lines) if tool_lines else "none"),
_AGENT_RULES,
]
parts.extend(_domain_rules_for_tools(included))
return "\n\n".join(parts)
@@ -608,11 +644,6 @@ _API_HOSTS = frozenset([
"api.perplexity.ai", "api.x.ai",
"ollama.com", "api.venice.ai", "api.kimi.com",
"api.githubcopilot.com",
# Local OpenAI-compatible endpoints (llama.cpp, vLLM, LM Studio, etc.).
# Without these, `_is_api_model` falls back to keyword sniffing on the
# model name, so well-behaved local servers don't get native tool
# schemas and the agent silently degrades to fenced-block parsing.
"localhost", "127.0.0.1", "host.docker.internal",
])
_MCP_KEYWORDS = frozenset(["mcp", "browse", "browser", "website", "calendar", "event", "email",
"gmail", "screenshot", "navigate", "click", "miniflux", "rss", "feed"])
@@ -640,6 +671,28 @@ def _is_ollama_openai_compat_url(endpoint_url: str) -> bool:
return parsed.port == 11434 and (path == "/v1" or path.startswith("/v1/"))
def _is_local_openai_compat_url(endpoint_url: str) -> bool:
try:
parsed = urlparse(endpoint_url or "")
except Exception:
return False
host = (parsed.hostname or "").lower()
path = (parsed.path or "").rstrip("/")
if not (path == "/v1" or path.startswith("/v1/")):
return False
if host in {"localhost", "127.0.0.1", "0.0.0.0", "host.docker.internal"}:
return True
if host.startswith("192.168.") or host.startswith("10."):
return True
if host.startswith("172."):
try:
second = int(host.split(".")[1])
return 16 <= second <= 31
except Exception:
return False
return False
def _endpoint_lookup_keys(endpoint_url: str) -> List[str]:
"""Candidate ModelEndpoint.base_url keys for a runtime chat URL."""
raw = (endpoint_url or "").strip()
@@ -703,6 +756,17 @@ def _extract_last_user_message(messages: List[Dict]) -> str:
_LOW_SIGNAL_RE = re.compile(r"^[\W_]*$", re.UNICODE)
_CASUAL_OPENING_RE = re.compile(
r"^\s*(?:h+i+|hey+|hello+|yo+|sup+|what'?s up|wass?up|hiya|howdy|"
r"lol|lmao|haha+|hehe+|thanks?|thank you|ty|idk|dunno|meh|bruh|bro)\b(?P<tail>.*)$",
re.IGNORECASE,
)
_CASUAL_BLOCKLIST_RE = re.compile(
r"\b(?:cookbook|serve|serving|launch|start|vllm|sglang|llama\.?cpp|ollama|"
r"download|model|email|document|doc|note|calendar|task|search|web|research|"
r"file|folder|repo|git|settings?|endpoint|api|token|mcp)\b",
re.IGNORECASE,
)
_EXPLICIT_CONTINUATION_RE = re.compile(
r"^\s*(?:"
r"yes|y|yeah|yep|ok|okay|sure|do it|go ahead|continue|carry on|"
@@ -712,6 +776,17 @@ _EXPLICIT_CONTINUATION_RE = re.compile(
r")\s*[.!?]*\s*$",
re.IGNORECASE,
)
_RETRY_CONTINUATION_RE = re.compile(
r"\b(?:try again|retry|again|rerun|re-run|run it again|launch it again|"
r"start it again|failed|fails?|died|crashed|broke|insta|instantly)\b",
re.IGNORECASE,
)
_COOKBOOK_CONTEXT_RE = re.compile(
r"\b(?:cookbook|serve|serving|served|launch|start|preset|vllm|sglang|"
r"llama\.?cpp|ollama|download|cached models?|model servers?|running models?|"
r"gpu box|ajax|qwen|gemma|llama|mistral|minimax)\b",
re.IGNORECASE,
)
def _is_explicit_continuation(text: str) -> bool:
@@ -719,6 +794,37 @@ def _is_explicit_continuation(text: str) -> bool:
return bool(_EXPLICIT_CONTINUATION_RE.match(str(text or "").strip()))
def _is_casual_low_signal(text: str) -> bool:
"""True for short greetings/slang that should not inherit stale context."""
s = str(text or "").strip()
m = _CASUAL_OPENING_RE.match(s)
if not m:
return False
tail = m.group("tail") or ""
if _CASUAL_BLOCKLIST_RE.search(tail):
return False
# Allow a short vocative/address after the opener without hardcoding the
# address term itself: "hey man", "yo dude", "sup <name>". Longer tails are
# more likely to be an actual request and should get normal context/tooling.
tail_words = re.findall(r"[A-Za-z0-9_'-]+", tail)
return len(tail_words) <= 2
def _is_contextual_retry_continuation(messages: List[Dict], text: str) -> bool:
"""Treat "try again / it failed" as a continuation only for active tool work.
These follow-ups are common after Cookbook launches: the latest user turn
says only "try again it failed", while the actionable model/host/command
details live one or two turns back. Keep this intentionally narrow so
ordinary chat does not inherit stale Cookbook context.
"""
latest = str(text or "").strip()
if not latest or not _RETRY_CONTINUATION_RE.search(latest):
return False
recent = _recent_context_for_retrieval(messages, max_user=5, max_chars=1200)
return bool(_COOKBOOK_CONTEXT_RE.search(recent))
def _assistant_requested_followup(messages: List[Dict]) -> bool:
"""True when the previous assistant turn asked for missing task details.
@@ -760,11 +866,12 @@ def _classify_agent_request(messages: List[Dict], last_user: str) -> Dict[str, o
which domain rule packs get appended to the system prompt.
"""
text = str(last_user or "").strip()
continuation = _is_explicit_continuation(text) or _assistant_requested_followup(messages)
retry_continuation = _is_contextual_retry_continuation(messages, text)
continuation = _is_explicit_continuation(text) or _assistant_requested_followup(messages) or retry_continuation
retrieval_query = _recent_context_for_retrieval(messages) if continuation else text
q = retrieval_query.lower()
if not text or bool(_LOW_SIGNAL_RE.match(text)):
if not text or bool(_LOW_SIGNAL_RE.match(text)) or _is_casual_low_signal(text):
return {
"low_signal": True,
"continuation": False,
@@ -807,10 +914,25 @@ def _classify_agent_request(messages: List[Dict], last_user: str) -> Dict[str, o
domains.add("sessions")
if has(r"\b(file|folder|directory|repo|git|grep|find in files|read file|edit file|shell|terminal|bash|python)\b"):
domains.add("files")
# Managing detached bash jobs: "kill the background job", "stop the job",
# "kill that job", "check the job output", "is the bg job done".
if (has(r"\b(background|bg)\s+(jobs?|task)\b")
or has(r"\b(kill|stop|cancel|terminate|check|tail|show|list)\b.{0,16}\bjobs?\b")
or has(r"\bjobs?\b.{0,16}\b(output|status|done|finished|running)\b")):
domains.add("files")
if has(r"\b(endpoint|api token|mcp|webhook|preference|configure|config|setting)\b"):
domains.add("settings")
if has(r"\b(contact|contacts|phone|phone number|address book|vcard)\b"):
domains.add("contacts")
# API-integration intent — calling a configured service via the api_call
# tool. Without this the #3794 repro ("Use the api_call tool to call Home
# Assistant GET /api/states") matched no domain, classified as low-signal,
# and the tool never reached the schema filter. Detect it explicitly so the
# "integrations" domain seeds api_call deterministically (see
# _DOMAIN_TOOL_MAP), independent of embedding retrieval.
if has(r"\bapi[ _]call\b", r"\bintegrations?\b",
r"\b(?:home ?assistant|miniflux|gitea|linkding|jellyfin)\b"):
domains.add("integrations")
low_signal = not continuation and not domains
return {
@@ -839,8 +961,11 @@ def _recent_context_for_retrieval(messages: List[Dict], max_user: int = 3, max_c
if isinstance(content, list):
content = " ".join(b.get("text", "") for b in content if isinstance(b, dict))
content = (content or "").strip()
# Skip injected tool-result envelopes — role=user but not human intent.
if not content or content.startswith("[Tool execution results]"):
# Skip injected envelopes — role=user but not human intent. Tool results
# are now wrapped via untrusted_context_message (metadata.trusted=False);
# keep the legacy "[Tool execution results]" prefix for older histories.
meta = msg.get("metadata") or {}
if not content or meta.get("trusted") is False or content.startswith("[Tool execution results]"):
continue
collected.append(content)
if len(collected) >= max_user:
@@ -859,6 +984,8 @@ def _build_system_prompt(
compact: bool = False,
owner: Optional[str] = None,
suppress_local_context: bool = False,
suppress_skills: bool = False,
active_email: Optional[Dict[str, str]] = None,
) -> List[Dict]:
"""Build agent system prompt, inject MCP/document context, merge consecutive system msgs."""
global _cached_base_prompt, _cached_base_prompt_key
@@ -875,7 +1002,7 @@ def _build_system_prompt(
_ov_sig = _hl.sha256(_json.dumps(get_builtin_overrides() or {}, sort_keys=True).encode()).hexdigest()
except Exception:
_ov_sig = ""
cache_key = (frozenset(disabled_tools or []), bool(mcp_mgr), needs_admin, _rt_key, compact, _ov_sig, owner, suppress_local_context)
cache_key = (frozenset(disabled_tools or []), bool(mcp_mgr), needs_admin, _rt_key, compact, _ov_sig, owner, suppress_local_context, suppress_skills)
if _cached_base_prompt and _cached_base_prompt_key == cache_key and not active_document:
agent_prompt = _cached_base_prompt
# Skill index is user-editable (name + description), so it must never
@@ -885,6 +1012,7 @@ def _build_system_prompt(
disabled_tools, mcp_mgr, needs_admin, relevant_tools,
mcp_disabled_map=mcp_disabled_map, compact=compact, owner=owner,
suppress_local_context=suppress_local_context,
suppress_skills=suppress_skills,
)
else:
agent_prompt, _skill_index_block = _build_base_prompt(
@@ -896,6 +1024,7 @@ def _build_system_prompt(
compact=compact,
owner=owner,
suppress_local_context=suppress_local_context,
suppress_skills=suppress_skills,
)
if not active_document:
_cached_base_prompt = agent_prompt
@@ -924,8 +1053,8 @@ def _build_system_prompt(
try:
from src.user_time import current_datetime_context_message
_datetime_message = current_datetime_context_message()
except Exception:
pass
except Exception as e:
logger.warning("Failed to build datetime context message", exc_info=e)
# Document context is kept as a SEPARATE message (not merged into the tool
# prompt) so the context trimmer doesn't destroy it when truncating the
@@ -968,8 +1097,8 @@ def _build_system_prompt(
try:
from src.pdf_form_doc import find_source_upload_id
_is_form_backed = bool(find_source_upload_id(active_document.current_content or ""))
except Exception:
pass
except Exception as e:
logger.warning("Failed to detect if document is form-backed, assuming plain", exc_info=e)
if _is_form_backed:
doc_ctx = (
@@ -1051,6 +1180,66 @@ def _build_system_prompt(
else:
set_active_document(None)
# Active email reader — frontend told us the user has an email open.
# Inject a context block so "reply", "summarize this", "what does it say"
# resolve to the real UID instead of the agent inventing a fresh .md
# draft with fake headers. This is the email equivalent of _doc_message.
_email_message = None
if active_email and active_email.get("uid"):
_em_uid = active_email.get("uid", "")
_em_folder = active_email.get("folder", "INBOX")
_em_account = active_email.get("account", "")
_em_subject = active_email.get("subject", "") or "(no subject)"
_em_from = active_email.get("from", "") or "(unknown sender)"
_em_preview = (active_email.get("body_preview", "") or "").strip()
_preview_block = f"\nBody preview:\n```\n{_em_preview[:1800]}\n```" if _em_preview else ""
_acct_arg = f" {_em_account}" if _em_account else ""
email_ctx = (
f"ACTIVE EMAIL OPEN (the user has this email open in a reader window right now)\n"
f"UID: {_em_uid}\n"
f"Folder: {_em_folder}\n"
f"Account: {_em_account or '(default)'}\n"
f"From: {_em_from}\n"
f"Subject: {_em_subject}{_preview_block}\n\n"
f"CRITICAL DEFAULT — every request about email this turn refers to "
f"THIS email unless the user names a DIFFERENT specific recipient "
f"(a name, an email address, or another thread). Examples that "
f"ALL mean reply-to-the-open-email:\n"
f"'reply' / 'reply to this' / 'respond'\n"
f"'write email saying X' / 'send email saying X' / 'draft something'\n"
f"'tell them X' / 'say hi' / 'thanks' / 'ack' / 'lmk'\n"
f"'summarize it' / 'what does it say' / 'tldr'\n"
f"'forward this' / 'forward to <addr>'\n"
f"DO NOT ASK THE USER 'who do you want to send this to?' — the "
f"answer is ALWAYS the sender of the open email (above) unless they "
f"named someone else. Asking that is the wrong move every time.\n\n"
f"RULES for the open email:\n"
f"1. DRAFT a reply (default for any 'write/send/reply/tell them' "
f"request without a different recipient): call `ui_control` with "
f"`action=\"open_email_reply\"` and `extra=\"{_em_uid} {_em_folder} "
f"reply\"`. This opens the proper reply doc with To/Subject/"
f"In-Reply-To pre-filled by the backend. The user will see and edit "
f"it before sending. DO NOT `create_document` a markdown file with "
f"hand-written `To:` / `Subject:` / `In-Reply-To:` headers — that "
f"is wrong every time.\n"
f"2. SEND a reply immediately (skip the draft): call "
f"`reply_to_email` with the UID above. Only do this when the user "
f"explicitly says 'send' / 'send the reply' / 'reply and send'.\n"
f"3. READ the full body (the preview above may be truncated): "
f"call `read_email` with the UID/folder/account above.\n"
f"4. SUMMARIZE / answer questions about it: read it first, then "
f"answer in chat. Don't create a document for a summary unless "
f"the user explicitly asks for one.\n"
f"5. Never ask the user to paste the email or 'share it with you' "
f"— you already have its identity above and can read the full body.\n"
f"6. The ONLY time you ask 'who to send to?' is when the user "
f"explicitly says 'send a NEW email to someone else' or names a "
f"recipient you can't identify. A bare 'send email saying X' = the "
f"open email's sender.\n"
)
_email_message = untrusted_context_message("active email reader", email_ctx)
_email_message["_protected"] = True
# Inject writing style for any email writing path. This is deliberately
# broader than read/list: models may compose via send_email, reply_to_email,
# or ui_control open_email_reply after the first tool round.
@@ -1119,7 +1308,7 @@ def _build_system_prompt(
# few. If the teacher wrote a procedure for "open my X chat" last
# time the student failed, this is where the student finds it
# before deciding which tool to call.
if not suppress_local_context:
if not suppress_local_context and not suppress_skills:
try:
last_user = _extract_last_user_message(messages)
# Respect the user's skills-enabled toggle (mirrors memory_enabled).
@@ -1258,6 +1447,9 @@ def _build_system_prompt(
if _doc_message:
merged.insert(last_user_idx, _doc_message)
last_user_idx += 1 # the document message is now at last_user_idx
if _email_message:
merged.insert(last_user_idx, _email_message)
last_user_idx += 1
if _skills_message:
merged.insert(last_user_idx, _skills_message)
last_user_idx += 1
@@ -1283,6 +1475,7 @@ def _build_base_prompt(
compact: bool = False,
owner: Optional[str] = None,
suppress_local_context: bool = False,
suppress_skills: bool = False,
):
"""Build the agent prompt with only relevant tools included.
@@ -1292,12 +1485,18 @@ def _build_base_prompt(
from src.tool_index import ALWAYS_AVAILABLE
disabled = set(disabled_tools or [])
if not get_setting("image_gen_enabled", True):
if not get_setting("image_gen_enabled", False):
disabled.add("generate_image")
if relevant_tools is not None:
# RAG mode: include always-available + retrieved + admin (if needed)
tool_names = set(ALWAYS_AVAILABLE) | set(relevant_tools)
# RAG mode: trust the relevant_tools set as already-composed.
# get_tools_for_query starts from ALWAYS_AVAILABLE and may
# *discard* tools that conflict with the query's intent (e.g.
# drop manage_memory for clear contact-save patterns). Unioning
# ALWAYS_AVAILABLE back in here used to silently undo those
# drops. Only force-include the irreducible loop primitives
# (ask_user, update_plan) as belt-and-suspenders.
tool_names = set(relevant_tools) | {"ask_user", "update_plan"}
if needs_admin:
tool_names |= _ADMIN_TOOLS
agent_prompt = _assemble_prompt(tool_names, disabled, compact=compact)
@@ -1329,7 +1528,7 @@ def _build_base_prompt(
# The caller wraps it in untrusted_context_message and ships it as a
# user-role message — same treatment as the matched-skills block.
skill_index_block = ""
if not suppress_local_context:
if not suppress_local_context and not suppress_skills:
try:
from services.memory.skills import SkillsManager
from src.constants import DATA_DIR
@@ -1488,8 +1687,14 @@ def _append_tool_results(
if round_reasoning:
msg["reasoning_content"] = round_reasoning
messages.append(msg)
# Tool output (shell/python stdout, file reads, fetched pages, email
# bodies, MCP results) is sourced from outside the server. Wrap it as
# untrusted data so prompt-injection inside a tool result is treated as
# data, not instructions — same hardening as skills (#788) and the
# web/RAG context. THREAT_MODEL.md lists tool output as a surface that
# must go through untrusted_context_message.
messages.append(
{"role": "user", "content": f"[Tool execution results]\n\n{tool_output_text}"}
untrusted_context_message("tool execution results", tool_output_text)
)
@@ -1738,6 +1943,7 @@ async def stream_agent_loop(
max_tool_calls: int = 0,
context_length: int = 0,
active_document=None,
active_email: Optional[Dict[str, str]] = None,
session_id: Optional[str] = None,
disabled_tools: Optional[Set[str]] = None,
owner: Optional[str] = None,
@@ -1747,6 +1953,7 @@ async def stream_agent_loop(
approved_plan: Optional[str] = None,
tool_policy: Optional[ToolPolicy] = None,
workspace: Optional[str] = None,
forced_tools: Optional[Set[str]] = None,
_is_teacher_run: bool = False,
) -> AsyncGenerator[str, None]:
"""Streaming agent loop generator.
@@ -1786,6 +1993,20 @@ async def stream_agent_loop(
_needs_admin = _detect_admin_intent(messages)
_last_user = _extract_last_user_message(messages)
_intent = _classify_agent_request(messages, _last_user)
_low_signal_turn = bool(_intent.get("low_signal"))
_casual_low_signal_turn = _is_casual_low_signal(_last_user)
_direct_low_signal = (
_low_signal_turn
and not bool(_intent.get("continuation"))
and not plan_mode
and not approved_plan
and not guide_only
and (_casual_low_signal_turn or active_document is None)
and (_casual_low_signal_turn or not active_email)
and (_casual_low_signal_turn or not workspace)
and not forced_tools
and not relevant_tools
)
# Tool retrieval uses the latest message by default. It may inherit recent
# user turns only for explicit continuations ("yes", "do it", "1").
_retrieval_query = str(_intent.get("retrieval_query") or _last_user)
@@ -1793,11 +2014,86 @@ async def stream_agent_loop(
"[agent-intent] latest=%r continuation=%s low_signal=%s domains=%s retrieval_query=%r",
_last_user[:120],
bool(_intent.get("continuation")),
bool(_intent.get("low_signal")),
_low_signal_turn,
sorted(_intent.get("domains") or []),
_retrieval_query[:200],
)
_mcp_disabled_map = _load_mcp_disabled_map() if mcp_mgr else {}
if _direct_low_signal:
logger.info("[agent] direct low-signal reply path for latest=%r", _last_user[:80])
direct_messages = [{"role": "user", "content": _last_user}]
direct_response = ""
direct_start = time.time()
direct_actual_model = model
real_input_tokens = 0
real_output_tokens = 0
try:
async for chunk in stream_llm_with_fallback(
[(endpoint_url, model, headers)] + list(fallbacks or []),
direct_messages,
temperature=temperature,
max_tokens=min(max_tokens or 128, 128),
prompt_type=None,
tools=None,
timeout=int(get_setting("agent_stream_timeout_seconds", 300) or 300),
session_id=session_id,
):
if chunk.startswith("data: ") and not chunk.startswith("data: [DONE]"):
try:
data = json.loads(chunk[6:])
except json.JSONDecodeError:
yield chunk
continue
if data.get("type") == "usage":
usage = data.get("data", {}) or {}
direct_actual_model = usage.get("model") or direct_actual_model
real_input_tokens += usage.get("input_tokens", 0) or 0
real_output_tokens += usage.get("output_tokens", 0) or 0
continue
if data.get("type") == "model_actual":
direct_actual_model = data.get("model") or direct_actual_model
data["requested_model"] = model
yield f"data: {json.dumps(data)}\n\n"
continue
if data.get("type") == "fallback":
direct_actual_model = data.get("answered_by") or direct_actual_model
yield chunk
continue
if "delta" in data:
if not data.get("thinking"):
direct_response += data.get("delta", "")
yield chunk
continue
yield chunk
elif chunk.startswith("event: "):
yield chunk
except Exception as _direct_err:
logger.warning("[agent] direct low-signal path failed: %s", _direct_err)
fallback = "Hey."
direct_response += fallback
yield f"data: {json.dumps({'delta': fallback})}\n\n"
if not direct_response.strip():
fallback = "Hey."
direct_response = fallback
yield f"data: {json.dumps({'delta': fallback})}\n\n"
duration = time.time() - direct_start
metrics = {
"model": direct_actual_model,
"requested_model": model,
"input_tokens": real_input_tokens or estimate_tokens(direct_messages),
"output_tokens": real_output_tokens or max(len(direct_response) // 4, 1),
"total_time": round(duration, 2),
"response_time": round(duration, 2),
"agent_rounds": 0,
"tool_calls": 0,
"direct_low_signal": True,
}
yield f"data: {json.dumps({'type': 'metrics', 'data': metrics})}\n\n"
yield "data: [DONE]\n\n"
return
if plan_mode and mcp_mgr:
# Allow read-only MCP tools to investigate, block write/unknown ones:
# hide them from the schemas AND reject them at runtime by qualified name.
@@ -1809,11 +2105,11 @@ async def stream_agent_loop(
# RAG-based tool selection: retrieve relevant tools for this query.
# If caller provided a pre-computed set (e.g. task_scheduler), use that.
_relevant_tools = set() if guide_only else relevant_tools
_relevant_tools = relevant_tools
_t1 = time.time()
if _relevant_tools:
logger.info(f"[tool-rag] Using caller-provided relevant_tools ({len(_relevant_tools)} tools)")
if not guide_only and not _relevant_tools and bool(_intent.get("low_signal")):
if not guide_only and not _relevant_tools and _low_signal_turn:
from src.tool_index import ALWAYS_AVAILABLE
if workspace:
# An active workspace IS the file-work signal: a vague "look at the
@@ -1904,6 +2200,53 @@ async def stream_agent_loop(
if _relevant_tools is not None and active_document is not None:
_relevant_tools.update({"edit_document", "update_document", "suggest_document"})
# Per-request UI toggles are stronger than retrieval. If the user turns on
# Search, the model must see the search tools even when the latest text is a
# typo or otherwise low-signal for tool RAG.
if not guide_only and forced_tools:
if _relevant_tools is None:
from src.tool_index import ALWAYS_AVAILABLE
_relevant_tools = set(ALWAYS_AVAILABLE)
_relevant_tools.update(t for t in forced_tools if t not in disabled_tools)
# The skill index injected by _build_system_prompt tells the model to
# call `manage_skills action=view`, and Jaccard-matched skills are pasted
# into the prompt as procedures to follow — but neither path goes through
# tool selection, so the model can be handed a procedure naming tools
# (grep, read_file, ...) that aren't in its schema list. Keep the schemas
# in lockstep: manage_skills is callable whenever any skill is indexed,
# and a matched skill's declared requires_toolsets ride along with it.
if not guide_only and _relevant_tools is not None and not _low_signal_turn:
try:
from services.memory.skills import SkillsManager
from src.constants import DATA_DIR
_skills_on = True
try:
from routes.prefs_routes import _load_for_user as _load_prefs
_skills_on = (_load_prefs(owner) or {}).get("skills_enabled", True)
except Exception:
pass
_sm = SkillsManager(DATA_DIR)
_owner_skills = _sm.load(owner=owner) if _skills_on else []
if _owner_skills:
_relevant_tools.add("manage_skills")
if _retrieval_query:
# Validate against every known executable tool, not just
# TOOL_SECTIONS — code-nav tools (grep/glob/ls) ship as
# schemas without a prompt-prose section.
from src.tool_policy import known_tool_names
_known = known_tool_names()
for _sk in _sm.get_relevant_skills(
_retrieval_query, skills=_owner_skills,
threshold=0.25, max_items=3,
):
_relevant_tools.update(
t for t in (_sk.get("requires_toolsets") or [])
if t in _known
)
except Exception as _e:
logger.debug(f"[tool-rag] skill-aware tool include skipped: {_e}")
if _relevant_tools is not None:
logger.info("[agent-intent] selected_tools=%s", sorted(_relevant_tools)[:50])
@@ -1938,7 +2281,7 @@ async def stream_agent_loop(
_model_supports_tools = any(kw in _model_lc for kw in (
"gpt-4", "gpt-5", "gpt-o", "claude", "gemini", "gemma",
"qwen3", "qwen2.5", "mixtral", "mistral", "llama-3.1", "llama-3.2",
"llama-3.3", "llama-4",
"llama-3.3", "llama-4", "llama3.1", "llama3.2", "llama3.3", "llama4",
# Local-served models that follow OpenAI-style function calling
# via vLLM's `--enable-auto-tool-choice`. Belt-and-suspenders
# with the per-endpoint flag above.
@@ -1980,13 +2323,16 @@ async def stream_agent_loop(
_is_api_model = False
else:
_is_api_model = any(h in endpoint_url for h in _API_HOSTS) or _model_supports_tools
_compact_agent_prompt = _is_api_model or _is_ollama_native or _ollama_openai_compat
messages, mcp_schemas = _build_system_prompt(
messages, model, active_document, mcp_mgr, disabled_tools,
needs_admin=_needs_admin, relevant_tools=_relevant_tools,
mcp_disabled_map=_mcp_disabled_map,
compact=_is_api_model,
compact=_compact_agent_prompt,
owner=owner,
suppress_local_context=guide_only,
suppress_skills=_low_signal_turn,
active_email=active_email,
)
if plan_mode and not guide_only:
# Steer the model to investigate-then-propose. Hard tool gating handles
@@ -2071,6 +2417,14 @@ async def stream_agent_loop(
# Strip internal metadata keys before sending to the LLM API
messages = [{k: v for k, v in msg.items() if k != "_protected"} for msg in messages]
agent_prompt_tokens = estimate_tokens(messages)
logger.info(
"[agent-timing] prep_done model=%s prompt_tokens=%s context_length=%s prep=%s",
model,
agent_prompt_tokens,
context_length,
{k: round(v, 3) for k, v in prep_timings.items()},
)
yield f"data: {json.dumps({'type': 'agent_prep', 'data': {k: round(v, 3) for k, v in prep_timings.items()}})}\n\n"
full_response = ""
@@ -2167,9 +2521,17 @@ async def stream_agent_loop(
elif _is_api_model:
# Filter schemas by RAG-selected tools (if available)
if _relevant_tools:
# _build_base_prompt unions _ADMIN_TOOLS into the prompt
# sections when admin intent fires — the schema list must
# offer the same names, or the model reads prose describing
# tools it cannot call and substitutes the nearest schema
# it does have (e.g. manage_memory for manage_skills).
_schema_names = set(_relevant_tools)
if _needs_admin:
_schema_names |= _ADMIN_TOOLS
base_schemas = [
s for s in FUNCTION_TOOL_SCHEMAS
if s.get("function", {}).get("name") in _relevant_tools
if s.get("function", {}).get("name") in _schema_names
]
_mcp_filtered = [
s for s in mcp_schemas
@@ -2207,6 +2569,19 @@ async def stream_agent_loop(
# complementary cap for the rare stream that trickles bytes forever and
# so never trips the inactivity timeout. Generous — only catches runaway.
_round_deadline = time.time() + max(agent_stream_timeout * 4, 1200)
_round_start = time.time()
_round_first_event_logged = False
_round_first_token_logged = False
logger.info(
"[agent-timing] round_start round=%s model=%s endpoint=%s prompt_tokens=%s tools=%s native_tools=%s timeout=%s",
round_num,
model,
endpoint_url,
estimate_tokens(messages),
len(_tool_names_sent),
bool(all_tool_schemas),
agent_stream_timeout,
)
async for chunk in stream_llm_with_fallback(
_candidates,
messages,
@@ -2217,11 +2592,30 @@ async def stream_agent_loop(
timeout=agent_stream_timeout,
session_id=session_id,
):
if not _round_first_event_logged:
_round_first_event_logged = True
logger.info(
"[agent-timing] first_event round=%s elapsed=%.3fs kind=%s",
round_num,
time.time() - _round_start,
"error" if chunk.startswith("event: error") else "data",
)
if time.time() > _round_deadline:
logger.warning(f"[agent] round {round_num} stream exceeded wall-clock deadline; cutting off")
logger.warning(
"[agent-timing] round_deadline round=%s elapsed=%.3fs deadline_s=%s",
round_num,
time.time() - _round_start,
max(agent_stream_timeout * 4, 1200),
)
break
# Forward error events from stream_llm to the frontend
if chunk.startswith("event: error"):
logger.warning(
"[agent-timing] stream_error round=%s elapsed=%.3fs chunk=%r",
round_num,
time.time() - _round_start,
chunk[:500],
)
yield chunk
continue
if chunk.startswith("data: ") and not chunk.startswith("data: [DONE]"):
@@ -2301,6 +2695,15 @@ async def stream_agent_loop(
if not first_token_received:
time_to_first_token = time.time() - total_start
first_token_received = True
if not _round_first_token_logged:
_round_first_token_logged = True
logger.info(
"[agent-timing] first_visible_token round=%s elapsed=%.3fs total_elapsed=%.3fs thinking=%s",
round_num,
time.time() - _round_start,
time.time() - total_start,
bool(data.get("thinking")),
)
# Keep reasoning deltas in a separate accumulator so
# we can echo them back via `reasoning_content` on the
# next request (DeepSeek requires this; harmless for
@@ -2370,7 +2773,21 @@ async def stream_agent_loop(
yield chunk
# Intercept [DONE] — don't forward until all rounds finish
tool_blocks, used_native = _resolve_tool_blocks(round_response, native_tool_calls, round_num, is_api_model=_is_api_model)
logger.info(
"[agent-timing] round_stream_done round=%s elapsed=%.3fs text_chars=%s tool_calls=%s first_event=%s first_token=%s",
round_num,
time.time() - _round_start,
len(round_response),
len(native_tool_calls),
_round_first_event_logged,
_round_first_token_logged,
)
tool_blocks, used_native = _resolve_tool_blocks(
round_response,
native_tool_calls,
round_num,
is_api_model=(_is_api_model and not guide_only),
)
# Force-answer round: we told the model to STOP calling tools and
# answer. If it ignored that and emitted a (possibly DSML) tool
@@ -2454,7 +2871,7 @@ async def stream_agent_loop(
# model with no real native_tool_calls) must not be stripped from the
# persisted text either — otherwise it streams once and then disappears
# on reload (#3222 follow-up).
cleaned_round = strip_tool_blocks(round_response, skip_fenced=(_is_api_model and not used_native)).strip()
cleaned_round = strip_tool_blocks(round_response, skip_fenced=(_is_api_model and not used_native and not guide_only)).strip()
round_texts.append(cleaned_round)
if not tool_blocks:
@@ -2526,6 +2943,15 @@ async def stream_agent_loop(
_intent_nudge_count += 1
_matched_phrase = _intent_match.group(0).strip()
logger.info(f"[agent] intent-without-action nudge #{_intent_nudge_count} on round {round_num}: {_matched_phrase!r}")
_lower_phrase = _matched_phrase.lower()
_cookbook_log_hint = ""
if any(_word in _lower_phrase for _word in ("log", "logs", "output", "tail", "status")):
_cookbook_log_hint = (
" If this is about a Cookbook/model serve, the concrete calls are: "
"`list_served_models` first, then `tail_serve_output` with the "
"session_id from the serve/list result. Never answer with "
"\"check logs\" when those tools are available."
)
messages.append({
"role": "system",
"content": (
@@ -2534,6 +2960,7 @@ async def stream_agent_loop(
"see you announced the action but didn't run it, which "
"is the most frustrating thing you can do. "
"DO IT NOW: emit the actual function call this turn. "
f"{_cookbook_log_hint}"
"If you decided not to do it after all, say so plainly in "
"one sentence instead of restating the plan."
),
@@ -2705,6 +3132,46 @@ async def stream_agent_loop(
)
desc, result = await _tool_task
# A skill the model just loaded can prescribe tools that weren't
# RAG-selected this turn (declared via requires_toolsets in its
# frontmatter). Union them into the selection so the NEXT round's
# schema list includes them — otherwise the model reads "use
# grep" from the skill it fetched but has no grep schema to call.
if (
block.tool_type == "manage_skills"
and _relevant_tools is not None
and not result.get("error")
):
_ms_args = {}
_ms_raw = (block.content or "").strip()
if _ms_raw.startswith("{"):
try:
_ms_args = json.loads(_ms_raw)
except json.JSONDecodeError:
_ms_args = {}
_ms_name = str(_ms_args.get("name", "") or "").strip()
if _ms_name and _ms_args.get("action") in ("view", "view_ref"):
try:
from services.memory.skills import SkillsManager as _SkM
from src.constants import DATA_DIR as _DD
from src.tool_policy import known_tool_names as _ktn
_known = _ktn()
for _sk in _SkM(_DD).load(owner=owner):
if _sk.get("name") == _ms_name:
_new = {
t for t in (_sk.get("requires_toolsets") or [])
if t in _known and t not in _relevant_tools
}
if _new:
_relevant_tools.update(_new)
logger.info(
"[tool-rag] skill '%s' unlocked tools for next round: %s",
_ms_name, sorted(_new),
)
break
except Exception as _e:
logger.debug(f"skill requires_toolsets unlock skipped: {_e}")
# Extract structured web sources from web_search tool output.
# web_search returns {"output": ..., "exit_code": 0}; check "output"
# first so the <!-- SOURCES:…--> marker is found and stripped even
@@ -2748,9 +3215,12 @@ async def stream_agent_loop(
f'data: {json.dumps({"type": "ui_control", "data": result})}\n\n'
)
# ask_user: the agent posed a multiple-choice question. Emit it so the
# frontend renders clickable options, then end the turn (below) and
# wait — the user's pick becomes the next message.
# ask_user: remember the payload now, but emit the interactive event
# only *after* tool_output below. Emitting it before tool_output let
# the subsequent tool-card rewrite/scroll push the choices out of
# view. The payload is also copied into the persisted tool event so
# history reload can reconstruct an unanswered card.
_pending_ask_user_event = None
if "ask_user" in result:
# The question lives in the tool args. ChatMessage.to_dict()
# replays only role+content to the model next turn — tool_event
@@ -2765,9 +3235,7 @@ async def stream_agent_loop(
_auq_delta = ("\n\n" if full_response.strip() else "") + _auq_q
full_response += _auq_delta
yield 'data: ' + json.dumps({"delta": _auq_delta}) + '\n\n'
yield (
f'data: {json.dumps({"type": "ask_user", "data": result["ask_user"]})}\n\n'
)
_pending_ask_user_event = _auq
_awaiting_user = True
# update_plan: agent wrote back to the plan (ticked a step / revised).
@@ -2822,9 +3290,25 @@ async def stream_agent_loop(
# Emit tool_output (include ui_event data if present)
tool_output_data = {"type": "tool_output", "tool": block.tool_type, "command": cmd_display, "output": output_text, "exit_code": result.get("exit_code")}
if _pending_ask_user_event:
# Keep enough state in the streamed tool result for alternate
# clients to render the prompt without depending on event order.
tool_output_data["ask_user"] = _pending_ask_user_event
if "ui_event" in result:
tool_output_data["ui_event"] = result["ui_event"]
for k in ("toggle_name", "state", "mode", "model", "endpoint_url", "theme_name", "colors"):
for k in (
"toggle_name", "state", "mode", "model", "endpoint_url",
"theme_name", "colors",
# ui_control open_email_reply payload — without these the
# frontend openReplyDraft bails on undefined uid and the
# reply window silently never opens.
"uid", "folder", "account_id",
# Optional pre-filled body for open_email_reply so the
# agent can compose-and-open in one tool call.
"body",
# ui_control open_panel payload
"panel",
):
if k in result:
tool_output_data[k] = result[k]
# Forward image data from generate_image tool
@@ -2840,6 +3324,14 @@ async def stream_agent_loop(
tool_output_data["diff"] = result["diff"]
yield f'data: {json.dumps(tool_output_data)}\n\n'
# This must be the final UI event for ask_user: the frontend appends
# the card below the now-settled tool node and cancels any between-
# round spinner. The turn ends after the current tool batch.
if _pending_ask_user_event:
yield (
f'data: {json.dumps({"type": "ask_user", "data": _pending_ask_user_event})}\n\n'
)
# Native document tools open in the editor + carry the REAL doc id.
# Emit a doc_update so the frontend opens/activates it and sends it
# back as active_doc_id next turn (otherwise the agent can't "see"
@@ -2897,6 +3389,11 @@ async def stream_agent_loop(
# this the diff shows live but vanishes from saved history.
if result.get("diff"):
tool_event["diff"] = result["diff"]
if _pending_ask_user_event:
# Persist the structured question with the tool event. On a
# reload, chatRenderer can restore the card; a later user
# message removes it as answered.
tool_event["ask_user"] = _pending_ask_user_event
tool_events.append(tool_event)
if block.tool_type in _VERIFIER_EFFECTFUL_TOOLS:
_effectful_used = True
+13 -1
View File
@@ -174,8 +174,20 @@ async def subscribe(session_id: str) -> AsyncGenerator[str, None]:
next_seq += 1
if run.status != "running":
return
heartbeat_idx = 0
while True:
seq, ev = await q.get()
try:
seq, ev = await asyncio.wait_for(q.get(), timeout=10.0)
except asyncio.TimeoutError:
# Keep slow local models/proxies alive while they prefill before
# the first token. SSE comments are ignored by the UI but reset
# browser/proxy idle timers, which prevents "empty response"
# disconnects on llama.cpp first-token latencies of 30s+.
if run.status == "running":
heartbeat_idx += 1
yield f": heartbeat {heartbeat_idx}\n\n"
continue
seq, ev = (None, None)
if seq is None: # end sentinel
while next_seq < len(run.buffer): # flush any tail the sentinel raced
yield run.buffer[next_seq]
+19 -6
View File
@@ -22,6 +22,14 @@ from .subprocess_tools import BashTool, PythonTool
from .web_tools import WebSearchTool, WebFetchTool
from .filesystem_tools import ReadFileTool, WriteFileTool, EditFileTool, LsTool, GlobTool, GrepTool, GetWorkspaceTool
from .document_tools import CreateDocumentTool, UpdateDocumentTool, EditDocumentTool, SuggestDocumentTool, ManageDocumentTool
from .model_interaction_tools import ChatWithModelTool, AskTeacherTool, ListModelsTool
from .bg_job_tools import ManageBgJobsTool
from .session_tools import CreateSessionTool, ListSessionsTool, SendToSessionTool, ManageSessionTool
from .admin_tools import (
ADMIN_TOOL_HANDLERS,
do_manage_endpoints, do_manage_mcp, do_manage_webhooks,
do_manage_tokens, do_manage_settings,
)
TOOL_HANDLERS = {
"bash": BashTool().execute,
@@ -40,7 +48,17 @@ TOOL_HANDLERS = {
"suggest_document": SuggestDocumentTool().execute,
"manage_documents": ManageDocumentTool().execute,
"get_workspace": GetWorkspaceTool().execute,
"chat_with_model": ChatWithModelTool().execute,
"ask_teacher": AskTeacherTool().execute,
"list_models": ListModelsTool().execute,
"manage_bg_jobs": ManageBgJobsTool().execute,
"create_session": CreateSessionTool().execute,
"list_sessions": ListSessionsTool().execute,
"send_to_session": SendToSessionTool().execute,
"manage_session": ManageSessionTool().execute,
}
# Config/integration admin tools (manage_endpoints/mcp/webhooks/tokens/settings).
TOOL_HANDLERS.update(ADMIN_TOOL_HANDLERS)
# ---------------------------------------------------------------------------
# Constants (re-exported for backward compatibility — single source of truth
@@ -52,7 +70,7 @@ PYTHON_TIMEOUT = 30
# Tool types that trigger execution
TOOL_TAGS = {"bash", "python", "web_search", "web_fetch", "read_file", "write_file", "edit_file",
"grep", "glob", "ls", "get_workspace",
"grep", "glob", "ls", "get_workspace", "manage_bg_jobs",
"create_document", "update_document", "edit_document",
"search_chats",
"chat_with_model", "create_session", "list_sessions",
@@ -127,10 +145,5 @@ from src.tool_implementations import ( # noqa: E402, F401
do_search_chats,
do_manage_skills,
do_manage_tasks,
do_manage_endpoints,
do_manage_mcp,
do_manage_webhooks,
do_manage_tokens,
do_manage_settings,
do_api_call,
)
+784
View File
@@ -0,0 +1,784 @@
"""Config/integration admin agent tools (TOOL_HANDLERS).
Moved verbatim from tool_implementations.py as part of the tool-registry
migration (#3629, the `admin_tools.py` bullet): manage_endpoints / manage_mcp /
manage_webhooks / manage_tokens / manage_settings, plus manage_mcp's
command-allowlist guard. Each impl keeps its `do_*(content, owner)` shape;
ADMIN_TOOL_HANDLERS wraps them into registry `execute(content, ctx)` adapters
via one factory.
"""
import json
import os
import re
import logging
from typing import Optional, Dict
from src.tool_utils import get_mcp_manager, _parse_tool_args
logger = logging.getLogger(__name__)
async def do_manage_endpoints(content: str, owner: Optional[str] = None) -> Dict:
"""Manage model endpoints: list, add, delete, enable, disable."""
from core.database import SessionLocal, ModelEndpoint
try:
args = _parse_tool_args(content)
except ValueError:
return {"error": "Invalid JSON arguments", "exit_code": 1}
action = args.get("action", "list")
db = SessionLocal()
try:
if action == "list":
eps = db.query(ModelEndpoint).all()
items = [{"id": e.id, "name": e.name, "base_url": e.base_url,
"is_enabled": e.is_enabled} for e in eps]
return {"response": f"{len(items)} endpoints", "endpoints": items, "exit_code": 0}
elif action == "add":
import uuid as _uuid
name = args.get("name", "")
base_url = args.get("base_url", "")
api_key = args.get("api_key", "")
if not base_url:
return {"error": "base_url is required", "exit_code": 1}
eid = str(_uuid.uuid4())[:8]
from datetime import datetime
ep = ModelEndpoint(id=eid, name=name or base_url, base_url=base_url,
api_key=api_key, is_enabled=True,
created_at=datetime.utcnow(), updated_at=datetime.utcnow())
db.add(ep)
db.commit()
return {"response": f"Added endpoint '{name or base_url}' (id: {eid})", "exit_code": 0}
elif action == "delete":
eid = args.get("endpoint_id", "")
ep = db.query(ModelEndpoint).filter(ModelEndpoint.id == eid).first()
if not ep:
return {"error": f"Endpoint {eid} not found", "exit_code": 1}
name = ep.name
db.delete(ep)
db.commit()
return {"response": f"Deleted endpoint '{name}'", "exit_code": 0}
elif action in ("enable", "disable"):
eid = args.get("endpoint_id", "")
ep = db.query(ModelEndpoint).filter(ModelEndpoint.id == eid).first()
if not ep:
return {"error": f"Endpoint {eid} not found", "exit_code": 1}
ep.is_enabled = (action == "enable")
db.commit()
return {"response": f"Endpoint '{ep.name}' {action}d", "exit_code": 0}
else:
return {"error": f"Unknown action: {action}", "exit_code": 1}
except Exception as e:
logger.error(f"manage_endpoints error: {e}")
return {"error": str(e), "exit_code": 1}
finally:
db.close()
# ---------------------------------------------------------------------------
# MCP server management tool
# ---------------------------------------------------------------------------
# Parallel to routes/cookbook_helpers._validate_serve_cmd but deliberately the
# opposite policy: that gate guards an admin-only serve command and allows
# interpreters (python3/etc) because model-serving needs them, whereas this is
# the model/prompt-injection-reachable manage_mcp path, so interpreters and
# runners are denied here.
#
# Commands that can execute arbitrary code regardless of their arguments. These
# are NEVER accepted on the manage_mcp agent path, even if an operator lists one
# in ODYSSEUS_MCP_ALLOWED_COMMANDS -- a stdio server that genuinely needs an
# interpreter or package runner must be registered via the trusted admin route.
_MCP_DENIED_COMMANDS = frozenset({
"sh", "bash", "zsh", "fish", "dash", "ksh", "csh", "tcsh", "ash", "busybox",
"cmd", "command.com", "powershell", "pwsh",
"python", "pypy", "node", "nodejs", "deno", "bun", "ruby", "jruby",
"perl", "raku", "php", "lua", "luajit", "tclsh", "wish", "expect", "rscript",
"groovy", "scala", "elixir", "erl", "iex", "java", "javac", "jshell", "jbang",
"kotlin", "kotlinc", "dotnet", "mono", "swift", "osascript", "tsx", "ts-node",
"npx", "bunx", "uvx", "pipx", "npm", "pnpm", "yarn", "pip", "uv",
"gem", "cargo", "go", "bundle", "poetry", "conda", "mamba", "brew",
"apt", "apt-get", "yum", "dnf", "pacman", "apk",
"env", "xargs", "nohup", "setsid", "nice", "ionice", "time", "timeout",
"watch", "stdbuf", "unbuffer", "script", "ssh", "scp", "sshpass", "sudo",
"doas", "su", "make", "cmake", "docker", "podman", "kubectl", "find",
"awk", "gawk", "sed", "vi", "vim", "nvim", "emacs", "ed", "tee", "eval",
})
# Argv flags that make even an allowlisted binary execute inline code. Matched
# by prefix so glued forms (-cimport os, --eval=...) are caught, not just the
# exact-token form.
_MCP_CODE_EXEC_SHORT_FLAGS = ("-c", "-e", "-m")
_MCP_CODE_EXEC_LONG_FLAGS = ("--eval", "--exec", "--print", "--module", "--command", "--require")
_MCP_URL_SCHEMES = ("http://", "https://", "ftp://", "ftps://", "file://", "data:", "jar:", "blob:")
# Shell metacharacters refused in command/args. Args are passed as an argv list
# (no shell), but refusing these keeps the surface narrow and obvious.
_MCP_SHELL_METACHARS = set(";|&$`><\n\r")
# Env vars that let a child process load attacker-supplied code before main().
_MCP_DANGEROUS_ENV = frozenset({
"LD_PRELOAD", "LD_LIBRARY_PATH", "LD_AUDIT", "DYLD_INSERT_LIBRARIES",
"DYLD_LIBRARY_PATH", "DYLD_FRAMEWORK_PATH", "PYTHONPATH", "PYTHONSTARTUP",
"PYTHONHOME", "PYTHONEXECUTABLE", "NODE_OPTIONS", "NODE_PATH", "BASH_ENV",
"ENV", "SHELLOPTS", "PERL5LIB", "PERL5OPT", "RUBYOPT", "RUBYLIB", "GEM_PATH",
"R_PROFILE", "R_HOME", "PATH", "IFS", "PROMPT_COMMAND",
})
def _mcp_allowed_commands() -> set:
"""Operator-configured allowlist of safe MCP launcher basenames for the agent
path. Empty by default; set ODYSSEUS_MCP_ALLOWED_COMMANDS (comma-separated)
to opt specific trusted binaries in. Denied commands are rejected even if
listed here."""
raw = os.environ.get("ODYSSEUS_MCP_ALLOWED_COMMANDS", "")
return {c.strip().lower() for c in raw.split(",") if c.strip()}
def _validate_mcp_command(command, args, env) -> Optional[str]:
"""Validate a model-supplied stdio MCP registration. Returns an error string
if it must be rejected, else None.
Closes the RCE where manage_mcp 'add' passed prompt-injection-controlled
command/args/env straight to a subprocess spawn (issue #438): a payload
smuggled into a skill description, memory entry, fetched page, or email body
could register a stdio server running arbitrary code as the app UID.
"""
if not isinstance(command, str) or not command.strip():
return "command must be a non-empty string"
command = command.strip()
if "/" in command or "\\" in command:
return "command must be a bare executable name, not a path"
if any(ch in _MCP_SHELL_METACHARS for ch in command):
return "command contains shell metacharacters"
base = command.lower()
if base.endswith(".exe") or base.endswith(".cmd") or base.endswith(".bat"):
base = base.rsplit(".", 1)[0]
# Canonicalize a trailing version suffix so versioned aliases collapse to the
# family name (python3.11 -> python, node18 -> node, pip3 -> pip); both the
# raw basename and the canonical form are denied, so an operator cannot
# accidentally allowlist a runtime alias back into the path.
canon = re.sub(r"[-_.]?\d+(?:\.\d+)*$", "", base)
if base in _MCP_DENIED_COMMANDS or canon in _MCP_DENIED_COMMANDS:
return (
f"command '{command}' is not allowed on the agent MCP path: "
"interpreters, runtimes, package runners, and shells can execute "
"arbitrary code. Register such a server via the admin route instead."
)
if base not in _mcp_allowed_commands():
return (
f"command '{command}' is not in the MCP allowlist. Add it to "
"ODYSSEUS_MCP_ALLOWED_COMMANDS if you trust it, or register the "
"server via the admin route."
)
if args is not None:
if isinstance(args, str):
try:
args = json.loads(args)
except Exception:
return "args must be a JSON list"
if not isinstance(args, list):
return "args must be a list"
for a in args:
if not isinstance(a, str):
return "args must all be strings"
s = a.strip()
low = s.lower()
if any(s == f or s.startswith(f) for f in _MCP_CODE_EXEC_SHORT_FLAGS):
return f"arg '{a}' is a code-execution flag and is not allowed"
if any(low == f or low.startswith(f + "=") for f in _MCP_CODE_EXEC_LONG_FLAGS):
return f"arg '{a}' is a code-execution flag and is not allowed"
if any(low.startswith(u) for u in _MCP_URL_SCHEMES):
return f"arg '{a}' is a remote URL and is not allowed"
if any(ch in _MCP_SHELL_METACHARS for ch in a):
return f"arg '{a}' contains shell metacharacters"
if env:
if isinstance(env, str):
try:
env = json.loads(env)
except Exception:
return "env must be a JSON object"
if not isinstance(env, dict):
return "env must be an object"
for k in env:
if str(k).strip().upper() in _MCP_DANGEROUS_ENV:
return f"env var '{k}' can inject code into the child process and is not allowed"
return None
async def do_manage_mcp(content: str, owner: Optional[str] = None) -> Dict:
"""Manage MCP servers: list, add, delete, enable, disable, reconnect."""
try:
args = _parse_tool_args(content)
except ValueError:
return {"error": "Invalid JSON arguments", "exit_code": 1}
action = args.get("action", "list")
if action == "list":
mcp = get_mcp_manager()
if not mcp:
return {"response": "No MCP manager available", "servers": [], "exit_code": 0}
from core.database import SessionLocal, McpServer
db = SessionLocal()
try:
servers = db.query(McpServer).all()
items = []
for s in servers:
st = mcp.get_server_status(s.id)
status = st.get("status", "disconnected")
tool_count = st.get("tool_count", 0)
items.append({"id": s.id, "name": s.name, "transport": s.transport,
"is_enabled": s.is_enabled, "status": status,
"tool_count": tool_count})
return {"response": f"{len(items)} MCP servers", "servers": items, "exit_code": 0}
finally:
db.close()
elif action == "add":
from core.database import SessionLocal, McpServer
import uuid as _uuid
from datetime import datetime
name = args.get("name", "")
command = args.get("command", "")
cmd_args = args.get("args", [])
env = args.get("env", {})
if not name or not command:
return {"error": "name and command are required", "exit_code": 1}
# Validate BEFORE any DB write or spawn: a rejected registration must
# leave no enabled row (which would otherwise auto-reconnect on restart)
# and must not attempt a connection.
_mcp_err = _validate_mcp_command(command, cmd_args, env)
if _mcp_err:
return {"error": f"manage_mcp: refused unsafe server registration: {_mcp_err}", "exit_code": 1}
sid = str(_uuid.uuid4())[:8]
db = SessionLocal()
try:
srv = McpServer(id=sid, name=name, transport="stdio", command=command,
args=json.dumps(cmd_args) if isinstance(cmd_args, list) else cmd_args,
env=json.dumps(env) if isinstance(env, dict) else env,
is_enabled=True, created_at=datetime.utcnow(), updated_at=datetime.utcnow())
db.add(srv)
db.commit()
finally:
db.close()
# Try to connect
mcp = get_mcp_manager()
tool_count = 0
if mcp:
try:
await mcp.connect_server(
sid, name, "stdio", command=command,
args=cmd_args if isinstance(cmd_args, list) else json.loads(cmd_args),
env=env if isinstance(env, dict) else json.loads(env),
)
st = mcp.get_server_status(sid)
tool_count = st.get("tool_count", 0)
except Exception as e:
logger.warning(f"MCP connect failed for {name}: {e}")
return {"response": f"Added MCP server '{name}' ({tool_count} tools)", "exit_code": 0}
elif action == "delete":
sid = args.get("server_id", "")
from core.database import SessionLocal, McpServer
db = SessionLocal()
try:
srv = db.query(McpServer).filter(McpServer.id == sid).first()
if not srv:
return {"error": f"Server {sid} not found", "exit_code": 1}
name = srv.name
mcp = get_mcp_manager()
if mcp:
try:
await mcp.disconnect_server(sid)
except Exception:
pass
db.delete(srv)
db.commit()
return {"response": f"Deleted MCP server '{name}'", "exit_code": 0}
finally:
db.close()
elif action == "reconnect":
sid = args.get("server_id", "")
mcp = get_mcp_manager()
if not mcp:
return {"error": "MCP manager not available", "exit_code": 1}
try:
await mcp.disconnect_server(sid)
from core.database import SessionLocal, McpServer
db2 = SessionLocal()
try:
srv = db2.query(McpServer).filter(McpServer.id == sid).first()
if srv:
_args = json.loads(srv.args) if srv.args else []
_env = json.loads(srv.env) if srv.env else {}
await mcp.connect_server(
server_id=sid,
name=srv.name,
transport=srv.transport,
command=srv.command,
args=_args,
env=_env,
url=srv.url,
)
st = mcp.get_server_status(sid)
return {"response": f"Reconnected '{srv.name}' ({st.get('tool_count', 0)} tools)", "exit_code": 0}
return {"error": f"Server {sid} not found", "exit_code": 1}
finally:
db2.close()
except Exception as e:
return {"error": str(e), "exit_code": 1}
elif action in ("enable", "disable"):
sid = args.get("server_id", "")
from core.database import SessionLocal, McpServer
db = SessionLocal()
try:
srv = db.query(McpServer).filter(McpServer.id == sid).first()
if not srv:
return {"error": f"Server {sid} not found", "exit_code": 1}
srv.is_enabled = (action == "enable")
db.commit()
return {"response": f"MCP server '{srv.name}' {action}d", "exit_code": 0}
finally:
db.close()
elif action == "list_tools":
mcp = get_mcp_manager()
if not mcp:
return {"response": "No MCP manager", "tools": [], "exit_code": 0}
tools = mcp.get_all_tools()
items = [{"name": t["name"], "server": t["server_name"],
"description": t.get("description", "")[:100]} for t in tools]
return {"response": f"{len(items)} MCP tools available", "tools": items, "exit_code": 0}
else:
return {"error": f"Unknown action: {action}", "exit_code": 1}
# ---------------------------------------------------------------------------
# Webhook management tool
# ---------------------------------------------------------------------------
async def do_manage_webhooks(content: str, owner: Optional[str] = None) -> Dict:
"""Manage webhooks: list, add, delete, enable, disable, test."""
from core.database import SessionLocal
try:
args = _parse_tool_args(content)
except ValueError:
return {"error": "Invalid JSON arguments", "exit_code": 1}
action = args.get("action", "list")
db = SessionLocal()
try:
from core.database import Webhook
if action == "list":
hooks = db.query(Webhook).all()
items = [{"id": h.id, "name": h.name, "url": h.url,
"events": h.events, "is_active": h.is_active} for h in hooks]
return {"response": f"{len(items)} webhooks", "webhooks": items, "exit_code": 0}
elif action == "add":
import uuid as _uuid
from datetime import datetime
from src.webhook_manager import validate_events, validate_webhook_url
name = args.get("name", "")
url = args.get("url", "")
events = args.get("events", "chat.completed")
if not url:
return {"error": "url is required", "exit_code": 1}
try:
url = validate_webhook_url(url)
events = validate_events(events)
except ValueError as e:
return {"error": str(e), "exit_code": 1}
wid = str(_uuid.uuid4())[:8]
hook = Webhook(id=wid, name=name or url, url=url,
events=events, is_active=True,
created_at=datetime.utcnow(), updated_at=datetime.utcnow())
db.add(hook)
db.commit()
return {"response": f"Added webhook '{name or url}'", "exit_code": 0}
elif action == "delete":
wid = args.get("webhook_id", "")
hook = db.query(Webhook).filter(Webhook.id == wid).first()
if not hook:
return {"error": f"Webhook {wid} not found", "exit_code": 1}
name = hook.name
db.delete(hook)
db.commit()
return {"response": f"Deleted webhook '{name}'", "exit_code": 0}
elif action in ("enable", "disable"):
wid = args.get("webhook_id", "")
hook = db.query(Webhook).filter(Webhook.id == wid).first()
if not hook:
return {"error": f"Webhook {wid} not found", "exit_code": 1}
hook.is_active = (action == "enable")
db.commit()
return {"response": f"Webhook '{hook.name}' {action}d", "exit_code": 0}
else:
return {"error": f"Unknown action: {action}", "exit_code": 1}
except Exception as e:
logger.error(f"manage_webhooks error: {e}")
return {"error": str(e), "exit_code": 1}
finally:
db.close()
# ---------------------------------------------------------------------------
# API token management tool
# ---------------------------------------------------------------------------
async def do_manage_tokens(content: str, owner: Optional[str] = None) -> Dict:
"""Manage API tokens: list, create, delete."""
from core.database import SessionLocal, ApiToken
try:
args = _parse_tool_args(content)
except ValueError:
return {"error": "Invalid JSON arguments", "exit_code": 1}
action = args.get("action", "list")
db = SessionLocal()
try:
if action == "list":
tokens = db.query(ApiToken).all()
items = [{"id": t.id, "name": t.name, "token_prefix": t.token_prefix + "...",
"is_active": t.is_active} for t in tokens]
return {"response": f"{len(items)} API tokens", "tokens": items, "exit_code": 0}
elif action == "create":
import uuid as _uuid, secrets, bcrypt
from datetime import datetime
name = args.get("name", "API Token")
raw_token = secrets.token_urlsafe(32)
token_hash = bcrypt.hashpw(raw_token.encode(), bcrypt.gensalt()).decode()
tid = str(_uuid.uuid4())[:8]
t = ApiToken(id=tid, name=name, token_hash=token_hash,
token_prefix=raw_token[:8], is_active=True,
created_at=datetime.utcnow(), updated_at=datetime.utcnow())
db.add(t)
db.commit()
return {"response": f"Created token '{name}'", "token": raw_token, "exit_code": 0}
elif action == "delete":
tid = args.get("token_id", "")
t = db.query(ApiToken).filter(ApiToken.id == tid).first()
if not t:
return {"error": f"Token {tid} not found", "exit_code": 1}
name = t.name
db.delete(t)
db.commit()
return {"response": f"Deleted token '{name}'", "exit_code": 0}
else:
return {"error": f"Unknown action: {action}", "exit_code": 1}
except Exception as e:
logger.error(f"manage_tokens error: {e}")
return {"error": str(e), "exit_code": 1}
finally:
db.close()
# ---------------------------------------------------------------------------
# Settings/preferences management tool
# ---------------------------------------------------------------------------
async def do_manage_settings(content: str, owner: Optional[str] = None) -> Dict:
"""Manage user settings and preferences."""
try:
args = _parse_tool_args(content)
except ValueError:
return {"error": "Invalid JSON arguments", "exit_code": 1}
action = args.get("action", "list")
from core.database import SessionLocal
db = SessionLocal()
try:
# set/get/list/delete operate on the REAL app settings (the same store
# the Settings panel writes), so changing a model / voice / search
# engine / reminder channel from chat actually takes effect.
from src.settings import load_settings, save_settings, DEFAULT_SETTINGS
# Secrets/credentials the agent must NOT write: kept read-only (masked)
# so API keys never flow through chat. User sets these in the panel.
_SECRET_KEYS = {
"brave_api_key", "google_pse_key", "google_pse_cx",
"tavily_api_key", "serper_api_key", "app_public_url",
}
def _is_secret(k):
# `token` must be a suffix, not a substring: otherwise the int
# setting `agent_input_token_budget` (which even has a "token budget"
# alias to set it from chat) is wrongly classified as a credential.
return (
k in _SECRET_KEYS
or k.endswith("token")
or any(t in k for t in ("api_key", "_key", "secret", "password"))
)
# Friendly aliases → real keys, so natural phrasing resolves.
_ALIASES_SET = {
"voice": "tts_voice", "tts voice": "tts_voice", "tts": "tts_enabled",
"text to speech": "tts_enabled", "tts provider": "tts_provider",
"speech speed": "tts_speed", "voice speed": "tts_speed",
"stt": "stt_enabled", "speech to text": "stt_enabled", "transcription": "stt_enabled",
"search engine": "search_provider", "search provider": "search_provider",
"search results": "search_result_count", "result count": "search_result_count",
"default model": "default_model", "chat model": "default_model",
"default endpoint": "default_endpoint_id",
"task model": "task_model", "background model": "task_model",
"teacher model": "teacher_model", "teacher": "teacher_enabled",
"utility model": "utility_model", "research model": "research_model",
"research max tokens": "research_max_tokens",
"vision model": "vision_model", "vision": "vision_enabled",
"image model": "image_model", "image quality": "image_quality",
"image gen": "image_gen_enabled", "image generation": "image_gen_enabled",
"reminder channel": "reminder_channel", "reminders": "reminder_channel",
"ntfy topic": "reminder_ntfy_topic",
"webhook integration": "reminder_webhook_integration_id",
"webhook template": "reminder_webhook_payload_template", "webhook payload": "reminder_webhook_payload_template",
"agent tool calls": "agent_max_tool_calls", "max tool calls": "agent_max_tool_calls",
"agent timeout": "agent_stream_timeout_seconds", "stream timeout": "agent_stream_timeout_seconds",
"token budget": "agent_input_token_budget", "input budget": "agent_input_token_budget",
"hard max": "agent_input_token_hard_max",
"token budget cap": "agent_input_token_hard_max",
"input budget cap": "agent_input_token_hard_max",
}
def _resolve(k):
k2 = (k or "").strip().lower()
if k2 in DEFAULT_SETTINGS:
return k2
return _ALIASES_SET.get(k2, (k or "").strip())
_ENUMS = {
"image_quality": ["low", "medium", "high"],
"reminder_channel": ["browser", "email", "ntfy", "webhook"],
}
def _coerce(value, default):
if isinstance(default, bool):
return value if isinstance(value, bool) else str(value).strip().lower() in ("true", "on", "yes", "1", "enable", "enabled")
if isinstance(default, int):
return int(value)
return value
def _model_slug(value: str) -> str:
import re as _re
return _re.sub(r"[^a-z0-9]+", "", (value or "").lower())
def _endpoint_model_from_cache(model_query: str):
"""Resolve friendly model text to an enabled endpoint + real model id.
The Settings UI stores both `<prefix>_endpoint_id` and
`<prefix>_model`; writing only the model leaves the runtime on the
old endpoint. Prefer cached model lists so this stays fast/offline.
"""
import json as _json
import re as _re
from core.database import ModelEndpoint
wanted = (model_query or "").strip()
wanted_slug = _model_slug(wanted)
wanted_tokens = [_model_slug(t) for t in _re.findall(r"[A-Za-z0-9]+", wanted)]
wanted_tokens = [t for t in wanted_tokens if t]
if not wanted_slug:
return None
best = None
for ep in db.query(ModelEndpoint).filter(ModelEndpoint.is_enabled == True).all():
raw_models = []
try:
raw_models = _json.loads(ep.cached_models or "[]") or []
except Exception:
raw_models = []
# If cache is empty, still allow matching against endpoint name
# for callers using model@endpoint elsewhere later.
for mid in raw_models:
mid = str(mid)
mid_slug = _model_slug(mid)
if not mid_slug:
continue
exact = mid.lower() == wanted.lower()
compact_match = wanted_slug in mid_slug or mid_slug in wanted_slug
token_match = bool(wanted_tokens) and all(tok in mid_slug for tok in wanted_tokens)
if exact or compact_match or token_match:
score = 3 if exact else (2 if compact_match else 1)
if not best or score > best[0]:
best = (score, ep.id, mid)
if best:
return {"endpoint_id": best[1], "model": best[2]}
return None
def _mask(k, v):
return "••••• (set in panel)" if _is_secret(k) and v else v
if action == "list":
s = load_settings()
shown = {k: _mask(k, v) for k, v in s.items() if k in DEFAULT_SETTINGS and not isinstance(v, dict)}
return {"response": f"{len(shown)} settings (use get/set with a key)", "settings": shown, "exit_code": 0}
elif action == "get":
key = _resolve(args.get("key", ""))
if not key:
return {"error": "key is required", "exit_code": 1}
if key not in DEFAULT_SETTINGS:
return {"error": f"Unknown setting '{args.get('key')}'. Use action='list' to see them.", "exit_code": 1}
val = load_settings().get(key, DEFAULT_SETTINGS.get(key))
return {"response": f"{key} = {_mask(key, val)}", "value": _mask(key, val), "exit_code": 0}
elif action == "set":
raw = args.get("key", "")
value = args.get("value")
if not raw:
return {"error": "key is required", "exit_code": 1}
key = _resolve(raw)
if key not in DEFAULT_SETTINGS:
return {"error": f"Unknown setting '{raw}'. Use action='list' to see available settings.", "exit_code": 1}
if _is_secret(key):
return {"response": f"'{key}' is a credential/secret. For security I can't set it from chat. Open Settings and set it there.", "exit_code": 0}
# Structured settings (dicts/lists like keybinds, default_model_fallbacks)
# have no safe scalar coercion; _coerce would pass a bare string
# straight through and clobber the structure. Refuse them here; they're
# edited in their dedicated panels. (reset/delete still restore the
# default structure, which is safe.)
if isinstance(DEFAULT_SETTINGS[key], (dict, list)):
return {"response": f"'{key}' is a structured setting. Edit it in its panel, not from chat. (You can reset it to default here.)", "exit_code": 0}
try:
value = _coerce(value, DEFAULT_SETTINGS[key])
except (ValueError, TypeError):
return {"error": f"'{value}' isn't a valid value for {key} (expected {type(DEFAULT_SETTINGS[key]).__name__}).", "exit_code": 1}
if key in _ENUMS and str(value).lower() not in _ENUMS[key]:
return {"error": f"{key} must be one of: {', '.join(_ENUMS[key])}.", "exit_code": 1}
s = load_settings()
s[key] = value
if key in {"default_model", "research_model", "utility_model", "task_model", "vision_model", "image_model"}:
resolved = _endpoint_model_from_cache(str(value))
if resolved:
prefix = key[:-6]
s[f"{prefix}_endpoint_id"] = resolved["endpoint_id"]
s[key] = resolved["model"]
value = resolved["model"]
save_settings(s)
if key.endswith("_model") and s.get(f"{key[:-6]}_endpoint_id"):
return {"response": f"Set {key} = {value} (endpoint {s.get(f'{key[:-6]}_endpoint_id')}).", "exit_code": 0}
return {"response": f"Set {key} = {value}.", "exit_code": 0}
elif action == "delete" or action == "reset":
key = _resolve(args.get("key", ""))
if key not in DEFAULT_SETTINGS:
return {"error": f"Unknown setting '{args.get('key')}'.", "exit_code": 1}
if _is_secret(key):
return {"response": f"'{key}' is a credential. Reset it in the panel.", "exit_code": 0}
s = load_settings()
s[key] = DEFAULT_SETTINGS[key]
save_settings(s)
return {"response": f"Reset {key} to default ({DEFAULT_SETTINGS[key]}).", "exit_code": 0}
elif action in ("disable_tool", "enable_tool", "list_tools"):
# Tool-toggle actions. These edit settings.json:disabled_tools
# (the global list read on every chat request) rather than
# prefs.json. Friendly aliases accepted: "shell" -> "bash",
# "search" -> "web_search", "browser" -> "builtin_browser",
# "documents" -> the document tool set, "memory" ->
# manage_memory, etc.
from src.settings import get_setting, save_settings, load_settings
_ALIASES = {
"shell": ["bash"],
"terminal": ["bash"],
"search": ["web_search", "web_fetch"],
"web": ["web_search", "web_fetch"],
"browser": ["builtin_browser"],
"documents": ["create_document", "edit_document", "update_document", "suggest_document"],
"doc": ["create_document", "edit_document", "update_document", "suggest_document"],
"memory": ["manage_memory"],
"skills": ["manage_skills"],
"images": ["generate_image"],
"image": ["generate_image"],
"tasks": ["manage_tasks"],
"notes": ["manage_notes"],
"calendar": ["manage_calendar"],
"email": ["mcp__email__list_emails", "mcp__email__read_email", "mcp__email__send_email"],
"research": ["web_search", "web_fetch"], # research is a per-request flag, not a tool (closest analog)
}
if action == "list_tools":
current = get_setting("disabled_tools", []) or []
return {
"response": (
f"Currently disabled: {', '.join(current) if current else '(none)'}.\n"
"Common toggles: shell (bash), search (web_search), browser, documents, "
"memory, skills, images, tasks, notes, calendar, email."
),
"disabled": list(current),
"exit_code": 0,
}
tool_name = (args.get("tool") or args.get("name") or "").strip().lower()
if not tool_name:
return {"error": "tool name required (e.g. 'shell', 'search', 'bash')", "exit_code": 1}
targets = _ALIASES.get(tool_name, [tool_name])
settings = load_settings()
current = list(settings.get("disabled_tools") or [])
before = set(current)
if action == "disable_tool":
for t in targets:
if t not in current:
current.append(t)
else: # enable_tool
current = [t for t in current if t not in targets]
after = set(current)
settings["disabled_tools"] = current
save_settings(settings)
verb = "Disabled" if action == "disable_tool" else "Enabled"
changed = sorted(after.symmetric_difference(before))
return {
"response": (
f"{verb} {tool_name} ({', '.join(targets)}). "
f"Now disabled: {', '.join(current) if current else '(none)'}."
),
"changed": changed,
"disabled": list(current),
"exit_code": 0,
}
else:
return {"error": f"Unknown action: {action}", "exit_code": 1}
except Exception as e:
logger.error(f"manage_settings error: {e}")
return {"error": str(e), "exit_code": 1}
finally:
db.close()
# ---------------------------------------------------------------------------
# API call tool
# ---------------------------------------------------------------------------
# ── registry adapters ────────────────────────────────────────────────────────
def _owner_adapter(fn):
"""Wrap a do_*(content, owner) impl as a registry execute(content, ctx)."""
async def _execute(content: str, ctx: dict) -> dict:
return await fn(content, ctx.get("owner"))
return _execute
ADMIN_TOOL_HANDLERS = {
"manage_endpoints": _owner_adapter(do_manage_endpoints),
"manage_mcp": _owner_adapter(do_manage_mcp),
"manage_webhooks": _owner_adapter(do_manage_webhooks),
"manage_tokens": _owner_adapter(do_manage_tokens),
"manage_settings": _owner_adapter(do_manage_settings),
}
+98
View File
@@ -0,0 +1,98 @@
"""Agent tool to inspect and control detached background `bash` jobs.
`bash` blocks prefixed with a `#!bg` marker run detached via `src.bg_jobs`; the
agent is auto-re-invoked with the output when they finish. This tool covers the
gaps in that flow: list the jobs in the current chat, read a still-running job's
output on demand, and kill a runaway job instead of waiting out its max-runtime.
Registry tool (`TOOL_HANDLERS["manage_bg_jobs"]`). Jobs are scoped to the chat
that launched them, so every action requires the caller's `session_id` and a job
from another session is treated as not found.
"""
import json
import time
from typing import Any, Dict, List
_LIST_ACTIONS = {"list", "ls", "jobs"}
_OUTPUT_ACTIONS = {"output", "get", "read", "tail", "status", "show"}
_KILL_ACTIONS = {"kill", "stop", "cancel", "terminate"}
def _age(rec: Dict[str, Any]) -> str:
start = rec.get("started_at")
if not start:
return "?"
secs = int(time.time() - start)
if secs < 60:
return f"{secs}s"
if secs < 3600:
return f"{secs // 60}m"
return f"{secs // 3600}h{(secs % 3600) // 60}m"
def _status_label(rec: Dict[str, Any]) -> str:
status = rec.get("status", "?")
if rec.get("killed"):
return "killed"
if rec.get("timed_out"):
return "timed out"
if rec.get("died"):
return "died"
if status in ("done", "failed"):
return f"{status} (exit {rec.get('exit_code')})"
return status
def _row(rec: Dict[str, Any]) -> str:
cmd = (rec.get("command") or "").strip().splitlines()[0][:80]
return f"[{rec.get('id')}] {_status_label(rec)} | {_age(rec)} | {cmd}"
class ManageBgJobsTool:
async def execute(self, content: str, ctx: dict) -> dict:
from src import bg_jobs
session_id = ctx.get("session_id")
raw = (content or "").strip()
try:
args = json.loads(raw) if raw else {}
except (ValueError, TypeError):
args = {}
if not isinstance(args, dict):
args = {}
action = str(args.get("action", "list")).strip().lower()
job_id = str(args.get("job_id") or args.get("id") or "").strip()
if not session_id:
return {"error": "manage_bg_jobs: no active chat session; background jobs are scoped to a chat.", "exit_code": 1}
if action in _LIST_ACTIONS:
jobs: List[Dict[str, Any]] = bg_jobs.list_for_session(session_id)
if not jobs:
return {"output": "No background jobs in this chat.", "exit_code": 0}
jobs.sort(key=lambda r: r.get("started_at") or 0, reverse=True)
lines = "\n".join(_row(r) for r in jobs)
return {"output": f"{len(jobs)} background job(s):\n{lines}", "exit_code": 0}
if action in _OUTPUT_ACTIONS or action in _KILL_ACTIONS:
if not job_id:
return {"error": f"manage_bg_jobs: action '{action}' requires a job_id (see action='list').", "exit_code": 1}
rec = bg_jobs.get(job_id)
# Scope: only the chat that launched a job may see or control it.
if rec is None or rec.get("session_id") != session_id:
return {"error": f"manage_bg_jobs: no background job '{job_id}' in this chat.", "exit_code": 1}
if action in _KILL_ACTIONS:
if rec.get("status") != "running":
return {"output": f"Job `{job_id}` already {_status_label(rec)}; nothing to kill.", "exit_code": 0}
killed = bg_jobs.kill(job_id)
return {"output": f"Killed background job `{job_id}` ({(killed or {}).get('command', '').splitlines()[0][:80]}).", "exit_code": 0}
out = rec.get("output") or "(no output yet)"
return {
"output": f"Job `{job_id}` [{_status_label(rec)}, {_age(rec)}]\nCommand: {rec.get('command')}\n\nOutput:\n{out}",
"exit_code": 0,
}
return {"error": f"manage_bg_jobs: unknown action '{action}'. Use list, output, or kill.", "exit_code": 1}

Some files were not shown because too many files have changed in this diff Show More