- Agent: pass the open email reader (uid/folder/account/from/subject/body
preview) on every chat submit so 'reply to this' / 'write email saying
hi' route to ui_control open_email_reply with the right UID instead of
inventing a new .md draft. Code-level enforcement (chat_routes strips
create_document + send_email when active_email is set); cross-session
active_doc_id is now trusted instead of being silently dropped.
set_active_email/clear_active_email tool-layer helpers in
tool_implementations.
- ui_control open_email_reply: optional body argument so the agent can
open-and-write in one call; envelope now forwards uid/folder/account/
body/panel through tool_output. Tool description sharpened and the
parser rejects empty bodies on reply/reply-all (forces the agent to
write rather than open an empty draft).
- Email library: search now runs against [Gmail]/All Mail when the
current folder is INBOX (archived emails surface). Whirlpool spinner
+ 'Searching…' placeholder while in flight. Each search result is
stamped with its source folder so clicks open the right email instead
of whatever shares its UID in INBOX. Search no longer re-applies the
same text pill locally (which only checks subject/from/snippet, never
body) so body-only matches don't get dropped after IMAP returns them.
Initial inbox load bumped 100→500.
- Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the
card menu and the open-reader more menu, backed by a new
/api/email/flag/{uid}?on=true|false endpoint. Flagged emails always
bubble to the top of the grid regardless of active sort.
- AI reply in doc editor: never overwrites existing draft text or the
quoted history. AI suggestion is prepended; AI-generated 'On …
wrote:' re-quotes are stripped so the original quote isn't visually
edited.
- Cookbook serve: pre-launch GPU driver / has_gpu / install / version-
floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0
etc.) before the launch chain starts. Detect 'another model already
running on this host' and offer Stop & launch (with graceful then
force tmux kill helpers, port release wait). Per-vendor deep-link
buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend
picker is now a custom dropdown with accent-coloured logos for vLLM,
SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to
package names in Dependencies. Runtime-readiness note moved inside
the panel (green when ready, red when missing) with an × dismiss.
Esc collapses the expanded card; expanded card scrolls when it
overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce
Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on
one row (Reasoning Parser auto-detected per model family).
Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto'
input — command builders read from the GPU button strip. Default
cookbook open is Download tab.
- Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by
release_date; release dates can be backfilled with the new
scripts/backfill_model_release_dates.py and recipe metadata pulled
with scripts/import_from_vllm_recipes.py against the upstream
vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped
on entries).
- Calendar: Quick add hint cycles a random Odysseus-themed example per
open (wooden horse Friday, crew muster 10am daily, council on
Ithaca, …). Typing a time like '11pm' in the event title updates
the hero clock live.
- Doc editor: email-mode Reply button (sparkle icon, accent) opens the
same Fast/Full + context popover the email reader uses; Ctrl+Alt+M
toggles markdown preview.
- Memories panel: custom sort picker with per-option icons, default
'Latest', visible Enabled/Disabled toggle text matching the section
description style.
* refactor(constants): single source of truth for data dir + merge core/src constants
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(contributing): use named src.constants for data paths, drop core/constants references
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(platform): add WSL compatibility functions and path translation
fix(cookbook): enhance model scan script to support additional HuggingFace cache paths
fix(hardware): improve cache key generation for remote SSH context
test(tests): add tests for WSL detection and path translation functionality
* fix(cookbook): prefer prebuilt wheels for llama-cpp-python and normalize package aliases
* fix: enable StrictHostKeyChecking in nvidia probe
refactor: consolidate ssh & powershell command execution to utility functions in core module
refactor: consolidate nvidia path candidates in to single variables in core module
tests: add tests for new utility functions
* fix: correct wrong variable name
* fix(security): close DNS-rebinding hole on diffusion_server
scripts/diffusion_server.py used to ship `allow_origins=["*"]` with the
default `--host=127.0.0.1` bind. Combined, that left the OpenAI-compatible
image API reachable from any browser tab via DNS-rebinding: an attacker page
resolves its own domain to 127.0.0.1 mid-fetch, the browser forwards the
request to the loopback server, the server processes it (no Host check), and
the wildcard CORS reply lets the attacker page read the result + drive the
GPU. CWE-346 + CWE-942 + CWE-352 (DNS-rebinding bridge).
Fix:
- Drop the wildcard CORS at module load (default-deny).
- Install `TrustedHostMiddleware` with a loopback allowlist so DNS-rebound
requests are rejected by the middleware before any route runs.
- Add additive `--allowed-host` / `--allowed-origin` CLI flags so operators
who need browser access on a specific origin can opt in explicitly without
re-introducing the wildcard.
Tests: tests/test_diffusion_server_security.py (9 cases) pin the allowlist
helpers, the default-deny CORS behavior, and the live middleware paths via
Starlette's TestClient.
Detected by Aeon + semgrep + manual review.
Severity: medium.
CWE-346 / CWE-942 / CWE-352.
* test(diffusion-server): drive ASGI app via httpx, not TestClient portal
The TrustedHost/CORS integration tests used `with TestClient(app) as
client:`, whose context-manager form spins up an anyio blocking portal to
run the app lifespan. Under the repo's pytest setup (anyio plugin active, a
stray asyncio_mode option, no pytest-asyncio) that portal deadlocks —
`test_trusted_host_middleware_rejects_attacker_host` hung indefinitely in
review before emitting any assertion output.
Replace the TestClient usage with a tiny _asgi_get() helper that drives the
ASGI app over httpx.ASGITransport on a fresh event loop (asyncio.run). No
portal, no lifespan, no dependency on the host project's async test plugins.
Host is taken from the request URL so TrustedHostMiddleware sees the exact
hostname under test; Origin goes through headers. Assertions are unchanged.
Focused test now passes in 0.12s; full file 9 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: aeonframework <aeonframework@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`odysseus-research list --status complete` returns an empty result on
any real corpus. The CLI accepts `complete` as a `--status` choice (the
user-facing label), but the writer in
`services/research/research_handler.py` stores `status="done"` when a
run finishes (and the legacy `src/research_handler.py` copy does the
same). The list filter at `scripts/odysseus-research` was a literal
string compare:
if args.status and (data.get("status") or "") != args.status:
continue
so `--status complete` filtered every finished record out, and the user
saw nothing — even though `odysseus-research list` (no filter) listed
them fine and `show RP_ID` worked on the same files. The other
documented choices — `running`, `cancelled`, `error` — are stored
verbatim by the writer, so the surface mismatch is just on `complete`.
Add a small `_STATUS_CLI_TO_STORED = {"complete": "done"}` map and run
`data.get("status")` through `_status_matches(...)` before comparing.
The other CLI choices fall through unchanged, so the filter still
matches them verbatim. A `None` or non-string `status` (corrupt JSON)
is coerced to `""` and never matches `complete`, so a half-written
record can't sneak past the filter.
`tests/test_research_cli_status_filter.py` covers all four documented
choices, the non-string / missing status case, and pins that the
verbatim choices are NOT rewritten — a blanket mapping that turned
every CLI choice into a stored variant would just re-introduce the
empty-result bug on the running/cancelled/error paths.
Part of #2122.
`add_hwfit_models.py` infers `parameter_count` and `parameters_raw` by
regexing the HF repo name for a `<num>B` token, optionally with an
`-A<num>B` MoE active-param suffix. Repos that don't encode a size in
their name at all (e.g. `zai-org/GLM-4.5`, where the "4.5" is a version
not a parameter count) fall through to the safetensors element-count
path. That path works for unquantized FP16 / BF16 repos but is brittle
in two cases the catalog hits often:
1. Author-bulk runs (`AUTHORS = ["cyankiwi"]`) pull pre-quantized AWQ /
GPTQ / MLX repos. The safetensors metadata stores the packed I32
tensors and a per-dtype `parameters` map, which the script unpacks
via a per-quant pack factor. When the upload doesn't populate that
map (older repos, custom shards), `st.total` is used raw and the
parameter count is off by 4-8x.
2. Repos where the safetensors block is absent from `model_info()`
entirely. The current code returns `None` and silently drops the
model, which then has to be added to `EXTRA_REPOS` by hand with a
literal `parameter_count` string.
Both are exactly what the issue calls out — the regex / safetensors
combo can't size GLM-4.5 by itself because the name has no `<num>B`
and the upstream repo's safetensors block doesn't carry a usable param
total either.
Add a config.json fallback in front of the safetensors path:
- `_fetch_config_json(repo_id)` downloads `config.json` via
`hf_hub_download` (so the standard HF on-disk cache handles
deduplication across runs, no extra cache layer needed). Network /
404 / gated-repo errors return `None` and the caller proceeds to the
safetensors fallback. An in-process `_CONFIG_CACHE` dedupes the
base-model vs. source-repo lookups within a single run.
- `_params_from_config(cfg)` first honours explicit `num_parameters` /
`n_params` / `total_params` fields when present. Otherwise it sums
embeddings + attention (GQA-aware via `num_key_value_heads` and
`head_dim`) + dense MLP (`3 * hidden_size * intermediate_size`,
covering SwiGLU / GeGLU). For MoE configs it picks up both naming
conventions in the wild — `num_experts` / `num_experts_per_tok`
(Qwen3-MoE) and `n_routed_experts` / `n_shared_experts` (GLM-4-MoE,
DeepSeek-V3) — uses `moe_intermediate_size`, and respects
`first_k_dense_replace` so the first N layers stay dense. Active
parameters come out as `num_experts_per_tok + n_shared_experts` of
the routed experts, which matches how each architecture reports its
active count.
- In `_entry_from_modelinfo`, try config.json on the source repo first
(works for unquantized models) and then on the `base_model:` parent
(covers AWQ / GPTQ children whose own config is just a quantization
manifest). Both lookups run only when regex + override + base_model
tag all failed, so the normal author-bulk run still resolves sizes
from names without touching the Hub.
Spot-checks against the three architecture families this script
actually pulls — within ~5% of the documented param counts, which is
well inside the `parameter_count` rounding (one decimal of "B") and
the `min_vram_gb` downstream bucket:
Qwen2.5-7B-Instruct 7.62B (HF card: 7.6B)
Qwen3-30B-A3B 30.5B / 3.34B active (card: 30.5B / 3.3B)
GLM-4.5 352.7B / 33.6B active (card: 355B / 32B)
The safetensors path is unchanged and remains the last resort, so
repos with neither a parsable name nor a fetchable config.json behave
exactly as before.
Closes#955.
* Improve Docker GPU setup diagnostics
Add a Docker GPU preflight script for NVIDIA users. The script is
read-only by default, checks host NVIDIA drivers, Docker availability,
and container GPU passthrough, and prints actionable next steps.
Add explicit opt-in modes to print install commands, install NVIDIA
Container Toolkit on Ubuntu/Debian, and enable the NVIDIA Compose overlay
in .env after passthrough is verified.
Document common NVIDIA Docker failure modes, ignore generated .env
backups, and clarify that Cookbook can only detect GPUs exposed to the
Odysseus container.
* Clarify Docker GPU diagnostic limits
The Cookbook fit scanner was reporting impossibly low VRAM requirements
for some pre-quantized models — e.g. cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit
shown as 7.1 GB ('perfect' on a 12 GB card) when the real load is ~40 GB.
Root cause is in the catalog builder. When _entry_from_modelinfo falls
back to safetensors metadata for the parameter count, it stored
safetensors.total directly. For pre-quantized repos that figure reflects
*packed* element counts: AWQ/GPTQ-Int4 pack 8x 4-bit weights into one
I32, AWQ-8bit/GPTQ-Int8/FP8 pack 4x. The catalog therefore recorded
~1/8 of the real parameter count, and min_vram_gb = packed * bpp
double-applied the quantization.
Fix the safetensors fallback:
* prefer the per-dtype parameters dict when available and unpack only the
I32/I64 entries (the F16/BF16 scale/zero tensors and embeddings are
already at their real element counts)
* fall back to total * pack_factor when only total is exposed
Patch the catalog entries that were affected by the old fallback so the
fit ratings reflect reality without waiting for a full catalog rebuild:
* cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit 11.4B -> 79.7B (40.8 GB VRAM)
* stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ 4.6B -> 30.5B
* stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ 5.1B -> 30.5B
* warshanks/Qwen3-8B-abliterated-AWQ 2.2B -> 8.2B
* QuantTrio/sarvam-30b-AWQ 7B -> 30B
* QuantTrio/sarvam-105b-AWQ 19B -> 105B
Closes#377.