Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:
* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).
Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.
`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.
Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.
Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.
`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.
Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
get_builtin_overrides() was swallowing all exceptions with a bare
`except Exception: pass`, so misconfigured tool-description overrides
would silently produce wrong agent behaviour with no log trace.
The background endpoint refresh loop had the same pattern: any probe
failure was silently ignored, giving operators no signal that the
refresh was broken.
Also removes a circular self-import (`from src.agent_loop import
_build_base_prompt`) inside _build_system_prompt; the function is
already in scope and the import created a latent circular reference risk.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ModelEndpoint is defined in core.database, not src.database. The wrong
import silently prevented the module from loading in deployment
configurations that do not have a src/database.py shim, resulting in an
ImportError at startup.
Also adds a warning log when resolve_endpoint finds no usable model
(all models hidden or the list is empty), making the otherwise-silent
failure visible in operator logs.
The test_auth_regressions stub for src.endpoint_resolver was missing the
build_models_url attribute, which caused test collection errors.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Agent mode treated local /v1 endpoints, including Ollama on :11434, as native-tool-capable by host/model heuristics. On Ollama's OpenAI-compatible surface some models that advertise tool support stop after a single token when schemas are sent (issue #1567). Default local Ollama /v1 back to fenced tool blocks unless the endpoint explicitly has supports_tools=True.
Also compare both the runtime chat URL and the normalized endpoint base when reading ModelEndpoint.supports_tools. That keeps a saved base URL such as http://localhost:11434/v1 effective when the active session URL is /v1/chat/completions.
Tests: .venv/bin/python -m pytest tests/test_tool_support_heuristic.py
* fix: support large proxy model endpoint refresh
Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline.
Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails.
Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage.
* fix: preserve endpoint ping status semantics
save() called load(), which DECRYPTS every stored key, then re-encrypted
only the key being saved and wrote the whole dict back. The other
providers' keys were thus persisted in plaintext; on the next load()
Fernet raised InvalidToken on them and they were silently dropped.
Add _load_raw() that returns the still-encrypted on-disk dict (reusing the
existing missing/corrupt-file guards) and have save() build on that, so
untouched providers keep their ciphertext. load() now also goes through
_load_raw(), keeping its behavior identical.
Fixes#1914
Co-authored-by: EkaTantra Dev <dev@ekatantra.com>
- Claude Agent integration: AGENT_CONFIGS.claude, INTG_TYPES.claude,
setup_claude_routes + integrations/claude/ skill bundle. Wired in
app.py alongside the existing Codex integration; same scope-gated
/api/codex/* backend; agent form has new description so users know
it's setup for an external CLI, not an agent streamed inside Odysseus.
- Remove mark_email_boundaries action: not good enough yet. Stripped
from task UI, scheduler defaults, registry, tool schema, clear-cache
route. Added to RETIRED_HOUSEKEEPING_ACTIONS so existing rows + their
task_runs auto-purge on startup.
- Cookbook download reliability: "Reconnect" fix button in the crash
diagnosis runs _reconnectTask after probing has-session. 30s confirm
window before marking a download "done" — kills the Finished/Downloading
flicker when tmux briefly drops between captures.
- Mobile UX: tap anywhere on a note card body opens the editor;
Update button morphs to Archive when no text was edited; bell icon
accent-colored; chip-trashing notif pills fade so only the icon
rotates into the trash zone.
- Settings integrations: SVG-per-provider in email + API preset
dropdowns, custom drop-up-aware menus, accent sub-header icons
(IMAP/SMTP), consistent card styling between list + edit, contacts
Edit/Delete icons, agent form description copy.
Validate only token-supplied direct base_url values for API-token chat requests, while keeping admin-configured endpoints available for local/LAN providers.
Scope configured endpoint fallback selection to the API token owner, fail closed for unknown token owners, and preserve strict session ownership checks when resuming sessions from chat-scoped API tokens.
Add focused regression coverage for direct base_url SSRF rejection, configured endpoint fallback behavior, token-owner scoping, URL validation, and null-owner session/endpoint handling.
parse_markdown_to_values — the read-back path for export-pdf, the export
preview, and prepare-signed-reply — matched the bold field label with [^*]+, so
it could not match a label containing '*' (the near-universal required-field
marker: "Email *", "State *", "Signature *"). The value then stayed empty, so
the exported PDF and the signed-reply attachment came out blank for that field
with no error — a whole form of required fields could export completely empty.
Match the label non-greedily (.+?) so '*' in labels is tolerated while still
splitting at the first ':**' / '**[', which also preserves a value that itself
contains ':**'.
Adds tests/test_form_markdown_roundtrip.py (render -> parse roundtrip): asterisk
text/choice/signature labels survive (fail before, pass after); plain labels and
colon-bearing values are unaffected.
Co-authored-by: NubsCarson <nubs@nubs.site>
If session.initialize() or list_tools() raises after the stdio
subprocess or SSE connection is already open, the AsyncExitStack is
never closed — leaking the child process or HTTP connection. Wrap the
setup phase in try/except to aclose() the stack before re-raising.
PermissionError was not in the except tuple so an unreadable settings.json
would crash the app instead of falling back to defaults. Added alongside the
existing FileNotFoundError/JSONDecodeError/ValueError catches.
Also adds test_settings_error_paths.py covering all four failure modes:
missing file, corrupted JSON, wrong type, and permission denied.
Gemma 4 returns reasoning_content in streaming responses via
llama-server, but the model wasn't listed in _THINKING_MODEL_PATTERNS,
causing reasoning tokens to be mishandled. Add "gemma" to the pattern
list and register Gemma 4's 128K context window in KNOWN_CONTEXT_WINDOWS
so the agent loop budgets context correctly.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
A native function/tool call whose `arguments` field is valid JSON but not an
object — a bare array like ["ls -la"], or a string/number/bool/null — parsed
fine in function_call_to_tool_block and then every branch called args.get(...),
raising AttributeError ('list'/'str' object has no attribute 'get'). That
propagated out of the streamed agent loop (no surrounding try/except at the
call site in stream_agent_loop) and aborted the user's entire turn. Weaker and
local models routinely emit malformed args like this.
Coerce non-dict parsed arguments to {} (mirrors the existing empty-arguments
behavior), so the tool runs with empty args instead of killing the stream.
Adds tests/test_function_call_non_object_args.py covering array/string/number/
bool/null arguments — they fail before this change and pass after.
datetime.utcnow() is deprecated in Python 3.12 and removed in 3.14.
Swap the five calls in src/cleanup_service.py for a local _utcnow()
helper returning naive UTC, matching the naive DateTime columns the
archive/delete cutoffs compare against (same approach as the
task-scheduler and core-database slices). Add a regression test
asserting the helper stays naive so the cutoff math can't hit a
naive/aware TypeError.
Part of #1116
* fix: context_compactor token helpers crash on non-string message text
* fix: _truncate_text_to_token_budget returns an empty string for non-string text, not the raw value
compute_next_run parsed scheduled_time as "HH:MM" with int(parts[0]),
int(parts[1]) and no validation, so "9", "9am", "25:00", "9:" or ":30" raised
IndexError/ValueError. The POST /tasks create route passes the user/LLM-supplied
scheduled_time before its try block (and only validates the cron field), so a
bad value surfaced as an unhandled 500 rather than the clean 400 used for other
invalid fields — and the same crash could fire inside the scheduler loop when
recomputing next_run for an already-stored bad row.
Guard the parse and fail closed (warn + return None), matching the existing
invalid-cron handling in the same function.
Adds tests/test_scheduler_scheduled_time_validation.py — malformed values return
None (fail before with IndexError/ValueError), valid HH:MM still computes.
* fix: is_public_blocked_tool crashes on a truthy non-string tool name
* fix: is_public_blocked_tool fails closed (blocks) on a malformed non-string tool name