Commit Graph

665 Commits

Author SHA1 Message Date
Ocean Bennett db1bbfe588 fix(sessions): keep fresh chats during auto tidy (#1871)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 01:06:20 +01:00
Kenny Van de Maele 2404b00f18 refactor(uploads): centralize upload byte-limits in upload_limits.py (#3364) (#3518)
Move every per-route upload byte-limit into src/upload_limits.py as a
validated, env-overridable constant via read_byte_limit_env:

- Add GALLERY_UPLOAD_MAX_BYTES, GALLERY_TRANSFORM_UPLOAD_MAX_BYTES,
  MEMORY_IMPORT_MAX_BYTES, PERSONAL_UPLOAD_MAX_BYTES,
  EMAIL_COMPOSE_UPLOAD_MAX_BYTES, STT_MAX_AUDIO_BYTES, ICS_MAX_BYTES.
- Routes import their constant instead of defining it locally: replaces 4
  raw int(os.getenv(...)) and removes 3 hardcoded literals.
- The 3 previously-hardcoded limits (email compose, STT audio, calendar
  ICS) are now env-overridable with the same ODYSSEUS_*_MAX_BYTES naming.
- Defaults unchanged, so behavior is unchanged unless an env var is set;
  an invalid value now fails fast with a clear message instead of a bare
  int() ValueError.
- Document all env vars in .env.example and the README.

Fixes #3364
2026-06-09 01:24:30 +02:00
Alexandre Teixeira a240f28af9 test(taxonomy): auto-mark tests by area and sub-area (#3491) 2026-06-09 01:13:28 +02:00
Ocean Bennett e7c1d75884 fix(models): query v1 models for llama-server endpoints (#3380)
* fix(models): query v1 models for llama-server endpoints

* test(models): accept owner kwargs in llama-server regression
2026-06-09 01:09:02 +02:00
Mateus Oliveira f7ae85590b refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils (#3478)
* refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils

Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager()
into a single leaf module (src/tool_utils.py) that imports only from
src.constants. This eliminates the lazy-import hack
('from src import agent_tools' inside function bodies) in tool_execution.py
and tool_implementations.py, and fixes a latent bug: the _truncate copy in
tool_execution.py was missing the isinstance guard and would crash on None.

Also deletes mcp_servers/_common.py — it was dead code with zero callers
anywhere in the codebase, containing its own copy of truncate() and
constants that already exist in src/constants.py.

* fix(tools): route remaining get_mcp_manager imports to src.tool_utils

The maintainer's feedback flagged src/task_scheduler.py:1857 and
routes/task_routes.py:977. A project-wide search found a third call site
in src/agent_loop.py that also imported get_mcp_manager from
src.agent_tools instead of src.tool_utils.

All three are now sourced from the canonical location in src.tool_utils.

---------

Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>
2026-06-09 01:05:30 +02:00
Ocean Bennett 62ffcb6236 fix(cookbook): preserve same-host ssh profile selection (#3373)
* fix(cookbook): preserve same-host ssh profile selection

* fix(cookbook): resolve same-host ssh profiles in running tab and port lookups
2026-06-09 00:36:10 +02:00
Wes Huber 85c6056c87 test(models): add regression coverage for Z.AI coding endpoint probing (#2244)
Add focused tests for the z.ai/api/coding path override:
- _match_provider_curated: 5 tests verifying coding vs base key
- _probe_endpoint: 3 tests verifying model preservation, curated
  append on partial response, and base-zai exclusion

Rebased onto dev per reviewer request.

Fixes #2230

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-08 23:07:29 +01:00
Rohith Matam 049833e309 fix: skip malformed document tool call items (#3494) 2026-06-08 23:25:31 +02:00
Cookiejunky 4e497f4878 fix(cookbook): guard break-system-packages pip flag (#3510) 2026-06-08 23:10:20 +02:00
Lucas Daniel 5462030cde fix(auth): per-user allowed-models checklist ignores cache, [None] doesn't block (#3355)
Three issues combined to make the per-user 'Allowed models' checklist
unreliable (#3032):

1. admin.js _loadModelsForUser fetched /api/models, which is backed by
   cached_models — endpoints that haven't been probed yet (e.g. a
   freshly-added DeepSeek API endpoint) simply didn't show up in the
   checklist. Switched to /api/model-endpoints, which always reflects
   every configured endpoint regardless of cache state.

2. _saveModels sent allowed_models: [] both when the admin clicked
   [All] (no restriction) and [None] (block everything) — the backend
   had no way to distinguish the two.

3. _enforce_chat_privileges treated an empty allowed_models list as
   'no restriction' (falsy -> skip the check), so [None] had no effect.

Added an explicit block_all_models privilege flag (defaulting to False,
and forced to False for admins) that admin.js now sets when zero models
are checked. _enforce_chat_privileges checks it first and 403s
regardless of allowed_models contents.
2026-06-08 22:52:39 +02:00
Lucas Daniel 0a324f20d2 fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356)
* fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models

_resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block
parser whenever a model produced no native tool_calls, regardless of whether
that model has a reliable native function-calling channel. Native models
(GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write
illustrative ```bash/```python/```json examples in guide-only prose; the
fallback parser matched these and executed them as real commands, sometimes
looping for several rounds as the model tried to clarify with more examples
(#3222).

Restrict the textual fenced-block fallback to non-native models, which rely
on it as their only tool-invocation channel. Native models are trusted to use
their structured tool_calls channel for real invocations; when they don't
emit one, a bare fence in their response is prose, not an action. The native
tool_calls path itself is untouched.

This sits one layer below #3088's guide-only policy enforcement: that PR
blocks tool exposure/execution on explicit no-tools requests, while this fixes
the parser so ordinary illustrative fences are never misread as calls in the
first place, on any turn.

* fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence

_resolve_tool_blocks previously short-circuited the entire textual parser
(tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native
function-calling models with no native tool_calls. That also dropped Patterns
2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content
as text), which are real calls a model couldn't emit on its structured channel
(e.g. DeepSeek-V falling back to DSML), not illustrative examples.

parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY
Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks
passes skip_fenced=is_api_model so fenced examples stop being executed for
native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and
recoverable. cleaned_round mirrors the same gate when persisting round text, so
an illustrative fence that wasn't executed isn't stripped from saved/reloaded
history either (it was streaming once and then disappearing on reload).
2026-06-08 22:25:28 +02:00
Mazen Tamer Salah 8e494cc1c4 fix(chat): keep balanced trailing ')' when extracting URLs (#3406)
extract_urls() stripped any trailing ')' unconditionally via
`re.sub(r'[.,;:!?\)]+$', '', url)`. That corrupts URLs that legitimately
end in a parenthesis — most commonly Wikipedia disambiguation links like
https://en.wikipedia.org/wiki/Python_(programming_language), which became
...Python_(programming_language and then 404 when fetched by the web/research
tools.

Strip trailing sentence punctuation as before, but only drop a ')' when it is
unbalanced (more ')' than '('), so a prose-glued "(see https://example.com)"
still loses its closing paren while balanced URLs keep theirs.

Added tests/test_extract_urls.py covering balanced, unbalanced, nested, and
trailing-punctuation cases.
2026-06-08 21:33:29 +02:00
nubs 932b7f2446 fix(email): close IMAP socket when connect/login fails (#3174) (#3363)
* fix(email): close IMAP socket when connect/login fails (#3174)

_imap_connect opened a live socket via _open_imap_connection and then
called conn.login() with no try/finally, and _open_imap_connection called
conn.starttls() unguarded. When auth fails (e.g. an Office 365 app password
on an MFA-enabled tenant, #3174) or STARTTLS is rejected, the already-open
socket was orphaned. Every IMAP caller funnels through _imap_connect,
including the 30-minute _auto_summarize_poller, so a persistently
misconfigured account leaked one descriptor per pass toward FD exhaustion.

The previously merged leak fixes (#1325/#1330/#1423/#1530) only guard the
post-connect body and monkeypatch _imap_connect to succeed, so this
connect-time path was uncovered. Wrap login() and starttls() so a failure
calls conn.shutdown() (low-level close; logout() can't run pre-auth) before
re-raising. Adds two regression tests that fail without the guard.

* fix(email): guard MCP IMAP+SMTP connect-time leaks too (#3174)

Folds in the sibling connect-time leaks vdmkenny flagged on #3363, so the
whole connect-then-step leak class is closed in one place:

- mcp_servers/email_server.py::_imap_connect — guard starttls() and login();
  close pre-auth with conn.shutdown() before re-raising.
- mcp_servers/email_server.py::_smtp_connect — guard starttls() and login();
  SMTP has no shutdown(), so close with conn.close() (socket close, no QUIT).

Routes SMTP (_send_smtp_message) is already safe via 'with smtplib.SMTP(...)'.
Adds four regression tests (one per guard), verified to fail without the fix.
2026-06-08 21:21:41 +02:00
Alex Little a58f526992 fix(presets): scope expand-prompt model resolution to owner (#3477)
* fix(presets): scope expand-prompt model resolution to owner

/api/presets/expand resolved its model endpoint with no owner, so in a
multi-user setup it could match another user's endpoint and use its URL
and decrypted api_key. Pass effective_user(request) to _resolve_model so
resolution is owner-scoped. Adds a regression test.

* fix(presets): scope teacher and audit model resolution to owner

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alex Little <alexwilliamlittle@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 21:12:02 +02:00
Mazen Tamer Salah 5198516979 fix(sessions): copy message metadata when forking a session (#3409)
fork_session passed each source message's metadata dict by reference into the
new session. add_message() -> _persist_message() stamps _db_id (and timestamp)
onto that dict in place, so persisting the fork overwrote the SOURCE messages'
_db_id with the forked rows' ids — silently breaking edit/delete-by-id on the
original conversation.

Copy the metadata dict per message so the fork and source no longer alias.

Adds tests/test_fork_session_metadata.py asserting the source session's
message metadata is unchanged after a fork.
2026-06-08 20:49:15 +02:00
Giuseppe Castelluccio 095c74b985 fix(security): fail closed in /api/models auth gate on unexpected errors (#3489)
GET /api/models swallowed any non-HTTPException raised while checking
whether the caller is authenticated (bare except Exception: pass), so a
broken auth_manager or an exception from get_current_user silently
granted the full model list to an anonymous caller instead of rejecting
the request. Now any unexpected exception logs and returns HTTP 500.

Split out of #2360 per reviewer request to keep the deny-list and the
auth-gate fix as separate, single-purpose PRs.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 20:23:39 +02:00
CorVous 34a3f8637a fix(memory): make auto-memory extraction reliable for reasoning models (#3190)
* fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn

extract_and_store appended the recent window as raw alternating role messages
after the system prompt. Since the window is the last N messages, the prompt
usually ENDED on an assistant turn — and a chat model given a prompt ending on
an assistant turn returns an empty completion (nothing to answer). The result
was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no
memories were ever stored, while skill extraction (which flattens the transcript
into a single user message) worked fine.

Flatten the window into one user message ending with an explicit instruction,
mirroring the skill extractor, so the model always responds. Also harden parsing
for reasoning models, matching the audit path which already does this:
- raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think>
  before emitting JSON; 500 truncated it before any JSON appeared);
- strip <think>/prose preambles via strip_think and slice the embedded JSON
  array before json.loads, instead of bombing on char 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(memory): reframe the comment to the accurate root cause (raw-chat framing)

The earlier comment leaned on "ends on an assistant turn -> empty completion",
which is only one failure mode. The dominant cause, confirmed by a controlled
repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat
messages makes the model treat it as a conversation to continue rather than a
transcript to analyze, so it returns [] even when durable facts are present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally

Factor the strip/fence/slice/json.loads logic out of extract_and_store into
a pure module-level helper _parse_extraction_json(raw) -> list and drop the
'text[0] != "["' guard so the array is sliced whenever both brackets exist
(fixes trailing commentary like '[...] Done!' reaching json.loads).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 19:57:44 +02:00
Mazen Tamer Salah 8449baea80 fix(api-tokens): preserve scopes on a partial token update (#3407)
PATCH /api/tokens/{id} unconditionally recomputed scopes from
payload.get("scopes"). On a rename — body {"name": "..."} with no "scopes"
key — that is None, so _normalize_scopes(None) returned the default ["chat"]
and the handler overwrote token.scopes, silently dropping every scope the
token had been granted (e.g. email:read, calendar:write).

Only write scopes when the request actually includes them, and return the
token's real stored scopes in the response (matching the GET /tokens display
shape) instead of the recomputed default.

tests/test_api_token_routes.py: add rename-preserves-scopes,
explicit-scopes-applied, and missing-token-404 cases for the PATCH handler.
2026-06-08 19:37:31 +02:00
Mazen Tamer Salah d58202d10e fix(presets): persist presets atomically to avoid corruption on crash (#2169)
PresetManager.save() used a plain open("w") + json.dump, which truncates
presets.json before writing the new content. A crash, power loss, or
serialization error mid-write leaves the file truncated/empty and every
saved preset is lost.

Route the write through core.atomic_io.atomic_write_json (tmp file +
os.replace), matching how the rest of the codebase persists JSON state.
The helper is imported lazily so this module stays free of the heavy core
package import graph at module load time.

Adds tests/test_preset_atomic_save.py covering the source contract, a
failed-write leaving the existing file intact, and a round trip.
2026-06-08 19:16:37 +02:00
Mazen Tamer Salah 1209f258d7 fix(caldav): skip the prune when any object fails to parse (#3454)
* fix(caldav): don't prune the whole window when no objects could be parsed

The post-sync prune deletes local origin=="caldav" rows in the window whose UID
the server didn't just return. With an empty seen_uids it falls back to
`uid.isnot(None)` — a match-all delete. That's right when the calendar is
genuinely empty, but when the server returns objects and every one fails to
parse (malformed iCal / an icalendar error), seen_uids is empty only because
nothing could be read, so the match-all branch silently deletes every local
event in the 90-day-back/365-day-forward window.

Track whether any object failed to parse and gate the prune with a small pure
helper `_should_prune_window(seen_uids, parse_failed)`: prune when something was
read, or when the calendar is genuinely empty (no objects, no parse errors), but
never when objects came back unreadable.

Adds tests/test_caldav_prune_parse_failure.py for the three cases.

* fix(caldav): skip the prune on any parse failure, not just total

Review follow-up (#3454): _should_prune_window returned True whenever seen_uids
was non-empty, so a partial parse failure (say 48 of 50 objects parse) still
pruned the 2 unreadable-but-still-upstream events, because their UIDs were absent
from seen_uids. Any parse failure makes seen_uids an incomplete view of the
server, so pruning against it is unsafe whether the failure is total or partial.

Skip the prune on any parse failure (return not parse_failed); only prune on a
clean read (a genuinely empty window is still safe to prune). Tradeoff: one
permanently-unparseable event pauses deletion mirroring until it is fixed, which
is the safe direction (false-keep beats false-delete).

Replace the now-incorrect "partial failure still prunes" assertion with a
partial-failure regression: one object parses, one fails, so the prune is
skipped and the unparsed event's local copy is not deleted.

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 18:59:14 +02:00
Mazen Tamer Salah d71284194b fix(memory): only delete memories the model explicitly drops in tidy (#3455)
* fix(memory): only delete memories the model explicitly drops in tidy

The AI memory-tidy path computed deletions as the complement of the model's
`keep` list (`if mid not in keep_ids: continue`). When the model returned a
valid response that simply omitted some existing ids — a common LLM lapse — every
omitted memory was silently deleted, even though it was neither a duplicate nor
listed in `drop`.

Honor the explicit `drop` set instead: delete only ids the model dropped (minus
any it saw only truncated), and preserve everything else, still applying cleaned
text/category from `keep`.

Adds tests/test_consolidate_memory_explicit_drops.py: a memory the model omits
from both keep and drop survives; an explicitly dropped one is removed.

* refactor(memory): remove now-dead keep_ids from tidy

After deletion switched to drop_ids and text/category rewrites to cleaned_by_id,
keep_ids was written but never read. Remove the init, the .add(mid) in the keep
loop, and the truncated .update() (its truncated-protection is already covered by
`drop_ids -= truncated_ids`). Pure deletion, no behavior change; tests stay green.

Addresses review feedback on #3455.

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-08 18:54:45 +02:00
Aman Tewary d458cade98 docs(email): clarify Outlook password auth failures
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-08 15:32:16 +01:00
Mostafa Eid d6882a895e feat(chat): recall last user message on empty composer ArrowUp (#1175)
Pressing ArrowUp on an empty #message composer restores the last sent user text, matching common chat-app UX (Slack, Discord, ChatGPT).

- Read from #chat-history .msg-user dataset.raw (same path as resend/regenerate), not session sidebar metadata

- Literal empty check (whitespace-only drafts are preserved); ignore Shift/Alt/Ctrl/Meta and IME composition

- Extract wiring to composerArrowUpRecall.js; rAF + 250ms retry only (no global MutationObserver)

- Add tests/test_composer_arrow_up_recall_js.py

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-08 13:06:05 +02:00
Vykos 4a9085d252 fix(endpoint): scope secondary endpoint lookups by owner
* Scope secondary endpoint lookups by owner

* Reject unregistered image endpoint URLs for non-admins

* Adjust owner-scope tests for rebased routes

* Allow non-admins to compare endpoints they own

The compare owner-scope guard called _reject_raw_endpoint_url_for_non_admin
with endpoint_id=None, so it rejected every signed-in non-admin
/api/compare/start request — even for endpoints the caller owns — because
compare resolves endpoints by URL and carries no endpoint_id. That locked
non-admins out of compare entirely.

Resolve the owned ModelEndpoint first and pass its id, so a registered
endpoint the caller owns is allowed while only truly raw, unregistered URLs
are rejected (mirrors the gallery inpaint/harmonize checks in this PR).
Replace the source-only reject test with deterministic reject + allow
regressions that no longer depend on the dev DB contents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Bind compare sessions to the resolved owner-scoped endpoint

/api/compare/start created the [CMP] helper sessions with the raw
caller-supplied endpoint URL and only used the owner-scoped lookup to
decide whether to copy an API key. That stopped key borrowing but still
let a non-admin inject an arbitrary raw endpoint URL into the compare
session path.

Now, when the supplied URL resolves to a registered endpoint visible to
the caller, the session binds to that row's own normalized base URL
(build_chat_url(normalize_base(ep.base_url))) plus its headers — the same
registered-endpoint shape session_routes uses. The raw URL survives only
when ep is None, which non-admins already hit a 403 on, leaving raw URLs
reachable solely for admins / single-user mode with no borrowed key.

Adds compare-specific behavior tests: another user's private endpoint is
rejected (nothing created), the session binds to the stored URL rather
than the raw input, and an admin raw URL is allowed but carries no
inherited key.

Addresses the review on #1511.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Validate both compare endpoints before creating any session

start_comparison resolved + created each [CMP] session inside one loop,
so a request pairing a valid owned endpoint A with an unregistered raw
endpoint B raised 403 only after A's session was already created — and
its Authorization header copied in. The rejected request left a partial
compare session with that header behind.

Split the flow into two phases: phase 1 resolves and owner-validates
both endpoints (running the raw-URL reject helper) and stashes the
session URL + headers; phase 2 creates the two sessions only once both
passed. A 403 on either endpoint now aborts with nothing created and no
header copied.

Adds a regression test: owned endpoint A + unregistered/raw endpoint B
-> 403 with no sessions created.

Addresses the follow-up review on #1511.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Resolve compare credentials by endpoint id, not URL alone

Two endpoints visible to a caller can share a base_url but hold different
api_keys. _owned_endpoint_by_url returned whichever row sorted first, so
/api/compare/start could copy the wrong key into the [CMP] session.

Add _owned_endpoint_by_id (same owner scoping) and optional endpoint_a_id/
endpoint_b_id form fields. The id pins the exact registered endpoint; URL
resolution remains only for legacy/admin raw-URL callers. An id the caller
can't see 404s instead of falling back to a same-URL row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Loosen research-routes owner-scope assertion to the stable substring

The rebased _resolve_research_endpoint generalized its owner derivation to
honor an explicit owner arg first (owner = owner or getattr(sess, ...)), so
the exact-line assertion broke CI. Assert the stable session-derivation
substring instead of the full line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:51:55 +01:00
stocky789 1e0d9b92af feat: add ChatGPT Subscription provider (#2876)
* feat: Add ChatGPT Subscription support and related features

- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.

* refactor: share provider device-flow setup

- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI

* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX

Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
  shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
  applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
  (centralized behind _is_discovery_only_provider) — the Codex/Responses API
  has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
  table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
  backing them is deleted (_delete_orphaned_provider_auth), surfaced via
  cleared_provider_auth in the delete response.

Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
  panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
  into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
  Copy works in non-secure (HTTP/LAN) contexts.

* fix: harden chatgpt subscription provider

* chore: remove PR media from branch

* Fix chatgpt subscription recovery and token handling

---------

Co-authored-by: 5p00kyy <admin@5p00ky.dev>
2026-06-08 10:19:18 +02:00
Mike ac94885c84 refactor(constants): single source of truth for data dir (#3368)
* refactor(constants): single source of truth for data dir + merge core/src constants

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(contributing): use named src.constants for data paths, drop core/constants references

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 09:58:52 +02:00
Lucas Daniel adc6ac9394 fix(compare): stream Compare panes directly to stop upstream promptly
The previous approach polled request.is_disconnected() inside the
async-for body of the chat/agent streaming loops. That happens too
late: by the time the poll runs, __anext__() has already awaited and
consumed the next upstream chunk, so a slow or silent generation could
still run for a full round-trip (or until a read timeout) after the
client disconnected. It was also unconditional, which would have made
ordinary chat navigation/refresh/tab-close stop a run that the
detached-run design intentionally keeps going server-side.

Both problems trace back to the same root cause: chat_stream always
wraps its generator in agent_runs (the detached-run manager), which
decouples the generator's lifetime from the SSE response on purpose so
normal chat/agent streams survive the client going away. Polling
disconnection inside a detached generator can never be "prompt" — the
generator isn't tied to that request anymore — and doing so defeats the
whole point of detaching it.

Compare panes don't need (or want) that: each pane's session exists
only to drive that one generation, there's nothing meaningful to
/resume, and the user expects the pane's Stop button — which aborts the
fetch and closes the SSE — to cancel the upstream call right away. So
route compare-mode requests around the agent_runs wrapper entirely and
stream the generator directly as the SSE body. Starlette already
cancels a streaming response's body iterator (raising
CancelledError/GeneratorExit into it) the instant it notices the client
disconnected — including while the generator is mid-await on the next
upstream chunk — and the existing except (CancelledError, GeneratorExit)
handlers in both the chat-mode and agent-mode loops already save the
partial response exactly once. No polling needed; the redesign just
stops getting in its own way.

Normal (non-Compare) chat and agent streams are untouched and keep
going through agent_runs, preserving detached-run semantics (surviving
tab close / navigation / refresh, reconnect via /api/chat/resume).

Replaces the source-text assertions in
tests/test_compare_stop_disconnect_poll.py with runtime tests that
actually exercise the cancellation contract: a Compare-shaped generator
is cancelled mid-await (not after the next chunk arrives) and saves its
partial exactly once; a normal completion still saves exactly once via
the completion path; agent_runs keeps a detached run alive when its
subscriber disconnects and only stops it on an explicit stop()/cancel
(also saving the partial exactly once); and the cancellation contract
is pinned for both chat-mode- and agent-mode-shaped chunk sequences.
2026-06-08 01:13:45 +01:00
Lucas Daniel fa7c4f8ea9 fix(search): catch HTTPStatusError so 403/404 URLs degrade gracefully instead of 500 (#2203)
raise_for_status() raises httpx.HTTPStatusError for 4xx/5xx responses,
but the surrounding try/except only caught httpx.RequestError (network
errors) and RateLimitError (429). Any other HTTP error code propagated
uncaught up through chat_processor -> chat_helpers -> chat_routes and
surfaced as a 500 Internal Server Error.

Added an explicit except httpx.HTTPStatusError clause that logs a warning
and returns an empty result, matching the behaviour already in place for
network errors.

Also adds focused regression tests that exercise the real
fetch_webpage_content() path with a mocked _get_public_url:
- 403/404 responses return the standard empty-result shape instead of
  raising, proving the new HTTPStatusError handling works end to end.
- 429 responses still take their own dedicated rate-limit branch (the
  status_code == 429 check runs before raise_for_status() is reached),
  keeping that behaviour distinct from the new generic HTTPStatusError
  handling.

Dropped the unrelated builtin_mcp.py change that had been carried over
from a rebase; that fix is tracked separately in #2018 and this branch
should stay scoped to the search content fetch path.

Closes #2148
2026-06-08 01:09:21 +01:00
Alexandre Teixeira 77b75ca97e docs(tests): define testing standard and taxonomy (#3372) 2026-06-08 01:15:47 +02:00
Kenny Van de Maele 505d8bae5a fix(cookbook): locate cookbook_state.json via DATA_DIR, not hardcoded /app/data (#3332)
Three call sites hardcoded Path("/app/data/cookbook_state.json"), which only
exists in Docker; on a native run the real path is <repo>/data, so the state
file looked missing and cookbook serve-state was silently ignored. Two others
used os.environ.get("DATA_DIR", "data") (a relative fallback, since DATA_DIR is
never set as an env var). Route all five through core.constants.DATA_DIR so the
path is consistent and absolute on both Docker and native.

Part of #3331.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 00:13:47 +01:00
horribleCodes 9c90f62657 fix(platform): Improve WSL SSH remote compatibility (#3316)
* fix(platform): add WSL compatibility functions and path translation
fix(cookbook): enhance model scan script to support additional HuggingFace cache paths
fix(hardware): improve cache key generation for remote SSH context
test(tests): add tests for WSL detection and path translation functionality

* fix(cookbook): prefer prebuilt wheels for llama-cpp-python and normalize package aliases

* fix: enable StrictHostKeyChecking in nvidia probe
refactor: consolidate ssh & powershell command execution to utility functions in core module
refactor: consolidate nvidia path candidates in to single variables in core module
tests: add tests for new utility functions

* fix: correct wrong variable name
2026-06-08 00:33:50 +02:00
Lucas Daniel 73315e6ddc fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON (#2205)
* fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON

The extractor sliced from the FIRST brace to the LAST brace to recover
JSON embedded in surrounding commentary. When the model emits stray
braces before the JSON object, the slice produces invalid JSON,
json.loads raises, and the exception is swallowed -- the skill is
silently lost.

Fix: walk each brace candidate left-to-right and attempt json.loads on
each slice. The first candidate that parses successfully wins. If none
parse, json.loads on the original text raises and the existing
JSONDecodeError handler logs and returns None as before.

Tested locally -- 8/8 tests passed:
  tests/test_extract_skill_json_nonstring.py (2 passed)
  tests/test_skill_extractor_rows.py (1 passed)
  tests/test_search_content_extraction_parity.py (2 passed)
  tests/test_deep_research_search_error.py (3 passed)

Closes #2199

* test(skill-extractor): add focused repro for stray-brace JSON recovery

* test(skill-extractor): add regression test for leading invalid-brace fragment

Addresses the remaining edge case from review: a response that *starts*
with a brace but the leading fragment isn't valid JSON (e.g.
'{not json}\n{"title": "Valid later", ...}') still needs to recover
the valid skill object that follows.

_extract_json_object (already on dev) handles this correctly — it tries
the whole de-fenced string first, then walks each '{' candidate left-to-
right regardless of whether the response begins with '{', so the leading
invalid fragment no longer short-circuits recovery of the real object.
Updates the comment at the call site to call this out explicitly and adds
a regression test covering exactly the scenario described in review.
2026-06-07 23:31:12 +01:00
Alexandre Teixeira a017108d41 refactor(tests): add temp sqlite helper (#2930) 2026-06-07 23:44:16 +02:00
Alexandre Teixeira 9ad6a2809e test(diffusion-server): exercise security middleware wiring (#3214) 2026-06-07 23:42:11 +02:00
nubs 1a0e1c5d69 fix(documents): restore PDF library metadata and preview (#2483)
PDF uploads are stored as markdown wrappers with pdf_source or pdf_form_source markers so the editor can preserve extracted text, form fields, and annotations. The library exposed that internal wrapper: auto-created PDF documents used the hashed storage filename as the title, and row/facet language reported markdown instead of pdf.

Derive chat-upload PDF titles from the original upload name, derive document-library display language from the PDF source marker for rows, filters, and facets, and keep markdown wrappers excluded from the markdown facet when they represent PDFs.

The expanded library card already renders PDF-backed documents through /api/document/{id}/render-pdf. Allow only that inline PDF preview endpoint to be framed by same-origin app pages while leaving normal routes on X-Frame-Options: DENY and frame-ancestors none.

Also tighten the existing PDF marker regression assertion so it matches the actual historical corruption signature instead of contradicting the preserved [Page 1 text]: marker.

Fixes #2468
2026-06-07 23:23:27 +02:00
Kenny Van de Maele 76c1f42ab0 fix: route all agent loopback calls through internal_api_base() helper (#3322)
#2753 made the agent loopback base port-configurable but only for
_COOKBOOK_BASE in tool_implementations. Several other in-process loopback
calls still hardcoded http://localhost:7000 and broke off port 7000:
cookbook_serve_lifecycle (model-endpoints x2, shell/exec), builtin_actions
(model/serve), task_routes (calendar x3), and the gallery/email calls in
tool_implementations.

Extract the resolution (ODYSSEUS_INTERNAL_BASE / APP_PORT / 7000 fallback,
127.0.0.1 to avoid IPv6 ambiguity) into core.constants.internal_api_base()
and route every call site through it. Rename the now-misnamed _COOKBOOK_BASE
to _INTERNAL_BASE since it serves gallery/email/calendar/serve too. Adds a
test for the resolver plus a regression guard against reintroducing the
literal.

Part of #2752.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 22:22:09 +01:00
Gunnar Arias d85c5e335e fix(security): harden untrusted_context_message against delimiter spoofing (#3086)
* fix(security): harden untrusted_context_message against delimiter spoofing

Root cause: untrusted_context_message() did not sanitise content before
interpolating it into the <<<UNTRUSTED_SOURCE_DATA>>> / <<<END_UNTRUSTED_SOURCE_DATA>>>
delimited sandbox block. Malicious content embedding the literal delimiter
strings could prematurely close the sandbox and inject instructions that
the LLM treats as trusted.

Fix: add _escape_guard_markers() helper that replaces the guard marker
strings with structurally inert tokens (<<<_UNTRUSTED_DATA>>> and
<<<_END_UNTRUSTED_DATA>>>) before the content is wrapped. The function is
applied in untrusted_context_message() after casting content to str.

The existing ~13 call sites (chat_processor.py, agent_loop.py,
deep_research.py, chat_helpers.py, chat_routes.py) are unaffected because
they pass content through without inspecting the output delimiters.

Regression tests added in tests/test_prompt_security.py covering:
- _escape_guard_markers unit tests (open, close, both, benign passthrough)
- untrusted_context_message integration tests (delimiter spoofing
  neutralisation, type coercion, None handling, metadata preservation)

Resolves #3056

* fix(security): sanitize label for newlines and guard markers

Addresses reviewer feedback on PR #3086:
- Normalize label: strip CR/LF to prevent pre-guard line injection
- Escape guard marker literals in label via _escape_guard_markers()
- Add regression tests for label-based newline injection, GUARD_OPEN
  and GUARD_CLOSE in label, and exactly-one-structural-guard assertion

* fix(security): move Source label inside GUARD_OPEN block

The reviewer correctly identified that even after sanitizing the label,
any user-derived label text (e.g. `f"web page: {url}"`) still appeared
before GUARD_OPEN in the trusted framing zone, where the LLM treats it
as trusted instructions.

Fix: move the 'Source: {label}' line to inside the guarded block so
only the hardcoded UNTRUSTED_CONTEXT_HEADER sits before GUARD_OPEN.
The raw label is still kept in metadata["source"] for traceability.
_sanitize_label() and _escape_guard_markers() are kept for defence-in-
depth on the label stored inside the block.

Update test_label_newline_injection_is_blocked to assert no label-
derived instruction text appears before GUARD_OPEN (pre-guard zone is
now empty of any user-derived content).
2026-06-07 22:15:50 +01:00
Syed Ali Jaseem f939cb65ce refactor(tests): replace local function copies in test_endpoint_resolver with real imports (#3359)
* refactor(tests): replace local function copies in test_endpoint_resolver with real imports

The test file carried 9 verbatim copies of src/endpoint_resolver.py functions
to avoid import-pollution concerns, but these copies are a drift hazard — PR #3343
had to update both in parallel.  Replace them with direct imports so future changes
to endpoint_resolver are automatically exercised by the test suite.

Also fixes _ollama_api_root in endpoint_resolver.py: the bare-URL Ollama case
(e.g. http://nas:11434 with empty path) was already handled correctly in the test
copy but was missing from the real function, which would return /chat instead of
/api/chat for native Ollama endpoints without an explicit /api prefix.

Closes #3351

* refactor: import _ollama_api_root from llm_core instead of duplicating it

endpoint_resolver already imports _detect_provider and _host_match from
llm_core. Add _ollama_api_root to that import and remove the local copy,
collapsing two implementations to one source of truth.

llm_core's version is a superset (also strips /api/chat|tags|generate paths),
and since normalize_base already removes those suffixes upstream the result
is identical for every input used here.
2026-06-07 22:47:57 +02:00
nubs 865e61450e fix(upload): configure chat attachment size limit (#2439) 2026-06-07 22:42:24 +02:00
nubs 8746c9c0df fix(documents): discard pending AI diff before switching active doc (#2484)
The document editor stores the AI-edit diff state (_diffModeActive,
_diffOldContent, _diffNewContent, _diffChunks) as a module-global
singleton bound to whatever document was active when the diff opened,
and every document shares one #doc-editor-textarea. When the active
document is switched while an unapproved diff is open, the stale diff
must be discarded first or a later exitDiffMode (tab switch /
Accept-Reject-All) flushes the old document's content into the new
active document and overwrites it (issue #2467).

Guard both paths that switch the active document for an AI update,
while activeDocId still points at the previously-active doc:
- handleDocUpdate(): a doc_update targeting a different document.
- streamDocOpen(): the AI streaming a NEW document — this runs first on
  that path, so a guard only in handleDocUpdate would fire too late and
  still overwrite the streamed document.

Both reuse the exact `if (_diffModeActive) exitDiffMode(true);` guard
switchToDoc() and enterDiffMode() already use.

Fixes #2467
2026-06-07 22:35:35 +02:00
nubs f7c0b3f23b fix document preview refresh after AI edits (#2259) 2026-06-07 22:33:01 +02:00
Syed Ali Jaseem e3e37ce526 fix(sessions): scope enrichment queries by owner, add LIMIT to auto_sort (#3350)
GET /api/sessions fired full-table scans against sessions, documents, and
gallery_images on every call. Added DbSession.owner == user (line 265),
Document.owner == user (line 283), GalleryImage.owner == user (line 289),
and .limit(2000) to auto_sort_sessions (line 1013). All follow the existing
owner-scoping pattern at lines 700 and 1230. No behaviour change — the
response was already correct; this eliminates the over-fetch.
2026-06-07 21:32:21 +02:00
adabarbulescu a8859bb25c fix(llm): Properly detect remote Ollama bare URLs as native endpoints (fixes #3252) (#3343) 2026-06-07 21:19:19 +02:00
lekt8 accdc4fc53 Add hover tooltips for clipped model names (#1982) (#1985)
Long model names are truncated with ellipsis in two places with no way to see
the full name: the model-picker dropdown items and the chat-header model
indicator. Add a native title tooltip carrying the full name to both — the
dropdown item's name span (nameSpan.title = m.display) and the header label
(label.title = the full model id; empty for the 'Select model' placeholder).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:23:44 +02:00
RaresKeY 3a91c11ff8 fix: block app_api access to Cookbook host controls (#3231) 2026-06-07 19:20:11 +02:00
Ashvin 00e8084969 fix(notes): handle time-first due_date phrases in parse_due_for_user (#3319)
parse_due_for_user only matched day-first format ('today at 3pm').
Time-first strings like '3pm today' or '11pm today' — which the tool
schema and tool_index both advertise as valid examples — fell through
all branches, hit dateutil or the legacy _parse_dt fallback, and in
many cases raised ValueError. do_manage_notes then stored the raw
string verbatim, and the ISO-only reminder scanner (action_ping_notes)
never fired the note.

Add a time-first regex branch immediately after the day-first branch
to handle '<time> today|tonight|tomorrow|tmrw|yesterday'. Existing
day-first parsing is unchanged.

Fixes #3302
2026-06-07 19:15:38 +02:00
ooovenenoso 681a2a3f2a fix(cookbook): scan persisted HF cache paths (#3189) 2026-06-07 18:19:47 +02:00
Sebastian Andres El Khoury Seoane 8d9d4ec9c6 feat(platform): Add support for APFEL as part of the dependencies and models for the Cookbook. (#2657)
* feat(platform): add support for Apple Silicon detection in platform compatibility

test(tests): enhance shell_routes tests for Apple Silicon compatibility

* fix issues with missing import

* fix: correct package name in package-lock.json and enhance package installation commands in shell_routes.py and cookbook.js

* feat: add Apfel startup and health checks on macOS

- bootstrap Apfel via Homebrew on arm64 macOS
- start `apfel --serve --port 11435` detached for Odysseus
- verify readiness via `/health`
- clean up the Apfel process on exit or Ctrl+C

* fix: duplicate variable declaration post-merge conflict
- Should fix `node` CI issues.

* fix: issues with the update status of the APFEL dependency.
- fixed by changing the main conditional that determines the update.

* Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file.

* Fix: whitespace issues with the model_routes file

* Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file. Final

* Fix: Fixed updates using PIP for APFEL instead of custom cmd
2026-06-07 17:28:02 +02:00
Kenny Van de Maele 8f2c8d2dc8 fix(test): tolerate owner kwarg in compaction summary resolve_endpoint mock (#3304)
#2996 made context_compactor call resolve_endpoint('utility', owner=owner),
but the mock added by #2174 stubbed it as lambda which: ..., which rejects the
owner kwarg. Each PR passed alone; merged on dev the two compaction tests fail
with TypeError and the pytest job goes red. Widen the mock to lambda *a, **k.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 17:23:06 +02:00
SurprisedDuck b8463e3ac2 fix(email): decode headers without injected spaces (#2433)
routes.email_helpers._decode_header joined the runs from
email.header.decode_header() with " ". Those runs carry their own
surrounding whitespace (e.g. (b"Re: ", None)), and RFC 2047 §6.2 requires
the whitespace between two adjacent encoded-words to be dropped, so the
join produced a double space after an ASCII prefix ("Re:  Jóse"), a
spurious space in "Name <addr>" senders, and a stray space between two
adjacent encoded-words ("Café 日本"). _decode_header backs the inbox list,
message read, search, and the background pollers, so the corruption hit
essentially every non-ASCII subject/sender.

Use email.header.make_header(...) for RFC-correct concatenation, keeping
the existing lossy per-part fallback for malformed/unknown MIME charsets
(make_header raises LookupError there) so the unknown-charset contract in
tests/test_email_decode_header.py still holds.

The sibling mcp_servers.email_server._decode_header was already fixed the
same way (commit 46999de); this brings the routes.email_helpers copy in
line, with regression coverage.

Supported by Claude Opus 4.8

Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
2026-06-07 16:56:20 +02:00