odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-23 13:15:29 -04:00

Author	SHA1	Message	Date
Dividesbyzer0	589fcd314a	fix(image): patch realesrgan torchvision compatibility (#4110 )	2026-06-15 15:16:41 +09:00
Max Hsu	039431f5ea	fix(mcp): detect npx cache entries before probing (#4034 )	2026-06-15 15:14:48 +09:00
Dividesbyzer0	7f571c8f7e	fix(agent): keep gpt-oss on text tool mode Treat gpt-oss local OpenAI-compatible models as text/fenced-tool models unless the endpoint explicitly declares native tool support.	2026-06-15 15:11:52 +09:00
cirim	056d1fb960	fix(llm): make connect timeout configurable Use a configurable LLM_CONNECT_TIMEOUT for call and stream connect budgets instead of the previous hard-coded 3s default.	2026-06-15 15:11:38 +09:00
Muhammed Midlaj	4b0a977988	fix(models): probe /v1/models for path-less LM Studio endpoints Probe /v1/models for path-less OpenAI-compatible model endpoints and surface clearer LM Studio diagnostics with the actual probed URL.	2026-06-15 15:09:50 +09:00
Boudbois2271	54690997ec	fix(calendar): treat same-day list_events range as full day Expand zero-width or inverted list_events windows to one day so start=end single-day queries return that day's events.	2026-06-15 15:09:19 +09:00
Wes Huber	be046dd29a	fix(cookbook): preserve state during lifecycle tick Log malformed cookbook state and re-read fresh state before writing scheduled-stop mutations so concurrent UI changes are preserved.	2026-06-15 15:07:03 +09:00
holden093	4c41834dc7	fix(youtube): consolidate duplicate handler Make src.youtube_handler a compatibility wrapper around services.youtube.youtube_handler so transcript state, URL parsing, and timeout behavior no longer diverge.	2026-06-15 15:03:41 +09:00
holden093	96052c5e8a	fix(agent): add contacts domain to tool classifier Add a contacts domain rule pack and deterministic contact intent detection so contact prompts surface resolve_contact/manage_contact tools.	2026-06-15 15:03:19 +09:00
adabarbulescu	afc81bdd7b	fix: drop thinking deltas from background agent loops Skip thinking-only deltas when accumulating background, scheduled-task, and teacher captured reply text.	2026-06-15 15:03:09 +09:00
Dividesbyzer0	a07fe35936	fix(agent): honor explicit web search requests Promote explicit web-search phrasing to tool use and keep web_search/web_fetch available for that turn even when the stale web toggle is false.	2026-06-15 15:02:10 +09:00
RaresKeY	a7766d0b7f	fix(agent): honor auth-disabled tool access after setup Check explicit auth-disabled mode before configured-admin ownership checks so single-user mode keeps full agent tool access after setup.	2026-06-15 15:01:48 +09:00
Tom	2857723e47	fix(security): restrict API-key encryption key file to 0o600 Lock the API key encryption key file to owner-only permissions on creation and when reading existing keys, with regression coverage for permissions and encryption roundtrip.	2026-06-15 15:00:11 +09:00
Michael	a633611823	fix(agent): let retrieval run for non-English low-signal queries Allow non-workspace low-signal prompts to fall through to tool retrieval so non-English requests are not limited to always-available tools.	2026-06-15 14:58:56 +09:00
muhamed hamed	3b3c0d6254	fix: detect HuggingFace token when downloading cookbook models (#3459 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-11 21:53:16 +01:00
Mazen Tamer Salah	f5c1eb4b9d	fix(settings): degrade load_features to defaults on PermissionError load_settings() already catches PermissionError, but load_features() caught only FileNotFoundError/JSONDecodeError/ValueError. An existing-but-unreadable data/features.json (e.g. root-owned after a deploy) therefore raised instead of falling back to DEFAULT_FEATURES, taking down GET /api/auth/features and anything that reads feature flags. Add PermissionError to the except tuple to match load_settings(). Adds tests/test_load_features_permission_error.py. Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-11 21:20:10 +01:00
Marius Popa	2a4bba2b9e	fix(api-keys): preserve encrypted keys when saving providers (#1920 ) * fix(api-keys): preserve encrypted keys when saving providers * test(api-keys): cover malformed raw key entries --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-11 18:23:54 +01:00
Kenny Van de Maele	620fdd0859	feat(agent): confine agent file/shell tools to a selectable workspace (#3665 ) * feat(agent): workspace confinement via context-local binding + get_workspace tool Bind the per-turn workspace once in execute_tool_block; the shared path resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd helper (agent_cwd) read it, so file tools + bash/python are confined centrally and a new tool that uses the shared helpers cannot accidentally bypass it. Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory modal (reusing existing modal/button CSS), the /workspace slash command, and a get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic (realpath/normcase/commonpath) and docker-safe (container paths, no host assumptions). Reopens #2023. * ux(workspace): clarify workspace is not a sandbox Picker modal note + pill tooltip + get_workspace tool/output wording now state plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder, but bash/python only start there (cwd) and are not sandboxed. Modal note reuses the existing .muted class. * fix(agent): treat an active workspace as file-work intent A vague low-signal message (e.g. "look at the local project") matches no domain keywords, so tool retrieval is skipped and only always-available tools are offered — leaving the agent with no file access even though a workspace is set. When a workspace is active, include the file/code tools (incl. get_workspace) on low-signal turns so the agent can act on the folder. Also requires the tool index (ChromaDB) to be reachable for normal retrieval; that is an environment dependency, not part of this change. * ux(workspace): hide pill + overflow entry in chat mode Workspace only scopes the agent's file/shell tools, so the pill and the overflow 'Workspace' entry are agent-only now — hidden in chat mode like the bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is called from the agent/chat setMode handler. * prompt(tools): steer bash/python to defer to the dedicated file tools bash/python schema descriptions (what native-tool-calling models read) were bare and gave no steer, so models would do file ops via the shell (e.g. writing SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python in the schema + tool-index + prompt section to prefer read_file/write_file/ edit_file/grep/glob/ls and only be used for what those do not cover. * prompt(tools): keep bash/python deferral generic (no hardcoded tool names) Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc. by name, so the guidance does not go stale if those tools are renamed. * style(workspace): drop em-dashes from added code comments/strings * ux(workspace): terser non-sandbox note in picker (no tool-name list) * ux(workspace): mirror terse non-sandbox wording in pill tooltip * chore: untrack local venv symlink (run-only, not part of the feature) * prompt(workspace): keep get_workspace text generic (no hardcoded tool names) * fix(agent): low-signal + workspace surfaces only read-only file tools Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but not write_file/edit_file/bash/python -- those wait for a request that actually calls for them (RAG retrieval still adds them on a real ask). * feat(workspace): cap browse listing at 500 dirs with a truncated hint Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local _MAX_BROWSE_DIRS so a directory with thousands of children does not dump every row into the picker; the response carries a truncated flag and the modal tells the user to type a path to jump in. * chore: untrack local venv symlink (run-only artifact) * fix(workspace): vet the workspace root against the sensitive-path deny list at bind time The in-workspace resolver deny-lists sensitive paths inside the workspace, but the empty-path search root is the workspace itself, so a workspace of ~/.ssh could be listed via ls with no path. vet_workspace() (public, in tool_execution next to the resolvers) rejects non-directories and sensitive roots before the path is ever bound; chat_routes uses it instead of its inline isdir check. * fix(workspace): reject filesystem roots and stop showing rejected workspaces as active Review findings from #3665: P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes every absolute path 'inside' the workspace and collapses confinement into host-wide file access. A root is its own dirname, so reject when dirname(resolved) == resolved; the browse response now carries a selectable flag and the picker disables 'Use this folder' on unselectable dirs. P3: /workspace set stored any string client-side and the chat route silently dropped rejected values, so the pill could claim a confinement that was not in effect. New admin-gated /api/workspace/vet validates manual paths before they persist (canonical path returned), and when a posted workspace is rejected at send time the stream emits workspace_rejected so the client clears the stored value and toasts instead of continuing silently. * fix(workspace): check caller privilege before vetting the posted workspace Review finding: /api/chat_stream called vet_workspace() on the posted value for every caller and emitted workspace_rejected on failure, so a non-admin who can chat but cannot use file/shell tools could distinguish existing directories from missing/file/sensitive/root paths by whether the event appeared. The resolution now lives in _resolve_request_workspace, which drops the submitted value uniformly for non-admin callers, with no vetting and no event, before the path ever touches the filesystem. Admin and single-user behavior is unchanged. Test pins that valid and invalid paths are indistinguishable for a non-admin and that vet_workspace is never invoked for them.	2026-06-11 18:17:54 +02:00
Michael	95c54ac3cb	fix: use _truncate for tool output display limits in agent_loop (#3831 ) Replace hardcoded [:2000] and [:4000] slicing with the shared _truncate helper from tool_utils, which uses MAX_OUTPUT_CHARS and adds an explicit truncation indicator when content is cut. Scoped down from the original PR: only agent/tool-output display behavior, no integrations.py changes. Co-authored-by: michaelxer <michaelxer@users.noreply.github.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-11 17:05:13 +01:00
Kenny Van de Maele	263d41c58a	fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers (#3945 ) * fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers _apply_local_cache_affinity adds session_id + cache_prompt for llama.cpp KV-cache slot affinity (#2927), gated on _is_self_hosted_openai_compatible, which treated any unknown OpenAI-compatible host as self-hosted. Strict cloud providers added as custom endpoints (Mistral at api.mistral.ai) reject unknown body fields, so every request failed with 422 extra_forbidden. Self-hosted now also requires the endpoint to resolve as local via model_context.is_local_endpoint: loopback/private/tailscale host, or endpoint kind explicitly configured as "local" (the escape hatch for tunneled self-hosted servers). is_local_endpoint is promoted to a public name since llm_core now shares it. Fixes #3793 * test(llm): sweep cloud OpenAI-compatible hosts in affinity gating Parametrized cases adapted from #3839 (credit: Shabablinchikow): deepseek, x.ai, together, fireworks, and the Gemini OpenAI-compat endpoint must all stay free of the llama.cpp extras, not just the Mistral host from #3793. * fix(llm): narrow the Tailscale range to 100.64.0.0/10 in is_local_endpoint Review finding on #3945: _PRIVATE_PREFIXES carried a bare "100." prefix, treating all of 100.0.0.0/8 as local while Tailscale only uses the CGNAT block 100.64.0.0/10. Public 100.x hosts (e.g. AWS ranges outside the block) were classified local and still received the llama.cpp extras this PR exists to keep away from strict providers. Match the narrowed classification routes/model_routes.py already uses, with boundary tests just below, inside, and just above the range.	2026-06-11 17:51:03 +02:00
Mazen Tamer Salah	f941db29d3	fix(search): batch FTS hit lookups into one query (N+1) (#3909 ) _search_fts ran the FTS MATCH query, then looked up each hit's full row with its own db.query(...).filter(id == message_id).first() inside a loop, so a search returning N hits issued N extra SELECTs. Fetch all hit rows in a single IN(...) query via _fetch_messages_by_id and reassemble results in hit (relevance) order. Adds tests/test_session_search_batch_fetch.py asserting a single batched query (and no query for empty input). Existing session-search tests stay green.	2026-06-11 16:31:54 +02:00
RaresKeY	c500bcb47d	fix(uploads): migrate upload ownership on rename (#3617 )	2026-06-11 16:01:04 +02:00
Mazen Tamer Salah	f7a3605b16	fix(webhooks): keep references to in-flight delivery tasks (#3859 ) fire() and fire_and_forget() scheduled delivery with bare create_task()/ loop.create_task() and kept no reference. asyncio holds only a weak reference to a task, so the GC could collect a delivery (or the fire() coroutine itself) before it completed, silently dropping the webhook. Track in-flight tasks in a set on the manager via a _spawn_tracked() helper that holds a strong reference for the task's lifetime and discards it on completion (add_done_callback), and route both schedule sites through it. Adds tests/test_webhook_task_refs.py.	2026-06-11 15:53:52 +02:00
George Lawton	4f48cfa9ae	fix: omit temperature for Opus 4.7+ on native Anthropic path (#3117 ) Anthropic removed the sampling parameters (temperature, top_p, top_k) starting with Claude Opus 4.7 — sending temperature at all, even 0.0, returns HTTP 400. _build_anthropic_payload sent it unconditionally, so every native-Anthropic request to Opus 4.7/4.8 failed: the research probe (ResearchHandler._probe_endpoint, temperature=0) aborted runs before they started, and all DeepResearcher._llm calls 400'd. Add _anthropic_rejects_temperature (version-gates opus-N-M >= (4,7)) and omit temperature in the Anthropic builder for those models. Older Claude models (Opus 4.6 and below, Sonnet/Haiku) keep temperature and the existing [0,1] clamp. The version gate is hardened against real-world model id shapes: - a word-boundary anchor so a substring like `octopus-4-8` is not read as Opus and stripped of temperature; - a 1-2 digit minor cap so a dated id such as `claude-opus-4-20250514` (Opus 4.0, listed in ANTHROPIC_MODELS) parses as major-only and keeps temperature, while dated 4.7+ snapshots still match; - a non-string guard so a non-string model can't raise AttributeError (the previous builder never called .lower() on it). Adds regression tests covering 4.7/4.8 omission, older/dated/legacy retention, the substring overmatch, and non-string inputs. Fixes #3065 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 16:27:40 +03:00
RaresKeY	50fedff2f2	fix(email): scope learned sender signatures by owner (#3724 )	2026-06-11 13:26:59 +02:00
cyq	c01034f9cb	fix(settings): scrub camelCase secret keys (#3707 )	2026-06-11 12:53:33 +02:00
pewdiepie-archdaemon	ebd2332db4	Agent prompt builder: stop re-adding ALWAYS_AVAILABLE on top of filtered tools Found the reason yesterday's tool-retrieval drop wasn't taking effect: in _build_agent_prompt, when relevant_tools was provided, it computed tool_names = set(ALWAYS_AVAILABLE) \| set(relevant_tools) which silently re-added every tool get_tools_for_query had just deliberately discarded. So when a 'save this for <person>' query dropped manage_memory from the retrieved set, the prompt builder put it right back, and the model saw both tools again. Trust the relevant_tools set. get_tools_for_query already starts from ALWAYS_AVAILABLE — any discard there is intentional and should propagate. Only force-include ask_user and update_plan here as belt- and-suspenders since the agent loop relies on those for its own control flow. Other callers (task_scheduler) already union ALWAYS_AVAILABLE or ASSISTANT_ALWAYS_AVAILABLE into relevant_tools before passing it in, so they're unaffected.	2026-06-11 09:49:20 +09:00
pewdiepie-archdaemon	f5ad59317c	Tool retrieval: HARD drop manage_memory when query is a contact-save pattern Description-level steering wasn't enough — even with the explicit 'DO NOT use for info about another person' in manage_memory's description, models kept choosing memory over manage_contact. They can't if memory isn't in the toolset. New logic in ToolIndex.get_tools_for_query: detect three contact-save patterns and discard manage_memory from the returned set (overriding ALWAYS_AVAILABLE): 1. 'save [up to 3 words] for/to <name>' where <name> isn't a timing / pronoun stopword (later, tomorrow, me, you, future, etc.). Catches the canonical 'save this for X' and the wider 'save this address for X', 'save it for X'. 2. 'to/in/into (my) contacts' or 'address book'. Catches both 'add X to my contacts' and 'put this in my address book for X'. 3. Possessive: 'save (his/her/their) (address/phone/email/...)'. Stronger signal — also force-adds manage_contact to the set in case the keyword fallback missed it. Verified: 8 positive contact patterns all drop memory, 10 false- positive 'save X for later/tomorrow/me/the next thing' all keep it.	2026-06-11 09:46:34 +09:00
pewdiepie-archdaemon	df47536b8d	manage_memory descriptions: explicit deferral to manage_contact for person info Even with manage_contact in the retrieved tool set, models were still defaulting to manage_memory when the user pasted an address + 'save for <person>'. Both tools were in front of the model and it picked memory. Tighten both descriptions to steer at decision-time: - agent_loop.py manage_memory description: clarify scope is facts about the USER, with an explicit 'DO NOT use for info about another person' + a 'use manage_contact instead' line. - tool_index.py manage_memory description: same in shorter form, so the embedded retrieval signal is consistent with the prompt-time description.	2026-06-11 09:25:23 +09:00
pewdiepie-archdaemon	8a00f954a9	Tool retrieval: catch 'add X to (my) contacts' / 'address book' phrasings The literal phrase 'add to contacts' missed when there was a name between 'add' and 'to', e.g. 'add Pat to my contacts'. Anchor on the tail with 'to my contacts', 'to contacts', 'to address book' so word boundaries fire regardless of what sits in front.	2026-06-11 09:18:30 +09:00
pewdiepie-archdaemon	8632072ce0	Contacts: postal-address support via vCard ADR, keep tool prompt minimal Closes the gap that pushed the agent into manage_memory when the user pasted an address and said 'save this for X'. manage_contact now accepts an optional address arg end-to-end: - routes/contacts_routes.py: - _normalize_contact carries an 'address' field - _build_vcard emits ADR:;;<address>;;;; (street component of the RFC-6350 7-part ADR), only when address is non-empty - _parse_vcards reads ADR, joins non-empty components with ', ' - _create_contact and _update_contact thread address through; update preserves existing address when caller passes empty - src/tool_implementations.py do_manage_contact: - add accepts address; require at least name+address or email (was: email required) so address-only contacts are addable - update accepts address; require name OR emails OR address - src/tool_schemas.py: schema gets a single 'address' string field - src/tool_index.py + src/agent_loop.py: descriptions get one 'address' arg mention and a 'use this for save-X-for-person / address pastes / phone-with-name' steering line. Net: a few bytes added, not a paragraph. Also: removed a stray name from the schema's manage_contact example strings ('save Jonathan's email…') — no real names in the codebase.	2026-06-11 09:14:52 +09:00
pewdiepie-archdaemon	153b788134	Tool retrieval: surface manage_contact for 'save X for <person>' patterns When the user dumps a postal address or phone number alongside a person's name and says 'save this for X', the vector retriever was missing manage_contact because its description only mentioned the literal word 'contact'. The model defaulted to manage_memory (which is in ALWAYS_AVAILABLE), so the saved fact ended up as un-named memory that wouldn't surface on a later 'what's X's address?' search. - Rewrite manage_contact's index description to anchor on the semantics: 'save info about another person', including postal/ mailing address, ZIP, phone, etc. Now it embeds close to address- paste queries. - Extend the keyword intent-map with 'save this for', 'save it for', 'mailing address', 'postal code', 'their address', etc. — common ways users say 'this belongs to a contact' without the literal word 'contact'.	2026-06-11 08:56:42 +09:00
pewdiepie-archdaemon	bc2d934b94	Agent email safety: stage drafts for user approval instead of auto-send Closes the auto-send hole that let earlier models invent signatures (e.g. signing 'David' for a user named Felix) and SMTP them to real recipients before the user could review. New setting: agent_email_confirm (default True). When on, the MCP send_email and reply_to_email tools no longer SMTP directly — they write the composed email to scheduled_emails with a new status 'agent_draft' (far-future send_at so the scheduled-send poller ignores them) and return a {pending: true, pending_id, to, subject, body, message: ...} payload. The model surfaces that to the user. Backend endpoints to approve / cancel: - GET /api/email/pending → list staged drafts for the owner - POST /api/email/pending/{id}/approve → flip status to 'pending' + backdate send_at so the existing scheduled-send poller delivers immediately - DELETE /api/email/pending/{id} → status = 'cancelled' UI: - Settings / AI Defaults gets a new 'Email Safety' card with the toggle, default on. - Tool descriptions for send_email and reply_to_email now include the pending behavior + an explicit 'DO NOT invent a signature, do not type a person's name' guardrail. Pass 2 (next): inline chat card with Send / Discard buttons so the user doesn't have to type a confirmation reply. Today's prompt + the listing endpoint give the model a clean path to surface drafts.	2026-06-11 08:50:06 +09:00
RaresKeY	d5603ee575	fix(research): migrate active task owners on rename (#3618 )	2026-06-11 01:17:02 +02:00
Mazen Tamer Salah	218b9ecbc8	fix(startup): ping real endpoints in warmup/keepalive (#3641 ) _warmup_endpoints called model_discovery.get_endpoints(), which does not exist on ModelDiscovery. It raised AttributeError on every startup and on every 60s keepalive tick, was swallowed by the outer except, and pinged nothing, so the cold-start prevention the loop exists for never ran. Add ModelDiscovery.warmup_ping_urls(), which resolves the /models probe URLs from the real discover_models() output, and call it from the warmup loop via asyncio.to_thread (discovery does a blocking port scan, so keep it off the event loop). Adds tests/test_warmup_ping_urls.py: resolves /models URLs from discovered items, honors the limit, degrades to [] on discovery failure, and documents that get_endpoints never existed.	2026-06-10 19:21:45 +02:00
Srinesh R	d9a4b99046	fix: handle batch events format in manage_calendar tool (#3503 ) * fix: handle batch events format in manage_calendar tool Models like deepseek-v4-flash emit batch events array instead of individual create_event calls. The tool defaulted to list_events (no action key), so events were never created despite the model confirming success. - Add batch normalization in do_manage_calendar - Map start/end objects to flat dtstart/dtend strings - Add tests for both object and flat string formats * fix: surface partial batch failures in manage_calendar Partial failures were silently dropped - batches with mixed success/failure would report only created count with no error visibility. - Return non-zero exit code for any failures - Surface both created and failed counts in response - Include first error message for debugging - Add test for partial failure case * chore: strip trailing whitespace in batch normalization block * chore: strip whitespace-only blank lines in batch events test	2026-06-10 19:13:08 +02:00
Mazen Tamer Salah	f5b91f1e9e	fix(tasks): read Memory.text in classify_events personal context (#3640 ) The classify_events task pulled user memories to give the LLM personal context, but read `m.content`, which the Memory ORM does not have (the column is `text`). That raised AttributeError on the first row; the surrounding except swallowed it and logged at debug, so the personal-context block was silently always empty and events were classified without it. Extract the rendering into `_memory_context_lines` (reads `text`, robust via getattr, keeps the 200-char and 40-line caps) and raise the swallowed-exception log to warning so a future schema mismatch is visible. Adds tests/test_classify_events_memory_text.py for the field, truncation, blank skipping, missing-attr robustness, and the line cap.	2026-06-10 19:03:45 +02:00
Mazen Tamer Salah	4e210d3337	fix(research): stop rescanning the research dir on every status poll (#3637 ) get_status() called get_avg_duration() unconditionally, and that helper globs and JSON-parses every file under the research data dir. The SSE status stream polls get_status() roughly once a second, so with a few saved reports each poll re-read and re-parsed all of them, including for sessions that are not active (the disk branch never even used the value). Compute avg_duration only for active sessions and memoize it on the task entry, so a long stream computes it once instead of on every poll. Behaviour is unchanged: active streams still report avg_duration. Adds tests/test_research_status_avg_duration.py: an inactive session does no avg scan, and an active session computes it once across many polls.	2026-06-10 17:40:44 +02:00
pewdiepie-archdaemon	2bf372b41c	Tasks: optional persona for LLM + research tasks (biases output voice) Wire the existing built-in PERSONAS catalog through to scheduled tasks the same way I wired it to reminder synthesis. Repurposes the dormant scheduled_tasks.character_id column. UI (static/js/tasks.js) - New 'Persona' select in the LLM / Research task form, with the five built-in characters (socrates/razor/nietzsche/spark/odysseus) plus a default 'no persona' option. Pre-populates from existing.character_id on edit. Non-llm/research types explicitly clear it on save. API (routes/task_routes.py) - TaskCreate + TaskUpdate gain character_id: Optional[str]. - _task_to_dict echoes character_id back so the form can hydrate on edit. Update endpoint stores '' as None to allow clearing. Runner (src/task_scheduler.py) - When task.character_id is set and matches a built-in persona, prepend the persona prompt to the task system prompt so the model speaks in that voice while still knowing it's running a scheduled task. - crew_member.personality still wins as the base; character_id stacks on top.	2026-06-10 23:36:18 +09:00
SurprisedDuck	e115b0155c	fix(security): don't grant tool access in the pre-setup window (#3506 ) * fix(security): don't grant tool access in the pre-setup window owner_is_admin_or_single_user() returned True whenever auth was not configured, which conflated two very different states: - intentional single-user mode (operator set AUTH_ENABLED=false), and - the pre-setup window (auth enabled, but no admin created yet). In the second state, blocked_tools_for_owner() returned an empty set, so server-execution tools (bash/python) and other admin-only tools were ungated. The auth middleware already 401s /api/ requests pre-setup, but a caller that bypasses it (trusted loopback / internal-tool path) could reach those tools before setup completed. Treat "not configured" as admin only when auth is intentionally disabled (AUTH_ENABLED=false), mirroring the AUTH_ENABLED parsing in app.py and core.middleware. Single-user mode is preserved; the pre-setup window is now non-admin as defense-in-depth. Adds regression tests for both states. Fixes #3201 Supported by Claude Opus 4.8 * refactor(security): reuse _auth_disabled() instead of a duplicate helper Addresses review on #3506: src/auth_helpers.py already has _auth_disabled() with the identical AUTH_ENABLED parse. Drop the duplicate _auth_intentionally_disabled() and call the existing helper via a lazy import inside owner_is_admin_or_single_user (mirroring the lazy core.auth import) to avoid any import cycle. Removes the now-unused `import os`. Behaviour and the two regression tests are unchanged. Supported by Claude Opus 4.8 --------- Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>	2026-06-10 14:37:26 +02:00
ooovenenoso	725d174243	fix(research): track analyzed URLs separately (#3125 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-10 12:08:22 +01:00
Yeoh Ing Ji	3e49658204	refactor(tools): extract document tools to handle registry (#3666 ) * feat(tools): add document management tool handlers to the agent_tools module * feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py * feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools * refactor(tools): add document tool dispatcher and updated tool calling path * refactor(tools): remove duplicated document management functions * refactor(tools): removing unused functions and adding new import paths * refactor(tools): update document tool execute methods to use context dictionary * refactor(tests): update import paths for document tools in test files * refactor(tests): update owner parameter format in document management tests * refactor(tests): update import path for _owned_document_query * feat(tools): add document management tool handlers to the agent_tools module * feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py * feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools * refactor(tools): add document tool dispatcher and updated tool calling path * refactor(tools): remove duplicated document management functions * refactor(tools): removing unused functions and adding new import paths * refactor(tools): update document tool execute methods to use context dictionary * refactor(tests): update import paths for document tools in test files * refactor(tests): update owner parameter format in document management tests * refactor(tests): update import path for _owned_document_query * refactor: update import paths for document tools * fix(tests): correct source path for document ID test	2026-06-10 10:41:52 +02:00
pewdiepie-archdaemon	4f7061fd61	Settings overhaul + UI polish pass Two months of iteration on the Settings panel, integration forms, and small visual nudges across the app. Highlights: Settings restructure - Add Models: split into separate Local + API cards (no more in-card tabs); each fuses Type/Provider with the URL input. - Added Models: new dedicated sidebar tab, with Probe + Clear-offline pulled into its header; Local/API sub-section icons accent-tinted. - Search: Web Search and a new Deep Research card (Model + tuning), with a cross-link to AI Defaults. Provider hints use real clickable anchors; Web Search Test button shows a whirlpool spinner. - AI Defaults: Image Generation card returns; Research Model card carries only Endpoint+Model with a cross-link to Search; Vision / Default / Utility fallbacks unified under one numbered-row design matching Search's chain. - API Permissions (was 'API Tokens'): per-row rename, inline Permissions toggle that expands the scope-edit panel, in-field copy icons (icon→check on success). Empty state accent-tinted. - Integrations: + Add Integration drops a type-picker menu directly under the button (drop-up on tight viewports); each integration form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses the same accent-outlined Save/Test/Cancel buttons right-aligned. - Danger Zone: Wipe→Delete with trash icons; new 'Delete everything' row at the bottom that loops every category. AI Synthesis (Reminders) - Persona dropdown sourced from PROMPT_TEMPLATES + custom preset. - src/reminder_personas.py mirrors the five built-ins for the server-side synthesis path. - dispatch_reminder() reads reminder_llm_persona and uses the persona's system prompt; empty/unknown falls back to warm-neutral. Esc handling - Kebab menus and the provider picker intercept Esc in capture phase so dismissing a popup no longer closes the whole Settings modal. Accent tinting - Scoped CSS rule across data-settings-panel=ai/services/added-models/ search/integrations/reminders for card h2 icons + the Added Models sub-section icons. Codex/Claude integration form - No more auto-creation on form open — explicit Create token button. - New tokens start with every scope granted; existing tokens move out of the integration form into the API Permissions card. - Setup reveal: copy buttons inline inside the token + setup code blocks; shorter subtitle wording. Misc visual polish - Save/Test/Cancel uniformly accent-outlined and right-aligned on every integration form. - Provider logos render inline next to the search fallback selects and the Deep Research Search dropdown. - Trash icons in fallback rows bumped to 20x20 so they fill the 32px button. - Image generation default flipped to off.	2026-06-10 15:15:13 +09:00
Lucas Daniel	55ff22c6d5	fix(chat): stabilize system prompt, sequence memory extraction, and send stable session id to preserve KV cache (#3360 ) * fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache Fixes #2927. As diagnosed in the issue, three things in Odysseus's request pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache continuity, forcing a full prompt re-evaluation (15-30s+) on every turn: 1. Dynamic content folded into the system prompt every turn. Both the chat preface (ChatProcessor.build_context_preface) and the agent system prompt (_build_system_prompt) injected current_datetime_prompt() — text that changes every minute — directly into system-role messages, which llm_core then concatenates into the single system message sent as the cached prefix. Any byte difference there invalidates the entire cache. Moved this to a new current_datetime_context_message() helper that returns a standalone user-role message, inserted near the end of the array (right before the latest user turn) instead of mixed into the system prompt. The static system prefix (preset prompt + safety policy + agent base prompt) now stays byte-identical across turns of the same session. 2. Memory/skill extraction side-requests competed with the main completion. run_post_response_tasks fired extract_and_store / maybe_extract_skill via asyncio.create_task — fire-and-forget coroutines that could overlap the next turn's main request and steal llama.cpp's limited processing slots, evicting the cached checkpoint. They're now queued through a new _run_extraction_jobs_sequentially helper that waits for the session's stream to go idle and runs the jobs strictly one at a time. 3. No stable session identifier was sent to local backends, so llama.cpp assigned a new processing slot via LRU every turn ("session_id=<empty> server-selected (LCP/LRU)"), losing slot affinity. Added _apply_local_cache_affinity() in llm_core, which sets session_id and cache_prompt: true on outgoing payloads — gated to self-hosted OpenAI-compatible endpoints only (never api.openai.com or other cloud providers, which reject unrecognized request fields with a 400). Threaded session_id through stream_llm / llm_call_async / stream_agent_loop from the existing Odysseus session id. Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload- assembly and scheduling code paths: byte-identical system prefix across two turns of the same session (with a regression check that genuinely changed instructions DO still change it), the dynamic time block landing as a user-role message, extraction jobs waiting for the stream to go idle and running sequentially, and the outgoing payload carrying a stable session_id (same across turns of one session, different across sessions) only for self-hosted endpoints. Updated tests/test_user_time.py for the new message placement. * fix(tests): accept owner= kwarg in normalize_model_id monkeypatch The upstream normalize_model_id signature now takes an owner= keyword argument, and chat_helpers.py passes owner=getattr(sess, "owner", None) at the call site. Update the test stub lambda to **kwargs so it handles the new argument without breaking, and update chat_helpers.py to forward the owner parameter consistently. --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 22:46:54 +01:00
Lucas Daniel	d273085744	fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut (#3540 ) * fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut * fix(integrations): avoid mutating response dict in-place on truncation * fix(integrations): truncate dict responses and bound list sentinel overhead - Dict path now walks keys in insertion order, adding them one at a time while checking that the accumulated dict + _truncated marker fits within the 12 000-char limit. Previously the marker was appended without removing any content, so large dicts were not actually truncated. - List path now subtracts the sentinel's serialised size (+ element-separator padding) from the budget before binary-searching, so the final array including the sentinel stays at or under the limit. - Add regression tests: large-dict actually-truncated, small-dict pass-through, and list-with-sentinel respects the size bound. --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 22:34:08 +01:00
Michael	2e6fff2212	fix: preserve reasoning_content in sanitized messages for Moonshot/Kimi (#3152 ) Providers like Moonshot (Kimi K2.5/K2.6) require the reasoning_content field to be present on assistant tool-call messages in multi-turn conversations. The sanitizer's allow-list was missing this field, causing HTTP 400: 'thinking is enabled but reasoning_content is missing in assistant tool call message at index N'. Add reasoning_content to the allowed field set in _sanitize_llm_messages and cover with regression tests. Fixes #3118 Co-authored-by: michaelxer <michaelxer@users.noreply.github.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 21:44:38 +01:00
Rohith Matam	fbd8ee9033	fix: fall back for npx cache subprocess check (#3560 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 20:41:23 +01:00
Rares Tudor	016157019c	fix(tools): use _INTERNAL_BASE in serve-session endpoint registration (#3675 ) #3322 renamed the loopback base to _INTERNAL_BASE, but a later Cookbook commit reintroduced one call site using the old _COOKBOOK_BASE name, raising NameError whenever the agent registers a model endpoint for a running serve session. Fixes #3669	2026-06-09 20:31:29 +02:00
Sid	9e74a327f8	fix(llm): remove max_output_tokens from ChatGPT Subscription payload (#3656 ) ChatGPT's Codex API rejects any request that includes max_output_tokens, returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused Deep Research to always fail during the endpoint probe when a ChatGPT Subscription model was selected. Remove the conditional that set payload["max_output_tokens"] in _build_chatgpt_responses_payload(). The parameter is simply not sent. Also update the two affected tests: - Rename test_chatgpt_subscription_payload_uses_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens - Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens → test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero - Assert max_output_tokens is absent rather than present Fixes #3650	2026-06-09 17:42:12 +02:00
Sheikh Rahat Mahmud	9180847c0e	feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964 ) * Add consolidated service health endpoint for degraded-state reporting ROADMAP (High Priority) asks for "Better degraded-state reporting for ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no single readout of which subsystems are actually working: /api/health is only a liveness ping and each subsystem's signal lives in a different module, so a misconfigured self-host install gives no consolidated picture. This adds an admin-only GET /api/diagnostics/services endpoint backed by a new src/service_health.py aggregator. Each subsystem reports a uniform {name, status, detail, meta} where status is ok \| degraded \| down \| disabled, and the response rolls up an overall verdict (worst non-disabled status). Probes are deliberately non-intrusive and safe to poll: - ChromaDB: reads the .healthy flags on the RAG and memory vector stores. - SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No search query is run. - ntfy: GET the server's built-in /v1/health. No test notification is sent. - email: short IMAP connect+logout per configured account (no credentials in meta). - providers: probe each enabled ModelEndpoint's model list (no api_key in meta). Probe functions take their inputs as parameters and isolate the network call to injectable callables, so they unit-test without touching the network (same pattern as the merged provider-endpoint tests). Network probes run concurrently off the event loop via asyncio.to_thread with bounded per-probe timeouts. memory_vector is now passed into setup_diagnostics_routes (new optional param, backward-compatible) so ChromaDB's vector-memory store can be reported too. Tests: tests/test_service_health.py — 29 tests covering every status mapping per subsystem, the overall rollup, and that no secrets leak into meta. Verification: python -m pytest tests/test_service_health.py -q # 29 passed python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q Backend + tests only; an Admin/Settings UI badge that renders this endpoint is a natural follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(diagnostics): bound service-health wall-clock and redact secrets Addresses review on #964. Blocker 1 — genuinely bounded wall-clock: - providers_health and email_health now fan out per-item probes across a bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET), instead of probing endpoints/accounts sequentially. Stragglers are reported as a controlled `timeout` and never block; the pool is shut down with wait=False so the response returns on time regardless of endpoint/account count. - The IMAP connect path now honors the service-health budget: _imap_connect gained a pass-through `timeout` param and the probe calls it with _PROBE_TIMEOUT instead of the default 15s. - collect_service_health runs the four network subsystems concurrently, each under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop. Blocker 2 — no secret/raw-error leakage in the response: - _safe_url strips userinfo, query, and fragment from every URL surfaced in meta (searxng instance, ntfy base, provider name fallback), keeping only scheme/host/port/path. - _classify_error maps every probe failure to a controlled category token (timeout, connection_refused, dns_error, tls_error, network_error, http_error, auth_or_protocol_error, …) — raw str(exception), which can embed credentialed URLs or server text, is never returned. Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py): - URL userinfo/query redaction for searxng/ntfy/providers. - secret-bearing exception strings map to categories and don't leak. - multiple slow providers/accounts stay bounded (single + 25-endpoint cases). - subsystems run concurrently; aggregate deadline yields a controlled result. - route-level unauthenticated (401) / non-admin (403) / admin (200) coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(diagnostics): isolate route tests so they don't leak module globals The new route tests replaced src.service_health.collect_service_health and routes.diagnostics_routes.require_admin via direct assignment, which persisted for the rest of the pytest session. In CI's full alphabetical run that fake collector (returning services=[]) leaked into the later collect_service_health tests and failed them. Switch to monkeypatch.setattr so both are restored after each test. No production code change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:00:24 +01:00

1 2 3 4 5 ...

454 Commits