Commit Graph

454 Commits

Author SHA1 Message Date
Dividesbyzer0 589fcd314a fix(image): patch realesrgan torchvision compatibility (#4110) 2026-06-15 15:16:41 +09:00
Max Hsu 039431f5ea fix(mcp): detect npx cache entries before probing (#4034) 2026-06-15 15:14:48 +09:00
Dividesbyzer0 7f571c8f7e fix(agent): keep gpt-oss on text tool mode
Treat gpt-oss local OpenAI-compatible models as text/fenced-tool models unless the endpoint explicitly declares native tool support.
2026-06-15 15:11:52 +09:00
cirim 056d1fb960 fix(llm): make connect timeout configurable
Use a configurable LLM_CONNECT_TIMEOUT for call and stream connect budgets instead of the previous hard-coded 3s default.
2026-06-15 15:11:38 +09:00
Muhammed Midlaj 4b0a977988 fix(models): probe /v1/models for path-less LM Studio endpoints
Probe /v1/models for path-less OpenAI-compatible model endpoints and surface clearer LM Studio diagnostics with the actual probed URL.
2026-06-15 15:09:50 +09:00
Boudbois2271 54690997ec fix(calendar): treat same-day list_events range as full day
Expand zero-width or inverted list_events windows to one day so start=end single-day queries return that day's events.
2026-06-15 15:09:19 +09:00
Wes Huber be046dd29a fix(cookbook): preserve state during lifecycle tick
Log malformed cookbook state and re-read fresh state before writing scheduled-stop mutations so concurrent UI changes are preserved.
2026-06-15 15:07:03 +09:00
holden093 4c41834dc7 fix(youtube): consolidate duplicate handler
Make src.youtube_handler a compatibility wrapper around services.youtube.youtube_handler so transcript state, URL parsing, and timeout behavior no longer diverge.
2026-06-15 15:03:41 +09:00
holden093 96052c5e8a fix(agent): add contacts domain to tool classifier
Add a contacts domain rule pack and deterministic contact intent detection so contact prompts surface resolve_contact/manage_contact tools.
2026-06-15 15:03:19 +09:00
adabarbulescu afc81bdd7b fix: drop thinking deltas from background agent loops
Skip thinking-only deltas when accumulating background, scheduled-task, and teacher captured reply text.
2026-06-15 15:03:09 +09:00
Dividesbyzer0 a07fe35936 fix(agent): honor explicit web search requests
Promote explicit web-search phrasing to tool use and keep web_search/web_fetch available for that turn even when the stale web toggle is false.
2026-06-15 15:02:10 +09:00
RaresKeY a7766d0b7f fix(agent): honor auth-disabled tool access after setup
Check explicit auth-disabled mode before configured-admin ownership checks so single-user mode keeps full agent tool access after setup.
2026-06-15 15:01:48 +09:00
Tom 2857723e47 fix(security): restrict API-key encryption key file to 0o600
Lock the API key encryption key file to owner-only permissions on creation and when reading existing keys, with regression coverage for permissions and encryption roundtrip.
2026-06-15 15:00:11 +09:00
Michael a633611823 fix(agent): let retrieval run for non-English low-signal queries
Allow non-workspace low-signal prompts to fall through to tool retrieval so non-English requests are not limited to always-available tools.
2026-06-15 14:58:56 +09:00
muhamed hamed 3b3c0d6254 fix: detect HuggingFace token when downloading cookbook models (#3459)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-11 21:53:16 +01:00
Mazen Tamer Salah f5c1eb4b9d fix(settings): degrade load_features to defaults on PermissionError
load_settings() already catches PermissionError, but load_features() caught only
FileNotFoundError/JSONDecodeError/ValueError. An existing-but-unreadable
data/features.json (e.g. root-owned after a deploy) therefore raised instead of
falling back to DEFAULT_FEATURES, taking down GET /api/auth/features and anything
that reads feature flags. Add PermissionError to the except tuple to match
load_settings().

Adds tests/test_load_features_permission_error.py.

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-11 21:20:10 +01:00
Marius Popa 2a4bba2b9e fix(api-keys): preserve encrypted keys when saving providers (#1920)
* fix(api-keys): preserve encrypted keys when saving providers

* test(api-keys): cover malformed raw key entries

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-11 18:23:54 +01:00
Kenny Van de Maele 620fdd0859 feat(agent): confine agent file/shell tools to a selectable workspace (#3665)
* feat(agent): workspace confinement via context-local binding + get_workspace tool

Bind the per-turn workspace once in execute_tool_block; the shared path
resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd
helper (agent_cwd) read it, so file tools + bash/python are confined centrally
and a new tool that uses the shared helpers cannot accidentally bypass it.

Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory
modal (reusing existing modal/button CSS), the /workspace slash command, and a
get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic
(realpath/normcase/commonpath) and docker-safe (container paths, no host
assumptions). Reopens #2023.

* ux(workspace): clarify workspace is not a sandbox

Picker modal note + pill tooltip + get_workspace tool/output wording now state
plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder,
but bash/python only start there (cwd) and are not sandboxed. Modal note reuses
the existing .muted class.

* fix(agent): treat an active workspace as file-work intent

A vague low-signal message (e.g. "look at the local project") matches no
domain keywords, so tool retrieval is skipped and only always-available tools
are offered — leaving the agent with no file access even though a workspace is
set. When a workspace is active, include the file/code tools (incl.
get_workspace) on low-signal turns so the agent can act on the folder.

Also requires the tool index (ChromaDB) to be reachable for normal retrieval;
that is an environment dependency, not part of this change.

* ux(workspace): hide pill + overflow entry in chat mode

Workspace only scopes the agent's file/shell tools, so the pill and the
overflow 'Workspace' entry are agent-only now — hidden in chat mode like the
bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is
called from the agent/chat setMode handler.

* prompt(tools): steer bash/python to defer to the dedicated file tools

bash/python schema descriptions (what native-tool-calling models read) were
bare and gave no steer, so models would do file ops via the shell (e.g. writing
SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python
in the schema + tool-index + prompt section to prefer read_file/write_file/
edit_file/grep/glob/ls and only be used for what those do not cover.

* prompt(tools): keep bash/python deferral generic (no hardcoded tool names)

Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc.
by name, so the guidance does not go stale if those tools are renamed.

* style(workspace): drop em-dashes from added code comments/strings

* ux(workspace): terser non-sandbox note in picker (no tool-name list)

* ux(workspace): mirror terse non-sandbox wording in pill tooltip

* chore: untrack local venv symlink (run-only, not part of the feature)

* prompt(workspace): keep get_workspace text generic (no hardcoded tool names)

* fix(agent): low-signal + workspace surfaces only read-only file tools

Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message
in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but
not write_file/edit_file/bash/python -- those wait for a request that actually
calls for them (RAG retrieval still adds them on a real ask).

* feat(workspace): cap browse listing at 500 dirs with a truncated hint

Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local
_MAX_BROWSE_DIRS so a directory with thousands of children does not dump every
row into the picker; the response carries a truncated flag and the modal tells
the user to type a path to jump in.

* chore: untrack local venv symlink (run-only artifact)

* fix(workspace): vet the workspace root against the sensitive-path deny list at bind time

The in-workspace resolver deny-lists sensitive paths inside the workspace,
but the empty-path search root is the workspace itself, so a workspace of
~/.ssh could be listed via ls with no path. vet_workspace() (public, in
tool_execution next to the resolvers) rejects non-directories and sensitive
roots before the path is ever bound; chat_routes uses it instead of its
inline isdir check.

* fix(workspace): reject filesystem roots and stop showing rejected workspaces as active

Review findings from #3665:

P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes
every absolute path 'inside' the workspace and collapses confinement into
host-wide file access. A root is its own dirname, so reject when
dirname(resolved) == resolved; the browse response now carries a selectable
flag and the picker disables 'Use this folder' on unselectable dirs.

P3: /workspace set stored any string client-side and the chat route silently
dropped rejected values, so the pill could claim a confinement that was not
in effect. New admin-gated /api/workspace/vet validates manual paths before
they persist (canonical path returned), and when a posted workspace is
rejected at send time the stream emits workspace_rejected so the client
clears the stored value and toasts instead of continuing silently.

* fix(workspace): check caller privilege before vetting the posted workspace

Review finding: /api/chat_stream called vet_workspace() on the posted value
for every caller and emitted workspace_rejected on failure, so a non-admin
who can chat but cannot use file/shell tools could distinguish existing
directories from missing/file/sensitive/root paths by whether the event
appeared. The resolution now lives in _resolve_request_workspace, which
drops the submitted value uniformly for non-admin callers, with no vetting
and no event, before the path ever touches the filesystem. Admin and
single-user behavior is unchanged. Test pins that valid and invalid paths
are indistinguishable for a non-admin and that vet_workspace is never
invoked for them.
2026-06-11 18:17:54 +02:00
Michael 95c54ac3cb fix: use _truncate for tool output display limits in agent_loop (#3831)
Replace hardcoded [:2000] and [:4000] slicing with the shared _truncate
helper from tool_utils, which uses MAX_OUTPUT_CHARS and adds an explicit
truncation indicator when content is cut.

Scoped down from the original PR: only agent/tool-output display
behavior, no integrations.py changes.

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-11 17:05:13 +01:00
Kenny Van de Maele 263d41c58a fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers (#3945)
* fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers

_apply_local_cache_affinity adds session_id + cache_prompt for llama.cpp
KV-cache slot affinity (#2927), gated on _is_self_hosted_openai_compatible,
which treated any unknown OpenAI-compatible host as self-hosted. Strict
cloud providers added as custom endpoints (Mistral at api.mistral.ai)
reject unknown body fields, so every request failed with 422
extra_forbidden. Self-hosted now also requires the endpoint to resolve as
local via model_context.is_local_endpoint: loopback/private/tailscale
host, or endpoint kind explicitly configured as "local" (the escape hatch
for tunneled self-hosted servers). is_local_endpoint is promoted to a
public name since llm_core now shares it.

Fixes #3793

* test(llm): sweep cloud OpenAI-compatible hosts in affinity gating

Parametrized cases adapted from #3839 (credit: Shabablinchikow): deepseek,
x.ai, together, fireworks, and the Gemini OpenAI-compat endpoint must all
stay free of the llama.cpp extras, not just the Mistral host from #3793.

* fix(llm): narrow the Tailscale range to 100.64.0.0/10 in is_local_endpoint

Review finding on #3945: _PRIVATE_PREFIXES carried a bare "100." prefix,
treating all of 100.0.0.0/8 as local while Tailscale only uses the CGNAT
block 100.64.0.0/10. Public 100.x hosts (e.g. AWS ranges outside the
block) were classified local and still received the llama.cpp extras
this PR exists to keep away from strict providers. Match the narrowed
classification routes/model_routes.py already uses, with boundary tests
just below, inside, and just above the range.
2026-06-11 17:51:03 +02:00
Mazen Tamer Salah f941db29d3 fix(search): batch FTS hit lookups into one query (N+1) (#3909)
_search_fts ran the FTS MATCH query, then looked up each hit's full row with its
own db.query(...).filter(id == message_id).first() inside a loop, so a search
returning N hits issued N extra SELECTs. Fetch all hit rows in a single IN(...)
query via _fetch_messages_by_id and reassemble results in hit (relevance) order.

Adds tests/test_session_search_batch_fetch.py asserting a single batched query
(and no query for empty input). Existing session-search tests stay green.
2026-06-11 16:31:54 +02:00
RaresKeY c500bcb47d fix(uploads): migrate upload ownership on rename (#3617) 2026-06-11 16:01:04 +02:00
Mazen Tamer Salah f7a3605b16 fix(webhooks): keep references to in-flight delivery tasks (#3859)
fire() and fire_and_forget() scheduled delivery with bare create_task()/
loop.create_task() and kept no reference. asyncio holds only a weak reference to
a task, so the GC could collect a delivery (or the fire() coroutine itself)
before it completed, silently dropping the webhook.

Track in-flight tasks in a set on the manager via a _spawn_tracked() helper that
holds a strong reference for the task's lifetime and discards it on completion
(add_done_callback), and route both schedule sites through it.

Adds tests/test_webhook_task_refs.py.
2026-06-11 15:53:52 +02:00
George Lawton 4f48cfa9ae fix: omit temperature for Opus 4.7+ on native Anthropic path (#3117)
Anthropic removed the sampling parameters (temperature, top_p, top_k)
starting with Claude Opus 4.7 — sending temperature at all, even 0.0,
returns HTTP 400. _build_anthropic_payload sent it unconditionally, so
every native-Anthropic request to Opus 4.7/4.8 failed: the research probe
(ResearchHandler._probe_endpoint, temperature=0) aborted runs before they
started, and all DeepResearcher._llm calls 400'd.

Add _anthropic_rejects_temperature (version-gates opus-N-M >= (4,7)) and
omit temperature in the Anthropic builder for those models. Older Claude
models (Opus 4.6 and below, Sonnet/Haiku) keep temperature and the
existing [0,1] clamp.

The version gate is hardened against real-world model id shapes:
- a word-boundary anchor so a substring like `octopus-4-8` is not read
  as Opus and stripped of temperature;
- a 1-2 digit minor cap so a dated id such as `claude-opus-4-20250514`
  (Opus 4.0, listed in ANTHROPIC_MODELS) parses as major-only and keeps
  temperature, while dated 4.7+ snapshots still match;
- a non-string guard so a non-string model can't raise AttributeError
  (the previous builder never called .lower() on it).

Adds regression tests covering 4.7/4.8 omission, older/dated/legacy
retention, the substring overmatch, and non-string inputs.

Fixes #3065

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 16:27:40 +03:00
RaresKeY 50fedff2f2 fix(email): scope learned sender signatures by owner (#3724) 2026-06-11 13:26:59 +02:00
cyq c01034f9cb fix(settings): scrub camelCase secret keys (#3707) 2026-06-11 12:53:33 +02:00
pewdiepie-archdaemon ebd2332db4 Agent prompt builder: stop re-adding ALWAYS_AVAILABLE on top of filtered tools
Found the reason yesterday's tool-retrieval drop wasn't taking effect:
in _build_agent_prompt, when relevant_tools was provided, it computed
  tool_names = set(ALWAYS_AVAILABLE) | set(relevant_tools)
which silently re-added every tool get_tools_for_query had just
deliberately discarded. So when a 'save this for <person>' query
dropped manage_memory from the retrieved set, the prompt builder put
it right back, and the model saw both tools again.

Trust the relevant_tools set. get_tools_for_query already starts from
ALWAYS_AVAILABLE — any discard there is intentional and should
propagate. Only force-include ask_user and update_plan here as belt-
and-suspenders since the agent loop relies on those for its own
control flow.

Other callers (task_scheduler) already union ALWAYS_AVAILABLE or
ASSISTANT_ALWAYS_AVAILABLE into relevant_tools before passing it in,
so they're unaffected.
2026-06-11 09:49:20 +09:00
pewdiepie-archdaemon f5ad59317c Tool retrieval: HARD drop manage_memory when query is a contact-save pattern
Description-level steering wasn't enough — even with the explicit 'DO
NOT use for info about another person' in manage_memory's description,
models kept choosing memory over manage_contact. They can't if memory
isn't in the toolset.

New logic in ToolIndex.get_tools_for_query: detect three contact-save
patterns and discard manage_memory from the returned set (overriding
ALWAYS_AVAILABLE):

1. 'save [up to 3 words] for/to <name>' where <name> isn't a timing /
   pronoun stopword (later, tomorrow, me, you, future, etc.). Catches
   the canonical 'save this for X' and the wider 'save this address
   for X', 'save it for X'.
2. 'to/in/into (my) contacts' or 'address book'. Catches both 'add X
   to my contacts' and 'put this in my address book for X'.
3. Possessive: 'save (his/her/their) (address/phone/email/...)'.
   Stronger signal — also force-adds manage_contact to the set in
   case the keyword fallback missed it.

Verified: 8 positive contact patterns all drop memory, 10 false-
positive 'save X for later/tomorrow/me/the next thing' all keep it.
2026-06-11 09:46:34 +09:00
pewdiepie-archdaemon df47536b8d manage_memory descriptions: explicit deferral to manage_contact for person info
Even with manage_contact in the retrieved tool set, models were still
defaulting to manage_memory when the user pasted an address + 'save for
<person>'. Both tools were in front of the model and it picked memory.

Tighten both descriptions to steer at decision-time:
- agent_loop.py manage_memory description: clarify scope is facts
  about the USER, with an explicit 'DO NOT use for info about another
  person' + a 'use manage_contact instead' line.
- tool_index.py manage_memory description: same in shorter form, so the
  embedded retrieval signal is consistent with the prompt-time
  description.
2026-06-11 09:25:23 +09:00
pewdiepie-archdaemon 8a00f954a9 Tool retrieval: catch 'add X to (my) contacts' / 'address book' phrasings
The literal phrase 'add to contacts' missed when there was a name
between 'add' and 'to', e.g. 'add Pat to my contacts'. Anchor on the
tail with 'to my contacts', 'to contacts', 'to address book' so word
boundaries fire regardless of what sits in front.
2026-06-11 09:18:30 +09:00
pewdiepie-archdaemon 8632072ce0 Contacts: postal-address support via vCard ADR, keep tool prompt minimal
Closes the gap that pushed the agent into manage_memory when the user
pasted an address and said 'save this for X'. manage_contact now
accepts an optional address arg end-to-end:

- routes/contacts_routes.py:
  - _normalize_contact carries an 'address' field
  - _build_vcard emits ADR:;;<address>;;;; (street component of the
    RFC-6350 7-part ADR), only when address is non-empty
  - _parse_vcards reads ADR, joins non-empty components with ', '
  - _create_contact and _update_contact thread address through;
    update preserves existing address when caller passes empty
- src/tool_implementations.py do_manage_contact:
  - add accepts address; require at least name+address or email
    (was: email required) so address-only contacts are addable
  - update accepts address; require name OR emails OR address
- src/tool_schemas.py: schema gets a single 'address' string field
- src/tool_index.py + src/agent_loop.py: descriptions get one
  'address' arg mention and a 'use this for save-X-for-person /
  address pastes / phone-with-name' steering line. Net: a few
  bytes added, not a paragraph.

Also: removed a stray name from the schema's manage_contact example
strings ('save Jonathan's email…') — no real names in the codebase.
2026-06-11 09:14:52 +09:00
pewdiepie-archdaemon 153b788134 Tool retrieval: surface manage_contact for 'save X for <person>' patterns
When the user dumps a postal address or phone number alongside a
person's name and says 'save this for X', the vector retriever was
missing manage_contact because its description only mentioned the
literal word 'contact'. The model defaulted to manage_memory (which is
in ALWAYS_AVAILABLE), so the saved fact ended up as un-named memory
that wouldn't surface on a later 'what's X's address?' search.

- Rewrite manage_contact's index description to anchor on the
  semantics: 'save info about another person', including postal/
  mailing address, ZIP, phone, etc. Now it embeds close to address-
  paste queries.
- Extend the keyword intent-map with 'save this for', 'save it for',
  'mailing address', 'postal code', 'their address', etc. — common
  ways users say 'this belongs to a contact' without the literal word
  'contact'.
2026-06-11 08:56:42 +09:00
pewdiepie-archdaemon bc2d934b94 Agent email safety: stage drafts for user approval instead of auto-send
Closes the auto-send hole that let earlier models invent signatures
(e.g. signing 'David' for a user named Felix) and SMTP them to real
recipients before the user could review.

New setting: agent_email_confirm (default True).

When on, the MCP send_email and reply_to_email tools no longer SMTP
directly — they write the composed email to scheduled_emails with a new
status 'agent_draft' (far-future send_at so the scheduled-send poller
ignores them) and return a {pending: true, pending_id, to, subject,
body, message: ...} payload. The model surfaces that to the user.

Backend endpoints to approve / cancel:
- GET    /api/email/pending          → list staged drafts for the owner
- POST   /api/email/pending/{id}/approve → flip status to 'pending' +
                                           backdate send_at so the
                                           existing scheduled-send
                                           poller delivers immediately
- DELETE /api/email/pending/{id}     → status = 'cancelled'

UI:
- Settings / AI Defaults gets a new 'Email Safety' card with the
  toggle, default on.
- Tool descriptions for send_email and reply_to_email now include the
  pending behavior + an explicit 'DO NOT invent a signature, do not
  type a person's name' guardrail.

Pass 2 (next): inline chat card with Send / Discard buttons so the user
doesn't have to type a confirmation reply. Today's prompt + the listing
endpoint give the model a clean path to surface drafts.
2026-06-11 08:50:06 +09:00
RaresKeY d5603ee575 fix(research): migrate active task owners on rename (#3618) 2026-06-11 01:17:02 +02:00
Mazen Tamer Salah 218b9ecbc8 fix(startup): ping real endpoints in warmup/keepalive (#3641)
_warmup_endpoints called model_discovery.get_endpoints(), which does not exist
on ModelDiscovery. It raised AttributeError on every startup and on every 60s
keepalive tick, was swallowed by the outer except, and pinged nothing, so the
cold-start prevention the loop exists for never ran.

Add ModelDiscovery.warmup_ping_urls(), which resolves the /models probe URLs
from the real discover_models() output, and call it from the warmup loop via
asyncio.to_thread (discovery does a blocking port scan, so keep it off the event
loop).

Adds tests/test_warmup_ping_urls.py: resolves /models URLs from discovered
items, honors the limit, degrades to [] on discovery failure, and documents that
get_endpoints never existed.
2026-06-10 19:21:45 +02:00
Srinesh R d9a4b99046 fix: handle batch events format in manage_calendar tool (#3503)
* fix: handle batch events format in manage_calendar tool

Models like deepseek-v4-flash emit batch events array instead of individual create_event calls. The tool defaulted to list_events (no action key), so events were never created despite the model confirming success.

- Add batch normalization in do_manage_calendar

- Map start/end objects to flat dtstart/dtend strings

- Add tests for both object and flat string formats

* fix: surface partial batch failures in manage_calendar

Partial failures were silently dropped - batches with mixed success/failure would report only created count with no error visibility.

- Return non-zero exit code for any failures

- Surface both created and failed counts in response

- Include first error message for debugging

- Add test for partial failure case

* chore: strip trailing whitespace in batch normalization block

* chore: strip whitespace-only blank lines in batch events test
2026-06-10 19:13:08 +02:00
Mazen Tamer Salah f5b91f1e9e fix(tasks): read Memory.text in classify_events personal context (#3640)
The classify_events task pulled user memories to give the LLM personal context,
but read `m.content`, which the Memory ORM does not have (the column is `text`).
That raised AttributeError on the first row; the surrounding except swallowed it
and logged at debug, so the personal-context block was silently always empty and
events were classified without it.

Extract the rendering into `_memory_context_lines` (reads `text`, robust via
getattr, keeps the 200-char and 40-line caps) and raise the swallowed-exception
log to warning so a future schema mismatch is visible.

Adds tests/test_classify_events_memory_text.py for the field, truncation, blank
skipping, missing-attr robustness, and the line cap.
2026-06-10 19:03:45 +02:00
Mazen Tamer Salah 4e210d3337 fix(research): stop rescanning the research dir on every status poll (#3637)
get_status() called get_avg_duration() unconditionally, and that helper globs
and JSON-parses every file under the research data dir. The SSE status stream
polls get_status() roughly once a second, so with a few saved reports each poll
re-read and re-parsed all of them, including for sessions that are not active
(the disk branch never even used the value).

Compute avg_duration only for active sessions and memoize it on the task entry,
so a long stream computes it once instead of on every poll. Behaviour is
unchanged: active streams still report avg_duration.

Adds tests/test_research_status_avg_duration.py: an inactive session does no
avg scan, and an active session computes it once across many polls.
2026-06-10 17:40:44 +02:00
pewdiepie-archdaemon 2bf372b41c Tasks: optional persona for LLM + research tasks (biases output voice)
Wire the existing built-in PERSONAS catalog through to scheduled tasks
the same way I wired it to reminder synthesis. Repurposes the
dormant scheduled_tasks.character_id column.

UI (static/js/tasks.js)
- New 'Persona' select in the LLM / Research task form, with the five
  built-in characters (socrates/razor/nietzsche/spark/odysseus) plus a
  default 'no persona' option. Pre-populates from existing.character_id
  on edit. Non-llm/research types explicitly clear it on save.

API (routes/task_routes.py)
- TaskCreate + TaskUpdate gain character_id: Optional[str].
- _task_to_dict echoes character_id back so the form can hydrate on
  edit. Update endpoint stores '' as None to allow clearing.

Runner (src/task_scheduler.py)
- When task.character_id is set and matches a built-in persona, prepend
  the persona prompt to the task system prompt so the model speaks in
  that voice while still knowing it's running a scheduled task.
- crew_member.personality still wins as the base; character_id stacks
  on top.
2026-06-10 23:36:18 +09:00
SurprisedDuck e115b0155c fix(security): don't grant tool access in the pre-setup window (#3506)
* fix(security): don't grant tool access in the pre-setup window

owner_is_admin_or_single_user() returned True whenever auth was not
configured, which conflated two very different states:

  - intentional single-user mode (operator set AUTH_ENABLED=false), and
  - the pre-setup window (auth enabled, but no admin created yet).

In the second state, blocked_tools_for_owner() returned an empty set, so
server-execution tools (bash/python) and other admin-only tools were
ungated. The auth middleware already 401s /api/ requests pre-setup, but a
caller that bypasses it (trusted loopback / internal-tool path) could reach
those tools before setup completed.

Treat "not configured" as admin only when auth is intentionally disabled
(AUTH_ENABLED=false), mirroring the AUTH_ENABLED parsing in app.py and
core.middleware. Single-user mode is preserved; the pre-setup window is now
non-admin as defense-in-depth.

Adds regression tests for both states.

Fixes #3201

Supported by Claude Opus 4.8

* refactor(security): reuse _auth_disabled() instead of a duplicate helper

Addresses review on #3506: src/auth_helpers.py already has _auth_disabled()
with the identical AUTH_ENABLED parse. Drop the duplicate
_auth_intentionally_disabled() and call the existing helper via a lazy import
inside owner_is_admin_or_single_user (mirroring the lazy core.auth import) to
avoid any import cycle. Removes the now-unused `import os`. Behaviour and the
two regression tests are unchanged.

Supported by Claude Opus 4.8

---------

Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
2026-06-10 14:37:26 +02:00
ooovenenoso 725d174243 fix(research): track analyzed URLs separately (#3125)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-10 12:08:22 +01:00
Yeoh Ing Ji 3e49658204 refactor(tools): extract document tools to handle registry (#3666)
* feat(tools): add document management tool handlers to the agent_tools module

* feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py

* feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools

* refactor(tools): add document tool dispatcher and updated tool calling path

* refactor(tools): remove duplicated document management functions

* refactor(tools): removing unused functions and adding new import paths

* refactor(tools): update document tool execute methods to use context dictionary

* refactor(tests): update import paths for document tools in test files

* refactor(tests): update owner parameter format in document management tests

* refactor(tests): update import path for _owned_document_query

* feat(tools): add document management tool handlers to the agent_tools module

* feat(tools): extraced document tools for create, update, edit, suggest, and manage from tool_implementations.py

* feat(tests): refactor document tool tests to use TOOL_HANDLERS and document_tools

* refactor(tools): add document tool dispatcher and updated tool calling path

* refactor(tools): remove duplicated document management functions

* refactor(tools): removing unused functions and adding new import paths

* refactor(tools): update document tool execute methods to use context dictionary

* refactor(tests): update import paths for document tools in test files

* refactor(tests): update owner parameter format in document management tests

* refactor(tests): update import path for _owned_document_query

* refactor: update import paths for document tools

* fix(tests): correct source path for document ID test
2026-06-10 10:41:52 +02:00
pewdiepie-archdaemon 4f7061fd61 Settings overhaul + UI polish pass
Two months of iteration on the Settings panel, integration forms, and
small visual nudges across the app. Highlights:

Settings restructure
- Add Models: split into separate Local + API cards (no more in-card
  tabs); each fuses Type/Provider with the URL input.
- Added Models: new dedicated sidebar tab, with Probe + Clear-offline
  pulled into its header; Local/API sub-section icons accent-tinted.
- Search: Web Search and a new Deep Research card (Model + tuning),
  with a cross-link to AI Defaults. Provider hints use real clickable
  anchors; Web Search Test button shows a whirlpool spinner.
- AI Defaults: Image Generation card returns; Research Model card
  carries only Endpoint+Model with a cross-link to Search; Vision /
  Default / Utility fallbacks unified under one numbered-row design
  matching Search's chain.
- API Permissions (was 'API Tokens'): per-row rename, inline
  Permissions toggle that expands the scope-edit panel, in-field
  copy icons (icon→check on success). Empty state accent-tinted.
- Integrations: + Add Integration drops a type-picker menu directly
  under the button (drop-up on tight viewports); each integration
  form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses
  the same accent-outlined Save/Test/Cancel buttons right-aligned.
- Danger Zone: Wipe→Delete with trash icons; new 'Delete everything'
  row at the bottom that loops every category.

AI Synthesis (Reminders)
- Persona dropdown sourced from PROMPT_TEMPLATES + custom preset.
- src/reminder_personas.py mirrors the five built-ins for the
  server-side synthesis path.
- dispatch_reminder() reads reminder_llm_persona and uses the
  persona's system prompt; empty/unknown falls back to warm-neutral.

Esc handling
- Kebab menus and the provider picker intercept Esc in capture phase
  so dismissing a popup no longer closes the whole Settings modal.

Accent tinting
- Scoped CSS rule across data-settings-panel=ai/services/added-models/
  search/integrations/reminders for card h2 icons + the Added Models
  sub-section icons.

Codex/Claude integration form
- No more auto-creation on form open — explicit Create token button.
- New tokens start with every scope granted; existing tokens move out
  of the integration form into the API Permissions card.
- Setup reveal: copy buttons inline inside the token + setup code
  blocks; shorter subtitle wording.

Misc visual polish
- Save/Test/Cancel uniformly accent-outlined and right-aligned on
  every integration form.
- Provider logos render inline next to the search fallback selects
  and the Deep Research Search dropdown.
- Trash icons in fallback rows bumped to 20x20 so they fill the 32px
  button.
- Image generation default flipped to off.
2026-06-10 15:15:13 +09:00
Lucas Daniel 55ff22c6d5 fix(chat): stabilize system prompt, sequence memory extraction, and send stable session id to preserve KV cache (#3360)
* fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache

Fixes #2927. As diagnosed in the issue, three things in Odysseus's request
pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache
continuity, forcing a full prompt re-evaluation (15-30s+) on every turn:

1. Dynamic content folded into the system prompt every turn. Both the chat
   preface (ChatProcessor.build_context_preface) and the agent system prompt
   (_build_system_prompt) injected current_datetime_prompt() — text that
   changes every minute — directly into system-role messages, which llm_core
   then concatenates into the single system message sent as the cached
   prefix. Any byte difference there invalidates the entire cache. Moved this
   to a new current_datetime_context_message() helper that returns a
   standalone user-role message, inserted near the end of the array (right
   before the latest user turn) instead of mixed into the system prompt. The
   static system prefix (preset prompt + safety policy + agent base prompt)
   now stays byte-identical across turns of the same session.

2. Memory/skill extraction side-requests competed with the main completion.
   run_post_response_tasks fired extract_and_store / maybe_extract_skill via
   asyncio.create_task — fire-and-forget coroutines that could overlap the
   next turn's main request and steal llama.cpp's limited processing slots,
   evicting the cached checkpoint. They're now queued through a new
   _run_extraction_jobs_sequentially helper that waits for the session's
   stream to go idle and runs the jobs strictly one at a time.

3. No stable session identifier was sent to local backends, so llama.cpp
   assigned a new processing slot via LRU every turn ("session_id=<empty>
   server-selected (LCP/LRU)"), losing slot affinity. Added
   _apply_local_cache_affinity() in llm_core, which sets session_id and
   cache_prompt: true on outgoing payloads — gated to self-hosted
   OpenAI-compatible endpoints only (never api.openai.com or other cloud
   providers, which reject unrecognized request fields with a 400). Threaded
   session_id through stream_llm / llm_call_async / stream_agent_loop from
   the existing Odysseus session id.

Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload-
assembly and scheduling code paths: byte-identical system prefix across two
turns of the same session (with a regression check that genuinely changed
instructions DO still change it), the dynamic time block landing as a
user-role message, extraction jobs waiting for the stream to go idle and
running sequentially, and the outgoing payload carrying a stable session_id
(same across turns of one session, different across sessions) only for
self-hosted endpoints. Updated tests/test_user_time.py for the new message
placement.

* fix(tests): accept owner= kwarg in normalize_model_id monkeypatch

The upstream normalize_model_id signature now takes an owner= keyword
argument, and chat_helpers.py passes owner=getattr(sess, "owner", None)
at the call site. Update the test stub lambda to **kwargs so it handles
the new argument without breaking, and update chat_helpers.py to forward
the owner parameter consistently.

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 22:46:54 +01:00
Lucas Daniel d273085744 fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut (#3540)
* fix(integrations): truncate api_call JSON lists with sentinel instead of mid-string cut

* fix(integrations): avoid mutating response dict in-place on truncation

* fix(integrations): truncate dict responses and bound list sentinel overhead

- Dict path now walks keys in insertion order, adding them one at a time
  while checking that the accumulated dict + _truncated marker fits within
  the 12 000-char limit. Previously the marker was appended without removing
  any content, so large dicts were not actually truncated.
- List path now subtracts the sentinel's serialised size (+ element-separator
  padding) from the budget before binary-searching, so the final array
  including the sentinel stays at or under the limit.
- Add regression tests: large-dict actually-truncated, small-dict pass-through,
  and list-with-sentinel respects the size bound.

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 22:34:08 +01:00
Michael 2e6fff2212 fix: preserve reasoning_content in sanitized messages for Moonshot/Kimi (#3152)
Providers like Moonshot (Kimi K2.5/K2.6) require the reasoning_content
field to be present on assistant tool-call messages in multi-turn
conversations.  The sanitizer's allow-list was missing this field,
causing HTTP 400: 'thinking is enabled but reasoning_content is missing
in assistant tool call message at index N'.

Add reasoning_content to the allowed field set in
_sanitize_llm_messages and cover with regression tests.

Fixes #3118

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 21:44:38 +01:00
Rohith Matam fbd8ee9033 fix: fall back for npx cache subprocess check (#3560)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 20:41:23 +01:00
Rares Tudor 016157019c fix(tools): use _INTERNAL_BASE in serve-session endpoint registration (#3675)
#3322 renamed the loopback base to _INTERNAL_BASE, but a later Cookbook
commit reintroduced one call site using the old _COOKBOOK_BASE name,
raising NameError whenever the agent registers a model endpoint for a
running serve session.

Fixes #3669
2026-06-09 20:31:29 +02:00
Sid 9e74a327f8 fix(llm): remove max_output_tokens from ChatGPT Subscription payload (#3656)
ChatGPT's Codex API rejects any request that includes max_output_tokens,
returning HTTP 400 "Unsupported parameter: max_output_tokens". This caused
Deep Research to always fail during the endpoint probe when a ChatGPT
Subscription model was selected.

Remove the conditional that set payload["max_output_tokens"] in
_build_chatgpt_responses_payload(). The parameter is simply not sent.

Also update the two affected tests:
- Rename test_chatgpt_subscription_payload_uses_max_output_tokens →
  test_chatgpt_subscription_payload_omits_max_output_tokens
- Rename test_chatgpt_subscription_payload_omits_empty_max_output_tokens →
  test_chatgpt_subscription_payload_omits_max_output_tokens_when_zero
- Assert max_output_tokens is absent rather than present

Fixes #3650
2026-06-09 17:42:12 +02:00
Sheikh Rahat Mahmud 9180847c0e feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964)
* Add consolidated service health endpoint for degraded-state reporting

ROADMAP (High Priority) asks for "Better degraded-state reporting for
ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no
single readout of which subsystems are actually working: /api/health is only a
liveness ping and each subsystem's signal lives in a different module, so a
misconfigured self-host install gives no consolidated picture.

This adds an admin-only GET /api/diagnostics/services endpoint backed by a new
src/service_health.py aggregator. Each subsystem reports a uniform
{name, status, detail, meta} where status is ok | degraded | down | disabled,
and the response rolls up an overall verdict (worst non-disabled status).

Probes are deliberately non-intrusive and safe to poll:
- ChromaDB: reads the .healthy flags on the RAG and memory vector stores.
- SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No
  search query is run.
- ntfy: GET the server's built-in /v1/health. No test notification is sent.
- email: short IMAP connect+logout per configured account (no credentials in
  meta).
- providers: probe each enabled ModelEndpoint's model list (no api_key in meta).

Probe functions take their inputs as parameters and isolate the network call to
injectable callables, so they unit-test without touching the network (same
pattern as the merged provider-endpoint tests). Network probes run concurrently
off the event loop via asyncio.to_thread with bounded per-probe timeouts.

memory_vector is now passed into setup_diagnostics_routes (new optional param,
backward-compatible) so ChromaDB's vector-memory store can be reported too.

Tests: tests/test_service_health.py — 29 tests covering every status mapping
per subsystem, the overall rollup, and that no secrets leak into meta.

Verification:
  python -m pytest tests/test_service_health.py -q          # 29 passed
  python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py
  python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q

Backend + tests only; an Admin/Settings UI badge that renders this endpoint is
a natural follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(diagnostics): bound service-health wall-clock and redact secrets

Addresses review on #964.

Blocker 1 — genuinely bounded wall-clock:
- providers_health and email_health now fan out per-item probes across a
  bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET),
  instead of probing endpoints/accounts sequentially. Stragglers are reported
  as a controlled `timeout` and never block; the pool is shut down with
  wait=False so the response returns on time regardless of endpoint/account
  count.
- The IMAP connect path now honors the service-health budget: _imap_connect
  gained a pass-through `timeout` param and the probe calls it with
  _PROBE_TIMEOUT instead of the default 15s.
- collect_service_health runs the four network subsystems concurrently, each
  under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall
  wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop.

Blocker 2 — no secret/raw-error leakage in the response:
- _safe_url strips userinfo, query, and fragment from every URL surfaced in
  meta (searxng instance, ntfy base, provider name fallback), keeping only
  scheme/host/port/path.
- _classify_error maps every probe failure to a controlled category token
  (timeout, connection_refused, dns_error, tls_error, network_error,
  http_error, auth_or_protocol_error, …) — raw str(exception), which can embed
  credentialed URLs or server text, is never returned.

Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py):
- URL userinfo/query redaction for searxng/ntfy/providers.
- secret-bearing exception strings map to categories and don't leak.
- multiple slow providers/accounts stay bounded (single + 25-endpoint cases).
- subsystems run concurrently; aggregate deadline yields a controlled result.
- route-level unauthenticated (401) / non-admin (403) / admin (200) coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(diagnostics): isolate route tests so they don't leak module globals

The new route tests replaced src.service_health.collect_service_health and
routes.diagnostics_routes.require_admin via direct assignment, which persisted
for the rest of the pytest session. In CI's full alphabetical run that fake
collector (returning services=[]) leaked into the later collect_service_health
tests and failed them. Switch to monkeypatch.setattr so both are restored after
each test. No production code change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 16:00:24 +01:00