Commit Graph

20 Commits

Author SHA1 Message Date
pewdiepie-archdaemon 6d507f8128 Merge remote-tracking branch 'origin/dev' into test-main-dev-merge-20260615
# Conflicts:
#	src/tool_implementations.py
#	static/js/research/panel.js
2026-06-15 21:20:15 +09:00
nsgds 7ae6133d7f fix(agent): don't let a materialized default budget defeat context-window scaling (#4122)
* fix(agent): don't let a materialized default budget defeat context scaling

#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.

Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.

is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): rework context-budget fix per review (#4122)

Address RaresKeY's review:

P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.

P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.

hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(agent): lock the materialized-default budget regression (review on #4121)

Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.

To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.

Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): bind the budget context window to its own provenance (review #4122)

RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
  fallback while the fresh probe proves the real (smaller) served context — we'd
  scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
  skill test runs, bg_monitor) were capped at 6000 even when a long window is
  discoverable.

Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): fix stale budget comments + tighten to the contract (review #4122)

- settings.py: an explicit budget is clamped to the window only — hard_max is
  auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
  point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
  non-default = explicit cap, hard_max = auto-only ceiling).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)

The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230#1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).

The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 15:17:28 +09:00
Mazen Tamer Salah f5c1eb4b9d fix(settings): degrade load_features to defaults on PermissionError
load_settings() already catches PermissionError, but load_features() caught only
FileNotFoundError/JSONDecodeError/ValueError. An existing-but-unreadable
data/features.json (e.g. root-owned after a deploy) therefore raised instead of
falling back to DEFAULT_FEATURES, taking down GET /api/auth/features and anything
that reads feature flags. Add PermissionError to the except tuple to match
load_settings().

Adds tests/test_load_features_permission_error.py.

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-11 21:20:10 +01:00
pewdiepie-archdaemon bc2d934b94 Agent email safety: stage drafts for user approval instead of auto-send
Closes the auto-send hole that let earlier models invent signatures
(e.g. signing 'David' for a user named Felix) and SMTP them to real
recipients before the user could review.

New setting: agent_email_confirm (default True).

When on, the MCP send_email and reply_to_email tools no longer SMTP
directly — they write the composed email to scheduled_emails with a new
status 'agent_draft' (far-future send_at so the scheduled-send poller
ignores them) and return a {pending: true, pending_id, to, subject,
body, message: ...} payload. The model surfaces that to the user.

Backend endpoints to approve / cancel:
- GET    /api/email/pending          → list staged drafts for the owner
- POST   /api/email/pending/{id}/approve → flip status to 'pending' +
                                           backdate send_at so the
                                           existing scheduled-send
                                           poller delivers immediately
- DELETE /api/email/pending/{id}     → status = 'cancelled'

UI:
- Settings / AI Defaults gets a new 'Email Safety' card with the
  toggle, default on.
- Tool descriptions for send_email and reply_to_email now include the
  pending behavior + an explicit 'DO NOT invent a signature, do not
  type a person's name' guardrail.

Pass 2 (next): inline chat card with Send / Discard buttons so the user
doesn't have to type a confirmation reply. Today's prompt + the listing
endpoint give the model a clean path to surface drafts.
2026-06-11 08:50:06 +09:00
pewdiepie-archdaemon 4f7061fd61 Settings overhaul + UI polish pass
Two months of iteration on the Settings panel, integration forms, and
small visual nudges across the app. Highlights:

Settings restructure
- Add Models: split into separate Local + API cards (no more in-card
  tabs); each fuses Type/Provider with the URL input.
- Added Models: new dedicated sidebar tab, with Probe + Clear-offline
  pulled into its header; Local/API sub-section icons accent-tinted.
- Search: Web Search and a new Deep Research card (Model + tuning),
  with a cross-link to AI Defaults. Provider hints use real clickable
  anchors; Web Search Test button shows a whirlpool spinner.
- AI Defaults: Image Generation card returns; Research Model card
  carries only Endpoint+Model with a cross-link to Search; Vision /
  Default / Utility fallbacks unified under one numbered-row design
  matching Search's chain.
- API Permissions (was 'API Tokens'): per-row rename, inline
  Permissions toggle that expands the scope-edit panel, in-field
  copy icons (icon→check on success). Empty state accent-tinted.
- Integrations: + Add Integration drops a type-picker menu directly
  under the button (drop-up on tight viewports); each integration
  form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses
  the same accent-outlined Save/Test/Cancel buttons right-aligned.
- Danger Zone: Wipe→Delete with trash icons; new 'Delete everything'
  row at the bottom that loops every category.

AI Synthesis (Reminders)
- Persona dropdown sourced from PROMPT_TEMPLATES + custom preset.
- src/reminder_personas.py mirrors the five built-ins for the
  server-side synthesis path.
- dispatch_reminder() reads reminder_llm_persona and uses the
  persona's system prompt; empty/unknown falls back to warm-neutral.

Esc handling
- Kebab menus and the provider picker intercept Esc in capture phase
  so dismissing a popup no longer closes the whole Settings modal.

Accent tinting
- Scoped CSS rule across data-settings-panel=ai/services/added-models/
  search/integrations/reminders for card h2 icons + the Added Models
  sub-section icons.

Codex/Claude integration form
- No more auto-creation on form open — explicit Create token button.
- New tokens start with every scope granted; existing tokens move out
  of the integration form into the API Permissions card.
- Setup reveal: copy buttons inline inside the token + setup code
  blocks; shorter subtitle wording.

Misc visual polish
- Save/Test/Cancel uniformly accent-outlined and right-aligned on
  every integration form.
- Provider logos render inline next to the search fallback selects
  and the Deep Research Search dropdown.
- Trash icons in fallback rows bumped to 20x20 so they fill the 32px
  button.
- Image generation default flipped to off.
2026-06-10 15:15:13 +09:00
Logan Davis f72e1bd412 feat(reminders): add generic webhook as a fourth reminder channel (#2952)
Replaces any Discord-specific reminder channel with a generic outbound
webhook channel. Users pick any saved Integration as the target and
supply a JSON payload template with {{title}} and {{message}}
placeholders — values are JSON-escaped before substitution. Works with
Discord, Slack, Teams, ntfy (JSON mode), or any service that accepts a
POST with a JSON body.

- `src/settings.py` — reminder_webhook_integration_id +
  reminder_webhook_payload_template defaults
- `routes/note_routes.py` — webhook delivery block; Integration lookup,
  template rendering, auth wiring; built-in preset defaults so
  discord_webhook works out of the box without a configured template;
  settings_override kwarg avoids test-button race condition
- `routes/auth_routes.py` — discord_webhook preset test handler
- `src/integrations.py` — discord_webhook preset with description +
  example templates; hides auth/key fields in the Integration form
- `src/builtin_actions.py` — webhook_sent delivery check
- `src/tool_implementations.py` — webhook aliases + enum updated
- `static/index.html` — Webhook channel option; Integration picker +
  payload template textarea
- `static/js/settings.js` — Integration list, populateWebhookIntegrations,
  syncChannelRows, hints, load/save, auto-fill preset templates,
  test-button override payload, hide auth/key for URL-auth presets

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 22:47:57 +02:00
Kenny Van de Maele 64d65b73c1 feat: round-limit handling — Continue affordance at the cap + configurable cap (#1999)
* feat: round-limit handling — Continue affordance at the cap + configurable cap

When the agent loop runs out of rounds (per-message step cap, default 20)
while still actively using tools, it stopped silently mid-task. Now:

1. The loop emits a `rounds_exhausted` SSE event at the cap, and the UI shows
   a "Continue" pill at the bottom of the chat that resumes the task from where
   it left off. Repeated cap-hits each get a fresh Continue (multiple continues
   in a row).
2. The cap is configurable in Settings → Agent ("Max steps per message"),
   validated on the client, at the save endpoint, and at the read site.

- src/agent_loop.py: track `_exhausted_rounds` (set only when a full
  tool-executing round completes on the last allowed round — i.e. the agent
  wanted to keep going); emit `{"type":"rounds_exhausted","rounds":N}` (logged).
- routes/chat_routes.py: read `agent_max_rounds` (clamped 1..200), pass as
  `max_rounds`; forward the new event through the SSE relay.
- routes/auth_routes.py: validate numeric settings on save (int + clamp;
  agent_max_rounds 1..200, agent_max_tool_calls 0..1000; 400 on non-int).
- src/settings.py: default `agent_max_rounds = 20`.
- static/: Settings input + client-side clamp; the Continue pill (reuses the
  existing .stopped-indicator / .continue-btn classes and theme vars
  --border/--fg/--bg/--accent); appended to the chat container so it survives
  the message re-render at stream finalize. chat.js cache version bumped.

* test: cover rounds_exhausted emission (cap-hit vs normal finish)

Drives the real stream_agent_loop with mocked LLM stream / tool exec / settings:
a tool block every round exhausts the cap and must emit rounds_exhausted; a
plain answer hits the done-break and must not. Guards the for/else logic.
2026-06-04 22:36:05 +02:00
ooovenenoso ab5311c44d fix(research): support timeout defaults in direct tests (#2624)
fix(research): honor planning query timeouts
2026-06-04 20:23:17 +02:00
Lucas Daniel 398892cced fix(settings): catch PermissionError in load_settings + error-path tests (#1570)
PermissionError was not in the except tuple so an unreadable settings.json
would crash the app instead of falling back to defaults. Added alongside the
existing FileNotFoundError/JSONDecodeError/ValueError catches.

Also adds test_settings_error_paths.py covering all four failure modes:
missing file, corrupted JSON, wrong type, and permission denied.
2026-06-03 14:23:27 +09:00
red person 35c40bce75 Fall back from invalid settings stores (#1416) 2026-06-03 03:53:05 +09:00
nickorlabs c39d8db12a fix(agent): make context-budget hard_max configurable via agent_input_token_hard_max setting (#1273)
Completes the reviewer requirement from PR #1190 review that was carried
over but not implemented in #1230:

> "The hard max is a function-local constant. For this setting, the ceiling
>  should be configurable or at least represented as a named setting/default
>  with tests."
                                                                — review on #1190

#1230 shipped the adaptive auto-derivation but left `DEFAULT_HARD_MAX = 200_000`
as a hardcoded module constant in src/context_budget.py. Admins on premium
APIs with large context windows (kimi-k2 / minimax-m3 at 1M, etc.) can use
their full window today only by setting `agent_input_token_budget`
explicitly — which then takes them off the adaptive auto-path entirely.

## What this PR changes

- src/settings.py: register `agent_input_token_hard_max` in
  DEFAULT_SETTINGS, default 200_000 (matches `DEFAULT_HARD_MAX`). Inline
  comment documents the no-op semantics in the explicit branch.

- src/agent_loop.py: read the setting at the call site and pass it as the
  `hard_max` kwarg of `compute_input_token_budget`. Defensive parsing —
  missing / non-int / zero values fall back to `DEFAULT_HARD_MAX`, so a
  misconfig cannot silently zero the budget.

- src/tool_implementations.py: three friendly aliases for `manage_settings`:
  - "hard max" -> agent_input_token_hard_max
  - "token budget cap" -> agent_input_token_hard_max
  - "input budget cap" -> agent_input_token_hard_max
  Plus the existing "token budget" -> agent_input_token_budget keeps a
  matching shorter alias "input budget".

- tests/test_context_budget.py: 6 new tests on top of the existing 6:
  - hard_max raises the auto ceiling (1M ctx + raised cap -> 85% of ctx)
  - hard_max lowers the auto ceiling (128K ctx + 50K cap -> 50K)
  - hard_max has no effect on the explicit branch
  - DEFAULT_SETTINGS contains the new key
  - manage_settings aliases are registered
  - the live get_setting path returns the override value, and malformed
    values fall back per the agent_loop defensive parsing

12 passed in 0.04s. No changes to the pure helper signature or semantics;
#1230's behavior is the default when the new setting is unset.

## How it lets users drop the explicit override

Before this PR, on a 1M-context model:
  agent_input_token_budget = 900_000  (explicit)  -> 900K  [user override]
  agent_input_token_budget = <unset>  (auto)      -> 200K  [HARD_MAX]

After this PR, same model:
  agent_input_token_budget = <unset>
  agent_input_token_hard_max = 900_000
                                      -> min(1M * 0.85, 900K) = 850K  [auto, no override needed]

The explicit-override path keeps working unchanged for users who prefer it.
2026-06-03 01:36:57 +09:00
lekt8 8c376d2b0e feat: adapt agent_input_token_budget to the model context window (#1170) (#1230)
The agent soft-trims input context to `agent_input_token_budget` (default 6000).
The old computation `min(context_length or budget, budget)` made the 6000 default
a hard ceiling for every model, so 128K/1M context models were silently capped at
6000 input tokens — now that num_ctx is sent correctly (#1056), this was the last
barrier to actually using a long context window.

This derives the default budget from the model's discovered context window
(~85%, capped at a generous hard max) while honouring an explicit user setting
exactly (clamped to the window). When the window is unknown it falls back to the
previous value, so behaviour is unchanged for that case.

- src/context_budget.py: pure `compute_input_token_budget()` (unit-testable)
- src/settings.py: `is_setting_overridden()` to tell an explicit user value from
  the merged default (load_settings merges DEFAULT_SETTINGS, so equality alone
  can't distinguish them)
- src/agent_loop.py: use the helper in the soft-trim path

Covered by tests/test_context_budget.py (6 cases).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 00:13:53 +09:00
Ernest Hysa c12ae79c42 fix(tools): strict path confinement with sensitive-subpath deny list (#1072)
Rework read_file / write_file confinement after review feedback:

- Remove $HOME from default allow roots. Only project data/ and system
  temp dirs are allowed out of the box.
- Add a sensitive-subpath deny list (.ssh, .gnupg, shell rc files,
  .env, .netrc, SSH key filenames). Checked BEFORE allowlist so it
  blocks even when a broader root is configured.
- Add "tool_path_extra_roots" setting for opt-in broader access.
- Sensitive subpaths remain blocked regardless of configured roots.

Tests: 24 cases covering /etc/shadow, ~/.ssh/authorized_keys,
symlink into .ssh, traversal, shell rc files, key filenames,
extra roots, and dispatch-level end-to-end.
2026-06-02 23:13:30 +09:00
Nikita Rozanov 119075f368 Research: add configurable run timeout
Surfaces the research_run_timeout_seconds setting (added in #783) in
Settings → Research as a "Max Time" field, and lets 0 disable the
wall-clock cap entirely for long deep-research runs.

- settings.py: document that 0 disables the cap; default stays 1800s.
- research_handler.py: resolve 0 (or negative) to no timeout
  (asyncio.wait_for timeout=None); other values stay bounded to
  [60, 86400] as before.
- index.html / settings.js: "Max Time" input bound to
  research_run_timeout_seconds, validated to {0} ∪ [60, 86400], with
  copy making explicit that 0 = no limit (unbounded model/API cost).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 20:57:57 +09:00
tanmayraut45 1cc2e90ac0 Apply SafeSearch by default across search providers (#763)
#718 reported Deep Research drifting into adult / spam URLs several
rounds into a benign session ("research about https://bhagathgoud.com/
and what he doing currently"). The reporter's log showed Japanese
adult sites being crawled even though the model was emitting normal
queries like "Bhagath Goud LinkedIn" and "site:bhagathgoud.com".

The model wasn't generating those URLs. Every provider call site
constructed its params dict without a SafeSearch parameter, so the
underlying HTTP backend (the duckduckgo-search library / DDG's HTML
endpoint in this case) was free to surface "related search" /
trending / spam recommendations that have nothing to do with the
user's query. Per provider:

- SearXNG: instance-dependent; many self-hosted instances default
  to safesearch=0.
- Brave API: defaults to "off" for new API keys.
- duckduckgo-search lib: defaults to "moderate", which still lets
  related-search recommendations and HTTP-backend fallback URLs
  surface trending non-English spam topics.
- DDG HTML fallback (html.duckduckgo.com): no `kp` param, treated
  as off.
- Google PSE: omitted `safe` is equivalent to off.
- Serper: omitted `safe` proxies to Google with safe off.

Since the bad URLs entered through the provider layer, not the
model, the provider params are the right place to gate this.

Changes:

- src/settings.py: new `search_safesearch` setting with default
  "strict". Documented values ("strict" | "moderate" | "off") plus
  a few aliases ("on", "high", "0/1/2", "disabled", ...) so a
  hand-edited config doesn't silently fall through to off.
- src/search/providers.py:
  - Add `_get_safesearch_level()` (canonical, normalizing) and
    `_safesearch_for(provider)` (per-provider param translation).
  - Thread the per-provider value into every params dict:
    SearXNG JSON, SearXNG language/engines fallbacks, SearXNG HTML,
    Brave, DDG library, DDG HTML fallback, Google PSE, Serper.
  - Tavily is left untouched — its API has no SafeSearch knob and
    its index already filters explicit content at ingest time.

Behavior change for existing installs: default is now "strict", so
explicit results get filtered across every supported provider
without any user action. Users who deliberately want unfiltered
results can set `search_safesearch` to "off" in Settings. No new
dependencies, no schema migrations.

Closes #718.
2026-06-02 11:34:32 +09:00
tanmayraut45 cc40a3263e Lift deep-research hard timeout into a setting (#783)
The 600s wall-clock cap in research_handler.start_research was too short
for local / edge LLMs to finish a deep-research synthesis — long
extraction passes plus a slow final report routinely blew past 10
minutes and the run was killed with partial results.

Introduce research_run_timeout_seconds (default 1800s = 30 min) in
DEFAULT_SETTINGS and resolve it at start_research entry when the caller
hasn't pinned hard_timeout. Bound the resolved value at [60, 86400] so a
misconfigured settings.json can't either disable the safety net or
explode into a multi-day hang. Existing call sites in research_routes.py
and chat_routes.py keep working unchanged — they don't pass hard_timeout
and now pick up the new default.

Closes #595.
2026-06-02 11:23:32 +09:00
Rifqi Akram 5b1e56407b Add SSRF-guarded web fetch agent tool
* feat(web-fetch): add web_fetch tool to read a specific URL's content

* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution

Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.
2026-06-01 16:57:28 +09:00
pewdiepie-archdaemon 0888a3b3e6 Add native Windows compatibility layer 2026-06-01 15:09:47 +09:00
pewdiepie-archdaemon b998c52dd0 Add Deep Research extraction controls 2026-06-01 14:55:33 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00