* log(app): add warnings to silent except Exception blocks
- Internal tool auth header failure now logs a warning instead of
silently passing, making auth bypass easier to spot in logs.
- Token last_used_at update failure now logs at DEBUG (fire-and-forget,
non-critical, but useful when debugging token tracking issues).
- Image ownership verification failure now logs a warning so unexpected
access-check errors surface instead of silently allowing the request.
* log(chat_routes): add warnings to silent except Exception blocks
- clear_orphaned_session_endpoint: log before rollback so failures
appear in traces when users see stale/deleted model options.
- _endpoint_has_model (JSON parse): log malformed cached_models instead
of silently treating endpoint as valid.
- _has_any_visible_model (JSON parse): log malformed cached_models
instead of silently returning empty list.
- timezone header parse: log failure so time-zone-related tool bugs
(wrong scheduled times, calendar events) are traceable.
- attachments JSON parse: log failure so silently-dropped attachments
are visible in server logs.
* log(email_routes): add warnings to silent except Exception blocks
- Email alias resolution failure now logs a warning instead of silently
returning an empty list, making broken account configs diagnosable.
* log(document_routes): add warnings to silent except Exception blocks
- Export ZIP request body parse failure now logs a warning so empty
exports caused by malformed requests are diagnosable.
- clear_active_document failure on detach now logs a warning to help
trace doc re-injection bugs like #1160.
* log(agent_loop): add warnings to silent except Exception blocks
- builtin tool overrides load failure now logs a warning so misconfigured
settings don't silently fall back to defaults without a trace.
- Timezone context injection failure now logs a warning to help debug
incorrect scheduled times in agent-created tasks.
- PDF form-backed document detection failure now logs a warning so
broken form-doc UI is traceable to the root cause.
* log(llm_core): add warnings to silent except Exception blocks
- Malformed URL in _is_ollama_native_url now logs a warning so bad
endpoint configs are traceable instead of silently returning False.
- Model list fetch failure now logs a warning with the endpoint URL so
endpoints that silently vanish from the model picker are diagnosable.
* log: pass exception via exc_info instead of string interpolation
* fix(logging): avoid logging raw URLs in llm_core error paths
Drop the raw url/base_chat_url from the Ollama-detection and
model-list-fetch warning logs added by this sweep, since these values
can contain private hostnames, internal IPs, credentials, or other
deployment details.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Agent: pass the open email reader (uid/folder/account/from/subject/body
preview) on every chat submit so 'reply to this' / 'write email saying
hi' route to ui_control open_email_reply with the right UID instead of
inventing a new .md draft. Code-level enforcement (chat_routes strips
create_document + send_email when active_email is set); cross-session
active_doc_id is now trusted instead of being silently dropped.
set_active_email/clear_active_email tool-layer helpers in
tool_implementations.
- ui_control open_email_reply: optional body argument so the agent can
open-and-write in one call; envelope now forwards uid/folder/account/
body/panel through tool_output. Tool description sharpened and the
parser rejects empty bodies on reply/reply-all (forces the agent to
write rather than open an empty draft).
- Email library: search now runs against [Gmail]/All Mail when the
current folder is INBOX (archived emails surface). Whirlpool spinner
+ 'Searching…' placeholder while in flight. Each search result is
stamped with its source folder so clicks open the right email instead
of whatever shares its UID in INBOX. Search no longer re-applies the
same text pill locally (which only checks subject/from/snippet, never
body) so body-only matches don't get dropped after IMAP returns them.
Initial inbox load bumped 100→500.
- Email favorites: 'Favorite (pin to top)' / 'Unfavorite' in both the
card menu and the open-reader more menu, backed by a new
/api/email/flag/{uid}?on=true|false endpoint. Flagged emails always
bubble to the top of the grid regardless of active sort.
- AI reply in doc editor: never overwrites existing draft text or the
quoted history. AI suggestion is prepended; AI-generated 'On …
wrote:' re-quotes are stripped so the original quote isn't visually
edited.
- Cookbook serve: pre-launch GPU driver / has_gpu / install / version-
floor checks (vllm minimax_m2 needs 0.10.0+, deepseek_r1 needs 0.7.0
etc.) before the launch chain starts. Detect 'another model already
running on this host' and offer Stop & launch (with graceful then
force tmux kill helpers, port release wait). Per-vendor deep-link
buttons (vLLM recipe / SGLang cookbook) with hardware hash. Backend
picker is now a custom dropdown with accent-coloured logos for vLLM,
SGLang, llama.cpp, Ollama, Diffusers; same glyphs added next to
package names in Dependencies. Runtime-readiness note moved inside
the panel (green when ready, red when missing) with an × dismiss.
Esc collapses the expanded card; expanded card scrolls when it
overflows; Trust Remote / Auto Tool / Reasoning Parser / Enforce
Eager / Prefix Caching / Expert Parallel / Speculative / MoE Env on
one row (Reasoning Parser auto-detected per model family).
Dtype→Row 1, GPUs→Row 2 (rightmost). Removed redundant GPU 'auto'
input — command builders read from the GPU button strip. Default
cookbook open is Download tab.
- Cookbook hwfit: 'Model (latest)' / 'Model (oldest)' header sorts by
release_date; release dates can be backfilled with the new
scripts/backfill_model_release_dates.py and recipe metadata pulled
with scripts/import_from_vllm_recipes.py against the upstream
vllm-project/recipes catalog (vllm_recipe + min_vllm_version stamped
on entries).
- Calendar: Quick add hint cycles a random Odysseus-themed example per
open (wooden horse Friday, crew muster 10am daily, council on
Ithaca, …). Typing a time like '11pm' in the event title updates
the hero clock live.
- Doc editor: email-mode Reply button (sparkle icon, accent) opens the
same Fast/Full + context popover the email reader uses; Ctrl+Alt+M
toggles markdown preview.
- Memories panel: custom sort picker with per-option icons, default
'Latest', visible Enabled/Disabled toggle text matching the section
description style.
* Agent: make skill-prescribed tools actually callable
The skill index and matched-skill procedures are injected into the
prompt, but tool selection never followed: manage_skills wasn't in the
RAG-selected schema list (so the model substituted manage_memory), and
a matched skill could prescribe tools (grep, read_file) the model had
no schema for. Now:
- manage_skills rides along whenever the owner has any skills indexed
- a Jaccard-matched skill's requires_toolsets join the selection
- viewing a skill mid-turn via manage_skills unlocks its
requires_toolsets for subsequent rounds
- admin-intent turns send _ADMIN_TOOLS schemas, matching the prompt
text _build_base_prompt already advertises
- index_for(active_toolsets=None) no longer hides requires_toolsets
skills from callers that don't know the active set
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Agent: validate skill requires_toolsets against known tools, not TOOL_SECTIONS
grep/glob/ls ship as function schemas without a prompt-prose section,
so gating on TOOL_SECTIONS silently dropped them from a skill's
requires_toolsets.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
* fix(kimi): resolve Kimi Code API 403 errors and User-Agent restrictions
Kimi Code subscription keys require a whitelisted coding-agent User-Agent to avoid access_terminated_error 403s. This adds User-Agent probing and caching for Kimi Code endpoints.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(kimi): omit temperature for kimi-for-coding API calls
Kimi Code rejects any non-default temperature with HTTP 400, which broke deep research probes and low-temp LLM rounds.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(agent): don't let a materialized default budget defeat context scaling
#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.
Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.
is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): rework context-budget fix per review (#4122)
Address RaresKeY's review:
P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.
P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.
hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(agent): lock the materialized-default budget regression (review on #4121)
Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.
To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.
Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): bind the budget context window to its own provenance (review #4122)
RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
fallback while the fresh probe proves the real (smaller) served context — we'd
scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
skill test runs, bg_monitor) were capped at 6000 even when a long window is
discoverable.
Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(agent): fix stale budget comments + tighten to the contract (review #4122)
- settings.py: an explicit budget is clamped to the window only — hard_max is
auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
non-default = explicit cap, hard_max = auto-only ceiling).
Comment/docstring-only; no behaviour change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)
The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230 — #1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).
The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).
Comment/docstring-only; no behaviour change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(agent): workspace confinement via context-local binding + get_workspace tool
Bind the per-turn workspace once in execute_tool_block; the shared path
resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd
helper (agent_cwd) read it, so file tools + bash/python are confined centrally
and a new tool that uses the shared helpers cannot accidentally bypass it.
Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory
modal (reusing existing modal/button CSS), the /workspace slash command, and a
get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic
(realpath/normcase/commonpath) and docker-safe (container paths, no host
assumptions). Reopens#2023.
* ux(workspace): clarify workspace is not a sandbox
Picker modal note + pill tooltip + get_workspace tool/output wording now state
plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder,
but bash/python only start there (cwd) and are not sandboxed. Modal note reuses
the existing .muted class.
* fix(agent): treat an active workspace as file-work intent
A vague low-signal message (e.g. "look at the local project") matches no
domain keywords, so tool retrieval is skipped and only always-available tools
are offered — leaving the agent with no file access even though a workspace is
set. When a workspace is active, include the file/code tools (incl.
get_workspace) on low-signal turns so the agent can act on the folder.
Also requires the tool index (ChromaDB) to be reachable for normal retrieval;
that is an environment dependency, not part of this change.
* ux(workspace): hide pill + overflow entry in chat mode
Workspace only scopes the agent's file/shell tools, so the pill and the
overflow 'Workspace' entry are agent-only now — hidden in chat mode like the
bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is
called from the agent/chat setMode handler.
* prompt(tools): steer bash/python to defer to the dedicated file tools
bash/python schema descriptions (what native-tool-calling models read) were
bare and gave no steer, so models would do file ops via the shell (e.g. writing
SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python
in the schema + tool-index + prompt section to prefer read_file/write_file/
edit_file/grep/glob/ls and only be used for what those do not cover.
* prompt(tools): keep bash/python deferral generic (no hardcoded tool names)
Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc.
by name, so the guidance does not go stale if those tools are renamed.
* style(workspace): drop em-dashes from added code comments/strings
* ux(workspace): terser non-sandbox note in picker (no tool-name list)
* ux(workspace): mirror terse non-sandbox wording in pill tooltip
* chore: untrack local venv symlink (run-only, not part of the feature)
* prompt(workspace): keep get_workspace text generic (no hardcoded tool names)
* fix(agent): low-signal + workspace surfaces only read-only file tools
Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message
in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but
not write_file/edit_file/bash/python -- those wait for a request that actually
calls for them (RAG retrieval still adds them on a real ask).
* feat(workspace): cap browse listing at 500 dirs with a truncated hint
Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local
_MAX_BROWSE_DIRS so a directory with thousands of children does not dump every
row into the picker; the response carries a truncated flag and the modal tells
the user to type a path to jump in.
* chore: untrack local venv symlink (run-only artifact)
* fix(workspace): vet the workspace root against the sensitive-path deny list at bind time
The in-workspace resolver deny-lists sensitive paths inside the workspace,
but the empty-path search root is the workspace itself, so a workspace of
~/.ssh could be listed via ls with no path. vet_workspace() (public, in
tool_execution next to the resolvers) rejects non-directories and sensitive
roots before the path is ever bound; chat_routes uses it instead of its
inline isdir check.
* fix(workspace): reject filesystem roots and stop showing rejected workspaces as active
Review findings from #3665:
P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes
every absolute path 'inside' the workspace and collapses confinement into
host-wide file access. A root is its own dirname, so reject when
dirname(resolved) == resolved; the browse response now carries a selectable
flag and the picker disables 'Use this folder' on unselectable dirs.
P3: /workspace set stored any string client-side and the chat route silently
dropped rejected values, so the pill could claim a confinement that was not
in effect. New admin-gated /api/workspace/vet validates manual paths before
they persist (canonical path returned), and when a posted workspace is
rejected at send time the stream emits workspace_rejected so the client
clears the stored value and toasts instead of continuing silently.
* fix(workspace): check caller privilege before vetting the posted workspace
Review finding: /api/chat_stream called vet_workspace() on the posted value
for every caller and emitted workspace_rejected on failure, so a non-admin
who can chat but cannot use file/shell tools could distinguish existing
directories from missing/file/sensitive/root paths by whether the event
appeared. The resolution now lives in _resolve_request_workspace, which
drops the submitted value uniformly for non-admin callers, with no vetting
and no event, before the path ever touches the filesystem. Admin and
single-user behavior is unchanged. Test pins that valid and invalid paths
are indistinguishable for a non-admin and that vet_workspace is never
invoked for them.
Replace hardcoded [:2000] and [:4000] slicing with the shared _truncate
helper from tool_utils, which uses MAX_OUTPUT_CHARS and adds an explicit
truncation indicator when content is cut.
Scoped down from the original PR: only agent/tool-output display
behavior, no integrations.py changes.
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
Found the reason yesterday's tool-retrieval drop wasn't taking effect:
in _build_agent_prompt, when relevant_tools was provided, it computed
tool_names = set(ALWAYS_AVAILABLE) | set(relevant_tools)
which silently re-added every tool get_tools_for_query had just
deliberately discarded. So when a 'save this for <person>' query
dropped manage_memory from the retrieved set, the prompt builder put
it right back, and the model saw both tools again.
Trust the relevant_tools set. get_tools_for_query already starts from
ALWAYS_AVAILABLE — any discard there is intentional and should
propagate. Only force-include ask_user and update_plan here as belt-
and-suspenders since the agent loop relies on those for its own
control flow.
Other callers (task_scheduler) already union ALWAYS_AVAILABLE or
ASSISTANT_ALWAYS_AVAILABLE into relevant_tools before passing it in,
so they're unaffected.
Even with manage_contact in the retrieved tool set, models were still
defaulting to manage_memory when the user pasted an address + 'save for
<person>'. Both tools were in front of the model and it picked memory.
Tighten both descriptions to steer at decision-time:
- agent_loop.py manage_memory description: clarify scope is facts
about the USER, with an explicit 'DO NOT use for info about another
person' + a 'use manage_contact instead' line.
- tool_index.py manage_memory description: same in shorter form, so the
embedded retrieval signal is consistent with the prompt-time
description.
Closes the gap that pushed the agent into manage_memory when the user
pasted an address and said 'save this for X'. manage_contact now
accepts an optional address arg end-to-end:
- routes/contacts_routes.py:
- _normalize_contact carries an 'address' field
- _build_vcard emits ADR:;;<address>;;;; (street component of the
RFC-6350 7-part ADR), only when address is non-empty
- _parse_vcards reads ADR, joins non-empty components with ', '
- _create_contact and _update_contact thread address through;
update preserves existing address when caller passes empty
- src/tool_implementations.py do_manage_contact:
- add accepts address; require at least name+address or email
(was: email required) so address-only contacts are addable
- update accepts address; require name OR emails OR address
- src/tool_schemas.py: schema gets a single 'address' string field
- src/tool_index.py + src/agent_loop.py: descriptions get one
'address' arg mention and a 'use this for save-X-for-person /
address pastes / phone-with-name' steering line. Net: a few
bytes added, not a paragraph.
Also: removed a stray name from the schema's manage_contact example
strings ('save Jonathan's email…') — no real names in the codebase.
Closes the auto-send hole that let earlier models invent signatures
(e.g. signing 'David' for a user named Felix) and SMTP them to real
recipients before the user could review.
New setting: agent_email_confirm (default True).
When on, the MCP send_email and reply_to_email tools no longer SMTP
directly — they write the composed email to scheduled_emails with a new
status 'agent_draft' (far-future send_at so the scheduled-send poller
ignores them) and return a {pending: true, pending_id, to, subject,
body, message: ...} payload. The model surfaces that to the user.
Backend endpoints to approve / cancel:
- GET /api/email/pending → list staged drafts for the owner
- POST /api/email/pending/{id}/approve → flip status to 'pending' +
backdate send_at so the
existing scheduled-send
poller delivers immediately
- DELETE /api/email/pending/{id} → status = 'cancelled'
UI:
- Settings / AI Defaults gets a new 'Email Safety' card with the
toggle, default on.
- Tool descriptions for send_email and reply_to_email now include the
pending behavior + an explicit 'DO NOT invent a signature, do not
type a person's name' guardrail.
Pass 2 (next): inline chat card with Send / Discard buttons so the user
doesn't have to type a confirmation reply. Today's prompt + the listing
endpoint give the model a clean path to surface drafts.
Two months of iteration on the Settings panel, integration forms, and
small visual nudges across the app. Highlights:
Settings restructure
- Add Models: split into separate Local + API cards (no more in-card
tabs); each fuses Type/Provider with the URL input.
- Added Models: new dedicated sidebar tab, with Probe + Clear-offline
pulled into its header; Local/API sub-section icons accent-tinted.
- Search: Web Search and a new Deep Research card (Model + tuning),
with a cross-link to AI Defaults. Provider hints use real clickable
anchors; Web Search Test button shows a whirlpool spinner.
- AI Defaults: Image Generation card returns; Research Model card
carries only Endpoint+Model with a cross-link to Search; Vision /
Default / Utility fallbacks unified under one numbered-row design
matching Search's chain.
- API Permissions (was 'API Tokens'): per-row rename, inline
Permissions toggle that expands the scope-edit panel, in-field
copy icons (icon→check on success). Empty state accent-tinted.
- Integrations: + Add Integration drops a type-picker menu directly
under the button (drop-up on tight viewports); each integration
form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses
the same accent-outlined Save/Test/Cancel buttons right-aligned.
- Danger Zone: Wipe→Delete with trash icons; new 'Delete everything'
row at the bottom that loops every category.
AI Synthesis (Reminders)
- Persona dropdown sourced from PROMPT_TEMPLATES + custom preset.
- src/reminder_personas.py mirrors the five built-ins for the
server-side synthesis path.
- dispatch_reminder() reads reminder_llm_persona and uses the
persona's system prompt; empty/unknown falls back to warm-neutral.
Esc handling
- Kebab menus and the provider picker intercept Esc in capture phase
so dismissing a popup no longer closes the whole Settings modal.
Accent tinting
- Scoped CSS rule across data-settings-panel=ai/services/added-models/
search/integrations/reminders for card h2 icons + the Added Models
sub-section icons.
Codex/Claude integration form
- No more auto-creation on form open — explicit Create token button.
- New tokens start with every scope granted; existing tokens move out
of the integration form into the API Permissions card.
- Setup reveal: copy buttons inline inside the token + setup code
blocks; shorter subtitle wording.
Misc visual polish
- Save/Test/Cancel uniformly accent-outlined and right-aligned on
every integration form.
- Provider logos render inline next to the search fallback selects
and the Deep Research Search dropdown.
- Trash icons in fallback rows bumped to 20x20 so they fill the 32px
button.
- Image generation default flipped to off.
* fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache
Fixes#2927. As diagnosed in the issue, three things in Odysseus's request
pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache
continuity, forcing a full prompt re-evaluation (15-30s+) on every turn:
1. Dynamic content folded into the system prompt every turn. Both the chat
preface (ChatProcessor.build_context_preface) and the agent system prompt
(_build_system_prompt) injected current_datetime_prompt() — text that
changes every minute — directly into system-role messages, which llm_core
then concatenates into the single system message sent as the cached
prefix. Any byte difference there invalidates the entire cache. Moved this
to a new current_datetime_context_message() helper that returns a
standalone user-role message, inserted near the end of the array (right
before the latest user turn) instead of mixed into the system prompt. The
static system prefix (preset prompt + safety policy + agent base prompt)
now stays byte-identical across turns of the same session.
2. Memory/skill extraction side-requests competed with the main completion.
run_post_response_tasks fired extract_and_store / maybe_extract_skill via
asyncio.create_task — fire-and-forget coroutines that could overlap the
next turn's main request and steal llama.cpp's limited processing slots,
evicting the cached checkpoint. They're now queued through a new
_run_extraction_jobs_sequentially helper that waits for the session's
stream to go idle and runs the jobs strictly one at a time.
3. No stable session identifier was sent to local backends, so llama.cpp
assigned a new processing slot via LRU every turn ("session_id=<empty>
server-selected (LCP/LRU)"), losing slot affinity. Added
_apply_local_cache_affinity() in llm_core, which sets session_id and
cache_prompt: true on outgoing payloads — gated to self-hosted
OpenAI-compatible endpoints only (never api.openai.com or other cloud
providers, which reject unrecognized request fields with a 400). Threaded
session_id through stream_llm / llm_call_async / stream_agent_loop from
the existing Odysseus session id.
Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload-
assembly and scheduling code paths: byte-identical system prefix across two
turns of the same session (with a regression check that genuinely changed
instructions DO still change it), the dynamic time block landing as a
user-role message, extraction jobs waiting for the stream to go idle and
running sequentially, and the outgoing payload carrying a stable session_id
(same across turns of one session, different across sessions) only for
self-hosted endpoints. Updated tests/test_user_time.py for the new message
placement.
* fix(tests): accept owner= kwarg in normalize_model_id monkeypatch
The upstream normalize_model_id signature now takes an owner= keyword
argument, and chat_helpers.py passes owner=getattr(sess, "owner", None)
at the call site. Update the test stub lambda to **kwargs so it handles
the new argument without breaking, and update chat_helpers.py to forward
the owner parameter consistently.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
Commit e6b1009 removed the workspace feature's entry point (deleted
routes/workspace_routes.py + static/js/workspace.js and dropped the
workspace-param parsing in chat_routes), but left the downstream backend
plumbing dangling: chat_routes passed a hardcoded workspace=None into
stream_agent_loop, which forwarded it to execute_tool_block, so the
workspace value was permanently None and every workspace-gated branch
was unreachable.
Remove the now-dead code (no behavior change, since workspace was always
None):
- src/tool_execution.py: drop _resolve_tool_path_in_workspace and the
workspace params/branches on execute_tool_block, _direct_fallback,
_call_mcp_tool, _do_edit_file, and _resolve_search_root; restore the
bash/python/bg cwd to _AGENT_WORKDIR.
- src/agent_loop.py: drop the workspace param on stream_agent_loop, the
dead 'ACTIVE WORKSPACE' system-prompt block, and the workspace forward.
- routes/chat_routes.py: drop the hardcoded workspace=None arg and var.
- tests: delete test_workspace_confine.py (tested the removed feature) and
the workspace assertion in test_tool_policy.py.
Full suite: 2903 passed, 1 skipped.
* refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils
Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager()
into a single leaf module (src/tool_utils.py) that imports only from
src.constants. This eliminates the lazy-import hack
('from src import agent_tools' inside function bodies) in tool_execution.py
and tool_implementations.py, and fixes a latent bug: the _truncate copy in
tool_execution.py was missing the isinstance guard and would crash on None.
Also deletes mcp_servers/_common.py — it was dead code with zero callers
anywhere in the codebase, containing its own copy of truncate() and
constants that already exist in src/constants.py.
* fix(tools): route remaining get_mcp_manager imports to src.tool_utils
The maintainer's feedback flagged src/task_scheduler.py:1857 and
routes/task_routes.py:977. A project-wide search found a third call site
in src/agent_loop.py that also imported get_mcp_manager from
src.agent_tools instead of src.tool_utils.
All three are now sourced from the canonical location in src.tool_utils.
---------
Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>
* fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models
_resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block
parser whenever a model produced no native tool_calls, regardless of whether
that model has a reliable native function-calling channel. Native models
(GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write
illustrative ```bash/```python/```json examples in guide-only prose; the
fallback parser matched these and executed them as real commands, sometimes
looping for several rounds as the model tried to clarify with more examples
(#3222).
Restrict the textual fenced-block fallback to non-native models, which rely
on it as their only tool-invocation channel. Native models are trusted to use
their structured tool_calls channel for real invocations; when they don't
emit one, a bare fence in their response is prose, not an action. The native
tool_calls path itself is untouched.
This sits one layer below #3088's guide-only policy enforcement: that PR
blocks tool exposure/execution on explicit no-tools requests, while this fixes
the parser so ordinary illustrative fences are never misread as calls in the
first place, on any turn.
* fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence
_resolve_tool_blocks previously short-circuited the entire textual parser
(tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native
function-calling models with no native tool_calls. That also dropped Patterns
2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content
as text), which are real calls a model couldn't emit on its structured channel
(e.g. DeepSeek-V falling back to DSML), not illustrative examples.
parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY
Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks
passes skip_fenced=is_api_model so fenced examples stop being executed for
native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and
recoverable. cleaned_round mirrors the same gate when persisting round text, so
an illustrative fence that wasn't executed isn't stripped from saved/reloaded
history either (it was streaming once and then disappearing on reload).
The loop-breaker's runaway backstop counted per-tool-type call totals and
tripped whenever any tool was used >=15 times — treating 15+ DISTINCT calls
to one tool as a stuck loop. A real batch (e.g. "add these 18 birthdays to my
calendar" emits 18 distinct manage_calendar create_event calls in one round)
got flagged "calling manage_calendar over and over", the calls were discarded
(next round tools_sent=0), and 0 events were created.
Count IDENTICAL repeated call signatures instead (same tool AND args), via a
small, unit-testable _detect_runaway_call() helper. Genuine batches pass; a
model truly stuck repeating one call still trips the backstop. Adds a
regression test.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat: Add plan mode to the chat agent
Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.
Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.
- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.
* Plan mode: persistent re-referenceable plan + agent write-back
Three improvements so a long plan survives a weak model and stays in reach:
1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
sends the approved checklist back (`approved_plan`); the backend pins it as a
top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
the agent can always re-read the plan instead of losing the thread on a long
run. New `build_active_plan_note()` (unit-tested).
2. Re-open / dock the plan anytime. The plan checklist is stored per-session
(localStorage). When a plan exists, the plan-mode button opens a small menu
("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
window — so it can stay docked while the agent works. The window live-refreshes
as the plan changes.
3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
`- [x]` after finishing them, or to revise steps when the user asks. Marker
tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
update live. The ACTIVE PLAN note instructs the agent to use it.
Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.
* Plan mode: drop bash/python, rely on read-only discovery tools
Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.
Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Comment: note _MCP_READONLY_VERBS are prefixes not whole words
Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Plan mode: clarify why gating inverts the allowlist into a denylist
Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Plan mode: stop hardcoding the read-only tool list in the directive
The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.
Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.
- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
Validates a non-empty question + 2..6 options, normalizes string/object
options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
the turn ends and waits for the user's selection. Stream the question as
assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
user can be asked); the structured args serialize via the default
json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
free-text Other + dismiss x; removed once answered). Reuses CSS vars and
the .modal-close button; emoji go through the monochrome-SVG pipeline.
Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
cap, validation errors, serializer round-trip, registration.
- Calendar overnight events render proportionally across day boundaries
via --start-frac / --end-frac CSS vars instead of bleeding as full-day
on day 2.
- Recurring-event delete strips the master uid + all master::* sibling
instances optimistically so the row clears immediately instead of
waiting for the next sync re-render.
- manage_notes(create) now returns note_id + open_url, and agent_loop
appends a markdown [View note](#note-<id>) link mirroring the
deep-research pattern.
- chatRenderer's hash-link router (already wired for #note-id) reaches
the new notes.openNote(id) helper, which force-closes/reopens the
Notes panel, polls for the target card, and runs a brief outline
flash so the user can locate it on long lists.
* feat: Add workspace: confine agent tools to a folder
Pick a server folder as the agent's workspace so its file/shell tools work
there and don't touch files outside it. File tools are hard-confined; bash/
python run with cwd set to the folder.
Includes a slash command: `/workspace` (alias `/ws`) — show / `set <path>` /
`clear` / `pick` (open the directory browser).
- routes/workspace_routes.py: GET /api/workspace/browse (admin-only).
- src/tool_execution.py: hard path confinement for read_file/write_file;
bash/python cwd. Threaded route → stream_agent_loop → execute_tool_block.
- src/agent_loop.py: workspace note prepended to the system prompt.
- static/: overflow menu item, input-bar pill, directory-browser modal, and
the /workspace slash command.
- tests/test_workspace_confine.py.
* Wire workspace confinement into tools that landed after this PR
edit_file (#1239) and grep/glob/ls (#1670) merged after workspace-confine was
written, so they bypassed the workspace boundary. Thread the workspace through:
- edit_file: _do_edit_file resolves via _resolve_tool_path_in_workspace
- grep/glob/ls: _resolve_search_root confines to the workspace (root + paths)
- bash/python/bg cwd: workspace or _AGENT_WORKDIR (keep the #2586 data-dir
default when no workspace is set)
Tests cover edit_file + grep/ls confinement (inside ok, outside rejected).
* Workspace picker: editable path bar + modal style cohesion + cross-platform hardening
- Make the current-folder strip an editable address bar: type/paste a full
path and press Enter to navigate (also reaches other Windows drives and
hidden dirs the up-only browser cannot).
- Reuse shared modal CSS: drop bespoke .workspace-modal-content/.workspace-btn*
in favour of base .modal-content/.modal-body and the .confirm-btn button
family; separators/hover use var(--border). Net -31 CSS lines.
- Fix the path field overflowing the modal right edge (flex stretch + margin
vs an overflow:auto scrollbar-feedback loop): full-bleed, no h-margin.
- Cross-platform confinement: normcase the workspace commonpath check so
containment holds on case-insensitive filesystems (Windows/macOS).
- Make tests OS-portable: sibling temp dirs instead of /etc, python os.getcwd()
instead of pwd. 5 pass.
* feat: round-limit handling — Continue affordance at the cap + configurable cap
When the agent loop runs out of rounds (per-message step cap, default 20)
while still actively using tools, it stopped silently mid-task. Now:
1. The loop emits a `rounds_exhausted` SSE event at the cap, and the UI shows
a "Continue" pill at the bottom of the chat that resumes the task from where
it left off. Repeated cap-hits each get a fresh Continue (multiple continues
in a row).
2. The cap is configurable in Settings → Agent ("Max steps per message"),
validated on the client, at the save endpoint, and at the read site.
- src/agent_loop.py: track `_exhausted_rounds` (set only when a full
tool-executing round completes on the last allowed round — i.e. the agent
wanted to keep going); emit `{"type":"rounds_exhausted","rounds":N}` (logged).
- routes/chat_routes.py: read `agent_max_rounds` (clamped 1..200), pass as
`max_rounds`; forward the new event through the SSE relay.
- routes/auth_routes.py: validate numeric settings on save (int + clamp;
agent_max_rounds 1..200, agent_max_tool_calls 0..1000; 400 on non-int).
- src/settings.py: default `agent_max_rounds = 20`.
- static/: Settings input + client-side clamp; the Continue pill (reuses the
existing .stopped-indicator / .continue-btn classes and theme vars
--border/--fg/--bg/--accent); appended to the chat container so it survives
the message re-render at stream finalize. chat.js cache version bumped.
* test: cover rounds_exhausted emission (cap-hit vs normal finish)
Drives the real stream_agent_loop with mocked LLM stream / tool exec / settings:
a tool block every round exhausts the cap and must emit rounds_exhausted; a
plain answer hits the done-break and must not. Guards the for/else logic.
* feat(provider): add GitHub Copilot provider with device-flow auth
Adds GitHub Copilot as a model provider, so Copilot models (gpt-4o/4.1/5,
Claude, Gemini, …) work through the normal chat + agent loop, incl. native
tool calling and vision.
Auth is one-click via the GitHub OAuth device flow; the access token is stored
as the endpoint's (encrypted) api_key and sent directly as `Authorization:
Bearer` (no Copilot-token exchange, no refresh — matching how editors talk to
the Copilot API). Copilot is a normal ModelEndpoint detected by host; the only
provider-specific behaviour is a small set of required request headers,
injected centrally.
Sign-in is available from Settings → model endpoints ("Connect GitHub
Copilot") and from chat via `/setup copilot`.
- src/copilot.py (new), routes/copilot_routes.py (new): constants, header
builders, device-flow start/poll, model discovery, owner-scoped endpoint
provisioning.
- src/llm_core.py, src/endpoint_resolver.py: detect `copilot`, inject headers,
per-request x-initiator/vision.
- src/agent_loop.py: allowlist api.githubcopilot.com for native tool schemas.
- src/model_context.py: known context windows for Copilot (no unauthenticated
/models probe).
- static/, README, tests/test_copilot*.py.
* Tidy copilot_routes: clarify supports_tools, note _PENDING is per-process
* Add edit_file tool + file-change diffs
edit_file is an exact old_string -> new_string replacement on a file on disk
(fails if old_string is missing or non-unique unless replace_all); write_file
also returns a unified diff. Diffs render collapsed in the tool bubble
(filename + +adds/-dels, theme colors); the raw JSON command box is hidden.
Security: edit_file is a sensitive filesystem-write tool, treated everywhere
write_file is —
- added to NON_ADMIN_BLOCKED_TOOLS (is_public_blocked_tool / blocked_tools_for_owner),
so on auth-enabled deployments a non-admin cannot run it; execute_tool_block
refuses it for non-admin owners.
- confined by the same path policy as read_file/write_file (allowlist +
sensitive-file deny) via _resolve_tool_path.
Disambiguation in tool descriptions + bash prompt: edit_file/write_file are the
only way to write files (they show a diff) — never edit_document (editor panel)
or a bash heredoc/redirect.
Tests (tests/test_edit_file.py): non-admin block (policy + execution gate),
successful edit, not-found old_string, non-unique old_string (+ replace_all),
and path outside the allowed roots.
Files: src/tool_execution.py, src/agent_loop.py, src/tool_schemas.py,
src/agent_tools.py, src/tool_index.py, static/js/chat.js, static/style.css,
tests/test_edit_file.py.
* Drop redundant import os in write_file closure
os is already imported at module top.
Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:
* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).
Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.
`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.
Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.
Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.
`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.
Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
get_builtin_overrides() was swallowing all exceptions with a bare
`except Exception: pass`, so misconfigured tool-description overrides
would silently produce wrong agent behaviour with no log trace.
The background endpoint refresh loop had the same pattern: any probe
failure was silently ignored, giving operators no signal that the
refresh was broken.
Also removes a circular self-import (`from src.agent_loop import
_build_base_prompt`) inside _build_system_prompt; the function is
already in scope and the import created a latent circular reference risk.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Agent mode treated local /v1 endpoints, including Ollama on :11434, as native-tool-capable by host/model heuristics. On Ollama's OpenAI-compatible surface some models that advertise tool support stop after a single token when schemas are sent (issue #1567). Default local Ollama /v1 back to fenced tool blocks unless the endpoint explicitly has supports_tools=True.
Also compare both the runtime chat URL and the normalized endpoint base when reading ModelEndpoint.supports_tools. That keeps a saved base URL such as http://localhost:11434/v1 effective when the active session URL is /v1/chat/completions.
Tests: .venv/bin/python -m pytest tests/test_tool_support_heuristic.py
Models like gemma4, qwen3.5, and ministral served via Ollama's native
/api/chat respond to OpenAI-style tool schemas by emitting a single
native tool_call chunk and then stopping. The agent loop receives
1 token of round_response and no recognised ToolBlock, so the round
ends immediately — the user sees a one-token response.
Root cause: _is_api_model was True for any endpoint whose host appears
in _API_HOSTS (which includes "host.docker.internal" and "localhost")
OR whose model name matches a keyword like "gemma". Native Ollama
endpoints were never excluded from this path.
Fix: import _is_ollama_native_url from llm_core and treat native Ollama
endpoints (/api/chat, port 11434) as text-only by default — falling back
to the fenced-block tool path the local models are tuned for. The
per-endpoint supports_tools=True toggle (Settings → Endpoints) still
overrides this for users who have explicitly opted in.
Fixes#1567
After a deep-research job completes, a follow-up like "check it out" / "read
that report" had the agent web_fetch the /api/research/report/{id} HTML render
(and then drift into unrelated searches) instead of reading the saved report
(issue #1363). The report text is already available via the manage_research
tool (action read), and action list returns ids most-recent-first, so the
agent can resolve "the recent report" itself.
Strengthen the manage_research instructions: read a finished report via
action list -> action read; do NOT web_fetch/app_api the report URL (it renders
HTML, not clean text) and do NOT start a fresh web_search just to read an
existing report. Annotate the app_api endpoint list to say the same.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat: document rrule in the manage_calendar tool schema (#1320)
The create_event handler already persists `rrule` (a single event carrying an
iCalendar RRULE), but the manage_calendar tool schema didn't list it, so the
agent had no documented way to make a recurring event and took a roundabout
path. Add `rrule?` to the create_event field list with examples
(FREQ=WEEKLY;BYDAY=MO etc.) and an explicit note to create ONE event with the
rule rather than looping.
Covered by tests/test_calendar_rrule.py: do_manage_calendar create_event with an
rrule stores one event with that recurrence; without it, the event is single.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test: restore SessionLocal via monkeypatch in #1320 rrule test (review)
Per review: the test patched core.database.SessionLocal at module import and
never restored it, which could leak the temp DB into later tests in the same
process. Move the patch into an autouse monkeypatch fixture so it is restored
after each test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>