The STOP_PROMPT did not include the target round count, so the LLM
could decide to stop after 2-3 rounds even when the user requested 8.
Additionally, min_rounds was capped at 3 regardless of max_rounds.
- Add max_rounds to STOP_PROMPT so the LLM knows the target
- Change min_rounds from min(3, max_rounds) to max(2, max_rounds - 2)
Fixes#2863
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
* feat(calendar): support multiple CalDAV accounts
Replaces the single CalDAV credential slot with a named account list so
users can sync both a personal and work calendar simultaneously.
- Add `account_id` column to `CalendarCal` + startup migration
- `_load_caldav_accounts()` in caldav_sync.py reads `caldav_accounts`
list from prefs, auto-migrating the legacy single `caldav` key on
first use (no user action required)
- `sync_caldav()` iterates all accounts and aggregates counts/errors
- `writeback_event()` resolves credentials via `CalendarCal.account_id`,
falling back to the first account for legacy rows
- New REST endpoints: GET/POST/PUT/DELETE `/api/calendar/config/accounts`
- Legacy GET/POST `/api/calendar/config` preserved for backward compat
- Settings UI: one card per account with Label, URL, Username, Password
fields; Test button works for both unsaved (inline creds) and saved
(by account_id) accounts; delete removes only that account
- Update test_caldav_url_hardening.py mock to include `_save_for_user`
and updated `_sync_blocking` signature
* fix(calendar): restore #2765 PK scoping and #2819 writeback URL validation
Two regressions introduced by the multi-account refactor:
1. PK collision (#2765): _stable_cal_id was back to hashing only the URL,
so two users — or one user with two accounts on the same server — would
collide on the primary key. Restore owner+account_id in the hash key
(format: "{owner}\n{account_id}\n{url}") and thread both values through
_sync_blocking → _writeback_blocking → push_event → find_remote_calendar
so the hash round-trips correctly on write-back.
2. URL validation dropped (#2819): _load_caldav_accounts imported
_save_for_user at function scope, causing an ImportError on test mocks
that only provide _load_for_user, which prevented writeback_event from
reaching the validate_caldav_url call. Move the import inside the
migration branch and wrap in try/except (best-effort save; next call
re-migrates from the still-present legacy key).
Update fake_writeback_blocking in test_caldav_writeback.py to accept the
new owner/account_id optional params.
* feat(skills): import SKILL.md bundles from public GitHub URLs
Supports GitHub tree/blob/raw links and skills.sh pages that resolve to GitHub.
Installs SKILL.md plus sibling text assets under data/skills/imported/.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): admin-gate URL import and validate redirect hosts
- require_admin on POST /api/skills/import-from-url (matches other skill admin routes)
- reject cross-host redirects after httpx follow_redirects
- test for redirect host validation
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): match Brain Add panel import/submit button styles
- Skill URL Import: theme-io-btn + download icon (same as memory Import)
- Add Skill submit: confirm-btn confirm-btn-primary
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): allow api.github.com during directory import
Real imports hit the GitHub contents API after redirects; whitelist
api.github.com and add regression tests. Shrink Import button with flex:none.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): align skill Import button with URL input row
Match memory-add-input height (28px) in memory-add-row and center the
download icon with flexbox instead of vertical-align hacks.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): cancel modal-body margin on skill Import button
The skill Import button sits in .memory-add-row beside an input; the
global .modal-body button { margin-top: 6px } rule only affected buttons,
pushing Import down and misaligning the download icon. Reset margin-top
and match Memory Import SVG markup at 28px row height.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): surface GitHub API errors on URL import
Pass through GitHub response messages (especially 403 rate limits) as
SkillImportError instead of a generic download failure.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* Make edge-docked windows resizable
Add draggable resize seams for left and right docked windows.
Keep the main chat area from getting too narrow and remember each window's dock width.
* Show emoji shortcodes as icons by default
Keep text-only emoji mode opt-in so model output like 😊 goes through the normal emoji renderer.
* Fix dock resize seams and left dock layout
Hide the resize seam when another floating modal is open, and keep the left-docked window from covering the chat area.
* Keep narrow modal tabs usable
* Fix split layout with both edge docks
* Fix left snap after right dock
* Enable left edge snap for all windows
* Tighten dock resize handle observers
* Use edge docking for settings window
* feat: Add plan mode to the chat agent
Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.
Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.
- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.
* Plan mode: persistent re-referenceable plan + agent write-back
Three improvements so a long plan survives a weak model and stays in reach:
1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
sends the approved checklist back (`approved_plan`); the backend pins it as a
top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
the agent can always re-read the plan instead of losing the thread on a long
run. New `build_active_plan_note()` (unit-tested).
2. Re-open / dock the plan anytime. The plan checklist is stored per-session
(localStorage). When a plan exists, the plan-mode button opens a small menu
("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
window — so it can stay docked while the agent works. The window live-refreshes
as the plan changes.
3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
`- [x]` after finishing them, or to revise steps when the user asks. Marker
tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
update live. The ACTIVE PLAN note instructs the agent to use it.
Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.
* Plan mode: drop bash/python, rely on read-only discovery tools
Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.
Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Comment: note _MCP_READONLY_VERBS are prefixes not whole words
Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Plan mode: clarify why gating inverts the allowlist into a denylist
Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Plan mode: stop hardcoding the read-only tool list in the directive
The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
* fix(calendar): expose source in calendar list and add per-calendar delete
- GET /api/calendar/calendars now includes source field so the frontend
can distinguish CalDAV collections from local calendars
- Add DELETE /api/calendar/calendars/{cal_id} to remove a specific
calendar and its events by owner-scoped ID
* fix(settings): show all CalDAV calendars in integrations list
Previously one card was shown for the CalDAV server connection regardless
of how many calendar collections had been synced. The Calendars page showed
them all; Settings did not.
- Fetch /api/calendar/calendars alongside existing requests and render
one card per source=caldav collection, falling back to the single
server-level card if nothing has synced yet
- Delete now targets the specific calendar by ID rather than clearing
the whole server config
- Confirm dialog shows the calendar name so the user can verify before
removing
* fix(caldav): pull Google Calendar events from the events collection, not the /user principal
Google serves its CalDAV principal at .../caldav/v2/<id>/user but events live
under .../caldav/v2/<id>/events. The caldav library's principal->home-set
discovery does not reliably enumerate calendars from Google's /user endpoint,
so _sync_blocking fell into its 'treat the URL as a single calendar' fallback
and ran every calendar-query REPORT against the principal URL. /user holds no
VEVENTs, so the REPORT returned a clean but empty 200 for every date range:
auth succeeded, the calendar stayed empty (Apple Calendar works because iCloud
exposes standard discovery at the pasted URL).
Add _google_caldav_events_url() to map a recognised Google principal URL to its
events collection, and route both discovery-less fallbacks through
_open_url_as_calendar() so Google syncs hit /events while other servers' URLs
are used unchanged.
Fixes#2507
* fix(caldav): also map Google's legacy www.google.com/calendar/dav principal URL
Some Google accounts authenticate against the older CalDAV endpoint
(https://www.google.com/calendar/dav/<id>/user) rather than the newer
apidata.googleusercontent.com/caldav/v2 form (reported on #2507). Both have the
same principal-vs-events split, so map the legacy /user URL to its /events
collection as well. The legacy branch is gated on the /calendar/dav/ path so an
unrelated www.google.com URL ending in /user is left untouched.
_stable_cal_id hashed only the remote URL, producing the same calendar
ID for all users syncing the same CalDAV endpoint. The second user would
get an IntegrityError on the primary key. Now includes owner in the
hash so each user gets a distinct calendar row.
then_task_id was stored without checking the target task's owner. A user
could chain their task to execute any other user's task on success via the
scheduler's _run_chained path. Now verifies the target task exists and
belongs to the requesting user before storing.
list_versions and get_version used a soft 'if doc:' guard that skipped
ownership verification when the Document row was missing (e.g. after
hard delete). Orphaned DocumentVersion rows would be returned to any
caller without auth. Now raises 404 when the parent document is gone,
matching the pattern already used in restore_version.
Every other image-processing endpoint (denoise, upscale, remove-bg,
enhance-face, inpaint, harmonize) calls require_privilege(request,
"can_generate_images"). The sharpen endpoint was missing this check,
allowing unauthenticated users to trigger CPU-intensive image processing.
* fix: prevent document link click from resetting active session
Clicking a #document-<uuid> link in chat caused the session to reset
because of two issues:
1. chatRenderer.js: clicking on the text inside an <a> yields a Text
node target whose .closest() is undefined, so preventDefault never
fires and the browser performs a default hash-navigation
2. sessions.js: the hashchange handler treated the entity hash
(document-<uuid>) as a session lookup, found no match, and the
subsequent loadSessions created a new default-model chat
Fix: walk past Text nodes before calling .closest(), and skip
entity-prefixed hashes in the hashchange handler.
Fixes#2035
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(documents): move isOpen=true after container check in openPanel
isOpen was set to true before the #chat-container existence check.
If the container was missing during a race, the function returned
early but isOpen stayed true, preventing the panel from ever
reopening on subsequent calls.
Move isOpen=true to after the container guard so a failed open
doesn't leave the flag stuck.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documentation-only PR continuing #2523. Adds tests/README.md to document helper conventions, validation expectations, and the next test-suite refactor phase.
* feat: support for embedding API key
* feat: encrypt and decrypt embedding API key
* test: add unit tests for EmbeddingClient authorization header behavior
This commit consolidates all Windows Cookbook background fixes into a single comprehensive commit based on the latest main branch.
Key fixes included:
1. React looksSuccessful Mismatch: Append 'DOWNLOAD_OK' for pip install commands in routes/cookbook_routes.py.
2. Local Windows SSH Wrapper & Log Directory Mismatch: Bypassed ssh wrappers and dynamically selected odysseus-tmux logs for local tasks in static/js/cookbookRunning.js.
3. WSL Bash Filtration: Filtered out the WSL bash stub at C:\Windows\System32\bash.exe in core/platform_compat.py.
4. Drive-Colon Path Normalization: Replaced .as_posix() with git_bash_path() in routes/shell_routes.py and src/bg_jobs.py.
5. GGUF-Only Hardware Fitting: Restructured local Windows recommendations to rank GGUF only in services/hwfit/fit.py.
6. Safe Win32 Process Liveness Probe: Replaced os.kill(pid, 0) with a safe Win32 API probe using GetExitCodeProcess in core/platform_compat.py.
7. Prebuilt llama-cpp-python Wheels: Supply the CPU extra index during compilation failure fallback.
8. Enforce UTF-8 log encoding: Set PYTHONIOENCODING=utf-8 on Windows bootstrap runners.
9. Fix Linux Llama.cpp Build script syntax error in routes/cookbook_helpers.py.
10. Page Reload Status Check: Run sys.executable instead of 'python3' to bypass Microsoft Store execution stubs on local Windows hosts.
11. Llama.cpp serve build bypass: Bypassed cmake compilation checks on local Windows and verified python bindings directly.
12. Serve Command Path Validation: Masked safe GGUF path printf subshells '' inside the serve command validator.
13. CPU Mismatch Diagnostics: Intercepted AVX2-lacking '0xc000001d' (Illegal Instruction) crashes in static/js/cookbook-diagnosis.js and guided users to Ollama.
14. Windows Pytest stability: Fixed stub import leakage in test files.
* fix(images): render agent-generated images in chat
When a chat model calls generate_image mid-conversation (agentic flow), the image does
not display — it survives only as a URL the model echoes in prose. generate_image runs
as a text-only MCP server, so result['image_url'] is never populated and the existing
buildImageBubble render path never fires. Promote the image URL out of the tool's stdout
in tool_execution so the agent loop's existing forwarding renders it via buildImageBubble
— deterministically, no dependence on the model echoing the URL. Backend-only; reuses
dev's image bubble, forwarding, and the tool's existing parseable output.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(images): fully-qualified, valid generated-image links
The chat model often mangled the generated-image URL it echoed in prose (relative path,
or copying the 'image_url:' label into the link href). Build a fully-qualified link by
prefixing the existing app_public_url setting (empty default keeps relative paths), and
present it as a clean 'Direct link:' the model can echo verbatim (the frontend auto-links
bare https URLs). One file; independent of how the image is rendered.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(images): cover _promote_image_fields; make exit-code guard self-contained
Adds the unit tests requested in review on #2809: absolute URL, relative URL,
no URL (result unchanged), and non-zero exit_code (not promoted). Moves the
dict/exit_code==0 guard from the call site into _promote_image_fields so the
function is self-contained and the failure case is unit-testable; call-site
behavior is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Test-only refactor continuing #2523. Replaces a repeated core.auth cache eviction pattern in three auth tests with the shared clear_module helper, preserving behavior.
Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.
Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.
- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
Validates a non-empty question + 2..6 options, normalizes string/object
options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
the turn ends and waits for the user's selection. Stream the question as
assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
user can be asked); the structured args serialize via the default
json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
free-text Other + dismiss x; removed once answered). Reuses CSS vars and
the .modal-close button; emoji go through the monochrome-SVG pipeline.
Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
cap, validation errors, serializer round-trip, registration.
Test-only fix continuing #2523. Updates two stale regression tests so the current broad Python pytest baseline is restored without changing production code.
Test-only fix continuing #2523. Makes the archived-session model-filter test independent of optional multipart packages. The red broad pytest status was classified as unrelated current dev baseline drift before merge.
* refactor(cookbook): move _diagnose_serve_output to module level in cookbook_helpers
Extracts the nested _diagnose_serve_output function from inside
setup_cookbook_routes() and moves it to module level in cookbook_helpers.py,
alongside the other helper functions it logically belongs with.
No behaviour change — the function is now importable directly for testing
and by other callers without going through the route factory closure.
* fix(cookbook): surface backend diagnosis when serve fails in background
The background poll (_pollBackgroundStatus) already received `diagnosis`
and `cmd` from /api/cookbook/tasks/status but discarded both. When a serve
job died while the Cookbook modal was closed, reopening it showed only a
red error badge with no context.
- Persist live.diagnosis into task._backendDiagnosis in localStorage so it
survives modal close/reopen and page refresh
- Persist live.cmd into task.payload._cmd for agent-spawned tasks so the
crash report includes the actual command
- After _renderRunningTab(), walk rendered cards and call _showDiagnosis()
for any that have a stored _backendDiagnosis but no panel yet
- In _renderTaskCard(), use _backendDiagnosis as a fallback when the
client-side _terminalServeDiagnosis() finds nothing
* test(cookbook): add coverage for _diagnose_serve_output error patterns
10 tests verifying the 16 serve-failure patterns:
- CUDA OOM, port-in-use, vLLM missing, gated model
- Traceback fallback fires without startup success marker
- Traceback suppressed when server actually started
- Clean/empty output returns None
- trust-remote-code and no-GGUF patterns
Bring main's maintainer-curated work (cookbook scheduler, calendar rendering/sync, settings polish, agent debug loop) into dev so dev is a superset of main (resolves the dev/main drift, #2543).
Test-only refactor continuing #2523. Reuses the shared import-state helper in session-related tests, removes duplicated local save/restore logic, and preserves existing test behavior.