* docs: add implementation plan for fixing chat context drifting (#135)
* fix: make Session.history immutable + fix {}.history crash
- Session.history now exposes a COPY of the internal _history list
- add_message() replaces history with a fresh copy each time
- get_context_messages() derives from _history directly
- replace_messages() updates both _history and history
- truncate_messages() updates both _history and history
- _persist_message() line 207: fixed {}.history fallback crash
- Added 11 tests for session isolation and edge cases
Addresses #135 root cause #1: shared mutable references
* fix: task scheduler uses SessionManager methods instead of overwriting sessions
- Added ensure_task_session() to SessionManager (checks cache first)
- Task scheduler now uses ensure_task_session() instead of direct dict assignment
- Task scheduler now uses SessionManager.add_message() for message persistence
- Removed direct sess_obj.history.append() that was silently losing data
Addresses #135 root causes #2 and #3
* fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old
Prevents the cleanup task from deleting sessions that were just created
and haven't received any messages yet (message_count == 0).
Addresses #135 root cause #5
* test: comprehensive session isolation tests (10/10 passing)
* refactor: consolidate _session_manager into singleton pattern
- Added set_session_manager_instance / get_session_manager_instance to core/models
- kept backward-compat aliases (set_session_manager, get_session_manager)
- session_manager.py re-exports the singleton functions
- ai_interaction.set_session_manager now syncs with the core singleton
- context_compactor uses get_session_manager_instance() instead of getattr hack
- app.py initializes the singleton once
Addresses #135 root cause #4: fragile global wiring
* test: add concurrent session isolation integration tests
Verifies:
- Concurrent add_message to different sessions doesn't cross-contaminate
- Rapid parallel writes maintain isolation
- Read-write concurrent access is safe
All 3 async tests pass, proving the immutable history fix works under concurrency
* fix: pre-import core.models in conftest to prevent test pollution
test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at
module level during collection. Any test collected after it imports
Session as a MagicMock. Pre-importing core.models in conftest.py
before test_agent_loop.py's module-level code runs prevents this.
* fix: make .history authoritative mutable list, address PR review
Per review feedback: keep .history as the authoritative mutable list so
existing code doing .history.pop(), .history = [...], etc. still works.
Fix the cross-contamination bug by ensuring __post_init__() gives each
Session its OWN unique history list (never shared).
Changes:
- core/models.py: .history IS the authoritative list. _history aliases it.
Each Session gets its own list in __post_init__.
- core/session_manager.py: add_message() delegates to Session.add_message()
instead of appending directly — no double-append, single source of truth.
- tests/test_session_manager.py: updated test to reflect that .history
references see new messages (same list, not a snapshot).
- docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for
shipping — useful design context but too much process/doc to ship).
All 272 tests pass (3 pre-existing failures unrelated).
* Fix session manager message persistence
* Fix session history alias regressions
* Fix session history aliasing and task delivery
* refactor(tools): consolidate duplicated _truncate and get_mcp_manager into src/tool_utils
Move all copies of _truncate(), get_mcp_manager(), and set_mcp_manager()
into a single leaf module (src/tool_utils.py) that imports only from
src.constants. This eliminates the lazy-import hack
('from src import agent_tools' inside function bodies) in tool_execution.py
and tool_implementations.py, and fixes a latent bug: the _truncate copy in
tool_execution.py was missing the isinstance guard and would crash on None.
Also deletes mcp_servers/_common.py — it was dead code with zero callers
anywhere in the codebase, containing its own copy of truncate() and
constants that already exist in src/constants.py.
* fix(tools): route remaining get_mcp_manager imports to src.tool_utils
The maintainer's feedback flagged src/task_scheduler.py:1857 and
routes/task_routes.py:977. A project-wide search found a third call site
in src/agent_loop.py that also imported get_mcp_manager from
src.agent_tools instead of src.tool_utils.
All three are now sourced from the canonical location in src.tool_utils.
---------
Co-authored-by: mcnoliveira <mcnoliveira@gmail.com>
* feat: assign folder='Tasks' to task sessions at creation
Task sessions (LLM, action, research) now set folder='Tasks' on their
DbSession row, matching the pattern used by the Assistant folder. This
enables sidebar lens filtering without changing existing session
behaviour.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add backfill script for task session folders
One-shot script to set folder='Tasks' on existing [Task]/[Research]
sessions that predate the folder assignment in task_scheduler.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: replace standalone backfill script with automatic migration
Convert scripts/backfill_task_folders.py into _migrate_backfill_task_folders()
in core/database.py, called from init_db(). The migration is idempotent (only
touches rows where folder IS NULL/empty) and runs automatically on upgrade,
so operators no longer need a manual step to tag pre-existing task sessions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Schedule cookbook serves through the existing ScheduledTask system: the
serve preset gets a ^ button next to Launch that opens a daily/hourly/
weekly form mirroring the admin-switch style; the schedule action runs
action_cookbook_serve, which delegates to /api/model/serve and stamps
the resulting task with _scheduledStopAtMs. A background
cookbook_serve_lifecycle loop ticks every 60s and kills any serve
whose window has ended, also dropping the auto-registered endpoint
so the model picker doesn't keep pointing at a dead server.
- Stop and remove on a Running serve now awaits the SSH/tmux kill,
re-checks tmux has-session, and surfaces an error toast (leaving the
row) when the kill failed. Previously fire-and-forget, so a failed
SSH/tmux call silently left the live serve running while the row
vanished from the UI.
- Cookbook tasks/status orphan-adoption sweep no longer requires the
serve-/cookbook- session-id prefix; any tmux session whose pane is
running a known model-server process gets auto-pulled into Running.
Without this loosening, a cookbook-launched serve whose tmux id
fell back to a bare number was invisible — you couldn't see it,
let alone stop it.
- Ollama serve always launches a fresh process under cookbook's tmux
(no more monitor-mode reattach to a systemd/Docker ollama Stop can't
reach). The handler pre-picks a free port by probing the target
host over SSH and mutates req.cmd's OLLAMA_HOST so the runner script
AND the auto-registered endpoint agree on the same bind port.
- Auto-register uses host.docker.internal (when running inside Docker)
instead of localhost, matching the URL /setup adds for Ollama by
hand. Local cookbook serves now produce a chat-reachable endpoint
on first launch.
- Cascade-delete: removing a scheduled cookbook task also deletes any
linked calendar event (cookbook_task_id marker in the description).
- Tasks list groups cookbook_serve under a "Cookbook" category that
sorts above the rest, so scheduler-launched serves are easy to find.
- Claude Agent integration: AGENT_CONFIGS.claude, INTG_TYPES.claude,
setup_claude_routes + integrations/claude/ skill bundle. Wired in
app.py alongside the existing Codex integration; same scope-gated
/api/codex/* backend; agent form has new description so users know
it's setup for an external CLI, not an agent streamed inside Odysseus.
- Remove mark_email_boundaries action: not good enough yet. Stripped
from task UI, scheduler defaults, registry, tool schema, clear-cache
route. Added to RETIRED_HOUSEKEEPING_ACTIONS so existing rows + their
task_runs auto-purge on startup.
- Cookbook download reliability: "Reconnect" fix button in the crash
diagnosis runs _reconnectTask after probing has-session. 30s confirm
window before marking a download "done" — kills the Finished/Downloading
flicker when tmux briefly drops between captures.
- Mobile UX: tap anywhere on a note card body opens the editor;
Update button morphs to Archive when no text was edited; bell icon
accent-colored; chip-trashing notif pills fade so only the icon
rotates into the trash zone.
- Settings integrations: SVG-per-provider in email + API preset
dropdowns, custom drop-up-aware menus, accent sub-header icons
(IMAP/SMTP), consistent card styling between list + edit, contacts
Edit/Delete icons, agent form description copy.
compute_next_run parsed scheduled_time as "HH:MM" with int(parts[0]),
int(parts[1]) and no validation, so "9", "9am", "25:00", "9:" or ":30" raised
IndexError/ValueError. The POST /tasks create route passes the user/LLM-supplied
scheduled_time before its try block (and only validates the cron field), so a
bad value surfaced as an unhandled 500 rather than the clean 400 used for other
invalid fields — and the same crash could fire inside the scheduler loop when
recomputing next_run for an already-stored bad row.
Guard the parse and fail closed (warn + return None), matching the existing
invalid-cron handling in the same function.
Adds tests/test_scheduler_scheduled_time_validation.py — malformed values return
None (fail before with IndexError/ValueError), valid HH:MM still computes.
When a timezone is configured, `now` is tz-aware local time.
The comparison stripped tzinfo with `.replace(tzinfo=None)`,
producing naive local time, but `scheduled_date` is stored as
naive UTC. For users east of UTC this causes tasks to appear
expired prematurely; for users west they linger past due time.
Use `_to_utc_naive(now)` to convert to the same reference frame.
TaskScheduler.start() aborts stale TaskRun rows but never advanced
ScheduledTask.next_run. Across a restart the in-process _executing set
is empty, so the first post-restart _check_due_tasks() call dispatches
every task whose next_run is still in the past — and so does every
subsequent poll, until the task's regular _execute_task path finally
runs compute_next_run and pushes it forward.
start() now queries active tasks with next_run < now and pushes each
one to now + 60s. The first poll after restart sees them as not-yet-due,
the task runs once normally, and compute_next_run puts the schedule
back on its real cadence. Paused and not-yet-due tasks are left alone.
The validator test was rewritten as a regression test asserting the
opposite of the bug it originally demonstrated, plus two narrower cases
to lock down the filter (only active+overdue is touched).
* fix: coerce null endpoint_url when delivering task result to a session
* fix: also coerce null model so the session insert satisfies NOT NULL
* test: cover task session delivery on an empty database
* feat(web-fetch): add web_fetch tool to read a specific URL's content
* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution
Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.