* docs: add implementation plan for fixing chat context drifting (#135)
* fix: make Session.history immutable + fix {}.history crash
- Session.history now exposes a COPY of the internal _history list
- add_message() replaces history with a fresh copy each time
- get_context_messages() derives from _history directly
- replace_messages() updates both _history and history
- truncate_messages() updates both _history and history
- _persist_message() line 207: fixed {}.history fallback crash
- Added 11 tests for session isolation and edge cases
Addresses #135 root cause #1: shared mutable references
* fix: task scheduler uses SessionManager methods instead of overwriting sessions
- Added ensure_task_session() to SessionManager (checks cache first)
- Task scheduler now uses ensure_task_session() instead of direct dict assignment
- Task scheduler now uses SessionManager.add_message() for message persistence
- Removed direct sess_obj.history.append() that was silently losing data
Addresses #135 root causes #2 and #3
* fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old
Prevents the cleanup task from deleting sessions that were just created
and haven't received any messages yet (message_count == 0).
Addresses #135 root cause #5
* test: comprehensive session isolation tests (10/10 passing)
* refactor: consolidate _session_manager into singleton pattern
- Added set_session_manager_instance / get_session_manager_instance to core/models
- kept backward-compat aliases (set_session_manager, get_session_manager)
- session_manager.py re-exports the singleton functions
- ai_interaction.set_session_manager now syncs with the core singleton
- context_compactor uses get_session_manager_instance() instead of getattr hack
- app.py initializes the singleton once
Addresses #135 root cause #4: fragile global wiring
* test: add concurrent session isolation integration tests
Verifies:
- Concurrent add_message to different sessions doesn't cross-contaminate
- Rapid parallel writes maintain isolation
- Read-write concurrent access is safe
All 3 async tests pass, proving the immutable history fix works under concurrency
* fix: pre-import core.models in conftest to prevent test pollution
test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at
module level during collection. Any test collected after it imports
Session as a MagicMock. Pre-importing core.models in conftest.py
before test_agent_loop.py's module-level code runs prevents this.
* fix: make .history authoritative mutable list, address PR review
Per review feedback: keep .history as the authoritative mutable list so
existing code doing .history.pop(), .history = [...], etc. still works.
Fix the cross-contamination bug by ensuring __post_init__() gives each
Session its OWN unique history list (never shared).
Changes:
- core/models.py: .history IS the authoritative list. _history aliases it.
Each Session gets its own list in __post_init__.
- core/session_manager.py: add_message() delegates to Session.add_message()
instead of appending directly — no double-append, single source of truth.
- tests/test_session_manager.py: updated test to reflect that .history
references see new messages (same list, not a snapshot).
- docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for
shipping — useful design context but too much process/doc to ship).
All 272 tests pass (3 pre-existing failures unrelated).
* Fix session manager message persistence
* Fix session history alias regressions
* Fix session history aliasing and task delivery
On summary LLM call failure, maybe_compact was returning system_msgs+recent
(dropping the older half) with was_compacted=False, misleading the caller into
thinking the list was unchanged. Return the original messages list unchanged so
no history is lost; the next trim_for_context call handles length if needed.
Fixes#2160
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix: context_compactor token helpers crash on non-string message text
* fix: _truncate_text_to_token_budget returns an empty string for non-string text, not the raw value
context_compactor.maybe_compact built its summary text with
msg.get('content', '')[:2000], which raised
TypeError: 'NoneType' object is not subscriptable on assistant turns
whose content is None (turns that carried only native tool_calls).
Once a conversation crossed the 85% compaction threshold — reached
after only a few turns on small-context local models plus the large
agent prompt — every subsequent message failed ("send more than three
messages and it stops working").
Flatten message content to text first via a _content_as_text helper
(str passthrough, multimodal list blocks joined, None -> "") and
tolerate a missing role. Adds tests/test_context_compactor.py covering
the helper and a >=4-message conversation that forces compaction with
a None-content tool-call turn (fails before this change, passes after).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The context compactor computed split_point against convo_msgs (system
messages filtered out) but applied it directly to session.history which
includes the system messages. After compaction, the original system
prompt was dropped and replaced by an off-by-N slice of the full history.
This silently dropped the system prompt (preset, persona, RAG context)
from every compacted session — the model would lose persona, RAG, and
preset guidance on the next turn after a long conversation.
The split in maybe_compact does:
convo_msgs = [m for m in messages if m['role'] != 'system']
split_point = len(convo_msgs) // 2
so split_point is indexed against the system-stripped list. But the
helper _update_session_history took (session, split_point, summary) and
did session.history[split_point:]. session.history is the full list
including the leading system messages, so this dropped the first
system_msg_count messages.
Fix: pass system_msg_count=len(system_msgs) into _update_session_history
and use session.history[system_msg_count + split_point:] as the recent
slice, with session.history[:system_msg_count] prepended to preserve
persona/preset/RAG system messages.
Validated: tests/test_compactor_data_loss.py both tests now pass (were
failing). tests/test_context_compactor.py 12 pre-existing tests still
pass.
Symptom was: post-compaction history = [summary] + assistant_1 + user_2
+ assistant_2 (system_A was lost).
Co-authored-by: Ernest Hysa <ernest@example.com>