Files
odysseus/tests
Joshua Valderrama 35b4dd2824 fix: session context drifting — messages leaking between chats (#135) (#267)
* docs: add implementation plan for fixing chat context drifting (#135)

* fix: make Session.history immutable + fix {}.history crash

- Session.history now exposes a COPY of the internal _history list
- add_message() replaces history with a fresh copy each time
- get_context_messages() derives from _history directly
- replace_messages() updates both _history and history
- truncate_messages() updates both _history and history
- _persist_message() line 207: fixed {}.history fallback crash
- Added 11 tests for session isolation and edge cases

Addresses #135 root cause #1: shared mutable references

* fix: task scheduler uses SessionManager methods instead of overwriting sessions

- Added ensure_task_session() to SessionManager (checks cache first)
- Task scheduler now uses ensure_task_session() instead of direct dict assignment
- Task scheduler now uses SessionManager.add_message() for message persistence
- Removed direct sess_obj.history.append() that was silently losing data

Addresses #135 root causes #2 and #3

* fix: add age guard to cleanup_empty_sessions — don't delete sessions <1h old

Prevents the cleanup task from deleting sessions that were just created
and haven't received any messages yet (message_count == 0).

Addresses #135 root cause #5

* test: comprehensive session isolation tests (10/10 passing)

* refactor: consolidate _session_manager into singleton pattern

- Added set_session_manager_instance / get_session_manager_instance to core/models
- kept backward-compat aliases (set_session_manager, get_session_manager)
- session_manager.py re-exports the singleton functions
- ai_interaction.set_session_manager now syncs with the core singleton
- context_compactor uses get_session_manager_instance() instead of getattr hack
- app.py initializes the singleton once

Addresses #135 root cause #4: fragile global wiring

* test: add concurrent session isolation integration tests

Verifies:
- Concurrent add_message to different sessions doesn't cross-contaminate
- Rapid parallel writes maintain isolation
- Read-write concurrent access is safe

All 3 async tests pass, proving the immutable history fix works under concurrency

* fix: pre-import core.models in conftest to prevent test pollution

test_agent_loop.py stubs sys.modules['core.models'] = MagicMock() at
module level during collection. Any test collected after it imports
Session as a MagicMock. Pre-importing core.models in conftest.py
before test_agent_loop.py's module-level code runs prevents this.

* fix: make .history authoritative mutable list, address PR review

Per review feedback: keep .history as the authoritative mutable list so
existing code doing .history.pop(), .history = [...], etc. still works.
Fix the cross-contamination bug by ensuring __post_init__() gives each
Session its OWN unique history list (never shared).

Changes:
- core/models.py: .history IS the authoritative list. _history aliases it.
  Each Session gets its own list in __post_init__.
- core/session_manager.py: add_message() delegates to Session.add_message()
  instead of appending directly — no double-append, single source of truth.
- tests/test_session_manager.py: updated test to reflect that .history
  references see new messages (same list, not a snapshot).
- docs/plans/2026-06-01-fix-chat-context-drifting.md: removed (not for
  shipping — useful design context but too much process/doc to ship).

All 272 tests pass (3 pre-existing failures unrelated).

* Fix session manager message persistence

* Fix session history alias regressions

* Fix session history aliasing and task delivery
2026-06-09 14:12:52 +01:00
..
2026-06-01 02:22:17 +00:00

Test Suite Notes

Purpose

This file documents the shared test helpers and the review expectations that go with them. The suite is being refactored incrementally, so this is a working reference for that effort - not a claim that the suite is already fully organized. Read it before adding a new helper or before reviewing a PR that touches tests/helpers/.

For the broader rules - test taxonomy, determinism/isolation rules, the behavioral-vs-source-text policy, and helper/factory extraction rules - see TESTING_STANDARD.md. This file is the concrete helper reference; that file is the standard the refactor works toward.

Running focused subsets (taxonomy markers)

tests/conftest.py tags every test at collection time with two markers derived from its filename by tests/_taxonomy.py: an area_* marker (e.g. area_security) and a finer sub_* marker (e.g. sub_owner_scope). This adds markers only - it moves no files and changes no test behavior. Use them to run a focused slice:

python3 -m pytest -m area_security
python3 -m pytest -m "area_services and sub_cookbook"

Areas are security, routes, services, cli, js, helpers, unit, and uncategorized. Classification is conservative and token-based: a file that matches no area keyword falls back to area_uncategorized with its filename as the sub-area. The area_* names are registered in pyproject.toml; the dynamic sub_* names are registered before collection by pytest_configure in tests/conftest.py, so unknown-mark warnings still flag genuine typos.

Core principles

  • Keep PRs small and homogeneous: one kind of change per PR.
  • Prefer explicit local setup over hidden global fixtures.
  • Avoid expanding the root conftest.py unless absolutely necessary.
  • Do not mix file moves with logic changes in the same PR.
  • Do not weaken tests with skip/xfail just to make CI pass.
  • Validate the focused files you changed, plus any neighboring or order-sensitive groups they interact with.

Helper conventions

The helpers below live under tests/helpers/. They exist to remove repeated boilerplate that already appeared across multiple tests. Reach for one only when your test matches its intended use; do not stretch a helper to cover a new case.

tests.helpers.cli_loader.load_script

Use when a test needs to import a script under scripts/ without repeating SourceFileLoader / importlib.util boilerplate.

  • Intended for script/CLI tests that load a single file from scripts/.
  • Not for arbitrary package imports - use a normal import for those.
  • When migrating an existing test to it, keep the existing stubs and assertions unchanged. Any sys.modules stubs the script needs at import time must still be injected (e.g. via monkeypatch) before calling load_script.

tests.helpers.import_state.clear_module

Use when a test must drop one cached module and its parent-package attribute before a fresh import.

  • Clears sys.modules[name].
  • Clears the parent-package attribute when present.
  • Good replacement for local sys.modules.pop(...) + delattr(parent, child) blocks.

tests.helpers.import_state.preserve_import_state

Use when a test temporarily installs stubs into sys.modules and needs deterministic cleanup afterward.

  • Context manager: restores both sys.modules entries and parent-package attributes on exit (normal or exception).
  • Useful around module-level stubs or temporary imports.
  • Prefer narrow, explicit module names over broad ones.

tests.helpers.import_state.clear_fake_database_modules

Use only for the guarded fake/stub database cleanup pattern.

  • Preserves a real-looking core.database (one with a string __file__).
  • Removes a fake/stub core.database and the related src.database state.
  • Do not use as a general database reset fixture.

tests.helpers.import_state.clear_fake_endpoint_resolver_modules

Use only for the guarded fake/stub src.endpoint_resolver cleanup pattern.

  • Preserves real resolver modules (those with a truthy __file__).
  • Evicts fake/stub resolver modules and the dependent route modules that were cached against them.
  • Accepts explicit extra dependent module names to evict alongside the defaults.

tests.helpers.sqlite_db.make_temp_sqlite

Use for the repeated file-backed temp sqlite setup in tests.

  • Only constructs (SessionLocal, engine, tmpfile) from the repeated block.
  • Does not patch modules and does not clean up the temp file.
  • The caller must bind SessionLocal explicitly onto whatever module the code under test reads, and must keep the returned objects alive.
  • Do not use it as a general DB fixture framework.

What not to abstract yet

Some remaining patterns should stay as-is for now rather than being forced into helpers:

  • Large mixed files such as security/review regression files.
  • Setup-oriented sys.modules stub installers.
  • One-off custom module patching.
  • DB/session/route setup, until it has been audited separately.

Validation expectations

Run validation locally before opening or approving a PR. Practical checks:

  • git diff --check - catch whitespace and conflict-marker errors.
  • python3 -m py_compile <changed files> - confirm changed files compile.
  • Focused pytest on the changed test files.
  • pytest on neighboring or order-sensitive test groups that share import state with the changed files.
  • grep for the old boilerplate when replacing it, to confirm no stragglers remain.
  • A fresh audit worktree when changing the helpers themselves, so stale __pycache__ or import state cannot mask a regression.

Current roadmap

  1. Import-state cleanup - complete.
  2. Document helper conventions (this file).
  3. Audit fake DB / SessionLocal / route setup duplication.
  4. Add tiny helpers only when the repeated semantics are clear.
  5. Start low-risk file moves only after helper conventions are documented.
  6. Avoid moving high-risk security/route regression files first.