* fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn
extract_and_store appended the recent window as raw alternating role messages
after the system prompt. Since the window is the last N messages, the prompt
usually ENDED on an assistant turn — and a chat model given a prompt ending on
an assistant turn returns an empty completion (nothing to answer). The result
was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no
memories were ever stored, while skill extraction (which flattens the transcript
into a single user message) worked fine.
Flatten the window into one user message ending with an explicit instruction,
mirroring the skill extractor, so the model always responds. Also harden parsing
for reasoning models, matching the audit path which already does this:
- raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think>
before emitting JSON; 500 truncated it before any JSON appeared);
- strip <think>/prose preambles via strip_think and slice the embedded JSON
array before json.loads, instead of bombing on char 0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(memory): reframe the comment to the accurate root cause (raw-chat framing)
The earlier comment leaned on "ends on an assistant turn -> empty completion",
which is only one failure mode. The dominant cause, confirmed by a controlled
repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat
messages makes the model treat it as a conversation to continue rather than a
transcript to analyze, so it returns [] even when durable facts are present.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally
Factor the strip/fence/slice/json.loads logic out of extract_and_store into
a pure module-level helper _parse_extraction_json(raw) -> list and drop the
'text[0] != "["' guard so the array is sliced whenever both brackets exist
(fixes trailing commentary like '[...] Done!' reaching json.loads).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
_fallback_memory_candidates matched both positive (prefer/like/love) and
negative (hate / do not like / don't like) sentiment verbs in one regex
alternation, then formatted every hit as "User prefers {X}.". So
"I hate cilantro" was stored as "User prefers cilantro." -- the inverse of
what the user said. These fallback facts are persisted to memory and later
re-injected into the model's context, so the inverted preference actively
misleads the assistant.
Capture the matched verb and branch on it: negatives become
"User dislikes {X}.", positives stay "User prefers {X}." (still filed under
the existing "preference" category).
Supported by Claude Opus 4.8
Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
extract_and_store dedups each extracted fact against the vector store
before the (owner-scoped) text fallback. The vector store is a single
shared ChromaDB collection storing only {"source": "memory"} — no
owner — and find_similar queries it with no owner filter, so it can
return a memory_id belonging to a different tenant. The old code
continue'd (skipped storing) on any vector hit without checking
ownership, so when ChromaDB is healthy (the common path) a user's
freshly-extracted fact was silently dropped because it was merely
semantically similar to another user's memory — the text fallback that
IS owner-scoped never ran. Gate the skip on the matched memory being
this user's own (or legacy unowned), mirroring the text dedup predicate;
cross-tenant or stale matches fall through. Same bug class as #1743.
audit_memories saves final_entries merged with other owners' entries
(correct), but then rebuilt the shared vector collection from
final_entries alone — wiping every other owner from semantic search
until they happened to run their own audit. Keyword fallback masked
it, so it degraded silently. Capture saved_entries once and rebuild
from that.
Caught by #1747.
Two independent data-integrity bugs:
- services/research/service.py: ResearchService.research() (the public deep-research
API, re-exported from services/__init__) treated the handler return value as a
dict (result.get("sources"/"summary"/...)), but call_research_service() returns a
formatted markdown STRING -> AttributeError: str has no attribute get on EVERY
successful call, making the API unusable for any non-error result. Now uses the
string report as the summary and parses sources from the "### Sources" markdown
section (section-bounded, URL-deduped), with a defensive dict branch for back-compat.
- services/memory/memory_extractor.py: extract_and_store guarded the vector-store
find_similar/add calls only with the .healthy flag set ONCE at init. If the
embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint
down), those calls raised, the exception escaped the dedup loop, skipped
memory_manager.save(), and was swallowed by the outer try/except -> EVERY
validated fact from the session was silently lost (the function docstring
promises "never raised"). Now falls back to the existing text/fuzzy dedup so
facts are still saved when the vector index is unavailable at runtime.
Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.