mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-15 17:25:26 -04:00
34a3f8637a
* fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn extract_and_store appended the recent window as raw alternating role messages after the system prompt. Since the window is the last N messages, the prompt usually ENDED on an assistant turn — and a chat model given a prompt ending on an assistant turn returns an empty completion (nothing to answer). The result was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no memories were ever stored, while skill extraction (which flattens the transcript into a single user message) worked fine. Flatten the window into one user message ending with an explicit instruction, mirroring the skill extractor, so the model always responds. Also harden parsing for reasoning models, matching the audit path which already does this: - raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think> before emitting JSON; 500 truncated it before any JSON appeared); - strip <think>/prose preambles via strip_think and slice the embedded JSON array before json.loads, instead of bombing on char 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(memory): reframe the comment to the accurate root cause (raw-chat framing) The earlier comment leaned on "ends on an assistant turn -> empty completion", which is only one failure mode. The dominant cause, confirmed by a controlled repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat messages makes the model treat it as a conversation to continue rather than a transcript to analyze, so it returns [] even when durable facts are present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally Factor the strip/fence/slice/json.loads logic out of extract_and_store into a pure module-level helper _parse_extraction_json(raw) -> list and drop the 'text[0] != "["' guard so the array is sliced whenever both brackets exist (fixes trailing commentary like '[...] Done!' reaching json.loads). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
37 lines
1.5 KiB
Python
37 lines
1.5 KiB
Python
"""_parse_extraction_json must survive reasoning-model noise.
|
|
|
|
The extraction model wraps its JSON array in <think> blocks, ```json fences,
|
|
or leading/trailing prose. The helper strips that noise and slices the array
|
|
unconditionally — a reply that starts with '[' can still carry trailing
|
|
commentary like "[...] Done!" that would otherwise break json.loads.
|
|
"""
|
|
|
|
from services.memory.memory_extractor import _parse_extraction_json
|
|
|
|
|
|
def test_think_prefixed_array_parses_to_one_fact():
|
|
raw = '<think>reasoning...</think>\n[{"text": "x", "category": "fact"}]'
|
|
assert _parse_extraction_json(raw) == [{"text": "x", "category": "fact"}]
|
|
|
|
|
|
def test_fenced_json_block_parses():
|
|
raw = '```json\n[{"text": "x", "category": "fact"}]\n```'
|
|
assert _parse_extraction_json(raw) == [{"text": "x", "category": "fact"}]
|
|
|
|
|
|
def test_leading_prose_before_array_parses():
|
|
raw = 'Here are the durable facts:\n[{"text": "x", "category": "fact"}]'
|
|
assert _parse_extraction_json(raw) == [{"text": "x", "category": "fact"}]
|
|
|
|
|
|
def test_trailing_commentary_after_array_parses():
|
|
# Exercises the unconditional slice: text starts with '[' but has trailing
|
|
# commentary that the old `text[0] != "["` guard skipped, breaking json.loads.
|
|
raw = '[{"text": "x", "category": "fact"}] Done!'
|
|
assert _parse_extraction_json(raw) == [{"text": "x", "category": "fact"}]
|
|
|
|
|
|
def test_malformed_no_array_returns_empty():
|
|
assert _parse_extraction_json("no array here, sorry") == []
|
|
assert _parse_extraction_json("") == []
|