mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-15 17:25:26 -04:00
fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON (#2205)
* fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON The extractor sliced from the FIRST brace to the LAST brace to recover JSON embedded in surrounding commentary. When the model emits stray braces before the JSON object, the slice produces invalid JSON, json.loads raises, and the exception is swallowed -- the skill is silently lost. Fix: walk each brace candidate left-to-right and attempt json.loads on each slice. The first candidate that parses successfully wins. If none parse, json.loads on the original text raises and the existing JSONDecodeError handler logs and returns None as before. Tested locally -- 8/8 tests passed: tests/test_extract_skill_json_nonstring.py (2 passed) tests/test_skill_extractor_rows.py (1 passed) tests/test_search_content_extraction_parity.py (2 passed) tests/test_deep_research_search_error.py (3 passed) Closes #2199 * test(skill-extractor): add focused repro for stray-brace JSON recovery * test(skill-extractor): add regression test for leading invalid-brace fragment Addresses the remaining edge case from review: a response that *starts* with a brace but the leading fragment isn't valid JSON (e.g. '{not json}\n{"title": "Valid later", ...}') still needs to recover the valid skill object that follows. _extract_json_object (already on dev) handles this correctly — it tries the whole de-fenced string first, then walks each '{' candidate left-to- right regardless of whether the response begins with '{', so the leading invalid fragment no longer short-circuits recovery of the real object. Updates the comment at the call site to call this out explicitly and adds a regression test covering exactly the scenario described in review.
This commit is contained in:
@@ -210,8 +210,10 @@ async def maybe_extract_skill(
|
||||
pass
|
||||
|
||||
# Parse JSON. The object may be wrapped in code fences or surrounded by
|
||||
# commentary (and may contain a stray brace before the real object), so
|
||||
# use a tolerant extractor that tries each '{' candidate.
|
||||
# commentary (and may contain a stray/invalid brace fragment before
|
||||
# the real object — including one that makes the response itself look
|
||||
# like it starts with '{'), so use a tolerant extractor that tries the
|
||||
# whole string first and then each '{' candidate left-to-right.
|
||||
data = _extract_json_object(response)
|
||||
if not data:
|
||||
logger.debug("[skill-extract] no JSON object found in response, dropping")
|
||||
|
||||
Reference in New Issue
Block a user