Commit Graph

5 Commits

Author SHA1 Message Date
Lucas Daniel 73315e6ddc fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON (#2205)
* fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON

The extractor sliced from the FIRST brace to the LAST brace to recover
JSON embedded in surrounding commentary. When the model emits stray
braces before the JSON object, the slice produces invalid JSON,
json.loads raises, and the exception is swallowed -- the skill is
silently lost.

Fix: walk each brace candidate left-to-right and attempt json.loads on
each slice. The first candidate that parses successfully wins. If none
parse, json.loads on the original text raises and the existing
JSONDecodeError handler logs and returns None as before.

Tested locally -- 8/8 tests passed:
  tests/test_extract_skill_json_nonstring.py (2 passed)
  tests/test_skill_extractor_rows.py (1 passed)
  tests/test_search_content_extraction_parity.py (2 passed)
  tests/test_deep_research_search_error.py (3 passed)

Closes #2199

* test(skill-extractor): add focused repro for stray-brace JSON recovery

* test(skill-extractor): add regression test for leading invalid-brace fragment

Addresses the remaining edge case from review: a response that *starts*
with a brace but the leading fragment isn't valid JSON (e.g.
'{not json}\n{"title": "Valid later", ...}') still needs to recover
the valid skill object that follows.

_extract_json_object (already on dev) handles this correctly — it tries
the whole de-fenced string first, then walks each '{' candidate left-to-
right regardless of whether the response begins with '{', so the leading
invalid fragment no longer short-circuits recovery of the real object.
Updates the comment at the call site to call this out explicitly and adds
a regression test covering exactly the scenario described in review.
2026-06-07 23:31:12 +01:00
Mazen Tamer Salah 92ef01d4fa fix(skills): tolerate a stray brace before the JSON in skill extraction (#2200)
maybe_extract_skill() sliced the LLM response from the first '{' to the
last '}'. When a model emits a stray brace in prose before the real
object (e.g. "uses {placeholder} then {...}"), the slice starts at the
prose brace, json.loads fails, and a valid skill is silently dropped.

Factor parsing into _extract_json_object(), which tries the whole
(de-fenced) string first and then each '{' start position, returning the
first candidate that parses to a JSON object.

Adds tests/test_skill_extractor_json.py.
2026-06-07 16:54:36 +02:00
red person ee8c049f9e Skip invalid skill extractor rows (#1546) 2026-06-03 14:06:53 +09:00
Tushar-Projects c3228f8b59 Background tasks: respect active session model fallback 2026-06-02 20:57:42 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00