Fix _parse_msg_content corrupting JSON-array-like text messages on reload (#2060)

_parse_msg_content deserializes stored multimodal content (image/audio
blocks) back into a list. It treated ANY string starting with '[{' and
containing the substring "type" as serialized content, requiring only
that each element be a dict — never that "type" be a real content-block
kind. So a plain text message whose content happens to be a JSON array
of typed objects (e.g. a user pasting an API schema sample like
[{"type": "object", ...}]) was silently parsed from str into a list on
the next hydration, destroying the original string. This runs on every
session load from the DB (_db_to_session -> get_session). Restrict the
round-trip to non-empty lists whose every element is a dict whose
"type" is a recognized block kind (text/image/image_url/audio/...);
real multimodal content (verified: document_processor emits exactly
these) still round-trips, JSON-looking text is left untouched.
This commit is contained in:
Afonso Coutinho
2026-06-27 14:31:51 +01:00
committed by GitHub
parent e3ecdd3207
commit edd5ea36ad
2 changed files with 92 additions and 1 deletions
+12 -1
View File
@@ -40,7 +40,18 @@ def _parse_msg_content(raw):
if isinstance(raw, str) and raw.startswith('[{') and '"type"' in raw:
try:
parsed = json.loads(raw)
if isinstance(parsed, list) and all(isinstance(p, dict) for p in parsed):
# Only treat as serialized multimodal content when EVERY element is
# a dict whose "type" is a recognized content-block kind. Otherwise a
# plain text message that merely *looks* like a JSON array of objects
# (e.g. a user pasting an API schema/sample with a "type" field) was
# silently parsed back into a list, destroying the original string.
_BLOCK_TYPES = {
"text", "image", "image_url", "audio", "input_audio",
"input_image", "document", "file",
}
if (isinstance(parsed, list) and parsed
and all(isinstance(p, dict) and p.get("type") in _BLOCK_TYPES
for p in parsed)):
return parsed
except (json.JSONDecodeError, ValueError):
pass