fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356)

* fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models

_resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block
parser whenever a model produced no native tool_calls, regardless of whether
that model has a reliable native function-calling channel. Native models
(GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write
illustrative ```bash/```python/```json examples in guide-only prose; the
fallback parser matched these and executed them as real commands, sometimes
looping for several rounds as the model tried to clarify with more examples
(#3222).

Restrict the textual fenced-block fallback to non-native models, which rely
on it as their only tool-invocation channel. Native models are trusted to use
their structured tool_calls channel for real invocations; when they don't
emit one, a bare fence in their response is prose, not an action. The native
tool_calls path itself is untouched.

This sits one layer below #3088's guide-only policy enforcement: that PR
blocks tool exposure/execution on explicit no-tools requests, while this fixes
the parser so ordinary illustrative fences are never misread as calls in the
first place, on any turn.

* fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence

_resolve_tool_blocks previously short-circuited the entire textual parser
(tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native
function-calling models with no native tool_calls. That also dropped Patterns
2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content
as text), which are real calls a model couldn't emit on its structured channel
(e.g. DeepSeek-V falling back to DSML), not illustrative examples.

parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY
Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks
passes skip_fenced=is_api_model so fenced examples stop being executed for
native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and
recoverable. cleaned_round mirrors the same gate when persisting round text, so
an illustrative fence that wasn't executed isn't stripped from saved/reloaded
history either (it was streaming once and then disappearing on reload).
This commit is contained in:
Lucas Daniel
2026-06-08 17:25:28 -03:00
committed by GitHub
parent 8e494cc1c4
commit 0a324f20d2
3 changed files with 363 additions and 30 deletions
+49 -26
View File
@@ -427,7 +427,7 @@ def _parse_tool_code_block(raw: str) -> Optional[ToolBlock]:
return None
def parse_tool_blocks(text: str) -> List[ToolBlock]:
def parse_tool_blocks(text: str, skip_fenced: bool = False) -> List[ToolBlock]:
"""Extract executable tool blocks from LLM response text.
Supports multiple formats:
@@ -436,6 +436,17 @@ def parse_tool_blocks(text: str) -> List[ToolBlock]:
3. XML-style <tool_call>/<invoke> blocks
4. <tool_code> blocks (MiniMax-M2.5 style)
5. DeepSeek DSML markup (normalized to <invoke> first)
`skip_fenced`: when True, Pattern 1 (fenced ```bash/```python/```json code
blocks) is not matched at all. Native function-calling models (GPT/Claude/
Grok/Qwen3/DeepSeek-V, etc.) commonly write illustrative fenced examples in
prose; for those models we trust the structured tool_calls channel for real
invocations and treat a bare fence as display text rather than an action
(issue #3222). Patterns 2-5 — explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML
markup that leaked into content as text — stay fully active regardless,
since that markup is never an illustrative example and dropping it would
silently lose real calls (e.g. DeepSeek-V falling back to DSML when it
can't emit structured tool_calls).
"""
blocks = []
@@ -443,30 +454,31 @@ def parse_tool_blocks(text: str) -> List[ToolBlock]:
# XML patterns below catch it.
text = _normalize_dsml(text)
# Pattern 1: fenced code blocks
for m in _TOOL_BLOCK_RE.finditer(text):
tag = m.group(1).lower()
content = m.group(2).strip()
if not content:
continue
# If a code block's content is an <invoke> XML call (some models wrap
# tool calls in ```python or ```xml fences), parse the invoke instead.
if '<invoke' in content:
for inv in _XML_INVOKE_RE.finditer(content):
block = _parse_xml_invoke(inv)
# Pattern 1: fenced code blocks (skipped when `skip_fenced` — see docstring).
if not skip_fenced:
for m in _TOOL_BLOCK_RE.finditer(text):
tag = m.group(1).lower()
content = m.group(2).strip()
if not content:
continue
# If a code block's content is an <invoke> XML call (some models wrap
# tool calls in ```python or ```xml fences), parse the invoke instead.
if '<invoke' in content:
for inv in _XML_INVOKE_RE.finditer(content):
block = _parse_xml_invoke(inv)
if block:
blocks.append(block)
# This fenced block is <invoke> markup, not literal code. Whether or
# not any call converted, never fall through to append the raw XML as
# a python/bash block — e.g. a hyphenated/namespaced tool name that
# _XML_INVOKE_RE's \w+ can't match would otherwise be executed as code.
continue
if tag in ("python", "bash"):
block = _parse_misfenced_web_lookup(content)
if block:
blocks.append(block)
# This fenced block is <invoke> markup, not literal code. Whether or
# not any call converted, never fall through to append the raw XML as
# a python/bash block — e.g. a hyphenated/namespaced tool name that
# _XML_INVOKE_RE's \w+ can't match would otherwise be executed as code.
continue
if tag in ("python", "bash"):
block = _parse_misfenced_web_lookup(content)
if block:
blocks.append(block)
continue
blocks.append(ToolBlock(tag, content))
continue
blocks.append(ToolBlock(tag, content))
# Pattern 2: [TOOL_CALL] blocks (only if no fenced blocks found)
if not blocks:
@@ -500,12 +512,23 @@ def parse_tool_blocks(text: str) -> List[ToolBlock]:
return blocks
def strip_tool_blocks(text: str) -> str:
"""Remove executable tool blocks from text for clean display."""
def strip_tool_blocks(text: str, skip_fenced: bool = False) -> str:
"""Remove executable tool blocks from text for clean display.
`skip_fenced`: when True, fenced ```bash/```python/```json code blocks
(Pattern 1) are left intact instead of being stripped. This must mirror
whatever `skip_fenced` value `parse_tool_blocks` was called with for the
same response: if a fence wasn't executed as a tool call (because it's an
illustrative example from a native function-calling model), it shouldn't
vanish from the persisted/displayed text either — otherwise the example
streams once and then disappears on reload (issue #3222 follow-up).
Patterns 2-5 + DSML markup are always stripped, since that markup should
never reach the user regardless of whether it converted to a tool call.
"""
# Normalize DSML first so its markup gets stripped by the <invoke>
# / <tool_call> removers below instead of leaking to the user.
text = _normalize_dsml(text)
cleaned = _TOOL_BLOCK_RE.sub('', text)
cleaned = text if skip_fenced else _TOOL_BLOCK_RE.sub('', text)
cleaned = _TOOL_CALL_RE.sub('', cleaned)
cleaned = _XML_TOOL_CALL_RE.sub('', cleaned)
cleaned = _TOOL_CODE_RE.sub('', cleaned)