mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-16 17:55:26 -04:00
fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356)
* fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models _resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block parser whenever a model produced no native tool_calls, regardless of whether that model has a reliable native function-calling channel. Native models (GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write illustrative ```bash/```python/```json examples in guide-only prose; the fallback parser matched these and executed them as real commands, sometimes looping for several rounds as the model tried to clarify with more examples (#3222). Restrict the textual fenced-block fallback to non-native models, which rely on it as their only tool-invocation channel. Native models are trusted to use their structured tool_calls channel for real invocations; when they don't emit one, a bare fence in their response is prose, not an action. The native tool_calls path itself is untouched. This sits one layer below #3088's guide-only policy enforcement: that PR blocks tool exposure/execution on explicit no-tools requests, while this fixes the parser so ordinary illustrative fences are never misread as calls in the first place, on any turn. * fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence _resolve_tool_blocks previously short-circuited the entire textual parser (tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native function-calling models with no native tool_calls. That also dropped Patterns 2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content as text), which are real calls a model couldn't emit on its structured channel (e.g. DeepSeek-V falling back to DSML), not illustrative examples. parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks passes skip_fenced=is_api_model so fenced examples stop being executed for native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and recoverable. cleaned_round mirrors the same gate when persisting round text, so an illustrative fence that wasn't executed isn't stripped from saved/reloaded history either (it was streaming once and then disappearing on reload).
This commit is contained in:
+49
-26
@@ -427,7 +427,7 @@ def _parse_tool_code_block(raw: str) -> Optional[ToolBlock]:
|
||||
return None
|
||||
|
||||
|
||||
def parse_tool_blocks(text: str) -> List[ToolBlock]:
|
||||
def parse_tool_blocks(text: str, skip_fenced: bool = False) -> List[ToolBlock]:
|
||||
"""Extract executable tool blocks from LLM response text.
|
||||
|
||||
Supports multiple formats:
|
||||
@@ -436,6 +436,17 @@ def parse_tool_blocks(text: str) -> List[ToolBlock]:
|
||||
3. XML-style <tool_call>/<invoke> blocks
|
||||
4. <tool_code> blocks (MiniMax-M2.5 style)
|
||||
5. DeepSeek DSML markup (normalized to <invoke> first)
|
||||
|
||||
`skip_fenced`: when True, Pattern 1 (fenced ```bash/```python/```json code
|
||||
blocks) is not matched at all. Native function-calling models (GPT/Claude/
|
||||
Grok/Qwen3/DeepSeek-V, etc.) commonly write illustrative fenced examples in
|
||||
prose; for those models we trust the structured tool_calls channel for real
|
||||
invocations and treat a bare fence as display text rather than an action
|
||||
(issue #3222). Patterns 2-5 — explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML
|
||||
markup that leaked into content as text — stay fully active regardless,
|
||||
since that markup is never an illustrative example and dropping it would
|
||||
silently lose real calls (e.g. DeepSeek-V falling back to DSML when it
|
||||
can't emit structured tool_calls).
|
||||
"""
|
||||
blocks = []
|
||||
|
||||
@@ -443,30 +454,31 @@ def parse_tool_blocks(text: str) -> List[ToolBlock]:
|
||||
# XML patterns below catch it.
|
||||
text = _normalize_dsml(text)
|
||||
|
||||
# Pattern 1: fenced code blocks
|
||||
for m in _TOOL_BLOCK_RE.finditer(text):
|
||||
tag = m.group(1).lower()
|
||||
content = m.group(2).strip()
|
||||
if not content:
|
||||
continue
|
||||
# If a code block's content is an <invoke> XML call (some models wrap
|
||||
# tool calls in ```python or ```xml fences), parse the invoke instead.
|
||||
if '<invoke' in content:
|
||||
for inv in _XML_INVOKE_RE.finditer(content):
|
||||
block = _parse_xml_invoke(inv)
|
||||
# Pattern 1: fenced code blocks (skipped when `skip_fenced` — see docstring).
|
||||
if not skip_fenced:
|
||||
for m in _TOOL_BLOCK_RE.finditer(text):
|
||||
tag = m.group(1).lower()
|
||||
content = m.group(2).strip()
|
||||
if not content:
|
||||
continue
|
||||
# If a code block's content is an <invoke> XML call (some models wrap
|
||||
# tool calls in ```python or ```xml fences), parse the invoke instead.
|
||||
if '<invoke' in content:
|
||||
for inv in _XML_INVOKE_RE.finditer(content):
|
||||
block = _parse_xml_invoke(inv)
|
||||
if block:
|
||||
blocks.append(block)
|
||||
# This fenced block is <invoke> markup, not literal code. Whether or
|
||||
# not any call converted, never fall through to append the raw XML as
|
||||
# a python/bash block — e.g. a hyphenated/namespaced tool name that
|
||||
# _XML_INVOKE_RE's \w+ can't match would otherwise be executed as code.
|
||||
continue
|
||||
if tag in ("python", "bash"):
|
||||
block = _parse_misfenced_web_lookup(content)
|
||||
if block:
|
||||
blocks.append(block)
|
||||
# This fenced block is <invoke> markup, not literal code. Whether or
|
||||
# not any call converted, never fall through to append the raw XML as
|
||||
# a python/bash block — e.g. a hyphenated/namespaced tool name that
|
||||
# _XML_INVOKE_RE's \w+ can't match would otherwise be executed as code.
|
||||
continue
|
||||
if tag in ("python", "bash"):
|
||||
block = _parse_misfenced_web_lookup(content)
|
||||
if block:
|
||||
blocks.append(block)
|
||||
continue
|
||||
blocks.append(ToolBlock(tag, content))
|
||||
continue
|
||||
blocks.append(ToolBlock(tag, content))
|
||||
|
||||
# Pattern 2: [TOOL_CALL] blocks (only if no fenced blocks found)
|
||||
if not blocks:
|
||||
@@ -500,12 +512,23 @@ def parse_tool_blocks(text: str) -> List[ToolBlock]:
|
||||
return blocks
|
||||
|
||||
|
||||
def strip_tool_blocks(text: str) -> str:
|
||||
"""Remove executable tool blocks from text for clean display."""
|
||||
def strip_tool_blocks(text: str, skip_fenced: bool = False) -> str:
|
||||
"""Remove executable tool blocks from text for clean display.
|
||||
|
||||
`skip_fenced`: when True, fenced ```bash/```python/```json code blocks
|
||||
(Pattern 1) are left intact instead of being stripped. This must mirror
|
||||
whatever `skip_fenced` value `parse_tool_blocks` was called with for the
|
||||
same response: if a fence wasn't executed as a tool call (because it's an
|
||||
illustrative example from a native function-calling model), it shouldn't
|
||||
vanish from the persisted/displayed text either — otherwise the example
|
||||
streams once and then disappears on reload (issue #3222 follow-up).
|
||||
Patterns 2-5 + DSML markup are always stripped, since that markup should
|
||||
never reach the user regardless of whether it converted to a tool call.
|
||||
"""
|
||||
# Normalize DSML first so its markup gets stripped by the <invoke>
|
||||
# / <tool_call> removers below instead of leaking to the user.
|
||||
text = _normalize_dsml(text)
|
||||
cleaned = _TOOL_BLOCK_RE.sub('', text)
|
||||
cleaned = text if skip_fenced else _TOOL_BLOCK_RE.sub('', text)
|
||||
cleaned = _TOOL_CALL_RE.sub('', cleaned)
|
||||
cleaned = _XML_TOOL_CALL_RE.sub('', cleaned)
|
||||
cleaned = _TOOL_CODE_RE.sub('', cleaned)
|
||||
|
||||
Reference in New Issue
Block a user