* fix(security): prevent ReDoS in XML and args tool-call parsers
Four py/polynomial-redos sinks in tool_parsing.py ran lazy/greedy regexes over
untrusted model output (tool-call markup is attacker-influenced via prompt
injection). When the closing delimiter was absent, each rescanned to
end-of-string from every opener -> O(n^2):
- args => { ... } in _parse_tool_call_block: greedy \{([\s\S]*)\} restarted
from every `args:{` opener. Now finds the opener once and takes through the
last `}` (rfind) — equivalent capture, O(n).
- _XML_INVOKE_RE: lazy <invoke ...>([\s\S]*?)</invoke>. Now _iter_xml_invoke
pairs each opener with the first reachable </invoke> and stops when none is.
- _XML_DIRECT_TOOL_RE and the <tag>([\s\S]*?)</\1> param scan in
_parse_tool_code_block: lazy backreference patterns. Now _iter_backref_blocks
pairs each opener with the nearest matching closer and memoizes tag names
with no remaining closer, so an opener flood stays O(n).
All four are output-equivalent to the originals on well-formed tool-call markup;
the lazy patterns remain defined (still re-exported via agent_tools) but no
longer drive a finditer over untrusted text. Adds tests/test_redos_xml_tool_parsers.py
pinning correctness and bounding the opener-flood inputs (old paths took 4-15s).
* fix(security): harden invoke-parameter and distinct-name tag scans
Forward-only the two residual ReDoS paths in the XML/tool parsers that the
outer-delimiter fix left quadratic:
- _parse_xml_invoke parsed <parameter> with _XML_PARAM_RE.finditer, so a
closed <invoke> body full of unclosed <parameter> openers rescanned the
body from every opener (O(n^2), ~11s at 8k openers). Now scans forward-only
via _iter_named_blocks, factored out of _iter_xml_invoke.
- _iter_backref_blocks only memoized repeated missing tag names; a flood of
distinct unclosed names searched the suffix once per name (O(n^2)). It now
indexes every closer by name in one linear pass and binary-searches per
opener (O(n log n)). Covers the direct and tool_code backref scans.
Output-equivalent to the prior scanners (200k randomized trials match the
memoized version for both the direct ci=True and tool_code ci=False configs).
Adds regressions for the closed-invoke parameter flood and the distinct-name
floods (45k openers now run in ~0.05s, were 5-6s).
* fix(security): prevent ReDoS in LLM-output tool/think parsers
The regexes that parse untrusted model output in text_helpers.py and
tool_parsing.py are delimiter-bounded with a lazy [\s\S]*? (or an
ambiguous (\s+[^>]*)?). Applied with re.sub/re.finditer over a whole
response, they degrade to O(n^2) when the closing delimiter is absent:
the engine rescans to end-of-string from every opener. Model output is
untrusted, so a prompt-injected or malicious model can stall the agent
loop with many unclosed openers (measured ~25s on a 60KB <thought flood).
- text_helpers.py: replace ambiguous <thought(\s+[^>]*)?> with
<thought([^>]*)> (identical capture, no \s+/[^>]* overlap); skip the
Gemma <|channel>...<channel|> subs when no <channel|> closer is present.
- tool_parsing.py: gate _TOOL_CALL_RE, _XML_TOOL_CALL_RE and _TOOL_CODE_RE
(in parse_tool_blocks and strip_tool_blocks) on a cheap presence check
for their closing delimiter. With no closer the regex cannot match, so
skipping is equivalent; only the wasted O(n^2) rescan is removed.
Resolves CodeQL py/polynomial-redos #230, #231, #232, #233, #235, #236,
#524. The _XML_OPEN_TOOL_CALL_RE alerts (#234, #477) are false positives
(its greedy [\s\S]*\Z is linear) and left untouched.
* fix(security): close ReDoS gaps in tool/think parsers from review
Addresses two review findings on the closer-guard approach:
- Whole-string "closer exists?" checks were bypassable: a stale closer
before an opener flood, or a closer with no reachable inner `}`, kept
the guard true while every opener still rescanned to end-of-string
(O(n^2)). Replace the substring guards with `_iter_delimited`, a
forward-only scan that pairs each opener with a *later* closer and
stops once none is reachable (O(n)). `parse_tool_blocks` and
`strip_tool_blocks` (via `_strip_delimited`) both use it for the
[TOOL_CALL], <tool_call>/<function_call>, and <tool_code> formats.
Verified equivalent to the original regexes on well-formed inputs.
- `<thought([^>]*)>` dropped the tag-name boundary and corrupted
unrelated tags (`<thoughtful>` -> `<thinkful>`). Use `<thought(\s[^>]*)?>`:
the single fixed `\s` keeps the pattern linear (no `\s+`/`[^>]*`
overlap) while restoring the boundary; capture is byte-for-byte
identical for real `<thought ...>` openers.
Adds regressions for stale-closer-before-opener, closer-present-without-
inner-brace, and the <thoughtful>/<thoughts> passthrough.
* fix(security): close Gemma channel ReDoS guard flagged in review
vdmkenny noted the same bypassable whole-string guard remained in
text_helpers.py: `if "<channel|>" in out.lower()` gating the Gemma
thought/response channel subs. A stale `<channel|>` before a
`<|channel>thought` opener flood keeps the guard true while every opener
still rescans to end-of-string (measured ~7.3s at 4k openers).
Replace it with `_sub_delimited`, the same forward-only scan used for the
tool-call parsers: pair each opener with a later closer, stop when none is
reachable (O(n)). Verified output-equivalent to the original capture regexes
on well-formed multi-channel inputs; the stale-closer case now runs in <2ms.
Adds a regression for stale-closer-before-opener on the Gemma path.
* fix(security): harden strip_think() think-tag ReDoS flagged in review
The earlier fixes hardened normalize_thinking_markup and the delimiter
scanners, but the production entrypoint strip_think() still ran
_THINK_CLOSED_RE / _THINK_ATTR_RE / _THINK_OPEN_RE (and the stray-tag
_THINK_TAG_RE) over untrusted model output. Those kept the same ReDoS
shapes: the lazy `<open>[\s\S]*?</close>` rescanned to end-of-string from
every opener, and `(?:\s+[^>]*)?` / `[^>]*` attribute scans ran to
end-of-string from every opener on a "many openers, no closer" flood. On
the prior head, malformed `<think` / `<thinking` / `<thought` floods took
6-14s through strip_think(). The shipped `<thought>` normalization had the
same residual: the single-opener case was linear but an opener flood was
still O(n^2) (~4.4s).
- Replace the lazy multi-pass _THINK_CLOSED_RE loop with the existing
forward-only _sub_delimited scan (pair each opener with the first
reachable closer, stop when none is reachable). One pass collapses
sequential and nested blocks as before.
- Bound every opener/stray-tag attribute scan at `<` (`[^<>]` not `[^>]`)
so a no-`>` opener flood can't drive a single match attempt to
end-of-string. Identical capture for well-formed think/thought tags.
- email_helpers._strip_think: compute had_think from the single linear
_THINK_TAG_RE instead of the lazy closed/open `.search()` calls, which
had the same O(n^2) on the email reply/summary/extraction paths.
All flood variants now finish in <10ms (were 6-14s). Output verified
byte-for-byte identical to the prior implementation over a 34-case corpus
(nested, mismatched, attr, uppercase, Gemma, prose, prompt-echo). Adds
strip_think() timing regressions for malformed openers, opener floods
(all three tag names), the closed-opener flood, and the malformed-closer
flood.
* docs: trim verbose comments in think-tag ReDoS fix
Detached bash jobs (#!bg) could be launched and auto-reported on completion,
but the agent had no way to act on a running one: no on-demand output read and
no kill (it blocked until the 1h max-runtime). bg_jobs had the pieces
(_read_output, list_for_session, internal _kill) but none was exposed.
Adds:
- bg_jobs.kill(job_id): tears down the process tree, marks the job killed, and
sets followed_up so the monitor does not also auto-continue a deliberate kill.
- manage_bg_jobs registry tool with actions list / output / kill, scoped to the
chat that launched the job (cross-session access reads as not found).
- Wiring: TOOL_HANDLERS/TAGS, function schema, RAG index + keyword hints, parser
name map, dispatch (threads session_id via _direct_fallback). Gated like bash
(NON_ADMIN_BLOCKED_TOOLS; plan-mode mutator).
- agent_loop: background-job intent regex maps to the files domain (and the tool
joins _DOMAIN_TOOL_MAP[files]) so short commands like 'kill that job' are not
dropped by the low-signal gate that skips tool retrieval.
- bg launch message tells the model to call manage_bg_jobs itself for check/stop
rather than printing raw tool syntax to the user.
Tests: tests/test_bg_job_tools.py (kill semantics, per-chat scoping, actions,
and the intent classifier).
* fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models
_resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block
parser whenever a model produced no native tool_calls, regardless of whether
that model has a reliable native function-calling channel. Native models
(GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write
illustrative ```bash/```python/```json examples in guide-only prose; the
fallback parser matched these and executed them as real commands, sometimes
looping for several rounds as the model tried to clarify with more examples
(#3222).
Restrict the textual fenced-block fallback to non-native models, which rely
on it as their only tool-invocation channel. Native models are trusted to use
their structured tool_calls channel for real invocations; when they don't
emit one, a bare fence in their response is prose, not an action. The native
tool_calls path itself is untouched.
This sits one layer below #3088's guide-only policy enforcement: that PR
blocks tool exposure/execution on explicit no-tools requests, while this fixes
the parser so ordinary illustrative fences are never misread as calls in the
first place, on any turn.
* fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence
_resolve_tool_blocks previously short-circuited the entire textual parser
(tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native
function-calling models with no native tool_calls. That also dropped Patterns
2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content
as text), which are real calls a model couldn't emit on its structured channel
(e.g. DeepSeek-V falling back to DSML), not illustrative examples.
parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY
Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks
passes skip_fenced=is_api_model so fenced examples stop being executed for
native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and
recoverable. cleaned_round mirrors the same gate when persisting round text, so
an illustrative fence that wasn't executed isn't stripped from saved/reloaded
history either (it was streaming once and then disappearing on reload).
Models (notably Gemini) emit a native 'google_search' function call, but the
agent loop had no mapping for it, so the call failed to convert, the round
produced 0 chars and 0 tool blocks, and generation died silently — the web
client hung on 'waiting for first token' with no error (also #443).
- Map google_search / google_search_retrieval / google_search_grounding to the
web_search tool, and read Gemini's 'queries' array (falling back to 'query').
- In stream_agent_loop, when a round yields no response text and no tool
events, emit a visible fallback message instead of leaving the user hanging.
- Give the unknown-tool execution branch an explicit exit_code=1 so the failure
is logged as an error rather than 'n/a'.
Unknown/unconvertible tool names still return None (unchanged) so they are
dropped safely rather than executed. Added tests covering the google_search
mapping, the queries array, and unknown/invalid-JSON returning None.
* feat(web-fetch): add web_fetch tool to read a specific URL's content
* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution
Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.