odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-28 15:45:22 -04:00

Author	SHA1	Message	Date
nopoz	fbe3a0d73b	fix(security): prevent ReDoS in XML and args tool-call parsers (#4941 ) * fix(security): prevent ReDoS in XML and args tool-call parsers Four py/polynomial-redos sinks in tool_parsing.py ran lazy/greedy regexes over untrusted model output (tool-call markup is attacker-influenced via prompt injection). When the closing delimiter was absent, each rescanned to end-of-string from every opener -> O(n^2): - args => { ... } in _parse_tool_call_block: greedy \{([\s\S])\} restarted from every `args:{` opener. Now finds the opener once and takes through the last `}` (rfind) — equivalent capture, O(n). - _XML_INVOKE_RE: lazy <invoke ...>([\s\S]?)</invoke>. Now _iter_xml_invoke pairs each opener with the first reachable </invoke> and stops when none is. - _XML_DIRECT_TOOL_RE and the <tag>([\s\S]?)</\1> param scan in _parse_tool_code_block: lazy backreference patterns. Now _iter_backref_blocks pairs each opener with the nearest matching closer and memoizes tag names with no remaining closer, so an opener flood stays O(n). All four are output-equivalent to the originals on well-formed tool-call markup; the lazy patterns remain defined (still re-exported via agent_tools) but no longer drive a finditer over untrusted text. Adds tests/test_redos_xml_tool_parsers.py pinning correctness and bounding the opener-flood inputs (old paths took 4-15s). fix(security): harden invoke-parameter and distinct-name tag scans Forward-only the two residual ReDoS paths in the XML/tool parsers that the outer-delimiter fix left quadratic: - _parse_xml_invoke parsed <parameter> with _XML_PARAM_RE.finditer, so a closed <invoke> body full of unclosed <parameter> openers rescanned the body from every opener (O(n^2), ~11s at 8k openers). Now scans forward-only via _iter_named_blocks, factored out of _iter_xml_invoke. - _iter_backref_blocks only memoized repeated missing tag names; a flood of distinct unclosed names searched the suffix once per name (O(n^2)). It now indexes every closer by name in one linear pass and binary-searches per opener (O(n log n)). Covers the direct and tool_code backref scans. Output-equivalent to the prior scanners (200k randomized trials match the memoized version for both the direct ci=True and tool_code ci=False configs). Adds regressions for the closed-invoke parameter flood and the distinct-name floods (45k openers now run in ~0.05s, were 5-6s).	2026-06-27 15:42:55 -07:00
nopoz	c098355778	fix(security): prevent ReDoS in LLM-output tool/think parsers (#4704 ) * fix(security): prevent ReDoS in LLM-output tool/think parsers The regexes that parse untrusted model output in text_helpers.py and tool_parsing.py are delimiter-bounded with a lazy [\s\S]? (or an ambiguous (\s+[^>])?). Applied with re.sub/re.finditer over a whole response, they degrade to O(n^2) when the closing delimiter is absent: the engine rescans to end-of-string from every opener. Model output is untrusted, so a prompt-injected or malicious model can stall the agent loop with many unclosed openers (measured ~25s on a 60KB <thought flood). - text_helpers.py: replace ambiguous <thought(\s+[^>])?> with <thought([^>])> (identical capture, no \s+/[^>]* overlap); skip the Gemma <\|channel>...<channel\|> subs when no <channel\|> closer is present. - tool_parsing.py: gate _TOOL_CALL_RE, _XML_TOOL_CALL_RE and _TOOL_CODE_RE (in parse_tool_blocks and strip_tool_blocks) on a cheap presence check for their closing delimiter. With no closer the regex cannot match, so skipping is equivalent; only the wasted O(n^2) rescan is removed. Resolves CodeQL py/polynomial-redos #230, #231, #232, #233, #235, #236, #524. The _XML_OPEN_TOOL_CALL_RE alerts (#234, #477) are false positives (its greedy [\s\S]\Z is linear) and left untouched. fix(security): close ReDoS gaps in tool/think parsers from review Addresses two review findings on the closer-guard approach: - Whole-string "closer exists?" checks were bypassable: a stale closer before an opener flood, or a closer with no reachable inner `}`, kept the guard true while every opener still rescanned to end-of-string (O(n^2)). Replace the substring guards with `_iter_delimited`, a forward-only scan that pairs each opener with a later closer and stops once none is reachable (O(n)). `parse_tool_blocks` and `strip_tool_blocks` (via `_strip_delimited`) both use it for the [TOOL_CALL], <tool_call>/<function_call>, and <tool_code> formats. Verified equivalent to the original regexes on well-formed inputs. - `<thought([^>])>` dropped the tag-name boundary and corrupted unrelated tags (`<thoughtful>` -> `<thinkful>`). Use `<thought(\s[^>])?>`: the single fixed `\s` keeps the pattern linear (no `\s+`/`[^>]` overlap) while restoring the boundary; capture is byte-for-byte identical for real `<thought ...>` openers. Adds regressions for stale-closer-before-opener, closer-present-without- inner-brace, and the <thoughtful>/<thoughts> passthrough. fix(security): close Gemma channel ReDoS guard flagged in review vdmkenny noted the same bypassable whole-string guard remained in text_helpers.py: `if "<channel\|>" in out.lower()` gating the Gemma thought/response channel subs. A stale `<channel\|>` before a `<\|channel>thought` opener flood keeps the guard true while every opener still rescans to end-of-string (measured ~7.3s at 4k openers). Replace it with `_sub_delimited`, the same forward-only scan used for the tool-call parsers: pair each opener with a later closer, stop when none is reachable (O(n)). Verified output-equivalent to the original capture regexes on well-formed multi-channel inputs; the stale-closer case now runs in <2ms. Adds a regression for stale-closer-before-opener on the Gemma path. * fix(security): harden strip_think() think-tag ReDoS flagged in review The earlier fixes hardened normalize_thinking_markup and the delimiter scanners, but the production entrypoint strip_think() still ran _THINK_CLOSED_RE / _THINK_ATTR_RE / _THINK_OPEN_RE (and the stray-tag _THINK_TAG_RE) over untrusted model output. Those kept the same ReDoS shapes: the lazy `<open>[\s\S]?</close>` rescanned to end-of-string from every opener, and `(?:\s+[^>])?` / `[^>]` attribute scans ran to end-of-string from every opener on a "many openers, no closer" flood. On the prior head, malformed `<think` / `<thinking` / `<thought` floods took 6-14s through strip_think(). The shipped `<thought>` normalization had the same residual: the single-opener case was linear but an opener flood was still O(n^2) (~4.4s). - Replace the lazy multi-pass _THINK_CLOSED_RE loop with the existing forward-only _sub_delimited scan (pair each opener with the first reachable closer, stop when none is reachable). One pass collapses sequential and nested blocks as before. - Bound every opener/stray-tag attribute scan at `<` (`[^<>]` not `[^>]`) so a no-`>` opener flood can't drive a single match attempt to end-of-string. Identical capture for well-formed think/thought tags. - email_helpers._strip_think: compute had_think from the single linear _THINK_TAG_RE instead of the lazy closed/open `.search()` calls, which had the same O(n^2) on the email reply/summary/extraction paths. All flood variants now finish in <10ms (were 6-14s). Output verified byte-for-byte identical to the prior implementation over a 34-case corpus (nested, mismatched, attr, uppercase, Gemma, prose, prompt-echo). Adds strip_think() timing regressions for malformed openers, opener floods (all three tag names), the closed-opener flood, and the malformed-closer flood. docs: trim verbose comments in think-tag ReDoS fix	2026-06-27 10:12:28 -07:00
Dividesbyzer0	2e16394b41	fix(agent): parse misfenced read_file calls (#4799 )	2026-06-23 23:20:13 +02:00
pewdiepie-archdaemon	993d504de3	Clear remaining CodeQL path and parser alerts	2026-06-22 02:45:05 +00:00
pewdiepie-archdaemon	fbdec22dcb	CodeQL hardening for cookbook sync	2026-06-22 02:39:18 +00:00
pewdiepie-archdaemon	92daf4e560	Cookbook launch and gallery upload fixes	2026-06-22 01:49:15 +00:00
Kenny Van de Maele	cdae9879f2	feat(agent): add manage_bg_jobs tool to inspect and kill background bash jobs (#4577 ) Detached bash jobs (#!bg) could be launched and auto-reported on completion, but the agent had no way to act on a running one: no on-demand output read and no kill (it blocked until the 1h max-runtime). bg_jobs had the pieces (_read_output, list_for_session, internal _kill) but none was exposed. Adds: - bg_jobs.kill(job_id): tears down the process tree, marks the job killed, and sets followed_up so the monitor does not also auto-continue a deliberate kill. - manage_bg_jobs registry tool with actions list / output / kill, scoped to the chat that launched the job (cross-session access reads as not found). - Wiring: TOOL_HANDLERS/TAGS, function schema, RAG index + keyword hints, parser name map, dispatch (threads session_id via _direct_fallback). Gated like bash (NON_ADMIN_BLOCKED_TOOLS; plan-mode mutator). - agent_loop: background-job intent regex maps to the files domain (and the tool joins _DOMAIN_TOOL_MAP[files]) so short commands like 'kill that job' are not dropped by the low-signal gate that skips tool retrieval. - bg launch message tells the model to call manage_bg_jobs itself for check/stop rather than printing raw tool syntax to the user. Tests: tests/test_bg_job_tools.py (kill semantics, per-chat scoping, actions, and the intent classifier).	2026-06-19 00:28:22 -07:00
Dividesbyzer0	33c26bab88	fix(agent): parse raw json web search calls (#4088 )	2026-06-15 15:19:38 +09:00
Lucas Daniel	0a324f20d2	fix(agent): stop treating illustrative Markdown fences as tool calls for native function-calling models (#3356 ) * fix(agent): stop executing illustrative Markdown fences as tool calls for native function-calling models _resolve_tool_blocks fell back to the textual parse_tool_blocks() fenced-block parser whenever a model produced no native tool_calls, regardless of whether that model has a reliable native function-calling channel. Native models (GPT/Claude/Grok/Qwen3/DeepSeek-V, etc. - _is_api_model true) commonly write illustrative ```bash/```python/```json examples in guide-only prose; the fallback parser matched these and executed them as real commands, sometimes looping for several rounds as the model tried to clarify with more examples (#3222). Restrict the textual fenced-block fallback to non-native models, which rely on it as their only tool-invocation channel. Native models are trusted to use their structured tool_calls channel for real invocations; when they don't emit one, a bare fence in their response is prose, not an action. The native tool_calls path itself is untouched. This sits one layer below #3088's guide-only policy enforcement: that PR blocks tool exposure/execution on explicit no-tools requests, while this fixes the parser so ordinary illustrative fences are never misread as calls in the first place, on any turn. * fix(agent): gate only the fenced-example pattern for native models, preserve DSML/invoke recovery and persistence _resolve_tool_blocks previously short-circuited the entire textual parser (tool_blocks = [] if is_api_model else parse_tool_blocks(...)) for native function-calling models with no native tool_calls. That also dropped Patterns 2-5 (explicit [TOOL_CALL]/<invoke>/<tool_code>/DSML markup leaked into content as text), which are real calls a model couldn't emit on its structured channel (e.g. DeepSeek-V falling back to DSML), not illustrative examples. parse_tool_blocks/strip_tool_blocks now take a skip_fenced flag that gates ONLY Pattern 1 (the fenced ```bash/```python/```json block matcher). _resolve_tool_blocks passes skip_fenced=is_api_model so fenced examples stop being executed for native models while [TOOL_CALL]/<invoke>/<tool_code>/DSML stay fully active and recoverable. cleaned_round mirrors the same gate when persisting round text, so an illustrative fence that wasn't executed isn't stripped from saved/reloaded history either (it was streaming once and then disappearing on reload).	2026-06-08 22:25:28 +02:00
Nicholai	33edc40eae	fix: route misfenced web lookups to web tools Fixes #3067	2026-06-06 03:46:31 -06:00
nubs	08e543d1ff	fix(tool-parsing): don't ship unconvertible <invoke> fence content to the code executor (#2926 )	2026-06-05 21:08:54 +02:00
Afonso Coutinho	3175d7ca21	fix: tool-block parsing crashes on a non-string input (#1628 )	2026-06-03 08:59:42 +09:00
Tatlatat	acfdcf346c	fix(agent): map native google_search and surface empty rounds Models (notably Gemini) emit a native 'google_search' function call, but the agent loop had no mapping for it, so the call failed to convert, the round produced 0 chars and 0 tool blocks, and generation died silently — the web client hung on 'waiting for first token' with no error (also #443). - Map google_search / google_search_retrieval / google_search_grounding to the web_search tool, and read Gemini's 'queries' array (falling back to 'query'). - In stream_agent_loop, when a round yields no response text and no tool events, emit a visible fallback message instead of leaving the user hanging. - Give the unknown-tool execution branch an explicit exit_code=1 so the failure is logged as an error rather than 'n/a'. Unknown/unconvertible tool names still return None (unchanged) so they are dropped safely rather than executed. Added tests covering the google_search mapping, the queries array, and unknown/invalid-JSON returning None.	2026-06-02 12:57:45 +09:00
Rifqi Akram	5b1e56407b	Add SSRF-guarded web fetch agent tool * feat(web-fetch): add web_fetch tool to read a specific URL's content * test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution Add explicit SSRF regression tests for the web_fetch path covering loopback, private LAN ranges, link-local/metadata, IPv6 private/local, redirect-into-private, and unsupported schemes. Harden _public_http_url to fail closed when a hostname resolves to no addresses.	2026-06-01 16:57:28 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

15 Commits