Commit Graph

12 Commits

Author SHA1 Message Date
ooovenenoso 725d174243 fix(research): track analyzed URLs separately (#3125)
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-10 12:08:22 +01:00
Wes Huber b9a96bca1a fix(research): avoid double split() call and potential IndexError (#2229)
cat.split()[0] was called in the condition and again in the body,
wasting a second split. More importantly, if cat were ever
whitespace-only, split() returns [] and [0] raises IndexError.
Assign to a local variable and guard with a truthiness check.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-07 16:46:21 +02:00
Giuseppe e87a1ad8d2 fix(deep-research): wrap fetched webpage content in untrusted-context sandbox
The goal-based extractor passed raw fetched webpage content straight
into the LLM prompt via string substitution, bypassing the
prompt-injection hardening layer in src/prompt_security.py.

Split EXTRACTOR_PROMPT into EXTRACTOR_SYSTEM (task instructions +
goal, trusted) and a second message built with untrusted_context_message()
(raw page content, sandboxed with <<<UNTRUSTED_SOURCE_DATA>>> guards).
This aligns the extractor with every other external-content injection
site in the codebase (agent_loop, chat_processor, chat_routes).

Fixes #3044

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 03:37:10 -06:00
michaelxer 71dda5b106 fix: respect user round count in deep research (#2896)
The STOP_PROMPT did not include the target round count, so the LLM
could decide to stop after 2-3 rounds even when the user requested 8.
Additionally, min_rounds was capped at 3 regardless of max_rounds.

- Add max_rounds to STOP_PROMPT so the LLM knows the target
- Change min_rounds from min(3, max_rounds) to max(2, max_rounds - 2)

Fixes #2863

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
2026-06-05 20:49:42 +02:00
ooovenenoso ab5311c44d fix(research): support timeout defaults in direct tests (#2624)
fix(research): honor planning query timeouts
2026-06-04 20:23:17 +02:00
Afonso Coutinho 8a0b79bc84 fix: deep research runs the prompt's example queries when the model echoes them (#1666) 2026-06-03 14:23:07 +09:00
lekt8 1f743970dd Don't lose deep-research findings when synthesis times out (#1551) (#1562)
Two problems made deep research report "No information could be gathered" even
after it had extracted findings, on slow local models (reporter served a 20B
via LM Studio):

- _synthesize hard-capped its LLM call at timeout=60, while extraction uses the
  user's extraction_timeout (300s here) and the final report uses 180s. The slow
  model needed >60s to synthesize the round's findings, so synthesis timed out
  after 3 attempts. Raised it to 180s to match the final-report call.

- When synthesis produced no report (it returns the unchanged, still-empty
  report on failure during round 1), the run hit
  `if not report: return "No information could be gathered…"` and discarded the
  findings it had already gathered. Now it falls back to a compiled report built
  from those findings (_fallback_report) so the user keeps the gathered material.

Tests stub the LLM (no live model/DB), pin the synthesis timeout >= 180, that the
fallback surfaces the findings rather than the give-up message, and that a failed
synthesis preserves the previous report.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 08:11:44 +09:00
lekt8 ce7f5dbbdd Inject current date into deep research planning and query prompts (#1347)
Deep research generated search queries from the LLM's training-cutoff
knowledge, so it emitted stale-year queries like "best Python tutorials
2025" when the actual year is later (issue #1341). The chat/agent path
already grounds the model with "Today is ..." (src/agent_loop.py); the
deep research planning and query-generation prompts had no equivalent.

Add a small current_date_context() helper and prepend it at the plan and
query-generation prompt sites (and the research_handler plan preview path
that reuses RESEARCH_PLAN_PROMPT). System-TZ local, portable strftime.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 03:00:52 +09:00
SurprisedDuck 4307cac966 Research: report empty search provider results clearly
Deep Research surfaced 'Error: unknown error' whenever every search
provider returned an empty result set without raising (e.g. SearXNG is
reachable but all its engines fail internally). _last_search_error was
only set on exceptions, so the empty-but-no-exception path left it unset
and the caller fell back to 'unknown error'.

Record an actionable reason on that path naming the providers that were
tried, so users can tell it's a search-backend problem rather than a
model problem. The provider-raised path is unchanged.

Re: #344.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:34:25 +09:00
NovaUnboundAi 3319310942 Allow longer deep research extraction timeouts (#651)
Co-authored-by: NovaUnboundAi <NovaUnboundAi@users.noreply.github.com>
2026-06-02 11:50:03 +09:00
pewdiepie-archdaemon b998c52dd0 Add Deep Research extraction controls 2026-06-01 14:55:33 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00