fix(search): catch HTTPStatusError so 403/404 URLs degrade gracefully instead of 500 (#2203)

raise_for_status() raises httpx.HTTPStatusError for 4xx/5xx responses,
but the surrounding try/except only caught httpx.RequestError (network
errors) and RateLimitError (429). Any other HTTP error code propagated
uncaught up through chat_processor -> chat_helpers -> chat_routes and
surfaced as a 500 Internal Server Error.

Added an explicit except httpx.HTTPStatusError clause that logs a warning
and returns an empty result, matching the behaviour already in place for
network errors.

Also adds focused regression tests that exercise the real
fetch_webpage_content() path with a mocked _get_public_url:
- 403/404 responses return the standard empty-result shape instead of
  raising, proving the new HTTPStatusError handling works end to end.
- 429 responses still take their own dedicated rate-limit branch (the
  status_code == 429 check runs before raise_for_status() is reached),
  keeping that behaviour distinct from the new generic HTTPStatusError
  handling.

Dropped the unrelated builtin_mcp.py change that had been carried over
from a rebase; that fix is tracked separately in #2018 and this branch
should stay scoped to the search content fetch path.

Closes #2148
This commit is contained in:
Lucas Daniel
2026-06-07 21:09:21 -03:00
committed by GitHub
parent 77b75ca97e
commit fa7c4f8ea9
2 changed files with 84 additions and 0 deletions
+3
View File
@@ -259,6 +259,9 @@ def fetch_webpage_content(url: str, timeout: int = 5, retry_attempt: int = 0) ->
raise RateLimitError(f"Rate limit hit for {url} (attempt {retry_attempt})")
response.raise_for_status()
except httpx.HTTPStatusError as e:
error_logger.warning(f"HTTP {e.response.status_code} fetching {url}: {e}")
return _empty_result(url, f"HTTP {e.response.status_code}: {e}")
except httpx.RequestError as e:
error_logger.error(f"NetworkError fetching {url} (attempt {retry_attempt}): {e}")
return _empty_result(url, f"NetworkError: {e}")