mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-19 19:25:27 -04:00
refactor(search): centralize the web-scraping User-Agent into one constant (#4325)
The outbound UA for web_fetch / web_search was inlined in four places with two different values and nothing keeping them current: content.py pinned a mid-2021 Chrome 91 build, and providers.py sent a bare Mozilla/5.0 in three spots. Some sites serve a degraded or blocked page to a UA that old. Add WEB_FETCH_USER_AGENT to src/constants.py (env-overridable, matching the existing Copilot/Kimi UA-constant pattern) and import it in content.py and providers.py. Default to a current, common desktop UA so pages return their normal HTML: the market-leading desktop OS (Windows; NT 10.0 covers Windows 10 and 11) and browser (Chrome) on a current stable build. The version is now bumped in one place. Service-specific self-identifying agents (Copilot, Kimi, webhooks, cookbook) are intentionally left separate. Adds a regression pinning the constant shape, the env override, and a guard against a new inline Mozilla literal in the search sources. Closes #4324
This commit is contained in:
committed by
GitHub
parent
b58af4267b
commit
fafaf089c5
@@ -78,6 +78,13 @@ MAX_CONTEXT_MESSAGES = 90
|
||||
REQUEST_TIMEOUT = 20
|
||||
OPENAI_COMPAT_PATH = "/v1/chat/completions"
|
||||
|
||||
# Outbound UA for web_fetch / web_search scraping; common desktop UA so pages serve normal HTML.
|
||||
WEB_FETCH_USER_AGENT = os.environ.get(
|
||||
"WEB_FETCH_USER_AGENT",
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36",
|
||||
)
|
||||
|
||||
# Environment variables with defaults
|
||||
DEFAULT_HOST = os.getenv("LLM_HOST", "localhost")
|
||||
LLM_HOSTS = [h.strip() for h in os.getenv("LLM_HOSTS", "").split(",") if h.strip()]
|
||||
|
||||
Reference in New Issue
Block a user