fix(startup): ping real endpoints in warmup/keepalive (#3641)

_warmup_endpoints called model_discovery.get_endpoints(), which does not exist
on ModelDiscovery. It raised AttributeError on every startup and on every 60s
keepalive tick, was swallowed by the outer except, and pinged nothing, so the
cold-start prevention the loop exists for never ran.

Add ModelDiscovery.warmup_ping_urls(), which resolves the /models probe URLs
from the real discover_models() output, and call it from the warmup loop via
asyncio.to_thread (discovery does a blocking port scan, so keep it off the event
loop).

Adds tests/test_warmup_ping_urls.py: resolves /models URLs from discovered
items, honors the limit, degrades to [] on discovery failure, and documents that
get_endpoints never existed.
This commit is contained in:
Mazen Tamer Salah
2026-06-10 20:21:45 +03:00
committed by GitHub
parent d9a4b99046
commit 218b9ecbc8
3 changed files with 81 additions and 10 deletions
+15 -10
View File
@@ -946,16 +946,21 @@ async def _startup_event():
async def _warmup_endpoints():
try:
import httpx
endpoints = model_discovery.get_endpoints() if model_discovery else []
for ep in endpoints[:5]:
url = ep.get("url", "").replace("/chat/completions", "/models")
if url:
try:
async with httpx.AsyncClient(timeout=5.0) as client:
await client.get(url)
logger.info(f"Warmup ping OK: {url}")
except Exception as e:
logger.debug(f"Warmup ping failed for endpoint: {e}")
# model_discovery has no get_endpoints(); that call raised
# AttributeError every run and silently disabled warmup/keepalive.
# Resolve the /models probe URLs via the real discovery API, off the
# event loop since discovery does a blocking port scan.
urls = (
await asyncio.to_thread(model_discovery.warmup_ping_urls)
if model_discovery else []
)
for url in urls:
try:
async with httpx.AsyncClient(timeout=5.0) as client:
await client.get(url)
logger.info(f"Warmup ping OK: {url}")
except Exception as e:
logger.debug(f"Warmup ping failed for endpoint: {e}")
except Exception as e:
logger.debug(f"Warmup ping skipped: {e}")