fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers (#3945)

* fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers

_apply_local_cache_affinity adds session_id + cache_prompt for llama.cpp
KV-cache slot affinity (#2927), gated on _is_self_hosted_openai_compatible,
which treated any unknown OpenAI-compatible host as self-hosted. Strict
cloud providers added as custom endpoints (Mistral at api.mistral.ai)
reject unknown body fields, so every request failed with 422
extra_forbidden. Self-hosted now also requires the endpoint to resolve as
local via model_context.is_local_endpoint: loopback/private/tailscale
host, or endpoint kind explicitly configured as "local" (the escape hatch
for tunneled self-hosted servers). is_local_endpoint is promoted to a
public name since llm_core now shares it.

Fixes #3793

* test(llm): sweep cloud OpenAI-compatible hosts in affinity gating

Parametrized cases adapted from #3839 (credit: Shabablinchikow): deepseek,
x.ai, together, fireworks, and the Gemini OpenAI-compat endpoint must all
stay free of the llama.cpp extras, not just the Mistral host from #3793.

* fix(llm): narrow the Tailscale range to 100.64.0.0/10 in is_local_endpoint

Review finding on #3945: _PRIVATE_PREFIXES carried a bare "100." prefix,
treating all of 100.0.0.0/8 as local while Tailscale only uses the CGNAT
block 100.64.0.0/10. Public 100.x hosts (e.g. AWS ranges outside the
block) were classified local and still received the llama.cpp extras
this PR exists to keep away from strict providers. Match the narrowed
classification routes/model_routes.py already uses, with boundary tests
just below, inside, and just above the range.
This commit is contained in:
Kenny Van de Maele
2026-06-11 17:51:03 +02:00
committed by GitHub
parent f941db29d3
commit 263d41c58a
5 changed files with 142 additions and 24 deletions
+11 -11
View File
@@ -6,7 +6,7 @@ import types
import pytest
import src.model_context as model_context
from src.model_context import _is_local_endpoint, estimate_tokens, _lookup_known
from src.model_context import is_local_endpoint, estimate_tokens, _lookup_known
class _Column:
@@ -56,20 +56,20 @@ def _install_endpoint_db(monkeypatch, rows):
class TestIsLocalEndpoint:
def test_localhost(self):
assert _is_local_endpoint("http://localhost:5000/v1/chat/completions") is True
assert is_local_endpoint("http://localhost:5000/v1/chat/completions") is True
def test_loopback_ipv4(self):
assert _is_local_endpoint("http://127.0.0.1:8080/v1/chat/completions") is True
assert is_local_endpoint("http://127.0.0.1:8080/v1/chat/completions") is True
def test_private_192_168(self):
assert _is_local_endpoint("http://192.168.1.1:11434/v1/chat/completions") is True
assert is_local_endpoint("http://192.168.1.1:11434/v1/chat/completions") is True
def test_private_10(self):
assert _is_local_endpoint("http://10.0.0.5:8000/v1/chat/completions") is True
assert is_local_endpoint("http://10.0.0.5:8000/v1/chat/completions") is True
def test_tailscale_100(self):
# 100.64.0.0/10 is the CGNAT range Tailscale uses.
assert _is_local_endpoint("http://100.64.0.1:5000/v1/chat/completions") is True
assert is_local_endpoint("http://100.64.0.1:5000/v1/chat/completions") is True
def test_configured_tailscale_proxy_is_remote(self, monkeypatch):
_install_endpoint_db(monkeypatch, [
@@ -81,19 +81,19 @@ class TestIsLocalEndpoint:
)
])
assert _is_local_endpoint("http://100.117.136.97:34521/v1/chat/completions") is False
assert is_local_endpoint("http://100.117.136.97:34521/v1/chat/completions") is False
def test_openai_is_remote(self):
assert _is_local_endpoint("https://api.openai.com/v1/chat/completions") is False
assert is_local_endpoint("https://api.openai.com/v1/chat/completions") is False
def test_anthropic_is_remote(self):
assert _is_local_endpoint("https://api.anthropic.com/v1/messages") is False
assert is_local_endpoint("https://api.anthropic.com/v1/messages") is False
def test_empty_url(self):
assert _is_local_endpoint("") is False
assert is_local_endpoint("") is False
def test_malformed_url(self):
assert _is_local_endpoint("not-a-url") is False
assert is_local_endpoint("not-a-url") is False
class TestEstimateTokens: