3 Commits

Author SHA1 Message Date
Muhammed Midlaj 4b0a977988 fix(models): probe /v1/models for path-less LM Studio endpoints
Probe /v1/models for path-less OpenAI-compatible model endpoints and surface clearer LM Studio diagnostics with the actual probed URL.
2026-06-15 15:09:50 +09:00
Maruf Hasan c3fcaf15b7 feat(providers): add NVIDIA AI provider endpoint support (#3456)
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 11:06:12 +02:00
Sheikh Rahat Mahmud e2ba068cbc Add provider endpoint resolver tests
The existing test_endpoint_resolver.py copies the pure functions to avoid
import side effects, so its assertions can silently drift from the shipped
src/endpoint_resolver.py (the copies already lag: no OpenRouter headers, no
anthropic.com host matching). This adds a sibling module that imports the
REAL resolver and locks in behavior for every provider named in ROADMAP.md's
"Provider setup/probing audit" — Anthropic, Gemini, Groq, xAI, OpenRouter,
OpenAI, DeepSeek — plus Ollama (local + cloud) and the Tailscale self-host
fallback in resolve_url.

Covers build_chat_url, build_models_url, build_headers, normalize_base,
_first_chat_model, _anthropic_api_root, _ollama_api_root, and resolve_url.
conftest.py already stubs the heavy deps, so the import is side-effect free.

Test-only; no behavior change. 55 new tests, all passing.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 12:53:50 +09:00