feat(providers): add NVIDIA AI provider endpoint support (#3456)

* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com) * feat: add NVIDIA option to provider settings dropdown and aliases * test: add NVIDIA provider detection and endpoint tests * Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering - nvidia.com -> 'nvidia' curated key for proper provider routing - _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed - _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip, kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse, -embedqa, -nemoretriever * Expand non-chat model filtering for NVIDIA embedding/guard/video models Add _NON_CHAT_PREFIXES: embed, recurrent Add _NON_CHAT_CONTAINS: topic-control, guard, calibration, ai-synthetic-video, cosmos-reason2 Catches remaining unfiltered non-chat models from NVIDIA catalog: embedding (llama-nemotron-embed, embed-qa), guard (llama-guard, nemoguard-topic-control), calibration (ising-calibration), video (ai-synthetic-video-detector, cosmos-reason2), recurrent (recurrentgemma-2b) * Filter non-chat models in _probe_endpoint via _is_chat_model() Previously _is_chat_model() was only used in the per-model probe and _first_chat_model(), so non-chat models still appeared in the model picker even though they were filtered in those specific paths. Applying the filter at _probe_endpoint() return ensures non-chat models (embeddings, safety guards, reward, calibration, video detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never enter cached_models and never appear in the picker. * Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models Prefix checks (mid.startswith) miss models with org prefixes like baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc. Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught regardless of the org prefix. Adds: embed, bge, recurrent, starcoder, gemma-2b * fix(model-routes): drop collision-prone substrings from global non-chat filter The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES and _NON_CHAT_CONTAINS tuples. These are intended to filter out embedding, retrieval, safety, and vision models from NVIDIA's catalog that are not chat-completions-capable. However, four of the added substrings collide with legitimate chat models served by other providers: - gemma-2b matches google/gemma-2b-it (instruct chat model) - starcoder matches bigcode/starcoder2-15b (code completion model) - recurrent matches google/recurrentgemma-2b (language model) - guard matches meta-llama/Llama-Guard-3-8B (safety classifier) Removing these four from the global tuples keeps the NVIDIA-specific filtering intact (safety, embedding, retrieval, and vision models are still caught by other tokens such as content-safety, -safety, -reward, embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while preventing false negatives for instruct/code models on other providers. Tests added for gemma-2b-it, google/gemma-2b-it, and bigcode/starcoder2-15b-instruct asserting they are recognized as chat models. Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS entries redundant since the prefix check runs first. Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be> * style: fix indentation of groq and xai test cases in test_provider_endpoints.py --------- Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-16 17:55:26 -04:00 · 2026-06-09 15:06:12 +06:00
parent 3c4ec8828b
commit c3fcaf15b7
8 changed files with 29 additions and 4 deletions
@@ -347,6 +347,8 @@ class TestIsChatModel:
        "gpt-4o", "gpt-4o-mini", "claude-sonnet-4", "llama-3.3-70b",
        "deepseek-chat", "gemini-2.0-flash", "o3",
        "llama-4-scout-17b-16e-instruct",
+        "gemma-2b-it", "google/gemma-2b-it",
+        "bigcode/starcoder2-15b-instruct",
    ])
    def test_chat_models(self, model_id):
        assert _is_chat_model(model_id) is True
@@ -40,6 +40,7 @@ class TestDetectProvider:
        ("https://anthropic.com/v1", "anthropic"),
        ("https://openrouter.ai/api/v1", "openrouter"),
        ("https://api.groq.com/openai/v1", "groq"),
+        ("https://integrate.api.nvidia.com/v1", "nvidia"),
        ("http://localhost:11434/api", "ollama"),
        ("https://ollama.com", "ollama"),
        # xAI, DeepSeek and Gemini's OpenAI-compatible surface are NOT
@@ -84,6 +85,7 @@ class TestProviderLabel:
        ("https://api.openai.com/v1", "OpenAI"),
        ("https://openrouter.ai/api/v1", "OpenRouter"),
        ("https://api.groq.com/openai/v1", "Groq"),
+        ("https://integrate.api.nvidia.com/v1", "NVIDIA"),
        ("https://api.mistral.ai/v1", "Mistral"),
        ("https://api.deepseek.com", "DeepSeek"),
        ("https://generativelanguage.googleapis.com/v1beta/openai", "Google"),
@@ -50,6 +50,9 @@ PROVIDER_CASES = [
    ("groq", "https://api.groq.com/openai/v1",
     "https://api.groq.com/openai/v1/chat/completions",
     "https://api.groq.com/openai/v1/models"),
+    ("nvidia", "https://integrate.api.nvidia.com/v1",
+     "https://integrate.api.nvidia.com/v1/chat/completions",
+     "https://integrate.api.nvidia.com/v1/models"),
    ("xai", "https://api.x.ai/v1",
     "https://api.x.ai/v1/chat/completions",
     "https://api.x.ai/v1/models"),
@@ -112,6 +115,7 @@ def test_headers_anthropic_without_key_still_sends_version():
    "https://api.x.ai/v1",
    "https://api.deepseek.com",
    "https://api.groq.com/openai/v1",
+    "https://integrate.api.nvidia.com/v1",
    "https://generativelanguage.googleapis.com/v1beta/openai",
 ])
 def test_headers_openai_style_use_bearer(base):