feat(providers): add NVIDIA AI provider endpoint support (#3456)

* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
This commit is contained in:
Maruf Hasan
2026-06-09 15:06:12 +06:00
committed by GitHub
parent 3c4ec8828b
commit c3fcaf15b7
8 changed files with 29 additions and 4 deletions
+11 -3
View File
@@ -283,6 +283,7 @@ _HOST_TO_CURATED = (
("fireworks.ai", "fireworks"),
("googleapis.com", "google"),
("x.ai", "xai"),
("nvidia.com", "nvidia"),
("openrouter.ai", "openrouter"),
("ollama.com", "ollama"),
)
@@ -477,10 +478,17 @@ _NON_CHAT_PREFIXES = (
"dall-e", "tts-", "whisper", "text-embedding", "embedding",
"davinci", "babbage", "moderation", "omni-moderation",
"sora", "gpt-image", "chatgpt-image",
# embedding / retrieval / non-chat models (common across providers)
"snowflake/arctic-embed", "nvidia/nv-embed", "embed",
)
_NON_CHAT_CONTAINS = (
"-realtime", "-transcribe", "-tts", "-codex",
"codex-",
"codex-", "content-safety", "-safety", "-reward", "nvclip",
"kosmos", "fuyu", "deplot", "vila", "neva",
"gliner", "riva", "-parse", "-embedqa", "-nemoretriever",
"topic-control", "calibration",
"ai-synthetic-video", "cosmos-reason2",
"bge", "llama-guard",
)
_NON_CHAT_EXACT_PREFIXES = (
"gpt-audio", # gpt-audio, gpt-audio-mini etc. (not gpt-4o-audio-preview which is chat)
@@ -731,7 +739,7 @@ def _probe_endpoint(base_url: str, api_key: str = None, timeout: int = 5) -> Lis
for _e in _PROVIDER_CURATED.get(_ck, []):
if _e not in set(models) and not any(m.startswith(_e) for m in models):
models.append(_e)
return models
return [m for m in models if _is_chat_model(m)]
except httpx.HTTPStatusError as e:
if api_key:
status = e.response.status_code if e.response is not None else "unknown"
@@ -755,7 +763,7 @@ def _probe_endpoint(base_url: str, api_key: str = None, timeout: int = 5) -> Lis
data = r.json()
models = [m.get("name") or m.get("model") for m in (data.get("models") or []) if m.get("name") or m.get("model")]
if models:
return models
return [m for m in models if _is_chat_model(m)]
except Exception as e:
logger.debug(f"Ollama /api/tags probe failed for {base}: {e}")
# Fall back to curated list if the provider has a URL-based match (e.g. z.ai has no /models endpoint)
+3
View File
@@ -444,6 +444,8 @@ def _detect_provider(url: str) -> str:
return "openrouter"
if _host_match(url, "groq.com"):
return "groq"
if _host_match(url, "nvidia.com"):
return "nvidia"
from src.chatgpt_subscription import is_chatgpt_subscription_base
if is_chatgpt_subscription_base(url):
return "chatgpt-subscription"
@@ -489,6 +491,7 @@ def _provider_label(url: str) -> str:
if is_copilot_base(url): return "GitHub Copilot"
if _host_match(url, "mistral.ai"): return "Mistral"
if _host_match(url, "deepseek.com"): return "DeepSeek"
if _host_match(url, "nvidia.com"): return "NVIDIA"
if _host_match(url, "googleapis.com"): return "Google"
if _host_match(url, "together.xyz", "together.ai"): return "Together"
if _host_match(url, "fireworks.ai"): return "Fireworks"
+1
View File
@@ -2095,6 +2095,7 @@
<option value="https://opencode.ai/zen/v1" data-logo="opencode">OpenCode Zen</option>
<option value="https://opencode.ai/zen/go/v1" data-logo="opencode">OpenCode Go</option>
<option value="https://api.z.ai/api/coding/paas/v4" data-logo="zhipu">Z.AI Coding Plan</option>
<option value="https://integrate.api.nvidia.com/v1" data-logo="nvidia">NVIDIA</option>
</select>
<!-- API key row stays in DOM, hidden until Key button is
clicked. Mirrors the Local section pattern: most users
+1
View File
@@ -118,6 +118,7 @@ const _ENDPOINT_LABELS = [
[/(^|\.)together\.(ai|xyz)$/i, "Together"],
[/(^|\.)fireworks\.ai$/i, "Fireworks"],
[/(^|\.)perplexity\.ai$/i, "Perplexity"],
[/(^|\.)nvidia\.com$/i, "NVIDIA"],
[/(^|\.)x\.ai$/i, "xAI"],
];
+5 -1
View File
@@ -43,6 +43,7 @@ const PROVIDER_PATTERNS = [
{ re: /^gsk_/, name: 'Groq', url: 'https://api.groq.com/openai/v1' },
{ re: /^AIza/, name: 'Gemini', url: 'https://generativelanguage.googleapis.com/v1beta/openai' },
{ re: /^xai-/, name: 'xAI', url: 'https://api.x.ai/v1' },
{ re: /^nvapi-/, name: 'NVIDIA', url: 'https://integrate.api.nvidia.com/v1' },
];
const SETUP_PROVIDER_URLS = {
deepseek: { name: 'DeepSeek', url: 'https://api.deepseek.com/v1' },
@@ -56,8 +57,9 @@ const SETUP_PROVIDER_URLS = {
google: { name: 'Gemini', url: 'https://generativelanguage.googleapis.com/v1beta/openai' },
'opencode-zen': { name: 'OpenCode Zen', url: 'https://opencode.ai/zen/v1' },
'opencode-go': { name: 'OpenCode Go', url: 'https://opencode.ai/zen/go/v1' },
nvidia: { name: 'NVIDIA', url: 'https://integrate.api.nvidia.com/v1' },
};
const SETUP_PROVIDER_NAMES = ['deepseek', 'openai', 'openrouter', 'ollama', 'xai', 'anthropic', 'groq', 'gemini', 'opencode-zen', 'opencode-go'];
const SETUP_PROVIDER_NAMES = ['deepseek', 'openai', 'openrouter', 'ollama', 'xai', 'anthropic', 'groq', 'gemini', 'opencode-zen', 'opencode-go', 'nvidia'];
const SETUP_DEVICE_AUTH_PROVIDERS = [
{ key: 'copilot', name: 'GitHub Copilot', aliases: ['github'], command: '/setup copilot' },
{ key: 'chatgpt-subscription', name: 'ChatGPT Subscription', aliases: ['chatgptsubscription', 'chatgpt-sub', 'codex'], command: '/setup chatgpt-subscription' },
@@ -97,6 +99,7 @@ function _setupProviderFromInput(input) {
google: 'gemini',
xai: 'xai',
grok: 'xai',
nvidia: 'nvidia',
};
return SETUP_PROVIDER_URLS[aliases[raw] || raw] || null;
}
@@ -124,6 +127,7 @@ function _extractSetupProviderCredential(input) {
['groq', 'groq'],
['google', 'gemini'], ['gemini', 'gemini'],
['x ai', 'xai'], ['xai', 'xai'], ['grok', 'xai'],
['nvidia', 'nvidia'],
];
for (const [alias, key] of providerAliases) {
const re = new RegExp('(^|\\s|[,;:])(' + alias.replace(/\s+/g, '\\s+') + ')(?=$|\\s|[,;:])', 'i');
+2
View File
@@ -347,6 +347,8 @@ class TestIsChatModel:
"gpt-4o", "gpt-4o-mini", "claude-sonnet-4", "llama-3.3-70b",
"deepseek-chat", "gemini-2.0-flash", "o3",
"llama-4-scout-17b-16e-instruct",
"gemma-2b-it", "google/gemma-2b-it",
"bigcode/starcoder2-15b-instruct",
])
def test_chat_models(self, model_id):
assert _is_chat_model(model_id) is True
+2
View File
@@ -40,6 +40,7 @@ class TestDetectProvider:
("https://anthropic.com/v1", "anthropic"),
("https://openrouter.ai/api/v1", "openrouter"),
("https://api.groq.com/openai/v1", "groq"),
("https://integrate.api.nvidia.com/v1", "nvidia"),
("http://localhost:11434/api", "ollama"),
("https://ollama.com", "ollama"),
# xAI, DeepSeek and Gemini's OpenAI-compatible surface are NOT
@@ -84,6 +85,7 @@ class TestProviderLabel:
("https://api.openai.com/v1", "OpenAI"),
("https://openrouter.ai/api/v1", "OpenRouter"),
("https://api.groq.com/openai/v1", "Groq"),
("https://integrate.api.nvidia.com/v1", "NVIDIA"),
("https://api.mistral.ai/v1", "Mistral"),
("https://api.deepseek.com", "DeepSeek"),
("https://generativelanguage.googleapis.com/v1beta/openai", "Google"),
+4
View File
@@ -50,6 +50,9 @@ PROVIDER_CASES = [
("groq", "https://api.groq.com/openai/v1",
"https://api.groq.com/openai/v1/chat/completions",
"https://api.groq.com/openai/v1/models"),
("nvidia", "https://integrate.api.nvidia.com/v1",
"https://integrate.api.nvidia.com/v1/chat/completions",
"https://integrate.api.nvidia.com/v1/models"),
("xai", "https://api.x.ai/v1",
"https://api.x.ai/v1/chat/completions",
"https://api.x.ai/v1/models"),
@@ -112,6 +115,7 @@ def test_headers_anthropic_without_key_still_sends_version():
"https://api.x.ai/v1",
"https://api.deepseek.com",
"https://api.groq.com/openai/v1",
"https://integrate.api.nvidia.com/v1",
"https://generativelanguage.googleapis.com/v1beta/openai",
])
def test_headers_openai_style_use_bearer(base):