* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)
* feat: add NVIDIA option to provider settings dropdown and aliases
* test: add NVIDIA provider detection and endpoint tests
* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering
- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
-embedqa, -nemoretriever
* Expand non-chat model filtering for NVIDIA embedding/guard/video models
Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
ai-synthetic-video, cosmos-reason2
Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)
* Filter non-chat models in _probe_endpoint via _is_chat_model()
Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.
* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models
Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.
Adds: embed, bge, recurrent, starcoder, gemma-2b
* fix(model-routes): drop collision-prone substrings from global non-chat filter
The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:
- gemma-2b matches google/gemma-2b-it (instruct chat model)
- starcoder matches bigcode/starcoder2-15b (code completion model)
- recurrent matches google/recurrentgemma-2b (language model)
- guard matches meta-llama/Llama-Guard-3-8B (safety classifier)
Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.
Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS
Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* style: fix indentation of groq and xai test cases in test_provider_endpoints.py
---------
Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
* feat: Add ChatGPT Subscription support and related features
- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.
* refactor: share provider device-flow setup
- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI
* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX
Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
(centralized behind _is_discovery_only_provider) — the Codex/Responses API
has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
backing them is deleted (_delete_orphaned_provider_auth), surfaced via
cleared_provider_auth in the delete response.
Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
Copy works in non-secure (HTTP/LAN) contexts.
* fix: harden chatgpt subscription provider
* chore: remove PR media from branch
* Fix chatgpt subscription recovery and token handling
---------
Co-authored-by: 5p00kyy <admin@5p00ky.dev>
- Add OpenCode Zen (https://opencode.ai/zen/v1) and Go (https://opencode.ai/zen/go/v1)
- Add provider detection via _host_match() in llm_core.py
- Add curated model list entries in model_routes.py
- Add webhook provider URLs
- Add provider icon (providers.js) and dropdown options (index.html)
- Add auto-detection patterns and setup URLs (slashCommands.js)
- Whitelist opencode.ai in URL validation (admin.js)
- Rebased on main to fix merge conflicts with _HOST_TO_CURATED refactor
Co-authored-by: M57 <hy4ri@users.noreply.github.com>
* Show the serving provider in the model-info card
The model-info popup (click the model name on a message) shows the model
and pricing, with a logo inferred from the model NAME. But the same model
can be served by different endpoints — e.g. claude-haiku via OpenRouter
vs GitHub Copilot vs Anthropic direct — which the name-based logo can't
distinguish.
Add a 'Provider' line derived from the session's endpoint URL:
- new providerLabel(endpointUrl) in static/js/providers.js maps the host
to a friendly name (GitHub Copilot, OpenRouter, Anthropic, OpenAI,
Google, AWS Bedrock, DeepSeek, Mistral, Groq, Together, Fireworks,
Perplexity, xAI), 'Local' for loopback/LAN, else the bare host.
- static/js/chatRenderer.js renders it under Model in the card, from
window.sessionModule.getCurrentEndpointUrl().
* Anchor provider-label patterns to the hostname
providerLabel matched its patterns against the full endpoint URL with
unanchored substrings, so a host like max.airlines.com matched /x\.ai/ and was
mislabeled "xAI". Anchor each pattern to the end of the hostname ((^|.)domain$)
and test against the parsed host instead of the raw URL.