Commit Graph

22 Commits

Author SHA1 Message Date
KYDNO 955455b797 fix(kimi): resolve Kimi Code API 403 errors and User-Agent restrictions (#3549)
* fix(kimi): resolve Kimi Code API 403 errors and User-Agent restrictions

Kimi Code subscription keys require a whitelisted coding-agent User-Agent to avoid access_terminated_error 403s. This adds User-Agent probing and caching for Kimi Code endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(kimi): omit temperature for kimi-for-coding API calls

Kimi Code rejects any non-default temperature with HTTP 400, which broke deep research probes and low-temp LLM rounds.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-15 15:56:54 +09:00
Muhammed Midlaj 4b0a977988 fix(models): probe /v1/models for path-less LM Studio endpoints
Probe /v1/models for path-less OpenAI-compatible model endpoints and surface clearer LM Studio diagnostics with the actual probed URL.
2026-06-15 15:09:50 +09:00
Max Hsu 66c25cbc2f fix(models): reassign default endpoint when current default is disabled (#3649)
Adding a new endpoint only auto-set the global default chat endpoint when
none was configured (`if not settings.get("default_endpoint_id")`). When the
existing default pointed at an endpoint the user had since disabled, it was
never reassigned, so features that read the raw `default_endpoint_id` setting
(notably Memory → Tidy) failed with "No default model configured — set one in
Settings" even though an enabled endpoint existed.

Reassign the default when the configured endpoint is missing/disabled, via a
new pure `_default_endpoint_needs_assignment` helper. Adds unit coverage for
the helper plus route-level regression tests for the disabled/enabled cases.

Fixes #3586

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:17:31 +02:00
Maruf Hasan c3fcaf15b7 feat(providers): add NVIDIA AI provider endpoint support (#3456)
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 11:06:12 +02:00
Wes Huber 85c6056c87 test(models): add regression coverage for Z.AI coding endpoint probing (#2244)
Add focused tests for the z.ai/api/coding path override:
- _match_provider_curated: 5 tests verifying coding vs base key
- _probe_endpoint: 3 tests verifying model preservation, curated
  append on partial response, and base-zai exclusion

Rebased onto dev per reviewer request.

Fixes #2230

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-08 23:07:29 +01:00
Giuseppe Castelluccio 095c74b985 fix(security): fail closed in /api/models auth gate on unexpected errors (#3489)
GET /api/models swallowed any non-HTTPException raised while checking
whether the caller is authenticated (bare except Exception: pass), so a
broken auth_manager or an exception from get_current_user silently
granted the full model list to an anonymous caller instead of rejecting
the request. Now any unexpected exception logs and returns HTTP 500.

Split out of #2360 per reviewer request to keep the deny-list and the
auth-gate fix as separate, single-purpose PRs.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 20:23:39 +02:00
stocky789 1e0d9b92af feat: add ChatGPT Subscription provider (#2876)
* feat: Add ChatGPT Subscription support and related features

- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.

* refactor: share provider device-flow setup

- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI

* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX

Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
  shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
  applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
  (centralized behind _is_discovery_only_provider) — the Codex/Responses API
  has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
  table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
  backing them is deleted (_delete_orphaned_provider_auth), surfaced via
  cleared_provider_auth in the delete response.

Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
  panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
  into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
  Copy works in non-secure (HTTP/LAN) contexts.

* fix: harden chatgpt subscription provider

* chore: remove PR media from branch

* Fix chatgpt subscription recovery and token handling

---------

Co-authored-by: 5p00kyy <admin@5p00ky.dev>
2026-06-08 10:19:18 +02:00
michaelxer bdf4ec8b24 fix: fall back to /models probe when base URL returns 404 (#3205)
_ping_endpoint() probes the bare base URL for non-Ollama endpoints.
OpenAI-compatible servers like llama-swap return 404 on the /v1 prefix
but 200 on /v1/models, causing endpoints to appear offline despite being
fully functional.

Add a /models fallback when the base URL returns a non-auth 4xx.
Auth failures (401/403) are treated as definitive — probing /models
would just repeat the same rejection.

Fixes #3181

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
2026-06-07 16:09:33 +02:00
Ocean Bennett 5911b8c0dc fix(models): allow same endpoint URL with different keys (#2758)
* fix(models): allow same endpoint URL with different keys

* fix(models): show endpoint key fingerprints
2026-06-05 21:12:14 +02:00
Alexandre Teixeira 452a94fb1b refactor(tests): centralize fake endpoint resolver cleanup
Test-only refactor continuing #2523. Centralizes the final repeated fake src.endpoint_resolver cleanup pattern into a focused import-state helper.
2026-06-05 13:23:46 +01:00
Alexandre Teixeira 3b292403dc fix(tests): accept verify in endpoint HTTP mocks
Updates endpoint/model-route test HTTP mocks to accept the verify keyword argument passed by endpoint probing code. Restores one focused part of the Python CI baseline tracked in #2580.
2026-06-04 17:53:18 +01:00
Yuri a2e691da2b fix(models): stabilize proxy endpoint refresh behavior
* fix: support large proxy model endpoint refresh

Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline.

Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails.

Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage.

* fix: preserve endpoint ping status semantics
2026-06-04 04:56:11 +01:00
pewdiepie-archdaemon 6861c41580 Reapply "Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus"
This reverts commit cc8fe2f6e3.
2026-06-03 22:47:00 +09:00
pewdiepie-archdaemon cc8fe2f6e3 Revert "Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus"
This reverts commit 8161c1253d, reversing
changes made to 8c2705b42a.
2026-06-03 22:46:19 +09:00
Alexandre Teixeira 145f4fd2b4 feat(models): support pinned endpoint model IDs 2026-06-03 13:00:07 +01:00
red person 42ae905df7 fix(models): clear deleted endpoint fallback refs (#1207) 2026-06-02 23:41:04 +09:00
red person 76a7685105 fix(models): clear stale speech endpoint settings (#1196) 2026-06-02 23:32:01 +09:00
Hayk Arzumanyan 514050d098 Models: rewrite Docker loopback endpoints to host gateway
In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio
default http://localhost:1234/v1) targets the Odysseus container itself, not
the host running the server, so the probe gets a connection error and the
endpoint is rejected with a misleading 'No models found for that provider/key'.
Rewrite loopback to host.docker.internal (which compose already maps to
host-gateway) for the probe and the saved URL, mirroring the existing Ollama
handling. Gated on actually being in a container with the gateway reachable, so
native installs and gateway-less deploys are untouched.

Fixes #25

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-02 20:34:40 +09:00
Prakhya a96593a99b Improve Ollama endpoint error messages 2026-06-02 05:53:50 +09:00
Alexander Kenley 2c4b8b57dd feat(ai): add OpenRouter and Ollama Cloud providers (#231)
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 14:26:10 +09:00
Ranjan Sharma 058d32451c Fix fresh checkout test failures
Make .env optional in tests and prevent endpoint resolver stubs from leaking into model route tests.
2026-06-01 02:22:17 +00:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00