43 Commits

Author SHA1 Message Date
KYDNO 955455b797 fix(kimi): resolve Kimi Code API 403 errors and User-Agent restrictions (#3549)
* fix(kimi): resolve Kimi Code API 403 errors and User-Agent restrictions

Kimi Code subscription keys require a whitelisted coding-agent User-Agent to avoid access_terminated_error 403s. This adds User-Agent probing and caching for Kimi Code endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(kimi): omit temperature for kimi-for-coding API calls

Kimi Code rejects any non-default temperature with HTTP 400, which broke deep research probes and low-temp LLM rounds.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-15 15:56:54 +09:00
Muhammed Midlaj 4b0a977988 fix(models): probe /v1/models for path-less LM Studio endpoints
Probe /v1/models for path-less OpenAI-compatible model endpoints and surface clearer LM Studio diagnostics with the actual probed URL.
2026-06-15 15:09:50 +09:00
Max Hsu 66c25cbc2f fix(models): reassign default endpoint when current default is disabled (#3649)
Adding a new endpoint only auto-set the global default chat endpoint when
none was configured (`if not settings.get("default_endpoint_id")`). When the
existing default pointed at an endpoint the user had since disabled, it was
never reassigned, so features that read the raw `default_endpoint_id` setting
(notably Memory → Tidy) failed with "No default model configured — set one in
Settings" even though an enabled endpoint existed.

Reassign the default when the configured endpoint is missing/disabled, via a
new pure `_default_endpoint_needs_assignment` helper. Adds unit coverage for
the helper plus route-level regression tests for the disabled/enabled cases.

Fixes #3586

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:17:31 +02:00
Maruf Hasan c3fcaf15b7 feat(providers): add NVIDIA AI provider endpoint support (#3456)
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 11:06:12 +02:00
pewdiepie-archdaemon d397b3db2f Restore dropped regression fixes 2026-06-09 10:31:43 +09:00
pewdiepie-archdaemon 37c573d865 Fix model endpoint route test regressions 2026-06-09 10:16:38 +09:00
pewdiepie-archdaemon fa8c93ec0a Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.

Highlights:
- Ollama as a first-class backend in the Cookbook:
  * Download input accepts ollama-style names (name:tag) → backend=ollama
  * /api/cookbook/ollama/library (cached scrape of ollama.com + curated
    fallback so classic models like qwen2.5 stay reachable)
  * "Browse Ollama library" toggle below Download with size chips
  * Engine=Ollama in hwfit toolbar merges the Ollama library into the
    main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
    columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
  loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
  checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
  pulled up 8px to align with the row's checkboxes, GGUF File dropdown
  exposed for Ollama backend, GPU re-render on Edit serve restore,
  _forceBackend flag so saved serveState wins over backend detection,
  cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
  Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
  Clear-offline buttons in Added Models toolbar, offline-pill removed
  (opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
  reachable (vLLM returns 404 on bare /v1, fully working endpoints
  were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
  serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
  Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
  serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
  ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
  auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish
2026-06-09 09:46:19 +09:00
Ocean Bennett e7c1d75884 fix(models): query v1 models for llama-server endpoints (#3380)
* fix(models): query v1 models for llama-server endpoints

* test(models): accept owner kwargs in llama-server regression
2026-06-09 01:09:02 +02:00
Giuseppe Castelluccio 095c74b985 fix(security): fail closed in /api/models auth gate on unexpected errors (#3489)
GET /api/models swallowed any non-HTTPException raised while checking
whether the caller is authenticated (bare except Exception: pass), so a
broken auth_manager or an exception from get_current_user silently
granted the full model list to an anonymous caller instead of rejecting
the request. Now any unexpected exception logs and returns HTTP 500.

Split out of #2360 per reviewer request to keep the deny-list and the
auth-gate fix as separate, single-purpose PRs.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 20:23:39 +02:00
stocky789 1e0d9b92af feat: add ChatGPT Subscription provider (#2876)
* feat: Add ChatGPT Subscription support and related features

- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.

* refactor: share provider device-flow setup

- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI

* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX

Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
  shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
  applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
  (centralized behind _is_discovery_only_provider) — the Codex/Responses API
  has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
  table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
  backing them is deleted (_delete_orphaned_provider_auth), surfaced via
  cleared_provider_auth in the delete response.

Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
  panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
  into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
  Copy works in non-secure (HTTP/LAN) contexts.

* fix: harden chatgpt subscription provider

* chore: remove PR media from branch

* Fix chatgpt subscription recovery and token handling

---------

Co-authored-by: 5p00kyy <admin@5p00ky.dev>
2026-06-08 10:19:18 +02:00
Sebastian Andres El Khoury Seoane 8d9d4ec9c6 feat(platform): Add support for APFEL as part of the dependencies and models for the Cookbook. (#2657)
* feat(platform): add support for Apple Silicon detection in platform compatibility

test(tests): enhance shell_routes tests for Apple Silicon compatibility

* fix issues with missing import

* fix: correct package name in package-lock.json and enhance package installation commands in shell_routes.py and cookbook.js

* feat: add Apfel startup and health checks on macOS

- bootstrap Apfel via Homebrew on arm64 macOS
- start `apfel --serve --port 11435` detached for Odysseus
- verify readiness via `/health`
- clean up the Apfel process on exit or Ctrl+C

* fix: duplicate variable declaration post-merge conflict
- Should fix `node` CI issues.

* fix: issues with the update status of the APFEL dependency.
- fixed by changing the main conditional that determines the update.

* Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file.

* Fix: whitespace issues with the model_routes file

* Fix: Remove unnecessary whitespaces and formatting for the model_routes.py file. Final

* Fix: Fixed updates using PIP for APFEL instead of custom cmd
2026-06-07 17:28:02 +02:00
M57 12cb39cbd9 feat: add OpenCode Zen and Go as provider options (#26)
- Add OpenCode Zen (https://opencode.ai/zen/v1) and Go (https://opencode.ai/zen/go/v1)
- Add provider detection via _host_match() in llm_core.py
- Add curated model list entries in model_routes.py
- Add webhook provider URLs
- Add provider icon (providers.js) and dropdown options (index.html)
- Add auto-detection patterns and setup URLs (slashCommands.js)
- Whitelist opencode.ai in URL validation (admin.js)
- Rebased on main to fix merge conflicts with _HOST_TO_CURATED refactor

Co-authored-by: M57 <hy4ri@users.noreply.github.com>
2026-06-07 16:43:00 +02:00
michaelxer bdf4ec8b24 fix: fall back to /models probe when base URL returns 404 (#3205)
_ping_endpoint() probes the bare base URL for non-Ollama endpoints.
OpenAI-compatible servers like llama-swap return 404 on the /v1 prefix
but 200 on /v1/models, causing endpoints to appear offline despite being
fully functional.

Add a /models fallback when the base URL returns a non-auth 4xx.
Auth failures (401/403) are treated as definitive — probing /models
would just repeat the same rejection.

Fixes #3181

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
2026-06-07 16:09:33 +02:00
Ocean Bennett 5911b8c0dc fix(models): allow same endpoint URL with different keys (#2758)
* fix(models): allow same endpoint URL with different keys

* fix(models): show endpoint key fingerprints
2026-06-05 21:12:14 +02:00
Giuseppe bc83479f94 fix: bool('false') is True coerces endpoint toggles incorrectly (#2361)
Python's bool('false') returns True because the string is non-empty.
A JS client serialising a boolean as the string 'false' would have
supports_tools or is_enabled silently flipped to True — so 'disable
tool support' would actually enable it.

Use an explicit lookup dict for supports_tools and a case-insensitive
string check for is_enabled so both string and native bool inputs are
handled correctly.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 19:43:38 +02:00
WasserEsser 20cc23c9bd fix(models): make pinned models visible in chat UI (#2481)
Two bugs prevented pinned models from appearing in the chat model picker:

1. _fetch_models() only used _cached_model_ids(), ignoring pinned_models.
   Since Fireworks AI doesn't list kimi-k2p6-turbo in /v1/models, the
   cached list was empty, so the endpoint showed as offline with no models.

2. _curate_models() filtered unknown pinned IDs into models_extra, but the
   chat UI only reads models (primary list). Pinned models stayed invisible.

Fix: use _visible_models() to merge cached + pinned, then promote pinned
IDs from models_extra to models so they appear in the dropdown.

Closes #1521 follow-up
2026-06-04 19:17:37 +02:00
tanmayraut45 f59edee611 Support extra CA bundle for private-CA LLM providers (#769)
Adding GigaChat (Sber) or an on-premise enterprise LLM gateway as a
model endpoint fails on first probe with

    CERTIFICATE_VERIFY_FAILED: self-signed certificate in certificate
    chain (_ssl.c:1000)

because their TLS chain is signed by a private root CA (Russian Trusted
Root CA for GigaChat; corporate CA for on-prem) that isn't part of the
default system / certifi trust store. The endpoint shows offline in
the picker even though the URL and API key are correct (issue #722).

The right fix is to extend the trust store, not to weaken verification.
This change:

- src/tls_overrides.py: new module that resolves an opt-in env var
  LLM_CA_BUNDLE at import time, builds a shared SSLContext via
  ssl.create_default_context() (so the system / certifi bundle is
  loaded first) and layers the operator's PEM on top with
  load_verify_locations(). Exposes llm_verify() returning a value
  suitable for httpx `verify=`. Defaults to True (httpx built-in
  trust) when the env var is unset, when the file is missing, or
  when the PEM fails to load — verification is never silently
  disabled, the warning is logged and we fall back to the safe path.

- src/llm_core.py: thread llm_verify() into the shared AsyncClient
  used by stream_llm / streaming completions.

- routes/model_routes.py: thread llm_verify() into the five httpx.get
  call sites in _probe_endpoint / _ping_endpoint so adding a
  private-CA endpoint goes green on the very first probe and the
  picker stops showing it offline.

- .env.example: document LLM_CA_BUNDLE with the GigaChat case as the
  concrete example.

Deliberately NOT included: a verify=False knob (global or per-host).
Disabling verification exposes the affected endpoint to MITM, and the
operator-supplied bundle is the correct fix for legitimate private-CA
providers — so the only switch in this PR is the safe one.

Closes #722.
2026-06-04 13:18:50 +01:00
Giuseppe f6a5f6592f fix: log warnings on silently swallowed agent and endpoint failures (#2367)
get_builtin_overrides() was swallowing all exceptions with a bare
`except Exception: pass`, so misconfigured tool-description overrides
would silently produce wrong agent behaviour with no log trace.

The background endpoint refresh loop had the same pattern: any probe
failure was silently ignored, giving operators no signal that the
refresh was broken.

Also removes a circular self-import (`from src.agent_loop import
_build_base_prompt`) inside _build_system_prompt; the function is
already in scope and the import created a latent circular reference risk.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 12:29:31 +01:00
Yuri a2e691da2b fix(models): stabilize proxy endpoint refresh behavior
* fix: support large proxy model endpoint refresh

Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline.

Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails.

Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage.

* fix: preserve endpoint ping status semantics
2026-06-04 04:56:11 +01:00
pewdiepie-archdaemon 6861c41580 Reapply "Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus"
This reverts commit cc8fe2f6e3.
2026-06-03 22:47:00 +09:00
pewdiepie-archdaemon cc8fe2f6e3 Revert "Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus"
This reverts commit 8161c1253d, reversing
changes made to 8c2705b42a.
2026-06-03 22:46:19 +09:00
Alexandre Teixeira 145f4fd2b4 feat(models): support pinned endpoint model IDs 2026-06-03 13:00:07 +01:00
Michael Gerber e392be0d65 fix: Cookbook local GGUF serving inside Docker (#1264)
* fix: Cookbook local GGUF serving inside Docker

Cookbook’s in-container GGUF serve flow had multiple Docker-specific breakages that made local llama.cpp models fail or register against the wrong endpoint.

Fixes included here:

use the scanned model cache root when generating GGUF serve commands instead of hardcoding $HOME/.cache/huggingface/hub
fix malformed llama.cpp preflight build lines that generated invalid bash in serve runner scripts
preserve loopback model URLs inside Docker when the target port is already reachable from the Odysseus container, instead of rewriting them unconditionally to host.docker.internal
Before this change, Docker local serves could fail in several ways:

Cookbook pointed llama.cpp at the wrong GGUF path
generated serve runner scripts crashed before launch with a shell syntax error
successfully started in-container model servers were auto-registered as host.docker.internal: instead of localhost/127.0.0.1
This makes the Docker Cookbook path work as expected for: downloaded GGUF -> local llama.cpp serve -> endpoint registration

* test: add test for docker-local endpoint rewrites
2026-06-03 02:08:09 +09:00
ghreprimand 1fda906407 Fix Cookbook container-local model endpoints (#1223)
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-03 00:09:48 +09:00
red person 42ae905df7 fix(models): clear deleted endpoint fallback refs (#1207) 2026-06-02 23:41:04 +09:00
red person 76a7685105 fix(models): clear stale speech endpoint settings (#1196) 2026-06-02 23:32:01 +09:00
Robin Fröhlich 3c6ae3713e Models: add Z.AI coding endpoint and GLM vision detection 2026-06-02 20:59:17 +09:00
SurprisedDuck 934bca9e48 Providers: omit temperature for OpenAI reasoning models
* fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5)

These models only accept the default temperature; sending any explicit
value (even 0.0) returns HTTP 400 "Only the default (1) value is
supported". This broke two paths:

- Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so
  a perfectly valid o3/gpt-5 endpoint is reported as failing in the
  Model Endpoints health check.
- Chat/stream payloads send temperature unconditionally, so a non-default
  temperature preset 400s on these models.

The code already special-cases the same model family for
max_completion_tokens, so this adds a sibling _restricts_temperature()
helper and omits the field for those models, letting the API use its
required default. gpt-4.5 is intentionally excluded (not a reasoning
model; accepts temperature normally).

Adds tests/test_llm_core_temperature.py covering the predicate and the
synchronous payload builder.

* fix: also omit temperature for reasoning models on the direct-POST paths

The first commit only covered llm_call/llm_call_async/stream_llm and the
endpoint probe. Email auto-summary, urgency-less spam classification, the
email reply-summary endpoint, and gallery vision tagging build their
OpenAI payloads inline and POST them directly (requests/httpx), bypassing
llm_core — so a reasoning model configured there would still 400 on the
temperature field. These sites already branch on _uses_max_completion_tokens,
so they're the same class; added the matching _restricts_temperature guard.

gallery_routes also gains the max_completion_tokens branch it was missing,
so gpt-5 vision tagging works end to end.

Note: email_pollers urgency scoring goes through llm_call_async and was
already covered.
2026-06-02 20:58:33 +09:00
Yavor Ivanov 7cc8fdb2f5 Models: avoid hidden models in default fallback
Both get_default_chat and _recover_empty_session_model picked the
first model from cached_models[0] without checking hidden_models.
If the first cached model was hidden (e.g. minimax-m3), it was
returned as the default or used to repair empty session models,
even though the model list endpoints already filter hidden_models.

- Add _visible_models() helper that filters cached_models by
  hidden_models (mirrors the filtering in list_model_endpoints)
- Use _visible_models() in get_default_chat fallback (when no
  explicit default_model is saved)
- Use _visible_models() in _recover_empty_session_model (when
  repairing a session whose model field is empty before chat send)
- Add regression tests for hidden-model filtering in default chat
  resolution, and unit tests for _visible_models helper
2026-06-02 20:37:14 +09:00
Hayk Arzumanyan 514050d098 Models: rewrite Docker loopback endpoints to host gateway
In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio
default http://localhost:1234/v1) targets the Odysseus container itself, not
the host running the server, so the probe gets a connection error and the
endpoint is rejected with a misleading 'No models found for that provider/key'.
Rewrite loopback to host.docker.internal (which compose already maps to
host-gateway) for the probe and the saved URL, mirroring the existing Ollama
handling. Gated on actually being in a container with the gateway reachable, so
native installs and gateway-less deploys are untouched.

Fixes #25

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-02 20:34:40 +09:00
tanmayraut45 6c654fb0ef Models: detect bare Ollama URLs as online
_ping_endpoint() is the reachability fallback the model-endpoint POST
handler invokes when _probe_endpoint() returns no model ids. It GETs
base + "/models" and, on any sub-500 response, returns immediately with
`reachable = (status < 400)`. That early return runs before the
Ollama-native /api/version / /api/tags fallback below it.

For an Ollama URL without /v1 (the quickstart accepts both
http://localhost:11434 and http://127.0.0.1:11434, and the reporter
on #1025 explicitly tried both), the OpenAI-style probe target is
http://127.0.0.1:11434/models. Ollama returns 404 there because /models
only lives under /v1. _ping_endpoint then returned reachable=False and
the picker showed "Added (offline — will retry on next load)" on an
install that was running fine. /api/version was never tried.

Same shape for http://127.0.0.1:11434/api (the native Ollama root):
/api/models is also 404, same premature offline verdict.

_probe_endpoint() does fall through to /api/tags on a 4xx (the response
raises via raise_for_status), so the endpoint quietly recovers once
cached_models becomes non-empty on the next background refresh —
matching the second commenter's "had to disconnect manually then
reconnect for it to be detected" note. The bug is most visible while
no models are pulled yet (cached_models stays empty, _ping_endpoint
keeps voting offline).

Fix:

- Hoist the Ollama-shaped-URL test (port == 11434 or "ollama" in
  hostname — the same condition _probe_endpoint already uses) to the
  top of the function so both code paths share it.
- Stop short-circuiting on 4xx when the URL looks like Ollama: fall
  through to the existing /api/version + /api/tags reachability loop
  so an alive Ollama gets recognised even when its OpenAI surface has
  the wrong prefix for the user's input.
- Fix the `root` computation in that loop to strip a trailing /api as
  well as /v1, so http://127.0.0.1:11434/api no longer gets probed at
  /api/api/version.
- 4xx on non-Ollama hosts keeps the current semantics: a 401 from
  api.openai.com/v1/models is still a definitive offline verdict, not
  a reason to GET /api/version on OpenAI.

Closes #1025.
2026-06-02 20:27:41 +09:00
Tatlatat ffb77d7ff2 fix(auth): honor AUTH_ENABLED=false on owner-scoped endpoints (no /login loop) (#880)
When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still
returned 401 (api/models, api/research/*, api/email/*), so the front-end
redirected the browser to /login and the app was unusable despite auth being
turned off. require_user() in src/auth_helpers.py already documents and honors
this contract (issue #622) via 'if _auth_disabled(): return ""', but these
endpoints did their own get_current_user/is_configured check without it.

Make _require_user (research), the /api/models anti-leak guard, and
email_helpers._require_auth consult _auth_disabled() and let anonymous through
(owner='') only when the operator explicitly disabled auth. The 401 protection
is fully intact when AUTH_ENABLED=true. Verified end-to-end: with
AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.
2026-06-02 12:26:26 +09:00
tanmayraut45 0e31c38be0 Support in-place endpoint updates and recover empty-model sessions (#786)
The "don't wipe endpoint_url/model on endpoint delete" half of #587 landed
in 6a78b02 (Fix endpoint model preservation for tasks). The three remaining
follow-up pieces from the original PR — flagged in the review on #786 —
are:

- routes/model_routes.py: toggle_model_endpoint (PATCH) now accepts
  api_key and base_url, so the admin UI can rotate a key or fix a typo'd
  URL without going through delete+recreate. base_url is normalized the
  same way the POST handler does (strip /models, /chat/completions,
  /completions, /v1/messages, then _normalize_base). Cache invalidation
  matches the POST/DELETE paths and the response includes base_url so the
  frontend can confirm what was saved.

- routes/chat_routes.py: new _recover_empty_session_model picks
  cached_models[0] from the endpoint that matches sess.endpoint_url and
  persists it onto the Session row before the LLM call goes out. Wired
  into both /api/chat and /api/chat_stream after the existing
  _clear_orphaned_session_endpoint guard, so the order is: drop
  truly-orphaned sessions first, then heal the "picker showed it, session
  never knew" case.

- routes/chat_routes.py: when recovery fails (no endpoint, no cached
  models) raise HTTP 400 with a clear message instead of letting
  model="" reach the upstream as 401/503.

Closes #587.
2026-06-02 11:26:38 +09:00
LittleLlama 54ecfa39cf Provider detection: match by hostname instead of substring (re #768) (#815)
* Dedupe URL routing helpers and tighten adjacent hostname checks

* Match providers by hostname, not substring, in _detect_provider

_detect_provider used `"anthropic.com" in url`-style substring checks, so a URL
that merely contained a provider's domain in its path or query — or a look-alike
host like `anthropic.com.example` — was misclassified and picked the wrong
auth-header/payload shape. Switch it to the existing `_host_match` helper
(hostname exact/subdomain match), the same way the human-readable labels and
curated model lists already work, finishing that migration. Also harden
`_host_match` against trailing-dot FQDNs.

Not a credential-leak fix: _detect_provider only classifies a URL the admin
already configured next to its key, and the URL — not this function — decides
where the request goes. This is a correctness/consistency cleanup.

Adds tests that import the real helpers (test_endpoint_resolver.py tests local
copies, so it can't catch this) covering the substring false-positives.

Refs #768.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Import build_headers under its real name in model_routes

It was imported as `build_headers as _provider_headers`, which collides with
the unrelated llm_core._provider_headers(provider, headers) — same name,
different signature. Use the real name to remove the confusion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Use hostname matching in URL builders, not raw suffix checks

PR review flagged that _detect_provider() was hardened to match on
hostname, but several helpers still used raw host.endswith("anthropic.com")
/ host.endswith("ollama.com"), which match adjacent hosts like
notanthropic.com / notollama.com.

Route the remaining checks through _host_match(): _is_ollama_native_url
and _ollama_api_root in llm_core, and _anthropic_api_root / _ollama_api_root
in endpoint_resolver. With _detect_provider already hostname-correct, the
trailing "or host.endswith(...)" clauses in build_chat_url / build_models_url
are redundant, so drop them rather than fix the substring match in place.

Add builder-level tests asserting look-alike and domain-in-path hosts route
to the OpenAI-compatible default. They import the real builders and fail on
the pre-fix code.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 11:11:17 +09:00
wundervrc 3f6d630b56 Never resolve to a disabled endpoint model (#861)
Background tasks (e.g. the Email Tags / check_email_urgency action)
resolve their model through resolve_endpoint("utility") → Default Chat.
When the configured model is one the user has since disabled on the
endpoint, the resolver still dispatched to it — on Groq that surfaces as
every email failing with "HTTP 400: model ... requires terms acceptance".

Two paths fed this:
- The auto-pick fallback selected from cached_models without excluding
  the endpoint's hidden_models, so a disabled model listed first won.
- A stale default_model left pointing at a now-disabled model (seeded at
  endpoint registration from raw model_ids[0]) was used verbatim.

Fix resolve_endpoint / resolve_endpoint_by_id to drop a configured model
that's in hidden_models and to pick the first ENABLED chat model. Also
seed default_model on registration via _first_chat_model so we never pin
the global default to an embedding/tts entry a provider lists first.

Checks: python -m pytest tests/test_endpoint_resolver.py
        tests/test_model_routes.py tests/test_model_context.py (all pass);
        python -m py_compile app.py routes/model_routes.py
        src/endpoint_resolver.py.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 11:10:43 +09:00
pewdiepie-archdaemon 6a78b02976 Fix endpoint model preservation for tasks 2026-06-02 09:44:24 +09:00
Alexandre Teixeira 26483661da Restrict provider discovery to admins
Require admin access before serving provider discovery data from
GET /api/providers. This prevents normal authenticated users from
triggering provider discovery or receiving cached provider host data.

Keep GET /api/models available to normal users and leave the existing
admin-only GET /api/discover behavior unchanged.

Add a focused regression test to ensure unauthorized callers cannot
trigger discovery and cannot receive cached provider data.
2026-06-02 05:54:40 +09:00
Prakhya a96593a99b Improve Ollama endpoint error messages 2026-06-02 05:53:50 +09:00
Abhinav 9e8de43f25 fix: clear session headers on endpoint deletion (#477) 2026-06-01 22:19:54 +09:00
Alexander Kenley 2c4b8b57dd feat(ai): add OpenRouter and Ollama Cloud providers (#231)
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 14:26:10 +09:00
Sirsyorrz 09acf955f1 models: dedupe endpoints by base_url on create (#266)
POST /api/model-endpoints always inserted a new row, so Settings -> Add
Models -> Scan for Servers re-added any endpoint a user had already
registered manually — once under its model name (from the earlier
manual add) and again under its host:port (auto-generated when scan
posts without a name). The success toast then misreported the result
as "added N new".

Look up an existing endpoint with the same base_url accessible to the
caller (shared or owned by them) before inserting. If found, return it
with `existing: true` so the client can tell the difference between
an actual add and a dedupe hit. Toast now reads, e.g.,
"Found 1 server with 1 model — 1 already added".

Tested: POSTing the same base_url three times (incl. trailing-slash
variation) returns the same id each time; only one row exists.
2026-06-01 14:22:06 +09:00
pewdiepie-archdaemon fc7f107b22 Improve Ollama setup and model endpoint handling 2026-06-01 10:00:15 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00