odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-18 02:35:23 -04:00

Author	SHA1	Message	Date
Kenny Van de Maele	263d41c58a	fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers (#3945 ) * fix(llm): stop sending llama.cpp slot-affinity fields to cloud providers _apply_local_cache_affinity adds session_id + cache_prompt for llama.cpp KV-cache slot affinity (#2927), gated on _is_self_hosted_openai_compatible, which treated any unknown OpenAI-compatible host as self-hosted. Strict cloud providers added as custom endpoints (Mistral at api.mistral.ai) reject unknown body fields, so every request failed with 422 extra_forbidden. Self-hosted now also requires the endpoint to resolve as local via model_context.is_local_endpoint: loopback/private/tailscale host, or endpoint kind explicitly configured as "local" (the escape hatch for tunneled self-hosted servers). is_local_endpoint is promoted to a public name since llm_core now shares it. Fixes #3793 * test(llm): sweep cloud OpenAI-compatible hosts in affinity gating Parametrized cases adapted from #3839 (credit: Shabablinchikow): deepseek, x.ai, together, fireworks, and the Gemini OpenAI-compat endpoint must all stay free of the llama.cpp extras, not just the Mistral host from #3793. * fix(llm): narrow the Tailscale range to 100.64.0.0/10 in is_local_endpoint Review finding on #3945: _PRIVATE_PREFIXES carried a bare "100." prefix, treating all of 100.0.0.0/8 as local while Tailscale only uses the CGNAT block 100.64.0.0/10. Public 100.x hosts (e.g. AWS ranges outside the block) were classified local and still received the llama.cpp extras this PR exists to keep away from strict providers. Match the narrowed classification routes/model_routes.py already uses, with boundary tests just below, inside, and just above the range.	2026-06-11 17:51:03 +02:00
Ocean Bennett	e7c1d75884	fix(models): query v1 models for llama-server endpoints (#3380 ) * fix(models): query v1 models for llama-server endpoints * test(models): accept owner kwargs in llama-server regression	2026-06-09 01:09:02 +02:00
nubs	6973c5427c	fix(model-context): count tool_calls in estimate_tokens so compaction sees real size (#2751 )	2026-06-05 15:56:54 +02:00
nubs	19a3fc59c9	fix(model-context): key context-window cache by (endpoint, model) (#2614 ) get_context_length() cached the resolved context window by model id alone, so two different remote endpoints serving the same model id (e.g. a capped proxy at 8k vs. the full provider at 200k) collided: the first to resolve won process-wide and the other endpoint was served the wrong window. That silently over-trims conversations on the larger-window endpoint (it feeds context_compactor) or overflows the smaller one (provider 400s). Key the cache on (endpoint_url, model). Local endpoints already always re-query, so they are unaffected. Fixes #2603	2026-06-05 02:50:56 +02:00
Kenny Van de Maele	1cd0aa2b8c	feat(provider): add GitHub Copilot provider with device-flow auth (#1480 ) * feat(provider): add GitHub Copilot provider with device-flow auth Adds GitHub Copilot as a model provider, so Copilot models (gpt-4o/4.1/5, Claude, Gemini, …) work through the normal chat + agent loop, incl. native tool calling and vision. Auth is one-click via the GitHub OAuth device flow; the access token is stored as the endpoint's (encrypted) api_key and sent directly as `Authorization: Bearer` (no Copilot-token exchange, no refresh — matching how editors talk to the Copilot API). Copilot is a normal ModelEndpoint detected by host; the only provider-specific behaviour is a small set of required request headers, injected centrally. Sign-in is available from Settings → model endpoints ("Connect GitHub Copilot") and from chat via `/setup copilot`. - src/copilot.py (new), routes/copilot_routes.py (new): constants, header builders, device-flow start/poll, model discovery, owner-scoped endpoint provisioning. - src/llm_core.py, src/endpoint_resolver.py: detect `copilot`, inject headers, per-request x-initiator/vision. - src/agent_loop.py: allowlist api.githubcopilot.com for native tool schemas. - src/model_context.py: known context windows for Copilot (no unauthenticated /models probe). - static/, README, tests/test_copilot.py. Tidy copilot_routes: clarify supports_tools, note _PENDING is per-process	2026-06-04 21:13:14 +02:00
Yuri	a2e691da2b	fix(models): stabilize proxy endpoint refresh behavior * fix: support large proxy model endpoint refresh Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline. Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails. Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage. * fix: preserve endpoint ping status semantics	2026-06-04 04:56:11 +01:00
danielroytel	39848a168b	fix: recognize Gemma 4 as a thinking model and add context entry (#1642 ) Gemma 4 returns reasoning_content in streaming responses via llama-server, but the model wasn't listed in _THINKING_MODEL_PATTERNS, causing reasoning tokens to be mishandled. Add "gemma" to the pattern list and register Gemma 4's 128K context window in KNOWN_CONTEXT_WINDOWS so the agent loop budgets context correctly. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 14:23:18 +09:00
SurprisedDuck	d06b6d87d3	Models: prefer longest known context match KNOWN_CONTEXT_WINDOWS lists 'o1' (200k) before 'o1-mini' (128k), and _lookup_known returned on the first substring hit — so "o1-mini" matched 'o1' and reported 200000 instead of 128000. Track the longest matching key instead, so the most specific entry wins regardless of table order.	2026-06-02 20:33:09 +09:00
ooovenenoso	cd6041477c	Refresh local model context after restart Co-authored-by: Kevin <120500656+oooindefatigable@users.noreply.github.com>	2026-06-02 05:54:06 +09:00
Elle	d885c70462	Treat Docker host gateway as local When running Odysseus in Docker and connecting to a local LLM on the host machine (e.g. `llama.cpp` or `Ollama`), the standard endpoint `http://host.docker.internal` is used to breach the container network. Because `host.docker.internal` was missing from `_LOCAL_HOSTS`, Odysseus incorrectly treated local self-hosted models as cloud APIs. This triggered the fallback behavior where actual API-reported context limits were being ignored and overridden by hardcoded fallbacks in `KNOWN_CONTEXT_WINDOWS`. Changes - Added `"host.docker.internal"` to the `_LOCAL_HOSTS` whitelist in `src/model_context.py` so that Dockerized deployments correctly trust and respect the context limits of locally hosted models. Checks Ran - [x] Syntax check (`python -m py_compile src/model_context.py`) - [x] Tested manually in Docker (`docker compose up -d --build`) on a Windows host using `llama-server`. The correct API context length is now correctly reported in the UI instead of falling back to the 131k hardcode.	2026-06-02 05:49:59 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

11 Commits