Commit Graph

14 Commits

Author SHA1 Message Date
Lucas Daniel 55ff22c6d5 fix(chat): stabilize system prompt, sequence memory extraction, and send stable session id to preserve KV cache (#3360)
* fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache

Fixes #2927. As diagnosed in the issue, three things in Odysseus's request
pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache
continuity, forcing a full prompt re-evaluation (15-30s+) on every turn:

1. Dynamic content folded into the system prompt every turn. Both the chat
   preface (ChatProcessor.build_context_preface) and the agent system prompt
   (_build_system_prompt) injected current_datetime_prompt() — text that
   changes every minute — directly into system-role messages, which llm_core
   then concatenates into the single system message sent as the cached
   prefix. Any byte difference there invalidates the entire cache. Moved this
   to a new current_datetime_context_message() helper that returns a
   standalone user-role message, inserted near the end of the array (right
   before the latest user turn) instead of mixed into the system prompt. The
   static system prefix (preset prompt + safety policy + agent base prompt)
   now stays byte-identical across turns of the same session.

2. Memory/skill extraction side-requests competed with the main completion.
   run_post_response_tasks fired extract_and_store / maybe_extract_skill via
   asyncio.create_task — fire-and-forget coroutines that could overlap the
   next turn's main request and steal llama.cpp's limited processing slots,
   evicting the cached checkpoint. They're now queued through a new
   _run_extraction_jobs_sequentially helper that waits for the session's
   stream to go idle and runs the jobs strictly one at a time.

3. No stable session identifier was sent to local backends, so llama.cpp
   assigned a new processing slot via LRU every turn ("session_id=<empty>
   server-selected (LCP/LRU)"), losing slot affinity. Added
   _apply_local_cache_affinity() in llm_core, which sets session_id and
   cache_prompt: true on outgoing payloads — gated to self-hosted
   OpenAI-compatible endpoints only (never api.openai.com or other cloud
   providers, which reject unrecognized request fields with a 400). Threaded
   session_id through stream_llm / llm_call_async / stream_agent_loop from
   the existing Odysseus session id.

Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload-
assembly and scheduling code paths: byte-identical system prefix across two
turns of the same session (with a regression check that genuinely changed
instructions DO still change it), the dynamic time block landing as a
user-role message, extraction jobs waiting for the stream to go idle and
running sequentially, and the outgoing payload carrying a stable session_id
(same across turns of one session, different across sessions) only for
self-hosted endpoints. Updated tests/test_user_time.py for the new message
placement.

* fix(tests): accept owner= kwarg in normalize_model_id monkeypatch

The upstream normalize_model_id signature now takes an owner= keyword
argument, and chat_helpers.py passes owner=getattr(sess, "owner", None)
at the call site. Update the test stub lambda to **kwargs so it handles
the new argument without breaking, and update chat_helpers.py to forward
the owner parameter consistently.

---------

Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
2026-06-09 22:46:54 +01:00
Lucas Daniel 5462030cde fix(auth): per-user allowed-models checklist ignores cache, [None] doesn't block (#3355)
Three issues combined to make the per-user 'Allowed models' checklist
unreliable (#3032):

1. admin.js _loadModelsForUser fetched /api/models, which is backed by
   cached_models — endpoints that haven't been probed yet (e.g. a
   freshly-added DeepSeek API endpoint) simply didn't show up in the
   checklist. Switched to /api/model-endpoints, which always reflects
   every configured endpoint regardless of cache state.

2. _saveModels sent allowed_models: [] both when the admin clicked
   [All] (no restriction) and [None] (block everything) — the backend
   had no way to distinguish the two.

3. _enforce_chat_privileges treated an empty allowed_models list as
   'no restriction' (falsy -> skip the check), so [None] had no effect.

Added an explicit block_all_models privilege flag (defaulting to False,
and forced to False for admins) that admin.js now sets when zero models
are checked. _enforce_chat_privileges checks it first and 403s
regardless of allowed_models contents.
2026-06-08 22:52:39 +02:00
stocky789 1e0d9b92af feat: add ChatGPT Subscription provider (#2876)
* feat: Add ChatGPT Subscription support and related features

- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.

* refactor: share provider device-flow setup

- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI

* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX

Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
  shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
  applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
  (centralized behind _is_discovery_only_provider) — the Codex/Responses API
  has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
  table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
  backing them is deleted (_delete_orphaned_provider_auth), surfaced via
  cleared_provider_auth in the delete response.

Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
  panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
  into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
  Copy works in non-secure (HTTP/LAN) contexts.

* fix: harden chatgpt subscription provider

* chore: remove PR media from branch

* Fix chatgpt subscription recovery and token handling

---------

Co-authored-by: 5p00kyy <admin@5p00ky.dev>
2026-06-08 10:19:18 +02:00
Vykos 83b0ab7cd3 Scope auxiliary LLM endpoints by owner (#2996)
* fix(auth): scope auxiliary llm endpoints by owner

* fix(auth): scope auxiliary llm fallbacks by owner
2026-06-07 14:47:44 +02:00
Nicholai a3cb15d0a1 fix(agent): enforce guide-only tool policy (#3088) 2026-06-06 18:48:24 -06:00
Mohammed Riaz 6ccd4500d7 fix(chat): show requested and actual reply models
Show requested and actual reply models in chat labels when fallback or provider routing changes the responding model.
2026-06-06 04:30:16 -06:00
ghreprimand 545e692565 fix(auth): distinguish empty model allowlists (#2938)
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-05 20:27:10 +02:00
RaresKeY c12c2aa233 fix: normalize Gemma 4 thought-channel output (#2224) 2026-06-04 19:26:58 +02:00
Denis Kutuzov (Rybak27) ec3b8b42ae fix: auto-naming for 24h time format (#1374)
* fix: auto-naming for 24h time format

needs_auto_name() required AM/PM suffix for default
frontend-generated names like 'deepseek-v4-flash 17:46:02'.
Frontend uses toLocaleTimeString() which outputs 24h
format in most locales — so the regex never matched and
auto-naming silently skipped.

Made AM/PM optional and added re.IGNORECASE for 'am'/'pm'.

* test: add regression tests for needs_auto_name (24h + 12h + custom)

---------

Co-authored-by: Calculator Dev <dev@calculator.local>
2026-06-03 14:14:34 +09:00
Vykos 4771d80eb2 Harden session endpoint owner scope (#1308) 2026-06-03 02:40:22 +09:00
red person fd89d098a1 Chat: use cached endpoint model ids before probing 2026-06-02 21:00:58 +09:00
Tushar-Projects c3228f8b59 Background tasks: respect active session model fallback 2026-06-02 20:57:42 +09:00
Alexander Kenley 2c4b8b57dd feat(ai): add OpenRouter and Ollama Cloud providers (#231)
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 14:26:10 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00