* fix(chat): stabilize system prompt, sequence memory extraction, send stable session id to preserve KV cache
Fixes#2927. As diagnosed in the issue, three things in Odysseus's request
pattern actively destroyed local backends' (llama.cpp / LM Studio) KV-cache
continuity, forcing a full prompt re-evaluation (15-30s+) on every turn:
1. Dynamic content folded into the system prompt every turn. Both the chat
preface (ChatProcessor.build_context_preface) and the agent system prompt
(_build_system_prompt) injected current_datetime_prompt() — text that
changes every minute — directly into system-role messages, which llm_core
then concatenates into the single system message sent as the cached
prefix. Any byte difference there invalidates the entire cache. Moved this
to a new current_datetime_context_message() helper that returns a
standalone user-role message, inserted near the end of the array (right
before the latest user turn) instead of mixed into the system prompt. The
static system prefix (preset prompt + safety policy + agent base prompt)
now stays byte-identical across turns of the same session.
2. Memory/skill extraction side-requests competed with the main completion.
run_post_response_tasks fired extract_and_store / maybe_extract_skill via
asyncio.create_task — fire-and-forget coroutines that could overlap the
next turn's main request and steal llama.cpp's limited processing slots,
evicting the cached checkpoint. They're now queued through a new
_run_extraction_jobs_sequentially helper that waits for the session's
stream to go idle and runs the jobs strictly one at a time.
3. No stable session identifier was sent to local backends, so llama.cpp
assigned a new processing slot via LRU every turn ("session_id=<empty>
server-selected (LCP/LRU)"), losing slot affinity. Added
_apply_local_cache_affinity() in llm_core, which sets session_id and
cache_prompt: true on outgoing payloads — gated to self-hosted
OpenAI-compatible endpoints only (never api.openai.com or other cloud
providers, which reject unrecognized request fields with a 400). Threaded
session_id through stream_llm / llm_call_async / stream_agent_loop from
the existing Odysseus session id.
Tests in tests/test_kv_cache_invalidation_2927.py exercise the real payload-
assembly and scheduling code paths: byte-identical system prefix across two
turns of the same session (with a regression check that genuinely changed
instructions DO still change it), the dynamic time block landing as a
user-role message, extraction jobs waiting for the stream to go idle and
running sequentially, and the outgoing payload carrying a stable session_id
(same across turns of one session, different across sessions) only for
self-hosted endpoints. Updated tests/test_user_time.py for the new message
placement.
* fix(tests): accept owner= kwarg in normalize_model_id monkeypatch
The upstream normalize_model_id signature now takes an owner= keyword
argument, and chat_helpers.py passes owner=getattr(sess, "owner", None)
at the call site. Update the test stub lambda to **kwargs so it handles
the new argument without breaking, and update chat_helpers.py to forward
the owner parameter consistently.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
Three issues combined to make the per-user 'Allowed models' checklist
unreliable (#3032):
1. admin.js _loadModelsForUser fetched /api/models, which is backed by
cached_models — endpoints that haven't been probed yet (e.g. a
freshly-added DeepSeek API endpoint) simply didn't show up in the
checklist. Switched to /api/model-endpoints, which always reflects
every configured endpoint regardless of cache state.
2. _saveModels sent allowed_models: [] both when the admin clicked
[All] (no restriction) and [None] (block everything) — the backend
had no way to distinguish the two.
3. _enforce_chat_privileges treated an empty allowed_models list as
'no restriction' (falsy -> skip the check), so [None] had no effect.
Added an explicit block_all_models privilege flag (defaulting to False,
and forced to False for admins) that admin.js now sets when zero models
are checked. _enforce_chat_privileges checks it first and 403s
regardless of allowed_models contents.
* feat: Add ChatGPT Subscription support and related features
- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.
* refactor: share provider device-flow setup
- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI
* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX
Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
(centralized behind _is_discovery_only_provider) — the Codex/Responses API
has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
backing them is deleted (_delete_orphaned_provider_auth), surfaced via
cleared_provider_auth in the delete response.
Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
Copy works in non-secure (HTTP/LAN) contexts.
* fix: harden chatgpt subscription provider
* chore: remove PR media from branch
* Fix chatgpt subscription recovery and token handling
---------
Co-authored-by: 5p00kyy <admin@5p00ky.dev>
* fix: auto-naming for 24h time format
needs_auto_name() required AM/PM suffix for default
frontend-generated names like 'deepseek-v4-flash 17:46:02'.
Frontend uses toLocaleTimeString() which outputs 24h
format in most locales — so the regex never matched and
auto-naming silently skipped.
Made AM/PM optional and added re.IGNORECASE for 'am'/'pm'.
* test: add regression tests for needs_auto_name (24h + 12h + custom)
---------
Co-authored-by: Calculator Dev <dev@calculator.local>