Commit Graph

64 Commits

Author SHA1 Message Date
pewdiepie-archdaemon 6d507f8128 Merge remote-tracking branch 'origin/dev' into test-main-dev-merge-20260615
# Conflicts:
#	src/tool_implementations.py
#	static/js/research/panel.js
2026-06-15 21:20:15 +09:00
Kfir Sadeh d8e7cc7053 feat(ui): add real-time diagnostic logs console (#974)
* feat(diagnostics): add admin-gated real-time diagnostics logs terminal UI

* feat(ui): resolve diagnostics logs feedback and optimize client-side caching

* feat(ui): resolve diagnostics logs feedback
2026-06-15 10:32:51 +02:00
Hasn 0939983ddf pwa missing icons added (#428) 2026-06-15 16:00:13 +09:00
pewdiepie-archdaemon bb66914b1e Settings: clamp logo SVGs to 18px chip + endpoint dropdown gets logo
Provider SVGs in providers.js declare only viewBox (no width/height),
so when injected into the 18×18 logo chips they fell back to the
browser default of 300×150 and blew out the row.

- CSS: SVGs inside settings logo chips (`span[id$="-logo"]`,
  the 18px wrappers in fallback rows) now stretch to 100%/100% of
  their container.
- Added matching `-logo` chip next to the Endpoint dropdowns in
  Default Chat Model and Utility Model cards.
- New `_syncEndpointLogo` helper mirrors the selected endpoint
  option's text label through providerLogo() (the select value is
  a UUID and wouldn't match anything otherwise), and
  `_fillEndpointSelect` calls it on each render.
2026-06-13 23:00:16 +09:00
pewdiepie-archdaemon f78084c230 Brain cards 32px tall + Trending tab up 8px + drop hwfit Rescan
- Brain admin-card header rows get min-height:32px so cards with
  toggles and cards without (Inject Skills) align.
- Cookbook Trending models tab nudged up 8px (top:-3 → -11).
- Removed the ↻ RESCAN button in hwfit toolbar; manual EDIT still
  available and auto-probe runs on container restart.
2026-06-13 19:56:22 +09:00
pewdiepie-archdaemon 7004e1de7b Brain settings: reorder + AI star icons on each toggle
- Reordered: Auto-extract memories → Auto-extract skills →
  Auto-approve skills → Inject Skills (Auto-approve now above
  Inject so all three AI-driven toggles cluster together)
- Added accent-tinted star icon (the AI star) before:
  Auto-extract memories, Auto-extract skills, Auto-approve skills
- Inject Skills gets a neutral down-arrow-into-line icon (it's
  configuration, not AI work)
2026-06-13 19:36:10 +09:00
pewdiepie-archdaemon e2a30c0600 Skills: Audit on left + accent star, Select w/ dot/X icon swap
- Reordered the toolbar so Audit sits left of Select (matches the
  brain memories layout where bulk actions live before Select)
- Renamed "Audit all" → "Audit"
- Star icon in Audit now tinted with var(--accent, var(--red))
- Select button gets the same dot/X SVG swap used in brain
  memories (dot in idle state, X when bulk-select mode is active)
2026-06-13 15:47:16 +09:00
Kenny Van de Maele 620fdd0859 feat(agent): confine agent file/shell tools to a selectable workspace (#3665)
* feat(agent): workspace confinement via context-local binding + get_workspace tool

Bind the per-turn workspace once in execute_tool_block; the shared path
resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd
helper (agent_cwd) read it, so file tools + bash/python are confined centrally
and a new tool that uses the shared helpers cannot accidentally bypass it.

Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory
modal (reusing existing modal/button CSS), the /workspace slash command, and a
get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic
(realpath/normcase/commonpath) and docker-safe (container paths, no host
assumptions). Reopens #2023.

* ux(workspace): clarify workspace is not a sandbox

Picker modal note + pill tooltip + get_workspace tool/output wording now state
plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder,
but bash/python only start there (cwd) and are not sandboxed. Modal note reuses
the existing .muted class.

* fix(agent): treat an active workspace as file-work intent

A vague low-signal message (e.g. "look at the local project") matches no
domain keywords, so tool retrieval is skipped and only always-available tools
are offered — leaving the agent with no file access even though a workspace is
set. When a workspace is active, include the file/code tools (incl.
get_workspace) on low-signal turns so the agent can act on the folder.

Also requires the tool index (ChromaDB) to be reachable for normal retrieval;
that is an environment dependency, not part of this change.

* ux(workspace): hide pill + overflow entry in chat mode

Workspace only scopes the agent's file/shell tools, so the pill and the
overflow 'Workspace' entry are agent-only now — hidden in chat mode like the
bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is
called from the agent/chat setMode handler.

* prompt(tools): steer bash/python to defer to the dedicated file tools

bash/python schema descriptions (what native-tool-calling models read) were
bare and gave no steer, so models would do file ops via the shell (e.g. writing
SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python
in the schema + tool-index + prompt section to prefer read_file/write_file/
edit_file/grep/glob/ls and only be used for what those do not cover.

* prompt(tools): keep bash/python deferral generic (no hardcoded tool names)

Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc.
by name, so the guidance does not go stale if those tools are renamed.

* style(workspace): drop em-dashes from added code comments/strings

* ux(workspace): terser non-sandbox note in picker (no tool-name list)

* ux(workspace): mirror terse non-sandbox wording in pill tooltip

* chore: untrack local venv symlink (run-only, not part of the feature)

* prompt(workspace): keep get_workspace text generic (no hardcoded tool names)

* fix(agent): low-signal + workspace surfaces only read-only file tools

Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message
in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but
not write_file/edit_file/bash/python -- those wait for a request that actually
calls for them (RAG retrieval still adds them on a real ask).

* feat(workspace): cap browse listing at 500 dirs with a truncated hint

Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local
_MAX_BROWSE_DIRS so a directory with thousands of children does not dump every
row into the picker; the response carries a truncated flag and the modal tells
the user to type a path to jump in.

* chore: untrack local venv symlink (run-only artifact)

* fix(workspace): vet the workspace root against the sensitive-path deny list at bind time

The in-workspace resolver deny-lists sensitive paths inside the workspace,
but the empty-path search root is the workspace itself, so a workspace of
~/.ssh could be listed via ls with no path. vet_workspace() (public, in
tool_execution next to the resolvers) rejects non-directories and sensitive
roots before the path is ever bound; chat_routes uses it instead of its
inline isdir check.

* fix(workspace): reject filesystem roots and stop showing rejected workspaces as active

Review findings from #3665:

P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes
every absolute path 'inside' the workspace and collapses confinement into
host-wide file access. A root is its own dirname, so reject when
dirname(resolved) == resolved; the browse response now carries a selectable
flag and the picker disables 'Use this folder' on unselectable dirs.

P3: /workspace set stored any string client-side and the chat route silently
dropped rejected values, so the pill could claim a confinement that was not
in effect. New admin-gated /api/workspace/vet validates manual paths before
they persist (canonical path returned), and when a posted workspace is
rejected at send time the stream emits workspace_rejected so the client
clears the stored value and toasts instead of continuing silently.

* fix(workspace): check caller privilege before vetting the posted workspace

Review finding: /api/chat_stream called vet_workspace() on the posted value
for every caller and emitted workspace_rejected on failure, so a non-admin
who can chat but cannot use file/shell tools could distinguish existing
directories from missing/file/sensitive/root paths by whether the event
appeared. The resolution now lives in _resolve_request_workspace, which
drops the submitted value uniformly for non-admin callers, with no vetting
and no event, before the path ever touches the filesystem. Admin and
single-user behavior is unchanged. Test pins that valid and invalid paths
are indistinguishable for a non-admin and that vet_workspace is never
invoked for them.
2026-06-11 18:17:54 +02:00
pewdiepie-archdaemon bc2d934b94 Agent email safety: stage drafts for user approval instead of auto-send
Closes the auto-send hole that let earlier models invent signatures
(e.g. signing 'David' for a user named Felix) and SMTP them to real
recipients before the user could review.

New setting: agent_email_confirm (default True).

When on, the MCP send_email and reply_to_email tools no longer SMTP
directly — they write the composed email to scheduled_emails with a new
status 'agent_draft' (far-future send_at so the scheduled-send poller
ignores them) and return a {pending: true, pending_id, to, subject,
body, message: ...} payload. The model surfaces that to the user.

Backend endpoints to approve / cancel:
- GET    /api/email/pending          → list staged drafts for the owner
- POST   /api/email/pending/{id}/approve → flip status to 'pending' +
                                           backdate send_at so the
                                           existing scheduled-send
                                           poller delivers immediately
- DELETE /api/email/pending/{id}     → status = 'cancelled'

UI:
- Settings / AI Defaults gets a new 'Email Safety' card with the
  toggle, default on.
- Tool descriptions for send_email and reply_to_email now include the
  pending behavior + an explicit 'DO NOT invent a signature, do not
  type a person's name' guardrail.

Pass 2 (next): inline chat card with Send / Discard buttons so the user
doesn't have to type a confirmation reply. Today's prompt + the listing
endpoint give the model a clean path to surface drafts.
2026-06-11 08:50:06 +09:00
pewdiepie-archdaemon a80421efb6 Sessions sort dropdown: nudge all items 2px more left
Group row's auto-sort-sessions-btn padding-left 6→4, and
.sort-dropdown-item left padding 8→6 so 'Last Active', 'Newest First',
'By Folder', '↑↓ Rearrange', '● Select' all shift in by the same
amount, matching the Group nudge.
2026-06-10 22:49:53 +09:00
pewdiepie-archdaemon 49ecd806a2 Chats sidebar: 'manage' slides in from the side like email's 'new'
The list-item-plus-label slide-in needs a visible anchor element so
the button takes up consistent width and the absolutely-positioned
label can fly in to the left of it. Email uses the '+' SVG as that
anchor; here we use an empty 13x13 spacer span instead — same
footprint, no glyph. Result: empty button at rest (still visible per
the chats-manage-btn fade rules), 'manage' slides in from the left
on direct hover.
2026-06-10 22:37:06 +09:00
pewdiepie-archdaemon 1eaa5c2a81 Sessions sort: nudge auto-sort icon + 'Group' text 4px left (10→6 left padding) 2026-06-10 22:35:21 +09:00
pewdiepie-archdaemon e107c5876e Chats sidebar: drop library SVG from manage button — text-only 'manage'
Removed the book/library SVG and list-item-plus-btn/-label classes.
The button is now a plain text button styled like email's 'new' label
(9.5px, 0.02em letter-spacing), reusing the existing chats-manage-btn
opacity hover-reveal rules so it still fades until you hover the
section.
2026-06-10 22:35:06 +09:00
pewdiepie-archdaemon 4f7061fd61 Settings overhaul + UI polish pass
Two months of iteration on the Settings panel, integration forms, and
small visual nudges across the app. Highlights:

Settings restructure
- Add Models: split into separate Local + API cards (no more in-card
  tabs); each fuses Type/Provider with the URL input.
- Added Models: new dedicated sidebar tab, with Probe + Clear-offline
  pulled into its header; Local/API sub-section icons accent-tinted.
- Search: Web Search and a new Deep Research card (Model + tuning),
  with a cross-link to AI Defaults. Provider hints use real clickable
  anchors; Web Search Test button shows a whirlpool spinner.
- AI Defaults: Image Generation card returns; Research Model card
  carries only Endpoint+Model with a cross-link to Search; Vision /
  Default / Utility fallbacks unified under one numbered-row design
  matching Search's chain.
- API Permissions (was 'API Tokens'): per-row rename, inline
  Permissions toggle that expands the scope-edit panel, in-field
  copy icons (icon→check on success). Empty state accent-tinted.
- Integrations: + Add Integration drops a type-picker menu directly
  under the button (drop-up on tight viewports); each integration
  form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses
  the same accent-outlined Save/Test/Cancel buttons right-aligned.
- Danger Zone: Wipe→Delete with trash icons; new 'Delete everything'
  row at the bottom that loops every category.

AI Synthesis (Reminders)
- Persona dropdown sourced from PROMPT_TEMPLATES + custom preset.
- src/reminder_personas.py mirrors the five built-ins for the
  server-side synthesis path.
- dispatch_reminder() reads reminder_llm_persona and uses the
  persona's system prompt; empty/unknown falls back to warm-neutral.

Esc handling
- Kebab menus and the provider picker intercept Esc in capture phase
  so dismissing a popup no longer closes the whole Settings modal.

Accent tinting
- Scoped CSS rule across data-settings-panel=ai/services/added-models/
  search/integrations/reminders for card h2 icons + the Added Models
  sub-section icons.

Codex/Claude integration form
- No more auto-creation on form open — explicit Create token button.
- New tokens start with every scope granted; existing tokens move out
  of the integration form into the API Permissions card.
- Setup reveal: copy buttons inline inside the token + setup code
  blocks; shorter subtitle wording.

Misc visual polish
- Save/Test/Cancel uniformly accent-outlined and right-aligned on
  every integration form.
- Provider logos render inline next to the search fallback selects
  and the Deep Research Search dropdown.
- Trash icons in fallback rows bumped to 20x20 so they fill the 32px
  button.
- Image generation default flipped to off.
2026-06-10 15:15:13 +09:00
Maruf Hasan c3fcaf15b7 feat(providers): add NVIDIA AI provider endpoint support (#3456)
* feat: add NVIDIA as an AI provider (integrate.api.nvidia.com)

* feat: add NVIDIA option to provider settings dropdown and aliases

* test: add NVIDIA provider detection and endpoint tests

* Add NVIDIA to _HOST_TO_CURATED and expand non-chat model filtering

- nvidia.com -> 'nvidia' curated key for proper provider routing
- _NON_CHAT_PREFIXES: bge, snowflake/arctic-embed, nvidia/nv-embed
- _NON_CHAT_CONTAINS: content-safety, -safety, -reward, nvclip,
  kosmos, fuyu, deplot, vila, neva, gliner, riva, -parse,
  -embedqa, -nemoretriever

* Expand non-chat model filtering for NVIDIA embedding/guard/video models

Add _NON_CHAT_PREFIXES: embed, recurrent
Add _NON_CHAT_CONTAINS: topic-control, guard, calibration,
  ai-synthetic-video, cosmos-reason2

Catches remaining unfiltered non-chat models from NVIDIA catalog:
embedding (llama-nemotron-embed, embed-qa), guard (llama-guard,
nemoguard-topic-control), calibration (ising-calibration),
video (ai-synthetic-video-detector, cosmos-reason2),
recurrent (recurrentgemma-2b)

* Filter non-chat models in _probe_endpoint via _is_chat_model()

Previously _is_chat_model() was only used in the per-model probe
and _first_chat_model(), so non-chat models still appeared in the
model picker even though they were filtered in those specific paths.
Applying the filter at _probe_endpoint() return ensures non-chat
models (embeddings, safety guards, reward, calibration, video
detectors, CLIP, VLM, translation, parsing, recurrent, etc.) never
enter cached_models and never appear in the picker.

* Fix _NON_CHAT_CONTAINS to catch org-prefixed embedding models

Prefix checks (mid.startswith) miss models with org prefixes like
baai/bge-m3, nvidia/embed-qa-4, google/recurrentgemma-2b, etc.
Adding the same terms to _NON_CHAT_CONTAINS ensures they are caught
regardless of the org prefix.

Adds: embed, bge, recurrent, starcoder, gemma-2b

* fix(model-routes): drop collision-prone substrings from global non-chat filter

The NVIDIA PR added several substrings to the shared _NON_CHAT_PREFIXES
and _NON_CHAT_CONTAINS tuples. These are intended to filter out
embedding, retrieval, safety, and vision models from NVIDIA's catalog
that are not chat-completions-capable. However, four of the added
substrings collide with legitimate chat models served by other providers:

  - gemma-2b  matches google/gemma-2b-it (instruct chat model)
  - starcoder matches bigcode/starcoder2-15b (code completion model)
  - recurrent matches google/recurrentgemma-2b (language model)
  - guard     matches meta-llama/Llama-Guard-3-8B (safety classifier)

Removing these four from the global tuples keeps the NVIDIA-specific
filtering intact (safety, embedding, retrieval, and vision models are
still caught by other tokens such as content-safety, -safety, -reward,
embed, bge, -embedqa, -nemoretriever, nvclip, deplot, etc.) while
preventing false negatives for instruct/code models on other providers.

Tests added for gemma-2b-it, google/gemma-2b-it, and
bigcode/starcoder2-15b-instruct asserting they are recognized as chat
models.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): remove duplicate bge/embed tokens from _NON_CHAT_CONTAINS

Tokens already present in _NON_CHAT_PREFIXES, making the CONTAINS
entries redundant since the prefix check runs first.

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* fix(nvidia): move bge to CONTAINS, add llama-guard, remove stray blanks

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>

* style: fix indentation of groq and xai test cases in test_provider_endpoints.py

---------

Co-authored-by: Kenny Van de Maele <kenny@kvandemaele.be>
2026-06-09 11:06:12 +02:00
pewdiepie-archdaemon 7690860ab1 Settings/Add Models: bump Local Type select width 57→62px 2026-06-09 15:12:57 +09:00
pewdiepie-archdaemon b6366e9da5 Settings/Add Models: fuse Local Type select + URL input into one bordered group 2026-06-09 15:12:12 +09:00
pewdiepie-archdaemon 64122269e9 Settings/Add Models: shrink Local Type select by 15px (72→57) 2026-06-09 15:11:07 +09:00
pewdiepie-archdaemon 1bdd515941 Settings/Add Models: drop 'Type:' label, keep the LLM/Image select 2026-06-09 15:10:48 +09:00
pewdiepie-archdaemon 8ac0ae72dc Settings/Add Models: Local card — Type and Add inline with URL field
Lift the LLM/Image Type select to the left of the URL input and the Add
button to its right, so the primary action (URL + Add) sits on one row.
Scan / Ollama / Key / Test stay on the action row below.
2026-06-09 15:09:28 +09:00
pewdiepie-archdaemon b2458f9891 Settings/Add Models: split Local and API into separate cards, always show API key
Drop the in-card Local/API tab strip — each is now its own admin card with
a normal h2 heading (Local on top, API below). The API key input is
always visible (no more click-to-reveal toggle), matching how cloud
providers actually work. Local keeps the optional key reveal since
local servers usually don't need one.

Dead code removed: wireModelsTabs IIFE and the adm-epApiKeyBtn toggle wire.
2026-06-09 14:57:42 +09:00
pewdiepie-archdaemon 2252776a97 Settings: promote Added Models to its own sidebar menu
Move the Added Models endpoint lists out of the Add Models card into a
dedicated sidebar tab between Add Models and AI Defaults. The card now
focuses purely on adding (Local / API tabs), while the new panel owns
the existing endpoints + Probe and Clear-offline controls.

admin.js: defensive fallback so a stale 'added' value in localStorage
falls back to 'local' instead of leaving both panes hidden.
2026-06-09 14:52:48 +09:00
pewdiepie-archdaemon c9fecd53dc Settings: third 'Added Models' tab in Add Models card
Move the Added Local + Added API lists out of the per-type tabs into
a dedicated third tab. Each Add tab is now just the form; the new tab
collects both lists together with Local / API subheadings.

Card layout:
  Add Models  [Probe] [Clear offline]
    [Local]  [API]  [Added Models]

Tab content:
  Local         → Add Local form
  API           → Add API form
  Added Models  → Local list + API list (subheadings)

All endpoint list/form IDs preserved. Tab switcher JS is generic so
the new 'added' tab works without code changes.
2026-06-09 14:47:21 +09:00
pewdiepie-archdaemon 8ef9b8b215 Settings: tabbed Add Models card with Local / API tabs
Earlier split into 4 flat cards wasn't what was asked for. Restore to
a single 'Add Models' card with two tabs at the top:

  Local  → Add form + Added Local Models list
  API    → Add form + Added API Endpoints list

Probe / Clear-offline live on the card header and act on both lists.
Active tab is remembered in localStorage so the user lands back where
they were. All form/list IDs preserved (adm-epLocalUrl, adm-epList-local,
adm-epList-api, etc.) so admin.js continues to work unchanged.

Replaces the .adm-section-toggle fold-open JS with a tab-switcher; the
fold elements no longer exist so the old handler was already a no-op.
2026-06-09 14:43:28 +09:00
pewdiepie-archdaemon 459b825daa Settings: split Add/Added Models into 4 flat cards (no folds)
The previous 'Add Models' card had two collapsible folds (Local + API)
inside it and 'Added Models' had two inline subsections. Both folded
states added a click-to-expand step that wasn't earning its keep —
users coming to Settings to add a model don't want a fold, they want
the form.

Reshape: four flat admin-cards in the Services panel, each with its
own h2 title matching the rest of Settings:
  Add Local Model       (was Add Models → Local fold)
  Add API               (was Add Models → API fold)
  Added Local Models    (was Added Models → Local subsection)
  Added API Endpoints   (was Added Models → API subsection)

The collapsible JS hook in admin.js already guards on
'if (!head) return' so removing the .adm-section-toggle headers
turns it into a clean no-op — no breakage.

All input/list IDs preserved (adm-epLocalUrl, adm-epList-local,
adm-epList-api, etc.) so the rest of admin.js continues to work
unchanged. Probe / Clear-offline live on the Local card and act on
both lists together (existing behavior).
2026-06-09 14:36:44 +09:00
pewdiepie-archdaemon 3247773447 Hide Teacher Model settings card (2.0 'harden the core' deferral)
The Teacher Mode feature stays out of the default UI per the 2.0
roadmap — backend escalation is already dormant when teacher_model is
unset (its default) and we want to focus on core reliability before
surfacing escalation as a feature.

Nothing removed from the backend:
- src/teacher_escalation.py still gates on get_setting('teacher_model')
- agent_loop.py's run_teacher_inline hook is a no-op without the setting
- settings backup/restore round-trips the teacher_model key unchanged
- power users can still set it via manage_settings or the JSON backup

settings.js's initTeacherModel already early-returns when the card's
DOM ids are missing, so the JS side is clean.

To re-surface the card, revert this commit.
2026-06-09 14:31:04 +09:00
pewdiepie-archdaemon e6b1009b89 Remove non-merge-ready workspace and terminal agent hooks 2026-06-09 09:48:59 +09:00
pewdiepie-archdaemon fa8c93ec0a Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.

Highlights:
- Ollama as a first-class backend in the Cookbook:
  * Download input accepts ollama-style names (name:tag) → backend=ollama
  * /api/cookbook/ollama/library (cached scrape of ollama.com + curated
    fallback so classic models like qwen2.5 stay reachable)
  * "Browse Ollama library" toggle below Download with size chips
  * Engine=Ollama in hwfit toolbar merges the Ollama library into the
    main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
    columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
  loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
  checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
  pulled up 8px to align with the row's checkboxes, GGUF File dropdown
  exposed for Ollama backend, GPU re-render on Edit serve restore,
  _forceBackend flag so saved serveState wins over backend detection,
  cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
  Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
  Clear-offline buttons in Added Models toolbar, offline-pill removed
  (opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
  reachable (vLLM returns 404 on bare /v1, fully working endpoints
  were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
  serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
  Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
  serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
  ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
  auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish
2026-06-09 09:46:19 +09:00
pewdiepie-archdaemon 2a2a93d845 Remove plan mode from merge-ready UI 2026-06-09 09:40:20 +09:00
stocky789 1e0d9b92af feat: add ChatGPT Subscription provider (#2876)
* feat: Add ChatGPT Subscription support and related features

- Introduced a new provider option for ChatGPT Subscription in the endpoint selection UI.
- Implemented OAuth flow for ChatGPT Subscription sign-in, including polling for authorization status.
- Updated admin interface to handle ChatGPT Subscription, including disabling API key input and providing user guidance.
- Enhanced cost tracking logic to differentiate between subscription and non-subscription endpoints.
- Added new slash commands for managing skills, including listing, searching, and invoking skills.
- Implemented caching for skill catalog to optimize performance.
- Updated tests to cover new ChatGPT Subscription functionality and ensure proper endpoint probing.
- Refactored existing code to accommodate new features and improve maintainability.

* refactor: share provider device-flow setup

- reuse one device-flow backend for Copilot and ChatGPT Subscription
- add one frontend device-flow helper for Settings and /setup
- put GitHub Copilot back into Add Models, now as a dropdown option
- make provider selection just select; clicking Add starts sign-in
- stop ChatGPT Subscription setup from opening auth tabs automatically
- make /setup copilot and /setup chatgpt-subscription work from chat
- show ChatGPT Subscription in the /setup suggestions
- show the real error message when setup fails
- add focused tests for the shared flow and setup UI

* feat(chatgpt-subscription): harden credential lifecycle and streamline auth UX

Backend:
- Resolve runtime bearer for provider-auth endpoints at probe time via a
  shared _resolve_probe_key() that delegates to resolve_endpoint_runtime,
  applied across all probe/refresh call sites.
- Skip live completion probes and health pings for discovery-only providers
  (centralized behind _is_discovery_only_provider) — the Codex/Responses API
  has no such endpoints, so status is derived from cached models.
- Never persist the short lived ChatGPT bearer to the plaintext sessions
  table; proactively clear any stale bearer left by an earlier code path.
- Revoke orphaned ProviderAuthSession credentials when the last endpoint
  backing them is deleted (_delete_orphaned_provider_auth), surfaced via
  cleared_provider_auth in the delete response.

Frontend (admin.js):
- Auto-start the device-auth flow on provider selection so the authorization
  panel (code + Authorize) shows immediately instead of behind a "Sign in" click.
- Remove the redundant top button for device auth providers, move retry
  into the panel via an inline "Try again".
- Drop the self-evident hint text and add an execCommand clipboard fallback so
  Copy works in non-secure (HTTP/LAN) contexts.

* fix: harden chatgpt subscription provider

* chore: remove PR media from branch

* Fix chatgpt subscription recovery and token handling

---------

Co-authored-by: 5p00kyy <admin@5p00ky.dev>
2026-06-08 10:19:18 +02:00
Alan Met a6bc1addd2 fix(settings): correct Add User username placeholder (#3296)
Fixes #3292
2026-06-07 17:50:18 +02:00
M57 12cb39cbd9 feat: add OpenCode Zen and Go as provider options (#26)
- Add OpenCode Zen (https://opencode.ai/zen/v1) and Go (https://opencode.ai/zen/go/v1)
- Add provider detection via _host_match() in llm_core.py
- Add curated model list entries in model_routes.py
- Add webhook provider URLs
- Add provider icon (providers.js) and dropdown options (index.html)
- Add auto-detection patterns and setup URLs (slashCommands.js)
- Whitelist opencode.ai in URL validation (admin.js)
- Rebased on main to fix merge conflicts with _HOST_TO_CURATED refactor

Co-authored-by: M57 <hy4ri@users.noreply.github.com>
2026-06-07 16:43:00 +02:00
Logan Davis f72e1bd412 feat(reminders): add generic webhook as a fourth reminder channel (#2952)
Replaces any Discord-specific reminder channel with a generic outbound
webhook channel. Users pick any saved Integration as the target and
supply a JSON payload template with {{title}} and {{message}}
placeholders — values are JSON-escaped before substitution. Works with
Discord, Slack, Teams, ntfy (JSON mode), or any service that accepts a
POST with a JSON body.

- `src/settings.py` — reminder_webhook_integration_id +
  reminder_webhook_payload_template defaults
- `routes/note_routes.py` — webhook delivery block; Integration lookup,
  template rendering, auth wiring; built-in preset defaults so
  discord_webhook works out of the box without a configured template;
  settings_override kwarg avoids test-button race condition
- `routes/auth_routes.py` — discord_webhook preset test handler
- `src/integrations.py` — discord_webhook preset with description +
  example templates; hides auth/key fields in the Integration form
- `src/builtin_actions.py` — webhook_sent delivery check
- `src/tool_implementations.py` — webhook aliases + enum updated
- `static/index.html` — Webhook channel option; Integration picker +
  payload template textarea
- `static/js/settings.js` — Integration list, populateWebhookIntegrations,
  syncChannelRows, hints, load/save, auto-fill preset templates,
  test-button override payload, hide auth/key for URL-auth presets

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 22:47:57 +02:00
Giulio Zelante b448119919 feat(skills): import SKILL.md bundles from public GitHub URLs (#2576)
* feat(skills): import SKILL.md bundles from public GitHub URLs

Supports GitHub tree/blob/raw links and skills.sh pages that resolve to GitHub.
Installs SKILL.md plus sibling text assets under data/skills/imported/.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): admin-gate URL import and validate redirect hosts

- require_admin on POST /api/skills/import-from-url (matches other skill admin routes)
- reject cross-host redirects after httpx follow_redirects
- test for redirect host validation

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): match Brain Add panel import/submit button styles

- Skill URL Import: theme-io-btn + download icon (same as memory Import)
- Add Skill submit: confirm-btn confirm-btn-primary

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): allow api.github.com during directory import

Real imports hit the GitHub contents API after redirects; whitelist
api.github.com and add regression tests. Shrink Import button with flex:none.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): align skill Import button with URL input row

Match memory-add-input height (28px) in memory-add-row and center the
download icon with flexbox instead of vertical-align hacks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): cancel modal-body margin on skill Import button

The skill Import button sits in .memory-add-row beside an input; the
global .modal-body button { margin-top: 6px } rule only affected buttons,
pushing Import down and misaligning the download icon. Reset margin-top
and match Memory Import SVG markup at 28px row height.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): surface GitHub API errors on URL import

Pass through GitHub response messages (especially 403 rate limits) as
SkillImportError instead of a generic download failure.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-05 19:48:23 +02:00
Kenny Van de Maele 8ce945d338 feat: Add plan mode to the chat agent (#638)
* feat: Add plan mode to the chat agent

Adds a plan mode: the agent investigates read-only, proposes a checklist, and
waits for approval before changing anything. On approval it runs with full
tools and checks items off as it goes. Enforcement reuses the existing
disabled_tools gate.

Includes a slash command: `/plan [on|off]` (and `/toggle plan`) to flip the
plan toggle from the chat input.

- src/tool_security.py, src/mcp_manager.py: read-only allowlist (tools + MCP).
- src/agent_loop.py, routes/chat_routes.py: union the disabled set, prepend the
  plan directive, force agent mode.
- static/: plan toggle pill, Approve & Run, dockable plan window, task-list
  checkboxes, and the /plan slash command.
- tests/test_plan_mode.py.

* Plan mode: persistent re-referenceable plan + agent write-back

Three improvements so a long plan survives a weak model and stays in reach:

1. Re-reference the plan (out-of-context fix). On the execution turn the frontend
   sends the approved checklist back (`approved_plan`); the backend pins it as a
   top-of-context `## ACTIVE PLAN` system note (kept by the context trimmer), so
   the agent can always re-read the plan instead of losing the thread on a long
   run. New `build_active_plan_note()` (unit-tested).

2. Re-open / dock the plan anytime. The plan checklist is stored per-session
   (localStorage). When a plan exists, the plan-mode button opens a small menu
   ("Show plan" / "Plan mode: On/Off") that re-opens the side-dockable plan
   window — so it can stay docked while the agent works. The window live-refreshes
   as the plan changes.

3. Agent write-back: new `update_plan` tool. The agent calls it to tick steps
   `- [x]` after finishing them, or to revise steps when the user asks. Marker
   tool (no I/O) → `plan_update` SSE event → the stored plan + docked window
   update live. The ACTIVE PLAN note instructs the agent to use it.

Backend: src/agent_loop.py (param + pin + note builder + emit + prompt blurb),
src/tool_execution.py (update_plan handler), routes/chat_routes.py (parse
`approved_plan`, relay `plan_update`), registration in tool_schemas / agent_tools
/ tool_index (always-available, not admin-gated).
Frontend: static/js/chat.js (plan store, send `approved_plan`, handle
`plan_update`, capture restated checklists), static/app.js (plan-button menu),
static/js/planWindow.js (`isPlanWindowOpen`), static/js/storage.js (PLAN key).
Tests: tests/test_plan_mode.py (plan-note), tests/test_update_plan_tool.py.

* Plan mode: drop bash/python, rely on read-only discovery tools

Shell can mutate (write files, hit the network) and can't be constrained to
read-only at the tool layer, so plan mode no longer relies on a prompt to keep
it well-behaved — bash/python are removed from the read-only allowlist and added
to the fail-closed block set. Discovery is covered by the dedicated read-only
tools (read_file, grep, glob, ls) instead.

Rewrites the plan-mode directive to state shell is disabled and lists the
available read-only tools positively. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Comment: note _MCP_READONLY_VERBS are prefixes not whole words

Clarifies that entries like "summar" are intentional stems matched via
startswith (covers summarise/summarize/summary), not typos. Addresses review
feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: clarify why gating inverts the allowlist into a denylist

Rename _PLAN_MODE_FALLBACK_BLOCK -> _PLAN_MODE_KNOWN_MUTATORS and rewrite the
comments. The tool gate is a denylist (disabled_tools); plan mode's policy is an
allowlist, so it returns the inverse (all known tool names minus the allowlist).
The static mutator set is a backstop for the schema-derived name list, which
misses XML-only tools and can fail to import. Addresses review feedback on #638.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Plan mode: stop hardcoding the read-only tool list in the directive

The model is already shown its available (read-only) tools by _assemble_prompt,
which removes every disabled tool. Enumerating them again in the directive only
duplicated that list and would drift as tools change. Point at the tools listed
below instead. Addresses review feedback on #638.
2026-06-05 16:32:25 +02:00
Nicholai 201e207b56 fix(memory): let manual add specify memory category
fix for #2784 and part of #2788: Add a category selector (same options as inline edit) and include category in
the /api/memory/add JSON payload.
2026-06-05 04:57:13 -06:00
joi-lightyears 88c9f1fa74 fix(memory): let manual add specify memory category
Add a category selector on the Brain Add tab and include it in the
/api/memory/add JSON payload instead of always defaulting to fact.
Fixes #2784
2026-06-05 13:17:14 +07:00
pewdiepie-archdaemon e2f449f4ef Cookbook scheduler + serve: schedule via Tasks, Stop verifies kill, Ollama auto port-pick
- Schedule cookbook serves through the existing ScheduledTask system: the
  serve preset gets a ^ button next to Launch that opens a daily/hourly/
  weekly form mirroring the admin-switch style; the schedule action runs
  action_cookbook_serve, which delegates to /api/model/serve and stamps
  the resulting task with _scheduledStopAtMs. A background
  cookbook_serve_lifecycle loop ticks every 60s and kills any serve
  whose window has ended, also dropping the auto-registered endpoint
  so the model picker doesn't keep pointing at a dead server.
- Stop and remove on a Running serve now awaits the SSH/tmux kill,
  re-checks tmux has-session, and surfaces an error toast (leaving the
  row) when the kill failed. Previously fire-and-forget, so a failed
  SSH/tmux call silently left the live serve running while the row
  vanished from the UI.
- Cookbook tasks/status orphan-adoption sweep no longer requires the
  serve-/cookbook- session-id prefix; any tmux session whose pane is
  running a known model-server process gets auto-pulled into Running.
  Without this loosening, a cookbook-launched serve whose tmux id
  fell back to a bare number was invisible — you couldn't see it,
  let alone stop it.
- Ollama serve always launches a fresh process under cookbook's tmux
  (no more monitor-mode reattach to a systemd/Docker ollama Stop can't
  reach). The handler pre-picks a free port by probing the target
  host over SSH and mutates req.cmd's OLLAMA_HOST so the runner script
  AND the auto-registered endpoint agree on the same bind port.
- Auto-register uses host.docker.internal (when running inside Docker)
  instead of localhost, matching the URL /setup adds for Ollama by
  hand. Local cookbook serves now produce a chat-reachable endpoint
  on first launch.
- Cascade-delete: removing a scheduled cookbook task also deletes any
  linked calendar event (cookbook_task_id marker in the description).
- Tasks list groups cookbook_serve under a "Cookbook" category that
  sorts above the rest, so scheduler-launched serves are easy to find.
2026-06-05 14:41:43 +09:00
Kenny Van de Maele 2be3779e6e feat: Add workspace: confine agent tools to a folder (#1103)
* feat: Add workspace: confine agent tools to a folder

Pick a server folder as the agent's workspace so its file/shell tools work
there and don't touch files outside it. File tools are hard-confined; bash/
python run with cwd set to the folder.

Includes a slash command: `/workspace` (alias `/ws`) — show / `set <path>` /
`clear` / `pick` (open the directory browser).

- routes/workspace_routes.py: GET /api/workspace/browse (admin-only).
- src/tool_execution.py: hard path confinement for read_file/write_file;
  bash/python cwd. Threaded route → stream_agent_loop → execute_tool_block.
- src/agent_loop.py: workspace note prepended to the system prompt.
- static/: overflow menu item, input-bar pill, directory-browser modal, and
  the /workspace slash command.
- tests/test_workspace_confine.py.

* Wire workspace confinement into tools that landed after this PR

edit_file (#1239) and grep/glob/ls (#1670) merged after workspace-confine was
written, so they bypassed the workspace boundary. Thread the workspace through:
  - edit_file: _do_edit_file resolves via _resolve_tool_path_in_workspace
  - grep/glob/ls: _resolve_search_root confines to the workspace (root + paths)
  - bash/python/bg cwd: workspace or _AGENT_WORKDIR (keep the #2586 data-dir
    default when no workspace is set)
Tests cover edit_file + grep/ls confinement (inside ok, outside rejected).

* Workspace picker: editable path bar + modal style cohesion + cross-platform hardening

- Make the current-folder strip an editable address bar: type/paste a full
  path and press Enter to navigate (also reaches other Windows drives and
  hidden dirs the up-only browser cannot).
- Reuse shared modal CSS: drop bespoke .workspace-modal-content/.workspace-btn*
  in favour of base .modal-content/.modal-body and the .confirm-btn button
  family; separators/hover use var(--border). Net -31 CSS lines.
- Fix the path field overflowing the modal right edge (flex stretch + margin
  vs an overflow:auto scrollbar-feedback loop): full-bleed, no h-margin.
- Cross-platform confinement: normcase the workspace commonpath check so
  containment holds on case-insensitive filesystems (Windows/macOS).
- Make tests OS-portable: sibling temp dirs instead of /etc, python os.getcwd()
  instead of pwd. 5 pass.
2026-06-05 00:06:37 +02:00
Kenny Van de Maele 64d65b73c1 feat: round-limit handling — Continue affordance at the cap + configurable cap (#1999)
* feat: round-limit handling — Continue affordance at the cap + configurable cap

When the agent loop runs out of rounds (per-message step cap, default 20)
while still actively using tools, it stopped silently mid-task. Now:

1. The loop emits a `rounds_exhausted` SSE event at the cap, and the UI shows
   a "Continue" pill at the bottom of the chat that resumes the task from where
   it left off. Repeated cap-hits each get a fresh Continue (multiple continues
   in a row).
2. The cap is configurable in Settings → Agent ("Max steps per message"),
   validated on the client, at the save endpoint, and at the read site.

- src/agent_loop.py: track `_exhausted_rounds` (set only when a full
  tool-executing round completes on the last allowed round — i.e. the agent
  wanted to keep going); emit `{"type":"rounds_exhausted","rounds":N}` (logged).
- routes/chat_routes.py: read `agent_max_rounds` (clamped 1..200), pass as
  `max_rounds`; forward the new event through the SSE relay.
- routes/auth_routes.py: validate numeric settings on save (int + clamp;
  agent_max_rounds 1..200, agent_max_tool_calls 0..1000; 400 on non-int).
- src/settings.py: default `agent_max_rounds = 20`.
- static/: Settings input + client-side clamp; the Continue pill (reuses the
  existing .stopped-indicator / .continue-btn classes and theme vars
  --border/--fg/--bg/--accent); appended to the chat container so it survives
  the message re-render at stream finalize. chat.js cache version bumped.

* test: cover rounds_exhausted emission (cap-hit vs normal finish)

Drives the real stream_agent_loop with mocked LLM stream / tool exec / settings:
a tool block every round exhausts the cap and must emit rounds_exhausted; a
plain answer hits the done-break and must not. Guards the for/else logic.
2026-06-04 22:36:05 +02:00
Kenny Van de Maele 1cd0aa2b8c feat(provider): add GitHub Copilot provider with device-flow auth (#1480)
* feat(provider): add GitHub Copilot provider with device-flow auth

Adds GitHub Copilot as a model provider, so Copilot models (gpt-4o/4.1/5,
Claude, Gemini, …) work through the normal chat + agent loop, incl. native
tool calling and vision.

Auth is one-click via the GitHub OAuth device flow; the access token is stored
as the endpoint's (encrypted) api_key and sent directly as `Authorization:
Bearer` (no Copilot-token exchange, no refresh — matching how editors talk to
the Copilot API). Copilot is a normal ModelEndpoint detected by host; the only
provider-specific behaviour is a small set of required request headers,
injected centrally.

Sign-in is available from Settings → model endpoints ("Connect GitHub
Copilot") and from chat via `/setup copilot`.

- src/copilot.py (new), routes/copilot_routes.py (new): constants, header
  builders, device-flow start/poll, model discovery, owner-scoped endpoint
  provisioning.
- src/llm_core.py, src/endpoint_resolver.py: detect `copilot`, inject headers,
  per-request x-initiator/vision.
- src/agent_loop.py: allowlist api.githubcopilot.com for native tool schemas.
- src/model_context.py: known context windows for Copilot (no unauthenticated
  /models probe).
- static/, README, tests/test_copilot*.py.

* Tidy copilot_routes: clarify supports_tools, note _PENDING is per-process
2026-06-04 21:13:14 +02:00
Kenny Van de Maele 7443c36bd9 feat: Add edit_file tool + file-change diffs (#1239)
* Add edit_file tool + file-change diffs

edit_file is an exact old_string -> new_string replacement on a file on disk
(fails if old_string is missing or non-unique unless replace_all); write_file
also returns a unified diff. Diffs render collapsed in the tool bubble
(filename + +adds/-dels, theme colors); the raw JSON command box is hidden.

Security: edit_file is a sensitive filesystem-write tool, treated everywhere
write_file is —
  - added to NON_ADMIN_BLOCKED_TOOLS (is_public_blocked_tool / blocked_tools_for_owner),
    so on auth-enabled deployments a non-admin cannot run it; execute_tool_block
    refuses it for non-admin owners.
  - confined by the same path policy as read_file/write_file (allowlist +
    sensitive-file deny) via _resolve_tool_path.

Disambiguation in tool descriptions + bash prompt: edit_file/write_file are the
only way to write files (they show a diff) — never edit_document (editor panel)
or a bash heredoc/redirect.

Tests (tests/test_edit_file.py): non-admin block (policy + execution gate),
successful edit, not-found old_string, non-unique old_string (+ replace_all),
and path outside the allowed roots.

Files: src/tool_execution.py, src/agent_loop.py, src/tool_schemas.py,
src/agent_tools.py, src/tool_index.py, static/js/chat.js, static/style.css,
tests/test_edit_file.py.

* Drop redundant import os in write_file closure

os is already imported at module top.
2026-06-04 18:29:10 +02:00
Yuri a2e691da2b fix(models): stabilize proxy endpoint refresh behavior
* fix: support large proxy model endpoint refresh

Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline.

Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails.

Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage.

* fix: preserve endpoint ping status semantics
2026-06-04 04:56:11 +01:00
pewdiepie-archdaemon 089246614d feat: Claude Agent integration + cookbook reconnect + UI polish
- Claude Agent integration: AGENT_CONFIGS.claude, INTG_TYPES.claude,
  setup_claude_routes + integrations/claude/ skill bundle. Wired in
  app.py alongside the existing Codex integration; same scope-gated
  /api/codex/* backend; agent form has new description so users know
  it's setup for an external CLI, not an agent streamed inside Odysseus.
- Remove mark_email_boundaries action: not good enough yet. Stripped
  from task UI, scheduler defaults, registry, tool schema, clear-cache
  route. Added to RETIRED_HOUSEKEEPING_ACTIONS so existing rows + their
  task_runs auto-purge on startup.
- Cookbook download reliability: "Reconnect" fix button in the crash
  diagnosis runs _reconnectTask after probing has-session. 30s confirm
  window before marking a download "done" — kills the Finished/Downloading
  flicker when tmux briefly drops between captures.
- Mobile UX: tap anywhere on a note card body opens the editor;
  Update button morphs to Archive when no text was edited; bell icon
  accent-colored; chip-trashing notif pills fade so only the icon
  rotates into the trash zone.
- Settings integrations: SVG-per-provider in email + API preset
  dropdowns, custom drop-up-aware menus, accent sub-header icons
  (IMAP/SMTP), consistent card styling between list + edit, contacts
  Edit/Delete icons, agent form description copy.
2026-06-04 08:27:26 +09:00
pewdiepie-archdaemon 562bc4dedc Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
  ascending then truncated top-N, which made "highest VRAM" mathematically
  unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
  re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
  default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
  Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
  with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
  actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
  QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.

Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
  re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
  later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
  tooltip. Smart title-suffix strips the parts already in the repo name
  (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
  (#1962) for model-add requests.

Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
  "Download complete"). True overall % for multi-shard downloads:
  ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
  live shard line, the pill is hidden + relabels as "reconnect" + revives
  on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
  to 8s), scan persisted done/error/crashed downloads and probe their
  tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
  state is done but tmux is still alive revives the existing task and
  refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
  /api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
  DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
  shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
  status got stuck. Dir text on the running row, font matched to uptime.

Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
  metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
  matches body; action buttons get icons on the left (Retry/Copy/Edit/
  Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
  downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
2026-06-03 20:25:25 +09:00
Robin Fröhlich 3c6ae3713e Models: add Z.AI coding endpoint and GLM vision detection 2026-06-02 20:59:17 +09:00
Nikita Rozanov 119075f368 Research: add configurable run timeout
Surfaces the research_run_timeout_seconds setting (added in #783) in
Settings → Research as a "Max Time" field, and lets 0 disable the
wall-clock cap entirely for long deep-research runs.

- settings.py: document that 0 disables the cap; default stays 1800s.
- research_handler.py: resolve 0 (or negative) to no timeout
  (asyncio.wait_for timeout=None); other values stay bounded to
  [60, 86400] as before.
- index.html / settings.js: "Max Time" input bound to
  research_run_timeout_seconds, validated to {0} ∪ [60, 86400], with
  copy making explicit that 0 = no limit (unbounded model/API cost).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 20:57:57 +09:00
red person d0c925f6c8 Chat attachments: allow picker to choose any file type 2026-06-02 20:55:30 +09:00
Kenny Van de Maele cfb7ec1c71 Accessibility: add labels and toggle states
* Accessibility: ARIA labels and toggle states

Screen readers couldn't name several icon-only controls or tell whether the
tool toggles were on. This adds accessible names and exposes toggle state,
with no behavior or layout change.

- Icon-only buttons get aria-label: web/shell tool toggles, the "more tools"
  overflow button (+ aria-haspopup), and the color-reset buttons.
- Unlabeled inputs/selects get aria-label: memory + skills search, model-picker
  search, memory sort, theme font/density selects, and the new-memory / skill
  (title, when-to-use, how, tags) fields, which only had a visual floating label.
- Toggle state via aria-pressed, kept in sync at the existing .active write
  sites: web/shell toggles (setupToggle) and the Agent/Chat mode buttons
  (initModeToggle). Static aria-pressed added in the markup so the attribute
  exists before JS runs.

Scope: first slice of the ROADMAP accessibility pass. Focus-visible/contrast,
reduced-motion, and modal dialog roles/focus-trap are left for follow-ups.

Checks: node --check static/app.js. No Python touched.

* Accessibility: mark chat log busy while streaming

The chat log is an aria-live="polite" region, so streaming a response
token-by-token made screen readers announce every partial update — noisy and
unreadable. Set aria-busy="true" on #chat-history while a response streams and
back to "false" in the stream's finally block. Assistive tech then waits for
the settled message and announces it once.

Checks: node --check static/js/chat.js.
2026-06-02 20:55:05 +09:00
Shaw 8115cb01a2 Models: allow API keys for local endpoints
Self-hosted endpoints on a LAN are sometimes protected by an API key. The admin
"Local" add/test form only sent base_url (+ model_type), so such an endpoint
could not be added — it just errored out — even though the backend
POST /api/model-endpoints and /model-endpoints/test already accept an optional
api_key form field (the cloud "API" form already uses it).

Adds an optional masked "API key" input (adm-epLocalApiKey) to the Local form
and wires it into the local Test and Add handlers, sending api_key only when
filled (an empty value is omitted so we never send a blank Bearer). The field
is cleared after a successful add, matching the cloud form.

Tested: tests/test_local_endpoint_api_key_js.py extracts the two click handlers
and runs them under node with mocked DOM/FormData/fetch, asserting api_key is
sent when the field is filled and omitted when blank, plus that the input
exists as a password field. `node --check static/js/admin.js` passes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:36:54 +09:00