maybe_extract_skill() sliced the LLM response from the first '{' to the
last '}'. When a model emits a stray brace in prose before the real
object (e.g. "uses {placeholder} then {...}"), the slice starts at the
prose brace, json.loads fails, and a valid skill is silently dropped.
Factor parsing into _extract_json_object(), which tries the whole
(de-fenced) string first and then each '{' start position, returning the
first candidate that parses to a JSON object.
Adds tests/test_skill_extractor_json.py.
_fallback_memory_candidates matched both positive (prefer/like/love) and
negative (hate / do not like / don't like) sentiment verbs in one regex
alternation, then formatted every hit as "User prefers {X}.". So
"I hate cilantro" was stored as "User prefers cilantro." -- the inverse of
what the user said. These fallback facts are persisted to memory and later
re-injected into the model's context, so the inverted preference actively
misleads the assistant.
Capture the matched verb and branch on it: negatives become
"User dislikes {X}.", positives stay "User prefers {X}." (still filed under
the existing "preference" category).
Supported by Claude Opus 4.8
Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
* Convert to different style of comment to make it easier to work with, fix formatting inside Powershell script.
* Grab VRAM amount from driver's registry keys.
* Fixed regression on NVIDIA GPUs
This change extends both the `PATH` variable and the list of absolute paths used to locate the `nvidia-smi` package to include `/usr/lib/wsl/lib`.
This path is a candidate for the default location of nvidia-smi for WSL machines (tested on WSL Ubuntu 22.04.5).
* feat(skills): import SKILL.md bundles from public GitHub URLs
Supports GitHub tree/blob/raw links and skills.sh pages that resolve to GitHub.
Installs SKILL.md plus sibling text assets under data/skills/imported/.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): admin-gate URL import and validate redirect hosts
- require_admin on POST /api/skills/import-from-url (matches other skill admin routes)
- reject cross-host redirects after httpx follow_redirects
- test for redirect host validation
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): match Brain Add panel import/submit button styles
- Skill URL Import: theme-io-btn + download icon (same as memory Import)
- Add Skill submit: confirm-btn confirm-btn-primary
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): allow api.github.com during directory import
Real imports hit the GitHub contents API after redirects; whitelist
api.github.com and add regression tests. Shrink Import button with flex:none.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): align skill Import button with URL input row
Match memory-add-input height (28px) in memory-add-row and center the
download icon with flexbox instead of vertical-align hacks.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): cancel modal-body margin on skill Import button
The skill Import button sits in .memory-add-row beside an input; the
global .modal-body button { margin-top: 6px } rule only affected buttons,
pushing Import down and misaligning the download icon. Reset margin-top
and match Memory Import SVG markup at 28px row height.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(skills): surface GitHub API errors on URL import
Pass through GitHub response messages (especially 403 rate limits) as
SkillImportError instead of a generic download failure.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
#1473 converted the title and sports-hint matches in services/search/ranking.py
to word boundaries but left two raw substring tests:
- snippet_score: 'term in snippet.lower()' — query term 'port' hits
'transport'/'support', inflating a result's relevance.
- news_quality_adjustment: 't in text or t in netloc' for the subject term —
query 'us' substring-matches 'business'/'music', so an off-topic page
wrongly escapes the off-topic penalty on a country/subject news query.
Add a _has_word helper (the same \b...\b pattern title_score already used) and
route all three word checks (title, snippet, subject) through it, so the file
stays consistent and a future partial fix can't reintroduce the same bug class.
Pure ranking refinement: scores change only for spurious substring matches; no
API or schema change.
(cherry picked from commit 22bd23f044)
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
extract_and_store dedups each extracted fact against the vector store
before the (owner-scoped) text fallback. The vector store is a single
shared ChromaDB collection storing only {"source": "memory"} — no
owner — and find_similar queries it with no owner filter, so it can
return a memory_id belonging to a different tenant. The old code
continue'd (skipped storing) on any vector hit without checking
ownership, so when ChromaDB is healthy (the common path) a user's
freshly-extracted fact was silently dropped because it was merely
semantically similar to another user's memory — the text fallback that
IS owner-scoped never ran. Gate the skip on the matched memory being
this user's own (or legacy unowned), mirroring the text dedup predicate;
cross-tenant or stale matches fall through. Same bug class as #1743.
Odysseus only supports llama.cpp on Windows (vLLM/SGLang are
explicitly blocked). llama.cpp requires GGUF, so AWQ/GPTQ/FP8
safetensors models without a GGUF alternate should not be
recommended in the Cookbook on Windows hosts.
Changes:
- hardware.py: add 'platform': 'windows' to _detect_windows()
so downstream logic can identify Windows hosts.
- fit.py: include is_windows in the existing GGUF-only filter
alongside apple_silicon and consumer_amd.
- tests: add test_hwfit_windows.py with regression tests.
Fixes#122, #614 (root cause: unservable models recommended).
The multi-GPU GGUF filter at fit.py:380 returned None unconditionally
for Q*/IQ quants on 2+ GPU systems. When the caller explicitly passes
target_quant, they are asking 'what happens if I try this?' and expect
a structured no_fit response, not a silent None.
Fix: skip the filter when target_quant is explicitly provided so the
call falls through to the existing no_fit path.
Fixes #
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
Backend (services/hwfit + routes):
- rank_models picks visible set by REQUESTED column, not always score —
sorting by Param now shows highest-param models PERIOD (incl. too_tight).
- New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang
cannot serve them); default non-prequantized to BF16 on 2+ GPUs.
- AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so
FP8 wins when both fit.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above
M2.5 on equal composite score; >=100B integers not misread as versions.
- /api/cookbook/hf-latest no longer drops models without an "NB" pattern in
the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered).
- Cached-model scan: atexit flushes models JSON even if the script is
killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s.
- KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New
"stalled" status for shells <1 MB with no .incomplete files.
- /api/cookbook/state POST guard: rejects "done" download tasks lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stops stale tabs from poisoning persisted state.
- hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization
Q4_K_M -> BF16 (it is the native base, not a quant).
Frontend (static/js):
- Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/
50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score
on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit
column header tagged with active budget (RAM / GPU / N GPU).
- Foldable Download admin-card: the Download h2 is the chevron trigger;
state persists in localStorage.
- Download card surfaces destination dir (Dir: <path>). Same dir on running
task row, font/color matched to uptime (9px Fira Code muted, opacity .4).
- Serve panel ctx text input always resets to model max on open. Sub-MB
cached models show with red "download stalled" badge.
- Bulk-select Cancel + Delete reset the Select button label on exit.
- Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/
required; bare "Download complete" no longer marks the task done after
the first config file. Clear button now sends tmux kill-session too.
True overall % for multi-shard downloads: ((N-1)+frac)/total instead of
hf_transfer per-shard aggregate.
- Diagnosis card simplified: removed fold toggle, copy button, dismiss X.
Suggestion font matches message body (12px).
- HF token field flashes green check + "Saved" on save.
- Cached scan no longer counts stalled rows as downloaded in Scan/Download.
CSS:
- dep Install button width pinned to 76px to match Installed split.
- task-sub row +1px; task-status badge gets margin-right 8px.
- Ctx slider styled like gallery editor sliders (thin pill rail, red thumb).
- Bulk-select cancel button top -3px -> -5px.
tavily_search, serper_search and google_pse_search parsed response.json()
inside the network try block, which only caught httpx.RequestError and
RateLimitError. When a provider returned a non-JSON body (an HTML error page, a
truncated/empty body, a gateway 5xx), response.json() raised an UNCAUGHT
json.JSONDecodeError that aborted the search in the background — exactly the
'search engines other than SearXNG fail in the background' symptom.
brave_search already handles this correctly: it parses JSON in its own try
block and returns [] on json.JSONDecodeError. Mirror that in the other three
providers so a malformed provider response degrades to no-results instead of
propagating an exception.
Adds tests/test_search_provider_json.py: a non-JSON 200 body now yields [] for
tavily, serper, google_pse, and brave (the last guards the reference behaviour).
Co-authored-by: NubsCarson <nubs@nubs.site>
* fix: match skill tags as whole tokens, not substrings, in retrieval
* test: skill tag matching uses whole tokens, not substrings
* test: give skill fixtures status=published so they reach the scoring path
synthesize() and get_stats() parsed the stored tts_speed with a bare
float(settings.get("tts_speed", "1")). The manage_settings agent tool maps
"speech speed"/"voice speed" to tts_speed and, because the setting's default is
a string, writes the value through unvalidated — so an agent (or a hand-edited
settings.json) can store "fast" or "". After that, GET /api/tts/stats and POST
/api/tts/synthesize both 500 with ValueError until the JSON is corrected by hand.
Parse defensively via a _safe_speed() helper (non-numeric/empty/<=0 -> 1.0),
mirroring the settings layer's tolerance of corrupt config.
Adds tests/test_tts_speed_malformed.py (stats + synthesize) — both raise
ValueError before this change and pass after.
Make services.search.analytics tolerate missing counters in older or partial analytics files by merging loaded data over defaults, with regression coverage.
Fix services.memory bullet-list extraction by grouping the bullet/number regex before the capture, and cover both memory manager copies in the regression test.
SearchService.search() did:
raw_results = await comprehensive_web_search(
query, max_results=10 * depth, fetch_content=fetch_content)
comprehensive_web_search is a synchronous function whose count knob is
`max_pages` (not `max_results`) and which has no `fetch_content` parameter, so
the call raised TypeError on argument binding; `await` on its non-coroutine
return would also fail. It returns a context string, or a (context, sources)
tuple with return_sources=True — not the list of dicts the wrapper iterates.
The method is exported in services/search/__init__.py and services/__init__.py
with a usage example in its docstring, so any caller of the documented public
API hit an immediate crash. Call it correctly via asyncio.to_thread with
max_pages + return_sources=True and use the returned source list as the rows.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
get_search_config returned SEARCH_CONFIG.copy(), and update_search_config
cached the decrypted Brave key into that shared global at startup
(app_initializer), so the unauthenticated /api/search/config route exposed
the operator's key. The cache was dead weight: brave_search reads its key
via _get_provider_key (settings/env), never SEARCH_CONFIG.
- update_search_config: no longer stores the api_key in the shared global
(accepted for backward compat; provider keys are read on demand).
- get_search_config: scrub any string-valued credential field before
returning, preserving the has_api_key presence flag.
No schema change; brave_search/_get_provider_key untouched. Adds regression
tests.
Fixes#1661
Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
audit_memories saves final_entries merged with other owners' entries
(correct), but then rebuilt the shared vector collection from
final_entries alone — wiping every other owner from semantic search
until they happened to run their own audit. Keyword fallback masked
it, so it degraded silently. Capture saved_entries once and rebuild
from that.
Caught by #1747.
_detect_nvidia parsed nvidia-smi --query-gpu=memory.total,name and did
float(memory.total) per row, dropping the row on ValueError. Grace Blackwell
GB10 (DGX Spark, sm_121) reports memory.total as '[N/A]'/'Not Supported'
because the GPU shares the system LPDDR pool rather than carrying discrete VRAM
— so the only GPU row was dropped and a real GB10 (even with vLLM running on it)
was reported as 'No GPU', breaking Cookbook recommendations and model switching.
Keep a named device whose memory.total is non-numeric: when there are no
discrete-VRAM rows but such unified devices exist, report a unified-memory CUDA
GPU backed by the system RAM pool (has_gpu, name, backend=cuda, count,
unified_memory=True) — mirroring how Apple Silicon and AMD APUs are already
handled. Discrete GPUs are unchanged, and a box with a real discrete GPU keeps
the discrete path.
Adds tests/test_hwfit_unified_nvidia.py with a GB10 nvidia-smi fixture: the
device is detected (not dropped), surfaces through detect_system with
unified_memory propagated, discrete GPUs stay non-unified, and a discrete GPU
takes precedence over an N/A-memory row.
Co-authored-by: NubsCarson <nubs@nubs.site>
* fix: detect question words as whole words, not prefixes
* fix: same question-word prefix bug in the services search copy
* test: question-word detection rejects prefix lookalikes