odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-15 17:25:26 -04:00

Author	SHA1	Message	Date
Kenny Van de Maele	e87b44126c	test(hwfit): fix non-Apple guard to assert the Apple matcher (unblocks pytest gate) (#4303 ) * test(hwfit): assert the Apple matcher, not the general lookup, in the non-Apple guard `f7aa2de` (#2564) added test_non_apple_gpu_with_cores_does_not_match, which asserts _lookup_bandwidth(RTX 4090) is None. But '4090': 1008 has been in the general GPU_BANDWIDTH table since v1.0, so _lookup_bandwidth correctly returns the card's real bandwidth and the test fails (expected None, got 1008) - reddening the required pytest gate on dev and, by inheritance, every open PR. The guard's actual intent is that the Apple-specific bandwidth path does not false-match a non-Apple card that carries a gpu_cores count. Point the two asserts at _lookup_apple_bandwidth, which returns None for any name without 'apple' regardless of the general table. The general-lookup behavior (4090 -> 1008) is correct and untouched. * fix(hwfit): route string GPU names through the Apple bandwidth helper Second half of the #2564 regression (RaresKeY review on #4303). That change moved the Apple tiers out of the generic GPU_BANDWIDTH table into the dict-only _lookup_apple_bandwidth, but _lookup_bandwidth only called that helper for dict inputs. A bare-string caller like _lookup_bandwidth("Apple M3 Max") therefore fell through to the generic table, found no Apple key, and returned None instead of the conservative tier. Route both dict and string inputs through the Apple helper (a string carries no gpu_cores, so it gets the model's lowest tier). Regression added for the string path plus a non-Apple string control.	2026-06-15 14:01:05 +00:00
Ahmad Naalweh	f7aa2de410	fix(hwfit): distinguish Apple Silicon bandwidth variants (#2564 ) * fix: resolve Apple Silicon bandwidth variants * fix(hwfit): preserve string lookup path in _lookup_bandwidth * fix(hwfit): guard Apple bandwidth lookup against false GPU matches Add "apple" not in gn check to _lookup_apple_bandwidth() so that non-Apple GPUs with "m3"/"m4"/"m5" in their names (e.g. NVIDIA Quadro M4 000) don't incorrectly match Apple bandwidth tiers. Addresses @o3LL review comment on PR #2564.	2026-06-15 15:13:03 +02:00
andrewemer	cd02ac7ef6	fix(agent): skill-prescribed tools never reach the model's schema list (#4008 ) * Agent: make skill-prescribed tools actually callable The skill index and matched-skill procedures are injected into the prompt, but tool selection never followed: manage_skills wasn't in the RAG-selected schema list (so the model substituted manage_memory), and a matched skill could prescribe tools (grep, read_file) the model had no schema for. Now: - manage_skills rides along whenever the owner has any skills indexed - a Jaccard-matched skill's requires_toolsets join the selection - viewing a skill mid-turn via manage_skills unlocks its requires_toolsets for subsequent rounds - admin-intent turns send _ADMIN_TOOLS schemas, matching the prompt text _build_base_prompt already advertises - index_for(active_toolsets=None) no longer hides requires_toolsets skills from callers that don't know the active set Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Agent: validate skill requires_toolsets against known tools, not TOOL_SECTIONS grep/glob/ls ship as function schemas without a prompt-prose section, so gating on TOOL_SECTIONS silently dropped them from a skill's requires_toolsets. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-15 20:32:43 +09:00
Vishnu	933ec8fec9	fix(memory): reject ambiguous multi-object outputs during skill extraction (#3985 )	2026-06-15 10:44:43 +00:00
Karthik Rajesh	674457384a	feat(cookbook): surface Docker hardware visibility warnings (#3658 )	2026-06-15 15:51:04 +09:00
holden093	4c41834dc7	fix(youtube): consolidate duplicate handler Make src.youtube_handler a compatibility wrapper around services.youtube.youtube_handler so transcript state, URL parsing, and timeout behavior no longer diverge.	2026-06-15 15:03:41 +09:00
Ashvin	b20cea347a	fix(hwfit): serve profiles for sub-8192 context models Allow serve-profile generation for models whose trained context window is below 8192 while preserving the 8K shrink floor for larger models.	2026-06-15 15:02:22 +09:00
Kenny Van de Maele	bfac1d55d6	fix(search): read plain-text, Markdown, and JSON URLs in fetch_webpage_content (#3809 ) raw.githubusercontent.com serves Markdown as text/plain, JSON APIs and raw config files serve application/json, and a lot of code and tool documentation lives in .md/.txt. fetch_webpage_content only handled PDF and HTML, so a non-HTML body produced empty content and web_fetch reported 'no readable text content'. Add a branch that returns the body verbatim for non-HTML text/*, JSON (application/json and +json), and a .md/.txt/.text/.json URL-suffix fallback for mislabeled octet-stream. HTML and PDF handling unchanged. Fixes #3808	2026-06-11 14:24:53 +00:00
ThomasAngel	a0b0420e6f	chore: Switch duckduckgo-search to ddgs (#3143 ) * Switch to ddgs duckduckgo_search was deprecated, this is the recommended replacement * Update test_service_search_provider_guards.py According to review comment	2026-06-10 17:59:47 +02:00
ooovenenoso	725d174243	fix(research): track analyzed URLs separately (#3125 ) Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-10 12:08:22 +01:00
pewdiepie-archdaemon	637a34515d	Merge remote-tracking branch 'origin/main' into dev	2026-06-09 10:41:48 +09:00
Boody	f605bb3864	fix: Enforce dynamic custom search result limits in backend (#2359 ) * fixed confusing credentials prompt * fix(setup): return status from create_default_admin function * fix(setup): initialize admin creation status in main function * fix(setup): enhance admin creation feedback and status handling * Enhance admin user login messages with conditional feedback based on creation status * Refine admin user creation feedback messages for clarity and actionability and formatted code * Add fallback error message for admin creation failure in setup script * Add run script for Uvicorn with dotenv integration * Refactor server runner to use argparse for host and port configuration * Remove captured output print statement from server runner * Fix server runner to ensure cross-platform compatibility and improve log handling * removed run.py to match original repo * Fixing custom search not working properly * Refactor search settings event listeners for improved functionality and clarity * Update search function signatures to use Optional for count parameter * revert changes * fixed broken merge issue * Delete services/chat_data_scraper.py added by mistake --------- Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 02:20:59 +01:00
pewdiepie-archdaemon	fa8c93ec0a	Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish Surface a lot of accumulated cookbook + UI work as a single non-agent commit so the agent rework lands cleanly. Highlights: - Ollama as a first-class backend in the Cookbook: * Download input accepts ollama-style names (name:tag) → backend=ollama * /api/cookbook/ollama/library (cached scrape of ollama.com + curated fallback so classic models like qwen2.5 stay reachable) * "Browse Ollama library" toggle below Download with size chips * Engine=Ollama in hwfit toolbar merges the Ollama library into the main scan list as per-tag rows with the same Fit/Param/Quant/VRAM columns; click → fills Download input - API Tokens form added to Integrations panel (matching wired loadTokens()/initTokenForm() that had no HTML) - Serve panel polish: Advanced fold tightening (-8px nudges on vLLM checks, Extra args, Spec row), n_cpu_moe + Split Mode controls pulled up 8px to align with the row's checkboxes, GGUF File dropdown exposed for Ollama backend, GPU re-render on Edit serve restore, _forceBackend flag so saved serveState wins over backend detection, cookbook:servers-changed CustomEvent so panels don't need refresh - Models page redesign: Add Models row (URL + hidden API key reveal + Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All + Clear-offline buttons in Added Models toolbar, offline-pill removed (opacity already conveys state), Engine dropdown gains Ollama option - _ping_endpoint probes /v1/models then base, accepts 4xx as reachable (vLLM returns 404 on bare /v1, fully working endpoints were showing offline) - Diagnosis card: × dismiss + Copy bundle buttons restored on the serve error feedback card - Orphan tmux sweep re-enabled behind a 60s rate-limit + background Thread (off the main event loop) so dead serves get discovered - cookbook_routes auto-register watchdog: drops the endpoint if the serve session exits non-zero within the first ~3min - ollama-rocm sidecar awareness in download wrapper (`docker exec ollama-rocm ollama pull` when host ollama isn't installed) - Skill extractor sets initial_status="published" when auto_approve_skills pref is on (audit demotes later) - Skill list / model list / cookbook scan misc polish	2026-06-09 09:46:19 +09:00
CorVous	34a3f8637a	fix(memory): make auto-memory extraction reliable for reasoning models (#3190 ) * fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn extract_and_store appended the recent window as raw alternating role messages after the system prompt. Since the window is the last N messages, the prompt usually ENDED on an assistant turn — and a chat model given a prompt ending on an assistant turn returns an empty completion (nothing to answer). The result was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no memories were ever stored, while skill extraction (which flattens the transcript into a single user message) worked fine. Flatten the window into one user message ending with an explicit instruction, mirroring the skill extractor, so the model always responds. Also harden parsing for reasoning models, matching the audit path which already does this: - raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think> before emitting JSON; 500 truncated it before any JSON appeared); - strip <think>/prose preambles via strip_think and slice the embedded JSON array before json.loads, instead of bombing on char 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(memory): reframe the comment to the accurate root cause (raw-chat framing) The earlier comment leaned on "ends on an assistant turn -> empty completion", which is only one failure mode. The dominant cause, confirmed by a controlled repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat messages makes the model treat it as a conversation to continue rather than a transcript to analyze, so it returns [] even when durable facts are present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally Factor the strip/fence/slice/json.loads logic out of extract_and_store into a pure module-level helper _parse_extraction_json(raw) -> list and drop the 'text[0] != "["' guard so the array is sliced whenever both brackets exist (fixes trailing commentary like '[...] Done!' reaching json.loads). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 19:57:44 +02:00
Mike	ac94885c84	refactor(constants): single source of truth for data dir (#3368 ) * refactor(constants): single source of truth for data dir + merge core/src constants Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(contributing): use named src.constants for data paths, drop core/constants references Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:58:52 +02:00
Lucas Daniel	fa7c4f8ea9	fix(search): catch HTTPStatusError so 403/404 URLs degrade gracefully instead of 500 (#2203 ) raise_for_status() raises httpx.HTTPStatusError for 4xx/5xx responses, but the surrounding try/except only caught httpx.RequestError (network errors) and RateLimitError (429). Any other HTTP error code propagated uncaught up through chat_processor -> chat_helpers -> chat_routes and surfaced as a 500 Internal Server Error. Added an explicit except httpx.HTTPStatusError clause that logs a warning and returns an empty result, matching the behaviour already in place for network errors. Also adds focused regression tests that exercise the real fetch_webpage_content() path with a mocked _get_public_url: - 403/404 responses return the standard empty-result shape instead of raising, proving the new HTTPStatusError handling works end to end. - 429 responses still take their own dedicated rate-limit branch (the status_code == 429 check runs before raise_for_status() is reached), keeping that behaviour distinct from the new generic HTTPStatusError handling. Dropped the unrelated builtin_mcp.py change that had been carried over from a rebase; that fix is tracked separately in #2018 and this branch should stay scoped to the search content fetch path. Closes #2148	2026-06-08 01:09:21 +01:00
horribleCodes	9c90f62657	fix(platform): Improve WSL SSH remote compatibility (#3316 ) * fix(platform): add WSL compatibility functions and path translation fix(cookbook): enhance model scan script to support additional HuggingFace cache paths fix(hardware): improve cache key generation for remote SSH context test(tests): add tests for WSL detection and path translation functionality * fix(cookbook): prefer prebuilt wheels for llama-cpp-python and normalize package aliases * fix: enable StrictHostKeyChecking in nvidia probe refactor: consolidate ssh & powershell command execution to utility functions in core module refactor: consolidate nvidia path candidates in to single variables in core module tests: add tests for new utility functions * fix: correct wrong variable name	2026-06-08 00:33:50 +02:00
Lucas Daniel	73315e6ddc	fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON (#2205 ) * fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON The extractor sliced from the FIRST brace to the LAST brace to recover JSON embedded in surrounding commentary. When the model emits stray braces before the JSON object, the slice produces invalid JSON, json.loads raises, and the exception is swallowed -- the skill is silently lost. Fix: walk each brace candidate left-to-right and attempt json.loads on each slice. The first candidate that parses successfully wins. If none parse, json.loads on the original text raises and the existing JSONDecodeError handler logs and returns None as before. Tested locally -- 8/8 tests passed: tests/test_extract_skill_json_nonstring.py (2 passed) tests/test_skill_extractor_rows.py (1 passed) tests/test_search_content_extraction_parity.py (2 passed) tests/test_deep_research_search_error.py (3 passed) Closes #2199 * test(skill-extractor): add focused repro for stray-brace JSON recovery * test(skill-extractor): add regression test for leading invalid-brace fragment Addresses the remaining edge case from review: a response that starts with a brace but the leading fragment isn't valid JSON (e.g. '{not json}\n{"title": "Valid later", ...}') still needs to recover the valid skill object that follows. _extract_json_object (already on dev) handles this correctly — it tries the whole de-fenced string first, then walks each '{' candidate left-to- right regardless of whether the response begins with '{', so the leading invalid fragment no longer short-circuits recovery of the real object. Updates the comment at the call site to call this out explicitly and adds a regression test covering exactly the scenario described in review.	2026-06-07 23:31:12 +01:00
Kenny Van de Maele	92300b5d67	fix(search): write cache under DATA_DIR, guard mkdir against read-only path (#3334 ) services/search/cache.py set CACHE_DIR = services/cache (the source tree) and mkdir'd it at import, unguarded. In Docker services/ is the read-only image layer, so the mkdir fails at import (same class as the analytics bug #2366). Move the cache under DATA_DIR/cache (writable on Docker and native) and wrap the mkdir so an unwritable path disables disk cache instead of crashing import. Part of #3331. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 22:37:12 +01:00
Giuseppe Castelluccio	6c9a16a7a8	fix: search analytics FileHandler crashes on startup writing to read-only image layer (#2366 ) * fix: move search analytics log to writable /app/logs volume services/search/analytics.py opened a FileHandler at module import time pointing to /app/services/search_engine_error.log — inside the container image's read-only layer. The process runs as non-root so the open() fails with PermissionError, crashing uvicorn before it ever binds. ANALYTICS_FILE had the same problem. Both paths now point to /app/logs (bind-mounted from the host data directory). The FileHandler creation is wrapped in try/except so a missing mount doesn't hard-crash on import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: derive log dir from DATA_DIR instead of hardcoded /app/logs Fixes reviewer feedback on #2366: /app/logs only exists inside Docker, so native runs couldn't write the analytics file. DATA_DIR resolves to the repo's data/ directory on native and /app/data (writable mount) in Docker, making both the error log handler and ANALYTICS_FILE work on every platform. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 19:26:22 +02:00
Bipin Mishra	b22c2b280c	fix(hwfit): detect NVIDIA GPU on WSL and other minimal-PATH environments (#3306 ) The nvidia-smi absolute-path fallback in _detect_nvidia() was gated on _remote_host, so it never ran for local detection. On systems where nvidia-smi is not in the default PATH (e.g. WSL: /usr/lib/wsl/lib/), this caused the Cookbook to report 'No GPU' even when nvidia-smi works from an interactive shell. Two issues fixed: 1. Removed the _remote_host gate so the absolute-path scan runs for local detection too. 2. For local execution, pass arguments as a list instead of a string so subprocess.run() resolves the absolute path correctly. Remote (SSH) execution keeps the string form, which the SSH command builder handles. Co-authored-by: Bipin Mishra <bipin.mishra@atlascopco.com>	2026-06-07 17:53:49 +02:00
Mazen Tamer Salah	92ef01d4fa	fix(skills): tolerate a stray brace before the JSON in skill extraction (#2200 ) maybe_extract_skill() sliced the LLM response from the first '{' to the last '}'. When a model emits a stray brace in prose before the real object (e.g. "uses {placeholder} then {...}"), the slice starts at the prose brace, json.loads fails, and a valid skill is silently dropped. Factor parsing into _extract_json_object(), which tries the whole (de-fenced) string first and then each '{' start position, returning the first candidate that parses to a JSON object. Adds tests/test_skill_extractor_json.py.	2026-06-07 16:54:36 +02:00
SurprisedDuck	c75d3e1975	fix(memory): record dislikes as dislikes, not preferences (#2435 ) _fallback_memory_candidates matched both positive (prefer/like/love) and negative (hate / do not like / don't like) sentiment verbs in one regex alternation, then formatted every hit as "User prefers {X}.". So "I hate cilantro" was stored as "User prefers cilantro." -- the inverse of what the user said. These fallback facts are persisted to memory and later re-injected into the model's context, so the inverted preference actively misleads the assistant. Capture the matched verb and branch on it: negatives become "User dislikes {X}.", positives stay "User prefers {X}." (still filed under the existing "preference" category). Supported by Claude Opus 4.8 Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>	2026-06-07 16:36:07 +02:00
n2b12	fb3e89b011	VRAM detection under native Windows install (#1610 ) * Convert to different style of comment to make it easier to work with, fix formatting inside Powershell script. * Grab VRAM amount from driver's registry keys. * Fixed regression on NVIDIA GPUs	2026-06-05 22:49:47 +02:00
horribleCodes	c8b4cd24e0	fix: Add WSL paths to hardware detection fallback (#2933 ) This change extends both the `PATH` variable and the list of absolute paths used to locate the `nvidia-smi` package to include `/usr/lib/wsl/lib`. This path is a candidate for the default location of nvidia-smi for WSL machines (tested on WSL Ubuntu 22.04.5).	2026-06-05 21:34:41 +02:00
Giulio Zelante	b448119919	feat(skills): import SKILL.md bundles from public GitHub URLs (#2576 ) * feat(skills): import SKILL.md bundles from public GitHub URLs Supports GitHub tree/blob/raw links and skills.sh pages that resolve to GitHub. Installs SKILL.md plus sibling text assets under data/skills/imported/. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): admin-gate URL import and validate redirect hosts - require_admin on POST /api/skills/import-from-url (matches other skill admin routes) - reject cross-host redirects after httpx follow_redirects - test for redirect host validation Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): match Brain Add panel import/submit button styles - Skill URL Import: theme-io-btn + download icon (same as memory Import) - Add Skill submit: confirm-btn confirm-btn-primary Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): allow api.github.com during directory import Real imports hit the GitHub contents API after redirects; whitelist api.github.com and add regression tests. Shrink Import button with flex:none. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): align skill Import button with URL input row Match memory-add-input height (28px) in memory-add-row and center the download icon with flexbox instead of vertical-align hacks. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): cancel modal-body margin on skill Import button The skill Import button sits in .memory-add-row beside an input; the global .modal-body button { margin-top: 6px } rule only affected buttons, pushing Import down and misaligning the download icon. Reset margin-top and match Memory Import SVG markup at 28px row height. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): surface GitHub API errors on URL import Pass through GitHub response messages (especially 403 rate limits) as SkillImportError instead of a generic download failure. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-05 19:48:23 +02:00
ghreprimand	cfb2d17a2d	Word-boundary match for snippet and subject-term ranking (#1473 follow-up) (#2556 ) #1473 converted the title and sports-hint matches in services/search/ranking.py to word boundaries but left two raw substring tests: - snippet_score: 'term in snippet.lower()' — query term 'port' hits 'transport'/'support', inflating a result's relevance. - news_quality_adjustment: 't in text or t in netloc' for the subject term — query 'us' substring-matches 'business'/'music', so an off-topic page wrongly escapes the off-topic penalty on a country/subject news query. Add a _has_word helper (the same \b...\b pattern title_score already used) and route all three word checks (title, snippet, subject) through it, so the file stays consistent and a future partial fix can't reintroduce the same bug class. Pure ranking refinement: scores change only for spurious substring matches; no API or schema change. (cherry picked from commit `22bd23f044`) Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-05 08:04:31 +01:00
afonsopc	28b296a712	Fix auto-memory vector dedup dropping a user's fact on cross-tenant match extract_and_store dedups each extracted fact against the vector store before the (owner-scoped) text fallback. The vector store is a single shared ChromaDB collection storing only {"source": "memory"} — no owner — and find_similar queries it with no owner filter, so it can return a memory_id belonging to a different tenant. The old code continue'd (skipped storing) on any vector hit without checking ownership, so when ChromaDB is healthy (the common path) a user's freshly-extracted fact was silently dropped because it was merely semantically similar to another user's memory — the text fallback that IS owner-scoped never ran. Gate the skip on the matched memory being this user's own (or legacy unowned), mirroring the text dedup predicate; cross-tenant or stale matches fall through. Same bug class as #1743.	2026-06-04 23:45:13 +01:00
Zen0-99	7188737294	fix(hwfit): filter non-GGUF models on Windows (#2530 ) Odysseus only supports llama.cpp on Windows (vLLM/SGLang are explicitly blocked). llama.cpp requires GGUF, so AWQ/GPTQ/FP8 safetensors models without a GGUF alternate should not be recommended in the Cookbook on Windows hosts. Changes: - hardware.py: add 'platform': 'windows' to _detect_windows() so downstream logic can identify Windows hosts. - fit.py: include is_windows in the existing GGUF-only filter alongside apple_silicon and consumer_amd. - tests: add test_hwfit_windows.py with regression tests. Fixes #122, #614 (root cause: unservable models recommended).	2026-06-04 20:02:13 +02:00
Nicholai	c916224510	feat(memory): add provider interface (#72 )	2026-06-04 16:26:11 +01:00
raf	cf5c5118d8	fix(hwfit): return no_fit instead of None when target_quant is a GGUF tier on multi-GPU (#2375 ) The multi-GPU GGUF filter at fit.py:380 returned None unconditionally for Q*/IQ quants on 2+ GPU systems. When the caller explicitly passes target_quant, they are asking 'what happens if I try this?' and expect a structured no_fit response, not a silent None. Fix: skip the filter when target_quant is explicitly provided so the call falls through to the existing no_fit path. Fixes #	2026-06-04 14:25:36 +01:00
Nicholai	4dc11cfe6b	refactor(memory): canonicalize memory imports (#50 )	2026-06-04 05:31:15 +01:00
Afonso Coutinho	03dbf976a5	fix: image model ranking crashes on a non-string search filter (#1898 )	2026-06-04 03:26:35 +01:00
Afonso Coutinho	5043b2924c	fix: image model ranking crashes when system is not a dict (#1900 )	2026-06-04 03:23:59 +01:00
Vykos	aaef6b1c49	fix(search): align content URL guards * Stabilize full test collection * Align search content URL guards	2026-06-04 00:34:06 +01:00
pewdiepie-archdaemon	6861c41580	Reapply "Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus" This reverts commit `cc8fe2f6e3`.	2026-06-03 22:47:00 +09:00
pewdiepie-archdaemon	cc8fe2f6e3	Revert "Merge branch 'main' of github.com:pewdiepie-archdaemon/odysseus" This reverts commit `8161c1253d`, reversing changes made to `8c2705b42a`.	2026-06-03 22:46:19 +09:00
Alexandre Teixeira	a75dd4a231	fix(search): apply recency UTC fix to live ranking module	2026-06-03 12:49:32 +01:00
pewdiepie-archdaemon	562bc4dedc	Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI Backend (services/hwfit + routes): - VRAM column sort now shows global highest first (was special-cased to ascending then truncated top-N, which made "highest VRAM" mathematically unreachable). Every column path uses reverse=True for the truncation. - Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep re-probing the rig during a session; Rescan button still forces fresh. - Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5. - hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the actual 156 GB / 284 GB disk footprints). - New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT / QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES. Frontend — Scan/Download: - Engine + Quant swapped in the toolbar; Quant defaults to "All". - Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag re-sorts by vram ascending (smallest fitting first); back to Max → score. - Ctx slider rail now visible — was background:transparent in a duplicate later-cascade rule. Hardcoded grey + !important. - Search input moved to the far right of the toolbar. - Type/Standard default; "Context" not uppercased; Search placeholder dimmed. - Engine "?" + Quant "?" inline help chips inside their dropdown boxes. - Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc. - Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in tooltip. Smart title-suffix strips the parts already in the repo name (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)"). - Conditional warning for safetensors models on non-GPU rigs only. - Dependency Install / Installed / Installed▾ / N/A all 75.85px wide. - Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag. - Foldable Download admin-card (h2 chevron); line under h2 only when folded. - HF token save gets a green ✓ + "Saved" flash. - Cached scan no longer counts stalled rows as downloaded. - Footer: "Request it →" link with GitHub mark to the public discussion (#1962) for model-add requests. Frontend — Running tab: - Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare "Download complete"). True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate. - ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m". - Clear button kills the tmux session too; if the output still shows a live shard line, the pill is hidden + relabels as "reconnect" + revives on click. - Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled to 8s), scan persisted done/error/crashed downloads and probe their tmux session — if alive, flip status back to running and reattach. - Per-launch zombie probe: clicking Download on a model whose persisted state is done but tmux is still alive revives the existing task and refuses to start a duplicate. - Pre-launch GPU probe: vllm / sglang / diffusers serve check /api/cookbook/gpus first; warns + confirms if no GPU is visible. - Server-side state guard: rejects "done" POSTs for downloads lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stale tabs can't poison persisted state any more. - Running count includes tasks whose output looks active even if persisted status got stuck. Dir text on the running row, font matched to uptime. Serve panel: - Ctx text input always resets to model max on open (default 20000 when metadata is missing). - Max Seqs default 8 -> 4. KV Cache dtype select 32px tall. - Lightning icon on Launch (same as Action toggle). - Diagnosis card simplified (no fold/copy/dismiss), suggestion font matches body; action buttons get icons on the left (Retry/Copy/Edit/ Install/Kill/Switch/etc.). - Incomplete-download serve warning when model status is downloading / stalled / has_incomplete. - MTP "?" tooltip ("supported on a few model families … up to ~3× faster").	2026-06-03 20:25:25 +09:00
pewdiepie-archdaemon	3706d756f3	Merge remote-tracking branch 'origin/main' into visual-pr-playground # Conflicts: # routes/cookbook_routes.py # routes/hwfit_routes.py # services/hwfit/fit.py # services/hwfit/models.py # static/js/cookbook-diagnosis.js # static/js/cookbook-hwfit.js # static/js/cookbook.js # static/js/cookbookRunning.js	2026-06-03 16:49:10 +09:00
pewdiepie-archdaemon	eb79b76432	Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes Backend (services/hwfit + routes): - rank_models picks visible set by REQUESTED column, not always score — sorting by Param now shows highest-param models PERIOD (incl. too_tight). - New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang cannot serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so FP8 wins when both fit. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5 on equal composite score; >=100B integers not misread as versions. - /api/cookbook/hf-latest no longer drops models without an "NB" pattern in the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered). - Cached-model scan: atexit flushes models JSON even if the script is killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s. - KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New "stalled" status for shells <1 MB with no .incomplete files. - /api/cookbook/state POST guard: rejects "done" download tasks lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stops stale tabs from poisoning persisted state. - hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization Q4_K_M -> BF16 (it is the native base, not a quant). Frontend (static/js): - Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/ 50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit column header tagged with active budget (RAM / GPU / N GPU). - Foldable Download admin-card: the Download h2 is the chevron trigger; state persists in localStorage. - Download card surfaces destination dir (Dir: <path>). Same dir on running task row, font/color matched to uptime (9px Fira Code muted, opacity .4). - Serve panel ctx text input always resets to model max on open. Sub-MB cached models show with red "download stalled" badge. - Bulk-select Cancel + Delete reset the Select button label on exit. - Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/ required; bare "Download complete" no longer marks the task done after the first config file. Clear button now sends tmux kill-session too. True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer per-shard aggregate. - Diagnosis card simplified: removed fold toggle, copy button, dismiss X. Suggestion font matches message body (12px). - HF token field flashes green check + "Saved" on save. - Cached scan no longer counts stalled rows as downloaded in Scan/Download. CSS: - dep Install button width pinned to 76px to match Installed split. - task-sub row +1px; task-status badge gets margin-right 8px. - Ctx slider styled like gallery editor sliders (thin pill rail, red thumb). - Bulk-select cancel button top -3px -> -5px.	2026-06-03 16:32:20 +09:00
Shaw	552bc15067	fix(search): degrade to empty results on non-JSON provider responses (#1129 ) (#1352 ) tavily_search, serper_search and google_pse_search parsed response.json() inside the network try block, which only caught httpx.RequestError and RateLimitError. When a provider returned a non-JSON body (an HTML error page, a truncated/empty body, a gateway 5xx), response.json() raised an UNCAUGHT json.JSONDecodeError that aborted the search in the background — exactly the 'search engines other than SearXNG fail in the background' symptom. brave_search already handles this correctly: it parses JSON in its own try block and returns [] on json.JSONDecodeError. Mirror that in the other three providers so a malformed provider response degrades to no-results instead of propagating an exception. Adds tests/test_search_provider_json.py: a non-JSON 200 body now yields [] for tavily, serper, google_pse, and brave (the last guards the reference behaviour). Co-authored-by: NubsCarson <nubs@nubs.site>	2026-06-03 14:24:23 +09:00
Afonso Coutinho	fb8a744cae	fix: skill retrieval boosts on tag substrings (e.g. 'ai' tag for any 'email' query) (#1406 ) * fix: match skill tags as whole tokens, not substrings, in retrieval * test: skill tag matching uses whole tokens, not substrings * test: give skill fixtures status=published so they reach the scoring path	2026-06-03 14:24:11 +09:00
Afonso Coutinho	b55c970ec5	fix: sports-hint ranking penalty fires on 'transport'/'passport' substrings (#1473 ) * fix: sports-hint ranking penalty fires on 'transport'/'passport' substrings * Apply word-boundary sports-hint fix to src/search/ranking.py as well	2026-06-03 14:23:52 +09:00
Afonso Coutinho	f93755e7a4	fix: params_b crashes the whole ranking on a malformed parameter_count (#1550 )	2026-06-03 14:23:30 +09:00
Afonso Coutinho	7f80d33210	fix: services research lists junk no-content pages as cited sources (#1669 )	2026-06-03 14:22:58 +09:00
Afonso Coutinho	eae8797e08	fix: web search content blocks numbered by fetch completion order break citations (#1672 )	2026-06-03 14:22:55 +09:00
Afonso Coutinho	3d00c85636	fix: hwfit native quant labels miss the cost maps and over-estimate VRAM (#1690 )	2026-06-03 14:22:42 +09:00
Stephen Purdue	85bc18b7d8	fix: fixed minor consistency issues within MemoryManager (#1353 )	2026-06-03 14:12:24 +09:00
Shaw	d38fb4bc46	fix(tts): tolerate a malformed tts_speed instead of 500-ing (#1450 ) synthesize() and get_stats() parsed the stored tts_speed with a bare float(settings.get("tts_speed", "1")). The manage_settings agent tool maps "speech speed"/"voice speed" to tts_speed and, because the setting's default is a string, writes the value through unvalidated — so an agent (or a hand-edited settings.json) can store "fast" or "". After that, GET /api/tts/stats and POST /api/tts/synthesize both 500 with ValueError until the JSON is corrected by hand. Parse defensively via a _safe_speed() helper (non-numeric/empty/<=0 -> 1.0), mirroring the settings layer's tolerance of corrupt config. Adds tests/test_tts_speed_malformed.py (stats + synthesize) — both raise ValueError before this change and pass after.	2026-06-03 14:12:03 +09:00

1 2

100 Commits