Commit Graph

27 Commits

Author SHA1 Message Date
andrewemer cd02ac7ef6 fix(agent): skill-prescribed tools never reach the model's schema list (#4008)
* Agent: make skill-prescribed tools actually callable

The skill index and matched-skill procedures are injected into the
prompt, but tool selection never followed: manage_skills wasn't in the
RAG-selected schema list (so the model substituted manage_memory), and
a matched skill could prescribe tools (grep, read_file) the model had
no schema for. Now:

- manage_skills rides along whenever the owner has any skills indexed
- a Jaccard-matched skill's requires_toolsets join the selection
- viewing a skill mid-turn via manage_skills unlocks its
  requires_toolsets for subsequent rounds
- admin-intent turns send _ADMIN_TOOLS schemas, matching the prompt
  text _build_base_prompt already advertises
- index_for(active_toolsets=None) no longer hides requires_toolsets
  skills from callers that don't know the active set

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* Agent: validate skill requires_toolsets against known tools, not TOOL_SECTIONS

grep/glob/ls ship as function schemas without a prompt-prose section,
so gating on TOOL_SECTIONS silently dropped them from a skill's
requires_toolsets.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 20:32:43 +09:00
Vishnu 933ec8fec9 fix(memory): reject ambiguous multi-object outputs during skill extraction (#3985) 2026-06-15 10:44:43 +00:00
pewdiepie-archdaemon fa8c93ec0a Cookbook UI: Ollama browser, advanced serve fold, API tokens form, diagnosis toolbar, polish
Surface a lot of accumulated cookbook + UI work as a single non-agent
commit so the agent rework lands cleanly.

Highlights:
- Ollama as a first-class backend in the Cookbook:
  * Download input accepts ollama-style names (name:tag) → backend=ollama
  * /api/cookbook/ollama/library (cached scrape of ollama.com + curated
    fallback so classic models like qwen2.5 stay reachable)
  * "Browse Ollama library" toggle below Download with size chips
  * Engine=Ollama in hwfit toolbar merges the Ollama library into the
    main scan list as per-tag rows with the same Fit/Param/Quant/VRAM
    columns; click → fills Download input
- API Tokens form added to Integrations panel (matching wired
  loadTokens()/initTokenForm() that had no HTML)
- Serve panel polish: Advanced fold tightening (-8px nudges on vLLM
  checks, Extra args, Spec row), n_cpu_moe + Split Mode controls
  pulled up 8px to align with the row's checkboxes, GGUF File dropdown
  exposed for Ollama backend, GPU re-render on Edit serve restore,
  _forceBackend flag so saved serveState wins over backend detection,
  cookbook:servers-changed CustomEvent so panels don't need refresh
- Models page redesign: Add Models row (URL + hidden API key reveal +
  Type select + Scan/Ollama/Key/Test/Add icon buttons), Probe All +
  Clear-offline buttons in Added Models toolbar, offline-pill removed
  (opacity already conveys state), Engine dropdown gains Ollama option
- _ping_endpoint probes /v1/models then base, accepts 4xx as
  reachable (vLLM returns 404 on bare /v1, fully working endpoints
  were showing offline)
- Diagnosis card: × dismiss + Copy bundle buttons restored on the
  serve error feedback card
- Orphan tmux sweep re-enabled behind a 60s rate-limit + background
  Thread (off the main event loop) so dead serves get discovered
- cookbook_routes auto-register watchdog: drops the endpoint if the
  serve session exits non-zero within the first ~3min
- ollama-rocm sidecar awareness in download wrapper (`docker exec
  ollama-rocm ollama pull` when host ollama isn't installed)
- Skill extractor sets initial_status="published" when
  auto_approve_skills pref is on (audit demotes later)
- Skill list / model list / cookbook scan misc polish
2026-06-09 09:46:19 +09:00
CorVous 34a3f8637a fix(memory): make auto-memory extraction reliable for reasoning models (#3190)
* fix(memory): auto-memory extracted nothing — flatten window so the prompt ends on a user turn

extract_and_store appended the recent window as raw alternating role messages
after the system prompt. Since the window is the last N messages, the prompt
usually ENDED on an assistant turn — and a chat model given a prompt ending on
an assistant turn returns an empty completion (nothing to answer). The result
was facts=[] → "Auto memory extraction ran: 0 candidates" on every run, so no
memories were ever stored, while skill extraction (which flattens the transcript
into a single user message) worked fine.

Flatten the window into one user message ending with an explicit instruction,
mirroring the skill extractor, so the model always responds. Also harden parsing
for reasoning models, matching the audit path which already does this:
- raise max_tokens 500 → 4096 (a reasoning model spends the budget on <think>
  before emitting JSON; 500 truncated it before any JSON appeared);
- strip <think>/prose preambles via strip_think and slice the embedded JSON
  array before json.loads, instead of bombing on char 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: tighten memory-extraction-empty-completion — clarify JSON-slice comment re prior strip steps

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(memory): reframe the comment to the accurate root cause (raw-chat framing)

The earlier comment leaned on "ends on an assistant turn -> empty completion",
which is only one failure mode. The dominant cause, confirmed by a controlled
repro (0/6 old vs 6/6 new on this model), is that passing the window as raw chat
messages makes the model treat it as a conversation to continue rather than a
transcript to analyze, so it returns [] even when durable facts are present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(memory): cover extraction JSON parsing + slice trailing commentary unconditionally

Factor the strip/fence/slice/json.loads logic out of extract_and_store into
a pure module-level helper _parse_extraction_json(raw) -> list and drop the
'text[0] != "["' guard so the array is sliced whenever both brackets exist
(fixes trailing commentary like '[...] Done!' reaching json.loads).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 19:57:44 +02:00
Mike ac94885c84 refactor(constants): single source of truth for data dir (#3368)
* refactor(constants): single source of truth for data dir + merge core/src constants

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(contributing): use named src.constants for data paths, drop core/constants references

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 09:58:52 +02:00
Lucas Daniel 73315e6ddc fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON (#2205)
* fix(skill-extractor): walk all brace candidates so stray braces in prose do not swallow valid JSON

The extractor sliced from the FIRST brace to the LAST brace to recover
JSON embedded in surrounding commentary. When the model emits stray
braces before the JSON object, the slice produces invalid JSON,
json.loads raises, and the exception is swallowed -- the skill is
silently lost.

Fix: walk each brace candidate left-to-right and attempt json.loads on
each slice. The first candidate that parses successfully wins. If none
parse, json.loads on the original text raises and the existing
JSONDecodeError handler logs and returns None as before.

Tested locally -- 8/8 tests passed:
  tests/test_extract_skill_json_nonstring.py (2 passed)
  tests/test_skill_extractor_rows.py (1 passed)
  tests/test_search_content_extraction_parity.py (2 passed)
  tests/test_deep_research_search_error.py (3 passed)

Closes #2199

* test(skill-extractor): add focused repro for stray-brace JSON recovery

* test(skill-extractor): add regression test for leading invalid-brace fragment

Addresses the remaining edge case from review: a response that *starts*
with a brace but the leading fragment isn't valid JSON (e.g.
'{not json}\n{"title": "Valid later", ...}') still needs to recover
the valid skill object that follows.

_extract_json_object (already on dev) handles this correctly — it tries
the whole de-fenced string first, then walks each '{' candidate left-to-
right regardless of whether the response begins with '{', so the leading
invalid fragment no longer short-circuits recovery of the real object.
Updates the comment at the call site to call this out explicitly and adds
a regression test covering exactly the scenario described in review.
2026-06-07 23:31:12 +01:00
Mazen Tamer Salah 92ef01d4fa fix(skills): tolerate a stray brace before the JSON in skill extraction (#2200)
maybe_extract_skill() sliced the LLM response from the first '{' to the
last '}'. When a model emits a stray brace in prose before the real
object (e.g. "uses {placeholder} then {...}"), the slice starts at the
prose brace, json.loads fails, and a valid skill is silently dropped.

Factor parsing into _extract_json_object(), which tries the whole
(de-fenced) string first and then each '{' start position, returning the
first candidate that parses to a JSON object.

Adds tests/test_skill_extractor_json.py.
2026-06-07 16:54:36 +02:00
SurprisedDuck c75d3e1975 fix(memory): record dislikes as dislikes, not preferences (#2435)
_fallback_memory_candidates matched both positive (prefer/like/love) and
negative (hate / do not like / don't like) sentiment verbs in one regex
alternation, then formatted every hit as "User prefers {X}.". So
"I hate cilantro" was stored as "User prefers cilantro." -- the inverse of
what the user said. These fallback facts are persisted to memory and later
re-injected into the model's context, so the inverted preference actively
misleads the assistant.

Capture the matched verb and branch on it: negatives become
"User dislikes {X}.", positives stay "User prefers {X}." (still filed under
the existing "preference" category).

Supported by Claude Opus 4.8

Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
2026-06-07 16:36:07 +02:00
Giulio Zelante b448119919 feat(skills): import SKILL.md bundles from public GitHub URLs (#2576)
* feat(skills): import SKILL.md bundles from public GitHub URLs

Supports GitHub tree/blob/raw links and skills.sh pages that resolve to GitHub.
Installs SKILL.md plus sibling text assets under data/skills/imported/.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): admin-gate URL import and validate redirect hosts

- require_admin on POST /api/skills/import-from-url (matches other skill admin routes)
- reject cross-host redirects after httpx follow_redirects
- test for redirect host validation

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): match Brain Add panel import/submit button styles

- Skill URL Import: theme-io-btn + download icon (same as memory Import)
- Add Skill submit: confirm-btn confirm-btn-primary

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): allow api.github.com during directory import

Real imports hit the GitHub contents API after redirects; whitelist
api.github.com and add regression tests. Shrink Import button with flex:none.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): align skill Import button with URL input row

Match memory-add-input height (28px) in memory-add-row and center the
download icon with flexbox instead of vertical-align hacks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): cancel modal-body margin on skill Import button

The skill Import button sits in .memory-add-row beside an input; the
global .modal-body button { margin-top: 6px } rule only affected buttons,
pushing Import down and misaligning the download icon. Reset margin-top
and match Memory Import SVG markup at 28px row height.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): surface GitHub API errors on URL import

Pass through GitHub response messages (especially 403 rate limits) as
SkillImportError instead of a generic download failure.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-05 19:48:23 +02:00
afonsopc 28b296a712 Fix auto-memory vector dedup dropping a user's fact on cross-tenant match
extract_and_store dedups each extracted fact against the vector store
before the (owner-scoped) text fallback. The vector store is a single
shared ChromaDB collection storing only {"source": "memory"} — no
owner — and find_similar queries it with no owner filter, so it can
return a memory_id belonging to a different tenant. The old code
continue'd (skipped storing) on any vector hit without checking
ownership, so when ChromaDB is healthy (the common path) a user's
freshly-extracted fact was silently dropped because it was merely
semantically similar to another user's memory — the text fallback that
IS owner-scoped never ran. Gate the skip on the matched memory being
this user's own (or legacy unowned), mirroring the text dedup predicate;
cross-tenant or stale matches fall through. Same bug class as #1743.
2026-06-04 23:45:13 +01:00
Nicholai c916224510 feat(memory): add provider interface (#72) 2026-06-04 16:26:11 +01:00
Nicholai 4dc11cfe6b refactor(memory): canonicalize memory imports (#50) 2026-06-04 05:31:15 +01:00
Afonso Coutinho fb8a744cae fix: skill retrieval boosts on tag substrings (e.g. 'ai' tag for any 'email' query) (#1406)
* fix: match skill tags as whole tokens, not substrings, in retrieval

* test: skill tag matching uses whole tokens, not substrings

* test: give skill fixtures status=published so they reach the scoring path
2026-06-03 14:24:11 +09:00
Stephen Purdue 85bc18b7d8 fix: fixed minor consistency issues within MemoryManager (#1353) 2026-06-03 14:12:24 +09:00
red person d7a6cadbe2 Skip invalid memory extractor rows (#1535) 2026-06-03 14:07:00 +09:00
red person ee8c049f9e Skip invalid skill extractor rows (#1546) 2026-06-03 14:06:53 +09:00
Mubashir R 61d62a3cb8 Fix memory bullet extraction in service copy
Fix services.memory bullet-list extraction by grouping the bullet/number regex before the capture, and cover both memory manager copies in the regression test.
2026-06-03 13:41:46 +09:00
Afonso Coutinho 9dd9bb8a3f fix: memory recall crashes on a non-dict row from the vector store (#1705) 2026-06-03 13:35:09 +09:00
pewdiepie-archdaemon 8e2b9baf19 Rebuild memory vector index from the full saved set, not just the audited owner (#1747)
audit_memories saves final_entries merged with other owners' entries
(correct), but then rebuilt the shared vector collection from
final_entries alone — wiping every other owner from semantic search
until they happened to run their own audit. Keyword fallback masked
it, so it degraded silently. Capture saved_entries once and rebuild
from that.

Caught by #1747.
2026-06-03 11:36:24 +09:00
Vykos 5ee30cc144 Scope skills usage by owner (#1312) 2026-06-03 02:27:43 +09:00
Tushar-Projects c3228f8b59 Background tasks: respect active session model fallback 2026-06-02 20:57:42 +09:00
David Anderson 610968f91e fix: data integrity — deep-research result parsing + memory-extraction durability (#808)
Two independent data-integrity bugs:

- services/research/service.py: ResearchService.research() (the public deep-research
  API, re-exported from services/__init__) treated the handler return value as a
  dict (result.get("sources"/"summary"/...)), but call_research_service() returns a
  formatted markdown STRING -> AttributeError: str has no attribute get on EVERY
  successful call, making the API unusable for any non-error result. Now uses the
  string report as the summary and parses sources from the "### Sources" markdown
  section (section-bounded, URL-deduped), with a defensive dict branch for back-compat.

- services/memory/memory_extractor.py: extract_and_store guarded the vector-store
  find_similar/add calls only with the .healthy flag set ONCE at init. If the
  embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint
  down), those calls raised, the exception escaped the dedup loop, skipped
  memory_manager.save(), and was swallowed by the outer try/except -> EVERY
  validated fact from the session was silently lost (the function docstring
  promises "never raised"). Now falls back to the existing text/fuzzy dedup so
  facts are still saved when the vector index is unavailable at runtime.

Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.
2026-06-02 11:27:31 +09:00
Ernest Hysa f4aef0dcf7 fix(skills): scope skill reads to caller owner (#777)
read_skill_md and read_skill_reference walk all skill files via
_iter_skill_files and return the first match by slug, regardless
of owner. In a multi-user deployment where two users have skills
with the same slug under different categories, a caller scoped
to owner='alice' can read Bob's skill content.

This is the same cross-tenant leak class as the update_skill /
delete_skill fix (PR #755, merged), but on the read path.

Changes:
- read_skill_md / read_skill_reference accept owner= param (default
  None = match ownerless only, matching the write-path convention).
- 7 callers updated: tool_implementations.py (view, view_ref, patch),
  builtin_actions.py (test_skills), skills_routes.py (audit, source,
  test routes).
- Tests: read scoping (alice reads hers, not bob's), positive update
  scoping (alice can mutate her own), ownerless-match default.
2026-06-02 11:21:27 +09:00
Ernest Hysa d42e6a7acc Scope skill mutations to caller owner
SkillsManager.update_skill walks every SKILL.md on disk and matches by
slug only; the 'owner' key in its scalar_keys whitelist meant a caller
could pass updates={'owner': 'attacker', 'description': 'pwned'} and the
first matching file on disk got silently re-owned. Two users with the
same slug under different category directories (which is supported by
the on-disk layout <category>/<name>/SKILL.md) could each stomp the
other's skill via the manage_skills tool or the in-process callers in
tool_implementations.py (edit, patch, publish, delete).

update_skill and delete_skill now require the caller's owner and only
match a file whose parsed owner field matches. The default of None
means 'no scope' and only matches ownerless skills, so an unsafe call
without an explicit owner is now a no-op. 'owner' is also removed from
scalar_keys so the updates dict cannot be used to reassign ownership
even when the manager is called from an in-process path that didn't
supply the owner argument.

The in-process callers in tool_implementations.py are updated to pass
owner=owner (which was already in scope at every call site) so the
HTTP and agent paths both go through the scoped check. The HTTP route
at routes/skills_routes.py:1499 was already owner-scoped via
sm.load(owner=user); the fix brings the in-process path up to the
same standard.
2026-06-02 05:59:43 +09:00
Fernando Lazzarin 14e8cffa41 Fail closed on untrusted teacher draft confidence
Follow-up to #275. get_relevant_skills() treats a missing/unparseable
confidence as 1.0, so it always clears the injection threshold. For
teacher-escalation drafts -- auto-written from a possibly untrusted trace
and then injected as authoritative guidance -- that means a draft can be
auto-injected regardless of the configured confidence bar.

Require teacher-escalation drafts to carry an explicit, parseable
confidence that meets min_confidence; fail closed otherwise. Hand-authored
legacy drafts keep the lenient "unset -> keep" behavior so they don't
silently vanish, and published skills are unaffected.

Ran: python -m py_compile services/memory/skills.py + a get_relevant_skills
unit check (teacher drafts with None/garbage/0.8 excluded at min=0.85; 0.9
included; legacy + published unaffected; gate-off control unchanged).

Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 15:20:29 +09:00
pewdiepie-archdaemon 0888a3b3e6 Add native Windows compatibility layer 2026-06-01 15:09:47 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00