2 Commits

Author SHA1 Message Date
Michael 39a802bea2 fix(tools): prune skipped dirs before descending in glob tool (#4538)
* fix(tools): prune skipped dirs before descending in glob tool

GlobTool used pathlib.Path.rglob which descends into every directory
(including node_modules, .git, dist, etc.) and filters AFTER the walk.
On repos with large junk directories this causes the glob tool to hang
for minutes.

Replace rglob with os.walk that prunes _CODENAV_SKIP_DIRS before
descending — matching the approach GrepTool already uses. Also add a
fast path for literal patterns (no wildcards → direct path lookup).

Fixes #4493

* fix(tools): use regex glob matching to fix * semantics and literal fallback

Replace fnmatch with _glob_to_regex so that * stays within a single
path segment (matching pathlib/rglob semantics) and **/ spans zero or
more directories.  Literal patterns now fall through to os.walk when
the direct path lookup misses, so e.g. 'foo.py' still finds files at
any depth.

Add tests for:
- bare literal matching in subdirectories
- multi-segment single-star patterns (sub/*.txt)
- * not crossing / boundaries
- ** matching at arbitrary depth

Closes #4493

---------

Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
2026-06-18 22:02:29 +02:00
Kenny Van de Maele 1f00fff837 feat: add code-navigation tools (grep, glob, ls) + read_file line ranges (#1670)
Gives the agent first-class code navigation instead of shelling out via bash
(token-heavy, unreliable on weaker models, unstructured). Mirrors the
Grep/Glob/Read primitives that Claude Code / opencode expose.

- grep: regex search over file contents across a tree. Uses ripgrep when
  available (with explicit excludes so junk dirs are skipped even without a
  .gitignore); falls back to a pure-Python walk+regex when rg is absent.
  Returns file:line:match, capped.
- glob: find files by glob pattern (recursive), newest first.
- ls: list a directory (folders first, then files with sizes).
- read_file: optional offset/limit for line-range reads of large files
  (plain-path calls stay back-compatible).

All confined by the same path policy as read_file (_resolve_tool_path:
data/tmp allowlist + sensitive-file deny). Junk dirs (.git, node_modules,
venv, __pycache__, dist/build, …) skipped. Output capped (200 hits,
400 chars/line). Admin-gated like the other filesystem tools.

Wiring: schemas + native arg->content serializer (src/tool_schemas.py), tool
tags (src/agent_tools.py), always-available + descriptions (src/tool_index.py),
admin gate (src/tool_security.py), dispatch + impls (src/tool_execution.py).

Tests: tests/test_code_nav_tools.py — match/skip-junk/ignore-case/glob-filter,
allowlist rejection, glob/ls, read-range, and the no-ripgrep Python fallback.
2026-06-04 18:37:32 +02:00