* feat(agent): workspace confinement via context-local binding + get_workspace tool Bind the per-turn workspace once in execute_tool_block; the shared path resolvers (_resolve_tool_path / _resolve_search_root) and the subprocess cwd helper (agent_cwd) read it, so file tools + bash/python are confined centrally and a new tool that uses the shared helpers cannot accidentally bypass it. Adds the admin-gated /api/workspace/browse picker, a workspace pill + directory modal (reusing existing modal/button CSS), the /workspace slash command, and a get_workspace tool (replaces a system-prompt block). Confinement is OS-agnostic (realpath/normcase/commonpath) and docker-safe (container paths, no host assumptions). Reopens #2023. * ux(workspace): clarify workspace is not a sandbox Picker modal note + pill tooltip + get_workspace tool/output wording now state plainly: read_file/write_file/edit_file/grep/glob/ls are confined to the folder, but bash/python only start there (cwd) and are not sandboxed. Modal note reuses the existing .muted class. * fix(agent): treat an active workspace as file-work intent A vague low-signal message (e.g. "look at the local project") matches no domain keywords, so tool retrieval is skipped and only always-available tools are offered — leaving the agent with no file access even though a workspace is set. When a workspace is active, include the file/code tools (incl. get_workspace) on low-signal turns so the agent can act on the folder. Also requires the tool index (ChromaDB) to be reachable for normal retrieval; that is an environment dependency, not part of this change. * ux(workspace): hide pill + overflow entry in chat mode Workspace only scopes the agent's file/shell tools, so the pill and the overflow 'Workspace' entry are agent-only now — hidden in chat mode like the bash toggle. Mode read from the DOM in syncWorkspaceIndicator; applyMode() is called from the agent/chat setMode handler. * prompt(tools): steer bash/python to defer to the dedicated file tools bash/python schema descriptions (what native-tool-calling models read) were bare and gave no steer, so models would do file ops via the shell (e.g. writing SVG/HTML, which then dumps raw markup into the tool preview). Tell bash/python in the schema + tool-index + prompt section to prefer read_file/write_file/ edit_file/grep/glob/ls and only be used for what those do not cover. * prompt(tools): keep bash/python deferral generic (no hardcoded tool names) Reference 'a dedicated tool' rather than listing read_file/write_file/grep/etc. by name, so the guidance does not go stale if those tools are renamed. * style(workspace): drop em-dashes from added code comments/strings * ux(workspace): terser non-sandbox note in picker (no tool-name list) * ux(workspace): mirror terse non-sandbox wording in pill tooltip * chore: untrack local venv symlink (run-only, not part of the feature) * prompt(workspace): keep get_workspace text generic (no hardcoded tool names) * fix(agent): low-signal + workspace surfaces only read-only file tools Intersect the files tool group with PLAN_MODE_READONLY_TOOLS so a vague message in a workspace exposes read_file/grep/glob/ls/get_workspace for exploration, but not write_file/edit_file/bash/python -- those wait for a request that actually calls for them (RAG retrieval still adds them on a real ask). * feat(workspace): cap browse listing at 500 dirs with a truncated hint Mirror the filesystem_tools._CODENAV_MAX_HITS pattern with a module-local _MAX_BROWSE_DIRS so a directory with thousands of children does not dump every row into the picker; the response carries a truncated flag and the modal tells the user to type a path to jump in. * chore: untrack local venv symlink (run-only artifact) * fix(workspace): vet the workspace root against the sensitive-path deny list at bind time The in-workspace resolver deny-lists sensitive paths inside the workspace, but the empty-path search root is the workspace itself, so a workspace of ~/.ssh could be listed via ls with no path. vet_workspace() (public, in tool_execution next to the resolvers) rejects non-directories and sensitive roots before the path is ever bound; chat_routes uses it instead of its inline isdir check. * fix(workspace): reject filesystem roots and stop showing rejected workspaces as active Review findings from #3665: P2: vet_workspace accepted / (and would accept drive/UNC roots), which makes every absolute path 'inside' the workspace and collapses confinement into host-wide file access. A root is its own dirname, so reject when dirname(resolved) == resolved; the browse response now carries a selectable flag and the picker disables 'Use this folder' on unselectable dirs. P3: /workspace set stored any string client-side and the chat route silently dropped rejected values, so the pill could claim a confinement that was not in effect. New admin-gated /api/workspace/vet validates manual paths before they persist (canonical path returned), and when a posted workspace is rejected at send time the stream emits workspace_rejected so the client clears the stored value and toasts instead of continuing silently. * fix(workspace): check caller privilege before vetting the posted workspace Review finding: /api/chat_stream called vet_workspace() on the posted value for every caller and emitted workspace_rejected on failure, so a non-admin who can chat but cannot use file/shell tools could distinguish existing directories from missing/file/sensitive/root paths by whether the event appeared. The resolution now lives in _resolve_request_workspace, which drops the submitted value uniformly for non-admin callers, with no vetting and no event, before the path ever touches the filesystem. Admin and single-user behavior is unchanged. Test pins that valid and invalid paths are indistinguishable for a non-admin and that vet_workspace is never invoked for them.
Test Suite Notes
Purpose
This file documents the shared test helpers and the review expectations that go
with them. The suite is being refactored incrementally, so this is a working
reference for that effort - not a claim that the suite is already fully
organized. Read it before adding a new helper or before reviewing a PR that
touches tests/helpers/.
For the broader rules - test taxonomy, determinism/isolation rules, the
behavioral-vs-source-text policy, and helper/factory extraction rules - see
TESTING_STANDARD.md. This file is the concrete helper
reference; that file is the standard the refactor works toward.
Running focused subsets (taxonomy markers)
tests/conftest.py tags every test at collection time with two markers derived
from its filename by tests/_taxonomy.py: an area_* marker (e.g.
area_security) and a finer sub_* marker (e.g. sub_owner_scope). This adds
markers only - it moves no files and changes no test behavior. Use them to run a
focused slice:
python3 -m pytest -m area_security
python3 -m pytest -m "area_services and sub_cookbook"
Areas are security, routes, services, cli, js, helpers, unit, and
uncategorized. Classification is conservative and token-based: a file that
matches no area keyword falls back to area_uncategorized with its filename as
the sub-area. The area_* names are registered in pyproject.toml; the dynamic
sub_* names are registered before collection by pytest_configure in
tests/conftest.py, so unknown-mark warnings still flag genuine typos.
For common focused runs, use tests/run_focus.py. It validates area and
sub-area names, accepts sub-areas with or without the sub_ prefix, and passes
extra pytest arguments after --:
python3 tests/run_focus.py --area security
python3 tests/run_focus.py --area services --sub-area cookbook
python3 tests/run_focus.py --sub-area sub_cookbook
python3 tests/run_focus.py --keyword taxonomy
python3 tests/run_focus.py --last-failed
python3 tests/run_focus.py --dry-run --area services --sub-area cookbook
python3 tests/run_focus.py --area services -- --maxfail=1 -q
Fast lane and duration visibility
--fast runs the fast lane: the tests that are not marked slow (it adds the
marker expression not slow). It composes with --area/--sub-area using
and. Because no tests may be marked slow yet, --fast can initially match
the full focused selection; it becomes a real speed-up as slow marks are added
from duration evidence. Use it for quick local or reviewer feedback; it does not
replace broader focused or full-suite validation before merge.
--durations N and --durations-min FLOAT add pytest's slowest-test reporting
so you can see where time goes. They are reporting only and do not count as a
focus selector, so --durations must be combined with a real selector
(--area, --sub-area, --keyword, --last-failed, or --fast).
Activate or otherwise use the project Python environment before running these
commands. The examples use python3 intentionally to avoid hard-coding a local
venv path.
python3 tests/run_focus.py --fast
python3 tests/run_focus.py --area services --fast
python3 tests/run_focus.py --area services --durations 25
python3 tests/run_focus.py --area services --fast --durations 25 --durations-min 0.05
The slow marker is opt-in. Mark a test slow only with duration evidence
(from --durations), not by guessing - see the fast-lane policy in
TESTING_STANDARD.md. --fast is for quick reviewer feedback and must not
replace the full suite before merge. A slow mark only excludes a test from the
fast lane; the test stays runnable directly, e.g.:
python3 -m pytest tests/test_auth_config_lock_concurrency.py
python3 -m pytest -m slow
Core principles
- Keep PRs small and homogeneous: one kind of change per PR.
- Prefer explicit local setup over hidden global fixtures.
- Avoid expanding the root
conftest.pyunless absolutely necessary. - Do not mix file moves with logic changes in the same PR.
- Do not weaken tests with
skip/xfailjust to make CI pass. - Validate the focused files you changed, plus any neighboring or order-sensitive groups they interact with.
Helper conventions
The helpers below live under tests/helpers/. They exist to remove repeated
boilerplate that already appeared across multiple tests. Reach for one only when
your test matches its intended use; do not stretch a helper to cover a new case.
tests.helpers.cli_loader.load_script
Use when a test needs to import a script under scripts/ without repeating
SourceFileLoader / importlib.util boilerplate.
- Intended for script/CLI tests that load a single file from
scripts/. - Not for arbitrary package imports - use a normal
importfor those. - When migrating an existing test to it, keep the existing stubs and assertions
unchanged. Any
sys.modulesstubs the script needs at import time must still be injected (e.g. viamonkeypatch) before callingload_script.
tests.helpers.import_state.clear_module
Use when a test must drop one cached module and its parent-package attribute before a fresh import.
- Clears
sys.modules[name]. - Clears the parent-package attribute when present.
- Good replacement for local
sys.modules.pop(...)+delattr(parent, child)blocks.
tests.helpers.import_state.preserve_import_state
Use when a test temporarily installs stubs into sys.modules and needs
deterministic cleanup afterward.
- Context manager: restores both
sys.modulesentries and parent-package attributes on exit (normal or exception). - Useful around module-level stubs or temporary imports.
- Prefer narrow, explicit module names over broad ones.
tests.helpers.import_state.clear_fake_database_modules
Use only for the guarded fake/stub database cleanup pattern.
- Preserves a real-looking
core.database(one with a string__file__). - Removes a fake/stub
core.databaseand the relatedsrc.databasestate. - Do not use as a general database reset fixture.
tests.helpers.import_state.clear_fake_endpoint_resolver_modules
Use only for the guarded fake/stub src.endpoint_resolver cleanup pattern.
- Preserves real resolver modules (those with a truthy
__file__). - Evicts fake/stub resolver modules and the dependent route modules that were cached against them.
- Accepts explicit extra dependent module names to evict alongside the defaults.
tests.helpers.sqlite_db.make_temp_sqlite
Use for the repeated file-backed temp sqlite setup in tests.
- Only constructs
(SessionLocal, engine, tmpfile)from the repeated block. - Does not patch modules and does not clean up the temp file.
- The caller must bind
SessionLocalexplicitly onto whatever module the code under test reads, and must keep the returned objects alive. - Do not use it as a general DB fixture framework.
tests.helpers.db_stubs.make_core_db_stub
Use for small import-time core.database stubs with a placeholder
SessionLocal.
- Pass model names via
modelswhen MagicMock attributes are sufficient. - Pass
attributeswhen an import needs exact placeholder values. - Set
install_core_package=Trueonly when the test also needs a fake parentcoremodule stub. - Keep custom fake sessions and route-specific database behavior local.
What not to abstract yet
Some remaining patterns should stay as-is for now rather than being forced into helpers:
- Large mixed files such as security/review regression files.
- Broad setup-oriented
sys.modulesstub installers. - One-off custom module patching.
- Custom DB session, route, and app setup.
Validation expectations
Run validation locally before opening or approving a PR. Practical checks:
git diff --check- catch whitespace and conflict-marker errors.python3 -m py_compile <changed files>- confirm changed files compile.- Focused
pyteston the changed test files. pyteston neighboring or order-sensitive test groups that share import state with the changed files.grepfor the old boilerplate when replacing it, to confirm no stragglers remain.- A fresh audit worktree when changing the helpers themselves, so stale
__pycache__or import state cannot mask a regression.
Current roadmap
- Import-state cleanup - complete.
- Document helper conventions (this file).
- Pilot the repeated import-time
core.databasestub helper. - Add further tiny helpers only when the repeated semantics are clear.
- Start low-risk file moves only after helper conventions are documented.
- Avoid moving high-risk security/route regression files first.