* fix(agent): don't let a materialized default budget defeat context scaling
#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.
Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.
is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): rework context-budget fix per review (#4122)
Address RaresKeY's review:
P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.
P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.
hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(agent): lock the materialized-default budget regression (review on #4121)
Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.
To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.
Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): bind the budget context window to its own provenance (review #4122)
RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
fallback while the fresh probe proves the real (smaller) served context — we'd
scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
skill test runs, bg_monitor) were capped at 6000 even when a long window is
discoverable.
Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(agent): fix stale budget comments + tighten to the contract (review #4122)
- settings.py: an explicit budget is clamped to the window only — hard_max is
auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
non-default = explicit cap, hard_max = auto-only ceiling).
Comment/docstring-only; no behaviour change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)
The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230 — #1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).
The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).
Comment/docstring-only; no behaviour change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make src.youtube_handler a compatibility wrapper around services.youtube.youtube_handler so transcript state, URL parsing, and timeout behavior no longer diverge.
Resolve remove_directory_from_rag paths through the same PERSONAL_DIR confinement helper used by add_directory_to_rag before removal sinks are reached.
Lock the API key encryption key file to owner-only permissions on creation and when reading existing keys, with regression coverage for permissions and encryption roundtrip.
Clarify in the Docker install section that Apple Silicon Docker cannot use Metal GPU acceleration for Cookbook model serving and point users to the native Apple Silicon path.
Fixes#4232
Convert email search and archive handlers from async def to sync def so FastAPI runs their blocking IMAP I/O in the threadpool instead of the event loop.
load_settings() already catches PermissionError, but load_features() caught only
FileNotFoundError/JSONDecodeError/ValueError. An existing-but-unreadable
data/features.json (e.g. root-owned after a deploy) therefore raised instead of
falling back to DEFAULT_FEATURES, taking down GET /api/auth/features and anything
that reads feature flags. Add PermissionError to the except tuple to match
load_settings().
Adds tests/test_load_features_permission_error.py.
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* ci: add security scanning suite and governance
Consolidates the security CI work into one reviewable change. Adds, as
separate workflow files under .github/workflows/:
- secret-scan.yml gitleaks (pinned + checksum-verified), full history
- workflow-security.yml actionlint + zizmor, audits the workflows themselves
- dependency-review.yml PR dependency gate + advisory pip-audit
- container-scan.yml hadolint (blocking) + Trivy image scan (advisory)
- codeql.yml CodeQL for Python and JS, main + weekly
Plus .github/dependabot.yml (pip/npm/actions/docker), .github/CODEOWNERS,
and docs/security-ci.md explaining each check and the one-time settings.
All additive: no existing files are modified. Actions are pinned to commit
SHAs, tokens default-deny (permissions: {}), advisory scans never block,
and SARIF upload is gated to push so fork PRs do not fail on a read-only
token. Composes with the correctness CI in #1015.
* ci(security): isolate Trivy from the Dockerfile lint gate
Address review on #1314 (points 2 and 3).
container-scan.yml now runs only hadolint (the blocking Dockerfile lint)
and keeps the broad pull_request + push:[main] trigger so the required
check always reports and never hangs a PR.
The advisory image scan moves to container-trivy.yml, split by event:
- pull_request / workflow_dispatch: build and scan under contents:read
only, no SARIF upload. The image build runs PR-supplied Dockerfile
instructions, so this path holds no write scope.
- push to main: build, scan, and upload SARIF with security-events:write.
Only this trusted path is granted write.
This stops PR jobs from requesting security-events:write they never use,
and a paths-ignore (matching docker-publish.yml) skips the image rebuild
on docs-only changes.
docs/security-ci.md: correct the trigger description to "every pull
request and every push to main", matching the workflows and the existing
ci.yml convention.
Verified locally: zizmor --offline --min-severity=low and actionlint are
clean on the changed and new workflow files.
---------
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
* fix: read allow_bash/allow_web_search from JSON body (#3229)
API callers using Content-Type: application/json had bash and web
tools silently disabled because allow_bash / allow_web_search were
only read from FormData (which is empty for JSON requests).
Changes:
- Fall back to JSON body for allow_bash and allow_web_search values
- Only add bash/web_search to disabled_tools when explicitly set to a
falsy value; when unset (None), defer to per-user privilege checks
- Admins with can_use_bash=True now get bash enabled by default
Fixes#3229
* fix: always send explicit allow_bash/allow_web_search from frontend
The backend 'is not None' guard (from prior commit) is correct for API
callers, but the frontend only sent allow_bash=true when the toggle was
ON — omission meant 'unspecified' which the backend treated as 'don't
disable'. Now the frontend always sends an explicit true/false value:
- allow_bash: sent on every request (checked ? 'true' : 'false')
- allow_web_search: explicit 'false' when toggle is off in agent mode
With explicit frontend values, the 'is not None' guard is safe:
- explicit true → tool enabled
- explicit false → tool disabled
- None (API caller omission) → defer to per-user privilege
---------
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>