3 Commits

Author SHA1 Message Date
nsgds 7ae6133d7f fix(agent): don't let a materialized default budget defeat context-window scaling (#4122)
* fix(agent): don't let a materialized default budget defeat context scaling

#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.

Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.

is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): rework context-budget fix per review (#4122)

Address RaresKeY's review:

P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.

P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.

hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(agent): lock the materialized-default budget regression (review on #4121)

Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.

To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.

Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): bind the budget context window to its own provenance (review #4122)

RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
  fallback while the fresh probe proves the real (smaller) served context — we'd
  scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
  skill test runs, bg_monitor) were capped at 6000 even when a long window is
  discoverable.

Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): fix stale budget comments + tighten to the contract (review #4122)

- settings.py: an explicit budget is clamped to the window only — hard_max is
  auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
  point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
  non-default = explicit cap, hard_max = auto-only ceiling).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)

The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230#1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).

The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 15:17:28 +09:00
nickorlabs c39d8db12a fix(agent): make context-budget hard_max configurable via agent_input_token_hard_max setting (#1273)
Completes the reviewer requirement from PR #1190 review that was carried
over but not implemented in #1230:

> "The hard max is a function-local constant. For this setting, the ceiling
>  should be configurable or at least represented as a named setting/default
>  with tests."
                                                                — review on #1190

#1230 shipped the adaptive auto-derivation but left `DEFAULT_HARD_MAX = 200_000`
as a hardcoded module constant in src/context_budget.py. Admins on premium
APIs with large context windows (kimi-k2 / minimax-m3 at 1M, etc.) can use
their full window today only by setting `agent_input_token_budget`
explicitly — which then takes them off the adaptive auto-path entirely.

## What this PR changes

- src/settings.py: register `agent_input_token_hard_max` in
  DEFAULT_SETTINGS, default 200_000 (matches `DEFAULT_HARD_MAX`). Inline
  comment documents the no-op semantics in the explicit branch.

- src/agent_loop.py: read the setting at the call site and pass it as the
  `hard_max` kwarg of `compute_input_token_budget`. Defensive parsing —
  missing / non-int / zero values fall back to `DEFAULT_HARD_MAX`, so a
  misconfig cannot silently zero the budget.

- src/tool_implementations.py: three friendly aliases for `manage_settings`:
  - "hard max" -> agent_input_token_hard_max
  - "token budget cap" -> agent_input_token_hard_max
  - "input budget cap" -> agent_input_token_hard_max
  Plus the existing "token budget" -> agent_input_token_budget keeps a
  matching shorter alias "input budget".

- tests/test_context_budget.py: 6 new tests on top of the existing 6:
  - hard_max raises the auto ceiling (1M ctx + raised cap -> 85% of ctx)
  - hard_max lowers the auto ceiling (128K ctx + 50K cap -> 50K)
  - hard_max has no effect on the explicit branch
  - DEFAULT_SETTINGS contains the new key
  - manage_settings aliases are registered
  - the live get_setting path returns the override value, and malformed
    values fall back per the agent_loop defensive parsing

12 passed in 0.04s. No changes to the pure helper signature or semantics;
#1230's behavior is the default when the new setting is unset.

## How it lets users drop the explicit override

Before this PR, on a 1M-context model:
  agent_input_token_budget = 900_000  (explicit)  -> 900K  [user override]
  agent_input_token_budget = <unset>  (auto)      -> 200K  [HARD_MAX]

After this PR, same model:
  agent_input_token_budget = <unset>
  agent_input_token_hard_max = 900_000
                                      -> min(1M * 0.85, 900K) = 850K  [auto, no override needed]

The explicit-override path keeps working unchanged for users who prefer it.
2026-06-03 01:36:57 +09:00
lekt8 8c376d2b0e feat: adapt agent_input_token_budget to the model context window (#1170) (#1230)
The agent soft-trims input context to `agent_input_token_budget` (default 6000).
The old computation `min(context_length or budget, budget)` made the 6000 default
a hard ceiling for every model, so 128K/1M context models were silently capped at
6000 input tokens — now that num_ctx is sent correctly (#1056), this was the last
barrier to actually using a long context window.

This derives the default budget from the model's discovered context window
(~85%, capped at a generous hard max) while honouring an explicit user setting
exactly (clamped to the window). When the window is unknown it falls back to the
previous value, so behaviour is unchanged for that case.

- src/context_budget.py: pure `compute_input_token_budget()` (unit-testable)
- src/settings.py: `is_setting_overridden()` to tell an explicit user value from
  the merged default (load_settings merges DEFAULT_SETTINGS, so equality alone
  can't distinguish them)
- src/agent_loop.py: use the helper in the soft-trim path

Covered by tests/test_context_budget.py (6 cases).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 00:13:53 +09:00