mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-17 10:15:27 -04:00
7ae6133d7f
* fix(agent): don't let a materialized default budget defeat context scaling #1230 scales agent_input_token_budget to the model's context window unless the user explicitly set a budget, detected via is_setting_overridden(). But the settings-save path materializes every DEFAULT_SETTINGS key into settings.json (load_settings merges defaults; handlers persist the merged dict), so the persisted default 6000 reads as "overridden" and the budget code takes the min(6000, ctx) branch — silently re-capping long-context models at 6000 for anyone who has ever saved a setting. This reintroduces the exact regression #1170/#1230 set out to fix. Add is_setting_customized() (saved value != default) and gate the scaling on it instead of mere presence. A persisted default is not a user choice. is_setting_overridden has exactly one consumer (this budget path), so the change is contained. Tests cover the materialized-default regression, a deliberately-chosen budget still being honoured, and the absent-key case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agent): rework context-budget fix per review (#4122) Address RaresKeY's review: P2 (explicitness): is_setting_customized treated a saved value equal to the default as "not explicit", which ALSO blocked a user from deliberately pinning the default budget. Reframe the default value itself as the AUTO sentinel — agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context window", any other value is an explicit cap. A materialized default still reads as auto (fixing the original regression), and any non-default value the user chooses is now honoured. Drop the now-unused is_setting_customized helper. P2 (fallback context): auto-scaling trusted get_context_length() even when it returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known window), over-allocating on self-hosted/proxy setups. Add get_context_length_known() (also returns whether the window was actually discovered); the budget block passes 0 when unknown so auto-scaling stays conservative instead of inflating to an unproven window. hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that contract and answered the reviewer's question rather than silently reversing it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(agent): lock the materialized-default budget regression (review on #4121) Per WGlynn's review on the issue: add an end-to-end regression that saves an UNRELATED setting (which makes the settings-save path materialize the budget default into settings.json) and asserts the budget still auto-scales rather than re-reading as an explicit 6000 cap — locking the exact reopening shut. To make the test bite the production decision (not just re-derive it), extract `budget_is_explicit()` into src/context_budget.py and use it from the agent loop. It keys off value-vs-default (the default is the auto sentinel), NOT settings presence — which is the whole point, since the save path materializes defaults. Note: after this PR's rework, is_setting_overridden has ZERO production callers, so the merged-dict materialization smell can't reach any setting through a presence check today (WGlynn's durability concern). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agent): bind the budget context window to its own provenance (review #4122) RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop kept only the `known` flag from get_context_length_known() and budgeted off the passed-in `context_length`, which can come from a *different* lookup. Two failures: - local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT fallback while the fresh probe proves the real (smaller) served context — we'd scale off the stale value; - callers that don't pass context_length (scheduled tasks, teacher escalation, skill test runs, bg_monitor) were capped at 6000 even when a long window is discoverable. Extract budget_context_for_model() which returns the freshly-probed window when known else 0, binding the flag to the value it proves; the agent loop uses it. Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(agent): fix stale budget comments + tighten to the contract (review #4122) - settings.py: an explicit budget is clamped to the window only — hard_max is auto-only (#1190); drop the incorrect "and to hard_max". - is_setting_overridden docstring: drop the stale "adaptive budgets" example; point value-sensitive callers at context_budget.budget_is_explicit. - Tighten the budget-block comments to the contract (default = auto sentinel, non-default = explicit cap, hard_max = auto-only ceiling). Comment/docstring-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(agent): correct budget issue citations (#1190 → merged #1230/#1273) The context-budget contract (auto-sentinel, explicit budgets honoured, hard_max auto-only) merged via #1230 — #1190 was the earlier, closed, superseded PR. Re-point the contract comments at #1230 (the live source, already cited for the auto-sentinel two lines up in settings.py). The configurable hard_max setting (`agent_input_token_hard_max`) was a reviewer requirement first raised on #1190, omitted from the merged #1230, and actually added in #1273 — credit #1273 for it and correct the test comment's history (it previously implied this PR completed the requirement). Comment/docstring-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
76 lines
3.4 KiB
Python
76 lines
3.4 KiB
Python
"""Adaptive input-token budget for the agent loop (#1170).
|
|
|
|
The agent soft-trims its input context to ``agent_input_token_budget`` (default
|
|
6000). The old computation was ``min(context_length or budget, budget)``, which
|
|
made the 6000 default a hard ceiling for *every* model — so a 128K or 1M context
|
|
model was silently capped at 6000 input tokens even though it can hold far more.
|
|
|
|
This derives the effective budget from the model's discovered context window when
|
|
the user has NOT set an explicit budget, while still honouring an explicit setting
|
|
exactly (clamped to the window). Pure and side-effect free so it is unit-testable.
|
|
"""
|
|
|
|
# Generous ceiling so long-context models are unblocked without sending a
|
|
# pathologically large prompt every agent turn. Tunable; chosen to fully cover
|
|
# 128K models and give 1M models a large but bounded budget.
|
|
DEFAULT_HARD_MAX = 200_000
|
|
DEFAULT_BUDGET = 6000
|
|
DEFAULT_HEADROOM = 0.85
|
|
|
|
|
|
def compute_input_token_budget(
|
|
configured: int,
|
|
context_length: int,
|
|
explicit: bool,
|
|
*,
|
|
default: int = DEFAULT_BUDGET,
|
|
headroom: float = DEFAULT_HEADROOM,
|
|
hard_max: int = DEFAULT_HARD_MAX,
|
|
) -> int:
|
|
"""Return the effective soft input-token budget.
|
|
|
|
Args:
|
|
configured: the value read from settings (may be the default).
|
|
context_length: the model's discovered context window. Pass 0 when the
|
|
window is unknown / only a bare fallback — auto-scaling then stays
|
|
conservative instead of trusting an unproven window (review on #4122).
|
|
explicit: True if the user set a NON-default budget. The default value is
|
|
the "auto" sentinel (scale to the window); any other value is an
|
|
explicit cap. (A deliberately-chosen default can't be distinguished
|
|
from a materialized default by value, so the default reads as auto.)
|
|
|
|
Rules:
|
|
- Explicit user budget is honoured exactly, only clamped to the model's
|
|
window when that window is known (the user's deliberate choice wins;
|
|
``hard_max`` is an auto-budget ceiling only — see #1230).
|
|
- Otherwise (auto), scale to ``headroom`` of the context window, capped at
|
|
``hard_max`` — so long-context models use their capacity.
|
|
- When the window is unknown (context_length <= 0), use the conservative
|
|
``default`` budget and do NOT scale off the fallback.
|
|
"""
|
|
configured = int(configured or 0)
|
|
context_length = int(context_length or 0)
|
|
|
|
if explicit and configured > 0:
|
|
return min(configured, context_length) if context_length > 0 else configured
|
|
|
|
if context_length > 0:
|
|
scaled = int(context_length * headroom)
|
|
return max(1, min(scaled, hard_max))
|
|
|
|
return configured if configured > 0 else default
|
|
|
|
|
|
def budget_is_explicit(configured: int, *, default: int = DEFAULT_BUDGET) -> bool:
|
|
"""Whether a configured agent_input_token_budget is a deliberate explicit cap.
|
|
|
|
The default value is the "auto" sentinel (scale to the model's window), so only
|
|
a NON-default positive value counts as explicit. This keys off the VALUE, not
|
|
settings *presence* — the settings-save path materializes every default into
|
|
settings.json, so a persisted default must still read as auto (the regression
|
|
#4121 / #1230 are about). Centralised here so the materialized-default contract
|
|
is unit-testable and can't silently regress to a presence check.
|
|
"""
|
|
configured = int(configured or 0)
|
|
return configured > 0 and configured != default
|