Files
odysseus/tests/test_context_budget.py
nsgds 7ae6133d7f fix(agent): don't let a materialized default budget defeat context-window scaling (#4122)
* fix(agent): don't let a materialized default budget defeat context scaling

#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.

Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.

is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): rework context-budget fix per review (#4122)

Address RaresKeY's review:

P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.

P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.

hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(agent): lock the materialized-default budget regression (review on #4121)

Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.

To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.

Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): bind the budget context window to its own provenance (review #4122)

RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
  fallback while the fresh probe proves the real (smaller) served context — we'd
  scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
  skill test runs, bg_monitor) were capped at 6000 even when a long window is
  discoverable.

Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): fix stale budget comments + tighten to the contract (review #4122)

- settings.py: an explicit budget is clamped to the window only — hard_max is
  auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
  point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
  non-default = explicit cap, hard_max = auto-only ceiling).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)

The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230#1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).

The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 15:17:28 +09:00

119 lines
5.5 KiB
Python

"""Issue #1170 — the agent input-token budget adapts to the model context window.
Pins the pure budget computation and the explicit-override detection.
"""
import json
from src.context_budget import compute_input_token_budget, DEFAULT_HARD_MAX
def test_default_scales_to_context_window():
# Not explicit, big window -> ~85% of the window (the old code capped at 6000).
assert compute_input_token_budget(6000, 128000, explicit=False) == int(128000 * 0.85)
def test_default_capped_at_hard_max_for_huge_windows():
assert compute_input_token_budget(6000, 1_000_000, explicit=False) == DEFAULT_HARD_MAX
def test_explicit_budget_is_honoured():
# User explicitly chose 6000 -> keep it even on a 128K model.
assert compute_input_token_budget(6000, 128000, explicit=True) == 6000
# A larger explicit budget is honoured too, clamped to the window.
assert compute_input_token_budget(50000, 128000, explicit=True) == 50000
def test_explicit_budget_clamped_to_window():
assert compute_input_token_budget(200000, 32000, explicit=True) == 32000
def test_unknown_window_falls_back_to_configured():
assert compute_input_token_budget(6000, 0, explicit=False) == 6000
assert compute_input_token_budget(0, 0, explicit=False) == 6000 # default
def test_is_setting_overridden_reads_raw_saved_file(tmp_path, monkeypatch):
import src.settings as settings
f = tmp_path / "settings.json"
f.write_text(json.dumps({"agent_input_token_budget": 12000}), encoding="utf-8")
monkeypatch.setattr(settings, "SETTINGS_FILE", str(f))
assert settings.is_setting_overridden("agent_input_token_budget") is True
assert settings.is_setting_overridden("some_unset_key") is False
f.write_text(json.dumps({}), encoding="utf-8")
assert settings.is_setting_overridden("agent_input_token_budget") is False
# ---------------------------------------------------------------------------
# Configurable hard_max — the ceiling on the auto-derived path is a setting
# (`agent_input_token_hard_max`), not a hidden constant. History: a reviewer
# required it on #1190, the merged #1230 shipped without it, and #1273 added it.
# This test pins the function-level override (the `hard_max` parameter); without
# a raisable ceiling, admins on 1M+ context APIs would be stuck at the 200K default.
# ---------------------------------------------------------------------------
def test_custom_hard_max_overrides_default_in_auto_branch():
"""A caller-supplied hard_max lifts the auto-derived ceiling."""
# Without override: 1M ctx -> capped at DEFAULT_HARD_MAX (200K)
assert compute_input_token_budget(6000, 1_000_000, explicit=False) == DEFAULT_HARD_MAX
# With explicit raise: 1M ctx -> 850K (85% of 1M), under the raised ceiling
assert compute_input_token_budget(6000, 1_000_000, explicit=False, hard_max=900_000) == int(1_000_000 * 0.85)
def test_custom_hard_max_lowers_default_for_cost_paranoid_setups():
"""A lower ceiling caps the auto-derived budget below the default."""
# 128K ctx, default ceiling 200K -> 85% of 128K = 108800
assert compute_input_token_budget(6000, 128_000, explicit=False) == int(128_000 * 0.85)
# Same ctx, ceiling lowered to 50K -> capped at 50K instead
assert compute_input_token_budget(6000, 128_000, explicit=False, hard_max=50_000) == 50_000
def test_hard_max_has_no_effect_on_explicit_branch():
"""When the user set an explicit budget, hard_max must not silently cap it."""
# User chose 900K explicitly; ctx is 1M; ceiling is 100K — user's choice wins.
assert compute_input_token_budget(900_000, 1_000_000, explicit=True, hard_max=100_000) == 900_000
def test_default_settings_registers_hard_max_key():
"""Required so /api/auth/settings and manage_settings can persist the key."""
from src.settings import DEFAULT_SETTINGS
assert "agent_input_token_hard_max" in DEFAULT_SETTINGS
assert DEFAULT_SETTINGS["agent_input_token_hard_max"] == DEFAULT_HARD_MAX
def test_alias_map_registers_friendly_names():
"""`manage_settings` should accept 'hard max' and friends."""
from pathlib import Path
src = Path("src/tool_implementations.py").read_text()
assert '"hard max": "agent_input_token_hard_max"' in src
assert '"token budget cap": "agent_input_token_hard_max"' in src
assert '"input budget cap": "agent_input_token_hard_max"' in src
def test_agent_loop_reads_hard_max_setting(tmp_path, monkeypatch):
"""End-to-end: a saved settings.json value for agent_input_token_hard_max
must reach compute_input_token_budget on the real agent_loop call path."""
import src.settings as settings
# Point SETTINGS_FILE at a temp file with our override.
f = tmp_path / "settings.json"
f.write_text(json.dumps({"agent_input_token_hard_max": 750_000}), encoding="utf-8")
monkeypatch.setattr(settings, "SETTINGS_FILE", str(f))
monkeypatch.setattr(settings, "_settings_cache", None)
# Read via the same import path the agent loop uses.
assert settings.get_setting("agent_input_token_hard_max", DEFAULT_HARD_MAX) == 750_000
# Malformed value falls back to DEFAULT_HARD_MAX (defensive, matches the
# try/except in src/agent_loop.py).
f.write_text(json.dumps({"agent_input_token_hard_max": "huge"}), encoding="utf-8")
monkeypatch.setattr(settings, "_settings_cache", None)
raw = settings.get_setting("agent_input_token_hard_max", DEFAULT_HARD_MAX)
try:
parsed = int(raw)
except (TypeError, ValueError):
parsed = DEFAULT_HARD_MAX
if parsed <= 0:
parsed = DEFAULT_HARD_MAX
assert parsed == DEFAULT_HARD_MAX