mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-19 19:25:27 -04:00
fix(agent): don't let a materialized default budget defeat context-window scaling (#4122)
* fix(agent): don't let a materialized default budget defeat context scaling #1230 scales agent_input_token_budget to the model's context window unless the user explicitly set a budget, detected via is_setting_overridden(). But the settings-save path materializes every DEFAULT_SETTINGS key into settings.json (load_settings merges defaults; handlers persist the merged dict), so the persisted default 6000 reads as "overridden" and the budget code takes the min(6000, ctx) branch — silently re-capping long-context models at 6000 for anyone who has ever saved a setting. This reintroduces the exact regression #1170/#1230 set out to fix. Add is_setting_customized() (saved value != default) and gate the scaling on it instead of mere presence. A persisted default is not a user choice. is_setting_overridden has exactly one consumer (this budget path), so the change is contained. Tests cover the materialized-default regression, a deliberately-chosen budget still being honoured, and the absent-key case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agent): rework context-budget fix per review (#4122) Address RaresKeY's review: P2 (explicitness): is_setting_customized treated a saved value equal to the default as "not explicit", which ALSO blocked a user from deliberately pinning the default budget. Reframe the default value itself as the AUTO sentinel — agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context window", any other value is an explicit cap. A materialized default still reads as auto (fixing the original regression), and any non-default value the user chooses is now honoured. Drop the now-unused is_setting_customized helper. P2 (fallback context): auto-scaling trusted get_context_length() even when it returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known window), over-allocating on self-hosted/proxy setups. Add get_context_length_known() (also returns whether the window was actually discovered); the budget block passes 0 when unknown so auto-scaling stays conservative instead of inflating to an unproven window. hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that contract and answered the reviewer's question rather than silently reversing it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(agent): lock the materialized-default budget regression (review on #4121) Per WGlynn's review on the issue: add an end-to-end regression that saves an UNRELATED setting (which makes the settings-save path materialize the budget default into settings.json) and asserts the budget still auto-scales rather than re-reading as an explicit 6000 cap — locking the exact reopening shut. To make the test bite the production decision (not just re-derive it), extract `budget_is_explicit()` into src/context_budget.py and use it from the agent loop. It keys off value-vs-default (the default is the auto sentinel), NOT settings presence — which is the whole point, since the save path materializes defaults. Note: after this PR's rework, is_setting_overridden has ZERO production callers, so the merged-dict materialization smell can't reach any setting through a presence check today (WGlynn's durability concern). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agent): bind the budget context window to its own provenance (review #4122) RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop kept only the `known` flag from get_context_length_known() and budgeted off the passed-in `context_length`, which can come from a *different* lookup. Two failures: - local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT fallback while the fresh probe proves the real (smaller) served context — we'd scale off the stale value; - callers that don't pass context_length (scheduled tasks, teacher escalation, skill test runs, bg_monitor) were capped at 6000 even when a long window is discoverable. Extract budget_context_for_model() which returns the freshly-probed window when known else 0, binding the flag to the value it proves; the agent loop uses it. Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(agent): fix stale budget comments + tighten to the contract (review #4122) - settings.py: an explicit budget is clamped to the window only — hard_max is auto-only (#1190); drop the incorrect "and to hard_max". - is_setting_overridden docstring: drop the stale "adaptive budgets" example; point value-sensitive callers at context_budget.budget_is_explicit. - Tighten the budget-block comments to the contract (default = auto sentinel, non-default = explicit cap, hard_max = auto-only ceiling). Comment/docstring-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(agent): correct budget issue citations (#1190 → merged #1230/#1273) The context-budget contract (auto-sentinel, explicit budgets honoured, hard_max auto-only) merged via #1230 — #1190 was the earlier, closed, superseded PR. Re-point the contract comments at #1230 (the live source, already cited for the auto-sentinel two lines up in settings.py). The configurable hard_max setting (`agent_input_token_hard_max`) was a reviewer requirement first raised on #1190, omitted from the merged #1230, and actually added in #1273 — credit #1273 for it and correct the test comment's history (it previously implied this PR completed the requirement). Comment/docstring-only; no behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
+56
-22
@@ -222,16 +222,12 @@ KNOWN_CONTEXT_WINDOWS = {
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cache
|
||||
# ---------------------------------------------------------------------------
|
||||
_context_cache: Dict[Tuple[str, str], int] = {}
|
||||
_context_cache: Dict[Tuple[str, str], Tuple[int, bool]] = {}
|
||||
|
||||
|
||||
def get_context_length(endpoint_url: str, model: str) -> int:
|
||||
"""Get the context window size for a model.
|
||||
|
||||
Queries /v1/models on the endpoint and looks for context_length
|
||||
or context_window fields. Caches result per (endpoint, model).
|
||||
Falls back to DEFAULT_CONTEXT if unavailable.
|
||||
"""
|
||||
def _get_context_length_cached(endpoint_url: str, model: str) -> Tuple[int, bool]:
|
||||
"""Return (context_length, known). ``known`` is False only when the value is a
|
||||
bare DEFAULT_CONTEXT fallback (no endpoint report and not in the known table)."""
|
||||
configured_kind = _configured_endpoint_kind(endpoint_url)
|
||||
is_local = is_local_endpoint(endpoint_url)
|
||||
# Key on (endpoint_url, model): the same model id can be served by two
|
||||
@@ -242,14 +238,50 @@ def get_context_length(endpoint_url: str, model: str) -> int:
|
||||
if not is_local and cache_key in _context_cache:
|
||||
return _context_cache[cache_key]
|
||||
|
||||
ctx = _query_context_length(endpoint_url, model)
|
||||
ctx, known = _query_context_length(endpoint_url, model)
|
||||
# Only cache non-default values to allow retry on next request.
|
||||
# Local endpoints can restart with a different --max-model-len while keeping
|
||||
# the same model id, so always re-query them instead of serving stale cache.
|
||||
if not is_local and (ctx != DEFAULT_CONTEXT or configured_kind in ("api", "proxy")):
|
||||
_context_cache[cache_key] = ctx
|
||||
_context_cache[cache_key] = (ctx, known)
|
||||
logger.info(f"Context length for {model}: {ctx}")
|
||||
return ctx
|
||||
return ctx, known
|
||||
|
||||
|
||||
def get_context_length(endpoint_url: str, model: str) -> int:
|
||||
"""Get the context window size for a model.
|
||||
|
||||
Queries /v1/models on the endpoint and looks for context_length
|
||||
or context_window fields. Caches result per (endpoint, model).
|
||||
Falls back to DEFAULT_CONTEXT if unavailable.
|
||||
"""
|
||||
return _get_context_length_cached(endpoint_url, model)[0]
|
||||
|
||||
|
||||
def get_context_length_known(endpoint_url: str, model: str) -> Tuple[int, bool]:
|
||||
"""Like ``get_context_length`` but also returns whether the window was actually
|
||||
discovered (endpoint-reported or in the known-models table) rather than the bare
|
||||
DEFAULT_CONTEXT fallback. Callers that *scale* a budget off the window must not
|
||||
trust an unknown value — a fallback 128K isn't proof the model holds 128K
|
||||
(review on #4122)."""
|
||||
return _get_context_length_cached(endpoint_url, model)
|
||||
|
||||
|
||||
def budget_context_for_model(endpoint_url: str, model: str, *, fallback: int = 0) -> int:
|
||||
"""Context window to scale the agent input budget against.
|
||||
|
||||
Returns the *freshly discovered* window when it was actually proven
|
||||
(endpoint-reported / known table), else 0 so auto-scaling stays conservative.
|
||||
Crucially this binds the ``known`` flag to the value it proves — callers must
|
||||
not pair this flag with a context length from a *different* lookup (a stale
|
||||
local re-query, or a caller that didn't pass one), which would budget off an
|
||||
unproven number (review on #4122). On probe error, returns ``fallback`` (the
|
||||
caller's best-known value) to preserve prior behaviour."""
|
||||
try:
|
||||
ctx, known = get_context_length_known(endpoint_url, model)
|
||||
return ctx if known else 0
|
||||
except Exception:
|
||||
return fallback
|
||||
|
||||
|
||||
def _lookup_known(model: str) -> Optional[int]:
|
||||
@@ -271,8 +303,9 @@ def _lookup_known(model: str) -> Optional[int]:
|
||||
return best_ctx
|
||||
|
||||
|
||||
def _query_context_length(endpoint_url: str, model: str) -> int:
|
||||
"""Query the model API for context length."""
|
||||
def _query_context_length(endpoint_url: str, model: str) -> Tuple[int, bool]:
|
||||
"""Query the model API for context length. Returns (context_length, known) where
|
||||
``known`` is False only for the bare DEFAULT_CONTEXT fallback."""
|
||||
known = _lookup_known(model)
|
||||
api_ctx = None
|
||||
configured_kind = _configured_endpoint_kind(endpoint_url)
|
||||
@@ -283,8 +316,8 @@ def _query_context_length(endpoint_url: str, model: str) -> int:
|
||||
if configured_kind in ("api", "proxy"):
|
||||
if known:
|
||||
logger.info(f"Using known context window for {model}: {known}")
|
||||
return known
|
||||
return DEFAULT_CONTEXT
|
||||
return known, True
|
||||
return DEFAULT_CONTEXT, False
|
||||
|
||||
# Try llama.cpp /slots endpoint first — reports actual serving context
|
||||
if is_local_endpoint(endpoint_url):
|
||||
@@ -297,7 +330,7 @@ def _query_context_length(endpoint_url: str, model: str) -> int:
|
||||
n_ctx = slots[0].get("n_ctx")
|
||||
if n_ctx and isinstance(n_ctx, int) and n_ctx > 0:
|
||||
logger.info(f"llama.cpp /slots reports n_ctx={n_ctx} for {model}")
|
||||
return n_ctx
|
||||
return n_ctx, True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -309,7 +342,8 @@ def _query_context_length(endpoint_url: str, model: str) -> int:
|
||||
if is_copilot_base(endpoint_url):
|
||||
if known:
|
||||
logger.info(f"Using known context window for {model}: {known}")
|
||||
return known or DEFAULT_CONTEXT
|
||||
return known, True
|
||||
return DEFAULT_CONTEXT, False
|
||||
|
||||
from src.endpoint_resolver import build_models_url
|
||||
|
||||
@@ -354,18 +388,18 @@ def _query_context_length(endpoint_url: str, model: str) -> int:
|
||||
_is_local = is_local_endpoint(endpoint_url)
|
||||
if _is_local and api_ctx < known:
|
||||
logger.info(f"Local endpoint reports {api_ctx} for {model} (known max: {known}) — using API value")
|
||||
return api_ctx
|
||||
return api_ctx, True
|
||||
result = max(api_ctx, known)
|
||||
if api_ctx < known:
|
||||
logger.info(f"API reported {api_ctx} for {model}, using known {known} instead")
|
||||
return result
|
||||
return result, True
|
||||
if api_ctx:
|
||||
return api_ctx
|
||||
return api_ctx, True
|
||||
if known:
|
||||
logger.info(f"Using known context window for {model}: {known}")
|
||||
return known
|
||||
return known, True
|
||||
|
||||
return DEFAULT_CONTEXT
|
||||
return DEFAULT_CONTEXT, False
|
||||
|
||||
|
||||
def estimate_tokens(messages: List[Dict]) -> int:
|
||||
|
||||
Reference in New Issue
Block a user