Files
odysseus/tests/test_model_context.py
T
nsgds 7ae6133d7f fix(agent): don't let a materialized default budget defeat context-window scaling (#4122)
* fix(agent): don't let a materialized default budget defeat context scaling

#1230 scales agent_input_token_budget to the model's context window unless
the user explicitly set a budget, detected via is_setting_overridden(). But
the settings-save path materializes every DEFAULT_SETTINGS key into
settings.json (load_settings merges defaults; handlers persist the merged
dict), so the persisted default 6000 reads as "overridden" and the budget
code takes the min(6000, ctx) branch — silently re-capping long-context
models at 6000 for anyone who has ever saved a setting. This reintroduces
the exact regression #1170/#1230 set out to fix.

Add is_setting_customized() (saved value != default) and gate the scaling
on it instead of mere presence. A persisted default is not a user choice.

is_setting_overridden has exactly one consumer (this budget path), so the
change is contained. Tests cover the materialized-default regression, a
deliberately-chosen budget still being honoured, and the absent-key case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): rework context-budget fix per review (#4122)

Address RaresKeY's review:

P2 (explicitness): is_setting_customized treated a saved value equal to the
default as "not explicit", which ALSO blocked a user from deliberately pinning
the default budget. Reframe the default value itself as the AUTO sentinel —
agent_input_token_budget == DEFAULT_BUDGET means "scale to the model's context
window", any other value is an explicit cap. A materialized default still reads
as auto (fixing the original regression), and any non-default value the user
chooses is now honoured. Drop the now-unused is_setting_customized helper.

P2 (fallback context): auto-scaling trusted get_context_length() even when it
returned only the bare DEFAULT_CONTEXT fallback (no endpoint-reported / known
window), over-allocating on self-hosted/proxy setups. Add get_context_length_known()
(also returns whether the window was actually discovered); the budget block
passes 0 when unknown so auto-scaling stays conservative instead of inflating to
an unproven window.

hard_max stays auto-only — a deliberate explicit budget wins (#1190); kept that
contract and answered the reviewer's question rather than silently reversing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(agent): lock the materialized-default budget regression (review on #4121)

Per WGlynn's review on the issue: add an end-to-end regression that saves an
UNRELATED setting (which makes the settings-save path materialize the budget
default into settings.json) and asserts the budget still auto-scales rather than
re-reading as an explicit 6000 cap — locking the exact reopening shut.

To make the test bite the production decision (not just re-derive it), extract
`budget_is_explicit()` into src/context_budget.py and use it from the agent loop.
It keys off value-vs-default (the default is the auto sentinel), NOT settings
presence — which is the whole point, since the save path materializes defaults.

Note: after this PR's rework, is_setting_overridden has ZERO production callers,
so the merged-dict materialization smell can't reach any setting through a
presence check today (WGlynn's durability concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): bind the budget context window to its own provenance (review #4122)

RaresKeY caught a correctness bug in the fallback-context guard: stream_agent_loop
kept only the `known` flag from get_context_length_known() and budgeted off the
passed-in `context_length`, which can come from a *different* lookup. Two failures:
- local endpoints are re-queried, so the passed value can be a stale DEFAULT_CONTEXT
  fallback while the fresh probe proves the real (smaller) served context — we'd
  scale off the stale value;
- callers that don't pass context_length (scheduled tasks, teacher escalation,
  skill test runs, bg_monitor) were capped at 6000 even when a long window is
  discoverable.

Extract budget_context_for_model() which returns the freshly-probed window when
known else 0, binding the flag to the value it proves; the agent loop uses it.
Regression tests cover the stale-fallback, no-arg-caller, and probe-error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): fix stale budget comments + tighten to the contract (review #4122)

- settings.py: an explicit budget is clamped to the window only — hard_max is
  auto-only (#1190); drop the incorrect "and to hard_max".
- is_setting_overridden docstring: drop the stale "adaptive budgets" example;
  point value-sensitive callers at context_budget.budget_is_explicit.
- Tighten the budget-block comments to the contract (default = auto sentinel,
  non-default = explicit cap, hard_max = auto-only ceiling).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(agent): correct budget issue citations (#1190 → merged #1230/#1273)

The context-budget contract (auto-sentinel, explicit budgets honoured,
hard_max auto-only) merged via #1230#1190 was the earlier, closed,
superseded PR. Re-point the contract comments at #1230 (the live source,
already cited for the auto-sentinel two lines up in settings.py).

The configurable hard_max setting (`agent_input_token_hard_max`) was a
reviewer requirement first raised on #1190, omitted from the merged #1230,
and actually added in #1273 — credit #1273 for it and correct the test
comment's history (it previously implied this PR completed the requirement).

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 15:17:28 +09:00

252 lines
8.0 KiB
Python

"""Tests for model_context.py — local endpoint detection, token estimation, known model lookup."""
import sys
import types
import pytest
import src.model_context as model_context
from src.model_context import is_local_endpoint, estimate_tokens, _lookup_known
class _Column:
def __init__(self, name):
self.name = name
def __eq__(self, value):
return ("eq", self.name, value)
class _ModelEndpoint:
is_enabled = _Column("is_enabled")
class _Query:
def __init__(self, rows):
self.rows = list(rows)
def filter(self, *conditions):
for condition in conditions:
if isinstance(condition, tuple) and condition[0] == "eq":
_, field, value = condition
self.rows = [row for row in self.rows if getattr(row, field) == value]
return self
def all(self):
return list(self.rows)
class _Db:
def __init__(self, rows):
self.rows = rows
def query(self, model):
return _Query(self.rows)
def close(self):
pass
def _install_endpoint_db(monkeypatch, rows):
mod = types.ModuleType("core.database")
mod.ModelEndpoint = _ModelEndpoint
mod.SessionLocal = lambda: _Db(rows)
monkeypatch.setitem(sys.modules, "core.database", mod)
class TestIsLocalEndpoint:
def test_localhost(self):
assert is_local_endpoint("http://localhost:5000/v1/chat/completions") is True
def test_loopback_ipv4(self):
assert is_local_endpoint("http://127.0.0.1:8080/v1/chat/completions") is True
def test_private_192_168(self):
assert is_local_endpoint("http://192.168.1.1:11434/v1/chat/completions") is True
def test_private_10(self):
assert is_local_endpoint("http://10.0.0.5:8000/v1/chat/completions") is True
def test_tailscale_100(self):
# 100.64.0.0/10 is the CGNAT range Tailscale uses.
assert is_local_endpoint("http://100.64.0.1:5000/v1/chat/completions") is True
def test_configured_tailscale_proxy_is_remote(self, monkeypatch):
_install_endpoint_db(monkeypatch, [
types.SimpleNamespace(
base_url="http://100.117.136.97:34521/v1",
endpoint_kind="proxy",
api_key="fake-key",
is_enabled=True,
)
])
assert is_local_endpoint("http://100.117.136.97:34521/v1/chat/completions") is False
def test_openai_is_remote(self):
assert is_local_endpoint("https://api.openai.com/v1/chat/completions") is False
def test_anthropic_is_remote(self):
assert is_local_endpoint("https://api.anthropic.com/v1/messages") is False
def test_empty_url(self):
assert is_local_endpoint("") is False
def test_malformed_url(self):
assert is_local_endpoint("not-a-url") is False
class TestEstimateTokens:
def test_empty_list(self):
assert estimate_tokens([]) == 0
def test_single_short_message(self):
messages = [{"role": "user", "content": "Hello"}]
tokens = estimate_tokens(messages)
# 4 overhead + int(5 * 0.3) = 4 + 1 = 5
assert tokens == 5
def test_multiple_messages(self):
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hi there"},
]
tokens = estimate_tokens(messages)
assert tokens > 0
# Each message adds 4 overhead + chars * 0.3
assert tokens == 4 + int(16 * 0.3) + 4 + int(8 * 0.3)
def test_multimodal_content_list(self):
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "data:..."}},
],
}
]
tokens = estimate_tokens(messages)
# 4 overhead + int(19 * 0.3) for the text item; image_url is ignored
assert tokens == 4 + int(19 * 0.3)
def test_missing_content_key(self):
messages = [{"role": "assistant"}]
tokens = estimate_tokens(messages)
# 4 overhead + 0 content
assert tokens == 4
def test_scales_with_length(self):
short = estimate_tokens([{"role": "user", "content": "short"}])
long_text = "a" * 10000
long = estimate_tokens([{"role": "user", "content": long_text}])
assert long > short * 10
class TestLookupKnown:
def test_claude_sonnet(self):
assert _lookup_known("claude-sonnet-4-5") == 200000
def test_gpt4o(self):
assert _lookup_known("gpt-4o") == 128000
def test_deepseek_r1(self):
assert _lookup_known("deepseek-r1") == 64000
def test_gemini_pro(self):
assert _lookup_known("gemini-2.5-pro") == 1048576
def test_unknown_model(self):
assert _lookup_known("totally-unknown-model-xyz") is None
def test_namespaced_model(self):
"""Models prefixed with provider/ should still match."""
result = _lookup_known("openrouter/deepseek-r1")
assert result == 64000
def test_model_with_tag(self):
"""Models with :free or :extended suffixes should still match."""
result = _lookup_known("deepseek-r1:free")
assert result == 64000
def test_o1_mini_not_shadowed_by_o1(self):
"""'o1' (200k) precedes 'o1-mini' (128k) in the table; longest match wins."""
assert _lookup_known("o1-mini") == 128000
def test_o1_full(self):
assert _lookup_known("o1") == 200000
def test_gpt4o_mini_not_shadowed_by_gpt4(self):
assert _lookup_known("gpt-4o-mini") == 128000
def test_gpt4_base(self):
assert _lookup_known("gpt-4") == 8192
class TestGetContextLength:
def setup_method(self):
model_context._context_cache.clear()
def test_local_endpoint_requeries_same_model_after_restart(self, monkeypatch):
calls = []
def fake_query(endpoint_url, model):
calls.append((endpoint_url, model))
return (8192, True) if len(calls) == 1 else (27000, True)
monkeypatch.setattr(model_context, "_query_context_length", fake_query)
endpoint = "http://127.0.0.1:8000/v1/chat/completions"
model = "Qwen/Qwen3-14B"
first = model_context.get_context_length(endpoint, model)
second = model_context.get_context_length(endpoint, model)
assert first == 8192
assert second == 27000
assert len(calls) == 2
def test_remote_endpoint_keeps_cached_context(self, monkeypatch):
calls = []
def fake_query(endpoint_url, model):
calls.append((endpoint_url, model))
return (200000, True) if len(calls) == 1 else (12345, True)
monkeypatch.setattr(model_context, "_query_context_length", fake_query)
endpoint = "https://api.openai.com/v1/chat/completions"
model = "gpt-5"
first = model_context.get_context_length(endpoint, model)
second = model_context.get_context_length(endpoint, model)
assert first == 200000
assert second == 200000
assert len(calls) == 1
def test_configured_proxy_uses_default_without_model_listing(self, monkeypatch):
_install_endpoint_db(monkeypatch, [
types.SimpleNamespace(
base_url="http://100.117.136.97:34521/v1",
endpoint_kind="proxy",
api_key="fake-key",
is_enabled=True,
)
])
calls = []
def fake_get(*args, **kwargs):
calls.append(args)
raise AssertionError("/models should not be queried for configured proxy context")
monkeypatch.setattr(model_context.httpx, "get", fake_get)
endpoint = "http://100.117.136.97:34521/v1/chat/completions"
first = model_context.get_context_length(endpoint, "unknown-proxy-model")
second = model_context.get_context_length(endpoint, "unknown-proxy-model")
assert first == model_context.DEFAULT_CONTEXT
assert second == model_context.DEFAULT_CONTEXT
assert calls == []