mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-17 10:15:27 -04:00
fix(compare): stream Compare panes directly to stop upstream promptly
The previous approach polled request.is_disconnected() inside the async-for body of the chat/agent streaming loops. That happens too late: by the time the poll runs, __anext__() has already awaited and consumed the next upstream chunk, so a slow or silent generation could still run for a full round-trip (or until a read timeout) after the client disconnected. It was also unconditional, which would have made ordinary chat navigation/refresh/tab-close stop a run that the detached-run design intentionally keeps going server-side. Both problems trace back to the same root cause: chat_stream always wraps its generator in agent_runs (the detached-run manager), which decouples the generator's lifetime from the SSE response on purpose so normal chat/agent streams survive the client going away. Polling disconnection inside a detached generator can never be "prompt" — the generator isn't tied to that request anymore — and doing so defeats the whole point of detaching it. Compare panes don't need (or want) that: each pane's session exists only to drive that one generation, there's nothing meaningful to /resume, and the user expects the pane's Stop button — which aborts the fetch and closes the SSE — to cancel the upstream call right away. So route compare-mode requests around the agent_runs wrapper entirely and stream the generator directly as the SSE body. Starlette already cancels a streaming response's body iterator (raising CancelledError/GeneratorExit into it) the instant it notices the client disconnected — including while the generator is mid-await on the next upstream chunk — and the existing except (CancelledError, GeneratorExit) handlers in both the chat-mode and agent-mode loops already save the partial response exactly once. No polling needed; the redesign just stops getting in its own way. Normal (non-Compare) chat and agent streams are untouched and keep going through agent_runs, preserving detached-run semantics (surviving tab close / navigation / refresh, reconnect via /api/chat/resume). Replaces the source-text assertions in tests/test_compare_stop_disconnect_poll.py with runtime tests that actually exercise the cancellation contract: a Compare-shaped generator is cancelled mid-await (not after the next chunk arrives) and saves its partial exactly once; a normal completion still saves exactly once via the completion path; agent_runs keeps a detached run alive when its subscriber disconnects and only stops it on an explicit stop()/cancel (also saving the partial exactly once); and the cancellation contract is pinned for both chat-mode- and agent-mode-shaped chunk sequences.
This commit is contained in:
+24
-5
@@ -1209,11 +1209,30 @@ def setup_chat_routes(
|
||||
finally:
|
||||
_active_streams.pop(session, None)
|
||||
|
||||
# Run the stream as a DETACHED background task so it survives the client
|
||||
# closing the tab / navigating away (true terminal-agent behavior). The
|
||||
# SSE response just subscribes (replay buffered output + live); dropping
|
||||
# the SSE only removes a subscriber — the run keeps going and saves the
|
||||
# assistant message on completion regardless. Reconnect via /api/chat/resume.
|
||||
# Compare panes are short-lived, single-shot generations whose sessions
|
||||
# exist only to drive that one pane — there's nothing to "resume" and
|
||||
# the user expects the pane's Stop button (which aborts the fetch,
|
||||
# closing this SSE) to promptly cancel the upstream LLM call. Detaching
|
||||
# them would keep burning upstream tokens/compute after the pane is
|
||||
# stopped or the comparison is abandoned, and would surface a stale
|
||||
# "still streaming" /resume target for a session nobody will revisit.
|
||||
#
|
||||
# So: stream them directly (no agent_runs wrapping). Starlette cancels
|
||||
# the underlying async generator (raising CancelledError/GeneratorExit
|
||||
# inside it) as soon as it notices the client disconnected — which the
|
||||
# mode-specific except blocks above already handle by saving the
|
||||
# partial response exactly once. This stops the upstream call promptly
|
||||
# without waiting on the next streamed chunk.
|
||||
#
|
||||
# Normal chat/agent streams keep the DETACHED behavior below: they
|
||||
# survive the client closing the tab / navigating away (true
|
||||
# terminal-agent semantics). The SSE response just subscribes (replay
|
||||
# buffered output + live); dropping the SSE only removes a subscriber —
|
||||
# the run keeps going and saves the assistant message on completion
|
||||
# regardless. Reconnect via /api/chat/resume.
|
||||
if compare_mode:
|
||||
return StreamingResponse(_safe_stream(), media_type="text/event-stream")
|
||||
|
||||
agent_runs.start(session, _safe_stream())
|
||||
return StreamingResponse(agent_runs.subscribe(session), media_type="text/event-stream")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user