fix(compare): stream Compare panes directly to stop upstream promptly

The previous approach polled request.is_disconnected() inside the
async-for body of the chat/agent streaming loops. That happens too
late: by the time the poll runs, __anext__() has already awaited and
consumed the next upstream chunk, so a slow or silent generation could
still run for a full round-trip (or until a read timeout) after the
client disconnected. It was also unconditional, which would have made
ordinary chat navigation/refresh/tab-close stop a run that the
detached-run design intentionally keeps going server-side.

Both problems trace back to the same root cause: chat_stream always
wraps its generator in agent_runs (the detached-run manager), which
decouples the generator's lifetime from the SSE response on purpose so
normal chat/agent streams survive the client going away. Polling
disconnection inside a detached generator can never be "prompt" — the
generator isn't tied to that request anymore — and doing so defeats the
whole point of detaching it.

Compare panes don't need (or want) that: each pane's session exists
only to drive that one generation, there's nothing meaningful to
/resume, and the user expects the pane's Stop button — which aborts the
fetch and closes the SSE — to cancel the upstream call right away. So
route compare-mode requests around the agent_runs wrapper entirely and
stream the generator directly as the SSE body. Starlette already
cancels a streaming response's body iterator (raising
CancelledError/GeneratorExit into it) the instant it notices the client
disconnected — including while the generator is mid-await on the next
upstream chunk — and the existing except (CancelledError, GeneratorExit)
handlers in both the chat-mode and agent-mode loops already save the
partial response exactly once. No polling needed; the redesign just
stops getting in its own way.

Normal (non-Compare) chat and agent streams are untouched and keep
going through agent_runs, preserving detached-run semantics (surviving
tab close / navigation / refresh, reconnect via /api/chat/resume).

Replaces the source-text assertions in
tests/test_compare_stop_disconnect_poll.py with runtime tests that
actually exercise the cancellation contract: a Compare-shaped generator
is cancelled mid-await (not after the next chunk arrives) and saves its
partial exactly once; a normal completion still saves exactly once via
the completion path; agent_runs keeps a detached run alive when its
subscriber disconnects and only stops it on an explicit stop()/cancel
(also saving the partial exactly once); and the cancellation contract
is pinned for both chat-mode- and agent-mode-shaped chunk sequences.
This commit is contained in:
Lucas Daniel
2026-06-07 21:13:45 -03:00
committed by GitHub
parent fa7c4f8ea9
commit adc6ac9394
2 changed files with 314 additions and 5 deletions
+24 -5
View File
@@ -1209,11 +1209,30 @@ def setup_chat_routes(
finally:
_active_streams.pop(session, None)
# Run the stream as a DETACHED background task so it survives the client
# closing the tab / navigating away (true terminal-agent behavior). The
# SSE response just subscribes (replay buffered output + live); dropping
# the SSE only removes a subscriber — the run keeps going and saves the
# assistant message on completion regardless. Reconnect via /api/chat/resume.
# Compare panes are short-lived, single-shot generations whose sessions
# exist only to drive that one pane — there's nothing to "resume" and
# the user expects the pane's Stop button (which aborts the fetch,
# closing this SSE) to promptly cancel the upstream LLM call. Detaching
# them would keep burning upstream tokens/compute after the pane is
# stopped or the comparison is abandoned, and would surface a stale
# "still streaming" /resume target for a session nobody will revisit.
#
# So: stream them directly (no agent_runs wrapping). Starlette cancels
# the underlying async generator (raising CancelledError/GeneratorExit
# inside it) as soon as it notices the client disconnected — which the
# mode-specific except blocks above already handle by saving the
# partial response exactly once. This stops the upstream call promptly
# without waiting on the next streamed chunk.
#
# Normal chat/agent streams keep the DETACHED behavior below: they
# survive the client closing the tab / navigating away (true
# terminal-agent semantics). The SSE response just subscribes (replay
# buffered output + live); dropping the SSE only removes a subscriber —
# the run keeps going and saves the assistant message on completion
# regardless. Reconnect via /api/chat/resume.
if compare_mode:
return StreamingResponse(_safe_stream(), media_type="text/event-stream")
agent_runs.start(session, _safe_stream())
return StreamingResponse(agent_runs.subscribe(session), media_type="text/event-stream")