fix(compare): stream Compare panes directly to stop upstream promptly

The previous approach polled request.is_disconnected() inside the async-for body of the chat/agent streaming loops. That happens too late: by the time the poll runs, __anext__() has already awaited and consumed the next upstream chunk, so a slow or silent generation could still run for a full round-trip (or until a read timeout) after the client disconnected. It was also unconditional, which would have made ordinary chat navigation/refresh/tab-close stop a run that the detached-run design intentionally keeps going server-side. Both problems trace back to the same root cause: chat_stream always wraps its generator in agent_runs (the detached-run manager), which decouples the generator's lifetime from the SSE response on purpose so normal chat/agent streams survive the client going away. Polling disconnection inside a detached generator can never be "prompt" — the generator isn't tied to that request anymore — and doing so defeats the whole point of detaching it. Compare panes don't need (or want) that: each pane's session exists only to drive that one generation, there's nothing meaningful to /resume, and the user expects the pane's Stop button — which aborts the fetch and closes the SSE — to cancel the upstream call right away. So route compare-mode requests around the agent_runs wrapper entirely and stream the generator directly as the SSE body. Starlette already cancels a streaming response's body iterator (raising CancelledError/GeneratorExit into it) the instant it notices the client disconnected — including while the generator is mid-await on the next upstream chunk — and the existing except (CancelledError, GeneratorExit) handlers in both the chat-mode and agent-mode loops already save the partial response exactly once. No polling needed; the redesign just stops getting in its own way. Normal (non-Compare) chat and agent streams are untouched and keep going through agent_runs, preserving detached-run semantics (surviving tab close / navigation / refresh, reconnect via /api/chat/resume). Replaces the source-text assertions in tests/test_compare_stop_disconnect_poll.py with runtime tests that actually exercise the cancellation contract: a Compare-shaped generator is cancelled mid-await (not after the next chunk arrives) and saves its partial exactly once; a normal completion still saves exactly once via the completion path; agent_runs keeps a detached run alive when its subscriber disconnects and only stops it on an explicit stop()/cancel (also saving the partial exactly once); and the cancellation contract is pinned for both chat-mode- and agent-mode-shaped chunk sequences.
2026-06-17 10:15:27 -04:00 · 2026-06-07 21:13:45 -03:00
parent fa7c4f8ea9
commit adc6ac9394
2 changed files with 314 additions and 5 deletions
@@ -1209,11 +1209,30 @@ def setup_chat_routes(
            finally:
                _active_streams.pop(session, None)

-        # Run the stream as a DETACHED background task so it survives the client
-        # closing the tab / navigating away (true terminal-agent behavior). The
-        # SSE response just subscribes (replay buffered output + live); dropping
-        # the SSE only removes a subscriber — the run keeps going and saves the
-        # assistant message on completion regardless. Reconnect via /api/chat/resume.
+        # Compare panes are short-lived, single-shot generations whose sessions
+        # exist only to drive that one pane — there's nothing to "resume" and
+        # the user expects the pane's Stop button (which aborts the fetch,
+        # closing this SSE) to promptly cancel the upstream LLM call. Detaching
+        # them would keep burning upstream tokens/compute after the pane is
+        # stopped or the comparison is abandoned, and would surface a stale
+        # "still streaming" /resume target for a session nobody will revisit.
+        #
+        # So: stream them directly (no agent_runs wrapping). Starlette cancels
+        # the underlying async generator (raising CancelledError/GeneratorExit
+        # inside it) as soon as it notices the client disconnected — which the
+        # mode-specific except blocks above already handle by saving the
+        # partial response exactly once. This stops the upstream call promptly
+        # without waiting on the next streamed chunk.
+        #
+        # Normal chat/agent streams keep the DETACHED behavior below: they
+        # survive the client closing the tab / navigating away (true
+        # terminal-agent semantics). The SSE response just subscribes (replay
+        # buffered output + live); dropping the SSE only removes a subscriber —
+        # the run keeps going and saves the assistant message on completion
+        # regardless. Reconnect via /api/chat/resume.
+        if compare_mode:
+            return StreamingResponse(_safe_stream(), media_type="text/event-stream")
+
        agent_runs.start(session, _safe_stream())
        return StreamingResponse(agent_runs.subscribe(session), media_type="text/event-stream")