Add ask_user tool: agent-posed multiple-choice questions (#2111)

Let the agent pause and ask the user a multiple-choice question when a task is genuinely ambiguous and the answer changes what it does next — choosing between approaches, confirming an assumption, picking a target — instead of guessing. Modeled on the existing `ui_control` marker pattern: the `ask_user` tool returns an `ask_user` payload that the agent loop emits as an SSE event and then ends the turn. The frontend renders the question with clickable option buttons, a free-text "Other" input, and an x to dismiss; the user's choice is sent as the next message and the agent resumes with it in context. - src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O. Validates a non-empty question + 2..6 options, normalizes string/object options, returns the payload. - src/agent_loop.py: emit the `ask_user` event and break the round loop so the turn ends and waits for the user's selection. Stream the question as assistant text so it persists/replays (prevents a re-ask loop). - Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS, FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any user can be asked); the structured args serialize via the default json.dumps path. - routes/chat_routes.py: relay the `ask_user` event to the client. - static/js/chat.js + static/style.css: render the question card (options + free-text Other + dismiss x; removed once answered). Reuses CSS vars and the .modal-close button; emoji go through the monochrome-SVG pipeline. Bump chat.js cache pin. - tests/test_ask_user_tool.py: payload, multi flag, string options, option cap, validation errors, serializer round-trip, registration.
2026-06-16 17:55:26 -04:00 · 2026-06-05 11:49:11 +02:00
parent 621885ac06
commit 0a2adc9c96
9 changed files with 432 additions and 1 deletions
@@ -335,6 +335,7 @@ If the user asks for a reminder/alarm before the event, pass `reminder_minutes`
    "search_chats": "- ```search_chats``` — Search across all chat history. Use when user asks 'did we discuss X?' or 'find the conversation about Y'.",
    "pipeline": "- ```pipeline``` — Run a multi-step AI pipeline. Args (JSON) with ordered steps, each specifying a model and prompt. Use for complex workflows.",
    "ui_control": "- ```ui_control``` — Control the UI: toggle tools on/off, OPEN PANELS, open email reply drafts, switch models, change themes. Commands: `toggle <name> on/off` (names: bash/shell, web/search, research, incognito, document_editor/documents), `open_panel <name>` (panels: documents, gallery, email, sessions, notes, memories/brain, skills, settings, cookbook), `open_email_reply <uid> <folder> <reply|reply-all|ai-reply>` (opens an email compose document, does NOT send), `set_mode agent/chat`, `switch_model <name>`, `set_theme <preset>`, `create_theme <name> <bg> <fg> <panel> <border> <accent>` (optional key=val for advanced colors AND background effects: bgPattern=<none|dots|synapse|rain|constellations|perlin-flow|petals|sparkles|embers>, bgEffectColor=#RRGGBB, bgEffectIntensity=<num>, bgEffectSize=<num>, frosted=true|false). \"open documents\" / \"open library\" / \"show gallery\" / \"open inbox\" / \"open notes\" / \"open cookbook\" all map to `open_panel <name>`. Theme presets: dark, light, midnight, paper, cyberpunk, retrowave, forest, ocean, ume, copper, terminal, organs, lavender, gpt, claude, cute.",
+    "ask_user": "- ```ask_user``` — Ask the user a multiple-choice question when the task is genuinely ambiguous and the answer changes what you do next (pick an approach, confirm an assumption, choose a target). Args (JSON): {\"question\": \"...\", \"options\": [{\"label\": \"...\", \"description\": \"...\"?}, ...], \"multi\": false?}. 2-6 options. The user gets clickable buttons; calling this ENDS your turn and their choice comes back as your next message. Prefer sensible defaults — only ask when you truly can't proceed well without their input.",
    "list_served_models": "- ```list_served_models``` — Show what the Cookbook (LLM-serving subsystem) is currently running. NO args. Use this for ANY 'what's running' / 'what's serving' / 'show my cookbook' / 'is anything up' query. DO NOT shell out (`ps aux`, `docker ps`, etc.) — this tool is the source of truth. Failed serve tasks include recent logs plus diagnosis/retry suggestions; use those suggestions to call `serve_model` again with an adjusted command when appropriate.",
    "stop_served_model": "- ```stop_served_model``` — Stop a running model server. Args (JSON): {\"session_id\": \"<from list_served_models>\"}. Use for 'kill my cookbook' / 'stop the model' / 'shut down vLLM'.",
    "tail_serve_output": "- ```tail_serve_output``` — Read the actual tmux stderr/traceback of a CURRENTLY failing cookbook task. Args (JSON): {\"session_id\": \"<from list_served_models>\", \"tail\": 150?}. **Use ONLY after** you just launched something via `serve_model` AND `list_served_models` reports YOUR new task as `crashed`/`error`. DO NOT use it on old stopped/completed download tasks (they're historical noise — won't predict whether a new launch succeeds). DO NOT call it before launching a fresh attempt. When you do call it, bump `tail` to 400+ only if the visible error references 'see root cause above'.",
@@ -1682,6 +1683,7 @@ async def stream_agent_loop(
        r"\b[^.\n]{0,140}",
        re.IGNORECASE,
    )
+    _awaiting_user = False  # set by ask_user → end the turn and wait for a choice

    # Document streaming state (persists across rounds)
    _doc_acc = ""          # accumulated tool-call JSON arguments
@@ -2263,6 +2265,28 @@ async def stream_agent_loop(
                    f'data: {json.dumps({"type": "ui_control", "data": result})}\n\n'
                )

+            # ask_user: the agent posed a multiple-choice question. Emit it so the
+            # frontend renders clickable options, then end the turn (below) and
+            # wait — the user's pick becomes the next message.
+            if "ask_user" in result:
+                # The question lives in the tool args. ChatMessage.to_dict()
+                # replays only role+content to the model next turn — tool_event
+                # metadata is dropped — so if the question is never in the saved
+                # assistant text, the model can't see it already asked and will
+                # loop and re-ask after the user answers. Stream it as assistant
+                # text (once) so it persists and is replayed. The card shows the
+                # options only, so this is the single visible copy of the question.
+                _auq = result["ask_user"]
+                _auq_q = (_auq.get("question") or "").strip()
+                if _auq_q and _auq_q not in full_response:
+                    _auq_delta = ("\n\n" if full_response.strip() else "") + _auq_q
+                    full_response += _auq_delta
+                    yield 'data: ' + json.dumps({"delta": _auq_delta}) + '\n\n'
+                yield (
+                    f'data: {json.dumps({"type": "ask_user", "data": result["ask_user"]})}\n\n'
+                )
+                _awaiting_user = True
+
            # Build output for frontend tool bubble.
            # Document tools get a short summary — content goes to the editor panel.
            output_text = ""
@@ -2392,6 +2416,13 @@ async def stream_agent_loop(
        if budget_hit:
            break

+        # ask_user posed a question — stop here and wait for the user's choice.
+        # Don't feed tool results back or advance a round; the user's selection
+        # arrives as the next message and the agent resumes from there. The
+        # question text is already in the streamed response, so it persists.
+        if _awaiting_user:
+            break
+
        # Feed results back to LLM for next round
        _append_tool_results(messages, round_response, native_tool_calls,
                             tool_results, tool_result_texts, used_native, round_num,
@@ -34,7 +34,7 @@ TOOL_TAGS = {"bash", "python", "web_search", "web_fetch", "read_file", "write_fi
             "send_to_session",
             "pipeline",
             "manage_session", "manage_memory", "list_models",
-             "ui_control", "generate_image",
+             "ui_control", "generate_image", "ask_user",
             "manage_tasks", "api_call", "ask_teacher", "manage_skills",
             "suggest_document",
             "manage_endpoints", "manage_mcp", "manage_webhooks",
@@ -1184,6 +1184,53 @@ async def execute_tool_block(
        logger.warning("Public tool policy blocked owner=%r tool=%s", owner, tool)
        return desc, result

+    # ask_user: the agent poses a multiple-choice question to the user to get a
+    # decision/clarification. This is a pure UI-control marker — no subprocess,
+    # no filesystem. It returns an `ask_user` payload that the agent loop turns
+    # into an `ask_user` SSE event and then ENDS the turn, so the chat waits for
+    # the user's selection (their choice arrives as the next message).
+    if tool == "ask_user":
+        import json as _json
+        question, options, multi = "", [], False
+        raw = (content or "").strip()
+        try:
+            parsed = _json.loads(raw) if raw else {}
+        except (ValueError, TypeError):
+            parsed = {}
+        if isinstance(parsed, dict):
+            question = str(parsed.get("question", "")).strip()
+            multi = bool(parsed.get("multi") or parsed.get("multiSelect"))
+            for opt in (parsed.get("options") or []):
+                if isinstance(opt, dict):
+                    label = str(opt.get("label", "")).strip()
+                    descr = str(opt.get("description", "")).strip()
+                elif isinstance(opt, str):
+                    label, descr = opt.strip(), ""
+                else:
+                    continue
+                if label:
+                    options.append({"label": label, "description": descr})
+        else:
+            question = raw
+        if not question or len(options) < 2:
+            return "ask_user: invalid", {
+                "error": (
+                    "ask_user needs a non-empty `question` and at least 2 `options` "
+                    "(each an object with a `label`, optional `description`)."
+                ),
+                "exit_code": 1,
+            }
+        options = options[:6]  # keep the choice list sane
+        desc = f"ask_user: {question[:80]}"
+        labels = ", ".join(o["label"] for o in options)
+        result = {
+            "ask_user": {"question": question, "options": options, "multi": multi},
+            "output": f"Asked the user: {question}\nOptions: {labels}\nAwaiting their selection.",
+            "exit_code": 0,
+        }
+        logger.info("Tool executed: %s (%d options, multi=%s)", desc, len(options), multi)
+        return desc, result
+
    # Background execution: a `bash` block whose first line is the `#!bg`
    # marker runs DETACHED — returns a job id immediately so the chat stream
    # isn't held open for a multi-minute install/ffmpeg/download. The always-on
@@ -52,6 +52,9 @@ ALWAYS_AVAILABLE = frozenset({
    # of topic. Without this, RAG drops it and the agent falls back to
    # app_api /api/memory/add which fails with 422 on first attempt.
    "manage_memory",
+    # Ask the user a multiple-choice question for a decision/clarification.
+    # Always reachable so the agent can pause and ask at any point.
+    "ask_user",
 })

 # Tools that the Personal Assistant always has access to during scheduled
@@ -111,6 +114,7 @@ BUILTIN_TOOL_DESCRIPTIONS: Dict[str, str] = {
    "list_sessions": "List all chats with their metadata (the UI calls these 'chats'). Use for 'list my chats', 'rename all my chats' (list first, then manage_session to rename each).",
    "send_to_session": "Send a message to another chat. Cross-chat communication.",
    "search_chats": "Search through chat history across all sessions.",
+    "ask_user": "Ask the user a multiple-choice question to get a decision or clarification. Use this when the task is genuinely ambiguous and the answer changes what you do next — pick between approaches, confirm an assumption, choose among options — instead of guessing. Provide a clear `question` and 2-6 `options` (each with a short `label`, optional `description`). Calling this ENDS your turn: the user sees clickable buttons and their choice arrives as your next message. Don't use it for things you can decide from context or sensible defaults, or for irreversible-action confirmation if a dedicated flow exists.",
    "ui_control": "Control the UI and toggle tools on/off. Use this to turn off / turn on / disable / enable individual tools and features: shell (bash), search (web), research, browser, documents, incognito. Open panels (documents library, gallery, email inbox, sessions, notes, memories/brain, skills, settings, cookbook) via `open_panel <name>`. Use `open_email_reply <uid> <folder> reply` to open an email reply draft document without sending. Also switches between chat/agent modes, changes the current model, and applies/creates themes.",
    "list_email_accounts": "List configured email accounts and default status. Use before reading or sending mail when the user mentions Gmail, work mail, custom domain mail, another mailbox, or asks to compare/check multiple inboxes.",
    "list_emails": "List emails for a folder/account, newest first, including read messages by default. Shows subject, sender, date, UID, account, and AI summary. Check inbox, find emails needing replies. Supports account from list_email_accounts for Gmail/work/custom mailboxes. For last/latest/newest email, use max_results=1 and unread_only=false.",
@@ -447,6 +447,33 @@ FUNCTION_TOOL_SCHEMAS = [
            }
        }
    },
+    {
+        "type": "function",
+        "function": {
+            "name": "ask_user",
+            "description": "Ask the user a multiple-choice question to get a decision or clarification when the task is genuinely ambiguous and the answer changes what you do next (e.g. pick between approaches, confirm an assumption, choose a target). The user sees clickable option buttons; calling this ENDS your turn and their selection arrives as your next message. Prefer sensible defaults over asking — only ask when you truly cannot proceed well without the user's input. Do NOT use it to confirm irreversible/destructive actions that have a dedicated confirmation flow.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "question": {"type": "string", "description": "The question to ask. Be specific and self-contained."},
+                    "options": {
+                        "type": "array",
+                        "description": "2-6 mutually exclusive choices. Each is an object with a short `label` and an optional `description` explaining the trade-off.",
+                        "items": {
+                            "type": "object",
+                            "properties": {
+                                "label": {"type": "string", "description": "Concise choice text the user clicks (1-5 words)."},
+                                "description": {"type": "string", "description": "Optional one-line explanation of this choice."}
+                            },
+                            "required": ["label"]
+                        }
+                    },
+                    "multi": {"type": "boolean", "description": "Set true to let the user select multiple options instead of one. Default false."}
+                },
+                "required": ["question", "options"]
+            }
+        }
+    },
    {
        "type": "function",
        "function": {