Add ask_user tool: agent-posed multiple-choice questions (#2111)

Let the agent pause and ask the user a multiple-choice question when a
task is genuinely ambiguous and the answer changes what it does next —
choosing between approaches, confirming an assumption, picking a target —
instead of guessing.

Modeled on the existing `ui_control` marker pattern: the `ask_user` tool
returns an `ask_user` payload that the agent loop emits as an SSE event
and then ends the turn. The frontend renders the question with clickable
option buttons, a free-text "Other" input, and an x to dismiss; the user's
choice is sent as the next message and the agent resumes with it in
context.

- src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O.
  Validates a non-empty question + 2..6 options, normalizes string/object
  options, returns the payload.
- src/agent_loop.py: emit the `ask_user` event and break the round loop so
  the turn ends and waits for the user's selection. Stream the question as
  assistant text so it persists/replays (prevents a re-ask loop).
- Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS,
  FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any
  user can be asked); the structured args serialize via the default
  json.dumps path.
- routes/chat_routes.py: relay the `ask_user` event to the client.
- static/js/chat.js + static/style.css: render the question card (options +
  free-text Other + dismiss x; removed once answered). Reuses CSS vars and
  the .modal-close button; emoji go through the monochrome-SVG pipeline.
  Bump chat.js cache pin.
- tests/test_ask_user_tool.py: payload, multi flag, string options, option
  cap, validation errors, serializer round-trip, registration.
This commit is contained in:
Kenny Van de Maele
2026-06-05 11:49:11 +02:00
committed by GitHub
parent 621885ac06
commit 0a2adc9c96
9 changed files with 432 additions and 1 deletions
+31
View File
@@ -335,6 +335,7 @@ If the user asks for a reminder/alarm before the event, pass `reminder_minutes`
"search_chats": "- ```search_chats``` — Search across all chat history. Use when user asks 'did we discuss X?' or 'find the conversation about Y'.",
"pipeline": "- ```pipeline``` — Run a multi-step AI pipeline. Args (JSON) with ordered steps, each specifying a model and prompt. Use for complex workflows.",
"ui_control": "- ```ui_control``` — Control the UI: toggle tools on/off, OPEN PANELS, open email reply drafts, switch models, change themes. Commands: `toggle <name> on/off` (names: bash/shell, web/search, research, incognito, document_editor/documents), `open_panel <name>` (panels: documents, gallery, email, sessions, notes, memories/brain, skills, settings, cookbook), `open_email_reply <uid> <folder> <reply|reply-all|ai-reply>` (opens an email compose document, does NOT send), `set_mode agent/chat`, `switch_model <name>`, `set_theme <preset>`, `create_theme <name> <bg> <fg> <panel> <border> <accent>` (optional key=val for advanced colors AND background effects: bgPattern=<none|dots|synapse|rain|constellations|perlin-flow|petals|sparkles|embers>, bgEffectColor=#RRGGBB, bgEffectIntensity=<num>, bgEffectSize=<num>, frosted=true|false). \"open documents\" / \"open library\" / \"show gallery\" / \"open inbox\" / \"open notes\" / \"open cookbook\" all map to `open_panel <name>`. Theme presets: dark, light, midnight, paper, cyberpunk, retrowave, forest, ocean, ume, copper, terminal, organs, lavender, gpt, claude, cute.",
"ask_user": "- ```ask_user``` — Ask the user a multiple-choice question when the task is genuinely ambiguous and the answer changes what you do next (pick an approach, confirm an assumption, choose a target). Args (JSON): {\"question\": \"...\", \"options\": [{\"label\": \"...\", \"description\": \"...\"?}, ...], \"multi\": false?}. 2-6 options. The user gets clickable buttons; calling this ENDS your turn and their choice comes back as your next message. Prefer sensible defaults — only ask when you truly can't proceed well without their input.",
"list_served_models": "- ```list_served_models``` — Show what the Cookbook (LLM-serving subsystem) is currently running. NO args. Use this for ANY 'what's running' / 'what's serving' / 'show my cookbook' / 'is anything up' query. DO NOT shell out (`ps aux`, `docker ps`, etc.) — this tool is the source of truth. Failed serve tasks include recent logs plus diagnosis/retry suggestions; use those suggestions to call `serve_model` again with an adjusted command when appropriate.",
"stop_served_model": "- ```stop_served_model``` — Stop a running model server. Args (JSON): {\"session_id\": \"<from list_served_models>\"}. Use for 'kill my cookbook' / 'stop the model' / 'shut down vLLM'.",
"tail_serve_output": "- ```tail_serve_output``` — Read the actual tmux stderr/traceback of a CURRENTLY failing cookbook task. Args (JSON): {\"session_id\": \"<from list_served_models>\", \"tail\": 150?}. **Use ONLY after** you just launched something via `serve_model` AND `list_served_models` reports YOUR new task as `crashed`/`error`. DO NOT use it on old stopped/completed download tasks (they're historical noise — won't predict whether a new launch succeeds). DO NOT call it before launching a fresh attempt. When you do call it, bump `tail` to 400+ only if the visible error references 'see root cause above'.",
@@ -1682,6 +1683,7 @@ async def stream_agent_loop(
r"\b[^.\n]{0,140}",
re.IGNORECASE,
)
_awaiting_user = False # set by ask_user → end the turn and wait for a choice
# Document streaming state (persists across rounds)
_doc_acc = "" # accumulated tool-call JSON arguments
@@ -2263,6 +2265,28 @@ async def stream_agent_loop(
f'data: {json.dumps({"type": "ui_control", "data": result})}\n\n'
)
# ask_user: the agent posed a multiple-choice question. Emit it so the
# frontend renders clickable options, then end the turn (below) and
# wait — the user's pick becomes the next message.
if "ask_user" in result:
# The question lives in the tool args. ChatMessage.to_dict()
# replays only role+content to the model next turn — tool_event
# metadata is dropped — so if the question is never in the saved
# assistant text, the model can't see it already asked and will
# loop and re-ask after the user answers. Stream it as assistant
# text (once) so it persists and is replayed. The card shows the
# options only, so this is the single visible copy of the question.
_auq = result["ask_user"]
_auq_q = (_auq.get("question") or "").strip()
if _auq_q and _auq_q not in full_response:
_auq_delta = ("\n\n" if full_response.strip() else "") + _auq_q
full_response += _auq_delta
yield 'data: ' + json.dumps({"delta": _auq_delta}) + '\n\n'
yield (
f'data: {json.dumps({"type": "ask_user", "data": result["ask_user"]})}\n\n'
)
_awaiting_user = True
# Build output for frontend tool bubble.
# Document tools get a short summary — content goes to the editor panel.
output_text = ""
@@ -2392,6 +2416,13 @@ async def stream_agent_loop(
if budget_hit:
break
# ask_user posed a question — stop here and wait for the user's choice.
# Don't feed tool results back or advance a round; the user's selection
# arrives as the next message and the agent resumes from there. The
# question text is already in the streamed response, so it persists.
if _awaiting_user:
break
# Feed results back to LLM for next round
_append_tool_results(messages, round_response, native_tool_calls,
tool_results, tool_result_texts, used_native, round_num,
+1 -1
View File
@@ -34,7 +34,7 @@ TOOL_TAGS = {"bash", "python", "web_search", "web_fetch", "read_file", "write_fi
"send_to_session",
"pipeline",
"manage_session", "manage_memory", "list_models",
"ui_control", "generate_image",
"ui_control", "generate_image", "ask_user",
"manage_tasks", "api_call", "ask_teacher", "manage_skills",
"suggest_document",
"manage_endpoints", "manage_mcp", "manage_webhooks",
+47
View File
@@ -1184,6 +1184,53 @@ async def execute_tool_block(
logger.warning("Public tool policy blocked owner=%r tool=%s", owner, tool)
return desc, result
# ask_user: the agent poses a multiple-choice question to the user to get a
# decision/clarification. This is a pure UI-control marker — no subprocess,
# no filesystem. It returns an `ask_user` payload that the agent loop turns
# into an `ask_user` SSE event and then ENDS the turn, so the chat waits for
# the user's selection (their choice arrives as the next message).
if tool == "ask_user":
import json as _json
question, options, multi = "", [], False
raw = (content or "").strip()
try:
parsed = _json.loads(raw) if raw else {}
except (ValueError, TypeError):
parsed = {}
if isinstance(parsed, dict):
question = str(parsed.get("question", "")).strip()
multi = bool(parsed.get("multi") or parsed.get("multiSelect"))
for opt in (parsed.get("options") or []):
if isinstance(opt, dict):
label = str(opt.get("label", "")).strip()
descr = str(opt.get("description", "")).strip()
elif isinstance(opt, str):
label, descr = opt.strip(), ""
else:
continue
if label:
options.append({"label": label, "description": descr})
else:
question = raw
if not question or len(options) < 2:
return "ask_user: invalid", {
"error": (
"ask_user needs a non-empty `question` and at least 2 `options` "
"(each an object with a `label`, optional `description`)."
),
"exit_code": 1,
}
options = options[:6] # keep the choice list sane
desc = f"ask_user: {question[:80]}"
labels = ", ".join(o["label"] for o in options)
result = {
"ask_user": {"question": question, "options": options, "multi": multi},
"output": f"Asked the user: {question}\nOptions: {labels}\nAwaiting their selection.",
"exit_code": 0,
}
logger.info("Tool executed: %s (%d options, multi=%s)", desc, len(options), multi)
return desc, result
# Background execution: a `bash` block whose first line is the `#!bg`
# marker runs DETACHED — returns a job id immediately so the chat stream
# isn't held open for a multi-minute install/ffmpeg/download. The always-on
+4
View File
@@ -52,6 +52,9 @@ ALWAYS_AVAILABLE = frozenset({
# of topic. Without this, RAG drops it and the agent falls back to
# app_api /api/memory/add which fails with 422 on first attempt.
"manage_memory",
# Ask the user a multiple-choice question for a decision/clarification.
# Always reachable so the agent can pause and ask at any point.
"ask_user",
})
# Tools that the Personal Assistant always has access to during scheduled
@@ -111,6 +114,7 @@ BUILTIN_TOOL_DESCRIPTIONS: Dict[str, str] = {
"list_sessions": "List all chats with their metadata (the UI calls these 'chats'). Use for 'list my chats', 'rename all my chats' (list first, then manage_session to rename each).",
"send_to_session": "Send a message to another chat. Cross-chat communication.",
"search_chats": "Search through chat history across all sessions.",
"ask_user": "Ask the user a multiple-choice question to get a decision or clarification. Use this when the task is genuinely ambiguous and the answer changes what you do next — pick between approaches, confirm an assumption, choose among options — instead of guessing. Provide a clear `question` and 2-6 `options` (each with a short `label`, optional `description`). Calling this ENDS your turn: the user sees clickable buttons and their choice arrives as your next message. Don't use it for things you can decide from context or sensible defaults, or for irreversible-action confirmation if a dedicated flow exists.",
"ui_control": "Control the UI and toggle tools on/off. Use this to turn off / turn on / disable / enable individual tools and features: shell (bash), search (web), research, browser, documents, incognito. Open panels (documents library, gallery, email inbox, sessions, notes, memories/brain, skills, settings, cookbook) via `open_panel <name>`. Use `open_email_reply <uid> <folder> reply` to open an email reply draft document without sending. Also switches between chat/agent modes, changes the current model, and applies/creates themes.",
"list_email_accounts": "List configured email accounts and default status. Use before reading or sending mail when the user mentions Gmail, work mail, custom domain mail, another mailbox, or asks to compare/check multiple inboxes.",
"list_emails": "List emails for a folder/account, newest first, including read messages by default. Shows subject, sender, date, UID, account, and AI summary. Check inbox, find emails needing replies. Supports account from list_email_accounts for Gmail/work/custom mailboxes. For last/latest/newest email, use max_results=1 and unread_only=false.",
+27
View File
@@ -447,6 +447,33 @@ FUNCTION_TOOL_SCHEMAS = [
}
}
},
{
"type": "function",
"function": {
"name": "ask_user",
"description": "Ask the user a multiple-choice question to get a decision or clarification when the task is genuinely ambiguous and the answer changes what you do next (e.g. pick between approaches, confirm an assumption, choose a target). The user sees clickable option buttons; calling this ENDS your turn and their selection arrives as your next message. Prefer sensible defaults over asking — only ask when you truly cannot proceed well without the user's input. Do NOT use it to confirm irreversible/destructive actions that have a dedicated confirmation flow.",
"parameters": {
"type": "object",
"properties": {
"question": {"type": "string", "description": "The question to ask. Be specific and self-contained."},
"options": {
"type": "array",
"description": "2-6 mutually exclusive choices. Each is an object with a short `label` and an optional `description` explaining the trade-off.",
"items": {
"type": "object",
"properties": {
"label": {"type": "string", "description": "Concise choice text the user clicks (1-5 words)."},
"description": {"type": "string", "description": "Optional one-line explanation of this choice."}
},
"required": ["label"]
}
},
"multi": {"type": "boolean", "description": "Set true to let the user select multiple options instead of one. Default false."}
},
"required": ["question", "options"]
}
}
},
{
"type": "function",
"function": {