mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-16 17:55:26 -04:00
Add ask_user tool: agent-posed multiple-choice questions (#2111)
Let the agent pause and ask the user a multiple-choice question when a task is genuinely ambiguous and the answer changes what it does next — choosing between approaches, confirming an assumption, picking a target — instead of guessing. Modeled on the existing `ui_control` marker pattern: the `ask_user` tool returns an `ask_user` payload that the agent loop emits as an SSE event and then ends the turn. The frontend renders the question with clickable option buttons, a free-text "Other" input, and an x to dismiss; the user's choice is sent as the next message and the agent resumes with it in context. - src/tool_execution.py: `ask_user` handler — pure UI marker, no I/O. Validates a non-empty question + 2..6 options, normalizes string/object options, returns the payload. - src/agent_loop.py: emit the `ask_user` event and break the round loop so the turn ends and waits for the user's selection. Stream the question as assistant text so it persists/replays (prevents a re-ask loop). - Registration: TOOL_TAGS, ALWAYS_AVAILABLE, BUILTIN_TOOL_DESCRIPTIONS, FUNCTION_TOOL_SCHEMAS, the system-prompt blurb. Not admin-gated (any user can be asked); the structured args serialize via the default json.dumps path. - routes/chat_routes.py: relay the `ask_user` event to the client. - static/js/chat.js + static/style.css: render the question card (options + free-text Other + dismiss x; removed once answered). Reuses CSS vars and the .modal-close button; emoji go through the monochrome-SVG pipeline. Bump chat.js cache pin. - tests/test_ask_user_tool.py: payload, multi flag, string options, option cap, validation errors, serializer round-trip, registration.
This commit is contained in:
committed by
GitHub
parent
621885ac06
commit
0a2adc9c96
@@ -335,6 +335,7 @@ If the user asks for a reminder/alarm before the event, pass `reminder_minutes`
|
||||
"search_chats": "- ```search_chats``` — Search across all chat history. Use when user asks 'did we discuss X?' or 'find the conversation about Y'.",
|
||||
"pipeline": "- ```pipeline``` — Run a multi-step AI pipeline. Args (JSON) with ordered steps, each specifying a model and prompt. Use for complex workflows.",
|
||||
"ui_control": "- ```ui_control``` — Control the UI: toggle tools on/off, OPEN PANELS, open email reply drafts, switch models, change themes. Commands: `toggle <name> on/off` (names: bash/shell, web/search, research, incognito, document_editor/documents), `open_panel <name>` (panels: documents, gallery, email, sessions, notes, memories/brain, skills, settings, cookbook), `open_email_reply <uid> <folder> <reply|reply-all|ai-reply>` (opens an email compose document, does NOT send), `set_mode agent/chat`, `switch_model <name>`, `set_theme <preset>`, `create_theme <name> <bg> <fg> <panel> <border> <accent>` (optional key=val for advanced colors AND background effects: bgPattern=<none|dots|synapse|rain|constellations|perlin-flow|petals|sparkles|embers>, bgEffectColor=#RRGGBB, bgEffectIntensity=<num>, bgEffectSize=<num>, frosted=true|false). \"open documents\" / \"open library\" / \"show gallery\" / \"open inbox\" / \"open notes\" / \"open cookbook\" all map to `open_panel <name>`. Theme presets: dark, light, midnight, paper, cyberpunk, retrowave, forest, ocean, ume, copper, terminal, organs, lavender, gpt, claude, cute.",
|
||||
"ask_user": "- ```ask_user``` — Ask the user a multiple-choice question when the task is genuinely ambiguous and the answer changes what you do next (pick an approach, confirm an assumption, choose a target). Args (JSON): {\"question\": \"...\", \"options\": [{\"label\": \"...\", \"description\": \"...\"?}, ...], \"multi\": false?}. 2-6 options. The user gets clickable buttons; calling this ENDS your turn and their choice comes back as your next message. Prefer sensible defaults — only ask when you truly can't proceed well without their input.",
|
||||
"list_served_models": "- ```list_served_models``` — Show what the Cookbook (LLM-serving subsystem) is currently running. NO args. Use this for ANY 'what's running' / 'what's serving' / 'show my cookbook' / 'is anything up' query. DO NOT shell out (`ps aux`, `docker ps`, etc.) — this tool is the source of truth. Failed serve tasks include recent logs plus diagnosis/retry suggestions; use those suggestions to call `serve_model` again with an adjusted command when appropriate.",
|
||||
"stop_served_model": "- ```stop_served_model``` — Stop a running model server. Args (JSON): {\"session_id\": \"<from list_served_models>\"}. Use for 'kill my cookbook' / 'stop the model' / 'shut down vLLM'.",
|
||||
"tail_serve_output": "- ```tail_serve_output``` — Read the actual tmux stderr/traceback of a CURRENTLY failing cookbook task. Args (JSON): {\"session_id\": \"<from list_served_models>\", \"tail\": 150?}. **Use ONLY after** you just launched something via `serve_model` AND `list_served_models` reports YOUR new task as `crashed`/`error`. DO NOT use it on old stopped/completed download tasks (they're historical noise — won't predict whether a new launch succeeds). DO NOT call it before launching a fresh attempt. When you do call it, bump `tail` to 400+ only if the visible error references 'see root cause above'.",
|
||||
@@ -1682,6 +1683,7 @@ async def stream_agent_loop(
|
||||
r"\b[^.\n]{0,140}",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
_awaiting_user = False # set by ask_user → end the turn and wait for a choice
|
||||
|
||||
# Document streaming state (persists across rounds)
|
||||
_doc_acc = "" # accumulated tool-call JSON arguments
|
||||
@@ -2263,6 +2265,28 @@ async def stream_agent_loop(
|
||||
f'data: {json.dumps({"type": "ui_control", "data": result})}\n\n'
|
||||
)
|
||||
|
||||
# ask_user: the agent posed a multiple-choice question. Emit it so the
|
||||
# frontend renders clickable options, then end the turn (below) and
|
||||
# wait — the user's pick becomes the next message.
|
||||
if "ask_user" in result:
|
||||
# The question lives in the tool args. ChatMessage.to_dict()
|
||||
# replays only role+content to the model next turn — tool_event
|
||||
# metadata is dropped — so if the question is never in the saved
|
||||
# assistant text, the model can't see it already asked and will
|
||||
# loop and re-ask after the user answers. Stream it as assistant
|
||||
# text (once) so it persists and is replayed. The card shows the
|
||||
# options only, so this is the single visible copy of the question.
|
||||
_auq = result["ask_user"]
|
||||
_auq_q = (_auq.get("question") or "").strip()
|
||||
if _auq_q and _auq_q not in full_response:
|
||||
_auq_delta = ("\n\n" if full_response.strip() else "") + _auq_q
|
||||
full_response += _auq_delta
|
||||
yield 'data: ' + json.dumps({"delta": _auq_delta}) + '\n\n'
|
||||
yield (
|
||||
f'data: {json.dumps({"type": "ask_user", "data": result["ask_user"]})}\n\n'
|
||||
)
|
||||
_awaiting_user = True
|
||||
|
||||
# Build output for frontend tool bubble.
|
||||
# Document tools get a short summary — content goes to the editor panel.
|
||||
output_text = ""
|
||||
@@ -2392,6 +2416,13 @@ async def stream_agent_loop(
|
||||
if budget_hit:
|
||||
break
|
||||
|
||||
# ask_user posed a question — stop here and wait for the user's choice.
|
||||
# Don't feed tool results back or advance a round; the user's selection
|
||||
# arrives as the next message and the agent resumes from there. The
|
||||
# question text is already in the streamed response, so it persists.
|
||||
if _awaiting_user:
|
||||
break
|
||||
|
||||
# Feed results back to LLM for next round
|
||||
_append_tool_results(messages, round_response, native_tool_calls,
|
||||
tool_results, tool_result_texts, used_native, round_num,
|
||||
|
||||
+1
-1
@@ -34,7 +34,7 @@ TOOL_TAGS = {"bash", "python", "web_search", "web_fetch", "read_file", "write_fi
|
||||
"send_to_session",
|
||||
"pipeline",
|
||||
"manage_session", "manage_memory", "list_models",
|
||||
"ui_control", "generate_image",
|
||||
"ui_control", "generate_image", "ask_user",
|
||||
"manage_tasks", "api_call", "ask_teacher", "manage_skills",
|
||||
"suggest_document",
|
||||
"manage_endpoints", "manage_mcp", "manage_webhooks",
|
||||
|
||||
@@ -1184,6 +1184,53 @@ async def execute_tool_block(
|
||||
logger.warning("Public tool policy blocked owner=%r tool=%s", owner, tool)
|
||||
return desc, result
|
||||
|
||||
# ask_user: the agent poses a multiple-choice question to the user to get a
|
||||
# decision/clarification. This is a pure UI-control marker — no subprocess,
|
||||
# no filesystem. It returns an `ask_user` payload that the agent loop turns
|
||||
# into an `ask_user` SSE event and then ENDS the turn, so the chat waits for
|
||||
# the user's selection (their choice arrives as the next message).
|
||||
if tool == "ask_user":
|
||||
import json as _json
|
||||
question, options, multi = "", [], False
|
||||
raw = (content or "").strip()
|
||||
try:
|
||||
parsed = _json.loads(raw) if raw else {}
|
||||
except (ValueError, TypeError):
|
||||
parsed = {}
|
||||
if isinstance(parsed, dict):
|
||||
question = str(parsed.get("question", "")).strip()
|
||||
multi = bool(parsed.get("multi") or parsed.get("multiSelect"))
|
||||
for opt in (parsed.get("options") or []):
|
||||
if isinstance(opt, dict):
|
||||
label = str(opt.get("label", "")).strip()
|
||||
descr = str(opt.get("description", "")).strip()
|
||||
elif isinstance(opt, str):
|
||||
label, descr = opt.strip(), ""
|
||||
else:
|
||||
continue
|
||||
if label:
|
||||
options.append({"label": label, "description": descr})
|
||||
else:
|
||||
question = raw
|
||||
if not question or len(options) < 2:
|
||||
return "ask_user: invalid", {
|
||||
"error": (
|
||||
"ask_user needs a non-empty `question` and at least 2 `options` "
|
||||
"(each an object with a `label`, optional `description`)."
|
||||
),
|
||||
"exit_code": 1,
|
||||
}
|
||||
options = options[:6] # keep the choice list sane
|
||||
desc = f"ask_user: {question[:80]}"
|
||||
labels = ", ".join(o["label"] for o in options)
|
||||
result = {
|
||||
"ask_user": {"question": question, "options": options, "multi": multi},
|
||||
"output": f"Asked the user: {question}\nOptions: {labels}\nAwaiting their selection.",
|
||||
"exit_code": 0,
|
||||
}
|
||||
logger.info("Tool executed: %s (%d options, multi=%s)", desc, len(options), multi)
|
||||
return desc, result
|
||||
|
||||
# Background execution: a `bash` block whose first line is the `#!bg`
|
||||
# marker runs DETACHED — returns a job id immediately so the chat stream
|
||||
# isn't held open for a multi-minute install/ffmpeg/download. The always-on
|
||||
|
||||
@@ -52,6 +52,9 @@ ALWAYS_AVAILABLE = frozenset({
|
||||
# of topic. Without this, RAG drops it and the agent falls back to
|
||||
# app_api /api/memory/add which fails with 422 on first attempt.
|
||||
"manage_memory",
|
||||
# Ask the user a multiple-choice question for a decision/clarification.
|
||||
# Always reachable so the agent can pause and ask at any point.
|
||||
"ask_user",
|
||||
})
|
||||
|
||||
# Tools that the Personal Assistant always has access to during scheduled
|
||||
@@ -111,6 +114,7 @@ BUILTIN_TOOL_DESCRIPTIONS: Dict[str, str] = {
|
||||
"list_sessions": "List all chats with their metadata (the UI calls these 'chats'). Use for 'list my chats', 'rename all my chats' (list first, then manage_session to rename each).",
|
||||
"send_to_session": "Send a message to another chat. Cross-chat communication.",
|
||||
"search_chats": "Search through chat history across all sessions.",
|
||||
"ask_user": "Ask the user a multiple-choice question to get a decision or clarification. Use this when the task is genuinely ambiguous and the answer changes what you do next — pick between approaches, confirm an assumption, choose among options — instead of guessing. Provide a clear `question` and 2-6 `options` (each with a short `label`, optional `description`). Calling this ENDS your turn: the user sees clickable buttons and their choice arrives as your next message. Don't use it for things you can decide from context or sensible defaults, or for irreversible-action confirmation if a dedicated flow exists.",
|
||||
"ui_control": "Control the UI and toggle tools on/off. Use this to turn off / turn on / disable / enable individual tools and features: shell (bash), search (web), research, browser, documents, incognito. Open panels (documents library, gallery, email inbox, sessions, notes, memories/brain, skills, settings, cookbook) via `open_panel <name>`. Use `open_email_reply <uid> <folder> reply` to open an email reply draft document without sending. Also switches between chat/agent modes, changes the current model, and applies/creates themes.",
|
||||
"list_email_accounts": "List configured email accounts and default status. Use before reading or sending mail when the user mentions Gmail, work mail, custom domain mail, another mailbox, or asks to compare/check multiple inboxes.",
|
||||
"list_emails": "List emails for a folder/account, newest first, including read messages by default. Shows subject, sender, date, UID, account, and AI summary. Check inbox, find emails needing replies. Supports account from list_email_accounts for Gmail/work/custom mailboxes. For last/latest/newest email, use max_results=1 and unread_only=false.",
|
||||
|
||||
@@ -447,6 +447,33 @@ FUNCTION_TOOL_SCHEMAS = [
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "ask_user",
|
||||
"description": "Ask the user a multiple-choice question to get a decision or clarification when the task is genuinely ambiguous and the answer changes what you do next (e.g. pick between approaches, confirm an assumption, choose a target). The user sees clickable option buttons; calling this ENDS your turn and their selection arrives as your next message. Prefer sensible defaults over asking — only ask when you truly cannot proceed well without the user's input. Do NOT use it to confirm irreversible/destructive actions that have a dedicated confirmation flow.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"question": {"type": "string", "description": "The question to ask. Be specific and self-contained."},
|
||||
"options": {
|
||||
"type": "array",
|
||||
"description": "2-6 mutually exclusive choices. Each is an object with a short `label` and an optional `description` explaining the trade-off.",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"label": {"type": "string", "description": "Concise choice text the user clicks (1-5 words)."},
|
||||
"description": {"type": "string", "description": "Optional one-line explanation of this choice."}
|
||||
},
|
||||
"required": ["label"]
|
||||
}
|
||||
},
|
||||
"multi": {"type": "boolean", "description": "Set true to let the user select multiple options instead of one. Default false."}
|
||||
},
|
||||
"required": ["question", "options"]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
|
||||
Reference in New Issue
Block a user