Settings overhaul + UI polish pass

Two months of iteration on the Settings panel, integration forms, and small visual nudges across the app. Highlights: Settings restructure - Add Models: split into separate Local + API cards (no more in-card tabs); each fuses Type/Provider with the URL input. - Added Models: new dedicated sidebar tab, with Probe + Clear-offline pulled into its header; Local/API sub-section icons accent-tinted. - Search: Web Search and a new Deep Research card (Model + tuning), with a cross-link to AI Defaults. Provider hints use real clickable anchors; Web Search Test button shows a whirlpool spinner. - AI Defaults: Image Generation card returns; Research Model card carries only Endpoint+Model with a cross-link to Search; Vision / Default / Utility fallbacks unified under one numbered-row design matching Search's chain. - API Permissions (was 'API Tokens'): per-row rename, inline Permissions toggle that expands the scope-edit panel, in-field copy icons (icon→check on success). Empty state accent-tinted. - Integrations: + Add Integration drops a type-picker menu directly under the button (drop-up on tight viewports); each integration form (API, CalDAV, CardDAV, Email, Codex/Claude, Vault, MCP) uses the same accent-outlined Save/Test/Cancel buttons right-aligned. - Danger Zone: Wipe→Delete with trash icons; new 'Delete everything' row at the bottom that loops every category. AI Synthesis (Reminders) - Persona dropdown sourced from PROMPT_TEMPLATES + custom preset. - src/reminder_personas.py mirrors the five built-ins for the server-side synthesis path. - dispatch_reminder() reads reminder_llm_persona and uses the persona's system prompt; empty/unknown falls back to warm-neutral. Esc handling - Kebab menus and the provider picker intercept Esc in capture phase so dismissing a popup no longer closes the whole Settings modal. Accent tinting - Scoped CSS rule across data-settings-panel=ai/services/added-models/ search/integrations/reminders for card h2 icons + the Added Models sub-section icons. Codex/Claude integration form - No more auto-creation on form open — explicit Create token button. - New tokens start with every scope granted; existing tokens move out of the integration form into the API Permissions card. - Setup reveal: copy buttons inline inside the token + setup code blocks; shorter subtitle wording. Misc visual polish - Save/Test/Cancel uniformly accent-outlined and right-aligned on every integration form. - Provider logos render inline next to the search fallback selects and the Deep Research Search dropdown. - Trash icons in fallback rows bumped to 20x20 so they fill the 32px button. - Image generation default flipped to off.
2026-06-19 19:25:27 -04:00 · 2026-06-10 15:15:13 +09:00
parent 7690860ab1
commit 4f7061fd61
18 changed files with 1512 additions and 552 deletions
@@ -1256,7 +1256,7 @@ def _build_base_prompt(
    from src.tool_index import ALWAYS_AVAILABLE

    disabled = set(disabled_tools or [])
-    if not get_setting("image_gen_enabled", True):
+    if not get_setting("image_gen_enabled", False):
        disabled.add("generate_image")

    if relevant_tools is not None:
@@ -199,11 +199,20 @@ def _fit_inline_attachment_text(
    return text[:remaining] + marker, 0


-def _process_office_document(path: str, display_name: str) -> str:
+def _process_office_document(
+    path: str,
+    display_name: str,
+    session_id: str | None = None,
+    auto_opened_docs: list[Dict[str, Any]] | None = None,
+    owner: str | None = None,
+) -> str:
    """Extract an Office/EPUB document to Markdown via the optional markitdown dep.

    Falls back to a friendly banner when markitdown is unavailable or finds no
-    text, so a missing optional dependency never breaks the chat path.
+    text, so a missing optional dependency never breaks the chat path. When a
+    session_id is provided AND the extraction succeeded, the FULL text is also
+    saved as a Document so the agent can page through it via
+    `manage_documents action=read offset=…` after the inline copy is capped.
    """
    from src.markitdown_runtime import (
        is_markitdown_format,
@@ -218,6 +227,46 @@ def _process_office_document(path: str, display_name: str) -> str:
    if markdown and markdown.strip():
        title = os.path.splitext(os.path.basename(path))[0]
        body, marker = _truncate_inline(markdown)
+
+        # Persist the full extracted text as a Document. The agent's existing
+        # manage_documents tool can then read past the inline cap with offset.
+        doc_id = None
+        if session_id:
+            try:
+                from src.office_doc import create_office_document
+                doc_id = create_office_document(
+                    session_id=session_id,
+                    upload_id=os.path.basename(path),
+                    title=title,
+                    body_text=markdown,
+                )
+                if doc_id and auto_opened_docs is not None:
+                    from src.database import SessionLocal, Document
+                    _db = SessionLocal()
+                    try:
+                        _d = _db.query(Document).filter(Document.id == doc_id).first()
+                        if _d:
+                            auto_opened_docs.append({
+                                "doc_id": _d.id,
+                                "title": _d.title,
+                                "language": _d.language,
+                                "content": _d.current_content,
+                                "version": _d.version_count,
+                            })
+                    finally:
+                        _db.close()
+            except Exception as e:
+                logger.warning("Office auto-doc creation failed for %s: %s", path, e)
+
+        # Upgrade the truncation marker with a hint pointing at the full doc so
+        # the agent knows it can read the rest.
+        if doc_id and marker:
+            marker = (
+                f"\n[…truncated for inline context — full {len(markdown):,} chars "
+                f"saved as document `{doc_id}`. Use `manage_documents` with "
+                f"action=read, document_id={doc_id}, offset=<N> to page through.]"
+            )
+
        return f"\n\n[Document content — {title}]:\n{body}{marker}"

    # No content: tell the user whether to install the optional dep or whether
@@ -521,7 +570,13 @@ def build_user_content(
            elif mime.startswith("text/") or _is_text_file(path):
                extracted_text = _process_text_file(path)
            else:
-                extracted_text = _process_office_document(path, display_name)
+                extracted_text = _process_office_document(
+                    path,
+                    display_name,
+                    session_id=session_id,
+                    auto_opened_docs=auto_opened_docs,
+                    owner=owner,
+                )

            extracted_text, inline_attachment_remaining = _fit_inline_attachment_text(
                extracted_text,
@@ -40,15 +40,59 @@ def load_markitdown():
    return MarkItDown


+def _extract_docx_native(path: str) -> str | None:
+    """Pure-Python .docx text extractor — no external deps.
+
+    A .docx file is just a zip of XML. The body prose lives in <w:t> runs
+    inside <w:p> paragraphs. Iterating with ElementTree (rather than
+    re.findall) keeps paragraph breaks intact and lets the XML parser handle
+    namespaces + entity unescaping. Loses tables, footnotes, images and
+    list bullets — keeps ~95% of "summarize this doc" content, which is the
+    case people hit when markitdown isn't installed.
+    """
+    import zipfile
+    import xml.etree.ElementTree as ET
+
+    ns = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
+    try:
+        with zipfile.ZipFile(path) as z:
+            xml_bytes = z.read("word/document.xml")
+    except (zipfile.BadZipFile, KeyError, OSError):
+        return None
+    try:
+        root = ET.fromstring(xml_bytes)
+    except ET.ParseError:
+        return None
+    paragraphs: list[str] = []
+    for para in root.iter(f"{ns}p"):
+        runs = [t.text or "" for t in para.iter(f"{ns}t")]
+        line = "".join(runs).strip()
+        if line:
+            paragraphs.append(line)
+    return "\n\n".join(paragraphs) if paragraphs else None
+
+
 def convert_to_markdown(path: str) -> str | None:
    """Convert a document to Markdown text via markitdown.

    Returns the extracted Markdown, or ``None`` if markitdown is unavailable or
    the conversion fails — callers degrade gracefully rather than erroring.
+
+    Fallback: when markitdown isn't installed and the file is a .docx, run
+    the bundled pure-Python extractor so the most common case (Word docs)
+    works out of the box. Other Office/EPUB formats still need markitdown.
    """
    try:
        markitdown_cls = load_markitdown()
    except RuntimeError:
+        if isinstance(path, str) and path.lower().endswith(".docx"):
+            text = _extract_docx_native(path)
+            if text:
+                logger.info(
+                    "markitdown not installed — used native .docx extractor for %s",
+                    path,
+                )
+                return text
        logger.warning("markitdown not installed; cannot extract %s", path)
        return None
    try:
@@ -0,0 +1,73 @@
+"""Auto-create a Document row from an Office attachment.
+
+When a .docx (and friends) lands in chat, the full extracted text is stored
+as a Document so the agent can page through it with `manage_documents
+action=read offset=…` even after the inline chat payload was capped. Mirrors
+the PDF auto-doc pattern in `src.pdf_form_doc`.
+"""
+
+import logging
+import uuid
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+
+def create_office_document(
+    session_id: str,
+    upload_id: str,
+    title: str,
+    body_text: Optional[str] = None,
+) -> Optional[str]:
+    """Create a markdown Document for an Office attachment and set it active.
+
+    Returns the new doc_id, or None on failure / empty body. The full
+    extracted body lives in `current_content`, so the agent can fetch
+    arbitrary windows via `manage_documents action=read` even when the
+    inline chat copy was truncated.
+    """
+    from src.database import (
+        SessionLocal,
+        Document,
+        DocumentVersion,
+        Session as DbSession,
+    )
+    from src.tool_implementations import set_active_document
+
+    if not body_text or not body_text.strip():
+        return None
+
+    db = SessionLocal()
+    try:
+        doc_id = str(uuid.uuid4())
+        ver_id = str(uuid.uuid4())
+        sess = db.query(DbSession).filter(DbSession.id == session_id).first()
+        doc = Document(
+            id=doc_id,
+            session_id=session_id,
+            title=title,
+            language="markdown",
+            current_content=body_text,
+            version_count=1,
+            is_active=True,
+            owner=sess.owner if sess else None,
+        )
+        ver = DocumentVersion(
+            id=ver_id,
+            document_id=doc_id,
+            version_number=1,
+            content=body_text,
+            summary="Imported from Office attachment",
+            source="upload",
+        )
+        db.add(doc)
+        db.add(ver)
+        db.commit()
+        set_active_document(doc_id)
+        return doc_id
+    except Exception as e:
+        db.rollback()
+        logger.error("Failed to create office document: %s", e)
+        return None
+    finally:
+        db.close()
@@ -0,0 +1,78 @@
+"""Server-side mirror of the built-in characters used for reminder synthesis.
+
+The frontend ships these in static/js/presets.js (PROMPT_TEMPLATES with
+isCharacter:true). The Reminders → AI Synthesis card writes only the
+persona ID into settings; the synthesis route in note_routes.py needs
+the full prompt text to bias the utility model's voice. Keeping a small
+local mirror avoids having the client send the prompt over the wire on
+every reminder fire.
+
+If the user picks a custom character (id == "custom") we fall back to
+the warm-neutral baseline — custom prompts live in browser localStorage
+and aren't visible to the server.
+"""
+
+PERSONAS = {
+    "socrates": (
+        "Never answer directly. Respond only with questions — sharp, layered, "
+        "Socratic. Expose contradictions. Make the person argue with themselves "
+        "until the truth falls out. Use irony like a scalpel. Be genuinely "
+        "curious, never condescending."
+    ),
+    "razor": (
+        "Strip everything to the bone. No filler, no hedging, no pleasantries. "
+        "Answer in the fewest words possible. If one sentence works, don't use "
+        "two. If a word adds nothing, cut it. Blunt, precise, surgical."
+    ),
+    "nietzsche": (
+        "Think and respond through the lens of Nietzsche. Analyze every "
+        "question in terms of will to power, self-overcoming, eternal "
+        "recurrence, ressentiment, value-creation, and master-slave morality. "
+        "Write with aphoristic force — sharp, compressed, vivid, and "
+        "unapologetic — but do not sacrifice depth for style. Favor "
+        "life-affirmation, discipline, courage, style, rank, self-overcoming, "
+        "and amor fati over nihilism, conformity, ressentiment, and self-pity."
+    ),
+    "spark": (
+        "You are Spark, a playful, quick-witted assistant with bright energy "
+        "and practical instincts. Keep responses concise, vivid, and helpful. "
+        "Be warm without being cloying, imaginative without losing the thread, "
+        "and always center the user's actual goal. Use a light, lively voice "
+        "with occasional clever turns of phrase."
+    ),
+    "odysseus": (
+        "You are Odysseus, king of Ithaca — subtle in counsel, disciplined in "
+        "judgment, and unmatched in strategic cunning. Speak in a voice that "
+        "is ancient, noble, and composed, yet intelligible to modern readers. "
+        "Be eloquent but not flowery. Be wise but not vague. Speak as one who "
+        "has weathered storms and taken back his house by wit, timing, and "
+        "resolve."
+    ),
+}
+
+
+_DEFAULT_SYNTHESIS_TONE = (
+    "You write short, warm, one-line reminders. The user has set a note for "
+    "themselves and the moment to remember has arrived. Keep it under 18 "
+    "words. Be human, gentle, and direct — never robotic."
+)
+
+
+def synthesis_system_prompt(persona_id: str) -> str:
+    """Return the system prompt for reminder synthesis given a persona id.
+
+    Falls back to the warm-neutral baseline when the id is empty, unknown,
+    or refers to a custom (client-only) character we don't have on file.
+    """
+    persona = (persona_id or "").strip().lower()
+    persona_prompt = PERSONAS.get(persona)
+    if persona_prompt:
+        # Persona drives the voice; the synthesis-instruction stays attached
+        # so the model knows it's writing a short reminder, not a chat reply.
+        return (
+            persona_prompt
+            + "\n\n"
+            + "You are now writing a single one-line reminder for the user. "
+              "Keep it under 18 words and in the voice above."
+        )
+    return _DEFAULT_SYNTHESIS_TONE
@@ -29,7 +29,7 @@ def _invalidate_caches():
 # ── Default values ──

 DEFAULT_SETTINGS = {
-    "image_gen_enabled": True,
+    "image_gen_enabled": False,
    "image_model": "",
    "image_quality": "medium",
    "vision_model": "",
@@ -143,6 +143,7 @@ DEFAULT_SETTINGS = {
    # Reminders
    "reminder_channel": "browser",   # "browser" | "email" | "ntfy" | "webhook"
    "reminder_llm_synthesis": False,
+    "reminder_llm_persona": "",
    "reminder_ntfy_topic": "Reminders",
    "reminder_email_to": "",
    # Generic outbound webhook channel: pick any saved Integration as the
@@ -1436,9 +1436,25 @@ async def do_manage_documents(content: str, owner: Optional[str] = None) -> Dict
            if not doc:
                return {"error": f"Document '{doc_id}' not found", "exit_code": 1}
            body = doc.current_content or ""
+            total = len(body)
+            # Clamp offset to [0, total] so a far-out offset returns an empty
+            # window with a useful "end of document" hint rather than erroring.
+            try: offset = int(args.get("offset", 0))
+            except (TypeError, ValueError): offset = 0
+            offset = max(0, min(offset, total))
            preview_limit = int(args.get("limit", MAX_READ_CHARS))
-            truncated = len(body) > preview_limit
-            preview = body[:preview_limit] + (f"\n... (truncated, {len(body)} chars total)" if truncated else "")
+            chunk = body[offset:offset + preview_limit]
+            next_offset = offset + len(chunk)
+            has_more = next_offset < total
+            # Trailing marker — tells the agent (and a curious human) exactly
+            # what to pass next to continue paginating.
+            if has_more:
+                marker = f"\n... ({total - next_offset:,} more chars; pass offset={next_offset} to continue)"
+            elif offset > 0:
+                marker = f"\n... (end of document, {total:,} chars total)"
+            else:
+                marker = ""
+            preview = chunk + marker
            anchor = f"[{doc.title}](#document-{doc.id})"
            return {
                "response": f"{anchor} — click to open in editor.\n\n```{doc.language or ''}\n{preview}\n```",
@@ -1446,9 +1462,11 @@ async def do_manage_documents(content: str, owner: Optional[str] = None) -> Dict
                    "id": doc.id,
                    "title": doc.title,
                    "language": doc.language,
-                    "size": len(body),
-                    "content": preview,
-                    "truncated": truncated,
+                    "size": total,
+                    "content": chunk,
+                    "offset": offset,
+                    "next_offset": next_offset if has_more else None,
+                    "truncated": has_more,
                },
                "exit_code": 0,
            }
@@ -94,7 +94,7 @@ BUILTIN_TOOL_DESCRIPTIONS: Dict[str, str] = {
    "manage_mcp": "MCP server management: list, add, delete, reconnect servers, or list available tools.",
    "manage_webhooks": "Webhook management: list, add, delete, enable, or disable webhooks.",
    "manage_tokens": "API token management: list, create, or delete API access tokens.",
-    "manage_documents": "List, read, delete, or tidy documents in the editor panel. action='list' returns clickable rows (most-recent first) so the user can open any doc by clicking. action='read' (aka view/open/get) with document_id returns the content. action='delete' with document_id removes a doc (only way to delete). Use this for ANY 'show/read/list/open my documents/docs/files/notes' request — never shell or curl.",
+    "manage_documents": "List, read, delete, or tidy documents in the editor panel. action='list' returns clickable rows (most-recent first) so the user can open any doc by clicking. action='read' (aka view/open/get) with document_id returns the content; supports offset=<N> + limit=<N> to page through large docs (response includes next_offset when more remains, so you can keep calling with offset=next_offset). action='delete' with document_id removes a doc (only way to delete). Use this for ANY 'show/read/list/open my documents/docs/files/notes' request — never shell or curl.",
    "manage_research": "List, read/open, or delete saved DEEP RESEARCH results from the Library. action='list' returns clickable [query](#research-<id>) rows (most-recent first). action='read' (aka open/view/get) with id returns the report + sources. action='delete' with id removes it. Use this for ANY 'open/read/find/delete my research / that report / the research on X' request. NOTE: this is for EXISTING research; to START new research use trigger_research.",
    "manage_settings": "Change ANY real app setting (the ones the Settings panel writes) so the user never has to open it: TTS voice/provider/speed, STT, search engine + result count, default/teacher/task/utility/vision/image/research models, image quality, reminder channel (browser/email/ntfy), agent timeout/tool-call budget, and more. action=set with key (friendly aliases ok: voice, 'search engine', 'default model', 'teacher model', 'image quality', 'reminder channel'...) + value; get/list/reset too. Also toggles tools on/off (disable_tool/enable_tool/list_tools). Secrets/API keys are read-only. Use for any 'change my…/set my…/use X for…/turn on…' preference request.",
    "create_session": "Create a new chat with a name and model.",