routes.email_helpers._decode_header joined the runs from
email.header.decode_header() with " ". Those runs carry their own
surrounding whitespace (e.g. (b"Re: ", None)), and RFC 2047 §6.2 requires
the whitespace between two adjacent encoded-words to be dropped, so the
join produced a double space after an ASCII prefix ("Re: Jóse"), a
spurious space in "Name <addr>" senders, and a stray space between two
adjacent encoded-words ("Café 日本"). _decode_header backs the inbox list,
message read, search, and the background pollers, so the corruption hit
essentially every non-ASCII subject/sender.
Use email.header.make_header(...) for RFC-correct concatenation, keeping
the existing lossy per-part fallback for malformed/unknown MIME charsets
(make_header raises LookupError there) so the unknown-charset contract in
tests/test_email_decode_header.py still holds.
The sibling mcp_servers.email_server._decode_header was already fixed the
same way (commit 46999de); this brings the routes.email_helpers copy in
line, with regression coverage.
Supported by Claude Opus 4.8
Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>
Three IMAP connection leaks were recently fixed via try/finally
(#1325, #1330, #1423). This commit applies the same pattern to the
remaining callsites that still used inline logout-only cleanup.
routes/email_helpers.py:
- _fetch_sender_thread_context: conn was uninitialized when the outer
try/except returned early on connect failure, causing the finally
block to crash on conn.close()/conn.logout(). Merged the two
separate try blocks into one and added conn=None guard.
- _pre_retrieve_context: ctx_conn.logout() was inside the loop body
with no finally, so any exception in the folder/search loop leaked
the socket. Moved cleanup into a finally block with ctx_conn=None
guard.
mcp_servers/email_server.py:
- _list_emails: multiple inline conn.logout() calls on early-return
paths; exception between them leaked the socket. Wrapped in
try/finally.
- _read_email: same pattern — four separate logout() calls replaced
by a single finally block.
- _reply_to_email: logout() called before the error check, so an
exception in conn.select() leaked the socket. Wrapped in
try/finally.
- _download_attachment: same pattern as _reply_to_email.
Also adds tests/test_imap_leak_fixes.py with 9 regression tests (one
per function/failure-mode) that monkeypatch _imap_connect and assert
conn.logout() is called exactly once even when IMAP operations raise.
Python's imaplib._MAXLINE defaults to 1 MB. Mailboxes with tens of
thousands of messages exceed this on UID SEARCH ALL, crashing with
'got more than 1000000 bytes'.
Set _MAXLINE to 50 MB after opening the connection so large mailboxes
work without error.
Fixes#2883
Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>
A header that declares an unknown or invalid MIME charset (e.g. a malformed
or spam Subject like =?x-unknown-charset?B?...?=) raised an uncaught
LookupError. bytes.decode(..., errors="replace") only handles byte-decode
errors, not codec *lookup* failures, so the "replace" safety net did not
apply.
_decode_header decodes Subject/From/To/Cc for the inbox list, single-message
fetch, and the background mail pollers (routes/email_routes.py,
routes/email_pollers.py, src/builtin_actions.py), so a single bad message
could crash the whole inbox render or the poller loop.
Wrap the per-part decode in try/except (LookupError, ValueError) and fall
back to utf-8/replace. Valid charsets (utf-8, iso-8859-1, ...) are unchanged.
Adds tests/test_email_decode_header.py — the unknown-charset case fails
before this change and passes after.
If c.store() or c.expunge() raised an exception, the connection was
never logged out. Use try/finally to ensure c.logout() is always
called regardless of how the function exits.
When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still
returned 401 (api/models, api/research/*, api/email/*), so the front-end
redirected the browser to /login and the app was unusable despite auth being
turned off. require_user() in src/auth_helpers.py already documents and honors
this contract (issue #622) via 'if _auth_disabled(): return ""', but these
endpoints did their own get_current_user/is_configured check without it.
Make _require_user (research), the /api/models anti-leak guard, and
email_helpers._require_auth consult _auth_disabled() and let anonymous through
(owner='') only when the operator explicitly disabled auth. The 401 protection
is fully intact when AUTH_ENABLED=true. Verified end-to-end: with
AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.
Hardens issues found in a security review of the current tree (separate from
the cookbook SSH PR):
- Email thread rendering (static/js/emailLibrary.js): the flat read path runs
inbound HTML through the allowlist sanitizer, but the two threaded paths
(_renderTurnsAsBubbles / _renderTurnsFromServer — the default view) injected
server-parsed `body_html` raw into the DOM. A crafted inbound email could
inject arbitrary markup (phishing/form/credential-capture/tracking; full XSS
if a deployment relaxes the script CSP). Now sanitized on all paths.
- Attachment extraction (routes/email_routes.py, routes/email_helpers.py): the
on-disk extraction dir was `ATTACHMENTS_DIR / f"{folder}_{uid}"` with
user-controlled folder/uid and no containment, so a folder like `../../tmp`
could escape ATTACHMENTS_DIR. New attachment_extract_dir() flattens both to a
single safe segment and asserts containment.
- Diagnostics routes (routes/diagnostics_routes.py): /api/db/stats,
/api/rag/stats, /api/test/youtube, /api/test-research relied only on the
global session check (any logged-in user). Now require_admin-gated.
- Defense-in-depth HTML escaping: session HTML export escapes the session name
(routes/session_routes.py); the MCP OAuth page escapes the reflected Host
header / server_id (routes/mcp_routes.py).
- Internal-tool token now compared with secrets.compare_digest (constant time)
in core/middleware.py and app.py.
Adds regression tests in tests/test_security_regressions.py.