odysseus

mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-17 02:05:22 -04:00

Author	SHA1	Message	Date
Hriday Ranka	270b8570fc	feat(email): add Google OAuth2 for Google Workspace / .edu IMAP & SMTP (#237 ) * feat(email): add Google OAuth2 for Google Workspace / .edu IMAP & SMTP Google deprecated basic-auth (password) access for Google Workspace accounts in May 2025. This means any .edu or org Google email account could no longer connect via IMAP/SMTP with a username + password — the email feature was silently broken for a large class of users. This PR adds full OAuth2 (XOAUTH2) support for Google accounts so Workspace / .edu emails work out of the box. ## What changed ### Backend - `core/database.py`: add `oauth_provider`, `oauth_access_token`, `oauth_refresh_token`, `oauth_token_expiry`, and `display_name` columns to `EmailAccount` + idempotent migration - `routes/email_helpers.py`: XOAUTH2 auth in `_imap_connect()` and `_send_smtp_message()`, automatic token refresh, OAuth fields in `_get_email_config()` - `routes/email_routes.py`: OAuth authorize + callback routes, `_smtp_ready()` fix, OAuth fields through `_deliver()` closure, `display_name` in `From:` header ### Frontend - `static/js/settings.js`: "Google Workspace / .edu" provider preset, "Connect with Google" button, success/error banner, display name field - `static/js/document.js`: `_accountCanSend()` recognises OAuth accounts as SMTP-capable * security: sign OAuth state, scope callback by owner, fix quotes & logs Addresses reviewer feedback on the email OAuth2 PR: - OAuth state is now HMAC-SHA256 signed (keyed with the app secret from secret_storage) encoding account_id + owner + a random nonce, and is verified with constant-time comparison in the callback before any token write. Replaces the bare account_id state, closing the CSRF / state-guessing gap. - Callback extracts the owner from the verified state and re-checks it against EmailAccount.owner before writing tokens, matching the ownership guards used elsewhere in the email routes. Single-user mode (owner == "") still accepts any account, consistent with _assert_owns_account. - Replaced curly/smart quotes in the Name/Email/Display Name input rows with plain ASCII so getElementById lookups and event wiring work. - Stripped account name, SMTP host/user, owner, and raw provider error text from send-config and OAuth logs; failures now surface as generic error codes in the redirect instead of raw exception strings. * test(email): add OAuth2 state, _smtp_ready, and XOAUTH2 tests Move the OAuth state sign/verify helpers out of the setup_email_routes closure into module-level make_oauth_state/verify_oauth_state in email_helpers.py so they can be unit-tested, then add tests/test_email_oauth.py: - signed state round-trips account_id + owner, nonce is unique per call - tampered account_id, forged signature, and garbage states are rejected - _smtp_ready treats an OAuth account (no password) as send-capable, and still rejects host+user-only accounts with neither password nor OAuth - _xoauth2_string / _xoauth2_bytes produce the correct SASL XOAUTH2 framing 14 new tests; existing test_security_regressions.py still passes (28). * refactor(email): single XOAUTH2 frame helper, use RuntimeError Polish from self-review before merge: - Collapse the XOAUTH2 framing to one source of truth: _xoauth2_raw() returns the unencoded SASL string used by both the SMTP and IMAP auth callbacks (each library base64-encodes it), and _xoauth2_bytes() is just its .encode(). Removes the unused base64 _xoauth2_string helper and the duplicated inline frame in _send_smtp_message. - Raise RuntimeError (not bare Exception) for the "OAuth token unavailable" path, matching the convention used across src/. - Update tests accordingly. All 14 OAuth tests + 28 security regressions pass; SMTP/IMAP XOAUTH2 verified live against a real Workspace account. * tests(email-oauth): cover the security-sensitive OAuth paths before merge The previous tests only exercised pure helpers (state signing, _smtp_ready, XOAUTH2 framing). This adds coverage for the actual token-custody and ownership behaviour, pinning the real route handlers rather than re-implementations of their logic. Real OAuth callback route (pulled live from setup_email_routes()): - missing code -> generic missing_code redirect, no account id / owner in URL - provider error -> generic google_error redirect, raw error not echoed - tampered/invalid state -> invalid_state redirect, auth code never leaked - signed state with owner mismatch -> token write refused (ownership_error), DB row left untouched - signed state with matching owner -> tokens written encrypted, and only to the intended account (a second account stays untouched) Real accounts-list route: - exposes oauth_provider status but never the access/refresh token values, encrypted or otherwise Token storage / refresh helpers (isolated in-memory SQLite, mocked HTTP): - refreshed access token stored encrypted; expiry is a timestamp, not a token - fresh token uses cache (no refresh call); expired token triggers refresh - refresh HTTP failure returns None silently, no exception or secret surfaced - missing client credentials short-circuits to None Password-account regression: - password IMAP accounts call conn.login(); OAuth accounts call XOAUTH2 authenticate() and never login() 28 tests pass (14 prior + 14 new). * fix(email-oauth): drop raw exception text from token-refresh log Google token refresh failures now log the account id only, matching the conservative logging used elsewhere on the OAuth path — no raw provider/exception details surfacing in logs. * fix(email-oauth): bring OAuth UI parity to the Integrations email form The Google Workspace / .edu provider preset, Display Name field, and Connect-with-Google flow were only wired into the Email-tab account form. The Integrations-tab form (a separate code path for the same account type) was missing all three, so the OAuth option was invisible from that entry point. Mirrors the same PROVIDERS entry, OAuth section, and connect handler so both forms behave identically. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-15 17:02:58 +01:00
RaresKeY	50fedff2f2	fix(email): scope learned sender signatures by owner (#3724 )	2026-06-11 13:26:59 +02:00
Sheikh Rahat Mahmud	9180847c0e	feat(diagnostics): add consolidated service health endpoint for degraded-state reporting (#964 ) * Add consolidated service health endpoint for degraded-state reporting ROADMAP (High Priority) asks for "Better degraded-state reporting for ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no single readout of which subsystems are actually working: /api/health is only a liveness ping and each subsystem's signal lives in a different module, so a misconfigured self-host install gives no consolidated picture. This adds an admin-only GET /api/diagnostics/services endpoint backed by a new src/service_health.py aggregator. Each subsystem reports a uniform {name, status, detail, meta} where status is ok \| degraded \| down \| disabled, and the response rolls up an overall verdict (worst non-disabled status). Probes are deliberately non-intrusive and safe to poll: - ChromaDB: reads the .healthy flags on the RAG and memory vector stores. - SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No search query is run. - ntfy: GET the server's built-in /v1/health. No test notification is sent. - email: short IMAP connect+logout per configured account (no credentials in meta). - providers: probe each enabled ModelEndpoint's model list (no api_key in meta). Probe functions take their inputs as parameters and isolate the network call to injectable callables, so they unit-test without touching the network (same pattern as the merged provider-endpoint tests). Network probes run concurrently off the event loop via asyncio.to_thread with bounded per-probe timeouts. memory_vector is now passed into setup_diagnostics_routes (new optional param, backward-compatible) so ChromaDB's vector-memory store can be reported too. Tests: tests/test_service_health.py — 29 tests covering every status mapping per subsystem, the overall rollup, and that no secrets leak into meta. Verification: python -m pytest tests/test_service_health.py -q # 29 passed python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q Backend + tests only; an Admin/Settings UI badge that renders this endpoint is a natural follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(diagnostics): bound service-health wall-clock and redact secrets Addresses review on #964. Blocker 1 — genuinely bounded wall-clock: - providers_health and email_health now fan out per-item probes across a bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET), instead of probing endpoints/accounts sequentially. Stragglers are reported as a controlled `timeout` and never block; the pool is shut down with wait=False so the response returns on time regardless of endpoint/account count. - The IMAP connect path now honors the service-health budget: _imap_connect gained a pass-through `timeout` param and the probe calls it with _PROBE_TIMEOUT instead of the default 15s. - collect_service_health runs the four network subsystems concurrently, each under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop. Blocker 2 — no secret/raw-error leakage in the response: - _safe_url strips userinfo, query, and fragment from every URL surfaced in meta (searxng instance, ntfy base, provider name fallback), keeping only scheme/host/port/path. - _classify_error maps every probe failure to a controlled category token (timeout, connection_refused, dns_error, tls_error, network_error, http_error, auth_or_protocol_error, …) — raw str(exception), which can embed credentialed URLs or server text, is never returned. Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py): - URL userinfo/query redaction for searxng/ntfy/providers. - secret-bearing exception strings map to categories and don't leak. - multiple slow providers/accounts stay bounded (single + 25-endpoint cases). - subsystems run concurrently; aggregate deadline yields a controlled result. - route-level unauthenticated (401) / non-admin (403) / admin (200) coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(diagnostics): isolate route tests so they don't leak module globals The new route tests replaced src.service_health.collect_service_health and routes.diagnostics_routes.require_admin via direct assignment, which persisted for the rest of the pytest session. In CI's full alphabetical run that fake collector (returning services=[]) leaked into the later collect_service_health tests and failed them. Switch to monkeypatch.setattr so both are restored after each test. No production code change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-09 16:00:24 +01:00
nubs	932b7f2446	fix(email): close IMAP socket when connect/login fails (#3174 ) (#3363 ) * fix(email): close IMAP socket when connect/login fails (#3174) _imap_connect opened a live socket via _open_imap_connection and then called conn.login() with no try/finally, and _open_imap_connection called conn.starttls() unguarded. When auth fails (e.g. an Office 365 app password on an MFA-enabled tenant, #3174) or STARTTLS is rejected, the already-open socket was orphaned. Every IMAP caller funnels through _imap_connect, including the 30-minute _auto_summarize_poller, so a persistently misconfigured account leaked one descriptor per pass toward FD exhaustion. The previously merged leak fixes (#1325/#1330/#1423/#1530) only guard the post-connect body and monkeypatch _imap_connect to succeed, so this connect-time path was uncovered. Wrap login() and starttls() so a failure calls conn.shutdown() (low-level close; logout() can't run pre-auth) before re-raising. Adds two regression tests that fail without the guard. * fix(email): guard MCP IMAP+SMTP connect-time leaks too (#3174) Folds in the sibling connect-time leaks vdmkenny flagged on #3363, so the whole connect-then-step leak class is closed in one place: - mcp_servers/email_server.py::_imap_connect — guard starttls() and login(); close pre-auth with conn.shutdown() before re-raising. - mcp_servers/email_server.py::_smtp_connect — guard starttls() and login(); SMTP has no shutdown(), so close with conn.close() (socket close, no QUIT). Routes SMTP (_send_smtp_message) is already safe via 'with smtplib.SMTP(...)'. Adds four regression tests (one per guard), verified to fail without the fix.	2026-06-08 21:21:41 +02:00
Aman Tewary	d458cade98	docs(email): clarify Outlook password auth failures Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>	2026-06-08 15:32:16 +01:00
Mike	ac94885c84	refactor(constants): single source of truth for data dir (#3368 ) * refactor(constants): single source of truth for data dir + merge core/src constants Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(contributing): use named src.constants for data paths, drop core/constants references Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:58:52 +02:00
SurprisedDuck	b8463e3ac2	fix(email): decode headers without injected spaces (#2433 ) routes.email_helpers._decode_header joined the runs from email.header.decode_header() with " ". Those runs carry their own surrounding whitespace (e.g. (b"Re: ", None)), and RFC 2047 §6.2 requires the whitespace between two adjacent encoded-words to be dropped, so the join produced a double space after an ASCII prefix ("Re: Jóse"), a spurious space in "Name <addr>" senders, and a stray space between two adjacent encoded-words ("Café 日本"). _decode_header backs the inbox list, message read, search, and the background pollers, so the corruption hit essentially every non-ASCII subject/sender. Use email.header.make_header(...) for RFC-correct concatenation, keeping the existing lossy per-part fallback for malformed/unknown MIME charsets (make_header raises LookupError there) so the unknown-charset contract in tests/test_email_decode_header.py still holds. The sibling mcp_servers.email_server._decode_header was already fixed the same way (commit `46999de`); this brings the routes.email_helpers copy in line, with regression coverage. Supported by Claude Opus 4.8 Co-authored-by: SurprisedDuck <288741682+SurprisedDuck@users.noreply.github.com>	2026-06-07 16:56:20 +02:00
Lucas Daniel	34bd8f0491	fix(email): guarantee IMAP conn.logout() on all exception paths (#1530 ) Three IMAP connection leaks were recently fixed via try/finally (#1325, #1330, #1423). This commit applies the same pattern to the remaining callsites that still used inline logout-only cleanup. routes/email_helpers.py: - _fetch_sender_thread_context: conn was uninitialized when the outer try/except returned early on connect failure, causing the finally block to crash on conn.close()/conn.logout(). Merged the two separate try blocks into one and added conn=None guard. - _pre_retrieve_context: ctx_conn.logout() was inside the loop body with no finally, so any exception in the folder/search loop leaked the socket. Moved cleanup into a finally block with ctx_conn=None guard. mcp_servers/email_server.py: - _list_emails: multiple inline conn.logout() calls on early-return paths; exception between them leaked the socket. Wrapped in try/finally. - _read_email: same pattern — four separate logout() calls replaced by a single finally block. - _reply_to_email: logout() called before the error check, so an exception in conn.select() leaked the socket. Wrapped in try/finally. - _download_attachment: same pattern as _reply_to_email. Also adds tests/test_imap_leak_fixes.py with 9 regression tests (one per function/failure-mode) that monkeypatch _imap_connect and assert conn.logout() is called exactly once even when IMAP operations raise.	2026-06-07 05:09:28 +01:00
michaelxer	53fd856ea8	fix: raise imaplib line limit for large mailboxes (#2895 ) Python's imaplib._MAXLINE defaults to 1 MB. Mailboxes with tens of thousands of messages exceed this on UID SEARCH ALL, crashing with 'got more than 1000000 bytes'. Set _MAXLINE to 50 MB after opening the connection so large mailboxes work without error. Fixes #2883 Co-authored-by: michaelxer <michaelxer@users.noreply.github.com>	2026-06-05 22:59:35 +02:00
anduimagui	f9c81f3c8d	fix(email): scope AI caches by owner (#2695 )	2026-06-05 02:21:50 +02:00
Shaw	e678ff753f	fix(email): guard _decode_header against unknown MIME charset (#1354 ) A header that declares an unknown or invalid MIME charset (e.g. a malformed or spam Subject like =?x-unknown-charset?B?...?=) raised an uncaught LookupError. bytes.decode(..., errors="replace") only handles byte-decode errors, not codec lookup failures, so the "replace" safety net did not apply. _decode_header decodes Subject/From/To/Cc for the inbox list, single-message fetch, and the background mail pollers (routes/email_routes.py, routes/email_pollers.py, src/builtin_actions.py), so a single bad message could crash the whole inbox render or the poller loop. Wrap the per-part decode in try/except (LookupError, ValueError) and fall back to utf-8/replace. Valid charsets (utf-8, iso-8859-1, ...) are unchanged. Adds tests/test_email_decode_header.py — the unknown-charset case fails before this change and passes after.	2026-06-03 14:24:20 +09:00
Paulo Victor Cordeiro	4019283eba	fix: IMAP connection leak in _imap_move on store/expunge failure (#1325 ) If c.store() or c.expunge() raised an exception, the connection was never logged out. Use try/finally to ensure c.logout() is always called regardless of how the function exits.	2026-06-03 02:35:36 +09:00
Vykos	1adf21a7e5	Scope email account workflows by owner (#1309 )	2026-06-03 02:21:02 +09:00
Afonso Coutinho	35fa022e2e	fix: email pre-retrieval ignores contacts (reads non-existent email/phone keys) (#1241 ) * fix: match known email senders against the contact 'emails' list * fix: build contact-match snippets from emails/phones lists	2026-06-03 00:39:31 +09:00
red person	c7ddfd7dd2	Use shared IMAP timeout for account tests (#1088 )	2026-06-02 23:11:04 +09:00
mechramc	9d0a18a5b5	Email: add explicit SMTP security mode	2026-06-02 13:15:06 +09:00
Tatlatat	ffb77d7ff2	fix(auth): honor AUTH_ENABLED=false on owner-scoped endpoints (no /login loop) (#880 ) When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still returned 401 (api/models, api/research/, api/email/), so the front-end redirected the browser to /login and the app was unusable despite auth being turned off. require_user() in src/auth_helpers.py already documents and honors this contract (issue #622) via 'if _auth_disabled(): return ""', but these endpoints did their own get_current_user/is_configured check without it. Make _require_user (research), the /api/models anti-leak guard, and email_helpers._require_auth consult _auth_disabled() and let anonymous through (owner='') only when the operator explicitly disabled auth. The 401 protection is fully intact when AUTH_ENABLED=true. Verified end-to-end: with AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.	2026-06-02 12:26:26 +09:00
Jamieson O'Reilly	171c29dcf3	Fix email-thread HTML injection, attachment path traversal, and missing authz (#475 ) Hardens issues found in a security review of the current tree (separate from the cookbook SSH PR): - Email thread rendering (static/js/emailLibrary.js): the flat read path runs inbound HTML through the allowlist sanitizer, but the two threaded paths (_renderTurnsAsBubbles / _renderTurnsFromServer — the default view) injected server-parsed `body_html` raw into the DOM. A crafted inbound email could inject arbitrary markup (phishing/form/credential-capture/tracking; full XSS if a deployment relaxes the script CSP). Now sanitized on all paths. - Attachment extraction (routes/email_routes.py, routes/email_helpers.py): the on-disk extraction dir was `ATTACHMENTS_DIR / f"{folder}_{uid}"` with user-controlled folder/uid and no containment, so a folder like `../../tmp` could escape ATTACHMENTS_DIR. New attachment_extract_dir() flattens both to a single safe segment and asserts containment. - Diagnostics routes (routes/diagnostics_routes.py): /api/db/stats, /api/rag/stats, /api/test/youtube, /api/test-research relied only on the global session check (any logged-in user). Now require_admin-gated. - Defense-in-depth HTML escaping: session HTML export escapes the session name (routes/session_routes.py); the MCP OAuth page escapes the reflected Host header / server_id (routes/mcp_routes.py). - Internal-tool token now compared with secrets.compare_digest (constant time) in core/middleware.py and app.py. Adds regression tests in tests/test_security_regressions.py.	2026-06-01 22:20:17 +09:00
pewdiepie-archdaemon	5ed9b74cd0	Polish email tasks and window controls	2026-06-01 20:56:46 +09:00
pewdiepie-archdaemon	0888a3b3e6	Add native Windows compatibility layer	2026-06-01 15:09:47 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

21 Commits