feat(email): add Google OAuth2 for Google Workspace / .edu IMAP & SMTP (#237)

* feat(email): add Google OAuth2 for Google Workspace / .edu IMAP & SMTP

Google deprecated basic-auth (password) access for Google Workspace
accounts in May 2025. This means any .edu or org Google email account
could no longer connect via IMAP/SMTP with a username + password —
the email feature was silently broken for a large class of users.

This PR adds full OAuth2 (XOAUTH2) support for Google accounts so
Workspace / .edu emails work out of the box.

## What changed

### Backend
- `core/database.py`: add `oauth_provider`, `oauth_access_token`,
  `oauth_refresh_token`, `oauth_token_expiry`, and `display_name`
  columns to `EmailAccount` + idempotent migration
- `routes/email_helpers.py`: XOAUTH2 auth in `_imap_connect()` and
  `_send_smtp_message()`, automatic token refresh, OAuth fields in
  `_get_email_config()`
- `routes/email_routes.py`: OAuth authorize + callback routes,
  `_smtp_ready()` fix, OAuth fields through `_deliver()` closure,
  `display_name` in `From:` header

### Frontend
- `static/js/settings.js`: "Google Workspace / .edu" provider preset,
  "Connect with Google" button, success/error banner, display name field
- `static/js/document.js`: `_accountCanSend()` recognises OAuth accounts
  as SMTP-capable

* security: sign OAuth state, scope callback by owner, fix quotes & logs

Addresses reviewer feedback on the email OAuth2 PR:

- OAuth state is now HMAC-SHA256 signed (keyed with the app secret from
  secret_storage) encoding account_id + owner + a random nonce, and is
  verified with constant-time comparison in the callback before any
  token write. Replaces the bare account_id state, closing the CSRF /
  state-guessing gap.
- Callback extracts the owner from the verified state and re-checks it
  against EmailAccount.owner before writing tokens, matching the
  ownership guards used elsewhere in the email routes. Single-user mode
  (owner == "") still accepts any account, consistent with
  _assert_owns_account.
- Replaced curly/smart quotes in the Name/Email/Display Name input rows
  with plain ASCII so getElementById lookups and event wiring work.
- Stripped account name, SMTP host/user, owner, and raw provider error
  text from send-config and OAuth logs; failures now surface as generic
  error codes in the redirect instead of raw exception strings.

* test(email): add OAuth2 state, _smtp_ready, and XOAUTH2 tests

Move the OAuth state sign/verify helpers out of the setup_email_routes
closure into module-level make_oauth_state/verify_oauth_state in
email_helpers.py so they can be unit-tested, then add tests/test_email_oauth.py:

- signed state round-trips account_id + owner, nonce is unique per call
- tampered account_id, forged signature, and garbage states are rejected
- _smtp_ready treats an OAuth account (no password) as send-capable, and
  still rejects host+user-only accounts with neither password nor OAuth
- _xoauth2_string / _xoauth2_bytes produce the correct SASL XOAUTH2 framing

14 new tests; existing test_security_regressions.py still passes (28).

* refactor(email): single XOAUTH2 frame helper, use RuntimeError

Polish from self-review before merge:

- Collapse the XOAUTH2 framing to one source of truth: _xoauth2_raw()
  returns the unencoded SASL string used by both the SMTP and IMAP auth
  callbacks (each library base64-encodes it), and _xoauth2_bytes() is
  just its .encode(). Removes the unused base64 _xoauth2_string helper
  and the duplicated inline frame in _send_smtp_message.
- Raise RuntimeError (not bare Exception) for the "OAuth token
  unavailable" path, matching the convention used across src/.
- Update tests accordingly.

All 14 OAuth tests + 28 security regressions pass; SMTP/IMAP XOAUTH2
verified live against a real Workspace account.

* tests(email-oauth): cover the security-sensitive OAuth paths before merge

The previous tests only exercised pure helpers (state signing, _smtp_ready,
XOAUTH2 framing). This adds coverage for the actual token-custody and
ownership behaviour, pinning the real route handlers rather than
re-implementations of their logic.

Real OAuth callback route (pulled live from setup_email_routes()):
- missing code -> generic missing_code redirect, no account id / owner in URL
- provider error -> generic google_error redirect, raw error not echoed
- tampered/invalid state -> invalid_state redirect, auth code never leaked
- signed state with owner mismatch -> token write refused (ownership_error),
  DB row left untouched
- signed state with matching owner -> tokens written encrypted, and only to
  the intended account (a second account stays untouched)

Real accounts-list route:
- exposes oauth_provider status but never the access/refresh token values,
  encrypted or otherwise

Token storage / refresh helpers (isolated in-memory SQLite, mocked HTTP):
- refreshed access token stored encrypted; expiry is a timestamp, not a token
- fresh token uses cache (no refresh call); expired token triggers refresh
- refresh HTTP failure returns None silently, no exception or secret surfaced
- missing client credentials short-circuits to None

Password-account regression:
- password IMAP accounts call conn.login(); OAuth accounts call XOAUTH2
  authenticate() and never login()

28 tests pass (14 prior + 14 new).

* fix(email-oauth): drop raw exception text from token-refresh log

Google token refresh failures now log the account id only, matching
the conservative logging used elsewhere on the OAuth path — no raw
provider/exception details surfacing in logs.

* fix(email-oauth): bring OAuth UI parity to the Integrations email form

The Google Workspace / .edu provider preset, Display Name field, and
Connect-with-Google flow were only wired into the Email-tab account
form. The Integrations-tab form (a separate code path for the same
account type) was missing all three, so the OAuth option was invisible
from that entry point. Mirrors the same PROVIDERS entry, OAuth section,
and connect handler so both forms behave identically.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
This commit is contained in:
Hriday Ranka
2026-06-15 12:02:58 -04:00
committed by GitHub
parent 0750486654
commit 270b8570fc
6 changed files with 1013 additions and 26 deletions
+134 -10
View File
@@ -13,6 +13,8 @@ and `email_pollers.py` (the background loops):
"""
import os
import base64
import time
import imaplib
import smtplib
import email as email_mod
@@ -38,6 +40,106 @@ from src.secret_storage import decrypt as _decrypt
logger = logging.getLogger(__name__)
def _xoauth2_raw(user: str, access_token: str) -> str:
"""The SASL XOAUTH2 initial-response string (unencoded).
Both smtplib.SMTP.auth() and imaplib.IMAP4.authenticate() base64-encode
the value their callback returns, so callers pass this raw form — never
pre-encoded — to avoid double base64.
"""
return f"user={user}\x01auth=Bearer {access_token}\x01\x01"
def _xoauth2_bytes(user: str, access_token: str) -> bytes:
"""Raw XOAUTH2 bytes for imaplib's authenticate() callback."""
return _xoauth2_raw(user, access_token).encode()
def make_oauth_state(account_id: str, owner: str) -> str:
"""Return an HMAC-signed, base64-encoded OAuth state token.
Encodes account_id + owner + a random nonce, signed with the app secret
so the callback can validate that the flow was initiated by an
authenticated, owning user (CSRF / state-forgery protection).
"""
import hmac as _hmac, hashlib as _hl, secrets as _sec
from src.secret_storage import _load_or_create_key
nonce = _sec.token_hex(16)
payload = json.dumps({"a": account_id, "o": owner, "n": nonce}, separators=(",", ":"))
sig = _hmac.new(_load_or_create_key(), payload.encode(), _hl.sha256).hexdigest()
return base64.urlsafe_b64encode(f"{payload}|{sig}".encode()).decode()
def verify_oauth_state(state: str) -> dict | None:
"""Verify an OAuth state token's HMAC signature.
Returns the decoded payload dict ({"a", "o", "n"}) on success, or None if
the token is malformed, tampered, or signed with a different key.
"""
import hmac as _hmac, hashlib as _hl
from src.secret_storage import _load_or_create_key
try:
decoded = base64.urlsafe_b64decode(state.encode()).decode()
payload, sig = decoded.rsplit("|", 1)
expected = _hmac.new(_load_or_create_key(), payload.encode(), _hl.sha256).hexdigest()
if not _hmac.compare_digest(sig, expected):
return None
return json.loads(payload)
except Exception:
return None
def _refresh_google_token(account_id: str) -> str | None:
"""Exchange the stored refresh token for a new access token and persist it."""
import httpx
from core.database import SessionLocal as _SL, EmailAccount as _EA
from src.secret_storage import encrypt as _enc, decrypt as _dec
client_id = os.environ.get("GOOGLE_OAUTH_CLIENT_ID", "")
client_secret = os.environ.get("GOOGLE_OAUTH_CLIENT_SECRET", "")
if not client_id or not client_secret:
return None
db = _SL()
try:
row = db.get(_EA, account_id)
if not row or not row.oauth_refresh_token:
return None
refresh_token = _dec(row.oauth_refresh_token or "")
if not refresh_token:
return None
resp = httpx.post("https://oauth2.googleapis.com/token", data={
"client_id": client_id,
"client_secret": client_secret,
"refresh_token": refresh_token,
"grant_type": "refresh_token",
}, timeout=10)
resp.raise_for_status()
data = resp.json()
access_token = data["access_token"]
row.oauth_access_token = _enc(access_token)
row.oauth_token_expiry = str(int(time.time()) + data.get("expires_in", 3600))
db.commit()
return access_token
except Exception:
logger.warning(f"Google token refresh failed for account {account_id}")
return None
finally:
db.close()
def _get_valid_google_token(account_id: str, cfg: dict) -> str | None:
"""Return a valid Google access token, refreshing if expired or missing."""
from src.secret_storage import decrypt as _dec
access_token = _dec(cfg.get("oauth_access_token") or "")
expiry_str = cfg.get("oauth_token_expiry") or ""
if access_token and expiry_str:
try:
if int(expiry_str) - 60 > time.time():
return access_token
except (ValueError, TypeError):
pass
return _refresh_google_token(account_id)
def _smtp_security_mode(cfg: dict) -> str:
raw = str(cfg.get("smtp_security") or "").strip().lower()
if raw in {"ssl", "starttls", "none"}:
@@ -54,20 +156,29 @@ def _send_smtp_message(cfg: dict, from_addr: str, recipients: list[str], message
port = int(cfg.get("smtp_port") or 465)
user = cfg.get("smtp_user") or ""
password = cfg.get("smtp_password") or ""
def _auth_smtp(smtp):
if cfg.get("oauth_provider") == "google":
token = _get_valid_google_token(cfg.get("account_id"), cfg)
if not token:
raise RuntimeError("Google OAuth token unavailable — reconnect the account")
smtp.ehlo()
smtp.auth("XOAUTH2", lambda challenge=None: _xoauth2_raw(user, token), initial_response_ok=True)
elif user and password:
smtp.login(user, password)
security = _smtp_security_mode(cfg)
if security == "ssl":
with smtplib.SMTP_SSL(host, port, timeout=timeout) as smtp:
if user and password:
smtp.login(user, password)
_auth_smtp(smtp)
smtp.sendmail(from_addr, recipients, message)
return
with smtplib.SMTP(host, port, timeout=timeout) as smtp:
if security == "starttls":
smtp.starttls()
if user and password:
smtp.login(user, password)
_auth_smtp(smtp)
smtp.sendmail(from_addr, recipients, message)
@@ -701,10 +812,16 @@ def _get_email_config(account_id: str | None = None, owner: str = "") -> dict:
"imap_password": _decrypt(row.imap_password or ""),
"imap_starttls": bool(row.imap_starttls),
"from_address": row.from_address or row.imap_user or "",
"oauth_provider": row.oauth_provider or "",
"oauth_access_token": row.oauth_access_token or "",
"oauth_refresh_token": row.oauth_refresh_token or "",
"oauth_token_expiry": row.oauth_token_expiry or "",
"display_name": row.display_name or "",
}
if not (cfg["smtp_host"] and cfg["smtp_user"] and cfg["smtp_password"]):
is_oauth = bool(cfg.get("oauth_provider"))
if not is_oauth and not (cfg["smtp_host"] and cfg["smtp_user"] and cfg["smtp_password"]):
logger.warning(f"SMTP not configured for account {row.name!r}")
if not (cfg["imap_host"] and cfg["imap_user"] and cfg["imap_password"]):
if not is_oauth and not (cfg["imap_host"] and cfg["imap_user"] and cfg["imap_password"]):
logger.warning(f"IMAP not configured for account {row.name!r}")
return cfg
finally:
@@ -825,12 +942,19 @@ def _imap_connect(account_id: str | None = None, owner: str = "",
timeout=timeout,
)
try:
conn.login(cfg["imap_user"], cfg["imap_password"])
if cfg.get("oauth_provider") == "google":
token = _get_valid_google_token(cfg.get("account_id"), cfg)
if not token:
raise RuntimeError("Google OAuth token unavailable — reconnect the account in Settings → Integrations")
conn.authenticate("XOAUTH2", lambda x: _xoauth2_bytes(cfg["imap_user"], token))
else:
conn.login(cfg["imap_user"], cfg["imap_password"])
except Exception:
# A failed AUTHENTICATE (e.g. an Office 365 app password on an
# MFA-enabled tenant, #3174) otherwise orphans the already-connected
# socket; close it before propagating so a misconfigured account
# can't leak one descriptor per retry / background poller pass.
# MFA-enabled tenant, #3174, or an expired/revoked OAuth token)
# otherwise orphans the already-connected socket; close it before
# propagating so a misconfigured account can't leak one descriptor
# per retry / background poller pass.
try:
conn.shutdown()
except Exception: