Commit Graph

6 Commits

Author SHA1 Message Date
SurprisedDuck 78747b56ca Documents: strip PDF marker without corrupting text
_process_pdf prepends "\n\n[PDF content]:" to extracted text, and two
call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:").
str.lstrip(chars) treats its argument as a *set of characters*, so it keeps
eating into the page text that follows the marker — e.g. a body starting
with "to the board" loses its leading "to" because 't'/'o' are in the
marker's character set. Replace both sites with a shared
strip_pdf_content_marker() helper that uses str.removeprefix.
2026-06-02 20:35:27 +09:00
Duarte Antunes 448401a0fc Harden PDF document markers against cross-owner upload access (#445)
Route PDF lookups through UploadHandler.resolve_upload, reject poisoned pdf_source markers on document create/update, and add regression tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-01 22:38:14 +09:00
Miles df7d32c70c Require document privilege for PDF imports 2026-06-01 18:28:15 +09:00
red person 2f87dbcfbc Show a clear message when PyMuPDF is missing 2026-06-01 18:27:17 +09:00
Duarte Antunes e77d87fa80 Enforce owner checks for upload attachments 2026-06-01 16:47:48 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00