fix(documents): restore PDF library metadata and preview (#2483)

PDF uploads are stored as markdown wrappers with pdf_source or pdf_form_source markers so the editor can preserve extracted text, form fields, and annotations. The library exposed that internal wrapper: auto-created PDF documents used the hashed storage filename as the title, and row/facet language reported markdown instead of pdf.

Derive chat-upload PDF titles from the original upload name, derive document-library display language from the PDF source marker for rows, filters, and facets, and keep markdown wrappers excluded from the markdown facet when they represent PDFs.

The expanded library card already renders PDF-backed documents through /api/document/{id}/render-pdf. Allow only that inline PDF preview endpoint to be framed by same-origin app pages while leaving normal routes on X-Frame-Options: DENY and frame-ancestors none.

Also tighten the existing PDF marker regression assertion so it matches the actual historical corruption signature instead of contradicting the preserved [Page 1 text]: marker.

Fixes #2468
This commit is contained in:
nubs
2026-06-07 21:23:27 +00:00
committed by GitHub
parent 76c1f42ab0
commit 1a0e1c5d69
6 changed files with 160 additions and 7 deletions
+9
View File
@@ -67,6 +67,9 @@ class SecurityHeadersMiddleware(BaseHTTPMiddleware):
# Tool render endpoints are served inside iframes — allow framing by self
is_tool_render = path.startswith("/api/tools/") and path.endswith("/render")
# PDF previews are embedded by the in-app document library. Keep the
# exception route-scoped so normal app pages remain unframeable.
is_document_pdf_preview = path.startswith("/api/document/") and path.endswith("/render-pdf")
# Visual report pages are self-contained HTML — need inline scripts + external images
is_report = path.startswith("/api/research/report/")
@@ -96,6 +99,12 @@ class SecurityHeadersMiddleware(BaseHTTPMiddleware):
# sandbox="allow-scripts" attribute provides isolation.
# Don't overwrite the route's own restrictive CSP either.
pass
elif is_document_pdf_preview:
response.headers["X-Frame-Options"] = "SAMEORIGIN"
response.headers["Content-Security-Policy"] = (
"default-src 'none'; "
"frame-ancestors 'self'"
)
else:
response.headers["X-Frame-Options"] = "DENY"
# NOTE: `style-src 'unsafe-inline'` is intentionally retained.