Isolate untrusted context from visible user prompts (#3584)

Prevent untrusted source/context guard text from being merged into the current visible user request during provider message sanitization.

Changes:
- Detect untrusted context blocks during LLM message sanitization
- Insert a short assistant boundary before the current user request
- Keep the visible user prompt as its own user message
- Preserve normal consecutive user-message merging for non-untrusted cases
- Strengthen prompt-security wording to avoid mentioning guard wrappers
- Add regression coverage for untrusted context followed by a user prompt

Notes:
- Untrusted context remains role:user for safety
- This does not add prompt debug logging
- This does not change frontend draft persistence
This commit is contained in:
Kevin Fiddick
2026-06-27 07:50:04 -05:00
committed by GitHub
parent ebead8083e
commit 8888819d74
4 changed files with 61 additions and 7 deletions
+2
View File
@@ -38,6 +38,8 @@ def test_untrusted_context_policy_marks_sources_as_data():
assert "not instructions" in UNTRUSTED_CONTEXT_POLICY
assert "overrides" in UNTRUSTED_CONTEXT_POLICY
assert "Do not quote" in UNTRUSTED_CONTEXT_POLICY
assert "acknowledge untrusted-source wrapper labels" in UNTRUSTED_CONTEXT_POLICY
# ── secret_storage ─────────────────────────────────────────────