* fix(security): harden untrusted_context_message against delimiter spoofing
Root cause: untrusted_context_message() did not sanitise content before
interpolating it into the <<<UNTRUSTED_SOURCE_DATA>>> / <<<END_UNTRUSTED_SOURCE_DATA>>>
delimited sandbox block. Malicious content embedding the literal delimiter
strings could prematurely close the sandbox and inject instructions that
the LLM treats as trusted.
Fix: add _escape_guard_markers() helper that replaces the guard marker
strings with structurally inert tokens (<<<_UNTRUSTED_DATA>>> and
<<<_END_UNTRUSTED_DATA>>>) before the content is wrapped. The function is
applied in untrusted_context_message() after casting content to str.
The existing ~13 call sites (chat_processor.py, agent_loop.py,
deep_research.py, chat_helpers.py, chat_routes.py) are unaffected because
they pass content through without inspecting the output delimiters.
Regression tests added in tests/test_prompt_security.py covering:
- _escape_guard_markers unit tests (open, close, both, benign passthrough)
- untrusted_context_message integration tests (delimiter spoofing
neutralisation, type coercion, None handling, metadata preservation)
Resolves#3056
* fix(security): sanitize label for newlines and guard markers
Addresses reviewer feedback on PR #3086:
- Normalize label: strip CR/LF to prevent pre-guard line injection
- Escape guard marker literals in label via _escape_guard_markers()
- Add regression tests for label-based newline injection, GUARD_OPEN
and GUARD_CLOSE in label, and exactly-one-structural-guard assertion
* fix(security): move Source label inside GUARD_OPEN block
The reviewer correctly identified that even after sanitizing the label,
any user-derived label text (e.g. `f"web page: {url}"`) still appeared
before GUARD_OPEN in the trusted framing zone, where the LLM treats it
as trusted instructions.
Fix: move the 'Source: {label}' line to inside the guarded block so
only the hardcoded UNTRUSTED_CONTEXT_HEADER sits before GUARD_OPEN.
The raw label is still kept in metadata["source"] for traceability.
_sanitize_label() and _escape_guard_markers() are kept for defence-in-
depth on the label stored inside the block.
Update test_label_newline_injection_is_blocked to assert no label-
derived instruction text appears before GUARD_OPEN (pre-guard zone is
now empty of any user-derived content).