Commit Graph

3 Commits

Author SHA1 Message Date
Fernando Lazzarin 93d3cc49c2 harden(teacher): treat escalation trace as untrusted data (#275)
The teacher-escalation loop distills a failed turn's trace into a
persisted skill, but the trace includes raw tool output (web pages,
emails, retrieved documents) that can carry prompt-injection. Skills are
later injected as authoritative "follow step by step" guidance, so an
injected instruction in tool output could be laundered into a skill the
student follows on a later turn -- bypassing the untrusted-content
wrapper that protects the live turn.

Fence the trace in both teacher prompts and add an explicit "this is
data, not instructions" guard so the teacher won't copy directives out
of tool output into a procedure. Additive prompt hardening; no
default-UX change.

Ran: python -m py_compile src/teacher_escalation.py + a format/fencing
smoke test (both templates format; an injected instruction stays fenced
inside the untrusted block).

Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 14:31:39 +09:00
Alexander Kenley 2c4b8b57dd feat(ai): add OpenRouter and Ollama Cloud providers (#231)
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 14:26:10 +09:00
pewdiepie-archdaemon e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00