fix(webhooks): redact IPv6 addresses in sanitized error messages (#3038)

* fix(webhooks): redact IPv6 addresses in sanitized error messages sanitize_error() only stripped IPv4 literals, so a failed webhook delivery to an internal IPv6 host (::1, fe80::/fc00:: ...) leaked the address into Webhook.last_error, which is surfaced in the UI. The module already treats internal IPv6 as sensitive (see _PRIVATE_NETWORKS and src/url_safety.py); the scrubber just didn't keep up. Add an IPv6 redaction pass covering bracketed, full 8-group, and ::-compressed forms. The pattern is scoped to leave clock times ("12:34:56"), MAC addresses, and C++ "::" tokens untouched, and the ::-branch uses a lookahead over a flat character class so there is no nested quantifier to backtrack on (no ReDoS on long colon/hex runs). Adds tests/test_webhook_sanitize_error_ipv6.py. * webhook: validate IPv6 candidates with ipaddress, not a regex grammar Per review on #3038: instead of hand-rolling the IPv6 grammar in a regex (brittle, and easy to over-match colon-heavy text), use a loose regex to find candidate tokens and let ipaddress.ip_address() decide. Only tokens it parses as IPv6 are redacted, so the false-positive guards (clock times, MACs, "std::vector") now come from the stdlib instead of a custom pattern. This also covers cases the old pattern missed -- zone ids (fe80::1%eth0) and IPv4-mapped addresses -- and no longer partially mangles invalid colon strings (a 9-group token is preserved whole rather than losing its first 8 groups). The bracketed branch is a single greedy class with no X*:X* backtracking; verified ~1ms on 40k-char adversarial input. Extends the test file with zone-id, IPv4-mapped, and invalid-token cases. * webhook: redact bracketed/scoped/IPv4-mapped IPv6 as one unit Review on #3038 found a few IP forms left partially redacted or malformed by sanitize_error(): [fe80::1%eth0]:8080 -> [[redacted]]:8080 [::ffff:192.168.0.1]:8080 -> [[redacted][redacted]]:8080 ::ffff:192.168.0.1 -> [redacted][redacted] Two causes: the bracketed branch's character class dropped zone ids, so scoped addresses fell through to the bare branch and left the brackets and port behind; and the IPv4 pass ran first, stripping the embedded v4 of an IPv4-mapped address so the v6 pass then redacted the "::ffff:" remnant separately. Fix: - run the IP-candidate pass before the IPv4 pass, so IPv4-mapped forms are matched and redacted whole - match the full bracketed authority ([...] + optional %zone + :port) as a single token, and redact a v4-or-v6 literal inside [ ] as one [redacted] - extend the bare branch with a bounded (exactly-3) dotted-quad tail for IPv4-mapped forms; exactly-3 so it can't swallow a partial suffix and accidentally preserve an otherwise-valid address Each form now collapses to a single [redacted]; the candidate finder stays linear (~1.3ms on 40k-char adversarial input). Adds regression tests for the three reported forms and keeps the timestamp/MAC/std::vector coverage.
2026-06-16 17:55:26 -04:00 · 2026-06-06 23:55:33 -04:00
parent a3cb15d0a1
commit 3940297655
2 changed files with 152 additions and 3 deletions
@@ -136,11 +136,62 @@ def validate_events(events_str: str) -> str:
    return ",".join(events)


+# Broad candidate matcher for the IP-redaction pass. Deliberately loose: a
+# bracketed host authority ([fe80::1%eth0]:8080 and friends) with an optional
+# :port, or a bare IPv6 run — hex groups joined by colons, an optional trailing
+# dotted-quad for IPv4-mapped forms (::ffff:192.168.0.1), and an optional %zone.
+# It does NOT encode the IPv6 grammar; ipaddress.ip_address() is the real
+# validator (see _redact_ip_candidate), so any colon-bearing string it rejects
+# (clock times, MACs, "std::vector") is left alone. Every branch is a single
+# greedy class or a repetition over a mandatory ':'/'.' delimiter, so there is no
+# nested-quantifier backtracking (ReDoS-safe).
+_IP_CANDIDATE = re.compile(
+    r'\[[^\[\]\s]*\](?::\d+)?'
+    r'|(?<![\w.:%])[0-9A-Fa-f]{0,4}(?::[0-9A-Fa-f]{0,4}){2,}'
+    r'(?:(?:\.[0-9]{1,3}){3})?(?:%[0-9A-Za-z._-]+)?'
+)
+
+
+def _redact_ip_candidate(match: re.Match) -> str:
+    """Redact a candidate token that the stdlib confirms is an IP address.
+
+    A bare token is redacted only when it parses as IPv6 — bare IPv4 is left to
+    the dedicated IPv4 pass. A bracketed token is a host authority, so a v4 or v6
+    literal inside [ ] is redacted as a whole. This keeps output consistent (one
+    [redacted], never nested or partial) for scoped/mapped/ported forms.
+    """
+    token = match.group(0)
+    bracketed = token.startswith('[')
+    candidate = token
+    if bracketed:
+        # Keep only what's inside [...]; the trailing :port is dropped.
+        candidate = candidate[1:candidate.index(']')]
+    # A zone id (fe80::1%eth0) is not part of the address ipaddress parses.
+    candidate = candidate.split('%', 1)[0]
+    # The loose bare pattern can trail one stray ':' (e.g. "::1:" in "host ::1:
+    # down"); drop it unless it's the "::" compression marker.
+    if candidate.endswith(':') and not candidate.endswith('::'):
+        candidate = candidate[:-1]
+    try:
+        addr = ipaddress.ip_address(candidate)
+    except ValueError:
+        return token
+    if bracketed or isinstance(addr, ipaddress.IPv6Address):
+        return '[redacted]'
+    return token
+
+
 def sanitize_error(error: str, max_len: int = 200) -> str:
    """Strip potentially sensitive details from error messages."""
-    # Remove IP addresses and ports
-    cleaned = re.sub(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(:\d+)?', '[redacted]', error)
-    # Remove hostnames in URLs
+    # Redact IPv6 (and bracketed-authority) addresses first, so an IPv4-mapped
+    # form like ::ffff:192.168.0.1 is scrubbed as one unit instead of having its
+    # embedded IPv4 removed first and leaving a stray "::ffff:" behind. Broad
+    # candidates are validated by ipaddress.ip_address(), so the false-positive
+    # guards (clock times, MACs, C++ "::") come from the stdlib, not a regex.
+    cleaned = _IP_CANDIDATE.sub(_redact_ip_candidate, error)
+    # Remove remaining bare IPv4 addresses and ports.
+    cleaned = re.sub(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(:\d+)?', '[redacted]', cleaned)
+    # Remove hostnames in URLs.
    cleaned = re.sub(r'https?://[^\s/]+', '[redacted-url]', cleaned)
    return cleaned[:max_len]