mirror of
https://github.com/pewdiepie-archdaemon/odysseus.git
synced 2026-06-15 17:25:26 -04:00
9180847c0e
* Add consolidated service health endpoint for degraded-state reporting
ROADMAP (High Priority) asks for "Better degraded-state reporting for
ChromaDB, SearXNG, email, ntfy, and provider probes." Until now there was no
single readout of which subsystems are actually working: /api/health is only a
liveness ping and each subsystem's signal lives in a different module, so a
misconfigured self-host install gives no consolidated picture.
This adds an admin-only GET /api/diagnostics/services endpoint backed by a new
src/service_health.py aggregator. Each subsystem reports a uniform
{name, status, detail, meta} where status is ok | degraded | down | disabled,
and the response rolls up an overall verdict (worst non-disabled status).
Probes are deliberately non-intrusive and safe to poll:
- ChromaDB: reads the .healthy flags on the RAG and memory vector stores.
- SearXNG: GET /healthz (2xx), falling back to the instance root (<500). No
search query is run.
- ntfy: GET the server's built-in /v1/health. No test notification is sent.
- email: short IMAP connect+logout per configured account (no credentials in
meta).
- providers: probe each enabled ModelEndpoint's model list (no api_key in meta).
Probe functions take their inputs as parameters and isolate the network call to
injectable callables, so they unit-test without touching the network (same
pattern as the merged provider-endpoint tests). Network probes run concurrently
off the event loop via asyncio.to_thread with bounded per-probe timeouts.
memory_vector is now passed into setup_diagnostics_routes (new optional param,
backward-compatible) so ChromaDB's vector-memory store can be reported too.
Tests: tests/test_service_health.py — 29 tests covering every status mapping
per subsystem, the overall rollup, and that no secrets leak into meta.
Verification:
python -m pytest tests/test_service_health.py -q # 29 passed
python -m py_compile src/service_health.py routes/diagnostics_routes.py app.py
python -m pytest tests/test_endpoint_resolver.py tests/test_provider_endpoints.py -q
Backend + tests only; an Admin/Settings UI badge that renders this endpoint is
a natural follow-up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(diagnostics): bound service-health wall-clock and redact secrets
Addresses review on #964.
Blocker 1 — genuinely bounded wall-clock:
- providers_health and email_health now fan out per-item probes across a
bounded thread pool (_bounded_map) with a hard total budget (_FANOUT_BUDGET),
instead of probing endpoints/accounts sequentially. Stragglers are reported
as a controlled `timeout` and never block; the pool is shut down with
wait=False so the response returns on time regardless of endpoint/account
count.
- The IMAP connect path now honors the service-health budget: _imap_connect
gained a pass-through `timeout` param and the probe calls it with
_PROBE_TIMEOUT instead of the default 15s.
- collect_service_health runs the four network subsystems concurrently, each
under a per-subsystem deadline (_SUBSYSTEM_DEADLINE), with an overall
wait_for ceiling (_AGGREGATE_DEADLINE) as a backstop.
Blocker 2 — no secret/raw-error leakage in the response:
- _safe_url strips userinfo, query, and fragment from every URL surfaced in
meta (searxng instance, ntfy base, provider name fallback), keeping only
scheme/host/port/path.
- _classify_error maps every probe failure to a controlled category token
(timeout, connection_refused, dns_error, tls_error, network_error,
http_error, auth_or_protocol_error, …) — raw str(exception), which can embed
credentialed URLs or server text, is never returned.
Tests (tests/test_service_health.py, +tests/test_diagnostics_service_route.py):
- URL userinfo/query redaction for searxng/ntfy/providers.
- secret-bearing exception strings map to categories and don't leak.
- multiple slow providers/accounts stay bounded (single + 25-endpoint cases).
- subsystems run concurrently; aggregate deadline yields a controlled result.
- route-level unauthenticated (401) / non-admin (403) / admin (200) coverage.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(diagnostics): isolate route tests so they don't leak module globals
The new route tests replaced src.service_health.collect_service_health and
routes.diagnostics_routes.require_admin via direct assignment, which persisted
for the rest of the pytest session. In CI's full alphabetical run that fake
collector (returning services=[]) leaked into the later collect_service_health
tests and failed them. Switch to monkeypatch.setattr so both are restored after
each test. No production code change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Alexandre Teixeira <111787685+alteixeira20@users.noreply.github.com>
69 lines
2.6 KiB
Python
69 lines
2.6 KiB
Python
"""Route-level regression tests for GET /api/diagnostics/services.
|
|
|
|
The reviewer asked for explicit coverage of unauthenticated / non-admin / admin
|
|
access to this admin diagnostics route, beyond the unit tests for the collector.
|
|
|
|
These need a real FastAPI + TestClient (the conftest only stubs FastAPI when it
|
|
is *not* installed). When the full app deps aren't present we skip rather than
|
|
fail, so the suite stays green in minimal environments; CI installs
|
|
requirements, so the tests run there.
|
|
"""
|
|
import pytest
|
|
|
|
fastapi = pytest.importorskip("fastapi")
|
|
pytest.importorskip("starlette.testclient")
|
|
|
|
from fastapi import FastAPI, HTTPException, Request
|
|
from starlette.testclient import TestClient
|
|
|
|
# Importing the route module pulls a few app deps; skip cleanly if unavailable.
|
|
diag = pytest.importorskip("routes.diagnostics_routes")
|
|
|
|
|
|
def _client_with_admin_gate(monkeypatch, gate):
|
|
"""Mount the diagnostics router with `require_admin` and the collector
|
|
patched (via monkeypatch so the module globals are restored afterwards),
|
|
and return a TestClient. `gate` plays the role of require_admin."""
|
|
import src.service_health as sh
|
|
|
|
async def _fake_collect(_rag, _mem):
|
|
return {"overall": "ok", "services": [], "timestamp": "t"}
|
|
|
|
# monkeypatch.setattr restores these after the test — a plain assignment
|
|
# would leak the fakes into every later test in the session.
|
|
monkeypatch.setattr(diag, "require_admin", gate)
|
|
monkeypatch.setattr(sh, "collect_service_health", _fake_collect)
|
|
|
|
app = FastAPI()
|
|
app.include_router(diag.setup_diagnostics_routes(
|
|
rag_manager=None, rag_available=False, research_handler=None,
|
|
memory_vector=None))
|
|
return TestClient(app, raise_server_exceptions=False)
|
|
|
|
|
|
def test_unauthenticated_is_rejected(monkeypatch):
|
|
def gate(_request: Request):
|
|
raise HTTPException(401, "Not authenticated")
|
|
client = _client_with_admin_gate(monkeypatch, gate)
|
|
r = client.get("/api/diagnostics/services")
|
|
assert r.status_code == 401
|
|
|
|
|
|
def test_non_admin_is_forbidden(monkeypatch):
|
|
def gate(_request: Request):
|
|
raise HTTPException(403, "Admin only")
|
|
client = _client_with_admin_gate(monkeypatch, gate)
|
|
r = client.get("/api/diagnostics/services")
|
|
assert r.status_code == 403
|
|
|
|
|
|
def test_admin_gets_report(monkeypatch):
|
|
def gate(_request: Request):
|
|
return None # admin allowed
|
|
client = _client_with_admin_gate(monkeypatch, gate)
|
|
r = client.get("/api/diagnostics/services")
|
|
assert r.status_code == 200
|
|
body = r.json()
|
|
assert set(body) == {"overall", "services", "timestamp"}
|
|
assert body["overall"] == "ok"
|