fix(scheduler): push next_run forward on startup to stop restart double-fire (#708)

TaskScheduler.start() aborts stale TaskRun rows but never advanced
ScheduledTask.next_run. Across a restart the in-process _executing set
is empty, so the first post-restart _check_due_tasks() call dispatches
every task whose next_run is still in the past — and so does every
subsequent poll, until the task's regular _execute_task path finally
runs compute_next_run and pushes it forward.

start() now queries active tasks with next_run < now and pushes each
one to now + 60s. The first poll after restart sees them as not-yet-due,
the task runs once normally, and compute_next_run puts the schedule
back on its real cadence. Paused and not-yet-due tasks are left alone.

The validator test was rewritten as a regression test asserting the
opposite of the bug it originally demonstrated, plus two narrower cases
to lock down the filter (only active+overdue is touched).
This commit is contained in:
Ernest Hysa
2026-06-02 03:43:30 +01:00
committed by GitHub
parent 15c7cb58e7
commit 7669696bb0
2 changed files with 217 additions and 0 deletions
+27
View File
@@ -312,6 +312,33 @@ class TaskScheduler:
except Exception as e:
logger.warning(f"Could not clear stale task_runs on startup: {e}")
# Advance next_run for active tasks whose next_run is already in the
# past. Without this, a restart hits _check_due_tasks() with an empty
# in-process _executing set, and the same overdue task fires once per
# poll until it completes.
try:
from core.database import SessionLocal as _SL, ScheduledTask as _ST
db = _SL()
try:
now = datetime.utcnow()
overdue = db.query(_ST).filter(
_ST.status == "active",
_ST.next_run.isnot(None),
_ST.next_run < now,
).all()
if overdue:
for t in overdue:
t.next_run = now + timedelta(seconds=60)
db.commit()
logger.info(
"Pushed next_run forward by 60s for %d overdue active tasks on startup",
len(overdue),
)
finally:
db.close()
except Exception as e:
logger.warning(f"Could not advance overdue next_run on startup: {e}")
# Defense-in-depth dedupe sweep: for any owner with >1 rows where
# is_default_assistant=True, keep the oldest and demote the rest +
# delete their orphaned check-in tasks. This is the safety net for