fix(scheduler): push next_run forward on startup to stop restart double-fire (#708)

TaskScheduler.start() aborts stale TaskRun rows but never advanced ScheduledTask.next_run. Across a restart the in-process _executing set is empty, so the first post-restart _check_due_tasks() call dispatches every task whose next_run is still in the past — and so does every subsequent poll, until the task's regular _execute_task path finally runs compute_next_run and pushes it forward. start() now queries active tasks with next_run < now and pushes each one to now + 60s. The first poll after restart sees them as not-yet-due, the task runs once normally, and compute_next_run puts the schedule back on its real cadence. Paused and not-yet-due tasks are left alone. The validator test was rewritten as a regression test asserting the opposite of the bug it originally demonstrated, plus two narrower cases to lock down the filter (only active+overdue is touched).
2026-06-16 09:45:24 -04:00 · 2026-06-02 03:43:30 +01:00
parent 15c7cb58e7
commit 7669696bb0
2 changed files with 217 additions and 0 deletions
@@ -312,6 +312,33 @@ class TaskScheduler:
        except Exception as e:
            logger.warning(f"Could not clear stale task_runs on startup: {e}")

+        # Advance next_run for active tasks whose next_run is already in the
+        # past. Without this, a restart hits _check_due_tasks() with an empty
+        # in-process _executing set, and the same overdue task fires once per
+        # poll until it completes.
+        try:
+            from core.database import SessionLocal as _SL, ScheduledTask as _ST
+            db = _SL()
+            try:
+                now = datetime.utcnow()
+                overdue = db.query(_ST).filter(
+                    _ST.status == "active",
+                    _ST.next_run.isnot(None),
+                    _ST.next_run < now,
+                ).all()
+                if overdue:
+                    for t in overdue:
+                        t.next_run = now + timedelta(seconds=60)
+                    db.commit()
+                    logger.info(
+                        "Pushed next_run forward by 60s for %d overdue active tasks on startup",
+                        len(overdue),
+                    )
+            finally:
+                db.close()
+        except Exception as e:
+            logger.warning(f"Could not advance overdue next_run on startup: {e}")
+
        # Defense-in-depth dedupe sweep: for any owner with >1 rows where
        # is_default_assistant=True, keep the oldest and demote the rest +
        # delete their orphaned check-in tasks. This is the safety net for