From 3f3c05e8c2d9cc6cc75e58968c2d0a78add3b582 Mon Sep 17 00:00:00 2001 From: Hsin-Chen Pai <72599785+Seanachan@users.noreply.github.com> Date: Mon, 15 Jun 2026 19:26:47 +0800 Subject: [PATCH] docs: add backup/restore guide for odysseus-backup (#2587) The scripts/odysseus-backup snapshot/restore CLI was undocumented in README.md and docs/. Add docs/backup-restore.md covering the snapshot, list, verify, and restore subcommands, default include/skip behavior (deep_research and mail-attachments skipped unless flagged), the destructive-restore warning and its data.before-restore-* stash, a cron example, and Docker-vs-native data/ paths (including the ChromaDB named volume caveat). Link it from the README Data section. Addresses the "Backup/restore guide and helper flow for data/" item in ROADMAP.md. Docs only; no change to the tool. Fixes #2583 Co-authored-by: Claude Opus 4.8 (1M context) --- README.md | 3 + docs/backup-restore.md | 129 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 132 insertions(+) create mode 100644 docs/backup-restore.md diff --git a/README.md b/README.md index 366e92c89..f5ce3e75a 100644 --- a/README.md +++ b/README.md @@ -467,6 +467,9 @@ docs/ landing page (index.html) + preview clips All user data lives in `data/` (gitignored): `app.db` (sessions, messages, documents), `memory.json`, `presets.json`, `uploads/`, `personal_docs/`, `chroma/`, `settings.json`. +To back up or restore everything in `data/`, see the +[Backup & Restore guide](docs/backup-restore.md). + ## Star History diff --git a/docs/backup-restore.md b/docs/backup-restore.md new file mode 100644 index 000000000..902c9e683 --- /dev/null +++ b/docs/backup-restore.md @@ -0,0 +1,129 @@ +# Backup & Restore + +Odysseus keeps all of your state in the `data/` directory — the SQLite database +(`app.db`), the Fernet encryption key (`data/.app_key`), the vault, memory, RAG +indexes, personal documents, and uploads. The `scripts/odysseus-backup` tool +snapshots that directory into a single gzip tarball and restores it later. + +Snapshots are safe to take while the app is running: SQLite databases are copied +through SQLite's own `.backup` API rather than a raw file copy, so an in-flight +write can't corrupt the snapshot. + +> **A snapshot contains your secrets.** The tarball includes the Fernet +> encryption key (`data/.app_key`), the vault, sessions, and any stored +> provider/API tokens — so treat it like a password. Store backups somewhere +> private, never commit them to Git, and prefer an encrypted destination when +> copying them offsite. + +## Quick start + +Run the tool from the repository root: + +```bash +# Create a snapshot → backups/odysseus-backup-.tar.gz +./scripts/odysseus-backup snapshot + +# List existing snapshots (most recent first) +./scripts/odysseus-backup list + +# Check a tarball's integrity without extracting it +./scripts/odysseus-backup verify backups/odysseus-backup-20260101-120000.tar.gz + +# Restore (destructive — see the warning below) +./scripts/odysseus-backup restore backups/odysseus-backup-20260101-120000.tar.gz --yes +``` + +The script depends only on the Python standard library, so any `python3` on your +`PATH` will run it — you don't need the app's virtualenv active. + +Every command prints a JSON result. Add `--pretty` for indented output. + +## Commands + +### `snapshot` + +Writes a `tar.gz` of `data/` to `backups/.tar.gz`. + +| Flag | Effect | +| --- | --- | +| `--out PATH` | Write to a specific path instead of the default `backups/` location. Must be **outside** `data/`. | +| `--include-research` | Include `data/deep_research/` (skipped by default — research runs are large). | +| `--include-attachments` | Include `data/mail-attachments/` (skipped by default — cached IMAP extractions, re-derivable). | + +By default the snapshot includes everything under `data/` **except** +`deep_research/` and `mail-attachments/`. Personal uploads and documents are +included. + +```bash +# Snapshot straight to a mounted NAS path +./scripts/odysseus-backup snapshot --out /mnt/nas/odysseus-$(date +%F).tar.gz + +# Full snapshot including research runs and mail attachments +./scripts/odysseus-backup snapshot --include-research --include-attachments +``` + +### `list` + +Lists the tarballs in `backups/`, most recent first, with size and modification +time. + +### `verify PATH` + +Opens the tarball read-only and walks every member to confirm it is intact and +safe to restore. Nothing is extracted. Use this before relying on an old backup +or after copying one across machines. + +### `restore PATH --yes` + +Overwrites `data/` from a tarball. + +> **Restore is destructive.** It replaces the current `data/` directory. `--yes` +> is required so a mistyped command can't wipe your live state. + +Restore is not a blind delete: before extracting, the tool **renames your current +`data/` to `data.before-restore-`** in the repository root. If a +restore turns out to be wrong, your previous state is still there — delete the +restored `data/` and rename the stashed directory back. The restore path is also +validated entry-by-entry: archives containing absolute paths, `..` segments, +symlinks, or anything outside `data/` are rejected. + +## Scheduling offsite backups + +The tarball output composes cleanly with cron and any copy tool. For example, a +nightly snapshot copied offsite: + +```cron +0 3 * * * cd /path/to/odysseus && ./scripts/odysseus-backup snapshot --out "/mnt/nas/odysseus-$(date +\%F).tar.gz" +``` + +Swap the `--out` target for `scp`, `rclone`, `s3cmd`, or similar to push the +snapshot to remote storage. + +## Docker vs native installs + +The tool reads `data/` and writes `backups/` relative to the repository root, so +where you run it matters: + +- **Native installs** — run it from the repo root as shown above. `data/` and + `backups/` are both in the repo directory. +- **Docker** — `docker-compose.yml` bind-mounts the host's `./data` to + `/app/data`, so the live data is also present on the host. **Run the tool on + the host** from the repo root; the snapshot reads the bind-mounted `./data` and + writes to `./backups` on the host. Running it *inside* the container is not + recommended, because `backups/` is not a mounted volume and the tarball would + be lost when the container is recreated. + +> **ChromaDB caveat (Docker only).** In the Docker setup, ChromaDB stores its +> vectors in a separate Compose-managed volume (declared as `chromadb-data`), +> **not** under `./data`. `odysseus-backup` therefore does not capture the Docker +> ChromaDB store. Back it up separately if you need it. Compose prefixes the +> volume with the project name, so find the real name first +> (`docker volume ls | grep chromadb`), then archive it — for example: +> +> ```bash +> docker run --rm -v _chromadb-data:/data -v "$PWD":/backup \ +> alpine tar czf /backup/chromadb.tar.gz -C /data . +> ``` +> +> On native installs ChromaDB lives at `data/chroma/` and is included in the +> snapshot normally.