docs: add agent migration manifest helper (#3028)

* docs: add agent migration manifest helper

* fix: use stat+streamed hash for metadata-only archive scans

When include_content is false, skip reading full file content and
only stat+stream-hash for size and sha256. Avoids spurious skipped-
content warnings and keeps large-export previews fast and clean.

Closes review feedback on PR #3028.

* fix: skip symlinked migration inputs

* fix: stream archive traversal warnings

* feat: stage conversation threads in agent migration manifests
This commit is contained in:
spooky
2026-06-15 16:57:33 +10:00
committed by GitHub
parent 955455b797
commit f23e2e6ffb
3 changed files with 1169 additions and 0 deletions
+194
View File
@@ -0,0 +1,194 @@
# Agent migration manifests
Odysseus should be able to learn from another agent without blindly trusting
that agent's whole state. The safe migration path is:
```text
source agent export -> source adapter -> agent-migration.v1 manifest -> preview -> apply
```
The manifest is intentionally source-neutral. OpenClaw, Hermes, a folder of
Markdown notes, or any other agent can have its own adapter, but Odysseus only
needs to understand the normalized manifest.
## Why not import everything as memory?
Durable memory should stay compact and useful. Long notes, logs, session
transcripts, and project archives are useful context, but they are not all
memories. A good migration keeps two layers separate:
- **Archive documents** preserve source material for search, reading, and later
extraction.
- **Memory candidates** are short facts or preferences that can be reviewed
before being saved into Odysseus memory.
This keeps Odysseus' existing memory-review flow intact while giving it better
source material to review.
## Manifest shape
`agent-migration.v1` is a JSON object:
```json
{
"schema_version": "agent-migration.v1",
"generated_at": "2026-06-06T00:00:00Z",
"source": {
"name": "example-agent",
"kind": "generic"
},
"summary": {
"item_count": 3,
"counts_by_kind": {
"memory": 1,
"skill": 1,
"conversation_thread": 1,
"archive_document": 1
},
"warning_count": 0
},
"items": [],
"warnings": []
}
```
Each item has a stable `id`, a `kind`, source metadata, and enough content for a
future importer to preview it before applying.
Supported item kinds in the first pass:
- `memory` — a candidate memory with `text`, `category`, `source`, and
provenance metadata.
- `skill` — a `SKILL.md` file with content and parsed frontmatter metadata.
- `conversation_thread` — a normalized transcript thread from an exported chat
history. Message content is optional; adapters can preserve only thread
metadata, message counts, timestamps, and hashes when a manifest should stay
small or avoid embedding private transcript text.
- `archive_document` — long-form source material. Content is optional; adapters
can preserve only path/hash/size metadata when a manifest should stay small.
## Build a manifest
Use the read-only helper:
```bash
python3 scripts/agent_migration_manifest.py \
--source-name old-agent \
--source-kind generic \
--memory-json /path/to/memories.json \
--skills-dir /path/to/skills \
--conversation-json /path/to/conversations.json \
--archive /path/to/notes \
--output /tmp/agent-migration.json
```
The helper does not write to `data/`, call an LLM, import Odysseus modules, or
modify the source. It only writes JSON.
Memory JSON may be:
```json
[
"A plain memory string",
{
"text": "A categorized memory",
"category": "preference",
"source": "old-agent"
}
]
```
or an object containing a list under `memories`, `memory`, `items`, or `data`.
Skills are scanned recursively for `SKILL.md`:
```bash
python3 scripts/agent_migration_manifest.py \
--source-name hermes \
--source-kind hermes \
--skills-dir ~/.hermes/skills \
--output /tmp/hermes-skills-manifest.json
```
Archive documents are metadata-only by default. To embed text content:
```bash
python3 scripts/agent_migration_manifest.py \
--source-name notes-export \
--archive /path/to/markdown-notes \
--include-archive-content \
--output /tmp/notes-manifest.json
```
Conversation exports are also metadata-only by default:
```bash
python3 scripts/agent_migration_manifest.py \
--source-name chatgpt-export \
--source-kind chatgpt \
--conversation-json /path/to/conversations.json \
--output /tmp/chatgpt-conversations-manifest.json
```
The first pass supports generic conversation JSON such as:
```json
[
{
"id": "thread-1",
"title": "Project plan",
"messages": [
{"role": "user", "content": "Can we design this?"},
{"role": "assistant", "content": "Yes, start with a narrow slice."}
]
}
]
```
It also recognizes ChatGPT-style `mapping` exports from `conversations.json`.
To embed normalized messages:
```bash
python3 scripts/agent_migration_manifest.py \
--source-name chatgpt-export \
--source-kind chatgpt \
--conversation-json /path/to/conversations.json \
--include-conversation-content \
--max-conversation-messages 2000 \
--output /tmp/chatgpt-conversations-with-content.json
```
Content embedding is explicit because exported chat histories can be huge and
private. A future source-specific adapter can add ZIP traversal, attachment
metadata, and provider-specific project/workspace fields while still emitting
the same `conversation_thread` manifest item.
## Recommended apply behavior
A future Odysseus importer should treat the manifest as untrusted user-provided
data and apply it in stages:
1. Show a dry-run summary with counts, warnings, duplicates, and sample items.
2. Back up current `data/` state before writing anything.
3. Import archive documents as documents or another searchable source, not as
memory.
4. Import conversation threads as searchable archived context first, with
citations back to the source thread. Do not turn whole transcripts into
memory.
5. Show memory candidates for review before saving through the normal memory
path.
6. Import skills only after name/category conflict checks.
7. Skip secrets by default. Credentials need explicit, provider-specific flows.
## What belongs in source adapters?
Adapters can be source-specific. The core manifest should not be.
For example, an OpenClaw adapter may know about OpenClaw's workspace files. A
Hermes adapter may know about `~/.hermes/config.yaml` and `~/.hermes/skills`.
A ChatGPT adapter may know about `conversations.json`, uploaded-file metadata,
and image attachment directories. A Claude adapter may know about Claude's
export shape and project boundaries. A generic adapter may only know about
memory JSON, conversation JSON, `SKILL.md`, and Markdown folders.
Nonstandard folders should be adapter details, not required Odysseus concepts.