mirror of https://github.com/pewdiepie-archdaemon/odysseus.git synced 2026-06-15 09:15:29 -04:00

Files

T

spooky f23e2e6ffb docs: add agent migration manifest helper (#3028 )

* docs: add agent migration manifest helper

* fix: use stat+streamed hash for metadata-only archive scans

When include_content is false, skip reading full file content and
only stat+stream-hash for size and sha256. Avoids spurious skipped-
content warnings and keeps large-export previews fast and clean.

Closes review feedback on PR #3028.

* fix: skip symlinked migration inputs

* fix: stream archive traversal warnings

* feat: stage conversation threads in agent migration manifests

2026-06-15 15:57:33 +09:00

6.0 KiB

Raw Blame History

Agent migration manifests

Odysseus should be able to learn from another agent without blindly trusting that agent's whole state. The safe migration path is:

source agent export -> source adapter -> agent-migration.v1 manifest -> preview -> apply

The manifest is intentionally source-neutral. OpenClaw, Hermes, a folder of Markdown notes, or any other agent can have its own adapter, but Odysseus only needs to understand the normalized manifest.

Why not import everything as memory?

Durable memory should stay compact and useful. Long notes, logs, session transcripts, and project archives are useful context, but they are not all memories. A good migration keeps two layers separate:

Archive documents preserve source material for search, reading, and later extraction.
Memory candidates are short facts or preferences that can be reviewed before being saved into Odysseus memory.

This keeps Odysseus' existing memory-review flow intact while giving it better source material to review.

Manifest shape

agent-migration.v1 is a JSON object:

{
  "schema_version": "agent-migration.v1",
  "generated_at": "2026-06-06T00:00:00Z",
  "source": {
    "name": "example-agent",
    "kind": "generic"
  },
  "summary": {
    "item_count": 3,
    "counts_by_kind": {
      "memory": 1,
      "skill": 1,
      "conversation_thread": 1,
      "archive_document": 1
    },
    "warning_count": 0
  },
  "items": [],
  "warnings": []
}

Each item has a stable id, a kind, source metadata, and enough content for a future importer to preview it before applying.

Supported item kinds in the first pass:

memory — a candidate memory with text, category, source, and provenance metadata.
skill — a SKILL.md file with content and parsed frontmatter metadata.
conversation_thread — a normalized transcript thread from an exported chat history. Message content is optional; adapters can preserve only thread metadata, message counts, timestamps, and hashes when a manifest should stay small or avoid embedding private transcript text.
archive_document — long-form source material. Content is optional; adapters can preserve only path/hash/size metadata when a manifest should stay small.

Build a manifest

Use the read-only helper:

python3 scripts/agent_migration_manifest.py \
  --source-name old-agent \
  --source-kind generic \
  --memory-json /path/to/memories.json \
  --skills-dir /path/to/skills \
  --conversation-json /path/to/conversations.json \
  --archive /path/to/notes \
  --output /tmp/agent-migration.json

The helper does not write to data/, call an LLM, import Odysseus modules, or modify the source. It only writes JSON.

Memory JSON may be:

[
  "A plain memory string",
  {
    "text": "A categorized memory",
    "category": "preference",
    "source": "old-agent"
  }
]

or an object containing a list under memories, memory, items, or data.

Skills are scanned recursively for SKILL.md:

python3 scripts/agent_migration_manifest.py \
  --source-name hermes \
  --source-kind hermes \
  --skills-dir ~/.hermes/skills \
  --output /tmp/hermes-skills-manifest.json

Archive documents are metadata-only by default. To embed text content:

python3 scripts/agent_migration_manifest.py \
  --source-name notes-export \
  --archive /path/to/markdown-notes \
  --include-archive-content \
  --output /tmp/notes-manifest.json

Conversation exports are also metadata-only by default:

python3 scripts/agent_migration_manifest.py \
  --source-name chatgpt-export \
  --source-kind chatgpt \
  --conversation-json /path/to/conversations.json \
  --output /tmp/chatgpt-conversations-manifest.json

The first pass supports generic conversation JSON such as:

[
  {
    "id": "thread-1",
    "title": "Project plan",
    "messages": [
      {"role": "user", "content": "Can we design this?"},
      {"role": "assistant", "content": "Yes, start with a narrow slice."}
    ]
  }
]

It also recognizes ChatGPT-style mapping exports from conversations.json. To embed normalized messages:

python3 scripts/agent_migration_manifest.py \
  --source-name chatgpt-export \
  --source-kind chatgpt \
  --conversation-json /path/to/conversations.json \
  --include-conversation-content \
  --max-conversation-messages 2000 \
  --output /tmp/chatgpt-conversations-with-content.json

Content embedding is explicit because exported chat histories can be huge and private. A future source-specific adapter can add ZIP traversal, attachment metadata, and provider-specific project/workspace fields while still emitting the same conversation_thread manifest item.

Recommended apply behavior

A future Odysseus importer should treat the manifest as untrusted user-provided data and apply it in stages:

Show a dry-run summary with counts, warnings, duplicates, and sample items.
Back up current data/ state before writing anything.
Import archive documents as documents or another searchable source, not as memory.
Import conversation threads as searchable archived context first, with citations back to the source thread. Do not turn whole transcripts into memory.
Show memory candidates for review before saving through the normal memory path.
Import skills only after name/category conflict checks.
Skip secrets by default. Credentials need explicit, provider-specific flows.

What belongs in source adapters?

Adapters can be source-specific. The core manifest should not be.

For example, an OpenClaw adapter may know about OpenClaw's workspace files. A Hermes adapter may know about ~/.hermes/config.yaml and ~/.hermes/skills. A ChatGPT adapter may know about conversations.json, uploaded-file metadata, and image attachment directories. A Claude adapter may know about Claude's export shape and project boundaries. A generic adapter may only know about memory JSON, conversation JSON, SKILL.md, and Markdown folders.

Nonstandard folders should be adapter details, not required Odysseus concepts.

6.0 KiB Raw Blame History