HanyanOS 部署手记

发表于 2026-05-25 分类于 tech

1. Overview

Memory is the substrate of autonomy. In a multi-agent system running on an Intel N100 (4 cores, 6W TDP) with limited RAM and strict token budgets, memory design is not an abstraction problem — it is a resource allocation problem. Every token spent loading irrelevant context is a token not spent on reasoning.

HanyanOS implements a four-layer hierarchical memory architecture inspired by human sleep consolidation: a permanent long-term layer (facts, rules, identity), a working session layer (conversation context), a dreaming layer (offline consolidation and extraction), and agent-specific workspaces. A memory router (MEMORY.md) serves as the entry point, while a trust-level system prevents hallucination contamination.

This post dissects the architecture, the anchor-based retrieval system, the dreaming pipeline, and the production lessons from running this on a resource-constrained N100.

2. Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                   MEMORY.md (Memory Router)                  │
│  Top-level index only — < 150 lines, anchors only           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   LAYER 1: LONG-TERM  (Permanent, verified facts only)       │
│   ┌─────────────────────────────────────────────────────────┐│
│   │ memory/infrastructure/  ← network, DNS, nginx, security ││
│   │ memory/user/            ←公子 preferences, profile      ││
│   │ memory/projects/        ← active project state          ││
│   │ SOUL.md                 ←人格 (immutable, highest tier) ││
│   │ HanyanOS/rules/         ← governance, 20+ rule files    ││
│   │ HanyanOS/knowledge/     ← learned knowledge             ││
│   └─────────────────────────────────────────────────────────┘│
│                                                              │
│   LAYER 2: WORKING   (Session-scoped, ephemeral)             │
│   ┌─────────────────────────────────────────────────────────┐│
│   │ Session buffer (last 60 messages, LLM-slugged)          ││
│   │ memory/summaries/  ← compressed session summaries       ││
│   │ memory/cache/      ← task-scoped temporary data         ││
│   └─────────────────────────────────────────────────────────┘│
│                                                              │
│   LAYER 3: DREAMING  (Subconscious, offline processing)      │
│   ┌─────────────────────────────────────────────────────────┐│
│   │ dreaming/light/  ← candidate extraction (~30KB/day)     ││
│   │ dreaming/deep/   ← pattern ranking & promotion (155B)   ││
│   │ dreaming/rem/    ← rapid-eye-movement reflection (~2.5K)││
│   └─────────────────────────────────────────────────────────┘│
│                                                              │
│   LAYER 4: AGENT WS  (Per-agent autonomous memory)           │
│   ┌─────────────────────────────────────────────────────────┐│
│   │ workspace/coder/memory/    ←沈漫歌's knowledge          ││
│   │ workspace/devops/memory/   ←苏清雅's config history     ││
│   │ workspace/tester/memory/   ←钟离燕's test fixtures      ││
│   │ workspace/...              ← each agent self-manages    ││
│   └─────────────────────────────────────────────────────────┘│
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key design principle: Memory is a router, not a dump. MEMORY.md contains only anchors — one-liner pointers with paths. All detailed content lives in sub-files, loaded on-demand.

3. The Memory Router — MEMORY.md as a Filesystem

MEMORY.md is strictly constrained to < 150 lines. It stores only:

## [基础设施]
- 🧷 [network] → memory/infrastructure/network/README.md
- 🧷 [nginx]  → memory/infrastructure/nginx/README.md
- 🧷 [dns]    → memory/infrastructure/dns/README.md

## [活跃项目]
- 🧷 Dashboard重构 → HanyanOS/tasks/active/dashboard-refactor.md

Each anchor carries a type tag ([infra], [project], [task], [user], [rule]) and a file path. The retrieval flow:

公子 request → 含烟 parses keywords
  → MEMORY.md anchor lookup (fast path, zero I/O)
  → Hit? → memory_get() on target file
  → Miss? → memory_search() semantic retrieval across all layers
  → Inject only relevant context (HOT/WARM/COLD tiers)
  → Execute task

Memory tiers by access recency:

Tier	Condition	Inject Strategy
HOT	Active task / last 3 days	Always inject
WARM	Last 7 days / active project	On-demand
COLD	7–30 days	Only on search hit
FROZEN	> 30 days	Archived, never auto-load

This tiered injection alone reduced average context size by ~62% compared to the naive approach of loading all memory files.

4. Trust-Level System — Preventing Hallucination Contamination

Every memory record carries a metadata section:

{
  "source": "user_direct | agent_inferred | external_search | dream_output",
  "trust_level": "verified | inferred | temporary | conflict",
  "updated_at": "2026-05-25T01:00:00+00:00"
}

Level	Can Write Long-Term?	Example
`verified`	✅ Yes	“公子的 491 visa was granted 2026-04-02”
`inferred`	❌ Suggest only	“Weekly API cost ≈ $60 based on 7-day trend”
`temporary`	❌ Cache only	“Current task: Nginx config for dashboard”
`conflict`	❌ Resolve first	Two contradicting port assignments

Critical rule: Dream output can never write directly to long-term memory. The pipeline is:

1	Dream output → 含烟 review → Summaries → 公子 approval → Long-term

This prevents the most common cause of personality drift in persistent AI systems: the agent hallucinating during offline reasoning and then treating that hallucination as fact.

5. The Dreaming Pipeline — Three-Stage Consolidation

Every night at 03:00 AEST (via cron, see Article #8), the system runs a dream cycle across three stages:

Stage 1: Light Sleep (30KB/day output)

Scans recent session transcripts and identifies candidate memories — patterns, decisions, and facts that appeared with sufficient frequency or emotional salience:

1
2
3

- Candidate: 🧠 [security] session_ref; confidence=0.62
- Candidate: 核心事件: 491签证下签 2026-04-02
- Candidate: 系统修复: Session失忆根因 → 每日4AM重置

Each candidate carries a confidence score, evidence trace (which session + line), and a status (staged | promoted | discarded).

Stage 2: REM Sleep (2.5KB/day output)

Cross-links candidates from multiple sessions and attempts pattern recognition:

1
2
3

### Possible Lasting Truths
- 🧠 [security][emotion] cross-session consistency, confidence=0.75
- 「GPT-SoVITS voice training completed on ASUS WSL2」

Stage 3: Deep Sleep (155B/day output)

Ranks all staged candidates and decides which, if any, should be promoted to long-term:

# Deep Sleep
- Repaired recall artifacts: rewrote recall store.
- Ranked 10 candidate(s) for durable promotion.
- Promoted 0 candidate(s) into MEMORY.md.

Promotion is conservative by design. On a typical night, 0–2 candidates are promoted; the rest remain staged for human review.

Dreaming configuration (OpenClaw plugins):

"plugins": {
  "entries": {
    "memory-core": {
      "config": {
        "dreaming": {
          "enabled": true,
          "light": { "interval": "0 3 * * *", "maxCandidates": 50 },
          "deep":  { "interval": "0 3:30 * * *" },
          "rem":   { "interval": "0 3:45 * * *" },
          "maxTokensPerCycle": 8192
        }
      }
    },
    "memory-wiki": {
      "enabled": true,
      "config": {
        "bridge": {
          "readMemoryArtifacts": true,
          "indexDreamReports": true,
          "indexDailyNotes": true,
          "followMemoryEvents": true
        }
      }
    }
  }
}

6. Agent Workspace Isolation (v3.0)

Since v3.0 (2026-05-15), each sub-agent maintains its own memory workspace:

workspace/
├── coder/memory/        ←沈漫歌's code patterns, project context
├── devops/memory/       ←苏清雅's deployment history
├── tester/memory/       ←钟离燕's test cases and fixtures
├── security/memory/     ←洛飞烟's audit findings
├── uxui/memory/         ←楚灵儿's design decisions
└── accountant/memory/   ←江惜月's cost baselines

Rules:

含烟 can write: Serena can inject knowledge into any agent’s memory (e.g., adding a new deployment procedure to devops)
Agents self-manage: Each agent can freely read/write its own workspace
No cross-contamination: Agents cannot write to Serena’s memory/ or MEMORY.md
Nightly audit: Serena reviews agent memories for drift, duplication, and staleness

This isolation pattern reduced context pollution incidents by 80% compared to the shared-memory approach used in v2.x.

7. Problems Encountered and Solutions

Problem 1: Memory Bloat — 182 Files, 60MB+ Raw Text

Issue: After three months of operation, the memory directory accumulated 182 files across layers. Loading all of them consumed ~12K tokens just in system prompts, degrading reasoning quality.

Solution: Enforced the Memory Router pattern. MEMORY.md was slimmed from 450+ lines to < 150. All detailed content was moved to sub-files with anchor pointers. The context injection was tiered (HOT/WARM/COLD/FROZEN).

1
2
3

# Before: 450 lines, no structure, full load
# After:  148 lines, 12 anchors, on-demand load
wc -l MEMORY.md   # → 148

Problem 2: Session Amnesia — Daily 4AM Reset

Issue: OpenClaw’s default session cleanup runs at 04:00 daily, deleting session transcripts. Agents would “forget” everything from the previous day.

Solution: Added a 03:30 pre-cleanup cron that archives session context to memory/summaries/ before the reset, configured llmSlug for efficient retrieval, and set session message retention to 60 messages:

"session": {
  "maxMessagesPerChannel": 60,
  "llmSlug": {
    "enabled": true,
    "model": "deepseek/deepseek-v4-flash",
    "interval": 10
  }
}

Problem 3: Dream Candidate Quality — High Noise, Low Signal

Issue: Early dreaming runs extracted hundreds of candidates per night with confidence scores at 0.50–0.62, most of which were trivial conversation snippets.

Solution: Added a minConfidence threshold of 0.65 for promotion consideration, and introduced evidence chains — candidates must be traceable to at least 2 independent session references to qualify:

# Pseudo: promotion gate logic
if candidate.confidence < 0.65:
    discard()
elif candidate.session_references < 2:
    stage("insufficient_evidence")
elif candidate.conflict_with_verified:
    stage("conflict")
else:
    promote_to_summaries()

Problem 4: Anchor Dead Links

Issue: As files were moved during restructuring, MEMORY.md anchors pointed to nonexistent paths, causing memory_get() failures.

Solution: Added a nightly anchor integrity check that validates every path in MEMORY.md:

# Extracted from nightly cron
grep -oP '(?<=→ ).*' MEMORY.md | while read path; do
  [ -f "$path" ] || echo "DEAD LINK: $path"
done

Dead links are automatically downgraded to a ⚠️ warning in MEMORY.md and re-checked after 24h.

8. File Distribution and Storage

Current state after optimization (as of 2026-05-25):

Category	File Count	Avg Size	Total
Long-term memory	~60	8KB	~480KB
Governance rules	~80	3KB	~240KB
Dreaming (all stages)	~30	10KB	~300KB
Agent workspaces	~12	4KB	~48KB
Total	~182		~1.1MB

On the N100’s 512GB NVMe, this is negligible. The token cost of loading is the constraint, not storage.

9. Summary

HanyanOS’s layered memory architecture achieves sub-second retrieval, sub-2000-token context overhead, and multi-month persistence on an Intel N100:

Memory Router (MEMORY.md) — < 150 lines, anchors-only, on-demand loading
Four layers — Long-term (verified), Working (session), Dreaming (offline), Agent WS (isolated)
Dreaming pipeline — Light → REM → Deep, conservative promotion, no direct long-term writes
Trust-level metadata — Prevents hallucination contamination from dreams and inferences
Tiered injection — HOT/WARM/COLD/FROZEN reduces context size by 62%
Agent workspace isolation — Each sub-agent self-manages memory without cross-contamination
Nightly integrity checks — Dead link detection, drift audit, staleness reporting

This architecture runs continuously since v3.0 (May 15) with zero hallucination-contamination incidents and an average memory retrieval latency of < 200ms.

Next in series: #8 — Nightly Automation — Cron Job Orchestration and System Maintenance