HanyanOS 部署手记

1. Overview

Memory is the substrate of autonomy. In a multi-agent system running on an Intel N100 (4 cores, 6W TDP) with limited RAM and strict token budgets, memory design is not an abstraction problem — it is a resource allocation problem. Every token spent loading irrelevant context is a token not spent on reasoning.

HanyanOS implements a four-layer hierarchical memory architecture inspired by human sleep consolidation: a permanent long-term layer (facts, rules, identity), a working session layer (conversation context), a dreaming layer (offline consolidation and extraction), and agent-specific workspaces. A memory router (MEMORY.md) serves as the entry point, while a trust-level system prevents hallucination contamination.

This post dissects the architecture, the anchor-based retrieval system, the dreaming pipeline, and the production lessons from running this on a resource-constrained N100.

2. Architecture Overview

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
┌─────────────────────────────────────────────────────────────┐
│ MEMORY.md (Memory Router) │
│ Top-level index only — < 150 lines, anchors only │
├─────────────────────────────────────────────────────────────┤
│ │
│ LAYER 1: LONG-TERM (Permanent, verified facts only) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ memory/infrastructure/ ← network, DNS, nginx, security ││
│ │ memory/user/ ←公子 preferences, profile ││
│ │ memory/projects/ ← active project state ││
│ │ SOUL.md ←人格 (immutable, highest tier) ││
│ │ HanyanOS/rules/ ← governance, 20+ rule files ││
│ │ HanyanOS/knowledge/ ← learned knowledge ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ LAYER 2: WORKING (Session-scoped, ephemeral) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Session buffer (last 60 messages, LLM-slugged) ││
│ │ memory/summaries/ ← compressed session summaries ││
│ │ memory/cache/ ← task-scoped temporary data ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ LAYER 3: DREAMING (Subconscious, offline processing) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ dreaming/light/ ← candidate extraction (~30KB/day) ││
│ │ dreaming/deep/ ← pattern ranking & promotion (155B) ││
│ │ dreaming/rem/ ← rapid-eye-movement reflection (~2.5K)││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ LAYER 4: AGENT WS (Per-agent autonomous memory) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ workspace/coder/memory/ ←沈漫歌's knowledge ││
│ │ workspace/devops/memory/ ←苏清雅's config history ││
│ │ workspace/tester/memory/ ←钟离燕's test fixtures ││
│ │ workspace/... ← each agent self-manages ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘

Key design principle: Memory is a router, not a dump. MEMORY.md contains only anchors — one-liner pointers with paths. All detailed content lives in sub-files, loaded on-demand.

3. The Memory Router — MEMORY.md as a Filesystem

MEMORY.md is strictly constrained to < 150 lines. It stores only:

1
2
3
4
5
6
7
## [基础设施]
- 🧷 [network] → memory/infrastructure/network/README.md
- 🧷 [nginx] → memory/infrastructure/nginx/README.md
- 🧷 [dns] → memory/infrastructure/dns/README.md

## [活跃项目]
- 🧷 Dashboard重构 → HanyanOS/tasks/active/dashboard-refactor.md

Each anchor carries a type tag ([infra], [project], [task], [user], [rule]) and a file path. The retrieval flow:

1
2
3
4
5
6
公子 request → 含烟 parses keywords
→ MEMORY.md anchor lookup (fast path, zero I/O)
→ Hit? → memory_get() on target file
→ Miss? → memory_search() semantic retrieval across all layers
→ Inject only relevant context (HOT/WARM/COLD tiers)
→ Execute task

Memory tiers by access recency:

Tier Condition Inject Strategy
HOT Active task / last 3 days Always inject
WARM Last 7 days / active project On-demand
COLD 7–30 days Only on search hit
FROZEN > 30 days Archived, never auto-load

This tiered injection alone reduced average context size by ~62% compared to the naive approach of loading all memory files.

4. Trust-Level System — Preventing Hallucination Contamination

Every memory record carries a metadata section:

1
2
3
4
5
{
"source": "user_direct | agent_inferred | external_search | dream_output",
"trust_level": "verified | inferred | temporary | conflict",
"updated_at": "2026-05-25T01:00:00+00:00"
}
Level Can Write Long-Term? Example
verified ✅ Yes “公子的 491 visa was granted 2026-04-02”
inferred ❌ Suggest only “Weekly API cost ≈ $60 based on 7-day trend”
temporary ❌ Cache only “Current task: Nginx config for dashboard”
conflict ❌ Resolve first Two contradicting port assignments

Critical rule: Dream output can never write directly to long-term memory. The pipeline is:

1
Dream output → 含烟 review → Summaries → 公子 approval → Long-term

This prevents the most common cause of personality drift in persistent AI systems: the agent hallucinating during offline reasoning and then treating that hallucination as fact.

5. The Dreaming Pipeline — Three-Stage Consolidation

Every night at 03:00 AEST (via cron, see Article #8), the system runs a dream cycle across three stages:

Stage 1: Light Sleep (30KB/day output)

Scans recent session transcripts and identifies candidate memories — patterns, decisions, and facts that appeared with sufficient frequency or emotional salience:

1
2
3
- Candidate: 🧠 [security] session_ref; confidence=0.62
- Candidate: 核心事件: 491签证下签 2026-04-02
- Candidate: 系统修复: Session失忆根因 → 每日4AM重置

Each candidate carries a confidence score, evidence trace (which session + line), and a status (staged | promoted | discarded).

Stage 2: REM Sleep (2.5KB/day output)

Cross-links candidates from multiple sessions and attempts pattern recognition:

1
2
3
### Possible Lasting Truths
- 🧠 [security][emotion] cross-session consistency, confidence=0.75
- 「GPT-SoVITS voice training completed on ASUS WSL2」

Stage 3: Deep Sleep (155B/day output)

Ranks all staged candidates and decides which, if any, should be promoted to long-term:

1
2
3
4
# Deep Sleep
- Repaired recall artifacts: rewrote recall store.
- Ranked 10 candidate(s) for durable promotion.
- Promoted 0 candidate(s) into MEMORY.md.

Promotion is conservative by design. On a typical night, 0–2 candidates are promoted; the rest remain staged for human review.

Dreaming configuration (OpenClaw plugins):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"plugins": {
"entries": {
"memory-core": {
"config": {
"dreaming": {
"enabled": true,
"light": { "interval": "0 3 * * *", "maxCandidates": 50 },
"deep": { "interval": "0 3:30 * * *" },
"rem": { "interval": "0 3:45 * * *" },
"maxTokensPerCycle": 8192
}
}
},
"memory-wiki": {
"enabled": true,
"config": {
"bridge": {
"readMemoryArtifacts": true,
"indexDreamReports": true,
"indexDailyNotes": true,
"followMemoryEvents": true
}
}
}
}
}

6. Agent Workspace Isolation (v3.0)

Since v3.0 (2026-05-15), each sub-agent maintains its own memory workspace:

1
2
3
4
5
6
7
workspace/
├── coder/memory/ ←沈漫歌's code patterns, project context
├── devops/memory/ ←苏清雅's deployment history
├── tester/memory/ ←钟离燕's test cases and fixtures
├── security/memory/ ←洛飞烟's audit findings
├── uxui/memory/ ←楚灵儿's design decisions
└── accountant/memory/ ←江惜月's cost baselines

Rules:

  • 含烟 can write: Serena can inject knowledge into any agent’s memory (e.g., adding a new deployment procedure to devops)
  • Agents self-manage: Each agent can freely read/write its own workspace
  • No cross-contamination: Agents cannot write to Serena’s memory/ or MEMORY.md
  • Nightly audit: Serena reviews agent memories for drift, duplication, and staleness

This isolation pattern reduced context pollution incidents by 80% compared to the shared-memory approach used in v2.x.

7. Problems Encountered and Solutions

Problem 1: Memory Bloat — 182 Files, 60MB+ Raw Text

Issue: After three months of operation, the memory directory accumulated 182 files across layers. Loading all of them consumed ~12K tokens just in system prompts, degrading reasoning quality.

Solution: Enforced the Memory Router pattern. MEMORY.md was slimmed from 450+ lines to < 150. All detailed content was moved to sub-files with anchor pointers. The context injection was tiered (HOT/WARM/COLD/FROZEN).

1
2
3
# Before: 450 lines, no structure, full load
# After: 148 lines, 12 anchors, on-demand load
wc -l MEMORY.md # → 148

Problem 2: Session Amnesia — Daily 4AM Reset

Issue: OpenClaw’s default session cleanup runs at 04:00 daily, deleting session transcripts. Agents would “forget” everything from the previous day.

Solution: Added a 03:30 pre-cleanup cron that archives session context to memory/summaries/ before the reset, configured llmSlug for efficient retrieval, and set session message retention to 60 messages:

1
2
3
4
5
6
7
8
"session": {
"maxMessagesPerChannel": 60,
"llmSlug": {
"enabled": true,
"model": "deepseek/deepseek-v4-flash",
"interval": 10
}
}

Problem 3: Dream Candidate Quality — High Noise, Low Signal

Issue: Early dreaming runs extracted hundreds of candidates per night with confidence scores at 0.50–0.62, most of which were trivial conversation snippets.

Solution: Added a minConfidence threshold of 0.65 for promotion consideration, and introduced evidence chains — candidates must be traceable to at least 2 independent session references to qualify:

1
2
3
4
5
6
7
8
9
# Pseudo: promotion gate logic
if candidate.confidence < 0.65:
discard()
elif candidate.session_references < 2:
stage("insufficient_evidence")
elif candidate.conflict_with_verified:
stage("conflict")
else:
promote_to_summaries()

Issue: As files were moved during restructuring, MEMORY.md anchors pointed to nonexistent paths, causing memory_get() failures.

Solution: Added a nightly anchor integrity check that validates every path in MEMORY.md:

1
2
3
4
# Extracted from nightly cron
grep -oP '(?<=→ ).*' MEMORY.md | while read path; do
[ -f "$path" ] || echo "DEAD LINK: $path"
done

Dead links are automatically downgraded to a ⚠️ warning in MEMORY.md and re-checked after 24h.

8. File Distribution and Storage

Current state after optimization (as of 2026-05-25):

Category File Count Avg Size Total
Long-term memory ~60 8KB ~480KB
Governance rules ~80 3KB ~240KB
Dreaming (all stages) ~30 10KB ~300KB
Agent workspaces ~12 4KB ~48KB
Total ~182 ~1.1MB

On the N100’s 512GB NVMe, this is negligible. The token cost of loading is the constraint, not storage.

9. Summary

HanyanOS’s layered memory architecture achieves sub-second retrieval, sub-2000-token context overhead, and multi-month persistence on an Intel N100:

  • Memory Router (MEMORY.md) — < 150 lines, anchors-only, on-demand loading
  • Four layers — Long-term (verified), Working (session), Dreaming (offline), Agent WS (isolated)
  • Dreaming pipeline — Light → REM → Deep, conservative promotion, no direct long-term writes
  • Trust-level metadata — Prevents hallucination contamination from dreams and inferences
  • Tiered injection — HOT/WARM/COLD/FROZEN reduces context size by 62%
  • Agent workspace isolation — Each sub-agent self-manages memory without cross-contamination
  • Nightly integrity checks — Dead link detection, drift audit, staleness reporting

This architecture runs continuously since v3.0 (May 15) with zero hallucination-contamination incidents and an average memory retrieval latency of < 200ms.


Next in series: #8 — Nightly Automation — Cron Job Orchestration and System Maintenance