HanyanOS 部署手记
1. Overview
Memory is the substrate of autonomy. In a multi-agent system running on an Intel N100 (4 cores, 6W TDP) with limited RAM and strict token budgets, memory design is not an abstraction problem — it is a resource allocation problem. Every token spent loading irrelevant context is a token not spent on reasoning.
HanyanOS implements a four-layer hierarchical memory architecture inspired by human sleep consolidation: a permanent long-term layer (facts, rules, identity), a working session layer (conversation context), a dreaming layer (offline consolidation and extraction), and agent-specific workspaces. A memory router (MEMORY.md) serves as the entry point, while a trust-level system prevents hallucination contamination.
This post dissects the architecture, the anchor-based retrieval system, the dreaming pipeline, and the production lessons from running this on a resource-constrained N100.
2. Architecture Overview
1 | ┌─────────────────────────────────────────────────────────────┐ |
Key design principle: Memory is a router, not a dump. MEMORY.md contains only anchors — one-liner pointers with paths. All detailed content lives in sub-files, loaded on-demand.
3. The Memory Router — MEMORY.md as a Filesystem
MEMORY.md is strictly constrained to < 150 lines. It stores only:
1 | ## [基础设施] |
Each anchor carries a type tag ([infra], [project], [task], [user], [rule]) and a file path. The retrieval flow:
1 | 公子 request → 含烟 parses keywords |
Memory tiers by access recency:
| Tier | Condition | Inject Strategy |
|---|---|---|
| HOT | Active task / last 3 days | Always inject |
| WARM | Last 7 days / active project | On-demand |
| COLD | 7–30 days | Only on search hit |
| FROZEN | > 30 days | Archived, never auto-load |
This tiered injection alone reduced average context size by ~62% compared to the naive approach of loading all memory files.
4. Trust-Level System — Preventing Hallucination Contamination
Every memory record carries a metadata section:
1 | { |
| Level | Can Write Long-Term? | Example |
|---|---|---|
verified |
✅ Yes | “公子的 491 visa was granted 2026-04-02” |
inferred |
❌ Suggest only | “Weekly API cost ≈ $60 based on 7-day trend” |
temporary |
❌ Cache only | “Current task: Nginx config for dashboard” |
conflict |
❌ Resolve first | Two contradicting port assignments |
Critical rule: Dream output can never write directly to long-term memory. The pipeline is:
1 | Dream output → 含烟 review → Summaries → 公子 approval → Long-term |
This prevents the most common cause of personality drift in persistent AI systems: the agent hallucinating during offline reasoning and then treating that hallucination as fact.
5. The Dreaming Pipeline — Three-Stage Consolidation
Every night at 03:00 AEST (via cron, see Article #8), the system runs a dream cycle across three stages:
Stage 1: Light Sleep (30KB/day output)
Scans recent session transcripts and identifies candidate memories — patterns, decisions, and facts that appeared with sufficient frequency or emotional salience:
1 | - Candidate: 🧠 [security] session_ref; confidence=0.62 |
Each candidate carries a confidence score, evidence trace (which session + line), and a status (staged | promoted | discarded).
Stage 2: REM Sleep (2.5KB/day output)
Cross-links candidates from multiple sessions and attempts pattern recognition:
1 | ### Possible Lasting Truths |
Stage 3: Deep Sleep (155B/day output)
Ranks all staged candidates and decides which, if any, should be promoted to long-term:
1 | # Deep Sleep |
Promotion is conservative by design. On a typical night, 0–2 candidates are promoted; the rest remain staged for human review.
Dreaming configuration (OpenClaw plugins):
1 | "plugins": { |
6. Agent Workspace Isolation (v3.0)
Since v3.0 (2026-05-15), each sub-agent maintains its own memory workspace:
1 | workspace/ |
Rules:
- 含烟 can write: Serena can inject knowledge into any agent’s memory (e.g., adding a new deployment procedure to devops)
- Agents self-manage: Each agent can freely read/write its own workspace
- No cross-contamination: Agents cannot write to Serena’s memory/ or MEMORY.md
- Nightly audit: Serena reviews agent memories for drift, duplication, and staleness
This isolation pattern reduced context pollution incidents by 80% compared to the shared-memory approach used in v2.x.
7. Problems Encountered and Solutions
Problem 1: Memory Bloat — 182 Files, 60MB+ Raw Text
Issue: After three months of operation, the memory directory accumulated 182 files across layers. Loading all of them consumed ~12K tokens just in system prompts, degrading reasoning quality.
Solution: Enforced the Memory Router pattern. MEMORY.md was slimmed from 450+ lines to < 150. All detailed content was moved to sub-files with anchor pointers. The context injection was tiered (HOT/WARM/COLD/FROZEN).
1 | # Before: 450 lines, no structure, full load |
Problem 2: Session Amnesia — Daily 4AM Reset
Issue: OpenClaw’s default session cleanup runs at 04:00 daily, deleting session transcripts. Agents would “forget” everything from the previous day.
Solution: Added a 03:30 pre-cleanup cron that archives session context to memory/summaries/ before the reset, configured llmSlug for efficient retrieval, and set session message retention to 60 messages:
1 | "session": { |
Problem 3: Dream Candidate Quality — High Noise, Low Signal
Issue: Early dreaming runs extracted hundreds of candidates per night with confidence scores at 0.50–0.62, most of which were trivial conversation snippets.
Solution: Added a minConfidence threshold of 0.65 for promotion consideration, and introduced evidence chains — candidates must be traceable to at least 2 independent session references to qualify:
1 | # Pseudo: promotion gate logic |
Problem 4: Anchor Dead Links
Issue: As files were moved during restructuring, MEMORY.md anchors pointed to nonexistent paths, causing memory_get() failures.
Solution: Added a nightly anchor integrity check that validates every path in MEMORY.md:
1 | # Extracted from nightly cron |
Dead links are automatically downgraded to a ⚠️ warning in MEMORY.md and re-checked after 24h.
8. File Distribution and Storage
Current state after optimization (as of 2026-05-25):
| Category | File Count | Avg Size | Total |
|---|---|---|---|
| Long-term memory | ~60 | 8KB | ~480KB |
| Governance rules | ~80 | 3KB | ~240KB |
| Dreaming (all stages) | ~30 | 10KB | ~300KB |
| Agent workspaces | ~12 | 4KB | ~48KB |
| Total | ~182 | ~1.1MB |
On the N100’s 512GB NVMe, this is negligible. The token cost of loading is the constraint, not storage.
9. Summary
HanyanOS’s layered memory architecture achieves sub-second retrieval, sub-2000-token context overhead, and multi-month persistence on an Intel N100:
- Memory Router (MEMORY.md) — < 150 lines, anchors-only, on-demand loading
- Four layers — Long-term (verified), Working (session), Dreaming (offline), Agent WS (isolated)
- Dreaming pipeline — Light → REM → Deep, conservative promotion, no direct long-term writes
- Trust-level metadata — Prevents hallucination contamination from dreams and inferences
- Tiered injection — HOT/WARM/COLD/FROZEN reduces context size by 62%
- Agent workspace isolation — Each sub-agent self-manages memory without cross-contamination
- Nightly integrity checks — Dead link detection, drift audit, staleness reporting
This architecture runs continuously since v3.0 (May 15) with zero hallucination-contamination incidents and an average memory retrieval latency of < 200ms.
Next in series: #8 — Nightly Automation — Cron Job Orchestration and System Maintenance