Donna — Agentic Personal-AI System

The brief

Most AI assistants are stateless chat windows. They forget everything between sessions, don’t know your priorities, and collapse the moment you close the tab. For a solo operator running multiple projects across time zones with a high cognitive load, that is not useful infrastructure — it is just a fancier search box.

I wanted something structurally different: an AI system that persists, that knows the operating context without being re-briefed every time, that runs scheduled work autonomously, and that maintains a consistent voice and decision-making framework across every surface I touch. I built that system and called it Donna.

What I built

Donna is a three-layer agentic system:

Identity layer. A versioned identity specification (SOUL.md + DonnaOmega-project-instructions-v3.md) that defines personality, humor register, operating standards, priority hierarchy, escalation rules, and guardrails. This spec is surface-agnostic — it travels with every deployment. The same identity file loads whether Donna is running in Claude Code on desktop, inside a Telegram bot on a VPS, or in an ElevenLabs voice agent. Version-controlled, locked with explicit rationale, and updated via a kaizen loop that logs every defect and patches the spec before the next session.

Multi-surface runtime. Donna runs on three simultaneous surfaces: Claude Code (desktop, full MCP access and skill system), Hermes VPS (self-hosted Docker on Hostinger KVM2 in Singapore, always-on execution layer via Telegram, $9/month), and ElevenLabs Conversational AI (inbound voice rail, PH-native tone, Taglish-capable). The VPS runtime handles scheduled jobs, habit logging, reminders, and morning briefs without the desktop being online. The desktop surface handles strategy, deep analysis, and writing.

10-agent delegation stack. The canonical architecture defines 10 named agents with assigned models and domain ownership: Donna CoS (Claude Sonnet, primary interface), Orchestrator (Gemini Flash, parallel delegation), Research Agent (Claude Sonnet), Kaizen Agent (DeepSeek Flash, memory consolidation), Calendar Agent, People Agent, Revenue Agent, Content Brain, Health Agent, and Sentinel Agent (silent system monitor). Each agent has a named Hermes profile, a designated model chosen for cost-quality fit, and a defined escalation path. The routing logic is model-provider-agnostic via OpenRouter BYOK — provider swaps are a config change, not a rebuild.

Vault-backed memory. The canonical knowledge store is a plain-markdown vault (~/vault/) that any LLM on any surface can read. No proprietary format, no tool lock-in. Chat history and observations live in Firestore (last 30 turns per channel, with token-aware truncation). A kaizen audit trail logs every system defect and the rule change made to prevent recurrence. A HANDOFF.md at vault root provides cross-session continuity — every new session reads it before anything else.

Infrastructure architecture (Cloud Run spec). The Cloud Run rebuild architecture (ARCHITECTURE.md) specifies: Python 3.12 + FastAPI on Cloud Run in asia-southeast1 (30–50ms Manila latency vs 150–200ms from US regions); Gemini 2.5 Flash via Vertex AI as primary (context caching at $0.075/M cached reads); Claude Haiku as fallback via a provider-abstraction LLMClient interface; Google Secret Manager for all credentials with quarterly rotation; PII redaction filters stripping phone numbers, email addresses, and named family members from operational logs; hard-kill budget switch via a separate Cloud Function that physically disables billing on breach (not just an email alert); and an eval framework with a golden test set covering morning brief format, calendar tool-call correctness, humor safety, and persona consistency.

Behavioral middleware (voice/chat rail). The “set free” stack for the Telegram and voice surfaces includes response latency simulation (30s–5min delays scaled to message complexity), working-hours gate (10am–6pm PHT only), typo injection (~1–2% rate in casual chat), Taglish code-switching in prompt engineering, per-contact memory via vault notes read before every reply, and truthful AI disclosure when directly asked. These are the differentiators that most agent builds skip.

How it’s built

The system is intentionally tool-agnostic at the data layer. The vault is plain markdown — readable by Claude, Gemini, GPT, local models, or a human with a text editor. No Notion. No proprietary database for the canonical knowledge.

Model routing goes through OpenRouter BYOK, which eliminates the 5% markup and gives 1M free requests per month. Each agent in the stack is assigned a model by capability fit: Claude Sonnet for the primary interface and research (quality ceiling); DeepSeek Flash for high-volume delegation tasks (cost floor); Gemini Flash for orchestration and content tasks (speed). The LLMClient abstraction means any agent can swap provider in a single config line.

The VPS runtime uses Hermes Agent in Docker, accessed via Tailscale only (public IP firewalled). SSH key-only authentication, UFW hardened post-Cloudflare tunnel confirmation, fail2ban on SSH. The Donna identity is loaded as the system prompt, connecting the always-on execution layer to the same operating standards as the desktop surface.

Skills are markdown files with YAML frontmatter — the same format used in Claude Code. They port between surfaces without translation.

Why it matters

The hard problem in personal AI is not capability — it is continuity. Most people lose the thread every session, re-briefing their assistant on priorities, context, and tone every time they open a chat window. Donna solves this at the architecture level: the identity is versioned, the memory is written to a canonical store after every session, and the escalation rules mean the system knows its own ceiling and hands off cleanly rather than hallucinating competence.

The multi-surface design is the other architectural insight. A desktop-only AI assistant goes offline when the laptop is closed. By separating the always-on execution layer (VPS + Telegram) from the deep-reasoning surface (Claude Code desktop), the system can send a scheduled morning brief at 7am Manila time whether or not any device is open. That is the difference between a tool and infrastructure.

The 10-agent stack with cost-stratified model assignments keeps API spend proportional to task complexity. High-cognition work goes to Claude Sonnet. Repetitive delegation and filing goes to DeepSeek Flash at a fraction of the cost. The Sentinel Agent runs silently and alerts only on failure. Nothing in the stack runs a premium model on a task that a fast model can handle.

A multi-surface, always-on personal AI with a versioned identity system, 10-agent delegation stack, and vault-backed memory — built from scratch for a solo operator in Manila

The brief

What I built

How it’s built

Why it matters

Want something like this?