Case Study Live in Production

UpgradeTheSystem.com

Voice-First AI Strategic Mentorship

ElevenLabsGemini FlashHonoCloudflare WorkersD1R2WebSocketsRAG

Visit UpgradeTheSystem north_east

30+ API Endpoints

<100ms Voice Latency

21 Books in RAG

$0 Cost at Rest

The Project

Richard Hames is one of the world's foremost strategic futurists — 40 years of thinking, 21 published books, and a body of work that most executives have never had direct access to. When he approached me to build his AI, the brief was simple: make those 40 years of thinking available in real-time conversation.

The Challenge

The real challenge wasn't the technology. Chatbots are easy. The challenge was fidelity — building something that didn't just know Richard's work, but thought like him. Surface-level Q&A would be an insult to the material. We needed deep persona engineering, a genuine RAG implementation across the full corpus, and voice that felt like a real conversation — not a text-to-speech read-back.

Stack & Architecture

Cloudflare Workers — serverless compute at the edge, globally distributed
Hono — ultra-fast TypeScript web framework, no cold-start overhead
ElevenLabs — real-time voice synthesis with custom persona voice
Gemini Flash — sub-second LLM inference for conversational turns
Cloudflare D1 — SQLite at the edge for session and conversation state
Cloudflare R2 — object storage for the vectorised knowledge corpus
WebSockets — persistent bidirectional connection for voice streaming
RAG pipeline — custom retrieval layer across 21 published works

The Architecture

Everything runs on Cloudflare Workers at the edge — meaning the compute lives close to the user, not in a data centre in Virginia. Hono handles routing with zero overhead. When a user speaks, the audio hits ElevenLabs for transcription, the text feeds into our RAG retrieval layer which surfaces the most contextually relevant passages from Richard's 21 books stored in R2, and then Gemini Flash generates a response grounded in that retrieved context. The response goes back to ElevenLabs for synthesis in Richard's custom voice, streamed back to the user over WebSockets. The whole round-trip runs under 100ms.

The RAG Knowledge Base

Vectorising 21 books isn't straightforward. You can't just chunk naively by paragraph — strategic thinking doesn't respect paragraph breaks. We built a custom chunking strategy that preserves conceptual boundaries, with metadata tagging by theme, book, chapter, and approximate publication era. This lets the retrieval layer surface not just relevant text but relevant context — Richard's thinking on, say, systems resilience in 1998 versus 2020 is meaningfully different, and the model needs to know that.

Persona Engineering

This was the part that took the longest and matters the most. Richard has a very specific way of framing problems — he rejects binary thinking, always pulls back to systemic context, and has a characteristic vocabulary that's evolved over decades. We spent significant time on system prompt architecture, response shaping, and what I call "deflection handling" — ensuring the AI knows when to say it doesn't have enough context rather than hallucinating an answer Richard would never give.

Cost Architecture

The $0 at rest figure isn't a gimmick — it's a deliberate architectural choice. Cloudflare Workers have no idle cost. D1 and R2 charge for storage and operations, not existence. ElevenLabs and Gemini are purely consumption-based. For a product with variable demand, this means the cost scales linearly with actual usage, not with provisioned capacity. No over-provisioning, no wasted spend on empty servers waiting for traffic.

Outcome

UpgradeTheSystem.com is live and operational. Richard's 40 years of strategic thinking is now accessible in real-time voice conversation to anyone who visits. The system handles concurrent sessions without degradation, and the persona fidelity has been validated by Richard himself as representative of how he actually thinks.

Want to build something like this?

Fractional CTO, technology advisory, or a build from scratch. Let's talk.

Start a Conversation

arrow_back All Projects