Building Pericode: Voice-Controlled Development

The Vision

What if you could write code while walking? Debug during your morning run? Review pull requests on the treadmill? This question led me to build Pericode, a voice-controlled interface for Claude Code that lets developers be productive anywhere.

The name comes from Aristotle's peripatetic school of philosophy, where he taught while walking. The idea: your best thinking often happens when you're moving, not sitting still.

Website: pericode.app

The Challenge: Three Hard Problems

Building Pericode meant solving three distinct technical challenges:

Voice: What's the best solution that balances cost, privacy, and user experience?
Context: How do we give Claude more context to enable tight development loops when not physically at the desk?
Privacy: How do you protect sensitive code and conversations when routing through relay servers?

Architecture Overview

Pericode consists of three components working in harmony:

1. Desktop App (Electron)

The desktop app runs on your development machine and orchestrates Claude Code agents:

Multi-agent system: Spawns Claude Code CLI processes as independent agents
Real-time streaming: Parses JSON output from Claude Code and streams to mobile
Screenshot integration: MCP (Model Context Protocol) server captures screenshots for visual context
Tool execution: Handles bash commands, file edits, and approvals
Firebase integration: Session discovery and user authentication

2. iOS App (SwiftUI)

The mobile app provides a session-based voice interface using cutting-edge offline AI:

Sherpa-ONNX: Neural network models for speech recognition and synthesis running entirely on-device
Sub-second latency: STT completes in under 1 second, no cloud round-trip
Privacy-first: Voice data never leaves your phone
Session-based interaction: Start a microphone session to communicate with Claude Code on your desktop

3. Relay Server (Cloudflare Workers)

A global edge network routes encrypted messages between devices:

Durable Objects: Session state maintained at the edge
~50ms latency: Messages routed through Cloudflare's global network
Zero configuration: No port forwarding or VPN setup needed
Auto-scaling: Handles hundreds of concurrent sessions

Solving Voice: Cost, Privacy, and UX

The voice challenge required balancing three competing factors. Cloud-based services (Google, Azure, AWS) offer great UX but come with recurring costs and privacy concerns. Offline models solve cost and privacy but require careful implementation. Here's why I chose offline:

Why Sherpa-ONNX?

Sherpa-ONNX is a neural speech toolkit that runs ONNX models directly on device. The benefits are dramatic:

Latency: <1s for STT vs 2-3s for cloud APIs
Privacy: Voice never transmitted (critical for coding sessions)
Cost: Completely free for unlimited usage (cloud APIs charge $0.006-0.024/min, adding up to $360-1,440/year for 1hr/day usage). For users who prefer cloud-based options, we also support bring-your-own-key for ElevenLabs and OpenAI.

The trade-off? Model size. The STT model is ~150MB, TTS models ~50MB each. But with modern phone storage, this is negligible compared to the benefits.

Solving Privacy: End-to-End Encryption

When you're sending code snippets, API keys, and development commands through relay servers, privacy isn't optional—it's critical. The challenge was implementing bank-level encryption without adding complexity that would slow down the development experience.

X25519 + ChaCha20-Poly1305

I implemented a complete end-to-end encryption system:

Key Exchange (X25519 ECDH):

Desktop generates X25519 keypair, stores in ~/.pericode/keys/
iOS generates keypair, stores in Keychain
Both upload public keys to Firebase Firestore
Each derives the same shared secret using ECDH (never transmitted!)

Message Encryption (ChaCha20-Poly1305 AEAD):

256-bit symmetric encryption
96-bit random nonces per message
128-bit authentication tags
~0.5ms overhead per message

The result? The relay server sees only encrypted ciphertext. Your code, voice commands, and Claude responses are completely private. This solves the privacy problem: even though messages route through Cloudflare's infrastructure, no one can read your sensitive development data.

Security Properties

Forward secrecy: Keys rotate per session
Authentication: Firebase prevents session hijacking
Zero-knowledge relay: Server cannot decrypt messages
No key transmission: ECDH derives shared secret without sending keys over the network

The Relay Server Decision

For device connectivity, I chose Cloudflare Workers + Durable Objects to create a seamless, zero-configuration experience:

Why Cloudflare Workers + Durable Objects?

Global edge network: ~50ms latency from anywhere
WebSocket routing: Messages relay between devices automatically
Session isolation: Each session in its own Durable Object
Auto-expiry: Sessions timeout after inactivity
Free tier: Covers hundreds of users

The ~50ms added latency is imperceptible in practice, and the user experience is dramatically better: just tap a desktop name in the app and connect instantly.

Solving Context: Tight Development Loops Away From Your Desk

The second challenge was giving Claude enough context to be useful when you're not at your computer. This is where MCP (Model Context Protocol) comes in. By streaming screenshots, file context, and mobile test documentation to the mobile device, Claude can see what you're working on and provide accurate debugging help.

Screenshot Streaming via MCP

MCP Server: Desktop runs an MCP server that Claude Code can call
Multiple capture modes: Active window, full screen, specific app
Compression: Sharp library compresses to <4MB
Base64 streaming: Images sent via relay in chunks
Inline display: iOS shows thumbnails, tap for full-screen

Maestro Documentation Integration

For mobile development workflows, I integrated Maestro documentation into the context system. Maestro is a cross-platform UI testing framework that works with iOS, Android, React Native, Flutter, SwiftUI, and Jetpack Compose. While this integration is experimental, it's showing promising results. By including Maestro docs in Claude's context:

Natural language test writing: Ask Claude to "write a test that logs in and navigates to settings" and it generates valid Maestro YAML
Framework-agnostic testing: Works across all mobile frameworks without platform-specific knowledge
Visual debugging: Combine screenshots with Maestro test commands for precise mobile UI feedback
Rapid iteration: Generate, modify, and run UI tests entirely via voice while away from your desk

This solves the context problem: when you say "the UI looks broken" while on a walk, Claude can actually see your screen and provide accurate debugging help without you being at your desk. For mobile projects, Claude can also generate Maestro tests to validate the fix.

Multi-Agent Orchestration

Unlike traditional chatbots, Pericode runs multiple Claude Code agents simultaneously:

Agent lifecycle: Each agent is a spawned child process
Independent execution: Agents don't block each other
Project isolation: Agents grouped by directory
Conversation history: Each agent maintains its own context
Streaming output: Real-time updates via JSON parsing

You can have one agent refactoring your authentication module while another runs tests, and a third reviews your PR. All controlled by voice.

Tool Approval Workflow

Giving an AI full access to bash commands and file edits is dangerous. I built an approval system:

SwiftUI sheet: Beautiful approval UI on iOS
Tool details: Shows command, description, and parameters
Push notifications: Fallback if app is backgrounded
Auto-approve option: For trusted use cases

Example: Claude wants to run npm install. You get a notification, review the command, approve or deny. If denied, Claude continues without that tool.

What's Next

Pericode is currently in private beta with plans for public release soon. The next phase focuses on:

Public release: Opening up Pericode to all developers
Template projects: Pre-configured starter projects across different frameworks (React Native, Flutter, SwiftUI) that work out of the box with Pericode, complete with MCP server setup, screenshot tools, and Maestro integration for immediate voice-controlled development
Maestro integration: MCP tools for mobile UI testing and automation, enabling Claude to understand and interact with mobile app flows through natural language commands
Screenshot feedback MCP: Advanced screenshot analysis tools specifically designed for mobile development, allowing Claude to provide detailed UI/UX feedback and catch visual regressions during voice-based development sessions

Website: pericode.app