Back
Abyss

Abyss

A voice-first AI assistant that lives on your iPhone, learns you over time through a context graph, and reaches deeper tools through a permissioned macOS bridge — built for the way people actually work.

Amazon BedrockNova SonicAnthropic ClaudeNeptune AnalyticsAWS ECS FargateWhisperKitElevenLabsSwiftUINode.jsWebSockets

Abyss is a voice-first AI assistant that lives on your iPhone, learns you over time through a context graph, and reaches deeper tools through a permissioned macOS bridge — built for the way people actually work.

The Problem

Today's AI assistants are stateless and siloed. Each conversation starts from scratch, with no memory of who you are, what you're working on, or what tools you actually need. They live inside browser tabs or chat windows — disconnected from the real workflows happening on your devices.

Abyss was built to close that gap: an assistant that accumulates context over time, speaks naturally through voice, and can reach into your desktop environment when you give it permission.

System Architecture

Abyss System Architecture

Abyss is a distributed system spanning three layers connected over WebSocket:

iOS App

The mobile client is built in SwiftUI with WhisperKit for on-device speech-to-text, ElevenLabs for text-to-speech, and OAuth for authenticating with external services. It connects to the backend over a persistent WebSocket and exchanges events bidirectionally.

Certain destructive actions — sending emails, deleting calendar events — require explicit iOS user confirmation before executing.

Node.js Conductor Server (ECS Fargate)

The Conductor is the central orchestration layer. It routes between three LLM backends depending on the task:

  • Bedrock Nova Lite / Pro — default text reasoning (Lite) and heavy tasks (Pro)
  • Anthropic Claude — complex reasoning and tool-use chains
  • Nova Sonic — real-time voice streaming

The Conductor dispatches tool calls to three categories:

Server Tools

APIs that run server-side with OAuth credentials:

ServiceCapabilities
Gmail APIinbox, search, read, send*, reply*
Google Calendar APIlist, create, update, delete*
Canvas LMStodo, grades, assignments
Cursor Cloud Agentsspawn, status, cancel, followup
Brave Searchweb search
GitHub OAuthauth, repos

* = iOS user confirmation required

macOS Bridge

A permissioned desktop agent connected through a Bridge Router. It exposes three tool modules, all sandboxed within a Workspace Sandbox with pairing security:

ModuleCapabilities
Command Executorexec.run, fs.read, fs.search, fs.patch, git.*
Claude Code (Subprocess)Bash, Read, Edit, Write, Glob, Grep
Nova Act (Python)start, act, stop — drives Chrome for web automation

Data Flow

The end-to-end voice interaction follows this path:

  1. User speaks → WhisperKit transcribes on-device
  2. Transcript sent over WebSocket to the Conductor
  3. Conductor routes to LLM (Nova/Claude) for reasoning
  4. LLM dispatches tool calls to one of three categories:
    • Server Tools — Gmail, Calendar, Canvas, Search
    • iOS Tools — user confirmations, audio controls, preferences
    • Bridge Tools — file system, git, Claude Code, Nova Act
  5. Tool results feed back to LLM for continued reasoning
  6. Final response streamed via ElevenLabs / Nova Sonic → iOS → speaker

AWS Infrastructure

The entire backend runs in AWS us-east-1:

Compute

ComponentDetails
Amazon ECRabyss-server:latest (linux/amd64)
ECS FargateCluster: abyss, Task: abyss-server, Node.js Conductor on port 8080
Application Load Balancer:8080 with WebSocket sticky sessions
Execution Roleabyss-ecs-execution-role (ECR + CloudWatch Logs)
Task Roleabyss-ecs-task-role (Bedrock + Neptune + S3 + KB)

AI / ML

ServiceUsage
Bedrock Converse APINova Lite (default), Nova Pro (heavy), Nova Sonic (voice), Titan Embed V2
Bedrock Knowledge BasesMemory retrieval, vector search
Bedrock Agent RuntimeKB ingestion, S3 data source

Storage & Graph

ServiceUsage
Amazon S3Memory documents (JSON)
Neptune AnalyticsContext graph, OpenCypher queries, vector + keyword hybrid search
CloudWatch Logs/ecs/abyss-server, structured JSON

External Services

Gmail APIGoogle Calendar APICanvas LMS APICursor Cloud Agents APIBrave Search APIGitHub OAuthElevenLabs (TTS)WhisperKit (on-device)Anthropic Claude API

Context Graph

Every interaction builds a persistent knowledge graph in Neptune Analytics using OpenCypher queries. Entities, preferences, projects, and relationships are extracted and stored as graph nodes and edges. Over time, the assistant develops a rich model of who you are and what you care about — enabling responses that are genuinely personalized rather than generically helpful.

Memory documents are stored as JSON in S3 and indexed through Bedrock Knowledge Bases for vector search. Neptune Analytics provides hybrid vector + keyword search for retrieval, combining semantic similarity with structured graph traversal.

Voice Pipeline

Voice is the primary input modality — not an afterthought bolted onto a chat interface. WhisperKit runs on-device for low-latency speech-to-text without a network round trip. Output is synthesized through ElevenLabs or Nova Sonic depending on the interaction mode, streamed back to the iOS client over WebSocket, and played through the speaker.

The system is designed for the conversational cadence of real speech: interruptions, corrections, and follow-ups all work naturally.

Challenges & Solutions

  • Context graph scalability: As the knowledge graph grows, query latency can degrade. We use selective subgraph retrieval based on embedding similarity to keep context injection fast and relevant.
  • Voice latency budget: End-to-end voice interaction needs to feel conversational (<2s). On-device WhisperKit handles recognition without a network round trip, and Bedrock streaming keeps generation responsive.
  • Desktop trust model: Giving an AI access to your filesystem is a security minefield. The workspace sandbox with pairing security ensures every bridge action requires explicit authorization, with granular capability scoping per module.
  • WebSocket reliability: Persistent WebSocket connections over mobile networks are fragile. The ALB uses sticky sessions to maintain connection affinity, and the iOS client handles reconnection transparently.
  • Multi-LLM routing: Different tasks demand different models. The Conductor dynamically routes between Nova Lite (fast/cheap), Nova Pro (complex), Claude (deep reasoning), and Nova Sonic (voice) based on task characteristics.