← Back to Projects
Agentic Orchestration Infrastructure
Desktop View

Agentic Orchestration Infrastructure

A production multi-agent AI system with semantic long-term memory, autonomous task orchestration, voice input, and continuous operation on personal hardware. Built, operated, and actively shipping real products.

AIMulti-AgentPythonClaudeWhispersqlite-vecMCPAutomationInfrastructure

A production multi-agent AI system designed, built, and operated alone. Voice in, work out. No manual handoffs, no supervision required, no cloud dependency. The same system that builds and ships StreakUp.

What It Does

A voice message arrives in Telegram. Within 5 seconds it becomes a running task, transcribed on-device by whisper.cpp with no external API call and no cost per message. That task is matched to the right specialist agent, given a focused memory slice assembled from semantic search results, and spawned with a precise context window rather than a full file dump.

The system runs work in parallel. One instruction can launch a build agent, a marketing agent, and a career agent simultaneously. Each appends results to a shared state file when done. Nothing is committed to the decision log until the main agent has reviewed the full diff. Work accumulates; oversight stays at the center.

It also monitors itself. A polling daemon checks agent progress every 3 minutes and surfaces stalled work via Telegram before it becomes a problem. A separate reaper process terminates agents that do not recover. OAuth failures write a file-based marker that the heartbeat detects, formats into a one-tap recovery message, and sends once. The system tells you what it needs; you resolve it in one step.

Problem

Most AI tools are stateless. Every new session starts from zero, specialist agents require constant manual handoffs, and any task running longer than a few minutes needs supervision. OAuth tokens expire silently, parallel workstreams are impossible in a single agent loop, and there is no mechanism to detect a failed process without manually checking.

Technical Explanation

Stateless sessions force full context reconstruction on every call, wasting tokens and breaking continuity on multiday tasks. Single-agent architectures serialize work that could run in parallel, blocking throughput. Without persistent semantic memory, each spawned agent must receive the full project brief at start time, pushing prompts past practical token limits. Without an autonomous monitoring layer, stalled or failed processes accumulate undetected until a human notices.

Architecture

Every design decision in this system serves one constraint: maximize context precision while minimizing token cost. That principle determines what loads at session start, what gets queried on demand, what never auto-injects, and how much context each spawned agent receives.

Memory Injection Hierarchy

Session context loads in tiers matched to task complexity. Lite (~2k tokens) loads operational rules only. Standard (~6-8k) adds identity, user profile, domain context, and the skills registry. Full (~11-14k) adds agent coordination files, the spawn template, and shared state. Most sessions never reach full. Complex multi-agent tasks do.

Within those tiers, four memory layers operate with distinct injection strategies:

  • Layer 1 is always-on: a dense set of non-negotiable operational rules injected at every session start via a hook, zero query cost
  • Layer 2 is on-demand: context_refresh.py rewrites a dedicated file with the top Brain DB hits for a given domain when requested; domain rules never pollute the always-on layer
  • Layer 3 is never auto-injected: daily session logs stored as markdown, accessed only when explicitly requested
  • Layer 4 is queried explicitly: Brain DB weighted retrieval at 60% cosine similarity, 30% importance, 10% recency; queried at session start and at every sub-agent spawn via the Memory Preamble Protocol

The Brain DB has three internal tiers: procedural (HOW to do things), semantic (WHAT is true), and episodic (WHAT happened and when). A nightly pipeline extracts 5 to 8 facts from session logs and writes them to the episodic tier automatically. Memory compounds over time without manual input.

Agent Lifecycle Controls

Before any sub-agent starts, spawn_guard.py checks for duplicate spawns and known-bad tool combinations. A cost circuit breaker sets a hard financial limit on token spend; runaway agents cannot exceed it regardless of task state. The watcher daemon polls every 3 minutes and writes a stuck-alert marker after 6 minutes of inactivity. The reaper terminates agents that exceed that threshold without resolving. The system does not accumulate stuck processes.

Tool Access

Three distinct paths handle different workloads. The MCP bridge exposes 12 tools via a stdio server callable by name or natural language: semantic search, context refresh, audio transcription, browser screenshots, Telegram file delivery, and others. A skills library provides prompt templates for specialized task types, loaded at spawn time with a registry tracking ready versus needs-setup status. A direct Python layer covers 60-plus operational scripts for platform automation, memory operations, monitoring, and maintenance.

Automation and Integration

The hub-and-spoke coordination model uses Claude Code as the orchestration primitive. Sub-agents append to BLACKBOARD.md only; the main agent is the sole writer to DECISIONS.md and writes only after reviewing the full diff. Every agent completion follows a six-field JSON schema: summary, files_changed, what_works, what_doesnt, needs_alexander, and next. All inter-agent messages log to a persistent audit file.

Task routing and social automation run as separate concerns. task_router.py semantically matches an incoming task to the closest specialist persona and returns a prebuilt spawn configuration. Social automation splits by transport: Tweepy API v2 handles all Twitter writes; Playwright handles reads and Reddit sessions; YouTube engagement runs through the Data API v3 with OAuth2 on a 30-minute cycle. Each platform maintains velocity caps and a cross-session deduplication log.

GitHub integrates through two accounts with fine-grained PATs scoped per repository. One holds read access across all repositories; the second holds collaborator write access where autonomous commits are permitted.

Technical Explanation

The tiered loading design keeps baseline token cost flat. Most sessions load lite or standard; full context loads only when agent coordination files are actively needed. The three-tier Brain DB allows the same query interface to serve different retrieval goals without mixing rule retrieval with fact retrieval or task history. spawn_guard and cost_breaker form a pre-execution safety layer: the first prevents logical errors, the second enforces financial limits. The episodic pipeline runs nightly and writes autonomously so memory grows without any manual curation step. The social transport split between Tweepy writes and Playwright reads avoids platform detection on React-rendered UIs.

System in Action

1) Voice message becomes an active task

A voice message arrives in Telegram as an OGG file. ffmpeg converts it to a 16kHz mono WAV in a single command. whisper.cpp runs local inference and returns a transcript in under 5 seconds. That text is passed to the agent as an instruction with no cloud dependency and no per-message cost.

2) Context assembled at the right tier

The system determines which loading tier applies before any work starts. Simple tasks load lite context. Multi-agent tasks load full context. agent_memory_slice.py then issues a Brain DB query and combines weighted results with filtered session context into a focused preamble under 300 tokens for each spawned sub-agent.

3) Task routed to the right specialist

task_router.py performs a semantic search against the persona library. The closest match returns a specialist identity and a prebuilt spawn configuration including persona file path, relevant memory slice, and structured reporting format. The main agent reviews the routing decision before spawning.

4) Parallel specialists execute simultaneously

A single instruction triggers multiple specialists at once: a build agent implements features, a marketing agent drafts copy and schedules posts, a career agent updates portfolio content. Each appends a structured JSON completion message to BLACKBOARD.md when done. The main agent reads every diff before writing anything to DECISIONS.md.

5) Watcher detects a stall and alerts

The polling loop runs every 3 minutes. No BLACKBOARD.md update within 6 minutes of an agent’s start time writes a stuck alert and triggers a Telegram notification. If the agent still does not resolve, the reaper terminates it. The alert fires exactly once per incident.

6) OAuth failure surfaces and resolves in one step

When a scheduled operation hits an expired token, reauth.py writes a marker to a fixed path. The next heartbeat reads it, formats a recovery message with the full authorization URL and exact exchange command, sends it via Telegram, and removes the marker. One command and the system resumes.

Results

  • Continuous operation across session boundaries with full context preserved at every restart
  • Tiered loading matches token cost to task complexity; simple sessions never pay for full context depth
  • Voice transcription on-device in under 5 seconds per message, zero transcription API cost
  • Memory grows autonomously via nightly episodic extraction; no manual curation required
  • Parallel specialists reduce total task time by running implementation, research, content, and review concurrently
  • Stalled agents detected, alerted, and terminated through the watcher and reaper layer with no manual checking
  • OAuth failures resolved through a single Telegram message and one-tap recovery command
  • Social automation across Twitter, YouTube, and Reddit with velocity controls and cross-session deduplication
  • All GitHub commits within auditable fine-grained PAT scope, traceable per repository and per account
  • StreakUp published on Google Play and the App Store with active RevenueCat monetization, AdMob integration, 701 automated tests, and a 12-member international beta team

Built on Python, Claude Code, sqlite-vec, whisper.cpp, and the MCP protocol running on Windows 10 Pro.


Building and operating this system alone is the clearest signal I can give of what I bring to complex software work: architecture instinct, shipping discipline, and the ability to keep something running without being asked to.