All documents for this post can be found in my repository.
What We Built
After 24 days of development, we have a complete multi-agent system. What started as a simple connection to a local language model has grown into a “production-ready” microservice architecture. Of course, always with the understanding that it’s completely constructed and something like this wouldn’t actually go into production.
The system simulates a heist scenario where multiple AI agents must work together. Each agent has a specialized role. Planner, Hacker, Safecracker, Intel, Driver, and Lookout. The agents communicate with each other, use various tools, and plan the heist together. The twist: One randomly selected agent is a saboteur who subtly undermines the team. That’s the idea.
Architecture
The system consists of 6-7 independent microservices:
- OAuth Service (Port 8001): Central authentication service with JWT token management
- Calculator Service (Port 8002): Mathematical calculations for the Safecracker
- File Reader Service (Port 8003): Access to documents and specifications
- Database Query Service (Port 8004): Security database for Intel research
- Memory Service (Port 8005): Context compression and long-term memory
- Dashboard (Port 8008): Interactive web interface with real-time analytics
- Detection API (Port 8010): AI-powered sabotage detection
All services are containerized via Docker and communicate over a shared network. OAuth protects every tool access, and health checks ensure services start in the correct order.
Core Technologies
- Backend: FastAPI as web framework, Python 3.11, and SQLite for persistence. Each service is a standalone FastAPI application with its own endpoints and responsibilities.
- Authentication: OAuth 2.0 Client Credentials Flow. Each agent authenticates with the OAuth service and receives a time-limited JWT token. Tools verify token validity and scopes before accepting requests.
- Frontend: Simple HTML/CSS/JavaScript dashboard with Chart.js for visualizations. WebSocket connection for real-time updates of agent activities. 100% vibecoded.
- LLM: Local language models via LM-Studio. No cloud API access, everything runs locally. This enables experimentation without costs and privacy concerns.
- Deployment: Docker Compose orchestrates all services. A single command
docker-compose upstarts the entire system.
Special Features
- Multi-Agent Conversation: Agents speak to each other in rounds. Each agent sees the last messages as context and responds according to their persona. The conversation develops organically.
- Memory Compression: Old messages are automatically summarized, new ones remain detailed. This prevents token explosion in long conversations. An LLM creates the summaries.
- Tool Discovery: Agents query a discovery service to find out which tools they’re allowed to use. Based on their OAuth scopes, they receive filtered tool lists. Tools are hot-swappable at runtime.
- Mole Game: A random agent becomes the saboteur. They receive secret instructions in their system prompt, invisible to other agents. There are five different sabotage strategies: timing errors, misinformation, coordination chaos, subtle delays, wrong tools.
- AI Detection: Hybrid system for saboteur detection. 60% rule-based pattern analysis (tool usage, timing inconsistencies, message anomalies, information quality) combined with 40% LLM reasoning. This is a RAG approach: Retrieval of patterns, Augmentation of context, Generation of final scores by an LLM.
- Interactive Dashboard: Real-time monitoring of all agent activities. Sessions can be paused, resumed, or stopped. Commands can be sent directly to individual agents, overriding their current instructions. Charts show tool statistics, agent interactions, and session histories.
- Session Analytics: Detailed evaluation of each session. SQL queries analyze tool success rates, agent interaction matrices (who spoke with whom), message frequencies. The data feeds into saboteur detection.
Technical Highlights
The system demonstrates several modern patterns.
- Microservices with OAuth: Each service is independently deployable. Central OAuth service authenticates all clients. No service-to-service communication without tokens. Scopes control granular permissions.
- Configuration-Driven: Agents are loaded from YAML configs, not hardcoded. This allows A/B testing, different environments (dev/prod), and rapid experimentation with new agent setups.
- RAG for Robust AI: Pure rules are rigid, pure LLMs hallucinate. The combination brings the current best of both worlds. Rules find measurable anomalies, and the LLM understands context and nuances.
- Health Checks in Docker: Services don’t just start, they report “ready”. Docker waits until health checks succeed before starting dependent services. This prevents race conditions.
- WebSocket for Real-Time: The dashboard updates live. No polling requests, real push updates. As soon as an agent sends a message, it appears in the dashboard.
Development Journey
Day 1 started with the simple question: How do I connect to a local LLM? Day 24 ends with a production-ready system of 7 services, OAuth security, AI detection, interactive dashboard, and Docker deployment.
Each day added one concept. Persistence (SQLite), web APIs (FastAPI), containerization (Docker), multi-agent coordination, memory management, tool integration, analytics, visualization, gamification, AI detection. Small incremental steps that sum up to a complex system.
Some concepts had to be reworked. The first memory implementation stored everything, which led to token explosion and pushed my MacBook to its limits. The first Docker integration started services in the wrong order and resulted in connection errors. The first detection was only rule-based and simply very inaccurate. Iteration and debugging are part of it.
The result is a system that demonstrates fundamental architecture principles. Not perfect, but a working example.
What Follows
This summary documents all 24 days. Each concept gets a brief explanation and a minimal code example. The document serves as a reference. Also for me for review, for reference, as a learning path for similar projects, …
The Concepts in Detail
[Note: The full translation would continue with all 24 days of concepts. Due to length, I’ll show the structure and a few examples]
Day 1: Connect Local LLM
Concept: Use locally hosted language models instead of cloud APIs.
Key Idea: LM-Studio and Ollama provide OpenAI-compatible APIs for local models, enabling privacy and cost-free experimentation.
Code Example:
from openai import OpenAI
# Connect to local LLM
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="local-model",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Technologies: LM-Studio, Ollama, OpenAI Python Library
Day 2: Persona Patterns
Concept: Give different agents different behaviors through system prompts.
Key Idea: The same LLM model produces different outputs through specialized personas. A “planner” thinks strategically, a “critic” looks for problems.
Code Example:
PERSONAS = {
"planner": "You are a strategic planner. Focus on coordination and timing.",
"hacker": "You are a tech expert. Analyze security systems and vulnerabilities.",
"safecracker": "You are a precision specialist. Focus on technical details."
}
class Agent:
def __init__(self, name: str, role: str):
self.name = name
self.system_prompt = PERSONAS[role]
def create_messages(self, user_input: str):
return [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": user_input}
]
planner = Agent("planner", "planner")
messages = planner.create_messages("What's our approach?")
response = client.chat.completions.create(model="local-model", messages=messages)Technologies: Prompt Engineering, System-Level Instructions
[Continuing with remaining days…]
Learned Architecture Patterns
Over the 24 days, recurring architecture patterns crystallized. These patterns are not specific to this heist system but transferable to many software projects. Here are the five most important patterns that proved particularly valuable in my opinion:
1. Microservices Architecture
- Single Responsibility per service
- Service-to-service communication via REST
- Centralized authentication (OAuth)
- Health checks and monitoring
2. Event-Driven Design
- WebSocket for real-time updates
- Asynchronous communication patterns
- Event-based state changes
- Push instead of pull architecture
3. Configuration-Driven Development
- YAML-based agent configuration
- Environment-specific configs (dev/staging/prod)
- Runtime agent instantiation
- Feature flags and A/B testing
4. RAG Pattern (Retrieval-Augmented Generation)
- Rule-based retrieval of facts
- Context augmentation for LLM
- LLM generation with grounded data
- Hybrid scoring for robustness
5. OAuth 2.0 Security Model
- Client Credentials flow for services
- Scope-based permissions
- JWT tokens with expiration
- Centralized token validation
Technology Stack Summary
| Layer | Technologies |
|---|---|
| LLM | LM-Studio, Ollama, Local Models (Gemma, Llama) |
| Backend | FastAPI, Uvicorn, Python 3.11+ |
| Database | SQLite3, SQL |
| Auth | OAuth 2.0, JWT, HS256 |
| Frontend | HTML/CSS/JavaScript, Chart.js |
| Real-Time | WebSocket |
| Containerization | Docker, Docker Compose |
| Protocol | MCP (Model Context Protocol) |
| Data Format | JSON, YAML, Pydantic Models |
Summary
Day one was a simple LLM connection with 20 lines of code. Day 24 is a system with seven microservices, OAuth, AI detection, and Docker deployment. This difference didn’t emerge through perfect planning but through incremental growth. Every day one concept, small steps that compound. Some concepts were planned, others emerged from problems. Day eleven’s memory compression, for example, was unplanned, but after day ten the token counts exploded and I had to react. This flexibility was the key to success.
The most important insights can be summarized as follows. Configuration beats hardcoding. By the third agent at the latest, it became clear that YAML configs (Day 14) make everything more flexible. Security is fundamental. OAuth and JWT felt like overkill at first but became the foundation for tool discovery, analytics, and AI detection. Data drives everything advanced. SQLite enabled analytics, pattern detection, and meaningful dashboards. Hybrid approaches unite strengths. Pure rules are too rigid, pure LLMs hallucinate. The solution lies in combining, for example, 60% rules for measurable anomalies and 40% LLM for context and nuances. Developer experience is productivity. Docker Compose transformed the daily manual startup of six services into docker-compose up. The time saved flows into features. Real-time transforms UX. WebSocket instead of polling makes the difference between sluggish and lively tangible. These patterns are not specific to this project but transferable to many LLM-based systems.
24 days, 24 concepts, one complete system. From a simple LLM connection to a production-ready multi-agent architecture. The result is not perfect, but it works, it teaches, and it demonstrates how modern LLM systems can be built.
Code References
All code and documentation available at: github.com/gvtsch/aoc_2025_heist
Individual day implementations in respective day_XX/ directories.