Architecture
How Bytebot’s desktop agent works under the hood
Overview
Bytebot is a self-hosted AI desktop agent built with a modular, containerized architecture. It combines a Linux desktop environment with Claude AI to create an autonomous computer user that can perform tasks through natural language instructions.
System Architecture
The system consists of four main components that work together:
1. Bytebot Desktop Container
The foundation of the system - a containerized Linux desktop that provides:
- Ubuntu 22.04 LTS base for stability and compatibility
- XFCE4 Desktop for a lightweight, responsive UI
- bytebotd Daemon - The automation service built on nutjs that executes computer actions
- Pre-installed Applications: Firefox ESR, Thunderbird, text editors, and development tools
- VNC & noVNC for remote desktop access
Key Features:
- Runs completely isolated from your host system
- Consistent environment across different platforms
- Can be customized with additional software
- Accessible via REST API on port 9990
2. AI Agent Service
The brain of the system - orchestrates tasks using an LLM:
- NestJS Framework for robust, scalable backend
- LLM Integration via OpenAI API for understanding and planning
- WebSocket Support for real-time updates
- Computer Use API Client to control the desktop
Responsibilities:
- Interprets natural language requests
- Plans sequences of computer actions
- Manages task state and progress
- Handles errors and retries
3. Web Task Interface
The user interface for interacting with your AI agent:
- Next.js Application with TypeScript for type safety
- Embedded VNC Viewer to watch the desktop in action
- Task Management UI for tracking progress
- WebSocket Connections for live updates
Features:
- Intuitive task interface
- Visual feedback of desktop actions
- Task history and status
- Export conversation logs
4. PostgreSQL Database
Persistent storage for the agent system:
- Tasks Table: Stores task details, status, and metadata
- Messages Table: Stores AI conversation history
- Prisma ORM for type-safe database access
Data Flow
Task Execution Flow
User Input
User describes a task in natural language via the chat UI
Task Creation
Agent service creates a task record and adds it to the processing queue
AI Planning
Claude AI analyzes the task and generates a plan of computer actions
Action Execution
Agent sends computer actions to bytebotd daemon via REST API
Desktop Automation
bytebotd executes actions (mouse, keyboard, screenshots) on the desktop
Result Processing
Agent receives results, updates task status, and continues or completes
User Feedback
Results and status updates are sent back to the user in real-time
Communication Protocols
Security Architecture
Isolation Layers
-
Container Isolation
- Each desktop runs in its own Docker container
- No access to host filesystem by default
- Network isolation with explicit port mapping
-
Process Isolation
- bytebotd runs as non-root user
- Separate processes for different services
- Resource limits enforced by Docker
-
Network Security
- Services only accessible from localhost by default
- Can be configured with authentication
- HTTPS/WSS for external connections
API Security
- Desktop API: No authentication by default (localhost only)
- Agent API: Can be secured with API keys
- Database: Password protected, not exposed externally
- VNC Access: Optional password protection
Default configuration is for development. For production:
- Enable authentication on all APIs
- Use HTTPS/WSS for all connections
- Implement network policies
- Rotate credentials regularly
Deployment Patterns
Single User (Development)
Team Deployment
Enterprise Deployment
Extension Points
Custom Tools
Add specialized software to the desktop:
AI Integrations
Extend agent capabilities:
- Custom tools for the LLM
- Additional AI models
- Specialized prompts
- Domain-specific knowledge
Performance Considerations
Resource Usage
- Desktop Container: ~1GB RAM idle, 2GB+ active
- Agent Service: ~256MB RAM
- UI Service: ~128MB RAM
- Database: ~256MB RAM
Optimization Tips
- Use lightweight desktop environments
- Limit concurrent tasks
- Monitor resource usage
- Scale horizontally for more capacity