Overview
Bytebot is a self-hosted AI desktop agent built with a modular architecture. It combines a Linux desktop environment with AI to create an autonomous computer user that can perform tasks through natural language instructions.
System Architecture
The system consists of four main components that work together:1. Bytebot Desktop Container
The foundation of the system - a virtual Linux desktop that provides:- Ubuntu 22.04 LTS base for stability and compatibility
- XFCE4 Desktop for a lightweight, responsive UI
- bytebotd Daemon - The automation service built on nutjs that executes computer actions
- Pre-installed Applications: Firefox ESR, Thunderbird, text editors, and development tools
- noVNC for remote desktop access
- Runs completely isolated from your host system
- Consistent environment across different platforms
- Can be customized with additional software
- Accessible via REST API on port 9990
- MCP SSE endpoint available at
/mcp
- Uses shared types from
@bytebot/shared
package
2. AI Agent Service
The brain of the system - orchestrates tasks using an LLM:- NestJS Framework for robust, scalable backend
- LLM Integration supporting Anthropic Claude, OpenAI GPT, and Google Gemini models
- WebSocket Support for real-time updates
- Computer Use API Client to control the desktop
- Prisma ORM for database operations
- Tool definitions for computer actions (mouse, keyboard, screenshots)
- Interprets natural language requests
- Plans sequences of computer actions
- Manages task state and progress
- Handles errors and retries
- Provides real-time task updates via WebSocket
3. Web Task Interface
The user interface for interacting with your AI agent:- Next.js 15 Application with TypeScript for type safety
- Embedded VNC Viewer to watch the desktop in action
- Task Management UI with status badges
- WebSocket Connections for live updates
- Reusable components for consistent UI
- API utilities for streamlined server communication
- Task creation and management interface
- Desktop tab for direct manual control
- Real-time desktop viewer with takeover mode
- Task history and status tracking
- Responsive design for all devices
4. PostgreSQL Database
Persistent storage for the agent system:- Tasks Table: Stores task details, status, and metadata
- Messages Table: Stores AI conversation history
- Prisma ORM for type-safe database access
Data Flow
Task Execution Flow
1
User Input
User describes a task in natural language via the chat UI
2
Task Creation
Agent service creates a task record and adds it to the processing queue
3
AI Planning
The LLM analyzes the task and generates a plan of computer actions
4
Action Execution
Agent sends computer actions to bytebotd via REST API or MCP
5
Desktop Automation
bytebotd executes actions (mouse, keyboard, screenshots) on the desktop
6
Result Processing
Agent receives results, updates task status, and continues or completes
7
User Feedback
Results and status updates are sent back to the user in real-time
Communication Protocols
Security Architecture
Isolation Layers
-
Container Isolation
- Each desktop runs in its own Docker container
- No access to host filesystem by default
- Network isolation with explicit port mapping
-
Process Isolation
- bytebotd runs as non-root user
- Separate processes for different services
- Resource limits enforced by Docker
-
Network Security
- Services only accessible from localhost by default
- Can be configured with authentication
- HTTPS/WSS for external connections
API Security
- Desktop API: No authentication by default (localhost only). Supports REST and MCP.
- Agent API: Can be secured with API keys
- Database: Password protected, not exposed externally
Default configuration is for development. For production:
- Enable authentication on all APIs
- Use HTTPS/WSS for all connections
- Implement network policies
- Rotate credentials regularly
Deployment Patterns
Single User (Development)
Production Deployment
Enterprise Deployment
Extension Points
Custom Tools
Add specialized software to the desktop:AI Integrations
Extend agent capabilities:- Custom tools for the LLM
- Additional AI models
- Specialized prompts
- Domain-specific knowledge
Performance Considerations
Resource Usage
- Desktop Container: ~1GB RAM idle, 2GB+ active
- Agent Service: ~256MB RAM
- UI Service: ~128MB RAM
- Database: ~256MB RAM
Optimization Tips
- Allocate sufficient resources to containers
- Limit concurrent tasks to prevent overload
- Monitor resource usage regularly
- Use LiteLLM proxy for provider flexibility