> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bytebot.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> How Bytebot's desktop agent works under the hood

## Overview

Bytebot is a self-hosted AI desktop agent built with a modular architecture. It combines a Linux desktop environment with AI to create an autonomous computer user that can perform tasks through natural language instructions.

<img src="https://mintcdn.com/bytebot/TpvPpGege5zpgegQ/images/agent-architecture.png?fit=max&auto=format&n=TpvPpGege5zpgegQ&q=85&s=e027a66186048c3489902f49842206f9" alt="Bytebot Architecture Diagram" className="w-full max-w-4xl" width="1820" height="1040" data-path="images/agent-architecture.png" />

## System Architecture

The system consists of four main components that work together:

### 1. Bytebot Desktop Container

The foundation of the system - a virtual Linux desktop that provides:

* **Ubuntu 22.04 LTS** base for stability and compatibility
* **XFCE4 Desktop** for a lightweight, responsive UI
* **bytebotd Daemon** - The automation service built on nutjs that executes computer actions
* **Pre-installed Applications**: Firefox ESR, Thunderbird, text editors, and development tools
* **noVNC** for remote desktop access

**Key Features:**

* Runs completely isolated from your host system
* Consistent environment across different platforms
* Can be customized with additional software
* Accessible via REST API on port 9990
* MCP SSE endpoint available at `/mcp`
* Uses shared types from `@bytebot/shared` package

### 2. AI Agent Service

The brain of the system - orchestrates tasks using an LLM:

* **NestJS Framework** for robust, scalable backend
* **LLM Integration** supporting Anthropic Claude, OpenAI GPT, and Google Gemini models
* **WebSocket Support** for real-time updates
* **Computer Use API Client** to control the desktop
* **Prisma ORM** for database operations
* **Tool definitions** for computer actions (mouse, keyboard, screenshots)

**Responsibilities:**

* Interprets natural language requests
* Plans sequences of computer actions
* Manages task state and progress
* Handles errors and retries
* Provides real-time task updates via WebSocket

### 3. Web Task Interface

The user interface for interacting with your AI agent:

* **Next.js 15 Application** with TypeScript for type safety
* **Embedded VNC Viewer** to watch the desktop in action
* **Task Management** UI with status badges
* **WebSocket Connections** for live updates
* **Reusable components** for consistent UI
* **API utilities** for streamlined server communication

**Features:**

* Task creation and management interface
* Desktop tab for direct manual control
* Real-time desktop viewer with takeover mode
* Task history and status tracking
* Responsive design for all devices

### 4. PostgreSQL Database

Persistent storage for the agent system:

* **Tasks Table**: Stores task details, status, and metadata
* **Messages Table**: Stores AI conversation history
* **Prisma ORM** for type-safe database access

## Data Flow

### Task Execution Flow

<Steps>
  <Step title="User Input">
    User describes a task in natural language via the chat UI
  </Step>

  <Step title="Task Creation">
    Agent service creates a task record and adds it to the processing queue
  </Step>

  <Step title="AI Planning">
    The LLM analyzes the task and generates a plan of computer actions
  </Step>

  <Step title="Action Execution">
    Agent sends computer actions to bytebotd via REST API or MCP
  </Step>

  <Step title="Desktop Automation">
    bytebotd executes actions (mouse, keyboard, screenshots) on the desktop
  </Step>

  <Step title="Result Processing">
    Agent receives results, updates task status, and continues or completes
  </Step>

  <Step title="User Feedback">
    Results and status updates are sent back to the user in real-time
  </Step>
</Steps>

### Communication Protocols

```mermaid theme={null}
graph LR
    A[Tasks UI] -->|WebSocket| B[Agent Service]
    A -->|HTTP Proxy| C[Desktop VNC]
    B -->|REST/MCP| D[Desktop API]
    B -->|SQL| E[PostgreSQL]
    B -->|HTTPS| F[LLM Provider]
    D -->|IPC| G[bytebotd]
```

## Security Architecture

### Isolation Layers

1. **Container Isolation**
   * Each desktop runs in its own Docker container
   * No access to host filesystem by default
   * Network isolation with explicit port mapping

2. **Process Isolation**
   * bytebotd runs as non-root user
   * Separate processes for different services
   * Resource limits enforced by Docker

3. **Network Security**
   * Services only accessible from localhost by default
   * Can be configured with authentication
   * HTTPS/WSS for external connections

### API Security

* **Desktop API**: No authentication by default (localhost only). Supports REST and MCP.
* **Agent API**: Can be secured with API keys
* **Database**: Password protected, not exposed externally

<Warning>
  Default configuration is for development. For production:

  * Enable authentication on all APIs
  * Use HTTPS/WSS for all connections
  * Implement network policies
  * Rotate credentials regularly
</Warning>

## Deployment Patterns

### Single User (Development)

```yaml theme={null}
Services: All on one machine
Scale: 1 instance each
Use Case: Personal automation, development
Resources: 4GB RAM, 2 CPU cores
```

### Production Deployment

```yaml theme={null}
Services: All services on dedicated hardware
Scale: Single instance (1 agent, 1 desktop)
Use Case: Business automation
Resources: 8GB+ RAM, 4+ CPU cores
```

### Enterprise Deployment

```yaml theme={null}
Services: Kubernetes orchestration
Scale: Single instance with high availability
Use Case: Organization-wide automation
Resources: Dedicated nodes
```

## Extension Points

### Custom Tools

Add specialized software to the desktop:

```dockerfile theme={null}
FROM bytebot/desktop:latest
RUN apt-get update && apt-get install -y \
    your-custom-tools
```

### AI Integrations

Extend agent capabilities:

* Custom tools for the LLM
* Additional AI models
* Specialized prompts
* Domain-specific knowledge

## Performance Considerations

### Resource Usage

* **Desktop Container**: \~1GB RAM idle, 2GB+ active
* **Agent Service**: \~256MB RAM
* **UI Service**: \~128MB RAM
* **Database**: \~256MB RAM

### Optimization Tips

1. Allocate sufficient resources to containers
2. Limit concurrent tasks to prevent overload
3. Monitor resource usage regularly
4. Use LiteLLM proxy for provider flexibility

## Next Steps

<CardGroup cols={2}>
  <Card title="Agent System" icon="robot" href="/core-concepts/agent-system">
    Learn about the AI agent capabilities
  </Card>

  <Card title="Desktop Environment" icon="desktop" href="/core-concepts/desktop-environment">
    Explore the virtual desktop environment
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    Integrate with your applications
  </Card>

  <Card title="Deployment Guide" icon="rocket" href="/quickstart">
    Deploy your own instance
  </Card>
</CardGroup>
