Agent System

Overview

The Bytebot Agent System transforms a simple desktop container into an intelligent, autonomous computer user. By combining Claude AI with structured task management, it can understand natural language requests and execute complex workflows just like a human would. Bytebot Agent Architecture

How the AI Agent Works

The Brain: Multi-Model AI Integration

At the heart of Bytebot is a flexible AI integration that supports multiple models. Choose the AI that best fits your needs: Anthropic Claude (Default):

Best for complex reasoning and visual understanding
Excellent at following detailed instructions
Superior performance on desktop automation tasks

OpenAI GPT Models:

Fast and reliable for general automation
Strong code understanding and generation
Cost-effective for routine tasks

Google Gemini:

Efficient for high-volume tasks
Good balance of speed and capability
Excellent multilingual support

The agent with any model:

Understands Context: Processes your natural language requests with full conversation history
Plans Actions: Breaks down complex tasks into executable computer actions
Adapts in Real-time: Adjusts its approach based on what it sees on screen
Learns from Feedback: Improves task execution through conversation

Conversation Flow

You Describe a Task

“Research competitors for my SaaS product and create a comparison table”

AI Plans the Approach

The AI model understands the request and plans: open browser → search → visit sites → extract data → create document

Executes Actions

The agent controls the desktop: clicking, typing, taking screenshots, reading content

Provides Updates

Real-time status updates and asks for clarification when needed

Delivers Results

Completes the task and provides the output (files, screenshots, summaries)

Task Management System

Task Lifecycle

Tasks move through a structured lifecycle:

Task Properties

Each task contains:

Description: What needs to be done
Priority: Urgent, High, Medium, or Low
Status: Current state in the lifecycle
Type: Immediate or Scheduled
History: All messages and actions taken

Smart Task Processing

The agent processes tasks intelligently:

Priority Queue: Urgent tasks run first
Error Recovery: Automatically retries failed actions
Human in the Loop: Asks for help when stuck
Context Preservation: Maintains conversation history across sessions

Real-world Capabilities

What the Agent Can Do

Web Automation

Browse websites
Fill out forms
Extract data
Download files
Monitor changes

Document Work

Create documents
Edit spreadsheets
Generate reports
Organize files
Convert formats

Email & Communication

Access webmail through browser
Read and extract information
Fill contact forms
Navigate communication portals
Handle verification flows

Data Processing

Extract from PDFs
Process CSV files
Create visualizations
Generate summaries
Transform data

Technical Architecture

Core Components

NestJS Agent Service
- Integrates with multiple AI provider APIs (Anthropic, OpenAI, Google)
- Handles WebSocket connections
- Coordinates with desktop API
Message System
- Structured conversation format
- Supports text and images
- Maintains full context
- Enables rich interactions

Database Schema

Tasks: id, description, status, priority, timestamps
Messages: id, task_id, role, content, timestamps
Summaries: id, task_id, content, parent_id

Computer Action Bridge
- Translates AI decisions to desktop actions
- Handles screenshots and feedback
- Manages action timing
- Provides error handling

API Endpoints

Key endpoints for programmatic control:

// Create a new task
POST /tasks
{
  "description": "Your task description",
  "priority": "HIGH",
  "type": "IMMEDIATE"
}

// Get task status
GET /tasks/:id

// Send a message
POST /tasks/:id/messages
{
  "content": "Additional instructions"
}

// Get task history
GET /tasks/:id/messages

Chat UI Features

The web interface provides:

Real-time Interaction

Live chat with the AI agent
Instant status updates
Progress indicators
Error notifications

Visual Feedback

Embedded desktop viewer
Screenshot history
Action replay
Task timeline

Task Management

Create and prioritize tasks
View active and completed tasks
Export conversation logs
Manage task queues

Security & Privacy

Data Isolation

All processing happens in your infrastructure
No data sent to external services (except your chosen AI provider API)
Conversations stored locally
Complete audit trail

Access Control

Configurable authentication
API key management
Network isolation options

Extending the Agent

Integration Points

External API calls via the Agent API
Custom AI prompts for specialized workflows
MCP protocol support for tool integration

Best Practices

Clear Instructions: Be specific about desired outcomes
Break Down Complex Tasks: Use multiple smaller tasks for better results
Provide Context: Include relevant files or URLs
Monitor Progress: Watch the desktop view for real-time feedback
Review Results: Verify outputs meet requirements

Troubleshooting

Agent not responding

Slow task execution

Next Steps

Quick Start

Get your agent running

API Reference

Integrate with your apps

Use Cases

See what’s possible

Best Practices

Optimize your workflows

Getting Started

User Guides

Deployment

Core Concepts

Overview

How the AI Agent Works

The Brain: Multi-Model AI Integration

Conversation Flow

Task Management System

Task Lifecycle

Task Properties

Smart Task Processing

Real-world Capabilities

What the Agent Can Do

Web Automation

Document Work

Email & Communication

Data Processing

Technical Architecture

Core Components

API Endpoints

Chat UI Features

Real-time Interaction

Visual Feedback

Task Management

Security & Privacy

Data Isolation

Access Control

Extending the Agent

Integration Points

Best Practices

Troubleshooting

Next Steps

Quick Start

API Reference

Use Cases

Best Practices

Getting Started

User Guides

Deployment

Core Concepts

​Overview

​How the AI Agent Works

​The Brain: Multi-Model AI Integration

​Conversation Flow

​Task Management System

​Task Lifecycle

​Task Properties

​Smart Task Processing

​Real-world Capabilities

​What the Agent Can Do

Web Automation

Document Work

Email & Communication

Data Processing

​Technical Architecture

​Core Components

​API Endpoints

​Chat UI Features

​Real-time Interaction

​Visual Feedback

​Task Management

​Security & Privacy

​Data Isolation

​Access Control

​Extending the Agent

​Integration Points

​Best Practices

​Troubleshooting

​Next Steps

Quick Start

API Reference

Use Cases

Best Practices

Overview

How the AI Agent Works

The Brain: Multi-Model AI Integration

Conversation Flow

Task Management System

Task Lifecycle

Task Properties

Smart Task Processing

Real-world Capabilities

What the Agent Can Do

Technical Architecture

Core Components

API Endpoints

Chat UI Features

Real-time Interaction

Visual Feedback

Task Management

Security & Privacy

Data Isolation

Access Control

Extending the Agent

Integration Points

Best Practices

Troubleshooting

Next Steps