# Tasks API Source: https://docs.bytebot.ai/api-reference/agent/tasks Reference documentation for the Bytebot Agent Tasks API ## Tasks API The Tasks API allows you to manage tasks in the Bytebot agent system. It's available at `http://localhost:9991/tasks` when running the full agent setup. ## Task Model ```typescript { id: string; description: string; status: 'PENDING' | 'IN_PROGRESS' | 'NEEDS_HELP' | 'NEEDS_REVIEW' | 'COMPLETED' | 'CANCELLED' | 'FAILED'; priority: 'LOW' | 'MEDIUM' | 'HIGH' | 'URGENT'; createdAt: string; updatedAt: string; } ``` ## Endpoints ### Create Task Create a new task for the agent to process. Create a new task #### Request Body ```json { "description": "This is a description of the task", "priority": "MEDIUM" // Optional: LOW, MEDIUM, HIGH, URGENT } ``` #### With File Upload To upload files with a task, use `multipart/form-data`: ```bash curl -X POST http://localhost:9991/tasks \ -F "description=Analyze the uploaded contracts and extract key terms" \ -F "priority=HIGH" \ -F "files=@contract1.pdf" \ -F "files=@contract2.pdf" ``` Uploaded files are automatically saved to the desktop and can be referenced in the task description. #### Response ```json { "id": "task-123", "description": "This is a description of the task", "status": "PENDING", "priority": "MEDIUM", "createdAt": "2025-04-14T12:00:00Z", "updatedAt": "2025-04-14T12:00:00Z" } ``` ### Get All Tasks Retrieve a list of all tasks. Get all tasks #### Response ```json [ { "id": "task-123", "description": "This is a description of the task", "status": "PENDING", "priority": "MEDIUM", "createdAt": "2025-04-14T12:00:00Z", "updatedAt": "2025-04-14T12:00:00Z" }, // ...more tasks ] ``` ### Get In-Progress Task Retrieve the currently in-progress task, if any. Get the currently in-progress task #### Response ```json { "id": "task-123", "description": "This is a description of the task", "status": "IN_PROGRESS", "priority": "MEDIUM", "createdAt": "2025-04-14T12:00:00Z", "updatedAt": "2025-04-14T12:00:00Z" } ``` If no task is in progress, the response will be `null`. ### Get Task by ID Retrieve a specific task by its ID. Get a task by ID #### Response ```json { "id": "task-123", "description": "This is a description of the task", "status": "PENDING", "priority": "MEDIUM", "createdAt": "2025-04-14T12:00:00Z", "updatedAt": "2025-04-14T12:00:00Z", "messages": [ { "id": "msg-456", "content": [ { "type": "text", "text": "This is a message" } ], "role": "USER", "taskId": "task-123", "createdAt": "2025-04-14T12:05:00Z", "updatedAt": "2025-04-14T12:05:00Z" } // ...more messages ] } ``` ### Update Task Update an existing task. Update a task #### Request Body ```json { "status": "COMPLETED", "priority": "HIGH" } ``` #### Response ```json { "id": "task-123", "description": "This is a description of the task", "status": "COMPLETED", "priority": "HIGH", "createdAt": "2025-04-14T12:00:00Z", "updatedAt": "2025-04-14T12:01:00Z" } ``` ### Delete Task Delete a task. Delete a task #### Response Status code `204 No Content` with an empty response body. ## Message Content Structure Messages in the Bytebot agent system use a content block structure compatible with Anthropic's Claude API: ```typescript type MessageContent = MessageContentBlock[]; interface MessageContentBlock { type: string; [key: string]: any; } interface TextContentBlock { type: "text"; text: string; } interface ImageContentBlock { type: "image"; source: { type: "base64"; media_type: string; data: string; }; } ``` ## Error Responses The API may return the following error responses: | Status Code | Description | | ----------- | ----------------------------------------- | | `400` | Bad Request - Invalid parameters | | `404` | Not Found - Resource does not exist | | `500` | Internal Server Error - Server side error | Example error response: ```json { "statusCode": 404, "message": "Task with ID task-123 not found", "error": "Not Found" } ``` ## Code Examples ```javascript JavaScript const axios = require('axios'); async function createTask(description) { const response = await axios.post('http://localhost:9991/tasks', { description }); return response.data; } async function findInProgressTask() { const response = await axios.get('http://localhost:9991/tasks/in-progress'); return response.data; } // Example usage async function main() { // Create a new task const task = await createTask('Compare React, Vue, and Angular for a new project'); console.log('Created task:', task); // Get current in-progress task const inProgressTask = await findInProgressTask(); console.log('In progress task:', inProgressTask); } ``` ```python Python import requests def create_task(description): response = requests.post( "http://localhost:9991/tasks", json={ "description": description } ) return response.json() def find_in_progress_task(): response = requests.get("http://localhost:9991/tasks/in-progress") return response.json() # Example usage def main(): # Create a new task task = create_task("Compare React, Vue, and Angular for a new project") print(f"Created task: {task}") # Get current in-progress task in_progress_task = find_in_progress_task() print(f"In progress task: {in_progress_task}") ``` ```curl cURL # Create a new task curl -X POST http://localhost:9991/tasks \ -H "Content-Type: application/json" \ -d '{ "description": "Compare React, Vue, and Angular for a new project" }' # Get current in-progress task curl -X GET http://localhost:9991/tasks/in-progress ``` # Task UI Source: https://docs.bytebot.ai/api-reference/agent/ui Documentation for the Bytebot Task UI ## Bytebot Task UI The Bytebot Task UI provides a web-based interface for interacting with the Bytebot agent system. It combines a action feed with an embedded noVNC viewer, allowing you to watch it perform task on the desktop in real-time. Bytebot Task Detail ## Accessing the UI When running the full Bytebot agent system, the Task UI is available at: ``` http://localhost:9992 ``` ## UI Components ### Task Management Panel The task management panel allows you to: * Create new tasks * View existing tasks * See task status and priority * Select a task to work on Task Management Panel ### Task Interface The main task interface provides: * Task history with the agent * Support for markdown formatting in messages * Automatic scrolling to new messages ### Desktop Viewer The embedded noVNC viewer displays: * Real-time view of the desktop environment * Visual feedback of agent actions * Option to expand to take over the desktop * Connection status indicator ## Features ### Task Creation To create a new task: 1. Enter a description for the task 2. Click "Start Task" button (or press Enter) ### Conversation Controls The task interface supports: * Text messages with markdown formatting * Viewing image content in messages * Displaying tool use actions * Showing tool results ### Desktop Interaction While primarily for viewing, the desktop panel allows: * Taking over the desktop * Real-time monitoring of agent actions ## Message Types The task interface displays different types of messages based on Bytebot's content block structure: * **User Messages**: Your instructions and queries * **Assistant Messages**: Responses from the agent, which may include: * **Text Content Blocks**: Markdown-formatted text responses * **Image Content Blocks**: Images generated or captured * **Tool Use Content Blocks**: Computer actions being performed * **Tool Result Content Blocks**: Results of computer actions The message content structure follows this format: ```typescript interface Message { id: string; content: MessageContentBlock[]; role: Role; // "USER" or "ASSISTANT" createdAt?: string; } interface MessageContentBlock { type: string; [key: string]: any; } interface TextContentBlock extends MessageContentBlock { type: "text"; text: string; } interface ImageContentBlock extends MessageContentBlock { type: "image"; source: { type: "base64"; media_type: string; data: string; }; } ``` ## Technical Details The Bytebot Task UI is built with: * **Next.js**: React framework for the frontend * **Tailwind CSS**: For styling * **ReactMarkdown**: For rendering markdown content * **noVNC**: For the embedded desktop viewer ## Troubleshooting ### Connection Issues If you experience connection issues: 1. Ensure all Bytebot services are running 2. Check that ports 9990, 9991, and 9992 are accessible 3. Try refreshing the browser 4. Check browser console for error messages ### Desktop Viewer Issues If the desktop viewer is not displaying: 1. Ensure the Bytebot container is running 2. Check that the noVNC service is accessible at port 9990 ### Message Display Issues If messages are not displaying correctly: 1. Check that the message content is properly formatted 2. Ensure the agent service is processing task correctly 3. Check the browser console for any rendering errors 4. Try refreshing the browser # Computer Use API Examples Source: https://docs.bytebot.ai/api-reference/computer-use/examples Code examples for common automation scenarios using the Bytebot API ## Basic Examples Here are some practical examples of how to use the Computer Use API in different programming languages. ### Using cURL ```bash Opening a Web Browser # Move to Firefox/Chrome icon in the dock and click it curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "move_mouse", "coordinates": {"x": 100, "y": 960}}' curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "click_mouse", "button": "left", "clickCount": 1}' ``` ```bash Taking and Saving a Screenshot # Take a screenshot response=$(curl -s -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "screenshot"}') # Extract the base64 image data and save to a file echo $response | jq -r '.data.image' | base64 -d > screenshot.png ``` ```bash Typing and Keyboard Shortcuts # Type text in a text editor curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "type_text", "text": "Hello, this is an automated test!", "delay": 30}' # Press Ctrl+S to save curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "press_keys", "key": "s", "modifiers": ["control"]}' ``` ### Python Examples ```python Basic Automation import requests import json import base64 import time from io import BytesIO from PIL import Image def control_computer(action, **params): url = "http://localhost:9990/computer-use" data = {"action": action, **params} response = requests.post(url, json=data) return response.json() # Open a web browser by clicking an icon control_computer("move_mouse", coordinates={"x": 100, "y": 960}) control_computer("click_mouse", button="left") # Wait for the browser to open control_computer("wait", duration=2000) # Type a URL control_computer("type_text", text="https://example.com") control_computer("press_keys", key="enter") ``` ```python Screenshot and Analysis import requests import json import base64 import cv2 import numpy as np from PIL import Image from io import BytesIO def take_screenshot(): url = "http://localhost:9990/computer-use" data = {"action": "screenshot"} response = requests.post(url, json=data) if response.json()["success"]: img_data = base64.b64decode(response.json()["data"]["image"]) image = Image.open(BytesIO(img_data)) return np.array(image) return None # Take a screenshot img = take_screenshot() # Convert to grayscale for analysis if img is not None: gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Save the screenshot cv2.imwrite("screenshot.png", img) # Perform image analysis (example: find edges) edges = cv2.Canny(gray, 100, 200) cv2.imwrite("edges.png", edges) ``` ```python Web Form Automation import requests import time def control_computer(action, **params): url = "http://localhost:9990/computer-use" data = {"action": action, **params} response = requests.post(url, json=data) return response.json() def fill_web_form(form_fields): # Click on the first form field control_computer("move_mouse", coordinates=form_fields[0]) control_computer("click_mouse", button="left") # Fill out each field for i, field in enumerate(form_fields): # Input the field value control_computer("type_text", text=field["value"]) # If not the last field, press Tab to move to next field if i < len(form_fields) - 1: control_computer("press_keys", key="tab") time.sleep(0.5) # Submit the form by pressing Enter control_computer("press_keys", key="enter") # Example form fields with coordinates and values form_fields = [ {"x": 500, "y": 300, "value": "John Doe"}, {"x": 500, "y": 350, "value": "john@example.com"}, {"x": 500, "y": 400, "value": "Password123"} ] fill_web_form(form_fields) ``` ### JavaScript/Node.js Examples ```javascript Basic Automation const axios = require('axios'); async function controlComputer(action, params = {}) { const url = "http://localhost:9990/computer-use"; const data = { action, ...params }; try { const response = await axios.post(url, data); return response.data; } catch (error) { console.error('Error:', error.message); return { success: false, error: error.message }; } } // Example: Automate opening an application and typing async function automateTextEditor() { try { // Open text editor by clicking its icon await controlComputer("move_mouse", { coordinates: { x: 150, y: 960 } }); await controlComputer("click_mouse", { button: "left" }); // Wait for it to open await controlComputer("wait", { duration: 2000 }); // Type some text await controlComputer("type_text", { text: "This is an automated test using Node.js and Bytebot", delay: 30 }); console.log("Automation completed successfully"); } catch (error) { console.error("Automation failed:", error); } } automateTextEditor(); ``` ```javascript Advanced: Screenshot Comparison const axios = require('axios'); const fs = require('fs'); const { createCanvas, loadImage } = require('canvas'); const pixelmatch = require('pixelmatch'); async function controlComputer(action, params = {}) { const url = "http://localhost:9990/computer-use"; const data = { action, ...params }; try { const response = await axios.post(url, data); return response.data; } catch (error) { console.error('Error:', error.message); return { success: false, error: error.message }; } } async function compareScreenshots() { try { // Take first screenshot const screenshot1 = await controlComputer("screenshot"); // Do some actions await controlComputer("move_mouse", { coordinates: { x: 500, y: 500 } }); await controlComputer("click_mouse", { button: "left" }); await controlComputer("wait", { duration: 1000 }); // Take second screenshot const screenshot2 = await controlComputer("screenshot"); // Compare screenshots if (screenshot1.success && screenshot2.success) { const img1Data = Buffer.from(screenshot1.data.image, 'base64'); const img2Data = Buffer.from(screenshot2.data.image, 'base64'); fs.writeFileSync('screenshot1.png', img1Data); fs.writeFileSync('screenshot2.png', img2Data); // Now you could load and compare these images // This requires additional image comparison libraries console.log('Screenshots saved for comparison'); } } catch (error) { console.error("Screenshot comparison failed:", error); } } compareScreenshots(); ``` ## File Operations ### Writing Files These examples show how to write files to the desktop environment: ```python Python import requests import base64 def write_file(path, content): url = "http://localhost:9990/computer-use" # Encode content to base64 encoded_content = base64.b64encode(content.encode('utf-8')).decode('utf-8') data = { "action": "write_file", "path": path, "data": encoded_content } response = requests.post(url, json=data) return response.json() # Write a text file result = write_file("/home/user/hello.txt", "Hello, Bytebot!") print(result) # {'success': True, 'message': 'File written successfully...'} # Write to desktop (relative path) result = write_file("report.txt", "Daily report content") print(result) # File will be written to /home/user/Desktop/report.txt ``` ```javascript JavaScript const axios = require('axios'); async function writeFile(path, content) { const url = "http://localhost:9990/computer-use"; // Encode content to base64 const encodedContent = Buffer.from(content, 'utf-8').toString('base64'); const data = { action: "write_file", path: path, data: encodedContent }; const response = await axios.post(url, data); return response.data; } // Write a text file writeFile("/home/user/notes.txt", "Meeting notes...") .then(result => console.log(result)) .catch(error => console.error(error)); // Write HTML file to desktop const htmlContent = '

Hello

'; writeFile("index.html", htmlContent) .then(result => console.log("HTML file created")); ```
### Reading Files These examples show how to read files from the desktop environment: ```python Python import requests import base64 def read_file(path): url = "http://localhost:9990/computer-use" data = { "action": "read_file", "path": path } response = requests.post(url, json=data) result = response.json() if result['success']: # Decode the base64 content content = base64.b64decode(result['data']).decode('utf-8') return { 'content': content, 'name': result['name'], 'size': result['size'], 'mediaType': result['mediaType'] } else: return result # Read a text file file_data = read_file("/home/user/hello.txt") print(f"Content: {file_data['content']}") print(f"Size: {file_data['size']} bytes") print(f"Type: {file_data['mediaType']}") ``` ```javascript JavaScript const axios = require('axios'); async function readFile(path) { const url = "http://localhost:9990/computer-use"; const data = { action: "read_file", path: path }; const response = await axios.post(url, data); const result = response.data; if (result.success) { // Decode the base64 content const content = Buffer.from(result.data, 'base64').toString('utf-8'); return { content: content, name: result.name, size: result.size, mediaType: result.mediaType }; } else { throw new Error(result.message); } } // Read a file from desktop readFile("report.txt") .then(fileData => { console.log(`Content: ${fileData.content}`); console.log(`Size: ${fileData.size} bytes`); console.log(`Type: ${fileData.mediaType}`); }) .catch(error => console.error("Error reading file:", error)); ``` ## Automation Recipes ### Browser Automation This example demonstrates how to automate browser interactions: ```python import requests import time def control_computer(action, **params): url = "http://localhost:9990/computer-use" data = {"action": action, **params} response = requests.post(url, json=data) return response.json() def automate_browser(): # Open browser (assuming browser icon is at position x=100, y=960) control_computer("move_mouse", coordinates={"x": 100, "y": 960}) control_computer("click_mouse", button="left") time.sleep(3) # Wait for browser to open # Type URL control_computer("type_text", text="https://example.com") control_computer("press_keys", key="enter") time.sleep(2) # Wait for page to load # Take screenshot of the loaded page screenshot = control_computer("screenshot") # Click on a link (coordinates would need to be adjusted for your target) control_computer("move_mouse", coordinates={"x": 300, "y": 400}) control_computer("click_mouse", button="left") time.sleep(2) # Scroll down control_computer("scroll", direction="down", scrollCount=5) automate_browser() ``` ### Form Filling Automation This example shows how to automate filling out a form in a web application: ```javascript const axios = require("axios"); async function controlComputer(action, params = {}) { const url = "http://localhost:9990/computer-use"; const data = { action, ...params }; const response = await axios.post(url, data); return response.data; } async function fillForm() { // Click first input field await controlComputer("move_mouse", { coordinates: { x: 400, y: 300 } }); await controlComputer("click_mouse", { button: "left" }); // Type name await controlComputer("type_text", { text: "John Doe" }); // Tab to next field await controlComputer("press_keys", { key: "tab" }); // Type email await controlComputer("type_text", { text: "john@example.com" }); // Tab to next field await controlComputer("press_keys", { key: "tab" }); // Type message await controlComputer("type_text", { text: "This is an automated message sent using Bytebot's Computer Use API", delay: 30, }); // Tab to submit button await controlComputer("press_keys", { key: "tab" }); // Press Enter to submit await controlComputer("press_keys", { key: "enter" }); } fillForm().catch(console.error); ``` ## Integration with Testing Frameworks The Computer Use API can be integrated with popular testing frameworks: ### Selenium Alternative Bytebot can serve as an alternative to Selenium for web testing: ```python import requests import time import json class BytebotWebDriver: def __init__(self, base_url="http://localhost:9990"): self.base_url = base_url def control_computer(self, action, **params): url = f"{self.base_url}/computer-use" data = {"action": action, **params} response = requests.post(url, json=data) return response.json() def open_browser(self, browser_icon_coords): self.control_computer("move_mouse", coordinates=browser_icon_coords) self.control_computer("click_mouse", button="left") time.sleep(3) # Wait for browser to open def navigate_to(self, url): self.control_computer("type_text", text=url) self.control_computer("press_keys", key="enter") time.sleep(2) # Wait for page to load def click_element(self, coords): self.control_computer("move_mouse", coordinates=coords) self.control_computer("click_mouse", button="left") def type_text(self, text): self.control_computer("type_text", text=text) def press_keys(self, key, modifiers=None): params = {"key": key} if modifiers: params["modifiers"] = modifiers self.control_computer("press_keys", **params) def take_screenshot(self): return self.control_computer("screenshot") # Usage example driver = BytebotWebDriver() driver.open_browser({"x": 100, "y": 960}) driver.navigate_to("https://example.com") driver.click_element({"x": 300, "y": 400}) driver.type_text("Hello Bytebot!") ``` # Unified Computer Actions API Source: https://docs.bytebot.ai/api-reference/computer-use/unified-endpoint Control all aspects of the desktop environment with a single endpoint ## Overview The unified computer action API allows for granular control over all aspects of the Bytebot virtual desktop environment through a single endpoint. It replaces multiple specific endpoints with a unified interface that handles various computer actions like mouse movements, clicks, key presses, and more. ## Endpoint | Method | URL | Description | | ------ | --------------- | ----------------------------------------------- | | POST | `/computer-use` | Execute computer actions in the virtual desktop | ## Request Format All requests to the unified endpoint follow this format: ```json { "action": "action_name", ...action-specific parameters } ``` The `action` parameter determines which operation to perform, and the remaining parameters depend on the specific action. ## Available Actions ### move\_mouse Move the mouse cursor to a specific position. **Parameters:** | Parameter | Type | Required | Description | | --------------- | ------ | -------- | --------------------------------- | | `coordinates` | Object | Yes | The target coordinates to move to | | `coordinates.x` | Number | Yes | X coordinate | | `coordinates.y` | Number | Yes | Y coordinate | **Example:** ```json { "action": "move_mouse", "coordinates": { "x": 100, "y": 200 } } ``` ### trace\_mouse Move the mouse along a path of coordinates. **Parameters:** | Parameter | Type | Required | Description | | ---------- | ------ | -------- | ---------------------------------------------- | | `path` | Array | Yes | Array of coordinate objects for the mouse path | | `path[].x` | Number | Yes | X coordinate for each point in the path | | `path[].y` | Number | Yes | Y coordinate for each point in the path | | `holdKeys` | Array | No | Keys to hold while moving along the path | **Example:** ```json { "action": "trace_mouse", "path": [ { "x": 100, "y": 100 }, { "x": 150, "y": 150 }, { "x": 200, "y": 200 } ], "holdKeys": ["shift"] } ``` ### click\_mouse Perform a mouse click at the current or specified position. **Parameters:** | Parameter | Type | Required | Description | | --------------- | ------ | -------- | ------------------------------------------------------ | | `coordinates` | Object | No | The coordinates to click (uses current if omitted) | | `coordinates.x` | Number | Yes\* | X coordinate | | `coordinates.y` | Number | Yes\* | Y coordinate | | `button` | String | Yes | Mouse button: 'left', 'right', or 'middle' | | `clickCount` | Number | Yes | Number of clicks to perform | | `holdKeys` | Array | No | Keys to hold while clicking (e.g., \['ctrl', 'shift']) | **Example:** ```json { "action": "click_mouse", "coordinates": { "x": 150, "y": 250 }, "button": "left", "clickCount": 2 } ``` ### press\_mouse Press or release a mouse button at the current or specified position. **Parameters:** | Parameter | Type | Required | Description | | --------------- | ------ | -------- | ---------------------------------------------------------- | | `coordinates` | Object | No | The coordinates to press/release (uses current if omitted) | | `coordinates.x` | Number | Yes\* | X coordinate | | `coordinates.y` | Number | Yes\* | Y coordinate | | `button` | String | Yes | Mouse button: 'left', 'right', or 'middle' | | `press` | String | Yes | Action: 'up' or 'down' | **Example:** ```json { "action": "press_mouse", "coordinates": { "x": 150, "y": 250 }, "button": "left", "press": "down" } ``` ### drag\_mouse Click and drag the mouse from one point to another. **Parameters:** | Parameter | Type | Required | Description | | ---------- | ------ | -------- | --------------------------------------------- | | `path` | Array | Yes | Array of coordinate objects for the drag path | | `path[].x` | Number | Yes | X coordinate for each point in the path | | `path[].y` | Number | Yes | Y coordinate for each point in the path | | `button` | String | Yes | Mouse button: 'left', 'right', or 'middle' | | `holdKeys` | Array | No | Keys to hold while dragging | **Example:** ```json { "action": "drag_mouse", "path": [ { "x": 100, "y": 100 }, { "x": 200, "y": 200 } ], "button": "left" } ``` ### scroll Scroll up, down, left, or right. **Parameters:** | Parameter | Type | Required | Description | | --------------- | ------ | -------- | ------------------------------------------------------ | | `coordinates` | Object | No | The coordinates to scroll at (uses current if omitted) | | `coordinates.x` | Number | Yes\* | X coordinate | | `coordinates.y` | Number | Yes\* | Y coordinate | | `direction` | String | Yes | Scroll direction: 'up', 'down', 'left', 'right' | | `scrollCount` | Number | Yes | Number of scroll steps | | `holdKeys` | Array | No | Keys to hold while scrolling | **Example:** ```json { "action": "scroll", "direction": "down", "scrollCount": 5 } ``` ### type\_keys Type a sequence of keyboard keys. **Parameters:** | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------------- | | `keys` | Array | Yes | Array of keys to type in sequence | | `delay` | Number | No | Delay between key presses (ms) | **Example:** ```json { "action": "type_keys", "keys": ["a", "b", "c", "enter"], "delay": 50 } ``` ### press\_keys Press or release keyboard keys. **Parameters:** | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------------- | | `keys` | Array | Yes | Array of keys to press or release | | `press` | String | Yes | Action: 'up' or 'down' | **Example:** ```json { "action": "press_keys", "keys": ["ctrl", "shift", "esc"], "press": "down" } ``` ### type\_text Type a text string with optional delay. **Parameters:** | Parameter | Type | Required | Description | | --------- | ------ | -------- | ----------------------------------------------------- | | `text` | String | Yes | The text to type | | `delay` | Number | No | Delay between characters in milliseconds (default: 0) | **Example:** ```json { "action": "type_text", "text": "Hello, Bytebot!", "delay": 50 } ``` ### paste\_text Paste text to the current cursor position. This is especially useful for special characters that aren't on the standard keyboard. **Parameters:** | Parameter | Type | Required | Description | | --------- | ------ | -------- | ---------------------------------------------------------- | | `text` | String | Yes | The text to paste, including special characters and emojis | **Example:** ```json { "action": "paste_text", "text": "Special characters: ©®™€¥£ émojis 🎉" } ``` ### wait Wait for a specified duration. **Parameters:** | Parameter | Type | Required | Description | | ---------- | ------ | -------- | ----------------------------- | | `duration` | Number | Yes | Wait duration in milliseconds | **Example:** ```json { "action": "wait", "duration": 2000 } ``` ### screenshot Capture a screenshot of the desktop. **Parameters:** None required **Example:** ```json { "action": "screenshot" } ``` ### cursor\_position Get the current position of the mouse cursor. **Parameters:** None required **Example:** ```json { "action": "cursor_position" } ``` ### application Switch between different applications or navigate to the desktop/directory. **Parameters:** | Parameter | Type | Required | Description | | ------------- | ------ | -------- | ---------------------------------------------------------- | | `application` | String | Yes | The application to switch to. See available options below. | **Available Applications:** * `firefox` - Mozilla Firefox web browser * `1password` - Password manager * `thunderbird` - Email client * `vscode` - Visual Studio Code editor * `terminal` - Terminal/console application * `desktop` - Switch to desktop * `directory` - File manager/directory browser **Example:** ```json { "action": "application", "application": "firefox" } ``` ### write\_file Write a file to the desktop environment filesystem. **Parameters:** | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------------------------------------ | | `path` | String | Yes | File path (absolute or relative to /home/user/Desktop) | | `data` | String | Yes | Base64 encoded file content | **Example:** ```json { "action": "write_file", "path": "/home/user/documents/example.txt", "data": "SGVsbG8gV29ybGQh" } ``` ### read\_file Read a file from the desktop environment filesystem. **Parameters:** | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------------------------------------ | | `path` | String | Yes | File path (absolute or relative to /home/user/Desktop) | **Example:** ```json { "action": "read_file", "path": "/home/user/documents/example.txt" } ``` ## Response Format The response format varies depending on the action performed. ### Standard Response Most actions return a simple success response: ```json { "success": true } ``` ### Screenshot Response ```json { "success": true, "data": { "image": "base64_encoded_image_data" } } ``` ### Cursor Position Response ```json { "success": true, "data": { "x": 123, "y": 456 } } ``` ### Write File Response ```json { "success": true, "message": "File written successfully to: /home/user/documents/example.txt" } ``` ### Read File Response ```json { "success": true, "data": "SGVsbG8gV29ybGQh", "name": "example.txt", "size": 12, "mediaType": "text/plain" } ``` ### Error Response ```json { "success": false, "error": "Error message" } ``` ## Code Examples ### JavaScript/Node.js Example ```javascript const axios = require('axios'); const bytebot = { baseUrl: 'http://localhost:9990/computer-use/computer', async action(params) { try { const response = await axios.post(this.baseUrl, params); return response.data; } catch (error) { console.error('Error:', error.response?.data || error.message); throw error; } }, // Convenience methods async moveMouse(x, y) { return this.action({ action: 'move_mouse', coordinates: { x, y } }); }, async clickMouse(x, y, button = 'left') { return this.action({ action: 'click_mouse', coordinates: { x, y }, button }); }, async typeText(text) { return this.action({ action: 'type_text', text }); }, async pasteText(text) { return this.action({ action: 'paste_text', text }); }, async switchApplication(application) { return this.action({ action: 'application', application }); }, async screenshot() { return this.action({ action: 'screenshot' }); } }; // Example usage: async function example() { // Switch to Firefox await bytebot.switchApplication('firefox'); // Navigate to a website await bytebot.moveMouse(100, 35); await bytebot.clickMouse(100, 35); await bytebot.typeText('https://example.com'); await bytebot.action({ action: 'press_keys', keys: ['enter'], press: 'down' }); // Wait for page to load await bytebot.action({ action: 'wait', duration: 2000 }); // Paste some special characters await bytebot.pasteText('© 2025 Example Corp™ - €100'); // Take a screenshot const result = await bytebot.screenshot(); console.log('Screenshot taken!'); } example().catch(console.error); ``` # API Reference Source: https://docs.bytebot.ai/api-reference/introduction Overview of the Bytebot API endpoints for programmatic control # Bytebot API Overview Bytebot provides two main APIs for programmatic control: ## 1. Agent API (Task Management) The Agent API runs on port 9991 and provides high-level task management: Create, manage, and monitor AI-powered tasks programmatically WebSocket connections and real-time updates for custom UIs ### Agent API Base URL ``` http://localhost:9991 ``` ### Example Task Creation ```bash curl -X POST http://localhost:9991/tasks \ -H "Content-Type: application/json" \ -d '{ "description": "Download invoices from webmail and organize by date", "priority": "HIGH" }' ``` ## 2. Desktop API (Direct Control) The Desktop API runs on port 9990 and provides low-level desktop control: Direct control of mouse, keyboard, and screen capture Code examples for common automation scenarios ### Desktop API Base URL ``` http://localhost:9990 ``` ### Example Desktop Control ```bash curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "screenshot"}' ``` ### MCP Support The Desktop API also exposes an MCP (Model Context Protocol) endpoint: ``` http://localhost:9990/mcp ``` Connect your MCP client to access desktop control tools over SSE. ## Authentication * **Local Access**: No authentication required by default * **Remote Access**: Configure authentication based on your security requirements * **Production**: Implement API keys, OAuth, or other authentication methods ## Response Formats ### Agent API Response ```json { "id": "task-123", "status": "RUNNING", "description": "Your task description", "messages": [...], "createdAt": "2024-01-01T00:00:00Z" } ``` ### Desktop API Response ```json { "success": true, "data": { ... }, // Response data specific to the action "error": null // Error message if success is false } ``` ## Error Handling Both APIs use standard HTTP status codes: | Status Code | Description | | ----------- | ------------------------------------ | | 200 | Success | | 201 | Created (new resource) | | 400 | Bad Request - Invalid parameters | | 401 | Unauthorized - Authentication failed | | 404 | Not Found - Resource doesn't exist | | 500 | Internal Server Error | ## Rate Limiting * **Agent API**: No hard limits, but consider task queue capacity * **Desktop API**: No rate limiting, but rapid actions may impact desktop performance ## Best Practices 1. **Use Agent API for high-level automation** - Let the AI handle complexity 2. **Use Desktop API for precise control** - When you need exact actions 3. **Combine both APIs** - Create tasks via Agent API, monitor via Desktop API 4. **Handle errors gracefully** - Implement retry logic for transient failures 5. **Monitor resource usage** - Both APIs can be resource-intensive ## Next Steps Get your APIs running See the APIs in action # Agent System Source: https://docs.bytebot.ai/core-concepts/agent-system The AI brain that powers your self-hosted desktop automation ## Overview The Bytebot Agent System transforms a simple desktop container into an intelligent, autonomous computer user. By combining Claude AI with structured task management, it can understand natural language requests and execute complex workflows just like a human would. Bytebot Agent Architecture ## How the AI Agent Works ### The Brain: Multi-Model AI Integration At the heart of Bytebot is a flexible AI integration that supports multiple models. Choose the AI that best fits your needs: **Anthropic Claude** (Default): * Best for complex reasoning and visual understanding * Excellent at following detailed instructions * Superior performance on desktop automation tasks **OpenAI GPT Models**: * Fast and reliable for general automation * Strong code understanding and generation * Cost-effective for routine tasks **Google Gemini**: * Efficient for high-volume tasks * Good balance of speed and capability * Excellent multilingual support The agent with any model: 1. **Understands Context**: Processes your natural language requests with full conversation history 2. **Plans Actions**: Breaks down complex tasks into executable computer actions 3. **Adapts in Real-time**: Adjusts its approach based on what it sees on screen 4. **Learns from Feedback**: Improves task execution through conversation ### Conversation Flow "Research competitors for my SaaS product and create a comparison table" The AI model understands the request and plans: open browser → search → visit sites → extract data → create document The agent controls the desktop: clicking, typing, taking screenshots, reading content Real-time status updates and asks for clarification when needed Completes the task and provides the output (files, screenshots, summaries) ## Task Management System ### Task Lifecycle Tasks move through a structured lifecycle: ```mermaid graph LR A[Created] --> B[Queued] B --> C[Running] C --> D[Needs Help] C --> E[Completed] C --> F[Failed] D --> C ``` ### Task Properties Each task contains: * **Description**: What needs to be done * **Priority**: Urgent, High, Medium, or Low * **Status**: Current state in the lifecycle * **Type**: Immediate or Scheduled * **History**: All messages and actions taken ### Smart Task Processing The agent processes tasks intelligently: 1. **Priority Queue**: Urgent tasks run first 2. **Error Recovery**: Automatically retries failed actions 3. **Human in the Loop**: Asks for help when stuck 4. **Context Preservation**: Maintains conversation history across sessions ## Real-world Capabilities ### What the Agent Can Do * Browse websites * Fill out forms * Extract data * Download files * Monitor changes * Create documents * Edit spreadsheets * Generate reports * Organize files * Convert formats * Access webmail through browser * Read and extract information * Fill contact forms * Navigate communication portals * Handle verification flows * Extract from PDFs * Process CSV files * Create visualizations * Generate summaries * Transform data ## Technical Architecture ### Core Components 1. **NestJS Agent Service** * Integrates with multiple AI provider APIs (Anthropic, OpenAI, Google) * Handles WebSocket connections * Coordinates with desktop API 2. **Message System** * Structured conversation format * Supports text and images * Maintains full context * Enables rich interactions 3. **Database Schema** ```sql Tasks: id, description, status, priority, timestamps Messages: id, task_id, role, content, timestamps Summaries: id, task_id, content, parent_id ``` 4. **Computer Action Bridge** * Translates AI decisions to desktop actions * Handles screenshots and feedback * Manages action timing * Provides error handling ### API Endpoints Key endpoints for programmatic control: ```typescript // Create a new task POST /tasks { "description": "Your task description", "priority": "HIGH", "type": "IMMEDIATE" } // Get task status GET /tasks/:id // Send a message POST /tasks/:id/messages { "content": "Additional instructions" } // Get task history GET /tasks/:id/messages ``` ## Chat UI Features The web interface provides: ### Real-time Interaction * Live chat with the AI agent * Instant status updates * Progress indicators * Error notifications ### Visual Feedback * Embedded desktop viewer * Screenshot history * Action replay * Task timeline ### Task Management * Create and prioritize tasks * View active and completed tasks * Export conversation logs * Manage task queues ## Security & Privacy ### Data Isolation * All processing happens in your infrastructure * No data sent to external services (except your chosen AI provider API) * Conversations stored locally * Complete audit trail ### Access Control * Configurable authentication * API key management * Network isolation options ## Extending the Agent ### Integration Points * External API calls via the Agent API * Custom AI prompts for specialized workflows * MCP protocol support for tool integration ### Best Practices 1. **Clear Instructions**: Be specific about desired outcomes 2. **Break Down Complex Tasks**: Use multiple smaller tasks for better results 3. **Provide Context**: Include relevant files or URLs 4. **Monitor Progress**: Watch the desktop view for real-time feedback 5. **Review Results**: Verify outputs meet requirements ## Troubleshooting * Check your AI provider API key is valid * Verify agent service is running * Review logs for errors * Ensure sufficient API credits/quota with your provider * Monitor system resources * Check network latency * Reduce screenshot frequency * Optimize AI prompts for your chosen model * Consider switching to a faster model (e.g., Gemini Flash) ## Next Steps Get your agent running Integrate with your apps See what's possible Optimize your workflows # Architecture Source: https://docs.bytebot.ai/core-concepts/architecture How Bytebot's desktop agent works under the hood ## Overview Bytebot is a self-hosted AI desktop agent built with a modular architecture. It combines a Linux desktop environment with AI to create an autonomous computer user that can perform tasks through natural language instructions. Bytebot Architecture Diagram ## System Architecture The system consists of four main components that work together: ### 1. Bytebot Desktop Container The foundation of the system - a virtual Linux desktop that provides: * **Ubuntu 22.04 LTS** base for stability and compatibility * **XFCE4 Desktop** for a lightweight, responsive UI * **bytebotd Daemon** - The automation service built on nutjs that executes computer actions * **Pre-installed Applications**: Firefox ESR, Thunderbird, text editors, and development tools * **noVNC** for remote desktop access **Key Features:** * Runs completely isolated from your host system * Consistent environment across different platforms * Can be customized with additional software * Accessible via REST API on port 9990 * MCP SSE endpoint available at `/mcp` * Uses shared types from `@bytebot/shared` package ### 2. AI Agent Service The brain of the system - orchestrates tasks using an LLM: * **NestJS Framework** for robust, scalable backend * **LLM Integration** supporting Anthropic Claude, OpenAI GPT, and Google Gemini models * **WebSocket Support** for real-time updates * **Computer Use API Client** to control the desktop * **Prisma ORM** for database operations * **Tool definitions** for computer actions (mouse, keyboard, screenshots) **Responsibilities:** * Interprets natural language requests * Plans sequences of computer actions * Manages task state and progress * Handles errors and retries * Provides real-time task updates via WebSocket ### 3. Web Task Interface The user interface for interacting with your AI agent: * **Next.js 15 Application** with TypeScript for type safety * **Embedded VNC Viewer** to watch the desktop in action * **Task Management** UI with status badges * **WebSocket Connections** for live updates * **Reusable components** for consistent UI * **API utilities** for streamlined server communication **Features:** * Task creation and management interface * Desktop tab for direct manual control * Real-time desktop viewer with takeover mode * Task history and status tracking * Responsive design for all devices ### 4. PostgreSQL Database Persistent storage for the agent system: * **Tasks Table**: Stores task details, status, and metadata * **Messages Table**: Stores AI conversation history * **Prisma ORM** for type-safe database access ## Data Flow ### Task Execution Flow User describes a task in natural language via the chat UI Agent service creates a task record and adds it to the processing queue The LLM analyzes the task and generates a plan of computer actions Agent sends computer actions to bytebotd via REST API or MCP bytebotd executes actions (mouse, keyboard, screenshots) on the desktop Agent receives results, updates task status, and continues or completes Results and status updates are sent back to the user in real-time ### Communication Protocols ```mermaid graph LR A[Tasks UI] -->|WebSocket| B[Agent Service] A -->|HTTP Proxy| C[Desktop VNC] B -->|REST/MCP| D[Desktop API] B -->|SQL| E[PostgreSQL] B -->|HTTPS| F[LLM Provider] D -->|IPC| G[bytebotd] ``` ## Security Architecture ### Isolation Layers 1. **Container Isolation** * Each desktop runs in its own Docker container * No access to host filesystem by default * Network isolation with explicit port mapping 2. **Process Isolation** * bytebotd runs as non-root user * Separate processes for different services * Resource limits enforced by Docker 3. **Network Security** * Services only accessible from localhost by default * Can be configured with authentication * HTTPS/WSS for external connections ### API Security * **Desktop API**: No authentication by default (localhost only). Supports REST and MCP. * **Agent API**: Can be secured with API keys * **Database**: Password protected, not exposed externally Default configuration is for development. For production: * Enable authentication on all APIs * Use HTTPS/WSS for all connections * Implement network policies * Rotate credentials regularly ## Deployment Patterns ### Single User (Development) ```yaml Services: All on one machine Scale: 1 instance each Use Case: Personal automation, development Resources: 4GB RAM, 2 CPU cores ``` ### Production Deployment ```yaml Services: All services on dedicated hardware Scale: Single instance (1 agent, 1 desktop) Use Case: Business automation Resources: 8GB+ RAM, 4+ CPU cores ``` ### Enterprise Deployment ```yaml Services: Kubernetes orchestration Scale: Single instance with high availability Use Case: Organization-wide automation Resources: Dedicated nodes ``` ## Extension Points ### Custom Tools Add specialized software to the desktop: ```dockerfile FROM bytebot/desktop:latest RUN apt-get update && apt-get install -y \ your-custom-tools ``` ### AI Integrations Extend agent capabilities: * Custom tools for the LLM * Additional AI models * Specialized prompts * Domain-specific knowledge ## Performance Considerations ### Resource Usage * **Desktop Container**: \~1GB RAM idle, 2GB+ active * **Agent Service**: \~256MB RAM * **UI Service**: \~128MB RAM * **Database**: \~256MB RAM ### Optimization Tips 1. Allocate sufficient resources to containers 2. Limit concurrent tasks to prevent overload 3. Monitor resource usage regularly 4. Use LiteLLM proxy for provider flexibility ## Next Steps Learn about the AI agent capabilities Explore the virtual desktop environment Integrate with your applications Deploy your own instance # Desktop Environment Source: https://docs.bytebot.ai/core-concepts/desktop-environment The virtual Linux desktop where Bytebot performs tasks ## Overview The Bytebot Desktop Environment (also called Bytebot Core) is a complete Linux desktop that runs in a Docker container. This is where Bytebot does its work - clicking buttons, typing text, browsing websites, and using applications just like you would. Bytebot Desktop Environment ## Why a Virtual Desktop? ### Complete Isolation * **No Risk to Host**: All actions happen inside the container * **Sandboxed Environment**: Desktop can't access your host system * **Easy Reset**: Destroy and recreate in seconds * **Clean Workspace**: Each restart provides a fresh environment ### Consistency Everywhere * **Platform Independent**: Same environment on Mac, Windows, or Linux * **Reproducible**: Identical setup every time * **Version Control**: Pin specific versions for stability * **No Dependencies**: Everything included in the container ### Built for Automation * **Predictable UI**: Consistent element positioning * **Clean Environment**: No popups or distractions * **Automation-Ready**: Optimized for programmatic control * **Fast Startup**: Desktop ready in seconds ## Technical Stack ### Base System * **Ubuntu 22.04 LTS**: Stable, well-supported Linux distribution * **XFCE4 Desktop**: Lightweight, responsive desktop environment * **X11 Display Server**: Standard Linux graphics system * **supervisord**: Service management ### Pre-installed Software * Firefox ESR (Extended Support Release) * Pre-configured for automation * Clean profile without distractions * Text editor * Office tools * PDF viewer * File manager * Thunderbird email client * Terminal emulator * 1Password password manager * Visual Studio Code (VSCode) * Git version control * Python 3 environment ### Core Services 1. **bytebotd Daemon** * Runs on port 9990 * Handles all automation requests * Built on nutjs framework * Provides REST API 2. **noVNC Web Client** * Browser-based desktop access * No client installation needed * WebSocket proxy included 3. **Supervisor** * Process management * Service monitoring * Automatic restarts * Log management ## Desktop Features ### Display Configuration ```bash # Resolution 1920x1080 @ 24-bit color ``` ### User Environment * **Username**: `user` * **Home Directory**: `/home/user` * **Sudo Access**: Yes (passwordless) * **Desktop Session**: Auto-login enabled ### File System ``` /home/user/ ├── Desktop/ # Desktop shortcuts ├── Documents/ # User documents ├── Downloads/ # Browser downloads ├── .config/ # Application configs └── .local/ # User data ``` ## Accessing the Desktop ### Web Browser (Recommended) Navigate to `http://localhost:9990/vnc` for instant access: * No software installation required * Works on any device with a browser * Supports touch devices * Clipboard sharing ### MCP Control The core container also exposes an [MCP](https://github.com/rekog-labs/MCP-Nest) endpoint. Connect your MCP client to `http://localhost:9990/mcp` to invoke these tools over SSE. ```json { "mcpServers": { "bytebot": { "command": "npx", "args": [ "mcp-remote", "http://127.0.0.1:9990/mcp", "--transport", "http-first" ] } } } ``` ### Direct API Control Most efficient for automation: ```bash # Take a screenshot curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "screenshot"}' # Move mouse curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "move_mouse", "coordinate": {"x": 500, "y": 300}}' ``` ## Customization ### Adding Software Create a custom Dockerfile: ```dockerfile FROM ghcr.io/bytebot-ai/bytebot-desktop:edge # Install additional packages RUN apt-get update && apt-get install -y \ slack-desktop \ zoom \ your-custom-app # Copy configuration files COPY configs/ /home/user/.config/ ``` ## Performance Optimization ### Resource Allocation ```yaml # Recommended settings deploy: resources: limits: cpus: '2' memory: 4G reservations: cpus: '1' memory: 2G ``` ## Security Hardening Default configuration prioritizes ease of use. For production, apply these security measures: ### Essential Security Steps 1. **Change Default Passwords** ```bash # Set user password passwd bytebot ``` 2. **Limit Network Access** ```yaml # Whitelist specific domains environment: - ALLOWED_DOMAINS=company.com,trusted-site.com # Or restrict to local network only ports: - "10.0.0.0/8:9990:9990" ``` ## Troubleshooting Check logs: ```bash docker logs bytebot-desktop ``` Common issues: * Insufficient memory * Port conflicts * Display server errors Monitor resources: ```bash docker stats bytebot-desktop ``` Solutions: * Increase memory allocation * Check disk space * Update container image ## Best Practices 1. **Regular Updates**: Keep the base image updated for security patches 2. **Persistent Storage**: Mount volumes for important data 3. **Backup Configurations**: Save customizations outside the container 4. **Monitor Resources**: Track CPU/memory usage 5. **Clean Temporary Files**: Periodic cleanup for performance ## Next Steps Deploy your first agent Control the desktop programmatically Add AI capabilities Set up authentication # Bytebot vs Traditional RPA Source: https://docs.bytebot.ai/core-concepts/rpa-comparison How Bytebot revolutionizes enterprise automation beyond traditional RPA tools # The Next Generation of Enterprise Automation Bytebot represents a fundamental shift in how businesses approach process automation. While traditional RPA tools like UiPath, Automation Anywhere, and Blue Prism require extensive scripting and brittle workflows, Bytebot leverages AI to understand and execute tasks like a human would. ## Traditional RPA Limitations Traditional RPA breaks when UI elements change even slightly Requires specialized developers and lengthy implementation cycles Constant updates needed as applications evolve Can't handle unexpected scenarios or variations ## How Bytebot is Different ### Visual Intelligence vs Element Mapping **Traditional RPA:** ```xml ``` **Bytebot:** ``` "Click the blue Submit button at the bottom of the form" ``` Bytebot understands interfaces visually, just like a human. It doesn't rely on fragile technical selectors that break with every update. ### Natural Language vs Complex Scripting **Traditional RPA Workflow:** * Design in Studio * Map every element * Script error handling * Test extensively * Deploy with fingers crossed * Fix when it breaks (often) **Bytebot Workflow:** * Describe what you need * Bytebot figures it out * Handles errors intelligently * Adapts to changes automatically ## Real-World Enterprise Examples ### Financial Services Automation ```csharp // 500+ lines of code to handle one banking portal var loginPage = new LoginPageObject(); loginPage.WaitForElement("username", 30); loginPage.EnterText("username", credentials.User); loginPage.EnterText("password", credentials.Pass); // Handle 2FA with complex conditional logic if (loginPage.Has2FAPrompt()) { var method = loginPage.Get2FAMethod(); switch(method) { case "SMS": // 50 more lines of code case "Email": // 50 more lines of code case "Authenticator": // 50 more lines of code } } // Download statements with exact selectors navigation.ClickElement("xpath://div[@id='acct-menu']"); navigation.ClickElement("xpath://a[contains(@href,'statements')]"); // ... continues for hundreds more lines ``` ``` Task: "Log into Chase banking portal, navigate to statements, download all statements from last month for account ending in 4521, and save them to Finance/BankStatements/Chase/" That's it. Bytebot handles everything - including 2FA - automatically. ``` ### Multi-System Integration A FinTech company needed to automate operators who: 1. Log into multiple banking portals with 2FA 2. Download transaction files 3. Run proprietary scripts on those files 4. Upload results to internal systems **Traditional RPA Challenge:** * 6 months to implement * Breaks monthly with UI changes * Requires dedicated maintenance team * Can't handle new banks without development * Complex 2FA handling logic for each bank **Bytebot Solution:** * Deployed in 1 week * Adapts to UI changes automatically * 2FA handled automatically via password manager * New banks added with simple instructions * Zero manual intervention required ## Performance Comparison | Metric | Traditional RPA | Bytebot | | ------------------------- | -------------------- | ---------------------------------- | | **Implementation Time** | 3-6 months | 1-2 weeks | | **Developer Requirement** | RPA specialists | Any technical user | | **Maintenance Effort** | 40% of dev time | Near zero | | **Handling UI Changes** | Breaks immediately | Adapts automatically | | **Error Recovery** | Pre-scripted only | Intelligent adaptation | | **New Process Addition** | Weeks of development | Minutes to describe | | **Cost** | \$100k+ annually | Self-hosted on your infrastructure | ## Common RPA Migration Patterns ### 1. Invoice Processing **Before (UiPath):** * 2000+ lines of workflow XML * Breaks when vendor portal updates * Requires exact folder structures * Failed on unexpected popups **After (Bytebot):** * One paragraph description * Handles portal changes * Asks for help when needed * Processes variations intelligently ### 2. Compliance Reporting **Before (Automation Anywhere):** * Complex bot orchestration * Separate bots per system * Rigid scheduling * No flexibility **After (Bytebot):** * Single unified workflow * Natural language instructions * Dynamic adaptation * Human collaboration when needed ### 3. Data Migration **Before (Blue Prism):** * Massive process definitions * Exact field mapping required * Breaks on data variations * Limited error handling **After (Bytebot):** * Describe the mapping rules * Handles variations intelligently * Asks for clarification * Visual validation included ## Integration with Existing RPA Bytebot can work alongside existing RPA investments: ```mermaid graph LR A[Legacy RPA] -->|Handles stable processes| B[Structured Systems] C[Bytebot] -->|Handles complex/changing processes| D[Dynamic Systems] C -->|Takes over when RPA fails| A E[Human Operator] -->|Guides via takeover mode| C ``` ## Enterprise Architecture ### Deployment Options Deploy in your data center for maximum security and compliance Use your AWS/Azure/GCP infrastructure with full control Process sensitive data locally, leverage cloud for scaling Completely isolated deployment for classified environments ### Security & Compliance * **Data Sovereignty**: All processing on your infrastructure * **Audit Trails**: Complete logs of every action * **Access Control**: Integrate with your IAM/SSO * **Compliance**: SOC2, HIPAA, PCI-DSS compatible deployments ## Getting Started with Migration List your current RPA workflows, especially: * Those that break frequently * Require regular maintenance * Handle multiple systems * Need human decision points Pick one problematic workflow: * Document the business process * Deploy Bytebot * Describe the task naturally * Compare results As confidence grows: * Migrate more complex processes * Retire brittle RPA bots * Reduce maintenance overhead * Scale across departments ## Next Steps Deploy Bytebot in your environment View source code and contribute Join our Discord for support Get help with enterprise deployments **Ready to move beyond traditional RPA?** Bytebot brings human-like intelligence to process automation, eliminating the brittleness and complexity of traditional tools while delivering enterprise-grade reliability and security. # Helm Deployment Source: https://docs.bytebot.ai/deployment/helm Deploy Bytebot on Kubernetes using Helm charts # Deploy Bytebot on Kubernetes with Helm Helm provides a simple way to deploy Bytebot on Kubernetes clusters. ## Prerequisites * Kubernetes cluster (1.19+) * Helm 3.x installed * kubectl configured * 8GB+ available memory in cluster ## Quick Start ```bash git clone https://github.com/bytebot-ai/bytebot.git cd bytebot ``` Create a `values.yaml` file with at least one API key: ```yaml bytebot-agent: apiKeys: anthropic: value: "sk-ant-your-key-here" # Optional: Add more providers # openai: # value: "sk-your-key-here" # gemini: # value: "your-key-here" ``` ```bash helm install bytebot ./helm \ --namespace bytebot \ --create-namespace \ -f values.yaml ``` ```bash # Port-forward for local access kubectl port-forward -n bytebot svc/bytebot-ui 9992:9992 # Access at http://localhost:9992 ``` ## Basic Configuration ### API Keys Configure at least one AI provider: ```yaml bytebot-agent: apiKeys: anthropic: value: "sk-ant-your-key-here" openai: value: "sk-your-key-here" gemini: value: "your-key-here" ``` ### Resource Limits (Optional) Adjust resources based on your needs: ```yaml # Desktop container (where automation runs) desktop: resources: requests: memory: "2Gi" cpu: "1" limits: memory: "4Gi" cpu: "2" # Agent (AI orchestration) agent: resources: requests: memory: "1Gi" cpu: "500m" ``` ### External Access (Optional) Enable ingress for domain-based access: ```yaml ui: ingress: enabled: true hostname: bytebot.your-domain.com tls: true ``` ## Accessing Bytebot ### Local Access (Recommended) ```bash kubectl port-forward -n bytebot svc/bytebot-ui 9992:9992 ``` Access at: [http://localhost:9992](http://localhost:9992) ### External Access If you configured ingress: * Access at: [https://bytebot.your-domain.com](https://bytebot.your-domain.com) ## Verifying Deployment Check that all pods are running: ```bash kubectl get pods -n bytebot ``` Expected output: ``` NAME READY STATUS RESTARTS AGE bytebot-agent-xxxxx 1/1 Running 0 2m bytebot-desktop-xxxxx 1/1 Running 0 2m bytebot-postgresql-0 1/1 Running 0 2m bytebot-ui-xxxxx 1/1 Running 0 2m ``` ## Troubleshooting ### Pods Not Starting Check pod status: ```bash kubectl describe pod -n bytebot ``` Common issues: * Insufficient memory/CPU: Check node resources with `kubectl top nodes` * Missing API keys: Verify your values.yaml configuration ### Connection Issues Test service connectivity: ```bash kubectl logs -n bytebot deployment/bytebot-agent ``` ### View Logs ```bash # All logs kubectl logs -n bytebot -l app=bytebot --tail=100 # Specific component kubectl logs -n bytebot deployment/bytebot-agent ``` ## Upgrading ```bash # Update your values.yaml as needed, then: helm upgrade bytebot ./helm -n bytebot -f values.yaml ``` ## Uninstalling ```bash # Remove Bytebot helm uninstall bytebot -n bytebot # Clean up namespace kubectl delete namespace bytebot ``` ## Advanced Configuration If using Kubernetes secret management (Vault, Sealed Secrets, etc.): ```yaml bytebot-agent: apiKeys: anthropic: useExisting: true secretName: "my-api-keys" secretKey: "anthropic-key" ``` Create the secret manually: ```bash kubectl create secret generic my-api-keys \ --namespace bytebot \ --from-literal=anthropic-key="sk-ant-your-key" ``` For centralized LLM management, use the included LiteLLM proxy: ```bash helm install bytebot ./helm \ -f values-proxy.yaml \ --namespace bytebot \ --create-namespace \ --set bytebot-llm-proxy.env.ANTHROPIC_API_KEY="your-key" ``` This provides: * Centralized API key management * Request routing and load balancing * Rate limiting and retry logic Configure persistent storage: ```yaml desktop: persistence: enabled: true size: "20Gi" storageClass: "fast-ssd" postgresql: persistence: size: "20Gi" storageClass: "fast-ssd" ``` ```yaml # Network policies networkPolicy: enabled: true # Pod security podSecurityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 # Enable authentication auth: enabled: true type: "basic" username: "admin" password: "changeme" # Use secrets in production! ``` ## Next Steps Integrate Bytebot with your applications Use any LLM provider with Bytebot **Need help?** Join our [Discord community](https://discord.com/invite/d9ewZkWPTP) or check our [GitHub discussions](https://github.com/bytebot-ai/bytebot/discussions). # LiteLLM Integration Source: https://docs.bytebot.ai/deployment/litellm Use any LLM provider with Bytebot through LiteLLM proxy # Connect Any LLM to Bytebot with LiteLLM LiteLLM acts as a unified proxy that lets you use 100+ LLM providers with Bytebot - including Azure OpenAI, AWS Bedrock, Anthropic, Hugging Face, Ollama, and more. This guide shows you how to set up LiteLLM with Bytebot. ## Why Use LiteLLM? Use Azure, AWS, GCP, Anthropic, OpenAI, Cohere, and local models Monitor spending across all providers in one place Distribute requests across multiple models and providers Automatic failover when primary models are unavailable ## Quick Start with Bytebot's Built-in LiteLLM Proxy Bytebot includes a pre-configured LiteLLM proxy service that makes it easy to use any LLM provider. Here's how to set it up: The easiest way is to use the proxy-enabled Docker Compose file: ```bash # Clone Bytebot git clone https://github.com/bytebot-ai/bytebot.git cd bytebot # Set up your API keys in docker/.env cat > docker/.env << EOF # Add any combination of these keys ANTHROPIC_API_KEY=sk-ant-your-key-here OPENAI_API_KEY=sk-your-key-here GEMINI_API_KEY=your-key-here EOF # Start Bytebot with LiteLLM proxy docker-compose -f docker/docker-compose.proxy.yml up -d ``` This automatically: * Starts the `bytebot-llm-proxy` service on port 4000 * Configures the agent to use the proxy via `BYTEBOT_LLM_PROXY_URL` * Makes all configured models available through the proxy To add custom models or providers, edit the LiteLLM config: ```yaml # packages/bytebot-llm-proxy/litellm-config.yaml model_list: # Add Azure OpenAI - model_name: azure-gpt-4o litellm_params: model: azure/gpt-4o-deployment api_base: https://your-resource.openai.azure.com/ api_key: os.environ/AZURE_API_KEY api_version: "2024-02-15-preview" # Add AWS Bedrock - model_name: claude-bedrock litellm_params: model: bedrock/anthropic.claude-3-5-sonnet aws_region_name: us-east-1 # Add local models via Ollama - model_name: local-llama litellm_params: model: ollama/llama3:70b api_base: http://host.docker.internal:11434 ``` Then rebuild: ```bash docker-compose -f docker/docker-compose.proxy.yml up -d --build ``` The Bytebot agent automatically queries the proxy for available models: ```bash # Check available models through Bytebot API curl http://localhost:9991/tasks/models # Or directly from LiteLLM proxy curl http://localhost:4000/model/info ``` The UI will show all available models in the model selector. ## How It Works ### Architecture ```mermaid graph LR A[Bytebot UI] -->|Select Model| B[Bytebot Agent] B -->|BYTEBOT_LLM_PROXY_URL| C[LiteLLM Proxy :4000] C -->|Route Request| D[Anthropic API] C -->|Route Request| E[OpenAI API] C -->|Route Request| F[Google API] C -->|Route Request| G[Any Provider] ``` ### Key Components 1. **bytebot-llm-proxy Service**: A LiteLLM instance running in Docker that: * Runs on port 4000 within the Bytebot network * Uses the config from `packages/bytebot-llm-proxy/litellm-config.yaml` * Inherits API keys from environment variables 2. **Agent Integration**: The Bytebot agent: * Checks for `BYTEBOT_LLM_PROXY_URL` environment variable * If set, queries the proxy at `/model/info` for available models * Routes all LLM requests through the proxy 3. **Pre-configured Models**: Out of the box support for: * Anthropic: Claude Opus 4, Claude Sonnet 4 * OpenAI: GPT-4.1, GPT-4o * Google: Gemini 2.5 Pro, Gemini 2.5 Flash ## Provider Configurations ### Azure OpenAI ```yaml model_list: - model_name: azure-gpt-4o litellm_params: model: azure/gpt-4o-deployment-name api_base: https://your-resource.openai.azure.com/ api_key: your-azure-key api_version: "2024-02-15-preview" - model_name: azure-gpt-4o-vision litellm_params: model: azure/gpt-4o-deployment-name api_base: https://your-resource.openai.azure.com/ api_key: your-azure-key api_version: "2024-02-15-preview" supports_vision: true ``` ### AWS Bedrock ```yaml model_list: - model_name: claude-bedrock litellm_params: model: bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 aws_region_name: us-east-1 # Uses AWS credentials from environment - model_name: llama-bedrock litellm_params: model: bedrock/meta.llama3-70b-instruct-v1:0 aws_region_name: us-east-1 ``` ### Google Vertex AI ```yaml model_list: - model_name: gemini-vertex litellm_params: model: vertex_ai/gemini-1.5-pro vertex_project: your-gcp-project vertex_location: us-central1 # Uses GCP credentials from environment ``` ### Local Models (Ollama) ```yaml model_list: - model_name: local-llama litellm_params: model: ollama/llama3:70b api_base: http://ollama:11434 - model_name: local-mixtral litellm_params: model: ollama/mixtral:8x7b api_base: http://ollama:11434 ``` ### Hugging Face ```yaml model_list: - model_name: hf-llama litellm_params: model: huggingface/meta-llama/Llama-3-70b-chat-hf api_key: hf_your_token ``` ## Advanced Features ### Load Balancing Distribute requests across multiple providers: ```yaml model_list: - model_name: gpt-4o litellm_params: model: gpt-4o api_key: sk-openai-key - model_name: gpt-4o # Same name for load balancing litellm_params: model: azure/gpt-4o api_base: https://azure.openai.azure.com/ api_key: azure-key router_settings: routing_strategy: "least-busy" # or "round-robin", "latency-based" ``` ### Fallback Models Configure automatic failover: ```yaml model_list: - model_name: primary-model litellm_params: model: claude-3-5-sonnet-20241022 api_key: sk-ant-key - model_name: fallback-model litellm_params: model: gpt-4o api_key: sk-openai-key router_settings: model_group_alias: "smart-model": ["primary-model", "fallback-model"] # Use "smart-model" in Bytebot config ``` ### Cost Controls Set spending limits and track usage: ```yaml general_settings: master_key: sk-litellm-master database_url: "postgresql://user:pass@localhost:5432/litellm" # Budget limits max_budget: 100 # $100 monthly limit budget_duration: "30d" # Per-model limits model_max_budget: gpt-4o: 50 claude-3-5-sonnet: 50 litellm_settings: callbacks: ["langfuse"] # For detailed tracking ``` ### Rate Limiting Prevent API overuse: ```yaml model_list: - model_name: rate-limited-gpt litellm_params: model: gpt-4o api_key: sk-key rpm: 100 # Requests per minute tpm: 100000 # Tokens per minute ``` ## Alternative Setup: External LiteLLM Proxy If you prefer to run LiteLLM separately or have an existing LiteLLM deployment: ### Option 1: Modify docker-compose.yml ```yaml # docker-compose.yml (without built-in proxy) services: bytebot-agent: environment: # Point to your external LiteLLM instance - BYTEBOT_LLM_PROXY_URL=http://your-litellm-server:4000 # ... rest of config ``` ### Option 2: Use Environment Variable ```bash # Set the proxy URL before starting export BYTEBOT_LLM_PROXY_URL=http://your-litellm-server:4000 # Start normally docker-compose -f docker/docker-compose.yml up -d ``` ### Option 3: Run Standalone LiteLLM ```bash # Run your own LiteLLM instance docker run -d \ --name litellm-external \ -p 4000:4000 \ -v $(pwd)/custom-config.yaml:/app/config.yaml \ -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \ ghcr.io/berriai/litellm:main \ --config /app/config.yaml # Then start Bytebot with: export BYTEBOT_LLM_PROXY_URL=http://localhost:4000 docker-compose up -d ``` ## Kubernetes Setup Deploy with Helm: ```yaml # litellm-values.yaml replicaCount: 2 image: repository: ghcr.io/berriai/litellm tag: main service: type: ClusterIP port: 4000 config: model_list: - model_name: claude-3-5-sonnet litellm_params: model: claude-3-5-sonnet-20241022 api_key: ${ANTHROPIC_API_KEY} general_settings: master_key: ${LITELLM_MASTER_KEY} # Then in Bytebot values.yaml: agent: openai: enabled: true apiKey: "${LITELLM_MASTER_KEY}" baseUrl: "http://litellm:4000/v1" model: "claude-3-5-sonnet" ``` ## Monitoring & Debugging ### LiteLLM Dashboard Access metrics and logs: ```bash # Port forward to dashboard kubectl port-forward svc/litellm 4000:4000 # Access at http://localhost:4000/ui # Login with your master_key ``` ### Debug Requests Enable detailed logging: ```yaml litellm_settings: debug: true detailed_debug: true general_settings: master_key: sk-key store_model_in_db: true # Store request history ``` ### Common Issues Check model name matches exactly: ```bash curl http://localhost:4000/v1/models \ -H "Authorization: Bearer sk-key" ``` Verify master key in both LiteLLM and Bytebot: ```bash # Test LiteLLM curl http://localhost:4000/v1/chat/completions \ -H "Authorization: Bearer sk-key" \ -H "Content-Type: application/json" \ -d '{"model": "your-model", "messages": [{"role": "user", "content": "test"}]}' ``` Check latency per provider: ```yaml router_settings: routing_strategy: "latency-based" enable_pre_call_checks: true ``` ## Best Practices ### Model Selection for Bytebot Choose models with strong vision capabilities for best results: * Claude 3.5 Sonnet (Best overall) * GPT-4o (Good vision + reasoning) * Gemini 1.5 Pro (Large context) * Claude 3.5 Haiku (Fast + cheap) * GPT-4o mini (Good balance) * Gemini 1.5 Flash (Very fast) * LLaVA (Vision support) * Qwen-VL (Vision support) * CogVLM (Vision support) ### Performance Optimization ```yaml # Optimize for Bytebot workloads router_settings: routing_strategy: "latency-based" cooldown_time: 60 # Seconds before retrying failed provider num_retries: 2 request_timeout: 600 # 10 minutes for complex tasks # Cache for repeated requests cache: true cache_params: type: "redis" host: "redis" port: 6379 ttl: 3600 # 1 hour ``` ### Security ```yaml general_settings: master_key: ${LITELLM_MASTER_KEY} # IP allowlist allowed_ips: ["10.0.0.0/8", "172.16.0.0/12"] # Audit logging store_model_in_db: true # Encryption encrypt_keys: true # Headers to forward forward_headers: ["X-Request-ID", "X-User-ID"] ``` ## Next Steps Full list of 100+ providers Official LiteLLM proxy server documentation Complete LiteLLM documentation **Pro tip:** Start with a single provider, then add more as needed. LiteLLM makes it easy to switch or combine models without changing Bytebot configuration. # Deploying Bytebot on Railway Source: https://docs.bytebot.ai/deployment/railway Comprehensive guide to deploying the full Bytebot stack on Railway using the official 1-click template > **TL;DR –** Click the button below, add your AI API key (Anthropic, OpenAI, or Google), and your personal Bytebot instance will be live in \~2 minutes. [![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/bytebot?referralCode=L9lKXQ) *** ## Why Railway? Railway provides a zero-ops PaaS experience with private networking and per-service logs that perfectly fits Bytebot’s multi-container architecture. The official template wires every service together using the latest container images pushed to the `edge` branch. *** ## What Gets Deployed | Service | Container Image (edge) | Port | Exposed? | Purpose | | ------------------- | ----------------------------------------- | ---- | -------- | ------------------------------------ | | **bytebot-ui** | `ghcr.io/bytebot-ai/bytebot-ui:edge` | 9992 | **Yes** | Next.js web UI rendered to the world | | **bytebot-agent** | `ghcr.io/bytebot-ai/bytebot-agent:edge` | 9991 | No | Task orchestration & LLM calls | | **bytebot-desktop** | `ghcr.io/bytebot-ai/bytebot-desktop:edge` | 9990 | No | Containerised Ubuntu + XFCE desktop | | **postgres** | `postgres:14-alpine` | 5432 | No | Persistence layer | All internal traffic flows through Railway’s [private networking](https://docs.railway.com/guides/private-networking). Only `bytebot-ui` is assigned a public domain. *** ## Step-by-Step Walk-through Click the **Deploy on Railway** button above or visit [https://railway.com/deploy/bytebot?referralCode=L9lKXQ](https://railway.com/deploy/bytebot?referralCode=L9lKXQ). For the bytebot-agent resource, add your AI API key (choose at least one): * **Anthropic**: Paste into `ANTHROPIC_API_KEY` for Claude models * **OpenAI**: Paste into `OPENAI_API_KEY` for GPT models * **Google**: Paste into `GEMINI_API_KEY` for Gemini models Keep other defaults as is. Press **Deploy**. Railway will pull the pre-built images, create the Postgres database and link all services on a private network. When the build logs show *"bytebot-ui: ready"*, click the generated URL (e.g. `https://bytebot-ui-prod.up.railway.app`). You should see the task interface. Create a task and watch the desktop stream!\ *Tip: You can tail logs for each service from the Railway dashboard.* The first deploy downloads several container layers – expect \~2 minutes. Subsequent redeploys are much faster. *** ## Private Networking & Security • **Private networking** ensures that the agent, desktop and database can communicate securely without exposing their ports to the internet.\ • **Public exposure** is limited to the UI which serves static assets and proxies WebSocket traffic.\ • **Add authentication** by placing the UI behind Railway’s built-in password protection or an external provider (e.g. Cloudflare Access, Auth0, OAuth proxy).\ • You can also point a custom domain to the UI from the Railway dashboard and enable Cloudflare for WAF/CDN protection. *** ## Customisation & Scaling 1. **Change images** – Fork the repo, push your own images and edit the template’s `Dockerfile` references. 2. **Increase resources** – Each service has an independent CPU/RAM slider in Railway. Bump up the desktop or agent if you plan heavy automations. *** ## Troubleshooting | Symptom | Likely Cause | Fix | | --------------------------- | -------------------------------------------------- | --------------------------------------------------------------------------- | | Web UI shows “connecting…” | Desktop not ready or private networking mis-config | Wait for `bytebot-desktop` container to finish starting, or restart service | | Agent errors `401` or `403` | Missing/invalid API key | Re-enter your AI provider's API key in Railway variables | | Slow desktop video | Free Railway plan throttling | Upgrade plan or reduce screen resolution in desktop settings | *** ## Next Steps • Explore the [REST APIs](/api-reference/introduction) to script tasks programmatically.\ • Join our [Discord](https://discord.com/invite/d9ewZkWPTP) community for support and showcase your automations! # Password Management & 2FA Source: https://docs.bytebot.ai/guides/password-management How Bytebot handles authentication automatically using password managers # Automated Authentication with Bytebot Bytebot can handle authentication automatically - including passwords, 2FA, and even complex multi-step authentication flows - when you set up a password manager extension. **Important**: Password manager extensions are not enabled by default. You need to install them manually using the desktop view. ## How It Works Bytebot comes with 1Password built-in and supports any browser-based password manager extension. It can: * Automatically fill passwords from the password manager * Handle 2FA codes (TOTP/authenticator apps) * Manage multiple accounts across different systems * Work with SSO and federated authentication * Store and use API keys and tokens ## Setting Up Password Management ### Option 1: 1Password (Recommended) 1. Go to the Desktop tab in Bytebot UI 2. Open Firefox 3. Install the 1Password extension from the Firefox Add-ons store 4. Sign in to your 1Password account (or create a dedicated one for Bytebot) In your 1Password admin panel: 1. Create a vault called "Bytebot Automation" 2. Add the credentials Bytebot needs 3. Share the vault with Bytebot's account 4. Set appropriate permissions (read-only recommended) The 1Password extension will automatically: * Detect login forms * Fill credentials * Handle 2FA codes * Submit forms ### Option 2: Other Password Managers You can use any browser-based password manager by installing it through the Desktop view: 1. Open Desktop tab 2. Launch Firefox 3. Install Bitwarden extension from Firefox Add-ons 4. Log in to your Bitwarden account 5. Configure auto-fill settings in Bitwarden preferences 1. Open Desktop tab 2. Launch Firefox 3. Install LastPass extension from Firefox Add-ons 4. Log in with your enterprise account 5. Accept any shared folders for automation credentials 1. Open Desktop tab 2. Install KeePassXC application if needed 3. Install KeePassXC browser extension in Firefox 4. Configure browser integration 5. Load your KeePass database ## Handling Different Authentication Types ### Standard Username/Password ```yaml # Task description Task: "Log into our CRM system and export the customer list" # Bytebot automatically: 1. Navigates to login page 2. Password manager detects form 3. Auto-fills credentials 4. Submits login 5. Proceeds with task ``` ### Time-based 2FA (TOTP) ```yaml # Task description Task: "Access the banking portal and download statements" # Bytebot handles: 1. Enters username/password from password manager 2. When 2FA prompt appears 3. Password manager provides TOTP code 4. Enters code automatically 5. Completes authentication ``` ### Complex Multi-Step Auth ```yaml # Task description Task: "Log into the government portal (uses email verification)" # Bytebot can: 1. Fill initial credentials 2. Handle "send code to email" flows 3. Access webmail account (also in password manager) 4. Retrieve verification code from webmail 5. Complete authentication ``` ## Enterprise Setup Guide ### Centralized Credential Management Set up dedicated service accounts for Bytebot: ``` - bytebot-finance@company.com (banking portals) - bytebot-hr@company.com (HR systems) - bytebot-ops@company.com (operational tools) ``` Structure your password manager: ``` Bytebot Vaults/ ├── Financial Systems/ │ ├── Banking Portal A │ ├── Banking Portal B │ └── Payment Processor ├── Internal Tools/ │ ├── ERP System │ ├── CRM Platform │ └── HR Portal └── External Services/ ├── Vendor Portal 1 ├── Government Site └── Partner System ``` Configure automatic password rotation: ```javascript // Example automation for password rotation { "schedule": "monthly", "task": "For each credential in 'Rotation Required' vault, update password in the system and save new password" } ``` ### Security Best Practices Only share credentials Bytebot needs for specific tasks Enable password manager audit logs to track access Separate vaults by sensitivity level and department Audit Bytebot's credential access monthly ## Common Authentication Scenarios ### Banking and Financial Systems ```yaml Scenario: Daily bank reconciliation across 5 banks Setup: - Each bank credential in password manager - 2FA seeds stored for TOTP generation - Bytebot's IP whitelisted at banks Task: "Log into each bank account, download yesterday's transactions, and consolidate into daily report" Result: Fully automated, no human intervention needed ``` ### Government and Compliance Portals ```yaml Scenario: Weekly regulatory filings Setup: - Service account with 2FA enabled - Password manager has TOTP seed - Security questions stored as notes Task: "Log into state tax portal, file weekly sales tax report using data from tax_data.csv" Handles: Password, 2FA, security questions, CAPTCHAs ``` ### Multi-Tenant SaaS Platforms ```yaml Scenario: Managing multiple client accounts Setup: - Credentials for each tenant/client - Organized in password manager by client - Naming convention: client-platform-role Task: "For each client in client_list.txt, log into their Shopify account and export this month's orders" Scales: Handles 100+ accounts seamlessly ``` ## Advanced Authentication Features ### SSO and SAML Integration ```yaml # Bytebot can handle SSO flows Task: "Log into Salesforce using Okta SSO" Process: 1. Navigate to Salesforce 2. Click "Log in with SSO" 3. Redirect to Okta 4. Password manager fills Okta credentials 5. Handle any 2FA on Okta 6. Redirect back to Salesforce 7. Continue with task ``` ### API Key Management ```yaml # Store API keys in password manager Password Entry: "OpenAI API Key" - Username: "api" - Password: "sk-proj-..." - Notes: "Rate limit: 10000/day" # Use in tasks Task: "Configure the application to use our OpenAI API key from the password manager" ``` ### Certificate-Based Auth ```yaml # For systems requiring certificates Setup: 1. Store certificate password in manager 2. Mount certificate file to Bytebot 3. Configure browser to use certificate Task: "Access the enterprise portal that requires client certificate authentication" ``` ## Troubleshooting Authentication **Solutions:** * Ensure extension is installed and logged in * Check site is saved in password manager * Verify auto-fill settings are enabled * Try refreshing the page **Common causes:** * Time sync issues (check system clock) * Wrong TOTP seed saved * Site using non-standard 2FA **Fix:** ```bash # Sync system time docker exec bytebot-desktop ntpdate -s time.nist.gov ``` **Solutions:** * Enable "remember me" if available * Increase session timeout in target system * Break long tasks into smaller chunks * Use API access where possible ## Integration Examples ### Finance Automation Script ```python # Example: Automated invoice collection tasks = [ { "description": "Log into vendor portal A and download all pending invoices", "credentials": "vault://Financial Systems/Vendor Portal A" }, { "description": "Log into vendor portal B and download all pending invoices", "credentials": "vault://Financial Systems/Vendor Portal B" }, { "description": "Process all downloaded invoices through our AP system", "credentials": "vault://Internal Tools/AP System" } ] # Bytebot handles all authentication automatically ``` ### Compliance Automation ```yaml Daily Compliance Check: Morning: - Log into regulatory portal (2FA enabled) - Download new compliance updates - Check our status If Non-Compliant: - Log into internal system - Create compliance ticket - Notify compliance team All credentials managed automatically ``` ## Best Practices Summary ✅ **DO:** * Use dedicated service accounts for Bytebot * Organize credentials in logical vaults * Enable 2FA on all accounts (Bytebot handles it!) * Rotate passwords regularly * Monitor access logs ❌ **DON'T:** * Share personal credentials with Bytebot * Store passwords in task descriptions * Disable 2FA for convenience * Use the same password across systems * Ignore authentication errors ## Next Steps See auth in action Programmatic credential management **Game Changer**: With proper password manager setup, Bytebot can handle even the most complex authentication flows automatically. No more manual intervention for 2FA, no more sharing passwords insecurely, and no more authentication bottlenecks in your automation workflows! # Takeover Mode Source: https://docs.bytebot.ai/guides/takeover-mode Take control of the desktop when you need to guide or assist Bytebot # Takeover Mode: Human-AI Collaboration Takeover mode lets you take control of the desktop to help Bytebot when needed. There are two ways to use it: ## 1. During Task Execution In the task detail view, you can hit the takeover button to: * Interrupt the agent if it's going down the wrong path * Guide it towards the correct solution * Resolve issues when it's stumbling on something ## 2. Automatic Activation Takeover mode is automatically enabled when a task status is set to "needs help" - this happens when the agent realizes it can't accomplish something on its own. ## How Actions Are Recorded All your actions during takeover (clicks, drags, scrolls, typing, key presses) are automatically logged in the same unified action space that the agent uses. This means Bytebot understands and learns from everything you do. ## Desktop Tab for Setup Outside of tasks, there's a dedicated **Desktop** tab on the main page that provides: * Free-ranging access to the desktop * Nothing is recorded in this mode * Perfect for: * Installing programs * Logging into apps or websites * Setting up the desktop environment * General desktop maintenance ## Activating Takeover Mode ### Method 1: Manual Takeover During Tasks While Bytebot is working on a task, click on the task to open the detail view. Hit the takeover button to interrupt the agent and take control. Perform the necessary actions to get past the obstacle or show the correct path. Click to release control and let Bytebot continue from where you left off. ### Method 2: Automatic When Help Needed When Bytebot sets a task status to "needs help": * Takeover mode is automatically enabled * You'll see a notification that Bytebot needs assistance * Take control to help resolve the issue * Bytebot will continue once you release control ## Common Use Cases ### 1. Complex UI Navigation **Scenario**: Working with proprietary or complex software **Steps**: 1. Let Bytebot open the application 2. Take control to navigate complex interfaces 3. Use the chat to explain what you're doing 4. Return control for Bytebot to continue **Example**: "Open our internal CRM, I'll show you how to navigate to the reports section" ### 2. Error Recovery **Scenario**: Bytebot encounters an error or gets stuck **Steps**: 1. Notice Bytebot is struggling 2. Take control to resolve the issue 3. Guide it past the problem 4. Explain what went wrong in chat 5. Return control to let Bytebot continue **Example**: "Let me handle this unexpected popup that's blocking the workflow" ### 3. Teaching by Demonstration **Scenario**: Complex multi-step processes **Steps**: 1. Take control when you need to demonstrate 2. Perform the task normally (no need to move slowly) 3. Use chat to explain what you're clicking and why 4. Return control 5. Ask Bytebot to repeat the process **Example**: "Watch me navigate through our vendor portal to find the invoice section" **Important**: Screenshots are taken for every action during takeover mode. Do not enter any data that you don't want captured in screenshots. ## Best Practices ### Do's ✅ * **Use Chat While Taking Over**: Type messages explaining what you're doing and why * **Explain Your Clicks**: Share context about UI elements and their purpose * **Return Control Before Leaving**: Always release control before exiting the task detail view * **Test Understanding**: Ask Bytebot to summarize what it learned ### Don'ts ❌ * **Enter Data You Don't Want Captured**: Screenshots are taken of all actions * **Skip Chat Explanations**: Context helps Bytebot learn patterns * **Leave Task View While in Control**: This will leave the task stuck in takeover mode * **Assume Knowledge**: Explain application-specific workflows **No Need to Move Slowly**: Bytebot captures the state before and after each action, so you can work at normal speed. ## Summary Takeover mode provides flexibility when you need to guide Bytebot or handle situations it can't manage alone. Whether you're navigating complex interfaces, recovering from errors, or teaching new workflows, takeover mode ensures you're always in control when needed. # Task Creation & Management Source: https://docs.bytebot.ai/guides/task-creation Master the art of creating effective tasks and managing them through completion # Creating and Managing Tasks in Bytebot This guide will walk you through everything you need to know about creating tasks that Bytebot can execute effectively, and managing them through their lifecycle. ## Understanding Tasks A task is any job you want Bytebot to complete. Tasks can be: * **Simple**: "Log in to GitHub" or "Visit example.com" (uses one program) * **Complex**: "Download invoices from email and save them to a folder" (uses multiple programs) * **File-based**: "Read the uploaded PDF and extract all email addresses" (processes uploaded files) * **Collaborative**: "Process invoices, ask me to handle special approvals" ## Working with Files Bytebot has powerful file handling capabilities that make it perfect for document processing and data analysis tasks. ### Uploading Files with Tasks When creating a task, you can upload files that will be automatically saved to the desktop instance. This is incredibly useful for: * **Document Processing**: Upload PDFs, spreadsheets, or documents for Bytebot to analyze * **Data Analysis**: Provide CSV files or datasets for processing * **Template Filling**: Upload forms or templates that need to be completed * **Batch Operations**: Upload multiple files for bulk processing **Game Changer**: Bytebot can read entire files, including PDFs, directly into the LLM context. This means it can process large amounts of data quickly and understand complex documents without manual extraction. ### File Upload Examples 1. Click the attachment button when creating a task 2. Select files to upload (PDFs, CSVs, images, etc.) 3. Files are automatically saved to the desktop 4. Reference them in your task description: ``` "Read the uploaded contracts.pdf and extract all payment terms, then create a summary spreadsheet with vendor names and terms" ``` ```bash # Upload files with task creation (multipart/form-data) curl -X POST http://localhost:9991/tasks \ -F "description=Analyze the uploaded financial statements and create a summary" \ -F "priority=HIGH" \ -F "files=@financial_statements_2024.pdf" \ -F "files=@budget_comparison.xlsx" ``` ### File Processing Capabilities * Extract text from PDFs * Read entire PDFs into context * Parse forms and contracts * Extract tables and data * Read Excel/CSV files * Analyze data patterns * Generate reports * Cross-reference multiple sheets * Summarize long documents * Extract key information * Compare multiple files * Answer questions about content * Process multiple files * Apply same analysis to each * Consolidate results * Generate unified reports ## Creating Your First Task ### Using the Web UI Navigate to `http://localhost:9992` In the input field on the left side, type what you want done. For example: ``` Log in to my GitHub account and check for new notifications ``` Press the arrow button or hit Enter. Bytebot will start loading and begin working on your task. ### Using the API ```bash curl -X POST http://localhost:9991/tasks \ -H "Content-Type: application/json" \ -d '{ "description": "Download all PDF invoices from my email and organize by date", "priority": "HIGH", "type": "IMMEDIATE" }' ``` ## Writing Effective Task Descriptions ### The Golden Rules ❌ "Do some research" ✅ "Research top 5 CRM tools for small businesses" ❌ "Fill out the form" ✅ "Fill out the contact form on example.com with test data" ❌ "Organize files" ✅ "Organize files in Downloads folder by type into subfolders" ❌ "Do multiple unrelated things" ✅ "Focus on a single objective with clear steps" ### Task Description Templates #### Enterprise Process Automation ``` Log into [system] and: 1. [Navigate to specific section] 2. [Download/Extract data] 3. [Process through other system] 4. [Update records/Generate report] Handle any [specific scenarios] Example: Log into our banking portal and: 1. Navigate to wire transfers section 2. Download all pending wire confirmations 3. Match against our ERP payment records 4. Flag any discrepancies in the reconciliation sheet (Bytebot handles all authentication including 2FA automatically via password manager) ``` #### Multi-Application Workflow ``` Access [System A] to get [data] Then in [System B]: 1. [Process the data] 2. [Update records] Finally in [System C]: 1. [Verify updates] 2. [Generate confirmation] Example: Access Salesforce to get list of new customers from today Then in NetSuite: 1. Create customer records with billing info 2. Set up payment terms Finally in our shipping system: 1. Verify addresses are valid 2. Generate welcome kit shipping labels ``` #### Compliance & Audit Task ``` For each [entity] in [source]: 1. Check [compliance requirement] 2. Document [specific data] 3. Flag any [violations/issues] Generate report showing [metrics] Example: For each vendor in our approved vendor list: 1. Check their insurance certificates are current 2. Document expiration dates and coverage amounts 3. Flag any expiring within 30 days Generate report showing compliance percentage by category ``` ## Managing Active Tasks ### Task States Task Lifecycle Tasks move through these states: 1. **Created** → Task is defined but not started 2. **Queued** → Waiting for agent availability 3. **Running** → Actively being worked on 4. **Needs Help** → Requires human input 5. **Completed** → Successfully finished 6. **Failed** → Could not be completed ### Monitoring Progress #### Real-time Updates Watch Bytebot work through the task detail viewer: * **Green dot**: Task is actively running * **Status messages**: Current step being executed * **Desktop view**: See what Bytebot sees in real-time #### Chat Messages Bytebot provides updates like: ``` Assistant: I'm now searching for project management tools... Assistant: Found 15 options, filtering by your criteria... Assistant: Creating the comparison table with 5 tools... ``` ### Interacting with Running Tasks #### Providing Additional Information ``` User: "Also include free tier options in your research" Assistant: "I'll add a column for free tier availability to the comparison table." ``` #### Clarifying Instructions ``` Assistant: "I found multiple forms on this page. Which one should I fill out?" User: "Use the 'Contact Sales' form on the right side" ``` #### Modifying Tasks ``` User: "Actually, make it top 10 tools instead of top 5" Assistant: "I'll expand my research to include 10 tools in the comparison." ``` ## Advanced Task Management ### Task Dependencies Chain tasks that depend on each other: ``` Task 1: "Download the invoice from the vendor portal" Task 2: "Open the downloaded invoice and extract the total amount" Task 3: "Enter the amount into our accounting system" ``` ## Best Practices ### Do's ✅ 1. **Start Simple**: Test with basic tasks before complex ones 2. **Provide Examples**: "Format it like the report from last week" 3. **Include Credentials Safely**: Use takeover mode for passwords 4. **Set Realistic Expectations**: Complex tasks take time 5. **Review Results**: Always verify important outputs ### Don'ts ❌ 1. **Overload Single Tasks**: Break complex workflows into steps 2. **Assume Knowledge**: Explain custom applications 3. **Skip Context**: Always provide necessary background 4. **Ignore Errors**: Address issues promptly 5. **Rush Critical Tasks**: Allow time for careful execution ## Task Examples by Category ### 📄 Document Processing & Analysis ``` "Read the uploaded contract.pdf and extract all key terms including payment schedules, deliverables, and termination clauses. Create a summary document with these details." "Process all the uploaded invoice PDFs, extract vendor names, amounts, and due dates, then create a consolidated Excel spreadsheet sorted by due date." "Analyze the uploaded financial_report.pdf and answer these questions: What was the revenue growth? What are the main risk factors mentioned? What is the debt-to-equity ratio?" "Read through the uploaded employee_handbook.pdf and create a checklist of all compliance requirements mentioned in the document." ``` ### 🏦 Enterprise Automation (RPA-Style Workflows) ``` "Log into our banking portal, download all transaction files from last month, save them to the Finance/Statements folder, then run the reconciliation script on each file." (Note: Bytebot handles all authentication including 2FA automatically using the built-in password manager) "Access the vendor portal at supplier.example.com, navigate to the invoice section, download all pending invoices, extract the data into our standard template, and upload to the AP system." "Open our legacy ERP system, export the customer list, then for each customer, look them up in the new CRM and update their status and last contact date." ``` ### 📊 Financial Operations & Data Analysis ``` "Read the uploaded bank_statements folder containing 12 monthly PDFs, extract all transactions over $10,000, and create a summary report showing patterns and anomalies." "Log into each of our 5 bank accounts, download the daily statements, consolidate them into a single cash position report, and save to the shared finance folder." "Process the uploaded expense_reports.zip file, review all reports over $1,000, create a summary with policy violations flagged, and prepare for approval." "Navigate to the tax authority website, download all GST/VAT returns for Q4, extract the figures, and populate our tax reconciliation spreadsheet." ``` ### 🔄 Multi-System Integration ``` "Pull today's orders from Shopify, create corresponding entries in NetSuite, update inventory in our WMS, and trigger shipping labels in ShipStation." "Extract employee data from Workday, cross-reference with our access control system, identify discrepancies, and create tickets for IT to resolve." "Log into our insurance portal, download policy documents for all active policies, extract key dates and coverage amounts, update our risk management database." ``` ### 📈 Compliance & Reporting ``` "Access all state regulatory websites for our operating regions, check for new compliance updates since last month, download relevant documents, and create a summary report." "Log into our various SaaS tools (list provided), export user access reports, consolidate into a single audit trail, and flag any terminated employees still with access." "Navigate to customer portal, download all SLA performance reports, extract metrics, compare against our internal data, and highlight discrepancies." ``` ### 🤝 Development & QA Integration ``` "After the code agent deploys the new feature, test the complete user journey from signup to checkout, take screenshots at each step, and verify against the design specs." "Run through all test scenarios in our QA checklist, but for any failures, have the code agent analyze the error and attempt a fix, then retest automatically." "Monitor our staging environment, when a new build is deployed, automatically run the smoke test suite and create a visual regression report comparing to production." ``` ## Troubleshooting Common Issues **Possible causes**: * Waiting for slow page/app to load * Encountered unexpected popup * Unclear next step **Solutions**: * Check desktop viewer for current state * Provide clarification via chat * Use takeover mode to help * Cancel and restart with clearer instructions **Possible causes**: * Ambiguous instructions * Website/app changed * Misunderstood context **Solutions**: * Review task description for clarity * Provide specific examples * Break into smaller subtasks * Use takeover mode to demonstrate **Possible causes**: * Invalid URL or application * Missing prerequisites * System resource issues **Solutions**: * Verify URLs and application names * Ensure required files/data exist * Check system resources * Review error messages in chat ## Task Management Tips ### Organizing Multiple Tasks 1. **Use Clear Naming**: Include date, category, or project 2. **Group Related Tasks**: Process similar tasks together 3. **Priority Management**: Reserve 'Urgent' for true emergencies 4. **Regular Reviews**: Check completed tasks for quality ### Performance Optimization * **Batch Similar Tasks**: Group web research, data entry, etc. * **Prepare Resources**: Have files/data ready before starting * **Clear Desktop**: Minimize distractions and popups * **Stable Environment**: Ensure good internet and system resources ### Learning from Tasks After each task: 1. Review the approach Bytebot took 2. Note any inefficiencies 3. Refine future task descriptions 4. Build a library of effective prompts ## Next Steps Learn human-AI collaboration Automate task creation **Pro Tip**: Start with simple tasks to understand Bytebot's capabilities, then gradually increase complexity as you learn what works best. # Introduction Source: https://docs.bytebot.ai/introduction Open source AI desktop agent that automates any computer task

Bytebot Logo Bytebot Logo

## What is Bytebot? Bytebot is an open-source AI agent that can control a computer desktop to complete tasks for you. It runs in Docker containers on your own infrastructure, giving you a virtual assistant that can: * Use any desktop application (browser, email, office tools, etc.) * Process uploaded files including PDFs, spreadsheets, and documents * Read entire files directly into the LLM context for rapid analysis * Automate repetitive tasks like data entry and form filling * Handle complex workflows that span multiple applications * Work 24/7 without human supervision Simply describe what you need done in plain English, and Bytebot will figure out how to do it – clicking buttons, typing text, navigating websites, reading documents, and completing tasks just like a human would. ## Why Bytebot Over Traditional RPA? Unlike UiPath or similar tools, no need to design flowcharts or write scripts - just describe tasks naturally AI-powered understanding means Bytebot adapts to UI changes without breaking Can read and understand any interface, not just pre-mapped elements Handles unexpected popups, errors, and variations automatically ## Why Self-Host Bytebot? Your tasks and data never leave your infrastructure. Everything runs locally on your servers. Customize the desktop environment, install any applications, and configure to your exact needs. Use your own LLM API keys without platform restrictions or additional fees. Each desktop runs in its own container, completely isolated from your host system. ## Real-World Use Cases ### Enterprise Automation (RPA Replacement) Bytebot is the next generation of RPA (Robotic Process Automation). It handles the same complex workflows as traditional tools like UiPath, but with AI-powered adaptability and automatic authentication: * **Financial Operations**: Automate banking portal access (including 2FA when password manager extensions are configured), download transaction files, and process them through multiple systems * **Compliance Workflows**: Navigate government websites, download regulatory documents, extract data, and update compliance tracking systems * **Multi-System Integration**: Bridge legacy systems that lack APIs by automating the UI interactions between them * **Vendor Management**: Log into supplier portals, download invoices, reconcile with internal systems, and process payments ### Business Process Automation * **Data Reconciliation**: Pull reports from multiple SaaS platforms, cross-reference data, and generate consolidated reports * **Customer Onboarding**: Navigate between CRM, banking, and verification systems to complete new customer setup * **Purchase Order Processing**: Extract POs from webmail portals, enter into ERP systems, and update inventory databases * **HR Operations**: Collect employee data from various systems, update records, and ensure consistency across platforms ### Development & QA Integration Bytebot becomes even more powerful when combined with coding agents: * **Full-Stack Testing**: Use a coding agent to generate code, then have Bytebot visually test and validate the output * **Automated Debugging**: Let Bytebot reproduce user-reported issues while a coding agent analyzes and fixes the code * **End-to-End Development**: Code agents write features, Bytebot tests them, creating a complete development loop * **Visual Regression Testing**: Automatically detect UI changes across deployments with screenshot comparisons ## How It Works Simply tell Bytebot what you want done in natural language through the tasks interface Bytebot understands your request and breaks it down into specific computer actions Bytebot executes the task on its virtual desktop using the keyboard and mouse Monitor it working in real-time through the task detail view, or let it complete tasks independently. Receive the completed task output, screenshots, or confirmation of completion ## Architecture Overview Bytebot consists of four integrated components working together: Bytebot Agent Architecture Ubuntu 22.04 with XFCE4, VSCode, Firefox, Thunderbird email client, and automation daemon (bytebotd) NestJS service that uses LLMs (Anthropic Claude, OpenAI GPT, Google Gemini) to plan and execute tasks Next.js web app for creating and managing tasks Programmatic access to both task management and direct desktop control ## Getting Started Get Bytebot running in 2 minutes Understand how it all fits together Integrate with your applications ## Key Features ### 🤖 Natural Language Control Just tell Bytebot what you need done. No coding or complex automation tools required. ### 🖥️ Full Desktop Access Bytebot can use any application you can install - browsers, office tools, custom software. ### 🔒 Complete Privacy Runs entirely on your infrastructure. Your data never leaves your servers. ### 🔄 Two Operating Modes * **Autonomous Mode**: Bytebot completes tasks independently * **Takeover Mode**: You can step in and take control when needed ### 🖱️ Direct Desktop Access * **Desktop Tab**: Free-form access to the virtual desktop for setup, installing programs, or manual operations * **Task View**: Watch and interact with Bytebot during task execution ### 🚀 Easy Deployment * One-click deployment on Railway * Docker Compose for self-hosting * Helm charts for Kubernetes ### 🔌 Developer-Friendly * REST APIs for programmatic control * Task management API * Extensible architecture * MCP (Model Context Protocol) support ## Community & Support Join our community for help, tips, and discussions Report issues, contribute, or star the project **Ready to give your AI its own computer?** Start with our [Quick Start Guide](/quickstart) to have your own AI desktop agent running in minutes. # Quick Start Source: https://docs.bytebot.ai/quickstart Get your AI desktop agent running in 2 minutes # Choose Your Deployment Method Bytebot can be deployed in several ways depending on your needs: ## ☁️ One-click Deploy on Railway [![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/bytebot?referralCode=L9lKXQ) Click the Deploy Now button in the Bytebot template on Railway. Enter either your `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY` for the bytebot-agent resource. Hit **Deploy**. Railway will build the stack, wire the services together via private networking and output a public URL for the UI. Your agent should be ready within a couple of minutes! Need more details? See the full Railway deployment guide. ## 🐳 Self-host with Docker Compose ## Prerequisites * Docker ≥ 20.10 * Docker Compose * 4GB+ RAM available * AI API key from one of these providers: * Anthropic ([get one here](https://console.anthropic.com)) - Claude models * OpenAI ([get one here](https://platform.openai.com/api-keys)) - GPT models * Google ([get one here](https://makersuite.google.com/app/apikey)) - Gemini models ## 🚀 2-Minute Setup Get your self-hosted AI desktop agent running with just three commands: ```bash git clone https://github.com/bytebot-ai/bytebot.git cd bytebot # Configure your AI provider (choose one): echo "ANTHROPIC_API_KEY=your_api_key_here" > docker/.env # For Claude # echo "OPENAI_API_KEY=your_api_key_here" > docker/.env # For OpenAI # echo "GEMINI_API_KEY=your_api_key_here" > docker/.env # For Gemini ``` ```bash docker-compose -f docker/docker-compose.yml up -d ``` This starts all four services: * **Bytebot Desktop**: Containerized Linux environment * **AI Agent**: LLM-powered task processor (supports Claude, GPT, or Gemini) * **Chat UI**: Web interface for interaction * **Database**: PostgreSQL for persistence Navigate to [http://localhost:9992](http://localhost:9992) to access the Bytebot UI. **Two ways to interact:** 1. **Tasks**: Enter task descriptions to have Bytebot work autonomously 2. **Desktop**: Direct access to the virtual desktop for manual control Try asking: * "Open Firefox and search for the weather forecast" * "Take a screenshot of the desktop" * "Create a text file with today's date" **First time?** The initial startup may take 2-3 minutes as Docker downloads the images. Subsequent starts will be much faster. ## 🎯 What You Just Deployed You now have a complete AI desktop automation system with: **🔐 Password Manager Support**: Bytebot can handle authentication automatically when you install a password manager extension. See our [password management guide](/guides/password-management) for setup instructions. * Understands natural language * Plans and executes tasks * Adapts to errors * Works autonomously * Full Ubuntu environment * Browser, office tools * File system access * Application support * Create and manage tasks * Real-time desktop view * Conversation history * Takeover mode * Programmatic control * Task management API * Direct desktop access * MCP protocol support ## 🚀 Your First Tasks Now let's see Bytebot in action! Try these example tasks: ### Simple Tasks (Test the Basics) "Take a screenshot of the desktop" "Open Firefox and go to google.com" "Create a text file called 'hello.txt' with today's date" "Check the system information and tell me the OS version" ### Advanced Tasks (See the Power) "Find the top 5 AI news stories today and create a summary document" "Go to hacker news, find the top 10 stories, and save them to a CSV file" "Upload a PDF contract and extract all payment terms and deadlines" "Search for 'machine learning tutorials', open the first 3 results in tabs, and take screenshots of each" ## Accessing Your Services | Service | URL | Purpose | | --------------- | ------------------------------------------------------------------------ | --------------------------------------------- | | **Tasks UI** | [http://localhost:9992](http://localhost:9992) | Main interface for interacting with the agent | | **Agent API** | [http://localhost:9991/tasks](http://localhost:9991/tasks) | REST API for programmatic task creation | | **Desktop API** | [http://localhost:9990/computer-use](http://localhost:9990/computer-use) | Low-level desktop control API | | **MCP SSE** | [http://localhost:9990/mcp](http://localhost:9990/mcp) | Connect MCP clients for tool access | ## ☸️ Deploy with Helm See our [Helm deployment guide](/deployment/helm) for Kubernetes installation. ## 🖥️ Desktop Container Only If you just want the virtual desktop without the AI agent: ```bash # Using pre-built image (recommended) docker-compose -f docker/docker-compose.core.yml pull docker-compose -f docker/docker-compose.core.yml up -d ``` Or build locally: ```bash docker-compose -f docker/docker-compose.core.yml up -d --build ``` Access the desktop at [http://localhost:9990/vnc](http://localhost:9990/vnc) ## Managing Your Agent ### View Logs Monitor what your agent is doing: ```bash # All services docker-compose -f docker/docker-compose.yml logs -f # Just the agent docker-compose -f docker/docker-compose.yml logs -f bytebot-agent ``` ### Stop Services ```bash docker-compose -f docker/docker-compose.yml down ``` ### Update to Latest ```bash docker-compose -f docker/docker-compose.yml pull docker-compose -f docker/docker-compose.yml up -d ``` ### Reset Everything Remove all data and start fresh: ```bash docker-compose -f docker/docker-compose.yml down -v ``` ## Quick API Examples ### Create a Task via API ```bash # Simple task curl -X POST http://localhost:9991/tasks \ -H "Content-Type: application/json" \ -d '{ "description": "Search for flights from NYC to London next month", "priority": "MEDIUM" }' # Task with file upload curl -X POST http://localhost:9991/tasks \ -F "description=Read this contract and summarize the key terms" \ -F "priority=HIGH" \ -F "files=@contract.pdf" ``` ### Direct Desktop Control ```bash # Take a screenshot curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "screenshot"}' # Type text curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "type_text", "text": "Hello, Bytebot!"}' ``` ## Troubleshooting Check Docker is running and you have enough resources: ```bash docker info docker-compose -f docker/docker-compose.yml logs ``` Ensure all services are running: ```bash docker-compose -f docker/docker-compose.yml ps ``` All services should show as "Up". Check your API key is set correctly: ```bash cat docker/.env docker-compose -f docker/docker-compose.yml logs bytebot-agent ``` Ensure you're using a valid API key from Anthropic, OpenAI, or Google. ## 📚 Next Steps Learn how to create and manage tasks effectively Take control when you need to guide Bytebot Use any LLM provider with Bytebot Automate Bytebot with your applications ## 🔧 Configuration Options ### Environment Variables ```bash # Choose one AI provider: ANTHROPIC_API_KEY=sk-ant-... # For Claude models OPENAI_API_KEY=sk-... # For GPT models GEMINI_API_KEY=... # For Gemini models # Optional: Use specific models ANTHROPIC_MODEL=claude-3-5-sonnet-20241022 # Default OPENAI_MODEL=gpt-4o GEMINI_MODEL=gemini-1.5-flash ``` ```bash # Change default ports if needed # Edit docker-compose.yml ports section: # bytebot-ui: # ports: # - "8080:9992" # Change 8080 to your desired port ``` ```bash # To use multiple LLM providers, use the proxy setup: docker-compose -f docker/docker-compose.proxy.yml up -d # This includes a pre-configured LiteLLM proxy ``` **Need help?** Join our [Discord community](https://discord.com/invite/d9ewZkWPTP) for support and to share what you're building!