Overview

The unified computer action API allows for granular control over all aspects of the Bytebot virtual desktop environment through a single endpoint. It replaces multiple specific endpoints with a unified interface that handles various computer actions like mouse movements, clicks, key presses, and more.

Endpoint

MethodURLDescription
POST/computer-use/computerExecute computer actions in the virtual desktop

Request Format

All requests to the unified endpoint follow this format:
{
  "action": "action_name",
  ...action-specific parameters
}
The action parameter determines which operation to perform, and the remaining parameters depend on the specific action.

Available Actions

move_mouse

Move the mouse cursor to a specific position. Parameters:
ParameterTypeRequiredDescription
coordinatesObjectYesThe target coordinates to move to
coordinates.xNumberYesX coordinate
coordinates.yNumberYesY coordinate
Example:
{
  "action": "move_mouse",
  "coordinates": {
    "x": 100,
    "y": 200
  }
}

trace_mouse

Move the mouse along a path of coordinates. Parameters:
ParameterTypeRequiredDescription
pathArrayYesArray of coordinate objects for the mouse path
path[].xNumberYesX coordinate for each point in the path
path[].yNumberYesY coordinate for each point in the path
holdKeysArrayNoKeys to hold while moving along the path
Example:
{
  "action": "trace_mouse",
  "path": [
    { "x": 100, "y": 100 },
    { "x": 150, "y": 150 },
    { "x": 200, "y": 200 }
  ],
  "holdKeys": ["shift"]
}

click_mouse

Perform a mouse click at the current or specified position. Parameters:
ParameterTypeRequiredDescription
coordinatesObjectNoThe coordinates to click (uses current if omitted)
coordinates.xNumberYes*X coordinate
coordinates.yNumberYes*Y coordinate
buttonStringYesMouse button: ‘left’, ‘right’, or ‘middle’
clickCountNumberYesNumber of clicks to perform
holdKeysArrayNoKeys to hold while clicking (e.g., [‘ctrl’, ‘shift’])
Example:
{
  "action": "click_mouse",
  "coordinates": {
    "x": 150,
    "y": 250
  },
  "button": "left",
  "clickCount": 2
}

press_mouse

Press or release a mouse button at the current or specified position. Parameters:
ParameterTypeRequiredDescription
coordinatesObjectNoThe coordinates to press/release (uses current if omitted)
coordinates.xNumberYes*X coordinate
coordinates.yNumberYes*Y coordinate
buttonStringYesMouse button: ‘left’, ‘right’, or ‘middle’
pressStringYesAction: ‘up’ or ‘down’
Example:
{
  "action": "press_mouse",
  "coordinates": {
    "x": 150,
    "y": 250
  },
  "button": "left",
  "press": "down"
}

drag_mouse

Click and drag the mouse from one point to another. Parameters:
ParameterTypeRequiredDescription
pathArrayYesArray of coordinate objects for the drag path
path[].xNumberYesX coordinate for each point in the path
path[].yNumberYesY coordinate for each point in the path
buttonStringYesMouse button: ‘left’, ‘right’, or ‘middle’
holdKeysArrayNoKeys to hold while dragging
Example:
{
  "action": "drag_mouse",
  "path": [
    { "x": 100, "y": 100 },
    { "x": 200, "y": 200 }
  ],
  "button": "left"
}

scroll

Scroll up, down, left, or right. Parameters:
ParameterTypeRequiredDescription
coordinatesObjectNoThe coordinates to scroll at (uses current if omitted)
coordinates.xNumberYes*X coordinate
coordinates.yNumberYes*Y coordinate
directionStringYesScroll direction: ‘up’, ‘down’, ‘left’, ‘right’
scrollCountNumberYesNumber of scroll steps
holdKeysArrayNoKeys to hold while scrolling
Example:
{
  "action": "scroll",
  "direction": "down",
  "scrollCount": 5
}

type_keys

Type a sequence of keyboard keys. Parameters:
ParameterTypeRequiredDescription
keysArrayYesArray of keys to type in sequence
delayNumberNoDelay between key presses (ms)
Example:
{
  "action": "type_keys",
  "keys": ["a", "b", "c", "enter"],
  "delay": 50
}

press_keys

Press or release keyboard keys. Parameters:
ParameterTypeRequiredDescription
keysArrayYesArray of keys to press or release
pressStringYesAction: ‘up’ or ‘down’
Example:
{
  "action": "press_keys",
  "keys": ["ctrl", "shift", "esc"],
  "press": "down"
}

type_text

Type a text string with optional delay. Parameters:
ParameterTypeRequiredDescription
textStringYesThe text to type
delayNumberNoDelay between characters in milliseconds (default: 0)
Example:
{
  "action": "type_text",
  "text": "Hello, Bytebot!",
  "delay": 50
}

paste_text

Paste text to the current cursor position. This is especially useful for special characters that aren’t on the standard keyboard. Parameters:
ParameterTypeRequiredDescription
textStringYesThe text to paste, including special characters and emojis
Example:
{
  "action": "paste_text",
  "text": "Special characters: ©®™€¥£ émojis 🎉"
}

wait

Wait for a specified duration. Parameters:
ParameterTypeRequiredDescription
durationNumberYesWait duration in milliseconds
Example:
{
  "action": "wait",
  "duration": 2000
}

screenshot

Capture a screenshot of the desktop. Parameters: None required Example:
{
  "action": "screenshot"
}

cursor_position

Get the current position of the mouse cursor. Parameters: None required Example:
{
  "action": "cursor_position"
}

application

Switch between different applications or navigate to the desktop/directory. Parameters:
ParameterTypeRequiredDescription
applicationStringYesThe application to switch to. See available options below.
Available Applications:
  • firefox - Mozilla Firefox web browser
  • 1password - Password manager
  • thunderbird - Email client
  • vscode - Visual Studio Code editor
  • terminal - Terminal/console application
  • desktop - Switch to desktop
  • directory - File manager/directory browser
Example:
{
  "action": "application",
  "application": "firefox"
}

Response Format

The response format varies depending on the action performed.

Standard Response

Most actions return a simple success response:
{
  "success": true
}

Screenshot Response

{
  "success": true,
  "data": {
    "image": "base64_encoded_image_data"
  }
}

Cursor Position Response

{
  "success": true,
  "data": {
    "x": 123,
    "y": 456
  }
}

Error Response

{
  "success": false,
  "error": "Error message"
}

Code Examples

JavaScript/Node.js Example

const axios = require('axios');

const bytebot = {
  baseUrl: 'http://localhost:9990/computer-use/computer',
  
  async action(params) {
    try {
      const response = await axios.post(this.baseUrl, params);
      return response.data;
    } catch (error) {
      console.error('Error:', error.response?.data || error.message);
      throw error;
    }
  },
  
  // Convenience methods
  async moveMouse(x, y) {
    return this.action({
      action: 'move_mouse',
      coordinates: { x, y }
    });
  },
  
  async clickMouse(x, y, button = 'left') {
    return this.action({
      action: 'click_mouse',
      coordinates: { x, y },
      button
    });
  },
  
  async typeText(text) {
    return this.action({
      action: 'type_text',
      text
    });
  },
  
  async pasteText(text) {
    return this.action({
      action: 'paste_text',
      text
    });
  },
  
  async switchApplication(application) {
    return this.action({
      action: 'application',
      application
    });
  },
  
  async screenshot() {
    return this.action({ action: 'screenshot' });
  }
};

// Example usage:
async function example() {
  // Switch to Firefox
  await bytebot.switchApplication('firefox');
  
  // Navigate to a website
  await bytebot.moveMouse(100, 35);
  await bytebot.clickMouse(100, 35);
  await bytebot.typeText('https://example.com');
  await bytebot.action({
    action: 'press_keys',
    keys: ['enter'],
    press: 'down'
  });
  
  // Wait for page to load
  await bytebot.action({
    action: 'wait',
    duration: 2000
  });
  
  // Paste some special characters
  await bytebot.pasteText('© 2025 Example Corp™ - €100');
  
  // Take a screenshot
  const result = await bytebot.screenshot();
  console.log('Screenshot taken!');
}

example().catch(console.error);