Overview

The unified computer action API allows for granular control over all aspects of the Bytebot virtual desktop environment through a single endpoint. It replaces multiple specific endpoints with a unified interface that handles various computer actions like mouse movements, clicks, key presses, and more.

Endpoint

MethodURLDescription
POST/computer-use/computerExecute computer actions in the virtual desktop

Request Format

All requests to the unified endpoint follow this format:

{
  "action": "action_name",
  ...action-specific parameters
}

The action parameter determines which operation to perform, and the remaining parameters depend on the specific action.

Available Actions

move_mouse

Move the mouse cursor to a specific position.

Parameters:

ParameterTypeRequiredDescription
coordinatesObjectYesThe target coordinates to move to
coordinates.xNumberYesX coordinate
coordinates.yNumberYesY coordinate

Example:

{
  "action": "move_mouse",
  "coordinates": {
    "x": 100,
    "y": 200
  }
}

trace_mouse

Move the mouse along a path of coordinates.

Parameters:

ParameterTypeRequiredDescription
pathArrayYesArray of coordinate objects for the mouse path
path[].xNumberYesX coordinate for each point in the path
path[].yNumberYesY coordinate for each point in the path
holdKeysArrayNoKeys to hold while moving along the path

Example:

{
  "action": "trace_mouse",
  "path": [
    { "x": 100, "y": 100 },
    { "x": 150, "y": 150 },
    { "x": 200, "y": 200 }
  ],
  "holdKeys": ["shift"]
}

click_mouse

Perform a mouse click at the current or specified position.

Parameters:

ParameterTypeRequiredDescription
coordinatesObjectNoThe coordinates to click (uses current if omitted)
coordinates.xNumberYes*X coordinate
coordinates.yNumberYes*Y coordinate
buttonStringYesMouse button: ‘left’, ‘right’, or ‘middle’
numClicksNumberNoNumber of clicks (default: 1)
holdKeysArrayNoKeys to hold while clicking (e.g., [‘ctrl’, ‘shift’])

Example:

{
  "action": "click_mouse",
  "coordinates": {
    "x": 150,
    "y": 250
  },
  "button": "left",
  "numClicks": 2
}

press_mouse

Press or release a mouse button at the current or specified position.

Parameters:

ParameterTypeRequiredDescription
coordinatesObjectNoThe coordinates to press/release (uses current if omitted)
coordinates.xNumberYes*X coordinate
coordinates.yNumberYes*Y coordinate
buttonStringYesMouse button: ‘left’, ‘right’, or ‘middle’
pressStringYesAction: ‘up’ or ‘down’

Example:

{
  "action": "press_mouse",
  "coordinates": {
    "x": 150,
    "y": 250
  },
  "button": "left",
  "press": "down"
}

drag_mouse

Click and drag the mouse from one point to another.

Parameters:

ParameterTypeRequiredDescription
pathArrayYesArray of coordinate objects for the drag path
path[].xNumberYesX coordinate for each point in the path
path[].yNumberYesY coordinate for each point in the path
buttonStringYesMouse button: ‘left’, ‘right’, or ‘middle’
holdKeysArrayNoKeys to hold while dragging

Example:

{
  "action": "drag_mouse",
  "path": [
    { "x": 100, "y": 100 },
    { "x": 200, "y": 200 }
  ],
  "button": "left"
}

scroll

Scroll up, down, left, or right.

Parameters:

ParameterTypeRequiredDescription
coordinatesObjectNoThe coordinates to scroll at (uses current if omitted)
coordinates.xNumberYes*X coordinate
coordinates.yNumberYes*Y coordinate
directionStringYesScroll direction: ‘up’, ‘down’, ‘left’, ‘right’
amountNumberYesScroll amount (pixels)
holdKeysArrayNoKeys to hold while scrolling

Example:

{
  "action": "scroll",
  "direction": "down",
  "amount": 100
}

type_keys

Type a sequence of keyboard keys.

Parameters:

ParameterTypeRequiredDescription
keysArrayYesArray of keys to type in sequence
delayNumberNoDelay between key presses (ms)

Example:

{
  "action": "type_keys",
  "keys": ["a", "b", "c", "enter"],
  "delay": 50
}

press_keys

Press or release keyboard keys.

Parameters:

ParameterTypeRequiredDescription
keysArrayYesArray of keys to press or release
pressStringYesAction: ‘up’ or ‘down’

Example:

{
  "action": "press_keys",
  "keys": ["ctrl", "shift", "esc"],
  "press": "down"
}

type_text

Type a text string with optional delay.

Parameters:

ParameterTypeRequiredDescription
textStringYesThe text to type
delayNumberNoDelay between characters in milliseconds (default: 0)

Example:

{
  "action": "type_text",
  "text": "Hello, Bytebot!",
  "delay": 50
}

wait

Wait for a specified duration.

Parameters:

ParameterTypeRequiredDescription
durationNumberYesWait duration in milliseconds

Example:

{
  "action": "wait",
  "duration": 2000
}

screenshot

Capture a screenshot of the desktop.

Parameters: None required

Example:

{
  "action": "screenshot"
}

cursor_position

Get the current position of the mouse cursor.

Parameters: None required

Example:

{
  "action": "cursor_position"
}

Response Format

The response format varies depending on the action performed.

Standard Response

Most actions return a simple success response:

{
  "success": true
}

Screenshot Response

{
  "success": true,
  "data": {
    "image": "base64_encoded_image_data"
  }
}

Cursor Position Response

{
  "success": true,
  "data": {
    "x": 123,
    "y": 456
  }
}

Error Response

{
  "success": false,
  "error": "Error message"
}

Code Examples

JavaScript/Node.js Example

const axios = require('axios');

const bytebot = {
  baseUrl: 'http://localhost:9990/computer-use/computer',
  
  async action(params) {
    try {
      const response = await axios.post(this.baseUrl, params);
      return response.data;
    } catch (error) {
      console.error('Error:', error.response?.data || error.message);
      throw error;
    }
  },
  
  // Convenience methods
  async moveMouse(x, y) {
    return this.action({
      action: 'move_mouse',
      coordinates: { x, y }
    });
  },
  
  async clickMouse(x, y, button = 'left') {
    return this.action({
      action: 'click_mouse',
      coordinates: { x, y },
      button
    });
  },
  
  async typeText(text) {
    return this.action({
      action: 'type_text',
      text
    });
  },
  
  async screenshot() {
    return this.action({ action: 'screenshot' });
  }
};

// Example usage:
async function example() {
  // Navigate to a website
  await bytebot.moveMouse(100, 35);
  await bytebot.clickMouse(100, 35);
  await bytebot.typeText('https://example.com');
  await bytebot.action({
    action: 'press_keys',
    keys: ['enter'],
    press: 'down'
  });
  
  // Wait for page to load
  await bytebot.action({
    action: 'wait',
    duration: 2000
  });
  
  // Take a screenshot
  const result = await bytebot.screenshot();
  console.log('Screenshot taken!');
}

example().catch(console.error);