Core Concepts

Bytebot is different from typical web scraping applications. Because Bytebot leverages large language models (LLMs) to heuristically create testing actions, it features a few unique concepts.

On this page, we’ll break down Bytebot’s core structure.

What is Bytebot?

Bytebot leverages LLMs to interpret a user-provided natural language prompt (e.g., “Add the item to the cart”). Bytebot generates a set of actions to automate a user session. This includes both interacting with page elements and extracting information.

Bytebot can be used for robust testing, web scraping, and various other browser automation tasks.

How does Bytebot work?

Bytebot manages three layers of processing to drive automation through natural language queries. These layers are (i) ingesting natural language prompts, (ii) defining output structure, and (iii) generating/executing the browser actions.

How it works diagram

Because AI models need strict instructions to deliver reliable results, Bytebot combines tightly-defined outputs with open-ended user prompts to predictably generate browser actions.

What are BrowserAction Objects?

BrowserAction objects represent a web browser action. BrowserAction provides a readable, intermediary representation of a generated automation. It includes two components: an action type and an xpath , which is a “path like” syntax to specify nodes in the HTML document. Bytebot can translate an array of BrowserAction obects into Puppeteer code.

There are five BrowserAction types.

  • Click: A click on a specified element.
  • CopyText: A copy action on a specified element’s innerText.
  • CopyAttribute: A copy action on a specified element’s attribute, such as an image’s src attribute.
  • AssignAttribute: An assignment of a new value on a specific element’s attribute, such as entering text to an input field.
  • ExtractTable: A copy action of tabular data, saved in a tabular format.

After these BrowserAction objects are translated into Puppeteer code, Bytebot runs Puppeteer.