Using Bytebot

Core Concepts

Core Concepts

Bytebot operates on a straightforward yet powerful premise: it takes webpage content and a user-defined prompt and translates them into a series of executable actions for web scraping or automation. This section delves into the core mechanics of Bytebot, shedding light on how it processes inputs and the fundamental components of its output.

Page and Prompt Processing

The interaction with Bytebot begins when you provide it with a page and a descriptive prompt, such as “Add the product to the cart.” Bytebot leverages the capabilities of Large Language Models (LLMs) to interpret this prompt in the context of the page DOM. The process involves parsing the DOM to construct a semantic representation that is understandable by the LLM. This enables Bytebot to discern the intended action from the prompt within the specific context of the DOM structure and content.

Decision Making with LLMs

Upon receiving the page and prompt, Bytebot employs LLMs to analyze the request and determine the most appropriate course of action. This decision-making process is grounded in the LLM’s understanding of typical web interactions and its analysis of the semantic representation of the page content. Bytebot’s continuous refinement of its models ensures increasingly accurate interpretations of prompts and more effective action sequences over time.

BrowserAction Objects

The outcome of Bytebot’s processing is a sequence of BrowserAction objects, each representing a discrete action that can be executed within a web browser environment. These objects are the building blocks Bytebot uses to translate high-level prompts into specific, actionable Puppeteer code. The types of BrowserAction objects include:

  • Click: Represents an action to click on a specified element within the web page.
  • CopyText: An action to copy the text content from a specified element.
  • CopyAttribute: Involves copying the value of a specified attribute from an element, such as the src attribute of an image.
  • AssignAttribute: Assigns a new value to an attribute of a specified element, effectively modifying the element’s attribute (ex: inputting text to a search bar).
  • ExtractTable: The extraction of tabular data from the page, organizing it into a structured format that can be easily utilized or analyzed.

These BrowserAction objects encapsulate the specifics of each action, including the target elements and any required parameters, facilitating their direct execution as Puppeteer commands by the Bytebot SDK.

Through this approach, Bytebot abstracts the complexity of web automation and scraping tasks, enabling developers to focus on defining their objectives at the semantic level, while Bytebot handles the intricate details of translating these intents into effective Puppeteer code.