Core Concepts

Bytebot is different from typical web scraping applications. Because Bytebot leverages large language models (LLMs) to heuristically create testing actions, it features a few unique concepts.

On this page, we’ll break down Bytebot’s core structure.

What is Bytebot?

Bytebot leverages LLMs to interpret a user-provided natural language prompt (e.g., “Click on the cart”). In the background, Bytebot generates a set of actions to automate a user session. This includes both interacting with page elements and extracting information.

Bytebot can be used for robust testing, web scraping, and various other browser automation tasks.

How does Bytebot work?

Bytebot manages four layers of processing to drive automation through natural language queries. These layers are (i) ingesting natural language prompts, (ii) defining output structure, (iii) generating browser actions, and (iv) executing the browser actions.

Because AI models need strict instructions to deliver reliable results, Bytebot combines tightly defined outputs with open-ended user prompts to predictably generate actions.

What are browser actions?

Under the hood, Bytebot leverages browser action. Browser actions are an intermediary representation of a generated automation. They include two components: an action type and an xpath (a “path-like” syntax to specify nodes in the HTML document).

These actions are not exposed to you. Bytebot strictly creates and executes them based on user prompts. Bytebot will visualize them to ensure the user that the right browser actions were created.