The Bytebot SDK enables developers to automate web tasks by translating natural language prompts into executable actions on web pages. This guide provides detailed documentation on the BytebotClient class, its methods, and the various interfaces and enums that are part of the SDK.
BytebotClient
BytebotClient is the primary interface for interacting with the Bytebot SDK, offering methods to execute actions based on natural language prompts or predefined action sets.
Constructor
Instantiate BytebotClient with optional ClientOptions for customization:
Act Method
The act method allows you to translate a natural language prompt into one or more actions for a specified page. The actions are returned as an array of BrowserAction objects.
Usage
Parameters
prompt
: A string containing the natural language instructions to be converted into actions.page
: The Page object representing the browser page where the actions will be executed.options
: Optional. An object of PromptOptions to further customize the prompt execution.
Extract Method
The extract method allows you to translate a natural language prompt into a series of actions and execute them on a specified page. A single action is returned in an BrowserAction array.
Usage
Parameters
schema
: A object representing the structure of what to extract.page
: The Page object representing the browser page where the actions will be executed.options
: Optional. An object of PromptOptions to further customize the prompt execution.
Execute Method
The execute method directly executes a list of specified actions on a given page. Depending on the action type, the method will return null or a value.
Enums
BrowserActionType
An enumeration of possible action types that BytebotClient can perform, such as copying attributes, assigning attributes, clicking elements, copying text, and extracting tables.
Interfaces
ClientOptions
An interface for customizing the behavior of the BytebotClient:
Properties
apiUrl
: Optional. A string for the URL of the Bytebot API. Defaults to ‘https://api.bytebot.ai’.apiKey
: Optional. A string for the API key to authenticate with the Bytebot API. If not provided, the API key must be set using theBYTEBOT_API_KEY
environment variable.logVerbose
: Optional. A boolean indicating whether verbose logging should be enabled. Defaults to false.
PromptOptions
An interface for customizing the behavior of the act method:
Properties
parameters
: Optional. An object containing key-value pairs for passing sensitive information securely.
BrowserAction
Defines the structure for individual actions, including their type, target, and parameters:
Properties
type
: ActionDetailActionType Specifies the type of action, such as Click, CopyText, or ExtractTable.xpath
: The XPath expression targeting the element(s) the action is to be performed on.attribute
: Required for CopyAttribute and AssignAttribute. The name of the attribute to read or assign.value
: Required for AssignAttribute. The value to assign to the specified attribute.rows
: Required for ExtractTable. An table represented as a 2D array of ExtractTableColumn objects that specify the cells to be extracted.
ExtractTableColumn
Details for individual table columns in an extraction action:
Properties
name
: The name assigned to the column, which will be used as key in the dictionary returned.action
: BrowserAction Specifies the action to perform on the column. Supported actions include CopyAttribute and CopyText.