Using a Remote Browser

Create an Action

In this guide, you will create and execute a simple action on a Bytebot-managed remote browser. If you want to create an action on a local Puppeteer instance instead, click here.

Bytebot’s Act Function

Next, you can convert a natural language prompt into an array of BrowserActions. To accomplish this, use the BytebotClient.browser.act(options: Object) function. The function is async and accepts the following options:

AttributeRequiredDescription
sessionIdRequiredA String that identifies the targeted remote browser session
promptRequiredA natural language String that describes the intended action
urlOptionalA String that indicates which destination the actions should be generated and executed on. If omitted, Bytebot will default to the browser’s last URL.
pageIdOptionalA String that indicates which page (virtual tab) the actions should be generated and executed on. If omitted, Bytebot will default to the last page interacted with.

To begin, write a concise and clear prompt:

1const prompt = "click on the Get Started button";

Then, call act, passing in the prompt text and the session ID:

1import { BytebotClient } from "./src/index.ts";
2import "dotenv/config";
3
4const bytebot = new BytebotClient({
5 apiKey: process.env.BYTEBOT_API_KEY,
6});
7
8async function run() {
9 const browser = await bytebot.browser.startSession(
10 "https://developer.chrome.com/"
11 );
12
13 if (browser.sessionId) {
14 const prompt = "click on the Get Started button";
15 const actions = await bytebot.browser.act({
16 sessionId: browser.sessionId,
17 prompt: prompt,
18 });
19 console.log("Actions", actions);
20
21 bytebot.browser.endSession(browser.sessionId);
22 }
23}
24
25run().catch(console.error);

If you run this code, you should get a printout in your console similar to:

$Actions {
> sessionId: '15e95c04-3c4c-4104-b072-63f8017f127a',
> actions: [
> {
> type: 'Click',
> xpath: '/html/body/section/section/main/devsite-content/article/div[3]/section[1]/div/header/div[2]/a'
> }
> ],
> pages: [ { pageId: 0, url: 'https://developer.chrome.com/' } ]
>}

The actions array is a set of BrowserAction objects, each with a type (e.g., “Click” or “Hover”) and an xpath (a reference to a specific DOM element). The pages array is akin to the open tabs on the browser, each with a pageId and url.

Changing the URL

By default, Bytebot uses the last URL to generate and execute the browser actions. However, you can optionally change the URL by passing it as an option to act:

1import { BytebotClient } from "./src/index.ts";
2import "dotenv/config";
3
4const bytebot = new BytebotClient({
5 apiKey: process.env.BYTEBOT_API_KEY,
6});
7
8async function run() {
9 const browser = await bytebot.browser.startSession(
10 "https://developer.chrome.com/"
11 );
12
13 if (browser.sessionId) {
14 const prompt = "click on the Get Started button";
15 const actions = await bytebot.browser.act({
16 sessionId: browser.sessionId,
17 prompt: prompt,
18 url: "https://developers.google.com/maps"
19 });
20 console.log("Actions", actions);
21
22 bytebot.browser.endSession(browser.sessionId);
23 }
24}
25
26run().catch(console.error);

If you specify a URL that is the same as the previous URL, the page will still reload.

Specifying a page

Additionally, if you have multiple pages open, you can specify which page to run the browser action on by using the pageId option:

1const actions = await bytebot.browser.act({
2 sessionId: browser.sessionId,
3 prompt: prompt,
4 pageId: PAGE_ID
5});

You can specify the pageId and the url of that page in a single act call:

1const actions = await bytebot.browser.act({
2 sessionId: browser.sessionId,
3 prompt: prompt,
4 url: "newUrl.org",
5 pageId: PAGE_ID
6});