Docs

Browser Use

Give an agent full control of a real browser. Navigate, click, type, scrape, fill forms, screenshot — using the user's signed-in Rush app browser, not a throwaway sandbox.

Why this matters#

OpenAI Operator and Claude Computer Use spin up their own sandboxed browsers. Every run starts logged-out. You re-authenticate to every site, every time.

rush-browser drives the user's in-app Rush browser — their logged-in Gmail, their signed-in GitHub, their active Notion workspace. OAuth sessions, cookies, and auth state are already there. The agent just acts.

The twelve browser tools#

Under the hood, rush-browser is a thin wrapper over twelve Host Browser tools in harness/tools/browser. Any agent can pull them in directly.

ToolPurpose
browser_navigateOpen a URL in the browser
browser_screenshotCapture the visible page as an image
browser_snapshotGet the accessibility tree with element refs for interaction
browser_get_textExtract page content as clean markdown
browser_get_htmlGet raw DOM HTML
browser_clickClick an element by ref
browser_typeType text into an input by ref
browser_scrollScroll the page
browser_evaluateRun JavaScript against the page (read values, inspect state)
browser_wait_for_selectorBlock until an element appears
browser_wait_for_navigationBlock until the URL changes
browser_notifySurface a page-side alert to the user

Use it in your own agent#

Declare the browser_tools tool group in agent.yaml. One line expands to all twelve tools:

agent.yaml
tools:
  - name: browser_tools
  - name: artifacts

The agent decides when to navigate, snapshot, click, or type. A good prompt gives it a mental model of the browsing loop:

prompt.yaml
- name: browsing_loop
  content: |
    When you need to interact with a page:
    1. navigate to the URL
    2. snapshot to see the accessibility tree and get element refs
    3. click / type using those refs (they change after every mutation —
       re-snapshot if the page updates)
    4. get_text or screenshot to capture results
    5. save findings to an artifact

Refs from browser_snapshot are ephemeral. They become invalid after any DOM change. Always re-snapshot after a click or form submission if you plan to interact further.

Or delegate to rush-browser#

If your agent only occasionally needs the web, don't carry the browser tools yourself. Delegate to the rush-browser system agent instead:

agent.yaml
tools:
  - name: delegate
    config:
      max_depth: 2
prompt.yaml
- name: available_agents
  content: |
    You can delegate web tasks to:
    - **rush-browser**: AI co-pilot for the web. Give it a goal
      ("fill the contact form at X with these values") and it handles
      navigation, snapshotting, clicking, and reporting back.

At call time:

delegate(
  agent_id: "rush-browser",
  task: "Open https://example.com/contact, fill the form with name=Jane, [email protected], message='hello', and submit it. Report the confirmation page back."
)

rush-browser is a system agent — hidden from the store, pre-installed with every Rush app, and callable from any agent that has the delegate tool enabled.

How it actually runs#

Tool calls from the agent are routed over IPC to the Host Browser — the Rush app's in-app browser panel. The user sees the agent's clicks, scrolls, and typing live, in the same browser they're already signed into.

Agent
rush/cli
browser_click(ref: "e42")
IPC bridge
Rush app
Host Browsersigned in
user's working browser tab
DOM updated · result returned
Tool call path · agent → IPC → Host Browser

Because the Host Browser is the user's actual working browser, the agent inherits logged-in state everywhere — Gmail, GitHub, Notion, internal dashboards, anything behind a cookie.

What this unlocks#

TaskPattern
Fill a reservation form on OpenTablerush-browser — delegate with the goal, it handles navigation and form completion
Scrape structured data from a signed-in dashboardbrowser_tools in your agent — navigate + snapshot + get_text + save to artifact
Watch a long page for something to loadbrowser_wait_for_selector — blocks until the element exists, no polling
Pull a value out of client-side statebrowser_evaluate — run JS, return the result

Next#

Documentation | Prix | Prix