Browser Use

Give an agent full control of a real browser. Navigate, click, type, scrape, fill forms, screenshot — using the user's signed-in Rush app browser, not a throwaway sandbox.

Why this matters#

OpenAI Operator and Claude Computer Use spin up their own sandboxed browsers. Every run starts logged-out. You re-authenticate to every site, every time.

rush-browser drives the user's in-app Rush browser — their logged-in Gmail, their signed-in GitHub, their active Notion workspace. OAuth sessions, cookies, and auth state are already there. The agent just acts.

The twelve browser tools#

Under the hood, rush-browser is a thin wrapper over twelve Host Browser tools in harness/tools/browser. Any agent can pull them in directly.

Tool	Purpose
`browser_navigate`	Open a URL in the browser
`browser_screenshot`	Capture the visible page as an image
`browser_snapshot`	Get the accessibility tree with element refs for interaction
`browser_get_text`	Extract page content as clean markdown
`browser_get_html`	Get raw DOM HTML
`browser_click`	Click an element by ref
`browser_type`	Type text into an input by ref
`browser_scroll`	Scroll the page
`browser_evaluate`	Run JavaScript against the page (read values, inspect state)
`browser_wait_for_selector`	Block until an element appears
`browser_wait_for_navigation`	Block until the URL changes
`browser_notify`	Surface a page-side alert to the user

Use it in your own agent#

Declare the browser_tools tool group in agent.yaml. One line expands to all twelve tools:

agent.yaml

tools:
  - name: browser_tools
  - name: artifacts

The agent decides when to navigate, snapshot, click, or type. A good prompt gives it a mental model of the browsing loop:

prompt.yaml

- name: browsing_loop
  content: |
    When you need to interact with a page:
    1. navigate to the URL
    2. snapshot to see the accessibility tree and get element refs
    3. click / type using those refs (they change after every mutation —
       re-snapshot if the page updates)
    4. get_text or screenshot to capture results
    5. save findings to an artifact

Refs from browser_snapshot are ephemeral. They become invalid after any DOM change. Always re-snapshot after a click or form submission if you plan to interact further.

Or delegate to rush-browser#

If your agent only occasionally needs the web, don't carry the browser tools yourself. Delegate to the rush-browser system agent instead:

agent.yaml

tools:
  - name: delegate
    config:
      max_depth: 2

prompt.yaml

- name: available_agents
  content: |
    You can delegate web tasks to:
    - **rush-browser**: AI co-pilot for the web. Give it a goal
      ("fill the contact form at X with these values") and it handles
      navigation, snapshotting, clicking, and reporting back.

At call time:

delegate(
  agent_id: "rush-browser",
  task: "Open https://example.com/contact, fill the form with name=Jane, [email protected], message='hello', and submit it. Report the confirmation page back."
)

rush-browser is a system agent — hidden from the store, pre-installed with every Rush app, and callable from any agent that has the delegate tool enabled.

How it actually runs#

Tool calls from the agent are routed over IPC to the Host Browser — the Rush app's in-app browser panel. The user sees the agent's clicks, scrolls, and typing live, in the same browser they're already signed into.

Agent

rush/cli

browser_click(ref: "e42")

IPC bridge

Rush app

Host Browsersigned in

user's working browser tab

DOM updated · result returned

Tool call path · agent → IPC → Host Browser

Because the Host Browser is the user's actual working browser, the agent inherits logged-in state everywhere — Gmail, GitHub, Notion, internal dashboards, anything behind a cookie.

What this unlocks#

Task	Pattern
Fill a reservation form on OpenTable	`rush-browser` — delegate with the goal, it handles navigation and form completion
Scrape structured data from a signed-in dashboard	`browser_tools` in your agent — navigate + snapshot + get_text + save to artifact
Watch a long page for something to load	`browser_wait_for_selector` — blocks until the element exists, no polling
Pull a value out of client-side state	`browser_evaluate` — run JS, return the result