Browser Use
Give an agent full control of a real browser. Navigate, click, type, scrape, fill forms, screenshot — using the user's signed-in Rush app browser, not a throwaway sandbox.
Why this matters#
OpenAI Operator and Claude Computer Use spin up their own sandboxed browsers. Every run starts logged-out. You re-authenticate to every site, every time.
rush-browser drives the user's in-app Rush browser — their logged-in Gmail, their signed-in GitHub, their active Notion workspace. OAuth sessions, cookies, and auth state are already there. The agent just acts.
The twelve browser tools#
Under the hood, rush-browser is a thin wrapper over twelve Host Browser tools in harness/tools/browser. Any agent can pull them in directly.
| Tool | Purpose |
|---|---|
browser_navigate | Open a URL in the browser |
browser_screenshot | Capture the visible page as an image |
browser_snapshot | Get the accessibility tree with element refs for interaction |
browser_get_text | Extract page content as clean markdown |
browser_get_html | Get raw DOM HTML |
browser_click | Click an element by ref |
browser_type | Type text into an input by ref |
browser_scroll | Scroll the page |
browser_evaluate | Run JavaScript against the page (read values, inspect state) |
browser_wait_for_selector | Block until an element appears |
browser_wait_for_navigation | Block until the URL changes |
browser_notify | Surface a page-side alert to the user |
Use it in your own agent#
Declare the browser_tools tool group in agent.yaml. One line expands to all twelve tools:
tools:
- name: browser_tools
- name: artifactsThe agent decides when to navigate, snapshot, click, or type. A good prompt gives it a mental model of the browsing loop:
- name: browsing_loop
content: |
When you need to interact with a page:
1. navigate to the URL
2. snapshot to see the accessibility tree and get element refs
3. click / type using those refs (they change after every mutation —
re-snapshot if the page updates)
4. get_text or screenshot to capture results
5. save findings to an artifactRefs from browser_snapshot are ephemeral. They become invalid after any DOM change. Always re-snapshot after a click or form submission if you plan to interact further.
Or delegate to rush-browser#
If your agent only occasionally needs the web, don't carry the browser tools yourself. Delegate to the rush-browser system agent instead:
tools:
- name: delegate
config:
max_depth: 2- name: available_agents
content: |
You can delegate web tasks to:
- **rush-browser**: AI co-pilot for the web. Give it a goal
("fill the contact form at X with these values") and it handles
navigation, snapshotting, clicking, and reporting back.At call time:
delegate(
agent_id: "rush-browser",
task: "Open https://example.com/contact, fill the form with name=Jane, [email protected], message='hello', and submit it. Report the confirmation page back."
)rush-browser is a system agent — hidden from the store, pre-installed with every Rush app, and callable from any agent that has the delegate tool enabled.
How it actually runs#
Tool calls from the agent are routed over IPC to the Host Browser — the Rush app's in-app browser panel. The user sees the agent's clicks, scrolls, and typing live, in the same browser they're already signed into.
browser_click(ref: "e42")Because the Host Browser is the user's actual working browser, the agent inherits logged-in state everywhere — Gmail, GitHub, Notion, internal dashboards, anything behind a cookie.
What this unlocks#
| Task | Pattern |
|---|---|
| Fill a reservation form on OpenTable | rush-browser — delegate with the goal, it handles navigation and form completion |
| Scrape structured data from a signed-in dashboard | browser_tools in your agent — navigate + snapshot + get_text + save to artifact |
| Watch a long page for something to load | browser_wait_for_selector — blocks until the element exists, no polling |
| Pull a value out of client-side state | browser_evaluate — run JS, return the result |