Every tool the engine exposes.
70+ MCP tools, grouped by surface. Each entry tells you when to reach for it, what it returns, and what it records to the playbook.
Core interaction
The bread-and-butter set. Every Thread uses some of these.
click
Click an element by CSS selector with strategy laddering. Tries CDP coords, JS dispatch, ARIA press, framework-specific paths. Auto-records the working strategy to the playbook.
click(selector="button[data-testid='submit']", session="slot-1", tab="tab1")
auto-records per-domain click_log with the winning strategy.
fill
Fill an input or contenteditable. Handles native setters, dispatches change/input events React listens for, falls back to keystroke typing.
fill(selector="#email", value="me@example.com")
navigate
Navigate the current tab and wait for document.readyState === "complete". Records the URL transition so multi-page flows chain in the playbook.
scan_tab
Returns a structured DOM snapshot — visible inputs, buttons, modals, iframes, framework hints, anti-bot signals. The first thing to call when you don't know a site.
read_tab
Plain text of the visible page (Reader-mode-style extraction). Use when you need text content, not interactive structure.
screenshot
PNG of the viewport. Saved to ~/.webloom/screenshots/ by default.
wait_for / wait_for_idle
wait_for(selector) blocks until a selector exists. wait_for_idle waits for network idle. Both record the wait timing to the playbook so future runs know how long the page actually takes to settle.
scroll_tab
Scroll by px or to an element. Important for lazy-loaded content where target elements aren't mounted until visible.
Keyboard
key_type
Type a string char-by-char via CDP Input.insertText. Survives Lexical, contenteditables. Cadence configurable for fragile editors (X composer drops chars under 60ms).
key_press
Single keypress with modifiers. key_press(key="Enter", modifiers=["Control"]) for Ctrl+Enter, etc.
Framework-aware
When generic click/fill desyncs because the framework holds its own state model, reach for these.
| Tool | When to use |
|---|---|
lexical_set_text | Reddit composer, Notion, Meta apps — any Lexical-based contenteditable. Tries setEditorState first, falls back to InputEvent dispatch. |
draftjs_set_text | X / Twitter composer. Draft.js's handlePastedText blocks synthetic clipboard events; this uses real CDP keystrokes with focus-settle warmup. delay_ms tunable, verify_per_char available. |
react_force_change | When React's onChange doesn't fire from a native value setter. Walks the fiber to find the descriptor's set and dispatches. |
redux_dispatch | Site uses Redux for state — call its store.dispatch directly. Works on D2D, complex multi-step forms. |
aui_dispatch | Amazon AUI (KDP, Author Central) — fires declarative events Amazon's custom widget kit listens for. |
backbone_inspect | Backbone-based admin UIs (legacy enterprise). Read collection state from the running app. |
Network
capture_network_start / capture_network_stop / get_captured_requests
Sniff every HTTP call the page makes. Pass full=true to get full request headers + body — required when you intend to replay (auth tokens, csrf, transaction-ids live in headers).
# 1. start capture
capture_network_start(session="slot-1", tab="tab1")
# 2. user does one manual action that hits the endpoint
# 3. stop and inspect
capture_network_stop(session="slot-1", url_filter="CreateTweet", full=true)
# returns full headers + body
# 4. replay forever
replay_xhr(url=..., method="POST", headers={...}, body={...})replay_xhr
Fire a fetch with credentials:'include' from inside the page. Uses the live session's cookies/auth. Auto-records the URL pattern to the playbook (normalized — hash segments collapse to {hash}).
inject_on_new_document / remove_injected_script
Register a JS script that runs in page context before page scripts on every navigation. Use for XHR interceptors, fingerprint probes, instrumentation that needs to survive page changes.
xhr_upload
Direct file upload via fetch + FormData. Bypasses widgets that corrupt programmatic input.files injection (KDP AjaxInput pattern). Pair with capture_network to discover the URL.
Vision + coords
vision_check
The fallback for when DOM strategies don't reach the target — canvas-rendered widgets, OAuth popups, weird iframes, image-only buttons. Snaps a screenshot, sends to Claude with your question, optionally returns {x, y} coordinates.
v = vision_check(question="click coords for the post button", session="slot-1", tab="tab1")
# v == {ok: true, answer: "...", click: {x: 720, y: 480}}
if v.click:
click_at_coords(x=v.click.x, y=v.click.y)click_at_coords
Real CDP click at absolute viewport coords. Pair with vision_check.
Stealth, captcha, healing
enable_stealth
Apply fingerprint masks (navigator.webdriver, plugins, languages, WebGL vendor, chrome.runtime). Persistent across navigations. Real-Chrome sessions usually don't need this; fresh-profile flows and Cloudflare/Akamai-guarded sites do.
solve_captcha
Submit a challenge to 2captcha (or capmonster in v0.2). Supports reCAPTCHA v2/v3, hCaptcha, Turnstile. Costs ~$0.001-0.003/solve. Without keys, returns a hint to fall back to pause_for_human.
drift_heal_suggest
When a recorded selector breaks, scan the current DOM for candidates ranked by aria-name overlap + structural anchor strength (data-testid > id > name). Returns top 8 suggestions with reasons.
Orchestration
run_parallel
Fan out N tool calls concurrently with a max-in-flight semaphore. Use for batch preflight, multi-session lockstep, cross-poster patterns.
run_parallel(calls=[
{"tool": "scan_tab", "args": {"session": "slot-1", "tab": "tab1"}},
{"tool": "scan_tab", "args": {"session": "slot-1", "tab": "tab2"}},
{"tool": "scan_tab", "args": {"session": "slot-1", "tab": "tab3"}},
], max_concurrency=3)start_recording / end_recording / replay_recipe
Record any sequence of action tools into a named recipe. Replay deterministically. Recordings save to ~/.webloom/engine/recipes/ as JSON.
Recipes (high-level workflows)
reddit_submit_comment
Full Reddit comment submission: navigate, mount Lexical composer via placeholder click, set text, find submit button across the churned selector zoo, verify landed. Records success/failure to playbook.
reddit_check_shadowban
Anonymous fetch of a Reddit profile from a logged-out perspective. Returns likely_shadowbanned verdict with reasons.
Safety + meta
pause_for_human
Halt and ask Mariano (or whoever's driving) to do something the bot can't. Beeps, records the manual-touch checkpoint to the playbook so future Thread runs know this step needs human help.
detect_anti_bot
Probe the page for known anti-bot patterns (Cloudflare interstitial, Datadome, PerimeterX, hCaptcha widgets). Returns verdict + signals.
detect_blocker
Detect "you've been logged out" / paywall / rate-limit overlays so a Thread can pause cleanly instead of clicking through garbage.
get_playbook
Read everything the engine has learned about a domain. Includes installed Thread data merged with live learning. Source of truth for "is this Thread proven on this site?"
The compounding effect
Every action a tool performs records to the playbook. Over time, on a given domain, the engine learns: which strategy wins for which selector, how long pages take to settle, what waits chain together. That's the moat — your Threads get more reliable the more they're used, across every buyer.