Engine reference

Every tool the engine exposes.

70+ MCP tools, grouped by surface. Each entry tells you when to reach for it, what it returns, and what it records to the playbook.

Core interaction

The bread-and-butter set. Every Thread uses some of these.

click

Click an element by CSS selector with strategy laddering. Tries CDP coords, JS dispatch, ARIA press, framework-specific paths. Auto-records the working strategy to the playbook.

example
click(selector="button[data-testid='submit']", session="slot-1", tab="tab1")

auto-records per-domain click_log with the winning strategy.

fill

Fill an input or contenteditable. Handles native setters, dispatches change/input events React listens for, falls back to keystroke typing.

example
fill(selector="#email", value="me@example.com")

navigate

Navigate the current tab and wait for document.readyState === "complete". Records the URL transition so multi-page flows chain in the playbook.

scan_tab

Returns a structured DOM snapshot — visible inputs, buttons, modals, iframes, framework hints, anti-bot signals. The first thing to call when you don't know a site.

read_tab

Plain text of the visible page (Reader-mode-style extraction). Use when you need text content, not interactive structure.

screenshot

PNG of the viewport. Saved to ~/.webloom/screenshots/ by default.

wait_for / wait_for_idle

wait_for(selector) blocks until a selector exists. wait_for_idle waits for network idle. Both record the wait timing to the playbook so future runs know how long the page actually takes to settle.

scroll_tab

Scroll by px or to an element. Important for lazy-loaded content where target elements aren't mounted until visible.

Keyboard

key_type

Type a string char-by-char via CDP Input.insertText. Survives Lexical, contenteditables. Cadence configurable for fragile editors (X composer drops chars under 60ms).

key_press

Single keypress with modifiers. key_press(key="Enter", modifiers=["Control"]) for Ctrl+Enter, etc.

Framework-aware

When generic click/fill desyncs because the framework holds its own state model, reach for these.

ToolWhen to use
lexical_set_textReddit composer, Notion, Meta apps — any Lexical-based contenteditable. Tries setEditorState first, falls back to InputEvent dispatch.
draftjs_set_textX / Twitter composer. Draft.js's handlePastedText blocks synthetic clipboard events; this uses real CDP keystrokes with focus-settle warmup. delay_ms tunable, verify_per_char available.
react_force_changeWhen React's onChange doesn't fire from a native value setter. Walks the fiber to find the descriptor's set and dispatches.
redux_dispatchSite uses Redux for state — call its store.dispatch directly. Works on D2D, complex multi-step forms.
aui_dispatchAmazon AUI (KDP, Author Central) — fires declarative events Amazon's custom widget kit listens for.
backbone_inspectBackbone-based admin UIs (legacy enterprise). Read collection state from the running app.

Network

capture_network_start / capture_network_stop / get_captured_requests

Sniff every HTTP call the page makes. Pass full=true to get full request headers + body — required when you intend to replay (auth tokens, csrf, transaction-ids live in headers).

capture and replay
# 1. start capture
capture_network_start(session="slot-1", tab="tab1")

# 2. user does one manual action that hits the endpoint

# 3. stop and inspect
capture_network_stop(session="slot-1", url_filter="CreateTweet", full=true)
# returns full headers + body

# 4. replay forever
replay_xhr(url=..., method="POST", headers={...}, body={...})

replay_xhr

Fire a fetch with credentials:'include' from inside the page. Uses the live session's cookies/auth. Auto-records the URL pattern to the playbook (normalized — hash segments collapse to {hash}).

inject_on_new_document / remove_injected_script

Register a JS script that runs in page context before page scripts on every navigation. Use for XHR interceptors, fingerprint probes, instrumentation that needs to survive page changes.

xhr_upload

Direct file upload via fetch + FormData. Bypasses widgets that corrupt programmatic input.files injection (KDP AjaxInput pattern). Pair with capture_network to discover the URL.

Vision + coords

vision_check

The fallback for when DOM strategies don't reach the target — canvas-rendered widgets, OAuth popups, weird iframes, image-only buttons. Snaps a screenshot, sends to Claude with your question, optionally returns {x, y} coordinates.

example
v = vision_check(question="click coords for the post button", session="slot-1", tab="tab1")
# v == {ok: true, answer: "...", click: {x: 720, y: 480}}
if v.click:
    click_at_coords(x=v.click.x, y=v.click.y)

click_at_coords

Real CDP click at absolute viewport coords. Pair with vision_check.

Stealth, captcha, healing

enable_stealth

Apply fingerprint masks (navigator.webdriver, plugins, languages, WebGL vendor, chrome.runtime). Persistent across navigations. Real-Chrome sessions usually don't need this; fresh-profile flows and Cloudflare/Akamai-guarded sites do.

solve_captcha

Submit a challenge to 2captcha (or capmonster in v0.2). Supports reCAPTCHA v2/v3, hCaptcha, Turnstile. Costs ~$0.001-0.003/solve. Without keys, returns a hint to fall back to pause_for_human.

drift_heal_suggest

When a recorded selector breaks, scan the current DOM for candidates ranked by aria-name overlap + structural anchor strength (data-testid > id > name). Returns top 8 suggestions with reasons.

Orchestration

run_parallel

Fan out N tool calls concurrently with a max-in-flight semaphore. Use for batch preflight, multi-session lockstep, cross-poster patterns.

example
run_parallel(calls=[
    {"tool": "scan_tab", "args": {"session": "slot-1", "tab": "tab1"}},
    {"tool": "scan_tab", "args": {"session": "slot-1", "tab": "tab2"}},
    {"tool": "scan_tab", "args": {"session": "slot-1", "tab": "tab3"}},
], max_concurrency=3)

start_recording / end_recording / replay_recipe

Record any sequence of action tools into a named recipe. Replay deterministically. Recordings save to ~/.webloom/engine/recipes/ as JSON.

Recipes (high-level workflows)

reddit_submit_comment

Full Reddit comment submission: navigate, mount Lexical composer via placeholder click, set text, find submit button across the churned selector zoo, verify landed. Records success/failure to playbook.

reddit_check_shadowban

Anonymous fetch of a Reddit profile from a logged-out perspective. Returns likely_shadowbanned verdict with reasons.

Safety + meta

pause_for_human

Halt and ask Mariano (or whoever's driving) to do something the bot can't. Beeps, records the manual-touch checkpoint to the playbook so future Thread runs know this step needs human help.

detect_anti_bot

Probe the page for known anti-bot patterns (Cloudflare interstitial, Datadome, PerimeterX, hCaptcha widgets). Returns verdict + signals.

detect_blocker

Detect "you've been logged out" / paywall / rate-limit overlays so a Thread can pause cleanly instead of clicking through garbage.

get_playbook

Read everything the engine has learned about a domain. Includes installed Thread data merged with live learning. Source of truth for "is this Thread proven on this site?"

The compounding effect

Every action a tool performs records to the playbook. Over time, on a given domain, the engine learns: which strategy wins for which selector, how long pages take to settle, what waits chain together. That's the moat — your Threads get more reliable the more they're used, across every buyer.