The era of generic, free‑roaming LLM browser agents is ending. Blindly clicking on CSS selectors with heuristic waits is too brittle for production. What we need are site adapters: stable, typed interfaces that lift raw DOM operations into high‑level, domain‑safe intents (e.g., 'search_products', 'checkout_cart', 'book_flight'). Once LLMs operate on these intents, everything downstream gets easier: reliability, safety, testability, and speed.
This article lays out an end‑to‑end system for auto‑generating and maintaining these adapters from human clickstreams. We will cover how to:
- Record human sessions and segment them into task‑level traces
- Infer form schemas and input contracts from DOM structure, accessibility metadata, and server responses
- Synthesize drift‑tolerant drivers that wait on semantics, not timeouts
- Add invariants, canaries, and observability to catch regressions early
- Self‑heal selectors with semantic anchoring and ranked fallbacks
- Enforce CI contracts (golden traces, schema diffs, synthetic monitors) so adapters evolve predictably
The goal is simple: allow LLM agents to act via safe, typed, and testable intents, while the adapter layer absorbs UI drift and operational complexity.
Why typed site adapters beat raw automation
End‑to‑end UI automation has decades of prior art: Selenium, Puppeteer, Playwright, RPA platforms. They work, but are brittle in the face of CSS churn, asynchronous events, and layout changes. LLMs add a new failure mode: agents may choose nonsensical sequences, omit preconditions, or hallucinate element roles. Typed adapters solve these issues by:
-
Raising the abstraction level
- Expose capabilities as typed functions (e.g., 'login(username: string, otp?: string): Session') rather than 'selector.click()'.
- Hide ephemeral DOM details. The adapter owns selectors and healing.
-
Constraining behavior with contracts
- Inputs validated against inferred schemas and server constraints.
- Postconditions enforced via invariants (e.g., 'user is authenticated and sees greeting').
-
Creating testable boundaries
- Each intent becomes a unit of testing with golden traces and deterministic fixtures.
- CI can gate on intent‑level success rather than raw CSS survival.
-
Enabling performance and observability
- Intent times, retries, fallback paths, and healing events are measurable.
- Canary dashboards can detect drift before agents suffer.
In short: adapters let humans and LLMs program the web by intent, not by pixels.
System architecture at a glance
- Recorder: Captures clickstreams with rich context (DOM snapshots, ARIA roles, network responses, viewport, cookies, timing).
- Trace segmenter: Chops raw sessions into labeled tasks like 'login', 'search', 'add_to_cart'.
- Schema inference: Learns parameter shapes, validators, and dependencies for forms and flows.
- Driver synthesizer: Emits strongly‑typed, drift‑tolerant Playwright (or Selenium) drivers with semantic waits and fallbacks.
- Invariants and canaries: Attach pre/postconditions and continuous monitors.
- Selector healing: Maintains prioritized selector sets with semantic anchors and differential scoring.
- CI contracts: Golden traces, snapshot diffs, synthetic runs, semver, and review gates.
We will walk through each layer, then show how to ship it in a developer‑friendly toolchain.
Recording human sessions without losing semantics
Recording a clickstream is easy. Recording a clickstream you can learn from is not. A production‑grade recorder should capture:
- DOM and accessibility context
- Element outerHTML and simplified path, visibility, bounding box, role/name from ARIA, label associations, input types.
- Event semantics
- Clicks, inputs, keypress, paste, scrolls. Timestamps, latency between events.
- Network activity
- XHR/fetch metadata, request/response samples (scrubbed), server validation errors, redirect chains.
- Visual and layout cues
- Bounding rectangles, z‑index, overlays, shadow DOM boundaries, viewport size.
- Navigation state
- URL + query, cookies, local/session storage diffs, history events.
- Privacy and security
- Redact secrets with on‑device rules (passwords, tokens, PII) before egress.
Practical tips:
- Prefer Playwright’s tracing hooks; collect HAR‑like network logs with body sampling + redaction.
- Normalize time by waiting for the next animation frame after each DOM mutation to reduce jitter.
- Record computed roles and accessible names (per ARIA); they are more stable than CSS classes.
- When in doubt, store a compressed DOM snapshot diff (e.g., minimal JSON patch) at key waypoints.
Segmenting clickstreams into intents
Humans don’t label their clicks as 'login' or 'checkout'. We have to infer segment boundaries and assign canonical names.
Heuristics that work well:
- Page/URL phase changes: major route changes or query param switches.
- Network milestones: server responds with 'auth=true', cart totals update, new session cookie.
- Form submission edges: 'Enter' keypress or button click closing a form node.
- Visual milestones: modal opens/closes, full‑screen overlay toggles.
Algorithm sketch:
- Build a directed acyclic graph of 'state → action → state' from the session.
- Detect strongly connected components of repeated micro‑actions (e.g., type chars) and collapse them.
- Cluster remaining edges by downstream network responses and DOM diffs.
- Assign proto‑labels using a library of patterns: 'login', 'search', 'filter', 'add_to_cart', 'checkout', 'upload', 'confirm'.
- Human‑in‑the‑loop pass for disambiguation; store final label with features for future auto‑labeling.
With enough labeled traces, a small classifier (gradient boosted trees over lexical/ARIA/URL/network features) can achieve >90% accuracy for common site tasks. This is usually sufficient to propose an initial intent API for the site.
Inferring form schemas and contracts
Typed adapters are only as good as their input constraints. We infer schemas by combining static signals with dynamic probes.
Static cues from the DOM:
- Input types and attributes: type=email, min/max/step, pattern, required, maxlength.
- Label associations: label for/id, aria‑label, aria‑describedby.
- Option enumerations: select options, datalist values, auto‑complete tokens.
- Role semantics: role=combobox vs role=button; date pickers; slider ranges.
Dynamic cues from observation:
- Server validation errors on submit; which fields caused them, error messages.
- Conditional visibility: field B appears only after choosing value in field A.
- Masking/formatting: phone number auto‑format, currency locales.
- Dedup constraints: email already taken, username constraints from response.
Probative sampling (optional in staging):
- Mutate candidate inputs with representative values to map boundaries (min, max, regex windows).
- Use canary accounts and scrubbed data to avoid PII.
Representing the schema as TypeScript types + runtime validators keeps the adapter trustworthy and DX‑friendly.
Example inferred schema for a login flow:
tsexport type LoginParams = { username: string & { maxLen: 128 }; password: string & { minLen: 8 }; otp?: string & { pattern: /\d{6}/ }; rememberMe?: boolean; }; export type LoginResult = { sessionId: string; userDisplayName: string; mfaRequired?: boolean; };
At runtime, attach zod or valibot validators to enforce contracts before automation runs:
tsimport { z } from 'zod'; export const LoginParamsZ = z.object({ username: z.string().min(1).max(128), password: z.string().min(8), otp: z.string().regex(/^\d{6}$/).optional(), rememberMe: z.boolean().optional(), });
Your generator should derive these from recorded traces and DOM analysis, then keep them updated as the site evolves.
Synthesizing drift‑tolerant drivers
Once we have intents and schemas, we synthesize drivers that turn typed inputs into robust browser actions. The core principles:
-
Prefer semantic anchors over raw selectors
- Use getByRole, getByLabelText, and accessible names.
- Fall back to data‑testid, stable ids, or text; avoid nth‑child where possible.
-
Wait for state, not time
- Wait for: request finished with 2xx, element to be enabled/visible, DOM invariant satisfied.
- Avoid fixed timeouts; use exponential backoffs with caps.
-
Idempotent steps
- Before clicking 'Login', check if already logged in; skip or validate.
- Before typing, clear and compare field values; avoid duplicating input.
-
Deterministic navigation
- Use 'page.waitForURL' with param patterns instead of arbitrary sleeps.
-
Structured retries and fallbacks
- On failure, attempt alternates selectors; log healing events.
Here is a condensed example driver for 'search_products' using Playwright:
tsimport { Page, expect } from '@playwright/test'; export type SearchParams = { query: string; filters?: { priceMax?: number; brand?: string } }; export type SearchResult = { resultsCount: number; topHits: Array<{ title: string; price: number; url: string }> }; export async function searchProducts(page: Page, params: SearchParams): Promise<SearchResult> { // Precondition: not in an auth wall await expect(page.getByRole('textbox', { name: /search/i })).toBeVisible({ timeout: 5000 }); const searchBox = page.getByRole('textbox', { name: /search/i }); await searchBox.fill(''); await searchBox.type(params.query, { delay: 20 }); await Promise.all([ page.waitForResponse(res => res.url().includes('/api/search') && res.ok()), page.getByRole('button', { name: /search/i }).click() ]); // Optional filters if (params.filters?.priceMax) { const slider = page.getByRole('slider', { name: /max price/i }); if (await slider.isVisible()) await slider.fill(String(params.filters.priceMax)); } if (params.filters?.brand) { const brandBox = page.getByRole('combobox', { name: /brand/i }); if (await brandBox.isVisible()) await brandBox.type(params.filters.brand); const opt = page.getByRole('option', { name: new RegExp(params.filters.brand, 'i') }); if (await opt.isVisible()) await opt.click(); } // Postcondition: results grid loaded const grid = page.getByRole('grid', { name: /results/i }); await expect(grid).toBeVisible(); const items = grid.getByRole('row'); const topHits = [] as Array<{ title: string; price: number; url: string }>; const count = await items.count(); for (let i = 0; i < Math.min(count, 5); i++) { const row = items.nth(i); const title = await row.getByRole('link').first().innerText(); const priceText = await row.locator('[data-testid="price"]').first().innerText().catch(() => '0'); const price = Number(priceText.replace(/[^\d.]/g, '')) || 0; const url = await row.getByRole('link').first().getAttribute('href') ?? ''; topHits.push({ title, price, url }); } return { resultsCount: count, topHits }; }
Note the bias toward roles, visible state, and network‑driven waits. This alone improves survival rates across minor UI changes.
Invariants and canary checks
Even robust drivers can go stale. We attach invariants (assertions that must hold) and canary checks (lightweight external monitors) to each intent.
Types of invariants:
-
Precondition invariants
- Page is not in an interstitial (cookie wall, age gate, auth wall) unless expected.
- Required inputs are visible and enabled.
-
Postcondition invariants
- DOM shows expected anchor elements (e.g., profile avatar) and server context (auth cookie present).
- No error banners; request logs did not carry 4xx or 5xx after submit.
-
Temporal invariants
- Response times remain under p95 budget; retries under threshold.
-
Semantic invariants
- Derived values consistent: subtotal == sum(lineItems), currency matches locale.
A compact pattern is to co‑locate invariants with the driver and expose health probes for CI and production monitors:
tsexport async function assertLoggedIn(page: Page) { await expect(page.getByRole('img', { name: /avatar|profile/i })).toBeVisible(); const cookies = await page.context().cookies(); if (!cookies.some(c => c.name === 'session' && c.value.length > 10)) throw new Error('No session cookie'); } export async function canaryLogin(page: Page, creds: { user: string; pass: string }) { await login(page, { username: creds.user, password: creds.pass }); await assertLoggedIn(page); }
A canary job can run 'canaryLogin' every few minutes with a synthetic account. Alerting thresholds should trigger well before production traffic fails.
Self‑healing selectors that prioritize semantics
Selector drift is inevitable. Our adapter needs a healing engine that chooses the best available anchor at runtime and records the event for later review.
Ranking signals (highest to lowest):
- ARIA role + accessible name (computed, not raw attributes)
- data‑testid or data‑qa that historically stays stable
- Labeled control relation (label 'for' → input)
- Visible text content within reason (locale aware)
- CSS id and stable class tokens (ignore hash‑like or build‑fingerprinted classes)
- Relative anchor (closest labeled ancestor/descendant)
- Spatial heuristics (fallback only): position near stable landmark
A simple healing orchestrator:
tstype Anchor = { query: string; strategy: 'role'|'label'|'testid'|'text'|'css'|'relative'|'spatial'; weight: number }; export class Healer { constructor(private page: Page, private anchors: Anchor[]) {} async firstVisible(timeoutMs = 3000) { const start = Date.now(); const sorted = this.anchors.sort((a,b) => b.weight - a.weight); for (const a of sorted) { try { const loc = this.locatorFor(a); await loc.waitFor({ state: 'visible', timeout: Math.max(250, timeoutMs - (Date.now() - start)) }); this.logHeal(a); return loc; } catch { /* try next */ } } throw new Error('No viable selector'); } locatorFor(a: Anchor) { switch (a.strategy) { case 'role': return this.page.getByRole(a.query as any); case 'label': return this.page.getByLabel(a.query); case 'testid': return this.page.locator(`[data-testid="${a.query}"]`); case 'text': return this.page.getByText(a.query); case 'css': return this.page.locator(a.query); case 'relative': return this.page.locator(a.query); case 'spatial': return this.page.locator(a.query); } } logHeal(a: Anchor) { /* emit metric + event; if fallback used, raise severity */ } }
The generator emits multiple anchors per element when recording, and weights them based on stability history and heuristics (e.g., 'data-testid' beats text). When a lower‑ranked anchor is used, we record a 'heal' event with DOM diffs to inform a future PR that updates the primary selector list.
For advanced sites, add a semantic model:
- Learn a lightweight embedding over node text + ARIA to re‑find elements whose wording changed (e.g., 'Sign in' → 'Log in').
- Store per‑site landmarks (nav, footer, main) and anchor relative positions.
Contract‑driven CI for adapters
Treat adapters like SDKs: semver, tests, docs, and release gating. A minimal CI contract includes:
-
Golden traces
- For each intent, store a canonical trace (events + expected DOM/network anchors) on a stable staging environment.
- Re‑run and diff at PR time; allow expected deltas (selectors updated) but fail on semantic changes (e.g., fewer results than threshold when using fixtures).
-
Schema diffs
- Changes to input/output types require semver bumps. Generate OpenAPI‑like specs for intents and diff them in CI.
-
Synthetic e2e tests
- Canary flows against production with synthetic accounts. Fail PRs that degrade p95 or increase healing severity.
-
Lint and static checks
- Disallow brittle patterns (nth‑child, sleep), enforce semantic waits.
-
Observability budget checks
- p50/p95 latency budgets, retry counts, error taxonomies stable or improving.
Example adapter test with Playwright Test:
tsimport { test, expect } from '@playwright/test'; import { searchProducts } from '../adapters/shop/search'; test('search returns at least 10 hits for stable fixture', async ({ page }) => { await page.goto('https://staging.shop.example'); const res = await searchProducts(page, { query: 'wireless mouse' }); expect(res.resultsCount).toBeGreaterThanOrEqual(10); expect(res.topHits[0].title.toLowerCase()).toContain('mouse'); });
And a CI gate (YAML) that runs on every PR:
yamlname: adapter-ci on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: { node-version: '20' } - run: npm ci - run: npx playwright install --with-deps - run: npm run build - run: npm run test:e2e -- --reporter=line - run: npm run golden:diff - run: npm run schema:diff
Publish adapters as versioned packages. Major bumps signal breaking intent contracts; minor/patch carry compatible selector or invariant updates.
Putting it all together: an end‑to‑end example
Suppose we want an adapter for a retail site with three intents: 'login', 'search_products', 'add_to_cart'.
-
Record sessions
- Capture login with username + OTP, a typical product search with brand filter, and adding an item.
-
Segment and label
- Recognize three segments by URL changes and network calls: '/auth', '/search', '/cart'.
-
Infer schemas
- LoginParams: username, password, otp? (from observed prompts); SearchParams: query, filters; AddToCartParams: sku, quantity (from DOM + server error when exceeding stock).
-
Synthesize drivers with invariants
- Drivers use role/label anchors, request‑driven waits, and assert postconditions (auth cookie present; results grid visible; cart total increased).
-
Add selector healing
- Each target element stores 3–6 anchors with weights. On fallback, log a heal event + DOM snippet.
-
Wire CI contracts
- Golden traces stored for three flows. Schema diffs gate PRs. Canary account runs login hourly; search and cart daily.
-
Ship and observe
- Publish '@adapters/shop' version 1.2.3. Expose an 'Intents' catalog for LLM usage.
Now LLM agents can call high‑level functions instead of inventing brittle click sequences:
tsimport { login, searchProducts, addToCart } from '@adapters/shop'; // Tool signature for the LLM runtime export const tools = { login: { params: LoginParamsZ, run: login }, search_products: { params: SearchParamsZ, run: searchProducts }, add_to_cart: { params: AddToCartParamsZ, run: addToCart }, };
The agent prompt instructs it to plan using only exposed intents and to declare parameters explicitly. The adapter enforces type contracts, invariants, and heals selectors as needed.
Security, ethics, and governance
Adapters make automation more powerful; we must ensure responsible use.
-
Respect site policies
- Read robots.txt and TOS. Some sites forbid automation or scraping. Obtain written permission when in doubt.
-
Protect user data
- Redact PII at the recorder. Encrypt stored traces. Use synthetic canary accounts.
-
Be a good citizen
- Rate limit. Use backoff on server errors. Identify your automation with a clear UA string when permitted.
-
Handle auth safely
- Use OS keychain or vault for secrets. Never log credentials. Scope tokens minimally.
-
Avoid dark patterns
- Do not bypass paywalls or CAPTCHAs unless explicitly authorized.
-
Compliance
- For regulated data, maintain audit logs of adapter activity and approvals.
Observability that matters
You cannot improve what you cannot see. Instrument adapters with:
-
Structured logs per intent
- intent_name, version, latency_ms, retry_count, healed_selector_count, invariant_failures.
-
Traces that stitch browser and network
- Correlate click → XHR → DOM update. Assign a trace_id for cross‑service correlation.
-
Metrics and alerts
- p50/p95 intent latency, error rate, healing severity histogram.
-
Session replays for failures
- Store minimal DOM snapshots and diffs to diagnose regressions without full video.
These signals feed both CI and runtime canaries.
Design choices that pay off
Based on field experience, these choices significantly improve adapter survival and maintainability:
-
Bias to ARIA and labels first
- Accessibility metadata is more stable than CSS. Use computed accessible names.
-
Keep drivers pure and deterministic
- No global mutable state. Given inputs and a page, produce the same outputs.
-
Fail early and loudly in CI; degrade gracefully in prod
- CI gates should be strict. In prod, prefer a lower‑impact fallback flow where safe.
-
Co‑locate code and contracts
- Keep type validators, invariants, and driver steps in one module per intent. Avoid drift across files.
-
Treat healing as a signal, not a crutch
- A healed selector should open an issue/PR automatically. Humans should bless stable updates.
-
Prefer small, composable intents
- 'apply_price_filter' is easier to reuse and test than a monolithic 'configure_filters'. Compose at the agent level.
Tooling suggestions and ecosystem fit
- Browser automation: Playwright (recommended) for robust selectors and tracing; Selenium or WebDriver BiDi for broader language support.
- Validation: zod, valibot, or io‑ts for TS; pydantic for Python adapters.
- Diffing: DOM‑diff libraries for golden traces; JSON diff for schema contracts.
- Observability: OpenTelemetry for traces; Prometheus + Grafana for metrics.
- Storage: Object store for traces; Git LFS or a dedicated artifact store for snapshots.
- CI: GitHub Actions, GitLab CI; nightly synthetic jobs via a scheduler.
Future directions
-
Site‑provided adapters
- Similar to OpenAPI for REST, a 'WebIntent' manifest could advertise typed, automation‑friendly capabilities of a site.
-
Standard selector contracts
- A community spec for ranked selector anchors and healing semantics to improve portability across tools.
-
Learning‑augmented healing
- Train a small model on historical DOMs to predict resilient anchors and preemptively rewrite selectors.
-
Real user monitoring feedback loop
- Feed anonymized production healing events back into the generator to auto‑propose PRs with updated anchors.
-
Sandboxed agent runtimes
- WASM or containerized runners with scoped permissions and network policies to harden automation.
Checklist: from clickstreams to adapters
- Recording
- Capture DOM/ARIA context, network, and layout cues; redact secrets.
- Segmentation
- Infer tasks from URL, network, and DOM diffs; human‑verify labels.
- Schema inference
- Extract input types, constraints, and conditional logic; attach validators.
- Driver synthesis
- Generate semantic selectors, deterministic waits, and idempotent steps.
- Invariants and canaries
- Add pre/postconditions and synthetic checks with alerts.
- Self‑healing
- Maintain ranked anchors; log healing; auto‑propose updates.
- CI contracts
- Golden traces, schema diffs, lint rules, latency budgets, versioned releases.
- Observability
- Structured logs, traces, metrics, replay of failures.
- Governance
- Respect TOS, rate limits, and data protection requirements.
Conclusion
Auto‑generated site adapters transform LLM browser agents from experimental toys into production‑grade automata. By converting human clickstreams into typed intents; synthesizing drift‑tolerant, invariant‑guarded drivers; and enforcing CI contracts, we create a system that is both maintainable and resilient. The adapter layer absorbs web UI chaos so agents can focus on goals, not clicks.
Teams that invest in this approach report dramatic reductions in flaky runs, faster incident response to UI drift, and easier collaboration between platform and product engineers. Most importantly, typed intent APIs create a safe, auditable perimeter for autonomous systems—exactly what we need as agents take on more of the mundane web work for us.
