OpenTelemetry for Agentic Browsers: End‑to‑End Tracing of CDP/BiDi, UA/Client Hints, and Tools to Cut Browser Agent Security Risk

Agentic browsers — automated browsing systems driven by LLMs or planning agents — are migrating from research labs into production. They read, click, fill, upload, download, and execute tools. With that power comes observability and security responsibilities: we must know what they did, why, and whether it was safe.

OpenTelemetry (OTel) is the obvious choice for end-to-end visibility. But standard web and HTTP instrumentation alone won’t cover headless browsers controlled via CDP (Chrome DevTools Protocol) or WebDriver BiDi, nor the planner-tool-browser pipeline unique to AI agents. This guide shows a pragmatic, end-to-end pattern to instrument agentic browsers with OTel:

Trace CDP/BiDi command and event flows, network/DOM actions, and the planner/tool calls that trigger them.
Capture User-Agent and Client Hints consistently.
Propagate context between the agent brain, tool layer, and browser worker processes.
Compute per-session and per-action security risk scores from OTel signals.
Debug and operate production pipelines using tail-based sampling, log/trace correlation, and guardrail alerts.

This is a build guide. Expect code, clear conventions, and trade-off commentary. Examples use Node.js/TypeScript (Playwright/Puppeteer) and Python (Selenium BiDi), but the patterns are portable.

Why OTel for agentic browsers

Classic browser automation tooling gives you logs and maybe screenshots. That’s not enough when:

An LLM planner issues a tool call that triggers a cascade of navigation, fetches, and DOM writes.
The agent’s identity (User-Agent/Client Hints) must be consistent and auditable.
You need to detect and quantify security risk (exfiltration, credential leakage, drive-by downloads) in near real time.
You must trace across process boundaries: planner → tool runner → browser worker → external sites → callbacks.

OpenTelemetry provides:

Traces and spans to model complex, nested workflows.
Semantic conventions for HTTP, RPC, messaging, and logs.
Context propagation (W3C Trace Context) across services and protocols.
Metrics to signal rate/latency/error, linked to traces via exemplars.
Vendor-neutral export to Jaeger, Tempo, Elastic, Honeycomb, Datadog, and more via OTLP and the Collector.

The missing piece is a set of conventions and adapters for CDP/BiDi and agent tooling. That’s what we’ll build.

Architecture and span model

We’ll model a single agent run as a trace with a top-level span representing the plan or task. Under it, we nest spans for:

Tool decisions and calls (e.g., browse, search, extract, fill_form).
Browser control protocols (CDP/BiDi): commands and major events grouped by intent.
Network requests (HTTP client spans) initiated by the browser.
DOM interactions: clicks, inputs, script injections.
Risk-analysis spans and events.

Recommended span hierarchy:

agent.run (root)
- agent.plan
- tool.call:browser.navigate (one per high-level action)
  - browser.navigate (CDP/BiDi commands)
    - http.client (for main resource)
    - browser.dom.event:load
    - browser.dom.action:click
    - http.client (subresources, optionally sampled)
- tool.call:extract_table
- risk.assessment (aggregates attributes from children)

Use span links for cross-navigation relationships (e.g., a popup, a download handled in another context), and for correlating metric exemplars.

OpenTelemetry setup (Node.js)

We’ll use OTel SDK for Node.js, an OTLP exporter, and a simple resource describing the agent service.

ts
// instrumentation/otel.ts
import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes as Res } from '@opentelemetry/semantic-conventions';

// Optional: uncomment for SDK internal diagnostics
// diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);

export function initTracing() {
  const provider = new NodeTracerProvider({
    resource: new Resource({
      [Res.SERVICE_NAME]: 'browser-agent',
      [Res.SERVICE_VERSION]: process.env.SERVICE_VERSION || '0.1.0',
      'deployment.environment': process.env.DEPLOY_ENV || 'dev'
    })
  });

  const exporter = new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT || 'http://localhost:4318/v1/traces'
  });

  provider.addSpanProcessor(new BatchSpanProcessor(exporter));
  provider.register();
}

Then initialize at process start:

ts
// index.ts
import { initTracing } from './instrumentation/otel';
initTracing();

In Python, a similar setup:

python
# instrumentation/otel.py
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter


def init_tracing():
    resource = Resource.create({
        'service.name': 'browser-agent',
        'service.version': '0.1.0',
        'deployment.environment': 'dev',
    })
    provider = TracerProvider(resource=resource)
    exporter = OTLPSpanExporter(endpoint='http://localhost:4318/v1/traces')
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

Instrumenting CDP (Puppeteer/Playwright)

CDP exposes domains for Network, Page, Runtime, DOM, Input, etc. We’ll capture high-level commands as spans and record low-level events as span events with structured attributes. The goal is to avoid a flood of spans but retain enough detail for root-cause analysis.

Playwright example with CDP events

ts
import { chromium } from 'playwright';
import { context as otContext, trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('browser-agent');

export async function runNavigation(url: string, headers?: Record<string, string>) {
  const browser = await chromium.launch({ headless: true });
  const context = await browser.newContext({
    userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ' +
               '(KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    extraHTTPHeaders: headers || {}
  });
  const page = await context.newPage();

  // Attach CDP session to observe Network events
  const session = await context.newCDPSession(page);
  await session.send('Network.enable');

  const span = tracer.startSpan('browser.navigate', {
    attributes: {
      'browser.protocol': 'cdp',
      'browser.target.url': url,
      'user_agent.original': await page.evaluate(() => navigator.userAgent)
    }
  });

  // Listen to request lifecycle
  session.on('Network.requestWillBeSent', (evt: any) => {
    span.addEvent('network.request', {
      'http.request.method': evt.request?.method,
      'url.full': evt.request?.url,
      'network.requestId': evt.requestId,
      'network.initiator.type': evt.initiator?.type
    });
  });

  session.on('Network.responseReceived', (evt: any) => {
    span.addEvent('network.response', {
      'http.response.status_code': evt.response?.status,
      'url.full': evt.response?.url,
      'network.requestId': evt.requestId,
      'response.mime_type': evt.response?.mimeType
    });
  });

  try {
    await page.goto(url, { waitUntil: 'networkidle' });
    span.setAttribute('browser.page.title', await page.title());
    span.addEvent('dom.load', {
      'document.ready_state': await page.evaluate(() => document.readyState)
    });
    span.setStatus({ code: SpanStatusCode.OK });
  } catch (err: any) {
    span.recordException(err);
    span.setStatus({ code: SpanStatusCode.ERROR, message: err?.message });
  } finally {
    span.end();
    await browser.close();
  }
}

Notes:

Use a single span per high-level navigation, add CDP events as span events. This is sufficient for most debugging without exploding span counts.
For large pages, consider sampling subresource events (e.g., only fonts/scripts/images with status >= 400 or content-type of interest).
Record the chosen User-Agent string and, if possible, Client Hints seen in responses (more below).

Puppeteer request-level spans

If you need timing on individual fetches initiated by the page, create child spans keyed by CDP requestId.

ts
import puppeteer from 'puppeteer';
import { trace, context as otContext } from '@opentelemetry/api';

const tracer = trace.getTracer('browser-agent');

export async function navigateWithRequestSpans(url: string) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  const client = await page.target().createCDPSession();
  await client.send('Network.enable');

  const requestSpans = new Map<string, any>();

  client.on('Network.requestWillBeSent', (evt: any) => {
    const span = tracer.startSpan('http.client', {
      attributes: {
        'network.protocol.name': 'http',
        'browser.protocol': 'cdp',
        'url.full': evt.request.url,
        'http.request.method': evt.request.method
      }
    });
    requestSpans.set(evt.requestId, span);
  });

  client.on('Network.responseReceived', (evt: any) => {
    const span = requestSpans.get(evt.requestId);
    if (span) {
      span.setAttribute('http.response.status_code', evt.response.status);
      span.setAttribute('response.mime_type', evt.response.mimeType);
    }
  });

  client.on('Network.loadingFinished', (evt: any) => {
    const span = requestSpans.get(evt.requestId);
    if (span) {
      span.end();
      requestSpans.delete(evt.requestId);
    }
  });

  await page.goto(url, { waitUntil: 'networkidle0' });
  await browser.close();
}

Careful: this can produce many spans on resource-heavy pages. Apply sampling or filtering (e.g., only main-frame and XHR/fetch).

Instrumenting WebDriver BiDi (Selenium)

WebDriver BiDi provides event streams for browsing contexts and network traffic across browsers, not just Chromium. Instrumentation patterns mirror CDP, but APIs differ per language binding and driver maturity.

Python example with Selenium BiDi for navigation and network events:

python
from selenium import webdriver
from selenium.webdriver.common.bidi.bidi_connection import BidiConnection
from selenium.webdriver.common.bidi.modules.network import Network
from selenium.webdriver.common.bidi.modules.browsing_context import BrowsingContext
from opentelemetry import trace

tracer = trace.get_tracer('browser-agent')


def navigate_bidi(url: str):
    options = webdriver.ChromeOptions()
    options.add_argument('--headless=new')
    driver = webdriver.Chrome(options=options)

    bidi: BidiConnection = driver.bidi_connection
    network = Network(bidi)
    context = BrowsingContext(bidi, driver.current_window_handle)

    # Subscribe to events
    bidi.session.subscribe(events=['network.beforeRequestSent', 'network.responseCompleted'])

    span = tracer.start_span('browser.navigate', attributes={
        'browser.protocol': 'bidi',
        'browser.target.url': url
    })

    def on_before_request(event):
        span.add_event('network.request', {
            'http.request.method': event['request']['method'],
            'url.full': event['request']['url'],
            'network.requestId': event['request']['request'],
        })

    def on_response_completed(event):
        span.add_event('network.response', {
            'http.response.status_code': event['response']['status'],
            'url.full': event['response']['url'],
            'network.requestId': event['request']['request'],
        })

    network.on_before_request_sent(on_before_request)
    network.on_response_completed(on_response_completed)

    try:
        driver.get(url)
        span.set_attribute('browser.page.title', driver.title)
    finally:
        span.end()
        driver.quit()

Caveats:

BiDi APIs and driver support evolve; not all drivers support request modification or header injection. Prefer browser-level capabilities to set default headers when needed.
Event names and payload shapes may differ by driver version; guard your code and include safe defaults.

Capturing User-Agent and Client Hints

To reliably capture and reproduce identity, record both the explicit User-Agent string used and the Client Hints accepted/returned.

User-Agent: set at context creation (e.g., Playwright newContext userAgent) and record as span attributes on navigation spans and the agent.run root span.
Client Hints: on requests, log headers like Sec-CH-UA, Sec-CH-UA-Platform, Sec-CH-UA-Model, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Platform-Version, and on responses Accept-CH and Critical-CH.

Playwright example recording hints via request interception:

ts
import { chromium, Request } from 'playwright';
import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('browser-agent');

export async function recordUAAndCH(url: string) {
  const browser = await chromium.launch();
  const context = await browser.newContext({
    userAgent: 'MyAgent/1.0 (compatible)'
  });
  const page = await context.newPage();

  const span = tracer.startSpan('browser.navigate', {
    attributes: {
      'user_agent.original': 'MyAgent/1.0 (compatible)'
    }
  });

  page.on('request', (req: Request) => {
    const headers = req.headers();
    // Record a subset to keep cardinality bounded
    const ch = {
      'sec-ch-ua': headers['sec-ch-ua'],
      'sec-ch-ua-platform': headers['sec-ch-ua-platform'],
      'sec-ch-ua-model': headers['sec-ch-ua-model']
    };
    span.addEvent('client_hints.request', ch);
  });

  page.on('response', async (res) => {
    const headers = await res.allHeaders();
    const ch = {
      'accept-ch': headers['accept-ch'],
      'critical-ch': headers['critical-ch']
    };
    span.addEvent('client_hints.response', ch);
  });

  await page.goto(url);
  span.end();
  await browser.close();
}

Semantic note: recent OpenTelemetry HTTP conventions are in flux regarding user-agent attributes. If your stack still expects http.user_agent, set both user_agent.original and http.user_agent to ease transition.

Context propagation across planner → tool → browser

Agent pipelines span multiple processes (or containers): an LLM planner chooses actions, a tool runner invokes browser work, and the browser issues HTTP. You need to:

Attach all spans to the same trace.
Inject W3C Trace Context (traceparent, tracestate) into HTTP requests the browser makes when it’s appropriate (e.g., to your own APIs), without leaking it broadly to third-party sites.

Pattern:

The request entering your agent service starts a root agent.run span.
The planner’s thought and decision process occurs within a child span agent.plan, annotated with prompt hashes and redaction-safe metadata (never log raw secrets or user data without scrubbing).
Each tool call becomes a child span (tool.call:name). If the tool uses RPC or HTTP to a worker, inject trace context into that call.

Node: injecting extra headers for first-party domains only.

ts
import { context as otContext, propagation, trace } from '@opentelemetry/api';

function traceHeadersFor(url: string): Record<string, string> {
  const carrier: Record<string, string> = {};
  propagation.inject(otContext.active(), carrier);
  // Only return for allowlist domains
  const allowed = ['api.mycompany.internal'];
  const host = new URL(url).host;
  if (allowed.includes(host)) return carrier;
  return {}; // do not leak traceparent to third parties
}

// Playwright navigation with conditional header injection
await context.setExtraHTTPHeaders({ ...traceHeadersFor(targetUrl) });

Browser-to-server context propagation is limited: you can’t reliably inject headers into all subresource requests across browsers, and you should not attempt it for third-party sites. Focus your propagation on first-party calls where you control the server. Record trace IDs on the browser side regardless, so you can correlate when you capture upstream logs.

Tracing tool calls and planner decisions

Instrument the agent’s cognitive loop as spans with structured attributes. At minimum:

agent.plan: attributes like model.name, token.usage.input/output, prompt.hash (not the raw prompt), and strategy tags.
tool.call:name: input summary hash, target url/domain, intention (navigate, scrape, submit, download), guardrails triggered.

Generic tool wrapper (Node):

ts
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('browser-agent');

type ToolFn<I, O> = (input: I) => Promise<O>;

export function withToolSpan<I, O>(name: string, fn: ToolFn<I, O>): ToolFn<I, O> {
  return async (input: I) => {
    const span = tracer.startSpan(`tool.call:${name}`, {
      attributes: {
        'tool.name': name,
        'tool.input.hash': stableHash(input)
      }
    });
    try {
      const out = await fn(input);
      span.setAttribute('tool.output.hash', stableHash(out));
      span.setStatus({ code: SpanStatusCode.OK });
      return out;
    } catch (e: any) {
      span.recordException(e);
      span.setStatus({ code: SpanStatusCode.ERROR, message: e?.message });
      throw e;
    } finally {
      span.end();
    }
  };
}

function stableHash(obj: unknown): string {
  const s = JSON.stringify(obj, Object.keys(obj as any).sort());
  // simple non-crypto hash for cardinality control
  let h = 0;
  for (let i = 0; i < s.length; i++) h = (h * 31 + s.charCodeAt(i)) | 0;
  return `h:${(h >>> 0).toString(16)}`;
}

This keeps data volumes small and safe. If you need full payload capture for debugging, gate it behind a feature flag and scrub PII.

Computing risk scores from OTel signals

Security posture depends on what the agent did and where. We can compute a risk score per agent.run using span events and attributes. The score is not a replacement for hard policies, but it’s a powerful triage and alerting signal.

Principles:

Use a transparent, additive model. Each risky behavior adds points.
Base it on spans/events you already emit.
Record the score and components as attributes on a risk.assessment span and on the root span for queryability.

Risk features (examples):

Cross-origin POST or PUT with cookies: +20
File download from unknown domain: +15
Form submission to domain not in allowlist: +25
Upload of large blob: +15
Password or token-like strings inserted into inputs: +30
Navigation to high-risk TLD or domain reputation negative: +10
Execution of inline script via Runtime.evaluate: +10
Prompt-injection keyword triggers on page: +10
Mixed-content (HTTP on HTTPS): +10

Risk engine (Node) consuming emitted events:

ts
import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('browser-agent');

interface RiskComponent { key: string; score: number; details?: string }

export class RiskAccumulator {
  private components: RiskComponent[] = [];
  add(c: RiskComponent) { this.components.push(c); }

  total() { return this.components.reduce((s, c) => s + c.score, 0); }

  emit(parentSpan: any) {
    const span = tracer.startSpan('risk.assessment', { parent: parentSpan.spanContext?.() ? parentSpan : undefined });
    const total = this.total();
    span.setAttribute('risk.score', total);
    for (const c of this.components) {
      span.addEvent('risk.component', { 'risk.key': c.key, 'risk.score': c.score, 'risk.details': c.details || '' });
    }
    span.end();
    // Also attach to root for easy querying
    parentSpan.setAttribute('risk.score', total);
    parentSpan.setAttribute('risk.component.count', this.components.length);
  }
}

Hooking it to CDP events:

ts
function analyzeNetworkEvent(evt: any, risk: RiskAccumulator) {
  const url = new URL(evt.request?.url || evt.response?.url || 'http://invalid.local');
  const method = evt.request?.method || 'GET';
  const isPost = method === 'POST' || method === 'PUT' || method === 'PATCH';
  const hasCookies = (evt.request?.headers && Object.keys(evt.request.headers).some(h => h.toLowerCase() === 'cookie'));
  if (isPost && hasCookies && !isFirstParty(url.host)) {
    risk.add({ key: 'cross_origin_post_with_cookies', score: 20, details: url.host });
  }
}

function isFirstParty(host: string) {
  return host.endsWith('.mycompany.internal');
}

You can run the accumulator at the end of an agent.run and ship the score as both a span attribute and a metric (gauge or histogram) with exemplars linking back to a trace ID.

Metrics with exemplars

If you also publish a risk.score metric, use exemplars to link to traces (support varies by backend).

ts
import { metrics, ValueType } from '@opentelemetry/api-metrics';

const meter = metrics.getMeter('browser-agent');
const riskHistogram = meter.createHistogram('agent.risk.score', {
  description: 'Per-run risk score',
  unit: '1',
});

function recordRisk(score: number) {
  riskHistogram.record(score, { 'deployment.environment': process.env.DEPLOY_ENV || 'dev' });
}

Note: the current OTel JS metrics API may differ depending on the SDK version you use. If you prefer simplicity, set the score only as span attributes and evaluate in your tracing backend.

Debugging production pipelines

Observability is as much about restraint as visibility. You must balance fidelity, cost, and privacy.

Sampling: Use head-based sampling at low rates for routine traffic, and tail-based sampling to keep high-value traces (errors, high risk.score, long latency). Tail-sampling is best done in the OpenTelemetry Collector.
Span cardinality: Avoid per-resource spans unless necessary; prefer span events.
Attribute cardinality: Hash or bucket values (domains, URLs). Use allowlists for full URLs when debugging.
PII scrubbing: Configure processors to drop attributes/events that may contain secrets. Never ship raw passwords, tokens, or full form contents.
Correlation: Inject trace IDs into application logs from your broker/worker to tie logs to traces.

Tail-based sampling policy (Collector)

A minimal Collector config to keep error and high-risk traces:

yaml
processors:
  tail_sampling:
    decision_wait: 5s
    num_traces: 100000
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: high_risk
        type: numeric_attribute
        numeric_attribute:
          key: risk.score
          min_value: 20
          max_value: 10000
      - name: default
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

Pipe traces into a backend (Tempo/Jaeger) and use service graphs to visualize planner → tool → browser flows. Add dashboards for:

Top domains by risk.
Error rate by tool name.
Latency distribution per browser action.
Frequency of prompt-injection triggers.

DOM interaction tracing and anti-exfil guardrails

DOM actions are often the safest place to attach intent. A click on a button with text "Download" means more than a generic network event.

Pattern:

Wrap page.locator(...).click() and page.fill() in spans and emit minimal, non-PII selectors/text.
Detect forms and inputs carrying sensitive names (password, token, api_key). If an agent tries to fill them, add risk points, and optionally block.

Example wrapper (Playwright):

ts
async function safeClick(page: any, selector: string) {
  const span = tracer.startSpan('browser.dom.action:click', { attributes: { selector } });
  try {
    await page.click(selector);
    span.end();
  } catch (e) {
    span.recordException(e as any);
    span.end();
    throw e;
  }
}

async function safeFill(page: any, selector: string, value: string, risk: RiskAccumulator) {
  const span = tracer.startSpan('browser.dom.action:fill', { attributes: { selector } });
  try {
    if (isSensitiveSelector(selector) && looksLikeSecret(value)) {
      risk.add({ key: 'secret_in_form_fill', score: 30, details: selector });
    }
    await page.fill(selector, value.replace(/[A-Za-z0-9]{16,}/g, '***redacted***'));
  } finally {
    span.end();
  }
}

function isSensitiveSelector(sel: string) {
  const s = sel.toLowerCase();
  return s.includes('password') || s.includes('token') || s.includes('api_key');
}

function looksLikeSecret(v: string) {
  return /sk-[0-9a-z]{20,}/i.test(v) || /[A-Za-z0-9]{32,}/.test(v);
}

These guards both record and minimize harm.

Header and identity management to reduce security risk

Fix and record identity: For reproducibility, set a static User-Agent per environment and document it in spans. For stealth scraping, you may rotate identities, but record what you used.
Client Hints: Only send what you need. Many servers request more hints than necessary.
Cookie isolation: Use incognito contexts per run and never reuse context state across tenants/sessions. Emit context.id attributes for correlation, not the cookie values.
Download controls: Intercept downloads and block by default, logging an event. Allow only from allowlist domains.

Playwright download gate:

ts
context.setDefaultDownloadOptions({ behavior: 'deny' });
page.on('download', (dl) => {
  const span = tracer.startSpan('browser.download');
  span.addEvent('download.blocked', { url: dl.url() });
  span.end();
});

Putting it together: a reference run

A reference flow for processing an agent task:

Create agent.run root span and attach request metadata (tenant, task id, safe keywords).
Start agent.plan span. Record model id and prompt hash. End when plan is decided.
For each tool action:
- Start tool.call:browser.navigate span.
- Spawn a browser context with deterministic UA and extra headers (trace headers only for first-party).
- Start browser.navigate span. Attach CDP/BiDi events as span events.
- Wrap DOM actions with spans and risk checks.
- Aggregate risk and emit risk.assessment span.
End tool span and propagate outputs to the planner. Repeat until done.
End agent.run with final risk.score attribute.

Pseudo-code (Node):

ts
import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('browser-agent');

async function runAgentTask(task: any) {
  const root = tracer.startSpan('agent.run', { attributes: { 'task.id': task.id } });
  const planSpan = tracer.startSpan('agent.plan', undefined, trace.setSpan(otContext.active(), root));
  // compute plan, record hash, etc.
  planSpan.end();

  for (const step of task.steps) {
    const toolSpan = tracer.startSpan(`tool.call:${step.tool}`, { attributes: { 'tool.name': step.tool } });

    const risk = new RiskAccumulator();
    await runBrowserStep(step, toolSpan, risk); // does CDP/BiDi work, emits DOM/network events

    risk.emit(toolSpan);
    toolSpan.end();
  }

  root.setAttribute('risk.score', 42); // maybe aggregate across steps
  root.end();
}

Data handling and privacy-by-default

Redact by default. Only allow full payload capture via a debugging flag (and ensure the flag cannot be enabled from untrusted inputs).
Hash high-cardinality attributes (URL paths, page titles) or bucket them.
Use OTel Collector processors to drop attributes and events that violate policy.
Separate noisy signals (per-resource spans) into a parallel pipeline or a separate sampling policy.

Collector attribute filtering example:

yaml
processors:
  attributes/scrub:
    actions:
      - key: tool.input.hash
        action: insert
        value: true
      - key: tool.input.raw
        action: delete
      - key: form.field.value
        action: delete
      - key: url.full
        action: hash

BiDi vs CDP: trade-offs for observability

CDP (Chromium only): richer, lower-level events, strong request interception/control. Excellent for detailed network/DOM instrumentation today.
WebDriver BiDi (cross-browser): standardized event model, improving request/response visibility. Header injection and interception support varies by driver/version.

Recommendation: If you control the runtime and need deep control, CDP via Playwright/Puppeteer on Chromium is the fastest path. If you need Firefox/Safari support, layer BiDi capture and converge on the minimum common set of events and attributes, with CDP providing extras when available.

Error handling and retries

Classify navigation errors (DNS, TLS, timeout, blocked by robots, login wall). Record error.type and cause where possible.
On retries, link spans with span links to maintain lineage across attempts without nesting.

ts
const retrySpan = tracer.startSpan('browser.navigate.retry', {
  links: [{ context: originalSpan.spanContext() }],
  attributes: { 'retry.attempt': 2 }
});

Performance costs and how to keep them low

Event volume: Prefer span events for high-frequency network/DOM signals; reserve spans for actions and outliers.
Batch processors: Use batch span processors with backpressure to avoid shedding.
Sampling: Tail sample by risk/error; head sample routine runs at 5–10%.
Scrubbing early reduces payload sizes drastically (headers, bodies).

Testing your instrumentation

Unit tests for wrappers (safeClick/safeFill) verifying span creation and risk detection.
Integration tests against a test site that triggers specific behaviors: cross-origin POST, downloads, mixed content.
Golden trace fixtures: serialize traces from a known run and compare attributes across versions.

Example Jest test for a tool wrapper:

ts
test('withToolSpan records hashes and status', async () => {
  const tool = withToolSpan('echo', async (x: any) => x);
  const out = await tool({ x: 1 });
  expect(out).toEqual({ x: 1 });
  // assert spans via your exporter mock or in-memory processor
});

What to alert on

risk.score above threshold by environment or tenant.
Frequent secret_in_form_fill events.
Downloads attempted from non-allowlist domains.
Elevated HTTP error rates or timeouts.
Planner loops (too many steps) or repeated retries for the same URL.

Alerting is best attached to metrics and tail-sampled traces. When an alert fires, include a direct link to the trace ID.

Common pitfalls

Logging raw prompts or tool inputs: redact or hash; only capture with a gated debug flag.
Over-instrumentation: thousands of spans per page make traces unreadable and expensive. Start with action-level spans + key events.
Context leaks: don’t inject trace headers to third-party domains. Guard propagation with an allowlist.
Ignoring driver version drift: pin and test browser/driver versions; BiDi event shapes can change.

Minimal semantic conventions to adopt

OpenTelemetry core conventions cover most of what we need. For custom attributes, use a stable namespace to avoid collisions, e.g.:

browser.protocol: cdp | bidi
browser.target.url, browser.page.title
user_agent.original, client_hints.request., client_hints.response.
tool.name, tool.input.hash, tool.output.hash
risk.score, risk.component.*
dom.selector, dom.action
network.requestId (CDP/BiDi IDs)

If your organization already uses a standard taxonomy (e.g., Elastic Common Schema), map attributes at the Collector.

A pragmatic rollout plan

Instrument the agent.run and tool.call spans with hashes and model metadata.
Wrap Playwright/Puppeteer with browser.navigate spans and key CDP events.
Add DOM action wrappers and minimal risk rules (cross-origin POST, downloads).
Plumb UA and Client Hints capture.
Add risk.assessment spans and a tail-based sampling policy keeping error/high risk.
Iterate: add per-request spans for XHR/fetch only, enrich risk rules, and integrate alerts.

Conclusion

Agentic browsers demand first-class observability and risk awareness. OpenTelemetry gives you the substrate — traces, metrics, logs, context — to model the full pipeline from planner decisions to protocol-level actions. With a disciplined span model, conservative data handling, and a simple, additive risk score, you can confidently ship and operate AI-driven browsing in production.

Start small: action-level spans, key events, and a handful of risk rules. Use tail sampling to keep what matters. Add depth where it pays off (CDP per-request spans for XHR, DOM fills, and downloads). And always treat identity and data with care: fix and record User-Agent/Client Hints, isolate cookies, and scrub aggressively.

With these patterns, your traces will tell the full story — of what the agent intended, what the browser executed, and where the risk lies — without drowning you in noise or exposing sensitive data.