Agentic Browser as a Chrome Extension: MV3 Auto‑Agent with CDP, Offscreen Docs, UA/Client‑Hints Harmony, and Telemetry
Agentic browsing is moving from demos to production: instead of scripting pages with brittle selectors, we want an adaptive agent that plans, navigates, extracts, fills forms, and self-checks outcomes. The twist in this guide is architectural: we’ll build an agent inside a Chrome Manifest V3 (MV3) extension, using the service worker as an orchestrator, offscreen documents to host local AI and DOM-capable work, and the Chrome DevTools Protocol (CDP) to bridge low-level browser controls. We’ll also reconcile the "User-Agent" string with modern Client Hints, expose a "what is my browser agent" telemetry pipeline, and finish with CI/CD for web store delivery.
The audience here is technical; we’ll lean on exact APIs, pitfalls, and code. The tone is opinionated but pragmatic: what works, what breaks, and how to ship safely.
TL;DR Architecture
- MV3 service worker is the agent’s conductor: router, tool-gating, session memory, and an event loop for tasks.
- Offscreen document (chrome.offscreen) hosts heavier Web APIs that the service worker can’t use: local LLM inference (WebGPU/WASM), DOM parsing, canvas, and WebRTC if needed. It can also act as a hidden UI for page evaluation.
- CDP bridge via chrome.debugger attaches to tabs and sends low-level protocol commands for navigation, UA override, input events, and DOM introspection.
- UA/Client‑Hints harmony: collect and unify user agent surfaces (UA string, navigator.userAgentData, Sec-CH-*), and avoid contradictions by configuring CDP’s Emulation.setUserAgentOverride with matching metadata when required.
- "What is my browser agent" telemetry: explicit, privacy-conscious diagnostics of UA/CH and execution context for your agent, exposed via a local page or remote service endpoint.
- Safety: tool gating with schemas, domain allowlists, human-in-the-loop checkpoints, and guardrails around CDP. Default to least privilege.
- CI/CD: automated build, zipping, and Chrome Web Store API upload via GitHub Actions with secrets management.
Why an agent inside an extension?
Embedding the agent in the user’s browser gains:
- Immediate access to user context and interactive tabs.
- Real page rendering, third-party scripts, and auth flows.
- Lower latency for tool actions versus a remote driver.
- Fewer brittle dependencies than external automation frameworks.
Costs:
- You must work within MV3 constraints (service worker lifetimes, permissions, CSP).
- Guardrails are non-optional; an agent without safety is a liability.
- Uploading to the Chrome Web Store means stricter review and privacy considerations.
For many use cases (research, QA assist, internal tools, power-user automation), the trade-off is worth it.
Core components and their responsibilities
- Service worker (SW): boot, receive messages, task queue, policy checks, tool invocation, session state, logging, CDP bridge, telemetry aggregation.
- Offscreen document: local inference host, DOM-capable analytics, vectorizer/embeddings, long-running processes (audio transcription), and a reliable rendering-capable sandbox that the SW can message.
- Content scripts: instrument target pages lightly for event capture and targeted actions (fill fields, capture text, annotate). Keep these minimal and idempotent.
- Side panel or options page: UI for configuration, allowlists, safe-mode toggles, and view of the agent’s plan and steps. Optional.
- CDP bridge: chrome.debugger attach/detach, sendCommand wrapper with retry/backoff, mapping high-level "navigate"/"click"/"type" to CDP calls.
Manifest V3: the minimum viable manifest
Start with a precise permission set. Don’t over-ask; you can expand later with migrations.
json{ "manifest_version": 3, "name": "Agentic Browser Auto-Agent", "version": "0.1.0", "description": "MV3 agent that plans, navigates with CDP, and harmonizes UA/Client Hints.", "permissions": [ "storage", "offscreen", "scripting", "activeTab", "debugger", "tabs" ], "host_permissions": [ "https://*/*", "http://*/*" ], "background": { "service_worker": "sw.js", "type": "module" }, "action": { "default_title": "Agentic Browser" }, "options_page": "options.html", "icons": { "16": "icons/16.png", "48": "icons/48.png", "128": "icons/128.png" }, "content_security_policy": { "extension_pages": "script-src 'self'; object-src 'self'" } }
Notes:
- chrome.offscreen requires the "offscreen" permission and a call-time justification string.
- The debugger permission is necessary for CDP bridging. Use it sparingly and attach only to the active tab when needed.
- Keep CSP tight; if you need remote model calls, consider a vetted API endpoint and avoid wildcard script loads.
Service worker as the agent conductor
The SW cannot rely on persistent global state; it must handle activation/deactivation and message-driven work. A simple model:
- Maintain a queue of tasks in chrome.storage.session (or chrome.storage.local) with a session ID.
- For each task, run: policy check -> planning -> tool execution -> verification -> logging.
- The planner can be remote (OpenAI/Vertex/Anthropic) or local (Transformers.js/WebLLM) via the offscreen doc.
Basic SW skeleton:
js// sw.js import { ensureOffscreen } from './offscreen.js'; import { cdp } from './cdp.js'; import { policy } from './policy.js'; import { tools } from './tools.js'; import { telemetry } from './telemetry.js'; chrome.runtime.onInstalled.addListener(() => { console.log('[Agent] Installed'); }); chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => { if (msg?.type === 'AGENT_RUN') { runAgent(msg.payload).then(result => sendResponse({ ok: true, result })) .catch(err => sendResponse({ ok: false, error: String(err) })); return true; // async } }); async function runAgent(payload) { await ensureOffscreen(); const { tabId, goal } = payload; // Acquire CDP bridge lazily await cdp.attach(tabId); const plan = await planSteps(goal); // remote or offscreen for (const step of plan.steps) { policy.assertAllowed(step); const outcome = await execute(step, tabId); await telemetry.logStep({ step, outcome }); if (!verify(step, outcome)) { const revised = await replan(goal, plan, step, outcome); plan.merge(revised); } } await cdp.detach(tabId); return { status: 'done' }; } async function planSteps(goal) { // Example: delegate to offscreen to run local LLM or call remote API return new Promise((resolve) => { chrome.runtime.sendMessage({ type: 'OFFSCREEN_PLAN', goal }, res => resolve(res.plan)); }); } async function execute(step, tabId) { const t = tools.lookup(step.tool); return await t.run({ step, tabId, cdp }); } function verify(step, outcome) { // Simple heuristic verification; upgrade with assertions defined in the plan return outcome?.ok !== false; } async function replan(goal, plan, step, outcome) { return new Promise((resolve) => { chrome.runtime.sendMessage({ type: 'OFFSCREEN_REPLAN', goal, plan, step, outcome }, res => resolve(res.plan)); }); }
Opinionated guidance:
- Do not let the planner write arbitrary JavaScript into pages. Restrict execution to vetted tools with schemas.
- Always detach from CDP when done; lingering sessions break other devtools and drain battery.
- Store minimal session state; prefer deterministic re-derivation.
Offscreen document: invisible muscle
Offscreen documents give you DOM and graphical APIs without a visible UI window. They are created on-demand.
Create on first use:
js// offscreen.js export async function ensureOffscreen() { const existing = await chrome.offscreen.hasDocument?.(); if (existing) return; await chrome.offscreen.createDocument({ url: 'offscreen.html', reasons: ['BLOBS', 'DOM_PARSER'], justification: 'Local LLM inference and DOM parsing for agent planning.' }); }
Inside offscreen.html, load offscreen.js to handle messages, optional local inference, and data processing.
html<!-- offscreen.html --> <!doctype html> <html> <head><meta charset="utf-8"><title>Offscreen</title></head> <body> <script type="module" src="offscreen-main.js"></script> </body> </html>
And an offscreen message handler:
js// offscreen-main.js import { localPlanner } from './planner-local.js'; chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => { if (msg.type === 'OFFSCREEN_PLAN') { localPlanner.plan(msg.goal).then(plan => sendResponse({ plan })); return true; } if (msg.type === 'OFFSCREEN_REPLAN') { localPlanner.replan(msg.goal, msg.plan, msg.step, msg.outcome).then(plan => sendResponse({ plan })); return true; } });
Local model options:
- Transformers.js or WebLLM to run small models via WebGPU/WebAssembly.
- A fast embeddings model for retrieval-augmented planning.
- Alternatively, call a remote LLM. Keep keys in chrome.storage and proxy through your backend to avoid exposing secrets in the extension package.
Tooling: safe tool gating and schemas
An agent should only act through explicit tools with clear schemas, rate limits, and domain policies. Example tool registry:
js// tools.js import { cdpTools } from './tools-cdp.js'; import { contentTools } from './tools-content.js'; export const tools = { registry: new Map([ ['navigate', cdpTools.navigate], ['click', cdpTools.click], ['type', cdpTools.type], ['extractText', contentTools.extractText], ['waitFor', cdpTools.waitFor], ['setUAOverride', cdpTools.setUAOverride] ]), lookup(name) { const tool = this.registry.get(name); if (!tool) throw new Error(`Unknown tool: ${name}`); return tool; } };
Policy checks with JSON-schema-like validation and allowlists:
js// policy.js const ALLOWED_DOMAINS = new Set([ 'example.com', 'developer.chrome.com' ]); const TOOL_SCHEMAS = { navigate: { required: ['url'], properties: { url: { type: 'string', pattern: '^https?://' } } }, click: { required: ['selector'], properties: { selector: { type: 'string', maxLength: 512 } } } // ...More schemas }; function validate(tool, args) { const s = TOOL_SCHEMAS[tool]; if (!s) return; for (const req of (s.required || [])) { if (!(req in args)) throw new Error(`Missing ${req}`); } if (s.properties?.url && args.url) { if (!new RegExp(s.properties.url.pattern).test(args.url)) throw new Error('Bad URL'); const u = new URL(args.url); if (!ALLOWED_DOMAINS.has(u.hostname)) throw new Error('Domain not allowed'); } } export const policy = { assertAllowed(step) { validate(step.tool, step.args || {}); if (requiresHumanApproval(step)) { throw new Error('Human approval required'); } } }; function requiresHumanApproval(step) { // Example: risky tools or off-allowlist domains if (step.tool === 'setUAOverride') return true; return false; }
Add a simple human-in-the-loop flow through the options page or a side panel to approve queued risky steps.
CDP bridge: chrome.debugger wrapper
MV3 does not allow raw sockets, but chrome.debugger exposes a controlled CDP channel. Wrap it to be ergonomic and safe.
js// cdp.js const sessions = new Map(); async function attach(tabId) { if (sessions.has(tabId)) return sessions.get(tabId); await chrome.debugger.attach({ tabId }, '1.3'); const send = (method, params = {}) => new Promise((resolve, reject) => { chrome.debugger.sendCommand({ tabId }, method, params, result => { const err = chrome.runtime.lastError; if (err) return reject(err); resolve(result); }); }); sessions.set(tabId, { send }); // Enable required domains await send('Page.enable'); await send('DOM.enable'); await send('Runtime.enable'); await send('Network.enable'); return sessions.get(tabId); } async function detach(tabId) { if (!sessions.has(tabId)) return; await chrome.debugger.detach({ tabId }); sessions.delete(tabId); } export const cdp = { attach, detach, send: async (tabId, method, params) => { const s = sessions.get(tabId) || await attach(tabId); return s.send(method, params); }};
Map tools to CDP commands:
js// tools-cdp.js import { cdp } from './cdp.js'; async function navigate({ step, tabId }) { const { url } = step.args; await cdp.send(tabId, 'Page.navigate', { url, transitionType: 'typed' }); await cdp.send(tabId, 'Page.loadEventFired'); // optional wait; better: lifecycle watcher return { ok: true }; } async function waitFor({ step, tabId }) { const { selector, timeoutMs = 10000 } = step.args; const start = Date.now(); while (Date.now() - start < timeoutMs) { const doc = await cdp.send(tabId, 'DOM.getDocument', { depth: -1, pierce: true }); const node = await cdp.send(tabId, 'DOM.querySelector', { nodeId: doc.root.nodeId, selector }); if (node?.nodeId) return { ok: true }; await new Promise(r => setTimeout(r, 200)); } return { ok: false, error: 'Timeout' }; } async function click({ step, tabId }) { const { selector } = step.args; const doc = await cdp.send(tabId, 'DOM.getDocument', { depth: -1, pierce: true }); const node = await cdp.send(tabId, 'DOM.querySelector', { nodeId: doc.root.nodeId, selector }); if (!node?.nodeId) return { ok: false, error: 'Node not found' }; const box = await cdp.send(tabId, 'DOM.getBoxModel', { nodeId: node.nodeId }); const [x1, y1, x2, y2] = box.model.border; const x = (x1 + x2) / 2; const y = (y1 + y2) / 2; await cdp.send(tabId, 'Input.dispatchMouseEvent', { type: 'mousePressed', x, y, button: 'left', clickCount: 1 }); await cdp.send(tabId, 'Input.dispatchMouseEvent', { type: 'mouseReleased', x, y, button: 'left', clickCount: 1 }); return { ok: true }; } async function type({ step, tabId }) { const { text } = step.args; for (const ch of text) { await cdp.send(tabId, 'Input.dispatchKeyEvent', { type: 'keyDown', text: ch }); await cdp.send(tabId, 'Input.dispatchKeyEvent', { type: 'keyUp', text: ch }); } return { ok: true }; } async function setUAOverride({ step, tabId }) { const { userAgent, uaMetadata } = step.args; await cdp.send(tabId, 'Emulation.setUserAgentOverride', { userAgent, userAgentMetadata: uaMetadata // harmonize UA-CH fields too }); return { ok: true }; } export const cdpTools = { navigate, waitFor, click, type, setUAOverride };
Caveats:
- Some pages block debugger-attached sessions or react poorly to CDP-driven input. Fall back to content-script-driven DOM actions if needed.
- Handle tab closures and navigation races; listen to chrome.tabs.onRemoved and avoid stale sessions.
Content scripts for page-local actions
Content scripts are ideal for data extraction and simple DOM manipulations.
js// tools-content.js export const contentTools = { async extractText({ step, tabId }) { const { selector } = step.args; const [{ result }] = await chrome.scripting.executeScript({ target: { tabId }, func: (sel) => { const el = document.querySelector(sel); return el ? el.innerText : null; }, args: [selector] }); return { ok: !!result, text: result }; } };
Prefer content scripts for operations that the page7s JS can handle without CDP; this keeps your debugger attachment lifetime short.
UA/Client‑Hints harmony: one browser, one story
The legacy User-Agent (UA) string is being reduced and superseded by Client Hints (CH):
- navigator.userAgent is frozen-ish and less detailed.
- navigator.userAgentData (UA-CH) provides structured data and high-entropy fields on request.
- Servers may ask for Sec-CH-UA* headers; browsers decide what to send based on Accept-CH and permissions policy.
Your agent must avoid saying contradictory things about its identity. If you spoof the UA with CDP, but navigator.userAgentData still reports another brand/version, sites may detect inconsistencies and break or flag the session.
Two viable strategies:
-
No override, observe and report. Default to the platform7s native identity. This is safest and recommended.
-
Override both UA and UA-CH via CDP for a specific testing purpose, and scope it clearly (the active tab and a bounded session).
Collect UA/CH surfaces:
js// ua.js - run in the tab context via executeScript export async function getUAProfile() { const ua = navigator.userAgent; const ch = navigator.userAgentData ? await navigator.userAgentData.getHighEntropyValues([ 'architecture', 'bitness', 'model', 'platformVersion', 'uaFullVersion', 'fullVersionList' ]) : null; return { ua, ch }; }
From the SW:
jsconst [{ result }] = await chrome.scripting.executeScript({ target: { tabId }, func: async () => { const ua = navigator.userAgent; const ch = navigator.userAgentData ? await navigator.userAgentData.getHighEntropyValues([ 'architecture','bitness','model','platformVersion','uaFullVersion','fullVersionList' ]) : null; return { ua, ch }; } }); const profile = result;
If you must override, use a coherent userAgent and userAgentMetadata:
jsawait cdp.send(tabId, 'Emulation.setUserAgentOverride', { userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36', userAgentMetadata: { brands: [ { brand: 'Chromium', version: '121' }, { brand: 'Google Chrome', version: '121' } ], fullVersionList: [ { brand: 'Chromium', version: '121.0.6167.140' }, { brand: 'Google Chrome', version: '121.0.6167.140' } ], platform: 'Windows', platformVersion: '10.0.0', architecture: 'x86', bitness: '64', model: '', mobile: false } });
Important:
- If you override UA, do it only during a controlled session and restore default by detaching.
- Some headers (Sec-CH-UA) are emitted by the network stack; CDP UA override with metadata is the correct path, not trying to hand-edit request headers.
- Privacy: Don7t collect high-entropy hints without user consent.
"What is my browser agent" telemetry
Telemetry here has one job: reveal, for debugging and trust, exactly how the browser identifies itself and how the agent acted. This should be explicit and opt-in.
A minimal telemetry payload:
- Extension version, session ID, timestamp
- UA string, UA-CH high-entropy values, platform, mobile status
- CDP attach/detach events and UA override events (with values)
- Tool actions (names, not raw content) and outcomes
- Sampling rate and privacy mode flags
Implementation sketch:
js// telemetry.js export const telemetry = { async logStep(event) { const cfg = await chrome.storage.local.get(['telemetryEnabled', 'endpoint']); if (!cfg.telemetryEnabled) return; const payload = { t: Date.now(), v: chrome.runtime.getManifest().version, session: await sessionId(), event }; try { await fetch(cfg.endpoint, { method: 'POST', headers: { 'content-type': 'application/json' }, body: JSON.stringify(payload) }); } catch (e) { // Best-effort; don7t crash agent } } }; async function sessionId() { const k = 'agent_session'; const v = (await chrome.storage.session.get(k))[k]; if (v) return v; const nv = crypto.randomUUID(); await chrome.storage.session.set({ [k]: nv }); return nv; }
Add a local diagnostics page (options.html) that runs the UA profile function and displays results + the last few telemetry records. Keep PII out of telemetry by default; allow a "redact" mode or on-device-only logging.
Privacy posture:
- Default off. Require explicit opt-in for telemetry leaving the device.
- Summarize what7s being sent in plain English.
- Respect enterprise policies and incognito behavior; disable in incognito unless the user grants split permission.
Planning model: pragmatic approach
Don7t over-engineer planning on day one. Two workable patterns:
- ReAct-like: the model proposes a thought and an action (a registered tool call with args) given an observation. You execute, feed the result back, and iterate to completion.
- Task graph: an initial planner produces a DAG of steps with conditions; the SW executes the graph with verification nodes.
Data contract for a tool call:
json{ "tool": "navigate", "args": { "url": "https://example.com" }, "verify": { "selector": "h1" } }
You can store a compact history of observations and tool results in chrome.storage.session to rehydrate after SW suspension.
Safety and compliance
Chrome Web Store policy and user trust demand:
- Minimal required permissions; justify debugger use in your listing.
- Clear disclosures about automation behavior, telemetry, and UA overrides.
- Gating risky operations behind user approval; emergency stop button.
- Rate limits and domain allowlists; avoid drive-by automation on arbitrary sites.
- Content Security Policy: disallow eval, remote script injection, or fishy host permissions.
Security tips:
- Never embed API secrets in the extension bundle. Use a backend proxy with user auth.
- Sanitize any text from the model before using it to form selectors or inputs. Avoid direct code execution from model outputs.
- Validate all tool inputs per schema and reject anything outside bounds.
Testing and validation
- Unit test policy and tool schemas (pure JS tests run with any harness).
- Simulate CDP flows in a controlled Chrome instance (e.g., launch Chrome with a test profile and load unpacked extension). For CI, you can use a headful Chrome on Linux with Xvfb if you must, but keep E2E tests minimal.
- Use Chrome7s extensions-internal logs: chrome://extensions > your extension > Service worker to inspect console output.
- Build a debug switch in storage to enable verbose logging and a trace of CDP commands (with sensitive data redacted).
CI/CD: from repo to Web Store
Packaging:
- Use a deterministic build step that outputs a zip with a stable file order (e.g., npm run build, then zip). The Web Store requires the zipped bundle.
- Version bump per release. Keep semantic versioning in manifest.json.
GitHub Actions workflow (simplified):
yamlname: build-and-publish on: push: tags: - 'v*.*.*' jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: '20' - run: npm ci - run: npm run build - name: Zip extension run: | cd dist && zip -r ../extension.zip . - name: Upload artifact uses: actions/upload-artifact@v4 with: name: extension path: extension.zip publish: needs: build runs-on: ubuntu-latest steps: - uses: actions/download-artifact@v4 with: name: extension path: . - name: Publish to Chrome Web Store uses: Klemensas/chrome-extension-upload-action@v1 with: refresh-token: ${{ secrets.CWS_REFRESH_TOKEN }} client-id: ${{ secrets.CWS_CLIENT_ID }} client-secret: ${{ secrets.CWS_CLIENT_SECRET }} app-id: ${{ secrets.CWS_EXTENSION_ID }} file-path: extension.zip publish: true
Notes:
- Generate OAuth credentials for the Web Store API; store them as repo secrets.
- Use separate channels (unlisted beta vs public) by toggling publish and using different app IDs.
- Run a linter that checks manifest permissions against an allowlist to prevent accidental permission creep.
Putting it together: a sample flow
Goal: "Log into example.com, navigate to /dashboard, extract the total revenue text, and report it."
- The user clicks the extension action. The SW creates a session and asks the planner for steps.
- Planner emits steps: navigate(url), waitFor(selector="#login"), type(text), click(selector="#submit"), waitFor("#dashboard"), extractText(selector="#revenue").
- Policy validates each step; domain is allowed, and tools are safe. The SW attaches to CDP, runs navigation and waits, uses content script for extraction, logs outcomes.
- UA override is not needed; telemetry logs that the native UA profile was used.
- The result is displayed in the side panel, and the SW detaches from CDP.
If a site breaks due to UA detection, the user can enable a "compatibility mode" that, for the specific domain, sets UA override + metadata to match a particular Chrome version, with a conspicuous banner and approval.
Performance and robustness
- Keep CDP sessions short. Attach, do work, detach. This reduces contention with DevTools and keeps power usage lower.
- Debounce planning cycles; batch observations before asking the model to replan.
- Cache context in chrome.storage.session; on SW wake, rehydrate and continue.
- For offscreen documents, reuse across tasks rather than recreating frequently; they7re more expensive to start than a message send.
- For local models, choose small quantized variants and leverage WebGPU if available. Provide a fallback to remote inference if not.
Anti-patterns to avoid
- Letting the model synthesize arbitrary CSS/XPath selectors without constraints; add heuristics that prefer stable attributes (data-* ids) and re-verify.
- Running infinite loops waiting for elements without backoff or a max attempt window.
- Telemetry without opt-in or collecting high-entropy Client Hints by default.
- Overriding UA globally for the entire browser; keep overrides scoped to your attached target only.
Extensibility roadmap
- Side panel UI: live plan visualization and step-by-step approvals.
- Workspace memory: vector store of prior tasks and selectors per site.
- Tooling for file uploads/downloads via chrome.downloads and CDP input events.
- Multi-tab orchestration with tab grouping and shared context.
- Enterprise mode: policies provisioned via managed storage, enforced domain allowlists, and centralized telemetry sinks.
References and pointers
- Manifest V3 docs: https://developer.chrome.com/docs/extensions
- Offscreen documents: https://developer.chrome.com/docs/extensions/reference/offscreen
- chrome.debugger (CDP bridge): https://developer.chrome.com/docs/extensions/reference/debugger
- Client Hints and UA reduction: https://wicg.github.io/ua-client-hints/
- Emulation.setUserAgentOverride: https://chromedevtools.github.io/devtools-protocol/tot/Emulation/#method-setUserAgentOverride
- Extensions CSP: https://developer.chrome.com/docs/extensions/mv3/manifest/content_security_policy
Conclusion
An agentic browser can live comfortably inside a Chrome MV3 extension if you treat the service worker as a conductor, the offscreen document as a capability sandbox, and the chrome.debugger CDP bridge as a carefully gated power tool. Harmonize UA and Client Hints to present a coherent identity, and surface that identity explicitly with a "what is my browser agent" telemetry view. With a bias toward safety (schemas, allowlists, approvals) and a disciplined CI/CD pipeline, you can ship an Auto-Agent that is practical, auditable, and fast enough for real workflows.
The upside isn7t just novelty; it7s reliability. By constraining agency to well-structured tools, aligning browser identity, and building in observability, you get an agent that behaves predictably in the wildly unpredictable web.