Hermetic Web Snapshots for Browser Agents: Deterministic Replay in Training and CI/CD
If you build browser agents or maintain serious end-to-end UI tests, you eventually discover the web is hostile to determinism: clocks keep ticking, servers keep drifting, network jitter changes timing, third-party scripts change their mind. That is fine for a human user but disastrous for reproducible training and stable CI.
The answer is to make the web hermetic: capture a slice of reality once, then replay it many times with the same behavior, byte-for-byte and event-for-event. This article lays out a practical blueprint for capturing, sanitizing, and deterministically replaying real sessions for agent training and CI. We will dig into network and DOM timelines, time and geolocation control, Web API stubs, and the tooling patterns that make this tractable with CDP, service workers, and HAR.
Opinionated thesis: if your replay is not hermetic at the boundaries of time, randomness, device state, and network, it will drift. Drifting replays waste compute, hide regressions, and make agent training brittle. Your goal is to seal the universe and own the clocks.
Goals and non-goals
You want:
- Repeatable runs using the same browser build, flags, and fonts.
- The same network responses and event ordering (including streaming and WebSocket frames).
- The same DOM evolution given the same inputs.
- Controlled time, geolocation, locale, and randomness.
- Walled-off external effects (WebRTC, push, notifications, OS integrations, clipboard, GPU nondeterminism).
- A portable snapshot artifact you can move across machines and CI lanes.
You do not need:
- Pixel-perfect screenshots across GPUs (nice-to-have; possible with font and canvas tricks but expensive).
- Full simulation of every OS facility (mock, stub, or disable most of them).
What hermeticity means for the web
Hermetic replay seals the environment with these invariants:
- Closed-world networking: every fetch/XHR/WebSocket/SSE resolves from your recorded corpus; no DNS resolution or internet access is required or permitted.
- Locked clocks and seeds: Date, performance.now, requestAnimationFrame, setTimeout, and randomness are deterministic and controlled by you.
- Pinned runtime: same browser version, flags, viewport, fonts, and locale.
- Stable storage: localStorage, sessionStorage, IndexedDB, CacheStorage, cookies, and permissions start in a known state.
- Canonical inputs: user events (clicks, keys, pointer moves, wheel, drag, IME) are scheduled from a recorded timeline.
If any of these leak, replay will diverge. Treat them as security perimeters.
What to capture (and why)
To rebuild determinism, capture enough to reconstruct the runtime state and I/O boundaries.
- Network:
- All HTTP(S) requests and responses, including headers, status, redirects, compressed and binary bodies.
- Streaming semantics: chunk boundaries and ordering for fetch/XHR streams and EventSource.
- WebSockets: handshake and frame timeline (direction, opcode, bytes).
- Cache semantics: ETag, Last-Modified, Cache-Control; cookies and Set-Cookie; any Service Worker fetch interception.
- DOM and script sources:
- Initial document HTML as delivered by the server (post-redirect), including doctype and charset.
- Dynamically fetched scripts and styles (responses captured under network above).
- Optional: a DOM snapshot timeline to accelerate deterministic recovery after user events (more below).
- Device and environment:
- User agent, viewport size and device scale factor, timezone, locale and Accept-Language, geolocation.
- Fonts actually used; if reproducibility matters across machines, bundle WOFF2 subsets in CSS.
- State and storage:
- Cookies at start and any mutations.
- localStorage/sessionStorage/IndexedDB/CacheStorage contents at start (or capture writes during run and derive).
- Permissions responses: geolocation, notifications, clipboard, camera/mic (deterministic allow/deny).
- Timing and randomness:
- A timeline of user inputs with timestamps relative to a monotonic base.
- Optionally, a schedule of network delivery to preserve interleavings under concurrency (important for subtle races).
- A seed value for Math.random and crypto.getRandomValues (if stubbed) and any seeded PRNG used by your stubs.
Your capture should be lossless for bytes and ordering but does not require low-level paint or layout snapshots unless you are validating pixels. In practice, hermetic network and deterministic clocks already eliminate most nondeterministic rendering cascades.
Tooling primitives: CDP, service workers, and HAR
- CDP (Chrome DevTools Protocol): gives you low-level hooks into the network, runtime, DOM, and performance timeline. You can capture requests and responses (including bodies), DOM snapshots, intercept and fulfill fetches, and install scripts that run before any page script (evaluateOnNewDocument). This is the most precise route.
- Service workers: a portable in-page mechanism to intercept fetch and WebSocket upgrade requests from same-origin contexts. Great for replay inside the browser without external proxies. Limitations include cross-origin resources and earliest install timing.
- HAR (HTTP Archive): a widely used schema for recording HTTP transactions. It is useful but incomplete: HAR 1.2 does not model WebSockets, Service Worker interplay, or streaming chunk boundaries. You will likely need extensions.
Put differently: capture with CDP for fidelity, store as HAR(+extensions) for interoperability, and replay with either CDP route fulfillment or a service worker for portability.
A practical capture architecture
A minimal yet robust capture loop with Puppeteer or Playwright + CDP looks like this:
- Pin the browser build and environment
- Use a specific Chromium/Chrome version. With Playwright, the browser is pinned per release.
- Launch with flags that reduce nondeterminism:
- --disable-background-networking
- --disable-translate
- --disable-sync
- --js-flags=--random-seed=42
- --no-first-run --no-default-browser-check
- --disable-features=NetworkServiceInProcess,InterestFeedV2,OptimizationHints or any feature you find flaky
- --force-device-scale-factor=1
- Optional: --font-render-hinting=none
- Provide fonts deterministically (containerize fonts, include known WOFF2).
- Instrument the network with CDP
- Enable Network domain; listen to requestWillBeSent, responseReceived, dataReceived, loadingFinished, webSocketCreated/opened/frameReceived, etc.
- Persist each request keyed by method, URL, and a canonicalized subset of headers plus the request body. Persist response headers, status, and the full body. For compressed responses, capture the compressed bytes but also store a decompressed version for convenience.
- Capture initial DOM and storage
- On first Document response, save the raw bytes and the character encoding.
- Use Storage APIs via CDP (Storage.getCookies, getUsageAndQuota; or query from page) to snapshot cookies and local/session storage.
- For IndexedDB and CacheStorage, consider a page script that enumerates keys and values; or rely on network determinism and let the page repopulate under replay.
- Record user input timeline
- Wrap page input APIs (click, type, mouse.move) to log an event stream with timestamps relative to t0.
- If you are capturing from a real user, capture pointer events from the OS or from the browser’s devtools input events.
- Freeze clocks while capturing
- Counterintuitively, stabilizing capture helps later replay: set fake timers on the page and advance them in tandem with your input timeline. This makes the event schedule explicit and reduces latency dependence.
- Save environment metadata
- Browser version, OS, viewport, locale, timezone, geolocation, and permissions.
- Sanitize (next section) and pack into a snapshot bundle with a manifest.
Example: capturing with Puppeteer + CDP
jsconst fs = require('fs'); const path = require('path'); const puppeteer = require('puppeteer'); function canonicalKey(req) { const url = new URL(req.url); // Sort query params and drop volatile ones url.search = [...url.searchParams.entries()] .filter(([k]) => !['_ts', 'cacheBust'].includes(k)) .sort(([a],[b]) => a.localeCompare(b)) .map(([k,v]) => `${k}=${v}`) .join('&'); const key = { method: req.method, url: url.toString(), headers: Object.fromEntries(Object.entries(req.headers) .filter(([k]) => ['accept','content-type'].includes(k.toLowerCase()))), bodyHash: req.postData ? require('crypto').createHash('sha1').update(req.postData).digest('hex') : null }; return key; } (async () => { const browser = await puppeteer.launch({ headless: true, args: [ '--disable-background-networking', '--disable-sync', '--no-first-run', '--no-default-browser-check', '--js-flags=--random-seed=42' ] }); const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Network.enable', { maxPostDataSize: -1 }); await client.send('Page.enable'); const out = { meta: {}, entries: [], ws: [] }; const bodies = new Map(); client.on('Network.requestWillBeSent', ev => { const r = ev.request; bodies.set(ev.requestId, { postData: r.postData || null }); }); client.on('Network.responseReceived', ev => { out.entries.push({ kind: 'http', phase: 'response', ts: Date.now(), requestId: ev.requestId, response: ev.response }); }); client.on('Network.loadingFinished', async ev => { try { const body = await client.send('Network.getResponseBody', { requestId: ev.requestId }); out.entries.push({ kind: 'http', phase: 'body', ts: Date.now(), requestId: ev.requestId, body }); } catch (e) {} }); client.on('Network.webSocketCreated', ev => out.ws.push({ type: 'created', ts: Date.now(), url: ev.url })); client.on('Network.webSocketFrameReceived', ev => out.ws.push({ type: 'recv', ts: Date.now(), requestId: ev.requestId, payload: ev.response.payloadData })); client.on('Network.webSocketFrameSent', ev => out.ws.push({ type: 'send', ts: Date.now(), requestId: ev.requestId, payload: ev.response.payloadData })); // Install deterministic environment before any app script runs await client.send('Page.addScriptToEvaluateOnNewDocument', { source: ` (function(){ const seed = 1337; let s = seed; Math.random = function(){ s = (1103515245*s + 12345) % 2147483648; return (s/2147483648); }; const origGetRandomValues = crypto.getRandomValues.bind(crypto); crypto.getRandomValues = function(arr){ for (let i=0;i<arr.length;i++){ s = (1103515245*s + 12345) % 2147483648; arr[i] = s & 255; } return arr; }; const start = 1700000000000; // fixed epoch const originNow = Date.now(); Date = class extends Date { constructor(...a){ if (a.length===0) return super(start + (originNow)); return super(...a); } static now(){ return start; } }; const perf0 = 0; performance.now = () => perf0; })(); ` }); await page.goto('https://example.com', { waitUntil: 'networkidle2' }); fs.writeFileSync(path.join(__dirname, 'snapshot.json'), JSON.stringify(out, null, 2)); await browser.close(); })();
This is intentionally simplified. Production capture will also store request mapping keys, bodies for POSTs, cookies, and the initial HTML document. The core idea is consistent: log the bytes and the ordering.
Sanitization without breaking determinism
Capturing real sessions often means capturing sensitive data: auth tokens, email addresses, query strings with IDs, and cookies. You must sanitize while preserving logical equivalence so that replayed code still behaves identically.
Principles:
- Data minimization: keep only what is necessary for deterministic behavior.
- Structured transformations: never regex blindly over raw bytes; parse first (URL, headers, HTML, JSON) and then transform specific fields.
- Reversible mappings: when a value appears in multiple places (e.g., a CSRF token in a cookie and a form), replace it with a stable placeholder and a mapping table so internal consistencies hold.
- Compression-aware handling: if you rewrite response bodies, re-encode them to match Content-Encoding and lengths to avoid mismatches.
- Hash and redact by policy: define an allowlist (keep) and blocklist (remove or replace) for headers, cookies, query params, and JSON keys.
- Sanitize DOM and logs: replace text nodes that contain PII with tokens; if using screenshots, blur known PII regions.
- Legal and provenance: store provenance metadata (where and when captured, consent context) and TTL policies.
Example rewrite rules that preserve determinism:
- Cookie AuthToken becomes Token-AAAA; anywhere the same hash appears in JS responses, replace accordingly.
- User email foo@example.com becomes user+X@example.test across network and DOM; keep domain shape.
- IDs like 64-bit integers get remapped through a stable bijection seeded by snapshot ID.
Your sanitizer runs as a pipeline:
- Parse request and response.
- Apply field-level transforms.
- Update derived headers (Content-Length, signatures if you stub verification, etc.).
- Repackage into the snapshot bundle with manifest entries describing transformations (critical for later analysis and debugging).
The snapshot bundle format
Design a portable artifact to store everything needed for replay. A simple but effective layout:
- manifest.json (or .yaml): metadata and index
- browser: name, version, revision
- os: name, version
- viewport: width, height, dpr
- locale, timezone, geolocation
- seeds: prngSeed, cryptoSeed
- permissions: map of API -> allow/deny
- routes: matching rules and canonicalization
- clocks: baseEpoch, rAF cadence, timer policy
- notes: capture source, consent, TTL
- network/
- requests.ndjson: stream of request/response envelope events with ids
- bodies/: files keyed by content hash
- websocket.ndjson: frame timeline
- storage/
- cookies.json
- localStorage.json, sessionStorage.json (per origin)
- indexeddb/: optional dumps or migrations
- cacheStorage/: keys and bodies
- dom/
- initial.html
- dom-timeline.ndjson: optional mutation/paint checkpoints
- inputs/
- user-events.ndjson: pointer/keyboard timeline
- stubs/
- sw.js: service worker replayer
- preloads.js: evaluateOnNewDocument stubs for time, geo, randomness
This structure is straightforward to version and shard. You can chunk large bodies in a content-addressed way to de-duplicate across snapshots.
Replay: building a closed world
You have two main strategies: intercept at the browser boundary using CDP routes, or run a service worker in-page. In CI, CDP is simpler and more complete; for agent training in sandboxed environments (like a hosted notebook), a service worker works inside the page without privileged launch flags.
Replay with Playwright routing from HAR (plus extensions)
Playwright includes convenient helpers:
- context.routeFromHAR('file.har', { notFound: 'error', update: false })
- context.route to intercept and fulfill requests with custom logic.
- context.addInitScript to install deterministic stubs early.
- context.setGeolocation, setTimezoneId, setExtraHTTPHeaders for locale.
- context.routeWebSocket is not available today; you must extend CDP or replace with mocks.
Example hybrid approach using HAR for HTTP and a custom WS replayer via CDP:
tsimport { chromium } from 'playwright'; import fs from 'fs'; (async () => { const browser = await chromium.launch({ args: ['--js-flags=--random-seed=42'] }); const context = await browser.newContext({ timezoneId: 'UTC', locale: 'en-US', geolocation: { latitude: 37.4219999, longitude: -122.0840575 }, permissions: ['geolocation'] }); // Deterministic environment stubs loaded before any app script await context.addInitScript({ content: ` (function(){ const seed = 42; let s=seed; Math.random = () => { s = (1103515245*s + 12345) % 2147483648; return s/2147483648; }; const _now = 1700000000000; Date.now = () => _now; performance.now = () => 0; const geo = { coords: { latitude: 37.42, longitude: -122.084, accuracy: 5 } }; navigator.geolocation.getCurrentPosition = (ok) => setTimeout(() => ok(geo), 0); navigator.geolocation.watchPosition = (ok) => { ok(geo); return 1; }; })(); ` }); // Route from HAR for HTTP; no new network is allowed await context.routeFromHAR('snapshot.har', { notFound: 'error' }); // CDP hook for WebSocket frames const cdp = await context.newCDPSession(await context.newPage()); await cdp.send('Network.enable'); const wsFrames = JSON.parse(fs.readFileSync('websocket.ndjson','utf8').trim().split('\n').map(x=>x).join('\n')); // Implement a simple WS mock by stubbing WebSocket in the page if you cannot fulfill frames via CDP await browser.close(); })();
Note: Playwright HAR does not include WebSockets or streaming chunk boundaries. For streaming APIs (SSE, fetch streaming), consider a custom CDP-based replayer that delivers chunks with recorded timing.
Replay with a service worker (portable and simple)
A service worker can intercept same-origin fetches and return recorded bytes. This works even for streaming responses if you pipe a ReadableStream from stored chunks.
Basic SW replayer:
js// stubs/sw.js self.addEventListener('install', (event) => { self.skipWaiting(); }); self.addEventListener('activate', (event) => { event.waitUntil(self.clients.claim()); }); async function fetchRecorded(request) { const url = new URL(request.url); const key = { method: request.method, url: url.origin + url.pathname + '?' + [...url.searchParams.entries()].sort().map(([k,v]) => k+'='+v).join('&') }; const resMeta = await caches.open('snapshot-meta').then(c => c.match(new Request('meta:' + JSON.stringify(key)))); if (!resMeta) throw new Error('No recorded response for ' + request.url); const meta = await resMeta.json(); const body = meta.bodyChunks ? new ReadableStream({ start(controller){ (async () => { for (const cid of meta.bodyChunks) { const chunk = await caches.open('snapshot-bodies').then(c => c.match('cid:' + cid)).then(r => r.arrayBuffer()); controller.enqueue(new Uint8Array(chunk)); // optional: await new Promise(r => setTimeout(r, cid.delayMs || 0)); } controller.close(); })(); } }) : null; const headers = new Headers(meta.headers); return new Response(body || (await caches.open('snapshot-bodies').then(c => c.match('cid:' + meta.bodyCid)).then(r => r.body)), { status: meta.status, headers }); } self.addEventListener('fetch', (event) => { event.respondWith(fetchRecorded(event.request)); });
To install it during replay, serve your app under a local origin (e.g., http://replay.local) and register the service worker from the page. You will need to rewrite subresource URLs to be same-origin or proxy them through a local replay server.
Deterministic clocks and the event loop
Time is the root of flakiness. You must control it.
- Freeze Date.now and performance.now to known values or a deterministic schedule.
- Replace setTimeout, setInterval, and requestAnimationFrame with a fake clock. Sinon-style fakes work, but you want control at the browser boundary to avoid page code erasing your changes.
- Drive the clock from your runner: between user inputs, advance timers just enough to flush queued tasks. A simple policy is step functions: after a click, advance timers by N ms and run one rAF; repeat until the microtask queue is empty.
- Lock Intl time zone (e.g., UTC) and locale.
A minimal fake clock for browsers:
js(function(){ const realSetTimeout = window.setTimeout; const realClearTimeout = window.clearTimeout; let now = 0; let nextId = 1; const timers = new Map(); function tick(ms){ now += ms; runDue(); } function runDue(){ const due = [...timers.entries()].filter(([id,t]) => t.when <= now).sort((a,b)=>a[1].when-b[1].when); for (const [id, t] of due){ timers.delete(id); try { t.fn(); } catch (e) { console.error(e); } } } window.__fakeClockTick = tick; window.setTimeout = (fn, ms=0) => { const id = nextId++; timers.set(id, { when: now + ms, fn }); return id; }; window.clearTimeout = (id) => timers.delete(id); window.Date.now = () => 1700000000000 + now; window.performance.now = () => now; let rafQueue = []; window.requestAnimationFrame = (cb) => { const id = nextId++; rafQueue.push({id, cb}); return id; }; window.cancelAnimationFrame = (id) => { rafQueue = rafQueue.filter(e => e.id !== id); }; window.__fakeRaf = () => { const q = rafQueue; rafQueue = []; q.forEach(e => e.cb(now)); }; })();
In your runner, sequence: dispatch input event -> __fakeClockTick(16) -> __fakeRaf() -> repeat until stable.
Avoid letting application code replace your stubs. Install them using addInitScript or Page.addScriptToEvaluateOnNewDocument so they run before any page script. If you need to enforce them, seal properties with Object.defineProperty and configurable: false.
Web API stubs for hermetic runs
Stubs are your glue logic to keep the environment sealed:
- Randomness: override Math.random and crypto.getRandomValues with seeded PRNGs. Beware: some libraries test for cryptographic randomness; if so, provide a deterministic but API-compatible implementation.
- Geolocation: return a fixed location and watch stream; schedule updates if the app expects movement.
- Permissions: implement a stable allow/deny policy.
- Clipboard: mock read/write to an in-memory buffer.
- Notifications: no-op or log-only.
- WebRTC: disable or stub RTCPeerConnection to avoid P2P variability; return canned SDP and ICE candidates if needed.
- Canvas and WebGL: if pixel stability matters, force a software backend and disable antialiasing. For 2D canvas text metrics, fix fonts; for WebGL, consider headless-gl or skip pixel assertions.
- Storage: seed localStorage/sessionStorage/IndexedDB with recorded values; ensure quotas are generous in CI.
Example stubs injected at page start:
js(function(){ const clip = { text: '' }; navigator.clipboard = { writeText: async (t) => { clip.text = String(t); }, readText: async () => clip.text }; const perm = navigator.permissions; navigator.permissions = { query: async (opts) => ({ state: (opts.name === 'geolocation' ? 'granted' : 'denied') }) }; const OldRTC = window.RTCPeerConnection; window.RTCPeerConnection = function(){ throw new Error('WebRTC disabled in hermetic replay'); }; })();
Matching and canonicalization during replay
Requests rarely match naïvely; cache-busting params and headers change. Define a consistent matching function:
- Key by (method, URL sans unordered query params, selected headers, and an optional body hash).
- Normalize query param order and drop volatile keys like _ts, cacheBust, or any that are known to vary.
- For POST forms, parse application/x-www-form-urlencoded and ignore fields known to be nondeterministic if they do not affect server behavior under replay.
- For GraphQL or JSON RPC, match by operationName and a content hash of variables after applying placeholder mappings.
Mismatch policy:
- Strict by default: if no match, error out to surface drift early.
- Fuzzy fallback for training-time exploration only when necessary, logging every deviation with a clear reason.
- Diagnostics: on mismatch, emit a diff of request vs closest candidates to refine your canonicalization rules.
Deterministic scheduling and concurrency
Even with fake clocks, JavaScript execution is concurrent with network streams and input events. There are two main approaches:
- Passive determinism: fix times and let the app run at its own pace. This works for most apps if the network is instant (replay is local) and your timers are stable.
- Cooperative determinism: orchestrate steps where the runner advances the clock, flushes microtasks, performs a frame, and only then allows the next input or network chunk. This is akin to a turn-based event loop and yields reproducibility for racy apps.
For streaming data, deliver chunks in the recorded order with recorded (or scaled) delays between them, driven by your fake clock. You can also batch delivery at known safe points (after rAF) for easier synchronization.
Pinning the runtime: browser, OS, fonts
Bugs emerge from variation in the underlying platform:
- Pin the browser build. Playwright helps by shipping a known Chromium per version. Log exact executable path and revision.
- Use a container for OS-level stability (glibc, fontconfig, graphics stack). Consider running headless with software rasterization (ANGLE with SwiftShader) to reduce GPU nondeterminism.
- Bundle fonts to control text metrics across machines.
- Disable feature flags and field trials: --disable-features and --disable-variations to avoid receiving experimental rollouts.
Flags worth considering in CI:
- --disable-gpu --use-gl=swiftshader (for headless rendering)
- --disable-renderer-backgrounding --disable-background-timer-throttling
- --mute-audio
- --incognito or clean user data dir
Training agents vs CI tests: what differs
- Scale and variability:
- Training needs thousands to millions of steps. Storage and bandwidth matter; deduplicate resources aggressively and consider differential snapshots (delta against a base snapshot of the app shell).
- CI is smaller and more curated; aim for human-readable diagnostics on mismatch.
- Instrumentation:
- Training benefits from semantic labels on actions and observations: element role, accessible name, ARIA attributes, and stable element identifiers (data-testid) mapped into your action log.
- CI emphasizes assertion hooks and failure triage (network diffs, DOM diffs, screenshot diffs if needed).
- Curriculum and augmentation:
- For agents, you can augment snapshots by permuting time, locale, small viewport changes, or small randomized delays under the same hermetic network to increase robustness without re-capturing.
- For CI, keep permutations minimal; stability beats variability.
A shared trick: stable element identity. During capture, record not just CSS selectors but also CDP backendNodeId or a robust locator (role, name, text). During replay, resolve to the same logical target even if sibling indices shift, so actions hit the same control.
Failure modes and how to mitigate
- Cross-origin iframes and third-party scripts:
- Problem: service workers cannot intercept cross-origin; CDP can, but replaying cross-origin frames may require origin substitution.
- Solution: use a local proxy that rewrites origins and sets Content-Security-Policy to consolidate under your replay origin; or capture those domains separately and replay via host mapping.
- Service worker interplay:
- Problem: the site’s own service worker may cache and intercept fetches.
- Solution: clear service workers during replay or ensure your replay SW registers with higher priority; consider disallowing SW registration in the hermetic context and rely on CDP routing instead.
- Storage quotas and eviction:
- Problem: IndexedDB and CacheStorage quotas vary.
- Solution: increase quotas in CI if supported; in Chromium, use persistent partition or limit dataset size.
- Timing assumptions:
- Problem: the app expects time to pass (e.g., progress animations, token expirations).
- Solution: drive your fake clock forward where appropriate; provide deterministic time progression policies in the manifest.
- Crypto and signature validation:
- Problem: apps using SubtleCrypto signatures or token validation may reject sanitized artifacts.
- Solution: stub verification routines where lawful and appropriate in hermetic runs, or keep validation untouched by retaining keys within your secure replay environment.
An end-to-end workflow (opinionated and battle-tested)
- Capture
- Use Playwright to run the flow with CDP enabled.
- Enable Network and WebSocket capture, record user inputs, and inject pre-run stubs for seeded randomness and frozen clocks.
- Snapshot cookies and storage at start.
- Sanitize
- Apply structured transforms to network data (headers, query params, JSON keys) and DOM text.
- Generate placeholder mappings and store them in the manifest.
- Pack
- Create a bundle with manifest, network artifacts, storage, and stubs.
- Deduplicate bodies by content hash and compress the archive.
- Replay in CI
- Launch Playwright with pinned Chromium and deterministic flags.
- context.addInitScript to install time, geo, and API stubs.
- context.routeFromHAR (with your extended HAR or custom router) and disallow any external network.
- Seed storage from the snapshot.
- Reproduce user input from inputs/user-events.ndjson.
- Assert expected DOM outcomes and optionally snapshots.
- Replay for training
- Use the same environment.
- Expose a step API that the agent calls: observe (DOM tree, accessibility tree, network state) -> act (click, type) -> advanceClock -> receive reward.
- Optionally add controlled noise: small input timing jitter, viewport offsets, or locale changes recorded in the manifest, but keep network and layout deterministic.
- Triage and maintenance
- On mismatch, diff at multiple boundaries: request keys, response bodies, DOM structure.
- Update canonicalization rules, not the captured data, unless the app fundamentally changed.
- Keep a catalog of snapshots with metadata tags (app version, region, feature flags) to aid selection.
Concrete examples and code snippets
- Preloading deterministic stubs early with Playwright:
tsawait context.addInitScript({ content: ` (function(){ const seed = 7; let s=seed; const rnd = () => { s = (1664525*s + 1013904223) % 0x100000000; return (s>>>0) / 0x100000000; }; Math.random = rnd; const start = 1700000000000; const origin = start; Date = class extends Date { constructor(...a){ return a.length ? new (Function.prototype.bind.apply(Date, [null, ...a]))() : new (Function.prototype.bind.apply(Date, [null, origin]))(); } static now(){ return origin; } }; performance.now = () => 0; navigator.geolocation.getCurrentPosition = (ok, err) => ok({ coords: { latitude: 0, longitude: 0, accuracy: 1 } }); })(); `});
- Routing with a custom matcher when HAR is insufficient:
tsawait context.route('**/*', async (route) => { const req = route.request(); const url = new URL(req.url()); const canonical = url.origin + url.pathname + '?' + [...url.searchParams.entries()] .filter(([k]) => !['_ts','cb'].includes(k)) .sort().map(([k,v]) => `${k}=${v}`).join('&'); const key = `${req.method()} ${canonical}`; const entry = lookupInSnapshot(key, req.headers(), await bodyHash(req)); if (!entry) return route.abort('failed'); return route.fulfill({ status: entry.status, headers: entry.headers, body: await readBody(entry.bodyCid) }); });
- Replaying streaming SSE from recorded chunks in a service worker:
jsconst stream = new ReadableStream({ async start(controller){ for (const chunk of recorded.chunks){ controller.enqueue(new TextEncoder().encode(chunk.data)); await new Promise(r => setTimeout(r, chunk.delayMs)); } controller.close(); } }); return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
Extensions beyond basics
- DOM snapshotting for fast-forward: CDP DOMSnapshot.captureSnapshot can capture layout tree, computed styles, and node strings. For some replays, you can fast-forward to a checkpoint after heavy hydration and start from there.
- Accessibility tree capture: for agents that act semantically, capture AX nodes and roles via Accessibility.getFullAXTree and use it as observation space.
- Input method variability: to test IME flows deterministically, synthesize composition events with the same timing.
- OPFS and File System Access API: if your app uses these, stub the file system with an in-memory mount populated from the snapshot.
Measuring determinism
Trust but verify. Build metrics:
- Replay divergence rate: percentage of runs with any request mismatch.
- DOM diff distance: tree-edit distance or hash of stripped DOM.
- Frame-to-frame variance: for pixel tests, per-frame MSE within a tolerance.
- Event loop steps: number of ticks required to settle; outliers indicate hidden timers.
- Time-to-stable: wall-clock time to complete replay; regressions can reveal latent polling loops.
Run these metrics on every change to your capture, sanitizer, stubs, or browser version.
Security and ethics
- Respect terms of service; do not capture or replay against production servers without permission and safeguards.
- Avoid collecting more data than necessary; anonymize aggressively.
- Encrypt snapshot bundles at rest; consider per-artifact keys.
- Maintain audit logs for who accessed which snapshot and why.
Conclusion
Hermetic web snapshots make flaky tests reliable and make agent training practical. The recipe is simple in concept and exacting in practice: capture bytes and timelines with CDP, sanitize with structured transforms, pin the runtime, and replay via a sealed world using routing or service workers. Control the clocks, stub the world, and demand strict matches by default.
When done well, you get deterministic replays that survive browser upgrades, network outages, and third-party churn. Your CI gets faster and calmer. Your agents learn from a stable curriculum. And your team regains time spent chasing flakes.
If you are starting from zero, adopt a layered path: first record HTTP with HAR and routeFromHAR; then add CDP capture for bodies and WebSockets; then fake clocks and seeded randomness; then storage and permissions; finally, a manifest and sanitizer. Each step pays off immediately and compounds toward full hermeticity.
References and further reading
- Chrome DevTools Protocol domains: Network, Page, Runtime, DOM, DOMSnapshot, Performance, Storage
- Playwright networking and HAR:
- Service Worker spec and fetch events:
- Resource Timing and Performance APIs:
- Accessibility tree via CDP:
- HAR 1.2 spec (incomplete for WS/streaming):
- Deterministic testing patterns with fake timers (Sinon inspiration):
- Web platform storage APIs overview:
- WebSocket protocol primer and frame semantics:
With these tools and patterns, your browser agents and UI suites can finally stand on deterministic ground.
