AX/T/03 — AX/TUTORIALS
Published: May 6, 2026 · 14 min read

How to monitor changes on a website

Change detection via DOM diff, content hashing, screenshot comparison. Alerts when competitor changes offer, prices, copy or policy.

BeginnerPlaywrightNode.js / TypeScriptPostgreSQLPixelmatch / sharpWebhooks

Change detection is one of the simplest but most impactful automation use cases. Detecting when: competition changes prices, adds a new product, modifies policy, updates page copy. Or: a regulator publishes a new guideline, an agency changes a form, a provider updates API docs.

We show 3 techniques, each for a different type of changes:

  1. Content hashing — for detected text changes
  2. DOM diff — for detected structural changes
  3. Screenshot diff — for visual changes (e.g. new banner, layout)
What you need
  • Node.js 20+ or Python 3.11+
  • Basic JS/TS or Python knowledge
  • VPS or serverless platform (Vercel, Cloudflare Workers)
  • Slack/email/Discord for alerts
Steps
  1. 01

    Technique 1: Content hashing

    Simplest approach: fetch text, hash, compare to previous hash. Works great for "did page content change":

    import { chromium } from 'playwright';
    import crypto from 'node:crypto';
    
    async function getContentHash(url, selector) {
      const browser = await chromium.launch({ headless: true });
      const page = await browser.newPage();
      await page.goto(url, { waitUntil: 'networkidle' });
      const text = await page.locator(selector).innerText();
      await browser.close();
      // Normalize whitespace, strip dynamic timestamps
      const normalized = text.replace(/\s+/g, ' ').trim();
      return {
        hash: crypto.createHash('sha256').update(normalized).digest('hex'),
        text: normalized,
      };
    }
    
    async function checkChanges(url, selector) {
      const current = await getContentHash(url, selector);
      const prev = await db.query(
        'SELECT hash, text FROM content_history WHERE url=$1 ORDER BY checked_at DESC LIMIT 1',
        [url]
      );
      if (prev.rows.length === 0 || prev.rows[0].hash !== current.hash) {
        await alertChange(url, prev.rows[0]?.text, current.text);
        await db.query(
          'INSERT INTO content_history (url, hash, text, checked_at) VALUES ($1, $2, $3, NOW())',
          [url, current.hash, current.text]
        );
      }
    }

    Pitfall: dynamic content (timestamps, ad slots, randomized order) creates false positives. Filter the selector tightly (e.g. main article not body).

  2. 02

    Technique 2: DOM diff for structural changes

    Hashing catches content changes but not structural ones (e.g. they added a pricing table, removed a section). DOM diff:

    import { diffLines } from 'diff';
    
    async function getDomSnapshot(url, selector) {
      const browser = await chromium.launch({ headless: true });
      const page = await browser.newPage();
      await page.goto(url);
      const html = await page.locator(selector).innerHTML();
      await browser.close();
      // Pretty-print so the diff is readable
      return html.replace(/>\s*</g, '>\n<');
    }
    
    async function findStructuralChanges(url, selector) {
      const current = await getDomSnapshot(url, selector);
      const prev = await getPrevSnapshot(url);
      if (!prev) return saveSnapshot(url, current);
      
      const changes = diffLines(prev, current);
      const significant = changes.filter(c => 
        (c.added || c.removed) && c.value.trim().length > 20
      );
      if (significant.length > 0) {
        await alertStructuralChange(url, significant);
      }
    }
  3. 03

    Technique 3: Screenshot diff for visual changes

    Hashing and DOM diff miss visual-only changes — e.g. changed banner image, CTA color, layout shift. Screenshot diff:

    import pixelmatch from 'pixelmatch';
    import { PNG } from 'pngjs';
    import fs from 'node:fs';
    
    async function captureAndCompare(url) {
      const browser = await chromium.launch({ headless: true });
      const page = await browser.newPage({ viewport: { width: 1440, height: 900 } });
      await page.goto(url);
      await page.waitForLoadState('networkidle');
      
      const buffer = await page.screenshot({ fullPage: true });
      await browser.close();
      
      const prev = await getPrevScreenshot(url);
      if (!prev) return saveScreenshot(url, buffer);
      
      const img1 = PNG.sync.read(prev);
      const img2 = PNG.sync.read(buffer);
      const { width, height } = img1;
      const diff = new PNG({ width, height });
      
      const numDifferent = pixelmatch(
        img1.data, img2.data, diff.data,
        width, height,
        { threshold: 0.1 } // sensitivity 0-1
      );
      
      const pctChanged = numDifferent / (width * height);
      if (pctChanged > 0.02) { // 2%+ pixels changed
        await alertVisualChange(url, pctChanged, diff);
      }
    }

    threshold: 0.1 = ignore minor anti-aliasing differences. 2% pixel change threshold = ignore tiny shifts.

  4. 04

    Scheduling and alerting

    Cron or serverless. For 50-200 URLs hourly Vercel cron or Cloudflare Worker suffices.

    // Vercel cron: vercel.json
    {
      "crons": [
        { "path": "/api/check-changes", "schedule": "0 * * * *" }
      ]
    }
    
    // app/api/check-changes/route.ts
    export async function GET() {
      const urls = await db.query('SELECT url, selector, technique FROM watched_urls');
      for (const { url, selector, technique } of urls.rows) {
        if (technique === 'hash') await checkChanges(url, selector);
        if (technique === 'dom') await findStructuralChanges(url, selector);
        if (technique === 'screenshot') await captureAndCompare(url);
      }
      return Response.json({ checked: urls.rows.length });
    }

    Alert template in Slack:

    const block = {
      type: 'section',
      text: { type: 'mrkdwn',
        text: `*Change detected*: ${url}\n` +
              `Type: ${technique}\n` +
              `Previous: ${prevSnippet}\n` +
              `Current: ${currentSnippet}`
      }
    };
What it costs to run

Run cost for 100 URLs hourly (72k checks/month):

  • Vercel Pro (cron jobs + functions): $20/month
  • Storage (Vercel Postgres or Supabase, screenshot history ~1GB): $0-25/month
  • Proxy (datacenter sufficient for public sites, ~5GB/month): $5-15/month
  • Slack: $0

Total: ~$25-60/month. Great ROI — it pays for itself if it detects ONE competitor price change per year.

Common pitfalls
  • Entire <body> as selector — catches cookie banner, recommendations widget, ad slot rotation. Hash narrowly (main content area).
  • No normalization — whitespace differences, line endings = false positive. Normalize before hashing.
  • Lazy-loaded content — page.goto + immediate read = empty sections. Use waitForLoadState('networkidle') or explicitly wait for elements.
  • Screenshot diff without viewport lock — different viewports = different layouts = constant false positives. Fixed width×height.
  • Alert fatigue — if alerts fire for every dynamic widget, you start ignoring them. Tune thresholds until <1 alert/day per URL.
Build yourself or hire?

Change detection is probably the best ROI in automation — low complexity, low cost, high business value. Standard deployment for 50-100 URLs is a weekend project.

Where it gets hard: very dynamic sites with heavy JS, sites behind auth walls (banking, internal portals), regulatory sites with RPA-style flow. That is where we come in.

Frequently asked questions
How often should I check for changes?
Depends on target volatility: e-commerce prices every 30-60 min, regulatory pages daily, marketing copy weekly. Too often = false positives from dynamic content. Too rarely = miss your window. Standard for "competitor monitoring": hourly.
Do I need proxies for change detection?
For public sites with normal anti-bot — no. Datacenter proxy ($5-15/month) suffices to avoid loading your IP. For protected sites (Cloudflare, DataDome) — residential proxy necessary, same as price scraping.
How to avoid alert fatigue?
Three tactics: 1) narrow selectors (main content area, not body), 2) thresholds (e.g. screenshot diff > 2% pixels, not 0.1%), 3) grouping (collect N changes over 30 min, send one alert not ten). Aim for <1 alert/day per URL — above that gets ignored.
Can I monitor pages behind authentication?
Yes — login in setup flow, save storageState for persistent session. For banking/SaaS internal portals it needs more care: session expiry handling, MFA flows. We do this for 3 clients currently, each custom.