Is competitor price monitoring legal?

Yes, for publicly available price data — the hiQ vs LinkedIn 2022 precedent confirms the right to scrape public data. At the same time you violate store ToS (Amazon, eBay, etc.), so the risk is IP/account blocking, not lawsuits. Compliance-wise: do not scrape behind login walls, do not overload servers, respect robots.txt for SEO-related crawling.

How many SKUs can I monitor with one bot?

500-2000 SKUs from 3-5 stores on one VPS (Hetzner CX22, 2GB RAM) without issues. Above 5000 SKUs you add a distributed queue (BullMQ + Redis), workers across multiple machines, dedicated proxy per worker. The limit is not CPU/RAM but proxy bandwidth.

What happens when Amazon or eBay changes their page layout?

Your selectors break — scrape returns null/undefined. That is why step 2 (schema validation with Zod) is critical: catch fails fast instead of pushing garbage to the database. Under retainer we fix in 24-48h. Without retainer — you must monitor success rate and fix yourself.

Are residential proxies really necessary?

For Amazon/eBay/major e-commerce — yes. Datacenter proxies get blocked by Cloudflare/Akamai/DataDome within 50-200 requests. Smartproxy/Bright Data residential ~$7-12/GB, monthly $50-200 for a typical setup. For smaller stores without aggressive anti-bot — datacenter suffices.

AX/T/01 — AX/TUTORIALS

Published: May 2, 2026 · 18 min read

How to build a competitor price monitoring bot

Production-grade price scraper with Playwright, proxy rotation, Slack alerts and change history. Step by step, with full code.

IntermediatePlaywrightNode.js / TypeScriptPostgreSQLSlack webhooksResidential proxy

Competitor price monitoring is bread and butter of e-commerce. In this tutorial we build a production-ready bot that: fetches prices for 200-500 SKUs from 3-5 competitor stores, detects price changes, saves history, sends Slack alerts when a price drops below threshold.

We say NO to solutions that die after 2 days: a single fetch() loop, one IP, no retry, no history. We say YES: idempotency, proxy rotation, schema validation, dead-letter queue.

What you need

Node.js 20+ or Python 3.11+
Basic JS/TS or Python knowledge
Access to residential proxy (Smartproxy, Bright Data — from $7/GB)
Slack workspace (optional, for alerts)
PostgreSQL or SQLite for price history

Steps

01
Project setup and dependencies
Initialize a TypeScript project with Playwright and a few helpers:
```
mkdir price-monitor && cd price-monitor
npm init -y
npm i playwright pg zod pino
npm i -D typescript @types/node tsx
npx playwright install chromium
```
Why these dependencies:
- playwright — browser automation with auto-wait
- pg — PostgreSQL client (better than SQLite if you plan to scale)
- zod — runtime schema validation (catches parser drift)
- pino — structured logging (JSON-line for observability)

Schema definition and target config

Each target (store) has its own structure — we define it declaratively:

// config/targets.ts
export const targets = [
  {
    id: 'amazon-de',
    baseUrl: 'https://www.amazon.de/dp/',
    selectors: {
      price: '#corePrice_feature_div .a-offscreen',
      title: '#productTitle',
      availability: '#availability',
    },
    waitFor: '#productTitle',
    proxyTier: 'residential',
  },
  // ...
];

Schema for the result (Zod):

import { z } from 'zod';
export const PriceData = z.object({
  sku: z.string(),
  targetId: z.string(),
  price: z.number().positive(),
  currency: z.enum(['PLN', 'EUR', 'USD']),
  available: z.boolean(),
  scrapedAt: z.date(),
});

Schema validation is critical — when the store changes layout and a selector returns garbage, validation fails instead of pushing bad data into the database.

Browser context and proxy rotation

Playwright uses browser contexts — like separate incognito sessions with isolated cookies/storage. For each SKU we create a new context with a different proxy:

// scraper/browser.ts
import { chromium } from 'playwright';

const proxies = [
  'http://user:pass@proxy1.smartproxy.com:7000',
  'http://user:pass@proxy2.smartproxy.com:7000',
  // ...
];

export async function scrapeWithRotation(url, selectors) {
  const proxy = proxies[Math.floor(Math.random() * proxies.length)];
  const browser = await chromium.launch({
    headless: true,
    proxy: { server: proxy },
  });
  const context = await browser.newContext({
    userAgent: getRandomUserAgent(),
    viewport: { width: 1920, height: 1080 },
    locale: 'en-US',
  });
  // ... scrape logic
  await browser.close();
}

For protected sites (Amazon, eBay) add playwright-extra + puppeteer-extra-plugin-stealth to hide headless flags.

Retry logic and dead-letter queue

Every scrape can fail: timeout, captcha, network error, parser drift. Implement exponential backoff:

async function scrapeWithRetry(target, sku, maxAttempts = 5) {
  let lastError;
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      const data = await scrape(target, sku);
      return PriceData.parse(data); // validate
    } catch (err) {
      lastError = err;
      const delay = Math.min(2 ** attempt * 1000, 30000);
      logger.warn({ sku, attempt, err: err.message });
      await sleep(delay);
    }
  }
  // After 5 fails → dead-letter queue
  await db.query(
    'INSERT INTO scrape_dlq (sku, target, error, failed_at) VALUES ($1, $2, $3, NOW())',
    [sku, target.id, lastError.message]
  );
  throw lastError;
}

DLQ with Slack alert when the list grows above threshold — you know something broke before the client asks "where are my reports".

Storage and price change detection

Each scrape is a new row in the history table. Price change detection = comparison vs previous:

CREATE TABLE price_history (
  id BIGSERIAL PRIMARY KEY,
  sku TEXT NOT NULL,
  target_id TEXT NOT NULL,
  price NUMERIC(10,2) NOT NULL,
  currency TEXT NOT NULL,
  available BOOLEAN NOT NULL,
  scraped_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_price_lookup ON price_history (sku, target_id, scraped_at DESC);

Query for drop detection:

SELECT current.sku, current.price, prev.price as prev_price,
       ((current.price - prev.price) / prev.price * 100) as pct_change
FROM price_history current
JOIN LATERAL (
  SELECT price FROM price_history
  WHERE sku = current.sku AND target_id = current.target_id
    AND scraped_at < current.scraped_at
  ORDER BY scraped_at DESC LIMIT 1
) prev ON true
WHERE current.scraped_at > NOW() - INTERVAL '1 hour'
  AND ((current.price - prev.price) / prev.price) < -0.05;

Each row in the result = drop >5%. Slack alert.

Slack alerts and scheduling

Slack webhook (incoming webhook URL from Slack App):

async function sendSlackAlert(changes) {
  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: `Price drops detected: ${changes.length} SKUs`,
      blocks: changes.map(c => ({
        type: 'section',
        text: { type: 'mrkdwn',
          text: `• ${c.sku} (${c.target_id}): ${c.prev_price} → ${c.price} (${c.pct_change.toFixed(1)}%)`
        }
      }))
    })
  });
}

Scheduling: cron on a VPS or BullMQ + Redis for distributed queue. Every 30 min:

// cron: */30 * * * *
import { scrapeAll } from './scraper';
import { detectChanges } from './detect';
import { sendSlackAlert } from './alerts';

(async () => {
  await scrapeAll(targets, getSkuList());
  const changes = await detectChanges();
  if (changes.length > 0) await sendSlackAlert(changes);
})();

What it costs to run

Run cost for 500 SKUs, scrape every 30 min (24/7):

Proxy (residential, ~3MB per page × 500 SKUs × 48 runs/day × 30 days = 22GB/month): ~$150-180/month at Smartproxy
VPS (Hetzner CX22, 2vCPU/4GB): €5/month
Postgres (Supabase Free tier suffices up to 500MB): $0
Slack: $0 (free tier)

Total: ~$160-185/month operating cost for 500 SKUs. Scales linearly with proxy bandwidth.

Common pitfalls

No schema validation — when the store changes layout, the scraper starts writing null as the price. Your analytics breaks silently.
Single IP — after 50-200 requests Cloudflare/Akamai will block. Proxy rotation is required not optional.
No retry logic — transient errors (timeout, network) blow up the whole batch instead of a single SKU.
Hard-coded selectors in code — when they change, you must redeploy. Better in config file or database.
No success rate monitoring — you do not know that 30% of scrapes return garbage. Log success/fail per target.

Build yourself or hire?

The above stack handles 500-2000 SKUs from 3-5 stores without issues. For 10k+ SKUs you add: queue system (BullMQ), distributed workers, observability (Grafana + Prometheus), more advanced anti-bot (mobile proxy for the hardest targets).

If your use-case needs scale, compliance (GDPR), 99.9% uptime SLA, or you just do not have a developer who will own this — talk to us. We run this for 14+ clients in production.

Frequently asked questions

Is competitor price monitoring legal?: Yes, for publicly available price data — the hiQ vs LinkedIn 2022 precedent confirms the right to scrape public data. At the same time you violate store ToS (Amazon, eBay, etc.), so the risk is IP/account blocking, not lawsuits. Compliance-wise: do not scrape behind login walls, do not overload servers, respect robots.txt for SEO-related crawling.
How many SKUs can I monitor with one bot?: 500-2000 SKUs from 3-5 stores on one VPS (Hetzner CX22, 2GB RAM) without issues. Above 5000 SKUs you add a distributed queue (BullMQ + Redis), workers across multiple machines, dedicated proxy per worker. The limit is not CPU/RAM but proxy bandwidth.
What happens when Amazon or eBay changes their page layout?: Your selectors break — scrape returns null/undefined. That is why step 2 (schema validation with Zod) is critical: catch fails fast instead of pushing garbage to the database. Under retainer we fix in 24-48h. Without retainer — you must monitor success rate and fix yourself.
Are residential proxies really necessary?: For Amazon/eBay/major e-commerce — yes. Datacenter proxies get blocked by Cloudflare/Akamai/DataDome within 50-200 requests. Smartproxy/Bright Data residential ~$7-12/GB, monthly $50-200 for a typical setup. For smaller stores without aggressive anti-bot — datacenter suffices.

← All tutorials