Continuous lead intelligence system for a B2B SaaS sales team
Daily multi-source enrichment of 40k+ accounts, intent signals, decision-maker mapping. Hands-off since Q2 2024.
Resilient scraping with anti-bot routing, SKU normalization and 5-minute price-change webhooks into the client's repricing engine.
The client — a large e-commerce company with a private portfolio of 50 own brands — needed a view of their and direct competitor prices across 1,200+ marketplaces in EU and US. They previously bought data from two providers. Problems: 24h delay (their pricing cycle is in hours), low SKU coverage (~65%), no visibility into data quality, cost €38k/mo.
Goal: monitor 2M+ products per day, <5 minute delay from price change at source to webhook in the repricing engine, SKU coverage >95%, operating cost <€15k/mo.
Three-layer architecture. Scraping layer: 480 Playwright workers in Kubernetes across three regions (EU-West, EU-Central, US-East), each with isolated fingerprint and its own residential proxy pool. Distribution per marketplace optimized for their individual rate limits and detection patterns.
Normalization layer: SKU matching via combination of EAN/UPC/MPN, fuzzy name matching (Levenshtein + embedding similarity for atypical cases), canonical product graph in Postgres with 18M nodes. Every new record hits this graph — either as a match to an existing SKU or a new node.
Delivery layer: change detection on every record, webhooks under 5 minutes from change (most <90 seconds), hourly batch export to S3 in Parquet, Next.js dashboard for client analysts.
SKU coverage: 97.2%. Median price-change detection time: 84 seconds. Operating cost: €11,800/mo (proxies, infra, LLM for edge cases).
The client saved €26k/mo versus prior providers, gained 23× shorter latency, and over 30 points higher SKU coverage.
The system survived two Cloudflare anti-bot engine updates in 2024 (each patched within 6h of detection), and a full Amazon EU UI migration in August 2025 (patched within 18h via an API backup path).
Daily multi-source enrichment of 40k+ accounts, intent signals, decision-maker mapping. Hands-off since Q2 2024.
Goal-driven agent crawling filings, press, social and internal sources — producing structured analyst briefings every morning before 7 AM ET.
47 brand accounts, four platforms, one operator console. Scheduling, engagement, analytics and human-in-the-loop review built end-to-end.
If you recognise pieces of this case study in your own situation — write. We usually see in the first call whether it is hours-per-week scale or months of infrastructure.