Continuous lead intelligence system for a B2B SaaS sales team
Daily multi-source enrichment of 40k+ accounts, intent signals, decision-maker mapping. Hands-off since Q2 2024.
11 real estate portals across 6 EU countries, 800k active listings monitored daily, cross-portal dedup, AVM. SaaS platform.
A proptech SaaS client (founded 2022, $5M ARR) served real estate investment funds across 6 European countries (PL, DE, AT, ES, IT, PT). Each fund client paid them $2–15k/month for "market intelligence" — daily listings, comparable sales analytics, deal alerts, AVM for underwriting.
State in 2024: scrape platform built by an earlier team, used Apify underneath, operational cost $25k/month, coverage only 3 countries (PL, DE, AT). Three attempts to expand to southern Europe (Idealista portal) failed — anti-bot blocked their configuration after 2–3 weeks every time.
Clients signalled growing dissatisfaction: data lag was 24–48h (vs competitors 2–4h), deduplication accuracy low (clients reported duplicates in their dashboards), AVM coverage only 60% of potential addresses.
We took over the existing infrastructure with a mandate: expand to 6 countries, reduce data lag to <2h, achieve 95%+ dedup accuracy, push AVM coverage to 95%.
Critical decisions: rebuild scrape layer from scratch on Playwright (Apify dominates the Polish market but has anti-bot weakness on harder EU targets), per-portal parser with parser_version tagging, canonical schema layer separating raw extraction from normalisation, dedicated photo-hash + geocoded-address deduplication service.
Architecture: Temporal as orchestrator (orchestrating 11 portals with different schedules — Otodom every 30 min, Idealista every 2h due to anti-bot tolerance), Playwright pool in Kubernetes (Hetzner for cost efficiency), PostgreSQL partitioned per portal per month, ClickHouse for time-series price tracking, AVM model in Python (hedonic regression with geo features).
AVM expansion approach: training data from scraped historical listings (3+ years retrospective) plus public land registry data (where available — KRN for Poland, Grundbuch for DE), enrichment via geocoded amenities (POI density, transport access, schools from OpenStreetMap), model retrain quarterly per market.
Coverage expanded to 11 portals across 6 countries: Otodom + Domiporta (PL), ImmoScout24 + Immowelt (DE), willhaben + Immobilienscout24.at (AT), Idealista + Fotocasa (ES), Immobiliare.it + Casa.it (IT), Idealista.pt (PT). Each with dedicated parser plus drift detection.
Data lag: average 47 minutes (vs 24–48h baseline), P95 92 minutes. This allowed client positioning as "real-time market intelligence" — a new pricing tier with premium clients.
Deduplication accuracy: 96.4% measured via manual sampling of 1,000 records monthly. False positive rate (different properties linked as same) <1%. Confidence scoring enables downstream applications to make trust-aware decisions.
AVM coverage: 94.2% of addresses with confidence interval <15%. Top 3 lending clients started using AVM in their underwriting decision flow.
Cost overall: $18k/month operational (down from $25k baseline), 4-person team taken over (technical lead + 3 engineers), maintenance retainer for expansion. Client revenue grew 2.4× in 11 months after project completion thanks to deal sourcing acceleration enabled by platform expansion.
Daily multi-source enrichment of 40k+ accounts, intent signals, decision-maker mapping. Hands-off since Q2 2024.
Resilient scraping with anti-bot routing, SKU normalization and 5-minute price-change webhooks into the client's repricing engine.
Goal-driven agent crawling filings, press, social and internal sources — producing structured analyst briefings every morning before 7 AM ET.
If you recognise pieces of this case study in your own situation — write. We usually see in the first call whether it is hours-per-week scale or months of infrastructure.