Web scraping | Glossary

Web scraping is the process of automatically retrieving data from websites and structuring it for further use. Two main approaches:

HTTP scraping — direct requests (curl, Python requests, axios). Fast, cheap, but works only on static sites without JS rendering.
Browser scraping — via browser automation (Playwright, Puppeteer). Slower and more expensive, but works everywhere.

Legality: public business data — usually legal (hiQ Labs vs LinkedIn precedent). Personal data, private content, ToS violations — grey zones or clearly illegal. See our GDPR vs scraping guide.

Typical challenges:

Anti-bot detection (Cloudflare, Akamai, Datadome, PerimeterX)
Rate limiting and IP blocking
Selector drift (parser breakage when the site changes)
JavaScript-rendered content
CAPTCHA

Production-grade scraping requires retry logic, monitoring, proxy rotation, and schema validation — "fetch + parse" is not enough.