Rate limiting is limiting requests per time period — a fundamental technique to protect APIs from abuse, overload, and unfair usage. Every serious API has rate limits.
Main algorithms:
- Token bucket — bucket with N tokens, each request costs 1, bucket refills R/sec. Allows bursts, smooth long-term rate.
- Leaky bucket — requests enter the bucket, "leak" at constant rate. Smooth output, drop overflow.
- Fixed window — counter per minute/hour, reset at start of window. Simple but has "burst at boundary" problem.
- Sliding window — rolling window, most accurate but more computationally expensive.
Typical limits (2026):
- Stripe API: 100 req/sec (live mode)
- OpenAI API: 500-10,000 req/min depending on tier
- Slack API: 1 req/sec per method (Tier 1)
- LinkedIn: 500 actions/day (very restrictive)
Headers to recognize: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After. Production integrations should respect these headers, not rely on hardcoded limits.
What to do when hitting limit: exponential backoff retry, do NOT retry immediately — that escalates the problem. For bulk operations: spread requests evenly, plan around limits, use batch endpoints if available.