Rate limiting | Glossary

Rate limiting is limiting requests per time period — a fundamental technique to protect APIs from abuse, overload, and unfair usage. Every serious API has rate limits.

Main algorithms:

Token bucket — bucket with N tokens, each request costs 1, bucket refills R/sec. Allows bursts, smooth long-term rate.
Leaky bucket — requests enter the bucket, "leak" at constant rate. Smooth output, drop overflow.
Fixed window — counter per minute/hour, reset at start of window. Simple but has "burst at boundary" problem.
Sliding window — rolling window, most accurate but more computationally expensive.

Typical limits (2026):

Stripe API: 100 req/sec (live mode)
OpenAI API: 500-10,000 req/min depending on tier
Slack API: 1 req/sec per method (Tier 1)
LinkedIn: 500 actions/day (very restrictive)

Headers to recognize: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After. Production integrations should respect these headers, not rely on hardcoded limits.

What to do when hitting limit: exponential backoff retry, do NOT retry immediately — that escalates the problem. For bulk operations: spread requests evenly, plan around limits, use batch endpoints if available.