Rate Limiting APIs Without Breaking User Experience
Overview
Rate limiting protects your API from abuse. Bad rate limiting punishes legitimate users. This post shows how to implement rate limiting that stops bots and scrapers while keeping real users unaware it even exists.
Problem
Your API is public. You launch. Within a week:
- A scraper is hitting your blog endpoint 10,000 times per hour
- A misconfigured webhook is retrying every second
- A user's bugged SPA is firing duplicate requests on every keystroke
Without rate limiting, these consume your database connections, inflate your hosting bill, and degrade performance for everyone.
But naive rate limiting ('100 requests per minute, hard cutoff') blocks legitimate users during normal bursts — like rapidly paginating through search results or submitting a form after a validation error.
Solution
Use the token bucket algorithm. It allows bursts while enforcing a long-term average rate.
How Token Bucket Works
- Each client has a 'bucket' of tokens (e.g., 20)
- Each request consumes one token
- Tokens refill at a steady rate (e.g., 10 per minute)
- If the bucket is empty, the request is rejected
This means a user can fire 20 rapid requests (the burst), then sustains 10/minute after that. No artificial cliff.
Implementation with Redis
import time
import redis.asyncio as redis
from fastapi import Request, HTTPException
redis_client = redis.from_url("redis://localhost:6379")
async def check_rate_limit(
key: str,
max_tokens: int = 20,
refill_rate: float = 10, # tokens per minute
) -> dict:
now = time.time()
pipe = redis_client.pipeline()
# Get current state
bucket_key = f"ratelimit:{key}"
pipe.hmget(bucket_key, "tokens", "last_refill")
result = await pipe.execute()
tokens_str, last_refill_str = result[0]
if tokens_str is None:
# First request — initialize bucket
tokens = max_tokens - 1
await redis_client.hmset(bucket_key, {
"tokens": tokens,
"last_refill": now
})
await redis_client.expire(bucket_key, 300)
return {"allowed": True, "remaining": tokens}
tokens = float(tokens_str)
last_refill = float(last_refill_str)
# Refill tokens based on elapsed time
elapsed = now - last_refill
tokens = min(max_tokens, tokens + elapsed * (refill_rate / 60))
if tokens < 1:
retry_after = (1 - tokens) / (refill_rate / 60)
return {"allowed": False, "retry_after": retry_after}
tokens -= 1
await redis_client.hmset(bucket_key, {
"tokens": tokens,
"last_refill": now
})
return {"allowed": True, "remaining": int(tokens)}
FastAPI Middleware
from fastapi import FastAPI
from starlette.responses import JSONResponse
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
# Skip rate limiting for static assets
if request.url.path.startswith(("/assets", "/images")):
return await call_next(request)
# Use IP + path as the rate limit key
client_ip = request.headers.get(
"x-forwarded-for", request.client.host
).split(",")[0].strip()
key = f"{client_ip}:{request.url.path}"
result = await check_rate_limit(key)
if not result["allowed"]:
return JSONResponse(
status_code=429,
content={"error": "Too many requests"},
headers={
"Retry-After": str(int(result["retry_after"])),
"X-RateLimit-Limit": "20",
"X-RateLimit-Remaining": "0",
}
)
response = await call_next(request)
response.headers["X-RateLimit-Remaining"] = str(result["remaining"])
return response
Implementation
Tiered Rate Limits
Different endpoints need different limits:
| Endpoint | Burst | Rate | Why |
|---|---|---|---|
| GET /api/posts | 30 | 15/min | Read-heavy, cacheable |
| POST /api/contact | 3 | 2/min | Prevents spam |
| POST /api/auth/login | 5 | 5/min | Brute force protection |
| GET /api/search | 10 | 10/min | Expensive queries |
RATE_LIMITS = {
"POST:/api/contact": {"max_tokens": 3, "refill_rate": 2},
"POST:/api/auth/login": {"max_tokens": 5, "refill_rate": 5},
"GET:/api/search": {"max_tokens": 10, "refill_rate": 10},
"default": {"max_tokens": 20, "refill_rate": 15},
}
Client-Side Handling
Do not just throw errors at the user. Handle 429 responses gracefully:
async function fetchWithRetry(url: string, retries = 2): Promise<Response> {
const response = await fetch(url);
if (response.status === 429 && retries > 0) {
const retryAfter = parseInt(
response.headers.get("Retry-After") || "5"
);
await new Promise(r => setTimeout(r, retryAfter * 1000));
return fetchWithRetry(url, retries - 1);
}
return response;
}
Upstash for Serverless
If you deploy on Vercel (serverless), you cannot run a local Redis. Use Upstash:
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, "60 s"),
});
export default async function handler(req) {
const ip = req.headers.get("x-forwarded-for") ?? "127.0.0.1";
const { success, limit, remaining } = await ratelimit.limit(ip);
if (!success) {
return new Response("Too many requests", { status: 429 });
}
// ... handle request
}
Challenges
Distributed rate limiting: If your API runs on multiple servers, each server's in-memory counter is independent. Redis solves this — it is the single source of truth.
IP-based limits behind proxies: Users behind corporate NATs share an IP. Consider adding authenticated user ID as a secondary rate limit key for logged-in users.
Conclusion
Token bucket is the right default for API rate limiting. It handles bursts naturally without punishing legitimate users. Always communicate limits via headers (X-RateLimit-Remaining, Retry-After). Use Upstash when deploying serverless. And test your limits with real traffic patterns before tightening them.