Why Rate Limiting Matters for App Performance and Security
March 21, 2025
Imagine you've launched your app. Someone — maybe a competitor, maybe a bored kid with a script — starts hammering your login endpoint. A thousand password guesses per minute. Your server is sweating, your real users are getting timeouts, and the attacker hasn't even broken a sweat yet. Without rate limiting, there's nothing stopping them.
The Threat — What Happens Without Request Limits
Without rate limiting, your API is an open invitation. Attackers can automate requests at machine speed — sending thousands of attempts per minute with no friction at all. Three attacks thrive in this environment:
Brute force attacks systematically try password combinations. An 8-character alphanumeric password has 218 trillion possibilities, but attackers don't need to try them all — they start with the most common ones, leaked passwords from previous breaches, and variations of usernames. At 1,000 guesses per second, an unprotected login endpoint becomes a liability.
Credential stuffing is brute force's more dangerous cousin. Attackers take username/password pairs from one breach and try them automatically across hundreds of other services. Since most people reuse passwords, success rates are surprisingly high — and it's entirely automated.
Scraping and DDoS round out the picture. Bots can extract your entire product catalog, price list, or user-generated content in minutes. And a distributed flood of requests — even without malicious intent, like a misconfigured integration sending retry loops — can take down your service for everyone.
Consequences — What You're Actually Risking
The attack scenarios above translate into real problems:
- Compromised accounts — credential stuffing breaches real users' accounts, which then becomes your legal and reputational problem, not just the user's
- Service outages — unthrottled traffic exhausts your server's CPU, memory, or database connections; legitimate users get errors
- Data loss — scrapers drain intellectual property you've spent months building
- Cloud bills — every request costs compute resources; an attacker sending 10 million requests isn't paying that bill, you are
- Compliance exposure — a brute-forced account containing personal data may trigger GDPR breach notification requirements
The Defense: Rate Limiting
Rate limiting restricts how many requests a client can make to your application within a given time window. When a client exceeds the limit, further requests get rejected (HTTP 429) or queued, until the window resets.
The key insight: rate limiting doesn't need to be perfect to be effective. It just needs to make automated attacks slow enough to be impractical.
A few strategies exist for counting requests:
- Fixed Window — counts requests in fixed time intervals (e.g., max 60 per minute, counter resets at :00)
- Sliding Window — a rolling window that tracks the last N seconds, smoother than fixed window
- Token Bucket — clients accumulate tokens over time and spend one per request; allows controlled bursts
- Leaky Bucket — requests are processed at a constant rate, excess is queued or dropped
Each has different trade-offs between simplicity and precision. For most applications, sliding window or token bucket gives the best balance.
Why It Works — The Math of Making Attacks Impractical
Here's the core mechanism: rate limiting turns machine-speed attacks into human-speed problems.
Without limits, an attacker guessing passwords can try 1,000 combinations per minute. At that pace, a list of 10,000 common passwords takes 10 minutes to exhaust. With rate limiting at 5 login attempts per minute per IP address, the same list takes 33 hours — and that's before any IP rotation costs on the attacker's side.
Add account lockout after 10 failed attempts and the attacker is effectively blocked for every single account they target. Rate limiting doesn't need to be a wall. It just needs to make the economics of an attack bad enough that the attacker moves on.
For credential stuffing specifically: the attack relies on speed and scale. Each account gets only a handful of attempts before detection. Rate limiting per IP makes that scale expensive to maintain — the attacker needs many rotating IPs, which costs money and infrastructure.
For DDoS: rate limiting caps how much damage each attacking node can do. A botnet of 10,000 IPs with a 60-request-per-minute limit maxes out at 600,000 requests per minute — substantial, but predictable and manageable by infrastructure, versus an unlimited flood.
Implementation Scheme — Where Rate Limiting Lives
Rate limiting works best as an early gate, before your requests reach business logic or the database:
[Client]
|
v
[CDN / Reverse Proxy]
| (passes real client IP via X-Forwarded-For)
v
[Rate Limiter] <——> [Shared counter store, e.g. Redis]
| Check: per-IP / per-user / per-endpoint
|
|— limit exceeded? → HTTP 429, Retry-After header
|
v
[API / Business Logic]
|
v
[Database]A few things to get right in this diagram:
The counter store must be shared. If you run multiple application instances, each needs to read and write the same counters. In-memory counters per process don't work — instance A might allow 60 requests while instance B is doing the same. Redis is the standard solution here.
Use the real client IP, not your proxy's IP. When requests pass through a CDN or reverse proxy, the REMOTE_ADDR your application sees is the proxy's IP address — not the client's. Pull the real IP from the X-Forwarded-For header, but validate it carefully (it can be spoofed if not configured properly at the proxy level).
Rate limit at multiple dimensions. Per-IP catches volume attacks; per-user prevents authenticated users from abusing their accounts; per-endpoint lets you set stricter limits on expensive operations like login, signup, or file uploads.
Setting Effective Thresholds
Rate limits should reflect realistic usage patterns, not theoretical worst cases. A few starting points:
- Public endpoints (unauthenticated): stricter — e.g., 30–60 requests per minute per IP
- Login / registration endpoints: much stricter — e.g., 5–10 attempts per minute, with lockout
- Authenticated API endpoints: more generous — e.g., 200–300 requests per minute per user (a modern SPA loading a single page can trigger 10+ API calls)
- Resource-intensive endpoints (exports, reports, file uploads): the strictest — e.g., 5–10 per minute
Start slightly generous and tighten based on monitoring data, rather than starting aggressive and dealing with support tickets from blocked legitimate users.
When a client is rate limited, communicate it clearly:
- Return HTTP 429 with a
Retry-Afterheader indicating when they can try again - Include rate limit headers (
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset) on regular responses so clients can self-throttle
Common Pitfalls
Setting one limit for everything. A login endpoint and a read-only product catalog endpoint have completely different risk profiles. One-size-fits-all limits either leave sensitive endpoints unprotected or frustrate users on benign ones.
Forgetting about your own internal traffic. Monitoring agents, background jobs, cron tasks, and microservice calls count toward limits too. Build in whitelist mechanisms or dedicated rate limit tiers for internal services before they cause production incidents.
Trusting X-Forwarded-For blindly. If your proxy isn't configured to set this header authoritatively, a client can spoof it to bypass per-IP limits. Ensure the header is set by your trusted proxy, not passed through from the client.
No distributed state in multi-instance deployments. Rate limiting only works if the counter is shared. Local in-process counters fail silently when you scale horizontally — you won't notice until you monitor request patterns after scaling.
Blocking legitimate power users. Some users genuinely need higher limits — API integrators, enterprise clients, internal tools. Plan for a mechanism to grant elevated limits for specific users or API keys before you need it.
Performance Benefits — a Bonus, Not the Goal
Rate limiting is a security feature. The performance benefits are real but secondary.
Capping client request rates prevents runaway mobile app bugs, misconfigured integrations, or traffic spikes from exhausting server resources. In cloud environments with auto-scaling, it also prevents unexpected compute costs from uncontrolled traffic growth. Consistent limits help maintain predictable response times for all users by preventing any single client from monopolizing connections or database queries.
These are good outcomes — but build rate limiting as a security control first, and treat the stability and cost benefits as a bonus.
Summary
Rate limiting is one of those defenses that pays for itself the first time it catches an automated attack. The principle is simple: make machine-speed attacks slow enough that they're not worth running.
The two things most implementations get wrong: not using a shared counter store (breaks on multiple instances) and setting one global limit instead of endpoint-specific ones.
Related posts in this series:
- Basic User Authentication Strategies — the endpoint rate limiting protects is typically authentication; these two belong together
- Common Attacks Against New Apps — brute force and credential stuffing in the broader context of attack types
- Why Rate Limiting Is Only Part of the API Gateway Story — coming soon: throttling vs. rate limiting, and what else an API gateway does