Build a Rate Limiter Service

Overview

Every public API needs rate limiting. Without it, a single misbehaving client can exhaust your server's resources, starve other users, and run up your infrastructure bill. Rate limiting is also a key defense against credential stuffing, scraping, and denial-of-service attacks.

In this tutorial, you will implement two industry-standard rate limiting algorithms — token bucket and sliding window — from scratch in TypeScript. You will wrap them in Express middleware, add per-route configuration, build an in-memory store with automatic cleanup, and return proper HTTP headers so clients can self-regulate. By the end, you will understand not just how to use rate limiting, but how it works at the algorithmic level.

What you will build:

A token bucket algorithm implementation
A sliding window counter implementation
A flexible Express middleware with per-route configuration
Proper rate limit HTTP response headers
An in-memory store with TTL-based cleanup
A test harness for verifying rate limit behavior

Step 1: The Token Bucket Algorithm

The token bucket is the most intuitive rate limiting algorithm. Imagine a bucket that holds tokens. Each request consumes one token. Tokens regenerate at a fixed rate. When the bucket is empty, requests are rejected until tokens refill.

The beauty of token bucket is burst tolerance. A bucket with 100 max tokens allows a burst of 100 rapid requests, then throttles to the refill rate. This matches real user behavior — page loads trigger many simultaneous requests, then activity drops.

Step 2: The Sliding Window Algorithm

The sliding window counter offers smoother rate limiting with less burstiness. Instead of tokens, it counts requests within a rolling time window. It approximates the true sliding window using two fixed windows and weighted interpolation.

The weighted interpolation is the key insight. At the start of a new window, previous requests still count at nearly full weight. As time progresses through the window, the previous window's influence fades linearly. This eliminates the boundary spike problem of fixed windows.

Step 3: Unified Limiter Interface

Abstract both algorithms behind a common interface so middleware can use either one without knowing the implementation details.

Step 4: Express Middleware

Transform the limiter into Express middleware. The middleware extracts a key from each request (default: IP address), checks the limit, sets response headers, and either passes through or returns 429 Too Many Requests.

The RateLimit-* headers follow the IETF draft standard for HTTP rate limiting. Well-behaved clients read these headers to throttle themselves before hitting the limit.

Step 5: Per-Route Configuration

Different endpoints have different sensitivity. Login attempts need aggressive limits (5 per minute). Public reads can be generous (1000 per minute). Configure limits per route using a declarative map.

Step 6: Memory Management and Cleanup

Without cleanup, the in-memory store grows unbounded as new clients appear. Add a periodic sweep that removes entries that have not been accessed recently.

Set the TTL to at least 2x your longest window. For a 5-minute window, a 10-minute TTL ensures entries expire naturally while giving cleanup a safe margin.

Step 7: Testing Rate Limit Behavior

Rate limiters need deterministic tests. Inject a clock function so tests can control time without real delays.

Step 8: Integration and Load Testing

Wire everything together and verify the system holds up under simulated load. Use a simple script that fires concurrent requests and validates the rate limit response.

Run this against your server and verify that: the number of allowed requests matches your configured limit, rejected requests get 429 status codes with proper headers, and the server remains responsive throughout — no memory leaks, no CPU spikes, no dropped connections.

For production deployment, consider Redis as a backing store for distributed rate limiting across multiple server instances. The algorithms remain identical — only the storage layer changes from a Map to Redis GET/SET with TTL.