Thundering herd problem

The thundering herd problem occurs in computing when a large number of processes, threads, or client requests, all waiting for a specific event or resource, are awakened or triggered simultaneously, but only one of them can successfully proceed to handle the event. The others then go back to waiting, resulting in wasted system resources (like CPU cycles for context switching) and potential system performance degradation or failure due to resource contention.

Common Scenarios

This problem commonly appears in:

Operating Systems: When multiple threads are waiting on a single event, such as an incoming network connection, and the kernel wakes up all of them when the event occurs, only one can accept the connection. The others perform unnecessary work (context switching) before returning to a wait state.
Caching Systems (Cache Stampede): When a popular cached item expires, numerous simultaneous requests for that item all miss the cache and then simultaneously hit the backend database or API to regenerate the data. This sudden surge can overwhelm the database and cause it to fail.
Distributed Systems and APIs: When a server is overwhelmed and becomes unavailable, multiple clients using a retry mechanism may all attempt to reconnect or retry failed requests at the exact same time, further exacerbating the server's overload and preventing it from recovering.
Scheduled Jobs: Multiple instances of a scheduled task (e.g., cron jobs on different servers) might all trigger at the exact same time, leading to a synchronized flood of requests to a shared service.

Solutions and Mitigations

Various strategies are used to mitigate the thundering herd problem:

Request Coalescing/Locking: A mechanism where, if a resource is being generated or fetched, subsequent requests wait for the first request to complete and then share the result, rather than initiating their own separate requests. This can be implemented using locks or a "single-flight" mechanism.
Staggered/Randomized Expirations (Jitter): Instead of having all cached items expire at the same time, a random delay (jitter) is added to the expiration times to spread out the renewal requests over a period.
Specialized Kernel APIs: Modern operating systems provide specific APIs (like EPOLLEXCLUSIVE in Linux's epoll() or I/O completion ports in Windows) that can be configured to wake up only one process or thread when an event occurs.
Rate Limiting and Circuit Breakers: Implementing rate limits on backend services can prevent them from being completely overwhelmed. A circuit breaker pattern can return a fallback response (like stale data) when the backend is under heavy load, preventing a cascade of failures.
Proactive Caching: Periodically updating the cache in the background (e.g., via a dedicated worker) before it expires, ensuring fresh data is available when requested.