How We Eliminated 4.1 Billion Redis Calls Per Day with 10 Lines of Code

July 3, 20264 min read

AWSScalingCachingPerformance Optimization

How We Eliminated 4.1 Billion Redis Calls Per Day with 10 Lines of Code

At NUS Technology, we specialize in designing, building, and maintaining Operations Backbone Platforms for businesses with complex workflows. When operations outgrow off-the-shelf software, or when transaction volumes surge, systems often break down or become prohibitively expensive to run. Recently, our engineering team tackled a massive scaling challenge for a high-traffic platform processing 82 million API requests daily.

What seemed like a need for a complex architectural overhaul was ultimately solved with extreme simplicity. Here is how we used a 3-tier caching strategy to eliminate billions of unnecessary database calls and drastically reduce infrastructure costs, proving that mature engineering is often about knowing exactly what not to build.

The Problem: When the "Single Source of Truth" Becomes a Bottleneck

In distributed systems, managing configuration states — like feature flags, rate limits, and toggles — across multiple instances is a standard challenge. For this system, the backend ran on a PM2 cluster distributed across multiple EC2 hosts. Because configurations needed to be updated at runtime without a deployment, Redis was used as the single source of truth.

However, this design introduced a severe bottleneck. Every single API request required the system to load approximately 50 active configuration keys.

The math was staggering: 82,000,000 requests/day × 50 keys = 4.1 billion Redis reads per day. Redis was forced to handle an average of 47,454 operations per second, with traffic spikes peaking at 142,361 ops/sec.

To keep the system from collapsing under this load, the infrastructure relied on an expensive AWS ElastiCache setup using a pair of r7g.2xlarge instances, costing the business $1,044 per month. Redis is incredibly fast, but querying it over the network billions of times a day for relatively static configuration data is an anti-pattern.

Architecture Before: The Redis Bottleneck

82M requests/day hammering AWS ElastiCache Redis with 4.1 billion calls per day

The Solution: A 3-Tier Cache Architecture in 10 Lines of Code

Faced with this bottleneck, junior engineering teams might reach for complex solutions: implementing Pub/Sub mechanisms to sync state across all EC2 instances, migrating databases, or endlessly scaling up AWS instances.

Instead, our approach to platform modernization is rooted in practical, sustainable engineering. We implemented a simple local in-memory cache using a plain JavaScript object. With just 10 lines of code, we created a 3-tier cache structure (in-memory → Redis → MySQL). We stored the configuration keys in memory with a Time-To-Live (TTL) timestamp of 60 seconds.

Architecture After: The 3-Tier Cache Shield

Architecture after: a local in-memory map with a 60-second TTL absorbs 99.98% of traffic before Redis

What about memory footprint? Caching in RAM can cause Out-Of-Memory (OOM) crashes if it isn't managed correctly. However, because we only cached specific configuration keys (around 160 distinct keys) rather than user-specific or request-specific data, the memory footprint stayed completely flat at just a few kilobytes per process — regardless of how much traffic spiked.

Engineering Trade-offs: The Art of Intentional Inconsistency

The hallmark of a senior engineering team is the ability to weigh business risks against technical complexity. By caching data locally on each instance, we intentionally introduced eventual consistency.

When a configuration was updated, it might take up to 60 seconds for all instances to reflect the change. We accepted this 60-second stale window because, from a business perspective, a one-minute delay in updating a feature flag or a rate limit causes zero financial impact. Conversely, for highly sensitive operations — like balances, trades, or payment processing — the application continued to bypass the local cache entirely to ensure real-time accuracy.

We also considered the "kill-switch" edge case. Disabling the site in an emergency could theoretically be delayed by up to 60 seconds. However, avoiding the immense complexity of a distributed Pub/Sub cache-invalidation system made this a highly profitable trade-off.

The Impact: Massive Cost and Performance Optimization

The results of this minimal code change were immediate and dramatic:

99.98% reduction in load: Redis reads plummeted from 4.1 billion to just 720,000 per day.
Ops/sec stabilized: The average load on Redis dropped from 47,454 ops/sec down to a mere ~8 ops/sec.
Infrastructure right-sizing: Because the heavy baseline load was eliminated, we safely downgraded the AWS ElastiCache cluster from the expensive r7g.2xlarge pair to a much smaller t4g.medium pair.
Cost savings: Monthly AWS costs for this component dropped from $1,044 to $88, saving the client $11,472 per year.

By drastically reducing the load, the system is now highly resilient and capable of absorbing massive traffic spikes without breaking a sweat.

Are Your Operations Outgrowing Your Software?

This case study is a prime example of our core philosophy at NUS Technology: Clarity in Strategy. Excellence in Execution.

We don't just deliver isolated apps; we operate like an internal engineering team, owning reliability, performance, and continuous improvement as your operations scale. Whether it involves complex systems integration — handling legacy platforms and real-time hardware where failure isn't an option — or modernizing a platform that is struggling under its own weight, we build software that bends to your real-world processes.

Your operations are too important to run on workarounds. Contact NUS Technology today to discover what a reliable, scalable operational backbone looks like for your enterprise.

Written By