How to Measure Your CDN’s Cache Hit Ratio and Increase Cache Hits
What happens when a user on your web app initiates a web request?
The request needs to travel from the client to the origin server that hosts the web app via a protocol. To do this, the request becomes fragmented and encapsulated into data packets, and is then sent out across the internet’s backbone — a network of cables. Now suppose that the user who initiated the request is in India and the origin server is located in Canada. The request will need to traverse thousands of switches, routers, and data centers along parallel ISP routes, traveling tens of thousands of kilometers. When the server responds, it must go via the same path and protocol to the client. Signals, therefore, have to traverse such vast distances twice in order to provide a complete response. There will also be server-side delays due to load.
This is a big reason for network latency today and it is a big no-no for websites and web applications that want to be competitive. Even a few seconds of latency might result in a bad user experience and lead to user churn.
To address this challenge, Content Delivery Networks (CDNs) have risen in massive popularity, and have completely transformed modern information delivery. Today, most leading service providers use CDNs to deliver content to their users. A CDN does this via multiple improvements over a single-server setup: by optimizing file storage, deploying better hardware to store and process requests, and most importantly, using cache storage. This leads us to our essential question.
Why is Caching an Important Strategy for CDNs?
Caching is the process by which a file gets stored in a temporary storage location. CDN servers cache HTML scripts, JavaScript, and images from the origin servers in proxy servers to reduce latency.
The speed of content delivery is a critical parameter for web applications. When the browser (user) requests information from a website through a CDN service for the first time, the CDN delivers the content by connecting to the origin server and then caching the content — essentially saving the content in the nearest data center to the user. So, when users request the content in the future, the content is served from the cache, which makes the whole process of content delivery faster.
After a TCP handshake is made, the client machine will make an HTTP request to the CDN. In case the content has not yet been cached, the CDN will make the request to the origin server, and then download the content from the origin. This results in an additional request between the origin server and the CDN’s edge server.
This is what happens during CDN caching:
- When the user requests a webpage, the user’s request is routed to the CDN’s nearest edge server.
- The edge server then delivers the content back from the cached content.
- In case the content is not available in its cache, a request is made to the origin server for the content that the user requested.
- The origin server responds to the edge server’s request.
- Finally, the edge server responds to the client. And then it caches the response for future requests.
What is a Cache Hit Ratio?
A CDN’s performance in delivering a piece of the requested content is measured by the average memory access time — the less, the better. This, in turn, depends mainly upon how readily available the content is (i.e. how close it is to the user). The successful access of the requested piece of data in the CDN’s cache memory is referred to as a hit. If a browser requests a piece of content and the CDN has it cached, then it will deliver that content.
This is referred to as a cache hit. However, if the content is not available, then the CDN makes the request to the origin server. This is classified as a cache miss. Because the requested content is not available in the cache, the CDN will connect with the origin server in order to respond to the request. This leads to a delay in the delivery of the requested content.
Hit ratio = successful hits / total requests
A CDN provides cache storage. Cache hit ratio measures how many content requests a cache can deliver successfully from its cache storage, compared to how many requests it receives. A high-performing CDN will have a high cache hit ratio.
A cache hit ratio is an important metric that applies to any cache and is not only limited to a CDN. However, this is a vital metric for a CDN. For dynamic websites, where content changes frequently, the cache hit ratio will be slightly lower compared to static websites. However, modern CDNs can perform dynamic caching as well. A reputable CDN service provider should provide their cache hit scores in their performance reports.
In a nutshell, high cache hit ratios result in faster web apps, while low cache hit ratios result in slower web apps. This increases the stress on the origin server, as well as increased latency and dropped connections.
Caching is an integral part of what a CDN does.
How to Increase Cache Hit Ratio?
- Optimise cache-control headers: The Cache-Control header field specifies directives for caching mechanisms in cases of requests and responses. These headers are used to set properties, such as the maximum age of an object, expiration time, or whether or not an object is to be cached at all. Depending upon how frequently your content changes, you need to specify this property. Optimising these property values can help improve the number of cache hits on your CDN.
- Ignore cookies: Cookies tend to be un-cacheable, hence the files that contain them are also un-cacheable. Therefore, it’s important that you set rules. For example, ignore all cookies in requests for assets that you want to be delivered by your CDN.
- Ignore query strings: Query strings are useful in multiple ways: they help interact with web applications and APIs, aggregate user metrics and provide information for objects. The problem arises when query strings are included in static object URLs. In this case, the CDN mistakes them to be unique objects and will direct the request to the origin server. Accordingly, each request will be classified as a cache miss, even though the requested content was available in the CDN cache. This leads to an unnecessarily lower cache hit ratio.
A well-implemented CDN cache will optimise your infrastructure costs, effectively distribute resources, and deliver maximum speed with minimum latency.
At Medianova, we provide global CDN solutions in streaming, encoding, caching, micro caching, hybrid CDN, and website acceleration. We have delivered and managed CDNs for leading enterprises and our state-of-art solutions are benchmarked against industry-leading quality parameters.
Get in touch with us to learn more about how Medianova can build and manage an optimized and dedicated CDN for you.