Infrastructure Capacity Is Like a Cargo Ship
Overload it and it’ll go extra slow at best.
But your cargo will probably sink, and you will scramble to try and save as much of it as possible. To avoid this, we recommend that you prepare in advance for the expected fluctuations in infrastructure load. Among the most demanding e-commerce events are Christmas and Black Friday.
Online stores are traditionally preparing for Christmas in the summer. That’s when it’s time to think not only about marketing campaigns to attract as many customers as possible but also about the technical background that needs to withstand their influx to the website. The key is to know the average traffic to your site. And what the last season was like. Once you have this comparison, you can simply calculate roughly how much increase you can expect this year.
But what if you do even better than last year? That would be great but be prepared for this very desirable option. We recommend having an extra infrastructure capacity of approx. 20% on top of your estimate from the calculation above. However, it is always best to consult your hosting provider directly about tailor-made capacity reserves.
How to calculate the necessary capacity
For a simple calculation of infrastructure capacity, it is sufficient to compare the expected numbers with the current data. If you assume that the application will scale linearly, you can simply use last year’s high season increase in traffic compared to the average traffic in the first half of the year. Use that percentage increase combined with this year’s traffic and you’ll find out what system resources you’ll need this time around.
The advantages of this method are its speed and minimal cost. However, it is only a rough approximation. A more accurate alternative would be the so-called performance test. During this process, we simulate large traffic using an artificial load, while monitoring which components of the infrastructure become bottlenecks. This method also reveals configuration or technological limitations. However, it is fair to mention that performance tests are time-consuming as well as highly specific depending on the technologies used. For small and medium-sized online stores, they can therefore be unnecessarily expensive.
Pro tip: For example, the popular Redis database is single-threaded, so when the performance of a single core becomes saturated, it has reached its maximum at that point, and it doesn’t matter that the server has dozens of free cores available. Simply because such an application cannot use them.
Getting technical: 4 things to watch
CPU – beware of misleading CPU usage graphs if hyperthreading is enabled. The graph aggregating performance across all processor cores then greatly distorts the available performance. Although hyper thread theoretically doubles the number of cores, it practically doesn’t add twice the power. If you see values above 50% on such a graph, you are very close to the maximum… This is typically somewhere between 60 and 70%, depending on the type of load.
RAM – RAM usually does not grow linearly. For example, for databases, some memory allocations are global and others are separate for each connection. It often gets forgotten that the RAM cannot get completely full. If it does, all you need is a small allocation requirement, and the core kills the process that the memory required.
The operating system typically uses the memory reserve as a disk cache, which has a positive effect on performance. If caching is not sufficient, disk operation needs to increase.
Disks – Low disk speeds are a common reason that some operations are slow or completely inoperable at high loads. Whether the solution is sufficient will be shown only at high load or during a performance test. This load can be reduced by more intensive caching, which requires more RAM. It is also possible to solve the situation, for example, by upgrading from SATA / SAS SSD to NVMe disks.
It is also necessary to consider capacity because it can also affect overall performance. All filesystems using COW (copy-on-write) – for example, the ZFS we use, or file systems such as btrfs or WAFL – need extra capacity to run. All of these file systems share an unpleasant feature: when about 90% of the capacity becomes occupied, performance starts to degrade rapidly. It is important not to underestimate this – in times of heavy load, more data is often created and capacity is consumed faster.
Network layer – especially important for cluster solutions, where servers communicate a lot with each other and the speed for internal communication can easily become insufficient. It is also appropriate to consider redundancy – the vshosting~ standard is the doubling of the network layer with the help of LACP technology. So, for example, we make one 2GE interface from 2x 1GE. This creates a capacity of 2GE, but in practice, it is not appropriate to use up more than 1GE, because at that moment we are losing redundancy on the server.
Even the fact that the solution uses a 10GE interface does not mean that such a solution will suffice under all circumstances. All it takes is a small developer error when a simple query transfers a large amount of unnecessary data to the database (typically select * from… and then takes the first X lines in the application) and it is easy to deplete even such a large bandwidth.
Can we help evaluate your infrastructure capacity? Email us at email@example.com.