fbpx
vshosting~

Imagine you’re in the middle of the peak season, marketing campaigns are in full swing and orders are pouring in. Sounds nice, doesn’t it? Unless the database suddenly stops working that is. Perhaps an inattentive colleague accidentally deletes it. Or maybe the disk array fails – it doesn’t matter in the end, the result is the same. Orders start falling into a black hole. You have no idea what someone bought and for how much, let alone where to send it to them. Of course, you have a database backup, but the file is quite large and it can take several hours to restore.

Now what?

Roll up your sleeves, start pulling the necessary information manually from email logs and other dark corners. And hope that nothing escapes you. But those few hours of recovery will be really long and incredibly expensive. Some orders will certainly be lost and you will be catching up with the hours of database downtime for a few more days.

Standard database backup (and why recovery takes so long)

Standard backup, which most larger e-shoppers are used to, is carried out using the so-called “dump” method, where the entire database is saved as a single file. The file contains sequences of commands that can be edited as needed. This method is very simple to implement. Another advantage is that the backup can be performed directly on the server on which the database is running.

However, a significant disadvantage of the dump is the time needed to restore the database from such a backup. This applies especially to large databases. As each command must be reloaded separately from the saved file into the database, the whole process can take several hours. At the same time, you can only restore the data that was contained in the last dump – you will lose the latest entries in the database that have not yet been backed up. The result is an unpleasant scenario described in the introduction – a lot of manual work and lost sales.

Want to dive deeper into tailored backup solutions? Visit our detailed page to learn how we can help protect your business.

Premium backup with Point-in-time recovery

In order for our clients to avoid similar problems, we offer them premium database backups. This service allows for very fast recovery of databases, to the state just before the moment of failure. We achieve this by combining snapshot backups with binary log replication.

How does it work exactly?

We create an asynchronous replica from the primary database to the backup server. On this backup server, we make a backup using a snapshot. In parallel, we continuously copy binary logs to the backup server, which record all changes in the primary database. In the event of an accident, the logs will help us determine exactly when the problem occurred. At the same time, thanks to them, we have records of operations that immediately preceded the accident and are not backed up by a snapshot.

By combining these two methods, we can – in case of failure – quickly restore the database to its original state (so-called Point-in-time recovery, recovery to a point in time).

First, we restore the latest backup snapshot and copy it to the primary server from the backup server. Subsequently, for binary logs, we identify the place where the destructive operation took place and use them to restore the most recent data.

The speed of the whole process can be as much as 10 times higher than recovery from a dump. It is limited only by the write speed to the disk and the network connection. With a database of around 100 GB, the length of the entire process will be in the order of dozens of minutes.

What is needed for implementation?

Unlike the classic dump backup, which you can perform directly on the primary server, you need a backup server for the premium option. This server should have similar performance as the production server. The size of the storage is also important: we recommend about twice the volume of the disk with the primary database. This capacity should allow snapshots to be backed up for at least the last 48 hours (if you opt for hourly backups).

We will be happy to recommend the ideal storage volume for your database – book a free consultation at consultation@vshosting.eu –⁠ it depends on the frequency of backups, the number of changes in your database, and other factors.

Premium backup also depends on the choice of database technologies. Due to the use of binary logs, it can only be implemented in relational databases such as MariaDB or PostgreSQL. NoSQL databases do not have a transaction log and are therefore not compatible with this method.

Another condition is a more conservative database setup on the backup server. The repository must always be consistent in order to take snapshots using ZFS. Upgrades that prioritize database performance over consistency cannot be used on the backup server. Therefore, it is necessary to choose a faster storage option than on the primary server, where a higher performance setting that reduces consistency is feasible.

Is the premium database backup for you?

If you can’t afford to lose any data in your business, let alone run for hours without a database, our premium backup with Point-in-time recovery is right for you. An example of a project that will benefit the most from this service is an online store with large databases, which would cost thousands of euros. In this case, an investment in the backup server needed for premium backup will pay off very quickly.

Conversely, if you have a smaller database with just a few changes per hour, you’re probably perfectly fine opting for a standard dump backup.

If you have any questions, we’ll be happy to advise you free of charge: consultation@vshosting.eu.


vshosting~

But your cargo will probably sink, and you will scramble to try and save as much of it as possible. To avoid this, we recommend that you prepare in advance for the expected fluctuations in infrastructure load. Among the most demanding e-commerce events are Christmas and Black Friday.

Online stores are traditionally preparing for Christmas in the summer. That’s when it’s time to think not only about marketing campaigns to attract as many customers as possible but also about the technical background that needs to withstand their influx to the website. The key is to know the average traffic to your site. And what the last season was like. Once you have this comparison, you can simply calculate roughly how much increase you can expect this year.

But what if you do even better than last year? That would be great but be prepared for this very desirable option. We recommend having an extra infrastructure capacity of approx. 20% on top of your estimate from the calculation above. However, it is always best to consult your hosting provider directly about tailor-made capacity reserves.

How to calculate the necessary capacity

For a simple calculation of infrastructure capacity, it is sufficient to compare the expected numbers with the current data. If you assume that the application will scale linearly, you can simply use last year’s high season increase in traffic compared to the average traffic in the first half of the year. Use that percentage increase combined with this year’s traffic and you’ll find out what system resources you’ll need this time around.

The advantages of this method are its speed and minimal cost. However, it is only a rough approximation. A more accurate alternative would be the so-called performance test. During this process, we simulate large traffic using an artificial load, while monitoring which components of the infrastructure become bottlenecks. This method also reveals configuration or technological limitations. However, it is fair to mention that performance tests are time-consuming as well as highly specific depending on the technologies used. For small and medium-sized online stores, they can therefore be unnecessarily expensive.

Pro tip: For example, the popular Redis database is single-threaded, so when the performance of a single core becomes saturated, it has reached its maximum at that point, and it doesn’t matter that the server has dozens of free cores available. Simply because such an application cannot use them.

Getting technical: 4 things to watch

CPU – beware of misleading CPU usage graphs if hyperthreading is enabled. The graph aggregating performance across all processor cores then greatly distorts the available performance. Although hyper thread theoretically doubles the number of cores, it practically doesn’t add twice the power. If you see values above 50% on such a graph, you are very close to the maximum… This is typically somewhere between 60 and 70%, depending on the type of load.

RAM – RAM usually does not grow linearly. For example, for databases, some memory allocations are global and others are separate for each connection. It often gets forgotten that the RAM cannot get completely full. If it does, all you need is a small allocation requirement, and the core kills the process that the memory required.

The operating system typically uses the memory reserve as a disk cache, which has a positive effect on performance. If caching is not sufficient, disk operation needs to increase.

Disks – Low disk speeds are a common reason that some operations are slow or completely inoperable at high loads. Whether the solution is sufficient will be shown only at high load or during a performance test. This load can be reduced by more intensive caching, which requires more RAM. It is also possible to solve the situation, for example, by upgrading from SATA / SAS SSD to NVMe disks.

It is also necessary to consider capacity because it can also affect overall performance. All filesystems using COW (copy-on-write) – for example, the ZFS we use, or file systems such as btrfs or WAFL – need extra capacity to run. All of these file systems share an unpleasant feature: when about 90% of the capacity becomes occupied, performance starts to degrade rapidly. It is important not to underestimate this – in times of heavy load, more data is often created and capacity is consumed faster.

Network layer – especially important for cluster solutions, where servers communicate a lot with each other and the speed for internal communication can easily become insufficient. It is also appropriate to consider redundancy – the vshosting~ standard is the doubling of the network layer with the help of LACP technology. So, for example, we make one 2GE interface from 2x 1GE. This creates a capacity of 2GE, but in practice, it is not appropriate to use up more than 1GE, because at that moment we are losing redundancy on the server.

Even the fact that the solution uses a 10GE interface does not mean that such a solution will suffice under all circumstances. All it takes is a small developer error when a simple query transfers a large amount of unnecessary data to the database (typically select * from… and then takes the first X lines in the application) and it is easy to deplete even such a large bandwidth.

Can we help evaluate your infrastructure capacity? Email us at consultation@vshosting.eu.


We have successfully assisted with migrations for hundreds of clients over the course of 17 years. Join them.

  1. Schedule a consultation

    Simply leave your details. We’ll get back to you as soon as possible.

  2. Free solution proposal

    A no commitment discussion about how we can help. We’ll propose a tailored solution.

  3. Professional implementation

    We’ll create the environment for a seamless migration according to the agreed proposal.

Leave us your email or telephone number




    Or contact us directly

    +420 246 035 835 Available 24/7
    consultation@vshosting.eu
    Copy
    We'll get back to you right away.