fbpx
Ondřej Flídr

Almost everyone who runs an online project has probably experienced it: an SMS or automated call from monitoring that something is not working. Your beloved project is ailing and needs your attention.

It doesn’t even have to be a catastrophe like a data centre fire or an earthquake – even a burst pipe in a building, an admin error on a network in India or an unprofessional user intervention can disable systems.

At vshosting~ we do everything we can to prevent such situations – read more about this in our article about our uncompromising security. Nevertheless, not all risk factors and realities are within our control and the occurrence of sudden situations cannot be excluded. The scenario is always the same. Suddenly, without any warning, everything goes out and the monitoring lights up red. Nothing works. And now what?

You can complain, grumble, panic… or activate your disaster recovery plans. Scenarios you’ve written down in times of calm in case something goes very, very wrong. Documents you hoped you’d never have to read.

What disaster recovery plans are and what they should look like

Disaster recovery plans are documents that clearly describe how to get your project back to working order. They take many forms and cover many possible scenarios. They can be simple lists of crisis credentials or complex procedures that involve actions across multiple departments in the company.

They should always include:

– what needs to be done
– who should do it
– how to do it
– where backups are available
– in what order to upload services
– when to restore which data

It’s a good idea to include login details for crisis accounts so you have them handy. The goal of disaster recovery plans is always to minimize the business impact of the “disaster” and get the project back on its feet as quickly as possible. They must be written clearly and concisely. Because if you have to follow them, it certainly won’t be in times of peace and quiet.

It is common that disaster recovery plans are not purely IT in nature and overlap with other departments in the company. In addition to the technical management of the problem, you also need to maintain contact with customers or provide documentation for the legal department. Therefore, when writing disaster recovery plans, you need to enlist the help of representatives from other departments and consult with them on the individual steps. You need to make sure that the procedures make sense to everyone and that you haven’t forgotten any non-technical impact.

We can base their design on the principles of aviation. If there is a problem on the plane, the procedure is “Aviate, Navigate, Communicate“, i.e. keep the plane in the air, find out where you are and where you can go, and then deal with communication.

In IT this can be used similarly. First, try to save what you can (i.e. minimize direct impacts, even by shutting down systems if needed). Get the system to a stable enough state, and then secure communications outwards. If you have multiple administrators, the process can and should be parallelized. Can you imagine a worse situation than an admin being pestered with questions by his colleagues in sales and customer support, asking what to tell clients?

Ready to secure your data? Visit our backup and disaster recovery page to get started.

Step 1: Keep the project “afloat”

When the plane hits the ground, it’s over. You’ve lost the plane, the crew, the cargo, and the passengers. That’s why you need to keep it in the air at all costs.

In IT it works the same way. If you lose one service, you need to minimize the impact on others. Prevent a cascading effect. You can’t afford to lose everything. Even at the cost of downtime. If you lose all your data and all your systems, it’s a complete existential risk to the company. Short-term project downtime is a much lesser evil than long-term data recovery.

A typical example of this is a bug in an application that saves corrupted data during an update. In this case, it is much better to shut down the application and prevent more data from being destroyed than to leave it running and work on issuing a fix.

Step 2: Get your bearings

Well done, you have successfully prevented the destruction of all data in the first step. However, the damage is still done, the application is inaccessible, clients are getting angry. This is a good time to start figuring out what happened, what the overall impact is, and develop a recovery plan.

Can I repair the app with a simple fix? Do I need to restore any data from a backup? How long will it take to return to minimum functional state? And how long will it take for a complete fix? How much data have you irretrievably lost? These are all questions that are on the table right now that you need to know the answer to quickly.

Step 3: Communicate with your customers

At this stage you already know what has happened, what you need to do and what the implications are for your company. Even how long it will take to repair. At this point you can communicate to customers an estimate of the time required.

We recommend that one staff member be designated as the communication liaison. Someone who will talk to sales, support and customers from the beginning. He will also serve as a shield standing in front of the admins who are saving the day. This role is a good one to assign to someone in project management, in an agile world this would be the project owner. And it’s also important to make sure everyone in the company knows who to contact with questions.

Communication outward is undoubtedly important, and in a crisis even more so. However, it should not be at the expense of operational recovery. Especially for sales and support, it is always the case that the information we convey to them must be current and valid at a given point in time. But it doesn’t have to be 100% precise and unchanging. We always build on what we know at a given point in time.

The situation may seem easy to fix at the outset, but further investigation may reveal much more dramatic consequences. Or vice versa. That’s why we recommend never communicating deadlines and estimates as definitive. It is always necessary to say “but the situation may change”, and you should always allow for a significant extension of the recovery period just to be safe. Everyone in the company needs to take this into account.

Crisis management competences

In calm times, it is understandable that large investments or infrastructure interventions need to be consulted with the management and planned thoroughly. You can take longer to select the most suitable supplier, negotiate good business terms with them, analyse and test the impact of the solution. But once the crisis hits, prudence largely goes by the wayside.

The rescue team needs to be empowered to make decisions quickly and not be concerned with entirely optimal efficiency or economy. If a key piece of hardware “dies” and you don’t have a replacement in stock, there is no room for negotiation with suppliers. You need to take the company card, get in the car and head to the nearest store that has a new piece in stock. And that comes at a higher cost. Is long downtime cheaper for your project?

If you’re operating in the cloud, the recovery team must have the power to launch additional instances immediately – regardless of the infrastructure budget. Alternatively, set budget limits in advance that you can’t go beyond, but consider the cost for every minute your project is down.

Backups, backups, backups

Backups and their recovery are an integral part of any disaster recovery plan. And not just having backups, but testing their functionality regularly and setting the optimal backup frequency. Regardless of whether you have infrastructure in the cloud or not.

Even in the cloud, you need to back it up!

We often see the opinion that infrastructure and data in the cloud do not need to be backed up. We hear: “that’s why we pay for the cloud, so we don’t have to spend money on backups”. This approach is flawed and can lead to the bankruptcy of a company.

In the cloud, you’re buying computing capacity, storage space and related services, but you’re not buying the security of your data. You’re only buying the security of knowing that if there’s a problem, you can restore your data back to the cloud and start a new set of servers. However, it is your responsibility to have a copy of the data to restore, to have information about the configuration of the servers, and to have the application deployment procedure described.

The cloud gives you the platform, but you give it the business value. The cloud is able to help you do this with geo-replication or a backup service. You have to be interested in using it, including making sense of it and having a data recovery process in place.

Backup quality testing and optimal backup frequency

You also need to be sure that you can restore your backups. All too often, a company has backed up beautifully, but when it came to the need for a restore, the backups were unusable. And it may not just be a fault in the backup itself. Backups may be inaccessible or incomplete due to a problem. That’s why they need to be tested regularly. To test that you are able to recover data from backups, as well as that you are backing up everything you need – and in sufficient quantities.

You will never have 100% of your data backed up, you will always lose something. The question remains whether you can lose data in a week, a day or an hour. How much data are you willing to lose? Or how much money is your company willing to invest in data backup and recovery? There is a non-linear relationship here – if backing up once a day costs XY kč, backing up twice a day doesn’t cost 2 XY kč, it costs 5 XY or 10 XY. You need to ask yourself if the data outside the backup is worth that much, or if it is more profitable to sacrifice the data already.

What to take away from this

Nothing is 100% and mistakes happen. Every admin has experienced a similar situation and it’s never pleasant. Still, you can always prepare for the situation in advance and minimize its impact on the company’s operations.

It’s not free and it’s not easy. And in quiet times it may seem like it does nothing for you, but it will prevent big problems in the future. And it’s always worth it. Think about it.

Want advice on the optimal backup mode for your project? Contact our experts: consultation@vshosting.eu. They will prepare a tailor-made solution for you free of charge.


vshosting~

Imagine you’re in the middle of the peak season, marketing campaigns are in full swing and orders are pouring in. Sounds nice, doesn’t it? Unless the database suddenly stops working that is. Perhaps an inattentive colleague accidentally deletes it. Or maybe the disk array fails – it doesn’t matter in the end, the result is the same. Orders start falling into a black hole. You have no idea what someone bought and for how much, let alone where to send it to them. Of course, you have a database backup, but the file is quite large and it can take several hours to restore.

Now what?

Roll up your sleeves, start pulling the necessary information manually from email logs and other dark corners. And hope that nothing escapes you. But those few hours of recovery will be really long and incredibly expensive. Some orders will certainly be lost and you will be catching up with the hours of database downtime for a few more days.

Standard database backup (and why recovery takes so long)

Standard backup, which most larger e-shoppers are used to, is carried out using the so-called “dump” method, where the entire database is saved as a single file. The file contains sequences of commands that can be edited as needed. This method is very simple to implement. Another advantage is that the backup can be performed directly on the server on which the database is running.

However, a significant disadvantage of the dump is the time needed to restore the database from such a backup. This applies especially to large databases. As each command must be reloaded separately from the saved file into the database, the whole process can take several hours. At the same time, you can only restore the data that was contained in the last dump – you will lose the latest entries in the database that have not yet been backed up. The result is an unpleasant scenario described in the introduction – a lot of manual work and lost sales.

Want to dive deeper into tailored backup solutions? Visit our detailed page to learn how we can help protect your business.

Premium backup with Point-in-time recovery

In order for our clients to avoid similar problems, we offer them premium database backups. This service allows for very fast recovery of databases, to the state just before the moment of failure. We achieve this by combining snapshot backups with binary log replication.

How does it work exactly?

We create an asynchronous replica from the primary database to the backup server. On this backup server, we make a backup using a snapshot. In parallel, we continuously copy binary logs to the backup server, which record all changes in the primary database. In the event of an accident, the logs will help us determine exactly when the problem occurred. At the same time, thanks to them, we have records of operations that immediately preceded the accident and are not backed up by a snapshot.

By combining these two methods, we can – in case of failure – quickly restore the database to its original state (so-called Point-in-time recovery, recovery to a point in time).

First, we restore the latest backup snapshot and copy it to the primary server from the backup server. Subsequently, for binary logs, we identify the place where the destructive operation took place and use them to restore the most recent data.

The speed of the whole process can be as much as 10 times higher than recovery from a dump. It is limited only by the write speed to the disk and the network connection. With a database of around 100 GB, the length of the entire process will be in the order of dozens of minutes.

What is needed for implementation?

Unlike the classic dump backup, which you can perform directly on the primary server, you need a backup server for the premium option. This server should have similar performance as the production server. The size of the storage is also important: we recommend about twice the volume of the disk with the primary database. This capacity should allow snapshots to be backed up for at least the last 48 hours (if you opt for hourly backups).

We will be happy to recommend the ideal storage volume for your database – book a free consultation at consultation@vshosting.eu –⁠ it depends on the frequency of backups, the number of changes in your database, and other factors.

Premium backup also depends on the choice of database technologies. Due to the use of binary logs, it can only be implemented in relational databases such as MariaDB or PostgreSQL. NoSQL databases do not have a transaction log and are therefore not compatible with this method.

Another condition is a more conservative database setup on the backup server. The repository must always be consistent in order to take snapshots using ZFS. Upgrades that prioritize database performance over consistency cannot be used on the backup server. Therefore, it is necessary to choose a faster storage option than on the primary server, where a higher performance setting that reduces consistency is feasible.

Is the premium database backup for you?

If you can’t afford to lose any data in your business, let alone run for hours without a database, our premium backup with Point-in-time recovery is right for you. An example of a project that will benefit the most from this service is an online store with large databases, which would cost thousands of euros. In this case, an investment in the backup server needed for premium backup will pay off very quickly.

Conversely, if you have a smaller database with just a few changes per hour, you’re probably perfectly fine opting for a standard dump backup.

If you have any questions, we’ll be happy to advise you free of charge: consultation@vshosting.eu.


We have successfully assisted with migrations for hundreds of clients over the course of 17 years. Join them.

  1. Schedule a consultation

    Simply leave your details. We’ll get back to you as soon as possible.

  2. Free solution proposal

    A no commitment discussion about how we can help. We’ll propose a tailored solution.

  3. Professional implementation

    We’ll create the environment for a seamless migration according to the agreed proposal.

Leave us your email or telephone number




    Or contact us directly

    +420 246 035 835 Available 24/7
    consultation@vshosting.eu
    Copy
    We'll get back to you right away.