GPU Cards for AI and Much More

Let's take a brief look at the history of GPU computing, how GPU servers can help your company specifically, and what risks and challenges come with their deployment.

A Bit of History: How It All Started

We've been striving to accelerate, optimize, and reduce the cost of data processing since time immemorial, so using special add-on cards and chips to increase computing power is nothing new. Remember the legendary Intel 8087 math coprocessor from 1980, or the 3D accelerators for computing graphically intensive scenes from the mid-90s.

However, at the end of the 90s, there was such an advancement in technologies and chip capabilities that it seemed specialized cards would slowly disappear. Processor performance was rising at rocket speed, instruction sets became very complex, and graphics cards started handling not only 3D scene generation but also adding graphical effects, materials, or generating scene lighting. It seemed that basic components could handle everything needed. Sure, we still had areas like video processing accelerators for professional cameras or hardware security modules for cryptography, but their use was rather in very specific professions and definitely wasn't technology that an average person would encounter in some form.

NVIDIA and CUDA-Technology

In 2006 (yes, almost 20 years ago), a breakthrough came in the form of CUDA technology. NVIDIA engineers realized that graphics cards were achieving performance levels that allowed for use far beyond "drawing pictures." For example, for data processing and performing mathematical operations. In some cases, even with significantly higher performance than when computing using CPU.

And that's when GPU computing was born.

After CUDA's success, other companies followed a similar path. AMD/ATI introduced ROCm technology, which offers similar capabilities to CUDA but focuses on optimizing the performance of AMD/ATI graphics cards. Just as in the competitive battle with Intel in the processor (CPU) field, AMD competes with NVIDIA in the graphics cards and AI accelerators market. Both companies constantly try to outdo each other in development, whether it's higher performance, larger memory capacity, better energy efficiency, or reducing waste heat production.

Why Graphics Cards Calculate Faster and Can Even Think?

Graphics cards were focused on specific types of mathematical operations from the beginning – massively parallel operations and matrix operations. In 2006, multi-core processors were just being born (Intel Core 2 Duo was released in mid-2006), but GPUs at that time already commonly contained dozens of cores. Models with 128 graphic cores started appearing, and through SLI technology, the possibility of combining multiple cards together and distributing load returned.

Another advantage of GPUs lies in their specific focus. While processors are designed to solve universal tasks, graphics chips can only perform a small range of tasks, but very quickly and efficiently. Typical examples of tasks that are perfect for GPUs are graph traversal and statistical calculations – the core of what we today call artificial intelligence. Processing trees, finding paths, calculating with matrices, all of this can benefit from the massive parallelization that GPUs have encoded in their "DNA."

Where GPU Servers Will Help You and What to Leave to CPU?

A GPU-szerverek mindenhol GPU servers make sense wherever a computational task can be divided into many independent parts that can be processed in parallel.

Typical examples include:

AI and machine learning: Ideal for both training and inference of models
Simulations and calculations in science and research: Physical simulations and complex modeling
Blockchain and cryptocurrencies: Mining and transaction analysis

On the other hand, tasks that cannot be parallelized, or which are dependent on surrounding states, won't benefit from GPU use at all. Thus, GPUs cannot be used to accelerate relational databases or web server responses.

Challenges of GPU Computing – High Costs, Energy Consumption, and Hardware Limits

Every new technology brings disadvantages. GPU computing has three: price, energy demands, and limited hardware resources.

Modern graphics cards are very expensive and energy-intensive. While high-end AMD EPYC processors cost tens of thousands of crowns, consume lower hundreds of watts of energy, and generate manageable amounts of waste heat, GPUs cost hundreds of thousands to millions of crowns, "eat up" up to 1kW of energy, and produce more than 500W of waste heat. This must be considered not only in the design of the card itself but mainly when selecting the server and data center where it will run. A medium-performance GPU server can easily take up the power designed for an entire rack and generate such an amount of heat that it can be problematic to safely dissipate. Because of this, only a significantly smaller number of GPU servers can be placed in a data center compared to classic servers, or the power distribution and cooling need to be significantly strengthened. All of this increases the operating costs of a GPU server and thus the final price for the customer.

Another challenge that GPUs bring is limited hardware resources. While it's not a problem to place terabytes of RAM in a classic server, top GPUs contain only 64-192 GB VRAM and it cannot be added. Tasks working with large amounts of data must therefore be distributed across multiple graphics cards, which leads to additional demands on energy and cooling.

Science and Technology First. AI Training Just as the Cherry on Top

In the vast majority of cases, we use GPU servers today for scientific and technical simulations and image and video rendering. However, there's also a lot of talk about the need for AI and large language models. Many e-commerce companies see AI potential primarily in customer support, as a helper for users with purchases. There's one catch that isn't talked about so much.

While you need relatively little power for generating responses from a trained AI model, even just fine-tuning the model with your domain knowledge (list of products and their parameters, history of human agent customer support responses, and more) requires orders of magnitude more power. This is true even when using pre-trained models like Ollama technology. For generating responses, a server with one to four NVIDIA H100 cards is sufficient, but for fine-tuning the model, you need at least 8 cards. Adding data to the model is simply very expensive, and this needs to be considered when designing solutions.

When GPU Isn't Enough – Other Specialized Accelerators

Today, we mostly talk about GPUs, but we must not forget about the existence of other special accelerators.

For applications in computer security and cryptography, there are many HSM (hardware security module) models, which bring not only uncompromising security of encryption keys (they cannot be obtained from the device in any way, and the only possibility is physical theft of the device*) but also respectable encryption performance. HSMs are used, for example, in the financial sphere or for signing DNS records when using DNSSEC technology. With the advent of the NIS2 directive, increased interest in these modules can be expected.

Another specific accelerator is TPU (tensor processing unit) cards, or ASIC modules for cryptocurrency mining and working with blockchain technology.

Programmable arrays (FPGAs) are also becoming quite popular, i.e., chips that don't contain fixed logic from the factory, and the end customer assembles a specific circuit on them according to requirements. Their advantage is high efficiency, lower consumption than CPU/GPU, and the possibility of easy reprogramming. However, the disadvantage is limited possibilities compared to classic applications, GPU computations, or highly specialized ASIC systems. Therefore, FPGAs currently play a role mainly in prototyping and piece production, for example in specialized network elements, traffic filtering and analysis systems ("traffic washers") or some artificial intelligence solutions where high flexibility and high efficiency are required.

Conclusion

Graphics cards and other accelerators are no longer just tools for cutting-edge technology but have become a key component of modern IT. Their use can significantly speed up and streamline processes in science, research, AI, or e-commerce. Nevertheless, it's important to approach their deployment strategically – evaluate not only the benefits but also costs, energy requirements, and suitability for specific tasks.

If you're thinking about using GPU servers or other specialized accelerators, we'll be happy to help you find the optimal solution tailored to your needs. Contact vshosting, and together we'll discover new ways to take your IT infrastructure performance to the next level.

Request a free consultation now and take your IT performance to the next level!

About the author

Ondřej Flídr is a senior infrastructure administrator at vshosting, a specialist in designing and managing highly available IT solutions, (non)use of public clouds, and hybrid deployments. He actively shares his expertise through articles and lectures on topics such as disaster recovery and public cloud dynamics.

*If the encryption key is stored in a file on disk, an attacker can steal it remotely – simply connect to the server and copy the file to themselves. When using HSM, the key is literally burned into the chip, and the only way to obtain it is to physically break into the data center, disconnect the device, and take it away.

GPU Cards for AI and Much More: How to Leverage GPU Power to Your Advantage