Chapter 0: introduction
1. Motivation — why do we even need GPUs?
My desktop computer is equipped with an Asus MG28UQ screen
(3840x2160,
144Hz, released in
2016) and an i7-4770K processor
First, we need to estimate the number of instructions sent to the CPU every second, which we can do with the formula width x height x frames/second x instructions/pixel. We have precise values for the first three elements, but we need to make an assumption regarding the last one. Let's conservatively estimate that 100 instructions are required to compute a pixel's value. Thus, we have: 3840 x 2160 x 144 x 100 = 119'439'360'000. This already lands us somewhere north of 100 billion instructions per second, huh.
Next, we need to estimate how many instructions the processor can run in a second. We can use the following formula: ticks/second x cores x logical_core/core. Let's just plug values from the manufacturer's spec sheet in the equation. 3'500'000'000 x 4 x 2 = 28'000'000'000. So around 30 billions of instructions per second.
30 billions < 100 billions — while we are using rough estimates, it's reasonable to conclude the processor is not up to the task, even more so when we consider that the CPU has to deal with more than these graphical operations, e.g. the application's logic, the operating system, and whatever other programs live in the background. Even if we were to use a less fancy screen, such as a 1920x1080 one running at 60Hz, about half of the processor's power would be dedicated to rendering alone.
You can play with the following widget to explore different combinations of CPUs and screens:
Screen:
CPU:
x
MHz
=
IPS
=
IPS
However, my computer handles my screen just fine in practice. This is thanks to an additional component I did not mention earlier: my GeForce GTX 1060 GPU (released in 2016), which is able to run more than 4 trillion (!) operations per second.
2. What is a GPU?
The distinguishing feature of a CPU is not the fact that it computes but rather its central role as a controller of other components (hence the name Central Processing Unit). The CPU issues most of the commands to other components: memories, I/O devices, … or even computing units. In fact, a GPU is one such CPU-controlled computing unit. GPUs are specialized in a category of tasks that includes graphics rendering (as can be inferred from its full name, Graphics Processing Unit).
This does not answer the question of how exactly a GPU achieves its magic. There are two parts to the answer: parallelism and specialization.
GPUs are highly parallel devices, unlike the sequential CPUs (in reality, CPUs are free to rely on parallelism as long as some illusion of sequentiality is maintained). CPUs need this sequentiality because they typically run a program only once and support complex control flows; e.g., previous instructions may impact which instructions are to be run later. In contrast, consider graphics rendering: to render a pixel, we run a simple program (100 instructions in our example) with a basic control flow many times. Each run of the program handles a single pixel. The same 100 instructions are run for all pixels. Since different runs of this program do not influence each other (the color of a pixel usually does not depend on that of its neighbors), we can just cram lots of cores on the GPU and let each of them manage one run of the program! But wait, CPU cores are not cheap. How could this option be practical?
Luckily, GPUs are specialized devices. The instructions required for graphics rendering are quite simple ones (at any rate, they are much simpler than those you would find in a CPU). Simple enough, in fact, that you can build GPU cores for a fraction of the cost of a CPU one.
Although GPUs first gained traction for their use in graphics rendering, they are now widely for code that can leverage their massive parallelism — AI or crypto mining come to mind (or whatever kids are up to these days).
I do not currently describe the inner workings of GPUs in detail. For a first introduction to the topic, I recommend checking both this beginner-friendly video overview and this more quantitative/memory-centric one.
3. The fragmentation problem
GPUs stem from a complicated and still ongoing history. They were developped incrementally, evolving from processors for almost fixed pipelines to the very flexible computing devices we have today. They do not seem to have settled on a stable form yet. For instance, modern graphics cards support real-time raytracing and upscaling. Concretely, what this means is that some hardware manufacturers are adding custom instructions specialized for these tasks.
GPU implementationis vary wildly across generations or manufacturers. This is bad for programmers: ideally, we would like to write a program once and to have it run on any hardware. If our program runs fine on our computer but nowhere else, it probably is quite useless.
Luckily, there are libraries built to act as an abstraction layer for GPU programming. Vulkan is one such library. Using it, you can write a program that says "draw a triangle", and the library takes care of translating this program into a sequence of commands that your GPU understands.
4. The Vulkan API
Vulkan is the library that we use during this series. I picked it over its concurrents because it is low-level, cross-platform, based on an open standard and modern in its design (nice debugging facilities, multi GPUs support). Its extensibility and flexibility make it a good fit for the shifting nature of the GPU landscape.
5. The plan
- In the next chapter, we describe the key concepts of Vulkan.
- In the transfer chapter, we meet the simplest Vulkan workload there is: exchanging data between CPU and GPU. We consider both raw buffers (a linear array of bytes) and images (a raw buffer + some metadata).
- In the compute chapter, we finally get to tap into the massive parallelism of GPUs by runnings arbitrary non-graphical programs. We introduce the domain specific language used for describing GPU programs.
- In the graphics chapter, we finally draw things. This is the most complex of the three standard workloads. We will not yet draw the images we generate to the screen — we get to real-time graphics in the next chapter.
- In the swapchain chapter, we introduce the concept of a swapchain. Swapchains control how images are presented to screen. Not as trivial as it sounds!
- Lastly, I dump a bunch of links primarily related to best practices in the links dump chapter.