Chapter 0: introduction
1. Motivation — why do we even need GPUs?
My desktop computer is equipped with an Asus MG28UQ screen
(
First, we need to estimate the number of instructions sent to the CPU every second, which we do
with the formula
Next, we need to estimate how many instructions the CPU runs in a second. We use the formula
30 billions < 100 billions — while we are using rough estimates, the CPU seems to be
insufficient for the task, especially when we consider that it has to contend with more than
graphical operations: there is the rest of the application's logic, plus the operating system and
whatever other programs live in the background. Even if we were to use a less fancy screen, such
as a
1920x1080 one running at
You can play with the following widget to explore different combinations of CPUs and screens:
Screen:
CPU:
x
MHz
=
IPS
=
IPS
However, this scenario is not an issue in practice. This is thanks to an additional component I did not mention earlier: my GeForce GTX 1060 GPU (released in 2016), which is able to run more than 4 trillion (!) operations per second.
2. What is a GPU?
The distinguishing feature of a CPU is not the fact that it computes but rather its central role as a controller of other components (hence the name Central Processing Unit). The CPU issues most of the commands to other components: memories, I/O devices, … or even computing units. In fact, a GPU is one such CPU-controlled computing unit. GPUs are specialized in a category of tasks that includes graphics rendering (as can be inferred from its full name, Graphics Processing Unit).
This does not tell us how a GPU achieves its magic. There are two parts to the full answer: parallelism and specialization.
GPUs are highly parallel devices, unlike the sequential CPUs (in reality, CPUs are free to rely on parallelism as long as some illusion of sequentiality is maintained). CPUs need this sequentiality to support complex control flows; e.g., previous instructions may impact which instructions are to be run later. In contrast, consider graphics rendering: to render a pixel, we run a simple program (100 instructions in our example) with a basic control flow once per pixel. Each run of the program handles a single pixel. The same 100 instructions are run for all pixels. Since different runs of this program do not influence each other (the color of a pixel usually does not depend on that of its neighbors), we can just cram lots of cores on the GPU and let each of them manage one run of the program! But wait, CPU cores are not cheap. How could this option be practical?
Luckily, GPUs are also specialized devices. The instructions required for graphics rendering are quite simple ones (at any rate, they are much simpler than those you would find in a CPU). Simple enough, in fact, that you can build GPU cores for a fraction of the cost of a CPU one.
Although GPUs first gained traction for their use in graphics rendering, they are now widely for code that can leverage their massive parallelism — AI or crypto mining come to mind (or whatever kids are up to these days).
For a first introduction to the inner workings of GPUs, please refer to this beginner-friendly video overview and this more quantitative/memory-centric one.
3. Why do we need Vulkan?
GPUs stem from a complicated and still ongoing history. They were developped incrementally, evolving from processors for almost fixed pipelines to the very flexible computing devices we have today. They do not seem to have settled on a stable form yet, as exemplified by the recent emergence of features such as real-time raytracing and upscaling: hardware manufacturers are often extending their devices with new, custom instructions specialized for some tasks, and providing developers with ways of using these facilities.
GPU implementations vary wildly across generations and manufacturers. This is bad for programmers — ideally, we would like to write a program once and to have it run on any hardware (we are not merely programming for ourselves!). Alas, different GPUs have different APIs.
It would be nice if we had a library that acted as a translation layer for all those APIs. Then, we could write a program saying "draw a green sphere", and the library would take care of translating this program into a sequence of commands tailored for a specific GPU. However, writing such a library would be a Herculean task: GPUs are very complex devices, there are many of them and new ones appear constantly. The problem is therefore solved in another way: people define generic APIs for GPUs, and it is up to the manufacturers to ensure that their devices support them. Manufacturers do this by implementing the functions specified within the APIs in the drivers of their devices. In addition, a special program called a runtime is provided by the developers of an API. This program is tasked with discovering devices whose drivers support it and with streamlining interactions with them.
As you probably guessed, Vulkan is such an API.
4. Why Vulkan and not X or Y?
I picked Vulkan it over its (many) concurrents because it is low-level, cross-platform, based on an open standard, modern in its design (nice debugging facilities, multi GPUs support) and extensible.
5. Reading the Vulkan documentation
This series is sprinkled with links to the Vulkan documentation, such as vkCreateInstance. Here are some tips for reading these pages efficiently (and about what to ignore):
- Functions are of the form vkFunctionName, structures are of the form VkStructName, constants are of the form VK_CONSTANT_NAME.
- Many functions come in pairs: vkCreateXXX comes with a vkDestroyXXX, vkAllocateXXX comes with a vkFreeXXX.
- vkCreateXXX functions take a VkXXXCreateInfo structure for parameterizing the creation process.
- Many functions take a pAllocator argument for interacting with custom allocators. This topic is out of the scope of this series, so we just ignore these arguments.
- Vulkan structures include an sType field. This is simply a constant that lets the driver identify the struct. For structure VkStructName, the value of sType has to be VK_STRUCTURE_TYPE_STRUCT_NAME. We just ignore these arguments for the rest of this series.
- Many structures include a pNext field. The role of this field is to support future versions of the standard that may want to expand the initial definition of the structure, without breaking backward compatibility. We usually just ignore this field.
- Similarly, many structures contain a flags field for controlling how an operation should proceed. It sometimes is the case that no values other than 0 are supported for this field. Again, this is done for future-proofing reasons. When there are no relevant values for flags, we just ignore this field.
- We mostly avoid things added by extensions. If you see something in the library that contains the substring khr, ext, amd or nv and wonder why it is not explained, it is because it is not a fundamental part of Vulkan.
- For the bulk of this series, I ignore everything that came after Vulkan 1.0: this version introduces all the main concepts that make Vulkan what it is, and this series is long enough as is. I believe that understanding later additions to the API is not too complicated once you become proficient with version 1.0. Note that the last chapter of this series gives a broad overview of post-1.0 additions to Vulkan.
- In the next chapter, we describe the key concepts of Vulkan.
- In the transfer chapter, we meet the simplest Vulkan workload there is: exchanging data between CPU and GPU. We consider both raw buffers (a linear array of bytes) and images (a raw buffer + some metadata).
- In the compute chapter, we finally get to tap into the massive parallelism of GPUs by running arbitrary non-graphical programs. We introduce the domain specific language used for describing GPU programs.
- In the graphics chapter, we finally get to draw things. This is the most complex of the three standard workloads. We will not yet print the images we generate to the screen — we get to real-time graphics in the next chapter.
- In the swapchain chapter, we introduce the concept of a swapchain. Swapchains control how images are presented to screen. Not as trivial as it sounds!