Chapter 0: introduction

1. Motivation — why do we even need GPUs?

My desktop computer is equipped with an Asus MG28UQ screen (3840x2160, 144Hz, released in 2016) and an i7-4770K processor (3.5GHz, 4 cores, released in 2013). Is my processor powerful enough for running the graphical operations required by a full screen application? We can answer this question by using simple calculations to estimate what fraction of my CPU's capacity would be used up in this context.

First, we need to estimate the number of instructions sent to the CPU every second, which we do with the formula width x height x frames/second x instructions/pixel. We set width x height to 3840 x 2160 and frames/second to 144, as per the specification of my screen. What about instructions/pixel? That is, how many instructions would be enough to compute the color of a pixel? This depends on the complexity of our graphical application, of course. Let's estimate that ours is simple and only requires 100 instructions. We end up with 3840 x 2160 x 144 x 100 = 119'439'360'000. This already lands us north of 100 billion instructions per second, huh.

Next, we need to estimate how many instructions the CPU runs in a second. We use the formula ticks/second x cores x logical_core/core. Let's just plug values from the manufacturer's spec sheet in the equation: 3'500'000'000 x 4 x 2 = 28'000'000'000. So around 30 billions of instructions per second.

30 billions < 100 billions — while we are using rough estimates, the CPU seems to be insufficient for the task, especially when we consider that it has to contend with more than graphical operations: there is the rest of the application's logic, plus the operating system and whatever other programs live in the background. Even if we were to use a less fancy screen, such as a 1920x1080 one running at 60Hz, about half of the processor's power would stil be dedicated to rendering alone — this is a lot.

You can play with the following widget to explore different combinations of CPUs and screens:

However, this scenario is not an issue in practice. This is thanks to an additional component I did not mention earlier: my GeForce GTX 1060 GPU (released in 2016), which is able to run more than 4 trillion (!) operations per second.

2. What is a GPU?

The distinguishing feature of a CPU is not the fact that it computes but rather its central role as a controller of other components (hence the name Central Processing Unit). The CPU issues most of the commands to other components: memories, I/O devices, … or even computing units. In fact, a GPU is one such CPU-controlled computing unit. GPUs are specialized in a category of tasks that includes graphics rendering (as can be inferred from its full name, Graphics Processing Unit).

This does not tell us how a GPU achieves its magic. There are two parts to the full answer: parallelism and specialization.

GPUs are highly parallel devices, unlike the sequential CPUs (in reality, CPUs are free to rely on parallelism as long as some illusion of sequentiality is maintained). CPUs need this sequentiality to support complex control flows; e.g., previous instructions may impact which instructions are to be run later. In contrast, consider graphics rendering: to render a pixel, we run a simple program (100 instructions in our example) with a basic control flow once per pixel. Each run of the program handles a single pixel. The same 100 instructions are run for all pixels. Since different runs of this program do not influence each other (the color of a pixel usually does not depend on that of its neighbors), we can just cram lots of cores on the GPU and let each of them manage one run of the program! But wait, CPU cores are not cheap. How could this option be practical?

Luckily, GPUs are also specialized devices. The instructions required for graphics rendering are quite simple ones (at any rate, they are much simpler than those you would find in a CPU). Simple enough, in fact, that you can build GPU cores for a fraction of the cost of a CPU one.

Although GPUs first gained traction for their use in graphics rendering, they are now widely for code that can leverage their massive parallelism — AI or crypto mining come to mind (or whatever kids are up to these days).

For a first introduction to the inner workings of GPUs, please refer to this beginner-friendly video overview and this more quantitative/memory-centric one.

3. Why do we need Vulkan?

GPUs stem from a complicated and still ongoing history. They were developped incrementally, evolving from processors for almost fixed pipelines to the very flexible computing devices we have today. They do not seem to have settled on a stable form yet, as exemplified by the recent emergence of features such as real-time raytracing and upscaling: hardware manufacturers are often extending their devices with new, custom instructions specialized for some tasks, and providing developers with ways of using these facilities.

GPU implementations vary wildly across generations and manufacturers. This is bad for programmers — ideally, we would like to write a program once and to have it run on any hardware (we are not merely programming for ourselves!). Alas, different GPUs have different APIs.

It would be nice if we had a library that acted as a translation layer for all those APIs. Then, we could write a program saying "draw a green sphere", and the library would take care of translating this program into a sequence of commands tailored for a specific GPU. However, writing such a library would be a Herculean task: GPUs are very complex devices, there are many of them and new ones appear constantly. The problem is therefore solved in another way: people define generic APIs for GPUs, and it is up to the manufacturers to ensure that their devices support them. Manufacturers do this by implementing the functions specified within the APIs in the drivers of their devices. In addition, a special program called a runtime is provided by the developers of an API. This program is tasked with discovering devices whose drivers support it and with streamlining interactions with them.

As you probably guessed, Vulkan is such an API.

4. Why Vulkan and not X or Y?

I picked Vulkan it over its (many) concurrents because it is low-level, cross-platform, based on an open standard, modern in its design (nice debugging facilities, multi GPUs support) and extensible.

5. Reading the Vulkan documentation

This series is sprinkled with links to the Vulkan documentation, such as vkCreateInstance. Here are some tips for reading these pages efficiently (and about what to ignore):

Functions are of the form vkFunctionName, structures are of the form VkStructName, constants are of the form VK_CONSTANT_NAME.
Many functions come in pairs: vkCreateXXX comes with a vkDestroyXXX, vkAllocateXXX comes with a vkFreeXXX.
vkCreateXXX functions take a VkXXXCreateInfo structure for parameterizing the creation process.
Many functions take a pAllocator argument for interacting with custom allocators for CPU-side memory. This topic is out of the scope of this series (and not really critical, unlike allocators for GPU-side memory), so we just ignore these arguments.
Vulkan structures include an sType field. This is simply a constant that lets the driver identify the struct. For structure VkStructName, the value of sType has to be VK_STRUCTURE_TYPE_STRUCT_NAME. We just ignore these arguments.
Many structures include a pNext field. The role of this field is to support future versions of the standard that may want to expand the initial definition of the structure, without breaking backward compatibility. We usually just ignore this field.
Similarly, many structures contain a flags field for controlling how an operation should proceed. It sometimes is the case that no values other than 0 are supported for this field. Again, this is done for future-proofing reasons. When there are no relevant values for flags, we just ignore this field.
We mostly avoid things added by extensions. If you see something in the library that contains the substring khr, ext, amd or nv and wonder why it is not explained, it is because it is not a fundamental part of Vulkan.
For the bulk of this series, I ignore everything that came after Vulkan 1.0: this version introduces all the main concepts that make Vulkan what it is, and this series is long enough as is. I believe that understanding later additions to the API is not too complicated once you become proficient with version 1.0. We discuss modern Vulkan in a dedicated chapter.
I also mostly ignore niche topics such as cubemaps and multi-planar images.

X. The rest of this guide

In the next chapter, we describe the key concepts of Vulkan.
In the resources and transfer chapter, we meet the simplest Vulkan workload there is: exchanging data between CPU and GPU. We consider both raw buffers (a linear array of bytes) and images (a raw buffer + some metadata).
In the compute chapter, we finally get to tap into the massive parallelism of GPUs by running arbitrary non-graphical programs. We introduce the domain specific language used for describing GPU programs.
In the graphics chapter, we finally get to draw things. This is the most complex of the three standard workloads. We will not yet print the images we generate to the screen — we get to real-time graphics in the next chapter.
In the swapchain chapter, we introduce the concept of a swapchain. Swapchains control how images are presented to screen. Not as trivial as it sounds!
In the modern Vulkan chapter, we discuss post 1.0 Vulkan and how it can make our lives easier.
In the Vulkan in practice chapter, we discuss good practices, debugging tools and such.
In the going further chapter, we part ways (everything must end!). We briefly discuss all the other directions you may want to explore at this stage.