Chapter 0: introduction

1. Motivation — why do we even need GPUs?

My desktop computer is equipped with an Asus MG28UQ screen (3840x2160, 144Hz, released in 2016) and an i7-4770K processor (3.5GHz, 4 cores, released in 2013). Is my processor powerful enough for running the graphical operations required by a full screen application? We can answer this question by using simple calculations to estimate what fraction of my CPU's capacity would be used up in this context.

First, we need to estimate the number of instructions sent to the CPU every second, which we do with the formula width x height x frames/second x instructions/pixel. We set width x height to 3840 x 2160 and frames/second to 144, as per the specification of my screen. What about instructions/pixel? That is, how many instructions would be enough to compute the color of a pixel? This depends on the complexity of our graphical application, of course. Let's estimate that ours is simple and only requires 100 instructions. We end up with 3840 x 2160 x 144 x 100 = 119'439'360'000. This already lands us north of 100 billion instructions per second, huh.

Next, we need to estimate how many instructions the CPU runs in a second. We use the formula ticks/second x cores x logical_core/core. Let's just plug values from the manufacturer's spec sheet in the equation: 3'500'000'000 x 4 x 2 = 28'000'000'000. So around 30 billions of instructions per second.

30 billions < 100 billions — while we are using rough estimates, the CPU seems to be insufficient for the task, especially when we consider that it has to contend with more than graphical operations: there is the rest of the application's logic, plus the operating system and whatever other programs live in the background. Even if we were to use a less fancy screen, such as a 1920x1080 one running at 60Hz, about half of the processor's power would stil be dedicated to rendering alone — this is a lot.

You can play with the following widget to explore different combinations of CPUs and screens:


Screen:


CPU:
Resolution
x
x
FPS
x
Instructions

Frequency
MHz
x
Cores
x
LC/Core

=
IPS


=
IPS

However, this scenario is not an issue in practice. This is thanks to an additional component I did not mention earlier: my GeForce GTX 1060 GPU (released in 2016), which is able to run more than 4 trillion (!) operations per second.

2. What is a GPU?

The distinguishing feature of a CPU is not the fact that it computes but rather its central role as a controller of other components (hence the name Central Processing Unit). The CPU issues most of the commands to other components: memories, I/O devices, … or even computing units. In fact, a GPU is one such CPU-controlled computing unit. GPUs are specialized in a category of tasks that includes graphics rendering (as can be inferred from its full name, Graphics Processing Unit).

This does not tell us how a GPU achieves its magic. There are two parts to the full answer: parallelism and specialization.

GPUs are highly parallel devices, unlike the sequential CPUs (in reality, CPUs are free to rely on parallelism as long as some illusion of sequentiality is maintained). CPUs need this sequentiality to support complex control flows; e.g., previous instructions may impact which instructions are to be run later. In contrast, consider graphics rendering: to render a pixel, we run a simple program (100 instructions in our example) with a basic control flow once per pixel. Each run of the program handles a single pixel. The same 100 instructions are run for all pixels. Since different runs of this program do not influence each other (the color of a pixel usually does not depend on that of its neighbors), we can just cram lots of cores on the GPU and let each of them manage one run of the program! But wait, CPU cores are not cheap. How could this option be practical?

Luckily, GPUs are also specialized devices. The instructions required for graphics rendering are quite simple ones (at any rate, they are much simpler than those you would find in a CPU). Simple enough, in fact, that you can build GPU cores for a fraction of the cost of a CPU one.

Although GPUs first gained traction for their use in graphics rendering, they are now widely for code that can leverage their massive parallelism — AI or crypto mining come to mind (or whatever kids are up to these days).

For a first introduction to the inner workings of GPUs, please refer to this beginner-friendly video overview and this more quantitative/memory-centric one.

3. Why do we need Vulkan?

GPUs stem from a complicated and still ongoing history. They were developped incrementally, evolving from processors for almost fixed pipelines to the very flexible computing devices we have today. They do not seem to have settled on a stable form yet, as exemplified by the recent emergence of features such as real-time raytracing and upscaling: hardware manufacturers are often extending their devices with new, custom instructions specialized for some tasks, and providing developers with ways of using these facilities.

GPU implementations vary wildly across generations and manufacturers. This is bad for programmers — ideally, we would like to write a program once and to have it run on any hardware (we are not merely programming for ourselves!). Alas, different GPUs have different APIs.

It would be nice if we had a library that acted as a translation layer for all those APIs. Then, we could write a program saying "draw a green sphere", and the library would take care of translating this program into a sequence of commands tailored for a specific GPU. However, writing such a library would be a Herculean task: GPUs are very complex devices, there are many of them and new ones appear constantly. The problem is therefore solved in another way: people define generic APIs for GPUs, and it is up to the manufacturers to ensure that their devices support them. Manufacturers do this by implementing the functions specified within the APIs in the drivers of their devices. In addition, a special program called a runtime is provided by the developers of an API. This program is tasked with discovering devices whose drivers support it and with streamlining interactions with them.

As you probably guessed, Vulkan is such an API.

4. Why Vulkan and not X or Y?

I picked Vulkan it over its (many) concurrents because it is low-level, cross-platform, based on an open standard, modern in its design (nice debugging facilities, multi GPUs support) and extensible.

5. Reading the Vulkan documentation

This series is sprinkled with links to the Vulkan documentation, such as vkCreateInstance. Here are some tips for reading these pages efficiently (and about what to ignore):