Chapter 3: the compute pipeline

Warning

This chapter is currently being written

In the previous chapter, we saw how memory works and how resources are managed in Vulkan. This allowed us to understand the simplest pipeline out there: the transfer pipeline, which enables transfers of resources (additionally, it opens ways of modifying their contents, albeit in limited ways).

In this chapter, we finally harness the power of GPUs for running custom computations. The compute pipeline is our first truly interesting pipeline!

Along the way, we will encounter the GLSL shader programming language.

A. A high-level overview

The compute pipeline enables using the GPU as a general-purpose computing device. We do this by writing special, GPU-compatible programs called shaders (this name reflects the origin of shaders as programs for controlling graphical operations; compute shaders may also be called kernels). Shaders are built using a domain specific language such as GLSL (website). We then compile these programs into a binary intermediate language called SPIR-V (website). GPUs do not run SPIR-V natively. Instead, their drivers contain a compiler for turning SPIR-V code into the machine language that corresponds to their actual architecture. Although the compilation from GLSL to SPIR-V can be done at compile-time, the compilation from SPIR-V to machine language is device-specific and has to be done at run-time.

Just like we may call a program with different arguments, we may call a compute shader with different parameters. These parameters come in two forms: push constants and descriptors. Push constants are small pieces of data updated directly from the CPU, whereas descriptors give us a way of interacting with resources found in GPU memory (a distinction is made between read-only resources — referred to as uniforms — and read/write resources — referred to as storage resources). A shader has a given prototype that represents what kind of push constants and descriptors it expects. We describe this prototype explicitly in the form of a pipeline layout object.

We then build a compute pipeline, an object centralizing all the information required to run our shader. In particular, it regroups the shader itself and its prototype. To run the shader, we record a command buffer and bind the pipeline to it through a special command. Note that we did not yet set the values of shader arguments — push constants or descriptors. If Vulkan expected us to set them while building the compute pipeline object, we would have no other choice than to build a new pipeline every time we would like to run the same shader with different parameters. However, building a pipeline is costly! Therefore, Vulkan defines special commands for providing values for push constants and binding descriptor sets. We simply pass them to the command buffer to which the pipeline was bound.

Finally, we call special dispatch commands for running a command buffer bound to a compute pipeline.

In summary, we go through the following steps:

  1. We build our compute shader and compile it to SPIR-V (outside of Vulkan)
  2. We create a compute shader object (this implicitly compiles the SPIR-V to machine language)
  3. We describe the interface of the compute shader as a pipeline layout
  4. We create a compute pipeline object that regroups the shader object and the pipeline layout
  5. We register a command buffer:
    1. We bind the pipeline object
    2. We bind the descriptor sets/push constants used by the pipeline object
  6. We dispatch the command buffer

Up to this point, we glossed over important notions regarding how compute tasks are dispatched. To benefit from the parallelism of GPUs, we need to explicitly split up the problem in smaller subtasks:

Compute tasks are used for parallel computations. These problems have a certain dimensionality to them. For instance, summing the contents of a vector would be a one-dimensional problem, whereas running a kernel on a matrix would be a two-dimensional problem. This dimensionality is reflected in workgroups. For instance, in 2D problems, we want 2D-neighbors to share their caches as much as possible. In fact, workgroups support splitting in up to three dimensions.

The dimensions of a workgroup are defined directly in the shader code. It is then the responsibility of the user to dispatch enough workgroups so as to cover all the data. Furthermore, dispatching itself can be done in 1D, 2D or 3D. This is useful in some situations, but my understanding is that it is mostly a quality of life feature. For instance, assume that we are doing a convolution on a 16x16 matrix and that our workgroups are of size 8x8. Then, our matrix will be split into fourths. Doing a dispatch of the form 2x2, we run precisely as many invocations as required. To access the current item of the matrix for each invocation, we can do something like m[<workgroup_id.x>*8 + <local_id.x>][<workgroup_id.y>*8 + <local_id.y>]]. Without this feature, we would need to dispatch 4 workgroups flatly and we would have to compute equivalents to <workgroup_id.x> and <workgroup_id.y> manually from a global workgroup id. This computation would be slightly less obvious: something like m[<workgroup_id>%2*8 + <local_id.x>][<workgroup_id/2>*8 + <local_id.y>]].

B. The compute pipeline in more detail

B.1. From GLSL shaders to Vulkan shader objects

B.1.1. Writing compute shaders

Compute shaders can be written in different languages. Throughout the rest of this series, we assume the use of GLSL, (documentation) although using another language is fine. Although the inner workings of this language are not the focus of this series, we will discuss it to some extent. You may want to take a look at this collection of CUDA puzzles to build basic intuition about writing compute shaders (CUDA is not part of Vulkan but the fundamentals are the same everywhere). Once the basics are in place, writing shaders is relatively straightforward: GLSL feels like a more limited version of C with first-class support for vectors and matrixes and a notion of input and output resources.

Below is a very minimal example of what a GLSL compute shader may look like:

#version 460 // We are using version 4.60 of GLSL // Special syntax to define the dimensions of the workgroup layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in; // We describe the parameters of the shader as follows: layout(push_constant, std430) uniform pc_struct { vec4 data; } pc; layout(binding = 0) uniform ParameterUBO { float delta; } ubo; layout(binding = 1, rgba8) uniform readonly image2D inputImage; layout(binding = 2, rgba8) uniform writeonly image2D outputImage; // These descriptions are called interface blocks in GLSL parlance void main() { // The actual code }
Identifying the current invocation

layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in; defines the size of the workgroup for this shader. In order to identify the index of the current invocation, we can rely on special shader variables set of values defined by GLSL:

Interface blocks: describing the prototype of a shader

It is not enough to describe the push constants and descriptors in the pipeline layout: we also need to declare them in the shader itself! Declarations of parameters always start with a layout section. This lets GLSL know how the different parameters are to be accessed from memory, through the following fields (see the doc for the gory details — I put more precise links whenever possible):

Individual fields of structured objects may come with a layout of their own (through known keywords for adding information about alignment or matrix storage order, or the yet unseen offset keyword to specify the offset of individual structure members).

Furthermore, the behavior of shader parameters can be refined through memory qualifiers. The more information about the behavior of the code there is, the more optimizations can be applied by the driver. The common qualifiers for shader parameters are readonly (the object cannot be written to) and writeonly (the object cannot be read from). Refer to the doc for a more extensive coverage of this topic.

Additionally, uniform is specified for almost all parameters of the shader: uniform buffers, uniform images, storage images or push constants — but not storage buffers, which are described with buffer instead. As you can see, the semantics of this keyword is not perfectly aligned in Vulkan and in GLSL.

The type of the resource also needs to be specified using keywords such as buffer or image2D. Structure types work a bit differently. Consider for instance layout(push_constant, std430) uniform pc_struct { vec4 data; } pc;. This describes a push_constant named pc that is defined with a structured type. The type itself is given the name pc_structure. It contains the single field data. We could also have defined the structure type prior to its use instead of going for an inline definition (see the doc).

Note that defining unnamed shader parameters with a structured type is allowed, e.g., layout(push_constant, std430) uniform pc_struct { vec4 data; }; (note the disappearance of pc). Doing so pulls all their fields into the toplevel namespace (i.e., references to data in the main function would resolve to the field of this parameter).

Shared variables

Shared variables are a feature that is exclusive to compute shaders. Declaring variables with the shared qualifier shares them among all members of a workgroup. Accesses to it have to be synchronized inside of GLSL, as further discussed in the doc.

Interacting with images from shaders

GLSL defines special functions for interacting with images. The details are in the doc.

B.1.2. Compiling compute shaders

glslangValidator is the GLSL to SPIR-V compiler provided by Khronos, the consortium behind the Vulkan standard. glslc is a wrapper developped by Google for this compiler. It makes its syntax closer to that of gcc. Compiling GLSL shaders to SPIR-V is straightforward: we just set up a Makefile or something similar and we are good to go. We only need to pay attention to the locations where the SPIR-V files thus generated are sent as we will need to upload those on the GPU.

B.1.3. Building shader modules

Vulkan devices are expected to know how to handle SPIR-V files. In practice, this means that they are fitted with a compiler from SPIR-V to their machine language. In order to build a shader module in Vulkan, we have to do two things:

Luckily for us, we do not have to worry about these boring details: VkCreateShaderModule handles everything transparently, from the upload of the code to the compilation of the SPIR-V code. We only need to provide a pointer to our SPIR-V code and a measure of its length in bytes. Magic! The shader is not compiled as soon as the shader module is created, but when the pipeline is created: the shader is compiled so as to be as efficient as possible for a specific pipeline layout.

B.2. The pipeline layout, a GPU computation's prototype

The pipeline layout defines the interface of a shader explicity for Vulkan (although we described this interface a first time in the shader itself, Vulkan does not extract this information directly from there).

vkCreatePipelineLayout is used to create a pipeline layout. We have two kinds of shader parameters to describe: there are push constants as well as traditional resources.

Push constants are specified through a notion of push constants ranges: in situations where we have multiple shaders, each shader may have access to a different range of the memory bound through the push constant mechanism (different ranges may overlap, and your GLSL code should specify a matching offset). When using the compute pipeline, we are limited to a single shader only, so this becomes irrelevant. Remember that a very limited amount of memory can be bound through this mechanism (at least 128 byte, sometimes a bit more).

Descriptor set layouts describe the other shader parameters. The notion of descriptor sets is needlessly complex in the context of compute shaders. It is built for more graphics pipeline vkCreateDescriptorSetLayout VkDescriptorSetLayoutBinding

B.3. The pipeline object

TODO VkDescriptorPool VkComputePipelineCreateInfo VkPipelineShaderStageCreateInfo

Derivatives: don't

B.4. Binding parameters and dispatching computations

We start by creating a command buffer that we fill up after having called vkBeginCommandBuffer. In fact, we can immediately call vkCmdBindPipeline to bind the pipeline we previously created to the buffer: further commands will know that they refer to it.

The main thing left to do is to bind values to the parameters of the shader. Again, we have to consider both push constants and other, classical resources. TODO vkCmdPushConstants vkCmdBindDescriptorSets vkUpdateDescriptorSets

vkCmdDispatch dispatch the commands using XxYxZ workgroups (there are variants of this command but they are very niche ones, see vkCmdDispatchIndirect and vkCmdDispatchBase.

Once all of this is done, we call vkEndCommandBuffer to indicate that we are done registering our command buffer. To run our computation on the GPU, we submit the command buffer through vkQueueSubmit on a queue that supports compute operations — and just like that, we are done!