Chapter 4: the graphics pipeline

In the previous chapter, we learned how to run arbitrary computations on GPUs. How cool is that? We are now turning our attention to the graphics pipeline. Although we could already generate images from the compute pipeline by writing the results of our computations to a storage image, the graphics pipeline is more efficient and more convenient for that purpose.

In this chapter, we restrict ourselves to offline rendering, as opposed to real-time rendering (a topic we keep for the next chapter). We render images sequentially, and we write our results to image objects (as opposed to displaying them in a window).

A. A high-level overview

The graphics pipeline is more complex than its compute counterpart. The most glaring difference is that it is built out of many more stages: we run at least two shaders instead of just one (the tessellation and geometry shaders are optional), and there are many fixed function stages whose behavior we can control in a limited way.

There are also differences that we cannot detect by looking at this graph. In the previous chapter, we encountered different mechanisms for providing parameters to shaders. These mechanisms remain valid here, but Vulkan also passes some specific, built-in data to the vertex and fragment shaders, and it expects some specific results back. These data are tied to the role of these shaders in the broader rendering process. Additionally, we can forward arbitrary data from one shader stage to the next.

You may also have noticed that there is a new kind of resource called attachments; we will get back to those soon enough.

Comparison between the compute and the graphics pipeline

1/2

Below is the compute pipeline, as introduced in the previous chapter.

The next subsection is about rendering a single object to an image, and the following one is about handling multiple objects.

A.1. Rendering a single object

Below is a simplified representation of the graphics pipeline (heavily inspired by the illustration over at vulkan-tutorial.com, with parts of the illustration in fact directly lifted from there).

To a first approximation, the roles of the different stages are as follows:

Input Assembly (fixed): setups calls to the vertex shader and distributes the per-vertex data we feed into the pipeline as appropriate. The exact behavior of this stage depends on what geometric primitives we build our object from: we may render it as a 3D mesh (in which case Vulkan assembles the vertices as a set of triangles), as a line of connected vertices or as individual points, among other options.
Vertex stage (shader): maps the object's vertices from model space to screen space (see this post by Song Ho Ahn for more information on this topic).
Tessellation stage (shader, optional): subdivides surfaces to add detail to the geometry, useful for level-of-detail scaling.
Geometry stage (shader, optional): transforms the geometry of the object; runs on every primitive and can discard it or generate new primitives based on it. It is a tessellation stage on steroids, but it comes with poor performance outside of some specific architectures and is therefore rarely used in practice.
Rasterization (fixed): generates "fragments" for primitives. Fragments are pixel coordinates that fall inside a certain primitive. Vulkan only keeps fragments that map within the boundaries of the final image. We retain depth information at this stage.
Fragment stage (shader): computes a color per fragment.
Blending (fixed): different fragments may be mapped to the same pixel in the image. Vulkan blends them following one of several predefined rules (when we are dealing with an opaque object, we discard all fragments save for the frontmost one).

We do not discuss the optional stages in more detail here, as they are not part of the core concepts of Vulkan; once you understand the rest, you should be able to figure them out quite easily on your own anyway.

The graphics pipeline expects per-vertex information from us. The data we provide is available to the vertex shader. Typically, we specify the (local-space) coordinates of the raw vertices that make up our object (although we could also generate its vertices procedurally from the vertex shader), plus other data that will become relevant further down the pipeline, such as the direction of the normal at each vertex and/or its texture coordinates. This is very freeform, but since we are the ones writing the shaders, we know how to extract the relevant information. Also, we can reemit data from the vertex shader to make it available from later shader stages; this is how we would get, e.g., the texture coordinates information to the fragment shader.

We provide this per-vertex information via vertex buffers and (optionally) an index buffer. The index buffer is an array of ids referencing vertex buffers entries; we use it to avoid duplicating data. Indeed, when an index buffer is present, Vulkan handles its contents sequentially, resolving them into geometric primitives of our choice (considering them in groups of three to form triangle faces, or one by one to form lines or points, for instance). When we use triangle faces, every individual vertex usually belongs to several primitives. Without an index buffer, Vulkan still processes the data sequentially, but it takes it directly from the vertex buffers. We are then forced to repeat the data of each vertex as often as it appears in a primitive, and the vertex shader also runs once per occurrence of the vertex. This is very wasteful.

We can interleave multiple pieces of per-vertex data (also called attributes) in a single buffer, or we can store them in distinct vertex buffers (in practice, it is probably best to mix these two options, as discussed here: a non-interleaved buffer for coordinates data, and whatever we want for the rest). We create an input binding description for each vertex buffer, and an input attribute description for each attribute we store in these buffers (specifying their offset and stride).

We can propagate arbitrary data from a shader stage to its successor via output parameters. Parameters output in a stage serve as input parameters to the next stage. Just like descriptors have a set and a binding index, input and output parameters are endowed with a location attribute (this is just an arbitrary id) which identifies an input of a stage with an output of its predecessor. In fact, Vulkan uses the per-vertex attributes we described earlier as the input parameters to the vertex shader, which has no predecessor.

We provide per-vertex attributes, but the fragment shader is tied to pixels instead of vertices. In fact, the inputs of the shader for a fragment are weighted interpolations of the outputs of the shaders that ran for its primitive's vertices.

We do not declare input/output parameters outside of GLSL. Furthermore, GLSL exposes special variables for different shader stages. For instance, each invocation of a vertex shader should return camera-space coordinates, and do so via one such variable. We can still use descriptors, however (this is how we provide texture data, for instance).

The graphics pipeline introduces a new kind of resource called attachments. Attachments are an abstraction over images, and they come in different flavors:

Color attachments are the images the result of the pipeline is written to (in fact, the resolution of the underlying images is what determines the resolution of the render). They are only ever accessed by the blending stage.
Depth/stencil attachments carry two kinds of information. The depth information stores the distance between the camera and the nearest known fragment for each pixel. When a new fragment comes around, Vulkan can discard it if it maps to a pixel for which a nearer fragment is already known (for opaque objects, at least). The stencil information constrains which areas of the color attachment are rendered to.
Input attachments are inputs to the fragment shader that come with an intriguing limitation: we can only access the pixel located at the coordinates of the current fragment (we get to the bottom of this in the next subsection). Also, they are the only kind of attachment for which we need to provide a descriptor.

Vulkan discards as many fragments which do not contribute to the final image as possible (such as those falling outside of the visible region) before they reach the fragment shader stage. See this page for more detail (in the context of OpenGL but it carries over to Vulkan).

To render an object, we bind a graphics pipeline object and we issue a draw call to do the rendering. The pipeline regroups all the information pertaining to the structure of the rendering task: which resources are available and which stages may access them (we use pipeline layouts again for that purpose). We also provide settings for the fixed function stages, and we select which of the optional shaders to enable. After we bind the pipeline, and before we issue a draw call, we also bind resource descriptors and provide values for push constants. Note that there is no notion of workgroups for draw calls.

There is still something missing, however: we have not yet seen how to bind attachments. Attachments work in a contrived way, but there is a rationale for everything. We defer the explanation of this topic to the next subsection, as it provides the context that should make this concept click.

A.2. Rendering multiple objects

Warning

Render passes and framebuffers were deprecated in Vulkan 1.4 in favor of dynamic rendering. We can still use them for the time being (and in fact we should if we are targeting mobile devices, as these lag a bit behind in term of Vulkan support). The bulk of this chapter (the graphics pipeline and shaders) remains valid. The modern Vulkan chapter discusses the modern alternative to render pass objects.

For rendering complex scenes, we issue not just one, but many draw calls that interact through some shared state. This leads to synchronization challenges, but Vulkan handles them transparently for us. We just need to describe what the shared state actually is (through a framebuffer, which holds image views), the global flow of our rendering operation (through a render pass), and how draw calls are to handle the shared state (through subpasses, where image views from the framebuffer are tied as attachments for use by draw calls; all draw calls must be issued from subpasses). In this subsection, we discuss these three concepts in more detail.

The secret to rendering multiple objects to the same image lies in the framebuffer, which is an array of image views. Rendering operations can access the framebuffer image views that we bind as attachments.

We describe the global flow of rendering operations (with an emphasis on the state stored in the framebuffer) through render passes. Render passes are abstract descriptions that are not tied to concrete framebuffers. In that sense, they can be compared to pipeline layouts. They only describe what kinds of attachments should present, whereas framebuffers hold actual resources. We always provide actual resources for use as attachments in a render pass (via a framebuffer) at the point where we bind it in a command buffer.

Render passes define a set of subpasses, each describing either graphical or computational operations. Each subpass references attachments declared in the render pass for use with specific roles, e.g., as a depth attachment. Different subpasses may use the same attachments for different roles. For instance, the color attachment of a subpass may become an input attachment of a later one. So long as we do not want to swap an attachment for another, we can keep issuing draw (or dispatch) calls from the same subpass. In particular, we can switch pipelines within a subpass. The concept of subpasses makes most sense in the context of deferred rendering, where later subpasses make use of images produced by earlier subpasses (see forward rendering vs. deferred rendering). With forward rendering, one subpass is often enough.

By default, all subpasses may run wholly in parallel. If there are dependencies between subpasses, we have to specify them explicitly through an array of constructs known as subpass dependencies.

The main selling point of render passes is that they enable writing code that runs efficiently on two architectures built on very different principles. Consider the tile-based architectures that we discussed in the resources and transfers chapter. Conceptually, these handle pixels one at a time, and they do not feel bound by subpasses boundaries: if the final value of a pixel has already been determined for some subpass, a later subpass can already begin the work that depends on that value — even if the final value of other pixels is not yet known! Mobile devices use this trick to forego allocating and moving large resources around in memory (at least for those allocated lazily). This "one pixel at a time" way of working is the source of the access limitations for attachments: the rest of the data may not exist at the time when a pixel is handled! In practice, things are never really done "one pixel at a time" but "one tile at a time": the image is split into small tiles (of size, say, 16x16) which go through the rendering process in isolation.

Furthermore, the render pass/subpass model is not a costly abstraction, and it also brings some value outside of deferred rendering on tile-based architectures (as discussed here). However, it is also cumbersome compared to an alternative called dynamic rendering, which is set to replace it as of Vulkan 1.4.

Vulkan encourages the use of a single render pass with multiple subpasses instead of several distinct rendering passes. At times, however, the limitations to accesses in attachments get in the way. We need distinct render passes with manual synchronization whenever we require arbitrary and not just pixel-local reads from attachments; more information over here).

Draw calls emitted from the same subpass share the same attachments. For read-write resources, this may be an issue. Consider for instance the depth buffer. How does Vulkan avoid race conditions? Here again, there is synchronization going on, but it runs automatically behind the scenes.

As a concrete example to bind all of the above together, consider how we could handle partial transparency: think of stained glass. We first issue the draw calls for all the opaque objects in the scene. That way, we ensure that fragments of transparent objects that pass the depth test do not get blended with fragments that ends up hidden behind another opaque object: we cannot undo blending. Furthermore, we make accesses to the depth buffer read only for transparent objects (as they do not hide anything). See this for an OpenGL demonstration; not the same API but the same principles are at play. Just bear in mind that transparency is a hard topic in the general case: the process described above is quite naive. For this rendering process, we require only one render pass and one subpass (attachments do not change roles), but we use different pipelines for opaque and transparent objects (they do not handle their depth attachment nor their blending attachment in the same way).

When objects using different shaders are present in a scene, we bind a different pipeline object whenever we switch from one kind of object to the next. We can do this at any point in a subpass.

Finally, a quick word on render pass compatibility. Both framebuffers and graphics pipeline objects are created with a reference to a render pass object. We are only allowed to use them in the context of that very render pass, or any compatible one. As a first-order approximation, two render passes are compatible when they have similar attachments (only image layouts are allowed to differ, more detail here).

B. A deeper dive

B.1. Render passes, subpasses and attachments management

Render passes describe the global flow of a rendering operation: they specify how different resources are used across the different rendering/computing tasks which make up the operation. Render passes are split into different subpasses, and a new subpass is required every time the draw calls are to use the framebuffer's attachments in a different manner.

We build render passes through vkCreateRenderPass, which takes three main parameters: attachment descriptions, subpasses, and subpass dependencies. We describe these concepts in the sections below. Furthermore, we specify whether the subpass is meant for graphical or computational operations; in practice, we only use render passes for graphical operations.

B.1.1. Attachment descriptions

Attachments corresponds to images that can only be accessed in a pixel-local way. They can serve multiple roles in the same render pass (we will see how to assign roles to them for a subpass in the next subsection). We describe attachments using VkAttachmentDescription, which does not include a binding to an actual resource. We will link the attachments to resources much later, by providing a framebuffer at the point where we bind the render pass in a command buffer. For each attachment, we specify:

Its format.
Its initial layout.
A layout to which to transition at the end of a run of the render pass (this transition happens automatically).
An operation describing how the color/depth components of the attachment are to be treated at the beginning of the subpass where they are first used (we can preserve the contents of the image, clear them, or leave the choice in the hands of the driver if we are indifferent).
An operation describing how the color/depth components of the attachment are to be treated at the end of the subpass where they are last used (we can preserve the contents of the image, clear them, or leave the choice in the hands of the driver).
A similar pair of operations for stencil components only. You may wonder how the depth/stencil components are differentiated. Stencil components are read from depth/stencil attachments. Their size size depends only on the format of that attachment. For instance, if it is VK_FORMAT_D16_UNORM_S8_UINT, then the attachment has a 16-bit depth component and an 8-bit stencil one.
How many samples it uses (this is 1 when it does not use multisampling).
Whether it aliases memory used by other attachments (we specify this via a flag).

B.1.2. Subpasses

Multiple pipelines may be bound over the course of a single subpass, and many draw calls may be issued for every one of them. What all these calls have in common is that they use the attached framebuffer resources in the same way. Indeed, the role of each render pass attachment varies from subpass to subpass. We define the roles we desire for a given subpass via a VkSubpassDescription, which defines a set of arrays for each available role (except for the depth/stencil attachment, for which there is a simple pointer instead, as only one of them is supported). These store VkAttachmentReference structures, which are built out of an attachment index and a layout to which the resource is automatically transitioned before the subpass. There are five roles we can choose from:

Input attachments are made accessible to the fragment shader, with the proviso that only same-coordinates data can be read/overwritten. Also, we must setup descriptors for them (using VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT; the descriptor is required for pipeline layouts compatibility reasons).
Color attachments hold the results of rendering operations. We obviously need at least one of those, but the reason why we may need several of them is perhaps not as clear. In short, it is absolutely possible for a render to have several results. For instance, deferred rendering is done in two subpasses. In the first subpass, we generate a collection of textures collectively known as the G-buffer. The G-buffer may hold information about the pixels' albedo (the raw color of the frontmost object for each coordinate, with no regard for the lighting yet), their normals, the world coordinates they originate from or their specular intensity. In the second subpass, we combine the data from the G-buffer to compute the final image. All the textures that make up the G-buffer would be color attachments of the first subpass, and input attachments of the second.
A depth/stencil attachment is a single resource that stores two pieces of information. The depth component stores the distance between the camera and the frontmost known fragment for each pixel, and it is usually cleared at the beginning of the render pass (everything starts infinitely far away). It then gets progressively and automatically updated throughout the render pass. The stencil component stores a value per pixel, which we can use to disable rendering on parts of the image, for example (as showcased here).

The roles described above are those that are visible in the graphics pipeline graph. There are two additional roles that are more passive:

Resolve attachments are used in contexts where multisampling is enabled; their inclusion trigger the automatic resolution of the image at the end of the subpass, i.e., the merger of all the samples computed for each pixel. For this to work, we must provide one resolve attachment per color attachment.
Preserve attachments just sit there and do not do anything. That is the point! If they were not marked as preserve attachments, Vulkan would be allowed to do anything with their contents (this is only true for attachments that are otherwise unbound for the current subpass).

The same attachment cannot be used twice in the same subpass. Furthermore, Vulkan imposes some restrictions regarding the format of attachments: not all formats are compatible with all uses. See VkSubpassDescription for more details.

B.1.3. An array of subpass dependencies

By default, all subpasses execute fully in parallel. In situations where there are data dependencies between subpasses, this is not acceptable! If, e.g., subpass A writes data to an attachment that is read in subpass B, then the data should flow through these subpasses in that order. We define subpass dependencies via VkSubpassDependency to enforce such constraints. We can view them as a form of memory barriers where we have to specify the pair of subpasses impacted by the dependency in addition to the src/dst stage/access masks.

Now, it may be that a subpass depends on data rendered in another render pass altogether. Maybe that other render pass is running in parallel with our subpass! For such situations, we can use VK_SUBPASS_EXTERNAL as src/dst subpass, which conceptually waits for all subpasses in all previous render passes to be over before running the current subpass (in practice, the driver may be smarter than this and do something finer-grained). In fact, this is not only about waiting for other subpasses: we could for instance be waiting on a compute operation that runs outside of any render pass.

B.2. Configuring rendering operations

Render passes are not useful in and of themselves: they merely provide a context in which we may run rendering operations. In this section, we turn to how we build graphics pipeline objects to control the actual rendering process. Just like in the compute case, building pipeline objects is expensive, as it entails the compilation of shaders. Again, using pipeline caches helps with performance.

It is in the graphics pipeline object that we specify the behavior of the rendering pipeline. We do this through a collection of structures, each of them focusing on a certain stage or aspect of the rendering process. vkCreateGraphicsPipelines assembles them to build a graphics pipeline object. Just like we need to bind a compute pipeline before issuing dispatches, we need to bind a graphics pipeline before we issue draw calls. However, unlike dispatches, we must issue draw calls from within a certain subpass of a certain render pass.

B.2.1. Pipeline vertex input state

We describe the per-vertex information that we push into the pipeline (such as position, texture coordinates, or color information) via a VkPipelineVertexInputStateCreateInfo. Remember that we may pass all the per-vertex information interleaved in a single buffer, or we may use several buffers instead.

We register each buffer as a VkVertexInputBindingDescription, defining an arbitrary binding id that we will use to identify individual buffers when describing attributes. We specify a stride, i.e., how many bits stand between the data for two consecutive vertices. Despite the name of this structure, we can also use it to pass per-instance data instead of per-vertex data, where "instance" refers to instanced rendering, a technique for efficiently rendering multiple copies of the same object through a single, special form of draw call (see this page for an OpenGL-based explanation); we come back to this topic towards the end of this chapter.

We describe how to access individual attributes from buffers via VkVertexInputAttributeDescriptions (we rely on the binding ids to identify buffers): we specify a format (which defines the type and thus the size of the attribute), a local offset in the per-vertex data, and a "location" id that should be unique across all attributes of all buffers (we rely on such ids to identify attributes from shaders).

There are situations where we do not want to pass any per-vertex data to the pipeline: all the information such as the objects coordinates may be hardcoded in the shaders themselves. In fact, we will see a concrete example of a vertex shader that does just that later on. In such scenarios, we do not provide any VkVertexInputBindingDescriptions nor VkVertexInputAttributeDescriptions.

B.2.2. Input assembly

The input assembly stage needs to know what kind of geometric primitives we are rendering. This helps the driver setup the calls to vertex shader. For instance, if we are rendering a mesh, it groups the vertices into triplets corresponding to triangles, whereas if we are rendering lines, it just handles the vertices sequentially. This stage does not typically group vertices into primitives directly, as this would block some optimizations (as discussed earlier, GPUs may call the vertex shader once per index instead of once per vertex occurrence when using index buffers; then, it is only the results of the vertex shader that are assembled into primitives). Its focus is on assembling the attributes we feed into the pipeline and dispatching them across the vertex shader calls.

We control all of this through VkPipelineInputAssemblyStateCreateInfo. The main thing that we specify is the topology we target (see VkPrimitiveTopology for the full list of possibilities). There is also a very niche parameter that only makes sense when we use indexing with a strip or a triangle fan topology, and which allows using a special value in the place of some indices. This value is interpreted as an end to the definition of a primitive and the beginning of a new one, effectively allowing to draw multiple items in one call.

B.2.3. Pipeline viewport state

The pipeline viewport state is mostly about specifying what part of the attachments are used by rendering operations. We describe this in a VkPipelineViewportStateCreateInfo object, using the notions of viewports (defined as a VkViewport) and scissors (defined as a VkRect2D). The image stretches to fill the viewport, and the scissor discards fragments that fall beyond the rectangle area it delimitates (acting like a rudimentary stencil buffer). There is a nice illustration of this over at vulkan-tutorial.com.

Most of the time, we use a single viewport (alongside its scissor). Multiple viewports are only supported by devices with the multiViewport device feature, and they have applications such as splitscreen rendering or rendering for VR headsets (where both eyes see slightly different outputs). With multiple viewports, we still produce only one image in the end, so we do not need to introduce additional attachments. By default, multi-viewport rendering is tricky: see this sample (note the use of a geometry shader, effectively duplicating vertices for use from both viewports). The VK_KHR_multiview extension (part of the core since Vulkan 1.1) provides a more efficient way of doing this (more details in this stackoverflow post). An alternative approach is to issue multiple draw calls from different pipelines, each targeting a different area of the same color attachment. This is more flexible but it comes at the expense of performance.

The viewport is not a mere 2D rectangle, as it also has a notion of depth. To better understand what will follow, some additional context is useful. The viewport state is tied to the vertex post-processing stage of the graphics pipeline, which runs right before rasterization. Rasterization is about flattening things (this is where Vulkan switches from a 3D coordinate system to a 2D one). Vertex post-processing may as well have been called "rasterization pre-processing". Indeed, this stage is mostly just Vulkan massaging our data to make it amenable to rasterization. The two most important things that it does are clipping (vertices that fall outside of the visible region are discarded; if only some of the vertices of a primitive such as a triangle are discarded, Vulkan repairs it by generating some new vertices right on the edge of the visible region and by constructing new primitives appropriately) and viewport mapping. This latter operation is the one we parameterize using the pipeline viewport state structure. More information on this topic can be found here and here.

Viewport mapping is a mapping between two 3D coordinate systems (even though at the beginning of this subsection, we discussed the viewport state a bit as if it were a 2D object). The depth information is mapped linearly to the third dimension; as this step happens after clipping, we know that everything will be neatly in-bound. In practice, we almost always set this range to [0.0, 1.0] (without certain extensions, we cannot pick values that fall outside of this range anyway). At the end of the vertex post-processing stage, our coordinates are said to be in "framebuffer space".

B.2.4. Rasterization

Rasterization is about generating fragments from primitives, where fragments are made out of the discrete coordinates of one of the pixels of the final image (also called "screen-space coordinates"), plus a depth component. Consider for example triangles: before this stage, we had framebuffer space coordinates for their vertices only. During rasterization, Vulkan transforms these vertex coordinates into 2D screen-space ones, and then computes all the pixels that fall between them, generating a fragment for each of them. The depth of each generated fragment is an interpolation of that of the original vertices, and Vulkan in fact interpolates all the other attributes forwarded from the vertex shader stage (e.g., texture coordinates) in the same way.

In practice, the exact behavior of the rasterization stage can vary a fair bit. We setup all of the operations that run at this stage through a single VkPipelineRasterizationStateCreateInfo structure.

For instance, we can setup face-culling to reduce the amount of generated fragments. We typically use it for removing the fragments that correspond to the back faces of meshes. Think for instance about a cube: when no reflections are involved, we can see at most three of its faces. Why bother generating fragments for all six of them? The issue is that figuring out which faces of a mesh are visible and which are not is computationally hard in the general case. If it would cost more to find whether a face is visible than to keep the fragments associated to it, then culling would be counterproductive! Vulkan uses a cheap heuristic: when describing meshes, we provide the vertices making up a face in a counter-clockwise manner, in the sense that, when they are seen facing us from a vantage point exterior to the mesh, considering the vertices mapped to on-screen coordinates in a counter-clockwise order should preserve the order from the face definition. For more details and nice illustrations, see this guide (written about OpenGL, but the principle is the same in Vulkan). We control the culling criterion (front faces, back faces, all faces or no face) and whether "front face" refers to a clockwise or a counter-clockwise vertex order.

For meshes, we control how the triangles are rendered: although faces are usually filled-in, we could do wireframe rendering or even draw vertices only instead. With wireframe rendering (or when rendering any primitive that displays lines), we can also control the width of lines.

There are options related to depth testing. Depth clamping ensures that all depth values fall in the range that is visible to the camera: a depth cannot be smaller than the distance of the near plane nor larger than the distance of the far plane (refer to this page for more detail on near/far planes). We can also use depth bias to tweak fragment depth values (the bias can be more than just a constant value, see the doc for the detail). This is useful for countering shadow acne.

We can fully disable the rasterization step if we want to, which in turn prevents the rest of the pipeline from running. This is useful in situations when we only care about the side effects of the shaders that run before this stage (which is extremely niche). See this stackoverflow discussion for more information.

B.2.5. Pipeline depth/stencil state

If we bind a depth/stencil resource to a subpass, then we must also specify how the pipeline should interface with it. We do this using a VkPipelineDepthStencilStateCreateInfo.

We may enable depth testing, disable it altogether, or enable it in a read-only mode. We specify what comparison operation to use as a VkCompareOp, and we also specify the range of depth values that we consider in-bound from there. Samples that fall out of this range are discarded without testing.

We may also enable or disable stencil testing, and we control how this test handles fragments belonging to front-facing and back-facing faces using two VkStencilOpState objects. In both of these structures, not only do we specify the test operation itself (via a VkCompareOp), but we also define the behavior of samples that pass the stencil test, fail it, or pass it while failing the depth test, all of that via VkStencilOps. We set two masks for specifying which bits of the stencil value are read from and written to by tests. We may also provide a constant reference value for use within the test operations.

B.2.6. Blending

Blending is about mixing the results of a fragment shaders with the contents of color attachment (fragments that fail the depth test never reach this far). We control blending via a VkPipelineColorBlendStateCreateInfo structure.

For each color attachment, we create a VkPipelineColorBlendAttachmentState, where we describe how blending should be done for it. We can disable blending altogether for it (meaning that fragments that reach the end of the pipeline blindly overwrite the contents of the color attachment), or for some specific components only. If we do not disable blending, we specify how the blended value is computed using VkBlendOps: one for color values and one for alpha values. Furthermore, Vulkan remaps the values of both arguments of this operation according to the VkBlendFactors that we additionally specify. We can also provide a single RGBA blend constant. This value is shared among all attachments, and it is used when remapping arguments for some values of VkBlendFactor.

All VkPipelineColorBlendAttachmentState objects must be the exact same if the independentBlend device feature is not enabled (as we will see when discussing the fragment shader, we emit one value per attachment; even when we blend them the same way, we are not just blending the same value for all attachments).

Optionally, we can use a VkLogicOp to handle the blending instead of the VkPipelineColorBlendAttachmentState-based method. This is only possible when the logicOp device feature is enabled. The available operations are bitwise binary operations instead of typical blending ones. This is very niche and you probably do not need it.

B.2.7. Pipeline multisample state

In renderpasses where multisampling is enabled, we use VkPipelineMultisampleStateCreateInfo to configure how draw calls interact with this feature. Even though we already specified this when defining the render pass attachments, we give the number of samples used by the pipeline explicitly. This is because pipelines may use less samples than what the attachments supports.

Before proceeding any further, I recommend you to check this cool overview of multisampling in Vulkan. With classical multisampling, the fragment shader runs once for all samples of a pixel that are tied to the same primitive. This means that multisampling only reduces artifacts at the interface between distinct primitives. To reduce artifacts within single primitives, we may activate sample shading to run the fragment shader for each sample instead. To mitigate the high performance cost, we can choose to run the fragment shader for some fraction of the samples belonging to a given object (e.g., 40%). We may specify a sample mask to disable the use of some of the samples. In practice, most renderers stick to classical multisampling.

Vulkan supports alpha to coverage, which is about making the alpha channel of pixels reflect the proportion of samples that correspond to the object (this is especially useful for rendering dense foliage). Alternatively, we may pin the value of the alpha component to 1.

Multisampling is only one form of anti-aliasing. It features prominently in the API since it is very common and requires hardware support, but it is not the only available option (see this Wiki article for more information).

B.2.8. Pipeline dynamic state

Most pipeline parameters are constants. This is good for the driver, as it allows it to heavily optimize the rendering process. However, sometimes we would actually like to make some more parameters dynamic. We can only do this in pipelines where these parameters are declared upfront as dynamic state. We do this using VkPipelineDynamicStateCreateInfo.

We list the parameters that vary as VkDynamicState. Parameters that qualify for dynamic state only include viewports, scissors, line width (e.g., when rendering in wireframe mode) and the blend constant, as well as the specifics of the depth and stencil attachments (bias and bounds for depth, compare mask, write mask and reference value for stencil) in Vulkan 1.0. More recent versions expanded this list, but there are still limitations as to what can be dynamic. If we have no dynamic state, we can use a null pointer instead of this structure.

B.2.9. Shader stages

The graphics pipeline uses multiple shaders. Like in the compute case, we use VkPipelineShaderStageCreateInfo to describe shaders. The main difference is that we now need a whole array of such structures.

B.2.10. Putting it all together

VkGraphicsPipelineCreateInfo expects one of each of the structures presented above (plus a VkPipelineTessellationStateCreateInfo, which we did not describe here as tessellation is out of the scope of this guide).

It also expects a render pass object. We will only be able to bind the resulting graphics pipeline in a context where this render pass or one compatible with it is bound (see section A.2 for more information). We also specify the index of the subpass in which this pipeline will be used, which allows the driver to further optimize things. The other arguments are carried over from VkComputePipelineCreateInfo: we provide a pipeline layout, and we could use pipeline derivatives (but just like then, we should not).

B.3. Shaders

The graphics pipeline introduces several new kinds of shaders: the mandatory vertex and fragment shaders, which we present in further detail in this section, as well as optional ones (which we are leaving aside — this chapter is long enough as is). Unlike compute shaders, there is no notion of workgroups or dimensionality of computations here. Furthermore, graphics shaders play a precise role in a wider pipeline. They receive specifics inputs and must emit specific outputs based on this role; GLSL introduces special mechanisms for this purpose. We can also propagate arbitrary data from shader to shader along the pipeline.

Here is a sample fragment shader that illustrates some key differences:

#version 460 // We are using version 4.60 of GLSL // Everything before main is an interface block layout(binding = 1) uniform sampler2D texSampler; // Data sent from the previous shader stage layout(location = 0) in vec3 fragColor; layout(location = 1) in vec2 fragTexCoord; // Data emitted to the next shader stage (except there is no // shader stage after the fragment shader; this is just how // GLSL asks us to provide the outputs required for this stage) layout(location = 0) out vec4 outColor; void main() { outColor = vec4(fragColor * texture(texSampler, fragTexCoord)); }

There are three new keywords to notice: location, in and out. in and out are about the data that we push from shader to shader, and locations are unique identifiers for such data: the location of inputs should match that of an output of the previous stage, except for the vertex shader (as it has no predecessor, being the first shader to run), for which it refers the pipeline's VkPipelineVertexInputStateCreateInfo instead (this structure stores an array of VkVertexInputAttributeDescriptions that contain a location field each). If we want to avoid interpolation, we can use the flat GLSL keyword. Then, the data for one of the surrounding vertices will be used as is for all fragments originating from the same primitive. For instance, we can do something like layout(location = 1) flat out int out_texture_index;

There are also new layout qualifiers (refer to the doc for the full list). Most of them are specific to one kind of shader, but there are some more global ones: we already discussed location, but there is also component, which we can use to pack more data in a single location (additional detail here; this is very niche and you probably do not need it). Furthermore, we can use input_attachment_index as layout(input_attachment_index = 0, set = 0, binding = 0) uniform subpassInput t;, to refer to input attachments, i.e., image views that support pixel-local operations only (note the specific subpassInput type).

B.3.1. Vertex shaders

A vertex shader receives the pipeline's attributes as input, and it must produce a 2D location in exchange. We can actually be pedantic and use the attribute keyword instead of in here (it does not change anything).

GLSL defines gl_Position, a special value of type uvec4 to which we must assign the screen-space position of the current vertex before the end of the stage. You may wonder why it is a uvec4 and not a uvec3; this stackoverflow post answers this question (short version: homogeneous coordinates; the last component is almost always set to 1).

We can access some special GLSL variables from vertex shaders (see the spec):

int gl_VertexIndex contains the index of the current vertex. It is useful in situations where we hardcode the vertex data instead of passing it as a pipeline input.

A concrete example
This code comes from here.
#version 460 // We are using version 4.60 of GLSL layout(location = 0) out vec3 fragColor; vec2 positions[3] = vec2[]( vec2(0.0, -0.5), vec2(0.5, 0.5), vec2(-0.5, 0.5) ); vec3 colors[3] = vec3[]( vec3(1.0, 0.0, 0.0), vec3(0.0, 1.0, 0.0), vec3(0.0, 0.0, 1.0) ); void main() { gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0); fragColor = colors[gl_VertexIndex]; }
int gl_BaseVertex: when issuing draw calls, we may handle vertices from a certain offset on only. This is this offset. Note that gl_VertexIndex is relative to 0 and not to gl_BaseVertex.
int gl_InstanceIndex is the same as gl_VertexIndex, except for instances (more on these in the last section).
int gl_BaseInstance is the same as gl_BaseVertex, except for instances.
int gl_DrawID is usually 0, except possibly when using the VK_EXT_multi_draw extension.

There are other output variables in addition to gl_Position, though these are usually not mandatory (see the ref):

We use float gl_PointSize when rendering points. This variable defines the size of the points in pixels (this value is defined per vertex).
float gl_ClipDistance[] is related to an advanced feature: user-defined clipping. User-defined clipping is an extension of the standard clipping phase during which Vulkan drops the fragments that fall outside of the visible region. It allows us to drop fragments depending on their positions relative to arbitrary planes. We may define several such planes, so this is an array type. See this video for more details.
float gl_CullDistance[] is a bit like the above except that it can cull full primitives, and Vulkan never needs to repair anything by adding new primitives sitting right at the border of culling planes like it must do for the clipping planes. See this stackexchange discussion for more information.

B.3.2. Fragment shaders

A fragment shader receives normalized 2D location data as input and emits colors as output (the format of which depends on our color attachments).

Unlike for the vertex shader, Vulkan does not define a special variable to which we should write the resulting color. This is because we may have several color attachments (as is for example the case for deferred rendering); a single keyword would not be enough. Instead, we define one shader output per color attachment (using the out keyword). The location of the output binds it to the color attachment with a matching id in the subpass description.

We can access some special GLSL variables from fragment shaders (see the spec):

vec4 gl_FragCoord contains the screen-space (homogeneous) coordinates of the current fragment. There are two special layout qualifiers that apply only to this variable: origin_upper_left and pixel_center_integer (they may be used like layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;). By default, Vulkan considers that the origin of the window is at the lower-left point, and that the centers of pixels are located at half-pixel coordinates. For example, the (x, y) coordinates (0.5, 0.5) correspond to the pixel at the bottom left of a window.
bool gl_FrontFacing is true iff the fragment belongs to a front-facing face (as per the backface culling rules).
float gl_ClipDistance[] is one of the outputs coming from the vertex shader. Of course, the vertex shader was per-vertex whereas this input is per-fragment. As usual, the value is an interpolate of the surrounding vertices'. Note that at this stage, clipping has already been performed.
float gl_CullDistance[] is also one of the outputs coming from the vertex shader, and it is also an interpolate. Note that at this stage, culling has already been performed.
vec2 gl_PointCoord is defined for point primitives only, and it is related to gl_PointSize (emitted from the vertex shader). It contains the position of the fragment in the quad generated from its point primitive.
int gl_PrimitiveID identifies the primitive to which the current fragment is attached.
int gl_SampleID identifies the sample being evaluated. Using this variable in a static context forces Vulkan to evaluate the shader once per sample for all pixels; this and the other gl_SampleXxx variables are multisampling-related.
vec2 gl_SamplePosition contains the subpixel coordinates of the current sample. Same warning as above regarding static contexts.
int gl_SampleMaskIn[] indicates which samples will have their value set to that of the current fragment.
int gl_Layer and int gl_ViewportIndex are set from the geometry shader, which offers ways of emitting vertices to different layers of the framebuffer images, or of targeting specific viewports. See this stackoverflow discussion for more information. If they are not defined there, we cannot access them from the fragment shader.
bool gl_HelperInvocation indicates whether the current fragment shader execution computes the value of an actual fragment or if it was invoked from another fragment shader instance instead (if this is the case, the results of this invocation are not written to the framebuffer; turn to the spec for more details). This happens when derivatives are used

There are also special GLSL variables that we can use as output (see the spec):

We can override the computed depth (which we can read from gl_FragCoord) using float gl_FragDepth. We may use the depth_any (default value), depth_greater, depth_less or depth_unchanged layout qualifiers for this variable only (e.g., as layout(depth_any) out float gl_FragDepth;). These qualifiers assert properties about our new depth value, which the driver can use for optimizations (if they do not hold in practice, the result of the depth test is undefined). For instance, depth_greater means that the new depth values are greater than the ones they replace.
Vulkan ANDs int gl_SampleMask[] with gl_SampleMaskIn[], and it only writes the result of the current invocation to samples that are set to true in the result (not setting this value is equivalent to forwarding gl_SampleMaskIn[]); this is multisampling-related.

early_fragment_tests is a very special layout qualifier which we may use as layout(early_fragment_tests) in; (note the absence of a variable name). It causes all depth and stencil tests to run before the fragment shader, which effectively disables any effects of writing to gl_FragDepth.

Finally, we may use the index layout qualifier for outputs only, e.g., as layout(location = 3, index = 1) out vec4 factor;. This sets the index for use with a feature known as dual source blending (more information on the OpenGL wiki; again, this is very niche).

GLSL defines some fragment shaders-exclusive functions in addition to the standard GLSL ones.

B.4. Framebuffers and attachments

Framebuffers are sets of image views that we can bind within a command buffer for use as render pass attachments (more detail on the binding part in the next section). We create them via vkCreateFramebuffer.

Framebuffers define the dimensions of attachment images (width, height, and number of layers; we can use a pipeline with attachments of any dimensions, but all the attachments in a framebuffer have the same dimensions). We define them using a reference to a specific render pass, but we can use them with any other render pass that is compatible with it (i.e., we may use render passes with similar attachments; only images layouts are allowed to differ, more information here).

We previously discussed attachment descriptions and attachment references, which were like the layouts of attachments. Framebuffers are the concrete image views that we bind for use by draw calls in render passes live. These are simply defined as VkImageViews with some restrictions regarding the underlying images: they should of course respect the dimensions defined by the framebuffer (width, height, and number of layers), but these dimensions are a minimum: the actual resources must be at least as large as this.

B.5. Registering and running render passes

We sandwich all rendering-related commands between a vkBeginCommandBuffer and a vkEndCommandBuffer. We must issue them in a context where a render pass is bound, i.e., between a vkCmdBeginRenderPass and a vkCmdEndRenderPass. vkCmdBeginRenderPass expects a render pass object and a framebuffer compatible with it as arguments (the framebuffer provides concrete resources that the driver binds to the attachment slots). At this point, we indicate which area of the attachments is updated by rendering operations in this render pass to help the driver optimize things. We should enforce that nothing is changed outside of that area, for instance by using scissors. Additionally, we define the color used for clear operations (used, e.g., when loading attachments that use VK_ATTACHMENT_LOAD_OP_CLEAR).

Rendering operations occur in specific subpasses. Vulkan automatically binds the first subpass (following their order in the array that we used to create the render pass object) at vkCmdBeginRenderPass time, and we use vkCmdNextSubpass to switch from a subpass to the next. Both of these functions take a VkSubpassContents argument, which controls whether the current subpass supports only primary command buffers or only secondary ones (remember that secondary buffers are pre-registered command buffers; we execute them through vkCmdExecuteCommands, which is the only authorized operation in subpasses using secondary buffers).

A draw call is tied to a pipeline, which we bind through vkCmdBindPipeline (several draw calls may be issued from the same pipeline), This looks like what we did for the compute case, except that we must do it from within a subpass now. We also bind resources to descriptors just like we did before, i.e., using vkCmdBindDescriptorSets. We bind vertex data through vkCmdBindVertexBuffers. We must provide one buffer per VkVertexInputBindingDescription in the graphics pipeline's VkPipelineVertexInputStateCreateInfo, and we specify an offset into each source buffer. Also, we may update only some of the vertex input bindings at a time by specifying appropriate values for the firstBinding and bindingCount fields). If the data is indexed, we use vkCmdBindIndexBuffer (we also specify an offset into the buffer plus a datatype in this case).

All the objects that we render in the same way may be drawn using the same pipeline, although we need to update the resources that they use between two draw calls (such as textures and vertex buffers). We use vkCmdBindDescriptorSets, vkCmdBindVertexBuffers and vkCmdBindIndexBuffer between draw calls for this purpose. We may also use vkCmdClearAttachments for clearing regions of depth/stencil attachments then.

Once all of this is done, issuing the actual draw commands is easy. There are several different draw commands to choose from — the most common ones are the following two:

vkCmdDraw takes a vertex count and an instance count, as well as an offset into the vertex buffer, and, similarly, an instance offset. See this for an explanation of instancing (remember the per-instance attributes?).
vkCmdDrawIndexed is meant for situations where we use an index buffer. In addition to the previous arguments, we also provide an offset into the bound index buffer. Indexing helps us avoid duplicated data, and it also helps the GPU avoid duplicated computations: in principle, the vertex shader may run once per index instead of once per occurrence of the vertex in primitives. Note that the results of vertex shaders are typically stored into a post transform cache before being assembled into primitives (which all happens in the vertex shader stage, if my understanding is correct). The device can read from this cache instead of recompiling the shaders every time. In practice, this does not eliminate all duplicated computations (as discussed in this paper, summarized in this blog post; in short, there is not a single global post transform cache, but a set of local ones).

We can update dynamic state within a render pass using the following functions:

Note that the specification forbids running many kinds of commands from render passes (such as vkCmdDispatch and vkCmdCopyBuffer).

Well, that was it! After all of this hard work, we are finally able to do rendering, which for most people is the core appeal of Vulkan. Now, all that we are missing is a way of continuously streaming the results of our rendering operations to the screen. We remedy this in the next chapter. This was the most complex chapter in this tutorial, and if you made it to this point, you will find that the next chapters are refreshingly simpler (also, congratulations!).

Chapter 4: the graphics pipeline

A. A high-level overview #

A.1. Rendering a single object #

A.2. Rendering multiple objects #

B. A deeper dive #

B.1. Render passes, subpasses and attachments management #

B.1.1. Attachment descriptions #

B.1.2. Subpasses #

B.1.3. An array of subpass dependencies #

B.2. Configuring rendering operations #

B.2.1. Pipeline vertex input state #

B.2.2. Input assembly #

B.2.3. Pipeline viewport state #

B.2.4. Rasterization #

B.2.5. Pipeline depth/stencil state #

B.2.6. Blending #

B.2.7. Pipeline multisample state #

B.2.8. Pipeline dynamic state #

B.2.9. Shader stages #

B.2.10. Putting it all together #

B.3. Shaders #

B.3.1. Vertex shaders #

B.3.2. Fragment shaders #

B.4. Framebuffers and attachments #

B.5. Registering and running render passes #