Chapter 4: the graphics pipeline

Warning

This chapter has been written in one go and has never been refined nor proofread

Thanks to the previous chapter, we know how to run arbitrary computations on a GPU. How cool is that? We are now going to get acquainted with the graphics pipeline, the rendering counterpart of the compute pipeline. Of course, we could already generate images from the compute pipeline by writing the results of our computations to a storage image, but, as we will see, the graphics pipeline is both faster and more convenient.

In this chapter, we restrict ourselves to offline rendering, as opposed to real-time rendering. That is, we are handling a single image at a given time and not a continuous stream thereof. We also write our rendered image to an image object, as opposed to printing it to a system window. We will tackle these two points in the next (and final) chapter of this series.

A. A high-level overview

The graphics pipeline is slightly more complex than its compute counterpart. The most glaring difference is that it is built out of many more stages: we do not only run a single shader but at least two (tesselation and geometry shading are optional). There are additional fixed function stages behavior can also be controlled, albeit in a much more limited fashion.

A less conspicuous difference has to do with how the pipeline handles data. Remember how we had to deal with passing arguments to the shader in the compute case? We still have to do that but for all shaders of the pipeline. Furthermore, there is an additional possible origin for shader parameters, namely the pipeline itself. Indeed, the compute shader was very much freeform: we took the parameters we wanted and we were free to pick which results to store. The vertex and fragment shaders are not as flexible, a consequence of their fixed role in the grander scheme of the graphics pipeline. They are automatically fed specific data, and they must return other data of a determined form. However, they are still allowed to store data in storage resources, just like compute shaders.

Comparison between the compute and the graphics pipeline

The objective of this section to give you intuition about explaining how rendering works in Vulkan. We first consider to how render a single object to an image, before turning to rendering multiple objects in parallel. There are obvious challenges regarding resources access synchronization, but Vulkan has got us covered.

A.1. Rendering a single object

The graphics pipeline is designed for rendering a single mesh (or other geometric objects; in particular, it does not have to be 3D) to an image. Now, real-life scenes are usually built out of at least tens of meshes; we will soon see how to work around this apparent limitation. Below is a simplified representation of the graphics pipeline (heavily inspired by the illustration over at vulkan-tutorial.com, with parts of the illustration in fact directly lifted from there).

To a first approximation, the roles of the different stages are as follows:

We will not present the optional stage in more detail in this series, as they are not part of the core concepts of Vulkan; once you understand the rest, you should be able to get this to work easily.

The graphics pipeline expects inputs related to the primitives to render. This is done by passing vertex buffers and (optionally) an index buffer. The index buffer is a linear list of ids referencing data from the vertex buffers. These ids are resolved into primitives at the input assembler stage. The contents of the vertex buffer itself is rather freeform: we may store additional, per-vertex information next to each coordinate, such as the texture coordinate at this vertex or the normal direction. We are the ones writing the vertex shader, so we know how to extract the relevant information! The rest is just propagated down the pipeline and can be used in the fragment shader. Alternatively, instead of interleaving our data, we can split it into multiple distinct buffers (which seems to be better performance wise, see this; at least, position-related data belongs to its own, uninterleaved buffer). We create an input binding description for each vertex buffer, and an input attribute description for describing each kind of attribute that we store (there may be several attributes tied to the same vertex buffer when we interleave per-vertex data — we just have to specify the right offset and stride). We can then reference the corresponding data from the shaders.

By default, the parameters originating from input bindings can only be accessed from the vertex shader (the very first shader to run). To propagate data further down the pipeline, we have to tie it as a special parameter. In fact, we can even generate new kinds of such per-vertex parameters containing arbitrary data. For each shader, we just specify the which such parameters are expected as input and which are emitted as output. Parameters are identified across different shaders according to the location they are bound at — this is just an arbitrary id. We do not have to declare them anywhere outside of the shader itself (in particular, we do not need to create descriptor set layout bindings or whatnot; this is another kind of parameters altogether). GLSL provide special keywords for this purpose. Note that data is per-vertex, whereas the fragment shader is, well, per-fragement. By default, the fragment shader receives an interpolate of the appropriate neighboring vertices (e.g., those that make up the face of the mesh the fragment was generated from).

Our vertex/fragment shaders are expected to return vertex/color information for further use by the pipeline. GLSL provides special pseudovariables to which this data should be written.

In the graphics pipeline, we first encounter the notion of attachments. There are several of them with different uses. The color attachments are only ever accessed from the blending stage. It contains the final result of the rendering, and the resolution of the image bound to it is all that determines the resolution of the render result. Depth/stencil buffer is a single attachment that carries two kinds of information. The depth information is used to enforce the property that the fragment nearest to the camera are rendered first (it stores the distance from the camera for the already-rendered fragment of each pixel that is the closest to the camera: if a closer fragment is found, the value in the color attachment is overridden for the current pixel and the depth buffer is updated with the new closest distance; otherwise, the fragment is plainly ignored). Stencil attachments are used to constraint which areas of the color attachment can actually be rendered to. There are also input attachments, which correspond to input data to be accessed by the fragment shader (we have to provide appropriate data descriptors should be provided). These come with an intriguing limitations: when handling a pixel with certain coordinates in the image, the only thing that we can access from the input attachments is data located at the same coordinate within them. You may wonder why we should bother with input attachments since they appear to be just a very limited form of resource. The answer has to do with synchronization between distinct rendering tasks, a topic explained the next section.

In the fuller version of the pipeline, we showed that there are per-fragment test stages. These are about discarding fragments which do not contribute to the final image. For instance, fragments that are mapped outside of its boundaries can safely be ignored; oftentime, we may avoid running the fragment shader altogether for these fragments. See this page for a more detailed explanation (in the context of OpenGL but the concept remains the same).

To render an object, we use a special command referred to as a draw call. In order to do so, we need to be in a context where a graphics pipeline object is bound. This pipeline object is the graphical case equivalent of the compute pipeline that we encountered in the previous chapter. Just like in the compute case, the pipeline regroups all the information pertaining to the general structure of the rendering task: which resources are available and which stages may access them (through a good old pipeline layout object). Unlike the simpler computational case, we also have to provide settings for the fixed function stages. We also select which of the optional shaders to activate. Once the pipeline is bound and before the draw call is issued, we also have to bind resource descriptors and to provide values for push constants. Not much new under the sun.

A.2. Handling multiple objects

The secret to rendering multiple objects to the same image lies in the framebuffer. A framebuffer is a wrapper around image views. These constitute state that we can reference throughout a rendering task.

Rendering tasks are described as render passes, which define their global flow, especially with regard to the data found in the framebuffer (we always need a rendering task, even for drawing a single object). The render pass is an abstract description that is not tied to a concrete framebuffer. In that sense, it can be compared to a pipeline layout: it is only about the structure of the operation.

A render pass defines a set of subpasses describing operations that may be graphical or computational. Each subpass references different image views found in the framebuffer for use with specific roles for the current task. We call such references attachments. For instance, it is customary to use one attachment for storing depth information. This is the depth attachment from the pipeline schema given at the onset of this chapter!

So long as their attachments do not change, draw (or dispatch) calls can be emitted from the same subpass. Typically, only one subpass is used for rendering all the objects that belong to the same scene. Subpasses are most useful for deferred rendering instead of forward rendering, where later subpasses make use of images produced by earlier subpasses (see forward rendering vs. deferred rendering). Attachments can be accessed in a limited way from within graphics shaders. Only information tied to the same coordinates can be loaded from them.

In principle, subpasses can run wholly in parallel. If there are dependencies between subpasses, we have to specify them explictly through a construct known as subpass dependencies.

Vulkan encourages the use of single rendering passes with multiple subpasses instead of several distinct rendering passes (when the limitations to attachments access do not get in the way; any effect which requires arbitrarily reading from an image that was rendered to must be in a different render pass from the render pass that rendered that image). Indeed, this leads to better performance, especially for tile-based architectures (as commonly found on mobile devices). Conceptually, a tile-based renderer would consider pixels one at a time and run all subpasses for them in a sequence. This eliminates having to actually allocate and transfer around large textures: we can use lazy allocation, as discussed in the resources and transfers chapter. This "one pixel at a time" way of working explains the access limitations for framebuffer data: the rest of the data may not exist at the time when the pixel is handled! Note that in practice, things are never really done "one pixel at a time" but "one tile at a time": the image is split into small tiles (of size e.g. 16x16) which are handled from beginning to end in one go. Still, we can never know which neighbors are present, so there you have it.

It is really nice to write code once and to have it run efficiently on two architectures built following such different principles! Furthermore, the render pass/subpass model is not a costly abstraction and also brings some value outside of deferred rendering on tile-based architectures (as discussed here).

Draw calls emitted from the same subpass share the same attachments. This is no issue for read only resources, but consider for instance the depth buffer, which is read-write. How do we stop the different calls from running into access conflicts? The answer is straightforward: earlier calls run before later ones, and synchronization is automatic. Of course, we can update the different descriptors sets between different draw calls.

As a concrete example, consider how we could handle partial transparency (think of stained glass). As later draw calls override those which come earlier, we first need to run all calls for non-transparent objects. Otherwise, we may end up in a situation where the color stored on the framebuffer has been blended with that of a mesh which ends up being covered by another mesh of a different color that stands between it and the stained glass. Then, we would have to subtract the color of the earlier object, but the image only contains a color and carries no information as to how that color was obtained. See this for an OpenGL demonstration; not the same API but the same principles are at play. Be aware, transparency is a hard topic in the general case.

What if objects that use different fragment shaders are present in the same scene? Well, then we need to bind a different pipeline object whenever we start to render a different kind of object. This can be done at any point of a subpass.

Finally, a quick word on render pass compatibility. Both framebuffers and graphics pipeline objects are created with a reference to a render pass object. They are only allowed to be used in the context of a compatible render pass (which may well be the very same one). As a first-order approximation, two render passes are compatible when they have similar attachments (only images layouts are allowed to differ, more detail here).

B. The graphics pipeline in more detail

B.1. Render passes, subpasses and attachments management

Render passes describe the global flow of a rendering operation. In particular, they describe how different resources are used across the different rendering/computing tasks which make up the more global rendering operation. A render pass is split into different subpasses, where a new subpass is required every time the attachments are to be used differently. Note that we are not issuing draw calls yet, but describing a context from which we will later be able to emit these calls. Having an explicit description of attachments usage helps the driver optimize the operation, especially on mobile devices.

A render pass is built through vkCreateRenderPass. It has three main parameters, which we break down in the sections below: these are attachments descriptions, subpasses and subpass depencies.

In addition to them, we have to specify what kind of pipeline will be used in the subpass (this is what the pipelineBindPoint argument is about); in practice, we always use graphics pipelines.

B.1.1. Attachment descriptions

Each attachment corresponds to a resource that that may be tied for use with a specific role in subpasses (see below), with the limitation that only pixel-local accesses are ever possible (see above). A VkAttachmentDescription does not include a binding to an actual image; this will come later. What needs to be specified about the image is:

B.1.2. Subpasses

Subpasses are where the attachments are tied to a specific roles for use by draw operations. Many draw operations may be emitted from the same subpass, and all these calls will use attachments the same way. This is described through a VkSubpassDescription. All of the interesting arguments are references to attachments, passed as VkAttachmentReference structures (which are built out of an attachment index and a layout to which the attached resource is automatically transitioned before the subpass).

In addition to those, which we had already introduced in section A, there are some attachments that do not play any direct role in the graphics pipeline itself:

There are some restrictions as to attachments usage. For instance, and perhaps obviously, the same attachment cannot be used twice in the same subpass. Some additional restrictions are tied to the format of the attachments: not all of them can be assumed to be compatible with all uses. See VkSubpassDescription's registry page for the full detail.

B.1.3. An array of subpass dependencies

By default, all subpasses execute fully in parallel. In situation where there are data dependencies between subpasses, this is not ok! If, e.g., subpass A writes data to an attachment that should be read in subpass B, then the data should flow through these subpasses in that order. This is what subpass dependencies are about.

Subpass dependencies are described through VkSubpassDependency. They can be viewed as an alternative form of memory barriers. In addition to the src/dst Stage/Access masks, we also have to specify the pair of subpasses impacted by the dependency.

Now, it may be that a subpass depends on data rendered in another render pass altogether. Maybe these other subpasses are running in parallel to our subpass! For such situations, we can rely on VK_SUBPASS_EXTERNAL, which basically waits for all subpasses in all previous render passes to be over before running the subpass (at least as far the operations matching the specified masks are concerned). Concretely, we would specify this value as the src subpass (or dst subpass).

B.2. Configuration of rendering operations

We have seen how to describe render passes, but these are not useful in and of themselves. Rather, they serve as a context in which render operations may run. In this section, we turn to how rendering operations are parameterized.

In order to parameterize a rendering operation, we define a bunch of structures describing how a precise aspect of the rendering should be done (e.g., how a fixed-function stage should behave or which shaders are used). At the end, we create a VkGraphicsPipelineCreateInfo object, which regroups all of the data from these different sources. Later on, when registering the render pass into a command buffer, we will submit draw calls in contexts where such a graphics pipeline object is bound to trigger rendering operations.

B.2.1. Pipeline vertex input state

VkPipelineVertexInputStateCreateInfo is about the per-vertex information that is fed into the pipeline (such as position or texture coordinate/color). We may pass all this information interleaved in a single buffer or in several distinct buffers (which often is the best option; see section A.1).

Each buffer containing per-vertex information is registered as a VkVertexInputBindingDescription. Just like for descriptor set layout bindings, we have to define an arbitrary binding id that will be used in the shaders to refer to information from this source. We describe the stride explictly (i.e., how many bits stand between the data for two consecutive vertices). Finally, despite the name of this structure, it can also be used to pass per-instance data instead of per-vertex data, where "instance" refers to instance rendering, a technique which allows for efficiently rendering the same object multiple times at different positions through a single (special kind of) draw call (see this page for an explanation in the context of OpenGL).

VkVertexInputAttributeDescription describes how to access attributes from input binding descriptions. In addition to a binding id, a format (giving the type and thus the size of the attribute) and an offset (in the per-vertex data) are given. Furthermore, a location id is defined. It must be unique across all bindings, as we will use it to identify this attribute from the vertex shader.

In some situations, we do not pass any per-vertex data to the pipeline: all the information, including for instance coordinates, may be hardcoded in the shaders. We will see a concrete example of how this is done when discussing the vertex shader in more detail. When this is the case, we simply provide no VkVertexInputBindingDescriptions nor VkVertexInputAttributeDescriptions in the VkPipelineVertexInputStateCreateInfo.

B.2.2. Input assembly

The input assembly stage is about gathering the list of vertices that is fed as input to the pipeline and assembling them into geometric primitives. For instance, if we are rendering a mesh, then we should assemble the vertices by triplets corresponding to triangular faces. If we are rendering lines instead, then we would read vertices in a linear sequence.

In fact, things are slightly more complex than that since primitives can either be described as a list of vertices or in an indexed fashion. Indexed rendering is meant to answer the inefficiency that arise from duplicates in the list of vertices. Indeed, if we are rendering a mesh without indexing, then we would be passing the data of each vertex as many times as there are faces that meet at this vertex. This is very wasteful! The alternative, indexing, is about passing a distinct, deduplicated list of vertices, and describing primitives with reference to this list. Instead of writing the full data of a vertex, we just write the index of this vertex in the other list.

All of this is controlled in VkPipelineInputAssemblyStateCreateInfo. There are two things to specify at this stage. First comes the obvious: we should specify our desired topology (see VkPrimitiveTopology for the full list of possibilities). The second parameter is very niche: it only makes sense when the draw call was issued in an indexed fashion and the topology is of either a strip one or triangle fan. It allows for passing a special value instead of an index. This value is interpreted as an end to the definition of a primitive and the beginning of a new one, effectively allowing to draw multiple items in one call.

B.2.3. Rasterization

Rasterization is about discretizing screen-space coordinates. There are many parameters that we can control about this stage; all of them are described in a single VkPipelineRasterizationStateCreateInfo structure.

First, there are options about culling. Culling is a setting that reduces the amount of generated fragments. It is typically used for removing the fragments that correspond to the back faces of meshes. Think for instance about a cube: when no reflections are involved, you can see at most three of its faces. Why bother generating fragments for all six of them? The issue is that figuring out which faces are visible and which are not is computationally hard in the general case. If it would cost more to find whether a face is visible than to keep the fragments associated to it, then culling would be counterproductive! Instead, Vulkan uses a heuristic: when describing meshes, the vertices making up a face should be given in a counter-clockwise manner, in the sense that, when they are seen facing us from a vantage point exterior to the mesh, considering the vertices mapped to on-screen coordinates in a counter-clockwise order should preserve the order from the face definition. For more details and nice illustrations, see this guide (written about OpenGL, but the principle is the same in Vulkan). We can control whichs faces should be culled that way (front faces, back faces, all faces or no face) and whether front faces means a clockwise or a counter-clockwise vertex order.

For meshes, we can control how the triangles are rendered: although they are usually filled-in, we could also do wireframe rendering or even draw the vertices only. When doing wireframe rendering with meshes (or rendering any primitive that displays lines), we can also control the width of lines.

There are options related to depth testing. Depth clamping ensures that all depth values fall in the range that is visible to the camera: a depth cannot smaller than the distance of the near plane nor larger than the distance of the far plane (refer to this page for a refresher on what near/far planes are). Depth bias can be used to tweak fragment depth values (the bias can be more than just a constant value, see the doc for the detail).

The rasterization step can be marked as inactive using a special field in the structure! If this is the case, the rest of the pipeline does not run. See this stackoverflow discussion for more information.

B.2.4. Blending

Blending is about mixing the colors of fragments that map to the same pixel using the color attachment (pixels generated from previous draw calls are also concerned since their value is stored in this attachment). This stage is parameterized through a VkPipelineColorBlendStateCreateInfo structure.

First, for each color attachment, we create a VkPipelineColorBlendAttachmentState. There, we describe how blending should be done for this attachment. We can disable blending altogether for an attachment (meaning that the draw call will not modify it), or for some specific components (e.g., we could keep only the green channel constant). We can also control how the value found in the attachment should be merged with a new one to be written to it using two VkBlendOps: one for color values and one for alpha values. For instance, we could sum both values or just keep the highest one. Furthermore, we can adjust the values used for the computation (so, the one from the attachment and the new one) before it actually runs through. This is done by picking appropriate VkBlendFactors. We can provide a single blend constant (with R, G, B and A components) as part of the blend state create info structure. This value, the same for all attachments, is used instead of the one from the color attachment or from the fragment under consideration for some values of VkBlendFactors. Note that all the VkPipelineColorBlendAttachmentState must be the exact same if the independentBlend device feature is not enabled.

Optionally, we can use a VkLogicOp to handle the blending instead of the VkPipelineColorBlendAttachmentState-based method. This is only possible when the logicOp device feature is enabled. The available operations are not really about blending in the usual sense of the word but more about bitwise binary operations. This is very niche and you probably do not need it.

B.2.5. Pipeline viewport state

The viewport determines what part of the attachments are actually used by the current operation (remember that all attachments are declared with the same size). Two distinct notions appear in VkPipelineViewportStateCreateInfo: viewports and scissors. Both the viewport (defined as a VkViewport) and the scissor (defined as a simple VkRect2D) determine the area that the image gets rendered to. The difference between the two is that the image always stretches so as to fill the viewport. The scissor only cuts off the data that falls beyond the rectangle area it delimitates. There is a nice illustration of this over at vulkan-tutorial.com.

Usually, we will have a single viewport and a single scissor. There are some contexts where multiple viewports are used, although this requires the use of the multiViewport device feature. Common uses of this feature include rendering for VR devices (where both eyes see slightly different outputs) or splitscreen rendering. At any rate, there should be as many viewports as there are scissors.

Note that we do not need to introduce additional attachments in the subpass description to support rendering to multiple viewports. Only one image is produced in the end! Alternative ways of obtaining this result include doing multiple draw calls, each targeting a different area of the same color attachment (although this would mean worse performance).

The VK_KHR_multiview extension provides an even more efficient way of rendering. More details on this stackoverflow post. Doing the same without this extension is actually somewhat tricky: see this sample (note the use of a geometry shader, effectively duplicating vertices for use from both viewports).

In each viewport, we also define to what range the depth information should be mapped. We do this by passing a lower bound and an upper bound. The depth information is mapped linearly to it, with clipping. We almost always set this range to [0.0, 1.0].

B.2.6. Pipeline depth/stencil state

Depth and stencil resources may optionally be attached to the current subpass. If this is the case, then a VkPipelineDepthStencilStateCreateInfo must also be defined, where parameters pertaining to both depth and stencil uses are introduced.

Depth testing may be disabled altogether, or it may be performed in a read-only fashion. The precise operation to be performed is defined as a VkCompareOp. The range of depth values that is considered in-bound is also set from here. Samples that fall out of bounds are simply discarded.

Just like depth testing, stencil testing may be disabled entirely. The only other related parameters control how fragments respectively belonging to front-facing and back-facing faces should be handled by this test (see the explanation about culling in the rasterization section) using VkStencilOpState objects. In addition to the test operation itself, also defined as a VkCompareOp, VkStencilOps are used to define the behavior of samples that pass the stencil test, fail it, or pass it while failing the depth test. Two masks are available for specifying which bits of the stencil value are read from and written to by tests. A constant reference value may also be provided for use within the test operation.

B.2.7. Pipeline multisample state

In renderpasses where multisampling is enabled, VkPipelineMultisampleStateCreateInfo is used for configuring how a draw call should handle this feature. Here is a cool video on the topic.

The number of samples involved is described explicitly (of course, it may not be more than the number of samples described in the related attachment, although it apparently may be less).

By default, the fragment shader runs only once for all samples of a pixel that map to the same object (using the coordinates at the center of the pixel for the computation). This helps with reducing artifacts at the interface between different objects, but does not help with situations where the artifact is located inside the object. Sample shading runs the shader once per sample instead. Of course, this comes at a performance cost. In addition, there is also the option of only running the fragment shader for some fraction of the samples belonging to a given object (e.g., you can specify that Vulkan should run a fragment shader for 40% of samples belonging to an object, so if you were using 16 samples and that 10 belonged to the same object, the graphics shader would run 4 times for this pixel and this object).

You may choose to disable the use of some samples through the sample mask. Furthermore, Vulkan supports alpha to coverage, meaning that the alpha channel of your pixel may be set to reflect the proportion of samples that correspond to the object (especially useful when rendering dense foliage, apparently). Alternatively, the value of the alpha component may be pinned to one.

Bear in mind that multisampling is only one form of antialiasing. It features so prominently in the API since it requires hardware support, but it is not the only available option (see this Wiki article for more information).

B.2.8. Pipeline dynamic state

In pipeline objects, most things are constants. This is good for the driver, as it enables it to use assumptions of constantness to heavily optimize the rendering process. However, sometimes we would actually like to update some of these values. We can only do this in pipelines where those precise elements where described as dynamic state, which is done through a VkPipelineDynamicStateCreateInfo structure.

Parameters that we may turn from constants to variables are described through VkDynamicState. They include viewports, scissors, line width (used for drawing some kinds of primitives only) and the blend constant, as well as information abouth depth and stencil attachments (bias and bounds for depth, compare mask, write mask and reference value for stencil).

B.2.9. Shaders

Unlike in the compute pipeline, we may pass multiple shaders. Like then, we use VkPipelineShaderStageCreateInfo to describe shaders. The only difference is that we build array of such structures instead of a single one.

B.2.9. Putting it all together

The VkGraphicsPipelineCreateInfo contains one field for each of the above structures (plus VkPipelineTessellationStateCreateInfo, which we did not describe here as tesselation is out of the scope of this guide).

In addition to these fields, we also provide a render pass object. We will only be able to bind this graphics pipeline in a context where this render pass or another one compatible with it is bound (see section A.2 for more information). We also specify in which precise subpass this pipeline will be used (again, specifying all of this in advance allows the driver to really optimize the pipeline). The other arguments are carried over from VkComputePipelineCreateInfo: we provide a pipeline layout, and we can use pipeline derivatives (but just like before, we should not).

B.3. Shaders

The graphics pipeline introduces several new kinds of shaders: two mandatory ones, which we present in further detail in this section, as well as optional ones (which we are leaving aside — this chapter is long enough as is).

An important departure from compute shaders is that our shaders plays a precise role in the wider pipeline. They receive some information in, and they output some information out, all of which are just determined by their role in the pipeline. There are special keywords for accessing such elements. In addition, the notion of workgroups and the dimensionality of computations disappears when dealing with graphics.

Here is a sample fragment shader that illustrates some key differences:

#version 460 // We are using version 4.60 of GLSL // Interface blocks layout(binding = 1) uniform sampler2D texSampler; layout(location = 0) in vec3 fragColor; layout(location = 1) in vec2 fragTexCoord; layout(location = 0) out vec4 outColor; void main() { outColor = vec4(fragColor * texture(texSampler, fragTexCoord)); }

There are three new keywords to notice: location, in and out. As you can guess, the last two are about the data that is propagated through the pipeline. As to location, it is a form of index just like binding, but exclusive to such inputs. Remember when we defined VkVertexInputAttributeDescriptions from VkPipelineVertexInputStateCreateInfo: we defined two such fields, using arbitrary values for the binding index and the location. All of the inputs that we described then are fed into the vertex stage. We access such inputs with in followed by and a location that reflect the value we picked when defining the pipeline; we do not pass a binding index at all). The out keyword also sets the location at which data will be found by later stages of the pipeline.

There are also new layout qualifiers (refer to the doc for the full list of available layout qualifiers). Most of the new ones are specific to one kind of shader only, but some are available for all graphics shaders. We already met location, but there is also component, which always comes in addition to a location qualifier, which is tricky and rarely used. Refer to the doc for a fuller explanation. Finally, there is input_attachment_index, used as layout(input_attachment_index = 0, set = 0, binding = 0) uniform subpassInput t;, which is used to refer to input attachments, image views that support pixel-local operations only, which are bound at a specific index (note the specific subpassInput type; set and binding work like for all other uniforms).

B.3.1. Vertex shaders

A vertex shader receives a set of coordinates for each vertex as input (typically in 3D) and emits a new location as output (in 2D). The attribute keyword is a synonym to in that can be used in this stage only.

GLSL defines a special value, gl_Position (of type uvec4) that must hold the screen-space position of the vertex it handled. You may wonder why it is a uvec4 and not a uvec3; this stackoverflow post answers this question (short version: homogeneous coordinates; the last component is almost always set to 1).

Just like for compute shaders, special GLSL variables may be accessed from vertex shaders (see the ref):

There are other output variables in addition to gl_Position, though these are usually not mandatory (see the ref):

B.3.2. Fragment shaders

A fragment shader receives normalized location data as input in 2D and emits a color as output (the format of which depends on your color attachment).

Unlike for the fragment shader, there is no special variable to which the resulting color should be written. This makes some sense when thinking about deferred rendering: there may be several color attachments, so a single keyword would not be enough. Instead, we define one output per color attachment, as in the example given above (the location indicates which one, following the order from the subpass description).

Again, there are some special variables that are accessible from the fragment shader (see the ref):

There are also some variables used for output (see the ref):

There is a very special layout qualifier, early_fragment_tests, which can only be used in the following way: layout(early_fragment_tests) in; (note the absence of variable). This ensures that all depth and stencil tests run before the fragment shader. Note that setting this effectively disables gl_FragDepth.

Furthermore, the index layout qualifier may be used in fragment shader outputs. It always comes alongside a location qualifier, e.g., layout(location = 3, index = 1) out vec4 factor;. This sets the index at which the result of the fragment shader is passed in the blending stage (this is useful for dual source blending; see this page from the OpenGL wiki).

In addition to the standard GLSL functions, there are some fragment shading-exclusive ones.

B.4. Framebuffers and attachments

Framebuffers are sets of image views that can be used as render pass attachments by binding them to a bound render pass in an open command buffer (more detail in the last section of this chapter). Framebuffers are created through vkCreateFramebuffer.

Framebuffers define a certain dimension for images (width, height, and number of layers). They are tied to a specific class of render passes, which limits the context in which they can be bound, but does not tie the framebuffer to that specific render pass: it merely determines their compatibility class. Other render passes with similar attachments can be used instead (only images layouts are allowed to differ, more detail here).

We previously described attachment descriptions and attachment references. We now turn to the attachments themselves, i.e., the concrete image views that we are to bind for use by draw calls in render passes. This is, after all, the entire point of framebuffers.

As anticlimactic as it may be, an attachment is simply a VkImageView. There are restrictions regarding the underlying images. Of course, they should respect the dimensions defined by the framebuffer (width, height, and number of layers). Note that these dimensions are a minimum: the actual resources must be at least as large as this.

B.5. Registering and running render passes

As usual, all commands related to rendering must be sandwiched between a vkBeginCommandBuffer and a vkEndCommandBuffer. In addition to this, we must also register them in a context where a render pass is bound. That is, they must also be sandwiched between a vkCmdBeginRenderPass and a vkCmdEndRenderPass. In addition to the render pass object itself, vkCmdBeginRenderPass expects a (compatible) framebuffer, which ties concrete resources to the attachment slots. We can restrict the area from the image views that may be written to, and we define which colors for clear operations (used e.g. when loading attachments that use VK_ATTACHMENT_LOAD_OP_CLEAR). However, that is not all, as we also need to consider subpasses.

All rendering operations occur in a specific subpass. By default, the first subpass (according to the order defined in the render pass) is bound at vkCmdBeginRenderPass time. VkSubpassContents defines how the commands are to be executed: using only primary command buffers or only secondary ones (which have to be executed through , which is the sole authorized operation for subpasses using secondary buffers); for reminder, secondary buffers are pre-registered command buffers. We use vkCmdNextSubpass to close the current subpass and open the next one (again, in the order defined in the render pass) which also takes a VkSubpassContents as parameter.

Before running a draw call, we bind a pipeline that defines the prototype of the next draw calls that will be run: for instance, they may expect access to a texture resource. Like in the compute case, we do this through vkCmdBindPipeline. The actual resources are also tied with vkCmdBindDescriptorSets, like before. In addition, we have to consider some special kinds of resources that are specific to compute operations: vertex data is bound through vkCmdBindVertexBuffers. Ultimately, we must provide one buffer per VkVertexInputBindingDescription specified in the graphics pipeline's VkPipelineVertexInputStateCreateInfo, and we may provide an offset into each of them. Furthermore, sometimes data is shared across different draw calls. It is possible to update only some of them by specify the index of the first binding to update and a count of how many bindings should be updated in total. If the data is indexed, we use a vkCmdBindIndexBuffer to pass the index buffer (we also specify an offset into the buffer and a datatype).

Note that we typically emit several draw calls per bound pipeline: one per object that runs using the same shader (but maybe different resources). We can call vkCmdBindDescriptorSets, vkCmdBindVertexBuffers and vkCmdBindIndexBuffer between draw calls to update which resources are to be used.

Finally, we can write draw commands in the buffer:

The rest of the draw commands are rather niche. See vkCmdDrawIndirect, vkCmdDrawIndexedIndirect, vkCmdDrawIndexedIndirectCount and vkCmdDrawIndirectByteCount.

If we specified dynamic state previously, we can update them while the render pass is running using the following functions:

Note that many commands may not be run from render passes (for instance, vkCmdDispatch and vkCmdCopyBuffer must both run outside of render passes). Another command that can run from inside render passes is vkCmdClearAttachments, which we use for clearing one or more regions of depth/stencil attachments in the render pass.

Well, that was it! After all of this hard work, we are finally able to do rendering, which is usually the core appeal of Vulkan. Now, all that we are missing is a way of continuously streaming the results of our rendering operations to the screen. We remedy to this in the next (shorter, I swear) chapter, which is also the last real chapter of this series.