Chapter 5: the swapchain

Warning

This chapter has never been proofread

In the previous chapter, we learned how to do offline GPU-based rendering. Offline renderers have all sorts of cool applications, but they are not the right tool for anything real-time. For real-time applications, we should turn to real-time rendering, where the renderer runs continuously and the results are streamed to the screen. As we will see, bridging the gap to real-time rendering is not too much of a hassle: the hard parts of the Vulkan API are behind us!

Nonetheless, we should not underestimate the task at hand either. Naively, we may think that rendering could performed through a simple while loop such as the one below:

while !should_exit() { vulkan.send_draw_call(); let image = vulkan.extract_draw_call_result(); system.send_to_screen(image); }

The performance of such a process would be quite poor for two main reasons:

Unexploited opportunities for parallelism: the process described above is entirely sequential. We could be working on the render of multiple images in parallel (this can lead to resource contention, so we should not overdo it).
Costly CPU/GPU synchronization: extracting the results to the CPU and printing them to the screen takes time. The system could setup a bridge that lets the GPU communicate directly with display-mapped memory (which is probably located on the screen itself). Then, our while loop could be reduced to a single function call (vulkan.draw_next_frame()), which would handle everything while avoiding back-and-forths with the CPU.

The swapchain answers both of these concerns.

In this chapter, we learn how to do pipelined rendering using swapchains. We discuss the use of the Vulkan extension that implement this mechanism, and we describe how to juggle multiple copies of resources to avoid conflicts between parallel renders.

A. A high-level overview

A.1. Windows and surfaces

We want to render to windows, but we have not yet seen how to handle window objects. Windows are a system-side feature, and they are not managed by Vulkan. We create them from CPU-land. As different systems handle windows in different ways, we use cross-platform libraries that let us create and manipulate windows in an abstracted way. One such library is GLFW, which supports Windows, macOS, Wayland and X11 (those last two are basically Linux current and legacy). In addition to abstracting away the implementation details of window handling, they also provide a cross-platform way of handling inputs. Alternatives to GLFW include SFML and SDL, but GLFW is the most lightweight option. It was originally designed for use with OpenGL, and we must load it in a specific manner to switch it to Vulkan mode. We need to make sure that it can find the Vulkan headers, and we must request some specific instance extensions when initializing Vulkan.

Once GLFW is loaded, we can create windows. From a window, we can extract a surface, i.e., a Vulkan-side view of it for use as a rendering target. The system impose limitations regarding what kind of images can be rendered to the window (e.g., we cannot display colored images on a black-and-white screen). Surfaces are not a native feature of Vulkan: they are defined in an instance extension that we need to enable (in fact, this is one of the extensions that GLFW expects us to load).

A.2. Swapchains and presentation

The swapchain is Vulkan's abstraction for the subsystem that handles rendering to a surface. This is not a core part of Vulkan: this time, we do not use an instance extension but a device extension for this. The swapchain extension introduces new constructs, including present operations and the accompanying queue capacity that we must check before running present commands on a queue (just like we must check that a queue has supports graphical commands before issuing draw calls through it).

When creating a swapchain, we describe an image type (with a resolution, a format, etc). This type is constrained by surface-specific limitations (for instance, the image resolution should match that of the window). We also specify a presentation mode, which dictates how the contents of the window is updated: we may write directly to the rendering buffer, or we may use a form of buffering. We also specify a minimum for the number of images in this swapchain. The actual number of images is influenced by this number, by the presentation mode (e.g., if we use mailbox mode, at least three images will be present) and by the minimum number supported by the surface. We typically request one more image than the minimum supported by the surface, as this probably results in non-blocking behavior.

We acquire an image from the swapchain prior to a render using a special function. Although we could use this images as a color attachments for draw calls, rendering directly into the swapchain is limiting: it forces us to stick to the exact resolution of the underlying window, as well as to one of the formats it supports. This rules out rendering options such as targeting resolution lower than that of the window to help with performance or high-dynamic-range rendering with non-HDR monitors. The alternative is to use distinct images for color attachments; copying these images into the ones from the swapchain is cheap. We typically acquire only one image at a time (if we want to be able to acquire more images while maintaining non-blocking behavior, we must specify a higher minimum number of images for the swapchain).

Finally, we release the swapchain image we previously acquired via a present operation. This operation transfers the control of that swapchain image to the presentation engine, which is an abstraction for the system-side component in charge of getting images onto the display (the image typically goes through a compositor; also, if we use triple buffering, the image is not displayed at all if a fresher image is produced before the the screen gets refreshed).

In summary, a frame goes through the following steps:

Acquisition: we get hold of a swapchain image to render to.
Rendering: we render the scene and copy the result to the swapchain image from step 1.
Presentation: we release the swapchain image, letting the presentation engine handle it.

On the GPU-side, things are async by default. We handle synchronization manually via primitives such as semaphores and fences (the former for GPU-GPU synchronization and the latter for the GPU-CPU case). We use an image available semaphore (signaled by the presentation engine from the acquisition function, awaited by the rendering function; "image available" is just a name for this semaphore) to ensure that the image does not get written to before the presentation engine is done reading from it, and a render finished semaphore (signaled by the rendering function, awaited by the presentation engine) to ensure that the presentation engine does not present it too early. Indeed, the image acquisition function may return an image which is still being read from by the presentation engine or written to by an in-flight render command for this purpose.

Although it may look like the above covers all of our synchronization needs, we must also think about our CPU-side resources. Consider for instance the image available semaphore. We may well acquire an image before the acquisition for the previous frame has signaled the semaphore. More generally, we can use all resources safely if we wait for the end of the previous render. This is a CPU-GPU synchronization problem, so we use a render finished fence instead of a semaphore.

A.3. Parallel renders and juggling resources

We can work on several images in parallel to increase the throughput of our GPU. Of course, there are limits to this, as the rendering tasks for different frames contend for the same computational resources. A common architecture is to support the handling of up two frames in parallel (also called frames-in-flight), where frames begin being handled in a staggered way (with CPU-side computations between two frames). That way, there are not too many frames running in parallel, and most of the rendering for a frame should be done by the time the next one begins rendering. The diagram below illustrates this method (credits go to the vulkan-diagrams project).

render loop illustration by David DiGioia

In CPU-bound contexts, we can minimize input lag and unnecessary frames generation by measuring how much time is required to generate a frame and by delaying the generation of subsequent frames so that renders finish right in time for the presentation engine.

We must duplicate some resources to avoid conflicts. For instance, we should not use the same semaphores for all frames-in-flight. Resources should be duplicated as specified below:

Per swapchain image resources: the image itself, a framebuffer, an image available semaphore (on that note, there is a mistake in the diagram above: that semaphore should not be defined per frame-in-flight).
Per frame-in-flight resources: a command buffer, a render finished fence, an image acquired semaphore, the uniform buffers, and all other variable resources.
Unduplicated resources: the rest, including the pipelines, the render passes, the vertex/index buffers, and all constant resources.

Note that if we are using pre-baked command buffers, we should generate frames_in_flight_count*swapchain_images_count unique ones as a consequence.

A.4. Handling resizes

When a window gets resized or minimized, the corresponding swapchain becomes outdated. When this happens, Vulkan returns an error when we try to acquire the next swapchain image. We must then recreate the swapchain with updated characteristics.

B. A deeper dive

B.1. GLFW, windows and surfaces

GLFW is a cross-platform library for developing interactive applications: it provides an interface for interacting with windows and I/O. Although it was originally developed for use with OpenGL, it now supports Vulkan. Details regarding how to use GLFW with Vulkan can be found on GLFW's website (read it if you have a non-classical Vulkan setup, which you probably don't have). The tiny samples in this subsection are given in C because GLFW is a C library, although GLFW bindings are available for all popular enough programming languages. We define a special preprocessor macro before including GFLW to let it know that we will use it in a Vulkan setting:

#define GLFW_INCLUDE_VULKAN #include <GLFW/glfw3.h>

We can then create a window, which looks something like this (see glfwWindowHint, glfwCreateWindow and GLFWwindow):

glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API); GLFWwindow *window = glfwCreateWindow(640, 480, "Window Title", NULL, NULL);

We use glfwGetRequiredInstanceExtensions to obtain the list of Vulkan instance extensions that GLFW expects us to load. We must request these extensions at vkCreateInstance time. We can also use glfwGetPhysicalDevicePresentationSupport to check whether a queue supports presentation without first creating a window and the associated surface.

We create a surface from a window using glfwCreateWindowSurface. Note that this function expects a Vulkan instance as an argument. For this, we need to enable the surface instance extension, but we can safely assume that GLFW requests this one.

We can get information about the size of our window (in pixel) via glfwGetFramebufferSize. Note that this use of the word "framebuffer" is not related to its Vulkan meaning.

GLFW also handles inputs (mouse, keyboard, etc).

B.2. Swapchain

B.2.1. Creation

Swapchains represent the device-side subsystem that handles rendering to surfaces. As different devices handle this in different ways, it is no wonder that we must enable a device extension to interact with this facility (namely, VK_KHR_swapchain; we load this extension by passing appropriate arguments at vkCreateDevice time).

We create swapchain objects via vkCreateSwapchainKHR, where we describe the type of the swapchain images. Our choices in that regard are constrained by what the underlying surface supports (we cannot use a 1920x1080 HDR image on an old 768x576 screen). We use vkGetPhysicalDeviceSurfaceCapabilitiesKHR to get information about these constraints. We also specify a minimum number of images for the swapchain (we typically specify one more than the minimum number supported by the surface, as this should give us non-blocking behavior).

The naive way of sending images to the surface is to immediately forward any image that comes out of the graphics pipeline. Immediate presentation mode is an option, but it leads to tearing (the screen may refresh while the image is being copied into the buffer, leaving us with a mixture of two different images). There are smarter presentation modes that avoid this issue, but they come at a cost (paid in input delay and/or performance). For instance, we may use double buffering, i.e., use two different buffers that we render two in an alternating order. One of these buffers is used for rendering, and the other is bound to the display, but we can swap the role of the buffers around. To avoid tearing, we only do so when during an interval where have the guarantee that the display is not getting refreshed. A Vulkan surface defines a notion of vertical blanking intervals during which we are guaranteed that the screen will not be refreshed.

In Vulkan, we do not control how the images are sent to the screen directly. Instead, the surface extension defines a set of presentation modes we can pick from; we must pick one of them when we create the swapchain:

Immediate: images are sent to the screen directly. May lead to tearing.
Mailbox: multiple buffering with synchronization during the vertical blank. No tearing, but wasteful (you end discarding frames if you are CPU-bound).
FIFO: multiple buffering with a queue, newer results get pushed to the back. No tearing, but the input delay may be high. This is my favorite option, and also the only one guaranteed to be supported.
Relaxed FIFO: same as above, but if a vertical blank comes before a new render is ready, the next image is sent to the screen immediately. Avoids stalling, but may lead to tearing in rare situations.

The swapchain also requires a color space. By default, only VK_COLOR_SPACE_SRGB_NONLINEAR_KHR (sRGB) is available, but other options are provided by extensions (see all available values). To display raw HDR data to an HDR display, we must use a color space larger than the default (we would also be using a large format in that case).

Additionally, we must specify a transform, which we usually leave to VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR (see available values). This is useful for optimizing what happens when mobile device are rotated, in which case we should set this to the same value as the surface's current (which we can get through vkGetPhysicalDeviceSurfaceCapabilitiesKHR); we should also adjust the MVP matrix in consequence.

Furthermore, we specify whether we allow Vulkan to discard rendering operations that fall into pixels that cannot be seen (maybe there is another window in front of the application's one). We do this through the clipped field. We always leave this to VK_TRUE in practice (we never read back from the images mapped to the surface anyway).

Finally, there is an oldSwapchain parameter that we use to make recreation more efficient. It is presented in greater in the next subsection.

Note that there can be at most one non-retired swapchain bound to a given surface at any time.

B.2.2. Recreation

When the application's window is minified, resized, rotated or moved to another screen, the surface's characteristics change and the swapchain becomes outdated. When we try to acquire an image from/present an image through an outdated swapchain, we end up either with an error code alone (VK_ERROR_OUT_OF_DATE_KHR) or with a warning (VK_SUBOPTIMAL_KHR) alongside a result. Swapchain creation can also fail (or end with a warning) if something about the surface changes while it runs. In these situations, we recreate the swapchain and the resources that depend on them (framebuffers, image views, and image available semaphores). We are not technically obligated to do so with suboptimal swapchains, but it is better for performance (in that case, we should probably keep using any acquired image before destroying the old swapchain and switching to the images from the new one; this spares us the latency of a hard sync).

We can pass the outdated swapchain as the oldSwapchain parameter of the creation function, which helps with maximizing the reuse of internal resources. This turns it into a retired swapchain; we cannot acquire new images from such swapchains, but any image acquired prior remain valid until we explictly destroy the swapchain object.

The swapchain part of the specification really is a mess that will probably not get fixed anytime soon:

There is no simple legal way of destroying swapchains once we start using them — they should only be destroyed after all outstanding operations on the images acquired from them are done, yet there is no vanilla way of releasing acquired images short of presenting them, and the present operation has no fence to tell us when it is done. Note that vkQueueWaitIdle works in practice. This sample proposes to wait for the end of the first present of the new swapchain to destroy the old one.
There is no easy way of destroying render finished semaphores. Even the radical vkDeviceWaitIdle function does not wait for these semaphores; we have to insert a ghost submission that waits for them before signaling a fence.

The VK_EXT_swapchain_maintenance1 device extension solves these issues thanks to its additional fence for the present operation and its vkReleaseSwapchainImagesKHR function, which gives us a way of releasing images without presenting them (we provide an array of indices of images to remove). However, this is only an extension and not all devices support it, so we should always integrate a fallback option.

B.3. Rendering loop

We typically use a single queue that handles both graphics and present operations (present queues that do not support graphics operations are legal but not implemented in practice).

Acquire an image: we use vkAcquireNextImageKHR to attempt to acquire an image from a specific (non-retired) swapchain. This function is blocking, but it takes a timeout argument (in nanoseconds) to avoid deadlocks (or UINT64_MAX for an unbounded wait). The specification does not give explicitly non-blocking assumptions, but we can expect that the function never blocks if we request enough swapchain image (as discussed previously). We should not used the returned image index directly, as it only becomes valid once either one of the optional fence/semaphore arguments become signaled.
Render to it: refer to the previous chapter.
Present the result: we use vkQueuePresentKHR to present an array of images (usually of size one). In addition to the swapchain/image index pairs, we provide a set of semaphores that this operation should wait for. Also, if we want per swapchain result (again, only if we present to several of them in parallell, which we rarely do), we can provide a pointer to an array of VkResults (we can also pass a null pointer instead of this array and check the global function's result if we do not need such fine-grained information). If we want to use a fence, we need to enable the VK_EXT_swapchain_maintenance1 device extension, and to make pNext point to a VkSwapchainPresentFenceInfoKHR structure (which lets us pass either a fence handle or a VK_NULL_HANDLE for every swapchain targeted by the present command).

We must synchronize all these steps through fences and semaphores (as described in subsection A.2.), and we must pay attention to the return code of any of these functions: some specific errors or warnings indicate that we must recreate the swapchain (as described in subsection B.2.2.).

X. Additional resources

A 2025 presentation about swapchains by Darius Bozek (video version).
A sample by Khronos that shows best practices in handling present resources and swapchain recreation.