Chapter 5: the swapchain
This chapter has only been proofread once (section A is still a bit unclear)
In the previous chapter, we learned how to do offline GPU-based rendering. Offline renderers have all sorts of interesting applications, but for games and any other live applications, we usually want the renderer to run continuously and to steam its results to the screen. As we will see, bridging the gap to real-time rendering is not too much of a hassle: the hardest parts of the Vulkan API are already behind us! Nonetheless, we should not underestimate the task at hand. Naively, we may think that rendering could performed through a simple while loop such as the one below:
The performance of such a process would be quite poor for two main reasons:
- Unexploited opportunities for parallelism: the process described above is entirely sequential. We could be working on the render of multiple images in parallel (this can lead to resource contention, so we should not overdo it).
- Costly CPU/GPU synchronization: extracting the results to the CPU and printing them to the screen takes time. The system could setup a bridge that lets the GPU communicate directly with display-mapped memory (which is probably located on the screen itself). Then, our while loop could be reduced to a single function call (vulkan.draw_next_frame()), which would handle everything while avoiding back-and-forths with the CPU.
The swapchain answers both of these concerns.
In this chapter, we learn how to do pipelined rendering and get their output onto the screen using swapchains. We discuss the use of the Vulkan extension that implement this mechanism, and we describe how to juggle multiple copies of resources to avoid conflicts between parallel renders.
A. A high-level overview
A.1. Windows and surfaces
We want to render to windows, but we have not yet seen how to handle window objects. Windows are a system-side feature, and they are not managed by Vulkan; we have to create them from CPU-land instead. As different systems handle windows in different ways, we use cross-platform libraries that let us manipulate them in an abstracted way. One such library is GLFW, which supports Windows, macOS, Wayland and X11 (with these last two basically being current and legacy Linux). Moreover, GLFW provides a cross-platform way of handling inputs. Alternatives to this library include SFML and SDL, but GLFW is the most lightweight option. We must load it in a specific manner to put it in Vulkan mode: we need to make sure that it can find the Vulkan headers, and we must request some specific instance extensions when initializing Vulkan.
Once GLFW is loaded, we can create windows. From a window, we can extract a surface, i.e., a Vulkan-side view of it for use as a rendering target. Surfaces are not a native feature of Vulkan: they are defined in an instance extension that we need to enable (in fact, this is one of the extensions that GLFW expects us to load).
A.2. Swapchains and presentation
The swapchain is Vulkan's abstraction for the subsystem that handles rendering to a surface. To enable interactions with swapchains from a device, we must request the appropriate device extension at the time of its creation. The swapchain device extension introduces present operations (for presenting an image to a swapchain) and the accompanying queue capacity (we must check that this capacity is supported by a queue before using it for present operations).
When creating a swapchain, we choose an image type (with a resolution, a format, etc). This type is constrained by surface-specific limitations (for instance, the image resolution should match that of the underlying window). We also specify a presentation mode, which dictates how the contents of the window is updated: we may write to the display-mapped buffer directly, or we may use a form of buffering instead.
We acquire an image from the swapchain prior to a render using a special function. Although we could use this image as a draw call's color attachment, rendering directly into the swapchain is limiting: it forces us to stick to the exact resolution of the underlying window, as well as to one of the formats it supports. This rules out rendering options such as targeting a resolution lower than that of the window or high-dynamic-range rendering with non-HDR monitors. The alternative is to introduce distinct images for color attachments; copying these images to the swapchain is cheap.
Finally, we release the updated swapchain image we previously acquired via a present operation. This operation transfers the control of that swapchain image to the presentation engine, which is an abstraction for the system-side component in charge of getting images onto the display (the image typically goes through a compositor; also, if we use triple buffering, images are not displayed at all if fresher ones are produced before the refresh).
In summary, a frame goes through the following steps:
- Acquisition: we get hold of a swapchain image to render to (technically, we acquire this image from the presentation engine).
- Rendering: we render the scene and copy the result to the swapchain image from step 1.
- Presentation: we release the updated image, letting the presentation engine handle it.
On the GPU-side, things are async by default. We handle synchronization manually via primitives such as semaphores and fences. We use an "image acquired" semaphore to ensure that the image does not get written to before the presentation engine is done reading from it, and a "render finished" semaphore to ensure that it does not present it too early. Indeed, the image acquisition function may return an image which is still being read from by the presentation engine or written to by an in-flight render command. The image acquired semaphore is signaled by the presentation engine once an image has been acquired, and waited on by the rendering function. In turn, the render finished semaphore is signaled by the rendering function, and waited on by the presentation engine.
Since we emit draw calls from the CPU, we must also handle CPU-GPU synchronization. We introduce a "render finished" fence (in addition to the semaphore of the same name) to enable waiting for the end of a render operation from the CPU.
A.3. Parallel renders and juggling resources
Rendering several images in parallel can increase a GPU's throughput. There are limits to this, as all rendering tasks contend for the same computational resources. A common architecture supports handling up two frames in parallel (also called frames-in-flight), where frames are handled in a staggered way. That way, there are not too many computations running in parallel (we expect most of the rendering for a frame to be over by the time the next one enters the pipeline anyway). The diagram below illustrates this method (credits go to the vulkan-diagrams project).
We must duplicate some resources to avoid conflicts. For instance, we should not share a single image acquired semaphore for all frames-in-flight. Resources should be duplicated as specified below:
- Per swapchain image resources: the image itself, a framebuffer, an image acquired semaphore (on that note, there is a mistake in the diagram above: that semaphore is defined per frame-in-flight there).
- Per frame-in-flight resources: a command buffer, a render finished fence, a render finished semaphore, the uniform buffers, and all other variable resources.
- Unduplicated resources: the rest, including the pipelines, the render passes, the vertex/index buffers, and all constant resources.
When using pre-baked command buffers, we should generate frames_in_flight_count*swapchain_images_count unique ones as a consequence (this corresponds to all possible combinations of resources that we must handle). Also, in CPU-bound contexts, we can minimize input lag and unnecessary frames generation by measuring how much time is required to generate a frame, and by delaying the generation of subsequent frames so that renders finish right in time for the presentation engine.
A.4. Handling resizes
When a window gets resized or minimized, the corresponding swapchain becomes outdated. Vulkan returns an error on later interactions with that swapchain. In that case, we must recreate it with updated characteristics.
B. A deeper dive
B.1. GLFW, windows and surfaces
GLFW is a cross-platform library that provides an interface for interacting with windows and I/O. Although it was originally developed for use with OpenGL, it also supports Vulkan. This subsection uses C because GLFW is a C library, though GLFW bindings are available for all popular enough programming languages. We define a special preprocessor macro before including GLFW to let it know that we will use it in a Vulkan setting (this works for most setups; if it does not for yours, check GLFW's website):
We can then create a window (see glfwWindowHint, glfwCreateWindow and GLFWwindow):
glfwGetRequiredInstanceExtensions returns the list of Vulkan instance extensions that GLFW requires the instance to load. We must request these extensions at vkCreateInstance time. glfwGetPhysicalDevicePresentationSupport tells us whether a specific queue supports presentation.
We create surfaces from windows via glfwCreateWindowSurface (this function takes a Vulkan instance as argument; we can safely assume that the surface instance extension is part of those that GLFW expects to be loaded).
glfwGetFramebufferSize returns information about the size of our window (in pixels). This use of the word "framebuffer" is not related to its Vulkan meaning. Remember that GLFW also handles inputs (mouse, keyboard, etc).
B.2. The swapchain
B.2.1. Creation
Swapchains represent the device-side subsystem that handles rendering to surfaces. They are defined in the VK_KHR_swapchain device extension, which we load at vkCreateDevice time.
vkCreateSwapchainKHR creates a swapchain object of a certain image type. The choice of that image type is constrained by the underlying surface: we could not use a 1920x1080 HDR image on an old 768x576 screen, for instance. We use vkGetPhysicalDeviceSurfaceCapabilitiesKHR to get information about these constraints. We also set a minimum for the number of images that the swapchain should contain (typically, one more than the minimum number supported by the surface, as this should give us non-blocking behavior).
This is not all. the naive way of sending images to the surface is to immediately forward any image that comes out of the graphics pipeline. Immediate presentation mode is an option, but it leads to tearing (the screen may refresh while the image is being copied into the buffer, leaving us with a mixture of two different images). There are smarter presentation modes that avoid this issue, but they come at a cost (paid in input delay and/or performance). For instance, we may use double buffering, i.e., use two different buffers that we render two in an alternating order. One of these buffers is used for rendering, and the other is bound to the display, but we can swap the role of the buffers around. To avoid tearing, we only do so when during an interval where have the guarantee that the display is not getting refreshed. A Vulkan surface defines a notion of vertical blanking intervals during which we are guaranteed that the screen will not be refreshed.
We do not directly control how the images are sent to the screen. Instead, the surface instance extension (required by GLFW) defines a set of presentation modes that we pick from when creating the swapchain:
- Immediate: images are sent to the screen directly. May lead to tearing.
- Mailbox: multiple buffering with synchronization during the vertical blank. No tearing, but wasteful (the presentation engine may discard frames in CPU-bound contexts).
- FIFO: multiple buffering with a queue, newer results get pushed to the back. No tearing, but the input delay may be high. This is my favorite option, and also the only one guaranteed to be supported.
- Relaxed FIFO: same as above, but if a vertical blank comes before a new render is ready, the next image is sent to the screen immediately. Avoids stalling, but may lead to tearing on rare occasions.
The swapchain also requires a color space. Only VK_COLOR_SPACE_SRGB_NONLINEAR_KHR (sRGB) is defined in core Vulkan, but extensions provide other options (see all available values). To display raw HDR data to an HDR display, we must use a color space larger than the default (we would also be using a large image format in that case).
Additionally, we must specify a transform (usually left to VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR; see available values). This is useful for optimizing what happens when mobile device are rotated, in which case we should set this to the same value as the surface's currentTransform (which we can get through vkGetPhysicalDeviceSurfaceCapabilitiesKHR); we should also adjust the MVP matrix in consequence.
Furthermore, we specify whether we allow Vulkan to discard rendering operations that fall into pixels that cannot be seen (e.g., when another window partially hides our application). We do this through the clipped field. We almost leave this to VK_TRUE in practice, except in the rare cases when we need to read from the images we render.
Finally, there is an oldSwapchain parameter that makes recreation more efficient. It is presented in greater detail in the next subsection.
B.2.2. Recreation
When an application's window is minified, resized, rotated or moved to another screen, the surface's characteristics change, and the swapchain becomes outdated as a result. When we try to acquire or present an image using an outdated swapchain, we end up either with an error code alone (VK_ERROR_OUT_OF_DATE_KHR) or with a warning alongside a result (the warning being VK_SUBOPTIMAL_KHR). Swapchain creation can also fail (or end with a warning) if something about the underlying surface changes while it runs.
In such situations, we recreate the swapchain and the resources that depend on them (framebuffers, image views, and image acquired semaphores). We can pass the outdated swapchain as the oldSwapchain parameter of vkCreateSwapchainKHR, which helps with maximizing the reuse of internal resources. This turns it into a retired swapchain; we cannot acquire new images from such swapchains, but any image acquired prior remains valid until we explicitly destroy the swapchain object. There can be at most one non-retired swapchain bound to a given surface at any time. We are not technically obligated to recreate suboptimal swapchains (i.e., those that return a warning and not an error), but it is better for performance (in that case, we can avoid the latency of a hard sync by finishing using any already acquired image before destroying the old swapchain).
The swapchain part of the specification really is a mess that will probably not get fixed anytime soon:
- There is no simple legal way of destroying swapchains once we start using them — they should only be destroyed after all outstanding operations on the images acquired through them are done, yet there is no vanilla way of releasing acquired images short of presenting them, and the present operation has no fence to tell us when it is done. vkQueueWaitIdle works fine in practice. This sample proposes to wait for the end of the first present of the new swapchain to destroy the old one.
- There is no easy way of destroying render finished semaphores. Even the radical vkDeviceWaitIdle function does not wait for these semaphores; we have to insert a ghost submission that waits for them before signaling a fence. In practice, many engines are not technically correct in that regard but still work fine in practice.
The VK_EXT_swapchain_maintenance1 device extension solves these issues thanks to its additional fence for the present operation and its vkReleaseSwapchainImagesKHR function, which gives us a way of releasing images without presenting them (we provide an array of indices of images to remove). However, this is only an extension and not all devices support it, so we should always integrate a fallback option.
B.3. Rendering loop
We typically use a single queue that handles both graphics and present operations (present queues that do not support graphics operations are technically possible but not constructor does that).
A rendering loop is made up of the following few steps:
- Acquire an image: we use vkAcquireNextImageKHR to request an image from a specific (non-retired) swapchain. This function is blocking, but it takes a timeout argument (in nanoseconds) to avoid deadlocks (or UINT64_MAX for an unbounded wait). The specification does not explicitly describe in which situations this function is guaranteed not to be stuck on a wait, but we can expect that the function never blocks if we create out swapchain with a minimum number of images set to one more than the nominally supported minimum (as discussed previously). We should not use an acquired image index immediately, as it only becomes valid once either one of the optional fence/semaphore arguments becomes signaled.
- Render to it: refer to the previous chapter.
- Present the result: we use vkQueuePresentKHR to present an array of images (usually of size one). In addition to the swapchain/image index pairs, we provide a set of semaphores that this operation should wait for. Also, if we want per swapchain result (again, only if we present to several of them in parallel, which we rarely do), we can provide a pointer to an array of VkResults (we can also pass a null pointer instead of this array and check the global function's result if we do not need such fine-grained information). If we want to use a fence, we need to enable the VK_EXT_swapchain_maintenance1 device extension and to make pNext point to a VkSwapchainPresentFenceInfoKHR structure (which lets us pass either a fence handle or a VK_NULL_HANDLE for every swapchain targeted by the present command).
X. Additional resources
- A 2025 presentation about swapchains by Darius Bozek (video version).
- A sample by Khronos that shows best practices in handling present resources and swapchain recreation.