Quantcast
Channel: Intel Developer Zone Articles
Viewing all 461 articles
Browse latest View live

Middleware in Game Development

$
0
0

Download PDF [PDF: 251KB]

Middleware in Game Development

Middleware can have a number of different meanings in software development. But in game development, middleware can be thought of in two ways: one as the software between the kernel and the UX, and the other more important one as software that adds services, features, and functionality to improve the game as well as make game development easier. Whether you are looking for an entire game engine to develop your idea into a game, or an efficient easy-to-use video codec to deploy full motion video, this list will guide you to the best middleware to use while developing your game for Intel® architecture.

Game Engines

A game engine typically encapsulates the rendering, physics, sound, input, networking, and artificial intelligence. If you are not building your own engine, then you will need to use a commercial version. The game engines below have been heavily optimized for Intel® hardware, ensuring that your game runs great no matter which Intel® platform you choose to develop for.

EngineDescriptionIntel Resources

Unreal* Engine 4

Unreal Engine 4 powers some of the most visually stunning games in existence while being easy to learn. Blueprints visual scripting lets you jump in with no programming experience, or you can go the traditional route and use C++. Unreal supports cross-platform game development on your Intel® processor-based PC and devices.

https://software.intel.com/en-us/articles/Unreal-Engine-4-with-x86-Support

Unity* 5

Unity 5 is extremely easy to learn and supports both Unity Script and C# programming support. Unity supports cross-platform game development on your Intel processor-based PC and devices.

https://software.intel.com/en-us/articles/unity

Cocos2d-x

Cocos2d-X is an open source game engine that supports cross-platform 2d game development on your Intel processor-based PC and devices. Cocos2d-x supports C++, JavaScript*, and LUA and allows developers to use the same code across all platforms.

https://software.intel.com/en-us/articles/creating-multi-platform-games-with-cocos2d-x-version-30-or-later

Marmalade

Marmalade is designed as a write once, execute anywhere engine. Developers can access low-level platform features for memory management and file access, while using C++ or Objective-C* for game scripting. Marmalade supports cross-platform game development on your Intel processor-based PC and devices.

https://software.intel.com/en-us/android/articles/marmalade-c-and-shiva3d-a-game-engine-guide-for-android-x86

libGDX

libGDX is an open-source, cross-platform game development framework for Windows*, Linux*, OS X*, iOS*, Android*, and Blackberry* platforms and WebGL-enabled browsers. It supports multiple Java* Virtual Machine languages.

https://software.intel.com/en-us/android/articles/preparing-libgdx-to-natively-support-intel-x86-cpus-running-android

Optimization Tools

Intel provides a number of tools for analyzing and optimizing your game. Does a particular section of your game cause long frame draw times? Do you want to optimize your code for multicore performance? Intel’s optimization tools can help you unleash the full performance of Intel hardware.

Intel Optimization ToolsDescriptionIntel Resources

Graphics Performance Analyzers (GPA)

GPA is a set of powerful, agile tools enabling game developers to get full performance out of their gaming platform, including (but not limited to) Intel® Core™ processors and Intel® HD graphics, as well as Intel processor-based tablets running Android.

https://software.intel.com/en-us/gpa/faq

Intel® VTune™ Amplifier

Intel Vtune Amplifier gives insight into threading performance and scalability, bandwidth, caching, and much more. Analysis is faster and easier because VTune Amplifier understands common threading models and presents information at a higher, easily understood level.

https://software.intel.com/en-us/get-started-with-vtune

Intel® Compiler Tools

Intel Compiler tools generate code that unlocks the full horsepower of Intel processors.

https://software.intel.com/en-us/compiler_15.0_ug_c

Intel® Thread Building Blocks (Intel® TBB)

Intel TBB lets you easily write parallel C++ programs. These parallel programs take full advantage of multicore performance, are portable and composable, and have future-proof scalability.

https://software.intel.com/en-us/android/articles/android-tutorial-writing-a-multithreaded-application-using-intel-threading-building-blocks

Other tools to consider

Using these additional tools can further specialize your game. Generate realistic looking vegetation with efficient levels of detail (LODs), compose your Mozart-like audio masterpiece, or improve your global illumination with lifelike shadows and lighting. If you’re looking to push the limits of what is possible in game technology, consider the tools below.

AudioDescription

Wwise*

Multithreaded high-quality audio that integrates easily into multiple game engines and is easily deployed to multiple platforms.

FMOD*

FMOD is a suite of tools for both game development and sound deployment. FMOD studio is an audio creation tool for authoring sounds for your game, while FMOD Ex is a playback engine for sound, with cross-platform compatibility and support for a variety of engines including Unity, Unreal, Cocos2d, and Havok*.

Lighting

Description

Beast*

Autodesk’s Beast provides high-quality global illumination, simulating physically correct real-time lighting.

GUI

Description

Scaleform*

Autodesk’s Scaleform creates menu systems that are both lightweight and feature-rich. Scaleform supports multithreaded rendering, is easy to implement, and supports DirectX* 12.

Misc.

Description

Bink* 2

Bink is a video codec with a self-contained library that does not require software installation. Bink supports multicore CPUs, such as 6th generation Intel processors, for smooth video playback of your game.

SpeedTree*

SpeedTree generates realistic trees with LODs for your game. SpeedTree supports per-instance and per-vertex hue generation to reduce the number of assets for your game, as well as shader optimizations for Intel HD graphics.

Umbra

Umbra is multicore-optimized occlusion-culling middleware, compatible with integration support for Unity and Unreal engines.

Simplygon*

Simplygon automatically generates new LODs by intelligently reducing the number of polygons in models that different LODs require.

Feedback

We value your input! Feel free to comment if you have middleware you’d like to see added to this list. And share screenshots of what you’re working on with middleware in the comments section below. 


API without Secrets: Introduction to Vulkan* Preface

$
0
0

Download PDF (456K)

Link to Github Sample Code

About the Author

I have been a software developer for over 9 years. My main area of interest is graphics programming, and most of my professional career has been involved in 3D graphics. I have a lot of experience in OpenGL* and shading languages (mainly GLSL and Cg), and for about 3 years I also worked with Unity* software. I have also had opportunities to work on some VR projects that involved working with head-mounted displays like Oculus Rift* or even CAVE-like systems.

Recently, with our team here at Intel, I was involved in preparing validation tools for our graphics driver’s support for the emerging API called Vulkan. This graphics programming interface and the approach it represents is new to me. The idea came to me that while I’m learning about it I can, at the same time, prepare a tutorial for writing applications using Vulkan. I can share my thoughts and experiences as someone who knows OpenGL and would like to “migrate” to its successor.

About Vulkan

Vulkan is seen as an OpenGL’s successor. It is a multiplatform API that allows developers to prepare high-performance graphics applications likes games, CAD tools, benchmarks, and so forth. It can be used on different operating systems like Windows*, Linux*, or Android*. The Khronos consortium created and maintains Vulkan. Vulkan also shares some other similarities with OpenGL, including graphics pipeline stages, GLSL shaders (sort of) or nomenclature.

But there are many differences that confirm the need for the new API. OpenGL was changing for over 20 years. Many things have changed in the computer industry since the early 90s, especially in graphics cards architecture. OpenGL is a good library, but not everything can be done by only adding new functionalities that match the abilities of new graphics cards. Sometimes a huge redesign has to be made. And that’s why Vulkan was created.

Vulkan was based on Mantle*—the first in a series of new low-level graphics APIs. Mantle was developed by AMD and designed only for the architecture of Radeon cards. And despite it being the first publicly available API, games and benchmarks that used Mantle saw some impressive performance gains. Then other low-level APIs started appearing, such as Microsoft’s DirectX* 12, Apple’s Metal* and now Vulkan.

What is the difference between traditional graphics APIs and new low-level APIs? High-level APIs like OpenGL are quite easy to use. The developer declares what they want to do and how they want to do it, and the driver handles the rest. The driver checks whether the developer uses API calls in the proper way, whether the correct parameters are passed, and whether the state is adequately prepared. If problems occur, feedback is provided. For ease of use, many tasks have to be done “behind the scenes” by the driver.

In low-level APIs the developer is the one who must take care of most things. They are required to adhere to strict programming and usage rules and also must write much more code. But this approach is reasonable. The developer knows what they want to do and what they want to achieve. The driver does not, so with traditional APIs the driver has to make additional effort for the program to work properly. With APIs like Vulkan this additional effort can be avoided. That’s why DirectX 12, Metal, or Vulkan are called thin-drivers/thin-APIs. Mostly they only communicate user requests to the hardware, providing only a thin abstraction layer of the hardware itself. The driver does as little as possible for the sake of much higher performance.

Low-level APIs require additional work on the application side. But this work can’t be avoided. Someone or something has to do it. So it is much more reasonable for the developer to do it, as they know how to divide work into separate threads, when the image would be a render target (color attachment) or used as a texture/sampler, and so on. The developer knows what pipeline state or what vertex attributes changes more often. All that leads to far more effective use of the graphics card hardware. And the best part is that it works. An impressive performance boost can be observed.

But the word “can” is important. It requires additional effort but also a proper approach. There are scenarios in which no difference in performance between OpenGL and Vulkan will be observed. If someone doesn’t need multithreading or if the application isn’t CPU bound (rendered scenes aren’t too complex), OpenGL is enough and using Vulkan will not give any performance boost (but it may lower power consumption, which is important on mobile devices). But if we want to squeeze every last bit from our graphics hardware, Vulkan is the way to go.

Sooner or later all major graphics engines will support some, if not all, of the new low-level APIs. So if we want to use Vulkan or other APIs, we won’t have to write everything from scratch. But it is always good to know what is going on “under the hood”, and that’s the reason I have prepared this tutorial.

A Note about the Source Code

I’m a Windows developer. When given a choice I write applications for Windows. That’s because I don’t have experience with other operating systems. But Vulkan is a multiplatform API and I want to show that it can be used on different operating systems. That’s why I’ve prepared a sample project that can be compiled and executed both on Windows and Linux.

Source code for this tutorial can be found here:

https://github.com/GameTechDev/IntroductionToVulkan

I have tried to write code samples that are as simple as possible and to not clutter the code with unnecessary “#ifdefs”. Sometimes this can’t be avoided (like in window creation and management) so I decided to divide the code into small parts:

  • Tutorial files are the most important here. They are the ones where all the exciting Vulkan-related code is placed. Each lesson is placed in one header/source pair.
  • OperatingSystem header and source files contain OS-dependent parts of code like window creation, message processing, and rendering loops. These files contain code for both Linux and Windows, but I tried to unify them as much as possible.
  • main.cpp file is a starting point for each lesson. As it uses my custom Window class it doesn’t contain any OS-specific code.
  • VulkanCommon header/source files contain the base class for all tutorials starting from tutorial 3. This class basically replicates tutorials 1 and 2—creation of a Vulkan instance and all other resources necessary for the rendered image to appear on the screen. I’ve extracted this preparation code so the code of all the other chapters could focus on only the presented topics.
  • Tools contain some additional utility functions and classes like a function that reads the contents of a binary file or a wrapper class for automatic object destruction.

The code for each chapter is placed in a separate folder. Sometimes it may contain an additional Data directory in which resources like shaders or textures for a given chapter are placed. This Data folder should be copied to the same directory in which executables will be held. By default executables are compiled into a build folder.

Right. Compilation and build folder. As the sample project should be easily maintained both on Windows and Linux I’ve decided to use CMakeLists.txt file and a CMake tool. On Windows there is a build.bat file that creates a Visual Studio* solution—Microsoft Visual Studio 2013 is required to compile the code on Windows (by default). On Linux I’ve provided a build.sh script that compiles the code using make but CMakeLists.txt can also be easily opened with tools like Qt. CMake is of course also required.

Solution and project files are generated and executables are compiled into the build folder. This folder is also the default working directory, so the Data folders should be copied into it for the lessons to work properly. During execution, in case of any problems, additional information is “printed” in cmd/terminal. So if there is something wrong, run the lesson from the command line/terminal or look into the console/terminal window to see if any messages are displayed.

I hope these notes will help you understand and follow my Vulkan tutorial. Now let’s focus on learning Vulkan itself!

 

Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

Tutorial: Using Intel® RealSense™ Technology in the Unreal Engine* 3 - Part 2

$
0
0

Download PDF 854 KB

Part 1

Setting Up Visual Studio 2010 for the Example Game

The steps below set your map file as the default map for the Example game by modifying the .ini file.

  1. Go to <UE3 Source>\ExmapleGame\Config.

  2. Open DefaultEngine.ini and change as shown below.

    [URL]

    MapExt=umap

    Map=test.umap

    LocalMap=BLrealsense_Map.umap

    TransitionMap=BLrealsense_Map.umap

    EXEName=ExampleGame.exe

    DebugEXEName=DEBUG-ExampleGame.exe

    GameName=Example Game

    GameNameShort=EG

  3. Open ExampleEngine.ini and change as listed.

    [URL]

    Protocol=unreal

    Name=Player

    Map=test.umap

    LocalMap=BLrealsense_Map.umap

    LocalOptions=

    TransitionMap=BLrealsense_Map.umap

    MapExt=umap

    EXEName=ExampleGame.exe

    DebugEXEName=DEBUG-ExampleGame.exe

    SaveExt=usa

    Port=7777

    PeerPort=7778

    GameName=Example Game

    GameNameShort=EG

  4. Open the UE3 Visual Studio project or solution file in <UE3 source>\Development\Src – UE3.sln, or open UE3.sln in Visual Studio.

    Figure 1: Microsoft Visual Studio* 2010.

  5. Build and run as in the previous steps. You will see the Unreal initial window and your game.

Using the Coordinate System in Unreal Engine

Before linking with the Intel® RealSense™ SDK, it is important to understand the coordinate system in Unreal.

Position is tracked by the X-Y-Z axis (Refer to “Origin” and “RotOrigin” class in UE3 source code) and rotation is by Euler (P-Y-R) and Quaternion (Refer to https://en.wikipedia.org/wiki/Quaternion for more detail). 


Figure 2: Coordinate system

Quaternion has one scalar factor and three vector factors.

To convert from Euler angle to Quaternion:

X-Y-Z angles:

Autoexpand Setup for a Debugger in Visual Studio 2010 (Optional)

The debugging symbols for bone structure array, position, and rotation array were originally encrypted and unrecognizable in Visual Studio. To see debugging symbols, follow the steps below.

  1. Find your Autoexp.dat
     

    For Visual Studio and Windows 7 64-bit, it is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Packages\Debugger

  2. Find the debugging script and open it.
     

    UE3 source\ Development/External/Visual Studio Debugging/AUTOEXP.DAT_addons.txt

  3. Copy each [AutoExpand] and [Visualizer] section into your Autoexp.dat.

Intel® RealSense™ SDK Enabling on Unreal Engine 3

This section describes Intel RealSense SDK-related changes in Unreal Engine 3 after installing the Intel RealSense SDK and Depth Camera Manager. Face landmark and head-pose tracking APIs in Intel RealSense SDK are used to manipulate facial expression and head movement of the example character. Head-pose tracking is intuitive since the roll, yaw, and pitch values can be used in Unreal Engine 3 as is, but face landmark tracking is more complicated.


Figure 3: Roll-Yaw-Pitch.

There are 76 traceable points for the face provided by the Intel RealSense SDK. Each expression, like blink or mouth open, has a value range with relevant points. For example, when the eye is closed, the distance between point 12 and point 16 will be around 0, and when the eye is open, the distance will be greater than 0 and varies for each individual.

Based on this, the current implementation is based on the relative calculation of the minimum/maximum value between the character and the user. For example, for blinking, calculate and apply how much distance the game character’s eye should have for eyes open and closed according to the user.


Figure 4: Face landmarks and numbers of the Intel® RealSense™ SDK.

<UE3> is the home folder where UE3 is installed. Below four files are to be modified.

  • <UE3>\Development\Src\UnrealBuildTool\Configuration\UE3BuildConfiguration.cs
  • <UE3>\Development\Src\UnrealBuildTool\Configuration\UE3BuildWin32.cs
  • <UE3>\Development\Src\Engine\Inc\UnSkeletalMesh.h
  • <UE3>\Development\Src\Engine\Src\UnSkeletalComponent.cpp

UE3BuildConfiguration.cs (Optional)

public static bool bRealSense = true;

RealSense relevant codes are enclosed with “#if USE_REALSENSE” phrase. This sentence is used for defining “#if USE_REALSENSE” phrase at UE3BuildWin32.cs file.  If you modify this to “false”, RealSense relevant code won’t be referenced to be compiled. This is an optional.

UE3BuildWin32.cs

if (UE3BuildConfiguration.bRealSense)
{
SetupRealSenseEnvironment();
}
void SetupRealSenseEnvironment()
{
      GlobalCPPEnvironment.Definitions.Add("USE_REALSENSE=1");
      String platform = (Platform == UnrealTargetPlatform.Win64 ? "x64" : "Win32");

      GlobalCPPEnvironment.SystemIncludePaths.Add("$(RSSDK_DIR)/include");
      FinalLinkEnvironment.LibraryPaths.Add("$(RSSDK_DIR)/lib/" + platform);

      if (Configuration == UnrealTargetConfiguration.Debug) {
           FinalLinkEnvironment.AdditionalLibraries.Add("libpxc_d.lib");
      } else {
           FinalLinkEnvironment.AdditionalLibraries.Add("libpxc.lib");
      }
}

The definition of “USE_REALSENSE” that will be used to enable or disable Intel RealSense SDK relevance at source codes (Optional).

Since Unreal Engine 3 is a makefile-based solution, the Intel RealSense SDK header file and library path should be added at the project’s include and library path.

UnSkeletalMesh.h

#if USE_REALSENSE
	PXCFaceData* faceOutput;
	PXCFaceConfiguration *faceConfig;
	PXCSenseManager *senseManager;

	void InitRealSense();
	void ReleaseRealSense();
#endif

This is the declaration part of the Intel RealSense SDK classes and functions. The bone structure manipulating part is at UpdateSkelPos() of UnSkeletalComponent.cpp.

UnSkeletalComponent.cpp

#if USE_REALSENSE
	#include "pxcfacedata.h"
	#include "pxcfacemodule.h"
	#include "pxcfaceconfiguration.h"
	#include "pxcsensemanager.h"

	FLOAT rsEyeMin = 6;
	FLOAT rsEyeMax = 25;

	FLOAT rsMouthMin = 5;
	FLOAT rsMouthMax = 50;

	FLOAT rsMouthWMin = 40;
	FLOAT rsMouthWMax = 70;

	FLOAT chMouthMin = -105;
	FLOAT chMouthMax = -75;
……
#endif

Include Intel RealSense SDK header files. Defines minimum/maximum values of the user and game characters, starting with “rs” as the user’s value, and “ch” as the game character’s value (this should be changed according to the user and game character’s appearance). For example, for blinking, this defines how much distance a game character’s eye should have for eyes open and closed according to the user.

void USkeletalMeshComponent::Attach()
{
……
#if USE_REALSENSE
	senseManager = NULL;
	InitRealSense();
#endif

The Attach() function calls the InitRealSense() function to initialize the Intel RealSense SDK’s relevant classes and configure the camera. 

#if USE_REALSENSE
void USkeletalMeshComponent::InitRealSense() {
	if (senseManager != NULL) return;

	faceOutput = NULL;

	senseManager = PXCSenseManager::CreateInstance();
	if (senseManager == NULL)
	{
 // error found
	}

	PXCSession *session = senseManager->QuerySession();
	PXCCaptureManager* captureManager = senseManager-> QueryCaptureManager();

The InitRealSense() function configures which camera will be used,and creates face-relevant class instances.

void USkeletalMeshComponent::UpdateSkelPose( FLOAT DeltaTime, UBOOL bTickFaceFX )
{
……
#if USE_REALSENSE
if (senseManager->AcquireFrame(false) >= PXC_STATUS_NO_ERROR) {
	faceOutput->Update();
	int totalNumFaces = faceOutput->QueryNumberOfDetectedFaces();
	if (totalNumFaces > 0) {

The UpdateSkelPose() function is used for head pose and face landmark tracking.

// Head
FVector v(yaw, roll, pitch);

LocalAtoms(6).SetRotation(FQuat::MakeFromEuler(v));
LocalAtoms(6).NormalizeRotation();

Head-pose tracking is intuitive because roll, yaw, and pitch values from the Intel RealSense SDK can be used as is.


Figure 5: Face landmarks and numbers that are used for eyes and mouth expression.

To express blinking, landmark points 12, 16 and 20, 24 are used, and 47, 51, 33, 39 are used for mouth expression (detail implementation depends on developers’ preference).

// Mouth
FLOAT mouthOpen = points[51].image.y - points[47].image.y;
mouth = chMouthMax - (mouthOpen - rsMouthMin) * mouthRatio;

mouthOpen = points[47].image.x - points[33].image.x;
rMouthWOpen = chMouthWMin + (mouthOpen - rsMouthWMin) * mouthWRatio;

mouthOpen = points[39].image.x - points[47].image.x;
lMouthWOpen = chMouthWMin + (mouthOpen - rsMouthWMin) * mouthWRatio;

cMouth = chMouthCMax - (mouthOpen - rsMouthWMin) * mouthCRatio;
// Left Eye
FLOAT eyeOpen = points[24].image.y - points[20].image.y;
lEyeInner = chEyeInnerMin + (eyeOpen - rsEyeMin) * innerEyeRatio;
lEyeOuter = chEyeOuterMin + (eyeOpen - rsEyeMin) * outerEyeRatio;
lEyeUpper = chEyeUpperMin + (eyeOpen - rsEyeMin) * upperEyeRatio;
// Right Eye
eyeOpen = points[16].image.y - points[12].image.y;
rEyeInner = chEyeInnerMin + (eyeOpen - rsEyeMin) * innerEyeRatio;
rEyeOuter = chEyeOuterMin + (eyeOpen - rsEyeMin) * outerEyeRatio;
rEyeUpper = chEyeUpperMin + (eyeOpen - rsEyeMin) * upperEyeRatio;
rEyeLower = chEyeLowerMin + (eyeOpen - rsEyeMin) * lowerEyeRatio;

BN_Lips_Corner_R, BN_Lips_Corner_L, BN_Jaw_Dum       is used for mouth expression, and BN_Blink_UpAdd, BN_Blink_Lower, BN_Blink_Inner, BN_Blink_Outer is used to express eye blinking. (Refer to the “Facial Bone Structure in Example Characters” section for each bone number.)

// Mouth
FVector m(90, 0, mouth);
LocalAtoms(59).SetRotation(FQuat::MakeFromEuler(m));

LocalAtoms(57).SetTranslation(FVector(mouthWXZ[2], rMouthWOpen, mouthWXZ[3])); // Right side
LocalAtoms(58).SetTranslation(FVector(mouthWXZ[4], lMouthWOpen * -1, mouthWXZ[5])); // Left side

// Left Eye
LocalAtoms(40).SetTranslation(FVector(eyeXY[0], eyeXY[1], lEyeUpper)); // Upper
LocalAtoms(41).SetTranslation(FVector(eyeXY[2], eyeXY[3], lEyeLower)); // Lower
LocalAtoms(42).SetTranslation(FVector(eyeXY[4], eyeXY[5], lEyeInner)); // Inner
LocalAtoms(43).SetTranslation(FVector(eyeXY[6], eyeXY[7], lEyeOuter)); // Outer

// Right Eye
LocalAtoms(47).SetTranslation(FVector(eyeXY[8], eyeXY[9], rEyeLower)); // Lower
LocalAtoms(48).SetTranslation(FVector(eyeXY[10], eyeXY[11], rEyeOuter)); // Outer
LocalAtoms(49).SetTranslation(FVector(eyeXY[12], eyeXY[13], rEyeInner)); // Inner
LocalAtoms(50).SetTranslation(FVector(eyeXY[14], eyeXY[15], rEyeUpper)); // Upper
void USkeletalMeshComponent::ReleaseRealSense() {
	if (faceOutput)
		faceOutput->Release();

	faceConfig->Release();
	senseManager->Close();
	senseManager->Release();
}

Close and release all of the Intel RealSense SDK relevant class instances.

Facial Bone Structure in Example Characters

In the example, the face is designed with 58 bones. In the image, each box represents a bone. A complete list of bones follows.


Figure 6: Names of bones.

Conclusion

To make an avatar that moves and copies users’ facial movements and expressions to enrich the gaming experience in UE3 and using the Intel RealSense SDK, implementation of the UE3 source code is the only option, and developers must know which source file to change. We hope this document helps you when making avatar in UE3 with the Intel RealSense SDK.

About the Authors

Chunghyun Kim is an application engineer in the Intel Software and Services Group. He focuses on game and graphic optimization on Intel® architecture.

Peter Hong is an application engineer at the Intel Software and Services Group. He focuses on enabling the Intel RealSense SDK for face, hand tracking, 3D scanning, and more.

For More Information

Epic Unreal Engine
https://www.unrealengine.com

Intel RealSense SDK
http://software.intel.com/realsense

Part 1

Tutorial: Using Intel® RealSense™ Technology in the Unreal Engine* 3 - Part 1

$
0
0

Download PDF 1.38 MB

Part 2

Introduction

Epic Games (https://epicgames.com/) Unreal Game Engine 3 (UE3) is a popular PC game engine. Intel® RealSense™ Technology is used for face and hand movement tracking to enrich the gaming experience. In the UE3 environment, using an Unreal script in the Unreal Development Kit (UDK) is the only recommendation and custom functions should be added into the UDK as a plug-in, but the Intel® RealSense™ SDK doesn’t provide an UDK plugin.

This article describes how to add Intel RealSense SDK features into a game character in massively multiplayer online role-playing games (MMORPGs) on UE3 by using C++, not an Unreal script. The common term for determining and modifying a character’s facial structure is “face-rigging.” There are several ways to handle face-rigging in the game, but we are focusing on using the Intel RealSense SDK as the manipulation method of the characters’ facial bone structure for performance and workload issues.

Key points covered in the article include the following:

  • A description of the face-rigging method
  • How to use Autodesk 3ds Max *as part of the Unreal Engine
  • A description of the coordinate system in Unreal: X-Y-Z, Euler (Pitch-Yaw-Roll) and Quaternion
  • How to enable the Intel RealSense SDK on the Unreal Engine
  • Mapping the algorithm of the Intel RealSense SDK to the game engine

Face-Rigging within the Game

There are several ways to modify the underlying bone structure of characters, which we call face-rigging, in the game.

  • Animation with script. Pre-defined animation is the normal method in games, but it is hard to implement for real-time face-rigging. If you want to make a simple emotion expressed on a face, this method would be the best and easiest way. You can control animation using an Unreal script or Matinee.
  • Commercial face-rigging tool – FaceFX*. FaceFX is Unreal’s commercial face-rigging tool from OC3 Entertainment (https://www.facefx.com). It is prelicensed on Unreal Engine 3. FaceFX incorporates full body and face changes for characters.
  • Morph targeting. The previous Intel RealSense SDK with Unity face-rigging sample code (named Angie) used Morph. It is a simple way to implement the Intel RealSense SDK within the game, but it has to make a morph target for each character. In the case of this MMORPG, there are from three to six tribes, and the player can modify a character’s face and body, so there are several thousands of combinations. It requires several thousands of Morph face resources and make less performance compared to bone manipulation.
  • Bone manipulation. If the Intel RealSense SDK can determine bone manipulation, it should be a good method on a real game. Even with several thousands of character combinations, there are comparatively few face structures (tribes x males/females). Also, this method will not impact on rendering and has minimum impact on gaming performance.

For example, the MMORPG game Bless* (http://bless-source.com/, http://bless.pmang.com/) has 10 tribes but there are only eight face bones types for Elf (male/female), Lupus (male/female), Human (male/female), and Mascu (male/female). A full list of bone names is available in the Face Bone Structure section at the end of the document.


Figure 1: Game characters in MMORPG.

Environment

  • Tested machine: Intel® Core™ i7-5775C processor, 16 GB DDR, 256 GB solid-state drive (SSD). Recommended machine: higher CPU like the 6th generation Intel® Core™ i7 processor (code-named Skylake) and a high-performance external graphic card and SSDs with large space storage (more than 50 GB). SSDs rather than hard-disk drives are recommended for I/O bandwidth. Intel® RealSense™ camera (F200) or 2D web camera.
  • Microsoft Windows* 7 64 bit
  • Autodesk 3ds Max 2014
  • Epic Games Unreal Engine 3 source code (required license)
  • Microsoft Visual Studio* 2010
  • Intel RealSense SDK 6.0.21.6598 and Intel® RealSense™ Depth Camera Manager 1.4.27.41944

Setup procedure

  1. Clean install Windows 7 64 bit on the machine and update Windows and drivers for each device
  2. Copy the UE3 source code to the local drive.
  3. Install Microsoft Visual Studio 2010 and update it. A debugging script must be included if you need debugging on Visual Studio 2010. Refer to the backup – Autoexpand setup for debugger.
  4. (Optional) Install Autodesk 3ds Max 2014 if you need to export the FBX file from the MAX file.

Export MAX File to FBX File for Importing UE3

Most common 3D modeling tools like Autodesk 3ds Max or Maya* can export their 3D modeling to the Unreal Engine or Unity* through the FBX file format.

These steps are based on Autodesk 3ds Max 2014.

  1. Open the MAX file that contains the bone structure. You can see the bone positions and outlook as well.


    Figure 2:Open the MAX file.

  2. Export the file to FBX. Set the “by-object” mode to export it correctly if encountered the warning screen of an unsupported “by-layer” mode.


    Figure 3:Export to FBX.

  3. Select all objects, then right-click and select “Object Properties”.


    Figure 4:Export option.

  4. Click “By Object" button to change to “by-layer” mode


    Figure 5: Export option – the by-layer mode.

  5. Select Menu, and then select Export. Enter the export name in the FBX export option. You can select animation and bake animation to test the animation in UE3.


    Figure 6:Export with animation.

Import the FBX File into the UE3 Editor
If you are using standalone type UE3 with standard option (DirectX* 9, 32-bit), you can run the Unreal editor.

  1. Run the UE3 Editor with the following commands:

    Run CMD and go to your UE3 folder (in my case, C:\UE3)
    Go to \Binaries\Win32\
    Run examplegame.exe editor -NoGADWarning


    Figure 7: Unreal Editor startup.

  2. In Content Browser, click Import, and then select the FBX file. Click OK to import. Once imported, you can see Animset, SkeletonMesh, and others.


    Figure 8:Unreal Editor - Content Browser.

  3. To check your imported FBX, right-click Animset and then select Edit Using AnimSet Viewer.


    Figure 9:Unreal editor - AnimSet Viewer.

  4. You can adjust the scale and position of the face using the mouse buttons (left: rotation, middle: position, right: zoom). You can see the bone names on the left side and skeletons on the right side. If you play the animation, the time frame and delta of position and rotation are also visible.


    Figure 10:AnimSet Viewer.


    Figure 11:AnimSet Viewer - Adjust scale.

  5. Select the bone you want (these images use BN_Ear_R 53) and the X-Y-Z coordinate system. To move, drag each X-, Y-, or Z-axis arrow.


    Figure 12:AnimSet Viewer - Check Bone

  6. To test rotation with Euler (pitch-yaw-roll), press the space bar. Changing the coordinate system displays the Euler coordinate system on the right ear. You can adjust the rotation as you drag each P-Y-R circle.


    Figure 13:Change the coordinate system.

Map and Level Creation in UE3 Editor

You can skip this section if you plan to use another existing map file or level. The steps in this section will make simple cubes and light, camera, and actor ( face bone).

  1. Run the UE3 Editor using the following commands:

    Run CMD and go to your UE3 folder (in my case: C:\UE3)
    Go to \Binaries\Win32\
    Run examplegame.exe editor -NoGADWarning

  2. Use one of the template levels, or make yourself a super basic level. Right-click the BSP Cube button. In the pop-up, enter 1024 for X, Y, Z, and enable “Hollow?” Then on the left toolbar, click Add.


    Figure 14: Unreal Editor - make layer.

  3. Fly into the cube using the WASD/arrow keys and the mouse, or alternatively drag around while holding the left/right/both mouse buttons to move the camera.


    Figure 15: Unreal Editor – start location.

  4. To add the game start location, right-click the floor, and then select Add Actor, then select Add Playerstart.

  5. To add light, right-click the wall, and then select Add Actor, then select Add Light(Point).


    Figure 16: Unreal Editor - Add light.

  6. To add an actor: face bone, press the Tab key to move to the contents browser and drag the skeleton mesh into the UE editor.


    Figure 17: Unreal Editor - Add Actor.

  7. To adjust scaling, enter a scaling number on the bottom right.

  8. To adjust position, select the Translation mode icon on the upper-left side; move your character with X-Y-Z.

  9. To adjust rotation, select the Rotation mode icon on the upper-left side; move your character with P-Y-R.


    Figure 18: Unreal Editor - Adjust rotation.

  10. Save the level with a name. In this instance, I used “test.umap” in <UE source>\ExampleGame\Content\Maps


    Figure 19: Unreal Editor – save.

  11. Finally, build it all. From the menu, select Build, and then select Build All.

  12. To check your map, click Play or press Alt+F8.


    Figure 20: Unreal Editor – Build.

  13. Save and exit the UE Editor.

About the Authors

Chunghyun Kim is an application engineer in the Intel Software and Services Group. He focuses on game and graphic optimization on Intel® architecture.

Peter Hong is an application engineer at the Intel Software and Services Group. He focuses on enabling the Intel RealSense SDK for face, hand tracking, 3D scanning, and more.

Part 2

Intel® Parallel Studio XE 2017 Beta

$
0
0

Contents

How to enroll in the Beta program

Complete the pre-beta survey at registration link

  • Information collected from the pre-beta survey will be used to evaluate beta testing coverage. Here is a link to theIntel Privacy Policy.
  • Keep the beta product serial number provided for future reference
  • After registration, you will be taken to the Intel Registration Center to download the product
  • After registration, you will be able to download all available beta products at any time by returning to the Intel Registration Center

Note: At the end of the beta program you should uninstall the beta product software.

What's New in the 2017 Beta

A detailed description of the new features in the 2017 Beta products is available in the Intel® Parallel Studio XE 2017 Beta What's Newdocument. You can also view the Release Notes for the suite or individual components. Download and try sample programs for the Beta products for OS X* platform. Linux* and Windows* versions of the samples are going to be available shortly as well.

Frequently Asked Questions

A complete list of FAQs regarding the 2017 Beta can be found in the Intel® Parallel Studio XE 2017 Beta Program: Frequently Asked Questionsdocument.

Beta duration and schedule

The beta program officially ends June 28th, 2016. The beta license provided will expire October 7th, 2016. Starting June 6th, 2016, you will be asked and be able to complete a survey regarding your experience with the beta software.

During the Beta feedback period, we will provide one periodic update. Here is a rough schedule of those milestones:

  • May 25th: Intel® Parallel Studio XE 2017 Beta Update 1
  • June 6th: Beta completion surveys are made available early
  • June 28th: Beta closes and post-beta surveys are sent

Note that once you register for the Beta, you will be notified automatically when updates are made available.

Support

Technical support will be provided via Intel® Premier Customer Support. The Intel® Registration Centerwill be used to provide updates to the component products during this beta period.

Beta Webinars

Want to know more about the 2017 Beta features in the Intel® Parallel Studio XE? Register for the webinars and view the webinar archives here at the Intel® Software Tools Technical Webinar Series page.

Beta Release Notes

Check out theRelease Notes for the Intel® Parallel Studio XE 2017 Beta and the various constituent components.

Known Issues

This section contains information on known issues (plus associated fixes) of the 2017 Beta versions of the Intel® Parallel Studio XE tools. Check back often for updates.

Compiler Fixes page

Visit the Compiler Fixes page for a list of defect fixes and feature requests that have been incorporated into Intel® C++ and Fortran Compilers 17.0 Beta component in Intel® Parallel Studio XE 2017 Beta. Defects and feature requests described within represent specific issues with specific test cases.

Next Steps

Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2016, Intel Corporation. All rights reserved.

How Intel® Media SDK Screen Capture Plug-in Plays with Video-Streaming Cloud Gaming

$
0
0

Introduction

Nowadays, cloud gaming is becoming more and more popular. The benefits are obvious: Instead of downloading and installing large files, you can just connect to the server from a diversity of devices.

How Video-Streaming Cloud Games Play

Let’s take a look how it works in case of video-streaming cloud games. First, a user runs a game on their device. Then an application sends a signal to the servers, it chooses the best available server in cloud, and connects the user to it. Actually, the game is stored and played on the server. To allow a user to see video streams, the server application captures the image, encodes it and sends it via the Internet to the client app, which decodes the received pictures. Next, the user makes some actions like pressing buttons on a keyboard or game controllers. The final step is sending this information back to the server. This endless cycle continues while the user plays the game.

 

As we can see, screen capturing is one of essential parts of cloud gaming. In this case, screen capture speed is extremely important, because we want to play in real time without any freezes, interruptions and latency delays. Along with cloud gaming, the screen capture feature is also useful in remote desktop and other content capturing scenarios etc. 

Optimize Cloud Game-Play with Intel® Media SDK Screen Capture Plug-in

Intel has a solution - especially for cloud gaming developers to help manage quick these challenges – with an optimized Screen Capture plug-in as part of the Intel® Media SDK and Intel® Media Server Studio. This plug-in is a development library that enables the media acceleration capabilities of Intel® platforms for Windows* desktop front-frame buffer capturing. The Intel Media SDK Screen Capture package includes a hardware-accelerated plug-in library that exposes graphics acceleration capabilities, and is implemented as an Intel Media SDK Decode plug-in. The plug-in can only be loaded into/used with the Intel Media SDK Hardware Library and combines with the Intel-powered encoder, decoder and screen capturing to make your game server work faster.

The Screen Capture procedure can use the decode plug-in and other SDK components. Capturing display are available in NV12 or RGB4 formats. Moreover, the following Screen Capturing features are supported:

  • Display Selection - This an opportunity to choose a display to capture for systems with enabled virtual display. The display selection feature is available on systems with virtual displays only (without physical display connected) and RGB4 output fourcc format.
  • Dirty Rectangles Detection - This function creates the possibility to detect only changed regions on the captured display image.

Set-up Steps for the Intel Media SDK Screen Capture Plug-in

  1. Download and install Intel® Media SDK or Intel® Media Server Studio.
  2. The screen capture plug-in is available at following path: <Installed_Directory>/Intel_Media_SDK_XX/bin/x64(win32)/ 22d62c07e672408fbb4cc20ed7a053e4.
  3. Download and install latest code samples package

Launch Screen Capture Plug-in

Following is an example command line to run screen capture plugin with sample_decode:

sample_decode.exe capture -w [Width] -h [Height] -d3d11 -p 22d62c07e672408fbb4cc20ed7a053e4 -o [output] -n [number of frames] - hw

(If plug-in is installed at a different directory, provide the complete path to the plug-in or copy the plug-in to the same folder directory before running sample_decode.)

On hybrid graphics (Intel Graphics + Discrete Graphics), screen capturing also supports a software implementation (replace parameter -hw with -sw in above command line). Also, please note that a software implementation is not optimized and when using that approach, expect a performance drop vs what you can achieve via hardware implementation. 

For further understanding and usability ease of the Screen capture feature, refer to: tutorial_screen_capture package attached below.

Limitations and Hardware Requirements

Refer to the Screen Capture Manual and Intel® Media SDK Release Notes.

Questions, Comments or Feedback? Connect with other developers and Intel media technical experts at the Intel Developer Zone Media Support Forum.

An overview of the 6th generation Intel® Core™ processor (code-named Skylake)

$
0
0

Introduction

The 6th generation Intel® Core™ processor (code-named Skylake) was launched in 2015. Based on improvements in the core, system-on-a-chip, and platform levels and new capabilities over the previous-generation 14nm processor (code-named Broadwell), Skylake is the processor-of-choice for productivity, creativity, and gaming applications across various form factors. This article provides an overview of the key capabilities and improvements in Skylake, along with exciting new usages like wake on voice and biometric login using Windows* 10.

Skylake architecture

The 6th generation Intel Core microarchitecture is built on 14nm technology that takes into consideration reduced processor and platform size for use in multiple form factors, Intel® architecture and graphics performance improvements, power reduction, and enhanced security features. Figure 1 illustrates these new capabilities and improvements. Actual configuration in OEM devices may vary.

Figure 1: Skylake architecture and improvement summary [1].

Core processor vectors

Performance

Performance improvement is a direct result of providing more instructions to the execution unit—more instructions executed per clock. This was accomplished through four categories of improvements [Ibid]

  • Improved front-end. Smarter branch prediction with higher capacity creates wider instruction decoding, and faster and more efficient prefetch.
  • Enhanced instruction parallelism. With more instructions per clock, the parallelism of instruction execution is improved through deeper out-of-order buffers.
  • Improved execution units (EUs). The EUs are enhanced compared to the previous generations through :
    • Shortening latencies
    • Increased number of EUs
    • Improved power efficiency of turning off units not in use
    • Increased security algorithms execution speed.
  • Improved memory subsystem. With improvements to the front-end, instruction parallelism, and EUs, the memory subsystem is also improved to scale to the bandwidth and performance requirements of the above. This has been accomplished through :
    • Higher load/store bandwidth
    • Prefetcher improvements
    • Deeper storage
    • Fill and write-back buffers
    • Improved page miss handling
    • Improvements to L2 cache miss bandwidth
    • New instructions for cache management

Figure 2: Skylake core uArchitecture at a glance.

Figure 3 shows the resulting increase in parallelism in Skylake compared to previous generations of processors (Sandy Bridge is the second generation and Haswell the 4th generation of Intel® Core™ processors).

Figure 3: Increased parallelism over past generations of processors.

The improvements shown in Figure 3 and more resulted in up to a 60-percent increase in performance compared to a five-year-old PC, with up to 6 times faster video transcoding and up to 11 times the graphics performance.

Figure 4: Performance of 6th generation Intel® Core™ processor compared to a five-year-old PC.

  1. Source: Intel Corporation. Based on estimated SYSmark* 2014 scores comparing Intel® Core™ i5-6500 and Intel® Core™ i5-650 processors.
  2. Source: Intel Corporation. Based on estimated Handbrake w/ QSV scores comparing Intel® Core™ i5-6500 and Intel® Core™ i5-650 processors.
  3. Source: Intel Corporation. Based on estimated 3DMark* “Cloud Gate” scores comparing Intel® Core™ i5-6500 and Intel® Core™ i5-650 processors.

For detailed benchmarks in performance for desktop and laptop can be found in the following:

Desktop performance benchmark: http://www.intel.com/content/www/us/en/benchmarks/desktop/6th-gen-core-i5-6500.html

Laptop performance benchmark: http://www.intel.com/content/www/us/en/benchmarks/laptop/6th-gen-core-i5-6200u.html

Energy efficiency

Resource configuration based on dynamic consumption:

Legacy systems use the Intel® SpeedStep® technology for balancing performance with energy efficiency through a demand-based algorithm controlled by the OS. While this works well for steady workloads, it is not optimal for bursty workloads. In Skylake, Intel® Speed Shift Technology shifts control from the OS to the hardware and allows the processor to go to a maximum clock speed in ~1 ms, providing for finer-grained power management [3].

Figure 5: Comparison of Intel® Speed Shift Technology with Intel® SpeedStep® technology.

On Intel® Core™ i5 – 6200U processor, the chart below gives the response time of Intel Speed Shift Technology compared to Intel SpeedStep technology:

  • Up to 45-percent improved responsiveness
  • Photo enhancement up to 45 percent
  • Sales graphs up to 31 percent
  • Local notes up to 22 percent
  • Overall responsiveness up to 20 percent

[Measured by WebXPRT* 2015, a benchmark from Principled Technologies* that measures the performance of web applications using overall and subtests for photo enhancements, local notes, and sales graphs. Find out more at www.principledtechnologies.com.]

Additional power optimization is also achieved by configuring resources based on dynamic consumption, be it through downscaling of resources that are underutilized or power gating of Intel® Advanced Vector Extensions 2 when not in use, as well as through idle power reduction.

Media and graphics

Intel® HD Graphics capabilities have come a long way in terms of 3D graphics, media and display capabilities, performance, power envelopes and configurability/scalability since processor graphics (the core processor and graphics on the same die) was first introduced in the 2nd generation Intel® Core™ processors. Figure 6 compares some of these improvements that provide a >100X improvement in graphics performance [2].

[Peak Shader FLOPS @ 1 GHz]

Figure 6: Generational features in processor graphics.

Figure 7: Generational improvement in graphics and media.

Gen9 uArchitecture

The Generation 9 (Gen9) graphics architecture is similar to the Gen8 microarchitecture in the Intel® 5th generation Core™ processor code named Broadwell but has been enhanced for performance and scalability. Figure 8 shows a block diagram of the Gen9 uArchitecture [8], which has three main components.

  • Display. On the far left side.
  • Unslice. The L-shaped piece in the center; handles the command streamer, global thread dispatcher, and the Graphics Technology Interface (GTI).
  • Slice. Comprises the EUs.

Compared to Gen8, the Gen9 uArchitecture enables maximum performance per watt, throughput improvements, and separate power/clock domain to the unslice component. This capability makes it more intelligent in terms of power management for uses like media playback. The slice component is configurable. For example, while GT3 can support up to 2 slices (each slice with 24 EUs), GT4 (Halo) can scale up to 3 slice units (GTx stands for the number of EUs based on use: GT1 supports 12, GT2 supports 24, GT3 supports 48, and GT4 supports 72). This architecture is configurable enough to allow for scaling down the number of EUs for low-power scenarios, thus allowing for usages that range from less than 4W to more than 65. API support in Gen9 is available for DirectX* 12, OpenCL™ 2.x, OpenGL* 5.x, and Vulkan*.

Figure 8: Gen9 processor graphics architecture.

You can read more about these components in detail at (IDF link, https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf)

Some of the capabilities and improvements for media include the following [2]:

  • < 1 W consumption and 1 W videoconferencing
  • Camera RAW acceleration with new VQE functions to enable 4K60 RAW video on mobile platforms
  • New Intel® Quick Sync Video Fixed-Function (FF) Mode
  • Rich codec support with fixed function and GPU accelerated decode

Figure 9 gives a snapshot of Gen9 codecs.

Note: Media codec and processing support may not be available on all operating systems and applications.

Figure 9: Codec support in Skylake.

Some of the capabilities and improvements on the display include the following:

  • Panel Blend, Scale, Rotate, Compress
  • High PPI support (4K+)
  • Wireless support up to 4K30
  • Self Refresh (PSR2)
  • CUI X.X – New capabilities, improved performance

For the gaming enthusiasts, the Intel® Core™ I7-6700K processor comes with all these rich features and improvements (see Figure 10). It also includes Intel® Turbo Boost Technology 2.0, Intel® Hyper-Threading Technology, and overclocking. The performance gains are 80 percent better compared to a 5-year-old PC. Additional information can be obtained here: http://www.intel.com/content/www/us/en/processors/core/core-i7ee-processor.html

  1. Source: Intel Corporation. Based on estimated SPECint*_rate_base2006 (8 copy rate) scores comparing Intel® Core™ i7-6700K and Intel® Core™ i7-875K processors.
  2. Source: Intel Corporation. Based on estimated SPECint*_rate_base2006 (8 copy rate) scores comparing Intel® Core™ i7-6700K and Intel® Core™ i7-3770K processors.
  3. Features are present with select chipsets and processor combinations. Warning: Altering clock frequency and/or voltage may (i) reduce system stability and useful life of the system and processor; (ii) cause the processor and other system components to fail; (iii) cause reductions in system performance; (iv) cause additional heat or other damage; and (v) affect system data integrity. Intel has not tested, and does not warranty, the operation of the processor beyond its specification.

Figure 10: Features in the Intel® Core™ i7-6700K processor.

Scalability

Skylake microarchitecture provides for a configurable core—a single design with two derivatives, one for the client space and another for servers—without compromising the power and performance requirements of each segment. Figure 11 shows the various SKUs and their power efficiencies for use in form factors that range from a compute stick on the low end to Intel® Xeon® processor-based workstations on the high end.

Figure 11: Intel® Core™ processor availability across various form factors.

Enhanced security features

Intel® Software Guard Extensions (Intel® SGX): Intel SGX is a set of new instructions provided in Skylake that allows application developers to protect sensitive data from unauthorized modification and access from rogue software running at higher privilege levels. This allows applications to preserve the confidentiality and integrity of sensitive information [1],[3]. Skylake provides instructions and flows to create secure enclaves and enables usage of trusted memory regions. More information about Intel SGX can be obtained here: https://software.intel.com/en-us/blogs/2013/09/26/protecting-application-secrets-with-intel-sgx

Intel® Memory Protection Extensions (Intel® MPX): Intel MPX is a new set of instructions to enable runtime buffer overflow checks. These instructions allow both stack and heap buffer boundary testing before memory access to ensure that the calling process only accesses memory that is allocated to it. Intel MPX support is enabled in Windows* 10 with support for Intel MPX intrinsics in Microsoft Visual Studio* 2015. Most C/C++ applications will be able to use Intel MPX by recompiling their applications without source code changes and interoperating with legacy libraries. Running an Intel MPX-enabled library on legacy systems without Intel MPX support (5th generation Intel® Core™ processors and earlier) do not provide any benefit or impact. It is also possible to enable/disable Intel MPX support dynamically [1], [3].

We’ve covered the architectural improvements and advancement in Skylake. In the next section, we’ll look at some of the Windows 10 feature that are optimized to take advantage of Intel® Core™ processor architecture.

New experiences with Windows 10

The capabilities in the 6th generation Intel Core processor are accentuated by the capabilities within Windows 10, creating an optimal experience. Below are some of the key hardware capabilities from Intel and Windows 10 capabilities that make the Intel® platforms running on Windows 10 more energy efficient, secure, responsive, and scalable [3].

Ϯ Intel and Microsoft active collaboration under way for future Windows support.

Figure 12: Skylake and Windows* 10 capabilities.

Cortana

The Microsoft Cortana* Voice Assistant available with the Windows* 10 RTM allows for a hands-free experience using the Hey Cortana keyword spotter. While the wake-on-voice capability uses the CPU processing audio pipeline for great Correct Accept and low False Accept performance, the capability can also be offloaded to a hardware audio DSP which has built in support on Windows 10 [3].

Windows Hello*

Using biometric hardware and Microsoft Passport*, Windows Hello supports various types of logins using the face, fingerprint, and iris for a password-free, out-of-the box-login experience. The user-facing Intel® RealSense™ camera (F200/SR300) supports biometric authentication using facial login [3].

Figure 13: Windows* Hello with Intel® RealSense™ Technology.

The photos in Figure 13 show how the facial landmarks provided by the F200 camera are used for the enrollment and login scenarios. The 78 landmark points on the face are used to create a facial template the first time a user tries to log in with face recognition. The next time the user tries to log in, the landmarks from the camera are verified against the template to obtain a match. Together with the Microsoft Passport security features and the camera features, the login capability provides for an accuracy determined by 1/100,000 False Acceptance Rate with a False Rejection Rate of 2 to 4 percent.

References

  1. Intel’s next generation microarchitecture code-named Skylake by Julius Mandelblat: http://intelstudios.edgesuite.net/idf/2015/sf/ti/150818_spcs001/index.html
  2. Next-generation Intel® processor graphics architecture, code-named Skylake, by David Blythe: http://intelstudios.edgesuite.net/idf/2015/sf/ti/150818_spcs003/index.html
  3. Intel® architecture code-named Skylake and Windows* 10 better together, by Shiv Koushik: http://intelstudios.edgesuite.net/idf/2015/sf/ti/150819_spcs009/index.html
  4. Skylake for gamers: http://www.intel.com/content/www/us/en/processors/core/core-i7ee-processor.html
  5. Intel’s best processor ever: http://www.intel.com/content/www/us/en/processors/core/core-processor-family.html
  6. Skylake Desktop Performance Benchmark: http://www.intel.com/content/www/us/en/benchmarks/desktop/6th-gen-core-i5-6500.html
  7. Skylake Laptop Performance Benchmark: http://www.intel.com/content/www/us/en/benchmarks/laptop/6th-gen-core-i5-6200u.html
  8. The compute architecture of Intel® processor graphics Gen9: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf

Intel® IPP Functions Optimized for Intel® Advanced Vector Extensions 2 (Intel® AVX2)

$
0
0

Here is a list of Intel® Integrated Performance Primitives (Intel® IPP) functions that are optimized for Intel® Advanced Vector Extensions 2 (AVX2) on Haswell and Intel® microarchitecture code name Skylake. These functions include Convert, CrossCorr, Max/Min, PolarToCart, Sort, and some other arithmetic functions. The functions listed here are all hand-tuned for Intel® architecture. Intel IPP functions that are not listed here also get optimization benefit from Intel® Compiler. 

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

ippiConvert_16s16u_C1Rs
ippiConvert_16s32f_C1R
ippiConvert_16s32s_C1R
ippiConvert_16s8u_C1R
ippiConvert_16u32f_C1R
ippiConvert_16u32s_C1R
ippiConvert_16u8u_C1R
ippiConvert_16s8s_C1RSfs
ippiConvert_16u16s_C1RSfs
ippiConvert_16u8s_C1RSfs
ippiConvert_32f16s_C1RSfs
ippiConvert_32f16u_C1RSfs
ippiConvert_32f32s_C1RSfs
ippiConvert_32f8s_C1RSfs
ippiConvert_32f8u_C1RSfs
ippiCopy_16u_C1MR
ippiCopy_16u_C3MR
ippiCopy_32s_C1MR
ippiCopy_32s_C3MR
ippiCopy_32s_C4MR
ippiCopy_8u_C1MR
ippiCopy_8u_C1R
ippiCopy_8u_C3MR
ippiCopy_8u_C3P3R
ippiCopy_8u_C4P4R
ippiCopyConstBorder_16s_C3R
ippiCopyConstBorder_16s_C4R
ippiCopyConstBorder_16u_C1R
ippiCopyConstBorder_16u_C3R
ippiCopyConstBorder_16u_C4R
ippiCopyConstBorder_32f_C3R
ippiCopyConstBorder_32f_C4R
ippiCopyConstBorder_32s_C3R
ippiCopyConstBorder_32s_C4R
ippiCopyConstBorder_8u_C3R
ippiCopyConstBorder_8u_C4R
ippiCopyReplicateBorder_16s_C1IR
ippiCopyReplicateBorder_16s_C1R
ippiCopyReplicateBorder_16s_C3IR
ippiCopyReplicateBorder_16s_C3R
ippiCopyReplicateBorder_16s_C4IR
ippiCopyReplicateBorder_16s_C4R
ippiCopyReplicateBorder_16u_C1IR
ippiCopyReplicateBorder_16u_C1R
ippiCopyReplicateBorder_16u_C3IR
ippiCopyReplicateBorder_16u_C3R
ippiCopyReplicateBorder_16u_C4IR
ippiCopyReplicateBorder_16u_C4R
ippiCopyReplicateBorder_32f_C1IR
ippiCopyReplicateBorder_32f_C1R
ippiCopyReplicateBorder_32f_C3IR
ippiCopyReplicateBorder_32f_C3R
ippiCopyReplicateBorder_32f_C4IR
ippiCopyReplicateBorder_32f_C4R
ippiCopyReplicateBorder_32s_C1IR
ippiCopyReplicateBorder_32s_C1R
ippiCopyReplicateBorder_32s_C3IR
ippiCopyReplicateBorder_32s_C3R
ippiCopyReplicateBorder_32s_C4IR
ippiCopyReplicateBorder_32s_C4R
ippiCopyReplicateBorder_8u_C1IR
ippiCopyReplicateBorder_8u_C1R
ippiCopyReplicateBorder_8u_C3IR
ippiCopyReplicateBorder_8u_C3R
ippiCopyReplicateBorder_8u_C4IR
ippiCopyReplicateBorder_8u_C4R
ippiCopyMirrorBorder_16s_C1IR
ippiCopyMirrorBorder_16s_C1R
ippiCopyMirrorBorder_16s_C3IR
ippiCopyMirrorBorder_16s_C3R
ippiCopyMirrorBorder_16s_C4IR
ippiCopyMirrorBorder_16s_C4R
ippiCopyMirrorBorder_16u_C1IR
ippiCopyMirrorBorder_16u_C1R
ippiCopyMirrorBorder_16u_C3IR
ippiCopyMirrorBorder_16u_C3R
ippiCopyMirrorBorder_16u_C4IR
ippiCopyMirrorBorder_16u_C4R
ippiCopyMirrorBorder_32f_C1IR
ippiCopyMirrorBorder_32f_C1R
ippiCopyMirrorBorder_32f_C3IR
ippiCopyMirrorBorder_32f_C3R
ippiCopyMirrorBorder_32f_C4IR
ippiCopyMirrorBorder_32f_C4R
ippiCopyMirrorBorder_32s_C1IR
ippiCopyMirrorBorder_32s_C1R
ippiCopyMirrorBorder_32s_C3IR
ippiCopyMirrorBorder_32s_C3R
ippiCopyMirrorBorder_32s_C4IR
ippiCopyMirrorBorder_32s_C4R
ippiCopyMirrorBorder_8u_C1IR
ippiCopyMirrorBorder_8u_C1R
ippiCopyMirrorBorder_8u_C3IR
ippiCopyMirrorBorder_8u_C3R
ippiCopyMirrorBorder_8u_C4IR
ippiCopyMirrorBorder_8u_C4R
ippiCrossCorrNorm_32f_C1R
ippiCrossCorrNorm_16u32f_C1R
ippiCrossCorrNorm_8u32f_C1R
ippiCrossCorrNorm_8u_C1RSfs
ippiDilateBorder_32f_C1R
ippiDilateBorder_32f_C3R
ippiDilateBorder_32f_C4R
ippiDilateBorder_8u_C1R
ippiDilateBorder_8u_C3R
ippiDilateBorder_8u_C4R
ippiDistanceTransform_3x3_8u_C1R
ippiDistanceTransform_3x3_8u32f_C1R
ippiErodeBorder_32f_C1R
ippiErodeBorder_32f_C3R
ippiErodeBorder_32f_C4R
ippiErodeBorder_8u_C1R
ippiErodeBorder_8u_C3R
ippiErodeBorder_8u_C4R
ippiFilterBoxBorder_16s_C1R
ippiFilterBoxBorder_16s_C3R
ippiFilterBoxBorder_16s_C4R
ippiFilterBoxBorder_16u_C1R
ippiFilterBoxBorder_16u_C3R
ippiFilterBoxBorder_16u_C4R
ippiFilterBoxBorder_32f_C1R
ippiFilterBoxBorder_32f_C3R
ippiFilterBoxBorder_32f_C4R
ippiFilterBoxBorder_8u_C1R
ippiFilterBoxBorder_8u_C3R
ippiFilterBoxBorder_8u_C4R
ippiFilterLaplacianBorder_32f_C1R
ippiFilterLaplacianBorder_8u16s_C1R
ippiFilterMaxBorder_32f_C1R
ippiFilterMaxBorder_32f_C3R
ippiFilterMaxBorder_32f_C4R
ippiFilterMaxBorder_8u_C1R
ippiFilterMaxBorder_8u_C3R
ippiFilterMaxBorder_8u_C4R
ippiFilterMedianBorder_16s_C1R
ippiFilterMedianBorder_16u_C1R
ippiFilterMedianBorder_32f_C1R
ippiFilterMedianBorder_8u_C1R
ippiFilterMinBorder_32f_C1R
ippiFilterMinBorder_32f_C3R
ippiFilterMinBorder_32f_C4R
ippiFilterMinBorder_8u_C1R
ippiFilterMinBorder_8u_C3R
ippiFilterMinBorder_8u_C4R
ippiFilterScharrHorizMaskBorder_16s_C1R
ippiFilterScharrHorizMaskBorder_32f_C1R
ippiFilterScharrHorizMaskBorder_8u16s_C1R
ippiFilterScharrVertMaskBorder_16s_C1R
ippiFilterScharrVertMaskBorder_8u16s_C1R
ippiGetCentralMoment_64f
ippiGetNormalizedCentralMoment_64f
ippiGetSpatialMoment_64f
ippiHarrisCorner_32f_C1R
ippiHarrisCorner_8u32f_C1R
ippiHistogramEven_8u_C1R
ippiHoughLine_Region_8u32f_C1R
ippiLUTPalette_8u_C3R
ippiLUTPalette_8u_C4R
ippiMax_16s_C1R
ippiMax_16u_C1R
ippiMax_32f_C1R
ippiMax_8u_C1R
ippiMin_16s_C1R
ippiMin_16u_C1R
ippiMin_32f_C1R
ippiMin_8u_C1R
ippiMinEigenVal_32f_C1R
ippiMinEigenVal_8u32f_C1R
ippiMirror_16s_C1IR
ippiMirror_16s_C1R
ippiMirror_16s_C3IR
ippiMirror_16s_C3R
ippiMirror_16s_C4IR
ippiMirror_16s_C4R
ippiMirror_16u_C1IR
ippiMirror_16u_C1R
ippiMirror_16u_C3IR
ippiMirror_16u_C3R
ippiMirror_16u_C4IR
ippiMirror_16u_C4R
ippiMirror_32f_C1IR
ippiMirror_32f_C1R
ippiMirror_32f_C3IR
ippiMirror_32f_C3R
ippiMirror_32f_C4IR
ippiMirror_32f_C4R
ippiMirror_32s_C1IR
ippiMirror_32s_C1R
ippiMirror_32s_C3IR
ippiMirror_32s_C3R
ippiMirror_32s_C4IR
ippiMirror_32s_C4R
ippiMirror_8u_C1IR
ippiMirror_8u_C1R
ippiMirror_8u_C3IR
ippiMirror_8u_C3R
ippiMirror_8u_C4IR
ippiMirror_8u_C4R
ippiMoments64f_16u_C1R
ippiMoments64f_32f_C1R
ippiMoments64f_8u_C1R
ippiMul_16s_C1RSfs
ippiMul_16u_C1RSfs
ippiMul_32f_C1R
ippiMul_8u_C1RSfs
ippiMulC_16s_C1IRSfs
ippiMulC_32f_C1R
ippiSet_16s_C1MR
ippiSet_16s_C3MR
ippiSet_16s_C4MR
ippiSet_16u_C1MR
ippiSet_16u_C3MR
ippiSet_16u_C4MR
ippiSet_32f_C1MR
ippiSet_32f_C3MR
ippiSet_32f_C4MR
ippiSet_32s_C1MR
ippiSet_32s_C3MR
ippiSet_32s_C4MR
ippiSet_8u_C1MR
ippiSet_8u_C3MR
ippiSet_8u_C4MR
ippiSqr_32f_C1R
ippiSqrDistanceNorm_32f_C1R
ippiSqrDistanceNorm_8u32f_C1R
ippiSwapChannels_16u_C4R
ippiSwapChannels_32f_C4R
ippiSwapChannels_8u_C4R
ippiThreshold_GT_16s_C1R
ippiThreshold_GT_32f_C1R
ippiThreshold_GT_8u_C1R
ippiThreshold_GTVal_16s_C1R
ippiThreshold_GTVal_32f_C1R
ippiThreshold_GTVal_8u_C1R
ippiThreshold_LTVal_16s_C1R
ippiThreshold_LTVal_32f_C1R
ippiThreshold_LTVal_8u_C1R
ippiTranspose_16s_C1IR
ippiTranspose_16s_C1R
ippiTranspose_16s_C3IR
ippiTranspose_16s_C3R
ippiTranspose_16s_C4IR
ippiTranspose_16s_C4R
ippiTranspose_16u_C1IR
ippiTranspose_16u_C1R
ippiTranspose_16u_C3IR
ippiTranspose_16u_C3R
ippiTranspose_16u_C4IR
ippiTranspose_16u_C4R
ippiTranspose_32f_C1IR
ippiTranspose_32f_C1R
ippiTranspose_32f_C3IR
ippiTranspose_32f_C3R
ippiTranspose_32f_C4IR
ippiTranspose_32f_C4R
ippiTranspose_32s_C1IR
ippiTranspose_32s_C1R
ippiTranspose_32s_C3IR
ippiTranspose_32s_C3R
ippiTranspose_32s_C4IR
ippiTranspose_32s_C4R
ippiTranspose_8u_C1IR
ippiTranspose_8u_C1R
ippiTranspose_8u_C3IR
ippiTranspose_8u_C3R
ippiTranspose_8u_C4IR
ippiTranspose_8u_C4R
ippsDotProd_32f64f
ippsDotProd_64f
ippsFlip_16u_I
ippsFlip_32f_I
ippsFlip_64f_I
ippsFlip_8u_I
ippsMagnitude_32f
ippsMagnitude_64f
ippsMaxEvery_16u
ippsMaxEvery_32f
ippsMaxEvery_64f
ippsMaxEvery_8u
ippsMinEvery_16u
ippsMinEvery_32f
ippsMinEvery_64f
ippsMinEvery_8u
ippsPolarToCart_32f
ippsPolarToCart_64f
ippsSortAscend_16s_I
ippsSortAscend_16u_I
ippsSortAscend_32f_I
ippsSortAscend_32s_I
ippsSortAscend_64f_I
ippsSortAscend_8u_I
ippsSortDescend_16s_I
ippsSortDescend_16u_I
ippsSortDescend_32f_I
ippsSortDescend_32s_I
ippsSortDescend_64f_I
ippsSortDescend_8u_I
ippsSortIndexAscend_16s_I
ippsSortIndexAscend_16u_I
ippsSortIndexAscend_32f_I
ippsSortIndexAscend_32s_I
ippsSortIndexAscend_64f_I
ippsSortIndexAscend_8u_I
ippsSortIndexDescend_16s_I
ippsSortIndexDescend_16u_I
ippsSortIndexDescend_32f_I
ippsSortIndexDescend_32s_I
ippsSortIndexDescend_64f_I
ippsSortIndexDescend_8u_I
 
ippiAdd_8u_C1RSfs
ippiAdd_16u_C1RSfs
ippiAdd_16s_C1RSfs
ippiAdd_32f_C1R
ippiSub_8u_C1RSfs
ippiSub_16u_C1RSfs
ippiSub_16s_C1RSfs
ippiSub_32f_C1R
ippiMaxEvery_8u_C1R
ippiMaxEvery_16u_C1R
ippiMaxEvery_32f_C1R
ippiMinEvery_8u_C1R
ippiMinEvery_16u_C1R
ippiMinEvery_32f_C1R
ippiAnd_8u_C1R
ippiOr_8u_C1R
ippiXor_8u_C1R
ippiNot_8u_C1R
ippiCompare_8u_C1R
ippiCompare_16u_C1R
ippiCompare_16s_C1R
ippiCompare_32f_C1R
ippiSum_8u_C1R 
ippiSum_8u_C3R 
ippiSum_8u_C4R 
ippiSum_16u_C1R
ippiSum_16u_C3R
ippiSum_16u_C4R
ippiSum_16s_C1R
ippiSum_16s_C3R
ippiSum_16s_C4R
ippiSum_32f_C1R
ippiSum_32f_C3R
ippiSum_32f_C4R
ippiMean_8u_C1R 
ippiMean_8u_C3R 
ippiMean_8u_C4R 
ippiMean_16u_C1R
ippiMean_16u_C3R
ippiMean_16u_C4R
ippiMean_16s_C1R
ippiMean_16s_C3R
ippiMean_16s_C4R
ippiMean_32f_C1R
ippiMean_32f_C3R
ippiMean_32f_C4R
ippiNorm_Inf_8u_C1R
ippiNorm_Inf_8u_C3R 
ippiNorm_Inf_8u_C4R 
ippiNorm_Inf_16u_C1R
ippiNorm_Inf_16u_C3R
ippiNorm_Inf_16u_C4R
ippiNorm_Inf_16s_C1R
ippiNorm_Inf_16s_C3R
ippiNorm_Inf_16s_C4R
ippiNorm_Inf_32f_C1R
ippiNorm_Inf_32f_C3R
ippiNorm_Inf_32f_C4R
ippiNorm_L1_8u_C1R
ippiNorm_L1_8u_C3R 
ippiNorm_L1_8u_C4R 
ippiNorm_L1_16u_C1R
ippiNorm_L1_16u_C3R
ippiNorm_L1_16u_C4R
ippiNorm_L1_16s_C1R
ippiNorm_L1_16s_C3R
ippiNorm_L1_16s_C4R
ippiNorm_L1_32f_C1R
ippiNorm_L1_32f_C3R
ippiNorm_L1_32f_C4R
ippiNorm_L2_8u_C1R
ippiNorm_L2_8u_C3R 
ippiNorm_L2_8u_C4R 
ippiNorm_L2_16u_C1R
ippiNorm_L2_16u_C3R
ippiNorm_L2_16u_C4R
ippiNorm_L2_16s_C1R
ippiNorm_L2_16s_C3R
ippiNorm_L2_16s_C4R
ippiNorm_L2_32f_C1R
ippiNorm_L2_32f_C3R
ippiNorm_L2_32f_C4R
ippiNormRel_Inf_8u_C1R
ippiNormRel_Inf_16u_C1R
ippiNormRel_Inf_16s_C1R
ippiNormRel_Inf_32f_C1R
ippiNormRel_L1_8u_C1R
ippiNormRel_L1_16u_C1R
ippiNormRel_L1_16s_C1R
ippiNormRel_L1_32f_C1R
ippiNormRel_L2_8u_C1R
ippiNormRel_L2_16u_C1R
ippiNormRel_L2_16s_C1R
ippiNormRel_L2_32f_C1R
ippiNormDiff_Inf_8u_C1R
ippiNormDiff_Inf_8u_C3R 
ippiNormDiff_Inf_8u_C4R 
ippiNormDiff_Inf_16u_C1R
ippiNormDiff_Inf_16u_C3R
ippiNormDiff_Inf_16u_C4R
ippiNormDiff_Inf_16s_C1R
ippiNormDiff_Inf_16s_C3R
ippiNormDiff_Inf_16s_C4R
ippiNormDiff_Inf_32f_C1R
ippiNormDiff_Inf_32f_C3R
ippiNormDiff_Inf_32f_C4R
ippiNormDiff_L1_8u_C1R
ippiNormDiff_L1_8u_C3R 
ippiNormDiff_L1_8u_C4R 
ippiNormDiff_L1_16u_C1R
ippiNormDiff_L1_16u_C3R
ippiNormDiff_L1_16u_C4R
ippiNormDiff_L1_16s_C1R
ippiNormDiff_L1_16s_C3R
ippiNormDiff_L1_16s_C4R
ippiNormDiff_L1_32f_C1R
ippiNormDiff_L1_32f_C3R
ippiNormDiff_L1_32f_C4R
ippiNormDiff_L2_8u_C1R
ippiNormDiff_L2_8u_C3R 
ippiNormDiff_L2_8u_C4R 
ippiNormDiff_L2_16u_C1R
ippiNormDiff_L2_16u_C3R
ippiNormDiff_L2_16u_C4R
ippiNormDiff_L2_16s_C1R
ippiNormDiff_L2_16s_C3R
ippiNormDiff_L2_16s_C4R
ippiNormDiff_L2_32f_C1R
ippiNormDiff_L2_32f_C3R
ippiNormDiff_L2_32f_C4R
ippiSwapChannels_8u_C3C4R
ippiSwapChannels_16u_C3C4R
ippiSwapChannels_32f_C3C4R
ippiSwapChannels_8u_C4C3R
ippiSwapChannels_16u_C4C3R
ippiSwapChannels_32f_C4C3R
ippiSwapChannels_8u_C3R
ippiSwapChannels_16u_C3R
ippiSwapChannels_32f_C3R
ippiSwapChannels_8u_AC4R
ippiSwapChannels_16u_AC4R
ippiSwapChannels_32f_AC4R
ippiCopy_8u_AC4C3R
ippiCopy_16u_AC4C3R
ippiCopy_32f_AC4C3R
ippiCopy_8u_P3C3R
ippiCopy_16u_P3C3R
ippiCopy_32f_P3C3R
ippiMulC_32f_C1IR
ippiSet_8u_C1R
ippiSet_16u_C1R
ippiSet_32f_C1R
ippiSet_8u_C3R
ippiSet_16u_C3R
ippiSet_32f_C3R
ippiSet_8u_C4R
ippiSet_16u_C4R
ippiWarpAffineBack_8u_C1R 
ippiWarpAffineBack_8u_C3R 
ippiWarpAffineBack_8u_C4R 
ippiWarpAffineBack_16u_C1R
ippiWarpAffineBack_16u_C3R
ippiWarpAffineBack_16u_C4R
ippiWarpAffineBack_32f_C1R
ippiWarpAffineBack_32f_C3R
ippiWarpAffineBack_32f_C4R
ippiWarpPerspectiveBack_8u_C1R 
ippiWarpPerspectiveBack_8u_C3R 
ippiWarpPerspectiveBack_8u_C4R 
ippiWarpPerspectiveBack_16u_C1R
ippiWarpPerspectiveBack_16u_C3R
ippiWarpPerspectiveBack_16u_C4R
ippiWarpPerspectiveBack_32f_C1R
ippiWarpPerspectiveBack_32f_C3R
ippiWarpPerspectiveBack_32f_C4R
ippiCopySubpixIntersect_8u_C1R
ippiCopySubpixIntersect_8u32f_C1R
ippiCopySubpixIntersect_32f_C1R
ippiSqrIntegral_8u32f64f_C1R
ippiIntegral_8u32f_C1R
ippiSqrIntegral_8u32s64f_C1R
ippiIntegral_8u32s_C1R
ippiHaarClassifierFree_32f
ippiHaarClassifierInitAlloc_32f
ippiHaarClassifierFree_32f
ippiRectStdDev_32f_C1R
ippiApplyHaarClassifier_32f_C1R
ippiAbsDiff_8u_C1R
ippiAbsDiff_16u_C1R
ippiAbsDiff_32f_C1R
ippiMean_8u_C1MR 
ippiMean_16u_C1MR
ippiMean_32f_C1MR
ippiMean_8u_C3CMR 
ippiMean_16u_C3CMR
ippiMean_32f_C3CMR
ippiMean_StdDev_8u_C1MR 
ippiMean_StdDev_16u_C1MR
ippiMean_StdDev_32f_C1MR
ippiMean_StdDev_8u_C3CMR 
ippiMean_StdDev_16u_C3CMR
ippiMean_StdDev_32f_C3CMR
ippiMean_StdDev_8u_C1R 
ippiMean_StdDev_16u_C1R
ippiMean_StdDev_32f_C1R
ippiMean_StdDev_8u_C3CR 
ippiMean_StdDev_16u_C3CR
ippiMean_StdDev_32f_C3CR
ippiMinMaxIndx_8u_C1MR 
ippiMinMaxIndx_16u_C1MR
ippiMinMaxIndx_32f_C1MR
ippiMinMaxIndx_8u_C1R 
ippiMinMaxIndx_16u_C1R
ippiMinMaxIndx_32f_C1R
ippiNorm_Inf_8u_C1MR
ippiNorm_Inf_8s_C1MR 
ippiNorm_Inf_16u_C1MR
ippiNorm_Inf_32f_C1MR
ippiNorm_L1_8u_C1MR
ippiNorm_L1_8s_C1MR 
ippiNorm_L1_16u_C1MR
ippiNorm_L1_32f_C1MR
ippiNorm_L2_8u_C1MR
ippiNorm_L2_8s_C1MR 
ippiNorm_L2_16u_C1MR
ippiNorm_L2_32f_C1MR
ippiNorm_Inf_8u_C3CMR
ippiNorm_Inf_8s_C3CMR 
ippiNorm_Inf_16u_C3CMR
ippiNorm_Inf_32f_C3CMR
ippiNorm_L1_8u_C3CMR
ippiNorm_L1_8s_C3CMR 
ippiNorm_L1_16u_C3CMR
ippiNorm_L1_32f_C3CMR
ippiNorm_L2_8u_C3CMR
ippiNorm_L2_8s_C3CMR 
ippiNorm_L2_16u_C3CMR
ippiNorm_L2_32f_C3CMR
ippiNormRel_Inf_8u_C1MR
ippiNormRel_Inf_8s_C1MR 
ippiNormRel_Inf_16u_C1MR
ippiNormRel_Inf_32f_C1MR
ippiNormRel_L1_8u_C1MR
ippiNormRel_L1_8s_C1MR 
ippiNormRel_L1_16u_C1MR
ippiNormRel_L1_32f_C1MR
ippiNormRel_L2_8u_C1MR
ippiNormRel_L2_8s_C1MR 
ippiNormRel_L2_16u_C1MR
ippiNormRel_L2_32f_C1MR
ippiNormDiff_Inf_8u_C1MR
ippiNormDiff_Inf_8s_C1MR 
ippiNormDiff_Inf_16u_C1MR
ippiNormDiff_Inf_32f_C1MR
ippiNormDiff_L1_8u_C1MR
ippiNormDiff_L1_8s_C1MR 
ippiNormDiff_L1_16u_C1MR
ippiNormDiff_L1_32f_C1MR
ippiNormDiff_L2_8u_C1MR
ippiNormDiff_L2_8s_C1MR 
ippiNormDiff_L2_16u_C1MR
ippiNormDiff_L2_32f_C1MR
ippiNormDiff_Inf_8u_C3CMR
ippiNormDiff_Inf_8s_C3CMR 
ippiNormDiff_Inf_16u_C3CMR
ippiNormDiff_Inf_32f_C3CMR
ippiNormDiff_L1_8u_C3CMR
ippiNormDiff_L1_8s_C3CMR 
ippiNormDiff_L1_16u_C3CMR
ippiNormDiff_L1_32f_C3CMR
ippiNormDiff_L2_8u_C3CMR 
ippiNormDiff_L2_8s_C3CMR 
ippiNormDiff_L2_16u_C3CMR
ippiNormDiff_L2_32f_C3CMR
ippiFilterRowBorderPipelineGetBufferSize_32f_C1R
ippiFilterRowBorderPipelineGetBufferSize_32f_C3R
ippiFilterRowBorderPipeline_32f_C1R
ippiFilterRowBorderPipeline_32f_C3R
ippiDistanceTransform_5x5_8u32f_C1R
ippiTrueDistanceTransform_8u32f_C1R
ippiTrueDistanceTransformGetBufferSize_8u32f_C1R
ippiFilterScharrVertGetBufferSize_32f_C1R
 ippiFilterScharrVertMaskBorderGetBufferSize
ippiFilterScharrVertBorder_32f_C1R
 ippiFilterScharrVertMaskBorder_32f_C1R
ippiFilterScharrHorizGetBufferSize_32f_C1R
 ippiFilterScharrHorizMaskBorderGetBufferSize
ippiFilterScharrHorizBorder_32f_C1R
ippiFilterSobelNegVertGetBufferSize_8u16s_C1R
ippiFilterSobelNegVertBorder_8u16s_C1R
ippiFilterSobelHorizBorder_8u16s_C1R
ippiFilterSobelVertSecondGetBufferSize_8u16s_C1R
ippiFilterSobelVertSecondBorder_8u16s_C1R
ippiFilterSobelHorizSecondGetBufferSize_8u16s_C1R
ippiFilterSobelHorizSecondBorder_8u16s_C1R
ippiFilterSobelNegVertGetBufferSize_32f_C1R
ippiFilterSobelNegVertBorder_32f_C1R
ippiFilterSobelHorizGetBufferSize_32f_C1R
ippiFilterSobelHorizBorder_32f_C1R
ippiFilterSobelVertSecondGetBufferSize_32f_C1R
ippiFilterSobelVertSecondBorder_32f_C1R
ippiFilterSobelHorizSecondGetBufferSize_32f_C1R
ippiFilterSobelHorizSecondBorder_32f_C1R
ippiColorToGray_8u_C3C1R
ippiColorToGray_16u_C3C1R
ippiColorToGray_32f_C3C1R
ippiColorToGray_8u_AC4C1R
ippiColorToGray_16u_AC4C1R
ippiColorToGray_32f_AC4C1R
ippiRGBToGray_8u_C3C1R
ippiRGBToGray_16u_C3C1R
ippiRGBToGray_32f_C3C1R
ippiRGBToGray_8u_AC4C1R
ippiRGBToGray_16u_AC4C1R
ippiRGBToGray_32f_AC4C1R
ippiRGBToXYZ_8u_C3R
ippiRGBToXYZ_16u_C3R
ippiRGBToXYZ_32f_C3R
ippiXYZToRGB_8u_C3R
ippiXYZToRGB_16u_C3R
ippiXYZToRGB_32f_C3R
ippiRGBToHSV_8u_C3R
ippiRGBToHSV_16u_C3R
ippiHSVToRGB_8u_C3R
ippiHSVToRGB_16u_C3R
ippiRGBToHLS_8u_C3R
ippiRGBToHLS_16u_C3R
ippiRGBToHLS_32f_C3R
ippiHLSToRGB_8u_C3R
ippiHLSToRGB_16u_C3R
ippiHLSToRGB_32f_C3R
 ippiDotProd_8u64f_C1R
 ippiDotProd_16u64f_C1R
 ippiDotProd_16s64f_C1R
 ippiDotProd_32u64f_C1R
 ippiDotProd_32s64f_C1R
 ippiDotProd_32f64f_C1R
 ippiDotProd_8u64f_C3R
 ippiDotProd_16u64f_C3R
 ippiDotProd_16s64f_C3R
 ippiDotProd_32u64f_C3R
 ippiDotProd_32s64f_C3R
 ippiDotProd_32f64f_C3R
 ippiDotProd_8u64f_C4R
 ippiDotProd_16u64f_C4R
 ippiDotProd_16s64f_C4R
 ippiDotProd_32u64f_C4R
 ippiDotProd_32s64f_C4R
 ippiDotProd_32f64f_C4R

API without Secrets: Introduction to Vulkan* Part 1: The Beginning

$
0
0

Download [PDF 736 KB]

Link to Github Sample Code


Go to: API without Secrets: Introduction to Vulkan* Part 0: Preface


Table of Contents

Tutorial 1: Vulkan* – The Beginning

We start with a simple application that unfortunately doesn’t display anything. I won’t present the full source code (with windowing, rendering loop, and so on) here in the text as the tutorial would be too long. The entire sample project with full source code is available in a provided example that can be found at https://github.com/GameTechDev/IntroductionToVulkan. Here I show only the parts of the code that are relevant to Vulkan itself. There are several ways to use the Vulkan API in our application:

  1. We can dynamically load the driver’s library that provides Vulkan API implementation and acquire function pointers by ourselves from it.
  2. We can use the Vulkan SDK and link with the provided Vulkan Runtime (Vulkan Loader) static library.
  3. We can use the Vulkan SDK, dynamically load Vulkan Loader library at runtime, and load function pointers from it.

The first approach is not recommended. Hardware vendors can modify their drivers in any way, and it may affect compatibility with a given application. It may even break the application and requiredevelopers writing a Vulkan-enabled application to rewrite some parts of the code. That’s why it’s better to use some level of abstraction.

The recommended solution is to use the Vulkan Loader from the Vulkan SDK. It provides more configuration abilities and more flexibility without the need to modify Vulkan application source code. One example of the flexibility is Layers. The Vulkan API requires developers to create applications that strictly follow API usage rules. In case of any errors, the driver provides us with little feedback, only some severe and important errors are reported (for example, out of memory). This approach is used so the API itself can be as small (thin) and as fast as possible. But if we want to obtain more information about what we are doing wrong we have to enable debug/validation layers. There are different layers for different purposes such as memory usage, proper parameter passing, object life-time checking, and so on. These layers all slow down the application’s performance but provide us with much more information.

We also need to choose whether we want to statically link with a Vulkan Loader or whether we will load it dynamically and acquire function pointers by ourselves at runtime. This choice is just a matter of personal preference. This paper focuses on the third way of using Vulkan: dynamically loading function pointers from the Vulkan Runtime library. This approach is similar to what we had to do when we wanted to use OpenGL* on a Windows* system in which only some basic functions were provided by the default implementation. The remaining functions had to be loaded dynamically using wglGetProcAddress() or standard windows GetProcAddress() functions. This is what wrangler libraries such as GLEW or GL3W were created for.

Loading Vulkan Runtime Library and Acquiring Pointer to an Exported Function

In this tutorial we go through the process of acquiring Vulkan functions pointers by ourselves. We load them from the Vulkan Runtime library (Vulkan Loader) which should be installed along with the graphics driver that supports Vulkan. The dynamic library for Vulkan (Vulkan Loader) is named vulkan-1.dll on Windows* and libvulkan.so on Linux*.

From now on, I refer to the first tutorial’s source code, focusing on the Tutorial01.cpp file. So in the initialization code of our application we have to load the Vulkan library with something like this:

#if defined(VK_USE_PLATFORM_WIN32_KHR)
VulkanLibrary = LoadLibrary( "vulkan-1.dll" );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
VulkanLibrary = dlopen( "libvulkan.so", RTLD_NOW );
#endif

if( VulkanLibrary == nullptr ) {
  printf( "Could not load Vulkan library!\n" );
  return false;
}
return true;

1.Tutorial01.cpp, function LoadVulkanLibrary()

VulkanLibrary is a variable of type HMODULE in Windows or just void* in Linux. If the value returned by the library loading function is not 0 we can load all exported functions. The Vulkan library, as well as Vulkan implementations (every driver from every vendor), are required to expose only one function that can be loaded with the standard techniques our OS possesses (like the previously mentioned GetProcAddress() in Windows or dlsym() in Linux). Other functions from the Vulkan API may also be available for acquiring using this method but it is not guaranteed (and even not recommended). The only function that must be exported is vkGetInstanceProcAddr().

This function is used to load all other Vulkan functions. To ease our work of obtaining addresses of all Vulkan API functions it is very convenient to place their names inside a macro. This way we won’t have to duplicate function names in multiple places (like definition, declaration, or loading) and can keep them in only one header file. This single file will be used later for different purposes with an #include directive. We can declare our exported function like this:

#if !defined(VK_EXPORTED_FUNCTION)
#define VK_EXPORTED_FUNCTION( fun )
#endif

VK_EXPORTED_FUNCTION( vkGetInstanceProcAddr )

#undef VK_EXPORTED_FUNCTION

2.ListOfFunctions.inl

Now we define the variables that will represent functions from the Vulkan API. This can be done with something like this:

#include "vulkan.h"

#define VK_EXPORTED_FUNCTION( fun ) PFN_##fun fun;
#define VK_GLOBAL_LEVEL_FUNCTION( fun ) PFN_##fun fun;
#define VK_INSTANCE_LEVEL_FUNCTION( fun ) PFN_##fun fun;
#define VK_DEVICE_LEVEL_FUNCTION( fun ) PFN_##fun fun;

#include "ListOfFunctions.inl"

3.VulkanFunctions.cpp

Here we first include the vulkan.h file, which is officially provided for developers that want to use Vulkan API in their applications. This file is similar to the gl.h file in the OpenGL library. It defines all enumerations, structures, types, and function types that are necessary for Vulkan application development. Next we define the macros for functions from each “level” (I will describe these levels soon). The function definition requires providing function type and a function name. Fortunately, function types in Vulkan can be easily derived from function names. For example, the definition of vkGetInstanceProcAddr() function’s type looks like this:

typedef PFN_vkVoidFunction (VKAPI_PTR *PFN_vkGetInstanceProcAddr)(VkInstance instance, const char* pName);

4.Vulkan.h

The definition of a variable that represents this function would then look like this:

PFN_vkGetInstanceProcAddr vkGetInstanceProcAddr;

This is what the macros from VulkanFunctions.cpp file expand to. They take the function name (hidden in a “fun” parameter) and add “PFN_” at the beginning. Then the macro places a space after the type, and adds a function name and a semicolon after that. Functions are “pasted” into the file in the line with the #include “ListOfFunctions.inl” directive.

But we must remember that when we want to define Vulkan functions’ prototypes by ourselves we must define the VK_NO_PROTOTYPES preprocessor directive. By default the vulkan.h header file contains definitions of all functions. This is useful when we are statically linking with Vulkan Runtime. So when we add our own definitions, there will be a compilation error claiming that the given variables (for function pointers) are defined more than once (since we would break the One Definition rule). We can disable definitions from vulkan.h file using the mentioned preprocessor macro.

Similarly we need to declare variables defined in the VulkanFunctions.cpp file so they would be seen in all other parts of our code. This is done in the same way, but the word “extern” is placed before each function. Compare to the VulkanFunctions.h file.

Now we have variables in which we can store addresses of functions acquired from the Vulkan library. To load the only one exported function, we can use the following code:

#if defined(VK_USE_PLATFORM_WIN32_KHR)
#define LoadProcAddress GetProcAddress
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
#define LoadProcAddress dlsym
#endif

#define VK_EXPORTED_FUNCTION( fun )                                 \
if( !(fun = (PFN_##fun)LoadProcAddress( VulkanLibrary, #fun )) ) {  \
  printf( "Could not load exported function: " #fun "!\n" );        \
  return false;                                                     \
}

#include "ListOfFunctions.inl"

return true;

5.Tutorial01.cpp, function LoadExportedEntryPoints()

This macro takes the function name from the “fun” parameter, converts it into a string (with #) and obtains its address from VulkanLibrary. The address is acquired using the GetProcAddress() (on Windows) or dlsym() (on Linux) function and is stored in the variable represented by fun. If this operation fails and the function is not exposed from the library, we report this problem by printing the proper information and returning false. The macro operates on lines included from ListOfFunctions.inl. This way we don’t have to write the names of functions multiple times.

Now that we have our main function-loading procedure, we can load the rest of the Vulkan API procedures. These can be divided into three types:

  • Global-level functions. Allow us to create a Vulkan instance.
  • Instance-level functions. Check what Vulkan-capable hardware is available and what Vulkan features are exposed.
  • Device-level functions. Responsible for performing jobs typically done in a 3D application (like drawing).

We will start with acquiring instance creation functions from the global level.

Acquiring Pointers to Global-Level Functions

Before we can create a Vulkan instance we must acquire the addresses of functions that will allow us to do it. Here is a list of these functions:

  • vkCreateInstance
  • vkEnumerateInstanceExtensionProperties
  • vkEnumerateInstanceLayerProperties

The most important function is vkCreateInstance(), which allows us to create a “Vulkan instance.” From application point of view Vulkan instance can be thought of as an equivalent of OpenGL’s rendering context. It stores per-application state (there is no global state in Vulkan) like enabled instance-level layers and extensions. The other two functions allow us to check what instance layers are available and what instance extensions are available. Validation layers are divided into instance and device levels depending on what functionality they debug. Extensions in Vulkan are similar to OpenGL’s extensions: they expose additional functionality that is not required by core specifications, and not all hardware vendors may implement them. Extensions, like layers, are also divided into instance and device levels, and extensions from different levels must be enabled separately. In OpenGL, all extensions are (usually) available in created contexts; using Vulkan we have to enable them before the functionality exposed by them can be used.

We call the function vkGetInstanceProcAddr() to acquire addresses of instance-level procedures. It takes two parameters: an instance, and a function name. We don’t have an instance yet so we provide “null” for the first parameter. That’s why these functions may sometimes be called null-instance or no-instance level functions. The second parameter required by the vkGetInstanceProcAddr() function is a name of a procedure address of which we want to acquire. We can only load global-level functions without an instance. It is not possible to load any other function without an instance handle provided in the first parameter.

The code that loads global-level functions may look like this:

#define VK_GLOBAL_LEVEL_FUNCTION( fun )                             \
if( !(fun = (PFN_##fun)vkGetInstanceProcAddr( nullptr, #fun )) ) {  \
  printf( "Could not load global level function: " #fun "!\n" );    \
  return false;                                                     \
}

#include "ListOfFunctions.inl"

return true;

6.Tutorial01.cpp, function LoadGlobalLevelEntryPoints()

The only difference between this code and the code used for loading the exported function (vkGetInstanceProcAddr() exposed by the library) is that we don’t use function provided by the OS, like GetProcAddress(), but we call vkGetInstanceProcAddr() where the first parameter is set to null.

If you follow this tutorial and write the code yourself, make sure you add global-level functions wrapped in a properly named macro to ListOfFunctions.inl header file:

#if !defined(VK_GLOBAL_LEVEL_FUNCTION)
#define VK_GLOBAL_LEVEL_FUNCTION( fun )
#endif

VK_GLOBAL_LEVEL_FUNCTION( vkCreateInstance )
VK_GLOBAL_LEVEL_FUNCTION( vkEnumerateInstanceExtensionProperties )
VK_GLOBAL_LEVEL_FUNCTION( vkEnumerateInstanceLayerProperties )

#undef VK_GLOBAL_LEVEL_FUNCTION

7.ListOfFunctions.inl

Creating a Vulkan Instance

Now that we have loaded global-level functions, we can create a Vulkan instance. This is done by calling the vkCreateInstance() function, which takes three parameters.

  • The first parameter has information about our application, the requested Vulkan version, and the instance level layers and extensions we want to enable. This all is done with structures (structures are very common in Vulkan).
  • The second parameter provides a pointer to a structure with list of different functions related to memory allocation. They can be used for debugging purposes but this feature is optional and we can rely on built-in memory allocation methods.
  • The third parameter is an address of a variable in which we want to store Vulkan instance handle. In the Vulkan API it is common that results of operations are stored in variables we provide addresses of. Return values are used only for some pass/fail notifications. Here is the full source code for instance creation:
VkApplicationInfo application_info = {
  VK_STRUCTURE_TYPE_APPLICATION_INFO,             // VkStructureType            sType
  nullptr,                                        // const void                *pNext"API without Secrets: Introduction to Vulkan",  // const char                *pApplicationName
  VK_MAKE_VERSION( 1, 0, 0 ),                     // uint32_t                   applicationVersion"Vulkan Tutorial by Intel",                     // const char                *pEngineName
  VK_MAKE_VERSION( 1, 0, 0 ),                     // uint32_t                   engineVersion
  VK_API_VERSION                                  // uint32_t                   apiVersion
};

VkInstanceCreateInfo instance_create_info = {
  VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO,         // VkStructureType            sType
  nullptr,                                        // const void*                pNext
  0,                                              // VkInstanceCreateFlags      flags
  &application_info,                              // const VkApplicationInfo   *pApplicationInfo
  0,                                              // uint32_t                   enabledLayerCount
  nullptr,                                        // const char * const        *ppEnabledLayerNames
  0,                                              // uint32_t                   enabledExtensionCount
  nullptr                                         // const char * const        *ppEnabledExtensionNames
};

if( vkCreateInstance( &instance_create_info, nullptr, &Vulkan.Instance ) != VK_SUCCESS ) {
  printf( "Could not create Vulkan instance!\n" );
  return false;
}
return true;

8.Tutorial01.cpp, function CreateInstance()

Most of the Vulkan structures begin with a field describing the type of the structure. Parameters are provided to functions by pointers to avoid copying big memory chunks. Sometimes, inside structures, pointers to other structures, are also provided. For the driver to know how many bytes it should read and how members are aligned, the type of the structure is always provided. So what exactly do all these parameters mean?

  • sType – Type of the structure. In this case it informs the driver that we are providing information for instance creation by providing a value of VK_STRUCTURE_TYPE_APPLICATION_INFO.
  • pNext – Additional information for instance creation may be provided in future versions of Vulkan API and this parameter will be used for that purpose. For now, it is reserved for future use.
  • flags – Another parameter reserved for future use; for now it must be set to 0.
  • pApplicationInfo – An address of another structure with information about our application (like name, version, required Vulkan API version, and so on).
  • enabledLayerCount – Defines the number of instance-level validation layers we want to enable.
  • ppEnabledLayerNames – This is an array of enabledLayerCount elements with the names of the layers we would like to enable.
  • enabledExtensionCount – The number of instance-level extensions we want to enable.
  • ppEnabledExtensionNames – As with layers, this parameter should point to an array of at least enabledExtensionCount elements containing names of instance-level extensions we want to use.

Most of the parameters can be nulls or zeros. The most important one (apart from the structure type information) is a parameter pointing to a variable of type VkApplicationInfo. So before specifying instance creation information, we also have to specify an additional variable describing our application. This variable contains the name of our application, the name of the engine we are using, or the Vulkan API version we require (which is similar to the OpenGL version; if the driver doesn’t support this version, the instance will not be created). This information may be very useful for the driver. Remember that some graphics card vendors provide drivers that can be specialized for a specific title, such as a specific game. If a graphics card vendor knows what graphics the engine game uses, it can optimize the driver’s behavior so the game performs faster. This application information structure can be used for this purpose. The parameters from the VkApplicationInfo structure include:

  • sType – Type of structure. Here VK_STRUCTURE_TYPE_APPLICATION_INFO, information about the application.
  • pNext – Reserved for future use.
  • pApplicationName – Name of our application.
  • applicationVersion – Version of our application; it is quite convenient to use Vulkan macro for version creation. It packs major, minor, and patch numbers into one 32-bit value.
  • pEngineName – Name of the engine our application uses.
  • engineVersion – Version of the engine we are using in our application.
  • apiVersion – Version of the Vulkan API we want to use. It is best to provide the version defined in the Vulkan header we are including, which is why we use VK_API_VERSION found in the vulkan.h header file.

So now that we have defined these two structures we can call the vkCreateInstance() function and check whether an instance was created. If successful, instance handle will be stored in a variable we provided the address of and VK_SUCCESS (which is zero!) is returned.

Acquiring Pointers to Instance-Level Functions

We have created a Vulkan instance. Next we can acquire pointers to functions that allow us to create a logical device, which can be seen as a user view on a physical device. There may be many different devices installed on a computer that support Vulkan. Each of these devices may have different features and capabilities and different performance, or may support different functionalities. When we want to use Vulkan, we must specify which device to perform the operations on. We may use many devices for different purposes (such as one for rendering 3D graphics, one for physics calculations, and one for media decoding). We must check what devices and how many of them are available, what their capabilities are, and what operations they support. This is all done with instance-level functions. We get the addresses of these functions using the vkGetInstanceProcAddr() function used earlier. But this time we will provide handle to a created Vulkan instance.

Loading every Vulkan procedure using the vkGetInstanceProcAddr() function and Vulkan instance handle comes with some trade-offs. When we use Vulkan for data processing, we must create a logical device and acquire device-level functions. But on the computer that runs our application, there may be many devices that support Vulkan. Determining which device to use depends on the mentioned logical device. But vkGetInstanceProcAddr() doesn’t recognize a logical device, as there is no parameter for it. When we acquire device-level procedures using this function we in fact acquire addresses of a simple “jump” functions. These functions take the handle of a logical device and jump to a proper implementation (function implemented for a specific device). The overhead of this jump can be avoided. The recommended behavior is to load procedures for each device separately using another function. But we still have to use the vkGetInstanceProcAddr() function to load functions that allow us to create such a logical device.

Some of the instance level functions include:

  • vkEnumeratePhysicalDevices
  • vkGetPhysicalDeviceProperties
  • vkGetPhysicalDeviceFeatures
  • vkGetPhysicalDeviceQueueFamilyProperties
  • vkCreateDevice
  • vkGetDeviceProcAddr
  • vkDestroyInstance

These are the functions that are required and are used in this tutorial to create a logical device. But there are other instance-level functions, that is, from extensions. The list in a header file from the example solution’s source code will expand. The source code used to load all these functions is:

#define VK_INSTANCE_LEVEL_FUNCTION( fun )                                   \
if( !(fun = (PFN_##fun)vkGetInstanceProcAddr( Vulkan.Instance, #fun )) ) {  \
  printf( "Could not load instance level function: " #fun "\n" );           \
  return false;                                                             \
}

#include "ListOfFunctions.inl"

return true;

9.Tutorial01.cpp, function LoadInstanceLevelEntryPoints()

The code for loading instance-level functions is almost identical to the code loading global-level functions. We just change the first parameter of vkGetInstanceProcAddr() function from null to create Vulkan instance handle. Of course we also operate on instance-level functions so now we redefine the VK_INSTANCE_LEVEL_FUNCTION() macro instead of a VK_GLOBAL_LEVEL_FUNCTION() macro. We also need to define functions from the instance level. As before, this is best done with a list of macro-wrapped names collected in a shared header, for example:

#if !defined(VK_INSTANCE_LEVEL_FUNCTION)
#define VK_INSTANCE_LEVEL_FUNCTION( fun )
#endif

VK_INSTANCE_LEVEL_FUNCTION( vkDestroyInstance )
VK_INSTANCE_LEVEL_FUNCTION( vkEnumeratePhysicalDevices )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceProperties )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceFeatures )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceQueueFamilyProperties )
VK_INSTANCE_LEVEL_FUNCTION( vkCreateDevice )
VK_INSTANCE_LEVEL_FUNCTION( vkGetDeviceProcAddr )
VK_INSTANCE_LEVEL_FUNCTION( vkEnumerateDeviceExtensionProperties )

#undef VK_INSTANCE_LEVEL_FUNCTION

10.ListOfFunctions.inl

Instance-level functions operate on physical devices. In Vulkan we can see “physical devices” and “logical devices” (simply called devices). As the name suggests, a physical device refers to any physical graphics card (or any other hardware component) that is installed on a computer running a Vulkan-enabled application that is capable of executing Vulkan commands. As mentioned earlier, such a device may expose and implement different (optional) Vulkan features, may have different capabilities (like total memory or ability to work on buffer objects of different sizes), or may provide different extensions. Such hardware may be a dedicated (discrete) graphics card or an additional chip built (integrated) into a main processor. It may even be the CPU itself. Instance-level functions allow us to check all these parameters. After we check them, we must decide (based on our findings and our needs) which physical device we want to use. Maybe we even want to use more than one device, which is also possible, but this scenario is too advanced for now. So if we want to harness the power of any physical device we must create a logical device that represents our choice in the application (along with enabled layers, extensions, features, and so on). After creating a device (and acquiring queues) we are prepared to use Vulkan, the same way as we are prepared to use OpenGL after creating rendering context.

Creating a Logical Device

Before we can create a logical device, we must first check to see how many physical devices are available in the system we execute our application on. Next we can get handles to all available physical devices:

uint32_t num_devices = 0;
if( (vkEnumeratePhysicalDevices( Vulkan.Instance, &num_devices, nullptr ) != VK_SUCCESS) ||
    (num_devices == 0) ) {
  printf( "Error occurred during physical devices enumeration!\n" );
  return false;
}

std::vector<VkPhysicalDevice> physical_devices( num_devices );
if( vkEnumeratePhysicalDevices( Vulkan.Instance, &num_devices, &physical_devices[0] ) != VK_SUCCESS ) {
  printf( "Error occurred during physical devices enumeration!\n" );
  return false;
}

11.Tutorial01.cpp, function CreateDevice()

To check how many devices are available, we call the vkEnumeratePhysicalDevices() function. We call it twice, first with the last parameter set to null. This way the driver knows that we are asking only for the number of available physical devices. This number will be stored in the variable we provided the address of in the second parameter.

Now that we know how many physical devices are available we can prepare storage for their handles. I use a vector so I don’t need to worry about memory allocation and deallocation. When we call vkEnumeratePhysicalDevices() again, this time with all the parameters not equal to null, we will acquire handles of the physical devices in the array we provided addresses of in the last parameter. This array may not be the same size as the number returned after the first call, but it must hold the same number of elements as defined in the second parameter.

Example: we can have four physical devices available, but we are interested only in the first one. So after the first call we set a value of four in num_devices. This way we know that there is any Vulkan-compatible device and that we can proceed. We overwrite this value with one as we only want to use one (any) such device, no matter which. And we will get only one physical device handle after the second call.

The number of devices we provided will be replaced by the actual number of enumerated physical devices (which of course will not be greater than the value we provided). Example: we don’t want to call this function twice. Our application supports up to 10 devices and we provide this value along with a pointer to a static, 10-element array. The driver always returns the number of actually enumerated devices. If there is none, zero is stored in the variable address we provided. If there is any such device, we will also know that. We will not be able to tell if there are more than 10 devices.

Now that we have handles of all the Vulkan compatible physical devices we can check the properties of each device. In the sample code, this is done inside a loop:

VkPhysicalDevice selected_physical_device = VK_NULL_HANDLE;
uint32_t selected_queue_family_index = UINT32_MAX;
for( uint32_t i = 0; i < num_devices; ++i ) {
  if( CheckPhysicalDeviceProperties( physical_devices[i], selected_queue_family_index ) ) {
    selected_physical_device = physical_devices[i];
  }
}

12.Tutorial01.cpp, function CreateDevice()

Device Properties

I created the CheckPhysicalDeviceProperties() function. It takes the handle of a physical device and checks whether the capabilities of a given device are enough for our application to work properly. If so, it returns true and stores the queue family index in the variable provided in the second parameter. Queues and queue families are discussed in a later section.

Here is the first half of a CheckPhysicalDeviceProperties() function:

VkPhysicalDeviceProperties device_properties;
VkPhysicalDeviceFeatures   device_features;

vkGetPhysicalDeviceProperties( physical_device, &device_properties );
vkGetPhysicalDeviceFeatures( physical_device, &device_features );

uint32_t major_version = VK_VERSION_MAJOR( device_properties.apiVersion );
uint32_t minor_version = VK_VERSION_MINOR( device_properties.apiVersion );
uint32_t patch_version = VK_VERSION_PATCH( device_properties.apiVersion );

if( (major_version < 1) &&
    (device_properties.limits.maxImageDimension2D < 4096) ) {
  printf( "Physical device %p doesn't support required parameters!\n", physical_device );
  return false;
}

13.Tutorial01.cpp, function CheckPhysicalDeviceProperties()

At the beginning of this function, the physical device is queried for its properties and features. Properties contain fields such as supported Vulkan API version, device name and type (integrated or dedicated/discrete GPU), Vendor ID, and limits. Limits describe how big textures can be created, how many samples in anti-aliasing are supported, or how many buffers in a given shader stage can be used.

Device Features

Features are additional hardware capabilities that are similar to extensions. They may not necessarily be supported by the driver and by default are not enabled. Features contain items such as geometry and tessellation shaders multiple viewports, logical operations, or additional texture compression formats. If a given physical device supports any feature we can enable it during logical device creation. Features are not enabled by default in Vulkan. But the Vulkan spec points out that some features may have performance impact (like robustness).

After querying for hardware info and capabilities, I have provided a small example of how these queries can be used. I “reversed” the VK_MAKE_VERSION macro and retrieved major, minor, and patch versions from the apiVersion field of device properties. I check whether it is above some version I want to use, and also check whether I can create 2D textures of a given size. In this example I’m not using features at all, but if we want to use any feature (that is,  geometry shaders) we must check whether it is supported and we must (explicitly) enable it later, during logical device creation. And this is the reason why we need to create a logical device and not use physical device directly. A logical device represents a physical device and all the features and extensions we enabled for it.

The next part of checking physical device’s capabilities—queues—requires additional explanation.

Queues, Queue Families, and Command Buffers

When we want to process any data (that is, draw a 3D scene from vertex data and vertex attributes) we call Vulkan functions that are passed to the driver. These functions are not passed directly, as sending each request separately down through a communication bus is inefficient. It is better to aggregate them and pass in groups. In OpenGL this was done automatically by the driver and was hidden from the user. OpenGL API calls were queued in a buffer and if this buffer was full (or if we requested to flush it) whole buffer was passed to hardware for processing. In Vulkan this mechanism is directly visible to the user and, more importantly, the user must specifically create and manage buffers for commands. These are called (conveniently) command buffers.

Command buffers (as whole objects) are passed to the hardware for execution through queues. However, these buffers may contain different types of operations, such as graphics commands (used for generating and displaying images like in typical 3D games) or compute commands (used for processing data). Specific types of commands may be processed by dedicated hardware, and that’s why queues are also divided into different types. In Vulkan these queue types are called families. Each queue family may support different types of operations. That’s why we also have to check if a given physical device supports the type of operations we want to perform. We can also perform one type of operation on one device and another type of operation on another device, but we have to check if we can. This check is done in the second half of CheckPhysicalDeviceProperties() function:

uint32_t queue_families_count = 0;
vkGetPhysicalDeviceQueueFamilyProperties( physical_device, &queue_families_count, nullptr );
if( queue_families_count == 0 ) {
  printf( "Physical device %p doesn't have any queue families!\n", physical_device );
  return false;
}

std::vector<VkQueueFamilyProperties> queue_family_properties( queue_families_count );
vkGetPhysicalDeviceQueueFamilyProperties( physical_device, &queue_families_count, &queue_family_properties[0] );
for( uint32_t i = 0; i < queue_families_count; ++i ) {
  if( (queue_family_properties[i].queueCount > 0) &&
      (queue_family_properties[i].queueFlags & VK_QUEUE_GRAPHICS_BIT) ) {
    queue_family_index = i;
    return true;
  }
}

printf( "Could not find queue family with required properties on physical device %p!\n", physical_device );
return false;

14.Tutorial01.cpp, function CheckPhysicalDeviceProperties()

We must first check how many different queue families are available in a given physical device. This is done in a similar way to enumerating physical devices. First we call vkGetPhysicalDeviceQueueFamilyProperties() with the last parameter set to null. This way, in a “queue_count” a variable number of different queue families is stored. Next we can prepare a place for this number of queue families’ properties (if we want to—the mechanism is similar to enumerating physical devices). Next we call the function again and the properties for each queue family are stored in a provided array.

The properties of each queue family contain queue flags, the number of available queues in this family, time stamp support, and image transfer granularity. Right now, the most important part is the number of queues in the family and flags. Flags (which is a bitfield) define which types of operations are supported by a given queue family (more than one may be supported). It can be graphics, compute, transfer (memory operations like copying), and sparse binding (for sparse resources like mega-textures) operations. Other types may appear in the future.

In our example we check for graphics operations support, and if we find it we can use the given physical device. Remember that we also have to remember the selected family index. After we chose the physical device we can create logical device that will represent it in the rest of our application, as shown in the example:

if( selected_physical_device == VK_NULL_HANDLE ) {
  printf( "Could not select physical device based on the chosen properties!\n" );
  return false;
}

std::vector<float> queue_priorities = { 1.0f };

VkDeviceQueueCreateInfo queue_create_info = {
  VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO,     // VkStructureType              sType
  nullptr,                                        // const void                  *pNext
  0,                                              // VkDeviceQueueCreateFlags     flags
  selected_queue_family_index,                    // uint32_t                     queueFamilyIndex
  static_cast<uint32_t>(queue_priorities.size()), // uint32_t                     queueCount&queue_priorities[0]                            // const float                 *pQueuePriorities
};

VkDeviceCreateInfo device_create_info = {
  VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO,           // VkStructureType                    sType
  nullptr,                                        // const void                        *pNext
  0,                                              // VkDeviceCreateFlags                flags
  1,                                              // uint32_t                           queueCreateInfoCount
  &queue_create_info,                             // const VkDeviceQueueCreateInfo     *pQueueCreateInfos
  0,                                              // uint32_t                           enabledLayerCount
  nullptr,                                        // const char * const                *ppEnabledLayerNames
  0,                                              // uint32_t                           enabledExtensionCount
  nullptr,                                        // const char * const                *ppEnabledExtensionNames
  nullptr                                         // const VkPhysicalDeviceFeatures    *pEnabledFeatures
};

if( vkCreateDevice( selected_physical_device, &device_create_info, nullptr, &Vulkan.Device ) != VK_SUCCESS ) {
  printf( "Could not create Vulkan device!\n" );
  return false;
}

Vulkan.QueueFamilyIndex = selected_queue_family_index;
return true;

15.Tutorial01.cpp, function CreateDevice()

First we make sure that after we exited the device features loop, we have found the device that supports our needs. Next we can create a logical device, which is done by calling vkCreateDevice(). It takes the handle to a physical device and an address of a structure that contains the information necessary for device creation. This structure is of type VkDeviceCreateInfo and contains the following fields:

  • sType – Standard type of a provided structure, VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO here that means we are providing parameters for a device creation.
  • pNext – Parameter pointing to an extension specific structure; here we set it to null.
  • flags – Another parameter reserved for future use which must be zero.
  • queueCreateInfoCount – Number of different queue families from which we create queues along with the device.
  • pQueueCreateInfos – Pointer to an array of queueCreateInfoCount elements specifying queues we want to create.
  • enabledLayerCount – Number of device-level validation layers to enable.
  • ppEnabledLayerNames – Pointer to an array with enabledLayerCount names of device-level layers to enable.
  • enabledExtensionCount – Number of extensions to enable for the device.
  • ppEnabledExtensionNames – Pointer to an array with enabledExtensionCount elements; each element must contain the name of an extension that should be enabled.
  • pEnabledFeatures – Pointer to a structure indicating additional features to enable for this device (see the “Device ” section).

Features (as I have described earlier) are additional hardware capabilities that are disabled by default. If we want to enable all available features, we can’t simply fill this structure with ones. If some feature is not supported, the device creation will fail. Instead, we should pass a structure that was filled when we called vkGetPhysicalDeviceFeatures(). This is the easiest way to enable all supported features. If we are interested only in some specific features, we query the driver for available features and clear all unwanted fields. If we don’t want any of the additional features we can clear this structure (fill it with zeros) or pass a null pointer for this parameter (like in this example).

Queues are created automatically along with the device. To specify what types of queues we want to enable, we provide an array of additional VkDeviceQueueCreateInfo structures. This array must contain queueCreateInfoCount elements. Each element in this array must refer to a different queue family; we refer to a specific queue family only once.

The VkDeviceQueueCreateInfo structure contains the following fields:

  • sType –Type of structure, here VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO  indicating it’s queue creation information.
  • pNext – Pointer reserved for extensions.
  • flags – Value reserved for future use.
  • queueFamilyIndex – Index of a queue family from which queues should be created.
  • queueCount – Number of queues we want to enable in this specific queue family (number of queues we want to use from this family) and a number of elements in the pQueuePriorities array.
  • pQueuePriorities – Array with floating point values describing priorities of operations performed in each queue from this family.

As I mentioned previously, each element in the array with VkDeviceQueueCreateInfo elements must describe a different queue family. Its index is a number that must be smaller than the value provided by the vkGetPhysicalDeviceQueueFamilyProperties() function (must be smaller than number of available queue families). In our example we are only interested in one queue from one queue family. And that’s why we must remember the queue family index. It is used right here. If we want to prepare a more complicated scenario, we should also remember the number of queues in each family as each family may support a different number of queues. And we can’t create more queues than are available in a given family!

It is also worth noting that different queue families may have similar (or even identical properties) meaning they may support similar types of operations, that is, there may be more than one queue families that support graphics operations. And each family may contain different number of queues.

We must also assign a floating point value (from 0.0 to 1.0, both inclusive) to each queue. The higher the value we provide for a given queue (relative to values assigned to other queues) the more time the given queue may have for processing commands (relatively to other queues). But this relation is not guaranteed. Priorities also don’t influence execution order. It is just a hint.

Priorities are relative only on a single device. If operations are performed on multiple devices, priorities may impact processing time in each of these devices but not between them. A queue with a given value may be more important only than queues with lower priorities on the same device. Queues from different devices are treated independently. Once we fill these structures and call vkCreateDevice(), upon success a created logical device is stored in a variable we provided an address of (in our example it is called VulkanDevice). If this function fails, it returns a value other than VK_SUCCESS.

Acquiring Pointers to Device-Level Functions

We have created a logical device. We can now use it to load functions from the device level. As I have mentioned earlier in real-life scenarios, there will be situations where more than one hardware vendor on a single computer will provide us with Vulkan implementation. With OpenGL it is happening now. Many computers have dedicated/discrete graphics card used mainly for gaming, but they also have Intel’s graphics card built into the processor (which of course can also be used for games). So in the future there will be more than one device supporting Vulkan. And with Vulkan we can divide processing into whatever hardware we want. Remember when there were extension cards dedicated for physics processing? Or going farther into the past, a normal “2D” card with additional graphics “accelerator” (do you remember Voodoo cards)? Vulkan is ready for any such scenario.

So what should we do with device-level functions if there can be so many devices? We can load universal procedures. This is done with the vkGetInstanceProcAddr() function. It returns the addresses of dispatch functions that perform jumps to proper implementations based on a provided logical device handle. But we can avoid this overhead by loading functions for each logical device separately. With this method, we must remember that we can call the given function only with the device we loaded this function from. So if we are using more devices in our application we must load functions from each of these devices. It’s not that difficult. And despite this leading to storing more functions (and grouping them based on a device they were loaded from), we can avoid one level of abstraction and save some processor time. We can load functions similarly to how we have loaded exported, global-, and instance-level functions:

#define VK_DEVICE_LEVEL_FUNCTION( fun )                                 \
if( !(fun = (PFN_##fun)vkGetDeviceProcAddr( Vulkan.Device, #fun )) ) {  \
  printf( "Could not load device level function: " #fun "!\n" );        \
  return false;                                                         \
}

#include "ListOfFunctions.inl"

return true;

16.Tutorial01.cpp, function LoadDeviceLevelEntryPoints()

This time we used the vkGetDeviceProcAddr() function along with a logical device handle. Functions from device level are placed in a shared header. This time they are wrapped in a VK_DEVICE_LEVEL_FUNCTION() macro like this:

#if !defined(VK_DEVICE_LEVEL_FUNCTION)
#define VK_DEVICE_LEVEL_FUNCTION( fun )
#endif

VK_DEVICE_LEVEL_FUNCTION( vkGetDeviceQueue )
VK_DEVICE_LEVEL_FUNCTION( vkDestroyDevice )
VK_DEVICE_LEVEL_FUNCTION( vkDeviceWaitIdle )

#undef VK_DEVICE_LEVEL_FUNCTION

17.ListOfFunctions.inl

All functions that are not from the exported, global or instance levels are from the device level. Another distinction can be made based on a first parameter: for device-level functions, the first parameter in the list may only be of type VkDevice, VkQueue, or VkCommandBuffer. In the rest of the tutorial if a new function appears it must be added to ListOfFunctions.inl and further added in the VK_DEVICE_LEVEL_FUNCTION portion (with a few noted exceptions like extensions).

Retrieving Queues

Now that we have created a device, we need a queue that we can submit some commands to for processing. Queues are automatically created with a logical device, but in order to use them we must specifically ask for a queue handle. This is done with vkGetDeviceQueue() like this:

vkGetDeviceQueue( Vulkan.Device, Vulkan.QueueFamilyIndex, 0, &Vulkan.Queue );

18.Tutorial01.cpp, function GetDeviceQueue()

To retrieve the queue handle we must provide the logical device we want to get the queue from. The queue family index is also needed and it must by one of the indices we’ve provided during logical device creation (we cannot create additional queues or use queues from families we didn’t request). One last parameter is a queue index from within a given family; it must be smaller than the total number of queues we requested from a given family. For example if the device supports five queues in family number 3 and we want two queues from that family, the index of a queue must be smaller than two. For each queue we want to retrieve we have to call this function and make a separate query. If the function call succeeds, it will store a handle to a requested queue in a variable we have provided the address of in the final parameter. From now on, all the work we want to perform (using command buffers) can be submitted for processing to the acquired queue.

Tutorial01 Execution

As I have mentioned, the example provided with this tutorial doesn’t display anything. But we have learned enough information for one lesson. So how do we know if everything went fine? If the normal application window appears and nothing is printed in the console/terminal, this means the Vulkan setup was successful. Starting with the next tutorial, the results of our operations will be displayed on the screen.

Cleaning Up

There is one more thing we need to remember: cleaning up and freeing resources. Cleanup must be done in a specific order that is (in general) a reversal of the order of creation.

After the application is closed, the OS should release memory and all other resources associated with it. This should include Vulkan; the driver usually cleans up unreferenced resources. Unfortunately, this cleaning may not be performed in a proper order, which might lead to application crash during the closing process. It is always good practice to do the cleaning ourselves. Here is the sample code required to release resources we have created during this first tutorial:

if( Vulkan.Device != VK_NULL_HANDLE ) {
  vkDeviceWaitIdle( Vulkan.Device );
  vkDestroyDevice( Vulkan.Device, nullptr );
}

if( Vulkan.Instance != VK_NULL_HANDLE ) {
  vkDestroyInstance( Vulkan.Instance, nullptr );
}

if( VulkanLibrary ) {
#if defined(VK_USE_PLATFORM_WIN32_KHR)
  FreeLibrary( VulkanLibrary );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
  dlclose( VulkanLibrary );
#endif
}

19.Tutorial01.cpp, destructor

We should always check to see whether any given resource was created. Without a logical device there are no device-level function pointers so we are unable to call even proper resource cleaning functions. Similarly, without an instance we are unable to acquire pointer to a vkDestroyInstance() function. In general we should not release resources that weren’t created.

We must ensure that before deleting any object, it is not being used by a device. That’s why there is a wait function, which will block until all processing on all queues of a given device is finished. Next, we destroy the logical device using the vkDestroyDevice() function. All queues associated with it are destroyed automatically, then the instance is destroyed. After that we can free (unload or release) a Vulkan library from which all these functions were acquired.

Conclusion

This tutorial explained how to prepare to use Vulkan in our application. First we “connect” with the Vulkan Runtime library and load global level functions from it. Then we create a Vulkan instance and load instance-level functions. After that we can check what physical devices are available and what are their features, properties, and capabilities. Next we create a logical device and describe what and how many queues must be created along with the device. After that we can retrieve device-level functions using the newly created logical device handle. One additional thing to do is to retrieve queues through which we can submit work for execution.


Go to: API without Secrets: Introduction to Vulkan* Part 2: Swap chain


Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

API without Secrets: Introduction to Vulkan* Part 2: Swap Chain

$
0
0

Download [PDF 1 MB]

Link to Github Sample Code


Go to: API without Secrets: Introduction to Vulkan* Part 1: The Beginning


Table of Contents

Tutorial 2: Swap Chain – Integrating Vulkan with the OS

Welcome to the second Vulkan tutorial. In the first tutorial, I discussed basic Vulkan setup: function loading, instance creation, choosing a physical device and queues, and logical device creation. I’m sure you now want to draw something! Unfortunately we must wait until the next part. Why? Because if we draw something we’ll want to see it. Similar to OpenGL*, we must integrate the Vulkan pipeline with the application and API that the OS provides. However, with Vulkan, this task unfortunately isn’t simple and obvious. And as with all other thin APIs, it is done this way on purpose—for  the sake of high performance and flexibility.

So how do you integrate Vulkan with the application’s window? What are the differences compared to OpenGL? In OpenGL (on Microsoft Windows*) we acquire Device Context that is associated with the application’s window. Using it we then have to define “how” to present images on the screen, “what” the format is of the application’s window we will be drawing on, and what capabilities it should support. This is done through the pixel format. Most of the time we create a 32-bit color surface with a 24-bit depth buffer and a support for double buffering (this way we can draw something to a “hidden” back buffer, and after we’re finished we can present it on the screen—swap front and back buffers). Only after these preparations can we create a Rendering Context and activate it. In OpenGL, all the rendering is directed to the default, back buffer.

In Vulkan there is no default frame buffer. We can create an application that displays nothing at all. This is a valid approach. But if we want to display something we can create a set of buffers to which we can render. These buffers along with their properties, similar to Direct3D*, are called a swap chain. A swap chain can contain many images. To display any of them we don’t “swap” them—as the name suggests—but we present them, which means that we give them back to a presentation engine. So in OpenGL we first have to define the surface format and associate it with a window (at least on Windows) and after that we create Rendering Context. In Vulkan, we first create an instance, a device, and then we create a swap chain. But, what’s interesting is that there will be situations where we will have to destroy this swap chain and recreate it. In the middle of a working application. From scratch!

Asking for a Swap Chain Extension

In Vulkan, a swap chain is an extension. Why? Isn’t it obvious we want to display an image on the screen in our application’s window?

Well, it’s not so obvious. Vulkan can be used for many different purposes, including performing mathematical operations, boosting physics calculations, and processing a video stream. The results of these actions may not necessarily be displayed on a typical monitor, which is why the core API is OS-agnostic, similar to OpenGL.

If you want to create a game and display rendered images on a monitor, you can (and should) use a swap chain. But here is the second reason why a swap chain is an extension. Every OS displays images in a different way. The  surface on which you can render may be implemented differently, can have a different format, and can be differently represented in the OS—there is no one universal way to do it. So in Vulkan a swap chain must also depend on the OS your application is written for.

These are the reasons a swap chain in Vulkan is treated as an extension: it provides render targets (buffers or images like FBOs in OpenGL) that integrates with OS specific code. It’s something that core Vulkan (which is platform independent) can’t do. So if swap chain creation and usage is an extension, we have to ask for the extension during both instance and device creation. The ability to create and use a swap chain requires us to enable extensions at two levels (at least on most operating systems, with Windows and Linux* among them). This means that we have to go back to the first tutorial and change it to request the proper swap-chain-related extensions. If a given instance and device doesn’t support these extensions, the instance and/or device creation will fail. There are of course other ways through which we can display an image, like acquiring the pointer to a buffer’s (texture’s) memory (mapping it) and copying data from it to the OS-acquired window’s surface pointer. This process is out of scope of this tutorial (though not really that hard). But fortunately it seems that swap chain extensions will be similar to OpenGL’s core extensions: they will be something that’s not in the core spec and that’s not required to be implemented but they also are something that every hardware vendor will implement anyway. I think all hardware vendors would like to show that they support Vulkan and that it gives impressive performance boost in games which are displayed on screen. And, what backs this theory, swap chain extensions are integrated into the main, “core” vulkan.h header.

In the case of swap-chain support, there are actually three extensions involved: two from an instance level and one from a device level. These extensions logically separate different functionalities. The first is the VK_KHR_surface extension defined at the instance level. It describes a “surface” object, which is a logical representation of an application’s window. This extension allows us to check different parameters (that is,  capabilities, supported formats, size) of a surface and to query whether the given physical device supports a swap chain (more precisely, whether the given queue family supports presenting an image to a given surface). This is useful information because we don’t want to choose a physical device and try to create a logical device from it only to find out that it doesn’t support swap chains. This extension also defines methods to destroy any such surface.

The second instance-level extension is OS-dependent: in the Windows OS family it is called VK_KHR_win32_surface and in Linux it is called VK_KHR_xlib_surface or VK_KHR_xcb_surface. This extension allows us to create a surface that represents the application’s window in a given OS (and uses OS-specific parameters).

Checking Whether an Instance Extension Is Supported

Before we can enable the two instance-level extensions, we need to check whether they are available or supported. We are talking about instance extensions and we haven’t created any instance yet. To determine whether our Vulkan instance supports these extensions, we use a global-level function called vkEnumerateInstanceExtensionProperties(). It enumerates all available instance general extensions, if its first parameter is null, or instance layer extensions (it seems that layers can also have extensions), if we set the first parameter to the name of any given layer. We aren’t interested in layers so we leave the first parameter set to null. Again we call this function twice. For the first call, we want to acquire the total number of supported extensions so we leave the third argument nulled. Next we prepare storage for all these extensions and we call this function once again with the third parameter pointing to the allocated storage.

uint32_t extensions_count = 0;
if( (vkEnumerateInstanceExtensionProperties( nullptr, &extensions_count, nullptr ) != VK_SUCCESS) ||
    (extensions_count == 0) ) {
  printf( "Error occurred during instance extensions enumeration!\n" );
  return false;
}

std::vector<VkExtensionProperties> available_extensions( extensions_count );
if( vkEnumerateInstanceExtensionProperties( nullptr, &extensions_count, &available_extensions[0] ) != VK_SUCCESS ) {
  printf( "Error occurred during instance extensions enumeration!\n" );
  return false;
}

std::vector<const char*> extensions = {
  VK_KHR_SURFACE_EXTENSION_NAME,
#if defined(VK_USE_PLATFORM_WIN32_KHR)
  VK_KHR_WIN32_SURFACE_EXTENSION_NAME
#elif defined(VK_USE_PLATFORM_XCB_KHR)
  VK_KHR_XCB_SURFACE_EXTENSION_NAME
#elif defined(VK_USE_PLATFORM_XLIB_KHR)
  VK_KHR_XLIB_SURFACE_EXTENSION_NAME
#endif
};

for( size_t i = 0; i < extensions.size(); ++i ) {
  if( !CheckExtensionAvailability( extensions[i], available_extensions ) ) {
    printf( "Could not find instance extension named \"%s\"!\n", extensions[i] );
    return false;
  }
}

1. Tutorial02.cpp, function CreateInstance()

We can prepare a place for a smaller amount of extensions, but then vkEnumerateInstanceExtensionProperties() will return VK_INCOMPLETE to let us know we didn’t acquire all the extensions.

Our array is now filled with all available (supported) instance-level extensions. Each element of our allocated space contains the name of the extension and its version. The second parameter probably won’t be used too often, but it may be useful to check whether the hardware supports the given version of the extension. For example, we might be  interested in some specific extension, and we downloaded an SDK for it that contains a set of header files. Each header file has its own version corresponding to the value returned by this query. If the hardware our application is executed on supports an older version of the extension (not the one we downloaded the SDK for) it may not support all the functions we are using from this specific extension. So sometimes it may be useful to also verify the version, but for a swap chain it doesn’t matter—at least for now.

We can now search through all of the returned extensions and see whether the list contains the extensions we are looking for. Here I’m using two convenient definitions named VK_KHR_SURFACE_EXTENSION_NAME and VK_KHR_????_SURFACE_EXTENSION_NAME. They are defined inside a Vulkan header file and contain the names of the extensions so we don’t have to copy or remember them. We just can use the definitions in our code, and if we make a mistake the compiler will tell us. I hope all extensions will come with a similar definition.

With the second definition comes a small trap. These two mentioned defines are placed in a vulkan.h header file. But isn’t the second define specific for a given OS and isn’t vulkan.h header OS independent? Both questions are true and quite valid. The vulkan.h file is OS-independent and it contains the definitions of OS-specific extensions. But these are enclosed inside #ifdef … #endif preprocessor directives. If we want to “enable” them we need to add a proper preprocessor directive somewhere in our project. For a Windows system, we need to add a VK_USE_PLATFORM_WIN32_KHR string. On Linux, we need to add VK_USE_PLATFORM_XCB_KHR or VK_USE_PLATFORM_XLIB_KHR depending on whether we want to use the X11 or XCB libraries. In the provided example project, these definitions are added by default through the CMakeLists.txt file.

But back to our source code. What does the CheckExtensionAvailability() function do? It loops over all available extensions and compares their names with the name of the provided extension. If a match is found, it just returns true.

for( size_t i = 0; i < available_extensions.size(); ++i ) {
  if( strcmp( available_extensions[i].extensionName, extension_name ) == 0 ) {
    return true;
  }
}
return false;

2.Tutorial02.cpp, function CheckExtensionAvailability()

Enabling an Instance-Level Extension

Let’s say we have verified that both extensions are supported. Instance-level extensions are requested (enabled) during instance creation—we create an instance with a list of extensions that should be enabled. Here’s the code responsible for doing it:

VkApplicationInfo application_info = {
  VK_STRUCTURE_TYPE_APPLICATION_INFO,             // VkStructureType            sType
  nullptr,                                        // const void                *pNext"API without Secrets: Introduction to Vulkan",  // const char                *pApplicationName
  VK_MAKE_VERSION( 1, 0, 0 ),                     // uint32_t                   applicationVersion"Vulkan Tutorial by Intel",                     // const char                *pEngineName
  VK_MAKE_VERSION( 1, 0, 0 ),                     // uint32_t                   engineVersion
  VK_API_VERSION                                  // uint32_t                   apiVersion
};

VkInstanceCreateInfo instance_create_info = {
  VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO,         // VkStructureType            sType
  nullptr,                                        // const void                *pNext
  0,                                              // VkInstanceCreateFlags      flags
  &application_info,                              // const VkApplicationInfo   *pApplicationInfo
  0,                                              // uint32_t                   enabledLayerCount
  nullptr,                                        // const char * const        *ppEnabledLayerNames
  static_cast<uint32_t>(extensions.size()),       // uint32_t                   enabledExtensionCount&extensions[0]                                  // const char * const        *ppEnabledExtensionNames
};

if( vkCreateInstance( &instance_create_info, nullptr, &Vulkan.Instance ) != VK_SUCCESS ) {
  printf( "Could not create Vulkan instance!\n" );
  return false;
}
return true;

3.Tutorial02.cpp, function CreateInstance()

This code is similar to the CreateInstance() function in the Tutorial01.cpp file. To request instance-level extensions we have to prepare an array with the names of all extensions we want to enable. Here I have used a standard vector with “const char*” elements and mentioned extension names in forms of defines.

In Tutorial 1 we declared zero extensions and placed a nullptr for the address of an array in a VkInstanceCreateInfo structure. This time we must provide an address of the first element of an array filled with the names of the requested extensions. And we must also specify how many elements the array contains (that’s why I chose a vector: if I add or remove extensions in future tutorials, the vector’s size will also change accordingly). Next we call the vkCreateInstance() function. If it doesn’t return VK_SUCCESS it means that (in the case of this tutorial) extensions are not supported. If it does return successfully, we can load instance-level functions as previously, but this time also with some additional, extension-specific functions.

With these extensions come additional functions. And, as it is an instance-level extension, we must add them to our set of instance-level functions (so they will also be loaded at a proper moment and with a proper function). In this case we must add the following functions into a ListOfFunctions.inl wrapped into a VK_INSTANCE_LEVEL_FUNCTION() macro like this:

// From extensions
#if defined(USE_SWAPCHAIN_EXTENSIONS)
VK_INSTANCE_LEVEL_FUNCTION( vkDestroySurfaceKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceSupportKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceCapabilitiesKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfaceFormatsKHR )
VK_INSTANCE_LEVEL_FUNCTION( vkGetPhysicalDeviceSurfacePresentModesKHR )
#if defined(VK_USE_PLATFORM_WIN32_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateWin32SurfaceKHR )
#elif defined(VK_USE_PLATFORM_XCB_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateXcbSurfaceKHR )
#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VK_INSTANCE_LEVEL_FUNCTION( vkCreateXlibSurfaceKHR )
#endif
#endif

4.ListOfFunctions.inl

One more thing: I’ve wrapped all these swap-chain-related functions inside another #ifdef … #endif pair, which requires a USE_SWAPCHAIN_EXTENSIONS preprocessor directive to be defined. I’ve done this so Tutorial 1 would properly work. Without it, our first application (as it uses the same header files) would try to load all these functions. But we don’t enable swap chain extensions in the first tutorial, so this operation would fail and the application would close without fully initializing Vulkan. If a given extension isn’t enabled, functions from it may not be available.

Creating a Presentation Surface

We have created a Vulkan instance with two extensions enabled. We have loaded instance-level functions from a core Vulkan spec and from enabled extensions (this is done automatically thanks to our macros). To create a surface, we write code similar to the following:

#if defined(VK_USE_PLATFORM_WIN32_KHR)
VkWin32SurfaceCreateInfoKHR surface_create_info = {
  VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR,  // VkStructureType                  sType
  nullptr,                                          // const void                      *pNext
  0,                                                // VkWin32SurfaceCreateFlagsKHR     flags
  Window.Instance,                                  // HINSTANCE                        hinstance
  Window.Handle                                     // HWND                             hwnd
};

if( vkCreateWin32SurfaceKHR( Vulkan.Instance, &surface_create_info, nullptr, &Vulkan.PresentationSurface ) == VK_SUCCESS ) {
  return true;
}

#elif defined(VK_USE_PLATFORM_XCB_KHR)
VkXcbSurfaceCreateInfoKHR surface_create_info = {
  VK_STRUCTURE_TYPE_XCB_SURFACE_CREATE_INFO_KHR,    // VkStructureType                  sType
  nullptr,                                          // const void                      *pNext
  0,                                                // VkXcbSurfaceCreateFlagsKHR       flags
  Window.Connection,                                // xcb_connection_t*                connection
  Window.Handle                                     // xcb_window_t                     window
};

if( vkCreateXcbSurfaceKHR( Vulkan.Instance, &surface_create_info, nullptr, &Vulkan.PresentationSurface ) == VK_SUCCESS ) {
  return true;
}

#elif defined(VK_USE_PLATFORM_XLIB_KHR)
VkXlibSurfaceCreateInfoKHR surface_create_info = {
  VK_STRUCTURE_TYPE_XLIB_SURFACE_CREATE_INFO_KHR,   // VkStructureType                sType
  nullptr,                                          // const void                    *pNext
  0,                                                // VkXlibSurfaceCreateFlagsKHR    flags
  Window.DisplayPtr,                                // Display                       *dpy
  Window.Handle                                     // Window                         window
};

if( vkCreateXlibSurfaceKHR( Vulkan.Instance,&surface_create_info, nullptr, &Vulkan.PresentationSurface ) == VK_SUCCESS ) {
  return true;
}

#endif

printf( "Could not create presentation surface!\n" );
return false;

5.Tutorial02.cpp, function CreatePresentationSurface()

To create a presentation surface, we call the vkCreate????SurfaceKHR() function, which accepts Vulkan Instance (with enabled surface extensions), a pointer to a OS-specific structure, a pointer to optional memory allocation handling functions, and a pointer to a variable in which a handle to a created surface will be stored.

This OS-specific structure is called Vk????SurfaceCreateInfoKHR and it contains the following fields:

  • sType – Standard type of structure that here should be equal to VK_STRUCTURE_TYPE_????_SURFACE_CREATE_INFO_KHR (where ???? can be WIN32, XCB, XLIB, or other)
  • pNext – Standard pointer to some other structure
  • flags – Parameter reserved for future use
  • hinstance/connection/dpy – First OS-specific parameter
  • hwnd/window – Handle to our application’s window (also OS specific)

Checking Whether a Device Extension is Supported

We have created an instance and a surface. The next step is to create a logical device. But we want to create a device that supports a swap chain. So we also need to check whether a given physical device supports a swap chain extension, a device-level extension. This extension is called VK_KHR_swapchain, and it defines the actual support, implementation, and usage of a swap chain.

To check what extensions given physical device supports we must create code similar to the code prepared for instance-level extensions. This time we just use the vkEnumerateDeviceExtensionProperties() function. It behaves identically to the function querying instance extensions. The only difference is that it takes an additional physical device handle in the first argument. The code for this may look similar to the example below. It is a part of the CheckPhysicalDeviceProperties() function in our example source code.

uint32_t extensions_count = 0;
if( (vkEnumerateDeviceExtensionProperties( physical_device, nullptr, &extensions_count, nullptr ) != VK_SUCCESS) ||
    (extensions_count == 0) ) {
  printf( "Error occurred during physical device %p extensions enumeration!\n", physical_device );
  return false;
}

std::vector<VkExtensionProperties> available_extensions( extensions_count );
if( vkEnumerateDeviceExtensionProperties( physical_device, nullptr, &extensions_count, &available_extensions[0] ) != VK_SUCCESS ) {
  printf( "Error occurred during physical device %p extensions enumeration!\n", physical_device );
  return false;
}

std::vector<const char*> device_extensions = {
  VK_KHR_SWAPCHAIN_EXTENSION_NAME
};

for( size_t i = 0; i < device_extensions.size(); ++i ) {
  if( !CheckExtensionAvailability( device_extensions[i], available_extensions ) ) {
    printf( "Physical device %p doesn't support extension named \"%s\"!\n", physical_device, device_extensions[i] );
    return false;
  }
}

6.Tutorial02.cpp, function CheckPhysicalDeviceProperties()

We first ask for the number of all extensions available on a given physical device. Next we get their names and look for the device-level swap-chain extension. If there is none there is no point in further checking the device’s properties, features, and queue families’ properties as a given device doesn’t support swap chain at all.

Checking Whether Presentation to a Given Surface Is Supported

Let’s go back to the CreateDevice() function. After creating an instance, in the first tutorial we looped through all available physical devices and queried their properties. Based on these properties we selected which device we want to use and which queue families we want to request. This query is done in a loop over all available physical devices. Now that we want to use swap chain I have to modify my CheckPhysicalDeviceProperties() function that is called inside a mentioned loop from CreateDevice() function like this:

uint32_t selected_graphics_queue_family_index = UINT32_MAX;
uint32_t selected_present_queue_family_index = UINT32_MAX;

for( uint32_t i = 0; i < num_devices; ++i ) {
  if( CheckPhysicalDeviceProperties( physical_devices[i], selected_graphics_queue_family_index, selected_present_queue_family_index ) ) {
    Vulkan.PhysicalDevice = physical_devices[i];
  }
}

7.Tutorial02.cpp, function CreateDevice()

The only change is that I’ve added another variable that will contain an index of a queue family that supports a swap chain (more precisely image presentation). Unfortunately, just checking whether swap extension is supported is not enough because presentation support is a queue family property. A physical device may support swap chains, but that doesn’t mean that all its queue families also support it. And do we really need another queue or queue family for displaying images? Can’t we just use graphics queue that we’d selected in the first tutorial? Most of the time one queue family will probably be enough for our needs. This means that the selected queue family will support both graphics operations and a presentation. But, unfortunately, it is also possible that there will be devices that won’t support graphics and presenting within a single queue family. In Vulkan we have to be flexible and prepared for any situation.

vkGetPhysicalDeviceSurfaceSupportKHR() function is used to check whether a given queue family from a given physical device supports a swap chain or, to be more precise, whether it supports presenting images to a given surface. That’s why we needed to create a surface earlier.

So assume we have already checked whether a given physical device exposes a swap-chain extension and that we have already queried for a number of different queue families supported by a given physical device. We have also requested the properties of all queue families. Now we can check whether a given queue family supports presentation to our surface (window).

uint32_t graphics_queue_family_index = UINT32_MAX;
uint32_t present_queue_family_index = UINT32_MAX;

for( uint32_t i = 0; i < queue_families_count; ++i ) {
  vkGetPhysicalDeviceSurfaceSupportKHR( physical_device, i, Vulkan.PresentationSurface, &queue_present_support[i] );

  if( (queue_family_properties[i].queueCount > 0) &&
      (queue_family_properties[i].queueFlags & VK_QUEUE_GRAPHICS_BIT) ) {
    // Select first queue that supports graphics
    if( graphics_queue_family_index == UINT32_MAX ) {
      graphics_queue_family_index = i;
    }

    // If there is queue that supports both graphics and present - prefer it
    if( queue_present_support[i] ) {
      selected_graphics_queue_family_index = i;
      selected_present_queue_family_index = i;
      return true;
    }
  }
}

// We don't have queue that supports both graphics and present so we have to use separate queues
for( uint32_t i = 0; i < queue_families_count; ++i ) {
  if( queue_present_support[i] ) {
    present_queue_family_index = i;
    break;
  }
}

// If this device doesn't support queues with graphics and present capabilities don't use it
if( (graphics_queue_family_index == UINT32_MAX) ||
    (present_queue_family_index == UINT32_MAX) ) {
  printf( "Could not find queue families with required properties on physical device %p!\n", physical_device );
  return false;
}

selected_graphics_queue_family_index = graphics_queue_family_index;
selected_present_queue_family_index = present_queue_family_index;
return true;

8.Tutorial02.cpp, function CheckPhysicalDeviceProperties()

Here we are iterating over all available queue families. In each loop iteration, we are calling a function responsible for checking whether a given queue family supports presentation. vkGetPhysicalDeviceSurfaceSupportKHR() function requires us to provide a physical device handle, the queue family index we want to check, and the surface handle we want to render into (present an image). If support is available, VK_TRUE will be stored at a given address; otherwise VK_FALSE is stored.

Now we have the properties of all available queue families. We know which queue family supports graphics operations and which supports presentation. In our tutorial example I prefer families that support both. If I find one I store the family index and exit immediately from CheckPhysicalDeviceProperties() function. If there is no such queue family I use the first queue family that supports graphics and a first family that supports presenting. Only then can I leave the function with a “success” return code.

A more advanced scenario may search through all available devices and try to find one with a queue family that supports both graphics and presentation operations. But I can also imagine situations when there will be no single device that supports both. Then we are forced to use one device for graphics calculations (maybe like the old “graphics accelerator”) and another device for presenting results on the screen (connected with the “accelerator” and a monitor). Unfortunately in such case we must use “general” Vulkan functions from the Vulkan Runtime or we need to store device‑level functions for each used device (each device may have a different implementation of Vulkan functions). But, hopefully, such situations will be uncommon.

Creating a Device with a Swap Chain Extension Enabled

Now we can return to the CreateDevice() function. We have found the physical device that supports both graphics and presenting but not necessarily in a single queue family. We now need to create a logical device.

std::vector<VkDeviceQueueCreateInfo> queue_create_infos;
std::vector<float> queue_priorities = { 1.0f };

queue_create_infos.push_back( {
    VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO,     // VkStructureType              sType
    nullptr,                                        // const void                  *pNext
    0,                                              // VkDeviceQueueCreateFlags     flags
    selected_graphics_queue_family_index,           // uint32_t                     queueFamilyIndex
    static_cast<uint32_t>(queue_priorities.size()), // uint32_t                     queueCount&queue_priorities[0]                            // const float                 *pQueuePriorities
} );

if( selected_graphics_queue_family_index != selected_present_queue_family_index ) {
  queue_create_infos.push_back( {
    VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO,     // VkStructureType              sType
    nullptr,                                        // const void                  *pNext
    0,                                              // VkDeviceQueueCreateFlags     flags
    selected_present_queue_family_index,            // uint32_t                     queueFamilyIndex
    static_cast<uint32_t>(queue_priorities.size()), // uint32_t                     queueCount&queue_priorities[0]                            // const float                 *pQueuePriorities
  } );
}

std::vector<const char*> extensions = {
  VK_KHR_SWAPCHAIN_EXTENSION_NAME
};

VkDeviceCreateInfo device_create_info = {
  VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO,             // VkStructureType                    sType
  nullptr,                                          // const void                        *pNext
  0,                                                // VkDeviceCreateFlags                flags
  1,                                                // uint32_t                           queueCreateInfoCount
  &queue_create_infos[0],                           // const VkDeviceQueueCreateInfo     *pQueueCreateInfos
  0,                                                // uint32_t                           enabledLayerCount
  nullptr,                                          // const char * const                *ppEnabledLayerNames
  static_cast<uint32_t>(extensions.size()),         // uint32_t                           enabledExtensionCount&extensions[0],                                   // const char * const                *ppEnabledExtensionNames
  nullptr                                           // const VkPhysicalDeviceFeatures    *pEnabledFeatures
};

if( vkCreateDevice( Vulkan.PhysicalDevice, &device_create_info, nullptr, &Vulkan.Device ) != VK_SUCCESS ) {
  printf( "Could not create Vulkan device!\n" );
  return false;
}

Vulkan.GraphicsQueueFamilyIndex = selected_graphics_queue_family_index;
Vulkan.PresentQueueFamilyIndex = selected_present_queue_family_index;
return true;

9.Tutorial02.cpp, function CreateDevice()

As before, we need to fill a variable of VkDeviceCreateInfo type. To do this, we need to declare the queue families and how many queues each we want to enable. We do this through a pointer to a separate array with VkDeviceQueueCreateInfo elements. Here I declare a vector and I add one element, which defines one queue from the queue family that supports graphics operations. We use a vector because if graphics and presenting aren’t supported by a single family, we will need to define two separate families. If a single family supports both we just define one member and declare that only one family is needed. If the indices of graphics and presentation families are different we need to declare additional members for our vector with VkDeviceQueueCreateInfo elements. In this case the VkDeviceCreateInfo structure must provide info about two different families. That’s why a vector once again comes in handy (with its size() member function).

But we are not finished with device creation yet. We have to ask for the third extension related to a swap chain—a device-level “VK_KHR_swapchain” extension. As mentioned earlier, this extensions defines the actual support, implementation, and usage of a swap chain.

To ask for this extension, similarly at an instance level, we define an array (or a vector) which contains all the names of device-level extensions we want to enable. We provide an address of a first element of this array and the number of extensions we want to use. This extension also contains a definition of its name in a form of a #define VK_KHR_SWAPCHAIN_EXTENSION_NAME. We can use it inside our array (vector), and we don’t have to worry about any typos.

This third extension introduces additional functions used to actually create, destroy, or in general manage swap chains. Before we can use them, we of course need to load pointers to these functions. They are from the device level so we will place them in a ListOfFunctions.inl file using VK_DEVICE_LEVEL_FUNCTION() macro:

// From extensions
#if defined(USE_SWAPCHAIN_EXTENSIONS)
VK_DEVICE_LEVEL_FUNCTION( vkCreateSwapchainKHR )
VK_DEVICE_LEVEL_FUNCTION( vkDestroySwapchainKHR )
VK_DEVICE_LEVEL_FUNCTION( vkGetSwapchainImagesKHR )
VK_DEVICE_LEVEL_FUNCTION( vkAcquireNextImageKHR )
VK_DEVICE_LEVEL_FUNCTION( vkQueuePresentKHR )
#endif

10.ListOfFunctions.inl

You can once again see that I’m checking whether a USE_SWAPCHAIN_EXTENSIONS preprocessor directive is defined. I define it only in projects that enable swap-chain extensions.

Now that we have created a logical devices we need to receive handles of a graphics queue and (if separate) presentation queue. I’m using two separate queue variables for convenience, but they both may contain the same handle.

After loading the device-level functions we can read requested queue handles. Here’s the code for it:

vkGetDeviceQueue( Vulkan.Device, Vulkan.GraphicsQueueFamilyIndex, 0, &Vulkan.GraphicsQueue );
vkGetDeviceQueue( Vulkan.Device, Vulkan.PresentQueueFamilyIndex, 0, &Vulkan.PresentQueue );
return true;

11.Tutorial02.cpp, function GetDeviceQueue()

Creating a Semaphore

One last step before we can move to swap chain creation and usage is to create a semaphore. Semaphores are objects used for queue synchronization. They may be signaled or unsignaled. One queue may signal a semaphore (change its state from unsignaled to signaled) when some operations are finished, and another queue may wait on the semaphore until it becomes signaled. After that, the queue resumes performing operations submitted through command buffers.

VkSemaphoreCreateInfo semaphore_create_info = {
  VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO,      // VkStructureType          sType
  nullptr,                                      // const void*              pNext
  0                                             // VkSemaphoreCreateFlags   flags
};

if( (vkCreateSemaphore( Vulkan.Device, &semaphore_create_info, nullptr, &Vulkan.ImageAvailableSemaphore ) != VK_SUCCESS) ||
    (vkCreateSemaphore( Vulkan.Device, &semaphore_create_info, nullptr, &Vulkan.RenderingFinishedSemaphore ) != VK_SUCCESS) ) {
  printf( "Could not create semaphores!\n" );
  return false;
}
return true;

12.Tutorial02.cpp, function CreateSemaphores()

To create a semaphore we call the vkCreateSemaphore() function. It requires us to provide create information with three fields:

  • sType – Standard structure type that must be set to VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO in this example.
  • pNext – Standard parameter reserved for future use.
  • flags – Another parameter that is reserved for future use and must equal zero.

Semaphores are used during drawing (or during presentation if we want to be more precise). I will describe the details later.

Creating a Swap Chain

We have enabled support for a swap chain, but before we can render anything on screen we must first create a swap chain from which we can acquire images on which we can render (or to which we can copy anything if we have rendered something into another image).

To create a swap chain, we call the vkCreateSwapchainKHR() function. It requires us to provide an address of a variable of type VkSwapchainCreateInfoKHR, which informs the driver about the properties of a swap chain that is being created. To fill this structure with the proper values, we must determine what is possible on a given hardware and platform. To do this we query the platform’s or window’s properties about the availability of and compatibility with several different features, that is, supported image formats or present modes (how images are presented on screen). So before we can create a swap chain we must check what is possible with a given platform and how we can create a swap chain.

Acquiring Surface Capabilities

First we must query for surface capabilities. To do this, we call the  vkGetPhysicalDeviceSurfaceCapabilitiesKHR() function like this:

VkSurfaceCapabilitiesKHR surface_capabilities;
if( vkGetPhysicalDeviceSurfaceCapabilitiesKHR( Vulkan.PhysicalDevice, Vulkan.PresentationSurface, &surface_capabilities ) != VK_SUCCESS ) {
  printf( "Could not check presentation surface capabilities!\n" );
  return false;
}

13.Tutorial02.cpp, function CreateSwapChain()

Acquired capabilities contain important information about ranges (limits) that are supported by the swap chain, that is, minimal and maximal number of images, minimal and maximal dimensions of images, or supported transforms (some platforms may require transformations applied to images before these images may be presented).

Acquiring Supported Surface Formats

Next, we need to query for supported surface formats. Not all platforms are compatible with typical image formats like non-linear 32-bit RGBA. Some platforms don’t have any preferences, but other may only support a small range of formats. We can only select one of the available formats for a swap chain or its creation will fail.

To query for surface formats, we must call the vkGetPhysicalDeviceSurfaceFormatsKHR() function. We can do it, as usual, twice: the first time to acquire the number of supported formats and a second time to acquire supported formats in an array prepared for this purpose. It can be done like this:

uint32_t formats_count;
if( (vkGetPhysicalDeviceSurfaceFormatsKHR( Vulkan.PhysicalDevice, Vulkan.PresentationSurface, &formats_count, nullptr ) != VK_SUCCESS) ||
    (formats_count == 0) ) {
  printf( "Error occurred during presentation surface formats enumeration!\n" );
  return false;
}

std::vector<VkSurfaceFormatKHR> surface_formats( formats_count );
if( vkGetPhysicalDeviceSurfaceFormatsKHR( Vulkan.PhysicalDevice, Vulkan.PresentationSurface, &formats_count, &surface_formats[0] ) != VK_SUCCESS ) {
  printf( "Error occurred during presentation surface formats enumeration!\n" );
  return false;
}

14.Tutorial02.cpp, function CreateSwapChain()

Acquiring Supported Present Modes

We should also ask for the available present modes, which tell us how images are presented (displayed) on the screen. The present mode defines whether an application will wait for v-sync or whether it will display an image immediately when it is available (which will probably lead to image tearing). I describe different present modes later.

To query for present modes that are supported on a given platform, we call the vkGetPhysicalDeviceSurfacePresentModesKHR() function. We can create code similar to this one:

uint32_t present_modes_count;
if( (vkGetPhysicalDeviceSurfacePresentModesKHR( Vulkan.PhysicalDevice, Vulkan.PresentationSurface, &present_modes_count, nullptr ) != VK_SUCCESS) ||
    (present_modes_count == 0) ) {
  printf( "Error occurred during presentation surface present modes enumeration!\n" );
  return false;
}

std::vector<VkPresentModeKHR> present_modes( present_modes_count );
if( vkGetPhysicalDeviceSurfacePresentModesKHR( Vulkan.PhysicalDevice, Vulkan.PresentationSurface, &present_modes_count, &present_modes[0] ) != VK_SUCCESS ) {
  printf( "Error occurred during presentation surface present modes enumeration!\n" );
  return false;
}

15.Tutorial02.cpp, function CreateSwapChain()

We now have acquired all the data that will help us prepare the proper values for a swap chain creation.

Selecting the Number of Swap Chain Images

A swap chain consists of multiple images. Several images (typically more than one) are required for the presentation engine to work properly, that is, one image is presented on the screen, another image waits in a queue for the next v-sync, and a third image is available for the application to render into.

An application may request more images. If it wants to use multiple images at once it may do so, for example, when encoding a video stream where every fourth image is a key frame and the application needs it to prepare the remaining three frames. Such usage will determine the number of images that will be automatically created in a swap chain: how many images the application requires at once for processing and how many images the presentation engine requires to function properly.

But we must ensure that the requested number of swap chain images is not smaller than the minimal required number of images and not greater than the maximal supported number of images (if there is such a limitation). And too many images will require much more memory. On the other hand, too small a number of images may cause stalls in the application (more about this later).

The number of images that are required for a swap chain to work properly and for an application to be able to render to is defined in the surface capabilities. Here is some code that checks whether the number of images is between the allowable min and max values:

// Set of images defined in a swap chain may not always be available for application to render to:
// One may be displayed and one may wait in a queue to be presented
// If application wants to use more images at the same time it must ask for more images
uint32_t image_count = surface_capabilities.minImageCount + 1;
if( (surface_capabilities.maxImageCount > 0) &&
    (image_count > surface_capabilities.maxImageCount) ) {
  image_count = surface_capabilities.maxImageCount;
}
return image_count;

16.Tutorial02.cpp, function GetSwapChainNumImages()

The minImageCount value in the surface capabilities structure gives the required minimum number of images for the swap chain to work properly. Here I’m selecting one more image than is required, and I also check whether I’m asking for too much. One more image may be useful for triple buffering-like presentation mode (if it is available). In more advanced scenarios we would also be required to store the number of images we want to use at the same time (at once). Let’s say we want to encode a mentioned video stream and we need a key frame (every forth image frame) and the other three images. But a swap chain doesn’t allow the application to operate on four images at once—only on three. We need to know that because we can only prepare two frames from a key frame, then we need to release them (give them back to a presentation engine) and only then can we acquire the last, third, non-key frame. This will become clearer later.

Selecting a Format for Swap Chain Images

Choosing a format for the images depends on the type of processing/rendering we want to do, that is, if we want to blend the application window with the desktop contents, an alpha value may be required. We must also know what color space is available and if we operate on linear or sRGB colorspace.

Each platform may support a different number of format-colorspace pairs. If we want to use specific ones we must make sure that they are available.

// If the list contains only one entry with undefined format
// it mean that there are no preferred surface formats and any can be choosen
if( (surface_formats.size() == 1) &&
    (surface_formats[0].format == VK_FORMAT_UNDEFINED) ) {
  return{ VK_FORMAT_R8G8B8A8_UNORM, VK_COLORSPACE_SRGB_NONLINEAR_KHR };
}

// Check if list contains most widely used R8 G8 B8 A8 format
// with nonlinear color space
for( VkSurfaceFormatKHR &surface_format : surface_formats ) {
  if( surface_format.format == VK_FORMAT_R8G8B8A8_UNORM ) {
    return surface_format;
  }
}

// Return the first format from the list
return surface_formats[0];

17.Tutorial02.cpp, function GetSwapChainFormat()

Earlier we requested a supported format which was placed in an array (a vector in our case). If this array contains only one value with an undefined format, that platform doesn’t have any preferences. We can use any image format we want.

In other cases, we can use only one of the available formats. Here I’m looking for any (linear or not) 32-bit RGBA format. If it is available I can choose it. If there is no such format I will use any format from the list (hoping that the first is also the best and contains the format with the most precision).

Selecting the Size of the Swap Chain Images

Typically the size of swap chain images will be identical to the window size. We can choose other sizes, but we must fit into image size constraints. The size of an image that would fit into the current application window’s size is available in the surface capabilities structure, in “currentExtent” member.

One thing worth noting is that a special value of “-1” indicates that the application’s window size will be determined by the swap chain size, so we can choose whatever dimension we want. But we must still make sure that the selected size is not smaller and not greater than the defined minimum and maximum constraints.

Selecting the swap chain size may (and probably usually will) look like this:

// Special value of surface extent is width == height == -1
// If this is so we define the size by ourselves but it must fit within defined confines
if( surface_capabilities.currentExtent.width == -1 ) {
  VkExtent2D swap_chain_extent = { 640, 480 };
  if( swap_chain_extent.width < surface_capabilities.minImageExtent.width ) {
    swap_chain_extent.width = surface_capabilities.minImageExtent.width;
  }
  if( swap_chain_extent.height < surface_capabilities.minImageExtent.height ) {
    swap_chain_extent.height = surface_capabilities.minImageExtent.height;
  }
  if( swap_chain_extent.width > surface_capabilities.maxImageExtent.width ) {
    swap_chain_extent.width = surface_capabilities.maxImageExtent.width;
  }
  if( swap_chain_extent.height > surface_capabilities.maxImageExtent.height ) {
    swap_chain_extent.height = surface_capabilities.maxImageExtent.height;
  }
  return swap_chain_extent;
}

// Most of the cases we define size of the swap_chain images equal to current window's size
return surface_capabilities.currentExtent;

18.Tutorial02.cpp, function GetSwapChainExtent()

Selecting Swap Chain Usage Flags

Usage flags define how a given image may be used in Vulkan. If we want an image to be sampled (used inside shaders) it must be created with “sampled” usage. If the image should be used as a depth render target, it must be created with “depth and stencil” usage. An image without proper usage “enabled” cannot be used for a given purpose or the results of such operations will be undefined.

For a swap chain we want to render (in most cases) into the image (use it as a render target), so we must specify “color attachment” usage with VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT enum. In Vulkan this usage is always available for swap chains, so we can always set it without any additional checking. But for any other usage we must ensure it is supported – we can do this through a “supportedUsageFlags” member of surface capabilities structure.

// Color attachment flag must always be supported
// We can define other usage flags but we always need to check if they are supported
if( surface_capabilities.supportedUsageFlags & VK_IMAGE_USAGE_TRANSFER_DST_BIT ) {
  return VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT;
}
return 0;

19.Tutorial02.cpp, function GetSwapChainUsageFlags()

In this example we define additional “transfer destination” usage which is required for image clear operation.

Selecting Pre-Transformations

On some platforms we may want our image to be transformed. This is usually the case on tablets when they are oriented in a way other than their default orientation. During swap chain creation we must specify what transformations should be applied to images prior to presenting. We can, of course, use only the supported transforms, which can be found in a “supportedTransforms” member of acquired surface capabilities.

If the selected pre-transform is other than the current transformation (also found in surface capabilities) the presentation engine will apply the selected transformation. On some platforms this may cause performance degradation (probably not noticeable but worth mentioning). In the sample code, I don’t want any transformations but, of course, I must check whether it is supported. If not, I’m just using the same transformation that is currently used.

// Sometimes images must be transformed before they are presented (i.e. due to device's orienation
// being other than default orientation)
// If the specified transform is other than current transform, presentation engine will transform image
// during presentation operation; this operation may hit performance on some platforms
// Here we don't want any transformations to occur so if the identity transform is supported use it
// otherwise just use the same transform as current transform
if( surface_capabilities.supportedTransforms & VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR ) {
  return VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
} else {
  return surface_capabilities.currentTransform;
}

20.Tutorial02.cpp, function GetSwapChainTransform()

Selecting Presentation Mode

Present modes determine the way images will be processed internally by the presentation engine and displayed on the screen. In the past, there was just a single buffer that was displayed all the time. If we were drawing anything on it the draw operations (whole process of image creation) were visible.

Double buffering was introduced to prevent the visibility of drawing operations: one image was displayed and the second was used to render into. During presentation, the contents of the second image were copied into the first image (earlier) or (later) the images were swapped (remember SwapBuffers() function used in OpenGL applications?) which means that their pointers were exchanged.

Tearing was another issue with displaying images, so the ability to wait for the vertical blank signal was introduced if we wanted to avoid it. But waiting introduced another problem: input lag. So double buffering was changed into triple buffering in which we were drawing into two back buffers interchangeably and during v-sync the most recent one was used for presentation.

This is exactly what presentation modes are for: how to deal with all these issues, how to present images on the screen and whether we want to use v-sync.

Currently there are four presentation modes:

  • IMMEDIATE. Present requests are applied immediately and tearing may be observed (depending on the frames per second). Internally the presentation engine doesn’t use any queue for holding swap chain images.

  • FIFO. This mode is the most similar to OpenGL’s buffer swapping with a swap interval set to 1. The image is displayed (replaces currently displayed image) only on vertical blanking periods, so no tearing should be visible. Internally, the presentation engine uses FIFO queue with “numSwapchainImages – 1” elements. Present requests are appended to the end of this queue. During blanking periods, the image from the beginning of the queue replaces the currently displayed image, which may become available to application. If all images are in the queue, the application has to wait until v-sync releases the currently displayed image. Only after that does it becomes available to the application and program may render image into it. This mode must always be available in all Vulkan implementations supporting swap chain extension.

  • FIFO RELAXED. This mode is similar to FIFO, but when the image is displayed longer than one blanking period it may be released immediately without waiting for another v-sync signal (so if we are rendering frames with lower frequency than screen’s refresh rate, tearing may be visible)
     
  • MAILBOX. In my opinion, this mode is the most similar to the mentioned triple buffering. The image is displayed only on vertical blanking periods and no tearing should be visible. But internally, the presentation engine uses the queue with only a single element. One image is displayed and one waits in the queue. If application wants to present another image it is not appended to the end of the queue but replaces the one that waits. So in the queue there is always the most recently generated image. This behavior is available if there are more than two images. For two images MAILBOX mode behaves similarly to FIFO (as we have to wait for the displayed image to be released, we don’t have “spare” image which can be exchanged with the one that waits in the queue).

Deciding on which presentation mode to use depends on the type of operations we want to do. If we want to decode and display movies we want all frames to be displayed in a proper order. So the FIFO mode is in my opinion the best choice. But if we are creating a game, we usually want to display the most recently generated frame. In this case I suggest using MAILBOX because there is no tearing and input lag is minimized. The most recently generated image is displayed and the application doesn’t need to wait for v-sync. But to achieve this behavior, at least three images must be created and this mode may not always be supported.

FIFO mode is always available and requires at least two images but causes application to wait for v-sync (no matter how many swap chain images were requested). Immediate mode is the fastest. As I understand it, it also requires two images but it doesn’t make application wait for monitor refresh rate. On the downside it may cause image tearing. The choice is yours but, as always, we must make sure that the chosen presentation mode is supported.

Earlier we queried for available present modes, so now we must look for the one that best suits our needs. Here is the code in which I’m looking for MAILBOX mode:

// FIFO present mode is always available
// MAILBOX is the lowest latency V-Sync enabled mode (something like triple-buffering) so use it if available
for( VkPresentModeKHR &present_mode : present_modes ) {
  if( present_mode == VK_PRESENT_MODE_MAILBOX_KHR ) {
    return present_mode;
  }
}
return VK_PRESENT_MODE_FIFO_KHR;

21.Tutorial02.cpp, function GetSwapChainPresentMode()

Creating a Swap Chain

Now we have all the data necessary to create a swap chain. We have defined all the required values, and we are sure they fit into the given platform’s constraints.

uint32_t                      desired_number_of_images = GetSwapChainNumImages( surface_capabilities );
VkSurfaceFormatKHR            desired_format = GetSwapChainFormat( surface_formats );
VkExtent2D                    desired_extent = GetSwapChainExtent( surface_capabilities );
VkImageUsageFlags             desired_usage = GetSwapChainUsageFlags( surface_capabilities );
VkSurfaceTransformFlagBitsKHR desired_transform = GetSwapChainTransform( surface_capabilities );
VkPresentModeKHR              desired_present_mode = GetSwapChainPresentMode( present_modes );
VkSwapchainKHR                old_swap_chain = Vulkan.SwapChain;

if( static_cast<int>(desired_usage) == 0 ) {
  printf( "TRANSFER_DST image usage is not supported by the swap chain!" );
  return false;
}

VkSwapchainCreateInfoKHR swap_chain_create_info = {
  VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR,  // VkStructureType                sType
  nullptr,                                      // const void                    *pNext
  0,                                            // VkSwapchainCreateFlagsKHR      flags
  Vulkan.PresentationSurface,                   // VkSurfaceKHR                   surface
  desired_number_of_images,                     // uint32_t                       minImageCount
  desired_format.format,                        // VkFormat                       imageFormat
  desired_format.colorSpace,                    // VkColorSpaceKHR                imageColorSpace
  desired_extent,                               // VkExtent2D                     imageExtent
  1,                                            // uint32_t                       imageArrayLayers
  desired_usage,                                // VkImageUsageFlags              imageUsage
  VK_SHARING_MODE_EXCLUSIVE,                    // VkSharingMode                  imageSharingMode
  0,                                            // uint32_t                       queueFamilyIndexCount
  nullptr,                                      // const uint32_t                *pQueueFamilyIndices
  desired_transform,                            // VkSurfaceTransformFlagBitsKHR  preTransform
  VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR,            // VkCompositeAlphaFlagBitsKHR    compositeAlpha
  desired_present_mode,                         // VkPresentModeKHR               presentMode
  VK_TRUE,                                      // VkBool32                       clipped
  old_swap_chain                                // VkSwapchainKHR                 oldSwapchain
};

if( vkCreateSwapchainKHR( Vulkan.Device, &swap_chain_create_info, nullptr, &Vulkan.SwapChain ) != VK_SUCCESS ) {
  printf( "Could not create swap chain!\n" );
  return false;
}
if( old_swap_chain != VK_NULL_HANDLE ) {
  vkDestroySwapchainKHR( Vulkan.Device, old_swap_chain, nullptr );
}

return true;

22.Tutorial02.cpp, function CreateSwapChain()

In this code example, at the beginning we gathered all the necessary data described earlier. Next we create a variable of type VkSwapchainCreateInfoKHR. It consists of the following members:

  • sType – Normal structure type, which here must be a VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR.
  • pNext – Pointer reserved for future use (for some extensions to this extension).
  • flags – Value reserved for future use; currently must be set to zero.
  • surface – A handle of a created surface that represents windowing system (our application’s window).
  • minImageCount – Minimal number of images the application requests for a swap chain (must fit into available constraints).
  • imageFormat – Application-selected format for swap chain images; must be one of the supported surface formats.
  • imageColorSpace – Colorspace for swap chain images; only enumerated values of format-colorspace pairs may be used for imageFormat and imageColorSpace (we can’t use format from one pair and colorspace from another pair).
  • imageExtent – Size (dimensions) of swap chain images defined in pixels; must fit into available constraints.
  • imageArrayLayers – Defines the number of layers in a swap chain images (that is, views); typically this value will be one but if we want to create multiview or stereo (stereoscopic 3D) images, we can set it to some higher value.
  • imageUsage – Defines how application wants to use images; it may contain only values of supported usages; color attachment usage is always supported.
  • imageSharingMode – Describes image-sharing mode when multiple queues are referencing images (I will describe this in more detail later).
  • queueFamilyIndexCount – The number of different queue families from which swap chain images will be referenced; this parameter matters only when VK_SHARING_MODE_CONCURRENT sharing mode is used.
  • pQueueFamilyIndices – An array containing all the indices of queue families that will be referencing swap chain images; must contain at least queueFamilyIndexCount elements and as in queueFamilyIndexCount this parameter matters only when VK_SHARING_MODE_CONCURRENT sharing mode is used.
  • preTransform – Transformations applied to the swap chain image before it can be presented; must be one of the supported values.
  • compositeAlpha – This parameter is used to indicate how the surface (image) should be composited (blended?) with other surfaces on some windowing systems; this value must also be one of the possible values (bits) returned in surface capabilities, but it looks like opaque composition (no blending, alpha ignored) will be always supported (as most of the games will want to use this mode).
  • presentMode – Presentation mode that will be used by a swap chain; only supported mode may be selected.
  • clipped – Connected with ownership of pixels; in general it should be set to VK_TRUE if application doesn’t want to read from swap chain images (like ReadPixels()) as it will allow some platforms to use more optimal presentation methods; VK_FALSE value is used in some specific scenarios (if I learn more about these scenario I will write about them).
  • oldSwapchain – If we are recreating a swap chain, this parameter defines an old swap chain that will be replaced by a newly created one.

So what’s the matter with this sharing mode? Images in Vulkan can be referenced by queues. This means that we can create commands that use these images. These commands are stored in command buffers, and these command buffers are submitted to queues. Queues belong to different queue families. And Vulkan requires us to state how many different queue families and which of them are referencing these images through commands submitted with command buffers.

If we want to reference images from many different queue families at a time we can do so. In this case we must provide “concurrent” sharing mode. But this (probably) requires us to manage image data coherency by ourselves, that is, we must synchronize different queues in such a way that data in the images is proper and no hazards occur—some queues are reading data from images, but other queues haven’t finished writing to them yet.

We may not specify these queue families and just tell Vulkan that only one queue family (queues from one family) will be referencing image at a time. This doesn’t mean other queues can’t reference these images. It just means they can’t do it all at once, at the same time. So if we want to reference images from one family and then from another we must specifically tell Vulkan: “My image was used inside this queue family, but from now on another family, this one, will be referencing it.” Such a transition is done using image memory barrier. When only one queue family uses a given image at a time, use the “exclusive” sharing mode.

If any of these requirements are not fulfilled, undefined behavior will probably occur and we may not rely on the image contents.

In this example we are using only one queue so we don’t have to specify “concurrent” sharing mode and leave related parameters (queueFamilyCount and pQueueFamilyIndices) blank (or nulled, or zeroed).

So now we can call the vkCreateSwapchainKHR() function to create a swap chain and check whether this operation succeeded. After that (if we are recreating the swap chain, meaning this isn’t the first time we are creating one) we should destroy the previous swap chain. I’ll discuss this later.

Image Presentation

We now have a working swap chain that contains several images. To use these images as render targets, we can get handles to all images created with a swap chain, but we are not allowed to use them just like that. Swap chain images belong to and are owned by the swap chain. This means that the application cannot use these images until it asks for them. This also means that images are created and destroyed by the platform along with a swap chain (not by the application).

So when the application wants to render into a swap chain image or use it in any other way, it must first get access to it by asking a swap chain for it. If the swap chain makes us wait, we have to wait. And after the application finishes using the image it should “return” it by presenting it. If we forget about returning images to a swap chain, we will soon run out of images and nothing will display on the screen.

The application may also request access to more images at once but they must be available. Acquiring access may require waiting. In corner cases, when there are too few images in a swap chain and the application wants to access too many of them, or if we forget about returning images to a swap chain, the application may even wait an infinite amount of time.

Given that there are (usually) at least two images, it may sound strange that we have to wait, but it is quite reasonable. Not all images are available for the application because they are used by the presentation engine. Usually one image is displayed. Additional images may also be required for the presentation engine to work properly. So we can’t use them because it could block the presentation engine in some way. We don’t know its internal mechanisms and algorithms or the requirements of the OS the application is executed on. So the availability of images may depend on many factors: internal implementation, OS, number of created images, number of images the application wants to use at a single time and on the selected presentation mode, which is the most important factor from the perspective of this tutorial.

In immediate mode, one image is always presented. Other images (at least one) are available for application. When the application posts a presentation request (“returns” an image), the image that was displayed is replaced with the new one. So if two images are created, only one image may be available for application at a single time. When the application asks for another image, it must “return” the previous one. If it wants two images at a time, it must create a swap chain with more images or it will wait forever. When we request more images, in immediate mode, the application can ask for (own) “imageCount – 1” images at a time.

In FIFO mode one image is displayed, and the rest are placed in a FIFO queue. The length of this queue is always equal to “imageCount – 1.” At first, all images may be available to the application (because the queue is empty and no image is presented). When the application presents an image (“returns” it to a swap chain), it is appended to the end of the queue. So as soon as the queue fills, the application has to wait for another image until the displayed image is released during the vertical blanking period. Images are always displayed in the same order they were presented in by the application. When the v-sync signal appears, the first image from the queue replaces the image that was displayed. The previously displayed image (the released one) may become available to the application as it becomes unused (isn’t presented and is not waiting in the queue). If all images are in the queue, the application will wait for the next blanking period to access another image. If rendering takes longer than the refresh rates, the application will not have to wait at all. This behavior doesn’t change when there are more images. The internal swap chain queue has always “imageCount – 1” elements.

The last mode available for the time being is MAILBOX. As previously mentioned, this mode is most similar to the “traditional” triple buffering. One image is always displayed. The second image waits in a single-element queue (it always has place for only one element). The rest of the images may be available for the application. When the application presents an image, the image replaces the one waiting in the queue. The image in the queue gets displayed only during blanking periods, but the application doesn’t need to wait for the next image (when there are more than two images). MAILBOX mode with only two images behaves identically to FIFO mode—the application must wait for the v-sync signal to acquire the next image. But with at least three images it immediately may acquire the image that was replaced by the “presented” image (the one waiting in the queue). That’s why I requested one more image than the minimal number. If MAILBOX mode is available I want to use it in a manner similar to triple buffering (maybe the first thing to do is to check what mode is available and after that choose the number of swap chain images based on the selected presentation mode).

I hope these examples help you understand why the application must ask for an image if it wants to use any. In Vulkan we can only do what is allowed and required—not less and usually not too much more.

uint32_t image_index;
VkResult result = vkAcquireNextImageKHR( Vulkan.Device, Vulkan.SwapChain, UINT64_MAX, Vulkan.ImageAvailableSemaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
  case VK_SUCCESS:
  case VK_SUBOPTIMAL_KHR:
    break;
  case VK_ERROR_OUT_OF_DATE_KHR:
    return OnWindowSizeChanged();
  default:
    printf( "Problem occurred during swap chain image acquisition!\n" );
    return false;
}

23.Tutorial02.cpp, function Draw()

To access an image, we must call the vkAcquireNextImageKHR() function. During the call we must specify (apart from the device handle like in almost all other functions) a swap chain from which we want to use an image, a timeout, a semaphore, and a fence object. A function, in case of a success, will store the image index in the variable we provided the address of. Why an index and not the (handle to) image itself? Such a behavior may be convenient (that is, during the “preprocessing” phase when we want to prepare as much data needed for rendering as possible to not waste time during typical frame rendering) but I will describe it later. Just remember that we can check what images were created in a swap chain if we want (we just can’t use them until we are allowed). An array of images will be provided upon such query. And the vkAcquireNextImageKHR() function stores an index into this very array.

We have to specify a timeout because sometimes images may not be immediately available. Trying to use an image before we are allowed to will cause an undefined behavior. Specifying a timeout gives the presentation engine time to react. If it needs to wait for the next vertical blanking period it can do so and we give it a time. So this function will block until the given time has passed. We can provide maximal available value so the function may even block indefinitely. If we provide 0 for the timeout, the function will return immediately. If any image was available at the time the call occurred it will be provided immediately. If there was no available image, an error will be returned stating that the image was not yet ready.

Once we have our image we can use it however we want. Images are processed or referenced by commands stored in command buffers. We can prepare command buffers earlier (to save as much processing time for rendering as we can) and use or submit them here. Or we can prepare the commands now and submit them when we’re done. In Vulkan, creating command buffers and submitting them to queues is the only way to cause operations to be performed by the device.

When command buffers are submitted to queues, all their commands start being processed. But a queue cannot use an image until it is allowed to, and the semaphore we created earlier is for internal queue synchronization—before the queue starts processing commands that reference a given image, it should wait on this semaphore (until it gets signaled). But this wait doesn’t block an application. There are two synchronization mechanisms for accessing swap chain images: (1) a timeout, which may block an application but doesn’t stop queue processing, and (2) a semaphore, which doesn’t block the application but blocks selected queues.

We now know (theoretically) how to render anything (through command buffers). So let’s now imagine that inside a command buffer we are submitting some rendering operations take place. But before the processing will start, we should tell the queue (on which this rendering will occur) to wait. This all is done within one submit operation.

VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkSubmitInfo submit_info = {
  VK_STRUCTURE_TYPE_SUBMIT_INFO,                // VkStructureType              sType
  nullptr,                                      // const void                  *pNext
  1,                                            // uint32_t                     waitSemaphoreCount&Vulkan.ImageAvailableSemaphore,              // const VkSemaphore           *pWaitSemaphores&wait_dst_stage_mask,                         // const VkPipelineStageFlags  *pWaitDstStageMask;
  1,                                            // uint32_t                     commandBufferCount&Vulkan.PresentQueueCmdBuffers[image_index],  // const VkCommandBuffer       *pCommandBuffers
  1,                                            // uint32_t                     signalSemaphoreCount&Vulkan.RenderingFinishedSemaphore            // const VkSemaphore           *pSignalSemaphores
};

if( vkQueueSubmit( Vulkan.PresentQueue, 1, &submit_info, VK_NULL_HANDLE ) != VK_SUCCESS ) {
  return false;
}

24.Tutorial02.cpp, function Draw()

First we prepare a structure with information about the types of operations we want to submit to the queue. This is done through VkSubmitInfo structure. It contains the following fields:

  • sType – Standard structure type; here it must be set to VK_STRUCTURE_TYPE_SUBMIT_INFO.
  • pNext – Standard pointer reserved for future use.
  • waitSemaphoreCount – Number of semaphores we want the queue to wait on before it starts processing commands from command buffers.
  • pWaitSemaphores – Pointer to an array with semaphore handles on which queue should wait; this array must contain at least waitSemaphoreCount elements.
  • pWaitDstStageMask – Pointer to an array with the same amount of elements as pWaitSemaphores array; it describes the pipeline stages at which each (corresponding) semaphore wait will occur; in our example, the queue may perform some operations before it starts using the image from the swap chain so there is no reason to block all of the operations; the queue may start processing some drawing commands and until pipeline gets to the stage in which the image is used, it will wait.            
  • commandBufferCount – Number of command buffers we are submitting for execution.
  • pCommandBuffers – Pointer to an array with command buffers handles which must contain at least commandBufferCount elements.
  • signalSemaphoreCount – Number of semaphores we want the queue to signal after processing all the submitted command buffers.
  • pSignalSemaphores – Pointer to an array of at least signalSemaphoreCount elements with semaphore handles; these semaphores will be signaled after the queue has finished processing commands submitted within this submit information.

In this example we are telling the queue to wait only on one semaphore, which will be signaled by the presentation engine when the queue can safely start processing commands referencing the swap chain image.

We also submit just one simple command buffer. It was prepared earlier (I will describe how to do it later). It only clears the acquired image. But this is enough for us to see the selected color in our application’s window and to see that the swap chain is working properly.

In the code above, the command buffers are arranged in an array (a vector, to be more precise). To make it easier to submit the proper command buffer—the one that references the currently acquired image—I prepared a separate command buffer for each swap chain image. The index of an image that the vkAcquireNextImageKHR() function provides can be used right here. Using image handles (in similar scenarios) would require creating maps that would translate the handle into a specific command buffer or index. On the other hand, normal numbers can be used to just select a specific array element. This is why this function gives us indices and not image handles.

After we have submitted a command buffer, all the processing starts in the background, on “hardware.” Next, we want to present a rendered image. Presenting means that we want our image to be displayed and that we are “giving it back” to the swap chain. The code to do this might look like this:

VkPresentInfoKHR present_info = {
  VK_STRUCTURE_TYPE_PRESENT_INFO_KHR,           // VkStructureType              sType
  nullptr,                                      // const void                  *pNext
  1,                                            // uint32_t                     waitSemaphoreCount&Vulkan.RenderingFinishedSemaphore,           // const VkSemaphore           *pWaitSemaphores
  1,                                            // uint32_t                     swapchainCount&Vulkan.SwapChain,                            // const VkSwapchainKHR        *pSwapchains&image_index,                                 // const uint32_t              *pImageIndices
  nullptr                                       // VkResult                    *pResults
};
result = vkQueuePresentKHR( Vulkan.PresentQueue, &present_info );

switch( result ) {
  case VK_SUCCESS:
    break;
  case VK_ERROR_OUT_OF_DATE_KHR:
  case VK_SUBOPTIMAL_KHR:
    return OnWindowSizeChanged();
  default:
    printf( "Problem occurred during image presentation!\n" );
    return false;
}

return true;

25.Tutorial02.cpp, function Draw()

An image (or images) is presented by calling the vkQueuePresentKHR() function. It may be perceived as submitting a command buffer with only operation: presentation.

To present an image we must specify what images should be presented from how many and from which swap chains. We can present many images from many swap chains at once (that is, to multiple windows) but only one image from a single swap chain can be presented at once. We provide this information through the VkPresentInfoKHR structure, which contains the following fields:

  • sType – Standard structure type, it must be a VK_STRUCTURE_TYPE_PRESENT_INFO_KHR here.
  • pNext – Parameter reserved for future use.
  • waitSemaphoreCount – The number of semaphores we want the queue to wait on before it presents images.
  • pWaitSemaphores – Pointer to an array with semaphore handles on which the queue should wait; this array must contain at least waitSemaphoreCount elements.
  • swapchainCount – The number of swapchains to which we would like to present images.
  • pSwapchains – An array with swapchainCount elements that contains handles of all the swap chains that we  want to present images to; any single swap chain may only appear once in this array.
  • imageIndices – An array with swapchainCount elements that contains indices of images that we want to present; each element of this array corresponds to a swap chain in a pSwapchains array; the image index is the index into the array of each swap chain’s images (see the next section).
  • pResults – A pointer to an array of at least swapchainCount element; this parameter is optional and can be set to null, but if we provide such an array, the result of the presenting operation will be stored in each of its elements, for each swap chain respectively; a single value returned by the whole function is the same as the worst result value from all swap chains.

Now that we have prepared this structure, we can use it to present an image. In this example I’m just presenting a single image from a single swap chain.

Each operation that is performed (or submitted) by calling vkQueue…() functions (this includes presenting) is appended to the end of the queue for processing. Operations are processed in the order in which they were submitted. For a presentation, we are presenting an image after submitting other command buffers. So the present queue will start presenting an image after the processing of all the command buffers is done. This ensures that the image will be presented after we are done using it (rendering into it) and an image with correct contents will be displayed on the screen. But in this example we submit drawing (clearing) operations and a present operation to the same queue: the PresentQueue. We are doing only simple operations that are allowed to be done on a present queue.

If we want to perform drawing operations on a queue that is different than the present operation, we need to synchronize the queues. This is done, again, with semaphores, which is the reason why we created two semaphores (the second one may not be necessary in this example, as we render and present using the same queue, but I wanted to show how it should be done in the correct way).

The first semaphore is for presentation engine to tell the queue that it can safely use (reference/render into) an image. The second semaphore is for us. It is signaled when the operations on the image (rendering into it) are done. The submit info structure has a field called pSignalSemaphores. It is an array of semaphore handles that will be signaled after processing of all of the submitted command buffers is finished. So we need to tell the second queue to wait on this second semaphore. We store the handle of our second semaphore in the pWaitSemaphores field of a VkPresentInfoKHR structure. And the queue to which we are submitting the present operation will wait, thanks to this second semaphore, until we are done rendering into a given image.

And that’s it. We have displayed our first image using Vulkan!

Checking What Images Were Created in a Swap Chain

Previously I mentioned swap chain’s image indices. Here in this code sample, I show you more specifically what I was talking about.

uint32_t image_count = 0;
if( (vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, nullptr ) != VK_SUCCESS) ||
    (image_count == 0) ) {
  printf( "Could not get the number of swap chain images!\n" );
  return false;
}

std::vector<VkImage> swap_chain_images( image_count );
if( vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, &swap_chain_images[0] ) != VK_SUCCESS ) {
  printf( "Could not get swap chain images!\n" );
  return false;
}

2​6. -

This code sample is a fragment of an imaginary function that checks how many and what images were created inside a swap chain. It is done by a traditional “double-call,” this time using a vkGetSwapchainImagesKHR() function. First we call it with the last parameter set to null. This way the number of all images created in a swap chain is stored in an “image_count” variable and we know how much storage we need to prepare for the handles of all images. The second time we call this function, we achieve the handles in the array we have provided the address of through the last parameter.

Now we know all the images that the swap chain is using. For the vkAcquireNextImageKHR() function and VkPresentInfoKHR structure, the indices I referred to are the indices into this array, an array “returned” by the vkGetSwapchainImagesKHR() function. It is called an array of a swap chain’s presentable images. And if any function, in the case of a swap chain, wants us to provide an index or returns an index, it is the index of an image in this very array.

Recreating a Swap Chain

Previously, I mentioned that sometimes we must recreate a swap chain, and I also said that the old swap chain must be destroyed. The vkAcquireNextImageKHR() and vkQueuePresentKHR() functions return a result that sometimes causes the OnWindowSizeChanged() function to be called. This function recreates the swap chain.

Sometimes a swap chain gets old. This means that the properties of the surface, platform, or application window properties changed in such a way that the current swap chain cannot be used any more. The most obvious (and unfortunately not so good) example is when the window’s size changed. We cannot create a swap chain image nor can we change its size. The only possibility is to destroy and recreate a swap chain. There are also situations in which we can still use a swap chain, but it may no longer be optimal for surface it was created for.

These situations are notified by the return codes of the vkAcquireNextImageKHR() and vkQueuePresentKHR() functions.

When the VK_SUBOPTIMAL_KHR value is returned, we can still use the current swap chain for presentation. It will still work but not optimally (that is, color precision will be worse). It is advised to recreate swap chain when there is an opportunity. A good example is when we have performed performance-heavy rendering and after acquiring the image we are informed that our image is suboptimal. We don’t want to waste all this processing and make the user wait much longer for another frame. We just present the image and recreate the swap chain as soon as there is an opportunity.

When VK_ERROR_OUT_OF_DATE_KHR is returned we cannot use current swap chain and we must recreate it immediately. We cannot present using the current swap chain; this operation will fail. We have to recreate a swap chain as soon as possible.

I have mentioned that changing the window size is the most obvious, but not so good, example of surface properties’ changes after which we should recreate a swap chain. In this situation we should recreate a swap chain, but we may not be notified about it with the mentioned return codes. We should monitor the window size changes by ourselves using OS-specific code. And that’s why the name of this function in our source is OnWindowSizeChanged. This function is called every time a window’s size had changed. But as this function only recreates a swap chain (and command buffers) the same function can be called here.

Recreation is done the same way as creation. There is a structure member in which we provide a swap chain that the new one should replace. But we must implicitly destroy the old swap chain after we create the new one.

Quick Dive into Command Buffers

You now know a lot about swap chains, but there is still one important thing you need to know. To explain it, I will briefly show you how to prepare drawing commands. That one last important thing about swap chains is connected with drawing and preparing command buffers. I will present only information about how to clear images, but it is enough to check whether our swap chain is working as it should.

In the first tutorial, I described queues and queue families. If we want to execute commands on a device we submit them to queues through command buffers. To put it in other words: commands are encapsulated inside command buffers. Submitting such buffers to queues causes devices to start processing commands that were recorded in them. Do you remember OpenGL’s drawing lists? We could prepare lists of commands that cause the geometry to be drawn in a form of a list of, well, drawing commands. The situation in Vulkan is similar, but far more flexible and advanced.

Creating Command Buffer Memory Pool

To store commands, a command buffer needs some storage. To prepare space for commands we create a pool from which the buffer can allocate its memory. We don’t specify the amount of space—it is allocated dynamically when the buffer is built (recorded).

Remember that command buffers can be submitted only to proper queue families and only the types of operations compatible with a given family can be submitted to a given queue. Also, the command buffer itself is not connected with any queue or queue family, but the memory pool from which buffer allocates its memory is. So each command buffer that takes memory from a given pool can only be submitted to a queue from a proper queue family—a family from (inside?) which the memory pool was created. If there are more queues created from a given family, we can submit a command buffer to any one of them; the family index is the most important thing here.

VkCommandPoolCreateInfo cmd_pool_create_info = {
  VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,     // VkStructureType              sType
  nullptr,                                        // const void*                  pNext
  0,                                              // VkCommandPoolCreateFlags     flags
  Vulkan.PresentQueueFamilyIndex                  // uint32_t                     queueFamilyIndex
};

if( vkCreateCommandPool( Vulkan.Device, &cmd_pool_create_info, nullptr, &Vulkan.PresentQueueCmdPool ) != VK_SUCCESS ) {
  printf( "Could not create a command pool!\n" );
  return false;
}

27.Tutorial02.cpp, function CreateCommandBuffers()

To create a pool for command buffer(s) we call a vkCreateCommandPool() function. It requires us to provide (an address of) a variable of structure type VkCommandPoolCreateInfo. It contains the following members:

  • sType – A usual type of structure that must be equal to VK_STRUCTURE_TYPE_CMD_POOL_CREATE_INFO in this occasion.
  • pNext – Pointer reserved for future use.
  • flags – Value reserved for future use.
  • queueFamilyIndex – Index of a queue family for which this pool is created.

For our test application, we use only one queue from a presentation family, so we should use its index. Now we can call the vkCreateCommandPool() function and check whether it succeeded. If yes, the handle to the command pool will be stored in a variable we have provided the address of.

Allocating Command Buffers

Next, we need to allocate the command buffer itself. Command buffers are not created in a typical way; they are allocated from pools. Other objects that take their memory from pool objects are also allocated (the pools themselves are created). That’s why there is a separation in the names of the functions vkCreate…() and vkAllocate…().

As described earlier, I allocate more than one command buffer—one for each swap chain image that will be referenced by the drawing commands. So each time we acquire an image from a swap chain we can submit/use the proper command buffer.

uint32_t image_count = 0;
if( (vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, nullptr ) != VK_SUCCESS) ||
    (image_count == 0) ) {
  printf( "Could not get the number of swap chain images!\n" );
  return false;
}

Vulkan.PresentQueueCmdBuffers.resize( image_count );

VkCommandBufferAllocateInfo cmd_buffer_allocate_info = {
  VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType              sType
  nullptr,                                        // const void*                  pNext
  Vulkan.PresentQueueCmdPool,                     // VkCommandPool                commandPool
  VK_COMMAND_BUFFER_LEVEL_PRIMARY,                // VkCommandBufferLevel         level
  image_count                                     // uint32_t                     bufferCount
};
if( vkAllocateCommandBuffers( Vulkan.Device, &cmd_buffer_allocate_info, &Vulkan.PresentQueueCmdBuffers[0] ) != VK_SUCCESS ) {
  printf( "Could not allocate command buffers!\n" );
  return false;
}

if( !RecordCommandBuffers() ) {
  printf( "Could not record command buffers!\n" );
  return false;
}
return true;

28.Tutorial02.cpp, function CreateCommandBuffers()

First we need to know how many swap chain images were created (a swap chain may create more images than we have specified). This was explained in an earlier section. We call the vkGetSwapchainImagesKHR() function with the last parameter set to null. Right now we don’t need the handles of images, only their total number. After that we prepare an array (vector) for a proper number of command buffers and we can create a proper number of command buffers. To do this we call the vkAllocateCommandBuffers() function. It requires us to prepare a structured variable of type VkCommandBufferAllocateInfo, which contains the following fields:

  • sType – Type of a structure, this time equal to VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO.
  • pNext – Normal parameter reserved for future use.
  • commandPool – Command pool from which the buffer will be allocating its memory during commands recording.
  • level – Type (level) of command buffer. There are two levels: primary and secondary. Secondary command buffers may only be referenced (used) from primary command buffers. Because we don’t have any other buffers, we need to create primary buffers here.
  • bufferCount – The number of command buffers we want to create at once.

After calling the vkAllocateCommandBuffers() function, we need to check whether the buffer creations succeeded. If yes, we are done allocating command buffers and we are ready to record some (simple) commands.

Recording Command Buffers

Command recording is the most important operation we will be doing in Vulkan. The recording itself also requires us to provide a lot of information. The more information, the more complicated the drawing commands are.

Here is a set of variables required (in this tutorial) to record command buffers:

uint32_t image_count = static_cast<uint32_t>(Vulkan.PresentQueueCmdBuffers.size());

std::vector<VkImage> swap_chain_images( image_count );
if( vkGetSwapchainImagesKHR( Vulkan.Device, Vulkan.SwapChain, &image_count, &swap_chain_images[0] ) != VK_SUCCESS ) {
  printf( "Could not get swap chain images!\n" );
  return false;
}

VkCommandBufferBeginInfo cmd_buffer_begin_info = {
  VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,  // VkStructureType                        sType
  nullptr,                                      // const void                            *pNext
  VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, // VkCommandBufferUsageFlags              flags
  nullptr                                       // const VkCommandBufferInheritanceInfo  *pInheritanceInfo
};

VkClearColorValue clear_color = {
  { 1.0f, 0.8f, 0.4f, 0.0f }
};

VkImageSubresourceRange image_subresource_range = {
  VK_IMAGE_ASPECT_COLOR_BIT,                    // VkImageAspectFlags                     aspectMask
  0,                                            // uint32_t                               baseMipLevel
  1,                                            // uint32_t                               levelCount
  0,                                            // uint32_t                               baseArrayLayer
  1                                             // uint32_t                               layerCount
};

29.Tutorial02.cpp, function RecordCommandBuffers()

First we get the handles of all the swap chain images, which will be used in drawing commands (we will just clear them to one single color but nevertheless we will use them). We already know the number of images, so we don’t have to ask for it again. The handles of images are stored in a vector after calling the vkGetSwapchainImagesKHR() function.

Next, we need to prepare a variable of structured type VkCommandBufferBeginInfo. It contains the information necessary in more typical rendering scenarios (like render passes). We won’t be doing such operations here and that’s why we can set almost all parameters to zeros or nulls. But, for clarity, the structure contains the following fields:

  • sType – Structure type, this time it must be set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO.
  • pNext – Pointer reserved for future use, leave it to null.
  • flags – Parameter defining preferred usage of a command buffer.
  • pInheritanceInfo – Parameter pointing to another structure that is used in more typical rendering scenarios.

Command buffers gather commands. To store commands in command buffers, we record them. The above structure provides some necessary information for the driver to prepare for and optimize the recording process.

In Vulkan, command buffers are divided into primary and secondary. Primary command buffers are typical command buffers similar to drawing lists. They are independent, individual “beings” and they (and only they) may be submitted to queues. Secondary command buffers can also store commands (we also record them), but they may only be referenced from within primary command buffers (we can call secondary command buffers from within primary command buffers like calling OpenGL’s drawing lists from another drawing lists). We can’t submit secondary command buffers directly to queues.

All of this information will be described in more detail in a forthcoming tutorial.

In this simple example we want to clear our images with one single value. So next we set up a color that will be used for clearing. You can pick any value you like. I used a light orange color.

The last variable in the code above specifies the parts of the image that our operations will be performed on. Our image consists of only one mipmap level and one array level (no stereoscopic buffers, and so on). We set values in the VkImageSubresourceRange structure accordingly. This structure contains the following fields:

  • aspectMask – Depends on the image format as we are using images as color render targets (they have “color” format) so we specify “color aspect” here.
  • baseMipLevel – First mipmap level that will be accessed (modified).
  • levelCount – Number of mipmap levels on which operations will be performed (including the base level).
  • baseArrayLayer – First array layer that will be accessed (modified).
  • arraySize –  Number of layers the operations will be performed on (including the base layer).

We are almost ready to record some buffers.

Image Layouts and Layout Transitions

The last variable required in the above code example (of type VkImageSubresourceRange) specifies the parts of the image that operations will be performed on. In this lesson we only clear an image. But we also need to perform resource transitions. Remember the code when we selected a use for a swap chain image before the swap chain itself was created? Images may be used for different purposes. They may be used as render targets, as textures that can be sampled from inside the shaders, or as a data source for copy/blit operations (data transfers). We must specify different usage flags during image creation for the different types of operations we want to perform with or on images. We can specify more usage flags if we want (if they are supported; “color attachment” usage is always available for swap chains). But image usage specification is not the only thing we need to do. Depending on the type of operation, images may be differently allocated or may have a different layout in memory. Each type of image operation may be connected with a different “image layout.” We can use a general layout that is supported by all operations, but it may not provide the best performance. For specific usages we should always use dedicated layouts.

If we create an image with different usages in mind and we want to perform different operations on it, we must change the image’s current layout before we can perform each type of operation. To do this, we must transition from the current layout to another layout that is compatible with the operations we are about to execute.

Each image we create is created (generally) with an undefined layout, and we must transition from it to another layout if want to use the image. But swap-chain-created images have VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR layouts. This layout, as the name suggests, is designed for the image to be used (presented) by the presentation engine (that is, displayed on the screen). So if we want to perform some operations on swap chain images, we need to change their layouts to ones compatible with the desired operations. And after we have finished with processing the images (that is, rendering into them) we need to transition their layouts back to the VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR. Otherwise, the presentation engine will not be able to use these images and undefined behavior may occur.

To transition from one layout to another one, image memory barriers are used. With them we can specify the old layout (current) we are transitioning from and the new layout we are transitioning to. The old layout must always be equal to the current or undefined layout. When we specify the old layout as undefined, image contents may be discarded during transition. This allows the driver to perform some optimizations. If we want to preserve image contents we must specify a layout that is equal to the current layout.

The last variable of type VkImageSubresourceRange in the code example above is also used for image transitions. It defines what “parts” of the image are changing their layout and is required when preparing an image memory barrier.

Recording Command Buffers

The last step is to record a command buffer for each swap chain image. We want to clear the image to some arbitrary color. But first we need to change the image layout and change it back after we are done. Here is the code that does that:

for( uint32_t i = 0; i < image_count; ++i ) {
  VkImageMemoryBarrier barrier_from_present_to_clear = {
    VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,     // VkStructureType                        sType
    nullptr,                                    // const void                            *pNext
    VK_ACCESS_MEMORY_READ_BIT,                  // VkAccessFlags                          srcAccessMask
    VK_ACCESS_TRANSFER_WRITE_BIT,               // VkAccessFlags                          dstAccessMask
    VK_IMAGE_LAYOUT_UNDEFINED,                  // VkImageLayout                          oldLayout
    VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,       // VkImageLayout                          newLayout
    Vulkan.PresentQueueFamilyIndex,             // uint32_t                               srcQueueFamilyIndex
    Vulkan.PresentQueueFamilyIndex,             // uint32_t                               dstQueueFamilyIndex
    swap_chain_images[i],                       // VkImage                                image
    image_subresource_range                     // VkImageSubresourceRange                subresourceRange
  };

  VkImageMemoryBarrier barrier_from_clear_to_present = {
    VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,     // VkStructureType                        sType
    nullptr,                                    // const void                            *pNext
    VK_ACCESS_TRANSFER_WRITE_BIT,               // VkAccessFlags                          srcAccessMask
    VK_ACCESS_MEMORY_READ_BIT,                  // VkAccessFlags                          dstAccessMask
    VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,       // VkImageLayout                          oldLayout
    VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                          newLayout
    Vulkan.PresentQueueFamilyIndex,             // uint32_t                               srcQueueFamilyIndex
    Vulkan.PresentQueueFamilyIndex,             // uint32_t                               dstQueueFamilyIndex
    swap_chain_images[i],                       // VkImage                                image
    image_subresource_range                     // VkImageSubresourceRange                subresourceRange
  };

  vkBeginCommandBuffer( Vulkan.PresentQueueCmdBuffers[i], &cmd_buffer_begin_info );
  vkCmdPipelineBarrier( Vulkan.PresentQueueCmdBuffers[i], VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_present_to_clear );

  vkCmdClearColorImage( Vulkan.PresentQueueCmdBuffers[i], swap_chain_images[i], VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear_color, 1, &image_subresource_range );

  vkCmdPipelineBarrier( Vulkan.PresentQueueCmdBuffers[i], VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_clear_to_present );
  if( vkEndCommandBuffer( Vulkan.PresentQueueCmdBuffers[i] ) != VK_SUCCESS ) {
    printf( "Could not record command buffers!\n" );
    return false;
  }
}

return true;

30.Tutorial02.cpp, function RecordCommandBuffers()

This code is placed inside a loop. We are recording a command buffer for each swap chain image. That’s why we needed a number of images. Image handles are also needed here. We need to specify them for image memory barriers and during image clearing. But recall that I said we can’t use swap chain images until we are allowed to, until we acquire the image from the swap chain. That’s true, but we aren’t using them here. We are only preparing commands. The usage itself is performed when we submit operations (a command buffer) to the queue for execution. Here we are just telling Vulkan that in the future, take this picture and do this with it, then that, and after that something more. This way we can prepare as much work as we can before we start the main rendering loop and we avoid switches, ifs, jumps, and other branches during the real rendering. This scenario won’t be so simple in real life, but I hope the example is clear.

In the above code above, we are first preparing two image memory barriers. Memory barriers are used to change three different things in the case of images. From the tutorial point of view, only the layouts are interesting right now but we need to properly set all fields. To set up a memory barrier we need to prepare a variable of type VkImageMemoryBarrier, which contains the following fields:

  • sType – Structure type which here must be set to VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER.
  • pNext – Leave it null, pointer not used right now.
  • srcAccessMask – Types of memory operations done on the image before the barrier.
  • dstAccessMask – Types of memory operations that will take place after the barrier.
  • oldLayout – Layout from which we are transitioning; it should always be equal to the current layout (which in this example, for a first barrier, would be VK_IMAGE_LAYOUT_PRESENT_SOURCE_KHR).Or we can use an undefined layout, which will let the driver perform some optimizations but the contents of the image may be discarded. Since we don’t need the contents, we can use an undefined layout here.
  • newLayout – A layout that is compatible with operations we will be performing after the barrier; we want to do image clears; to do that we need to specify VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL layout. We should always use a specific, dedicated layout.
  • srcQueueFamilyIndex – A queue family index that was referencing the image previously.
  • dstQueueFamilyIndex – A family index from which queues will be referencing images after the barrier (this refers to the swap chain sharing mode I was describing earlier).
  • image – handle to image itself.
  • subresourceRange – A structure describing parts of an image we want to perform transitions on; this is that last variable from the previous code example.

Some notes are necessary regarding access masks and family indices. In this example before the first barrier and after the second barrier only the presentation engine has access to the image. The presentation engine only reads from the image (it doesn’t modify it) so we set srcAccessMask in the first barrier and dstAccessMask in the second barrier to VK_ACCESS_MEMORY_READ_BIT. This indicates that the memory associated with the image is read-only (image contents are not modified before the first barrier and after the second barrier). In our command buffer we will only clear an image. This operation belongs to the so-called “transfer” operations. That is why I’ve set the value of VK_ACCESS_TRANSFER_WRITE_BIT in the first barrier in dstAccessMask field and in the srcAccessMask field of the second barrier.

I won’t go into more detail about queue family indices, but if a queue used for graphics operations and presentation are the same, srcQueueFamilyIndex and dstQueueFamilyIndex will be equal, and the hardware won’t make any modifications regarding image access from the queues. But remember that we have specified that only one queue at a time will access/use the image. So if these queues are different, we inform the hardware here about the “ownership” change, that different queue will now access the image. And this is all the information you need right now to properly set up barriers.

We need to create two barriers: one that changes the layout from the “present source” (or undefined) to ”transfer dst”. This barrier is used at the beginning of a command buffer, when the previously presentation engine used an image and now we want to use it and modify it. The second barrier is used to change the layout back into the “present source” when we are done using the images and we can give them back to a swap chain. This barrier is set at the end of a command buffer.

Now we are ready to start recording our commands by calling the vkBeginCommandBuffer() function. We provide a handle to a command buffer and an address of a variable of type VkCommandBufferBeginInfo and we are ready to go. Next we set up a barrier to change the image layout. We call the vkCmdPipelineBarrier() function, which takes quite a few parameters but in this example the only relevant ones are the first—the command buffer handle—and the last two: number of elements (barriers) of an array and a pointer to first element of an array containing the addresses of variables of type VkImageMemoryBarrier. Elements of this array describe images, their parts, and the types of transitions that should occur. After the barrier we can safely perform any operations on the swap chain image that are compatible with the layout we have transitioned images to. The general layout is compatible with all operations but with a (probably) reduced performance.

In the example we are only clearing images so we call the vkCmdClearColorImage() function. It takes a handle to a command buffer, handle to an image, current layout of an image, pointer to a variable with clear color value, number of subresources (number of elements in the array from the last parameter), and an array of pointers to variables of type VkImageSubresourceRange. Elements in the last array specify what parts of the image we want to clear (we don’t have to clear all mipmaps or array levels of an image if we don’t want to).

And at the end of our recording session we set up another barrier that transitions the image layout back to a “present source” layout. It is the only layout that is compatible with the present operations performed by the presentation engine.

Now we can call the vkEndCommandBuffer() function to inform that we have ended recording a command buffer. If something went wrong during recording we will be informed about it through the value returned by this function. If there were errors, we cannot use the command buffer, and we’ll need to record it once again. If everything is fine we can use the command buffer later to tell our device to perform operations stored in it just by submitting the buffer to a queue.

Tutorial 2 Execution

In this example, if everything went fine, we should see a window with a light-orange color displayed inside it. The contents of a window should look similar to this:

Cleaning Up

Now you know how to create a swap chain, display images in a window and perform simple operations that are executed on a device. We have created command buffers, recorded them, and presented on the screen. Before we close the application, we need to clean up the resources we were using. In this tutorial I have divided cleaning into two functions. The first function clears (destroys) only those resources that should be recreated when the swap chain is recreated (that is, after the size of an application’s window has changed).

if( Vulkan.Device != VK_NULL_HANDLE ) {
  vkDeviceWaitIdle( Vulkan.Device );

  if( (Vulkan.PresentQueueCmdBuffers.size() > 0) && (Vulkan.PresentQueueCmdBuffers[0] != VK_NULL_HANDLE) ) {
    vkFreeCommandBuffers( Vulkan.Device, Vulkan.PresentQueueCmdPool, static_cast<uint32_t>(Vulkan.PresentQueueCmdBuffers.size()), &Vulkan.PresentQueueCmdBuffers[0] );
    Vulkan.PresentQueueCmdBuffers.clear();
  }

  if( Vulkan.PresentQueueCmdPool != VK_NULL_HANDLE ) {
    vkDestroyCommandPool( Vulkan.Device, Vulkan.PresentQueueCmdPool, nullptr );
    Vulkan.PresentQueueCmdPool = VK_NULL_HANDLE;
  }
}

31.Tutorial02.cpp, Clear()

First we must be sure that no operations are executed on the device’s queues (we can’t destroy a resource that is used by the currently processed commands). We can check it by calling vkDeviceWaitIdle() function. It will block until all operations are finished.

Next we free all the allocated command buffers. In fact this operation is not necessary here. Destroying a command pool implicitly frees all command buffers allocated from a given pool. But I want to show you how to explicitly free command buffers. Next we destroy the command pool itself.

Here is the code that is responsible for destroying all of the resources created in this lesson:

Clear();

if( Vulkan.Device != VK_NULL_HANDLE ) {
  vkDeviceWaitIdle( Vulkan.Device );

  if( Vulkan.ImageAvailableSemaphore != VK_NULL_HANDLE ) {
    vkDestroySemaphore( Vulkan.Device, Vulkan.ImageAvailableSemaphore, nullptr );
  }
  if( Vulkan.RenderingFinishedSemaphore != VK_NULL_HANDLE ) {
    vkDestroySemaphore( Vulkan.Device, Vulkan.RenderingFinishedSemaphore, nullptr );
  }
  if( Vulkan.SwapChain != VK_NULL_HANDLE ) {
    vkDestroySwapchainKHR( Vulkan.Device, Vulkan.SwapChain, nullptr );
  }
  vkDestroyDevice( Vulkan.Device, nullptr );
}

if( Vulkan.PresentationSurface != VK_NULL_HANDLE ) {
  vkDestroySurfaceKHR( Vulkan.Instance, Vulkan.PresentationSurface, nullptr );
}

if( Vulkan.Instance != VK_NULL_HANDLE ) {
  vkDestroyInstance( Vulkan.Instance, nullptr );
}

if( VulkanLibrary ) {
#if defined(VK_USE_PLATFORM_WIN32_KHR)
  FreeLibrary( VulkanLibrary );
#elif defined(VK_USE_PLATFORM_XCB_KHR) || defined(VK_USE_PLATFORM_XLIB_KHR)
  dlclose( VulkanLibrary );
#endif
}

32.Tutorial02.cpp, destructor

First we destroy the semaphores (remember they cannot be destroyed when they are in use, that is, when a queue is waiting on a given semaphore). After that we destroy a swap chain. Images that were created along with it are automatically destroyed, and we don’t need to do it by ourselves (we are even not allowed to). Next the device is destroyed. We also need to destroy the surface that represents our application’s window. At the end, the Vulkan instance destruction takes place and the graphics driver’s dynamic library is unloaded. Before we perform each step we also check whether a given resource was properly created. We can’t destroy resources that weren’t properly created.

Conclusion

In this tutorial you learned how to display on a screen anything that was created with Vulkan API. To brief review the steps: First we enabled the proper instance level extensions. Next we created an application window’s Vulkan representation called a surface. Then we chose a device with a queue family that supported presentation and created a logical device (don’t forget about enabling device-level extensions!)

After that we created a swap chain. To do that we first acquired a set of parameters describing our surface and then chose values for proper swap chain creation. Those values had to fit into a surface’s supported constraints.

To draw something on the screen we learned how to create and record command buffers, which also included image’s layout transitions for which image memory barriers (pipeline barriers) were used. We cleared images so we could see the selected color being displayed on screen.

And we also learned how to present a given image on the screen, which included acquiring an image, submitting a command buffer, and the presentation process itself.


Go to: API without Secrets: Introduction to Vulkan* Part 3: First Triangle


Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

API without Secrets: Introduction to Vulkan* Part 3: First Triangle

$
0
0

Download [PDF 885 KB]

Link to Github Sample Code


Go to: API without Secrets: Introduction to Vulkan* Part 2: Swap Chain


Table of Contents

Tutorial 3: First Triangle – Graphics Pipeline and Drawing

In this tutorial we will finally draw something on the screen. One single triangle should be just fine for our first Vulkan-generated “image.”

The graphics pipeline and drawing in general require lots of preparations in Vulkan (in the form of filling many structures with even more different fields). There are potentially many places where we can make mistakes, and in Vulkan, even simple mistakes may lead to the application not working as expected, displaying just a blank screen, and leaving us wondering what went wrong. In such situations validation layers can help us a lot. But I didn’t want to dive into too many different aspects and the specifics of the Vulkan API. So I prepared the code to be as small and as simple as possible.

This led me to create an application that is working properly and displays a simple triangle the way I expected, but it also uses mechanics that are not recommended, not flexible, and also probably not too efficient (though correct). I don’t want to teach solutions that aren’t recommended, but here it simplifies the tutorial quite considerably and allows us to focus only on the minimal required set of API usage. I will point out the “disputable” functionality as soon as we get to it. And in the next tutorial, I will show the recommended way of drawing triangles.

To draw our first simple triangle, we need to create a render pass, a framebuffer, and a graphics pipeline. Command buffers are of course also needed, but we already know something about them. We will create simple GLSL shaders and compile them into Khronos’s SPIR*-V language—the only (at this time) form of shaders that Vulkan (officially) understands.

If nothing displays on your computer’s screen, try to simplify the code as much as possible or even go back to the second tutorial. Check whether command buffer that just clears image behaves as expected, and that the color the image was cleared to is properly displayed on the screen. If yes, modify the code and add the parts from this tutorial. Check every return value if it is not VK_SUCCESS. If these ideas don’t help, wait for the tutorial about validation layers.

About the Source Code Example

For this and succeeding tutorials, I’ve changed the sample project. Vulkan preparation phases that were described in the previous tutorials were placed in a “VulkanCommon” class found in separate files (header and source). The class for a given tutorial that is responsible for presenting topics described in a given tutorial, inherits from the “VulkanCommon” class and has access to some (required) Vulkan variables like device or swap chain. This way I can reuse Vulkan creation code and prepare smaller classes focusing only on the presented topics. The code from the earlier chapters works properly so it should also be easier to find potential mistakes.

I’ve also added a separate set of files for some utility functions. Here we will be reading SPIR-V shaders from binary files, so I’ve added a function for checking loading contents of a binary file. It can be found in Tools.cpp and Tools.h files.

Creating a Render Pass

To draw anything on the screen, we need a graphics pipeline. But creating it now will require pointers to other structures, which will probably also need pointers to yet other structures. So we’ll start with a render pass.

What is a render pass? A general picture can give us a “logical” render pass that may be found in many known rendering techniques like deferred shading. This technique consists of many subpasses. The first subpass draws the geometry with shaders that fill the G-Buffer: store diffuse color in one texture, normal vectors in another, shininess in another, depth (position) in yet another. Next for each light source, drawing is performed that reads some of the data (normal vectors, shininess, depth/position), calculates lighting and stores it in another texture. Final pass aggregates lighting data with diffuse color. This is a (very rough) explanation of deferred shading but describes the render pass—a set of data required to perform some drawing operations: storing data in textures and reading data from other textures.

In Vulkan, a render pass represents (or describes) a set of framebuffer attachments (images) required for drawing operations and a collection of subpasses that drawing operations will be ordered into. It is a construct that collects all color, depth and stencil attachments and operations modifying them in such a way that driver does not have to deduce this information by itself what may give substantial optimization opportunities on some GPUs. A subpass consists of drawing operations that use (more or less) the same attachments. Each of these drawing operations may read from some input attachments and render data into some other (color, depth, stencil) attachments. A render pass also describes the dependencies between these attachments: in one subpass we perform rendering into the texture, but in another this texture will be used as a source of data (that is, it will be sampled from). All this data help the graphics hardware optimize drawing operations.

To create a render pass in Vulkan, we call the vkCreateRenderPass() function, which requires a pointer to a structure describing all the attachments involved in rendering and all the subpasses forming the render pass. As usual, the more attachments and subpasses we use, the more array elements containing properly filed structures we need. In our simple example, we will be drawing only into a single texture (color attachment) with just a single subpass.

Render Pass Attachment Description

VkAttachmentDescription attachment_descriptions[] = {
  {
    0,                                          // VkAttachmentDescriptionFlags   flags
    GetSwapChain().Format,                      // VkFormat                       format
    VK_SAMPLE_COUNT_1_BIT,                      // VkSampleCountFlagBits          samples
    VK_ATTACHMENT_LOAD_OP_CLEAR,                // VkAttachmentLoadOp             loadOp
    VK_ATTACHMENT_STORE_OP_STORE,               // VkAttachmentStoreOp            storeOp
    VK_ATTACHMENT_LOAD_OP_DONT_CARE,            // VkAttachmentLoadOp             stencilLoadOp
    VK_ATTACHMENT_STORE_OP_DONT_CARE,           // VkAttachmentStoreOp            stencilStoreOp
    VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  initialLayout;
    VK_IMAGE_LAYOUT_PRESENT_SRC_KHR             // VkImageLayout                  finalLayout
  }
};

1.Tutorial03.cpp, function CreateRenderPass()

To create a render pass, first we prepare an array with elements describing each attachment, regardless of the type of attachment and how it will be used inside a render pass. Each array element is of type VkAttachmentDescription, which contains the following fields:

  • flags – Describes additional properties of an attachment. Currently, only an aliasing flag is available, which informs the driver that the attachment shares the same physical memory with another attachment; it is not the case here so we set this parameter to zero.
  • format – Format of an image used for the attachment; here we are rendering directly into a swap chain so we need to take its format.
  • samples – Number of samples of the image; we are not using any multisampling here so we just use one sample.
  • loadOp – Specifies what to do with the image’s contents at the beginning of a render pass, whether we want them to be cleared, preserved, or we don’t care about them (as we will overwrite them all). Here we want to clear the image to the specified value. This parameter also refers to depth part of depth/stencil images.
  • storeOp – Informs the driver what to do with the image’s contents after the render pass (after a subpass in which the image was used for the last time). Here we want the contents of the image to be preserved after the render pass as we intend to display them on screen. This parameter also refers to the depth part of depth/stencil images.
  • stencilLoadOp – The same as loadOp but for the stencil part of depth/stencil images; for color attachments it is ignored.
  • stencilStoreOp – The same as storeOp but for the stencil part of depth/stencil images; for color attachments this parameter is ignored.
  • initialLayout – The layout the given attachment will have when the render pass starts (what the layout image is provided with by the application).
  • finalLayout – The layout the driver will automatically transition the given image into at the end of a render pass.

Some additional information is required for load and store operations and initial and final layouts.

Load op refers to the attachment’s contents at the beginning of a render pass. This operation describes what the graphics hardware should do with the attachment: clear it, operate on its existing contents (leave its contents untouched), or it shouldn’t matter about the contents because the application intends to overwrite them. This gives the hardware an opportunity to optimize memory operations. For example, if we intend to overwrite all of the contents, the hardware won’t bother with them and, if it is faster, may allocate totally new memory for the attachment.

Store op, as the name suggests, is used at the end of a render pass and informs the hardware whether we want to use the contents of the attachment after the render pass or if we don’t care about it and the contents may be discarded. In some scenarios (when contents are discarded) this creates the ability for the hardware to create an image in temporary, fast memory as the image will “live” only during the render pass and the implementations may save some memory bandwidth avoiding writing back data that is not needed anymore.

When an attachment has a depth format (and potentially also a stencil component) load and store ops refer only to the depth component. If a stencil is present, stencil values are treated the way stencil load and store ops describe. For color attachments, stencil ops are not relevant.

Layout, as I described in the swap chain tutorial, is an internal memory arrangement of an image. Image data may be organized in such a way that neighboring “image pixels” are also neighbors in memory, which can increase cache hits (faster memory reading) when image is used as a source of data (that is, during texture sampling). But caching is not necessary when the image is used as a target for drawing operations, and the memory for that image may be organized in a totally different way. Image may have linear layout (which gives the CPU ability to read or populate image’s memory contents) or optimal layout (which is optimized for performance but is also hardware/vendor dependent). So some hardware may have special memory organization for some types of operations; other hardware may be operations-agnostic. Some of the memory layouts may be better suited for some intended image “usages.” Or from the other side, some usages may require specific memory layouts. There is also a general layout that is compatible with all types of operations. But from the performance point of view, it is always best to set the layout appropriate for an intended image usage and it is application’s responsibility to inform the driver about transitions.

Image layouts may be changed using image memory barriers. We did this in the swap chain tutorial when we first changed the layout from the presentation source (image was used by the presentation engine) to transfer destination (we wanted to clear the image with a given color). But layouts, apart from image memory barriers, may also be changed automatically by the hardware inside a render pass. If we specify a different initial layout, subpass layouts (described later), and final layout, the hardware does the transition automatically at the appropriate time.

Initial layout informs the hardware about the layout the application “provides” (or “leaves”) the given attachment with. This is the layout the image starts with at the beginning of a render pass (in our example we acquire the image from the presentation engine so the image has a “presentation source” layout set). Each subpass of a render pass may use a different layout, and the transition will be done automatically by the hardware between subpasses. The final layout is the layout the given attachment will be transitioned into (automatically) at the end of a render pass (after a render pass is finished).

This information must be prepared for each attachment that will be used in a render pass. When graphics hardware receives this information a priori, it may optimize operations and memory during the render pass to achieve the best possible performance.

Subpass Description

VkAttachmentReference color_attachment_references[] = {
  {
    0,                                          // uint32_t                       attachment
    VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL    // VkImageLayout                  layout
  }
};

VkSubpassDescription subpass_descriptions[] = {
  {
    0,                                          // VkSubpassDescriptionFlags      flags
    VK_PIPELINE_BIND_POINT_GRAPHICS,            // VkPipelineBindPoint            pipelineBindPoint
    0,                                          // uint32_t                       inputAttachmentCount
    nullptr,                                    // const VkAttachmentReference   *pInputAttachments
    1,                                          // uint32_t                       colorAttachmentCount
    color_attachment_references,                // const VkAttachmentReference   *pColorAttachments
    nullptr,                                    // const VkAttachmentReference   *pResolveAttachments
    nullptr,                                    // const VkAttachmentReference   *pDepthStencilAttachment
    0,                                          // uint32_t                       preserveAttachmentCount
    nullptr                                     // const uint32_t*                pPreserveAttachments
  }
};

2.Tutorial03.cpp, function CreateRenderPass()

Next we specify the description of each subpass our render pass will include. This is done using VkSubpassDescription structure, which contains the following fields:

  • flags – Parameter reserved for future use.
  • pipelineBindPoint – Type of pipeline in which this subpass will be used (graphics or compute). Our example, of course, uses a graphics pipeline.
  • inputAttachmentCount – Number of elements in the pInputAttachments array.
  • pInputAttachments – Array with elements describing which attachments are used as an input and can be read from inside shaders. We are not using any input attachments here so we set this value to 0.
  • colorAttachmentCount – Number of elements in pColorAttachments and pResolveAttachments arrays.
  • pColorAttachments – Array describing (pointing to) attachments which will be used as color render targets (that image will be rendered into).
  • pResolveAttachments – Array closely connected with color attachments. Each element from this array corresponds to an element from a color attachments array; any such color attachment will be resolved to a given resolve attachment (if a resolve attachment at the same index is not null or if the whole pointer is not null). This is optional and can be set to null.
  • pDepthStencilAttachment – Description of an attachment that will be used for depth (and/or stencil) data. We don’t use depth information here so we can set it to null.
  • preserveAttachmentCount – Number of elements in pPreserveAttachments array.
  • pPreserveAttachments – Array describing attachments that should be preserved. When we have multiple subpasses not all of them will use all attachments. If a subpass doesn’t use some of the attachments but we need their contents in the later subpasses, we must specify these attachments here.

The pInputAttachments, pColorAttachments, pResolveAttachments, pPreserveAttachments, and pDepthStencilAttachment parameters are all of type VkAttachmentReference. This structure contains only these two fields:

  • attachment – Index into an attachment_descriptions array of VkRenderPassCreateInfo.
  • layout – Requested (required) layout the attachment will use during a given subpass. The hardware will perform an automatic transition into a provided layout just before a given subpass.

This structure contains references (indices) into the attachment_descriptions array of VkRenderPassCreateInfo. When we create a render pass we must provide a description of all attachments used during a render pass. We’ve prepared this description earlier in “Render pass attachment description” when we created the attachment_descriptions array. Right now it contains only one element, but in more advanced scenarios there will be multiple attachments. So this “general” collection of all render pass attachments is used as a reference point. In the subpass description, when we fill pColorAttachments or pDepthStencilAttachment members, we provide indices into this very “general” collection, like this: take the first attachment from all render pass attachments and use it as a color attachment. The second attachment from that array will be used for depth data.

There is a separation between a whole render pass and its subpasses because each subpass may use multiple attachments in a different way, that is, in one subpass we are rendering into one color attachment but in the next subpass we are reading from this attachment. In this way, we can prepare a list of all attachments used in the whole render pass, and at the same time we can specify how each attachment will be used in each subpass. And as each subpass may use a given attachment in its own way, we must also specify each image’s layout for each subpass.

So before we can specify a description of all subpasses (an array with elements of type VkSubpassDescription) we must create references for each attachment used in each subpass. And this is what the color_attachment_references variable was created for. When I write a tutorial for rendering into a texture, this usage will be more apparent.

Render Pass Creation

We now have all the data we need to create a render pass.

vkRenderPassCreateInfo render_pass_create_info = {
  VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,    // VkStructureType                sType
  nullptr,                                      // const void                    *pNext
  0,                                            // VkRenderPassCreateFlags        flags
  1,                                            // uint32_t                       attachmentCount
  attachment_descriptions,                      // const VkAttachmentDescription *pAttachments
  1,                                            // uint32_t                       subpassCount
  subpass_descriptions,                         // const VkSubpassDescription    *pSubpasses
  0,                                            // uint32_t                       dependencyCount
  nullptr                                       // const VkSubpassDependency     *pDependencies
};

if( vkCreateRenderPass( GetDevice(), &render_pass_create_info, nullptr, &Vulkan.RenderPass ) != VK_SUCCESS ) {
  printf( "Could not create render pass!\n" );
  return false;
}

return true;

3.Tutorial03.cpp, function CreateRenderPass()

We start by filling the VkRenderPassCreateInfo structure, which contains the following fields:

  • sType – Type of structure (VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO here).
  • pNext – Parameter not currently used.
  • flags – Parameter reserved for future use.
  • attachmentCount – Number of all different attachments (elements in pAttachments array) used during whole render pass (here just one).
  • pAttachments – Array specifying all attachments used in a render pass.
  • subpassCount – Number of subpasses a render pass consists of (and number of elements in pSubpasses array – just one in our simple example).
  • pSubpasses – Array with descriptions of all subpasses.
  • dependencyCount – Number of elements in pDependencies array (zero here).
  • pDependencies – Array describing dependencies between pairs of subpasses. We don’t have many subpasses so we don’t have dependencies here (set it to null here).

Dependencies describe what parts of the graphics pipeline use memory resource in what way. Each subpass may use resources in a different way. Layouts of each resource may not solely define how they use resources. Some subpasses may render into images or store data through shader images. Other may not use images at all or may read from them at different pipeline stages (that is, vertex or fragment).

This information helps the driver optimize automatic layout transitions and, more generally, optimize barriers between subpasses. When we are writing into images only in a vertex shader there is no point waiting until the fragment shader executes (of course in terms of used images). After all the vertex operations are done, images may immediately change their layouts and memory access type, and even some parts of graphics hardware may start executing the next operations (that are referencing or reading the given images) without the need to wait for the rest of the commands from the given subpass to finish. For now, just remember that dependencies are important from a performance point of view.

So now that we have prepared all the information required to create a render pass, we can safely call the vkCreateRenderPass() function.

Creating a Framebuffer

We have created a render pass. It describes all attachments and all subpasses used during the render pass. But this description is quite abstract. We have specified formats of all attachments (just one image in this example) and described how attachments will be used by each subpass (also just one here). But we didn’t specify WHAT attachments we will be using or, in other words, what images will be used as these attachments. This is done through a framebuffer.

A framebuffer describes specific images that the render pass operates on. In OpenGL*, a framebuffer is a set of textures (attachments) we are rendering into. In Vulkan, this term is much broader. It describes all the textures (attachments) used during the render pass, not only the images we are rendering into (color and depth/stencil attachments) but also images used as a source of data (input attachments).

This separation of render pass and framebuffer gives us some additional flexibility. We can use the given render pass with different framebuffers and a given framebuffer with different render passes, if they are compatible, meaning that they operate in a similar fashion on images of similar types and usages.

Before we can create a framebuffer, we must create image views for each image used as a framebuffer and render pass attachment. In Vulkan, not only in the case of framebuffers, but in general, we don’t operate on images themselves. Images are not accessed directly. For this purpose, image views are used. Image views represent images, they “wrap” images and provide additional (meta)data for them.

Creating Image Views

In this simple application, we want to render directly into swap chain images. We have created a swap chain with multiple images, so we must create an image view for each of them.

const std::vector<VkImage> &swap_chain_images = GetSwapChain().Images;
Vulkan.FramebufferObjects.resize( swap_chain_images.size() );

for( size_t i = 0; i < swap_chain_images.size(); ++i ) {
  VkImageViewCreateInfo image_view_create_info = {
    VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,   // VkStructureType                sType
    nullptr,                                    // const void                    *pNext
    0,                                          // VkImageViewCreateFlags         flags
    swap_chain_images[i],                       // VkImage                        image
    VK_IMAGE_VIEW_TYPE_2D,                      // VkImageViewType                viewType
    GetSwapChain().Format,                      // VkFormat                       format
    {                                           // VkComponentMapping             components
      VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             r
      VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             g
      VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             b
      VK_COMPONENT_SWIZZLE_IDENTITY               // VkComponentSwizzle             a
    },
    {                                           // VkImageSubresourceRange        subresourceRange
      VK_IMAGE_ASPECT_COLOR_BIT,                  // VkImageAspectFlags             aspectMask
      0,                                          // uint32_t                       baseMipLevel
      1,                                          // uint32_t                       levelCount
      0,                                          // uint32_t                       baseArrayLayer
      1                                           // uint32_t                       layerCount
    }
  };

  if( vkCreateImageView( GetDevice(), &image_view_create_info, nullptr, &Vulkan.FramebufferObjects[i].ImageView ) != VK_SUCCESS ) {
    printf( "Could not create image view for framebuffer!\n" );
    return false;
  }

4.Tutorial03.cpp, function CreateFramebuffers()

To create an image view, we must first create a variable of type VkImageViewCreateInfo. It contains the following fields:

  • sType – Structure type, in this case it should be set to VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO.
  • pNext – Parameter typically set to null.
  • flags – Parameter reserved for future use.
  • image – Handle to an image for which view will be created.
  • viewType – Type of view we want to create. View type must be compatible with an image it is created for.  (that is, we can create a 2D view for an image that has multiple array layers or we can create a CUBE view for a 2D image with six layers).
  • format – Format of  an image view; it must be compatible with the image’s format but may not be the same format (that is, it may be a different format but with the same number of bits per pixel).
  • components – Mapping of an image components into a vector returned in the shader by texturing operations. This applies only to read operations (sampling), but since we are using an image as a color attachment (we are rendering into an image) we must set the so-called identity mapping (R component into R, G -> G, and so on) or just use “identity” value (VK_COMPONENT_SWIZZLE_IDENTITY).
  • subresourceRange – Describes the set of mipmap levels and array layers that will be accessible to a view. If our image is mipmapped, we may specify the specific mipmap level we want to render to (and in case of render targets we must specify exactly one mipmap level of one array layer).

As you can see here, we acquire handles to all swap chain images, and we are referencing them inside a loop. This way we fill the structure required for image view creation, which we pass to a vkCreateImageView() function. We do this for each image that was created along with a swap chain.

Specifying Framebuffer Parameters

Now we can create a framebuffer. To do this we call the vkCreateFramebuffer() function.

VkFramebufferCreateInfo framebuffer_create_info = {
    VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO,  // VkStructureType                sType
    nullptr,                                    // const void                    *pNext
    0,                                          // VkFramebufferCreateFlags       flags
    Vulkan.RenderPass,                          // VkRenderPass                   renderPass
    1,                                          // uint32_t                       attachmentCount&Vulkan.FramebufferObjects[i].ImageView,    // const VkImageView             *pAttachments
    300,                                        // uint32_t                       width
    300,                                        // uint32_t                       height
    1                                           // uint32_t                       layers
  };

  if( vkCreateFramebuffer( GetDevice(), &framebuffer_create_info, nullptr, &Vulkan.FramebufferObjects[i].Handle ) != VK_SUCCESS ) {
    printf( "Could not create a framebuffer!\n" );
    return false;
  }
}
return true;

5.Tutorial03.cpp, function CreateFramebuffers()

vkCreateFramebuffer() function requires us to provide a pointer to a variable of type VkFramebufferCreateInfo so we must first prepare it. It contains the following fields:

  • sType – Structure type set to VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO in this situation.
  • pNext – Parameter most of the time set to null.
  • flags – Parameter reserved for future use.
  • renderPass – Render pass this framebuffer will be compatible with.
  • attachmentCount – Number of attachments in a framebuffer (elements in pAttachments array).
  • pAttachments – Array of image views representing all attachments used in a framebuffer and render pass. Each element in this array (each image view) corresponds to each attachment in a render pass.
  • width – Width of a framebuffer.
  • height – Height of a framebuffer.
  • layers – Number of layers in a framebuffer (OpenGL’s layered rendering with geometry shaders, which could specify the layer into which fragments rasterized from a given polygon will be rendered).

The framebuffer specifies what images are used as attachments on which the render pass operates. We can say that it translates image (image view) into a given attachment. The number of images specified for a framebuffer must be the same as the number of attachments in a render pass for which we are creating a framebuffer. Also, each pAttachments array’s element corresponds directly to an attachment in a render pass description structure. Render pass and framebuffer are closely connected, and that’s why we also must specify a render pass during framebuffer creation. But we may use a framebuffer not only with the specified render pass but also with all render passes that are compatible with the one specified. Compatible render passes, in general, must have the same number of attachments and corresponding attachments must have the same format and number of samples. But image layouts (initial, final, and for each subpass) may differ and doesn’t involve render pass compatibility.

After we have finished creating and filling the VkFramebufferCreateInfo structure, we call the vkCreateFramebuffer() function.

The above code executes in a loop. A framebuffer references image views. Here the image view is created for each swap chain image. So for each swap chain image and its view, we are creating a framebuffer. We are doing this in order to simplify the code called in a rendering loop. In a normal, real-life scenario we wouldn’t (probably) create a framebuffer for each swap chain image. I assume that a better solution would be to render into a single image (texture) and after that use command buffers that would copy rendering results from that image into a given swap chain image. This way we will have only three simple command buffers that are connected with a swap chain. All other rendering commands would be independent of a swap chain, making it easier to maintain.

Creating a Graphics Pipeline

Now we are ready to create a graphics pipeline. A pipeline is a collection of stages that process data one stage after another. In Vulkan there is currently a compute pipeline and a graphics pipeline. The compute pipeline allows us to perform some computational work, such as performing physics calculations for objects in games. The graphics pipeline is used for drawing operations.

In OpenGL there are multiple programmable stages (vertex, tessellation, fragment shaders, and so on) and some fixed function stages (rasterizer, depth test, blending, and so on). In Vulkan, the situation is similar. There are similar (if not identical) stages. But the whole pipeline’s state is gathered in one monolithic object. OpenGL allows us to change the state that influences rendering operations anytime we want, we can change parameters for each stage (mostly) independently. We can set up shader programs, depths test, blending, and whatever state we want, and then we can render some objects. Next we can change just some small part of the state and render another object. In Vulkan, such operations can’t be done (we say that pipelines are “immutable”). We must prepare the whole state and set up parameters for pipeline stages and group them in a pipeline object. At the beginning this was one of the most startling pieces information for me. I’m not able to change shader program anytime I want? Why?

The easiest and more valid explanation is because of the performance implications of such state changes. Changing just one single state of the whole pipeline may cause graphics hardware to perform many background operations like state and error checking. Different hardware vendors may implement (and usually are implementing) such functionality differently. This may cause applications to perform differently (meaning unpredictably, performance-wise) when executed on different graphics hardware. So the ability to change anything at any time is convenient for developers. But, unfortunately, it is not so convenient for the hardware.

That’s why in Vulkan the state of the whole pipeline is to gather in one, single object. All the relevant state and error checking is performed when the pipeline object is created. When there are problems (like different parts of pipeline are set up in an incompatible way) pipeline object creation fails. But we know that upfront. The driver doesn’t have to worry for us and do whatever it can to properly use such a broken pipeline. It can immediately tell us about the problem. But during real usage, in performance-critical parts of the application, everything is already set up correctly and can be used as is.

The downside of this methodology is that we have to create multiple pipeline objects, multiple variations of pipeline objects when we are drawing many objects in a different way (some opaque, some semi-transparent, some with depth test enabled, others without). Unfortunately, even different shaders make us create different pipeline objects. If we want to draw objects using different shaders, we also have to create multiple pipeline objects, one for each combination of shader programs. Shaders are also connected with the whole pipeline state. They use different resources (like textures and buffers), render into different color attachments, and read from different attachments (possibly that were rendered into before). These connections must also be initialized, prepared, and set up correctly. We know what we want to do, the driver does not. So it is better and far more logical that we do it, not the driver. In general this approach makes sense.

To begin the pipeline creation process, let’s start with shaders.

Creating a Shader Module

Creating a graphics pipeline requires us to prepare lots of data in the form of structures or even arrays of structures. The first such data is a collection of all shader stages and shader programs that will be used during rendering with a given graphics pipeline bound.

In OpenGL, we write shaders in GLSL. They are compiled and then linked into shader programs directly in our application. We can use or stop using a shader program anytime we want in our application.

Vulkan on the other hand accepts only a binary representation of shaders, an intermediate language called SPIR-V. We can’t provide GLSL code like we did in OpenGL. But there is an official, separate compiler that can transform shaders written in GLSL into a binary SPIR-V language. To use it, we have to do it offline. After we prepare the SPIR-V assembly we can create a shader module from it. Such modules are then composed into an array of VkPipelineShaderStageCreateInfo structures, which are used, among other parameters, to create graphics pipeline.

Here’s the code that creates a shader module from a specified file that contains a binary SPIR-V.

const std::vector<char> code = Tools::GetBinaryFileContents( filename );
if( code.size() == 0 ) {
  return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}

VkShaderModuleCreateInfo shader_module_create_info = {
  VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,    // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  0,                                              // VkShaderModuleCreateFlags      flags
  code.size(),                                    // size_t                         codeSize
  reinterpret_cast<const uint32_t*>(&code[0])     // const uint32_t                *pCode
};

VkShaderModule shader_module;
if( vkCreateShaderModule( GetDevice(), &shader_module_create_info, nullptr, &shader_module ) != VK_SUCCESS ) {
  printf( "Could not create shader module from a %s file!\n", filename );
  return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}

return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>( shader_module, vkDestroyShaderModule, GetDevice() );

6.Tutorial03.cpp, function CreateShaderModule()

First we prepare a VkShaderModuleCreateInfo structure that contains the following fields:

  • sType – Type of structure, in this example set to VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO.
  • pNext – Pointer not yet used.
  • flags – Parameter reserved for future use.
  • codeSize – Size in bytes of the code passed in pCode parameter.
  • pCode – Pointer to an array with source code (binary SPIR-V assembly).

To acquire the contents of the file, I have prepared a simple utility function GetBinaryFileContents() that reads the entire contents of a specified file. It returns the content in a vector of chars.

After we prepare a structure, we can call the vkCreateShaderModule() function and check whether everything went fine.

The AutoDeleter<> class from Tools namespace is a helper class that wraps a given Vulkan object handle and takes a function that is called to delete that object. This class is similar to smart pointers, which delete the allocated memory when the object (the smart pointer) goes out of scope. AutoDeleter<> class takes the handle of a given object and deletes it with a provided function when the object of this class’s type goes out of scope.

template<class T, class F>
class AutoDeleter {
public:
  AutoDeleter() :
    Object( VK_NULL_HANDLE ),
    Deleter( nullptr ),
    Device( VK_NULL_HANDLE ) {
  }

  AutoDeleter( T object, F deleter, VkDevice device ) :
    Object( object ),
    Deleter( deleter ),
    Device( device ) {
  }

  AutoDeleter( AutoDeleter&& other ) {
    *this = std::move( other );
  }

  ~AutoDeleter() {
    if( (Object != VK_NULL_HANDLE) && (Deleter != nullptr) && (Device != VK_NULL_HANDLE) ) {
      Deleter( Device, Object, nullptr );
    }
  }

  AutoDeleter& operator=( AutoDeleter&& other ) {
    if( this != &other ) {
      Object = other.Object;
      Deleter = other.Deleter;
      Device = other.Device;
      other.Object = VK_NULL_HANDLE;
    }
    return *this;
  }

  T Get() {
    return Object;
  }

  bool operator !() const {
    return Object == VK_NULL_HANDLE;
  }

private:
  AutoDeleter( const AutoDeleter& );
  AutoDeleter& operator=( const AutoDeleter& );
  T         Object;
  F         Deleter;
  VkDevice  Device;
};

7.Tools.h

Why so much effort for one simple object? Shader modules are one of the objects required to create the graphics pipeline. But after the pipeline is created, we don’t need these shader modules anymore. Sometimes it is convenient to keep them as we may need to create additional, similar pipelines. But in this example they may be safely destroyed after we create a graphics pipeline. Shader modules are destroyed by calling the vkDestroyShaderModule() function. But in the example, we would need to call this function in many places: inside multiple “ifs” and at the end of the whole function. Because I don’t want to remember where I need to call this function and, at the same time, I don’t want any memory leaks to occur, I have prepared this simple class just for convenience. Now, I don’t have to remember to delete the created shader module because it will be deleted automatically.

Preparing a Description of the Shader Stages

Now that we know how to create and destroy shader modules, we can create data for shader stages compositing our graphics pipeline. As I have written, the data that describes what shader stages should be active when a given graphics pipeline is bound has a form of an array with elements of type VkPipelineShaderStageCreateInfo. Here is the code that creates shader modules and prepares such an array:

Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> vertex_shader_module = CreateShaderModule( "Data03/vert.spv" );
Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> fragment_shader_module = CreateShaderModule( "Data03/frag.spv" );

if( !vertex_shader_module || !fragment_shader_module ) {
  return false;
}

std::vector<VkPipelineShaderStageCreateInfo> shader_stage_create_infos = {
  // Vertex shader
  {
    VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,        // VkStructureType                                sType
    nullptr,                                                    // const void                                    *pNext
    0,                                                          // VkPipelineShaderStageCreateFlags               flags
    VK_SHADER_STAGE_VERTEX_BIT,                                 // VkShaderStageFlagBits                          stage
    vertex_shader_module.Get(),                                 // VkShaderModule                                 module"main",                                                     // const char                                    *pName
    nullptr                                                     // const VkSpecializationInfo                    *pSpecializationInfo
  },
  // Fragment shader
  {
    VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,        // VkStructureType                                sType
    nullptr,                                                    // const void                                    *pNext
    0,                                                          // VkPipelineShaderStageCreateFlags               flags
    VK_SHADER_STAGE_FRAGMENT_BIT,                               // VkShaderStageFlagBits                          stage
    fragment_shader_module.Get(),                               // VkShaderModule                                 module"main",                                                     // const char                                    *pName
    nullptr                                                     // const VkSpecializationInfo                    *pSpecializationInfo
  }
};

8.Tutorial03.cpp, function CreatePipeline()

At the beginning we are creating two shader modules for vertex and fragment stages. They are created with the function presented earlier. When any error occurs and we return from the CreatePipeline() function, any created module is deleted automatically by a wrapper class with a provided deleter function.

The code for the shader modules is read from files that contain the binary SPIR-V assembly. These files are generated with an application called “glslangValidator”. This is a tool distributed officially with the Vulkan SDK and is designed to validate GLSL shaders. But “glslangValidator” also has the capability to compile or rather transform GLSL shaders into SPIR-V binary files. A full explanation of the command line for its usage can be found at the official SDK site. I’ve used the following commands to generate SPIR-V shaders for this tutorial:

glslangValidator.exe -V -H shader.vert > vert.spv.txt

glslangValidator.exe -V -H shader.frag > frag.spv.txt

“glslangValidator” takes a specified file and generates SPIR-V file from it. The type of shader stage is automatically detected by the input file’s extension (“.vert” for vertex shaders, “.geom” for geometry shaders, and so on). The name of the generated file can be specified, but by default it takes a form “<stage>.spv”. So in our example “vert.spv” and “frag.spv” files will be generated.

SPIR-V files have a binary format so it may be hard to read and analyze them—but not impossible. When the “-H” option is used, “glslangValidator” outputs SPIR-V in a form that can be more easily read. This form is printed on standard output and that’s why I’m using the “> *.spv.txt” redirection operator.

Here are the contents of a “shader.vert” file from which SPIR-V assembly was generated for the vertex stage:

#version 400

void main() {
    vec2 pos[3] = vec2[3]( vec2(-0.7, 0.7), vec2(0.7, 0.7), vec2(0.0, -0.7) );
    gl_Position = vec4( pos[gl_VertexIndex], 0.0, 1.0 );
}

9.shader.vert

As you can see I have hardcoded the positions of all vertices used to render the triangle. They are indexed using the Vulkan-specific “gl_VertexIndex” built-in variable. In the simplest scenario, when using non-indexed drawing commands (which takes place here) this value starts from the value of the “firstVertex” parameter of a drawing command (zero in the provided example).

This is the disputable part I wrote about earlier—this approach is acceptable and valid but not quite convenient to maintain and also allows us to skip some of the “structure filling” needed to create the graphics pipeline. I’ve chosen it in order to shorten and simplify this tutorial as much as possible. In the next tutorial, I will present a more typical way of drawing any number of vertices, similar to using vertex arrays and indices in OpenGL.

Below is the source code of a fragment shader from the “shader.frag” file that was used to generate the SPIRV-V assembly for the fragment stage:

#version 400

layout(location = 0) out vec4 out_Color;

void main() {
  out_Color = vec4( 0.0, 0.4, 1.0, 1.0 );
}

10.shader.frag

In Vulkan’s shaders (when transforming from GLSL to SPIR-V) layout qualifiers are required. Here we specify to what output (color) attachment we want to store the color values generated by the fragment shader. Because we are using only one attachment, we must specify the first available location (zero).

Now that you know how to prepare shaders for applications using Vulkan, we can move on to the next step. After we have created two shader modules, we check whether these operations succeeded. If they did we can start preparing a description of all shader stages that will constitute our graphics pipeline.

For each enabled shader stage we need to prepare an instance of VkPipelineShaderStageCreateInfo structure. Arrays of these structures along with the number of its elements are together used in a graphics pipeline create info structure (provided to the function that creates the graphics pipeline). VkPipelineShaderStageCreateInfo structure has the following fields:

  • sType – Type of structure that we are preparing, which in this case must be equal to VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • stage – Type of shader stage we are describing (like vertex, tessellation control, and so on).
  • module – Handle to a shader module that contains the shader for a given stage.
  • pName – Name of the entry point of the provided shader.
  • pSpecializationInfo – Pointer to a VkSpecializationInfo structure, which we will leave for now and set to null.

When we are creating a graphics pipeline we don’t create too many (Vulkan) objects. Most of the data is presented in a form of just such structures.

Preparing Description of a Vertex Input

Now we must provide a description of the input data used for drawing. This is similar to OpenGL’s vertex data: attributes, number of components, buffers from which to take data, data’s stride, or step rate. In Vulkan this data is of course prepared in a different way, but in general the meaning is the same. Fortunately, because of the fact that vertex data is hardcoded into a vertex shader in this tutorial, we can almost entirely skip this step and fill the VkPipelineVertexInputStateCreateInfo with almost nulls and zeros:

VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,    // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineVertexInputStateCreateFlags          flags;
  0,                                                            // uint32_t                                       vertexBindingDescriptionCount
  nullptr,                                                      // const VkVertexInputBindingDescription         *pVertexBindingDescriptions
  0,                                                            // uint32_t                                       vertexAttributeDescriptionCount
  nullptr                                                       // const VkVertexInputAttributeDescription       *pVertexAttributeDescriptions
};

11. Tutorial03.cpp, function CreatePipeline()

But for clarity here is a description of the members of the VkPipelineVertexInputStateCreateInfo structure:

  • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO here.
  • pNext – Pointer to an extension-specific structure.
  • flags – Parameter reserved for future use.
  • vertexBindingDescriptionCount – Number of elements in the pVertexBindingDescriptions array.
  • pVertexBindingDescriptions – Array with elements describing input vertex data (stride and stepping rate).
  • vertexAttributeDescriptionCount – Number of elements in the pVertexAttributeDescriptions array.
  • pVertexAttributeDescriptions – Array with elements describing vertex attributes (location, format, offset).

Preparing the Description of an Input Assembly

The next step requires us to describe how vertices should be assembled into primitives. As with OpenGL, we must specify what topology we want to use: points, lines, triangles, triangle fan, and so on.

VkPipelineInputAssemblyStateCreateInfo input_assembly_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,  // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineInputAssemblyStateCreateFlags        flags
  VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST,                          // VkPrimitiveTopology                            topology
  VK_FALSE                                                      // VkBool32                                       primitiveRestartEnable
};

12.Tutorial03.cpp, function CreatePipeline()

We do that through the VkPipelineInputAssemblyStateCreateInfo structure, which contains the following members:

  • sType – Structure type set here to VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO.
  • pNext – Pointer not yet used.
  • flags – Parameter reserved for future use.
  • topology – Parameter describing how vertices will be organized to form a primitive.
  • primitiveRestartEnable – Parameter that tells whether a special index value (when indexed drawing is performed) restarts assembly of a given primitive.

Preparing the Viewport’s Description

We have finished dealing with input data. Now we must specify the form of output data, all the part of the graphics pipeline that are connected with fragments, like rasterization, window (viewport), depth tests, and so on. The first set of data we must prepare here is the state of the viewport, which specifies to what part of the image (or texture, or window) we want do draw.

VkViewport viewport = {
  0.0f,                                                         // float                                          x
  0.0f,                                                         // float                                          y
  300.0f,                                                       // float                                          width
  300.0f,                                                       // float                                          height
  0.0f,                                                         // float                                          minDepth
  1.0f                                                          // float                                          maxDepth
};

VkRect2D scissor = {
  {                                                             // VkOffset2D                                     offset
    0,                                                            // int32_t                                        x
    0                                                             // int32_t                                        y
  },
  {                                                             // VkExtent2D                                     extent
    300,                                                          // int32_t                                        width
    300                                                           // int32_t                                        height
  }
};

VkPipelineViewportStateCreateInfo viewport_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO,        // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineViewportStateCreateFlags             flags
  1,                                                            // uint32_t                                       viewportCount
  &viewport,                                                    // const VkViewport                              *pViewports
  1,                                                            // uint32_t                                       scissorCount&scissor                                                      // const VkRect2D                                *pScissors
};

13.Tutorial03.cpp, function CreatePipeline()

In this example, the usage is simple: we just set the viewport coordinates to some predefined values. I don’t check the size of the swap chain image we are rendering into. But remember that in real-life production applications this has to be done because the specification states that dimensions of the viewport cannot exceed the dimensions of the attachments that we are rendering into.

To specify the viewport’s parameters, we fill the VkViewport structure that contains these fields:

  • x – Left side of the viewport.
  • y – Upper side of the viewport.
  • width – Width of the viewport.
  • height – Height of the viewport.
  • minDepth – Minimal depth value used for depth calculations.
  • maxDepth – Maximal depth value used for depth calculations.

When specifying viewport coordinates, remember that the origin is different than in OpenGL. Here we specify the upper-left corner of the viewport (not the lower left).

Also worth noting is that the minDepth and maxDepth values must be between 0.0 and 1.0 (inclusive) but maxDepth can be lower than minDepth. This will cause the depth to be calculated in “reverse.”

Next we must specify the parameters for the scissor test. The scissor test, similarly to OpenGL, restricts generation of fragments only to the specified rectangular area. But in Vulkan, the scissor test is always enabled and can’t be turned off. We can just provide the values identical to the ones provided for viewport. Try changing these values and see how it influences the generated image.

The scissor test doesn’t have a dedicated structure. To provide data for it we fill the VkRect2D structure which contains two similar structure members. First is VkOffset2D with the following members:

  • x – Left side of the rectangular area used for scissor test
  • y – Upper side of the scissor area

The second member is of type VkExtent2D, which contains the following fields:

  • width – Width of the scissor rectangular area
  • height – Height of the scissor area

In general, the meaning of the data we provide for the scissor test through the VkRect2D structure is similar to the data prepared for viewport.

After we have finished preparing data for viewport and the scissor test, we can finally fill the structure that is used during pipeline creation. The structure is called VkPipelineViewportStateCreateInfo, and it contains the following fields:

  • sType – Type of the structure, VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO here.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • viewportCount – Number of elements in the pViewports array.
  • pViewports – Array with elements describing parameters of viewports used when the given pipeline is bound.
  • scissorCount – Number of elements in the pScissors array.
  • pScissors – Array with elements describing parameters of the scissor test for each viewport.

Remember that the viewportCount and scissorCount parameters must be equal. We are also allowed to specify more viewports, but then the multiViewport feature must be also enabled.

Preparing the Rasterization State’s Description

The next part of the graphics pipeline creation applies to the rasterization state. We must specify how polygons are going to be rasterized (changed into fragments), which means whether we want fragments to be generated for whole polygons or just their edges (polygon mode) or whether we want to see the front or back side or maybe both sides of the polygon (face culling). We can also provide depth bias parameters or indicate whether we want to enable depth clamp. This whole state is encapsulated into VkPipelineRasterizationStateCreateInfo. It contains the following members:

  • sType – Structure type, VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO in this example.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • depthClampEnable – Parameter describing whether we want to clamp depth values of the rasterized primitive to the frustum (when true) or if we want normal clipping to occur (false).
  • rasterizerDiscardEnable – Deactivates fragment generation (discards primitive before rasterization turning off fragment shader).
  • polygonMode – Controls how the fragments are generated for a given primitive (triangle mode): whether they are generated for the whole triangle, only its edges, or just its vertices.
  • cullMode – Chooses the triangle’s face used for culling (if enabled).
  • frontFace – Chooses which side of a triangle should be considered the front (depending on the winding order).
  • depthBiasEnable – Enabled or disables biasing of fragments’ depth values.
  • depthBiasConstantFactor – Constant factor added to each fragment’s depth value when biasing is enabled.
  • depthBiasClamp – Maximum (or minimum) value of bias that can be applied to fragment’s depth.
  • depthBiasSlopeFactor – Factor applied for fragment’s slope during depth calculations when biasing is enabled.
  • lineWidth – Width of rasterized lines.

Here is the source code responsible for setting rasterization state in our example:

VkPipelineRasterizationStateCreateInfo rasterization_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO,   // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineRasterizationStateCreateFlags        flags
  VK_FALSE,                                                     // VkBool32                                       depthClampEnable
  VK_FALSE,                                                     // VkBool32                                       rasterizerDiscardEnable
  VK_POLYGON_MODE_FILL,                                         // VkPolygonMode                                  polygonMode
  VK_CULL_MODE_BACK_BIT,                                        // VkCullModeFlags                                cullMode
  VK_FRONT_FACE_COUNTER_CLOCKWISE,                              // VkFrontFace                                    frontFace
  VK_FALSE,                                                     // VkBool32                                       depthBiasEnable
  0.0f,                                                         // float                                          depthBiasConstantFactor
  0.0f,                                                         // float                                          depthBiasClamp
  0.0f,                                                         // float                                          depthBiasSlopeFactor
  1.0f                                                          // float                                          lineWidth
};

14.Tutorial03.cpp, function CreatePipeline()

In the tutorial we are disabling as many parameters as possible to simplify the process, the code itself, and the rendering operations. The parameters that matter here set up (typical) fill mode for polygon rasterization, back face culling, and similar to OpenGL’s counterclockwise front faces. Depth biasing and clamping are also disabled (to enable depth clamping, we first need to enable a dedicated feature during logical device creation; similarly we must do the same for polygon modes other than “fill”).

Setting the Multisampling State’s Description

In Vulkan, when we are creating a graphics pipeline, we must also specify the state relevant to multisampling. This is done using the VkPipelineMultisampleStateCreateInfo structure. Here are its members:

  • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO here.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • rasterizationSamples – Number of per pixel samples used in rasterization.
  • sampleShadingEnable – Parameter specifying that shading should occur per sample (when enabled) instead of per fragment (when disabled).
  • minSampleShading – Specifies the minimum number of unique sample locations that should be used during the given fragment’s shading.
  • pSampleMask – Pointer to an array of static coverage sample masks; this can be null.
  • alphaToCoverageEnable – Controls whether the fragment’s alpha value should be used for coverage calculations.
  • alphaToOneEnable – Controls whether the fragment’s alpha value should be replaced with one.

In this example, I wanted to minimize possible problems so I’ve set parameters to values that generally disable multisampling—just one sample per given pixel with the other parameters turned off. Remember that if we want to enable sample shading or alpha to one, we also need to enable two respective features. Here is a source code that prepares the VkPipelineMultisampleStateCreateInfo structure:

VkPipelineMultisampleStateCreateInfo multisample_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,     // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineMultisampleStateCreateFlags          flags
  VK_SAMPLE_COUNT_1_BIT,                                        // VkSampleCountFlagBits                          rasterizationSamples
  VK_FALSE,                                                     // VkBool32                                       sampleShadingEnable
  1.0f,                                                         // float                                          minSampleShading
  nullptr,                                                      // const VkSampleMask                            *pSampleMask
  VK_FALSE,                                                     // VkBool32                                       alphaToCoverageEnable
  VK_FALSE                                                      // VkBool32                                       alphaToOneEnable
};

15.Tutorial03.cpp, function CreatePipeline()

Setting the Blending State’s Description

Another thing we need to prepare when creating a graphics pipeline is a blending state (which also includes logical operations).

VkPipelineColorBlendAttachmentState color_blend_attachment_state = {
  VK_FALSE,                                                     // VkBool32                                       blendEnable
  VK_BLEND_FACTOR_ONE,                                          // VkBlendFactor                                  srcColorBlendFactor
  VK_BLEND_FACTOR_ZERO,                                         // VkBlendFactor                                  dstColorBlendFactor
  VK_BLEND_OP_ADD,                                              // VkBlendOp                                      colorBlendOp
  VK_BLEND_FACTOR_ONE,                                          // VkBlendFactor                                  srcAlphaBlendFactor
  VK_BLEND_FACTOR_ZERO,                                         // VkBlendFactor                                  dstAlphaBlendFactor
  VK_BLEND_OP_ADD,                                              // VkBlendOp                                      alphaBlendOp
  VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT |         // VkColorComponentFlags                          colorWriteMask
  VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT
};

VkPipelineColorBlendStateCreateInfo color_blend_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,     // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineColorBlendStateCreateFlags           flags
  VK_FALSE,                                                     // VkBool32                                       logicOpEnable
  VK_LOGIC_OP_COPY,                                             // VkLogicOp                                      logicOp
  1,                                                            // uint32_t                                       attachmentCount
  &color_blend_attachment_state,                                // const VkPipelineColorBlendAttachmentState     *pAttachments
  { 0.0f, 0.0f, 0.0f, 0.0f }                                    // float                                          blendConstants[4]
};

16.Tutorial03.cpp, function CreatePipeline()

Final color operations are set up through the VkPipelineColorBlendStateCreateInfo structure. It contains the following fields:

  • sType – Type of the structure, set to VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO in this example.
  • pNext – Pointer reserved for future, extension-specific use.
  • flags – Parameter also reserved for future use.
  • logicOpEnable – Indicates whether we want to enable logical operations on pixels.
  • logicOp – Type of the logical operation we want to perform (like copy, clear, and so on)
  • attachmentCount – Number of elements in the pAttachments array.
  • pAttachments – Array containing state parameters for each color attachment used in a subpass for which the given graphics pipeline is bound.
  • blendConstants – Four-element array with color value used in blending operation (when a dedicated blend factor is used).

More information is needed for the attachmentCount and pAttachments parameters. When we want to perform drawing operations we set up parameters, the most important of which are graphics pipeline, render pass, and framebuffer. The graphics card needs to know how to draw (graphics pipeline which describes rendering state, shaders, test, and so on) and where to draw (the render pass gives general setup; the framebuffer specifies exactly what images are used). As I have already mentioned, the render pass specifies how operations are ordered, what the dependencies are, when we are rendering into a given attachment, and when we are reading from the same attachment. These stages take the form of subpasses. And for each drawing operation we can (but don’t have to) enable/use a different pipeline. But when we are drawing, we must remember that we are drawing into a set of attachments. This set is defined in a render pass, which describes all color, input, depth attachments (the framebuffer just specifies what images are used for each of them). For the blending state, we can specify whether we want to enable blending at all. This is done through the pAttachments array. Each of its elements must correspond to each color attachment defined in a render pass. So the value of attachmentCount elements in the pAttachments array must equal the number of color attachments defined in a render pass.

There is one more restriction. By default all elements in pAttachments array must contain the same values, must be specified in the same way, and must be identical. By default, blending (and color masks) is done in the same way for all attachments. So why it is an array? Why can’t we just specify one value? Because there is a feature that allows us to perform independent, distinct blending for each active color attachment. When we enable the independent blending feature during device creation we can provide different values for each color attachment.

Each pAttachments array’s element is of type VkPipelineColorBlendAttachmentState. It is a structure with the following members:

  • blendEnable – Indicates whether we want to enable blending at all.
  • srcColorBlendFactor – Blending factor for color of the source (incoming) fragment.
  • dstColorBlendFactor – Blending factor for the destination color (stored already in the framebuffer at the same location as the incoming fragment).
  • colorBlendOp – Type of operation to perform (multiplication, addition, and so on).
  • srcAlphaBlendFactor – Blending factor for the alpha value of the source (incoming) fragment.
  • dstAlphaBlendFactor – Blending factor for the destination alpha value (already stored in the framebuffer).
  • alphaBlendOp – Type of operation to perform for alpha blending.
  • colorWriteMask – Bitmask selecting which of the RGBA components are selected (enabled) for writing.

In this example, we disable blending, which causes all other parameters to be irrelevant. Except for colorWriteMask, we select all components for writing but you can freely check what will happen when this parameter is changed to some other R, G, B, A combinations.

Creating a Pipeline Layout

The final thing we must do before pipeline creation is create a proper pipeline layout. A pipeline layout describes all the resources that can be accessed by the pipeline. In this example we must specify how many textures can be used by shaders and which shader stages will have access to them. There are of course other resources involved. Apart from shader stages, we must also describe the types of resources (textures, buffers), their total numbers, and layout. This layout can be compared to OpenGL’s active textures and shader uniforms. In OpenGL we bind textures to the desired texture image units and for shader uniforms we don’t provide texture handles but IDs of the texture image units to which actual textures are bound (we provide the number of the unit which the given texture was associated with).

With Vulkan, the situation is similar. We create some form of a memory layout: first there are two buffers, next we have three textures and an image. This memory “structure” is called a set and a collection of these sets is provided for the pipeline. In shaders, we access specified resources using specific memory “locations” from within these sets (layouts). This is done through a layout (set = X, binding = Y) specifier, which can be translated to: take the resource from the Y memory location from the X set.

And pipeline layout can be thought of as an interface between shader stages and shader resources as it takes these groups of resources, describes how they are gathered, and provides them to the pipeline.

This process is complex and I plan to devote a tutorial to it. Here we are not using any additional resources so I present an example for creating an “empty” pipeline layout:

VkPipelineLayoutCreateInfo layout_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,  // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  0,                                              // VkPipelineLayoutCreateFlags    flags
  0,                                              // uint32_t                       setLayoutCount
  nullptr,                                        // const VkDescriptorSetLayout   *pSetLayouts
  0,                                              // uint32_t                       pushConstantRangeCount
  nullptr                                         // const VkPushConstantRange     *pPushConstantRanges
};

VkPipelineLayout pipeline_layout;
if( vkCreatePipelineLayout( GetDevice(), &layout_create_info, nullptr, &pipeline_layout ) != VK_SUCCESS ) {
  printf( "Could not create pipeline layout!\n" );
  return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>();
}

return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>( pipeline_layout, vkDestroyPipelineLayout, GetDevice() );

17.Tutorial03.cpp, function CreatePipelineLayout()

To create a pipeline layout we must first prepare a variable of type VkPipelineLayoutCreateInfo. It contains the following fields:

  • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO in this example.
  • pNext – Parameter reserved for extensions.
  • flags – Parameter reserved for future use.
  • setLayoutCount – Number of descriptor sets included in this layout.
  • pSetLayouts – Pointer to an array containing descriptions of descriptor layouts.
  • pushConstantRangeCount – Number of push constant ranges (I will describe it in a later tutorial).
  • pPushConstantRanges – Array describing all push constant ranges used inside shaders (in a given pipeline).

In this example we create “empty” layout so almost all the fields are set to null or zero.

We are not using push constants here, but they deserve some explanation. Push constants in Vulkan allow us to modify the data of constant variables used in shaders. There is a special, small amount of memory reserved for push constants. We update their values through Vulkan commands, not through memory updates, and it is expected that updates of push constants’ values are faster than normal memory writes.

As shown in the above example, I’m also wrapping pipeline layout in an “AutoDeleter” object. Pipeline layouts are required during pipeline creation, descriptor sets binding (enabling/activating this interface between shaders and shader resources) and push constants setting. None of these operations, except for pipeline creation, take place in this tutorial. So here, after we create a pipeline, we don’t need the layout anymore. To avoid memory leaks, I have used this helper class to destroy the layout as soon as we leave the function in which graphics pipeline is created.

Creating a Graphics Pipeline

Now we have all the resources required to properly create graphics pipeline. Here is the code that does that:

Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout> pipeline_layout = CreatePipelineLayout();
if( !pipeline_layout ) {
  return false;
}

VkGraphicsPipelineCreateInfo pipeline_create_info = {
  VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,              // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineCreateFlags                          flags
  static_cast<uint32_t>(shader_stage_create_infos.size()),      // uint32_t                                       stageCount&shader_stage_create_infos[0],                                // const VkPipelineShaderStageCreateInfo         *pStages&vertex_input_state_create_info,                              // const VkPipelineVertexInputStateCreateInfo    *pVertexInputState;&input_assembly_state_create_info,                            // const VkPipelineInputAssemblyStateCreateInfo  *pInputAssemblyState
  nullptr,                                                      // const VkPipelineTessellationStateCreateInfo   *pTessellationState&viewport_state_create_info,                                  // const VkPipelineViewportStateCreateInfo       *pViewportState&rasterization_state_create_info,                             // const VkPipelineRasterizationStateCreateInfo  *pRasterizationState&multisample_state_create_info,                               // const VkPipelineMultisampleStateCreateInfo    *pMultisampleState
  nullptr,                                                      // const VkPipelineDepthStencilStateCreateInfo   *pDepthStencilState&color_blend_state_create_info,                               // const VkPipelineColorBlendStateCreateInfo     *pColorBlendState
  nullptr,                                                      // const VkPipelineDynamicStateCreateInfo        *pDynamicState
  pipeline_layout.Get(),                                        // VkPipelineLayout                               layout
  Vulkan.RenderPass,                                            // VkRenderPass                                   renderPass
  0,                                                            // uint32_t                                       subpass
  VK_NULL_HANDLE,                                               // VkPipeline                                     basePipelineHandle
  -1                                                            // int32_t                                        basePipelineIndex
};

if( vkCreateGraphicsPipelines( GetDevice(), VK_NULL_HANDLE, 1, &pipeline_create_info, nullptr, &Vulkan.GraphicsPipeline ) != VK_SUCCESS ) {
  printf( "Could not create graphics pipeline!\n" );
  return false;
}
return true;

18.Tutorial03.cpp, function CreatePipeline()

First we create a pipeline layout wrapped in an object of type “AutoDeleter”. Next we fill the structure of type VkGraphicsPipelineCreateInfo. It contains many fields. Here is a brief description of them:

  • sType – Type of structure, VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO here.
  • pNext – Parameter reserved for future, extension-related use.
  • flags – This time this parameter is not reserved for future use but controls how the pipeline should be created: if we are creating a derivative pipeline (if we are inheriting from another pipeline) or if we allow creating derivative pipelines from this one. We can also disable optimizations, which should shorten the time needed to create a pipeline.
  • stageCount – Number of stages described in the pStages parameter; must be greater than zero.
  • pStages – Array with descriptions of active shader stages (the ones created using shader modules); each stage must be unique (we can’t specify a given stage more than once). There also must be a vertex stage present.
  • pVertexInputState – Pointer to a variable contain the description of the vertex input’s state.
  • pInputAssemblyState – Pointer to a variable with input assembly description.
  • pTessellationState – Pointer to a description of the tessellation stages; can be null if tessellation is disabled.
  • pViewportState – Pointer to a variable specifying viewport parameters; can be null if rasterization is disabled.
  • pRasterizationState – Pointer to a variable specifying rasterization behavior.
  • pMultisampleState – Pointer to a variable defining multisampling; can be null if rasterization is disabled.
  • pDepthStencilState – Pointer to a description of depth/stencil parameters; this can be null in two situations: when rasterization is disabled or we’re not using depth/stencil attachments in a render pass.
  • pColorBlendState – Pointer to a variable with color blending/write masks state; can be null also in two situations: when rasterization is disabled or when we’re not using any color attachments inside the render pass.
  • pDynamicState – Pointer to a variable specifying which parts of the graphics pipeline can be set dynamically; can be null if the whole state is considered static (defined only through this create info structure).
  • layout – Handle to a pipeline layout object that describes resources accessed inside shaders.
  • renderPass – Handle to a render pass object; pipeline can be used with any render pass compatible with the provided one.
  • subpass – Number (index) of a subpass in which the pipeline will be used.
  • basePipelineHandle – Handle to a pipeline this one should derive from.
  • basePipelineIndex – Index of a pipeline this one should derive from.

When we are creating a new pipeline, we can inherit some of the parameters from another one. This means that both pipelines should have much in common. A good example is shader code. We don’t specify what fields are the same, but the general message that the pipeline inherits from another one may substantially accelerate pipeline creation. But why are there two fields to indicate a “parent” pipeline? We can’t use them both—only one of them at a time. When we are using a handle, this means that the “parent” pipeline is already created and we are deriving from the one we have provided the handle of. But the pipeline creation function allows us to create many pipelines at once. Using the second parameter, “parent” pipeline index, we can create both “parent” and “child” pipelines in the same call. We just specify an array of graphics pipeline creation info structures and this array is provided to pipeline creation function. So the “basePipelineIndex” is the index of pipeline creation info in this very array. We just have to remember that the “parent” pipeline must be earlier (must have a smaller index) in this array and it must be created with the “allow derivatives” flag set.

In this example we are creating a pipeline with the state being entirely static (null for the “pDynamicState” parameter). But what is a dynamic state? To allow for some flexibility and to lower the number of created pipeline objects, the dynamic state was introduced. We can define through the “pDynamicState” parameter what parts of the graphics pipeline can be set dynamically through additional Vulkan commands and what parts are being static, set once during pipeline creation. The dynamic state includes parameters such as viewports, line widths, blend constants, or some stencil parameters. If we specify that a given state is dynamic, parameters in a pipeline creation info structure that are related to that state are ignored. We must set the given state using the proper Vulkan commands during rendering because initial values of such state may be undefined.

So after these quite overwhelming preparations we can create a graphics pipeline. This is done by calling the vkCreateGraphicsPipelines() function which, among others, takes an array of pointers to the pipeline create info structures. When everything goes well, VK_SUCCESS should be returned by this function and a handle of a graphics pipeline should be stored in a variable we’ve provided the address of. Now we are ready to start drawing.

Preparing Drawing Commands

I introduced you to the concept of command buffers in the previous tutorial. Here I will briefly explain what are they and how to use them.

Command buffers are containers for GPU commands. If we want to execute some job on a device, we do it through command buffers. This means that we must prepare a set of commands that process data (that is, draw something on the screen) and record these commands in command buffers. Then we can submit whole buffers to device’s queues. This submit operation tells the device: here is a bunch of things I want you to do for me and do them now.

To record commands, we must first allocate command buffers. These are allocated from command pools, which can be thought of as memory chunks. If a command buffer needs to be larger (as we record many complicated commands in it) it can grow and use additional memory from a pool it was allocated with. So first we must create a command pool.

Creating a Command Pool

Command pool creation is simple and looks like this:

VkCommandPoolCreateInfo cmd_pool_create_info = {
  VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,     // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  0,                                              // VkCommandPoolCreateFlags       flags
  queue_family_index                              // uint32_t                       queueFamilyIndex
};

if( vkCreateCommandPool( GetDevice(), &cmd_pool_create_info, nullptr, pool ) != VK_SUCCESS ) {
  return false;
}
return true;

19.Tutorial03.cpp, function CreateCommandPool()

First we prepare a variable of type VkCommandPoolCreateInfo. It contains the following fields:

  • sType – Standard type of structure, set to VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO here.
  • pNext – Pointer reserved for extensions.
  • flags – Indicates usage scenarios for command pool and command buffers allocated from it; that is, we can tell the driver that command buffers allocated from this pool will live for a short time; for no specific usage we can set it to zero.
  • queueFamilyIndex – Index of a queue family for which we are creating a command pool.

Remember that command buffers allocated from a given pool can only be submitted to a queue from a queue family specified during pool creation.

To create a command pool, we just call the vkCreateCommandPool() function.

Allocating Command Buffers

Now that we have the command pool ready, we can allocate command buffers from it.

VkCommandBufferAllocateInfo command_buffer_allocate_info = {
  VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  pool,                                           // VkCommandPool                  commandPool
  VK_COMMAND_BUFFER_LEVEL_PRIMARY,                // VkCommandBufferLevel           level
  count                                           // uint32_t                       bufferCount
};

if( vkAllocateCommandBuffers( GetDevice(), &command_buffer_allocate_info, command_buffers ) != VK_SUCCESS ) {
  return false;
}
return true;

20.Tutorial03.cpp, function AllocateCommandBuffers()

To allocate command buffers we specify a variable of structure type. This time its type is VkCommandBufferAllocateInfo, which contains these members:

  • sType – Type of the structure; VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO for this purpose.
  • pNext – Pointer reserved for extensions.
  • commandPool – Pool from which we want our command buffers to take their memory.
  • level – Command buffer level; there are two levels: primary and secondary; right now we are only interested in primary command buffers.
  • bufferCount – Number of command buffers we want to allocate.

To allocate command buffers, call the vkAllocateCommandBuffers() function and check whether it succeeded. We can allocate many buffers at once with one function call.

I’ve prepared a simple buffer allocating function to show you how some Vulkan functions can be wrapped for easier use. Here is a usage of two such wrapper functions that create command pools and allocate command buffers from them.

if( !CreateCommandPool( GetGraphicsQueue().FamilyIndex, &Vulkan.GraphicsCommandPool ) ) {
  printf( "Could not create command pool!\n" );
  return false;
}

uint32_t image_count = static_cast<uint32_t>(GetSwapChain().Images.size());
Vulkan.GraphicsCommandBuffers.resize( image_count, VK_NULL_HANDLE );

if( !AllocateCommandBuffers( Vulkan.GraphicsCommandPool, image_count, &Vulkan.GraphicsCommandBuffers[0] ) ) {
  printf( "Could not allocate command buffers!\n" );
  return false;
}
return true;

21.Tutorial03.cpp, function CreateCommandBuffers()

As you can see, we are creating a command pool for a graphics queue family index. All image state transitions and drawing operations will be performed on a graphics queue. Presentation is done on another queue (if the presentation queue is different from the graphics queue) but we don’t need a command buffer for this operation.

And we are also allocating command buffers for each swap chain image. Here we take number of images and provide it to this simple “wrapper” function for command buffer allocation.

Recording Command Buffers

Now that we have command buffers allocated from the command pool we can finally record operations that will draw something on the screen. First we must prepare a set of data needed for the recording operation. Some of this data is identical for all command buffers, but some is referencing a specific swap chain image. Here is a code that is independent of swap chain images:

VkCommandBufferBeginInfo graphics_commandd_buffer_begin_info = {
  VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,    // VkStructureType                        sType
  nullptr,                                        // const void                            *pNext
  VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT,   // VkCommandBufferUsageFlags              flags
  nullptr                                         // const VkCommandBufferInheritanceInfo  *pInheritanceInfo
};

VkImageSubresourceRange image_subresource_range = {
  VK_IMAGE_ASPECT_COLOR_BIT,                      // VkImageAspectFlags             aspectMask
  0,                                              // uint32_t                       baseMipLevel
  1,                                              // uint32_t                       levelCount
  0,                                              // uint32_t                       baseArrayLayer
  1                                               // uint32_t                       layerCount
};

VkClearValue clear_value = {
  { 1.0f, 0.8f, 0.4f, 0.0f },                     // VkClearColorValue              color
};

const std::vector<VkImage>& swap_chain_images = GetSwapChain().Images;

22.Tutorial03.cpp, function RecordCommandBuffers()

Performing command buffer recording is similar to OpenGL’s drawing lists where we start recording a list by calling the glNewList() function. Next we prepare a set of drawing commands and then we close the list or stop recording it (glEndList()). So the first thing we need to do is to prepare a variable of type VkCommandBufferBeginInfo. It is used when we start recording a command buffer and it tells the driver about the type, contents, and desired usage of a command buffer. Variables of this type contain the following members:

  • sType – Standard structure type, here set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO.
  • pNext – Pointer reserved for extensions.
  • flags – Parameters describing the desired usage (that is, if we want to submit this command buffer only once and destroy/reset it or if it is possible that the buffer will submitted again before the processing of its previous submission has finished).
  • pInheritanceInfo – Parameter used only when we want to record a secondary command buffer.

Next we describe the areas or parts of our images that we will set up image memory barriers for. Here we set up barriers to specify that queues from different families will reference a given image. This is done through a variable of type VkImageSubresourceRange with the following members:

  • aspectMask – Describes a “type” of image, whether it is for color, depth, or stencil data.
  • baseMipLevel – Number of a first mipmap level our operations will be performed on.
  • levelCount – Number of mipmap levels (including base level) we will be operating on.
  • baseArrayLayer – Number of an first array layer of an image that will take part in operations.
  • layerCount – Number of layers (including base layer) that will be modified.

Next we set up a clear value for our images. Before drawing we need to clear images. In previous tutorials, we performed this operation explicitly by ourselves. Here images are cleared as a part of a render pass attachment load operation. We set to “clear” so now we must specify the color to which an image must be cleared. This is done using a variable of type VkClearValue in which we provide R, G, B, A values.

Variables we have created thus far are independent of an image itself, and that’s why we have specified them before a loop. Now we can start recording command buffers:

for( size_t i = 0; i < Vulkan.GraphicsCommandBuffers.size(); ++i ) {
  vkBeginCommandBuffer( Vulkan.GraphicsCommandBuffers[i], &graphics_commandd_buffer_begin_info );

  if( GetPresentQueue().Handle != GetGraphicsQueue().Handle ) {
    VkImageMemoryBarrier barrier_from_present_to_draw = {
      VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,     // VkStructureType                sType
      nullptr,                                    // const void                    *pNext
      VK_ACCESS_MEMORY_READ_BIT,                  // VkAccessFlags                  srcAccessMask
      VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,       // VkAccessFlags                  dstAccessMask
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  oldLayout
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  newLayout
      GetPresentQueue().FamilyIndex,              // uint32_t                       srcQueueFamilyIndex
      GetGraphicsQueue().FamilyIndex,             // uint32_t                       dstQueueFamilyIndex
      swap_chain_images[i],                       // VkImage                        image
      image_subresource_range                     // VkImageSubresourceRange        subresourceRange
    };
    vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_present_to_draw );
  }

  VkRenderPassBeginInfo render_pass_begin_info = {
    VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,     // VkStructureType                sType
    nullptr,                                      // const void                    *pNext
    Vulkan.RenderPass,                            // VkRenderPass                   renderPass
    Vulkan.FramebufferObjects[i].Handle,          // VkFramebuffer                  framebuffer
    {                                             // VkRect2D                       renderArea
      {                                           // VkOffset2D                     offset
        0,                                          // int32_t                        x
        0                                           // int32_t                        y
      },
      {                                           // VkExtent2D                     extent
        300,                                        // int32_t                        width
        300,                                        // int32_t                        height
      }
    },
    1,                                            // uint32_t                       clearValueCount
    &clear_value                                  // const VkClearValue            *pClearValues
  };

  vkCmdBeginRenderPass( Vulkan.GraphicsCommandBuffers[i], &render_pass_begin_info, VK_SUBPASS_CONTENTS_INLINE );

  vkCmdBindPipeline( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, Vulkan.GraphicsPipeline );

  vkCmdDraw( Vulkan.GraphicsCommandBuffers[i], 3, 1, 0, 0 );

  vkCmdEndRenderPass( Vulkan.GraphicsCommandBuffers[i] );

  if( GetGraphicsQueue().Handle != GetPresentQueue().Handle ) {
    VkImageMemoryBarrier barrier_from_draw_to_present = {
      VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,       // VkStructureType              sType
      nullptr,                                      // const void                  *pNext
      VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,         // VkAccessFlags                srcAccessMask
      VK_ACCESS_MEMORY_READ_BIT,                    // VkAccessFlags                dstAccessMask
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,              // VkImageLayout                oldLayout
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,              // VkImageLayout                newLayout
      GetGraphicsQueue().FamilyIndex,               // uint32_t                     srcQueueFamilyIndex
      GetPresentQueue( ).FamilyIndex,               // uint32_t                     dstQueueFamilyIndex
      swap_chain_images[i],                         // VkImage                      image
      image_subresource_range                       // VkImageSubresourceRange      subresourceRange
    };
    vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_draw_to_present );
  }

  if( vkEndCommandBuffer( Vulkan.GraphicsCommandBuffers[i] ) != VK_SUCCESS ) {
    printf( "Could not record command buffer!\n" );
    return false;
  }
}
return true;

23.Tutorial03.cpp, function RecordCommandBuffers()

Recording a command buffer is started by calling the vkBeginCommandBuffer() function. At the beginning we set up a barrier that tells the driver that previously queues from one family referenced a given image but now queues from a different family will be referencing it (we need to do this because during swap chain creation we specified exclusive sharing mode). The barrier is set only when the graphics queue is different than the present queue. This is done by calling the vkCmdPipelineBarrier() function. We must specify when in the pipeline the barrier should be placed (VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT) and how the barrier should be set up. Barrier parameters are prepared through the VkImageMemoryBarrier structure:

  • sType – Type of the structure, here set to VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER.
  • pNext – Pointer reserved for extensions.
  • srcAccessMask – Type of memory operations that took place in regard to a given image before the barrier.
  • dstAccessMask – Type of memory operations connected with a given image that will take place after the barrier.
  • oldLayout – Current image memory layout.
  • newLayout – Memory layout image you should have after the barrier.
  • srcQueueFamilyIndex – Index of a family of queues which were referencing image before the barrier.
  • dstQueueFamilyIndex – Index of a queue family queues from which will be referencing image after the barrier.
  • image – Handle to the image itself.
  • subresourceRange – Parts of an image for which we want the transition to occur.

In this example we don’t change the layout of an image, for two reasons: (1) The barrier may not be set at all (if the graphics and present queues are the same), and (2) the layout transition will be performed automatically as a render pass operation (at the beginning of the first—and only—subpass).

Next we start a render pass. We call the vkCmdBeginRenderPass() function for which we must provide a pointer to a variable of VkRenderPassBeginInfo type. It contains the following members:

  • sType – Standard type of structure. In this case we must set it to a value of VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO.
  • pNext – Pointer reserved for future use.
  • renderPass – Handle of a render pass we want to start.
  • framebuffer – Handle of a framebuffer, which specifies images used as attachments for this render pass.
  • renderArea – Area of all images that will be affected by the operations that takes place in this render pass. It specifies the upper-left corner (through x and y parameters of an offset member) and width and height (through extent member) of a render area.
  • clearValueCount – Number of elements in pClearValues array.
  • pClearValues – Array with clear values for each attachment.

When we specify a render area for the render pass, we must make sure that the rendering operations won’t modify pixels outside this area. This is just a hint for a driver so it could optimize its behavior. If we won’t confine operations to the provided area by using a proper scissor test, pixels outside this area may become undefined (we can’t rely on their contents). We also can’t specify a render area that is greater than a framebuffer’s dimensions (falls outside the framebuffer).

And with a pClearValues array, it must contain the elements for each render pass attachment. Each of its members specifies the color to which the given attachment must be cleared when its loadOp is set to clear. For attachments where loadOp is not clear, the values provided for them are ignored. But we can’t provide an array with a smaller amount of elements.

We have begun a command buffer, set a barrier (if necessary), and started a render pass. When we start a render pass we are also starting its first subpass. We can switch to the next subpass by calling the vkCmdNextSubpass() function. During these operations, layout transitions and clear operations may occur. Clears are done in a subpass in which the image is first used (referenced). Layout transitions occur each time a subpass layout is different than the layout in a previous subpass or (in the case of a first subpass or when the image is first referenced) different than the initial layout (layout before the render pass). So in our example when we start a render pass, the swap chain image’s layout is changed automatically from “presentation source” to a “color attachment optimal” layout.

Now we bind a graphics pipeline. This is done by calling the vkCmdBindPipeline() function. This “activates” all shader programs (similar to the glUseProgram() function) and sets desired tests, blending operations, and so on.

After the pipeline is bound, we can finally draw something by calling the vkCmdDraw() function. In this function we specify the number of vertices we want to draw (three), number of instances that should be drawn (just one), and a numbers or indices of a first vertex and first instance (both zero).

Next the vkCmdEndRenderPass() function is called which, as the name suggests, ends the given render pass. Here all final layout transitions occur if the final layout specified for a render pass is different from the layout used in the last subpass the given image was referenced in.

After that, the barrier may be set in which we tell the driver that the graphics queue finished using a given image and from now on the present queue will be using it. This is done, once again, only when the graphics and present queues are different. And after the barrier, we stop recording a command buffer for a given image. All these operations are repeated for each swap chain image.

Drawing

The drawing function is the same as the Draw() function presented in Tutorial 2. We acquire the image’s index, submit a proper command buffer, and present an image. We are using semaphores the same way they were used previously: one semaphore is used for acquiring an image and it tells the graphics queue to wait when the image is not yet available for use. The second command buffer is used to indicate whether drawing on a graphics queue is finished. The present queue waits on this semaphore before it can present an image. Here is the source code of a Draw() function:

VkSemaphore image_available_semaphore = GetImageAvailableSemaphore();
VkSemaphore rendering_finished_semaphore = GetRenderingFinishedSemaphore();
VkSwapchainKHR swap_chain = GetSwapChain().Handle;
uint32_t image_index;

VkResult result = vkAcquireNextImageKHR( GetDevice(), swap_chain, UINT64_MAX, image_available_semaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
  case VK_SUCCESS:
  case VK_SUBOPTIMAL_KHR:
    break;
  case VK_ERROR_OUT_OF_DATE_KHR:
    return OnWindowSizeChanged();
  default:
    printf( "Problem occurred during swap chain image acquisition!\n" );
    return false;
}

VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkSubmitInfo submit_info = {
  VK_STRUCTURE_TYPE_SUBMIT_INFO,                // VkStructureType              sType
  nullptr,                                      // const void                  *pNext
  1,                                            // uint32_t                     waitSemaphoreCount
  &image_available_semaphore,                   // const VkSemaphore           *pWaitSemaphores&wait_dst_stage_mask,                         // const VkPipelineStageFlags  *pWaitDstStageMask;
  1,                                            // uint32_t                     commandBufferCount&Vulkan.GraphicsCommandBuffers[image_index],  // const VkCommandBuffer       *pCommandBuffers
  1,                                            // uint32_t                     signalSemaphoreCount&rendering_finished_semaphore                 // const VkSemaphore           *pSignalSemaphores
};

if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info, VK_NULL_HANDLE ) != VK_SUCCESS ) {
  return false;
}

VkPresentInfoKHR present_info = {
  VK_STRUCTURE_TYPE_PRESENT_INFO_KHR,           // VkStructureType              sType
  nullptr,                                      // const void                  *pNext
  1,                                            // uint32_t                     waitSemaphoreCount
  &rendering_finished_semaphore,                // const VkSemaphore           *pWaitSemaphores
  1,                                            // uint32_t                     swapchainCount&swap_chain,                                  // const VkSwapchainKHR        *pSwapchains&image_index,                                 // const uint32_t              *pImageIndices
  nullptr                                       // VkResult                    *pResults
};
result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );

switch( result ) {
  case VK_SUCCESS:
    break;
  case VK_ERROR_OUT_OF_DATE_KHR:
  case VK_SUBOPTIMAL_KHR:
    return OnWindowSizeChanged();
  default:
    printf( "Problem occurred during image presentation!\n" );
    return false;
}

return true;

24.Tutorial03.cpp, function Draw()

Tutorial 3 Execution

In this tutorial we performed “real” drawing operations. A simple triangle may not sound too convincing, but it is a good starting point for a first Vulkan-created image. Here is what the triangle should look like:

If you’re wondering why there are black parts in the image, here is an explanation: To simplify the whole code, we created a framebuffer with a fixed size (width and height of 300 pixels). But the window’s size (and the size of the swap chain images) may be greater than these 300 x 300 pixels. The parts of an image that lay outside of the framebuffer’s dimensions are uncleared and unmodified by our application. They may even contain some “artifacts,” because the memory from which the driver allocates the swap chain images may have been previously used for other purposes and could contain some data. The correct behavior is to create a framebuffer with the same size as the swap chain images and to recreate it when the window’s size changes. But as long as the blue triangle is rendered on an orange/gold background, it means that the code works correctly.

Cleaning Up

One last thing to learn before this tutorial ends is how to release resources created during this lesson. I won’t repeat the code needed to release resources created in the previous chapter. Just look at the VulkanCommon.cpp file. Here is the code needed to destroy resources specific to this chapter:

if( GetDevice() != VK_NULL_HANDLE ) {
  vkDeviceWaitIdle( GetDevice() );

  if( (Vulkan.GraphicsCommandBuffers.size() > 0) && (Vulkan.GraphicsCommandBuffers[0] != VK_NULL_HANDLE) ) {
    vkFreeCommandBuffers( GetDevice(), Vulkan.GraphicsCommandPool, static_cast<uint32_t>(Vulkan.GraphicsCommandBuffers.size()), &Vulkan.GraphicsCommandBuffers[0] );
    Vulkan.GraphicsCommandBuffers.clear();
  }

  if( Vulkan.GraphicsCommandPool != VK_NULL_HANDLE ) {
    vkDestroyCommandPool( GetDevice(), Vulkan.GraphicsCommandPool, nullptr );
    Vulkan.GraphicsCommandPool = VK_NULL_HANDLE;
  }

  if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {
    vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
    Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
  }

  if( Vulkan.RenderPass != VK_NULL_HANDLE ) {
    vkDestroyRenderPass( GetDevice(), Vulkan.RenderPass, nullptr );
    Vulkan.RenderPass = VK_NULL_HANDLE;
  }

  for( size_t i = 0; i < Vulkan.FramebufferObjects.size(); ++i ) {
    if( Vulkan.FramebufferObjects[i].Handle != VK_NULL_HANDLE ) {
      vkDestroyFramebuffer( GetDevice(), Vulkan.FramebufferObjects[i].Handle, nullptr );
      Vulkan.FramebufferObjects[i].Handle = VK_NULL_HANDLE;
    }

    if( Vulkan.FramebufferObjects[i].ImageView != VK_NULL_HANDLE ) {
      vkDestroyImageView( GetDevice(), Vulkan.FramebufferObjects[i].ImageView, nullptr );
      Vulkan.FramebufferObjects[i].ImageView = VK_NULL_HANDLE;
    }
  }
  Vulkan.FramebufferObjects.clear();
}

25.Tutorial03.cpp, function ChildClear()

As usual we first check whether there is any device. If we don’t have a device, we don’t have a resource. Next we wait until the device is free and we delete all the created resources. We start from deleting command buffers by calling a vkFreeCommandBuffers() function. Next we destroy a command pool through a vkDestroyCommandPool() function and after that the graphics pipeline is destroyed. This is achieved through a vkDestroyPipeline() function call. Next we call a vkDestroyRenderPass() function, which releases the handle to a render pass. Finally, all framebuffers and image views associated with each swap chain image are deleted.

Each object destruction is preceded by a check whether a given resource was properly created. If not we skip the process of destruction of such resource.

Conclusion

In this tutorial, we created a render pass with one subpass. Next we created image views and framebuffers for each swap chain image. One of the most difficult parts was to create a graphics pipeline, because it required us to prepare lots of data. We had to create shader modules and describe all the shader stages that should be active when a given graphics pipeline is bound. We had to prepare information about input vertices, their layout, and assembling them into polygons. Viewport, rasterization, multisampling, and color blending information was also necessary. Then we created a simple pipeline layout and after that we could create the pipeline itself. Next we created a command pool and allocated command buffers for each swap chain image. Operations recorded in each command buffer involved setting up an image memory barrier, beginning a render pass, binding a graphics pipeline, and drawing. Next we ended a render pass and set up another image memory barrier. The drawing itself was performed the same way as in the previous tutorial (2).

In the next tutorial, we will learn about the vertex attributes, images and buffers.


Go to: API without Secrets: Introduction to Vulkan* Part 4: Vertex Attributes ( To Be Continued)


Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

Intel® Collaboration Suite for WebRTC Simplifies Adding Real-Time Communication to Your Applications

$
0
0

Download PDF [PDF 569 KB]

Overview

Web-based real-time communication (WebRTC) is an open standard proposed by both World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) that allows browser-to-browser applications to support voice calling, video chat, and peer-to-peer (P2P) data transmission. End users can use their browsers for real-time communication without the need for any additional clients or plugins.

The WebRTC standard is gaining significant momentum and is currently fully supported by open standard browsers such as Google Chrome*, Mozilla Firefox*, and Opera*. Microsoft also announced its Edge* browser support on Object RTC (ORTC), which will be interoperable with WebRTC.

To ease adoption of this WebRTC technology and make it widely available to expand or create new applications, Intel has developed the end-to-end WebRTC solution, Intel® Collaboration Suite for WebRTC (Intel® CS for WebRTC). Intel CS for WebRTC is highly optimized for Intel® platforms, including Intel® Xeon® processor-based products such as Intel® Visual Compute Accelerator card, Intel® CoreTM processor-based desktop products, and Intel® AtomTM processor-based mobile products.

You can download Intel CS for WebRTC from http://webrtc.intel.com at no charge. It includes the following main components:

  • Intel CS for WebRTC Conference Server– enables not only P2P-style communication, but also efficient WebRTC-based video conferencing.
  • Intel CS for WebRTC Gateway Server for SIP– provides the WebRTC connectivity into session initiation protocol (SIP) conferences.
  • Intel CS for WebRTC Client SDK– allows you to develop WebRTC apps using JavaScript* APIs, Internet Explorer* plugin for WebRTC, Android* native apps using Java* APIs, iOS* native apps using Objective-C* APIs, or Windows* native apps using C++ APIs.
  • Intel CS for WebRTC User Documentation – includes complete online documentation available on the WebRTC website http://webrtc.intel.com, with sample code, installation instructions, and API descriptions.

Problems with Existing WebRTC-Based RTC Solutions

WebRTC-based RTC solutions change the way people communicate, bringing real-time communication to the browser. However, as a new technology, WebRTC-based solutions require improvements in the following areas to be as complete as traditional RTC solutions.

  • Mostly P2P communication based. The WebRTC standard itself as well as Google WebRTC open source reference implementation only focuses on peer-to-peer (P2P) communication, limiting most of the WebRTC-based solutions to two-party communication. Although some WebRTC solutions support multi-party chat, these solutions use mesh network topology, which is less efficient and can only support a few attendees for common client devices.
  • Not fully accounting for client usage preferences. Although browsers are available for multiple platforms, not all users like browsers. That is, many mobile platform end-users prefer native apps, such as Android apps or iOS apps. Additionally, some commonly used browsers, such as Internet Explorer, still do not natively support WebRTC.
  • Lack of flexibility on MCU server. Some WebRTC-based solutions support multipoint control unit (MCU) servers for multi-party communication. However, most of those MCU servers use a router/forward solution, which just forwards the publishers’ streams to the subscribers. Although this method fulfills part of the scenarios when clients have equivalent capabilities or SVC/Simulcast is supported, it becomes a high requirement for clients to easily meet. To work with a wide variety of devices, MCU servers must do some media-specific processing, such as transcoding and mixing.
  • Limited deployment mode choices for customers. Most of the existing WebRTC-based RTC solutions work as a service model hosted by service providers. This style provides all the benefits of a cloud service, but is not useful for service providers and those who want to host the service themselves for data sensitive consideration.

Key differentiation of Intel® CS for WebRTC

Fully Functional WebRTC-Based Audio/Video Communication

Intel CS for WebRTC not only offers peer-to-peer WebRTC communication, but it also supports WebRTC-based multi-party video conferencing and provides the WebRTC client connectivity to other traditional video conferences, like SIP. For video conferencing, it provides router and mixer solutions simultaneously to handle complex customer scenarios. Additionally, it supports:

  • H.264 and VP8 video codecs for input and output streams
  • MCU multi-streaming
  • Real-time streaming protocol (RTSP) stream input
  • Customized video layout definition plus runtime control
  • Voice activity detection (VAD) controlled video switching
  • Flexible media recording

Easy to Deploy, Scale, and Integrate

Intel CS for WebRTC Conference and Gateway Servers provide pluggable integration modules as well as open APIs to work with existing enterprise systems. They easily scale to cluster mode and serve larger number of users with an increase of cluster node numbers. In addition, the Intel solution provides comprehensive client SDKs including JavaScript SDK, Android native SDK, iOS native SDK, and Windows native SDK to help customers quickly expand their client applications with video communication capabilities.

High-Performance Media Processing Capability

Intel CS for WebRTC MCU and Gateway servers are built on top of Intel® Media Server Studio, optimized for Intel® Core™ processors and Intel® Xeon® processor E3 family with Intel® Iris™ graphics, Intel® Iris™ Pro graphics, and Intel® HD graphics technology.

The client SDKs, including the Android native SDK and Windows C++ SDK, use the mobile and desktop platforms’ hardware media processing capabilities to improve the user experience. That is, the Android native SDK is optimized for Intel® Atom™ platforms (all Intel® Atom™ x3, x5, and x7 processor series) focusing on video power and performance, as well as end-to-end latency. The Windows C++ SDK also uses the media processing acceleration of the Intel® Core™ processor-based platforms (i3, i5, i7) for consistent HD video communication.

Secure, Intelligent, Reliable QoS Control Support

Intel CS for WebRTC solution ensures video communication data security through HTTPS, secure WebSocket, SRTP/DTLS, etc. Also the intelligent quality of service (QoS) control, e.g., NACK, FEC, and dynamic bitrate control, guarantees the communication quality between clients and servers against high packet loss and network bandwidth variance. Experiments listed in Figure 1 have shown that the Intel video engine handles up to 20% packet loss and 200ms delay.

Figure 1. Packet Loss Protection Results with QoS Control
Figure 1. Packet Loss Protection Results with QoS Control

Full Functional Video Communication with Intel CS for WebRTC Conference Servers

Flexible Communication Modes

Intel CS for WebRTC offers both peer-to-peer video call and MCU-based multi-party video conference communication modes.

A typical WebRTC usage scenario is direct peer-to-peer video call. After connecting to the signaling server, users can invite other parties for P2P video communication. All video, audio, and data streams are transported directly between each other. Meanwhile, the signaling messages for discovery and control go through the signaling server. As Figure 2 shows, Intel provides a reference signaling server implementation called Peer Server with source code included. Customers can construct their own signaling server based on this Peer Server or replace the whole Peer Server with an existing channel. The client SDK also provides the customization mechanism to let users implement their own signaling channel adapter.

Figure 2. P2P Video Communication with Peer Server
Figure 2. P2P Video Communication with Peer Server

Intel CS for WebRTC solution further offers the MCU-based multi-party video conference chat. All streams go through the MCU server the same as the signaling messages do as Figure 3 shows. This reduces the stream traffic and computing overhead on client devices compared to a mesh network solution.

Figure 3. Multi-party Video Conference Chat through MCU Server
Figure 3. Multi-party Video Conference Chat through MCU Server

Unlike most existing WebRTC MCUs, which usually work as a router to forward media streams for clients, Intel CS for WebRTC MCU server also handles the media processing and allows a wide range of devices to be used in the conference. Users can subscribe to either the forward streams or mixed streams from MCU server. Based on Intel Iris Pro graphics or Intel HD graphics technology, media processing on the MCU server can achieve excellent cost-performance ratio.

The Intel MCU provides more flexibility on mixed streams. You can generate multiple video resolution mixed streams to adapt to various client devices with different media processing capability and network bandwidth.

External Input for RTSP Streams

Intel CS for WebRTC allows bridging a wider range of devices into the conference by supporting external inputs from RTSP streams. This means almost all RTSP compatible devices, including IP cameras, can join the video conference. The IP camera support opens up usage scenarios and applications in security, remote education, remote healthcare, etc.

Mixed-Stream Layout Definition and Runtime Region Control

Through Intel CS for WebRTC video layout definition interface, which is an expanded version of RFC-5707 (MSML), you can define any rectangle-style video layout for conference, according to the runtime participant numbers. Figure 4 shows the video layout for one conference. The meeting contains 5 different layouts with 1, 2, 3, 4, or 5-6 participants.

Figure 4. Example Video Layouts
Figure 4. Example Video Layouts

Figure 5 describes the detailed layout regions for a maximum of 2 participants. The region with id number 1 is always the primary region of this layout.

Figure 5. Example Video Layout Definition and Effect
Figure 5. Example Video Layout Definition and Effect

Intel CS for WebRTC MCU also supports automatic voice-activated video switching through voice activity detection (VAD). The user most active on voice is switched to the primary region which is the yellow part of Figure 6.

Figure 6. Example Video Layouts with Primary Region
Figure 6. Example Video Layouts with Primary Region

You can also assign any stream to any region as needed during runtime for flexible video layout design of the conference.

Flexible Conference Recording

When recording in Intel CS for WebRTC, you can select any video feed and any audio feed. You not only can record switching across different streams that the conference room is offering (such as mixed and forward streams), but also select video and audio tracks separately from different streams. You can select the audio track from the mixed stream of participants and video track from the screen-sharing stream.

Scaling the Peer Server Reference Implementation

Although the Peer Server that Intel provides is a signaling server reference implementation for signal node, you can extend it to a distributed and large scale platform by refactoring the implementation. See Figure 7 for a scaling proposal.

Figure 7. Peer Server Cluster Scaling Proposal
Figure 7. Peer Server Cluster Scaling Proposal

Scaling the MCU Conference Server

The Intel CS for WebRTC MCU server was designed to be a distributed framework with separate components, including manager node, signaling nodes, accessing nodes, media processing nodes, etc. Those components are easy to scale and suitable for cloud deployment.

Figure 8 shows an example from the conference server user guide for deploying an MCU server cluster.

Figure 8. MCU Conference Server Cluster Deployment Example
Figure 8. MCU Conference Server Cluster Deployment Example

Interoperability with Intel CS for WebRTC Gateway

For legacy video conference solutions to adopt the WebRTC advantage on the client side, Intel CS for WebRTC provides the WebRTC gateway.

Key Functionality Offering

Intel CS for WebRTC gateway for SIP not only provides the basic signaling and protocol translation between WebRTC and SIP, it also provides the real-time media transcoding between VP8 and H.264 to address the video codec preference difference between them. In addition, the gateway keeps the sessions mapping between WebRTC and SIP to support bi-directional video calls. Figure 9 briefly shows how SIP devices can connect with WebRTC terminals through the Gateway Intel provided.

Figure 9. Connect WebRTC with SIP Terminals through the Gateway
Figure 9. Connect WebRTC with SIP Terminals through the Gateway

Validated SIP Environments

Note: See Intel CS for WebRTC Release Notes for current validated environments

Cloud Deployment

The Intel CS for WebRTC gateway instances are generally session-based. Each session is independent, so sessions are easily scalable to multiple instances for cloud deployment. You can make the gateway instance management a component of your existing conference system scheduling policy and achieve load balancing for the gateway.

Comprehensive Intel CS for WebRTC Client SDKs

The Intel CS for WebRTC also provides comprehensive client SDKs to help you easily implement all the functionalities that the server provides. The client SDKs allow client apps to communicate with remote clients or join conference meetings. Basic features include audio/video communication, data transmission, and screen sharing. P2P mode also supports a customized signaling channel that can be easily integrated into existed IT infrastructures.

Client SDKs include JavaScript SDK, Android SDK, iOS SDK, and Windows SDK. Current features are listed in Table 1.

Table 1. Client SDK Features

#Partial support: for JavaScript SDK H.264 video codec support, only valid when browser WebRTC engine supports it.

Customized Signaling Channel

In addition to the default Peer Server, Intel CS for WebRTC client SDK for P2P chat provides simple customizable interfaces to allow you to implement and integrate with your own signaling channel through the extensible messaging and presence protocol (XMPP) server channel. Figure 10 shows there is a separated signaling channel model in client SDK for P2P chat and allows user to customize.

Figure 10. Customized Signaling Channel in Client SDK for P2P Chat
Figure 10. Customized Signaling Channel in Client SDK for P2P Chat

Hardware Media Processing Acceleration

On Android platforms, VP8/H.264 decoding/encoding hardware acceleration is enabled if the underlying platform includes corresponding MediaCodec plugins. For Windows, H.264 decoding/encoding and VP8 decoding hardware acceleration is enabled with DXVA-based HMFT or Intel Media SDK. For iOS, H.264 encoding/decoding is hardware-accelerated through Video Toolbox framework. Table 2 below shows hardware acceleration for WebRTC on different platforms.

Table 2.Hardware Media Acceleration Status for Client SDKs

#Conditional support: only enabled if the platform level enables VP8 hardware codec

NAT Traversal

Interactive Connectivity Establishment (ICE) helps devices connect to each other in various complicated Network Address Translation (NAT) conditions. The client SDKs support Session Traversal Utilities for NAT (STUN) and Traversal Using Relay NAT (TURN) servers. Figure 11 and Figure 12 show how client SDKs perform NAT traversal through STUN or TURN servers.

Figure 11. NAT Traversal with STUN Server
Figure 11. NAT Traversal with STUN Server

Figure 12. NAT Traversal with TURN Server
Figure 12. NAT Traversal with TURN Server

Fine-Grained Media & Network Parameter Control

Client SDKs further allow you to choose the video or audio source and its resolution and frame rate, the preferred video codec, and maximum bandwidth for video/audio streams.

Real-Time Connection Status Retrieval

Client SDKs provide APIs to retrieve real-time network and audio/video quality conditions. You can reduce the resolution or switch to an audio only stream if the network quality is not good, or adjust audio levels if audio quality is poor. Table 3 lists connection status information supported by client SDKs.

Table 3. Connection Status Information supported by Client SDKs

Conclusion

Based on WebRTC technology, Intel® Collaboration Suite for WebRTC builds an end-to-end solution, allowing you to enhance your applications with Internet video communication capabilities. The acceleration from Intel’s media processing platforms on the client and server sides, such as the Intel® Visual Compute Accelerator, improves the client user experience as well as the server side cost-effectiveness.

Additional Information

For more information, please visit the following web pages:
 

Intel Visual Compute Accelerator:
http://www.intel.com/content/www/us/en/servers/media-and-graphics/visual-compute-accelerator.html
http://www.intel.com/visualcloud

Intel Collaboration Suite for WebRTC:
http://webrtc.intel.com
https://software.intel.com/en-us/forums/webrtc
https://software.intel.com/zh-cn/forums/webrtc

The Internet Engineering Task Force (IETF) Working Group:
http://tools.ietf.org/wg/rtcweb/

W3C WebRTC Working Group:
http://www.w3.org/2011/04/webrtc/

WebRTC Open Project:
http://www.webrtc.org

Acknowledgements (alphabetical)

Elmer Amaya, Jianjun Zhu, Jianlin Qiu, Kreig DuBose, Qi Zhang, Shala Arshi, Shantanu Gupta, Yuqiang Xian

About the Author

Lei Zhai is the engineering manager in the Intel Software and Solutions Group (SSG), Systems Technologies & Optimizations (STO), Client Software Optimization (CSO). His engineering team focuses on Intel® Collaboration Suite of WebRTC product development and its optimization on IA platforms.

Obtaining a High Confidence in Streaming Data Using Standard Deviation of the Streaming Data

$
0
0

Download PDF (490.39 KB)

When measuring boxes with the world-facing Intel® RealSense™ camera DS4, I discovered that I needed the ability to automate the capturing of the box image and size data. This would allow a camera mounted over a scale to auto-capture the image and then send a known accurate value back to the system. This type of automation enables the design of a kiosk where placing a box on the scale triggers the image capture and automates the image process, so the clerk doesn’t have to press a button to facilitate the transaction. With the weight and size data calculated, the data can be entered as measured into a mail system programmatically.

Thinking about how to automate the image capture when the images are composed of streaming data presents new problems. The idea presented in this code is to determine when the image has stabilized to a point where we have a high confidence in the data that is being seen. The basis used in this class is that we can use statistics to determine when we have a stable image and thus when to automatically capture the image. To do this, we use the standard deviation model.

The standard deviation model or bell curve is normally represented using this type of graph.


https://en.wikipedia.org/wiki/Standard_deviation

Another way to look at this data is to represent how much of the data is within a range of standard deviations from the mean.


https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule

As shown in the graph, when you are within 1 standard deviation of the mean, you have a 68-percent confidence in the data. By using 2 standard deviations as the cutoff, you know you have a 95-percent confidence level in the accuracy of the data that is coming across the stream from the camera.

Now that I’ve proposed the idea of using the standard deviation as the method for auto-capturing an image, determining the standard deviation for a set of streaming data becomes another issue. Searching for solutions, I came upon a refresher from college days using Knuth. On page 232 of Donald Knuth's The Art of Computer Programming (volume 2, third edition) is a formula for determining variance around the standard deviation for streaming data. The code in this class implements this formula and is documented in the source code header files, exactly which variables are used for which parts of the formula in the code.

This line of formulas are described in this article:
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

With this class, I now have the ability to determine how many standard deviations away from the mean a data stream is. Here’s how I exposed this knowledge in the UI. In my sample, the Intel RealSense camera was used to measure a 3D box and then draw a bounding box around the item in the picture. When the standard deviation was less than 1, the box was drawn with a red border. When the standard deviation was less than 2, it was drawn as yellow, and anything greater than 2 was drawn as green. This UI implementation of the functionality allows immediate feedback to the applications’ user and helps them visualize the consistency of the data. Of course the stability is also viewed as the box lines drawn over the image also move about less and become more stable until the image freezes when captured.

With that preface, here’s the class just described implemented in C code.

For any questions or discussions, please email me
Dale Taylor      Intel Corp
dale.t.taylor@intel.com


This is the source code from the StreamingStats.h file.

/******************************************************************************
Copyright 2015, Intel Corporation All Rights Reserved.

The source code, information and material("Material") contained herein is owned
by Intel Corporation or its suppliers or licensors, and title to such Material
remains with Intel Corporation or its suppliers or licensors. The Material
contains proprietary information of Intel or its suppliers and licensors.The
Material is protected by worldwide copyright laws and treaty provisions. No
part of the Material may be used, copied, reproduced, modified, published,
uploaded, posted, transmitted, distributed or disclosed in any way without
Intel's prior express written permission. No license under any patent,
copyright or other intellectual property rights in the Material is granted to
or conferred upon you, either expressly, by implication, inducement, estoppel
or otherwise. Any license under such intellectual property rights must be
express and approved by Intel in writing.

Unless otherwise agreed by Intel in writing, you may not remove or alter this
notice or any other notice embedded in Materials by Intel or Intel's suppliers
or licensors in any way.
******************************************************************************/

//
// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
// This code implements a method for determining Std Dev from streaming data.
// Based on Donald Knuth's Art of Computer Programming vol 2, 3rd edition,
// page 232
//
// The basic algorithm follows (to show what the variable names represent)
// X is the current data item
// Mc is the running Mean (Mean current), Mc-1 is the mean for the previous
// item, Sc is the running Sum (Sum current), Sc-1 is the sum for the previous
// item, c is the count or current item
// init M1 = X1 and S1 = 0 during the first pass and on reset
// Data is added to the cumulating values using this formula
// Mc = Mc-1 + (Xc - (Mc-1))/c
// Sc = Sc-1 + (Xc - (Mc-1))*(Xc - Mc)
// for 2<= c <= n the cth estimate of variance is s*2 = Sc/(c-1)
//


#include "math.h"

class StreamingStats {

private:
	unsigned int count = 0;
	unsigned int index = 0;
	double ss_Mean, ss_PrevMean, ss_Sum;
	double* ss_Data;
	unsigned int ss_Size = 1;
	// Internal functions defined here

public:
	StreamingStats(unsigned int windowSize);     // Constructor, defines window size
	~StreamingStats(void) { delete [] ss_Data; };	// destructor for data
	int		DataCount();			// return # items are in this data set
	int		DataReset();			// reset the data to empty state
	int		NewData(double x);		// add a data item
	double	Mean();				// return Mean of the current data
	double	Variance();				// return Variance of the current data
	double	StandardDeviation();		// return Std Deviation of the current data

};

Comments on the class and code defined in the H file.

Because the variable sized data structure (ss_Data) is defined via the new command in the constructor, a destructor was defined to assure that delete is called.



This is the source code from the StreamingStats.cpp file.

/******************************************************************************
Copyright 2015, Intel Corporation All Rights Reserved.

The source code, information and material("Material") contained herein is owned
by Intel Corporation or its suppliers or licensors, and title to such Material
remains with Intel Corporation or its suppliers or licensors. The Material
contains proprietary information of Intel or its suppliers and licensors.The
Material is protected by worldwide copyright laws and treaty provisions. No
part of the Material may be used, copied, reproduced, modified, published,
uploaded, posted, transmitted, distributed or disclosed in any way without
Intel's prior express written permission. No license under any patent,
copyright or other intellectual property rights in the Material is granted to
or conferred upon you, either expressly, by implication, inducement, estoppel
or otherwise. Any license under such intellectual property rights must be
express and approved by Intel in writing.

Unless otherwise agreed by Intel in writing, you may not remove or alter this
notice or any other notice embedded in Materials by Intel or Intel's suppliers
or licensors in any way.
******************************************************************************/

//
// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
// This code implments a method for determining Std Dev from streaming data.
// Based on Donald Knuth's Art of Computer Programming vol 2, 3rd edition,
// page 232
//
// The basic algorithm follows (to show what the variable names represent)
// X is the current data item
// Mc is the running Mean (Mean current), Mc-1 is the mean for the previous
// item, Sc is the running Sum of Squares of Differences (Sum current),
// Sc-1 is the sum for the previous item, c is the count or current item.
// Init M1 = X1 and S1 = 0 during the first pass and on reset
// Data is added to the cumulating values using this formula
// Mc = Mc-1 + (Xc - (Mc-1))/c
// Sc = Sc-1 + (Xc - (Mc-1))*(Xc - Mc)
// for 2<= c <= n the cth estimate of variance is s*2 = Sc/(c-1)
//

#include "math.h"
#include "StreamingStats.h"

StreamingStats::StreamingStats(unsigned int windowSize)
{
	if (windowSize > 0)
		ss_Size = windowSize;
	ss_Data = new double[ss_Size];

	return;
}

// This is the only public routine, it returns the count of the # of items used
// to determine the current values in the object.
//
int StreamingStats::DataCount()
{
	return count;			// return the number of accumulated data items
}

int StreamingStats::DataReset()
{
//	ss_PrevMean = ss_Mean = 0.0;	// clear all data
//	ss_PrevSum = 0.0;
	count = 0;
	index = 0;
	return 0;				// start empty, no elements
}

// this routine adds new data to the streaming stats totals
// returns the # of items added to the data set
int StreamingStats::NewData(double x)
{
	ss_PrevMean = ss_Mean;
	if (count >= ss_Size) { // We're rolling the window
		// The oldest data point is the next point in a circular array
		index++;
		if (index >= ss_Size)
			index = 0;
		// Remove oldest data point from mean
		ss_Mean = ss_Mean - (ss_Data[index] - ss_Mean) / ss_Size;
		// Add new data point to mean
		ss_Mean = ss_Mean + (x - ss_PrevMean) / ss_Size;
		// Remove oldest data point from sum
		ss_Sum = ss_Sum - (ss_Data[index] - ss_PrevMean) *
			(ss_Data[index] - ss_Mean);
		// Add new data point to sum
		ss_Sum = ss_Sum + (x - ss_PrevMean) * (x - ss_Mean);
	}
	else { // We're still filling the window
		count++;

		if (count == 1) // initialize with the first data item only
		{
			ss_PrevMean = ss_Mean = x;
			ss_Sum = 0.0;
		}
		else // we are adding a data item, follow the formula
		{
			ss_Mean = ss_PrevMean + (x - ss_PrevMean) / count;
			ss_Sum = ss_Sum + (x - ss_PrevMean)*(x - ss_Mean);
		}
		index = count - 1;
	}

	// Store new data point - overwriting oldest in circular array
	ss_Data[index] = x;

	return count;
}

// if the count is positive, return the new mean
double StreamingStats::Mean()
{
	return (count > 0) ? ss_Mean : 0.0;
}

// if the count is 2 or more, return a variance, otherwise zero
double StreamingStats::Variance()
{
	return ((count > 1) ? ss_Sum / (count - 1) : 0.0);
}

// calc the StdDev based using sqrt of the variance (standard method)
double StreamingStats::StandardDeviation()
{
	return sqrt(Variance());
}

About the Author

Dale Taylor has worked for Intel since 1999 as a Software Engineer.  He is currently in Arizona and focused on Atom Enabling, helping our software partners use Intel’s latest chips and hardware.  He’s been a programmer for 25 years. Dale has a BS in Computer Science and a Business Management minor, and has always been a gadget freak. Dale is a pilot and spends time in the summer flying gliders over the Rocky Mountains. When not soaring he enjoys hiking, cycling, boating, and photography

Notices

Intel, the Intel logo, and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


This sample source code is released under the Intel Sample Source Code License Agreement.

An API Journey for Velocity and Lower Cost

$
0
0

Download Document

Introduction

With digital business and market competition intensifying, many companies are shifting to APIs to expose business processes and data for easy and better integration, enabling business expansion, and driving lower cost and faster time to market (TTM) through reusable services and APIs. APIs are being deployed on an unprecedented scale in broad ecosystems for social, mobile, cloud, and analytical applications as well as for the Internet of Things (IOTs). This article shares what we has learned in adopting service oriented architecture (SOA) and APIs.

Our Approach to APIs

APIs are now used by most global businesses to interconnect and integrate with each other’s businesses, forming a trend called an API Economy. Since 2013, we has been promoting the adoption of API based application development practice along with our SOA effort. We standardized APIs, design patterns, and code libraries. We developed an API based application framework to help us deliver solutions and integrations faster and more flexible.

Prior we started our API journey, most of our applications were built as monolithic solutions with tight integrated architecture. A typical application was built as a closed system, with some modules compiled and linked together to run as a single process. With little ability for reuse, every capability or integration needed some new design and development effort. As we started our API journey, we created our API strategy, and developed standards, technical guidance, training, and an application framework based on the API and service concept, as illustrated in Figure 1.

We used the standards, guidance, and the API based application framework to guide our application architecture design and development. The framework helped teams shift their application development approach from building tightly coupled applications to building APIs and SOA service based applications.


Figure 1.API based application framework.

We also learned that we need to define the API taxonomy and metadata for APIs up front, see Figure 2. Having clearly defined taxonomy and metadata would allow us to communicate among business stakeholders and development teams, and help us analyze the business processes and data, and properly model the services and APIs. With a defined taxonomy and related metadata, we were able to break large business processes and data into smaller function pieces, and model them as APIs. We then implemented them as services, and exposed them as APIs.


Figure 2.The API taxonomy.

Next we knew it was important to model core APIs that could be used by many enterprise applications. To do this, we developed three categories of core APIs: Master Data APIs, Security APIs, and Utility APIs. These core APIs provided immediate value through broad reuse and shortened application development time. We have built more than 80 core APIs since developing the categories. As a result of developing the applications based on these reusable APIs, TTM has been greatly reduced. Figure 3 shows some sample of our core API usage.


Figure 3.Sample Core API usage.

Finally, we realized the importance of having an API management capability for registering, finding, and managing APIs. With the number of APIs increasing, we needed a better ability to control API calls, throttle the API call volume, track the call traffic, and measure and report the API calls. Of course, we also needed to properly secure the APIs. A good API management product would provide all these key API management capabilities. Figure 4 illustrates a conceptual diagram of API management and usage.

We developed the Enterprise API Registration Application (EARA) and the IT Developer Zone (ITDZ) Service Portal to provide some of the API management capabilities. API owners use the EARA to register their APIs. Application developers use the ITDZ Service Portal to search for, and request to use the APIs. As we continue on our API journey, we are deploying a vendor API Management tool to meet our growing needs.


Figure 4.API management and usage concept.

Lessons Learned

To embark on the API Economy journey, and start API based application development, we had to change our mindset from the traditional way of building applications to the API approach. Instead of thinking of an application as one big thing, we had to think of it as an assembly of smaller APIs and services. From there, we designed the capabilities as reusable APIs integrated together using a loosely coupled framework.

Once we built the component APIs, we designed applications as containers, and orchestrated the actions and responses of APIs to deliver the desired results for the business and customers. We also exposed selected APIs to external partners for business expansion and integration when necessary.

Changing the mindset or our development culture was much more difficult than simply planning the development process. We had chartered a dedicated program to drive our API adoption. We not only drove the architecture change, API management capability, and technology toolsets, but more importantly we also developed standards, guidance, and training courses, providing guidance and training to our development community to help anyone be successful on their API journey.

Summary

The API based application development and integration approach is useful for any company. APIs enable easier integration with customers and partners business applications, and support mobile applications better. The API approach helps reduce the cost of application development and integration through reuse, and gains the TTM advantage, which enables a business to get to market quicker. The approach is especially important for companies that want to expand business onto the Internet, and participate in the growing API Economy. We are actively using APIs in our business application integrations with internal and external partners. As our journey continues, we are beginning work on our Micro-service and Container strategy. Our API architecture framework, experience, and learnings can help other companies start their own API Economy journey.

References

Welcome to The API Economy:
http://www.forbes.com/sites/ciocentral/2012/08/29/welcome-to-the-api-economy/

API management definition:
http://searchcloudapplications.techtarget.com/definition/API-management

Top API management vendors:
http://www.informationweek.com/cloud/platform-as-a-service/forrester-names-top-api-management-vendors/d/d-id/1316520

About the Authors

Jian Wu has over 30 years of industry experience. His career spans cross the Factory Automation, Process Control, Supply Chain, Product Engineering and IT enterprise application architecture design, development and integrations. Currently, Jian is responsible for setting and governing Intel IT enterprise application integration capability strategy, standards, and technical guidance.

Mark Boucher is a 30 year veteran in large scale enterprise software. Currently Mark is a principal engineer setting the software architecture direction for sales and marketing organizations.

Don Meyers has over 30 years of industry experience leading the architecture, design, and deployment of enterprise applications, IT infrastructure, and collaboration systems. He has patent filings for various computer systems, including: Network, Data management, Collaboration, eMail, Search security, and Perceptual computing.

Tutorial: Using Intel® RealSense™ Technology in the Unreal Engine* 3 - Part 2

$
0
0

Download PDF 854 KB

GoTutorial: Using Intel® RealSense™ Technology in the Unreal Engine* 3 - Part 1 

In Part 1, you created a simple game map with face bone structure using the Unreal 3 Editor. Here, in Part 2 below, we will show how to apply Intel RealSense SDK features in your Unreal game.

Setting Up Visual Studio 2010 for the Example Game

The steps below set your map file as the default map for the Example game by modifying the .ini file.

  1. Go to <UE3 Source>\ExmapleGame\Config.
  2. Open DefaultEngine.ini and change as shown below.

[URL]

MapExt=umap

Map=test.umap

LocalMap=BLrealsense_Map.umap

TransitionMap=BLrealsense_Map.umap

EXEName=ExampleGame.exe

DebugEXEName=DEBUG-ExampleGame.exe

GameName=Example Game

GameNameShort=EG

  1. Open ExampleEngine.ini and change as listed.

    [URL]

    Protocol=unreal

    Name=Player

    Map=test.umap

    LocalMap=BLrealsense_Map.umap

    LocalOptions=

    TransitionMap=BLrealsense_Map.umap

    MapExt=umap

    EXEName=ExampleGame.exe

    DebugEXEName=DEBUG-ExampleGame.exe

    SaveExt=usa

    Port=7777

    PeerPort=7778

    GameName=Example Game

    GameNameShort=EG

  2. Open the UE3 Visual Studio project or solution file in <UE3 source>\Development\Src – UE3.sln, or open UE3.sln in Visual Studio.

    Figure 1: Microsoft Visual Studio* 2010.

  3. Build and run as in the previous steps. You will see the Unreal initial window and your game.

Using the Coordinate System in Unreal Engine

Before linking with the Intel® RealSense™ SDK, it is important to understand the coordinate system in Unreal.

Position is tracked by the X-Y-Z axis (Refer to “Origin” and “RotOrigin” class in UE3 source code) and rotation is by Euler (P-Y-R) and Quaternion (Refer to https://en.wikipedia.org/wiki/Quaternion for more detail). 


Figure 2: Coordinate system

Quaternion has one scalar factor and three vector factors.

To convert from Euler angle to Quaternion:

X-Y-Z angles:

Autoexpand Setup for a Debugger in Visual Studio 2010 (Optional)

The debugging symbols for bone structure array, position, and rotation array were originally encrypted and unrecognizable in Visual Studio. To see debugging symbols, follow the steps below.

  1. Find your Autoexp.dat
     

    For Visual Studio and Windows 7 64-bit, it is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Packages\Debugger

  2. Find the debugging script and open it.
     

    UE3 source\ Development/External/Visual Studio Debugging/AUTOEXP.DAT_addons.txt

  3. Copy each [AutoExpand] and [Visualizer] section into your Autoexp.dat.

Intel® RealSense™ SDK Enabling on Unreal Engine 3

This section describes Intel RealSense SDK-related changes in Unreal Engine 3 after installing the Intel RealSense SDK and Depth Camera Manager. Face landmark and head-pose tracking APIs in Intel RealSense SDK are used to manipulate facial expression and head movement of the example character. Head-pose tracking is intuitive since the roll, yaw, and pitch values can be used in Unreal Engine 3 as is, but face landmark tracking is more complicated.


Figure 3: Roll-Yaw-Pitch.

There are 76 traceable points for the face provided by the Intel RealSense SDK. Each expression, like blink or mouth open, has a value range with relevant points. For example, when the eye is closed, the distance between point 12 and point 16 will be around 0, and when the eye is open, the distance will be greater than 0 and varies for each individual.

Based on this, the current implementation is based on the relative calculation of the minimum/maximum value between the character and the user. For example, for blinking, calculate and apply how much distance the game character’s eye should have for eyes open and closed according to the user.


Figure 4: Face landmarks and numbers of the Intel® RealSense™ SDK.

<UE3> is the home folder where UE3 is installed. Below four files are to be modified.

  • <UE3>\Development\Src\UnrealBuildTool\Configuration\UE3BuildConfiguration.cs
  • <UE3>\Development\Src\UnrealBuildTool\Configuration\UE3BuildWin32.cs
  • <UE3>\Development\Src\Engine\Inc\UnSkeletalMesh.h
  • <UE3>\Development\Src\Engine\Src\UnSkeletalComponent.cpp

UE3BuildConfiguration.cs (Optional)

public static bool bRealSense = true;

RealSense relevant codes are enclosed with “#if USE_REALSENSE” phrase. This sentence is used for defining “#if USE_REALSENSE” phrase at UE3BuildWin32.cs file.  If you modify this to “false”, RealSense relevant code won’t be referenced to be compiled. This is an optional.

UE3BuildWin32.cs

if (UE3BuildConfiguration.bRealSense)
{
SetupRealSenseEnvironment();
}
void SetupRealSenseEnvironment()
{
      GlobalCPPEnvironment.Definitions.Add("USE_REALSENSE=1");
      String platform = (Platform == UnrealTargetPlatform.Win64 ? "x64" : "Win32");

      GlobalCPPEnvironment.SystemIncludePaths.Add("$(RSSDK_DIR)/include");
      FinalLinkEnvironment.LibraryPaths.Add("$(RSSDK_DIR)/lib/" + platform);

      if (Configuration == UnrealTargetConfiguration.Debug) {
           FinalLinkEnvironment.AdditionalLibraries.Add("libpxc_d.lib");
      } else {
           FinalLinkEnvironment.AdditionalLibraries.Add("libpxc.lib");
      }
}

The definition of “USE_REALSENSE” that will be used to enable or disable Intel RealSense SDK relevance at source codes (Optional).

Since Unreal Engine 3 is a makefile-based solution, the Intel RealSense SDK header file and library path should be added at the project’s include and library path.

UnSkeletalMesh.h

#if USE_REALSENSE
	PXCFaceData* faceOutput;
	PXCFaceConfiguration *faceConfig;
	PXCSenseManager *senseManager;

	void InitRealSense();
	void ReleaseRealSense();
#endif

This is the declaration part of the Intel RealSense SDK classes and functions. The bone structure manipulating part is at UpdateSkelPos() of UnSkeletalComponent.cpp.

UnSkeletalComponent.cpp

#if USE_REALSENSE
	#include "pxcfacedata.h"
	#include "pxcfacemodule.h"
	#include "pxcfaceconfiguration.h"
	#include "pxcsensemanager.h"

	FLOAT rsEyeMin = 6;
	FLOAT rsEyeMax = 25;

	FLOAT rsMouthMin = 5;
	FLOAT rsMouthMax = 50;

	FLOAT rsMouthWMin = 40;
	FLOAT rsMouthWMax = 70;

	FLOAT chMouthMin = -105;
	FLOAT chMouthMax = -75;
……
#endif

Include Intel RealSense SDK header files. Defines minimum/maximum values of the user and game characters, starting with “rs” as the user’s value, and “ch” as the game character’s value (this should be changed according to the user and game character’s appearance). For example, for blinking, this defines how much distance a game character’s eye should have for eyes open and closed according to the user.

void USkeletalMeshComponent::Attach()
{
……
#if USE_REALSENSE
	senseManager = NULL;
	InitRealSense();
#endif

The Attach() function calls the InitRealSense() function to initialize the Intel RealSense SDK’s relevant classes and configure the camera. 

#if USE_REALSENSE
void USkeletalMeshComponent::InitRealSense() {
	if (senseManager != NULL) return;

	faceOutput = NULL;

	senseManager = PXCSenseManager::CreateInstance();
	if (senseManager == NULL)
	{
 // error found
	}

	PXCSession *session = senseManager->QuerySession();
	PXCCaptureManager* captureManager = senseManager-> QueryCaptureManager();

The InitRealSense() function configures which camera will be used,and creates face-relevant class instances.

void USkeletalMeshComponent::UpdateSkelPose( FLOAT DeltaTime, UBOOL bTickFaceFX )
{
……
#if USE_REALSENSE
if (senseManager->AcquireFrame(false) >= PXC_STATUS_NO_ERROR) {
	faceOutput->Update();
	int totalNumFaces = faceOutput->QueryNumberOfDetectedFaces();
	if (totalNumFaces > 0) {

The UpdateSkelPose() function is used for head pose and face landmark tracking.

// Head
FVector v(yaw, roll, pitch);

LocalAtoms(6).SetRotation(FQuat::MakeFromEuler(v));
LocalAtoms(6).NormalizeRotation();

Head-pose tracking is intuitive because roll, yaw, and pitch values from the Intel RealSense SDK can be used as is.


Figure 5: Face landmarks and numbers that are used for eyes and mouth expression.

To express blinking, landmark points 12, 16 and 20, 24 are used, and 47, 51, 33, 39 are used for mouth expression (detail implementation depends on developers’ preference).

// Mouth
FLOAT mouthOpen = points[51].image.y - points[47].image.y;
mouth = chMouthMax - (mouthOpen - rsMouthMin) * mouthRatio;

mouthOpen = points[47].image.x - points[33].image.x;
rMouthWOpen = chMouthWMin + (mouthOpen - rsMouthWMin) * mouthWRatio;

mouthOpen = points[39].image.x - points[47].image.x;
lMouthWOpen = chMouthWMin + (mouthOpen - rsMouthWMin) * mouthWRatio;

cMouth = chMouthCMax - (mouthOpen - rsMouthWMin) * mouthCRatio;
// Left Eye
FLOAT eyeOpen = points[24].image.y - points[20].image.y;
lEyeInner = chEyeInnerMin + (eyeOpen - rsEyeMin) * innerEyeRatio;
lEyeOuter = chEyeOuterMin + (eyeOpen - rsEyeMin) * outerEyeRatio;
lEyeUpper = chEyeUpperMin + (eyeOpen - rsEyeMin) * upperEyeRatio;
// Right Eye
eyeOpen = points[16].image.y - points[12].image.y;
rEyeInner = chEyeInnerMin + (eyeOpen - rsEyeMin) * innerEyeRatio;
rEyeOuter = chEyeOuterMin + (eyeOpen - rsEyeMin) * outerEyeRatio;
rEyeUpper = chEyeUpperMin + (eyeOpen - rsEyeMin) * upperEyeRatio;
rEyeLower = chEyeLowerMin + (eyeOpen - rsEyeMin) * lowerEyeRatio;

BN_Lips_Corner_R, BN_Lips_Corner_L, BN_Jaw_Dum       is used for mouth expression, and BN_Blink_UpAdd, BN_Blink_Lower, BN_Blink_Inner, BN_Blink_Outer is used to express eye blinking. (Refer to the “Facial Bone Structure in Example Characters” section for each bone number.)

// Mouth
FVector m(90, 0, mouth);
LocalAtoms(59).SetRotation(FQuat::MakeFromEuler(m));

LocalAtoms(57).SetTranslation(FVector(mouthWXZ[2], rMouthWOpen, mouthWXZ[3])); // Right side
LocalAtoms(58).SetTranslation(FVector(mouthWXZ[4], lMouthWOpen * -1, mouthWXZ[5])); // Left side

// Left Eye
LocalAtoms(40).SetTranslation(FVector(eyeXY[0], eyeXY[1], lEyeUpper)); // Upper
LocalAtoms(41).SetTranslation(FVector(eyeXY[2], eyeXY[3], lEyeLower)); // Lower
LocalAtoms(42).SetTranslation(FVector(eyeXY[4], eyeXY[5], lEyeInner)); // Inner
LocalAtoms(43).SetTranslation(FVector(eyeXY[6], eyeXY[7], lEyeOuter)); // Outer

// Right Eye
LocalAtoms(47).SetTranslation(FVector(eyeXY[8], eyeXY[9], rEyeLower)); // Lower
LocalAtoms(48).SetTranslation(FVector(eyeXY[10], eyeXY[11], rEyeOuter)); // Outer
LocalAtoms(49).SetTranslation(FVector(eyeXY[12], eyeXY[13], rEyeInner)); // Inner
LocalAtoms(50).SetTranslation(FVector(eyeXY[14], eyeXY[15], rEyeUpper)); // Upper
void USkeletalMeshComponent::ReleaseRealSense() {
	if (faceOutput)
		faceOutput->Release();

	faceConfig->Release();
	senseManager->Close();
	senseManager->Release();
}

Close and release all of the Intel RealSense SDK relevant class instances.

Facial Bone Structure in Example Characters

In the example, the face is designed with 58 bones. In the image, each box represents a bone. A complete list of bones follows.


Figure 6: Names of bones.

Conclusion

To make an avatar that moves and copies users’ facial movements and expressions to enrich the gaming experience in UE3 and using the Intel RealSense SDK, implementation of the UE3 source code is the only option, and developers must know which source file to change. We hope this document helps you when making avatar in UE3 with the Intel RealSense SDK.

About the Authors

Chunghyun Kim is an application engineer in the Intel Software and Services Group. He focuses on game and graphic optimization on Intel® architecture.

Peter Hong is an application engineer at the Intel Software and Services Group. He focuses on enabling the Intel RealSense SDK for face, hand tracking, 3D scanning, and more.

For More Information

Epic Unreal Engine
https://www.unrealengine.com

Intel RealSense SDK
http://software.intel.com/realsense

Part 1


MAGIX takes Video Editing to a New Level by Providing HEVC to Broad Users

$
0
0

MAGIX's Video Pro X delivers Intel HEVC encoding through Intel® Media Server Studio

 

MagixWhile elite video pros have access to high-powered video production applications with bells and whistles available traditionally only to enterprise, MAGIX has taken a broader approach unveiling its latest version of Video Pro X (Figure 1), a video editing software that sets new standards for semi-professional video production for broader users. Optimized with Intel® Media Server Studio, MAGIX Video Pro X delivers Intel HEVC encoding to prosumers and semi-pros, to help alleviate a bandwidth-constrained internet environment where millions of videos are shared and distributed.

Magix Video Pro X
Figure 1: MAGIX Video Pro X for semi-professional video production

 

Video takes up a massive―and growing―share of Internet traffic. And meeting consumers’ demands for higher online video quality pushes the envelope even more. 

One solution to make those demands more manageable is the HEVC standard (also known as H.265), which delivers huge gains in compression efficiency over H.264/AVC, currently the most commonly used standard. Depending on the testing scenarios, Intel's HEVC GPU-accelerated encoder can deliver 43% better compression for the same quality as 2-pass x264-medium.1, 2 While video is the largest and fastest growing category of Internet traffic, it also consumes much more internet bandwidth than other content formats. By using HEVC, massive gains in compression efficiency is innovative not only for major online video streaming providers, but also for general internet users and the growing audiences that create and share videos online every day.

With the 6th generation Intel® Core Processors launch in September last year, Intel's platforms allow both hardware-based decoding and encoding of HEVC capabilities. Since then, MAGIX worked closely to optimize its video production software with Intel Media Server Studio's Professional Edition for access to hardware acceleration and graphics processor capabilities, Intel's HEVC codec, and to use expert-grade performance and advanced visual quality analyzers.

MAGIX technical experts elaborated that thanks to the high quality of Intel's media software product and easy integration, MAGIX was able to incorporate this extremely efficient compression technology in Video Pro X, making it a premier, semi-pro video editing software with the competitive advantage of providing hardware-accelerated HEVC encoding. The production software is also a great tool for the import, editing and export of 4K/UHD videos.

"Through integrating Intel’s HEVC decoder and encoder that is a part of Intel® Media Server Studio Professional Edition, we put the power in our customer's hands to use the benefits of better compression rate of next gen codec that allows to deliver high quality video with less bandwidth," Sven Kardelke, MAGIX Chief Product Officer Video/Photo.

"We’re working with many industry leaders to help bring their solutions to the marketplace, and MAGIX Video Pro X is an innovative example of a new video editing software solution that supports HEVC. Optimized with Media Server Studio, it’s one of the newest, prosumer software products that’s available enabling individuals to create, edit, and share their own broadcast-ready 4K content online in compressed formats. This promises to have a huge effect on improving video viewer experiences via the internetwhere it is so bandwidth constrained today," said Jeff McVeigh, Intel Software and Services Group vice president and Visual Computing Products general manager.

All in all, it’s a great step forward in taking video editing to a new level.  

 


1Intel Media Server Studio HEVC Codec Scores Fast Transcoding Title

Remote Power Management of Intel® Active Management Technology (Intel® AMT) Devices with InstantGo

$
0
0

Download Document

Introduction

InstantGo, also known as Connected Standby, creates a low OS power state that must be handled differently from how remote power management was handled in the past. This article provides information on how to support the InstantGo feature.

How to support Remote Power Management of Intel® Active Management Technology (Intel® AMT) enabled devices with InstantGo

InstantGo, formerly known as Windows* Connected Standby, is a Microsoft power-connectivity standard for Windows* 8 and 10. This hardware and software specification defines low power levels while maintaining network connectivity and allows for a quick startup (500 milliseconds). InstantGo replaces the s3 power state.

To verify if a system supports InstantGo, type “powercfg /a” from a command prompt. If you have InstantGo you’ll see the Standby (Connected) as an option.

Intel AMT and InstantGo

Intel AMT added support for InstantGo in version 10.0, but the manufacturer must enable the feature.

How are Intel AMT and "InstantGo" related? Intel AMT has to properly handle the various power states by communicating with the firmware, however in this case the OS, not the hardware, controls the low power state.

Intel AMT and InstantGo prerequisites

The only platforms fully compatible with InstantGo run Windows* 8.1 or newer with Intel AMT 10.0 or later. To remotely determine if a device OS is in a low power state, use the OSPowerSavingState method.

One way of determining the Intel AMT version is to inspect the CIM_SoftwareIdentity.VersionString property as shown in the Get Core Version Use Case.

Remote Verification of Device Power State

In order to verify power state in the past we looked at the hardware power state using the CIM_AssociatedPowerManagementService.PowerState method. Now when a system is in a low OS power of InstantGo, the hardware power state will be return s0 (OS powered on) You now need to make an additional query for the OSPowerSavingState in order to determine if the OS is in FULL or low power mode.

The Power Management Work Flow in Intel AMT

Previous work flow for Power-On operations

  1. Query for Intel AMT Power State 
  2. If system is in s0 (power on), do nothing
  3. If system is in s3, s4 or s5, then issue a power on command using the Change Power State API

Current recommendation to properly handle Intel AMT Devices with InstantGo

  1. Query for Intel AMT Power StateGet Power State API
  2. If system is in s3, s4 or s5 then issue a power on command using the Change Power State API
  3. If a system is in s0 (power on) then:
    • If Intel AMT version is 9.0 and below, do nothing
    • If Intel AMT version is 10.0 and above, query the OSPowerSavingState method
      1. If OSPowerSavingState is full power, do nothing
      2. If OSPowerSavingState is in a low power state, wake up the system to full power using RequestOSPowerSavingState method.

There is also a sample PowerShell Script demonstrating this available for download. The script has 4 basic sections:

  1. Establishes the connection and identifies the Intel AMT Version
  2. Queries the Intel AMT device’s current power state (hardware) – Note: script assumes Intel AMT 10 and is in InstantGo low power mode
  3. Queries for the OS Power State
  4. Wakes up the Device

For information on running PowerShell scripts with the Intel® vPro™ module please refer to the Intel AMT SDK and related Intel AMT Implementation and Reference Guide.

Additional Resources

Summary

As more devices support InstantGo, integration of this technology with remote power management methodology will become critical. You want to avoid cases where devices may be detected in powered On (s0) state, when the system is actually running at a lower power state. Fortunately, supporting of InstantGo technology isn’t a difficult task, just a few additional steps to determine the actual power state.

About the Author

Joe Oster has been active at Intel around Intel® vPro™ technology and Intel AMT technology since 2006. He is passionate about technology and is a MSP/SMB technology advocate. When not working, he is a Dad and spends time working on his family farm or flying drones and RC Aircraft.

Innovative Media Solutions Showcase

$
0
0

New, Inventive Media Products Made Possible with Intel Media Software Tools

With Intel media software tools, media/video solutions providers can create inspiring, innovative new products that capitalize on next gen capabilities like HEVC, high-dynamic range (HDR) content delivery, video security solutions with smart analytics, and more. Check these out. Envision how your company can use Intel's advanced media tools to re-invent new solutions for the media and broadcasting industry.

    Mobile Viewpoint Live Reporting Ronde of Norg

    Mobile Viewpoint Delivers HEVC HDR Live Broadcasting

    Mobile Viewpoint recently announced a new bonding transmitter that delivers HEVC (H.265) HDR video running on the latest 6th generation Intel® processors, - and through using the Intel® Media Server Studio Professional Edition optimizes HEVC compression and quality. And for broadcast-quality video, Intel’s graphics-accelerated codec enabled Mobile Viewpoint to develop a hardware platform that combines low power hardware-accelerated encoding and transmission. The new HEVC enabled software will be used in Mobile Viewpoint's Wireless Multiplex Terminal (WMT) AGILE high-dynamic range (HDR) back of the camera solutions and in its 19-inch FLEX IO encoding and O2 decoding products. The results: fast, high-quality, video broadcasting on-the-go so the world can stay better informed of fast-changing news and events. Read more.

     

    Sharp all-around security camera

    Sharp's New Innovative Security Camera is built with Intel® Architecture & Media Software

    With digital surveillance and security concerns now an everyday part of life, SHARP unveiled a new omnidirectional wireless, intelligent, digital security surveillance camera to better meet these needs. Built with an Intel® Celeron® processor (N3160), SHARP 12 megapixel image sensors, and by utilizing the Intel® Media SDK for hardware accelerated encoding, the QG- B20C camera can capture video in 4Kx3K resolution, provide all-around views, and is armed with many intelligent automatic detection functions. Read more.

     

    Magix Video Pro XMAGIX takes Video Editing to a New Level by Providing HEVC to Broad Users

    While elite video pros have access to high-powered video production applications with bells and whistles available traditionally only to enterprise,MAGIX has taken a broader approach unveiling its latest version of Video Pro X, a video editing software that sets new standards for semi-professional video production to widespread users. Optimized with Intel Media Server Studio, MAGIX Video Pro X provides Intel HEVC encoding to prosumers and semi-pros to help alleviate a bandwidth-constrained internet where millions of videos are shared and distributed. Read more.

     

    Comprimato2

    New JPEG2000 Codec Now Native for Intel Media Server Studio

    Comprimato recently worked with Intel on providing the best video encoding technology as part of Intel Media Server Studio by providing a plug-in for the software, which delivers high quality, low latency JPEG2000 encoding. The result is a powerful encoding option available to Media Server Studio users so that they can transcode JPEG2000 contained in IMF, AS02 or MXF OP1a files to distribution formats like AVC/H.264 and HEVC/H.265, and enable software-defined processes of IP video streams in broadcast applications. By using Intel Media Server Studio to access hardware-acceleration and programmable graphics in Intel GPUs, encoding can run super fast. This is a vital benefit because fast media processing significantly reduces latency in the connection, which is particularly important in live broadcasting. Read more.

     

    SPB TV AG Showcases Innovative Mobile TV/On-demand Transcoder enabled by Intel

    Unveiled at Mobile World Congress (MWC) 2016, SPB TV AG showed its innovative single-platform product line at the event, which included a new SPB TV Astra transcoder powered by IntelSPB TV Astra is a professional solution for fast, high-quality processing of linear TV broadcast and on-demand video streams from a single head-end to any mobile, desktop or home device. The transcoder uses Intel® Core™ i7 processors with media accelerators and delivers high-density transcoding via Intel Media Server Studio. “We are delighted that our collaboration with Intel ensures faster and high quality transcoding, making our new product performance remarkable,” said CEO of SPB TV AG Kirill Filippov. Read more.

     

    SURF Communications collaborates with Intel for NFV & WebRTC all-inclusive platforms

    Also at MWC 2016, SURF Communication Solutions announced SURF ORION-HMP* and SURF MOTION-HMP*, the next building blocks of the SURF-HMP™ family. The new SURF-HMP architecture delivers fast, high-quality media acceleration - facilitating up to 4K video resolutions and ultra-high capacity HD voice and video processing - running on Intel® processors with integrated graphics, and optimized by Media Server Studio. SURF-HMP is flexibly architectured to meet the requirements of evolving and large-scale deployments, is driven-by a powerful processing engine that supports all major video and voice codecs and protocols in use, and delivers a multitude of applications such as transcoding, conferencing/mixing, MRF, playout, recording, messaging, video surveillance, encryption and more. Read more.

     


    More about Intel Media Software Tools

    Intel Media Server Studio - Provides an Intel® Media SDK, runtimes, graphics drivers, media/audio codecs, and advanced performance and quality analysis tools to help video solution providers deliver fast, high-density media transcoding.

    Intel Media SDK - A cross-platform API for developing client and media applications for Windows*. Achieve fast video plaback, encode, processing, media format conversion, and video conferencing. Accelerate RAW video and image processing. Get audio decode/encode support.

    Accelerating Media Processing: Which Media Software Tool do I use? English | Chinese

    Intel® Advisor XE 2016 Update 4 - What’s new

    $
    0
    0

    Intel® Advisor XE 2016 Update 4 - What’s new

     

    We’re pleased to announce new version of the Vectorization Assistant tool - Intel® Advisor XE 2016 Update 4.

    Below are highlights of the new functionality in Intel Advisor 2016 Update 4.

    Full support for all analysis types on the second generation Intel® Xeon Phi processor (code named Knights Landing)

    FLOPS and mask utilization

    Tech Preview feature! Accurate hardware independent FLOPS measurement tool. (AVX512 only) Mask aware . Unique capability to correlate FLOPS with performance data.

    Workflow

    Batch mode, which lets you to automate collecting multiple analysis types at once.  You can collect Survey and Trip Counts in single run – Advisor will run the application twice, but automatically, without user actions. For Memory Access Patterns (MAP) and Dependencies analyses, there are pre-defined auto-selection criterias, e.g. check Dependencies only for loops with “Assumed dependencies” issue.

    Improved MPI workflow allows you to create snapshots for MPI results, so you can collect data in CLI and transfer self-contained packed result to a workstation with GUI for analysis. We also fixed some GUI and CLI interoperability issues.

     

    Memory Access Patterns

    MAP analysis now detects Gather instruction usage, unveiling more complex access patterns. A SIMD loop with Gather instructions will work faster than scalar one, but slower, than SIMD loop without Gather operations.  If a loop has “Gather stride” category, check new “Details” tab in Refinement report for information about strides and mask shape for the gather operation. One of possible solutions is to inform compiler about your data access patterns via OpenMP 4.x options – for cases, when gather instructions are not necessary actually.

    For AVX512 MAP analysis also detects Gather/scatter instruction usage, these instructions allow more code to vectorize but you can obtain greater performance by avoiding these types of instructions.

    MAP report in enriched with Memory Footprint metric – distance in address ranges touched by given instruction. The value represents maximal footprint across all loop instances.

     

    Variable name is now reported for memory accesses, in addition to source line and assembly instruction. Therefore, you have more accuracy in determining the data structure of interest. Advisor can detect global, static, stack and heap-allocated variables.

    We added new recommendation to use SDLT for loops with an “Ineffective memory access” issue.

     

    Survey and Loop Analytics

    Loop Analytics tab has got trip counts and extended instruction mix, so you can see compute vs memory instruction distribution, scalar vs vector, ISA details, etc.

    We have improved usability of non-Executed code paths analysis, so that you can see ISA and traits in the virtual loops and sort and find AVX512 code paths more easily.

     

    Loops with vector intrinsics are exposed as vectorized in the Survey grid now.

     

    Get Intel Advisor and more information

    Visit the product site, where you can find videos and tutorials. .

    Remote Power Management of Intel® AMT Devices with InstantGo

    $
    0
    0

    Download Document

    Introduction

    InstantGo, also known as Connected Standby, creates a low OS power state that must be handled differently from how remote power management was handled in the past. This article provides information on how to support the InstantGo feature.

    How to support Remote Power Management of Intel® Active Management Technology (Intel® AMT) enabled devices with InstantGo

    InstantGo, formerly known as Windows* Connected Standby, is a Microsoft power-connectivity standard for Windows* 8 and 10. This hardware and software specification defines low power levels while maintaining network connectivity and allows for a quick startup (500 milliseconds). InstantGo replaces the s3 power state.

    To verify if a system supports InstantGo, type “powercfg /a” from a command prompt. If you have InstantGo you’ll see the Standby (Connected) as an option.

    Intel AMT and InstantGo

    Intel AMT added support for InstantGo in version 10.0, but the manufacturer must enable the feature.

    How are Intel AMT and "InstantGo" related? Intel AMT has to properly handle the various power states by communicating with the firmware, however in this case the OS, not the hardware, controls the low power state.

    Intel AMT and InstantGo prerequisites

    The only platforms fully compatible with InstantGo run Windows* 8.1 or newer with Intel AMT 10.0 or later. To remotely determine if a device OS is in a low power state, use the OSPowerSavingState method.

    One way of determining the Intel AMT version is to inspect the CIM_SoftwareIdentity.VersionString property as shown in the Get Core Version Use Case.

    Remote Verification of Device Power State

    In order to verify power state in the past we looked at the hardware power state using the CIM_AssociatedPowerManagementService.PowerState method. Now when a system is in a low OS power of InstantGo, the hardware power state will be return s0 (OS powered on) You now need to make an additional query for the OSPowerSavingState in order to determine if the OS is in FULL or low power mode.

    The Power Management Work Flow in Intel AMT

    Previous work flow for Power-On operations

    1. Query for Intel AMT Power State 
    2. If system is in s0 (power on), do nothing
    3. If system is in s3, s4 or s5, then issue a power on command using the Change Power State API

    Current recommendation to properly handle Intel AMT Devices with InstantGo

    1. Query for Intel AMT Power State 
    2. If system is in s3, s4 or s5 then issue a power on command using the Change Power State API
    3. If a system is in s0 (power on) then:
      • If Intel AMT version is 9.0 and below, do nothing
      • If Intel AMT version is 10.0 and above, query the OSPowerSavingState method
        1. If OSPowerSavingState is full power, do nothing
        2. If OSPowerSavingState is in a low power state, wake up the system to full power using RequestOSPowerSavingState method.

    There is also a sample PowerShell Script demonstrating this available for download. The script has 4 basic sections:

    1. Establishes the connection and identifies the Intel AMT Version
    2. Queries the Intel AMT device’s current power state (hardware) – Note: script assumes Intel AMT 10 and is in InstantGo low power mode
    3. Queries for the OS Power State
    4. Wakes up the Device

    For information on running PowerShell scripts with the Intel® vPro™ module please refer to the Intel AMT SDK and related Intel AMT Implementation and Reference Guide.

    Additional Resources

    Summary

    As more devices support InstantGo, integration of this technology with remote power management methodology will become critical. You want to avoid cases where devices may be detected in powered On (s0) state, when the system is actually running at a lower power state. Fortunately, supporting of InstantGo technology isn’t a difficult task, just a few additional steps to determine the actual power state.

    About the Author

    Joe Oster has been active at Intel around Intel® vPro™ technology and Intel AMT technology since 2006. He is passionate about technology and is a MSP/SMB technology advocate. When not working, he is a Dad and spends time working on his family farm or flying drones and RC Aircraft.

    Viewing all 461 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>