Using the Intel® RealSense™ Camera with TouchDesigner*: Part 3

July 25, 2016, 12:10 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 2, Application Design

≪ Previous: API without Secrets: Introduction to Vulkan* Part 4

The Intel® RealSense™ camera (R200) is a vital tool for creating VR and AR projects and real time performance interactivity. I found TouchDesigner*, created by Derivative*, to be an excellent program for utilizing the information provided by the Intel RealSense cameras.

This third article is written from the standpoint of creating real time interactivity in performances using the Intel RealSense camera in combination with TouchDesigner. Interactivity in a performance always adds a magical element. There will be example photos and videos from an in-progress interactive dance piece I am directing and creating visuals for. There will be demos showing how you can create different interactive effects using the Intel RealSense camera (R200). The interactive performance dance demo takes place in the Vortex Immersion dome in Los Angeles where I am a resident artist. The dancer and choreographer is Stevie Rae Gibbs, and Tim Hicks, cinematography and VR live action shooting, assisted me. The music was created by Winter Lazerus. The movies embedded in this article were shot by Chris Casady and Winter Lazerus.

Things to Consider When Creating an Interactive Immersive Project.

Just as in any performance there needs to be a theme. The theme of this short interactive demo is simple, liberation from what is trapping the dancer, the box weighing her down. The interactivity contributed to this theme. The effects were linked to the skeletal movements of the dancer and some were linked to the color and depth information provided by the Intel RealSense camera. The obvious linking of the effects to the dancer contributed to a sense of magic. The choreography and dancing had to work with the effects. Besides the use of theatrical lighting care had to be taken so that enough light was on the subject so that the Intel RealSense cameras could properly register. The camera distances from the dancer also had to be considered, taking into account the range of the camera and the effect wanted. The dancer also had to be careful to stay within the effective camera range.

The demo dance project is an immersive full dome performance piece so it had to be mapped to the dome. Having the effects mapped to the dome also influenced their look. For the Vortex Immersion dome, Jeff Smith of Eye Vapor has created a TouchDesigner interface for dome mapping. I used this interface as the base layer within which to put my TouchDesigner programming of the interactive effects.

Jeff Smith on Mapping the Dome Using TouchDesigner:

“There were several challenges in creating a real time mapping solution for a dome using TouchDesigner. One of the first things we had to work through was getting a perspective corrected image through each projector. The solution, which is well known now, is to place virtual projectors inside a mapped virtual dome and render out an image for each projector. Another challenge was to develop a set of alignment and blending tools to be able to perfectly calibrate and blend the projected image. And finally, we had to develop custom GLSL shaders to render real time fisheye imagery”.

Tim Hicks on Technical Aspects of Working with the RealSense Camera

“Working with the Intel RealSense camera was extremely efficient in creating a simple and stable workflow to connect our performer’s gestures through TouchDesigner, and then rendered out as interactive animations. Setup is quick and performance is reliable, even in low light, which is always an issue when working inside an immersive digital projection dome.”

Notes for Utilizing TouchDesigner and the Intel RealSense Camera

Like Part 1 and Part 2, Part 3 is aimed at those familiar with using TouchDesigner and its interface. If you are unfamiliar with TouchDesigner, before you follow the demos I recommend that you review some of the documentation and videos available here: Learning TouchDesigner. The Part 1 and Part 2 articles walk you through use of the TouchDesigner nodes described in this article, and provide sample .toe files to get you started.

A free non-commercial copy of TouchDesigner is available and is fully functional, except that the highest resolution is limited to 1280 x 1280.

Note: When using the Intel RealSense camera, it is important to pay attention to its range for best results.

Demo #1: Using the Depth Mapping of the R200 and SR300 Camera

This is a simple and effective way to create interactive colored lines that respond to the movement of the performer. In the case of this performance, the lines wrapped and animated around the entire dome in response to the movement of the dancer.

Create the nodes you will need, arrange, and connect/wire them in a horizontal row in this order:
- RealSense TOP node
- Level TOP node
- Luma Level TOP node
Open the Setup parameters page of the RealSense TOP node and set the Image parameter to Depth.
Set the parameters of the Level TOP and the Luma Level TOP to offset the brightness and contrast. Judge this by looking at the result you are getting in the effect.
Figure1.You are using the Depth setting in the RealSense TOP node for the R200 camera.
Create a Blur TOP and a Displace TOP.
Wire the Luma Level TOP to the Blur TOP and the top connector on the Displace TOP.
Connect the Blur TOP to the bottom connector of the Displace TOP (Note: the filter size of the blur should be based on what you want your final effect to look like).
Figure 2.Set the Filter for the Blur TOP at 100 as a starting point
Create a Ramp TOP, a Composite TOP.
Choose the colors you want your line to be in the Ramp TOP.
Connect the Displace TOP to the top connector in the Composite TOP and the Ramp TOP to the bottom connector in the Displace TOP.
Figure 3.You are using the Depth setting in the RealSense TOP node for the R200 camera.

Figure 4.The complete network for the effect.
Watch how the line reacts to the performer's motions.
Figure 5.Videofrom the demo performance of the colored lines created from the depth mapping of the performer by the RealSense camera.

Demo #2: RealSense TOP Depth Mapping Second Effect

In this demo, we use TouchDesigner with the depth feature of the Intel RealSense R200 Camera to project duplicates of the performer and offset them in time. I used it in the demo performance to project several images of the dancer moving at different times, creating the illusion of more than one dancer. Note that this effect was not in the dance performance but it is very worth using.

Add a RealSense TOP node to your scene.
On the Setup parameters page for the RealSense TOP node, for the Image parameter select Depth.

Create two Level TOP nodes and connect the RealSense TOP node to each of them.
Figure 6.You are using the Depth setting in the RealSense TOP node for the R200 camera.
Adjust the level node parameters to give you the amount of contrast and brightness you want on your effect. You might go back after seeing the effect and readjust the parameters. As a starting point for both Level TOPS, in the Pre Parameters page, set the Brightness parameter to 2 and the Gamma parameter to 1.75.
Create a Transform TOP and wire it to level2 TOP.
In the Transform TOP Parameters, on the Transform page, set the Translate x parameter to .2.Note that translating the x 1 would move the image fully off.
Create two Cache TOP nodes and wire one to the Transform TOP and one to level1 TOP.
On the cache1 TOPs parameters Cache Page, set Cache Size to 32 and Output Index to -20.
On the cache2 TOPs parameters Cache Page, set Cache Size to 32 and the Output Index to -40. I am using the Cache TOP to save and offset the timing of the images. Note that once you see how your effect is working with your performance you will want to go back and readjust these settings.
Notes on the Cache TOP: The Cache TOP can be used to freeze images in the TOP by turning the Active parameter to Off. (You can set the cache size to 1.) The Cache TOP acts as a delay if you set Output Index to a negative number and leave the Active parameter at On. Once a sequence of images has been captures by turning the On parameter on and off, they can be looped by animating the Output Index parameter.
For more info on the Cache TOP, click here.
Figure 7.You could add in more Level TOPS to create more duplicates.
Wire both Cache TOPS to a Difference TOP.
Figure 8.The Cache TOPS are wired to the Diff TOP so that both images of the performer will be seen.

Figure 9.The entire network for the effect. Look at the effect when projected in your performance, go back, and adjust the node parameters as necessary.

Demo #3: RealSense TOP Color Mapping For Texture Mapping

Using the RealSense TOP node to texture map the geometries, in this case the boxes with the dancers moving image.

Create a Geometry COMP and go inside it, down one level (/project1/geo1) and create an In SOP.
Go back up to project1 and create a Box SOP.
In the Box SOP Parameters, set the Texture Coordinates to Face Outside. This will insure that each face will get the full texture (Zero to 1).
Wire the Box SOP to the Geometry COMPs input.
Create a RealSense TOP Node and in the Parameters Setup page, set the Model to R200 and the Image to Color.
Create a Phong MAT and in the Parameters RGB page set the Color Map to realsense1 or alternatively drag the RealSense TOP node into the Color Map parameter.
In the Geo COMP Render parameter page, Material put phong1
Create a Render TOP, a Camera COMP, a Light COMP.
Create a Reorder TOP and in the Reorder parameter page, set the Output Alpha, Input 1 to One using the drop down.
Figure 10.The entire network to show how the Intel RealSense R200 Color mode can be used to texture all sides of a Box Geo.

Figure 11.The dancer appears to be holding up the box, which is textured with her image.

Figure 12.Multiple boxes with the image of the dancer animate around the dancer once she has lifted the box off herself.

Demo #4: RealSense CHOP Movement Control Over Large Particle Sphere

For this effect, I wanted the dancer to be able to interact playfully with a large particle ball. She moves towards the sphere and it moves away from her.

Create a RealSense CHOP node. In the Parameters, Set Up page, Model to be an R200, the Mode to Finger/FaceTracking. Turn On the Person Center-Mass World Position and the Persons Center Mass Color Position.
Connect the RealSense CHOP node to a Select CHOP node.
In the Select CHOP, Select page, set the ChannelNames to, person1_center_mass:tx.
Create Math CHOP node, leave the defaults on for now, (You can adjust them later as needed in your performance) and wire the select CHOP node to the Math CHOP node.
Create a Lag CHOP node and wire the Math CHOP node to that.
Connect the Lag CHOP node to a Null CHOP node and connect the Null CHOP node to a Trail CHOP node.
Figure 13.The entire network to show how the RealSense R200 CHOP can be hooked up. The Trail CHOP node is very useful for seeing if and how much the RealSense camera is working.
Create a Taurus SOP, connect it to a Transform SOP and then connect the Transform SOP to a Material SOP.
Create a Point Sprite MAT.
In the Point Sprite MAT, Point Sprite parameters page, choose a yellow color.
In the Material SOP, parameters page, set the Material to pointsprite1
Create a Copy SOP, keep its default parameter settings, and wire the Material SOP to the bottom connection on it.
Create a Sphere SOP, wire it to a Particle SOP.
Wire the Particle SOP to the top connector in the Copy SOP.
In the Particle SOP, State parameter page, Particle Type, to Render as Point Sprites.
Connect the Copy SOP to a Geo COMP. Go one level down project1/geo1. Delete the Torus SOP and create an In SOP.
Figure 14.For the more advanced a Point Sprite MAT can be used to change the look of the particles
Export the personal1_center_mass:tx channel from the Null SOP to the Transform SOP parameters, the Transform page, Translate tx.
Figure 15. Exporting the channel.

Figure 16.The large particle ball assume a personality as the dancer plays with it, trying to catch it.

Demo #5: Buttons to Control Effects

Turning on and off interactive effects is important. In this demo, I will show the simplest way to do this using a button.

Create a Button COMP.
Connect it to a Null CHOP.
Activate and export the channel from the Null CHOP to the Parameters, Render Page of the Geo COMP from the previous Demo 4. Pressing the button will turn the render of the Geo COMP on and off.
Figure 17.An elementary button set up

Summary

This article is designed to give the reader some basic starting points, techniques and ideas as to how to use the RealSense camera to create interactivity in a performance. There are many more sophisticated effects to be explored using the RealSense camera in combination with TouchDesigner.

Related Applications

Many apps that people have created for the RealSense camera are very useful.

https://appshowcase.intel.com/en-us/realsense/node/9167?cam=all-cam - drummer app for Intel RealSense Cameras.

https://appshowcase.intel.com/en-us/realsense?cam=all-cam - apps for all Intel RealSense cameras.

About the Author

Audri Phillips is a visualist/3d animator based out of Los Angeles, with a wide range of experience that includes over 25 years working in the visual effects/entertainment industry in studios such as Sony*, Rhythm and Hues*, Digital Domain*, Disney*, and Dreamworks* feature animation. Starting out as a painter she was quickly drawn to time based art. Always interested in using new tools she has been a pioneer of using computer animation/art in experimental film work including immersive performances. Now she has taken her talents into the creation of VR. Samsung* recently curated her work into their new Gear Indie Milk VR channel.

Her latest immersive work/animations include: Multi Media Animations for "Implosion a Dance Festival" 2015 at the Los Angeles Theater Center, 4 Full dome Concerts in the Vortex Immersion dome, one with the well-known composer/musician Steve Roach. The most recent being the fulldome concert, "Relentless Universe”. She also created animated content for the dome show for the TV series, “Constantine*” shown at the 2014 Comic-Con convention. Several of her Fulldome pieces, “Migrations” and “Relentless Beauty”, have been juried into "Currents", The Santa Fe International New Media Festival, and Jena FullDome Festival in Germany. She exhibits in the Young Projects gallery in Los Angeles.

She writes online content and a blog for Intel®. Audri is an Adjunct professor at Woodbury University, a founding member and leader of the Los Angeles Abstract Film Group, founder of the Hybrid Reality Studio (dedicated to creating VR content), a board member of the Iota Center, and an exhibiting member of the LA Art Lab. In 2011 Audri became a resident artist of Vortex Immersion Media and the c3: CreateLAB. A selection of her works are available on Vimeo , on creativeVJ and on Vrideo.

↧

Intel® Software Guard Extensions Tutorial Series: Part 2, Application Design

July 26, 2016, 2:03 pm

Latest and popular articles on Intel Technologies

≫ Next: Using Enclaves from .NET: Making ECALLS with Callbacks via OCALLS

≪ Previous: Using the Intel® RealSense™ Camera with TouchDesigner*: Part 3

The second part in the Intel® Software Guard Extensions (Intel® SGX) tutorial series is a high-level specification for the application we’ll be developing: a simple password manager. Since we’re building this application from the ground up, we have the luxury of designing for Intel SGX from the start. That means that in addition to laying out our application’s requirements, we’ll examine how Intel SGX design decisions and the overall application architecture influence one another.

Read the first tutorial in the series or find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Password Managers At-A-Glance

Most people are probably familiar with password managers and what they do, but it’s a good idea to review the fundamentals before we get into the details of the application design itself.

The primary goals of a password manager are to:

Reduce the number of passwords that end users need to remember.
Enable end users to create stronger passwords than they would normally choose on their own.
Make it practical to use a different password for every account.

Password management is a growing problem for Internet users, and numerous studies have tried to quantify the problem over the years. A Microsoft study published in 2007—nearly a decade ago as of this writing—estimated that the average person had 25 accounts that required passwords. More recently, in 2014 Dashlane estimated that their US users had an average of 130 accounts, while the number of accounts for their worldwide users averaged in the 90s. And the problems don’t end there: people are notoriously bad at picking “good” passwords, frequently reusing the same password on multiple sites, which has led to some spectacular attacks. These problems boil down to two basic issues: passwords that are hard for hacking tools to guess are often difficult for people to remember, and having a greater number of passwords makes this problem more complex by having to remember which password is associated with which account.

With a password manager, you only need to remember one very strong passphrase in order to gain access to your password database or vault. Once you have authenticated to your password manager, you can look up any passwords you have stored, and copy and paste them into authentication fields as needed. Of course, the key vulnerability of the password manager is the password database itself: since it contains all of the user’s passwords it is an attractive target for attackers. For this reason, the password database is encrypted with strong encryption techniques, and the user’s master passphrase becomes the means for decrypting the data inside of it.

Our goal in this tutorial is to build a simple password manager that provides the same core functions as a commercial product while following good security practices and use that as a learning vehicle for designing for Intel SGX. The tutorial password manager, which we’ll name the “Tutorial Password Manager with Intel® Software Guard Extensions” (yes, that’s a mouthful, but it’s descriptive), is not intended to function as a commercial product and certainly won’t contain all the safeguards found in one, but that level of detail is not necessary.

Basic Application Requirements

Some basic application requirements will help narrow down the scope of the application so that we can focus on the Intel SGX integration rather than the minutiae of application design and development. Again, the goal is not to create a commercial product: the Tutorial Password Manager with Intel SGX does not need to run on multiple operating systems or on all possible CPU architectures. All we require is a reasonable starting point.

To that end, our basic application requirements are:

The first requirement may seem strange given that this tutorial series is about Intel SGX application development, but real-world applications need to consider the legacy installation base. For some applications it may be appropriate to restrict execution only to Intel SGX-capable platforms, but for the Tutorial Password Manager we’ll use a less rigid approach. An Intel SGX-capable platform will receive a hardened execution environment, but non-capable platforms will still function. This usage is appropriate for a password manager, where the user may need to synchronize his or her password database with other, older systems. It is also a learning opportunity for implementing dual code paths.

The second requirement gives us access to certain cryptographic algorithms in the non-Intel SGX code path and to some libraries that we’ll need. The 64-bit requirement simplifies application development by ensuring access to native 64-bit types and also provides a performance boost for certain cryptographic algorithms that have been optimized for 64-bit code.

The third requirement gives us access to the RDRAND instruction in the non-Intel SGX code path. This greatly simplifies random number generation and ensures access to a high-quality entropy source. Systems that support the RDSEED instruction will make use of that as well. (For information on the RDRAND and RDSEED instructions, see the Intel® Digital Random Number Generator Software Implementation Guide.)

The fourth requirement keeps the list of software required by the developer (and the end user) as short as possible. No third-party libraries, frameworks, applications, or utilities need to be downloaded and installed. However, this requirement has an unfortunate side effect: without third-party frameworks, there are only four options available to us for creating the user interface. Those options are:

Win32 APIs
Microsoft Foundation Classes (MFC)
Windows Presentation Foundation (WPF)
Windows Forms

The first two are implemented in native/unmanaged code while the latter two require .NET*.

The User Interface Framework

For the Tutorial Password Manager, we’re going to be developing the GUI using Windows Presentation Foundation in C#. This design decision impacts our requirements as follows:

Why use WPF? Mostly because it simplifies the UI design while introducing complexity that we actually want. Specifically, by relying on the .NET Framework, we have the opportunity to discuss mixing managed code, and specifically high-level languages, with enclave code. Note, though, that choosing WPF over Windows Forms was arbitrary: either environment would work.

As you might recall, enclaves must be written in native C or C++ code, and the bridge functions that interact with the enclave must be native C (not C++) functions. While both Win32 APIs and MFC provide an opportunity to develop the password manager with 100-percent native C/C++ code, the burden imposed by these two methods does nothing for those who want to learn Intel SGX application development. With a GUI based in managed code, we not only reap the benefits of the integrated design tools but also have the opportunity to discuss something that is of potential value to Intel SGX application developers. In short, you aren’t here to learn MFC or raw Win32, but you might want to know how to glue .NET to enclaves.

To bridge the managed and unmanaged code we’ll be using C++/CLI (C++ modified for Common Language Infrastructure). This greatly simplifies the data marshaling and is so convenient and easy to use that many developers refer to it as IJW (“It Just Works”).

Figure 1: Minimum component structures for native and C# Intel® Software Guard Extensions applications.

Figure 1 shows the impact to an Intel SGX application’s minimum component makeup when it is moved from native code to C#. In the fully native application, the application layer can interact directly with the enclave DLL since the enclave bridge functions can be incorporated into the application’s executable. In a mixed-mode application, however, the enclave bridge functions need to be isolated from the managed code block because they are required to be 100-percent native code. The C# application, on the other hand, can’t interact with the bridge functions directly, and in the C++/CLI model that means creating another intermediary: a DLL that marshals data between the managed C# application and the native, enclave bridge DLL.

Password Vault Requirements

At the core of the password manager is the password database, or what we’ll be referring to as the password vault. This is the encrypted file that will hold the end user’s account information and passwords. The basic requirements for our tutorial application are:

The requirement that the vault be portable means that we should be able to copy the vault file to another computer and still be able to access its contents, whether or not they support Intel SGX. In other words, the user experience should be the same: the password manager should work seamlessly (so long as the system meets the base hardware and OS requirements, of course).

Encrypting the vault at rest means that the vault file should be encrypted when it is not actively in use. At a minimum, the vault must be encrypted on disk (without the portability requirement, we could potentially solve the encryption requirements by using the sealing feature of Intel SGX) and should not sit decrypted in memory longer than is necessary.

Authenticated encryption provides assurances that the encrypted vault has not been modified after the encryption has taken place. It also gives us a convenient means of validating the user’s passphrase: if the decryption key is incorrect, the decryption will fail when validating the authentication tag. That way, we don’t have to examine the decrypted data to see if it is correct.

Passwords

Any account information is sensitive information for a variety of reasons, not the least of which is that it tells an attacker exactly which logins and sites to target, but the passwords are arguably the most critical piece of the vault. Knowing what account to attack is not nearly as attractive as not needing to attack it at all. For this reason, we’ll introduce additional requirements on the passwords stored in the vault:

This is nesting the encryption. The passwords for each of the user’s accounts are encrypted when stored in the vault, and the entire vault is encrypted when written to disk. This approach allows us to limit the exposure of the passwords once the vault has been decrypted. It is reasonable to decrypt the vault as a whole so that the user can browse their account details, but displaying all of their passwords in clear text in this manner would be inappropriate.

An account password is only decrypted when a user asks to see it. This limits its exposure both in memory and on the user’s display.

Cryptographic Algorithms

With the encryption needs identified it is time to settle on the specific cryptographic algorithms, and it’s here that our existing application requirements impose some significant limits on our options. The Tutorial Password Manager must provide a seamless user experience on both Intel SGX and non-Intel SGX platforms, and it isn’t allowed to depend on third-party libraries. That means we have to choose an algorithm, and a supported key and authentication tag size, that is common to both the Windows CNG API and the Intel SGX trusted crypto library. Practically speaking, this leaves us with just one option: Advanced Encryption Standard-Galois Counter Mode (AES-GCM) with a 128-bit key. This is arguably not the best encryption mode to use in this application, especially since the effective authentication tag strength of 128-bit GCM is less than 128 bits, but it is sufficient for our purposes. Remember: the goal here is not to create a commercial product, but rather a useful learning vehicle for Intel SGX development.

With GCM come some other design decisions, namely the IV length (12 bytes is most efficient for the algorithm) and the authentication tag.

Encryption Keys and User Authentication

With the encryption method chosen, we can turn our attention to the encryption key and user authentication. How will the user authenticate to the password manager in order to unlock their vault?

The simple approach would be to derive the encryption key directly from the user’s passphrase or password using a key derivation function (KDF). But while the simple approach is a valid one, it does have one significant drawback: if the user changes his or her password, the encryption key changes along with it. Instead, we’ll follow the more common practice of encrypting the encryption key.

In this method, the primary encryption key is randomly generated using a high-quality entropy source and it never changes. The user’s passphrase or password is used to derive a secondary encryption key, and the secondary key is used to encrypt the primary key. This approach has some key advantages:

The data does not have to be re-encrypted when the user’s password or passphrase changes
The encryption key never changes, so it could theoretically be written down in, say, hexadecimal notation and locked in a physically secure location. The data could thus still be decrypted even if the user forgot his or her password. Since the key never changes, it would only have to be written down once.
More than one user could, in theory, be granted access to the data. Each would encrypt a copy of the primary key with their own passphrase.

Not all of these are necessarily critical or relevant to the Tutorial Password Manager, but it’s a good security practice nonetheless.

Here the primary key is called the vault key, and the secondary key that is derived from the user’s passphrase is called the master key. The user authenticates by entering their passphrase, and the password manager derives a master key from it. If the master key successfully decrypts the vault key, the user is authenticated and the vault can be decrypted. If the passphrase is incorrect, the decryption of the vault key fails and that prevents the vault from being decrypted.

The final requirement, building the KDF around SHA-256, comes from the constraint that we find a hashing algorithm common to both the Windows CNG API and the Intel SGX trusted crypto library.

Account Details

The last of the high-level requirements is what actually gets stored in the vault. For this tutorial, we are going to keep things simple. Figure 2 shows an early mockup of the main UI screen.

Figure 2:Early mockup of the Tutorial Password Manager main screen.

The last requirement is all about simplifying the code. By fixing the number of accounts stored in the vault, we can more easily put an upper bound on how large the vault can be. This will be important when we start designing our enclave. Real-world password managers do not, of course, have this luxury, but it is one that can be afforded for the purposes of this tutorial.

Coming Up Next

In part 3 of the tutorial we’ll take a closer look at designing our Tutorial Password Manager for Intel SGX. We’ll identify our secrets, which portions of the application should be contained inside the enclave, how the enclave will interact with the core application, and how the enclave impacts the object model. Stay tuned!

Read the first tutorial in the series, Intel® Software Guard Extensions Tutorial Series: Part 1, Intel® SGX Foundation or find the list of all the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

↧

Using Enclaves from .NET: Making ECALLS with Callbacks via OCALLS

August 23, 2016, 11:01 am

Latest and popular articles on Intel Technologies

≫ Next: Improve Video Quality, Build Extremely Efficient Encoders & Decoders with Intel® VPA & Intel® SBE

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 2, Application Design

One question about Intel® Software Guard Extensions (Intel® SGX) that comes up frequently is how to mix enclaves with managed code on Microsoft Windows* platforms, particularly with the C# language. While enclaves themselves must be 100 percent native code and the enclave bridge functions must be 100 percent native code with C (and not C++) linkages, it is possible, indirectly, to make an ECALL into an enclave from .NET and to make an OCALL from an enclave into a .NET object. There are multiple solutions for accomplishing these tasks, and this article and its accompanying code sample demonstrate one approach.

Mixing Managed Code and Native Code with C++/CLI

Microsoft Visual Studio* 2005 and later offers three options for calling unmanaged code from managed code:

Platform Invocation Services, commonly referred to by developers as P/Invoke
COM
C++/CLI

P/Invoke is good for calling simple C functions in a DLL, which makes it a reasonable choice for interfacing with enclaves, but writing P/Invoke wrappers and marshaling data can be difficult and error-prone. COM is more flexible than P/Invoke, but it is also more complicated; that additional complexity is unnecessary for interfacing with the C bridge functions required by enclaves. This code sample uses the C++/CLI approach.

C++/CLI offers significant convenience by allowing the developer to mix managed and unmanaged code in the same module, creating a mixed-mode assembly which can in turn be linked to modules comprised entirely of either managed or native code. Data marshaling in C++/CLI is also fairly easy: for simple data types it is done automatically through direct assignment, and helper methods are provided for more complex types such as arrays and strings. Data marshaling is, in fact, so painless in C++/CLI that developers often refer to the programming model as IJW (an acronym for “it just works”).

The trade-off for this convenience is that there can be a small performance penalty due to the extra layer of functions, and it does require that you produce an additional DLL when interfacing with Intel SGX enclaves.

Minimum component makeup of an Intel® Software Guard Extensions application written in C# and C++/CLI.

Figure 1Minimum component makeup of an Intel® Software Guard Extensions application written in C# and C++/CLI.

Figure 1 illustrates the component makeup of a C# application when using the C++/CLI model. The managed application consists of, at minimum, a C# executable, a C++/CLI DLL, the native enclave bridge DLL, and the enclave DLL itself.

The Sample Application

The sample application provides two functions that execute inside of an enclave: one calls CPUID, and the other generates random data in 1KB blocks and XORs them together to produce a final 1KB block of random bytes. This is a multithreaded application, and you can run all three tasks simultaneously. The user interface is shown in Figure 2.

Sample application user interface.

Figure 2:Sample application user interface.

To build the application you will need the Intel SGX SDK. This sample was created using the 1.6 Intel SGX SDK and built with Microsoft Visual Studio 2013. It targets the .NET framework 4.5.1.

The CPUID Tab

On the CPUID panel, you enter a value for EAX to pass to the CPUID instruction. When you click query, the program executes an ECALL on the current thread and runs the sgx_cpuid() function inside the enclave. Note that sgx_cpuid() does, in turn, make an OCALL to execute the CPUID instruction, since CPUID is not a legal instruction inside an enclave. This OCALL is automatically generated for you by the edgr8tr tool when you build your enclave. See the Intel SGX SDK Developer Guide for more information on the sgx_cpuid() function.

The RDRAND Tab

On the RDRAND panel you can generate up to two simultaneous background threads. Each thread performs the same task: it makes an ECALL to enter the enclave and generates the target amount of random data using the sgx_read_rand() function in 1 KB blocks. Each 1 KB block is XORd with the previous block to produce a final 1 KB block of random data that is returned to the application (the first block is XORd with a block of 0s).

For every 1 MB of random data that is generated, the function also executes an OCALL to send the progress back up to the main application via a callback. The callback function then runs a thread in the UI context to update the progress bar.

Because this function runs asynchronously, you can have both threads in the UI active at once and even switch to the CPUID tab to execute that ECALL while the RDRAND ECALLs are still active.

Overall Structure

The application is made up of the following components, three of which we’ll examine in detail:

C# application. A Windows Forms*-based application that implements the user interface.
EnclaveLink.dll. A mixed-mode DLL responsible for marshaling data between .NET and native code. This assembly contains two classes: EnclaveLinkManaged and EnclaveLinkNative.
EnclaveBridge.dll. A native DLL containing the enclave bridge functions. These are pure C functions.
Enclave.dll (Enclave.signed.dll). The Intel SGX enclave.

There is also a fifth component, sgx_support_detect.dll, which is responsible for the runtime check of Intel SGX capability. It ensures that the application exits gracefully when run on a system that does not support Intel SGX. We won’t be discussing this component here, but for more information on how it works and why it’s necessary, see the article Properly Detecting Intel® Software Guard Extensions in Your Applications.

The general application flow is that the enclave is not created immediately when the application launches. It initializes some global variables for referencing the enclave and creates a mutex. When a UI event occurs, the first thread that needs to run an enclave function checks to see if the enclave has already been created, and if not, it launches the enclave. All subsequent threads and events reuse that same enclave. In order to keep the sample application architecture relatively simple, the enclave is not destroyed until the program exists.

The C# Application

The main executable is written in C#. It requires a reference to the EnclaveLink DLL in order to execute the C/C++ methods that eventually call into the enclave.

On startup, the application calls static methods to prepare the application for the enclave, and then closes it on exit:

        public FormMain()
        {
            InitializeComponent();
            // This doesn't create the enclave, it just initializes what we need
            // to do so in an multithreaded environment.
            EnclaveLinkManaged.init_enclave();
        }

        ~FormMain()
        {
            // Destroy the enclave (if we created it).
            EnclaveLinkManaged.close_enclave();
        }

These two functions are simple wrappers around functions in EnclaveLinkNative and are discussed in more detail below.

When either the CPUID or RDRAND functions are executed via the GUI, the application creates an instance of class EnclaveLinkManaged and executes the appropriate method. The CPUID execution flow is shown, below:

      private void buttonCPUID_Click(object sender, EventArgs e)
        {
            int rv;
            UInt32[] flags = new UInt32[4];
            EnclaveLinkManaged enclave = new EnclaveLinkManaged();

            // Query CPUID and get back an array of 4 32-bit unsigned integers

            rv = enclave.cpuid(Convert.ToInt32(textBoxLeaf.Text), flags);
            if (rv == 1)
            {
                textBoxEAX.Text = String.Format("{0:X8}", flags[0]);
                textBoxEBX.Text = String.Format("{0:X8}", flags[1]);
                textBoxECX.Text = String.Format("{0:X8}", flags[2]);
                textBoxEDX.Text = String.Format("{0:X8}", flags[3]);
            }
            else
            {
                MessageBox.Show("CPUID query failed");
            }
        }

The callbacks for the progress bar in the RDRAND execution flow are implemented using a delegate, which creates a task in the UI context to update the display. The callback methodology is described in more detail later.

        Boolean cancel = false;
        progress_callback callback;
        TaskScheduler uicontext;

        public ProgressRandom(int mb_in, int num_in)
        {
            enclave = new EnclaveLinkManaged();
            mb = mb_in;
            num = num_in;
            uicontext = TaskScheduler.FromCurrentSynchronizationContext();
            callback = new progress_callback(UpdateProgress);

            InitializeComponent();

            labelTask.Text = String.Format("Generating {0} MB of random data", mb);
        }

        private int UpdateProgress(int received, int target)
        {
            Task.Factory.StartNew(() =>
            {
                progressBarRand.Value = 100 * received / target;
                this.Text = String.Format("Thread {0}: {1}% complete", num, progressBarRand.Value);
            }, CancellationToken.None, TaskCreationOptions.None, uicontext);

            return (cancel) ? 0 : 1;
        }

The EnclaveLink DLL

The primary purpose of the EnclaveLink DLL is to marshal data between .NET and unmanaged code. It is a mixed-mode assembly that contains two objects:

EnclaveLinkManaged, a managed class that is visible to the C# layer
EnclaveLinkNative, a native C++ class

EnclaveLinkManaged contains all of the data marshaling functions, and its methods have variables in both managed and unmanaged memory. It ensures that only unmanaged pointers and data get passed to EnclaveLinkNative. Each instance of EnclaveLinkManaged contains an instance of EnclaveLinkNative, and the methods in EnclaveLinkManaged are essentially wrappers around the methods in the native class.

EnclaveLinkNative is responsible for interfacing with the enclave bridge functions in the EnclaveBridge DLL. It also is responsible for initializing the global enclave variables and handling the locking.

#define MUTEX L"Enclave"

static sgx_enclave_id_t eid = 0;
static sgx_launch_token_t token = { 0 };
static HANDLE hmutex;
int launched = 0;

void EnclaveLinkNative::init_enclave()
{
	hmutex = CreateMutex(NULL, FALSE, MUTEX);
}

void EnclaveLinkNative::close_enclave()
{
	if (WaitForSingleObject(hmutex, INFINITE) != WAIT_OBJECT_0) return;

	if (launched) en_destroy_enclave(eid);
	eid = 0;
	launched = 0;

	ReleaseMutex(hmutex);
}

int EnclaveLinkNative::get_enclave(sgx_enclave_id_t *id)
{
	int rv = 1;
	int updated = 0;

	if (WaitForSingleObject(hmutex, INFINITE) != WAIT_OBJECT_0) return 0;

	if (launched) *id = eid;
	else {
		sgx_status_t status;

		status= en_create_enclave(&token, &eid, &updated);
		if (status == SGX_SUCCESS) {
			*id = eid;
			rv = 1;
			launched = 1;
		} else {
			rv= 0;
			launched = 0;
		}
	}
	ReleaseMutex(hmutex);

	return rv;
}

The EnclaveBridge DLL

As the name suggests, this DLL holds the enclave bridge functions. This is a 100 percent native assembly with C linkages, and the methods from EnclaveLinkNative call into these functions. Essentially, they marshal data and wrap the calls in the mixed mode assembly to and from the enclave.

The OCALL and the Callback Sequence

The most complicated piece of the sample application is the callback sequence used by the RDRAND operation. The OCALL must propagate from the enclave all the way up the application to the C# layer. The task is to pass a reference to a managed class instance method (a delegate) down to the enclave so that it can be invoked via the OCALL. The challenge is to do that within the following restrictions:

The enclave is in its own DLL, which cannot depend on other DLLs.
The enclave only supports a limited set of data types.
The enclave can only link against 100 percent native functions with C linkages.
There cannot be any circular DLL dependencies.
The methodology must be thread-safe.
The user must be able to cancel the operation.

The Delegate

The delegate is prototyped inside of EnclaveLinkManaged.h along with the EnclaveLinkManaged class definition:

public delegate int progress_callback(int, int);

public ref class EnclaveLinkManaged
{
	array<BYTE> ^rand;
	EnclaveLinkNative *native;

public:
	progress_callback ^callback;

	EnclaveLinkManaged();
	~EnclaveLinkManaged();

	static void init_enclave();
	static void close_enclave();

	int cpuid(int leaf, array<UINT32>^ flags);
	String ^genrand(int mb, progress_callback ^cb);

	// C++/CLI doesn't support friend classes, so this is exposed publicly even though
	// it's only intended to be used by the EnclaveLinkNative class.

	int genrand_update(int generated, int target);
};

When each ProgressRandom object is instantiated, a delegate is assigned in the variable callback, pointing to the UpdateProgress instance method:

    public partial class ProgressRandom : Form
    {
        EnclaveLinkManaged enclave;
        int mb;
        Boolean cancel = false;
        progress_callback callback;
        TaskScheduler uicontext;
        int num;

        public ProgressRandom(int mb_in, int num_in)
        {
            enclave = new EnclaveLinkManaged();
            mb = mb_in;
            num = num_in;
            uicontext = TaskScheduler.FromCurrentSynchronizationContext();
            callback = new progress_callback(UpdateProgress);

            InitializeComponent();

            labelTask.Text = String.Format("Generating {0} MB of random data", mb);
        }

This variable is passed as an argument to the EnclaveLinkManaged object when the RDRAND operation is requested:

        public Task<String> RunAsync()
        {
            this.Refresh();

            // Create a thread using Task.Run

            return Task.Run<String>(() =>
            {
                String data;

                data= enclave.genrand(mb, callback);

                return data;
            });
        }

The genrand() method inside of EnclaveLinkManaged saves this delegate to the property “callback”. It also creates a GCHandle that both points to itself and pins itself in memory, preventing the garbage collector from moving it in memory and thus making it accessible from unmanaged memory. This handle is passed as a pointer to the native object.

This is necessary because we cannot directly store a handle to a managed object as a member of an unmanaged class.

String ^EnclaveLinkManaged::genrand(int mb, progress_callback ^cb)
{
	UInt32 rv;
	int kb= 1024*mb;
	String ^mshex = gcnew String("");
	unsigned char *block;
	// Marshal a handle to the managed object to a system pointer that
	// the native layer can use.
	GCHandle handle= GCHandle::Alloc(this);
	IntPtr pointer= GCHandle::ToIntPtr(handle);

	callback = cb;
	block = new unsigned char[1024];
	if (block == NULL) return mshex;

	// Call into the native layer. This will make the ECALL, which executes
	// callbacks via the OCALL.

	rv= (UInt32) native->genrand(kb, pointer.ToPointer(), block);

In the native object, we now have a pointer to the managed object, which we save in the member variable managed.

Next, we use a feature of C++11 to create a std::function reference that is bound to a class method. Unlike standard C function pointers, this std::function reference points to the class method in our instantiated object, not to a static or global function.

DWORD EnclaveLinkNative::genrand (int mkb, void *obj, unsigned char rbuffer[1024])
{
	using namespace std::placeholders;
	auto callback= std::bind(&EnclaveLinkNative::genrand_progress, this, _1, _2);
	sgx_status_t status;
	int rv;
	sgx_enclave_id_t thiseid;

	if (!get_enclave(&thiseid)) return 0;

	// Store the pointer to our managed object as a (void *). We'll Marshall this later.

	managed = obj;

	// Retry if we lose the enclave due to a power transition
again:
	status= en_genrand(thiseid, &rv, mkb, callback, rbuffer);

Why do we need this layer of indirection? Because the next layer down, EnclaveBridge.dll, cannot have a linkage dependency on EnclaveLink.dll as this would create a circular reference (where A depends on B, and B depends on A). EnclaveBridge.dll needs an anonymous means of pointing to our instantiated class method.

Inside en_genrad() in EnclaveBridge.cpp, this std::function is converted to a void pointer. Enclaves only support a subset of data types, and they don’t support any of the C++11 extensions regardless. We need to convert the std::function pointer to something the enclave will accept. In this case, that means passing the pointer address in a generic data buffet. Why use void instead of an integer type? Because the size of a std::function pointer varies by architecture.

typedef std::function<int(int, int)> progress_callback_t;

ENCLAVENATIVE_API sgx_status_t en_genrand(sgx_enclave_id_t eid, int *rv, int kb, progress_callback_t callback, unsigned char *rbuffer)
{
	sgx_status_t status;
	size_t cbsize = sizeof(progress_callback_t);

	// Pass the callback pointer to the enclave as a 64-bit address value.
	status = e_genrand(eid, rv, kb, (void *)&callback, cbsize, rbuffer);

	return status;
}

Note that we not only must allocate this data buffer, but also tell the edgr8r tool how large the buffer is. That means we need to pass the size of the buffer in as an argument, even though it is never explicitly used.

Inside the enclave, the callback parameter literally just gets passed through and out the OCALL. The definition in the EDL file looks like this:

enclave {
	from "sgx_tstdc.edl" import *;

    trusted {
        /* define ECALLs here. */

		public int e_cpuid(int leaf, [out] uint32_t flags[4]);
		public int e_genrand(int kb, [in, size=sz] void *callback, size_t sz, [out, size=1024] unsigned char *block);
    };

    untrusted {
        /* define OCALLs here. */

		int o_genrand_progress ([in, size=sz] void *callback, size_t sz, int progress, int target);
    };
};

The callback starts unwinding in the OCALL, o_genrand_progress:

typedef std::function<int(int, int)> progress_callback_t;

int o_genrand_progress(void *cbref, size_t sz, int progress, int target)
{
	progress_callback_t *callback = (progress_callback_t *) cbref;

	// Recast as a pointer to our callback function.

	if (callback == NULL) return 1;

	// Propogate the cancellation condition back up the stack.
	return (*callback)(progress, target);
}

The callback parameter, cbref, is recast as a std::function binding and then executed with our two arguments: progress and target. This points back to the genrand_progress() method inside of the EnclaveLinkNative object, where the GCHandle is recast to a managed object reference and then executed.

int __cdecl EnclaveLinkNative::genrand_progress (int generated, int target)
{
	// Marshal a pointer to a managed object to native code and convert it to an object pointer we can use
	// from CLI code

	EnclaveLinkManaged ^mobj;
	IntPtr pointer(managed);
	GCHandle mhandle;

	mhandle= GCHandle::FromIntPtr(pointer);
	mobj= (EnclaveLinkManaged ^)mhandle.Target;

	// Call the progress update function in the Managed version of the object. A retval of 0 means
	// we should cancel our operation.

	return mobj->genrand_update(generated, target);
}

The next stop is the managed object. Here, the delegate that was saved in the callback class member is used to call up to the C# method.

int EnclaveLinkManaged::genrand_update(int generated, int target)
{
	return callback(generated, target);
}

This executes the UpdateProgress() method, which updates the UI. This delegate returns an int value of either 0 or 1, which represents the status of the cancellation button:

        private int UpdateProgress(int received, int target)
        {
            Task.Factory.StartNew(() =>
            {
                progressBarRand.Value = 100 * received / target;
                this.Text = String.Format("Thread {0}: {1}% complete", num, progressBarRand.Value);
            }, CancellationToken.None, TaskCreationOptions.None, uicontext);

            return (cancel) ? 0 : 1;
        }

A return value of 0 means the user has asked to cancel the operation. This return code propagates back down the application layers into the enclave. The enclave code looks at the return value of the OCALL to determine whether or not to cancel:

        // Make our callback. Be polite and only do this every MB.
        // (Assuming 1 KB = 1024 bytes, 1MB = 1024 KB)
        if (!(i % 1024)) {
            status = o_genrand_progress(&rv, callback, sz, i + 1, kb);
            // rv == 0 means we got a cancellation request
            if (status != SGX_SUCCESS || rv == 0) return i;
         }

Enclave Configuration

The default configuration for an enclave is to allow a single thread. As the sample application can run up to three threads in the enclave at one time—the CPUID function on the UI thread and the two RDRAND operations in background threads—the enclave configuration needed to be changed. This is done by setting the TCSNum parameter to 3 in Enclave.config.xml. If this parameter is left at its default of 1 only one thread can enter the enclave at a time, and simultaneous ECALLs will fail with the error code SGX_ERROR_OUT_OF_TCS.

<EnclaveConfiguration><ProdID>0</ProdID><ISVSVN>0</ISVSVN><StackMaxSize>0x40000</StackMaxSize><HeapMaxSize>0x100000</HeapMaxSize><TCSNum>3</TCSNum><TCSPolicy>1</TCSPolicy><DisableDebug>0</DisableDebug><MiscSelect>0</MiscSelect><MiscMask>0xFFFFFFFF</MiscMask></EnclaveConfiguration>

Summary

Mixing Intel SGX with managed code is not difficult, but it can involve a number of intermediate steps. The sample C# application presented in this article represents one of the more complicated cases: multiple DLLs, multiple threads originating from .NET, locking in native space, OCALLS, and UI updates based on enclave operations. It is intended to demonstrate the flexibility that application developers really have when working with Intel SGX, in spite of their restrictions.

↧

Improve Video Quality, Build Extremely Efficient Encoders & Decoders with Intel® VPA & Intel® SBE

August 23, 2016, 2:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Memory Protection Extensions on Windows® 10: A Tutorial

≪ Previous: Using Enclaves from .NET: Making ECALLS with Callbacks via OCALLS

Video Codec Developers: This could be your magic encoder/decoder ring. We're excited to announce new Intel® Video Pro Analyzer 2017 (Intel® VPA), and also Intel® Stress Bitstreams and Encoder 2017, which you can use to enhance the brilliance of your video quality, and build extremely efficient, robust encoders and decoders. Get the scoop and more technical details on these advanced Intel video analysis tools below.

Learn more: Intel® VPA | Intel® SBE

Enhance Video Quality & Streaming for AVC, HEVC, VP9 & MPEG-2 with Intel VPA

Improving your encoder's video quality and compliance becomes faster and easier. Intel® VPA, a comprehensive video analysis toolset to inspect, debug and optimize the encode/decode process for AVC, HEVC, VP9 and MPEG-2, brings efficiency and multiple UI enhancements in its 2017 edition. A few of the top new features include:

Optimized Performance & Efficiency

HEVC file indexingmakes Intel VPA faster and easier to use with better performance and responsiveness when loading and executing debug optimizations and quicker switching capabilities between frames.
MPEG-2 error resilience improvements (previously delivered for HEVC and VP9 analysis)
Significant improvements in decode processing time by 30% for HEVC and by 60% for AVC, along with AVC playback optimization. This includes optimizations for skipping some intermediate processing when the user clicks on frames to decode in the GUI.
Video Quality Caliper provides more stream information, and has faster playback speed.

Enhanced Playback & Navigation

New performance enhancements in the 2017 release include decoder performance optimization with good gains for linear playback and indexing (for HEVC) to facilitate very fast navigation within the stream. **Playback for HEVC improved 1.4x, and for AVC improved 2.2x.**¹
Performance analysis for HEVC and AVC playback (blue bars) consists of the ratio of average times to load one Time Lapse Footage American Cities sequence, 2160p @ 100 frames.
Performance analysis for HEVC Random Navigation (orange bar) improved by 12x and consists of the ratio of latency differences to randomly access the previous frame from the current frame, measured on 2160p HEVC/AVC video.

UI Enhancements

Filtering of error messages and new settings to save fixes
Additional options for save/load, including display/decode order, fields/frame, yuv mode, and more.
Improved GUI picture cashing.

And don't forget. With this advance video analysis tool, you can innovate for UHD with BT 2020 support. See the full list of Intel VPA features, visit the product site for more details. Versions for Linux*, Windows* and Mac OSX* are available.

For current users, Download the Intel VPA 2017 Edition Now

If you are not already using Intel VPA - Get a Free Trial²

Intel VPA special offer for Academia: 50% discount, and free for eligible students and educators

More Resources - Get Started Optimizing Faster

Build Compliant HEVC & AVS2 Decoders with new Intel SBE 2017

Intel SBE is a set of streams and tools for VP9, HEVC, and AVS2 for extensive validation of the decoders, transcoders, players, and streaming solutions. You can also create custom bitstreams for testing and optimize stream base for coverage and usage efficiency. The new 2017 release delivers:

Improved HEVC coverage including syntax ensuring that decoders are in full conformance with the standard. Longterm reference generation support for video conferencing.
Random Encoder for AVS2 Main and Main 10 (this can be shipped only to those who are members of the AVS2 committee. (AVS2 format is broadly used in People's Republic of China.)
Compliance with the recently finalized AVS2 standard.

Learn more by visiting the Intel SBE site.

Take a test drive of Intel SBE - Download a Free Evaluation²

¹Baseline configuration: Intel® VPA 2017 vs. 2016 running on Microsoft Windows* 8.1. Intel Customer Reference Platform with Intel® Core™ i7-5775C (3.4 GHz, 32 GB DDR3 DRAM). Gigabyte Z97-HD3 Desktop board, 32GB (4x8GB DDR3 PC3-12800 (800MHz) DIMM), 500GB Intel SSD, Turbo Boost Enabled, and HT Enabled. Source: Intel internal measurements as of August 2016.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/performance. Features and benefits may require an enabled system and third party hardware, software or services. Consult your system provider.

Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804.

²Note that in evaluation package, the number of streams and/or other capabilities may be limited. Contact Intel Sales if you need a review without these limitations.

↧

Intel® Memory Protection Extensions on Windows® 10: A Tutorial

August 26, 2016, 4:57 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 3, Designing for Intel® SGX

≪ Previous: Improve Video Quality, Build Extremely Efficient Encoders & Decoders with Intel® VPA & Intel® SBE

Introduction

Beginning with the Intel® 6th generation Core™ processor, Intel has introduced Intel® Memory Protection Extensions (Intel® MPX), a new extension to the instruction set architecture that aims to enhance software security by helping to protect against buffer overflow attacks. In this article, we discuss buffer overflow, and then give step-by step details on how application developers can prevent their apps from suffering from buffer overflow attacks on Windows® 10. Intel MPX works for both traditional desktop apps and Universal Windows Platform* apps.

Prerequisites

To run the samples discussed in this article, you’ll need the following hardware and software:

A computer (desktop, laptop, or any other form factor) with Intel® 6th generation Core™ processor and Microsoft Windows 10 OS (November 2015 update or greater; Windows 10 version 1607 is preferred)
Intel MPX enabled in UEFI (if the option is available)
Intel MPX driver properly installed
Microsoft Visual Studio* 2015 (update 1 or later IDE; Visual Studio 2015 update 3 is preferred)

Buffer Overflow

C/C++ code is by nature more susceptible to buffer overflows. For example, in the following code the string operation function “strcpy” in main() will put the program at risk for a buffer overflow attack.

#include "stdafx.h"
#include <iostream>
#include <time.h>
#include <stdlib.h>

using namespace std;

void GenRandomUname(char* uname_string, const int uname_len)
{
	srand(time(NULL));
	for (int i = 0; i < uname_len; i++)
	{
		uname_string[i] = (rand() % ('9' - '0' + 1)) + '0';
	}
	uname_string[uname_len] = '\0';
}

int main(int argnum, char** args)
{
	char user_name[16];
	GenRandomUname(user_name, 15);
	cout << "random gentd user name: "<< user_name << endl;

	char config[10] = { '\0' };
	strcpy(config, args[1]);

	cout << "config mem addr: "<< &config << endl;
	cout << "user_name mem addr: "<< &user_name << endl;

	if (0 == strcmp("ROOT", user_name))
	{
		cout << "Buffer Overflow Attacked!"<< endl;
		cout << "Uname changed to: "<< user_name << endl;
	}
	else
	{
		cout << "Uname OK: "<< user_name << endl;
	}
	return 0;
}

To be more accurate, if we compile and run the above sample as a C++ console application, passing CUSM_CFG as an argument, the program will run normally and the console will show the following output:

Figure 1 Buffer Overflow

But if we rerun the program passing CUSTOM_CONFIGUREROOT as an argument, the output will be “unexpected” and the console will show a message like this:

Figure 2 Buffer Overflow

This simple example shows how a buffer overflow attack works. The reason why there can be unexpected output is that the function call of strcpy does not check the bonds of the destination array. Although compilers usually give several extra bytes to arrays for memory alignment purpose, buffer overflow may still happen if the source array is long enough. In this case, a piece of the runtime memory layout of the program looks like this (the result of different compilers or compile options may vary):

Intel Memory Protection Extensions

With the help of Intel MPX, we can avoid the buffer overflow security issue simply by adding the compile option /d2MPX to the Visual Studio C++ compiler.

After recompiling with the Intel MPX option, the program is able to defend against buffer overflow attacks. If we try running the recompiled program with CUSTOM_CONFIGUREROOT argument, a runtime exception will arise and cause the program to exit.

Let’s dig into the generated assembly code to see what Intel MPX has done with the program. From the results, we can see that many of the instructions related to Intel MPX have been inserted into the original instructions to detect buffer overflows at runtime.

Now let’s look in more detail at the instructions related to Intel MXP:

bndmk: Creates LowerBound (LB) and UpperBound (UB) in the bounds register (%bnd0) in the code snapshot above.
bndmov: Fetches the bounds information (upper and lower) out of memory and puts it in a bounds register.
bndcl: Checks the lower bounds against an argument (%rax) in the code snapshot above.
bndcu: Checks the upper bounds against an argument (%rax) in the code snapshot above.

Troubleshooting

If MPX is not working properly,

Double-check the versions of your CPU, OS, and Visual Studio 2015. Boot the PC into the UEFI settings to check if there is any Intel MPX switch; turn on the switch if needed.
Confirm that the Intel MPX driver is properly installed and functioning properly in the Windows* Device Manager.
Check that the compiled executable contains instructions related to Intel MPX. Insert a break point, and then run the program. When the break point is hit, right-click with the mouse, and then click Go To Disassembly. A new window will display for viewing the assembly code.

Conclusion

Intel MPX is a new hardware solution that helps defend against buffer overflow attacks. Compared with software solutions such as AddressSanitizer (https://code.google.com/p/address-sanitizer/), from an application developer’s point of view, Intel MPX has many advantages, including the following:

Detects when a pointer points out of the object but still points to valid memory.
Intel MPX is more flexible, it can be used in some certain modules without affecting any other modules.
Compatibility with legacy code is much higher for code instrumented with Intel MPX.
One single version binary can still be released, because of the particular instruction encoding. The instructions related to Intel MPX will be executed as NOPs (No Operations) on unsupported hardware or operations systems.

On Intel® 6th generation Core™ Processor and Windows 10, benefiting from Intel MPX for applications is as simple as adding a compiler option, which can help enhance application security without hurting the application’s backward compatibility.

Intel® Memory Protection Extensions Enabling Guide:

https://software.intel.com/en-us/articles/intel-memory-protection-extensions-enabling-guide

References

[1] AddressSanitizer: https://code.google.com/p/address-sanitizer/

About the Author

Fanjiang Pei is an application engineer in the Client Computing Enabling Team, Developer Relations Division, Software and Solutions Group (SSG). He is responsible for enabling security technologies of Intel such as Intel MPX, Intel® Software Guard Extensions, and more.

↧

Intel® Software Guard Extensions Tutorial Series: Part 3, Designing for Intel® SGX

August 29, 2016, 3:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Deploying applications with Intel® IPP DLLs

≪ Previous: Intel® Memory Protection Extensions on Windows® 10: A Tutorial

In Part 3 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series we’ll talk about how to design an application with Intel SGX in mind. We’ll take the concepts that we reviewed in Part 1, and apply them to the high-level design of our sample application, the Tutorial Password Manager, laid out in Part 2. We’ll look at the overall structure of the application and how it is impacted by Intel SGX and create a class model that will prepare us for the enclave design and integration.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

While we won’t be coding up enclaves or enclave interfaces just yet, there is source code provided with this installment. The non-Intel SGX version of the application core, without its user interface, is available for download. It comes with a small test program, a console application written in C#, and a sample password vault file.

Designing for Enclaves

This is the general approach we’ll follow for designing the Tutorial Password Manager for Intel SGX:

Identify the application’s secrets.
Identify the providers and consumers of those secrets.
Determine the enclave boundary.
Tailor the application components for the enclave.

Identify the Application’s Secrets

The first step in designing an application for Intel SGX is to identify the application’s secrets.

A secret is anything that is not meant to be known or seen by others. Only the user or the application for which it is intended should have access to a secret, and it should not be exposed to others users or applications regardless of their privilege level. Potential secrets can include financial information, medical records, personally identifiable information, identity data, licensed media content, passwords, and encryption keys.

In the Tutorial Password Manager, there are several items that are immediately identifiable as secrets, shown in Table 1.

Secret

The user’s account passwords

The user’s account logins

The user’s master password or passphrase

The master key for the password vault

The encryption key for the account database

Table 1:Preliminary list of application secrets.

These are the obvious choices, but we’re going to expand this list by including all of the user’s account information and not just their logins. The revised list is shown in Table 2.

Secret

The user’s account passwords

The user’s account logins information

The user’s master password or passphrase

The master key for the password vault

The encryption key for the account database

Table 2: Revised list of application secrets.

Even without revealing the passwords, the account information is valuable to attackers. Exposing this data in the password manager leaks valuable clues to those with malicious intent. With this data, they can choose to launch attacks against the services themselves, perhaps using social engineering or password reset attacks, to obtain access to the owner’s account because they know exactly what to target.

Identify the Providers and Consumers of the Application’s Secrets

Once the application’s secrets have been identified, the next step is to determine their origins and destinations.

In the current version of Intel SGX, the enclave code is not encrypted, which means that anyone with access to the application files can disassemble and inspect it. By definition, something cannot be a secret if it is open to inspection, and that means that secrets should never be statically compiled into enclave code. An application’s secrets must originate from outside its enclaves and be loaded into them at runtime. In Intel SGX terminology, this is referred to as provisioning secrets into the enclave.

When a secret originates from a component outside of the Trusted Compute Base (TCB), it is important to minimize its exposure to untrusted code. (One of the main reasons why remote attestation is such a valuable component of Intel SGX is that it allows a service provider to establish a trusted relationship with an Intel SGX application, and then derive an encryption key that can be used to provision encrypted secrets to the application that only the trusted enclave on that client system can decrypt.) Similar care must be taken when a secret is exported out of an enclave. As a general rule, an application’s secrets should not be sent to untrusted code without first being encrypted inside of the enclave.

Unfortunately for the Tutorial Password Manager application, we do need to send secrets into and out of the enclave, and those secrets will have to exist in clear text at some point. The end user will be entering his or her account information and password via a keyboard or touchscreen, and recalling it at a future time as needed. Their account passwords will need to be shown on the screen, and even copied to the Windows* clipboard on request. These are core requirements for a password manager application to be useful.

What that means for us is that we can’t completely eliminate the attack surface: we can only minimize it, and we’ll need some mitigation strategy for dealing with secrets when they exist outside the enclave in plain text.

Secret	Source	Destination
The user’s account passwords	User input* Password vault file	User interface* Clipboard* Password vault file
The user’s account information	User input* Password vault file	User interface* Password vault file
The user’s master password or passphrase	User input	Key derivation function
The master key for the password vault	Key derivation function	Database key crypto
The encryption key for the password database	Random generation Password vault file	Password vault crypto Password vault fil

Table 3: Application secrets, their sources, and their destinations. Potential security risks are denoted with an asterisk (*).

Table 3 adds the sources and destinations for the Tutorial Password Manager’s secrets. Potential problems—areas where secrets may be exposed to untrusted code—are denoted with an asterisk (*).

Determine the Enclave Boundary

Once the secrets have been identified, it’s time to determine the boundary for the enclave. Start by looking at the data flow of secrets through the application’s core components. The enclave boundary should:

Encompass the minimum set of critical components that act on your application’s secrets.
Completely contain as many secrets as is feasible.
Minimize the interactions with, and dependencies on, untrusted code.

The data flows and chosen enclave boundary for the Tutorial Password Manager application are shown in Figure 1.

Figure 1: Data flow for secrets in the Tutorial Password Manager.

Here, the application secrets are depicted as circles, with blue circles representing secrets that will exist in plain text (unencrypted) at some point during the application’s execution and green circles representing secrets that are encrypted by the application. The enclave boundary has been drawn around the encryption and decryption routines, the key derivation function (KDF) and the random number generator. This does several things for us:

The database/vault key, which is used to encrypt some of our application’s secrets (account information and passwords), is generated within the enclave and is never sent outside of it in clear text.
The master key is derived from the user’s passphrase inside the enclave, and used to encrypt and decrypt the database/vault key. The master key is ephemeral and is never sent outside the enclave in any form.
The database/vault key, account information, and account passwords are encrypted inside the enclave using encryption keys that are not visible to untrusted code (see #1 and #2).

Unfortunately, we have issues with unencrypted secrets crossing the enclave boundary that we simply can’t avoid. At some point during the Tutorial Password Manager’s execution, a user will have to enter a password on the keyboard or copy a password to the Windows clipboard. These are insecure channels that can’t be placed inside the enclave, and the operations are absolutely necessary for the functioning of the application. This is potentially a huge problem, which is compounded by the decision to build the application on top of a managed code base.

Protecting Secrets Outside the Enclave

There are no complete solutions for securing unencrypted secrets outside the enclave, only mitigation strategies that reduce the attack surface. The best we can do is minimize the amount of time that this information exists in a form that is easily compromised.

Here is some general advice for handling sensitive data in untrusted code:

Zero-fill your data buffers when you are done with them. Be sure to use functions such as SecureZeroMemory (Windows) and memzero_explicit (Linux) that are guaranteed to not be optimized out by the compiler.
Do not use the C++ standard template library (STL) containers to store sensitive data. The STL containers have their own memory management, which makes it difficult to ensure that the memory allocated to an object is securely wiped when the object is deleted. (By using custom allocators you can address this issue for some containers.)
When working with managed code such as .NET, or languages that feature automatic memory management, use storage types that are specifically designed for holding secure data. Other storage types are at the mercy of the garbage collector and just-in-time compilation, and may not be cleared or freed on demand (if at all).
If you must place data on the clipboard be sure to clear it after a short length of time. In particular, don’t allow it to remain there after the application has exited.

For the Tutorial Password Manager project, we have to work with both native and managed code. In native code, we’ll allocate wchar_t and char buffers, and use SecureZeroMemory to wipe them clean before freeing them. In the managed code space, we’ll employ .NET’s SecureString class.

When sending a SecureString to unmanaged code, we’ll use the helper functions from System::Runtime::InteropServices to marshal the data.

using namespace System::Runtime::InteropServices;

LPWSTR PasswordManagerCore::M_SecureString_to_LPWSTR(SecureString ^ss)
{
	IntPtr wsp= IntPtr::Zero;

	if (!ss) return NULL;

	wsp = Marshal::SecureStringToGlobalAllocUnicode(ss);
	return (wchar_t *) wsp.ToPointer();
}

When marshaling data in the other direction, from native code to managed code, we have two methods. If the SecureString object already exists, we’ll use the Clear and AppendChar methods to set the new value from the wchar_t string.

password->Clear();
for (int i = 0; i < wpass_len; ++i) password->AppendChar(wpass[i]);

When creating a new SecureString object, we’ll use the constructor form that creates a SecureString from an existing wchar_t string.

try {
	name = gcnew SecureString(wname, (int) wcslen(wname));
	login = gcnew SecureString(wlogin, (int) wcslen(wlogin));
	url = gcnew SecureString(wurl, (int) wcslen(wurl));
}
catch (...) {
	rv = NL_STATUS_ALLOC;
}

Our password manager also supports transferring passwords to the Windows clipboard. The clipboard is an insecure storage space that can potentially be accessed by other users and for this reason Microsoft recommends that sensitive data never be placed on there. The point of a password manager, though, is to make it possible for users to create strong passwords that they do not have to remember. It also makes it possible to create lengthy passwords consisting of randomly generated characters which would be difficult to type by hand. The clipboard provides much needed convenience in exchange for some measure of risk.

To mitigate this risk, we need to take some extra precautions. The first is to ensure that the clipboard is emptied when the application exits. This is accomplished in the destructor in one of our native objects.

PasswordManagerCoreNative::~PasswordManagerCoreNative(void)
{
	if (!OpenClipboard(NULL)) return;
	EmptyClipboard();
	CloseClipboard();
}

We’ll also set up a clipboard timer. When a password is copied to the clipboard, set a timer for 15 seconds and execute a function to clear the clipboard when it fires. If a timer is already running, meaning a new password was placed on the clipboard before the old one was expired, that timer is cancelled and the new one takes its place.

void PasswordManagerCoreNative::start_clipboard_timer()
{
	// Use the default Timer Queue

	// Stop any existing timer
	if (timer != NULL) DeleteTimerQueueTimer(NULL, timer, NULL);

	// Start a new timer
	if (!CreateTimerQueueTimer(&timer, NULL, (WAITORTIMERCALLBACK)clear_clipboard_proc,
		NULL, CLIPBOARD_CLEAR_SECS * 1000, 0, 0)) return;
}

static void CALLBACK clear_clipboard_proc(PVOID param, BOOLEAN fired)
{
	if (!OpenClipboard(NULL)) return;
	EmptyClipboard();
	CloseClipboard();
}

Tailor the Application Components for the Enclave

With the secrets identified and the enclave boundary drawn, it’s time to structure the application while taking the enclave into account. There are significant restrictions on what can be done inside of an enclave, and these restrictions will mandate which components live inside the enclave, which live outside of it, and when porting an existing applications, which ones may need to be split in two.

The biggest restriction that impacts the Tutorial Password Manager is that enclaves cannot perform any I/O operations. The enclave can’t read from the keyboard or write to the display so all of our secrets—passwords and account information—must be marshaled into and out of the enclave. It also can’t read from or write to the vault file: the components that parse the vault file must be separated from components that perform the physical I/O. That means we are going to have to marshal more than just our secrets across the enclave boundary: we have to marshal the file contents as well.

Class diagram for the Tutorial Password Manager.

Figure 2:Class diagram for the Tutorial Password Manager.

Figure 2 shows the basic class diagram for the application core (excluding the user interface), including which classes serve as the sources and destinations for our secrets. Note that the PasswordManagerCore class is considered the source and destination for secrets which must interact with the GUI in this diagram for simplicity’s sake. Table 4 briefly describes each class and its purpose.

Class	Type	Function
PasswordManagerCore	Managed	Interact with the C# graphical user interface (GUI) and marshal data to the native layer.
PasswordManagerCoreNative	Native, Untrusted	Interact with the managed PasswordManagerCore class. Also responsible for converting between Unicode and multibyte character data (this will be discussed in more detail in Part 4).
VaultFile	Managed	Reads and writes from the vault file.
Vault	Native, Enclave	Stores the password vault data in AccountRecord members. Deserializes the vault file on reads, and reserializes it for writing.
AccountRecord	Native, Enclave	Stores the account information and password for each account in the user’s password vault.
Crypto	Native, Enclave	Performs cryptographic functions.
DRNG	Native, Enclave	Interface to the random number generator.

Table 4:Class descriptions.

Note that we had to split the handling of the vault file into two pieces: one that does the physical I/O, and one that stores its contents once they are read and parsed. We also had to add serialization and deserialization methods to the Vault object as intermediate sources and destinations for our secrets. All of this is necessary because the VaultFile class can’t know anything about the structure of the vault file itself, since that would require access to cryptographic functions that are located inside the enclave.

We’ve also drawn a dotted line when connecting the PasswordManagerCoreNative class to the Vault class. As you might recall from Part 2, enclaves can only link to C functions. These two C++ classes cannot directly communicate with one another: they must use an intermediary which is denoted by the Bridge Functions box.

The Non-Intel® Software Guard Extensions Code Path

The diagram in Figure 2 is for the Intel SGX code path. The PasswordManagerCoreNative class cannot link directly to the Vault class because the latter is inside the enclave. In the non-Intel SGX code path, however, there is no such restriction: PasswordManagerCoreNative can directly contain a member of class Vault. This is the only shortcut we’ll take in the application design for the non-Intel SGX code path. To simplify the enclave integration, the non-enclave code path will still separate the vault processing into the Vault and VaultFile classes.

Another key difference between the two code paths is that the cryptographic functions in the Intel SGX path will come from the Intel SGX SDK. The non-Intel SGX code path can’t use these functions, so they will draw upon Microsoft’s Cryptography Next Generation* API (CNG). That means we have to maintain two, distinct copies of the Crypto class: one for use in enclaves and one for use in untrusted space. We’ll have to do the same with the DRNG class, too, since the Intel SGX code path will call sgx_read_rand instead of using the RDRAND intrinsic.

Sample Code

As mentioned in the introduction, there is sample codeprovided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core DLL, prior to enclave integration. In other words, this is the non-Intel SGX version of the application core. There is no user interface provided, but we have included a rudimentary test application written in C# that runs through a series of test operations. It executes two test suites: one that creates a new vault file and performs various operations on it, and one that acts on a reference vault file that is included with the source distribution. As written, the test application expects the test vault to be located in your Documents folder, though you can change this in the TestSetup class if needed.

This source code was developed in Microsoft Visual Studio* Professional 2013 per the requirements stated in the introduction to the tutorial series. It does not require the Intel SGX SDK at this point, though you will need a system that supports Intel® Data Protection Technology with Secure Key.

Coming Up Next

In part 4 of the tutorial we’ll develop the enclave and the bridge functions. Stay tuned!

Find the list of all the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

↧

Deploying applications with Intel® IPP DLLs

September 1, 2016, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Introducing the new Packed APIs for GEMM

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 3, Designing for Intel® SGX

Introduction

The Intel® Integrated Performance Primitives (Intel® IPP) is a cross-architecture software library that provides a broad range of library functions for image processing, signal processing, data compression, cryptography, and computer vision, as well as math support routines for such processing capabilities. Intel® IPP is optimized for the wide range of Intel microprocessors.

One of the key advantages within Intel® IPP is performance. The performance advantage comes through per processor architecture optimized functions, compiled into one single library. Intel® IPP functions are “dispatched” at run-time. The “dispatcher” chooses which of these processor-specific optimized libraries to use when the application makes a call into the IPP library. This is done to maximize each function’s use of the underlying vector instructions and other architecture-specific features.

This paper covers application deployment with Intel® IPP dynamic-link libraries (DLLs). It is important to understand processor detection and library dispatching, so that software redistribution is problem free. Additionally you want to consider two key factors when it comes to DLLs:

Selection of an appropriate DLL linking model.
The location for the DLLs on the target system.

This document explains how the Intel® IPP dynamic libraries work and discusses these important considerations. For information on all Intel® IPP linking models, please refer to the document Intel® IPP Linkage Models – Quick Reference Guide. Further documentation on Intel® IPP can be found at Intel® Integrated Performance Primitives – Documentation.

Version Information
This document applies to Intel® IPP 2017.xx.xxx for Windows* running 32-bit and 64-bit applications but concepts can also be applied to other operating systems supported by Intel® IPP.

Library Location
Intel® IPP is also a key component of Intel® Parallel Studio XE and Intel® System Studio. The IPP libraries of Parallel Studio can be found in redist directory. For default installation on Windows*, the path to the libraries is set to ’C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_x.xx.xxx\<target_os>’ where ‘x.xx.xxx’ designates the version installed (on certain systems, instead of ‘Program Files (x86)’, the directory name is ‘Program Files’). For convenience <ipp directory> will be used instead throughout this paper.

Note: Please verify that your license permits redistribution before distributing the Intel® IPP DLLs. Any software source code included with this product is furnished under a software license and may only be used or copied in accordance with the terms of that license. Please see the Intel® Software Products End User License Agreement for license definitions and restrictions on the library.

Key Concepts

Library Dispatcher
Every Intel® IPP function has many binary implementations, each performance-optimized for a specific target CPU. These processor-specific functions are contained in separate DLLs. The name of each DLL has a prefix identification code that denotes its target processor. For example, a 32-bit Intel processor with SSE4.2 support requires the image processing library named ippip8.dll, where ‘p8’ is the CPU identification code for 32-bit SSE4.2.

IA-32 Intel® architecture	Intel® 64 architecture	Meaning
px	Mx	Generic code optimized for processors with Intel® Streaming SIMD Extensions (Intel® SSE)
w7	My	Optimized for processors with Intel SSE2
s8	n8	Optimized for processors with Supplemental Streaming SIMD Extensions 3 (SSSE3)
-	m7	Optimized for processors with Intel SSE3
p8	y8	Optimized for processors with Intel SSE4.2
g9	e9	Optimized for processors with Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI)
h9	l9	Optimized for processors with Intel® Advanced Vector Extensions 2 (Intel® AVX2)
-	k0	Optimized for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
	n0	Optimized for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

Table 1: CPU Identification Codes Associated with Processor-Specific Libraries

When the first Intel® IPP function call occurs in the application, the application searches the system path for an Intel® IPP dispatcher library. The dispatcher library identifies the system processor and invokes the function version that has the best performance on the target CPU. This process does not add overhead because the dispatcher connects to an entry point of an optimized function only once during application initialization. This allows your code to call optimized functions without worrying about the processor on which the code will execute.

Dynamic Linking
Dynamic-link libraries are loaded when an application runs. Simply link the application to the Intel® IPP libraries located in the <ipp directory>\ipp\lib\ia32 or <ipp directory>\ipp\lib\intel64 folder, which load the dispatcher libraries and link to the correct entry points. Ensure that the dispatcher DLLs and the processor-specific DLLs are on the system path. In the diagram below, the application links to the ipps.lib and ipps.dll automatically loads ippsv8.dll at runtime.

Figure 1: Processor-Specific Dispatching

Dynamic linking is useful if many Intel® IPP functions are called in the application. Most applications are good candidates for this model.

Building a Custom DLL
In addition to dynamic linking, the Intel® IPP provides a tool called Intel® IPP Custom Library Tool for developers to create their own DLL. This tool can be found under <ipp directory>\ipp\tools\custom_library_tool and links selected Intel® IPP functions into a new separate DLL and generates an import library to which the application can link. A custom DLL is useful if the application uses a limited set of functions. The custom DLL must be distributed with the application. Intel® IPP supports two dynamic linking options. Refer to Table 2 below to choose which dynamic linking model best suits the application.

Feature	Dynamic Linking	Custom DLL
Processor Updates	Automatic	Recompile and redistribute
Optimization	All processors	All processors
Build	Link to stub static libraries	Build and link to a separate import library which dispatches a separate DLL
Function Naming	Regular names	Regular names
Total Binary Size	Large	Small
Executable Size	Smallest	Smallest
Kernel Mode	No	No

Table 2: Dynamic Linking Models

For detailed information on how to build and link to a custom DLL, unzip the example package files under <ipp directory>\ipp\examples and look at the core examples under components\examples_core\ipp_custom_dll.

Threading and Multi-core Support
Intel continues deprecation of internal threading that was started in version Intel® IPP 7.1. Internal (inside a primitive) threading is significantly less effective than external (at the application level) threading. For threading Intel® IPP functions, external threading is recommended which gives significant performance gain on multi-processor and multi-core systems. A good starting point on how to develop code for external threading can be found here.

Linking the Application

The Intel® IPP can be compiled with Microsoft Visual Studio* and Intel® C++ Compiler. Instructions for configuring Microsoft Visual Studio to link to the Intel® IPP libraries can be found in the Getting Started with Intel® Integrated Performance Primitives document.

Deploying the Application

The Intel® IPP dispatcher and processor-specific DLLs, located in <ipp directory>\redist\ia32\ipp or <ipp directory>\redist\intel64\ipp, or a custom DLL must be distributed with the application software. The Intel® IPP core functions library, ippcore.dll must also be distributed.

When distributing a custom DLL, it is best to create a distinct naming scheme to avoid conflicts and for tracking purposes. This is also important because custom DLLs must be recompiled and redistributed to include new processor optimizations not available in previous Intel® IPP versions.

On Microsoft Windows*, the system PATH variable holds a list of folder locations that is searched for executable files. When the application is invoked, the Intel® IPP DLLs need to be located in a folder that is listed in the PATH variable. Choose a location for the Intel® IPP DLLs and custom DLLs on the target system so that the application can easily find them. Possible distribution locations include %SystemDrive%/WINDOWS\system32, the application folder or any other folder on the target system. Table 3 below compares these options.

	System PATH	Permissions
%SystemDrive%\WINDOWS\system32	This folder is listed on the system PATH by default.	Administrator permissions may be required to copy files to this folder.
Application Folder or Subfolder	Windows will first check the application folder for the DLLs.	Special permissions may be required.
Other Folder	Add this directory to the system PATH.	Special permissions may be required.

Table 3: Intel® IPP DLL Location

In all cases, the application must be able to find the location of the Intel® IPP DLLs and custom DLLs in order to run properly.

The Intel® IPP provides a convenient method to performance optimize a 32-bit or Intel 64-bit application for the latest processors. Application and DLL distribution requires developers to do the following:

Choose the appropriate DLL linking model
- Dynamic linking– Application is linked to stub libraries. At runtime, dispatcher DLLs detect the target processor and dispatch processor-specific DLLs. Dispatcher and processor-specific DLLs to be distributed with the application.
- Custom DLL– Application is linked to a custom import library. At runtime, the custom DLL is invoked. Custom DLL to be distributed with the application.

Determine the best location for the Intel® IPP DLLs on the end–user system.
- %SystemDrive%\WINDOWS\system32
- Application folder or subfolder
- Other folder

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

↧

Introducing the new Packed APIs for GEMM

August 18, 2016, 12:14 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® IPP ZLIB Coding Functions

≪ Previous: Deploying applications with Intel® IPP DLLs

1 Introducing Packed APIs for GEMM

Matrix-matrix multiplication (GEMM) is a fundamental operation in many scientific, engineering, and machine learning applications. There is a continuing demand to optimize this operation, and Intel® Math Kernel Library (Intel® MKL) offers parallel high-performing GEMM implementations. To provide optimal performance, the Intel MKL implementation of GEMM typically transforms the original input matrices into an internal data format best suited for the targeted platform. This data transformation (also called packing) can be costly, especially for input matrices with one or more small dimensions.

Intel MKL 2017 introduces [S,D]GEMM packed application program interfaces (APIs) that allow users to explicitly transform the matrices into an internal packed format and pass the packed matrix (or matrices) to multiple GEMM calls. With this approach, the packing costs can be amortized over multiple GEMM calls if the input matrices (A or B) are reused between these calls.

2 Example

Three GEMM calls shown below use the same A matrix, while B/C matrices differ for each call:

float *A, *B1, *B2, *B3, *C1, *C2, *C3, alpha, beta;

MKL_INT m, n, k, lda, ldb, ldc;

// initialize the pointers and matrix dimensions (skipped for brevity)

sgemm(“T”, “N”, &m, &n, &k, &alpha, A, &lda, B1, &ldb, &beta, C1, &ldc);

sgemm(“T”, “N”, &m, &n, &k, &alpha, A, &lda, B2, &ldb, &beta, C2, &ldc);

sgemm(“T”, “N”, &m, &n, &k, &alpha, A, &lda, B3, &ldb, &beta, C3, &ldc);

Here the A matrix is transformed into internal packed data format within each sgemm call. The relative cost of packing matrix A three times can be high if n is small (number of columns for B/C). This cost can be minimized by packing the A matrix once and using its packed equivalent for the three consecutive GEMM calls as shown below:

// allocate memory for packed data format

float *Ap;

Ap = sgemm_alloc(“A”, &m, &n, &k);

// transform A into packed format

sgemm_pack(“A”, “T”, &m, &n, &k, &alpha, A, &lda, Ap);

// SGEMM computations are performed using the packed A matrix: Ap

sgemm_compute(“P”, “N”, &m, &n, &k, Ap, &lda, B1, &ldb, &beta, C1, &ldc);

sgemm_compute(“P”, “N”, &m, &n, &k, Ap, &lda, B2, &ldb, &beta, C2, &ldc);

sgemm_compute(“P”, “N”, &m, &n, &k, Ap, &lda, B3, &ldb, &beta, C3, &ldc);

// release the memory for Ap

sgemm_free(Ap);

The code sample above uses four new functions introduced to support packed APIs for GEMM: sgemm_alloc, sgemm_pack, sgemm_compute, and sgemm_free. First, the memory required for packed format is allocated using sgemm_alloc, which accepts a character argument identifying the packed matrix (A in this example) and three integer arguments for the matrix dimensions. Then, sgemm_pack transforms the original A matrix into the packed format Ap and performs the alpha scaling. The original A matrix remains unchanged. The three sgemm calls are replaced with three sgemm_compute calls that work with packed matrices and assume that alpha=1.0. The first two character arguments to sgemm_compute indicate that the A matrix is in packed format (“P”), and the B matrix is in non-transposed column major format (“N”). Finally, the memory allocated for Ap is released by calling sgemm_free.

GEMM packed APIs eliminate the cost of packing the matrix A twice for the three matrix-matrix multiplication operations shown in this example. These packed APIs can be used to eliminate the data transformation costs for A and/or B input matrices if A and/or B are re-used between GEMM calls.

3 Performance

The chart below shows the performance gains with the packed APIs on Intel® Xeon Phi™ Processor 7250. It is assumed that the packing cost can be completely amortized by a large number of SGEMM calls that use the same A matrix. The performance of regular SGEMM call is also provided for comparison.

4 Implementation Notes

It is recommended to call gemm_pack and gemm_compute with the same number of threads to get the best performance. Note that if there are only a small number of GEMM calls that share the same A or B matrix, the packed APIs may provide little performance benefit.

The gemm_alloc routine allocates memory approximately as large as the original input matrix. This means that the memory requirement of the application may increase significantly for a large input matrix.

GEMM packed APIs are only implemented for SGEMM and DGEMM in Intel MKL 2017. They are functional for all Intel architectures, but they are only optimized for 64-bit Intel® AVX2 and above.

5 Summary

[S,D]GEMM packed APIs can be used to minimize the data packing costs for multiple GEMM calls that use the same input matrix. As shown in the performance chart, calling them can improve the performance significantly if there is sufficient matrix reuse across multiple GEMM calls. These packed APIs are available in Intel MKL 2017, and both FORTRAN 77 and CBLAS interfaces are supported. Please see the Intel MKL Developer Reference for additional documentation.

↧

Intel® IPP ZLIB Coding Functions

September 4, 2016, 7:41 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 4, Enclave Design

≪ Previous: Introducing the new Packed APIs for GEMM

1. Overview

ZLIB is a lossless data compression method and software library by Jean-loup Gailly and Mark Adler initially released in 1995 and became “de-facto” standard of lossless data compression. ZLIB is inherent part of almost all operating systems based on Linux*, including Android, OS X* and versions for embedded and mobile platforms. Many applications, including the software packages as HTTP servers, use ZLIB as one (and sometimes, as the only) data compression methods.

Intel® Integrated Performance Primitives (Intel® IPP) library has functionality supporting and optimizing ZLIB library since Intel® IPP version 5.2. Unlike other ZLIB implementations, Intel® IPP functions for ZLIB optimize not only data compression part, but decompression operations too.

This article describes how Intel® IPP supports ZLIB, Intel® IPP for ZLIB distribution model, recent changes in Intel® IPP ZLIB functionality in version 2017, provides performance data obtained on different Intel® platforms.

2. ZLIB and Intel® IPP Implementation

The distribution model of Intel® IPP for ZLIB is as follows:

Intel® IPP library package provides files for source code patching for all ZLIB major versions – from 1.2.5.3 to 1.2.8. This patches should be applied to ZLIB source code files downloaded from ZLIB repositories at zlib.net (latest ZLIB version), or zlib.net/fossils for previous versions of ZLIB;
After patch file applied, the source code contains a set of conditional compilation constructions with respect to WITH_IPP definition. For example (from file deflate.c):

send_bits(s, (STATIC_TREES<<1)+last, 3);
#if !defined(WITH_IPP)
compress_block(s, (const ct_data *)static_ltree,(const ct_data *)static_dtree);
#else
{
  IppStatus status;
  status = ippsDeflateHuff_8u( (const Ipp8u*)s->l_buf, (const Ipp16u*)s->d_buf,
                     (Ipp32u)s->last_lit, (Ipp16u*)&s->bi_buf, (Ipp32u*)&s->bi_valid,
                     (IppDeflateHuffCode*)static_ltree, (IppDeflateHuffCode*)static_dtree,
                     (Ipp8u*)s->pending_buf, (Ipp32u*)&s->pending );
 Assert( ippStsNoErr == status, "ippsDeflateHuff_8u returned a bad status" );
}
send_code(s, END_BLOCK, static_ltree);
#endif

So, when source code file is compiled with no WITH_IPP definition, the original ZLIB library is built. If “-DWITH_IPP” compiler option is used, the Intel® IPP-enabled ZLIB library produced. Of course, several other compiler/linker options are required to build ZLIB with IPP (look below).

Intel® IPP library has the following functions to support ZLIB functionality:
Common functions:

ippsAdler32_8u,
ippsCRC32_8u

For compression (deflate):

ippsDeflateLZ77Fast_8u,
ippsDeflateLZ77Fastest_8u,
ippsDeflateLZ77Slow_8u,
ippsDeflateHuff_8u,
ippsDeflateDictionarySet_8u,
ippsDeflateUpdateHash_8u

For decompression (inflate):

ippsInflateBuildHuffTable,
ippsInflate_8u.

6 source code files are patched in ZLIB source code tree with the optimized Intel® IPP functions:

adler32.c,
crc32.c,
deflate.c,
inflate.c,
inftrees.h,
trees.c.

In general, the most compute intensive parts of ZLIB code are substituted with Intel® IPP function calls, all common/service parts of ZLIB remain intact.

3. What’s New in Intel® IPP 2017 Implementation of ZLIB

Intel® IPP 2017 adds some significant enhancement for the ZLIB optimization code, including a faster CPU-specific optimization code, a new “fastest“ compression level with the best compression performance, deflate parameters tuning support, and additional compression levels support:

3.1 CPU-Specific Optimizations

Intel® IPP 2017 functions provide the additional optimization for new Intel® platforms. For particular ZLIB needs, Intel® IPP library 2017 contains the following optimizations:

Checksum computing using modern Intel® CPU instructions;
Hash table operation using modern Intel® CPU instructions;
Huffman tables generation functionality;
Huffman tables decomposition during inflating;
Additional optimization of pattern matching algorithms (new in Intel® IPP 2017)

3.2 New Fastest Compression Level

Intel® IPP 2017 for ZLIB implementation introduces a brand new compression level with best compression performance. This is done by simplifying pattern matching, thus by slightly decreasing compression ratio.
New compression level – called “fastest” – got numeric code of “-2” to distinguish it from ZLIB “default” compression level (Z_DEFAULT_COMPRESSION = -1).
The value of compression level decrease can be seen from the following table:

Data Compression Corpus	Ratio (level “fast”, 1)/ Performance^* (MBytes/s)	Ratio (level “fastest”, -2) )/ Performance* (MBytes/s)
Large Calgary	2.80 / 86	2.10 (-0.7) / 197 (+111)
Canterbury	3.09 / 107	2.26 (-0.83)/ 294(+187)
Large (3 files)	3.10 / 97	2.01 (-1.09)/ 209(+112)
Silesia	2.80 / 89	2.16(-0.64) / 194(+105)

Note: "Compression ratio” in the table above is geometric mean of ratios of uncompressed file sizes to compressed file sizes, “performance” is number of input data megabytes compressed per second measured on Intel® Xeon® processor E5-2680 v3, 2.5 GHz, single thread.

3.3 Deflate Parameters Tuning

To give additional freedom in tuning of data compression parameters, in Intel® IPP 2017 for ZLIB the original deflateTune function is activated:

ZEXTERN int ZEXPORT deflateTune OF((z_streamp strm, int good_length, int max_lazy,
int nice_length, int max_chain));

The purpose and usage of function parameters is the same as in original ZLIB deflate algorithm. The modified deflate function itself loads the pattern matching parameters from configuration_table array of deflate.c with some pre-defined sets for each compression level.

3.4 Additional Compression Levels

The deflateTune function parameters give a freedom to modify compression search algorithm to obtain best “compression ratio”/”compression performance” ratio for particular customer needs. Nevertheless, the process of finding optimal parameter set is not straightforward, because actual behavior of compress functionality highly depends on input data specifics.

Intel® IPP team has done several experiments with different data and fixed some parameter sets as additional compression levels. The level values and input data characteristics are in the table below.

Additional compression levels	Input data
11-19	General data (text documents, binary files) of large size (greater than 1 MB)
21-29	Highly-compressible data (database tables, text documents with repeating phrases, large uncompressed pictures like BMPs, PPMs)

These sets are stored in array configuration_table in the file deflate.c. The affect to compression ratio in the levels, for example, from 11 to 19 in the same as of original levels from 1 to 9. That is, higher level provides better compression.You may use these sets, or discover your own.

4. Getting Started With Intel® IPP 2017 ZLIB

The process of preparation of Intel® IPP boosted Zlib library is described in readme.html file provided with Intel® IPP “components” package. It is explained how to download Zlib source code files from its site, how to un-archive, patch source code files and how to build Intel® IPP-enabled Zlib for different needs (static or dynamic Zlib libraries, statically or dynamically linked to Intel® IPP).

5. Usage Notes for Intel® IPP ZLIB Functions

5.1 Using the "Fastest" Compression Level

In order to obtain better compression performance, keeping ZLIB (deflate) compatibility, the new “fastest” compression method is implemented. It is light-weight compression, which

Doesn’t look back in the dictionary to find a better match;
Doesn’t collect input stream statistics for better Huffman-based coding.

This method corresponds to compression level “-2” and can be used as follows:

       z_stream str_deflate;
       str_deflate.zalloc =NULL;
       str_deflate.zfree =NULL;
       deflateInit(&str_deflate,-2);

The output (compressed) stream, generated with “fastest” compression is fully compatible with “deflate” standard and can be decompressed using regular ZLIB.

5.2 Tuning Compression Level

In Intel® IPP 2017 product ZLIB-related functions use table of substring marching parameters to control compression ratio and performance. This table, defined as configuration_table in deflate.c file contains the sets of four values. They are max_chain, good_length, nice_length and max_lazy. The description of these values is in the table below:

Value	Description
max_chain	Maximum number of searches in the dictionary for better (higher matching length) substring match. Reasonable value range is 1-8192.
good_length	If substring of this or higher length is matched in the dictionary, the maximum number of searches for this particular input string is reduced fourfold. Reasonable value range is 4-258.
nice_match	If substring of this or higher length is matched in the dictionary, the search is stopped. Reasonable value range is 4-258.
max_lazy	If this or wider substring is found in dictionary: For fast compression method (compression levels from 1 to 4) hash table is not updated; For slow compression method (levels from 5 to 9) the search algorithm doesn’t check nearest input data for better match.

Note: the final results of compression ratio and performance highly depends on input data specifics.

The actual values of parameters are shown in the table below

Compression level	Deflate function	max_chain	good_length	nice_match	max_lazy
1	Fast	4	8	8	8
2		4	16	16	9
3		4	16	16	12
4		48	32	32	16
5	Slow	32	8	32	16
6		128	8	256	16
7		144	8	256	16
8		192	32	258	128
9		256	32	258	258

These values were chosen for similar compression ratios with original open-source ZLIB on standard data compression collections. You can try your own combinations of matching values using deflateTune ZLIB function.For example, to change max_chain value from 128 to 64, and thus to speedup compression with some compression ratio degradation you need to do the following:

    z_stream str_deflate;
    str_deflate.zalloc =NULL;
    str_deflate.zfree =NULL;
    deflateInit(&str_deflate, Z_DEFAULT_COMPRESSION);
    deflateTune(&str_deflate, 8, 26, 256, 64);
    …
    deflateEnd(&str_deflate);

Note, that the string matching parameters changed for all subsequent compression operations (ZLIB deflate calls) with str_deflate object, until it is destroyed, or re-initialized with deflateReset function call.

5.3 Using additional Compression Levels

Some input data sets for compression can have specifics, for example input data can be long, or input data can be highly compressible.
For this specific data we introduced additional compression level, which are in fact function calls of the same “fast” or “slow” compression functions, but with different sets of string matching values. The new compression levels are the following:

From 11 to 19 – compression levels for big input data buffers (1 Mbyte and longer);
From 21 to 29 – compression levels for highly-compressible data (compression ratio of 30x and more).

For example, for levels 6 and 16 on “Large” data compression corpus on Intel® Xeon® processor E5-2680 v3, the “geomean” results are:

Level	Ratio	Compression Performance (in Mbyte/sec)
6	3.47	17.7
16	3.46	19.9

For levels, 6 and 26 on some synthetic highly-compressible data on Intel® Xeon® processor E5-2680 v3, the “geomean” results are:

Level	Ratio	Compression Performance (in Mbyte/sec)
6	218	768
26	218	782

Note: These levels are “experimental” and don’t guarantee improvements on all input data.

↧

Intel® Software Guard Extensions Tutorial Series: Part 4, Enclave Design

September 12, 2016, 1:15 pm

Latest and popular articles on Intel Technologies

≫ Next: Happy Together: Ground-Breaking Media Performance with Intel® Processors + Software - Oct. 27 Free Webinar

≪ Previous: Intel® IPP ZLIB Coding Functions

In Part 4 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series we’ll be designing our enclave and its interface. We’ll take a look at the enclave boundary that was defined in Part 3 and identify the necessary bridge functions, examine the impact the bridge functions have on the object model, and create the project infrastructure necessary to integrate the enclave into our application. We’ll only be stubbing the enclave ECALLS at this point; full enclave integration will come in Part 5 of the series.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series: the enclave stub and interface functions are provided for you to download.

Application Architecture

Before we jump into designing the enclave interface, we need to take a moment and consider the overall application architecture. As discussed in Part 1, enclaves are implemented as dynamically loaded libraries (DLLs under Windows* and shared libraries under Linux*) and they can only link against 100-percent native C code.

The Tutorial Password Manager, however, will have a GUI written in C#. It uses a mixed-mode assembly written in C++/CLI to get us from managed to unmanaged code, but while that assembly contains native code it is not a 100-percent native module and it cannot interface directly with an Intel SGX enclave. Attempts to incorporate the untrusted enclave bridge functions in C++/CLI assemblies will result in a fatal error:

	Command line error D8045: cannot compile C file 'Enclave_u.c' with the /clr option

That means we need to place the untrusted bridge functions in a separate DLL that is all native code. As a result, our application will need to have, at minimum, three DLLs: the C++/CLI core, the enclave bridge, and the enclave itself. This structure is shown in Figure 1.

Figure 1. Component makeup for a mixed-mode application with enclaves.

Further Refinements

Since the enclave bridge functions must reside in a separate DLL, we’ll go a step further and place all the functions that deal directly with the enclave in that same DLL. This compartmentalization of the application layers will not only make it easier to manage (and debug) the program, but also to ease integration by lessening the impact to the other modules. When a class or module has a specific task with a clearly defined boundary, changes to other modules are less likely to impact it.

In this case, the PasswordManagerCoreNative class should not be burdened with the additional task of instantiating enclaves. It just needs to know whether or not Intel SGX is supported on the platform so that it can execute the appropriate function.

As an example, the following code block shows the unlock() method:

int PasswordManagerCoreNative::vault_unlock(const LPWSTR wpassphrase)
{
	int rv;
	UINT16 size;

	char *mbpassphrase = tombs(wpassphrase, -1, &size);
	if (mbpassphrase == NULL) return NL_STATUS_ALLOC;

	rv= vault.unlock(mbpassphrase);

	SecureZeroMemory(mbpassphrase, size);
	delete[] mbpassphrase;

	return rv;
}

This is a pretty simple method that takes the user’s passphrase as a wchar_t, converts it to a variable-length encoding (UTF-8), and then calls the unlock() method in the vault object. Rather than clutter up this class, and this method, with enclave-handling functions and logic, it would be best to add enclave support to this method through a one-line addition:

int PasswordManagerCoreNative::vault_unlock(const LPWSTR wpassphrase)
{
	int rv;
	UINT16 size;

	char *mbpassphrase = tombs(wpassphrase, -1, &size);
	if (mbpassphrase == NULL) return NL_STATUS_ALLOC;

	// Call the enclave bridge function if we support Intel SGX
	if (supports_sgx()) rv = ew_unlock(mbpassphrase);
	else rv= vault.unlock(mbpassphrase);

	SecureZeroMemory(mbpassphrase, size);
	delete[] mbpassphrase;

	return rv;
}

Our goal will be to put as little enclave awareness into this class as is feasible. The only other additions the PasswordManagerCoreNative class needs is a flag for Intel SGX support and methods to both set and get it.

class PASSWORDMANAGERCORE_API PasswordManagerCoreNative
{
	int _supports_sgx;

	// Other class members ommitted for clarity

protected:
	void set_sgx_support(void) { _supports_sgx = 1; }
	int supports_sgx(void) { return _supports_sgx; }

Designing the Enclave

Now that we have an overall application plan in place, it’s time to start designing the enclave and its interface. To do that, we return to the class diagram for the application core in Figure 2, which was first introduced in Part 3. The objects that will reside in the enclave are shaded in green while the untrusted components are shaded in blue.

Figure 2. Class diagram for the Tutorial Password Manager with Intel® Software Guard Extensions.

The enclave boundary only crosses one connection: the link between the PasswordManagerCoreNative object and the Vault object. That suggests that the majority of our ECALLs will simply be wrappers around the class methods in Vault. We’ll also need to add some additional ECALLs to manage the enclave infrastructure. One of the complications of enclave development is that the ECALLs, OCALLs, and bridge functions must be native C code, and we are making extensive use of C++ features. Once the enclave has been launched, we’ll also need functions that span the gap between C and C++ (objects, constructors, overloads, and others).

The wrapper and bridge functions will go in their own DLL, which we’ll name EnclaveBridge.dll. For clarity, we’ll prefix the wrapper functions with ew_ (for “enclave wrapper”), and the bridge functions that make the ECALLs with ve_ (for “vault enclave”).

Calls from PasswordManagerCoreNative to the corresponding method in Vault will follow the basic flow shown in Figure 3.

Figure 3. Execution flow for bridge functions and ECALLs.

The method in PasswordManagerCoreNative will call into the wrapper function in EnclaveBridge.dll. That wrapper will, in turn, invoke one or more ECALLs, which enter the enclave and invoke the corresponding class method in the Vault object. Once all ECALLs have completed, the wrapper function returns back to the calling method in PasswordManagerCoreNative and provides it with a return value.

Enclave Logistics

The first step in designing the enclave is working out a system for managing the enclave itself. The enclave must be launched and the resulting enclave ID must be provided to the ECALLs. Ideally, this should be transparent to the upper layers of the application.

The easiest solution for the Tutorial Password Manager is to use global variables in the EnclaveBridge DLL to hold the enclave information. This design decision comes with a restriction: only one thread can be active in the enclave at a time. This is a reasonable solution because the password manager application would not benefit from having multiple threads operating on the vault. Most of its actions are driven by the user interface and do not consume a significant amount of CPU time.

To solve the transparency problem, each wrapper function will first call a function to check to see if the enclave has been launched, and launch it if it hasn’t. This logic is fairly simple:

#define ENCLAVE_FILE _T("Enclave.signed.dll")

static sgx_enclave_id_t enclaveId = 0;
static sgx_launch_token_t launch_token = { 0 };
static int updated= 0;
static int launched = 0;
static sgx_status_t sgx_status= SGX_SUCCESS;

// Ensure the enclave has been created/launched.

static int get_enclave(sgx_enclave_id_t *eid)
{
	if (launched) return 1;
	else return create_enclave(eid);
}

static int create_enclave(sgx_enclave_id_t *eid)
{
	sgx_status = sgx_create_enclave(ENCLAVE_FILE, SGX_DEBUG_FLAG, &launch_token, &updated, &enclaveId, NULL);
	if (sgx_status == SGX_SUCCESS) {
		if ( eid != NULL ) *eid = enclaveId;
		launched = 1;
		return 1;
	}

	return 0;
}

Each wrapper function will start by calling get_enclave(), which checks to see if the enclave has been launched by examining a static variable. If it has, then it (optionally) populates the eid pointer with the enclave ID. This step is optional because the enclave ID is also stored as a global variable, enclaveID, which can of course just be used directly.

What happens if an enclave is lost due to a power event or a bug that causes it to crash? For that, we check the return value of the ECALL: it indicates the success or failure of the ECALL operation itself, not of the function being called in the enclave.

sgx_status = ve_initialize(enclaveId, &vault_rv);

The return value of the function being called in the enclave, if any, is transferred via the pointer which is provided as the second argument to the ECALL (these function prototypes are generated for you automatically by the Edger8r tool). You must always check the return value of the ECALL itself. Any result other than SGX_SUCCESS indicates that the program did not successfully enter the enclave and the requested function did not run. (Note that we’ve defined sgx_status as a global variable as well. This is another simplification stemming from our single-threaded design.)

We’ll add a function that examines the error returned by the ECALL and checks for a lost or crashed enclave:

static int lost_enclave()
{
	if (sgx_status == SGX_ERROR_ENCLAVE_LOST || sgx_status == SGX_ERROR_ENCLAVE_CRASHED) {
		launched = 0;
		return 1;
	}

	return 0;
}

These are recoverable errors. The upper layers don’t currently have logic to deal with these specific conditions, but we provide it in the EnclaveBridge DLL in order to support future enhancements.

Also notice that there is no function provided to destroy the enclave. As long as the user has the password manager application open, the enclave is in place even if they choose to lock their vault. This is not good enclave etiquette. Enclaves draw from a finite pool of resources, even when idle. We’ll address this problem in a future segment of the series when we talk about data sealing.

The Enclave Definition Language

Before moving on to the actual enclave design, we’ll take a few moments to discuss the Enclave Definition Language (EDL) syntax. An enclave’s bridge functions, both its ECALLs and OCALLs, are prototyped in its EDL file and its general structure is as follows:

enclave {
	// Include files

	// Import other edl files

	// Data structure declarations to be used as parameters of the function prototypes in edl

	trusted {
	// Include file if any. It will be inserted in the trusted header file (enclave_t.h)

	// Trusted function prototypes (ECALLs)

	};

	untrusted {
	// Include file if any. It will be inserted in the untrusted header file (enclave_u.h)

	// Untrusted function prototypes (OCALLs)

	};
};

ECALLs are prototyped in the trusted section, and OCALLs are prototyped in the untrusted section.

The EDL syntax is C-like and function prototypes very closely resemble C function prototypes, but it’s not identical. In particular, bridge function parameters and return values are limited to some fundamental data types and the EDL includes some additional keywords and syntax that defines some enclave behavior. The Intel® Software Guard Extensions (Intel® SGX) SDK User’s Guide explains the EDL syntax in great detail and includes a tutorial for creating a sample enclave. Rather than repeat all of that here, we’ll just discuss those elements of the language that are specific to our application.

When parameters are passed to enclave functions, they are marshaled into the protected memory space of the enclave. For parameters passed as values, no special action is required as the values are placed on the protected stack in the enclave just as they would be for any other function call. The situation is quite different for pointers, however.

For parameters passed as pointers, the data referenced by the pointer must be marshaled into and out of the enclave. The edge routines that perform this data marshalling need to know two things:

Which direction should the data be copied: into the bridge function, out of the bridge function, or both directions?
What is the size of the data buffer referenced by the pointer?

Pointer Direction

When providing a pointer parameter to a function, you must specify the direction by the keywords in brackets: [in], [out], or [in, out], respectively. Their meaning is given in Table 1.

Direction	ECALL	OCALL
in	The buffer is copied from the application into the enclave. Changes will only affect the buffer inside the enclave.	The buffer is copied from the enclave to the application. Changes will only affect the buffer outside the enclave.
out	A buffer will be allocated inside the enclave and initialized with zeros. It will be copied to the original buffer when the ECALL exits.	A buffer will be allocated outside the enclave and initialized with zeros. This untrusted buffer will be copied to the original buffer in the enclave when the OCALL exits.
in, out	Data is copied back and forth.	Same as ECALLs.

Table 1. Pointer direction parameters and their meanings in ECALLs and OCALLs.

Note from the table that the direction is relative to the bridge function being called. For an ECALL, [in] means “copy the buffer to the enclave,” but for an OCALL it’s “copy the buffer to the untrusted function.”

(There is also the option called user_check that can be used in place of these, but it’s not relevant to our discussion. See the SDK documentation for information on its use and purpose.)

Buffer Size

The edge routines calculate the total buffer size, in bytes, as:

bytes = element_size * element_count

By default, the edge routines assume element_count = 1, and calculate element_size from the element referenced by the pointer parameter, e.g., for an integer pointer it assumes element_size is:

sizeof(int)

For a single element of a fixed data type, such as an int or a float, no additional information needs to be provided in the EDL prototype for the function. For a void pointer, you must specify an element size or you’ll get an error at compile time. For arrays, char and wchar_t strings, and other types where the length of the data buffer is more than one element you must specify the number of elements in the buffer or only one element will be copied.

Add either the count or size parameter (or both) to the bracketed keywords for the pointer as appropriate. They can be set to a constant value or one of the parameters to the function. For most cases, count and size are functionally the same, but it’s good practice to use them in their correct contexts. Strictly speaking, you would only specify size when passing a void pointer. Everything else would use count.

If you are passing a C string or wstring (a NULL-terminated char or wchar_t array), then you can use the string or wstring parameter in place of count or size. In this case, the edge routines will determine the size of the buffer by getting the length of the string directly.

function([in, size=12] void *param);
function([in, count=len] char *buffer, uint32_t len);
function([in, string] char *cstr);

Note that you can only use string or wstring if the direction is set to [in] or [in, out]. When the direction is set only to [out], the string has not yet been created so the edge routine can’t know the size of the buffer. Specifying [out, string] will generate an error at compile time.

Wrapper and Bridge Functions

We are now ready to define our wrapper and bridge functions. As we pointed out above, the majority of our ECALLs will be wrappers around the class methods in Vault. The class definition for the public member functions is shown below:

class PASSWORDMANAGERCORE_API Vault
{
	// Non-public methods and members ommitted for brevity

public:
	Vault();
	~Vault();

	int initialize();
	int initialize(const char *header, UINT16 size);
	int load_vault(const char *edata);

	int get_header(unsigned char *header, UINT16 *size);
	int get_vault(unsigned char *edate, UINT32 *size);

	UINT32 get_db_size();

	void lock();
	int unlock(const char *password);

	int set_master_password(const char *password);
	int change_master_password(const char *oldpass, const char *newpass);

	int accounts_get_count(UINT32 *count);
	int accounts_get_info(UINT32 idx, char *mbname, UINT16 *mbname_len, char *mblogin, UINT16 *mblogin_len, char *mburl, UINT16 *mburl_len);

	int accounts_get_password(UINT32 idx, char **mbpass, UINT16 *mbpass_len);

	int accounts_set_info(UINT32 idx, const char *mbname, UINT16 mbname_len, const char *mblogin, UINT16 mblogin_len, const char *mburl, UINT16 mburl_len);
	int accounts_set_password(UINT32 idx, const char *mbpass, UINT16 mbpass_len);

	int accounts_generate_password(UINT16 length, UINT16 pwflags, char *cpass);

	int is_valid() { return _VST_IS_VALID(state); }
	int is_locked() { return ((state&_VST_LOCKED) == _VST_LOCKED) ? 1 : 0; }
};

There are several problem functions in this class. Some of them are immediately obvious, such as the constructor, destructor, and the overloads for initialize(). These are C++ features that we must invoke using C functions. Some of the problems, though, are not immediately obvious because they stem from the function’s inherent design. (Some of these problem methods were poorly designed on purpose so that we could cover specific issues in this tutorial, but some were just poorly designed, period!) We’ll tackle each problem, one by one, presenting both the prototypes for the wrapper functions and the EDL prototypes for the proxy/bridge routines.

The Constructor and Destructor

In the non-Intel SGX code path, the Vault class is a member of PasswordManagerCoreNative. We can’t do this for the Intel SGX code path; however, the enclave can include C++ code so long as the bridge functions themselves are pure C functions.

Since we have already limited the enclave to a single thread, we can make the Vault class a static, global object in the enclave. This greatly simplifies our code and eliminates the need for creating bridge functions and logic to instantiate it.

The Overload on initialize()

There are two prototypes for the initialize() method:

The method with no arguments initializes the Vault object for a new password vault with no contents. This is a password vault that the user is creating for the first time.
The method with two arguments initializes the Vault object from the header of the vault file. This represents an existing password vault that the user is opening (and, later on, attempting to unlock).

This will be broken up into two wrapper functions:

ENCLAVEBRIDGE_API int ew_initialize();
ENCLAVEBRIDGE_API int ew_initialize_from_header(const char *header, uint16_t hsize);

And the corresponding ECALLs will be defined as:

public int ve_initialize ();
public int ve_initialize_from_header ([in, count=len] unsigned char *header, uint16_t len);

get_header()

This method has a fundamental design issue. Here’s the prototype:

int get_header(unsigned char *header, uint16_t *size);

This function accomplishes one of two tasks:

It gets the header block for the vault file and places it in the buffer pointed to by header. The caller must allocate enough memory to store this data.
If you pass a NULL pointer in the header parameter, the uint16_t pointed to by size is set to the size of the header block, so that the caller knows how much memory to allocate.

This is a fairly common compaction technique in some programming circles, but it presents a problem for enclaves: when you pass a pointer to an ECALL or an OCALL, the edge functions copy the data referenced by the pointer into or out of the enclave (or both). Those edge functions need to know the size of the data buffer so they know how many bytes to copy. The first usage involves a valid pointer with a variable size which is not a problem, but the second usage has a NULL pointer and a size of zero.

We could probably come up with an EDL prototype for the ECALL that could make this work, but clarity should generally trump brevity. It’s better to split this into two ECALLs:

public int ve_get_header_size ([out] uint16_t *sz);
public int ve_get_header ([out, count=len] unsigned char *header, uint16_t len);

The enclave wrapper function will take care of the necessary logic so that we don’t have to make changes to other classes:

ENCLAVEBRIDGE_API int ew_get_header(unsigned char *header, uint16_t *size)
{
	int vault_rv;

	if (!get_enclave(NULL)) return NL_STATUS_SGXERROR;

	if ( header == NULL ) sgx_status = ve_get_header_size(enclaveId, &vault_rv, size);
	else sgx_status = ve_get_header(enclaveId, &vault_rv, header, *size);

	RETURN_SGXERROR_OR(vault_rv);
}

accounts_get_info()

This method operates similarly to get_header(): pass a NULL pointer and it returns the size of the object in the corresponding parameter. However, it is uglier and sloppier because of the multiple parameter arguments. It is better off being broken up into two wrapper functions:

ENCLAVEBRIDGE_API int ew_accounts_get_info_sizes(uint32_t idx, uint16_t *mbname_sz, uint16_t *mblogin_sz, uint16_t *mburl_sz);
ENCLAVEBRIDGE_API int ew_accounts_get_info(uint32_t idx, char *mbname, uint16_t mbname_sz, char *mblogin, uint16_t mblogin_sz, char *mburl, uint16_t mburl_sz);

And two corresponding ECALLs:

public int ve_accounts_get_info_sizes (uint32_t idx, [out] uint16_t *mbname_sz, [out] uint16_t *mblogin_sz, [out] uint16_t *mburl_sz);
public int ve_accounts_get_info (uint32_t idx,
	[out, count=mbname_sz] char *mbname, uint16_t mbname_sz,
	[out, count=mblogin_sz] char *mblogin, uint16_t mblogin_sz,
	[out, count=mburl_sz] char *mburl, uint16_t mburl_sz
);

accounts_get_password()

This is the worst offender of the lot. Here’s the prototype:

int accounts_get_password(UINT32 idx, char **mbpass, UINT16 *mbpass_len);

The first thing you’ll notice is that it passes a pointer to a pointer in mbpass. This method is allocating memory.

In general, this is not a good design. No other method in the Vault class allocates memory so it is internally inconsistent, and the API violates convention by not providing a method to free this memory on the caller’s behalf. It also poses a unique problem for enclaves: an enclave cannot allocate memory in untrusted space.

This could be handled in the wrapper function. It could allocate the memory and then make the ECALL and it would all be transparent to the caller, but we have to modify the method in the Vault class, regardless, so we should just fix this the correct way and make the corresponding changes to PasswordManagerCoreNative. The caller should be given two functions: one to get the password length and one to fetch the password, just as with the previous two examples. PasswordManagerCoreNative should be responsible for allocating the memory, not any of these functions (the non-Intel SGX code path should be changed, too).

ENCLAVEBRIDGE_API int ew_accounts_get_password_size(uint32_t idx, uint16_t *len);
ENCLAVEBRIDGE_API int ew_accounts_get_password(uint32_t idx, char *mbpass, uint16_t len);

The EDL definition should look familiar by now:

public int ve_accounts_get_password_size (uint32_t idx, [out] uint16_t *mbpass_sz);
public int ve_accounts_get_password (uint32_t idx, [out, count=mbpass_sz] char *mbpass, uint16_t mbpass_sz);

load_vault()

The problem with load_vault() is subtle. The prototype is fairly simple, and at first glance it may look completely innocuous:

int load_vault(const char *edata);

What this method does is load the encrypted, serialized password database into the Vault object. Because the Vault object has already read the header, it knows how large the incoming buffer will be.

The issue here is that the enclave’s edge functions don’t have this information. A length has to be explicitly given to the ECALL so that the edge function knows how many bytes to copy from the incoming buffer into the enclave’s internal buffer, but the size is stored inside the enclave. It’s not available to the edge function.

The wrapper function’s prototype can mirror the class method’s prototype, as follows:

ENCLAVEBRIDGE_API int ew_load_vault(const unsigned char *edata);

The ECALL, however, needs to pass the header size as a parameter so that it can be used to define the size of the incoming data buffer in the EDL file:

public int ve_load_vault ([in, count=len] unsigned char *edata, uint32_t len)

To keep this transparent to the caller, the wrapper function will be given extra logic. It will be responsible for fetching the vault size from the enclave and then passing it through as a parameter to this ECALL.

ENCLAVEBRIDGE_API int ew_load_vault(const unsigned char *edata)
{
	int vault_rv;
	uint32_t dbsize;

	if (!get_enclave(NULL)) return NL_STATUS_SGXERROR;

	// We need to get the size of the password database before entering the enclave
	// to send the encrypted blob.

	sgx_status = ve_get_db_size(enclaveId, &dbsize);
	if (sgx_status == SGX_SUCCESS) {
		// Now we can send the encrypted vault data across.

		sgx_status = ve_load_vault(enclaveId, &vault_rv, (unsigned char *) edata, dbsize);
	}

	RETURN_SGXERROR_OR(vault_rv);
}

A Few Words on Unicode

In Part 3, we mentioned that the PasswordManagerCoreNative class is also tasked with converting between wchar_t and char strings. Given that enclaves support the wchar_t data type, why do this at all?

This is a design decision intended to minimize our footprint. In Windows, the wchar_t data type is the native encoding for Win32 APIs and it stores UTF-16 encoded characters. In UTF-16, each character is 16 bits in order to support non-ASCII characters, particularly for languages that aren’t based on the Latin alphabet or have a large number of characters. The problem with UTF-16 is that a character is always 16-bits long, even when encoding plain ASCII text.

Rather than store twice as much data both on disk and inside the enclave for the common case where the user’s account information is in plain ASCII and incur the performance penalty of having to copy and encrypt those extra bytes, the Tutorial Password Manager converts all of the strings coming from .NET to the UTF-8 encoding. UTF-8 is a variable-length encoding, where each character is represented by one to four 8-bit bytes. It is backwards-compatible with ASCII and it results in a much more compact encoding than UTF-16 for plain ASCII text. There are cases where UTF-8 will result in longer strings than UTF-16, but for our tutorial password manager we’ll accept that tradeoff.

A commercial application would choose the best encoding for the user’s native language, and then record that encoding in the vault (so that it would know which encoding was used to create it in case the vault is opened on a system using a different native language).

Sample Code

Coming Up Next

In Part 5 of the tutorial we’ll complete the enclave by porting the Crypto, DRNG, and Vault classes to the enclave, and connecting them to the ECALLs. Stay tuned!

↧

Happy Together: Ground-Breaking Media Performance with Intel® Processors + Software - Oct. 27 Free Webinar

September 23, 2016, 10:02 am

Latest and popular articles on Intel Technologies

≫ Next: Advanced Bitrate Control Methods in Intel® Media SDK

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 4, Enclave Design

REGISTER NOW

Now, you can get the sweetest, fastest, high density and quality results for media workloads and video streaming - with the latest Intel hardware and media software working together. Take advantage of these platforms and learn how to access hardware-accelerated codecs on Intel® Xeon® E3-1500 v5 and 6th generation Intel® Core™ processors (codenamed Skylake) in a free webinar on Oct. 27 at 9.a.m. (Pacific).

Optimize media solutions and apps for HEVC, AVC and MPEG-2 using Intel® Media Server Studioor Intel® Media SDK
Achieve up to real-time 4K@60fps HEVC, or up to 18 AVC HD@30fps transcoding sessions on one platform**
Access big performance boosts are possible with Intel graphics processors (GPUs)
Get the skinny on shortcuts to fast-track results

Sign Up Today Oct. 27 Free Webinar: Happy Together: Ground-Breaking Media Performance with Intel® Processors + Software

intelskl-hevc — Technical specifications apply. See performance benchmarks and the Media Server Studio site for more details.

Webinar Speaker

Jeff McAllister– Media Software Technical Consulting Engineer

↧

Advanced Bitrate Control Methods in Intel® Media SDK

September 27, 2016, 2:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Driver Support Matrix for Intel® Media SDK and OpenCL™

≪ Previous: Happy Together: Ground-Breaking Media Performance with Intel® Processors + Software - Oct. 27 Free Webinar

Introduction

In the world of media, there is a great demand to increase encoder quality but this comes with tradeoffs between quality and bandwidth consumption. This article addresses some of those concerns by discussing advanced bitrate control methods, which provide the ability to increase quality (relative to legacy rate controls) while maintaining the bitrate constant using Intel® Media SDK/ Intel® Media Server Studio tools.

The Intel Media SDK encoder offers many bitrate control methods, which can be divided into legacy and advanced/special purpose algorithms. This article is the 2nd part of 2-part series of Bitrate Control Methods in Intel® Media SDK. The legacy rate control algorithms are detailed in the 1st part, which is Bitrate Control Methods (BRC) in Intel® Media SDK; the advanced rate control methods (summarized in the table below) will be explained in this article.

Rate Control	HRD/VBV Compliant	OS supported	Usage
LA	No	Windows/Linux	Storage transcodes
LA_HRD	Yes	Windows/Linux	Storage transcodes; Streaming solution (where low latency is not a requirement)
ICQ	No	Windows	Storage transcodes (better quality with smaller file size)
LA_ICQ	No	Windows	Storage transcodes

The following tools (along with the downloadable links) are what we used to explain the concepts and generate performance data for this article:

Software- Intel® Media Server Studio and Intel® Media SDK
Code Samples - Version 6.0.0.142
Analysis tool- Intel® Video Pro Analyzer(VPA) and Video Quality Caliper(VQC), a component in Intel® Media Server Studio Professional Edition and Intel® Video Pro Analyzer.
Raw input stream -Sintel 1080p
System Used -
- CPU: Intel® Core® i5-5300U CPU @ 2.30GHz
- OS: Microsoft Windows 8.1 Enterprise
- Architecture: 64-bit
- Graphics Devices: Intel® HD Graphics 5500

Look Ahead (LA) Rate Control

As the name explains, this bitrate control method looks at successive frames, or the frames to be encoded next, and stores them in a look-ahead buffer. The number of frames or the length of the look ahead buffer can be specified by the LookAheadDepth parameter. This rate control is recommended for transcoding/encoding in a storage solution.

Generally, many parameters can be used to modify the quality/performance of the encoded stream. In this particular rate control, the encoding performance can be controlled by changing the size of the look ahead buffer. The LookAheadDepth parameter value can be changed between 10 - 100 to specify the size of the look ahead buffer. The LookAheadDepth parameter specifies the number of frames that the SDK encoder analyzes before encoding. As the LookAheadDepth increases, so does the number of frames that the encoder looks into; this results in an increase in quality of the encoded stream, however the performance (encoding frames per second) will decrease. In our experiments, this performance tradeoff was negligible for small input streams such as SIntel1080p.

Look Ahead rate control is enabled by default in sample_encode and sample_multi_transcode, part of code samples. The example below describes how to use this rate control method using the sample_encode application.

sample_encode.exe h264 -i sintel_1080p.yuv -o LA_out.264 -w 1920 -h 1080 -b 10000 –f 30 -lad 100 -la

As the value of LookAheadDepth increases, encoding quality improves, because the number of frames stored in the look ahead buffer has also increased, and the encoder will have more visibility to upcoming frames.

It should be noted that LA is not HRD (Hypothetical Reference Decoder) compliant. The following picture, obtained from Intel® Video Pro Analyzer shows a HRD buffer fullness view with “Buffer” mode enabled where sub-mode “HRD” is greyed out. This means no HRD parameters were passed in the stream headers, which indicates LA rate control is not HRD compliant. The left axis of the plot shows frame sizes and the right axis of the plot shows the slice QP (Quantization Parameter) values.

LA BRC — Figure 1: Snapshot of Intel Video Pro Analyzer analyzing H264 stream(Sintel -1080p), encoded using LA rate control method.

Sliding Window condition:

Sliding window algorithm is a part of the Look Ahead rate control method. This algorithm is applicable for both LA and LA_HRD rate control methods by defining WinBRCMaxAvgKbps and WinBRCSize through the mfxExtCodingOption3 structure.

Sliding window condition is introduced to strictly constrain the maximum bitrate of the encoder by changing two parameters: WinBRCSize and WinBRCMaxAvgKbps. This helps in limiting the achieved bitrate which makes it a good fit in limited bandwidth scenarios such as live streaming.

WinBRCSize parameter specifies the sliding window size in frames. A setting of zero means that sliding window condition is disabled.
WinBRCMaxAvgKbps specifies the maximum bitrate averaged over a sliding window specified by WinBRCSize.

In this technique, the average bitrate in a sliding window of WinBRCSize must not exceed WinBRCMaxAvgKbps. The above condition becomes weaker as the sliding window size increases and becomes stronger if the sliding window size value decreases. Whenever this condition fails, the frame will be automatically re-encoded with a higher quantization parameter and performance of the encoder decreases as we keep encountering failures. To reduce the number of failures and to avoid re-encoding, frames within the look ahead buffer will be analyzed by the encoder. A peak will be detected when there is a condition failure by encountering a large frame in the look ahead buffer. Whenever a peak is predicted, the quantization parameter value will be increased, thus reducing the frame size.

Sliding window can be implemented by adding the following code to the pipeline_encode.cpp program in the sample_encode application.

m_CodingOption3.WinBRCMaxAvgKbps = 1.5*TargetKbps;
m_CodingOption3.WinBRCSize = 90; //3*framerate
m_EncExtParams.push_back((mfxExtBuffer *)&m_CodingOption3);

The above values were chosen when encoding sintel_1080p.yuv of 1253 frames with H.264 codec, TargetKbps = 10000, framerate = 30fps. Sliding window parameter values (WinBRCMaxAvgKbps and WinBRCSize) are subject to change when using different input options.

If WinBRCMaxAvgKbps is close to TargetKbps and WinBRCSize almost equals 1, the sliding window will degenerate into the limitation of the maximum frame size (TargetKbps/framerate).

Sliding window condition can be evaluated by checking in any WinBRCSize consecutive frames, the total encoded size doesn't exceed the value set by WinBRCMaxAvgKbps. The following equation explains the sliding window condition.

The condition of limiting frame size can be checked after the asynchronous encoder run and encoded data is written back to the output file in pipeline_encode.cpp.

Look Ahead with HRD Compliance (LA_HRD) Rate Control

As Look Ahead bitrate control is not HRD compliant, there is a dedicated mode to achieve HRD compliance with the LookAhead algorithm, known as LA_HRD mode (MFX_RATECONTROL_LA_HRD). With HRD compliance, the Coded Picture Buffer should neither overflow nor underflow. This rate control is recommended in storage transcoding solutions and streaming scenarios, where low latency is not a major requirement.

To use this rate control in sample_encode, it will require code changes as illustrated below -

Statements to be added in sample_encode.cpp file within ParseInputString() function

else if (0 == msdk_strcmp(strInput[i], MSDK_STRING("-hrd")))
pParams->nRateControlMethod = MFX_RATECONTROL_LA_HRD;

LookAheadDepth value can be mentioned in the command line when executing the sample_encode binary. The example below describes how to use this rate control method using the sample_encode application.

sample_encode.exe h264 -i sintel_1080p.yuv -o LA_out.264 -w 1920 -h 1080 -b 10000 –f 30 -lad 100 –hrd

In the following graph, the LookAheadDepth(lad) value is 100.

Look Ahead HRD

Figure 2: a snapshot of Intel® Video Pro Analyzer(VPA), which verifies that LA_HRD rate control is HRD compliant. The buffer fullness mode is activated by selecting “Buffer” mode and “HRD” is chosen in sub-mode.

The above figure shows HRD buffer fullness view with “Buffer” mode enabled in Intel VPA, in which the sub-mode “HRD” is selected. The horizontal red lines show the upper and lower limits of the buffer and green line shows the instantaneous buffer fullness. The buffer fullness didn’t cross the upper and lower limits of the buffer. This means neither overflow nor underflow occurred in this rate control.

Extended Look Ahead (LA_EXT) Rate Control

For 1:N transcoding scenarios (1 decode and N encode session), there is an optimized lookahead algorithm knows as Extended Look Ahead Rate Control algorithm (MFX_RATECONTROL_LA_EXT), available only in Intel® Media Server Studio (not part of the Intel® Media SDK). This is recommended for broadcasting solutions.

An application should be able to load the plugin ‘mfxplugin64_h264la_hw.dll’ to support MFX_RATECONTROL_LA_EXT. This plugin can be found in the following location in the local system, where the Intel® Media Server Studio is installed.

“\Program Installed\Software Development Kit\bin\x64\588f1185d47b42968dea377bb5d0dcb4”.

The path of this plugin needs to be mentioned explicitly because it is not part of the standard installation directory. This capability can be used in either of two ways:

Preferred Method - Register the plugin with registry and point all necessary attributes such as API version, plugin type, path etc; so the dispatcher, which is a part of the software, can find it through the registry and connect to a decoding/encoding session.
Have all binaries (Media SDK, plugin, and app) in a directory and execute from the same directory.

LookAheadDepth parameter must be mentioned only once and considered to be the same value of LookAheadDepth of all N transcoded streams. LA_EXT rate control can be implemented using sample_multi_transcode, below is the example cmd line -

sample_multi_transcode.exe -par file_1.par

Contents of the par file are

-lad 40 -i::h264 input.264 -join -la_ext -hw_d3d11 -async 1 -n 300 -o::sink
-h 1088 -w 1920 -o::h264 output_1.0.h264 -b 3000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300
-h 1088 -w 1920 -o::h264 output_2.h264 -b 5000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300
-h 1088 -w 1920 -o::h264 output_3.h264 -b 7000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300
-h 1088 -w 1920 -o::h264 output_4.h264 -b 10000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300

Intelligent Constant Quality (ICQ) Rate Control

The ICQ bitrate control algorithm is designed to improve subjective video quality of an encoded stream: it may or may not improve video quality objectively - depending on the content. ICQQuality is a control parameter which defines the quality factor for this method. ICQQuality parameter can be changed between 1 - 51, where 1 corresponds to the best quality. The achieved bitrate and encoder quality (PSNR) can be adjusted by increasing or decreasing ICQQuality parameter. This rate control is recommended for storage solutions, where high quality is required while maintaining a smaller file size.

To use this rate control in sample_encode, it will require code changes as explained below -

Statements to be added in sample_encode.cpp within ParseInputString() function

else if (0 == msdk_strcmp(strInput[i], MSDK_STRING("-icq")))
pParams->nRateControlMethod = MFX_RATECONTROL_ICQ;

ICQQuality is available in the mfxInfoMFX structure. The desired value can be entered for this variable in InitMfxEncParams() function, e.g.:

m_mfxEncParams.mfx.ICQQuality = 12;

The example below describes how to use this rate control method using the sample_encode application.

sample_encode.exe h264 -i sintel_1080p.yuv -o ICQ_out.264 -w 1920 -h 1080 -b 10000 -icq

VBR vs ICQ RD Graph — Figure 3: Using Intel Media SDK samples and Video Quality Caliper, compare VBR and ICQ (ICQQuality varied between 13 and 18) with H264 encoding for 1080p, 30fps sintel.yuv of 1253 frames

Using about the same bitrate, ICQ shows improved Peak Signal to Noise Ratio (PSNR) in the above plot. The RD-graph data for the above plot is captured using the Video Quality Caliper, which compares two different streams encoded with ICQ and VBR.

Observation from above performance data:

At the same achieved bitrate, ICQ shows much improved quality (PSNR) compared to VBR, while maintaining the same encoding FPS.
The encoding bitrate and quality of the stream decreases as the ICQQuality parameter value increases.

The snapshot below shows a subjective comparison between encoded frames using VBR (on the left) and ICQ (on the right). Highlighted sections demonstrate missing details in VBR and improvements in ICQ.

VBR and ICQ subjective comparison — Figure 4: Using Video Quality Caliper, compare encoded frames subjectively for VBR vs ICQ

Look Ahead & Intelligent Constant Quality (LA_ICQ) Rate Control

This method is the combination of ICQ with Look Ahead. This rate control is also recommended for storage solutions. ICQQuality and LookAheadDepth are the two control parameters where the qualify factor is specified by mfxInfoMFX::ICQQuality and look ahead depth is controlled by the mfxExtCodingOption2: LookAheadDepth parameter.

To use this rate control in sample_encode, it requires code changes as explained below -

Statements to be added in sample_encode.cpp within ParseInputString() function

else if (0 == msdk_strcmp(strInput[i], MSDK_STRING("-laicq")))
pParams->nRateControlMethod = MFX_RATECONTROL_LA_ICQ;

ICQQuality is available in the mfxInfoMFX structure. Desired values can be entered for this variable in InitMfxEncParams() function

m_mfxEncParams.mfx.ICQQuality = 12;

LookAheadDepth can be mentioned in command line as lad.

sample_encode.exe h264 -i sintel_1080p.yuv -o LAICQ_out.264 -w 1920 -h 1080 -b 10000 –laicq -lad 100

VBR vs LAICQ RD-graph — Figure 5: Using Intel Media SDK samples and Video Quality Caliper, compare VBR and LA_ICQ (LookAheadDepth 100, ICQQuality varied between 20 and 26) with H264 encoding for 1080p, 30fps sintel.yuv of 1253 frames

At similar bitrate, better PSNR is observed for LA_ICQ compared to VBR as shown in the above plot. By keeping LookAheadDepth value at 100, the ICQQuality parameter value was changed between 1 - 51. The RD-graph data for this plot was captured using the Video Quality Caliper, which compares two different streams encoded with LA_ICQ and VBR.

Conclusion

There are several advanced bitrate control methods available to play with, to see if higher quality encoded streams can be achieved while maintaining bandwidth requirements constant. Each rate control has its own advantages and can be used in specific industry level use-cases depending on the requirement. To implement the Bitrate Control methods, refer also to the Intel® Media SDK Reference Manual, which comes with an installation of the Intel® Media SDK or Intel® Media Server Studio, and the Intel® Media Developer’s Guide from the documentation website. Visit Intel’s media support forum for further questions.

Resources

Bitrate Control Methods in Intel® Media SDK

↧

Driver Support Matrix for Intel® Media SDK and OpenCL™

October 10, 2016, 4:06 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 5, Enclave Development

≪ Previous: Advanced Bitrate Control Methods in Intel® Media SDK

Developers can access Intel's processor graphics GPU capabilities through the Intel® Media SDK and Intel® SDK for OpenCL™ Applications. This article provides more information on how the software, driver, and hardware layers map together.

Delivery Models

There are two different packaging/delivery models:

For Windows* Client: all components needed to run applications written with these SDKs are distributed with the Intel graphics driver. These components are intended to be updated on a separate cadence than Media SDK/OpenCL installs. Drivers are released separately and moving to the latest available driver is usually encouraged. Use Intel® Driver Update Utility to keep your system up-to-date with latest graphics drivers or manually update from downloadcenter.intel.com. To verify driver version installed on the machine, use the system analyzer tool.
For Linux* and Windows Server*:Intel® Media Server Studio is an integrated software tools suite that includes both SDKs, plus a specific version of the driver validated with each release.

Driver Branches

Driver development uses branches covering specific hardware generations, as described in the table below. The general pattern is that each branch covers only the two latest architectures (N and N-1). This means there are two driver branches for each architecture except the newest one. Intel recommends using the most recent branch. If issues are found it is easier to get fixes for newer branches. The most recent branch has the most resources and gets the most frequent updates. Older branches/architectures get successively fewer resources and updates.

Driver Support Matrix
Processor Architecture	Intel® Integrated Graphics	Windows	Linux
3rd Generation Core, 4th Generation Core (Ivybridge/Haswell) LEGACY ONLY, downloads available but not updated	Ivybridge - Gen 7 Graphics Haswell - Gen 7.5 graphics	15.33 Operating Systems: Client: Windows 7, 8, 8.1, 10 Server: Windows Server 2012 r2	16.3 (Media Server Studio 2015 R1) Gold Operating Systems: Ubuntu 12.04, SLES 11.3
4th Generation Core, 5th Generation Core (Haswell/Broadwell) LEGACY	Haswell - Gen 7.5 graphics Broadwell - Gen 8 graphics	15.36 Operating Systems: Client: Windows 7, 8, 8.1, 10 Server: Windows Server 2012 r2	16.4 (Media Server Studio 2015/2016) Gold Operating Systems: CentOS 7.1 Generic kernel: 3.14.5
5th Generation Core 6th Generation Core (Broadwell/Skylake) CURRENT RELEASE	Broadwell - Gen 8 graphics Skylake - Gen 9 graphics	15.40 (Broadwell/Skylake Media Server Studio 2017) 15.45 (Skylake + forward, client) Operating Systems: Client: Windows 7, 8, 8.1, 10 Server: Windows Server 2012 r2	16.5 (Media Server Studio 2017) Gold Operating Systems: CentOS 7.2 Generic kernel: 4.4.0

Windows client note: Many OEMs have specialized drivers with additional validation. If you see a warning during install please check with your OEM for supported drivers for your machine.

Hardware details

Ivybridge (IVB) codename for 3rd generation Intel processor based on 22nm manufacturing technology and Gen 7 graphics architecture.

Ivybridge

Gen7

3rd Generation Core

GT2: Intel® HD Graphics 2500

GT2: Intel® HD Graphics 4000

Haswell (HSW) codename for 4th generation Intel processor based on 22nm manufacturing technology and Gen 7.5 graphics architecture. Available in multiple graphics versions- GT2(20 Execution Units), GT3(40 Execution Units) and GT3e(40 Execution Units + eDRAM to provide faster secondary cache).

Haswell

Gen 7.5

4th Generation Core

GT2: Intel® HD Graphics 4200

GT2: Intel® HD Graphics 4400

GT2: Intel® HD Graphics 4600

GT3: Intel® Iris™ Graphics 5000

GT3: Intel® Iris™ Graphics 5100

GT3e: Intel® Iris™ Pro Graphics 5200

Broadwell (BDW) codename for 5th generation Intel processor based on 14nm die shrink of Haswell architecture and Gen 8 graphics architecture. Available in multiple graphics versions - GT2(24 Execution Units), GT3(48 Execution Units) and GT3e(48 Execution Units + eDRAM to provide faster secondary cache).

Broadwell

Gen8

5th Generation Core

GT2: Intel® HD Graphics 5500

GT2: Intel® HD Graphics 5600

GT2: Intel® HD Graphics 5700

GT3: Intel® Iris™ Graphics 6100

GT3e: Intel® Iris™ Pro Graphics 6200

Skylake (SKL) codename for 6th generation Intel processor based on 14nm manufacturing technology and Gen 9 graphics architecture. Available in multiple graphics versions - GT1 (12 Execution Units), GT2(24 Execution Units), GT3(48 Execution Units) and GT3e(48 Execution Units + eDRAM), GT4e (72 Execution Units + eDRAM to provide faster secondary cache).

Skylake

Gen9

6th Generation Core

GT1: Intel® HD Graphics 510 (12 EUs)

GT2: Intel® HD Graphics 520 (24 EUs, 1050MHz)

GT2: Intel® HD Graphics 530 (24 EUs, 1150MHz)

GT3e: Intel® Iris™ Graphics 540 (48 EUs, 1050MHz, 64 MB eDRAM)

GT3e: Intel® Iris™ Graphics 550 (48 EUs, 1100MHz, 64 MB eDRAM)

GT4e: Intel® Iris™ Pro Graphics 580 (72 EUs, 1050 MHz, 128 MB eDRAM)

GT4e: Intel® Iris™ Pro Graphics p580 (72 EUs, 1100 MHz, 128 MB eDRAM)

For more details please check

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Kronos.

↧

Intel® Software Guard Extensions Tutorial Series: Part 5, Enclave Development

October 5, 2016, 1:18 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 6, Dual Code Paths

≪ Previous: Driver Support Matrix for Intel® Media SDK and OpenCL™

In Part 5 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series, we’ll finish developing the enclave for the Tutorial Password Manager application. In Part 4 of the series, we created a DLL to serve as our interface layer between the enclave bridge functions and the C++/CLI program core, and defined our enclave interface. With those components in place, we can now focus our attention on the enclave itself.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series: the completed application with its enclave. This version is hardcoded to run the Intel SGX code path.

The Enclave Components

To identify which components need to be implemented within the enclave, we’ll refer to the class diagram for the application core in Figure 1, which was first introduced in Part 3. As before, the objects that will reside in the enclave are shaded in green while the untrusted components are shaded in blue.

Figure 1. Class diagram for the Tutorial Password Manager with Intel® Software Guard Extensions.

From this we can identify four classes that need to be ported:

Vault
AccountRecord
Crypto
DRNG

Before we get started, however, we do need to make a design decision. Our application must function on systems both with and without Intel SGX support, and that means we can’t simply convert our existing classes so that they function within the enclave. We must create two versions of each: one intended for use in enclaves, and one for use in untrusted memory. The question is, how should this dual-support be implemented?

Option 1: Conditional Compilation

The first option is to implement both the enclave and untrusted functionality in the same source module and use preprocessor definitions and #ifdef statements to compile the appropriate code based on the context. The advantage of this approach is that we only need one source file for each class, and thus do not have to maintain changes in two places. The disadvantages are that the code can be more difficult to read, particularly if the changes between the two versions are numerous or significant, and the project structure will be more complex. Two of our Visual Studio* projects, Enclave and PasswordManagerCore, will share source files, and each will need to set a preprocessor symbol to ensure that the correct source code is compiled.

Option 2: Separate Classes

The second option is to duplicate each source file that has to go into the enclave. The advantages of this approach are that the enclave has its own copy of the source files which we can modify directly, allowing for a simpler project structure and easier code view. But, these come at a cost: if we need to make changes to the classes, those changes must be made in two places, even if those changes are common to both the enclave and untrusted versions.

Option 3: Inheritance

The third option is to use the C++ feature of class inheritance. The functions common to both versions of the class would be implemented in the base class, and the derived classes would implement the branch-specific methods. The big advantage to this approach is that it is a very natural and elegant solution to the problem, using a feature of the language that is designed to do exactly what we need. The disadvantages are the added complexity required in both the project structure and the code itself.

There is no hard and fast rule here, and the decision does not have to be a global one. A good rule of thumb is that Option 1 is best for modules where the changes are small or easily compartmentalized, and Options 2 and 3 are best when the changes are significant or result in source code that is difficult to read and maintain. However, it really comes down to style and preference, and either approach is fine.

For now, we’ll choose Option 2 because it allows for easy side-by-side comparisons of the enclave and untrusted source files. In a future installment of the tutorial series we may switch to Option 3 in order to tighten up the code.

The Enclave Classes

Each class has its own set of issues and challenges when it comes to adapting it to the enclave, but there is one universal truth that will apply to all of them: we no longer have to zero-fill our memory before freeing it. As you recall from Part 3, this was a recommended action when handling secure data in untrusted memory. Because our enclave memory is encrypted by the CPU, using an encryption key that is not available to any hardware layer, the contents of freed memory will contain what appears to be random data to other applications. This means we can remove all calls to SecureZeroMemory that are inside the enclave.

The Vault Class

The Vault class is our interface to the password vault operations. All of our bridge functions act through one or more methods in Vault. Its declaration from Vault.h is shown below.

class PASSWORDMANAGERCORE_API Vault
{
	Crypto crypto;
	char m_pw_salt[8];
	char db_key_nonce[12];
	char db_key_tag[16];
	char db_key_enc[16];
	char db_key_obs[16];
	char db_key_xor[16];
	UINT16 db_version;
	UINT32 db_size; // Use get_db_size() to fetch this value so it gets updated as needed
	char db_data_nonce[12];
	char db_data_tag[16];
	char *db_data;
	UINT32 state;
	// Cache the number of defined accounts so that the GUI doesn't have to fetch
	// "empty" account info unnecessarily.
	UINT32 naccounts;

	AccountRecord accounts[MAX_ACCOUNTS];
	void clear();
	void clear_account_info();
	void update_db_size();

	void get_db_key(char key[16]);
	void set_db_key(const char key[16]);

public:
	Vault();
	~Vault();

	int initialize();
	int initialize(const unsigned char *header, UINT16 size);
	int load_vault(const unsigned char *edata);

	int get_header(unsigned char *header, UINT16 *size);
	int get_vault(unsigned char *edata, UINT32 *size);

	UINT32 get_db_size();

	void lock();
	int unlock(const char *password);

	int set_master_password(const char *password);
	int change_master_password(const char *oldpass, const char *newpass);

	int accounts_get_count(UINT32 *count);
	int accounts_get_info_sizes(UINT32 idx, UINT16 *mbname_sz, UINT16 *mblogin_sz, UINT16 *mburl_sz);
	int accounts_get_info(UINT32 idx, char *mbname, UINT16 mbname_sz, char *mblogin, UINT16 mblogin_sz,
		char *mburl, UINT16 mburl_sz);

	int accounts_get_password_size(UINT32 idx, UINT16 *mbpass_sz);
	int accounts_get_password(UINT32 idx, char *mbpass, UINT16 mbpass_sz);

	int accounts_set_info(UINT32 idx, const char *mbname, UINT16 mbname_len, const char *mblogin, UINT16 mblogin_len,
		const char *mburl, UINT16 mburl_len);
	int accounts_set_password(UINT32 idx, const char *mbpass, UINT16 mbpass_len);

	int accounts_generate_password(UINT16 length, UINT16 pwflags, char *cpass);

	int is_valid() { return _VST_IS_VALID(state); }
	int is_locked() { return ((state&_VST_LOCKED) == _VST_LOCKED) ? 1 : 0; }
};

The declaration for the enclave version of this class, which we’ll call E_Vault for clarity, will be identical except for one crucial change: database key handling.

In the untrusted code path, the Vault object must store the database key, decrypted, in memory. Every time we make a change to our password vault we have to encrypt the updated vault data and write it to disk, and that means the key must be at our disposal. We have four options:

Prompt the user for their master password on every change so that the database key can be derived on demand.
Cache the user’s master password so that the database key can be derived on demand without user intervention.
Encrypt, encode, and/or obscure the database key in memory.
Store the key in the clear.

None of these are good solutions and they highlight the need for technologies like Intel SGX. The first is arguably the most secure, but no user would want to run an application that behaved in this manner. The second could be achieved using the SecureString class in .NET*, but it is still vulnerable to inspection via a debugger and there is a performance cost associated with the key derivation function that a user might find unacceptable. The third option is effectively insecure as the second, only it comes without a performance penalty. The fourth option is the worst of the lot.

Our Tutorial Password Manager uses the third option: the database key is XOR’d with a 128-bit value that is randomly generated when a vault file is opened, and it is stored in memory only in this XOR’d form. This is effectively a one-time pad encryption scheme. It is open to inspection for anyone running a debugger, but it does limit the amount of time in which the database key is present in memory in the clear.

void Vault::set_db_key(const char db_key[16])
{
	UINT i, j;
	for (i = 0; i < 4; ++i)
		for (j = 0; j < 4; ++j) db_key_obs[4 * i + j] = db_key[4 * i + j] ^ db_key_xor[4 * i + j];
}

void Vault::get_db_key(char db_key[16])
{
	UINT i, j;
	for (i = 0; i < 4; ++i)
		for (j = 0; j < 4; ++j) db_key[4 * i + j] = db_key_obs[4 * i + j] ^ db_key_xor[4 * i + j];
}

This is obviously security through obscurity, and since we are publishing the source code, it’s not even particularly obscure. We could choose a better algorithm or go to greater lengths to hide both the database key and the pad’s secret key (including how they are stored in memory); but in the end, the method we choose would still be vulnerable to inspection via a debugger, and the algorithm would still be published for anyone to see.

Inside the enclave, however, this problem goes away. The memory is protected by hardware-backed encryption, so even when the database key is decrypted it is not open to inspection by anyone, even a process running with elevated privileges. As a result, we no longer need these class members or methods:

char db_key_obs[16];
char db_key_xor[16];

	void get_db_key(char key[16]);
	void set_db_key(const char key[16]);

We can replace them with just one class member: a char array to hold the database key.

char db_key[16];

The AccountInfo Class

The account data is stored in a fixed-size array of AccountInfo objects as a member of the Vault object. The declaration for AccountInfo is also found in Vault.h, and it is shown below:

class PASSWORDMANAGERCORE_API AccountRecord
{
	char nonce[12];
	char tag[16];
	// Store these in their multibyte form. There's no sense in translating
	// them back to wchar_t since they have to be passed in and out as
	// char * anyway.
	char *name;
	char *login;
	char *url;
	char *epass;
	UINT16 epass_len; // Can't rely on NULL termination! It's an encrypted string.

	int set_field(char **field, const char *value, UINT16 len);
	void zero_free_field(char *field, UINT16 len);

public:
	AccountRecord();
	~AccountRecord();

	void set_nonce(const char *in) { memcpy(nonce, in, 12); }
	void set_tag(const char *in) { memcpy(tag, in, 16); }

	int set_enc_pass(const char *in, UINT16 len);
	int set_name(const char *in, UINT16 len) { return set_field(&name, in, len); }
	int set_login(const char *in, UINT16 len) { return set_field(&login, in, len); }
	int set_url(const char *in, UINT16 len) { return set_field(&url, in, len); }

	const char *get_epass() { return (epass == NULL)? "" : (const char *)epass; }
	const char *get_name() { return (name == NULL) ? "" : (const char *)name; }
	const char *get_login() { return (login == NULL) ? "" : (const char *)login; }
	const char *get_url() { return (url == NULL) ? "" : (const char *)url; }
	const char *get_nonce() { return (const char *)nonce; }
	const char *get_tag() { return (const char *)tag; }

	UINT16 get_name_len() { return (name == NULL) ? 0 : (UINT16)strlen(name); }
	UINT16 get_login_len() { return (login == NULL) ? 0 : (UINT16)strlen(login); }
	UINT16 get_url_len() { return (url == NULL) ? 0 : (UINT16)strlen(url); }
	UINT16 get_epass_len() { return (epass == NULL) ? 0 : epass_len; }

	void clear();
};

We actually don’t need to do anything to this class for it to work inside the enclave. Other than remove the unnecessary calls to SecureZeroFree, this class is fine as is. However, we are going to change it anyway in order to illustrate a point: within the enclave, we gain some flexibility that we did not have before.

Returning to Part 3, another of our guidelines for securing data in untrusted memory space was avoiding container classes that manage their own memory, specifically the Standard Template Library’s std::string class. Inside the enclave this problem goes away, too. For the same reason that we don’t need to zero-fill our memory before freeing it, we don’t have to worry about how the Standard Template Library (STL) containers manager their memory. The enclave memory is encrypted, so even if fragments of our secure data remain there as a result of container operations, they can’t be inspected by other processes.

There’s also a good reason to use the std::string class inside the enclave: reliability. The code behind the STL containers has been through significant peer review over the years and it can be argued that it is safer to use it than implement our own high-level string functions when given the choice. For simple code like what’s in the AccountInfo class, it’s probably not a significant issue, but in more complex programs this can be a huge benefit. However, this does come at the cost of a larger DLL due to the added STL code.

The new class declaration, which we’ll call E_AccountInfo, is shown below:

#define TRY_ASSIGN(x) try{x.assign(in,len);} catch(...){return 0;} return 1

class E_AccountRecord
{
	char nonce[12];
	char tag[16];
	// Store these in their multibyte form. There's no sense in translating
	// them back to wchar_t since they have to be passed in and out as
	// char * anyway.
	string name, login, url, epass;

public:
	E_AccountRecord();
	~E_AccountRecord();

	void set_nonce(const char *in) { memcpy(nonce, in, 12); }
	void set_tag(const char *in) { memcpy(tag, in, 16); }

	int set_enc_pass(const char *in, uint16_t len) { TRY_ASSIGN(epass); }
	int set_name(const char *in, uint16_t len) { TRY_ASSIGN(name); }
	int set_login(const char *in, uint16_t len) { TRY_ASSIGN(login); }
	int set_url(const char *in, uint16_t len) { TRY_ASSIGN(url); }

	const char *get_epass() { return epass.c_str(); }
	const char *get_name() { return name.c_str(); }
	const char *get_login() { return login.c_str(); }
	const char *get_url() { return url.c_str(); }

	const char *get_nonce() { return (const char *)nonce; }
	const char *get_tag() { return (const char *)tag; }

	uint16_t get_name_len() { return (uint16_t) name.length(); }
	uint16_t get_login_len() { return (uint16_t) login.length(); }
	uint16_t get_url_len() { return (uint16_t) url.length(); }
	uint16_t get_epass_len() { return (uint16_t) epass.length(); }

	void clear();
};

The tag and nonce members are still stored as char arrays. Our password encryption is done with AES in GCM mode, using a 128-bit key, a 96-bit nonce, and a 128-bit authentication tag. Since the size of the nonce and the tag are fixed there is no reason to store them as anything other than simple char arrays.

Note that this std::string-based approach has allowed us to almost completely define the class in the header file.

The Crypto Class

The Crypto class provides our cryptographic functions. The class declaration is shown below.

class PASSWORDMANAGERCORE_API Crypto
{
	DRNG drng;

	crypto_status_t aes_init (BCRYPT_ALG_HANDLE *halgo, LPCWSTR algo_id, PBYTE chaining_mode, DWORD chaining_mode_len, BCRYPT_KEY_HANDLE *hkey, PBYTE key, ULONG key_len);
	void aes_close (BCRYPT_ALG_HANDLE *halgo, BCRYPT_KEY_HANDLE *hkey);

	crypto_status_t aes_128_gcm_encrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE pt, DWORD pt_len, PBYTE ct, DWORD ct_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t aes_128_gcm_decrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE ct, DWORD ct_len, PBYTE pt, DWORD pt_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t sha256_multi (PBYTE *messages, ULONG *lengths, BYTE hash[32]);

public:
	Crypto(void);
	~Crypto(void);

	crypto_status_t generate_database_key (BYTE key_out[16], GenerateDatabaseKeyCallback callback);
	crypto_status_t generate_salt (BYTE salt[8]);
	crypto_status_t generate_salt_ex (PBYTE salt, ULONG salt_len);
	crypto_status_t generate_nonce_gcm (BYTE nonce[12]);

	crypto_status_t unlock_vault(PBYTE passphrase, ULONG passphrase_len, BYTE salt[8], BYTE db_key_ct[16], BYTE db_key_iv[12], BYTE db_key_tag[16], BYTE db_key_pt[16]);

	crypto_status_t derive_master_key (PBYTE passphrase, ULONG passphrase_len, BYTE salt[8], BYTE mkey[16]);
	crypto_status_t derive_master_key_ex (PBYTE passphrase, ULONG passphrase_len, PBYTE salt, ULONG salt_len, ULONG iterations, BYTE mkey[16]);

	crypto_status_t validate_passphrase(PBYTE passphrase, ULONG passphrase_len, BYTE salt[8], BYTE db_key[16], BYTE db_iv[12], BYTE db_tag[16]);
	crypto_status_t validate_passphrase_ex(PBYTE passphrase, ULONG passphrase_len, PBYTE salt, ULONG salt_len, ULONG iterations, BYTE db_key[16], BYTE db_iv[12], BYTE db_tag[16]);

	crypto_status_t encrypt_database_key (BYTE master_key[16], BYTE db_key_pt[16], BYTE db_key_ct[16], BYTE iv[12], BYTE tag[16], DWORD flags= 0);
	crypto_status_t decrypt_database_key (BYTE master_key[16], BYTE db_key_ct[16], BYTE iv[12], BYTE tag[16], BYTE db_key_pt[16]);

	crypto_status_t encrypt_account_password (BYTE db_key[16], PBYTE password_pt, ULONG password_len, PBYTE password_ct, BYTE iv[12], BYTE tag[16], DWORD flags= 0);
	crypto_status_t decrypt_account_password (BYTE db_key[16], PBYTE password_ct, ULONG password_len, BYTE iv[12], BYTE tag[16], PBYTE password);

	crypto_status_t encrypt_database (BYTE db_key[16], PBYTE db_serialized, ULONG db_size, PBYTE db_ct, BYTE iv[12], BYTE tag[16], DWORD flags= 0);
	crypto_status_t decrypt_database (BYTE db_key[16], PBYTE db_ct, ULONG db_size, BYTE iv[12], BYTE tag[16], PBYTE db_serialized);

	crypto_status_t generate_password(PBYTE buffer, USHORT buffer_len, USHORT flags);
};

The public methods in this class are modeled to perform various high-level vault operations: unlock_vault, derive_master_key, validate_passphrase, encrypt_database, and so on. Each of these methods invokes one or more cryptographic algorithms in order to complete its task. For example, the unlock_vault method takes the passphrase supplied by the user, runs it through the SHA-256-based key derivation function, and uses the resulting key to decrypt the database key using AES-128 in GCM mode.

These high-level methods do not, however, directly invoke the cryptographic primitives. Instead, they call into a middle layer which implements each cryptographic algorithm as a self-contained function.

Figure 2. Cryptographic library dependancies.

The private methods that make up our middle layer are built on the cryptographic primitives and support functions provided by the underlying cryptographic library, as illustrated in Figure 2. The non-Intel SGX implementation relies on Microsoft’s Cryptography API: Next Generation (CNG) for these, but we can’t use this same library inside the enclave because an enclave cannot have dependencies on external DLLs. To build the Intel SGX version of this class, we need to replace those underlying functions with the ones in the trusted crypto library that is distributed with the Intel SGX SDK. (As you might recall from Part 2, we were careful to choose cryptographic functions that were common to both CNG and the Intel SGX trusted crypto library for this very reason.)

To create our enclave-capable Crypto class, which we’ll call E_Crypto, what we need to do is modify these private methods:

crypto_status_t aes_128_gcm_encrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE pt, DWORD pt_len, PBYTE ct, DWORD ct_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t aes_128_gcm_decrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE ct, DWORD ct_len, PBYTE pt, DWORD pt_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t sha256_multi (PBYTE *messages, ULONG *lengths, BYTE hash[32]);

A description of each, and the primitives and support functions from CNG upon which they are built, is given in Table 1.

Method	Algorithm	CNG Primitives and Support Functions
*aes_128_gcm_encrypt*	AES encryption in GCM mode with: A 128-bit key A 128-bit authentication tag No additional authenticated data (AAD)	BCryptOpenAlgorithmProvider BCryptSetProperty BCryptGenerateSymmetricKey BCryptEncrypt BCryptCloseAlgorithmProvider BCryptDestroyKey
*aes_128_gcm_decrypt*	AES encryption in GCM mode with: A 128-bit key A 128-bit authentication tag No AAD	BCryptOpenAlgorithmProvider BCryptSetProperty BCryptGenerateSymmetricKey BCryptDecrypt BCryptCloseAlgorithmProvider BCryptDestroyKey
*sha256_multi*	SHA-256 hash (incremental)	BCryptOpenAlgorithmProvider BCryptGetProperty BCryptCreateHash BCryptHashData BCryptFinishHash BCryptDestroyHash BCryptCloseAlgorithmProvider

Method

Algorithm

CNG Primitives and Support Functions

aes_128_gcm_encrypt

AES encryption in GCM mode with:

A 128-bit key
A 128-bit authentication tag
No additional authenticated data (AAD)

BCryptOpenAlgorithmProvider
BCryptSetProperty
BCryptGenerateSymmetricKey
BCryptEncrypt
BCryptCloseAlgorithmProvider
BCryptDestroyKey

aes_128_gcm_decrypt

AES encryption in GCM mode with:

A 128-bit key
A 128-bit authentication tag
No AAD

BCryptOpenAlgorithmProvider
BCryptSetProperty
BCryptGenerateSymmetricKey
BCryptDecrypt
BCryptCloseAlgorithmProvider
BCryptDestroyKey

sha256_multi

SHA-256 hash (incremental)

BCryptOpenAlgorithmProvider
BCryptGetProperty
BCryptCreateHash
BCryptHashData
BCryptFinishHash
BCryptDestroyHash
BCryptCloseAlgorithmProvider

Table 1. Mapping Crypto class methods to Cryptography API: Next Generation functions

CNG provides very fine-grained control over its encryption algorithms, as well as several optimizations for performance. Our Crypto class is actually fairly inefficient: each time one of these algorithms is called, it initializes the underlying primitives from scratch and then completely closes them down. This is not a significant issue for a password manager, which is UI-driven and only encrypts a small amount of data at a time. A high-performance server application such as a web or database server would need a more sophisticated approach.

The API for the trusted cryptography library distributed with the Intel SGX SDK more closely resembles our middle layer than CNG. There is less granular control over the underlying primitives, but it does make developing our E_Crypto class much simpler. Table 2 shows the new mapping between our middle layer and the underlying provider.

Method	Algorithm	Intel® SGX Trusted Cryptography Library Primitives and Support Functions
*aes_128_gcm_encrypt*	AES encryption in GCM mode with: A 128-bit key A 128-bit authentication tag No additional authenticated data (AAD)	sgx_rijndael128GCM_encrypt
*aes_128_gcm_decrypt*	AES encryption in GCM mode with: A 128-bit key A 128-bit authentication tag No AAD	sgx_rijndael128GCM_decrypt
*sha256_multi*	SHA-256 hash (incremental)	sgx_sha256_init sgx_sha256_update sgx_sha256_get_hash sgx_sha256_close

Method

Algorithm

Intel® SGX Trusted Cryptography Library Primitives and Support Functions

aes_128_gcm_encrypt

AES encryption in GCM mode with:

A 128-bit key
A 128-bit authentication tag
No additional authenticated data (AAD)

sgx_rijndael128GCM_encrypt

aes_128_gcm_decrypt

AES encryption in GCM mode with:

A 128-bit key
A 128-bit authentication tag
No AAD

sgx_rijndael128GCM_decrypt

sha256_multi

SHA-256 hash (incremental)

sgx_sha256_init
sgx_sha256_update
sgx_sha256_get_hash
sgx_sha256_close

Table 2. Mapping Crypto class methods to Intel® SGX Trusted Cryptography Library functions

The DRNG Class

The DRNG class is the interface to the on-chip digital random number generator, courtesy of Intel® Secure Key. To stay consistent with our previous actions we’ll name the enclave version of this class E_DRNG.

We’ll be making two changes in this class to prepare it for the enclave, but both of these changes are internal to the class methods. The class declaration will stay the same.

The CPUID Instruction

One of our application requirements is that the CPU supports Intel Secure Key. Even though Intel SGX is a newer feature than Secure Key, there is no guarantee that all future generations of all possible CPUs which support Intel SGX will also support Intel Secure Key. While it’s hard to conceive of such a situation today, best practice is to not assume a coupling between features where one does not exist. If a set of features have independent detection mechanisms, then you must assume that the features are independent of one another and check for them separately. This means that no matter how tempting it may be to assume that a CPU with support for Intel SGX will also support Intel Secure Key, we absolutely must not do so under any circumstances.

Further complicating the situation is the fact that Intel Secure Key actually consists of two independent features, both of which must also be checked separately. Our application must determine support for both the RDRAND and RDSEED instructions. For more information on Intel Secure Key, see the Intel Digital Random Number Generator (DRNG) Software Implementation Guide.

The constructor in the DRNG class is responsible for the RDRAND and RDSEED feature detection checks. It makes the necessary calls to the CPUID instruction using the compiler intrinsics __cpuid and __cpuidex, and sets a static, global variable with the results.

static int _drng_support= DRNG_SUPPORT_UNKNOWN;
static int _drng_support= DRNG_SUPPORT_UNKNOWN;

DRNG::DRNG(void)
{
	int info[4];

	if (_drng_support != DRNG_SUPPORT_UNKNOWN) return;

	_drng_support= DRNG_SUPPORT_NONE;

	// Check our feature support

	__cpuid(info, 0);

	if ( memcmp(&(info[1]), "Genu", 4) ||
		memcmp(&(info[3]), "ineI", 4) ||
		memcmp(&(info[2]), "ntel", 4) ) return;

	__cpuidex(info, 1, 0);

	if ( ((UINT) info[2]) & (1<<30) ) _drng_support|= DRNG_SUPPORT_RDRAND;

#ifdef COMPILER_HAS_RDSEED_SUPPORT
	__cpuidex(info, 7, 0);

	if ( ((UINT) info[1]) & (1<<18) ) _drng_support|= DRNG_SUPPORT_RDSEED;
#endif
}

The problem for the E_DRNG class is that CPUID is not a legal instruction inside of an enclave. To call CPUID, one must use an OCALL to exit the enclave and then invoke CPUID in untrusted code. Fortunately, the Intel SGX SDK designers have created two convenient functions that greatly simplify this task: sgx_cpuid and sgx_cpuidex. These functions automatically perform the OCALL for us, and the OCALL itself is automatically generated. The only requirement is that the EDL file must import the sgx_tstdc.edl header:

enclave {

	/* Needed for the call to sgx_cpuidex */
	from "sgx_tstdc.edl" import *;

    trusted {
        /* define ECALLs here. */

		public int ve_initialize ();
		public int ve_initialize_from_header ([in, count=len] unsigned char *header, uint16_t len);
		/* Our other ECALLs have been omitted for brevity */
	};

    untrusted {
    };
};

The feature detection code in the E_DRNG constructor becomes:

static int _drng_support= DRNG_SUPPORT_UNKNOWN;

E_DRNG::E_DRNG(void)
{
	int info[4];
	sgx_status_t status;

	if (_drng_support != DRNG_SUPPORT_UNKNOWN) return;

	_drng_support = DRNG_SUPPORT_NONE;

	// Check our feature support

	status= sgx_cpuid(info, 0);
	if (status != SGX_SUCCESS) return;

	if (memcmp(&(info[1]), "Genu", 4) ||
		memcmp(&(info[3]), "ineI", 4) ||
		memcmp(&(info[2]), "ntel", 4)) return;

	status= sgx_cpuidex(info, 1, 0);
	if (status != SGX_SUCCESS) return;

	if (info[2]) & (1 << 30)) _drng_support |= DRNG_SUPPORT_RDRAND;

#ifdef COMPILER_HAS_RDSEED_SUPPORT
	status= __cpuidex(info, 7, 0);
	if (status != SGX_SUCCESS) return;

	if (info[1]) & (1 << 18)) _drng_support |= DRNG_SUPPORT_RDSEED;
#endif
}

Because calls to the CPUID instruction must take place in untrusted memory, the results of CPUID cannot be trusted! This warning applies whether you run CPUID yourself or rely on the SGX functions to do it for you. The Intel SGX SDK offers this advice: “Code should verify the results and perform a threat evaluation to determine the impact on trusted code if the results were spoofed.”

In our tutorial password manager, there are three possible outcomes:

RDRAND and/or RDSEED are not detected, but a positive result for one or both is spoofed. This will lead to an illegal instruction fault at runtime, at which point the program will crash.
RDRAND is detected, but a negative result is spoofed. This will result in an error at runtime, causing the program to exit gracefully since a required feature is not detected.
RDSEED is detected, but a negative result is spoofed. This will cause the program to fall back to the seed-from-RDRAND method for generating random seeds, which has a small performance impact. The program will otherwise function normally.

Since our worst-case scenarios are denial-of-service issues, which do not compromise the application’s secrets or robustness, we will not attempt to detect spoofing attacks.

Generating Seeds from RDRAND

In the event that the underlying CPU does not support the RDSEED instruction, we need to be able to use the RDRAND instruction to generate random seeds that are functionally equivalent to what we would have received from RDSEED if it were available. The Intel Digital Random Number Generator (DRNG) Software Implementation Guide describes the process of obtaining random seeds from RDRAND in detail, but the short version is that one method for doing this is to generate 512 pairs of 128-bit values and mix the intermediate values together using the CBC-MAC mode of AES to produce a single, 128-bit seed. The process is repeated to generate as many seeds as necessary.

In the non-Intel SGX code path, the method seed_from_rdrand uses CNG to build the cryptographic algorithm. Since the Intel SGX code path can’t depend on CNG, we once again need to turn to the trusted cryptographic library that is distributed with the Intel SGX SDK. The changes are summarized in Table 3.

Algorithm	CNG Primitives and Support Functions	Intel® SGX Trusted Cryptography Library Primitives and Support Functions
aes-cmac	BCryptOpenAlgorithmProvider BCryptGenerateSymmetricKey BCryptSetProperty BCryptEncrypt BCryptDestroyKey BCryptCloseAlgorithmProvider	sgx_cmac128_init sgx_cmac128_update sgx_cmac128_final sgx_cmac128_close

Table 3. Cryptographic function changes to the E_DRNG class’s seed_from_rdrand method

Why is this algorithm embedded in the DRNG class and not implemented in the Crypto class with the other cryptographic algorithms? This is simply a design decision. The DRNG class only needs this one algorithm, so we chose not to create a co-dependency between DRNG and Crypto (currently, Crypto does depend on DRNG). The Crypto class is also structured to provide the cryptographic services for vault operations rather than function as a general-purpose cryptographic API.

Why Not Use sgx_read_rand?

The Intel SGX SDK provides the function sgx_read_rand as a means of obtaining random numbers inside of an enclave. There are three reasons why we aren’t using it:

As documented in the Intel SGX SDK, this function is “provided to replace the C standard pseudo-random sequence generation functions inside the enclave, since these standard functions are not supported in the enclave, such as rand, srand, etc.” While sgx_read_rand does call the RDRAND instruction if it is supported by the CPU, it falls back to the trusted C library’s implementation of srand and rand if it is not. The random numbers produced by the C library are not suitable for cryptographic use. It is highly unlikely that this situation will ever occur, but as mentioned in the section on CPUID, we must not assume that it will never occur.
There is no Intel SGX SDK function for calling the RDSEED instruction and that means we still have to use compiler intrinsics in our code. While we could replace the RDRAND intrinsics with calls to sgx_read_rand, it would not gain us anything in terms of code management or structure and it would cost us additional time.
The intrinsics will marginally outperform sgx_read_rand since there is one less layer of function calls in the resulting code.

Wrapping Up

With these code changes, we have a fully functioning enclave! However, there are still some inefficiencies in the implementation and some gaps in functionality, and we’ll revisit the enclave design in Parts 7 and 8 in order to address them.

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core, including the enclave and its wrapper functions. This source code should be functionally identical to Part 3, only we have hardcoded Intel SGX support to be on.

Coming Up Next

In Part 6 of the tutorial we’ll add dynamic feature detection to the password manager, allowing it to choose the appropriate code path based on whether or not Intel SGX is supported on the underlying platform. Stay tuned!

↧

Intel® Software Guard Extensions Tutorial Series: Part 6, Dual Code Paths

October 28, 2016, 8:53 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 5, Enclave Development

In Part 6 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series, we set aside the enclave to address an outstanding design requirement that was laid out in Part 2, Application Design: provide support for dual code paths. We want to make sure our Tutorial Password Manager will function on hosts both with and without Intel SGX capability. Much of the content in this part comes from the article, Properly Detecting Intel® Software Guard Extensions in Your Applications.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series.

All Intel® Software Guard Extensions Applications Need Dual Code Paths

First it’s important to point out that all Intel SGX applications must have dual code paths. Even if an application is written so that it should only execute if Intel SGX is available and enabled, a fallback code path must exist so that you can present a meaningful error message to the user and then exit gracefully.

In short, an application should never crash or fail to launch solely because the platform does not support Intel SGX.

Scoping the Problem

In Part 5 of the series we completed our first version of our application enclave and tested it by hardcoding the enclave support to be on. That was done by setting the _supports_sgx flag in PasswordCoreNative.cpp.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 1;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

Obviously, we can’t leave this on by default. The convention for feature detection is that features are off by default and turned on if they are detected. So our first step is to undo this change and set the flag back to 0, effectively disabling the Intel SGX code path.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 0;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

However, before we get into the feature detection procedure, we’ll give the console application that runs our test suite, CLI Test App, a quick functional test by executing it on an older system that does not have the Intel SGX feature. With this flag set to zero, the application will not choose the Intel SGX code path and thus should run normally.

Here’s the output from a 4th generation Intel® Core™ i7 processor-based laptop, running Microsoft Windows* 8.1, 64-bit. This system does not support Intel SGX.

CLI Test App

What Happened?

Clearly we have a problem even when the Intel SGX code path is explicitly disabled in the software. This application, as written, cannot execute on a system without Intel SGX support. It didn’t even start executing. So what’s going on?

The clue in this case comes from the error message in the console window:

System.IO.FileNotFoundException: Could not load file or assembly ‘PasswordManagerCore.dll’ or one of its dependencies. The specified file could not be found.

Let’s take a look at PasswordManagerCore.dll and its dependencies:

Additional Dependencies

In addition to the core OS libraries, we have dependencies on bcrypt.lib and EnclaveBridge.lib, which will require bcrypt.dll and EnclaveBridge.dll at runtime. Since bcrypt.dll comes from Microsoft and is included in the OS, we can reasonably assume its dependencies, if any, are already installed. That leaves EnclaveBridge.dll.

Examining its dependencies, we see the following:

Additional Dependencies

This is the problem. Even though we have the Intel SGX code path explicitly disabled, EnclaveBridge.dll still has references to the Intel SGX runtime libraries. All symbols in an object module must be resolved as soon as it is loaded. It doesn’t matter if we disable the Intel SGX code path: undefined symbols are still present in the DLL. When PasswordManagerCore.dll loads, it resolves its undefined symbols by loading bcrypt.dll and EnclaveBridge.dll, the latter of which, in turn, attempts to resolve its undefined symbols by loading sgx_urts.dll and sgx_uae_service.dll. The system we tried to run our command-line test application on does not have these libraries, and since the OS can’t resolve all of the symbols it throws an exception and the program crashes before it even starts.

These two DLLs are part of the Intel SGX Platform Software (PSW) package, and without them Intel SGX applications written using the Intel SGX Software Development Kit (SDK) cannot execute. Our application needs to be able to run even if these libraries are not present.

The Platform Software Package

As mentioned above, the runtime libraries are part of the PSW. In addition to these support libraries, the PSW includes:

Services that support and maintain the trusted compute block (TCB) on the system
Services that perform and manage certain Intel SGX operations such as attestation
Interfaces to platform services such as trusted time and the monotonic counters

The PSW must be installed by the application installer when deploying an Intel SGX application, because Intel does not offer the PSW for direct download by end users. Software vendors must not assume that it will already be present and installed on the destination system. In fact, the license agreement for Intel SGX specifically states that licensees must re-distribute the PSW with their applications.

We’ll discuss the PSW installer in more detail in a future installment of the series covering packaging and deployment.

Detecting Intel Software Guard Extensions Support

So far we’ve focused on the problem of just starting our application on systems without Intel SGX support, and more specifically, without the PSW. The next step is to detect whether or not Intel SGX support is present and enabled once the application is running.

Intel SGX feature detection is, unfortunately, a complicated procedure. For a system to be Intel SGX capable, four conditions must be met:

The CPU must support Intel SGX.
The BIOS must support Intel SGX.
In the BIOS, Intel SGX must be explicitly enabled or set to the “software controlled” state.
The PSW must be installed on the platform.

Note that the CPUID instruction, alone, is not sufficient to detect the usability of Intel SGX on a platform. It can tell you whether or not the CPU supports the feature, but it doesn’t know anything about the BIOS configuration or the software that is installed on a system. Relying solely on the CPUID results to make decisions about Intel SGX support can potentially lead to a runtime fault.

To make feature detection even more difficult, examining the state of the BIOS is not a trivial task and is generally not possible from a user process. Fortunately the Intel SGX SDK provides a simple solution: the function sgx_enable_device will both check for Intel SGX capability and attempt to enable it if the BIOS is set to the software control state (the purpose of the software control setting is to allow applications to enable Intel SGX without requiring users to reboot their systems and enter their BIOS setup screens, a particularly daunting and intimidating task for non-technical users).

The problem with sgx_enable_device, though, is that it is part of the Intel SGX runtime, which means the PSW must be installed on the system in order to use it. So before we attempt to call sgx_enable_device, we must first detect whether or not the PSW is present.

Implementation

With our problem scoped out, we can now lay out the steps that must be followed, in order, for our dual-code path application to function properly. Our application must:

Load and begin executing even without the Intel SGX runtime libraries.
Determine whether or not the PSW package is installed.
Determine whether or not Intel SGX is enabled (and attempt to enable it).

Loading and Executing without the Intel Software Guard Extensions Runtime

Our main application depends on PasswordManagerCore.dll, which depends on EnclaveBridge.dll, which in turn depends on the Intel SGX runtime. Since all symbols need to be resolved when an application loads, we need a way to prevent the loader from trying to resolve symbols that come from the Intel SGX runtime libraries. There are two options:

Option #1: Dynamic Loading

In dynamic loading, you don’t explicitly link the library in the project. Instead you use system calls to load the library at runtime and then look up the names of each function you plan to use in order to get the addresses of where they have been placed in memory. Functions in the library are then invoked indirectly via function pointers.

Dynamic loading is a hassle. Even if you only need a handful of functions, it can be a tedious process to prototype function pointers for every function that is needed and get their load address, one at a time. You also lose some of the benefits provided by the integrated development environment (such as prototype assistance) since you are no longer explicitly calling functions by name.

Dynamic loading is typically used in extensible application architectures (for example, plug-ins).

Option #2: Delayed-Loaded DLLs

In this approach, you dynamically link all your libraries in the project, but instruct Windows to do delayed loading of the problem DLL. When a DLL is delay-loaded, Windows does not attempt to resolve symbols that are defined by that DLL when the application starts. Instead it waits until the program makes its first call to a function that is defined in that DLL, at which point the DLL is loaded and the symbols get resolved (along with any of its dependencies). What this means is that a DLL is not loaded until the application needs it. A beneficial side effect of this approach is that it allows applications to reference a DLL that is not installed, so long as no functions in that DLL are ever called.

When the Intel SGX feature flag is off, that is exactly the situation we are in so we will go with option #2.

You specify the DLL to be delay-loaded in the project configuration for the dependent application or DLL. For the Tutorial Password Manager, the best DLL to mark for delayed loading is EnclaveBridge.dll as we only call this DLL if the Intel SGX path is enabled. If this DLL doesn’t load, neither will the two Intel SGX runtime DLLS.

We set the option in the Linker -> Input page of the PasswordManagerCore.dll project configuration:

Password Manager

After the DLL is rebuilt and installed on our 4th generation Intel Core processor system, the console test application works as expected.

CLI Test App

Detecting the Platform Software Package

Before we can call the sgx_enable_device function to check for Intel SGX support on the platform, we first have to make sure that the PSW package is installed because sgx_enable_device is part of the Intel SGX runtime. The best way to do this is to actually try to load the runtime libraries.

We know from the previous step that we can’t just dynamically link them because that will cause an exception when we attempt to run the program on a system that does not support Intel SGX (or have the PSW package installed). But we also can’t rely on delay-loaded DLLs either: delayed loading can’t tell us if a library is installed because if it isn’t, the application will still crash! That means we must use dynamic loading to test for the presence of the runtime libraries.

The PSW runtime libraries should be installed in the Windows system directory so we’ll use GetSystemDirectory to get that path, and limit the DLL search path via a call to SetDllDirectory. Finally, the two libraries will be loaded using LoadLibrary. If either of these calls fail, we know the PSW is not installed and that the main application should not attempt to run the Intel SGX code path.

Detecting and Enabling Intel Software Guard Extensions

Since the previous step dynamically loads the PSW runtime libraries, we can just look up the symbol for sgx_enable_device manually and then invoke it via a function pointer. The result will tell us whether or not Intel SGX is enabled.

Implementation

To implement this in the Tutorial Password Manager we’ll create a new DLL called FeatureSupport.dll. We can safely dynamically link this DLL from the main application since it has no explicit dependencies on other DLLs.

Our feature detection will be rolled into a C++/CLI class called FeatureSupport, which will also include some high-level functions for getting more information about the state of Intel SGX. In rare cases, enabling Intel SGX via software may require a reboot, and in rarer cases the software enable action fails and the user may be forced to enable it explicitly in their BIOS.

The class declaration for FeatureSupport is shown below.

typedef sgx_status_t(SGXAPI *fp_sgx_enable_device_t)(sgx_device_status_t *);


public ref class FeatureSupport {
private:
	UINT sgx_support;
	HINSTANCE h_urts, h_service;

	// Function pointers

	fp_sgx_enable_device_t fp_sgx_enable_device;

	int is_psw_installed(void);
	void check_sgx_support(void);
	void load_functions(void);

public:
	FeatureSupport();
	~FeatureSupport();

	UINT get_sgx_support(void);
	int is_enabled(void);
	int is_supported(void);
	int reboot_required(void);
	int bios_enable_required(void);

	// Wrappers around SGX functions

	sgx_status_t enable_device(sgx_device_status_t *device_status);

};

Here are the low-level routines that check for the PSW package and attempt to detect and enable Intel SGX.

int FeatureSupport::is_psw_installed()
{
	_TCHAR *systemdir;
	UINT rv, sz;

	// Get the system directory path. Start by finding out how much space we need
	// to hold it.

	sz = GetSystemDirectory(NULL, 0);
	if (sz == 0) return 0;

	systemdir = new _TCHAR[sz + 1];
	rv = GetSystemDirectory(systemdir, sz);
	if (rv == 0 || rv > sz) return 0;

	// Set our DLL search path to just the System directory so we don't accidentally
	// load the DLLs from an untrusted path.

	if (SetDllDirectory(systemdir) == 0) {
		delete systemdir;
		return 0;
	}

	delete systemdir; // No longer need this

	// Need to be able to load both of these DLLs from the System directory.

	if ((h_service = LoadLibrary(_T("sgx_uae_service.dll"))) == NULL) {
		return 0;
	}

	if ((h_urts = LoadLibrary(_T("sgx_urts.dll"))) == NULL) {
		FreeLibrary(h_service);
		h_service = NULL;
		return 0;
	}

	load_functions();

	return 1;
}

void FeatureSupport::check_sgx_support()
{
	sgx_device_status_t sgx_device_status;

	if (sgx_support != SGX_SUPPORT_UNKNOWN) return;

	sgx_support = SGX_SUPPORT_NO;

	// Check for the PSW

	if (!is_psw_installed()) return;

	sgx_support = SGX_SUPPORT_YES;

	// Try to enable SGX

	if (this->enable_device(&sgx_device_status) != SGX_SUCCESS) return;

	// If SGX isn't enabled yet, perform the software opt-in/enable.

	if (sgx_device_status != SGX_ENABLED) {
		switch (sgx_device_status) {
		case SGX_DISABLED_REBOOT_REQUIRED:
			// A reboot is required.
			sgx_support |= SGX_SUPPORT_REBOOT_REQUIRED;
			break;
		case SGX_DISABLED_LEGACY_OS:
			// BIOS enabling is required
			sgx_support |= SGX_SUPPORT_ENABLE_REQUIRED;
			break;
		}

		return;
	}

	sgx_support |= SGX_SUPPORT_ENABLED;
}

void FeatureSupport::load_functions()
{
	fp_sgx_enable_device = (fp_sgx_enable_device_t)GetProcAddress(h_service, "sgx_enable_device");
}

// Wrappers around SDK functions so the user doesn't have to mess with dynamic loading by hand.

sgx_status_t FeatureSupport::enable_device(sgx_device_status_t *device_status)
{
	check_sgx_support();

	if (fp_sgx_enable_device == NULL) {
		return SGX_ERROR_UNEXPECTED;
	}

	return fp_sgx_enable_device(device_status);
}

Wrapping Up

With these code changes, we have integrated Intel SGX feature detection into our application! It will execute smoothly on systems both with and without Intel SGX support and choose the appropriate code branch.

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core, including the new feature detection DLL. Additionally, we have added a new GUI-based test program that automatically selects the Intel SGX code path, but lets you disable it if desired (this option is only available if Intel SGX is supported on the system).

SGX Code Branch

The console-based test program has also been updated to detect Intel SGX, though it cannot be configured to turn it off without modifying the source code.

Coming Up Next

We’ll revisit the enclave in Part 7 in order to fine-tune the interface. Stay tuned!

↧

Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation

November 14, 2016, 7:23 am

Latest and popular articles on Intel Technologies

≫ Next: NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 6, Dual Code Paths

Download Code Sample [ZIP 12.03 MB]

In some high-quality games, an avatar may have facial expression animation. These animations are usually pre-generated by the game artist and replayed in the game according to the fixed story plot. If players are given the ability to animate that avatar’s face based on their own facial motion in real time, it may enable personalized expression interaction and creative game play. Intel® RealSense™ technology is based on a consumer-grade RGB-D camera, which provides the building blocks like face detection and analysis functions for this new kind of usage. In this article, we introduce a method for an avatar to simulate user facial expression with the Intel® RealSense™ SDK and also provide the sample codes to be downloaded.

Figure 1: The sample application of Intel® RealSense™ SDK-based face tracking and animation.

System Overview

Our method is based on the idea of the Facial Action Coding System (FACS), which deconstructs facial expressions into specific Action Units (AU). AUs are a contraction or relaxation of one or more muscles. With the weights of the AUs, nearly any anatomically possible facial expression can be synthesized.

Our method also assumes that the user and avatar have compatible expression space so that the AU weights can be shared between them. Table 1 illustrates the AUs defined in the sample code.

Action Units	Description
MOUTH_OPEN	Open the mouth
MOUTH_SMILE_L	Raise the left corner of mouth
MOUTH_SMILE_R	Raise the right corner of mouth
MOUTH_LEFT	Shift the mouth to the left
MOUTH_RIGHT	Shift the mouth to the right

EYEBROW_UP_L	Raise the left eyebrow
EYEBROW_UP_R	Raise the right eyebrow
EYEBROW_DOWN_L	Lower the left eyebrow
EYEBROW_DOWN_R	Lower the right eyebrow

EYELID_CLOSE_L	Close left eyelid
EYELID_CLOSE_R	Close right eyelid
EYELID_OPEN_L	Raise left eyelid
EYELID_OPEN_R	Raise right eyelid

EYEBALL_TURN_R	Move both eyeballs to the right
EYEBALL_TURN_L	Move both eyeballs to the left
EYEBALL_TURN_U	Move both eyeballs up
EYEBALL_TURN_D	Move both eyeballs down

Table 1: The Action Units defined in the sample code.

The pipeline of our method includes three stages: (1) tracking the user face by the Intel RealSense SDK, (2) using the tracked facial feature data to calculate the AU weights of the user’s facial expression, and (3) synchronizing the avatar facial expression through normalized AU weights and corresponding avatar AU animation assets.

Prepare Animation Assets

To synthesize the facial expression of the avatar, the game artist needs to prepare the animation assets for each AU of the avatar’s face. If the face is animated by a blend-shape rig, the blend-shape model of the avatar should contain the base shape built for a face of neutral expression and the target shapes, respectively, constructed for the face with the maximum pose of the corresponding AU. If a skeleton rig is used for facial animation, the animation sequence must be respectively prepared for every AU. The key frames of the AU animation sequence transform the avatar face from a neutral pose to the maximum pose of the corresponding AU. The duration of the animation doesn’t matter, but we recommend a duration of 1 second (31 frames, from 0 to 30).

The sample application demonstrates the animation assets and expression synthesis method for avatars with skeleton-based facial animation.

In the rest of the article, we discuss the implementation details in the sample code.

Face Tracking

In our method, the user face is tracked by the Intel RealSense SDK. The SDK face-tracking module provides a suite of the following face algorithms:

Face detection: Locates a face (or multiple faces) from an image or a video sequence, and returns the face location in a rectangle.
Landmark detection: Further identifies the feature points (eyes, mouth, and so on) for a given face rectangle.
Pose detection: Estimates the face’s orientation based on where the user's face is looking.

Our method chooses the user face that is closest to the Intel® RealSense™ camera as the source face for expression retargeting and gets this face’s 3D landmarks and orientation in camera space to use in the next stage.

Facial Expression Parameterization

Once we have the landmarks and orientation of the user’s face, the facial expression can be parameterized as a vector of AU weights. To obtain the AU weights, which can be used to control an avatar’s facial animation, we first measure the AU displacement. The displacement of the k-th AU

D_k is achieved by the following formula:

Where S_k^c is the k-th AU state in the current expression, S_kⁿ is the k-th AU state in a neutral expression, and N_k is the normalization factor for k-th AU state.

We measure AU states S_k^c and S_kⁿ in terms of the distances between the associated 3D landmarks. Using a 3D landmark in camera space instead of a 2D landmark in screen space can prevent the measurement from being affected by the distance between the user face and the Intel RealSense camera.

Different users have different facial geometry and proportions. So the normalization is required to ensure that the AU displacement extracted from two users have approximately the same magnitude when both are in the same expression. We calculated N_k in the initial calibration step on the user’s neutral expression, using the similar method to measure MPEG4 FAPU (Face Animation Parameter Unit).

In normalized expression space, we can define the scope for each AU displacement. The AU weights are calculated by the following formula:

Where D_k^max is the maximum of the k-th AU displacement.

Because of the accuracy of face tracking, the measured AU weights derived from the above formulas may generate an unnatural expression in some special situations. In the sample application, geometric constraints among AUs are used to adjust the measured weights to ensure that a reconstructed expression is plausible, even if not necessarily close to the input geometrically.

Also because of the input accuracy, the signal of the measured AU weights is noisy, which may have the reconstructed expression animation stuttering in some special situations. So smoothing AU weights is necessary. However, smoothing may cause latency, which impacts the agility of expression change.

We smooth the AU weights by interpolation between the weight of the current frame and that of previous frame as follows:

Where w_i,k is the weight of the k-th AU in i-th frame.

To balance the requirements of both smoothing and agility, the smoothing factor of the i-th frame for AU weights, α_i is set as the face-tracking confidence of this frame. The face-tracking confidence is evaluated according to the lost tracking rate and the angle of the face deviating from a neutral pose. The higher the lost tracking rate and bigger deviation angle, the lower the confidence to get accurate tracking data.

Similarly, the face angle is smoothed by interpolation between the angle of the current frame and that of the previous frame as follows:

To balance the requirements of both smoothing and agility, the smoothing factor of the i-th frame for face angle, β_i, is adaptive to face angles and calculated by

Where T is the threshold of noise, taking the smaller variation between face angles as more noise to smooth out, and taking the bigger variation as more actual head rotation to respond to.

Expression Animation Synthesis

This stage synthesizes the complete avatar expression in terms of multiple AU weights and their corresponding AU animation assets. If the avatar facial animation is based on a blend-shape rig, the mesh of the final facial expression B_final is generated by the conventional blend-shape formula as follows:

Where B₀ is the face mesh of a neutral expression, B_i is the face mesh with the maximum pose of the i-th AU.

If the avatar facial animation is based on a skeleton rig, the bone matrices of the final facial expression S_final are achieved by the following formula:

Where S₀ is the bone matrices of a neutral expression, A_i(w_i) is the bone matrices of the i-th AU extracted from this AU’s key-frame animation sequence A_i by this AU’s weight w_i.

The sample application demonstrates the implementation of facial expression synthesis for a skeleton-rigged avatar.

Performance and Multithreading

Real-time facial tracking and animation is a CPU-intensive function. Integrating the function into the main loop of the application may significantly degrade application performance. To solve the issue, we wrap the function in a dedicated work thread. The main thread retrieves the new data from the work thread just when the data are updated. Otherwise, the main thread uses the old data to animate and render the avatar. This asynchronous integration mode minimizes the performance impact of the function to the primary tasks of the application.

Running the Sample

When the sample application launches (Figure 1), by default it first calibrates the user’s neutral expression, and then real-time mapping user performed expressions to the avatar face. Pressing the “R” key resets the system when the user wants to or a new user substitutes to control the avatar expression, which will activate a new session including calibration and retargeting.

During the calibration phase—in the first few seconds after the application launches or is reset—the user is advised to hold his or her face in a neutral expression and position his or her head so that it faces the Intel RealSense camera in the frontal-parallel view. The calibration completes when the status bar of face-tracking confidence (in the lower-left corner of the Application window) becomes active.

After calibration, the user is free to move his or her head and perform any expression to animate the avatar face. During this phase, it’s best for the user to keep an eye on the detected Intel RealSense camera landmarks, and make sure they are green and appear in the video overlay.

Summary

Face tracking is an interesting function supported by Intel® RealSense™ technology. In this article, we introduce a reference implementation of user-controlled avatar facial animation based on Intel® RealSense™ SDK, as well as the sample written in C++ and uses DirectX*. The reference implementation includes how to prepare animation assets, to parameterize user facial expression and to synthesize avatar expression animation. Our practices show that not only are the algorithms of the reference implementation essential to reproduce plausible facial animation, but also the high quality facial animation assets and appropriate user guide are important for better user experience in real application environment.

Reference

1. https://en.wikipedia.org/wiki/Facial_Action_Coding_System

2. https://www.visagetechnologies.com/uploads/2012/08/MPEG-4FBAOverview.pdf

3. https://software.intel.com/en-us/intel-realsense-sdk/download

About the Author

Sheng Guo is a senior application engineer in Intel Developer Relations Division. He has been working on top gaming ISVs with Intel client platform technologies and performance/power optimization. He has 10 years expertise on 3D graphics rendering, game engine, computer vision etc., as well as published several papers in academic conference, and some technical articles and samples in industrial websites. He hold the bachelor degree of computer software from Nanjing University of Science and Technology, and the Master’s degree in Computer Science from Nanjing University.

Wang Kai is a senior application engineer from Intel Developer Relations Division. He has been in the game industry for many years. He has professional expertise on graphics, game engine and tools development. He holds a bachelor degree from Dalian University of Technology.

↧

NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

November 22, 2016, 1:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

≪ Previous: Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation

In August of 2016, half a million fans came to Rio de Janeiro to witness 17 days and nights of the Summer Olympics. At the same time, millions more people all over the world were enjoying the competition live in front of their TV screens.

Arranging a live TV broadcast to another continent is a daunting task that demands reliable equipment and agile technical support. That was the challenge for Thomson Reuters, the world’s largest multimedia news agency.

To help it meet the challenge, Thomson Reuters chose NetUP as its technical partner, using NetUP equipment for delivering live broadcasts from Rio de Janeiro to its New York and London offices. In developing the NetUP Transcoder, NetUP worked with Intel, using Intel® Media SDK, a cross-platform API for developing media applications on Windows*.

“This project was very important for us,” explained Abylay Ospan, founder of NetUP. “It demonstrates the quality and reliability of our solutions, which can be used for broadcasting global events such as the Olympics. Intel Media SDK gave us the fast transcoding we needed to help deliver the Olympics to a worldwide audience.”

Get the whole story in our new case study.

↧

Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

November 28, 2016, 2:07 pm

Latest and popular articles on Intel Technologies

≫ Next: Building an Arcade Cabinet with Skull Canyon

≪ Previous: NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

Part 7 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series revisits the enclave interface and adds a small refinement to make it simpler and more efficient. We’ll discuss how the proxy functions marshal data between unprotected memory space and the enclave, and we’ll also discuss one of the advanced features of the Enclave Definition Language (EDL) syntax.

You can find a list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Source code is provided with this installment of the series. With this release we have migrated the application to the 1.7 release of the Intel SGX SDK and also moved our development environment to Microsoft Visual Studio* Professional 2015.

The Proxy Functions

When building an enclave using the Intel SGX SDK you define the interface to the enclave in the EDL. The EDL specifies which functions are ECALLs (“enclave calls,” the functions that enter the enclave) and which ones are OCALLs (“outside calls,” the calls to untrusted functions from within the enclave).

When the project is built, the Edger8r tool that is included with the Intel SGX SDK parses the EDL file and generates a series of proxy functions. These proxy functions are essentially wrappers around the real functions that are prototyped in the EDL. Each ECALL and OCALL gets a pair of proxy functions: a trusted half and an untrusted half. The trusted functions go into EnclaveProject_t.h and EnclaveProjct_t.c and are included in the Autogenerated Files folder of your enclave project. The untrusted proxies go into EnclaveProject_u.h and EnclaveProject_u.c and are placed in the Autogenerated Files folder of the project that will be interfacing with your enclave.

Your program does not call the ECALL and OCALL functions directly; it calls the proxy functions. When you make an ECALL, you call the untrusted proxy function for the ECALL, which in turn calls the trusted proxy function inside the enclave. That proxy then calls the “real” ECALL and the return value propagates back to the untrusted function. This sequence is shown in Figure 1. When you make an OCALL, the sequence is reversed: you call the trusted proxy function for the OCALL, which calls an untrusted proxy function outside the enclave that, in turn, invokes the “real” OCALL.

Figure 1. Proxy functions for an ECALL.

The proxy functions are responsible for:

Marshaling data into and out of the enclave
Placing the return value of the real ECALL or OCALL in an address referenced by a pointer parameter
Returning the success or failure of the ECALL or OCALL itself as an sgx_status_t value

Note that this means that each ECALL or OCALL has potentially two return values. There’s the success of the ECALL or OCALL itself, meaning, were we able to successfully enter or exit the enclave, and then the return value of the function being called in the ECALL or OCALL.

The EDL syntax for the ECALL functions ve_lock() and ve_unlock() in our Tutorial Password Manager’s enclave is shown below:

enclave {
   trusted {
      public void ve_lock ();
      public int ve_unlock ([in, string] char *password);
    }
}

And here are the untrusted proxy function prototypes that are generated by the Edger8r tool:

sgx_status_t ve_lock(sgx_enclave_id_t eid);
sgx_status_t ve_unlock(sgx_enclave_id_t eid, int* retval, char* password);

Note the additional arguments that have been added to the parameter list for each function and that the functions now return a type of sgx_status_t.

Both proxy functions need the enclave identifier, which is passed in the first parameter, eid. The ve_lock() function has no parameters and does not return a value so no further changes are necessary. The ve_unlock() function, however, does both. The second argument to the proxy function is a pointer to an address that will store the return value from the real ve_unlock() function in the enclave, in this case a return value of type int. The actual function parameter, char *password, is included after that.

Data Marshaling

The untrusted portion of an application does not have access to enclave memory. It cannot read from or write to these protected memory pages. This presents some difficulties when the function parameters include pointers. OCALLs are especially problematic, because a memory allocated inside the enclave is not accessible to the OCALL, but even ECALLs can have issues. Enclave memory is mapped into the application’s memory space, so enclave pages can be adjacent to unprotected memory pages. If you pass a pointer to untrusted memory into an enclave, and then fail to do appropriate bounds checking in your enclave, you may inadvertently cross the enclave boundary when reading or writing to that memory in your ECALL.

The Intel SGX SDK’s solution to this problem is to copy the contents of data buffers into and out of enclaves, and have the ECALLs and OCALLs operate on these copies of the original memory buffer. When you pass a pointer into an enclave, you specify in the EDL whether the buffer referenced by the pointer is being pass into the call, out of the call, or in both directions, and then you specify the size of the buffer. The proxy functions generated by the Edger8r tool use this information to check that the address range does not cross the enclave boundary, copy the data into or out of the enclave as indicated, and then substitute a pointer to the copy of the buffer in place of the original pointer.

This is the slow-and-safe approach to marshaling data and pointers between unprotected memory and enclave memory. However, this approach has drawbacks that may make it undesirable in some cases:

It’s slow, since each memory buffer is checked and copied.
It requires additional heap space in your enclave to store the copies of the data buffers.
The EDL syntax is a little verbose.

There are also cases where you just need to pass a raw pointer into an ECALL and out to an OCALL without it ever being used inside the enclave, such as when passing a function pointer for a callback function straight through to an OCALL. In this case, there is no data buffer per se, just the pointer address itself, and the marshaling functions generated by Edger8r actually get in the way.

The Solution: user_check

Fortunately, the EDL language does support passing a raw pointer address into an ECALL or an OCALL, skipping both the boundary checks and the data buffer copy. The user_check parameter tells the Edger8r tool to pass a pointer as it is and assume that the developer has done the proper bounds checking on the address. When you specify user_check you are essentially trading safety for performance.

A pointer marked with the user_check does not have a direction (in or out) associated with it, because there is no buffer copy taking place. Mixing user_check with in or out will result in an error at compile time. Similarly, you don’t supply a count or size parameter, either.

In the Tutorial Password Manager, the most appropriate place to use the user_check parameter is in the ECALLs that load and store the encrypted password vault. While our design constraints put a practical limit on the size of the vault itself, generally speaking these sorts of bulk reads and writes benefit from allowing the enclave to directly operate on untrusted memory.

The original EDL for ve_load_vault() and ve_get_vault() looks like this:

public int ve_load_vault ([in, count=len] unsigned char *edata, uint32_t len);

public int ve_get_vault ([out, count=len] unsigned char *edata, uint32_t len);

Rewriting these to specify user_check results in the following:

public int ve_load_vault ([user_check] unsigned char *edata);

public int ve_get_vault ([user_check] unsigned char *edata, uint32_t len);

Notice that we were able to drop the len parameter from ve_load_vault(). As you might recall from Part 4, the issue we had with this function was that although the length of the vault is stored as a variable in the enclave, the proxy functions don’t have access to it. In order for the ECALL’s proxy functions to copy the incoming data buffer, we had to supply the length in the EDL so that the Edger8r tool would know the size of the buffer. With the user_check option, there is no buffer copy operation, so this problem goes away. The enclave can read directly from untrusted memory, and it can use its internal variable to determine how many bytes to read.

However, we still send the length as a parameter to ve_get_vault(). This is a safety check to ensure that we don’t accidentally overflow a buffer when fetching the encrypted vault from the enclave.

Summary

The EDL provides three options for passing pointers into an ECALL or an OCALL: in, out, and user_check. These options are summarized in Table 1.

Specifier/Direction	ECALL	OCALL
in	The buffer is copied from the application into the enclave. Changes will only affect the buffer inside the enclave.	The buffer is copied from the enclave to the application. Changes will only affect the buffer outside the enclave.
out	A buffer will be allocated inside the enclave and initialized with zeros. It will be copied to the original buffer when the ECALL exits.	A buffer will be allocated outside the enclave and initialized with zeros. This untrusted buffer will be copied to the original buffer in the enclave when the OCALL exits.
in, out	Data is copied back and forth.	Data is copied back and forth.
user_check	The pointer is not checked. The raw address is passed.	The pointer is not checked. The raw address is passed.

Table 1. Pointer specifiers and their meanings in ECALLs and OCALLs.

If you use the direction indicators, the data buffer referenced by your pointer gets copied and you must supply a count so that the Edger8r can determine how many bytes are in the buffer. If you specify user_check, the raw pointer is passed to the ECALL or OCALL unaltered.

Sample Code

The code sample for this part of the series has been updated to build against the Intel SGX SDK version 1.7 using Microsoft Visual Studio 2015. It should still work with the Intel SGX SDK version 1.6 and Visual Studio 2013, but we encourage you to update to the newer release of the Intel SGX SDK.

Coming Up Next

In Part 8 of the series, we’ll add support for power events. Stay tuned!

↧

Building an Arcade Cabinet with Skull Canyon

December 1, 2016, 3:12 pm

Latest and popular articles on Intel Technologies

≫ Next: SOME METHODOLOGIES TO OPTIMIZE YOUR VR APPLICATIONS POWER ON INTEL® PLATFORM

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

Hi I’m Bela Messex, one half of Buddy System, a bedroom studio based in Los Angeles, and makers of the game Little Bug.

Why an Arcade Cabinet?

My co-developer and I come from worlds where DIY wasn’t a marketable aesthetic, but a natural and necessary creative path. Before we met and found ourselves in video game design we made interactive sculpture, zines, and comics. We’ve been interested in ways to blend digital games with physical interaction, and while this can take many forms, a straightforward route was to house our debut game, Little Bug, in a custom arcade cabinet. As it turns out doing so was painless, fun, and easy; and at events like Fantastic Arcade and Indiecade, it provided a unique interaction that really drew attendees.

The Plan

To start off, I rendered a design in Unity complete with Image Effects, Animations and completely unrealistic lighting… If only real life were like video games, but at least I now had a direction.

The Components

This worked for us and could be a good starting point for you, but you might want to tailor a bit to your game’s unique needs.

Intel NUC Skull Canyon.
2 arcade joysticks.
3 arcade buttons.
2 generic PC joystick boards with wires included.
4’ x 8’ MDF panel.
24” monitor.
8” LED accent light.
Power Strip.
Power Drill.
Nail gun and wood glue.
Screws of varying sizes and springs.
6” piano hinge.
Velcro strips.
Zip ties.
Black spray paint and multicolored paint markers.
Semi opaque plexi.

Building the Cabinet

When I was making sculptures, I mainly welded, so I asked my friend Paul for some help measuring and cutting the MDF panels. We did this by designing our shapes on the spot with a jigsaw, pencil, and basic drafting tools. Here is Paul in his warehouse studio with the soon to be cabinet.

We attached the cut pieces with glue and a nail gun, but you could use screws if you need a little more strength. Notice the hinge in the front - this was Paul’s idea and ended up being a life saver later on when I needed to install buttons and joysticks. Next to the paint can is a foot pedal we made specific for Little Bug’s unique controls: two joysticks and a button used simultaneously. On a gamepad this dual stick setup is no problem but translated to two full sized arcade joysticks both hands would be occupied, so how do you press that button? Solution: use your foot!

After painting the completed frame, it was time for the fun part - installing electronics. I used a cheap ($15) kit that include six buttons, a joystick, a USB controller board and all the wiring. After hundreds of plays, it’s all still working great. Notice the LED above the screen to light up the marquee for a classic arcade feel.

Once the NUC was installed in the back via velcro strips, I synced the buttons and joysticks inside the Unity inspector and created a new build specifically designed for the cabinet. Little Bug features hand drawn sprites, so we drew on all of the exterior designs with paint markers to keep that look coherent. The Marquee was made by stenciling painter’s tape with spray paint.

The Joy of Arcade

There is really nothing like watching players interact with a game you’ve made. Even though Little Bug itself is the same, the interaction is now fundamentally different, and as game designers it has been mesmerizing to watch people play it in this new way. The compact size and performance of the NUC was perfect for creating experiences like this, and it’s worked so well I’m already drawing up plans for more games in the same vein.

↧

SOME METHODOLOGIES TO OPTIMIZE YOUR VR APPLICATIONS POWER ON INTEL® PLATFORM

December 6, 2016, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: CSharp Application with Intel Software Guard Extension

≪ Previous: Building an Arcade Cabinet with Skull Canyon

As VR becomes a popular consumer product, more and more VR contents come out. From recent investment, lots of users love VR devices without wires, like AIO devices or Mobile devices. For these devices, it is not charging during playing so developers need to take special care of application power.

For details, please see the attachments.

↧

Notes for Utilizing TouchDesigner and the Intel RealSense Camera

Demo #1: Using the Depth Mapping of the R200 and SR300 Camera

Summary

Related Applications

About the Author

Password Managers At-A-Glance

Basic Application Requirements

The User Interface Framework

Password Vault Requirements

Passwords

Cryptographic Algorithms

Encryption Keys and User Authentication

Account Details

Coming Up Next

Mixing Managed Code and Native Code with C++/CLI

The Sample Application

The CPUID Tab

The RDRAND Tab

Overall Structure

The C# Application

The EnclaveLink DLL

The EnclaveBridge DLL

The OCALL and the Callback Sequence

The Delegate

Enclave Configuration

Summary

Optimized Performance & Efficiency

Enhanced Playback & Navigation

UI Enhancements​

More Resources - Get Started Optimizing Faster

Introduction

Prerequisites

Buffer Overflow

Intel Memory Protection Extensions

Troubleshooting

Conclusion

Related Articles

References

About the Author

Designing for Enclaves

Identify the Application’s Secrets

Identify the Providers and Consumers of the Application’s Secrets

Determine the Enclave Boundary

Protecting Secrets Outside the Enclave

Tailor the Application Components for the Enclave

The Non-Intel® Software Guard Extensions Code Path

Sample Code

Coming Up Next

Introduction

Key Concepts

Linking the Application

Deploying the Application

1 Introducing Packed APIs for GEMM

2 Example

3 Performance

5 Summary

1. Overview

2. ZLIB and Intel® IPP Implementation

3. What’s New in Intel® IPP 2017 Implementation of ZLIB

3.1 CPU-Specific Optimizations

3.2 New Fastest Compression Level

3.3 Deflate Parameters Tuning

3.4 Additional Compression Levels

4. Getting Started With Intel® IPP 2017 ZLIB

5. Usage Notes for Intel® IPP ZLIB Functions

5.1 Using the "Fastest" Compression Level

5.2 Tuning Compression Level

5.3 Using additional Compression Levels

Application Architecture

Further Refinements

Designing the Enclave

Enclave Logistics

The Enclave Definition Language

Pointer Direction

Buffer Size

Wrapper and Bridge Functions

The Constructor and Destructor

The Overload on initialize()

get_header()

accounts_get_info()

UI Enhancements