Platform Analyzer - Analyzing Healthy and not-so Healthy Applications

October 5, 2015, 4:42 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance Considerations for Resource Binding in Microsoft DirectX* 12

≪ Previous: Perceptual Drone Speech Recognition

Recently my wife purchased a thick and expensive book. As an ultrasonic diagnostician for children, she purchases many books, but this one had me puzzled. The book was titled Ultrasound Anatomy of the Healthy Child. Why would she need a book that showed only healthy children? I asked her and her answer was simple: to diagnose any disease, even one not yet discovered, you need to know what a healthy child looks like.

In this article we will act like doctors, analyzing and comparing a healthy and a not-so-healthy application.

Knock – knock – knock.

The doctor says: “It’s open, please enter.”

In walks our patient, Warrior Wave*, an awesome game in which your hand acts as the road for the warriors to cross. It’s extremely fun to play, innovative, and uses Intel® RealSense™ technology.

While playing the game, though, something felt a little off. Something that I hadn’t felt before in other games based on Intel® RealSense™ technology. The problem could be caused by so many things, but what is it in this case?

Like any good doctor who is equipped with the latest and greatest analysis tools to diagnose the problem, we have the perfect tools to analyze our patient.

Using Intel® Graphics Performance Analyzer (Intel® GPA) Platform Analyzer, we receive a time-line view of our application’s CPU load, frame time, frames per second (FPS), and draw calls:

Let’s take a look.

Hmm… the first things that catch our eye are the regular FPS surges that occur periodically. All is relatively smooth for ~200 milliseconds and then jumps up and down severely.

For comparison, let’s look at a healthy FPS trace bellow. The game in this trace felt smooth and played well.

No pattern was evident within the frame time, just normal random deviations.

But in our case we see regular surges. These surges happen around four times a second. Let’s investigate the problem deeper, by zooming in on one of the surges and seeing what happening in the threads:

We can see that working thread 2780 spends most of the time in synchronization. The thread does almost nothing but wait for the next frame from the Intel® RealSense™ SDK:

At the same time, we see that rendering goes in another worker thread. If we scroll down, we find thread 2372.

Instead of “actively” waiting for the next frame from the Intel RealSense SDK, the game could be doing valuable work. Drawing and Intel® RealSense™ SDK work could be done in one worker thread instead of two, simplifying thread communication.

Excessive inter-thread communication can drastically slow down the execution and cause many problems.

Here is the example of a “healthy” game, where the Intel® RealSense™ SDK work and the DirectX* calls are in one thread.

RealSense™ experts say: there is no point in waiting for the frames from the Intel® RealSense™ SDK. They won’t be ready any faster.

But we can see that the main problem is at the top of the timeline.

On average, five out of six CPU frames did not result in a GPU frame. This is the cause of the slow and uneven GPU frame rate, which on average is less than 16 FPS.

Now let’s look at the pipeline to try and understand how the code is executing. Looking at the amount of packets on “Engine 0,” the pipeline is filled to the brim, but the execution is almost empty.

The brain can process 10 to 12 separate images per second, perceiving them individually. This explains why the first movies were cut at a rate of 16 FPS: this is the average threshold at which the majority of people stop seeing a slide show and start seeing a movie.

Once again, let’s see the profile of the nice-looking game:

Notice that the GPU frames follow the CPU frames with little shift. For every CPU frame, there is a corresponding GPU that starts execution after a small delay.

Let’s try to understand why our game doesn’t have this pattern.

First, let’s examine our DirectX* calls. The highlighted one with the tooltip is our “Present” call that sends the finished frame to the GPU. In the screenshot above, we see that it creates a “Present” packet on the GPU pipeline (marked with X’s). At round the 2215 ms mark, it has moved closer to execution, jumping over three positions, but at 2231 ms it just disappears without completing execution.

And if we look at each present call within the trace, not one call successfully makes it to execution.

Question: How does the game draw itself if all our DirectX* Present calls are ignored?! Good thing we have good tools so we can figure this out. Let’s take a look.

Can you see something curious inside the gray oval? We can see that this packet, not caused by any DirectX* call of our code, still gets to the execution, fast and out of order. Hey, wait a minute!!!

Let's look closely at our packet.

And now to the packet that got executed.

Wow! It came from an EXTERNAL thread. What could this mean? External threads are threads that don’t belong to the game.

Our own packets get ignored, but an external thread draws our game? What? Hey, this tool went nuts!

No, the image is quite right. The explanation is that on the Windows* system (starting with Windows Vista*), there is a program called Desktop Window Manager (DWM), which does the actual composition on the screen. Its packets are the ones we see executing at a fast rate with high priority. And no, our packets aren’t lost—they are intercepted by DWM to create the final picture.

But why would DWM get involved in a full- screen game? After thinking a while, I realized that the answer is simple: I have a multi-monitor desktop configuration. Switching my second monitor off the schema made the Warrior Wave behave like other games: normal GPU FPS, no glitches, and no DWM packets.

The patient will live! What a relief!

But other games still worked well even with a multi-monitor configuration, right (says the evil voice in the back of my head)?

To dig deeper, we need another tool to do that. Intel® GPA Platform Analyzer allows you to see CPU and GPU execution over time, but it doesn’t give you lower level details of each frame.

We would need to look more closely at the Direct3D* Device creation code. For this we could use Intel® GPA Frame Analyzer for DirectX*, but this is a topic for another article.

So let’s summarize what we have learned:

During this investigation we were able to detect poor usage of threads that led to FPS surges and a nasty DWM problem that was easily fixed by switching the second monitor of the desktop schema.

Conclusion: Intel® GPA Platform Analyzer is a must-have tool for initial investigation of the problem. Get familiar with it and add it to your toolbox.

About the Author:

Alexander Raud works in the Intel® Graphics Performance Analyzers team in Russia and previously worked on the VTune Amplifier. Alex has dual citizenship in Russia and the EU, speaks Russian, English, some French, and is learning Spanish. Alex has a wife and two children and still manages to play Progressive Metal professionally and head the International Ministry at Jesus Embassy Church.

↧

Performance Considerations for Resource Binding in Microsoft DirectX* 12

October 8, 2015, 11:19 am

Latest and popular articles on Intel Technologies

≫ Next: How Intugine Integrated the Nimble* Gesture Recognition Platform with Intel® RealSense™ Technology

≪ Previous: Platform Analyzer - Analyzing Healthy and not-so Healthy Applications

By Wolfgang Engel, CEO of Confetti

With the release of Windows* 10 on July 29 and the release of the 6th generation Intel® Core™ processor family (code-name Skylake), we can now look closer into resource binding specifically for Intel® platforms.

The previous article “Introduction to Resource Binding in Microsoft DirectX* 12” introduced the new resource binding methods in DirectX 12 and concluded that with all these choices, the challenge is to pick the most desirable binding mechanism for the target GPU, types of resources, and their frequency of update.

This article describes how to pick different resource binding mechanisms to run an application efficiently on specific Intel’s GPUs.

Tools of the Trade

To develop games with DirectX 12, you need the following tools:

Windows 10
Visual Studio* 2013 or higher
DirectX 12 SDK comes with Visual Studio
DirectX 12-capable GPU and drivers

Overview

A descriptor is a block of data that describes an object to the GPU, in a GPU-specific opaque format. DirectX 12 offers the following descriptors, previously named “resource views” in DirectX 11:

Constant buffer view (CBV)
Shader resource view (SRV)
Unordered access view (UAV)
Sampler view (SV)
Render target view (RTV)
Depth stencil view (DSV)
and others

These descriptors or resource views can be considered a structure (also called a block) that is consumed by the GPU front end. The descriptors are roughly 32–64 bytes in size and hold information like texture dimensions, format, and layout.

Descriptors are stored in a descriptor heap, which represents a sequence of structures in memory.

A descriptor table holds offsets into this descriptor heap. It maps a continuous range of descriptors to shader slots by making them available through a root signature. This root signature can also hold root constants, root descriptors, and static samplers.

Figure 1. Descriptors, descriptor heap, descriptor tables, root signature.

Figure 1 shows the relationship between descriptors, a descriptor heap, descriptor tables, and the root signature.

The code that Figure 1 describes looks like this:

// the init function sets the shader registers
// parameters: type of descriptor, num of descriptors, base shader register
// the first descriptor table entry in the root signature in
// image 1 sets shader registers t1, b1, t4, t5
// performance: order from most frequent to least frequent used
D3D12_DESCRIPTOR_RANGE Param0Ranges[3];
Param0Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 1); // t1 Param0Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 1); // b1 Param0Ranges[2].Init(D3D12_DESCRIPTOR_RANGE_SRV, 2, 4); // t4-t5

// the second descriptor table entry in the root signature
// in image 1 sets shader registers u0 and b2
D3D12_DESCRIPTOR_RANGE Param1Ranges[2]; Param1Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_UAV, 1, 0); // u0 Param1Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 2); // b2

// set the descriptor tables in the root signature
// parameters: number of descriptor ranges, descriptor ranges, visibility
// visibility to all stages allows sharing binding tables
// with all types of shaders
D3D12_ROOT_PARAMETER Param[4];
Param[0].InitAsDescriptorTable(3, Param0Ranges, D3D12_SHADER_VISIBILITY_ALL);
Param[1].InitAsDescriptorTable(2, Param1Ranges, D3D12_SHADER_VISIBILITY_ALL); // root descriptor
Param[2].InitAsShaderResourceView(1, 0); // t0
// root constants
Param[3].InitAsConstants(4, 0); // b0 (4x32-bit constants)

// writing into the command list
cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]);
cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]);
cmdList->SetGraphicsRootConstantBufferView(2, [srvCPUHandle]);
cmdList->SetGraphicsRoot32BitConstants(3, {1,3,3,7}, 0, 4);

The source code above sets up a root signature that has two descriptor tables, one root descriptor, and one root constant. The code also shows that root constants have no indirection and are directly provided with the SetGraphicsRoot32bitConstants call. They are routed directly into the shader registers; there is no actual constant buffer, constant buffer descriptor, or binding happening. Root descriptors have only one level of indirection, because they store a pointer to memory (descriptor->memory), and descriptor tables have two levels of indirection (descriptor table -> descriptor-> memory).

Descriptors live in different heaps depending on their types, such as SV and CBV/SRV/UAV. This is due to wildly inconsistent sizes of descriptor types on different hardware platforms. For each type of descriptor heap, there should be only one heap allocated because changing heaps could be expensive.

In general DirectX 12 offers an allocation of more than one million descriptors upfront, enough for a whole game level. While previous DirectX versions dealt with allocations in the driver on their own terms, with DirectX 12 it is possible to avoid any allocations during runtime. That means any initial allocation of a descriptor can be taken out of the performance “equation.”

Note: With 3rd generation Intel® Core™ processors (code-name Ivy Bridge)/4th generation Intel® Core™ processor family (code-name Haswell) and DirectX 11 and the Windows Display Driver Model (WDDM) version 1.x, resources were dynamically mapped into memory based on the resources referenced in the command buffer with a page table mapping operation. This way copying data was avoided. The dynamic mapping was important because those architectures only offer 2 GB of memory to the GPU (Intel® Xeon® processor E3-1200 v4 product family (code-name Broadwell) offers more).
With DirectX 12 and WDDM version 2.x, it is no longer possible to remap resources into the GPU virtual address space as necessary, because resources have to be assigned a static virtual address when created and therefore the virtual address of resources cannot change after creation. Even if a resource is “evicted” from GPU memory, it maintains its virtual address for later when it is made resident again.
Therefore the overall available memory of 2 GB in Ivy Bridge/Haswell can become a limiting factor.

As stated in the previous article, a perfectly reasonable outcome for an application might be a combination of all types of bindings: root constants, root descriptors, descriptor tables for descriptors gathered on-the-fly as draw calls are issued, and dynamic indexing of large descriptor tables.

Different hardware architectures will show different performance trade-offs between using sets of root constants and root descriptors versus using descriptor tables. Therefore it might be necessary to tune the ratio between root parameters and descriptor tables depending on the hardware target platforms.

Expected Patterns of Change

To understand which kinds of change incur an additional cost, we have to analyze first how game engines typically change data, descriptors, descriptor tables, and root signatures.

Let’s start with what is called constant data. Most game engines store usually all constant data in “system memory.” The game engine will change data in CPU accessible memory and then later on during the frame, a whole block of constant data is copied/mapped into GPU memory and then read by the GPU through a constant buffer view or through the root descriptor.

If the constant data is provided through SetGraphicsRoot32BitConstants() as a root constant, the entry in the root descriptor does not change but the data might change. If it is provided through a CBV == descriptor and then a descriptor table, the descriptor doesn’t change but the data might change.

In case we need several constant buffer views—for example, for double or triple buffered rendering— the CBV or descriptor might change for each frame in the root signature.

For texture data, it is expected that the texture is allocated in GPU memory during startup. Then an SV == descriptor will be created, stored in a descriptor table or a static sampler, and then referenced in the root descriptor. The data and the descriptor or static sample do not change after that.

For dynamic data like changing texture or buffer data (for example, textures with rendered localized text, buffers of animated vertices or procedurally generated meshes), we allocate a render target or buffer, provide an RTV or UAV, which are descriptors, and then these descriptors might not change from there on. The data in the render target or buffer might change.

In case we need several render targets or buffers—for example, for double or triple buffered rendering—the descriptors might change for each frame in the root signature.

For the following discussion, a change is considered important for binding resources if it does the following:

Changes/replaces a descriptor in a descriptor table, for example, the CBVs, RTVs, or UAVs described above
Changes any entry in the root signature

Descriptors in Descriptor Tables with Haswell/Broadwell

On platforms based on Haswell/Broadwell, the cost of changing one descriptor table in the root signature is equivalent to changing all descriptor tables. Changing one argument means that the hardware has to make a copy (version) of all the current arguments. The number of root parameters in a root signature is the amount of data that the hardware has to version when any subset changes.

Note: All the other types of memory in DirectX 12, like descriptor heaps, buffer resources, and so on, are not versioned by hardware.

In other words, changing all of the parameters is roughly the same cost as just changing one (see [Lauritzen] and [MSDN]). Changing none is still the cheapest, but not that useful.

Note: Other hardware, that has for example a split between fast / slow (spill) root argument storage only has to version the region of memory where the argument changed – either the fast area or the spill area.

On Haswell/Broadwell, an additional cost of changing descriptor tables can come from the limited size of the binding table in hardware.

Descriptor tables on those hardware platforms use “binding table” hardware. Each binding table entry is a single DWORD that can be considered an offset into the descriptor heap. The 64 KB ring can store 16,384 binding table entries.

In other words the amount of memory consumed per draw call is dependent on the total number of descriptors that are indexed in a descriptor table and then referenced through a root signature.

In case we run out of the 64 KB memory for the binding table entries, the driver will allocate another 64 KB binding table. The switch between those tables leads to a pipeline stall as shown in Figure 2.

Figure 2. Pipeline stall (courtesy of Andrew Lauritzen).

For example a root signature references 64 descriptors in a descriptor table. The stall will happen every 16,384 / 64 = 256 draw calls.

Because changing a root signature is considered cheap, having multiple root signatures with a low number of descriptors in the descriptor table is favorable over having root signatures with a larger amount of descriptors in the descriptor table.

Therefore it is favorable on Haswell/Broadwell to keep the number of descriptors referenced in descriptor tables as low as possible.

What does this mean for renderer designs? Using more descriptor tables with less descriptors and therefore more root signatures should increase the number of pipeline state objects (PSO), because with an increased number of root signatures the number of PSOs needs to increase because of the one-to-one relationship between these two.

Having more pipeline state objects might lead to a larger number of shaders that, in this case, might be more specialized, instead of longer shaders that offer a wider range of features, which is the common recommendation.

Root Constants/Descriptors on Haswell/Broadwell

Similar to where changing one descriptor table is the same cost compared to changing all of them, changing one root constant or root descriptor is the equivalent to changing all of them (see [Lauritzen]).

Root constants are implemented with “push constants” that are a buffer that hardware uses to prepopulate Execution Unit (EU) registers. Because the values are immediately available when the EU thread launches, it can be a performance win to store constant data as root constants, instead of storing them with descriptor tables.

Root descriptors are implemented as “push constants” as well. They are just pointers passed as constants to the shader, reading data through the general memory path.

Descriptor Tables versus Root Constants/Descriptors on Haswell/Broadwell

Now that we looked at the way descriptor tables, root constants, and descriptors are implemented, we can answer the main question of this article: is one favorable over the other? Because of the limited size of binding table hardware and the potential stalls resulting from crossing this limit, changing root constants and root descriptors is expected to be cheaper on Haswell/Broadwell hardware because they do not use the binding table hardware. For root descriptors and root constants, this is especially recommended in case the data changes every draw call.

Static Samplers on Haswell/Broadwell

As described in the previous article, it is possible to define samplers in the root signature or right in the shader with HLSL root signature language. These are called static samplers.

On Haswell/Broadwell hardware, the driver will place static samplers in the regular sampler heap. This is equivalent to putting them into descriptors manually. Other hardware implements samplers in shader registers, so static samplers can be compiled directly into the shader.

In general static samplers should be a win on many platforms, so there is no downside to using them. On Haswell/Broadwell hardware there is still the chance that by increasing the number of descriptors in a descriptor table, we end up more often with a pipeline stall, because descriptor table hardware has only 16,384 slots to offer.

Here is the syntax for a static sampler in HLSL:

StaticSampler( sReg,
               [ filter = FILTER_ANISOTROPIC,
               addressU = TEXTURE_ADDRESS_WRAP,
               addressV = TEXTURE_ADDRESS_WRAP,
               addressW = TEXTURE_ADDRESS_WRAP,
               mipLODBias = 0.f,     maxAnisotropy = 16,
               comparisonFunc = COMPARISON_LESS_EQUAL,
               borderColor = STATIC_BORDER_COLOR_OPAQUE_WHITE,
               minLOD = 0.f, maxLOD = 3.402823466e+38f,
               space = 0, visibility = SHADER_VISIBILITY_ALL ])

Most of the parameters are self-explanatory because they are similar to the C++ level usage. The main difference is the border color: on the C++ level it offers a full color range while the HLSL level is restricted to opaque white/black and transparent black. An example for a static shader is:

StaticSampler(s4, filter=FILTER_MIN_MAG_MIP_LINEAR)

Skylake

Skylake allows dynamic indexing of the entire descriptor heap (~1 million resources) in one descriptor table. That means one descriptor table could be enough to index all the available descriptor heap memory.

Compared to previous architectures, it is not necessary to change descriptor table entries in the root signature as often. That also means that the number of root signatures can be reduced. Obviously different materials will require different shaders and therefore different PSOs. But those PSOs can reference the same root signatures.

With modern rendering engines utilizing less shaders than their DirectX 9 and 11 ancestors so that they can avoid the cost of changing shaders and the attached states, reducing the number of root signatures and therefore the number of PSOs is favorable and should result in a performance gain on any hardware platform.

Conclusion

Focusing on Haswell/Broadwell and Skylake, the recommendation for developing performant DirectX 12 applications are dependent on the underlying platform. While for Haswell/Broadwell, the number of descriptors in a descriptor table should be kept low, for Skylake it is recommended to keep this number high and decrease the number of descriptor tables.

To achieve optimal performance, the application programmer can check during startup for the type of hardware and then pick the most efficient resource binding pattern. (There is a GPU detect example that shows how to detect different Intel® hardware architectures at https://software.intel.com/en-us/articles/gpu-detect-sample/) The choice of resource binding pattern will influence how shaders for the system are written.

About the Author

Wolfgang is the CEO of Confetti. Confetti is a think-tank for advanced real-time graphics research and a service provider for the video game and movie industry. Before cofounding Confetti, Wolfgang worked as the lead graphics programmer in Rockstar's core technology group RAGE for more than four years. He is the founder and editor of the ShaderX and GPU Pro books series, a Microsoft MVP, the author of several books and articles on real-time rendering and a regular contributor to websites and conferences worldwide. One of the books he edited, ShaderX4, won the Game developer Front line award in 2006. Wolfgang is in many advisory boards throughout the industry; one of them is the Microsoft’s Graphics Advisory Board for DirectX 12. He is an active contributor to several future standards that drive the Game Industry. You can find him on twitter at wolfgangengel. Confetti's website is www.conffx.com

Acknowledgement

I would like to thank the reviewers of this article:

Andrew Lauritzen
Robin Green
Michal Valient
Dean Calver
Juul Joosten
Michal Drobot

References and Related Links

[Lauritzen] Andrew Lauritzen et al., “Efficient Rendering with DirectX 12 on Intel Graphic”, GDC 2015
[MSDN] MSDN, “Advanced use of Descriptor Tables,” https://msdn.microsoft.com/en-us/library/windows/desktop/dn859250(v=vs.85).aspx
Microsoft DirectX blog: http://blogs.msdn.com/b/directx/
DirectX 12 on Twitter: @DirectX12 https://twitter.com/DirectX12
Direct3D* 12 - Console API Efficiency & Performance on PCs (https://software.intel.com/en-us/articles/console-api-efficiency-performance-on-pcs)
Microsoft DirectX 12 Graphics Education (YouTube channel): https://www.youtube.com/channel/UCiaX2B8XiXR70jaN7NK-FpA

** Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

↧

How Intugine Integrated the Nimble* Gesture Recognition Platform with Intel® RealSense™ Technology

October 12, 2015, 1:19 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® C++ Composer XE 2013 SP1 for Windows*, Update 6

≪ Previous: Performance Considerations for Resource Binding in Microsoft DirectX* 12

Shwetha Doss, Senior Application Engineer, Intel Corporation

Harshit Shrivastava, Founder and CEO, Intugine Technologies

Abstract

Intel® RealSense™ technology helps developers enable a natural user interface (NUI) for their gesture recognition platforms. The gesture recognition platform seamlessly integrates with Intel RealSense technology for NUI across segments of applications on Microsoft Windows* platforms. The gesture recognition platform handles all interactions with the user and the Intel® RealSense™ SDK, ensuring that no code changes are required for individual applications.

This paper highlights how Intugine (http://www.intugine.com/) enabled its gesture recognition platforms for Intel® RealSense™ technology. It also discusses how the same methodology can be applied to other applications related to games and productivity applications.

Introduction

Intel® RealSense™ technology adds “human-like” senses to computing devices. Intel® is working with OEMs to create future computing devices that will be able to hear, see, and feel the environment, as well as understand human emotion and a human’s sensitivity to context. These devices will interact with humans in immersive, natural, and intuitive ways.

Intel® RealSense™ technology understands four important modes of communication: hands, the face, speech, and the environment around you. This multi-modal processing will enable the devices to behave more like humans.

The Intel® RealSense™ Camera

The Intel® RealSense™ camera uses depth-sensing technology so that computing devices see more like you do. To harness the possibilities of the Intel® RealSense™ technology, developers need to use the Intel® RealSense™ SDK along with the Intel® RealSense™ camera. There are two camera options: the F200 and the R200. These Intel-developed depth cameras support full VGA depth resolution, full 1080p RGB resolution, and require USB 3.0. Both cameras support depth and IR processing at 640×480 resolution at 60 frames per second (FPS).

There are many OEM devices with integrated Intel® RealSense™ cameras available, including Ultrabooks*, tablets, notebooks, 2 in1s, and all-in-one form factors.

Figure 1. Intel® RealSense™ cameras.

Figure 2. The Intel® RealSense™ camera (F200).

The infrared (IR) laser projector on the Intel RealSense camera (F200) sends non-visible patterns (coded light) onto the object. The IR camera captures the reflected patterns. These patterns are processed by the ASIC, which assigns depth values to each pixel to create a depth video frame.

Applications see both depth and color video streams. The ASIC syncs depth with color stream (texture mapping) using a UVC time stamp and generates data flags for each depth value (valid, invalid, or motion detected.) The range of the F200 camera is about 120 cm.

Figure 3. The Intel® RealSense™ camera (R200).

The R200 camera actually has three cameras providing RGB (color) and stereoscopic IR to produce depth. With the help of a laser projector, the camera does 3D scanning for scene perception and enhanced photography. The inside range is approximately 0.5–3.5 meters, and the outside range is up to 10 meters.

Intel® RealSense™ SDK

The Intel® RealSense™ SDK includes a set of pattern detection and recognition algorithm implementations exposed through standardized interfaces. These algorithms implementations enable the application developer’s focus to move from coding the algorithm details to innovating on the usage of these algorithms.

Intel® RealSense™ SDK Architecture

The SDK library architecture consists of several components. The essence of the SDK functionalities lays in the I/O modules and the algorithm modules. The I/O modules retrieve input from the input device or send output to an output device.

The algorithm module includes various pattern detection and recognition algorithms related to face recognition, gesture recognition, and speech recognition.

Figure 4. The Intel® RealSense™ SDK architecture.

Figure 5. The Intel® RealSense™ SDK provides 78-point face landmarks.

Figure 6. The Intel® RealSense™ SDK provides skeletal tracking.

Intugine Nimble*

Intugine Nimble* is a high-accuracy, motion-sensing wearable device. The setup consists of a USB sensor and two wearable devices: a ring and a finger clip. The sensor tracks the movement of rings in 3D space with sub-millimeter accuracy and low latency. The device works on computer vision, where the rings do a certain patterned emission in a narrow nanometer bandwidth, and the sensor is coupled to see only that wavelength. The software algorithm sitting on the host device recognizes the emitted pattern and tracks the rings individually. The software generates the coordinates of the rings at a high frame rate of over 60 coordinates per second, for each ring.

Figure 7. The Intugine Nimble* effectively replaces the mouse and keyboard.

I. Applications With Nimble

Some of the available applications that Nimble can control are games such as Fruit Ninja*, Angry Birds*, and Counter-Strike* and utility applications such as Microsoft PowerPoint* and media players. These available applications are currently controlled by mouse and keyboard inputs. To control them with Nimble, we need to generate the keyboard and mouse events programmatically.

The software module that takes care of the keyboard and mouse events is called the interaction layer. Nimble uses a proprietary software interaction layer to interact with existing games and applications. The interaction layer maps the user’s fingertip coordinates to the application/OS recognizable mouse and keyboard events.

Nimble with the Intel® RealSense™ SDK

The Intel® RealSense™ SDK can detect IR emissions of 860 nm. The patterned emission of Nimble rings can be customized to a certain wavelength range. Replacing the emission source in the ring by an 860 nm emitter, the ring emits similar patterns in the 860 nm range. The Intel® RealSense™ SDK can sense these emissions, which can be taken as an image stream and then tracked using the SDK. By implementing Nimble pattern recognition and tracking algorithms in the Intel® RealSense™ SDK, we get the coordinates of individual rings at 60 FPS.

Intel® RealSense™ SDK’s design avoids most of lens and curvature defects, which allows a better scaled motion tracking of Nimble rings. The IR resolution of 640×480 generates refined spatial coordinate information. The Intel® RealSense™ SDK supports up to 300 FPS in the IR stream, which provides almost zero latency in Nimble’s tracking and provides an extremely responsive experience.

Nimble technology is designed to track only the emissions of rings and thus misses the details of skeletal tracking that might be required for a few applications.

Figure 8. The Intugine Nimble* along with Intel® RealSense™ technology.

Value proposition for Intel® RealSense™ Technology

Nimble along with Intel® RealSense™ technology can support a wide range of existing applications. Currently over 100 applications are working seamlessly without needing any source-code modifications. And potentially most of the Microsoft* Windows and Android* applications can work with this solution.

Currently the Intel® RealSense™ camera (F200) supports a range of 120 cm. With the addition of Nimble, this range can extend to over 15 feet.

Nimble allows sub-millimeter accurate finger tracking within a range of 3 feet and sub-centimeter accurate tracking within a range of 15 feet. This enables many high-accuracy games and applications to be used with better control.

Nimble along with Intel® RealSense™ technology reduces the application latency to less than 5 milliseconds.

Nimble along with Intel® RealSense™ technology can support multiple rings together; we have tested up to eight rings with Intel® RealSense™ technology.

Summary

Nimble’s interaction layer along with Intel® RealSense™ technology can help add gesture support to any application without any changes to the source code. Using this technology, applications in Windows* and Android* platforms can add gesture support with minimal efforts.

For More Information

Intel® RealSense™ technology: http://www.intel.in/content/www/in/en/architecture-and-technology/realsense-overview.html
Intugine: http://www.intugine.com/
https://software.intel.com/en-us/articles/realsense-r200-camera

↧

Intel® C++ Composer XE 2013 SP1 for Windows*, Update 6

October 14, 2015, 11:53 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with Microsoft Visual Studio 2010 Shell & Libraries*, Update 6

≪ Previous: How Intugine Integrated the Nimble* Gesture Recognition Platform with Intel® RealSense™ Technology

Intel® C++ Composer XE 2013 SP1 Update 6 includes the latest Intel C/C++ compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® C++ Compiler XE Version 14.0.6, Intel® Math Kernel Library (Intel® MKL) Version 11.1 Update 4, Intel® Integrated Performance Primitives (Intel® IPP) Version 8.1 Update 1, Intel® Threading Building Blocks (Intel® TBB) Version 4.2 Update 5, Intel(R) Debugger Extension 7.5-1.0 for Intel(R) Many Integrated Core Architecture.

New in this release:

Intel® C++ Compiler XE 14.0.6
Corrections to reported problems
- Compiler fix list
- Intel® MKL fix list

Note: For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Intel® Composer XE (Click on desired product)
Intel® Composer XE 2013 SP1 Checksums

Contents
File: w_ccompxe_online_2013_sp1.6.241.exe
Online installer

File: w_ccompxe_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications

File: w_ccompxe_redist_msi_2013_sp1.6.241.zip
Redistributable Libraries for 32-bit and 64-bit msi files

File: get-ipp-8.1-crypto-library.htm
Cryptography Library

↧

Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with Microsoft Visual Studio 2010 Shell & Libraries*, Update 6

October 14, 2015, 12:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with IMSL*, Update 6

≪ Previous: Intel® C++ Composer XE 2013 SP1 for Windows*, Update 6

Intel® Visual Fortran Composer XE 2013 SP1 Update 6 includes the latest Intel Fortran compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® Visual Fortran Compiler XE Version 14.0.6, Intel® Math Kernel Library (Intel® MKL) Version 11.1 Update 4, Intel® Debugger Extension 7.5-1.0 for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

New in this release:

Intel® Fortran Compiler updated to version 14.0.6
Corrections to reported problems
- Compiler fix list
- Intel® MKL fix list

Note: For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Intel® Composer XE (Click on desired product)
Intel® Composer XE 2013 SP1 Checksums

Contents
File: w_fcompxe_online_2013_SP1.6.241.exe
Online installer

File: w_fcompxe_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, English version)

File: w_fcompxe_all_jp_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, Japanese version)

File: w_fcompxe_redist_msi_2013_sp1.6.241.zip
Redistributable Libraries for 32-bit and 64-bit msi files

↧

Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with IMSL*, Update 6

October 14, 2015, 2:16 pm

Latest and popular articles on Intel Technologies

≫ Next: 3D People Full-Body Scanning System With Intel® RealSense™ 3D Cameras and Intel® Edison: How We Did It

≪ Previous: Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with Microsoft Visual Studio 2010 Shell & Libraries*, Update 6

New in this release:

Intel® Fortran Compiler updated to version 14.0.6
Corrections to reported problems
- Compiler fix list
- Intel® MKL fix list

Note: For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Intel® Composer XE (Click on desired product)
Intel® Composer XE 2013 SP1 Checksums

Contents
File: w_fcompxe_online_2013_sp1.6.241.exe
Online installer

File: w_fcompxe_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, English version)

File: w_fcompxe_all_jp_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, Japanese version)

File: w_fcompxe_redist_msi_2013_sp1.6.241.zip
Redistributable Libraries for 32-bit and 64-bit msi files

File: w_fcompxe_imsl_2013_sp1.0.024.exe
IMSL* Library for developing 32-bit and 64-bit applications

↧

3D People Full-Body Scanning System With Intel® RealSense™ 3D Cameras and Intel® Edison: How We Did It

October 15, 2015, 2:17 pm

Latest and popular articles on Intel Technologies

≫ Next: Blend the Intel® RealSense™ Camera and the Intel® Edison Board with JavaScript*

≪ Previous: Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with IMSL*, Update 6

By Konstantin Popov of Cappasity

Cappasity has been developing 3D scanning technologies for two years. This year we are going to release a scanning software product for Ultrabook™ devices and tablets with Intel® RealSense™ cameras: Cappasity Easy 3D Scan*. Next year we plan to create hardware and software solutions to scan people and objects.

As an Intel® Software Innovator and with the help of the Intel® team, we were invited to show the prototype of the people scanning system much earlier than planned. We had limited time for preparations, but still we decided to take on the challenge. In this article I'll explain how we created our demo for the Intel® Developer Forum 2015 held August 18– 20 in San Francisco.

Our demo is based upon previously developed technology that combines the multiple depth cameras and the RGB cameras into a single scanning system (U.S. Patent Pending). The general concept is as follows: we calibrate the positions, angles, and optical properties of the cameras. This calibration allows us to merge the data for subsequent reconstruction of the 3D model. To capture the scene in 3D we can place the cameras around the scene, rotate the camera system around the scene, or rotate the scene itself in front of the cameras.

We selected the Intel® RealSense™ camera because we believe that it's an optimum value-for-money solution for our B2B projects. At present we are developing two prototype systems using several Intel® RealSense™ cameras: a scanning box with several 3D cameras for instant scanning and a system for full-body people scanning.

We demonstrated both prototypes at IDF 2015. The people scanning prototype operated with great success for the three days of the conference, scanning many visitors who came to our booth.

Now let's see how it works. We attached three Intel® RealSense™ cameras to a vertical bar so that the bottom camera is aimed at the feet and lower legs, the middle camera captures the legs and the body, and the top-most camera films the head and the shoulders.

Each camera is connected to a separate Intel® NUC computer, and all the computers are connected to the local area network.

Since the cameras are mounted onto a fixed bar, we used a rotating table to rotate the person being filmed. The table construction is quite basic: a PLEXIGLAS* pad, roller bearings, and a step motor. The table is connected to the PC via an Intel® Edison board; it receives commands through the USB port.

We also used a simple lighting system to steadily illuminate the front of a person being filmed. In the future, all these components will be built into a single box, but at present we were just demonstrating an early prototype of the scanning system, so we had to assemble everything using a commercially available component.

Our software operates based on the client-server architecture, but the server part can be run on almost any modern PC. That is, any computer that performs our calculations is a "server" in our system. We often use an ordinary Ultrabook® with Intel® HD Graphics as a server. The server sends the recording command to the Intel® NUC computers, gets the data from them, then analyzes and rebuilds the 3D model.

Now, let's look at some particular aspects of the task we are trying to solve. The 3D rebuilding technology that we use in the Cappasity products is based upon our implementation of the Kinect* Fusion algorithm. But in this case our challenge was much more complex: we had only one month to create an algorithm to reconstruct the data from several sources. We called it "Multi-Fusion." In its present state the algorithm can merge the data from an unlimited number of sources into a single voxel volume. For scanning people three data sources were enough.

Calibration is the first stage. The Cappasity software allows the devices to be calibrated pairwise. Our studies from the year we spent in R&D came in pretty handy in preparation for IDF 2015. In just a couple of weeks we reworked the calibration procedure and implemented support for voxel volumes after Fusion. Previously the calibration process was more involved with processing the point cloud. The system needs to be calibrated just once, after the cameras are installed. Calibration takes no more than 5 minutes.

Then we had to come up with a data-processing approach, and after doing some research we chose post-processing. That is, first we record the data from all cameras, then we upload the data to the server via the network, and then we begin the reconstruction process. All cameras record color and depth streams. As a result, we have the complete data cast for further processing. It is convenient considering that the post-processing algorithms are constantly improved, and the ones we're using were written in just a couple of days before IDF.

Compared to the Intel® RealSense™ camera (F200), the Intel® RealSense™ camera (long-range R200) performs better with black color and complex materials. We had few glitches in tracking. The most important thing, however, is that the cameras allow us to capture the images at the required range. We have optimized the Fusion reconstruction algorithm for OpenCL* to achieve good performance even on Intel® HD Graphics 5500 and later. To remove the noise we used Fusion plus additional data segmentation after a single mesh was composed.

In addition, we have refined the high-resolution texture mapping algorithm. We use the following approach: we capture the image at the full resolution of the color camera, and then we project the image onto the mesh. We are not using voxel color since it causes the texture quality to degrade. The projection method is quite complex to implement, but it allows us to use both built-in and external cameras as color sources. For example, the scanning box we are developing operates using DSLR cameras to get high-resolution textures, which is important for our e-commerce customers.

However, even the built-in Intel® RealSense™ cameras with RGB provide perfect colors. Here is a sample after mapping the textures:

We are developing a new algorithm to eradicate the texture shifting. We plan to have it ready by the release of our Easy 3D Scan software product.

Our seemingly simple demo is based upon complex code allowing us to compete with expensive scanning systems at USD 100K+ price range. The Intel® RealSense™ cameras are budget-friendly, which will help them revolutionize the B2B market.

Here are the advantages of our people scanning system:

It is an affordable solution, and it’s easy to setup and operate. Only a press of a button is needed.
Small size: the scanning system can be placed in retail areas, recreational centers, medical institutions, casinos, and so on.
The quality of the 3D models is suitable for 3D printing and for developing content for AR/VR applications.
The precision of the resulting 3D mesh is suitable for taking measurements.

We understand that the full potential of the Intel® RealSense™ cameras is yet to be uncovered. We are confident that at CES 2016 we'll be able to demonstrate significantly improved products.

↧

Blend the Intel® RealSense™ Camera and the Intel® Edison Board with JavaScript*

October 22, 2015, 11:48 am

Latest and popular articles on Intel Technologies

≫ Next: Enabling IPP on OpenCV ( Windows* and Linux* Ubintu* )

≪ Previous: 3D People Full-Body Scanning System With Intel® RealSense™ 3D Cameras and Intel® Edison: How We Did It

Introduction

Smart devices can now connect to things we never before thought possible. This is being enabled by the Internet of Things (IoT), which allows these devices to collect and exchange data.

Intel has created Intel® RealSense™ technology, which includes the Intel® RealSense™ camera and the Intel® RealSense™ SDK. Using this technology, you can create applications that detect gesture and head movement, analyze facial data, perform background segmentation, read depth level, recognize and synthesize voice and more. Imagine that you are developing a super sensor that can detect many things. Combined with the versatile uses of the Intel® Edison kit and its output, you can build creative projects that are both useful and entertaining.

The Intel® RealSense™ SDK provides support to popular programming language and frameworks such as C++, C#, Java*, JavaScript*, Processing, and Unity*. This means that developers can get started quickly using a programming environment they are familiar with.

Peter Ma’s article, Using an Intel® RealSense™ 3D Camera with the Intel® Edison Development Platform, presents two examples of applications using C#. The first uses the Intel® RealSense™ camera as input and the Intel® Edison board as output. The result is that if you spread your fingers in front of Intel® RealSense™ camera, it sends a signal to the Intel® Edison board to turn on the light.

In the second example, Ma reverses the flow, with the Intel® Edison board as input and the Intel® RealSense™ camera as output. The Intel® Edison board provides data that comes from a sensor to be processed and presents it to us through the Intel® RealSense™ camera as voice synthesis to provide more humanized data.

Ma’s project inspired me to build something similar, but using JavaScript* instead of C#. I used the Intel® RealSense™ SDK to read and send hand gesture data to a node.js server, which then sends the data to the Intel® Edison board to trigger a buzzer and LED that are connected to it.

About the Project

This project is written in JavaScript*. If you are interested in implementing only a basic gesture, the algorithm module is already in the Intel® RealSense™ SDK. It gives you everything you need.

Hardware

Requirements:

Intel® Edison board with the Arduino* breakout board
Seeed Grove* Starter Kit Plus - Intel® XDK IoT Edition (at http://www.seeedstudio.com/depot/Grove-starter-kit-plus-Intel-IoT-Edition-for-Intel-Galileo-Gen-2-and-Edison-p-1978.html)
4th generation Intel® Core™ processor (or later) with 8 GB free hard disk space and USB 3.0 port support.
Intel® RealSense™ camera (F200) (system-integrated or peripheral version)
Linux* server equipped with node.js (see https://nodejs.org)

Intel® Edison board with the Arduino breakout board

The Intel® Edison board is a low-cost, general-purpose computer platform. It uses a 22nm dual-core Intel® Atom™ SoC running at 500 MHz. It supports 40 GPIOs and includes 1 GB LPDDR3 RAM, 4 GB EMMC for storage, dual-band Wi-Fi, and Bluetooth, and has a small size.

The board runs the Linux* kernel and is compatible with Arduino, so it can run an Arduino implementation as a Linux* program.

Figure 1. Intel® Edison breakout board kit.

Grove Starter Kit Plus - Intel® XDK IoT Edition

Grove Starter Kit Plus - Intel® XDK IoT Edition is designed for the Intel® Galileo board Gen 2, but it is fully compatible with the Intel® Edison board via the breakout board kit.

The kit contains sensors, actuators, and shields, such as a touch sensor, light sensor, and sound sensor, and also contains an LCD display as shown in Figure 2. This kit is an affordable solution for developing an IoT project.

You can purchase the Grove Starter Kit Plus here:

Figure 2. Grove* Starter Kit Plus - Intel® XDK IoT Edition

Intel® RealSense™ Camera

The Intel® RealSense™ camera is built for game interactions, entertainment, photography, and content creation with a system-integrated or a peripheral version. The camera’s minimum requirements are a USB 3.0 port, a 4th gen Intel Core processor, and 8 GB of hard drive space.

The camera (shown in Figure 3) features full 1080p color and an in-depth sensor and gives the PC a 3D visual and immersive experience.

Figure 3. Intel® RealSense™ camera

You can purchase the complete developer kit, which includes the camera here.

GNU/Linux* server

A GNU/Linux* server is easy to develop. You can use an old computer or laptop or you can put the server on a cloud. I used a cloud server with an Ubuntu* server. If you have different Linux* flavors for the server, just adapt to your favorite command.

Software

Before we start to develop the project, make sure you have the following software installed on your system. You can use the links to download the software.

Intel® XDK IoT Edition
Intel® RealSense™ SDK (R4 or latest version)

Set Up the Intel® RealSense™ Camera

To set up the Intel® RealSense™ camera, connect the Intel® RealSense™ camera (F200) to the USB 3.0 port, and then install the driver as the camera connected to your computer. Navigate to the Intel® RealSense™ SDK location, and open the JavaScript* sample on your browser:

Install_Location\RSSDK\framework\JavaScript\FF_HandsViewer\FF_HandsViewer.html

After the file opens, the scripts checks to see what platform you have. While the script is checking your platform, click the link on your web browser to install the Intel® RealSense™ SDK WebApp Runtime.

When the installation is finished, restart your web browser, and then open the file again. You can check to see that the installation was a success by raising your hand in front of the camera. It should show your hand gesture data visualized on your web browser.

Gesture Set Up

The first key code line that enables gesture looks like the following:

{"timeStamp":130840014702794340 ,"handId": 4,"state": 0,"frameNumber":1986 ,"name":"spreadfinger"
}

This sends "name":"spreadfingers" to the server to be processed.

Next, we will write some JavaScript* code to stream gesture data from the Intel® RealSense™ camera to the Intel® Edison board through the node.js server.

Working with JavaScript*

Finally, we get to do some programming. I suggest that you first move the whole folder because the default installation doesn’t allow the original folder to be rewritten.

Copy the FF_HandsViewer folder from this location and paste it somewhere else. The folder’s location is:

\install_Location\RSSDK\framework\JavaScript\FF_HandsViewer\

Eventually, you will be able to create your own project folder to keep things managed.

Next, copy the realsense.js file from the location below and paste it inside the FF_HandsViewer folder:

Install_Location\RSSDK\framework\common\JavaScript

To make everything easier, let’s create one file named edisonconnect.js. This file will receive gesture data from the Intel® RealSense™ camera and send it to the node.js server. Remember that you have to change the IP address on the socket variable directing it to your node.js server IP address:

// var socket = io ('change this to IP node.js server');

var socket = io('http://192.168.1.9:1337');

function edisonconnect(data){
  console.log(date.name);
  socket.emit('realsense_signal',data);
}

Now for the most important step: commanding the file sample.js to create the gesture data, and running a thread to intercept that gesture data and pass it to edisonconnect.js. You don’t need to watch CPU activity because it doesn’t take much frame rate or RAM as it compiles.

// retrieve the fired gestures
for (g = 0; g < data.firedGestureData.length; g++){
  $('#gestures_status').text('Gesture: ' + JSON.stringify(data.firedGestureData[g]));

  // add script start - passing gesture data to edisonconnect.js
	edisonconnect(data.firedGestureData[g]);
  // add script end
}

After the function above is running and called to create some gesture data, the code below finishes the main task of the JavaScript* program. After that, you have to replace the realsense.js file path.

It is critical to do the following: link the socket.io and edisonconnect.js files

<!DOCTYPE html><html><head><title> Intel&reg; RealSense&trade; SDK JavaScript* Sample</title><script src=”https://aubahn.s3.amazonaws.com/autobahnjs/latest/autobahn.min.jgz” </script><script src=”https://promisejs.org/polyfills/promise-6.1.0.js” </script><script src=”https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js” </script><script src=”https://common/JavaScript/realsense.js” </script><script src=”sample.js” </script><script src=”three.js” </script><!-- add script start --><script src=”https://cdn.socket.io/socket.io-1.3.5.js” </script><script src=”edisonconnect.js” </script><!-- add script end → <link rel=”stylesheet” type=”text/css” href=”style.css”></head><body>

The code is taken from SDK sample. It has been reduced in order to make the code simple and easy. The code is about to send gesture data to the server. The result is that the Intel® RealSense™ SDK was successful in understanding gesture and is ready to send it to the server.

Set Up the Server

We will use a GNU/Linux*-based server. I use an Ubuntu* server as the OS, but you can use any GNU/Linux* distribution that you familiar with. We will skip the installation server section, because related tutorials are readily found on the Internet.

As the server has just been installed, we need to update the repository list and upgrade the server. To do this, I will use a common command that is found on Ubuntu distribution but you can use a similar command depending on the GNU/Linux* distribution that you are using.

# apt-get update && apt-get upgrade

Once the repository list is updated, the next step is to install node.js.

# apt-get install nodejs

We also need to install npm Package Manager.

# apt-get install npm

Finally, install socket.io express from npm Package Manager.

# npm install socket.io express

Remember to create file server.js and index.html.

# touch server.js index.html

Edit the server.js file, using your favorite text editor such as vim or nano #

vim server.js

Write down this code:

var express   = require("express");
var app   	= express();
var port  	= 1337;

app.use(express.static(__dirname + '/'));
var io = require('socket.io').listen(app.listen(port));
console.log("Listening on port " + port);

io.on('connection', function(socket){'use strict';
  console.log('a user connected from ' + socket.request.connection.remoteAddress);

	// Check realsense signal
	socket.on('realsense_signal', function(data){
  	socket.broadcast.emit('realsense_signal',data);
  	console.log('Hand Signal: ' + data.name);
	});
  socket.on('disconnect',function(){
	console.log('user disconnected');
  });
});

var port = 1337; means that an available port has been assigned to port 1337. console.log("Listening on port " + port) ; indicates whether the data from JavaScript* has been received or not. The main code is socket.broadcast.emit('realsense_signal',data); this means the data is received and is ready to broadcast to all listening port and clients.

The last thing we need to do is to run the server.js file with node. If listening at port 1337 displays as shown in the screenshot below, you have been successful.
# node server.js

root@edison:~# node server.js Listening on port 1337 events.js:85

Set up the Intel® Edison Board

The Intel® Edison SDK is easy to deploy. Refer to the following documentation:

Now it's time to put the code into the Intel® Edison board. This code connects the server and listens to any broadcast that comes from the server. It is like the code for the other server and listening step. If any gesture data is received, the Intel® Edison board triggers Digital Pins to On/Off.

Open the Intel® XDK IoT Edition and create a new project from Templates, using the DigitalWrite template, as shown in the screenshot below.

Edit line 9 in package.json. by adding dependencies socket.io-client. If it is empty, search to find the proper installation. By adding dependencies, it will install the socket io client, if there was no client in the Intel® Edison board.

"dependencies": {"socket.io-client":"latest" // add this script
}

Find the file named main.js. You need to connect to the server in order to make sure that server is ready to listen. Next, check to see whether the gesture data name "spreadfingers" exists in that file, which will trigger Digital Pins2 and Digital Pins8 state to 1 / On and reversed.
Change the referring server IP’s addresses. If you want to change the pins, make sure you change on mraa.Gpio(selectedpins) too.

var mraa  = require("mraa");

var pins2 = new mraa.Gpio(2);
	pins2.dir(mraa.DIR_OUT);

var pins8 = new mraa.Gpio(8);
	pins8.dir(mraa.DIR_OUT);

var socket = require('socket.io-client')('http://192.168.1.9:1337');

socket.on('connect', function(){
  console.log('i am connected');
});

socket.on('realsense_signal', function(data){
  console.log('Hand Signal: ' + data.name);
  if(data.name=='spreadfingers'){
	pins2.write(1);
	pins8.write(1);
  } else {
	pins2.write(0);
	pins8.write(0);
  }
});

socket.on('disconnect', function(){
  console.log('i am not connected');
});

Select Install/Build, and then select Run after making sure the Intel® Edison board is connected to your computer.

Now make sure the server is up and running, and the Intel® RealSense camera and Intel® Edison board are connected to the Internet.

Conclusion

Using Intel® RealSense™ technology, this project modified the JavaScript* framework sample script to send captured gesture data to the Node.js server. But this project is only a beginning for more to come.

This is easy to code. The server broadcasts Gesture Data to any socket client that listening. The Intel® Edison board that installed with socket.io-client is listening to the broadcast from server. Because of that, Gesture Data name spreadfingers will trigger Digital Pins change state from 1 to 0 and vice versa.

Possibilities are endless. The Intel RealSense camera is lightweight, easy to bring and use. Intel® Edison is a powerful embedded PC. If we blend and connect the Intel® Edison and the Intel® RealSense™ camera with JavaScript*, it is easy to pack, code, and build an IoT device. You can create something great yet useful.

About the Author

Aulia Faqih - Intel® Software Innovator

Intel® RealSense™ Technology Innovator based in Yogyakarta, Indonesia, currently lecturing at UIN Sunan Kalijaga Yogyakarta. Love playing with Galileo / Edison, Web and all geek things.

↧

Enabling IPP on OpenCV ( Windows* and Linux* Ubintu* )

October 26, 2015, 9:54 pm

Latest and popular articles on Intel Technologies

≫ Next: Using the Intel® RealSense™ Camera with TouchDesigner*: Part 1

≪ Previous: Blend the Intel® RealSense™ Camera and the Intel® Edison Board with JavaScript*

To set up the environment (Windows* systems):

Configuration of OpenCV 3.0.0 – Enabling IPP
- Download OpenCV 3.0.0( http://www.opencv.org/ ) and CMake-3.2.3 (http://www.cmake.org/download/ )
- Extract OpenCV where you want and install CMake and run CMake.
- Add OpenCV’s location as the source location and choose a location where you want your build will be created.
- To enable IPP you have 2 options. One you can just use ‘ICV’ that is a special IPP build for OpenCV which is free and the other option is that you can use your IPP from any Intel® software tool suites ( Intel® System Studio or Intel® Parallel Studio )if you have one.
- To go with ICV just have WITH_IPP on. ICV package will download automatically and cmake configuration will catch it.
- In order to enable IPP from Intel® Software Suites , you need to manually add an entry for IPP as well on top of setting WITH_IPP. Click ‘Add Entry’ and type in its name as ‘IPPROOT’. Choose its type as PATH and insert where your IPP is located.
- If configuration gets done without a problem. Then it is ready to go

To set up the environment (Linux* Ubuntu* systems):

Configuration of OpenCV 3.0.0 – Enabling IPP
- Download OpenCV 3.0.0( http://www.opencv.org/ )
- Extract OpenCV where you want
- Open a terminal and go to where you extracted OpenCV
- As the same as Windows case, you can go with either ICV or IPP
- For ICV, type 'cmake -D WITH_IPP=ON .'
- Example configuration result for ICV
- For IPP, type 'cmake -D WITH_IPP=ON -D IPPROOT=<Your IPP Location> .'
- Example configuration result for IPP
- If the configuration went without a problem, then proceed and type 'make -j4'
- When building is done, type 'make install' to filnally install the library

↧

Using the Intel® RealSense™ Camera with TouchDesigner*: Part 1

October 27, 2015, 8:24 am

Latest and popular articles on Intel Technologies

≫ Next: Caffe* Training on Multi-node Distributed-memory Systems Based on Intel® Xeon® Processor E5 Family

≪ Previous: Enabling IPP on OpenCV ( Windows* and Linux* Ubintu* )

Download Demo Files ZIP 35KB

TouchDesigner*, created by Derivative, is a popular platform/program used worldwide for interactivity and real-time animations during live performances as well as rendering 3D animation sequences, building mapping, installations and recently, VR work. The support of the Intel® RealSense™ camera in TouchDesigner makes it an even more versatile and powerful tool. Also useful is the ability to import objects and animations into TouchDesigner from other 3D packages using .fbx files, as well as taking in rendered animations and images.

In this two-part article I explain how the Intel RealSense camera is integrated into and can be used in TouchDesigner. The demos in Part 1 use the Intel RealSense camera TOP node. The demos in Part 2 use the CHOP node. In Part 2, I also explain how to create VR and full-dome sequences in combination with the Intel RealSense camera. I show how TouchDesigner’s Oculus Rift node can be used in conjunction with the Intel RealSense camera. Both Part 1 and 2 include animations and downloadable TouchDesigner files, .toe files, which can be used to follow along. To get the TouchDesigner (.toe) files click on the button on the top of the article. In addition, a free noncommercial copy of TouchDesigner which is fully functional (except that the highest resolution has been limited to 1280 by 1280), is available.

Note: There are currently two types of Intel RealSense cameras, the short range F200, and the longer-range R200. The R200 with its tiny size is useful for live performances and installations where a hidden camera is desirable. Unlike the larger F200 model, the R200 does not have finger/hand tracking and doesn’t support "Marker Tracking." TouchDesigner supports both the F200 and the R200 Intel RealSense cameras.

To quote from the TouchDesigner web page, "TouchDesigner is revolutionary software platform which enables artists and designers to connect with their media in an open and freeform environment. Perfect for interactive multimedia projects that use video, audio, 3D, controller inputs, internet and database data, DMX lighting, environmental sensors, or basically anything you can imagine, TouchDesigner offers a high performance playground for blending these elements in infinitely customizable ways."

I asked Malcolm Bechard, senior developer at Derivative, to comment on using the Intel RealSense camera with TouchDesigner:

"Using TouchDesigner’s procedural node-based architecture, Intel RealSense camera data can be immediately brought in, visualized, and then connected to other nodes without spending any time coding. Ideas can be quickly prototyped and developed with an instant-feedback loop.Being a native node in TouchDesigner means there is no need to shutdown/recompile an application for each iteration of development.The Intel RealSense camera augments TouchDesigner capabilities by giving the users a large array of pre-made modules such as gesture, hand tracking, face tracking and image (depth) data, with which they can build interactions. There is no need to infer things such as gestures by analyzing the lower-level hand data; it’s already done for the user."

Using the Intel® RealSense™ Camera in TouchDesigner

TouchDesigner is a node-based platform/program that uses Python* as its main scripting language. There are five distinct categories of nodes that perform different operations and functions: TOP nodes (textures), SOP nodes (geometry), CHOP nodes (animation/audio data), DAT nodes (tables and text) and COMP nodes (3D Geometry nodes and nodes for building 2D control panels), and MAT nodes (materials). The programmers at TouchDesigner consulting with Intel programmers designed two special nodes: the Intel RealSense camera TOP node and the Intel RealSense camera CHOP node to integrate the Intel RealSense camera into the program.

Note: This article is aimed at those familiar with using TouchDesigner and its interface. If you are unfamiliar with TouchDesigner and plan to follow along with this article step-by-step, I recommend that you first review some of the documentation and videos available here:

Learning TouchDesigner

Note: When using the Intel RealSense camera, it is important to pay attention to its range for best results. On this Intel web page you will find the range of each camera and best operating practices for using it.

Intel RealSense Camera TOP Node

The TOP nodes in TouchDesigner perform many of the same operations found in a traditional compositing program. The Intel RealSense camera TOP node adds to these capabilities utilizing the 2D and 3D data feed that the Intel RealSense camera feeds into it. The Intel RealSense camera TOP node has a number of setup settings to acquire different forms of data.

Color. The video from the Intel RealSense camera color sensor.
Depth. A calculation of the depth of each pixel. 0 means the pixel is 0 meters from the camera, and 1 means the pixel is the maximum distance or more from the camera.
Raw depth. Values taken directly from the Intel® RealSense™ SDK. Once again 0 means 1 meter from the camera and 1 is the maximum range or more away from the camera.
Visualized depth. A gray-scale image from the Intel RealSense SDK that can help you visualize the depth. It cannot be used to actually determine a pixel’s exact distance from the camera.
Depth to color UV map. The UV values from a 32-bit floating RG texture (note, no blue) that are needed to remap the depth image to line up with the color image. You can use the Remap TOP node to align the images to match.
Color to depth UV map. The UV values from a 32-bit floating RG texture (note, no blue) that are needed to remap the color image to line up with the depth image. You can use the Remap TOP node to align the two.
Infrared. The raw video from the infrared sensor of the Intel RealSense camera.
Point cloud. Literally a cloud of points in 3D space (x, y, and z coordinates) or data points created by the scanner of the Intel RealSense camera.
Point cloud color UVs. Can be used to get each point’s color from the color image stream.

Note: You can download that toe file, RealSensePointCloudForArticle.toe, to use as a simple beginning template for creating a 3D animated geometry from the data of the Intel RealSense camera. This file can be modified and changed in many ways. Together, the three Intel RealSense camera TOP nodes—the Point Cloud, the Color, and the Point Cloud Color UVs—can create a 3D geometry composed of points (particles) with the color image mapped onto it. This creates many exciting possibilities.

Point Cloud Geometry. This is an animated geometry made using the Intel RealSense camera. This technique would be exciting to use in a live performance. The audio of the character speaking could be added as well. TouchDesigner can also use the data from audio to create real-time animations.

Intel RealSense Camera CHOP Node

Note: There is also an Intel RealSense camera CHOP node that controls the 3D tracking/position data that we will discuss in Part 2 of this article.

Demo 1: Using the Intel RealSense Camera TOP Node

Click on the button on top of the article to get the First TOP Demo: settingUpRealNode2b_FINAL.toe

Demo 1, part 1: You will learn how to set up the Intel RealSense camera TOP node and then connect it to other TOP nodes.

Open the Add Operator/OP Create dialog window.
Under the TOP section, click RealSense.
On the Setup parameters page for the Intel RealSense camera TOP node, for Image select Color from the drop-down menu. In the Intel RealSense camera TOP node, the image of what the camera is pointing to shows up, just as in a video camera.
Set the resolution of the Intel RealSense Camera to 1920 by 1080.

The Intel RealSense camera TOP node is easy to set up.
Create a Level TOP and connect it to the Intel RealSense camera TOP node.
In the Pre parameters page of the Level TOP Node, choose Invert and slide the slider to 1.
Connect The Level TOP node to an HSV To RGB TOP node and then connect that to a Null TOP node.

The Intel RealSense camera TOP node can be connected to other TOP nodes to create different looks and effects.

Next we will put this created image into the Phong MAT (Material) so we can texture geometries with it.

Using the Intel RealSense Camera Data to Create Textures for Geometries

Demo 1, part 2: This exercise shows you how to use the Intel RealSense camera TOP node to create textures and how to add them into a MAT node that can then be assigned to the geometry in your project.

Add a Geometry (geo) COMP node into your scene.
Add a Phong MAT node.
Take the Null TOP node and drag it onto the Color Map parameter of your Phong MAT node.

The Phong MAT using the Intel RealSense camera data for its Color Map parameter.
On the Render parameter page of your Geo COMP for the Material parameter add type phong1 to make it use the phong1 node as its material.

The Phong MAT using the Intel RealSense camera data for its Color Map added into the Render/Material parameter of the Geo COMP node.

Creating the Box SOP and Texturing it with the Just Created Phong Shader

Demo 1, part 3: You will learn how to assign the Phong MAT shader you created using the Intel RealSense camera data to a box Geometry SOP.

Go into the geo1 node to its child level, (/project1/geo1).
Create a Box SOP node, a Texture SOP node, and a Material SOP node.
Delete the Torus SOP node that was there and connect the box1 node to the texture1 node and the material1 node.
In the Material parameter of the material1 node enter: ../phong1 which will refer it to the phong1 MAT node you created in the parent level.
To put the texture on each face of the box, in the parameters of the texture1 node, Texture/Texture Type, put face and set the Texture/Offset put .5 .5 .5.

At the child level of the geo1 COMP node, the Box SOP node, the Texture SOP node, and the Material SOP node are connected. The Material SOP is now getting its texture info from the phong1 MAT node which is at the parent level. ( …/phong1).

Animating and Instancing the Box Geometry

Demo 1, part 4: You will learn how to rotate a Geometry SOP using the Transform SOP node and a simple expression. Then you will learn how to instance the Box geometry. We will end up with a screen full of rotating boxes with the textures from the Intel RealSense camera TOP node on them.

To animate the box rotating on the x-axis, insert a Transform SOP node after the Texture SOP node.
Put an expression into the x component (first field) of the Rotate parameter in the transform1 SOP node. This expression is not dependent on the frames so it will keep going and not start repeating when the frames on the timeline run out. I multiplied by 10 to increase the speed: absTime.seconds*10

Here you can see how the cube is rotating.
To make the boxes, go up to the parent level (/project1) and in the Instance page parameters of the geo1 COMP node, for Instancing change it to On.
Add a Grid SOP node and a SOP to the DAT node.
Set the grid parameters to 10 Rows and 10 Columns and the size to 20 and 20.
In the SOP to DAT node parameters, for SOP put grid1 and make sure Extract is to set Points.
In the Instance page parameters of the geo1 COMP, for Instance CHOP/DAT enter: sopto1.
Fill in the TX, TY, and TZ parameters with P(0), P(1), and P(2) respectively to specify which columns from the sopto1 node to use for the instance positions.

Click on the button on top of the article to download this .toe file to see what we have done so far in this first Intel RealSense camera TOP demo.
TOP_Demo1_forArticle.toe
If you prefer to see the image in the Intel RealSense camera unfiltered, disconnect or bypass the Level TOP node and the HSV to RGB TOP node.

Rendering or Performing the Animation Live

Demo 1, part 5: You will learn how to set up a scene to be rendered and either performed live or rendered out as a movie file.

To render the project, add in a Camera COMP node, a Light COMP node, and a Render TOP node. By default the camera will render all the Geometry components in the scene.
Translate your camera about 20 units back on the z-axis. Leave the light at the default setting.
Set the resolution of the render to 1920 by 1080. By default the background of a render is transparent (alpha of 0).
To make this an opaque black behind the squares, add in a Constant TOP node and change the Color to 0,0,0 so it is black while leaving the Alpha as 1. You can choose another color if you want.
Add in an Over TOP node and connect the Render TOP node to the first hook up and the Constant TOP node to the second hook up. This makes the background pixels of the render (0, 0, 0, 1), which is no longer transparent.

Another way to change the alpha of a TOP to 1 is to use a Reorder TOP and set its Output Alpha parameter to Input 1 and One.

Shows the rendered scene with the background being set to opaque black.

Here you can see the screen full of the textured rotating cubes.

If you prefer to render out the animation instead of playing it in real time in a performance you must choose the Export movie Dialog box under file in the top bar of the TouchDesigner program. In the parameter for the TOP Video, enter null2 for this particular example. Otherwise enter any TOP node that you want to render.

Here is the Export Movie panel, and null2 has been pulled into it. If I had an audio CHOP to go along with it, I would pull or place that into the CHOP Audio slot directly under where I put null2.

Demo 1, part 6: One of the things that makes TouchDesigner a special platform is the ability to do real-time performance animations with it. This makes it especially good when paired with the Intel RealSense Camera.

Add a Window COMP node and in the operator parameter enter your null2 TOP node.
Set the resolution to 1920 by 1080.
Choose the Monitor you want in the Location parameter. The Window COMP node lets you perform the entire animation in real time projected onto the monitor you choose. Using the Window COMP node you can specify the monitor or projector you want the performance to be played from.

You can create as many Window COMP nodes as you need to direct the output to other monitors.

Demo 2: Using the Intel RealSense Camera TOP Node Depth Data

The Intel RealSense camera TOP node has a number of other settings that are useful for creating textures and animation.

In demo 2, we use the depth data to apply a blur on an image based on depth data from the camera. Click on the button on top of the article to get this file: RealSenseDepthBlur.toe

First, create an Intel RealSense camera TOP and set its Image parameter to Depth. The depth image has pixels that are 0 (black) if they are close to the camera and 1 (white) if they are far away from the camera. The range of the pixel values is controlled by the Max Depth parameter which is specified in Meters. By default it has a value of 5 which means pixels 5 or more meters from the camera will be white. A pixel with a value of 0.5 will be 2.5 meters from the camera. Depending on how far the camera is from you changing this value to something smaller may be good. For this example we’ve changed it to 1.5 meters.

Next we want to process the depth a bit to remove objects outside our range interest, which we will do using a Threshold TOP.

Create a Threshold TOP and connect it to the realsense1 node. We want to cull out pixels that beyond a certain distance from the camera so set the Comparator parameter to Greater and set the Threshold parameter to 0.8. This makes pixels that are greater than 0.8 (which is 1.2 meters or greater if we have Max Depth in the Intel RealSense camera TOP set to 1.5), become 0 and all other pixels become 1.
Create a Multiply TOP and connect the realsense1 node to the first input and the thresh1 node to the 2nd input. Multiplying the pixels we want by 1 will leave them as-is and others by 0 make them back. The multiply1 node now has only pixels greater than 0 for the part of the image you want to control the blur we will do next.
Create a Movie File in TOP, and select a new image for its File parameter. In this example we select Metter2.jpg from the TouchDesigner Samples/Map directory.
Create a Luma Blur TOP and connect moviefilein1 to the 1st input of lumablur1 and multiply1 to the 2nd input of lumablur1.
In the parameters for lumablur1 set White Value to 0.4, Black Filter Width to 20, and White Filter Width to 1. This makes pixels where the first input is 0 have a blur filter width of 20 and a pixels with a value of 0.4 or greater have a blur width of 1.

The whole layout.

The result is an image where the pixels where the user is located are not blurred while other pixels are blurry.

The background, by putting on the display of the Luma Blur TOP shows how the image is blurred.

Demo 3: Using the Intel RealSense Camera TOP Node Depth Data with the Remap TOP Node

Click on the button on the article top to get this file: RealSenseRemap.toe

Note: The depth and color cameras of the Intel RealSense camera TOP node are in different spots in the world so their resulting images by default do not line up. For example if your hand is positioned in the middle of the color image, it won’t be in the middle of the depth image, it will either be off to the left or right a bit. The UV remap fixes this by shifting the pixels around so they align on top of each other. Notice the difference between the aligned and unaligned TOPs.

The Remap TOP aligns the depth data from the Intel RealSense camera TOP with the color data from the Intel RealSense camera TOP, using the depth to color UV data, putting them in the same world space.

Demo 4: Using Point Cloud in the Intel RealSense Camera TOP Node

Click on the button on top of the article to get this file: PointCloudLimitEx.toe

In this exercise you learn how to create animated geometry using the Intel RealSense camera TOP node point Cloud setting and the Limit SOP node. Note that this technique is different than the Point Cloud example file shown at the beginning of this article. The previous example uses GLSL shaders, which results in the ability to generate far more points, but it is more complex to do and out of the scope of this article.

Create a RealSense TOP node and set the parameter Image to Point Cloud.
Create a TOP to CHOP node and connect it to a Select CHOP node.
Connect the Select CHOP node to a Math CHOP node.
In the topto1 CHOP node parameter, TOP, enter: realsense1.
In the Select CHOP node parameters, Channel Names, enter r g b leaving a space between the letters.
In the math1 CHOP node for the Multiply parameter, enter: 4.2.
On the Range parameters page, for To Range, enter: 1 and 7.
Create a Limit SOP node.

To quote from the information on the www.derivative.ca online wiki page, "The Limit SOP creates geometry from samples fed to it by CHOPs. It creates geometry at every point in the sample. Different types of geometry can be created using the Output Type parameter on the Channels Page."

In the limit1 CHOP Channels parameters page, enter r in the X Channel, g in the Y Channel, and b in the Z Channel.

Note: Switching the r g and b to different X Y or Z channels changes the geometry being generated. So you might want to try this later: In the Output parameter page, for Output Type select Sphere at Each Point from the drop-down. Create a SOP to DAT node. In the parameters page, for SOP put in limit1 or drag your limit1 CHOP into the parameter. Keep the default setting of Points in the Extract parameter. Create a Render TOP node, a Camera COMP node, and a Light COMP node. Create a Reorder TOP and make Output Alpha be Input 1 and One and connect it to the Render TOP.

As the image in the Intel RealSense camera changes, so does the geometry. This is the final layout.

Final images in the Over TOP CHOP node. By changing the order of the channels in the Limit TOP parameters you change the geometry which is based on the point cloud.

In Part 2 of this article we will discuss the Intel RealSense camera CHOP and how to create content both rendered and in real-time for performances, Full Dome shows, and VR. We will also show how to use the Oculus Rift CHOP node.

About the Author

Audri Phillips is a visualist/3d animator based out of Los Angeles, with a wide range of experience that includes over 25 years working in the visual effects/entertainment industry in studios such as Sony, Rhythm and Hues, Digital Domain, Disney, and Dreamworks feature animation. Starting out as a painter she was quickly drawn to time based art. Always interested in using new tools she has been a pioneer of using computer animation/art in experimental film work including immersive performances. Now she has taken her talents into the creation of VR. Samsung recently curated her work into their new Gear Indie Milk VR channel.

Her latest immersive work/animations include: Multi Media Animations for "Implosion a Dance Festival" 2015 at the Los Angeles Theater Center, 3 Full dome Concerts in the Vortex Immersion dome, one with the well-known composer/musician Steve Roach. She has a fourth upcoming fulldome concert, "Relentless Universe", on November 7th, 2015. She also created animated content for the dome show for the TV series, “Constantine” shown at the 2014 Comic-Con convention. Several of her Fulldome pieces, “Migrations” and “Relentless Beauty”, have been juried into "Currents", The Santa Fe International New Media Festival, and Jena FullDome Festival in Germany. She exhibits in the Young Projects gallery in Los Angeles.

She writes online content and a blog for Intel. Audri is an Adjunct professor at Woodbury University, a founding member and leader of the Los Angeles Abstract Film Group, founder of the Hybrid Reality Studio (dedicated to creating VR content), a board member of the Iota Center, and she is also an exhibiting member of the LA Art Lab. In 2011 Audri became a resident artist of Vortex Immersion Media and the c3: CreateLAB.

↧

Caffe* Training on Multi-node Distributed-memory Systems Based on Intel® Xeon® Processor E5 Family

October 29, 2015, 9:35 am

Latest and popular articles on Intel Technologies

≫ Next: Video Transcode Solutions: Simple, Fast, Efficient - Dec. 1 Webinar

≪ Previous: Using the Intel® RealSense™ Camera with TouchDesigner*: Part 1

Deep neural network (DNN) training is computationally intensive and can take days or weeks on modern computing platforms. In the recent article, Single-node Caffe Scoring and Training on Intel® Xeon® E5 Family, we demonstrated a tenfold performance increase of the Caffe* framework on the AlexNet* topology and reduced the training time to 5 days on a single node. Intel continues to deliver on the machine learning vision outlined in Pradeep Dubey’s Blog, and in this technical preview, we demonstrate how the training time for Caffe can be reduced from days to hours in a multi-node, distributed-memory environment.

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and one of the most popular community frameworks for image recognition. Caffe is often used as a benchmark together with AlexNet*, a neural network topology for image recognition, and ImageNet*, a database of labeled images.

The Caffe framework does not support multi-node, distributed-memory systems by default and requires extensive changes to run on distributed-memory systems. We perform strong scaling of the synchronous minibatch stochastic gradient descent (SGD) algorithm with the help of Intel® MPI Library. Computation for one iteration is scaled across multiple nodes, such that the multi-threaded multi-node parallel implementation is equivalent to the single-node, single-threaded serial implementation.

We use three approaches—data parallelism, model parallelism, and hybrid parallelism—to scale computation. Model parallelism refers to partitioning the model or weights into nodes, such that parts of weights are owned by a given node and each node processes all the data points in a minibatch. This requires communication of the activations and gradients of activations, unlike communication of weights and weight gradients, as is the case with data parallelism.

With this additional level of distributed parallelization, we trained AlexNet on the full ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC-2012) dataset and reached 80% top-5 accuracy in just over 5 hours on a 64-node cluster of systems based on Intel® Xeon® processor E5 family.

Getting Started

While we are working to incorporate the new functionality outlined in this article into future versions of Intel^® Math Kernel Library (Intel^® MKL) and Intel^® Data Analytics Acceleration Library (Intel^® DAAL), you can use the technology preview package attached to this article to reproduce the demonstrated performance results and even train AlexNet on your own dataset. The preview includes both the single-node and the multi-node implementations. Note that the current implementation is limited to the AlexNet topology and may not work with other popular DNN topologies.

The package supports the AlexNet topology and introduces the ‘intel_alexnet’ and ‘mpi_intel_alexnet’ models, which are similar to ‘bvlc_alexnet’ with the addition of two new ‘IntelPack’ and ‘IntelUnpack’ layers, as well as the optimized convolution, pooling, normalization layers, and MPI-based implementations for all these layers. We also changed the validation parameters to facilitate vectorization by increasing the validation minibatch size from 50 to 256 and reducing the number of test iterations from 1,000 to 200, thus keeping constant the number of images used in the validation run. The package contains the ‘intel_alexnet’ model in these folders:

models/intel_alexnet/deploy.prototxt
models/intel_alexnet/solver.prototxt
models/intel_alexnet/train_val.prototxt.

models/mpi_intel_alexnet/deploy.prototxt
models/mpi_intel_alexnet/solver.prototxt
models/mpi_intel_alexnet/train_val.prototxt.
models/mpi_intel_alexnet/train_val_shared_db.prototxt
models/mpi_intel_alexnet/train_val_split_db.prototxt

Both the ’intel_alexnet’ and the ’mpi_intel_alexnet’ models allow you to train and test the ILSVRC-2012 training set.

To start working with the package, ensure that all the regular Caffe dependencies and Intel software tools listed in the System Requirements and Limitations section are installed on your system.

Running on Single Node

Unpack the package.
Specify the paths to the database, snapshot location, and image mean file in these ‘intel_alexnet’ model files:
- models/intel_alexnet/deploy.prototxt
- models/intel_alexnet/solver.prototxt
- models/intel_alexnet/train_val.prototxt
Set up a runtime environment for the software tools listed in the System Requirements and Limitations section.
Add the path to ./build/lib/libcaffe.so to the LD_LIBRARY_PATH environment variable.
Set the threading environment as follows:
$> export OMP_NUM_THREADS=<N_processors * N_cores>
$> export KMP_AFFINITY=compact,granularity=fine

Note: OMP_NUM_THREADS must be an even number equal to at least 2.

Run timing on a single node using this command:
$> ./build/tools/caffe time \
-iterations <number of iterations> \
--model=models/intel_alexnet/train_val.prototxt
Run training on a single node using this command:
$> ./build/tools/caffe train \
--solver=models/intel_alexnet/solver.prototxt

Running on Cluster

Unpack the package.
Set up a runtime environment for the software tools listed in the System Requirements and Limitations section.
Add the path to ./build-mpi/lib/libcaffe.so to the LD_LIBRARY_PATH environment variable.
Set the NP environment variable to the number of nodes to be used, as follows:

$> export NP=<number-of-mpi-ranks>

Note: the best performance is achieved with one MPI rank per node.

Create a node file in the root directory of the application with the name of x${NP}.hosts. For instance, for IBM* Platform LSF*, run the following command:

$> cat $PBS_NODEFILE > x${NP}.hosts

Specify the paths to the database, snapshot location, and image mean file in the following ‘mpi_intel_alexnet’ model files:
- models/mpi_intel_alexnet/deploy.prototxt,
- models/mpi_intel_alexnet/solver.prototxt,
- models/mpi_intel_alexnet/train_val_shared_db.prototxt

Note: on some system configurations, performance of a shared-disk system may become a bottleneck. In this case, pre-distributing the image database to compute nodes is recommended to achieve best performance results. Refer to the readme files included with the package for instructions.

Set the threading environment as follows:

$> export OMP_NUM_THREADS=<N_processors * N_cores>
$> export KMP_AFFINITY=compact,granularity=fine

Note: OMP_NUM_THREADS must be an even number equal to at least 2.

Run timing using this command:
$> mpirun -nodefile x${NP}.hosts -n $NP -ppn 1 -prepend-rank \

./build/tools/caffe time \

-iterations <number of iterations> \

--model=models/mpi_intel_alexnet/train_val.prototxt

Run training using this command:
$> mpirun -nodefile x${NP}.hosts -n $NP -ppn 1 -prepend-rank \

./build-mpi/tools/caffe train \

--solver=models/mpi_intel_alexnet/solver.prototxt

System Requirements and Limitations

The package has the same software dependencies as non-optimized Caffe:

Intel software tools:

Intel MKL 11.3 or higher
Intel MPI Library 5.0

Hardware compatibility:

4^th Generation Intel® Core™ Processor (code name Haswell)

This software was validated with the AlexNet topology only and may not work with other configurations.

Support

Please direct questions and comments on this package to mailto:intel.mkl@intel.com.

↧

Video Transcode Solutions: Simple, Fast, Efficient - Dec. 1 Webinar

November 1, 2015, 11:52 pm

Latest and popular articles on Intel Technologies

≫ Next: Platform Analyzer - Analyzing Healthy and not-so Healthy Applications

≪ Previous: Caffe* Training on Multi-node Distributed-memory Systems Based on Intel® Xeon® Processor E5 Family

In a world where internet video use is skyrocketing and consumers expect High Definition and ultra-high definition (UHD) 4K viewing anytime, anywhere and on any device, excel with Intel in delivering live and on-demand video faster, more efficiently, and at higher quality through the latest media acceleration technologies. Get the most performance from your media platform, and accelerate to 4K/UHD and HEVC, while reducing infrastructure and development costs.

Attend this free online webinar, Video Transcode Solutions on Dec. 1, 9 a.m. (Pacific), to learn about new media acceleration and Intel graphics technologies. Offer your cloud and communication service provider customers a customizable solution that can:

Deliver fast video transcoding into multiple formats and bit rates in less time
Reduce the amount of storage needed for multiple formats through higher compression processing and offering multiple rate control techniques
Allow for real-time transcoding into multiple formats from the stored format, reducing the need to store all possible media formats
Reduce the amount of network bandwidth needed (lower bit rates) at better video quality by compressing the video appropriately prior to transmission

Video Transcode Solutions: Simple, Fast Efficient
Online Webinar | Dec. 1, 2015 - 9 a.m. (Pacific)

See how Intel can help you innovate and bring new media solutions quicker to market. Webinar is for cloud media solutions and video streaming/conferencing providers, media/graphics developers, broadcast/datacenter engineers, and IT/business decision-makers.

Speakers

Shantanu Gupta, Intel Media Accelerator Products Director
Shantanu has held leadership roles in technology/solutions marketing, integration, product design/development and more areas - for 27 years at Intel.

Mark J. Buxton, Intel Media Development Products Director
Mark has more than 20 years experience leading video standards development and Intel’s media development product efforts, including products such at Intel® Media Server Studio, Intel® Video Pro Analyzer, and Intel® Stress Bitstreams and Encoder

↧

Platform Analyzer - Analyzing Healthy and not-so Healthy Applications

October 5, 2015, 4:42 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® RealSenseTM Cameras and DCMs Overview

≪ Previous: Video Transcode Solutions: Simple, Fast, Efficient - Dec. 1 Webinar