Quantcast
Channel: Intel Developer Zone Articles
Viewing all 461 articles
Browse latest View live

Applying Intel® RealSense™ SDK Face Scans to a 3D Mesh

$
0
0

This sample uses the Intel® RealSense™ SDK to scan and map a user’s face onto an existing 3D character model. The code is written in C++ and uses DirectX*. The sample requires Intel® RealSense™ SDK R5 or greater, which can be found here.

The sample is available at https://github.com/GameTechDev/FaceMapping.

Scanning

The face scanning module is significantly improved in the WM5 SDK. Improvements include:

  • Improved color data
  • Improved consistency of the scan by providing hints to direct the user’s face to an ideal starting position
  • Face landmark data denoting the positions of key facial features

These improvements enable easier integration into games and other 3D applications by producing more consistent results and requiring less user modification.

The scanning implementation in this sample guides the user’s head to a correct position by using the hints provided from the Intel® RealSense™ SDK. Once the positioning requirements are met, the sample enables the start scan button.

This sample focuses on the face mapping process, and therefore the GUI for directing the user during the scan process is not ideal. The interface for an end-user application should better direct the user to the correct starting position as well as provide instructions once the scan begins.

The output of the scan is an .OBJ model file and an associate texture which will be consumed in the face mapping phase of the sample.


Figure 1: The face scanning module provides a preview image that helps the user maximize scan coverage.


Figure 2: Resulting scanned mesh. The image on the far right shows landmark data. Note that the scan is only the face and is not the entire head. The color data is captured from the first frame of the scan and is projected onto the face mesh; this approach yields high color quality but results in texture stretching on the sides of the head.

Face Mapping

The second part of the sample consumes the user’s scanned face color and geometry data and blends it onto an existing head model. The challenge is to create a complete head from the scanned face. This technique displaces the geometry of an existing head model as opposed to stitching the scanned face mesh onto the head model. The shader performs vertex displacement and color blending between the head and face meshes. This blending can be performed every time the head model is rendered, or a single time by caching the results. This sample supports both approaches.

The high-level process of this mapping technique includes:

  1. Render the scanned face mesh using an orthographic projection matrix to create a displacement map and a color map.
  2. Create a matrix to project positions on the head model onto the generated displacement and color maps. This projection matrix accounts for scaling and translations determined by face landmark data.
  3. Render the head model using the projection matrix to map vertex positions to texture coordinates on the displacement and color maps.
  4. Sample the generated maps to deform the vertices and color the pixels. The blending between the color map and the original head texture is controlled by an artist-created control map.
  5. (Optional) Use the same displacement and blending methodologies to create a displaced mesh and single diffuse color texture that incorporates all blending effects.

Art Assets

The following art assets are used in this sample:

Head model. A head model the face is applied to. The model benefits from higher resolution in the facial area where the vertices are displaced.

Feature map. Texture mapped to the UVs of the head model that affects the brightness of the head.

Detail map. Repeated texture that applies additional detail to the feature map.

Color transfer map. Controls blending between two base skin tones. This allows different tones to be applied at different locations of the head. For example, the cheeks and ears can have a slightly different color than the rest of the face.

Control map. Controls blending between the displacement and color maps and existing head model data. Each channel of the control map has a separate purpose:

  • The red channel is the weight for vertex Z displacement. A weight of zero uses the vertex position of the head model, a weight of one modifies the Z vertex position based on the generated displacement map, and intermediate values result in a combination of the two.
  • The green channel is the weight for blending between the head diffuse color and the generated face color map. Zero is full head diffuse and one is full face color.
  • The blue channel is an optional channel that covers the jawbone. This can be used in conjunction with the green channel to allow a user’s jawbone color to be applied instead of the head model’s diffuse color. This might be useful in the case where the user has facial hair.

All maps are created in head model UV space.


Figure 3: Head model asset with a highly tesselated face. The scanned face will be extruded from the high resolution area.


Figure 4: Feature map (left) and detail map (right). The detail map is mapped with head model UVs but repeated several times to add detail.


Figure 5: Color transfer map (left) and the color transfer map being applied drawn on the head model (right). This map determines the weights of the two user-selected skin colors.


Figure 6: The control map (left) and the control map applied to the head model. The red channel is the area that should be affected by the displacement map. The green channel is the area that should receive the scanned color map. The blue channel represents the jawbone area. Using the color map in the jawbone area can be used to capture distinct jawbone features such as facial hair.

Displacement and Color Maps

The first step of the face mapping process is generating a displacement and color map based on the scanned face mesh. These maps are generated by rendering the face mesh using an orthographic projections matrix. This sample uses multiple render targets to generate the depth displacement map and the color map in a single draw call. It sets the projection matrix so that the face is fully contained within the viewport.


Figure 7: The displacement and color maps generated from the scanned face mesh.

Map Projection Matrix

Now we use the landmark data from the scanned face model and the head model to create a transformation matrix to convert from head model vertex coordinates in model space to texture coordinates in the displacement and color map space. We’ll call this the map projection matrix because it effectively projects the displacement maps onto the head model.

The map projection matrix consists of a translation and a scale transformation:

  • Scale transform. The scaling factor is calculated by the ratio of the distances between the eyes of the scanned face mesh (in projected map coordinates) and eyes of the head model.
  • Translation transform. The vertex translation is calculated using the head model and scanned face mesh landmark data. The translation makes the point directly between the eyes of the head model to align with the respective point on the displacement map. To calculate this respective point, we use the left and right eye landmarks to calculate the center point and then transform it by the orthographic projection matrix used when generating the displacement and color maps.
  • Rotation transform. This sample assumes that the scanned face mesh is axially aligned and does not require a rotation. The sample includes GUI controls for introducing rotation for artistic control.


Figure 8: Generated color map (left) being orthographically projected onto the head model. The map is translated and scaled so that the yellow anchor points between the eyes align.

Rendering

The sample applies the generated displacement and color maps at render time in vertex and pixel shaders.

Vertex Shader

The vertex shader displaces the model’s Z coordinate based on the displacement map. The displacement map texture coordinates are sent to the pixel shader where they’re used to sample the color map for blending.

The vertex shader steps include:

  1. Transform vertex position by the map projection matrix to get color/displacement map texture coordinates.
  2. Sample the displacement map texture at calculated coordinates.
  3. Convert the displacement sample to a model space Z value. The range and scale of the displacement map are passed through a constant buffer.
  4. Blend between the displaced Z value and the original Z value based on the control map’s red component. The control map lets the artist decide what vertices get displaced and allows for a gradual, smooth transition to the displaced position.
  5. Pass the displacement map UV coordinates to the pixel shader to be used to sample the color map.

Pixel Shader

The pixel shader uses the control map’s green channel to blend between the head color and the generated color map texture. Because the sample allows the user to change the head color to better match the scanned face color, the color is blended into greyscale art assets in the pixel shader. The skin color is calculated for each pixel by blending between two user-selected colors based on the color transfer map. That skin color is multiplied by the greyscale intensity to produce a final head color.


Figure 9: Demonstration of head model blending without applying the displacement or color maps.


Figure 10: Final result composited in real-time using the sample.

Exporting

This technique applies several layers of blending inside the pixel shader as well as modifies the vertex position in the vertex shader each time the model is rendered. The sample also supports exporting the composited texture and the deformed mesh to an .OBJ file.

The entire compositing and deformation process still occurs on the GPU using a variation of the original shaders. The new vertex shader uses Direct3D* stream-output support to capture the deformed vertices. The vertex shader also uses the input texture coordinates as the output position; this effectively renders a new UV mapped texture.

Once composited, the model can be rendered with lower overhead and without any custom shaders, allowing it to easily be loaded by 3D modeling tools and game engines.


Figure 11: Exported .OBJ model (left), composited head texture UV mapped to the head model’s UVs.


Designing Apps for Intel® RealSense™ Technology – User Experience Guidelines with Examples for Windows*

$
0
0

Introduction

Intel® RealSense™ technology supports two varieties of depth cameras: the short-range, user-facing camera (called the F200) is designed for use in laptops, Ultrabook™ devices, 2 in 1 devices, and All-In-One (AIO) form factors, while the long-range, world-facing camera (called the R200) is designed for the detachable and tablet form factors. Both these cameras are available as peripherals and are built into devices we see in the market today. When using Intel RealSense technology to develop apps for these devices, keep in mind that the design paradigm for interacting with 3D apps without tactile feedback is considerably different than what developers and end users are used to with apps built for touch.

In this article, we highlight some of the most common UX ideas and challenges for both the F200 and R200 cameras and demonstrate how developers can build in visual feedback through the Intel® RealSense™ SDK APIs.

F200 UX and API Guidelines

Outcome 1: Understanding the capture volumes and interaction zones for laptop and AIO form factors

The UX scenario

Consider the scenarios depicted in Figure 1

 Capture volumes.

Figure 1: Capture volumes.

The pyramid drawn out of the camera represents what is called the capture volume, also known as the Field of View (FOV). For the F200, the capture volume includes the horizontal and vertical axes of the camera as well as the effective distance of the user from the camera. If the user moves out of this pyramid, the camera fails to track the mode of interaction. A table of reference for FOV parameters is given below:

ParameterRanges
Effective range for gestures0.2–0.6 m
Effective range for face tracking0.35–1.2 m
FOV (DxVxH) color camera in degrees77x43x70 (cone)
FOV Depth (IR) camera in degrees

90x59x73 (cone)

IR Projector FOV = NA x 56 x 72 (pyramid)
RGB resolutionUp to 1080p at 30 frames per second (fps)
Depth resolutionUp to 640x480 at 60 fps

Both the color and depth cameras within the F200 have different fidelities, and, therefore, application developers need to keep the capture volume in mind for the modalities they want to use. As shown in the table above, the effective range for gestures is shorter, whereas face tracking covers a longer range.

Why is this important from a UX perspective? End users are unaware of how the camera sees them. Since they are aware of the interaction zones, they may become frustrated using the app because there is no way to determine what went wrong. As shown in the image on the left in Figure 1, the user’s hand is within the FOV, whereas in the image on the right, the user’s hand is outside the FOV, depicting a scenario where tracking could be lost. The problem is compounded if the application uses two hands or multiple modalities like hands and the face at the same time. Consider the consequences if your application is deployed on different form factors like laptops and AIOs where the effective interaction zone in the latter is higher than on a laptop. Figure 2 depicts scenarios where users are positioned in front of different devices.

Figure 2. FOV and form factor considerations.
Figure 2: FOV and form factor considerations.

Keeping these parameters in mind will help you build an effective visual feedback mechanism into the application that can clearly steer users in the right track of usage. Let’s now see how to capture some of these FOV parameters in your app through the SDK.

The technical implementation

The Intel RealSense SDK provides APIs that allow you to capture the FOV and camera range. The APIs QueryColorFieldOfView and QueryDepthFieldOfView are both provided as device-neutral functions within the “device” interface. Here is how to implement it in your code:

Though the return data structure is a PXCPointF32, the values returned indicate the x and y angles in degrees and are the model set values, not the device-calibrated values.

The next parameter with the capture volume is range. The QueryDepthSensorRange API returns the range value in mm. Again, this is a model default value and not the device-calibrated value.

Knowing the APIs that exist and how to implement them in your code, you can build effective visual feedback to your end users. Figures 3 and 4 show examples of visual feedback for capture volumes.
Figure 3. Distance prompts.
Figure 3: Distance prompts.
Figure 4. World diagrams.
Figure 4: World diagrams.

Simple prompts indicate the near and far boundaries of the interaction zone. Without prompts, if the system becomes unresponsive, the user might not understand what to do next. Filter the distance data and show the prompt after a slight delay. Also ensure that you use positive instructions instead of error alerts. World diagrams orient the user and introduce them to the notion of a depth camera with an interaction zone. The use of world diagrams is recommended for help screens and tutorials and for games in which users might be new to the camera. For maximum effectiveness, show the world diagrams only during a tutorial or on a help screen. Instructions should be easy to understand and created with the audience in mind.

You can supplement the use of the above-mentioned APIs with alerts that are provided within each SDK middleware to capture specific user actions. For example, let’s take a look at the face detection middleware. The following table summarizes some of the alerts within the PXC[M]FaceData module:

As we already know, the SDK allows for detecting up to four faces within the FOV. Using the face ID, we can capture alerts specific to each face depending on your application’s needs. It is also possible that tracking is lost completely (example: The face moved in and out of the FOV too fast for the camera to track). In such a scenario, you can use the capture volume data together with the alerts to build a robust feedback mechanism for your end users.

Alert TypeDescription
ALERT_NEW_FACE_DETECTEDA new face is detected
ALERT_FACE_NOT_DETECTEDThere is no face in the scene.
ALERT_FACE_OUT_OF_FOVThe face is out of camera field of view.
ALERT_FACE_BACK_TO_FOVThe face is back to field of view.
ALERT_FACE_LOSTFace tracking is lost.

The SDK also allows you to detect occlusion scenarios. Please refer to the F200 UX guideline document for partially supported and unsupported scenarios. Irrespective of which category of occlusion you are trying to track, the following set of alerts will come in handy.

Alert TypeDescription
ALERT_FACE_OCCLUDEDThe face is occluded.
ALERT_FACE_NO_LONGER_OCCLUDEDThe face is not occluded.
ALERT_FACE_ATTACHED_OBJECTThe face is occluded by some object, ex: hand.
ALERT_FACE_OBJECT_NO_LONGER_ATTACHEDThe face is not occluded by the object.

Now let’s take a look at alerts within the hand tracking module. These are available within the PXC[M]HandData module of the SDK. As you can see, some of these alerts also provide the range detection implicitly (recall that the range is different for the face and hand modules).

Alert NameDescription
ALERT_HAND_OUT_OF_BORDERSA tracked hand is outside of a 2D bounding box or 3D bounding cube defined by the user.
ALERT_HAND_INSIDE_BORDERSA tracked hand has moved back inside the 2D bounding box or 3D bounding cube defined by the user.
ALERT_HAND_TOO_FARA tracked hand is too far to the camera.
ALERT_HAND_TOO_CLOSEA tracked hand is too close to the camera.
ALERT_HAND_DETECTEDA tracked hand is identified and its mark is available.
ALERT_HAND_NOTE_DETECTEDA previously detected hand is lost, either because it left the field of view or because it is occluded.
And more...Refer to the documentation

Now that you know what capabilities the SDK provides, it is easy to code this in your app. The following code snippet shows an example:



Replace the wprintf_s statements with logic to implement the visual feedback. Instead of enabling all alerts, you can also just enable specific alerts as shown below:

Figures 5 and 6 show examples of effective visual feedback using alerts.

Figure 5. User viewport.
Figure 5: User viewport.

Figure 6. User overlay.
Figure 6: User overlay.

Links to APIs in SDK documentation:

QueryColorFieldOfView: https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?querycolorfieldofview_device_pxccapture.html

QueryDepthFieldOfView: https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?querydepthfieldofview_device_pxccapture.html

QueryDepthSensorRange: https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?querydepthsensorrange_device_pxccapture.html

Face module FOV alerts:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?alerttype_alertdata_pxcfacedata.html

Hand module FOV alerts:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?manuals_handling_alerts.html

Outcome 2: Minimizing user fatigue

The UX scenario: The choice of appropriate input for required precision

When building apps using the Intel RealSense SDK, it is important to keep modality usage relevant. Choosing the appropriate input methods for various scenarios in your application plays a key role. Keyboard, mouse, and touch provide for higher precision, while gesture provides lower precision. For example, keyboard and mouse, rather than gestures, are still the preferred input methods for data-intensive apps. Imagine using your finger instead of a mouse to select a specific cell in Excel (see Figure 7). This would be incredibly frustrating and tiring. Users naturally tense their muscles when trying to perform precise actions, which in turn accelerates fatigue. 

 Choice of correct input.
Figure 7: Choice of correct input. 

The selection of menu items can be handled either through touch or mouse. The Intel RealSense SDK modalities provide for a direct, natural, and non-tactile interaction mechanism while making your application more engaging. Use them in a way that does not require many repeating gestures. Continuous and low risk actions are best for gesture usage.

Choice of direction for gesture movement

Tips for designing for left-right or arced gesture movements: Whenever presented with a choice, design for movement in the left-right directions versus up-down for ease and ergonomic considerations. Also, avoid actions that require your users to lift their hands above the height of their shoulder. Remember the gorilla arm effect?

 Choice of direction for gesture movement.
Figure 8: Choice of direction for gesture movement.

Choice of relative versus absolute motion

Allow for relative motion instead of absolute motion wherever it makes sense. Relative motion allows the user to reset his or her hand representation on the screen to a location that is more comfortable for the hand (such as when lifting a mouse and repositioning it so that it is still on the mouse pad). Absolute motion preserves spatial relationships. Applications should use the motion model that makes the most sense for the particular context.

Understanding speed of motion

The problem of precision is compounded by speed. When users move too fast in front of the camera, they potentially risk losing tracking altogether because they could move out of the capture volume. Building fast movement into apps also introduces fatigue while being more error prone. So it is critical to understand the effects of speed and its relation to the effective range (faster motion up to 2m/s can be detected closer to the camera—20 to 55 cm) and the capture volume (closer to the camera implies only one hand can be in the FOV).

Understanding action and object interaction

The human body is prone to jitters that could be interpreted by the camera as multiple interactions. When designing apps for the Intel RealSense SDK, keep action-to-object interaction in mind. For example, if you have objects that could be grabbed through gesture, consider their size, placement, how close they are to the edges of the screen, where to drop the object, and how to detect tracking failures.

Here are some guidelines to help avoid these challenges:

  • Objects should be large enough to account for slight hand jitter. They should also be positioned far enough apart so users cannot inadvertently grab the wrong object.
  • Avoid placing interaction elements too close to the edge of the screen, so the user doesn’t get frustrated with popping out of the field of view and thus lose tracking altogether.
  • If the interface relies heavily on grabbing and moving, it should be obvious to the user where a grabbed object can be dropped.
  • If the hand tracking fails while the user is moving an object, the moved object should reset to its origin and the tracking failure should be communicated to the user.

The technical implementation: Speed and precision

If your application doesn’t require the hand skeleton data, but relies more on quicker hand movements, consider using the “blob” module. The following table gives a sampling of scenarios and their expected precision. While full hand tracking with joint data requires a slower speed of movement, this limitation can be overcome by either choosing the extremities or the blob mode. The blob mode is also advantageous if your application is designed for kids to use.

If you do want more control within your app and want to manage the speed of motion, you can obtain speed at the hand joint level through the use of PXCMHandConfiguration.EnableJointSpeed. This allows you to either obtain the absolute speed based on current and previous location or average speed over time. However, this feature is a drain on the CPU and memory resources and should be considered only when absolutely necessary.

Since hand jitters cannot be avoided, the SDK also provides the Smoother utility (PXC[M]Smoother) to reduce the numbers of jitters as seen by the camera. This utility uses various linear and quadratic algorithms that you can experiment with based on your needs and pick the one that works best. In Figure 9 below, you can see how the effect of jitters is reduced through the use of this utility.

 Smoothed and unsmoothed data.

Figure 9: Smoothed and unsmoothed data.

Another mechanism you can use to detect whether the hand is moving too fast is the TRACKINGSTATUS_HIGH_SPEED enumeration within the PXCMHandData.TrackingStatusType property. For face detection, fast movements may lead to lost tracking. Use PXCMFaceData.AlertData.AlertType – ALERT_FACE_LOST to determine whether tracking is lost. Alternatively, if you are using hand gestures to control the OS using Touchless Controller, use the  PXC[M]TouchlessController member functions SetPointerSensitivity and SetScrollSensitivity to set pointer and scroll sensitivity.

Bounding boxes

An effective mechanism to ensure smooth action and object interactions is the use of bounding boxes, which help provide clear visual cues to the user on the source and destination areas for object of interaction.

The hand and face modules within the SDK provide for the PXCMHandData.IHand.QueryBoundingBoxImage API, which returns the location and dimension of the tracked hand—a 2D bounding box—in the depth image pixels, and the PXCMFaceData.DetectionData.QueryBoundingRect API, which returns the bounding box of the detected face. You can also use PXCMHandData.AlertType – ALERT_HAND_OUT_OF_BORDERS to detect whether the hand is out of the bounding box.

Links to APIs in the SDK documentation:

Blob tracking algorithm:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?manuals_blob_tracking.html

EnableJointSpeed:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?enablejointspeed_pxchandconfiguration.html

The Smoother utility:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?manuals_the_smoother_utility.html

TouchlessController:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?pxctouchlesscontroller.html

SetPointerSensitivity and SetScrollSensitivity:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?member_functions_pxctouchlesscontroller.html

R200 UX and API Guidelines

The R200 camera is designed for tablet and detachable form factors with uses that capture the scene around you. Augmented reality and full body scan are some of the prominent use cases with the R200 camera. With the focus on the world around you, the nature and scope of UX challenges is different from the F200 scenarios we discussed in the previous section. In this section, we provide insights into some of the known UX issues around the Scene Perception module (which developers will use for augmented reality apps) and the 3D scanning module.

Outcome 1: Understanding the capture volumes and interaction zones for tablet form factors

The UX scenario

As shown in Figure 10, the horizontal and vertical angles and the range for the R200 are considerably different than for the F200. The R200 camera can also be used in two different modes: active mode (when the user is moving around capturing a scene) and passive mode (when the user is working with a static image). When capturing an object/scene, ensure that it is within the FOV while the user is actively performing a scan. Also note how the range of the camera (depending on indoor versus outdoor use) is different compared to the F200. How do we capture these data points in runtime, so that we can provide good visual feedback to the user?

 R200 capture volumes.

Figure 10: R200 capture volumes.

The technical implementation

The QueryColorFieldOfView() and the QueryDepthFieldOfView() APIs were introduced in the F200 section above. These functions are device neutral and will work to capture the capture volumes for R200 as well. However, the API to detect the R200 camera range is device specific. To obtain this data for the R200, you must use the QueryDSMinMaxZ API, which is available as part of the PXCCapture interface and returns the minimum and maximum range of the camera in mm.

Links to APIs in SDK documentation

QueryDSMinMaxZ: https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?querydsminmaxz_device_pxccapture.html

Outcome 2: Understanding user action and scene interaction

The UX scenario: Planning for the scene and camera qualities

While working in the active camera mode, be aware of the camera limitations. Depth data is less accurate when scanning a scene with very bright areas, reflective surfaces, and black surfaces. Knowing when tracking could fail helps build an effective feedback mechanism into the application and fail the app gracefully than preventing play.

The technical implementation

The Scene Perception and 3D scanning modules have different requirements and hence provide for separate mechanisms to detect minimum requirements.

  • Scene Perception. Always use the CheckSceneQuality API within the PXCScenePerception module to tell whether the scene in question is suitable for tracking. The API returns a value between 0 and 1. The higher the return value, the better the scene is for tracking. Here is how to implement it in the code:

Once you determine that the scene quality is adequate and tracking starts, check the tracking status dynamically using the TrackingAccuracy API within the PXCScenePerception module, which enumerates the tracking accuracy definition.

NameDescription
HIGHHigh tracking accuracy
LOWLow tracking accuracy
MEDMedian tracking accuracy
FAILEDTracking failed

To ensure the right quality of data for the scene in question, you can also set the voxel resolution (a voxel represents the unit/resolution of the volume). Depending on whether you are tracking a room-size area, tabletop, or a close object, for best results, set the voxel resolution as indicated in the table below.

NameDescription
LOW_RESOLUTIONThe low voxel resolution. Use this resolution in a room-sized scenario (4/256m).
MED_RESOLUTIONThe median voxel resolution. Use this resolution in a table-top-sized scenario (2/256m).
HIGH_RESOLUTIONThe high voxel resolution. Use this resolution in a object-sized scenario (1/256m).
  • 3D Scanning. The 3D scanning algorithm provides the alerts shown in the table below. Use PXC3DScan::AlertEvent to obtain this data.
NameDescription
ALERT_IN_RANGEThe scanning object is in the right range
ALERT_TOO_CLOSEThe scanning object is too close to the camera. Prompt the user to move the object away from the camera
ALERT_TOO_FARThe scanning object is too far away from the camera. Prompt the user to move the object closer
ALERT_TRACKINGThe scanning object is in good tracking
ALERT_LOST_TRACKINGLost tracking on the scanning object

Once the data to track camera and module limitations is available within the app, you can then use that to provide the visual feedback, clearly demonstrating to users how their actions were translated by the camera or in the event of failure, showing them how they can correct their actions for better usage. Samples of visual feedback are provided here for reference; you can adapt these to suit the application requirement and UI design.

  • Sample tutorial at the start: Tutorials.
    Figure 11: Tutorials.
     
  • Preview of subject or area captured:
     Previews.
    Figure 12: Previews.
     
  • User prompts:
     User prompts.
    Figure 13: User prompts.

Minimizing fatigue while holding the device

Most applications will use the device in both active and inactive camera modes. (We distinguish these two modes as follows: “active camera” when the user is holding up the tablet to actively view a scene through the camera or perform scanning and “inactive camera” when the user is resting the tablet and interacting with content on the screen while the camera is off.) Understanding the way in which the user holds and uses the device in each mode and choosing interaction zones accordingly is critical to reducing fatigue. Active camera mode is prone to a higher degree of fatigue due to constant tracking, as shown in Figure 14.

 Device usage in active and inactive modes.

Figure 14: Device usage in active and inactive modes.

Choosing the appropriate mode for the activity

The mode of use also directly dictates the nature of interaction with the app you build through the UI. In active mode, the user uses both hands to hold the device. Therefore, any visual elements, like buttons that you provide in the app, must to be easily accessible to the user. Research has shown that the edges of the screen are most suitable for UI design. Figure 15 shows the touch zones that are preferred. The interactions are also less precise in active mode, so the active mode works best for short captures.

In contrast, in inactive mode, touch interactions are more comfortable, more precise, and can be used for extended play.

 Touch zones in active and inactive modes.

Figure 15: Touch zones in active and inactive modes.

Links to APIs in SDK documentation:

Scene Perception Configuration and tracking data:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?manuals_configuration_and_tra2.html

3D Scanning Alerts:https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?alertevent_pxc3dscan.html

Summary:

App development using the Intel® RealSense™ Technology requires developers to keep end user in mind from the very beginning stages of development. The directions provided in this article provides the starting point for some of the critical UX challenges and implementing them in code using the SDK.

Additional Resources:

F200 UX Guidelines: https://software.intel.com/sites/default/files/managed/27/50/Intel%20RealSense%20SDK%20Design%20Guidelines%20F200%20v2.pdf

R200 UX Guidelines:

https://software.intel.com/sites/default/files/managed/d5/41/Intel%20RealSense%20SDK%20Design%20Guidelines%20R200%20v1_1.pdf

Best UX practices for F200 apps:

https://software.intel.com/en-us/articles/best-ux-practices-for-intel-realsense-camera-f200-applications

Link to presentation and recording at IDF:

http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65D7520025A8/7/5

About the authors:

Meghana Rao:

As a Developer Evangelist within Intel's Software and Services division, Meghana works with developers and ISVs assisting them with Intel® RealSense™ Technology and Windows* 8 application development on Ultrabook™, 2-in-1s, and tablet devices on Intel® Architecture. She is also a regular speaker at Intel Application Labs teaching app design and development on Intel platforms and has contributed many white papers on the Intel® Developer Zone. She holds a Bachelor of Engineering degree in Computer Science and engineering and a Master of Science degree in Engineering and Technology Management. Prior to joining Intel in 2011, she was a senior software engineer with Infineon Technologies India Pvt. Ltd.

Kevin Arthur:

Kevin Arthur is a Senior User Experience Researcher in the Perceptual Computing group at Intel. He leads user research on new use cases and best practices for RealSense 3D depth cameras, including mixed and augmented reality experiences. He holds a PhD in computer science from the University of North Carolina at Chapel Hill, where he specialized in human-computer interaction in virtual reality systems. He previously did user experience and software R&D work at Amazon Lab126, Synaptics, and Industrial Light & Magic.

Windows 10* Features Every Game Developer Should Know

$
0
0

Figure 1. Windows* 10 provides a common OS across the spectrum of Microsoft devices.

Among a long list of improved features, Windows 10 is positioned to tie together the wide variety of potential gaming platforms with UWP apps. The augmented support for 2-in-1s brings a more natural adaptability to interaction modes. With all platforms running variants of the same system, the apps are delivered by the same store. For other PC games, the support provided for Steam* makes it even stronger than before. Whichever distribution works for your game, DirectX* 12 brings new life to games with significantly increased power.

Power with DirectX 12

DirectX 12 is the most recent version of Microsoft’s graphics API suite designed to manage and optimize multimedia functionality – most notably handling video game graphics; in fact, Xbox* got its name from originally being conceived as the DirectX Box, so it’s no surprise that the technology has continued to be a pillar of game development.

DirectX* 12 brings PCs console-level optimization

Figure 2. DirectX* 12 brings PCs console-level optimization

DirectX 12 on Windows 10 improves that relationship further by reducing call overhead, memory paging, and system footprint, to give your game more space and processor time. Games being run in the foreground are granted better control over their process execution, with reserved memory to minimize texture swaps and other potential conflicts with external apps. Without going into a deeper explanation, these capabilities translate into a smoother experience for the user and a wider range of computers being able to maintain acceptable performance.

One of the biggest concerns when considering use of a new technology is how much of the existing content and development process will need to change. Fortunately, the jump from DirectX 11 to 12 keeps interdepartmental workings consistent; development can continue unimpeded with the same textures and models, formats, render targets, compute models, etc. The most significant changes are constrained to the code of custom engines, and MSDN even offers some official guidance for porting those. In most situations, developers use an existing engine—DirectX 12 has already been introduced for Unreal* and Unity* (specifically providing boosts for multicore scaling, with continued improvement ahead). Combined with its native support on 6th generation Intel® Core™ processors, DirectX 12 really gets the engine running on all cylinders.

Distribution with Steam

Steam is the leading digital distribution service for PC games, making it easy for gamers to discover, purchase, and access their games. It may seem counterproductive for Microsoft to support an external app store, but it’s not trying to compete for the same space; the Windows app store is focused on serving the full spectrum of Windows devices (which last year total upward of 1.5 billion), so you can do things like shopping on your phone to find and download games to your Xbox One and PC.

Just like developers moving to new tools, gamers can be hesitant to upgrade to a new OS when so much of their game library is in a potentially unsupported program. Fortunately, Steam* on Windows 10 is fully backward compatible, requiring no changes aside from updated video drivers.

There is even a Steam Tile to create tiles for quick access to individual games. Microsoft really wants Steam to “run great on Windows 10,” and these various points of solid support are definitely gaining traction; according to Steam’s opt-in hardware survey, Windows 10 has become the second most widely used OS on Steam, with over a quarter of users having made the switch in the short few months since its release.

Figure 3. Second only to Windows* 7 (total 44.86%), Windows 10 continues to grow (27.42%) as much as all others combined.

Versatility with 2-in-1s

With the hardware of modern convertible devices, a tablet ideal for gaming on the go can become a laptop with the high-quality graphics and performance expected of PC games. In combination, the versatility afforded to gamers having a touch screen and laptop can create a gestalt experience tailored to the ideal interaction methods for various activities. And with technology like the Intel® RealSense™ SDK enabling perceptual computing with 3D cameras and voice commands (here’s a quick-start guide to using it in Windows 10), the spectrum of user controls continues to grow.

Windows 10 bridges the gap between these two modes, allowing seamless transitions between interfaces tailored for the control scheme currently available to the user; when in tablet mode, most input is likely to be designed for touch, whereas keyboard usage can be expected when the keyboard is present.

Integration across the Windows Universe

The dream goal of “write once, run anywhere” becomes clearer with the advent of Universal Windows Platform (UWP) apps. Microsoft is standardizing the operating systems of all platforms under the Windows 10 banner, creating a significant opportunity for code reuse. While you still have the option of writing a different version for each platform, you can write a single game that runs on a PC, Xbox One, tablets, phones, and even the HoloLens*, or target entire device families by their hardware commonalities.

One Windows platform

UWP apps bridge the gaps while allowing device-specific customization. Detecting the device family and specific features available can be done before installation by deciding which binary to use, or it can be done by API contracts at runtime, allowing adaptive features to make things like 2-in-1s and peripherals easier to use.

One notable addition to make the most of Windows Apps is the way app life cycle transitions are handled. Rather than being in the dark as to whether your game is in stasis or even about to be closed, these changes can trigger program events to handle operations for a more persistent player experience. In addition to apps being able to gracefully manage state changes, Windows is able to more effectively manage the resources needed.

There are a number of resources that provide technical information, such as Microsoft’s GDC 2015 talks, presenting good overviews touching on many aspects this article doesn’t have the space to explore: Chris Tector and Don Box explain how the systems work together under this paradigm, Bill Schiefelbein demonstrates how gamers and game developers connect in a new form of social network around the Xbox app, Vijay Gajjala and Brian Tyler elaborate on using the Xbox Live APIs to quickly make use of these new features, and Chris Charla introduces the ID@XBOX program for independent developers to self-publish at the same level as anyone else (even receiving two dev kits at no charge as long as you have an email address with your company website).

Connection via the Xbox App

The Xbox app—the PC hub of gaming activity that extends Xbox Live functionality to other devices—ties all the game experiences together. It unites gamer friends in more of a social network dynamic, driving discovery, engagement, and player retention by enabling development of a player culture.

Figure 4. The Xbox* app is geared toward enriching the gamer experience.

Players can capture clips and screenshots of their games to share, with the bonus option of capturing the last 30 seconds of gameplay for when something awesome but unexpected happens. Since the network of friends and followers connects across platform differences, any game enabled with these features is granted the same degree of exposure—even users viewing the app on a mobile device can watch your game clips.

The single sign-on approach of Xbox accounts makes user profile association easy, letting the OS handle a lot of the leg work (and import friends from Facebook*). Similarly, since Windows apps have explicit manifest information, the system can manage installation and updates, saving significant developer hours (which are especially critical once the game goes live).

Those developers can also simplify gathering and using in-game metrics, granting a richer online presence; rather than simply knowing what game you’re playing, your friends could potentially see where you are in the game and how you’re doing—presumably with privacy options. The friends list is viewable alongside the games library, where developers can live-update the game’s information for announcements and updates.

The Windows Dev Center also provides dashboards on analytics, tracking things like player data and monetization, as well as dynamic creation of engagement drivers like achievements, challenges, and leaderboards. With the information available to developers, players can connect with new aspects of your game, while connecting with others through your game in new ways.

Bonus: Streaming Xbox One to the PC

In addition to its ability to play and connect with friends across platform differences, the Xbox One can stream to any Windows 10 PC on the same network. Granted, there are some considerations to ensure the gameplay stays in top shape, but the ability to play console games without being tethered to a TV seems like a dream come true.

Streaming gameplay to anywhere on your network.

Figure 5. Streaming gameplay to anywhere on your network.

Looks like a Win-Win

If you are a gamer or game developer targeting the PC or Xbox One, Windows 10 is your friend. The features delivered by the Xbox app and APIs lay the foundation for a broader and deeper engagement with players. By anticipating gamer preferences for power and ease of use, the details of Microsoft’s newest offering all lean toward augmenting the gamer experience on multiple levels.

For More Information

The meaning of Xbox: http://www.economist.com/node/5214861

DirectX 12: http://blogs.msdn.com/b/directx/archive/2014/03/20/directx-12.aspx

What DirectX 12 means for gamers and developers: http://www.pcgamer.com/what-directx-12-means-for-gamers-and-developers/

Important Changes from Direct 3D 11 to Direct 3D 12: https://msdn.microsoft.com/en-us/library/windows/desktop/dn899194(v=vs.85).aspx

Porting from Direct3D 11 to Direct 3D 12: https://msdn.microsoft.com/en-us/library/windows/desktop/mt431709(v=vs.85).aspx

Any developer can now make a DirectX 12 game with updated Unreal Engine 4: http://www.windowscentral.com/any-developer-can-now-make-directx-12-game-updated-unreal-engine-4

Unity Founder: DirectX 12 API Alone Doesn’t Give A Significant Performance Boost: http://gamingbolt.com/unity-founder-directx-12-api-alone-doesnt-give-a-significant-performance-boost

Product Brief: 6th Gen Intel® Core™ Processor Platform:  http://www.intel.com/content/www/us/en/processors/core/6th-gen-core-family-mobile-brief.html

Valve Lines Up Console Partners in Challenge to Microsoft, Sony: http://www.bloomberg.com/news/articles/2013-11-04/valve-lines-up-console-partners-in-challenge-to-microsoft-sony

Microsoft’s Xbox Store isn’t trying to cut out Steam in Windows 10: http://venturebeat.com/2015/05/21/microsofts-xbox-store-isnt-trying-to-cut-out-steam-in-windows-10/

Microsoft: We have 1.5 billion Windows devices in the market: http://www.neowin.net/news/microsoft-we-have-15-billion-windows-devices-in-the-market

Steam Tile:   https://www.microsoft.com/en-us/store/apps/steam-tile/9wzdncrfhzkv

Microsoft wants to support Steam and “help it run great on Windows 10”: http://www.technobuffalo.com/2015/08/07/microsoft-wants-to-support-steam-and-help-it-run-great-on-windows-10/

Steam Hardware & Software Survey: November 2015: http://store.steampowered.com/hwsurvey

Intel RealSense: http://www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html

Get Started Developing Intel® RealSense™ SDK for Windows* 10 Desktop Apps: https://software.intel.com/en-us/articles/get-started-developing-intel-realsense-sdk-for-windows-10-desktop-apps

Windows 10 on the Surface Pro 3: Now the 2-in-1 makes perfect sense: http://www.gizmag.com/windows-10-surface-pro-3-review/38189/

What's a Universal Windows Platform (UWP) app?: https://msdn.microsoft.com/en-us/library/windows/apps/dn726767.aspx

Dynamically detecting features with API contracts (10 by 10): https://blogs.windows.com/buildingapps/2015/09/15/dynamically-detecting-features-with-api-contracts-10-by-10/

It's Universal: Understanding the Lifecycle of a Windows 10 Application: https://visualstudiomagazine.com/articles/2015/09/01/its-universal.aspx

MSDN Channel 9 – GDC 2015: https://channel9.msdn.com/Events/GDC/GDC-2015

Developing Games for Windows 10: https://channel9.msdn.com/Events/GDC/GDC-2015/Developing-Games-for-Windows-10

Gaming Consumer Experience on Windows 10: https://channel9.msdn.com/Events/GDC/GDC-2015/Gaming-Consumer-Experience-on-Windows-10

Developing with Xbox Live for Windows 10: https://channel9.msdn.com/Events/GDC/GDC-2015/Developing-with-Xbox-Live-for-Windows-10?ocid=SessionsInEvent

New Opportunities for Independent Developers: https://channel9.msdn.com/Events/GDC/GDC-2015/New-Opportunities-for-Independent-Developers?ocid=SessionsInEvent

The Xbox Experience on Windows 10: http://www.xbox.com/en-US/windows-10/xbox-app

Windows Dev Center: https://dev.windows.com/en-us/games

How to use game streaming in the Xbox app on Windows 10: https://support.xbox.com/en-US/xbox-on-windows/gaming-on-windows/how-to-use-game-streaming

About the Author

Brad Hill is a software engineer at Intel in the Developer Relations Division. Brad investigates new technologies on Intel® hardware and shares the best methods with software developers via the Intel® Developer Zone and at developer conferences. His focus is on turning students and developers into game developers and helping them change the world.

Get Ready for Intel® RealSense™ SDK Universal Windows* Platform Apps

$
0
0

Introduction

The much anticipated Intel® RealSense™ SDK support for developing Universal Windows Platform (UWP) apps has arrived in SDK R5 (v7), and this article will help get you started. A sneak peek of the UWP interfaces and methods was presented in the R4 release documentation, and R5 now delivers the software components, samples, and documentation you’ll need for developing UWP apps that use the Intel® RealSense™ camera (SR300).

As stated in the What's News in R5 of the Intel® RealSense™ SDK R5 (v7) article, the SR300 camera should be available for order in Q1 of 2016 and integrated into select Intel-based systems in 2016. Because the SR300 is not yet available for end users, this article will focus on the things you need to know now to prepare for its arrival.

What You’ll Need to Get Started

SDK Scope and Limitations

  • The SDK supports UWP app development for Windows 10 using C# and XAML.
  • For UWP apps, the SDK supports only raw color and depth streaming and the blob tracking algorithm. Other UWP-specific algorithms are in development.
  • UWP apps must statically include the SDK runtime files, so the SDK version is fixed at the time of development.
  • The Session interface is not explicitly exposed in C# UWP.
  • The camera coordinate system is slightly different for UWP apps; refer to the SDK manual for details.
  • You cannot change the coordinate system in your UWP application.
  • To map the coordinates between color and depth streams in UWP apps, use the Windows.Devices.Perception.PerceptionDepthCorrelatedCoordinateMapper interface.
  • In the SDK manual, the (UWP) marking indicates UWP-specific interfaces or methods. The (+UWP) markings emphasize that the UWP interface is part of a function along with the other language interfaces.

Windows Desktop App Samples

Once the DCM and SDK are installed, reboot your computer and then ensure the camera is operating correctly by running one of the samples provided in the Intel RealSense SDK Sample Browser.

Click the Windows 10 Start button in the lower-left corner of the screen.

Select All Apps and then scroll to the Intel RealSense SDK folder (Figure 1).

Figure 1. Windows Start menu

Figure 1. Windows Start menu

Locate and run the Intel RealSense SDK Sample Browser. At the top of the SDK Sample Browser window (Figure 2) you’ll find a tab labelled “SR300 Samples” containing all of the Windows Desktop sample apps (i.e., code samples that run in Windows Desktop mode, not as UWP apps). You should familiarize yourself with these samples to understand the full capabilities of the SR300 camera.

Figure 2. SDK Sample Browser

Figure 2. SDK Sample Browser

UWP Components

The UWP software components provided in the Intel RealSense SDK are located in C:\Program Files (x86)\Intel\RSSDK\UWP. (Note: this is the default installation path; your file path may be different depending on how the SDK was installed.) The components are located in the following folders under \UWP:

  • \ExtensionSDKs – Contains the DLLs that are referenced in a UWP application.
  • \Samples  – Contains the DF_BlobViewer_UWP_CS and DF_StreamViewer_UWP_CS code samples.

Creating a UWP Project from Scratch

If you are new to UWP app development, first familiarize yourself with the basics of creating a UWP project from scratch. An informative C#/XAML “Hello, world” tutorial is available on this Microsoft website.

This tutorial provides a good starting point for learning how to create a simple app that targets the UWP and Windows 10. After completing the “Hello, world” tutorial and running your new UWP app, it will look something like the screen shown in Figure 3.

Figure 3. Hello, world! UWP application

Figure 3. Hello, world! UWP application

When running in Debug mode you may notice a little black frame counter in the upper-left corner of your app. If you don’t want the counter to show, locate the following code in App.xaml.cs:

#if DEBUG

        if (System.Diagnostics.Debugger.IsAttached)

        {

            this.DebugSettings.EnableFrameRateCounter = true;

        }

#endif

You can either set the property to false:

this.DebugSettings.EnableFrameRateCounter = false;

Or simply comment-out the line:

// this.DebugSettings.EnableFrameRateCounter = true;

Configure the Development Environment for the Intel®RealSense™ Camera

To enable your app for the Intel RealSense camera, do the following:

  • Enable Webcam under Capabilities in the App Manifest.
  • Add references to the Intel RealSense SDK libraries.

In Solution Explorer, double-click Package.appxmanifest, and then click the Capabilities tab.

Locate and select the checkbox for Webcam, as shown in Figure 4.

Figure 4. Package.appxmanifest

Figure 4. Package.appxmanifest

Next you’ll need to reference the Intel RealSense SDK libraries:

  • Intel.RealSense– The library containing the SDK essential instance implementation, such as algorithm management and streaming data from the cameras.
  • Intel.RealSense.Blob– The library containing the SDK Blob Tracking module implementation.

Right-click on References in Solution Explorer, and then select Add Reference to open the Reference Manager.

Click the Browse button and navigate to the folders containing Intel.RealSense.winmd and Intel.RealSense.Blob.winmd. (These metadata files are located under C:\Program Files (x86)\Intel\RSSDK\UWP\ExtensionSDKs\.)

Click the Add button. The libraries appear under References in Solution Explorer.

Explore the UWP Samples

To learn more about how to integrate the Intel RealSense SDK capabilities in your app, open and build the two sample projects provided in the SDK:

  • DF_BlobViewer_UWP_CS
  • DF_StreamViewer_UWP_CS

Note: These samples are not available in the SDK Sample Browser app discussed earlier. They are located under C:\Program Files (x86)\Intel\RSSDK\UWP\Samples and should be copied to any writable directory in order to build them with Visual Studio 2015.

Summary

This articles presents a brief overview of developing UWP apps that integrate the Intel RealSense SDK. Stay tuned for more information as the SR300 camera becomes available.

About the Author

Bryan Brown is a software applications engineer in the Developer Relations Division at Intel. 

What do I need to know about redistributing libraries that are included with Intel Software Development Products?

$
0
0

Can I redistribute libraries included with Intel® Parallel Studio XE 2016 with my application?

Yes. When you agree to the EULA for Intel® Parallel Studio XE you receive rights to redistribute portions of Intel® MKL, IPP, TBB and DAAL libraries with your application. However, the evaluation versions of Intel® Software Development products do not include redistribution rights.

Where do I find the redistributable packages for Intel® Parallel Studio XE 2016 for C++ and Fortran?  

The following articles contain links to the redistributable installation packages:

Redistributable libraries for Intel® TBB, Intel® IPP and Intel® MKL are installed along with Intel® Parallel Studio XE.

What are the licensing terms of redistributing the libraries?

Subject to the terms of the EULA, you may redistribute an unlimited number of copies of the Redistributable files that are listed in the text files defined in the Redistributables section of the EULA.

Have Questions?

Please consult the Intel User Forums:

Accelerating Media Processing: Which Tool Do I Use?

$
0
0

Intel has a multitude of awesome software development tools, including ones for media and graphics optimization. But sometimes, it's hard to figure out just which tool is the best one to use for your particular needs and usages.

Below you'll find a few insights to help you get to the right media tool faster, so you can focus on the really fun stuff - like optimizing your media solutions, applications, or video streaming. 

Accelerating Media Processing - Which Tool Do I Use?
Intel Media ToolPlatform / Device Targets & Usages

Intel® Media SDK

Developing for:

  • Intel® Core™ or Core™ M processors 
  • Select SKUs of Intel® Celeron™, Intel® Pentium™ and Intel® Atom™ processors with Intel HD Graphics supporting Intel® Quick Sync Video
  • Client devices - Desktop or mobile applications
  • OS - Windows only*

Usages & Needs

  • Fast video playback, encode, processing, media formats conversion or video conferencing
  • Accelerated processing of RAW video or images
  • Screen capture
  • Audio decode & encode support

Intel® Media Server Studio

 

 

 

 

3 Editions are available

  • Community
  • Essentials
  • Professional

Developing for:

Format Support HEVC, AVC and MPEG-Audio

Usages & Needs

  • High-density and fast decode, encode, transcode
  • Optimize performance of Media/GPU pipeline (-> VTune)
  • Enhanced graphics programmability or visual analytics (for use with OpenCL™ applications)
  • Low-level control over encode quality
  • Debug, analysis and performance/quality optimization tools
  • HEVC with premium quality
  • Need to measure visual quality (Video Quality Caliper)
  • Looking for an enterprise-grade telecine interlace reverser (Premium Telecine Interlace Reverser)
  • Audio codecs
  • Screen capture

Intel® Video Pro Analyzer

Format Support - HEVC, VP9, AVC and MPEG-2

Usages & Needs

  • Develop for HEVC, AVC, MPEG-2 or VP9 decoder or encoder, analyze streams
  • Interested in saving time and resources
  • Fine-tune coding pipeline
  • Inspect full decode and encode process, debug & optimize encoders
  • Measure and improve visual quality (Video Quality Caliper)
  • Access to detailed video stream statistics
  • Innovate for UHD with HDR color support

Intel® Stress Bitstreams & Encoder

Format/Profile Support - HEVC, VP9

Usages & Needs

  • Perform extensive, enterprise-grade, production-scale media validation and debug for HEVC/VP9 decoders, transcoders, players, and streaming solutions
  • Develop HEVC or VP9 decoders, inspect decoding results
  • Significantly accelerate validation cycles, reduce costs and speed time-to-market
  • Create custom bitstreams for testing and optimize stream base for coverage and usage efficiency
  • Ensure decoder is compliant to standard, runs with top performance, is robust and resilient to errors

Intel® SDK for OpenCL™ Applications

Developing for:

General purpose GPU acceleration on select Intel® processors (see technical specifications). OpenCL primarily targets execution units. An increasing number of extensions are being added to Intel processors to make the benefits of Intel’s fixed function hardware blocks  accessible to OpenCL applications.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Kronos.

For more complete information about compiler optimizations, see our Optimization Notice.

Getting started with the Depth Data provided by Intel® RealSense™ Technology

$
0
0

Abstract

Looking for tips on obtaining the best depth data from your 3D camera? This article will describe a few key characteristics of the Intel® RealSense™ camera R200 and the Intel® RealSense™ SDK that can affect depth data. You’ll also find a methodology for collecting raw data, including sample code (available from GitHub.com) as well as an attached zip file containing typical data.
We suggest testing under your specific lighting, texture, and distance parameters to provide some working boundaries for your scenes.

Overview

Figure 1 Intel(R) RealSense(TM) Camera R200 Reference Developer Kit

This analysis uses data obtained from the world facing Intel RealSense camera R200. The R200 is the first generation of Intel’s rear-facing 3D cameras based on active stereoscopic depth technology.  The camera implements an IR laser projector, a pair of stereoscopic IR cameras and a 2 MP RGB color camera.  You can purchase a developer kit for this camera from http://click.intel.com/intel-realsense-developer-kit-r200.html or a system with an integrated R200 camera.

Considerations - Camera and Object

The construction and behavior of the camera components can affect the resulting data, even in the case of fixed indoor lighting with stationary camera and target. Features and functions of the camera that affect depth data include:

  • Temporal (frame-to-frame) and Spatial (within a frame) variation
  • Sensor to sensor alignment and each sensor’s field of view (FOV)
  • The environment’s temperature (effect on sensors)
  • The environment’s ambient IR light
  • Any other physical effect on a sensor including physical impact or vibration

The location and type of target, especially the object’s IR reflectivity and texture (sheen, dimples, and curvature) will also affect the depth data. For simplicity, this data collection used a fixed target of single texture.

These tests were performed using Windows* 8.1 64 bit with August update and v6 (R4) of the Intel RealSense SDK available at https://software.intel.com/en-us/intel-realsense-sdk/download.

Considerations - Geometry

For the simplest test, the camera was pointed directly at a flat target, a typical inside white wall with standard office fluorescent ceiling lighting.  The camera was mounted firmly, at a fixed height, on a rail that can travel linearly (one axis only) from 600-2100 mm from the target.  The distance is measured by a consumer-grade laser meter mounted flush with the front of the camera. 
The camera face is aligned parallel to the wall and the rail axis is aligned perpendicular to the wall.


Figure 2:  Test measurement set up with Intel® RealSense™ camera mounted for stabilization

While the camera moves between measurements, it was sitting still while collecting data.

In these tests, the motion of the rail induced significant vibrations that would affect the data so a large delay was introduced to allow the vibrations to cease before any frames were captured.  The rail motion, time delays, and data capture were automated to provide consistent, repeatable results (and to cut down on labor).

Tip: Alignment of the rail and camera with the target is critical. You can align the rail axis using physical measurement from the end of the rail to reference points on the wall. Aligning the camera face is more difficult. For these tests, we adjusted the camera alignment based on a spatially uniform pattern observed in the raw depth image from an Intel® RealSense™ SDK sample, with the contrast adjusted by scaling to make it easier for the human eye to see variations across the field of view.  See Figure 3 for a depiction of a non-spatially uniform pattern and Figure 4 for an aligned target image.  Note that the black area on the left is an artifact of stereoscopic imaging and does not reflect an alignment issue.

TIP:  All stereoscopic cameras will have areas where data is missing, due to the difference in field of view between the sensors. Figure 4 above shows such an area for the R200 camera in the black bar on the left.  The effect of this camera characteristic decreases with distance but needs to be considered at closer ranges.

Test Range

Using the above test configuration, we collected a series of raw depth buffers at distances ranging from 600 mm to 2100 mm.  600mm is the minimum distance for the Intel R200 camera and 2-3 meters is the maximum distance indoors (longer range outdoors).  Remember, range ultimately depends on multiple factors including lighting and object textures.

TIP:  Keep within the range of the depth camera. At distances less than 600 mm, the R200 camera may not return any depth data, depending on the configured resolution.

TIP:  For testing, keep the input scene simple (e.g. a flat wall), as it is easier to detect general trends. 

Quality and Precision Related to Distance

Our tests looked at two factors - the quality of the depth data and the precision of the depth data as functions of distance from the target.  For all depth cameras, accuracy and precision of depth measurements decreases as distance increases.

Tips:
For more accurate, precise data, stay as close to the object as possible. If moderate accuracy is needed, just stay within the recommended range of the camera.

Capturing Depth Data

A sample script is provided at https://github.com/IntelRealSense/rsdatacollector But to illustrate the flow, here’s a quick walkthrough of the basic steps to capture the camera data as individual frames using the Intel RealSense SDK:

  1. Create and initialize an instance of PXCSenseManager (see PXCSenseManager::CreateInstance()).
  2. Use sense manager to enable the depth and/or color stream  specifying resolution and frame rate.
           senseManager->EnableStream(PXCCapture::STREAM_TYPE_COLOR, 320, 240, 60);
           senseManager->EnableStream(PXCCapture::STREAM_TYPE_DEPTH, 320, 240, 30);
  3. Use sense manager to acquire / lock an available frame.
            camStatus = senseManager->AcquireFrame(true);
  4. Use sense manager to query the available samples
            PXCCapture::Sample *sample = senseManager->QuerySample();
  5. Access the Sample's specific PXCImage members to extract the desired data .
                colorImage = sample->color;
                depthImage = sample->depth;
  6. Use member functions of the PXCImage object, such as QueryInfo(), to get the parameters of the color or depth data.

NOTE: Understanding the image format, dimensions, and other details is critical when traversing the raw image buffers. The parameters are documented in the SDK and the sample code associated with this paper shows an example of how those parameters are used.

The data buffer is organized in planes and pixels and specific pitches that correspond to the parameters in the ImageInfo mentioned above.Once you understand the format of the data buffer, you can directly manipulate the values for various analysis or write it out to disk for analysis in other tools.

  • 7. To access the raw image buffers, you must use the PCXImage::AcquireAccess() function passing it your PCXImage::ImageData pointer to store the buffer location. The planes and pitches of the ImageData should be initialized to empty arrays.

    8. After obtaining the desired image data, you must release access to the data buffer (PXCImage::ReleaseAccess()) and release the frame (PXCSenseManager::ReleaseFrame())

The Intel® RealSense™ SDK offers several abstractions for dealing with the image data, allowing easy access to and direct handling of the data buffers. These include the PXCPhoto::SaveXDM() function, which will simply write the depth enhanced JPEG file format. Additionally Background Segmentation (BGS), Enhanced Photography and Video (EPV) and Scene Perception (SP) modules in the SDK have options to provide additional filtering of the depth data. EPV and BGS share the same flags while SP flags are fixed.

Much more information is available in the SDK documentation at https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_devguide_introduction.html

Conclusion

Intel® RealSense™ Technology provides developers with depth data useful for extracting distance and size of objects from images and the ability to reconstruct a scene in 3D.  Since the quality of depth images varies due to many factors including time, lighting and distance, it is important to characterize the factors that may impact your application as described in this paper. Read the Intel RealSense SDK documentation to familiarize yourself with available SDK algorithms and sample code for capture, filtering, and processing.

Further Information

The Intel Developer Zone (IDZ) website for Intel RealSense technology includes the SDK and full documentation plus many articles, code samples, videos, blogs by software developers, links to events, and a forum for discussion and questions. Start at https://software.intel.com/realsense to check out the R200 camera features and videos.

About the Authors

Tom Propst is a Software Engineer at Intel focusing on enabling new use cases in business environments. He received his B.S. in Electrical Engineering from Colorado State University. Outside of work, Tom enjoys playing bicycle polo and tinkering with electronics.

Teri Morrison is a Software Architect at Intel focusing on the Intel® RealSense™ SDK. She has worked on NASA projects, graphics driver products and physics, graphics and perception computing samples. She received her M.S. in Applied Mathematics with minor in Computer Science from the University of Colorado. 

Media for Mobile Getting Started Guide for Windows* platforms

$
0
0

Media for Mobile Getting Started Guide for Windows* platforms

Introduction

Media for Mobile, a feature of Intel® Integrated Native Developer Experience (Intel INDE) is a set of easy to use components and API for a wide range of media scenarios. It contains several complete pipelines for most popular use cases and enables you to add your own components to those pipelines.

The samples demonstrate how to incorporate the Media for Mobile into various applications for Windows* platforms.

Download the Media for Mobile samples from:

Media for Mobile Samples on GitHub

Media for Mobile Samples

System Requirements and List of available samples:

Host systems to develop applications for Windows*:

  • Operating Systems: Microsoft* Windows* 8.1
  • Intel® Integrated Native Developer Experience (Intel® INDE) 2015 for Windows* – Media For Mobile component
  • IDE: Microsoft* Visual Studio* 2013 Update 2 or later

Windows* target systems:

  • Hardware: IA-32 or Intel® 64 architecture processors
  • Operating System: Microsoft* Windows* 8.1

Samples for Windows* RT* targets

  • Transcode video
  • Join Video
  • Cut Video
  • Video Effect
  • Camera Capturing

The following tutorial guides you in your first steps with pre-built samples on...

Building and running samples for Windows* RT: steps needed to build the samples for Windows* RT in Microsoft* Visual Studio* 2013.

Media for Mobile is available for free with the Intel® INDE Starter Edition. Click here for more information.

Please see the list of open issues in Media for Mobile samples repository here.


WebAssembly*: An initial view

$
0
0

Download WebAssembly article PDF  [PDF 263 KB]

Introduction

This first in a series of articles gives a high-level introduction to WebAssembly (wasm). While the language is not fully formed, with many changes to come, this article gives you a rough idea of the current state of wasm. We will provide follow-up articles to provide news of changes as they come along.

Wasm’s goal is to enable more performance for JavaScript*. It defines a new, portable, size- and load-time-efficient file format suitable for compilation to the web. It will use existing web APIs and is destined to become an integral part of Web technology. Although “WebAssembly” contains the word web in it, the language is not only destined uniquely for browsers, but the goal is to also provide the same technology to non-web usages. This opens the door to a lot more coverage and potential traction of wasm.

As with all dynamically typed languages, it is difficult to get anything close to native performance running JavaScript, so the ability to reuse existing native (C/C++) code in web pages was used for many years. Alternatives such as NaCl and PNaCl run in web pages alongside JavaScript. The most practical is asm.js, a JavaScript subset restricted to features that can be compiled into code approaching native performance.  None of these alternatives has been universally accepted in web browsers as the solution for performance and code reuse. Wasm is an attempt to fix that.

Wasm is not meant to replace JavaScript but instead to provide a means to attain near-native performance for key parts of an application, web-based or not. For this reason, browser, web, compiler, and general software engineers are working to create the definition of the technology. One of the goals is to define a new platform-independent binary code format for the web.

Near-native performing code on web pages could transform how new features are brought to the web. In native code, new features are often provided by an SDK, providing a native library for access. Web pages are not permitted to access those native libraries for security reasons. Web support for new capabilities often requires complex standardized APIs offered by the web browser to correct related issues; for example, JavaScript libraries are too slow.

With wasm, those standard APIs could be much simpler, and operate at a much lower level, with cached, multithreaded, and SIMD-capable dynamic link libraries of wasm providing functionality not currently possible. For example, instead of complex standard APIs for facial recognition or 3D image construction for use with a 3D camera, a much simpler standardized API could just provide access to the raw 3D data stream, with a wasm module doing the processing that a native SDK library now does. This would allow for downloading and caching of commonly used dynamic link libraries and rapid access to new capabilities from the web, long before the standardization process could complete.

This article provides an introduction to a very fast evolving technology. It will remain relatively high-level as many things are still evolving on the wasm specification.

More details about the high-level goals of wasm can be found here:

https://github.com/WebAssembly/design/blob/master/HighLevelGoals.md

General Overview

Design Versus Spec

The original design documents of wasm can be found in the following repository (repo), which contains these notable files:

  • AstSemantics.md provides an overview of the format itself.
  • MVP.md: what defines the Minimal Viable Product - requirements for the first iteration of wasm.
  • HighLevelGoals.md: the high-level goals of wasm and the use cases it is trying to solve.

In order to precisely define, and also to verify and check the decisions recorded in the design document, the spec repo contains an OCaml interpreter of the wasm language. It also contains a test suite folder that has some initial wasm tests. The test suite ranges from general integer and floating-point calculations to memory operations.

A few other tools in the main wasm github repository use the same test suite to test for regression. For example, both wasm-to-llvm-prototype and wasmint use the same test suite for their regression tests.

Prototypes

The base github page for wasm contains various active projects. We mentioned a few of them in this article and there are many others of note. We recommend that the reader peruse the various projects to see where the wasm community is putting their efforts, but you can divide the repositories in five major groups:

Many of these repositories are proof-of-concept type repositories, meaning that they try to get something up and running to gather experience and don't necessarily represent the final result. The engineers working on the repositories often are playing around with wasm to determine how everything works. Examples are the various binary formats that are being tested such as polyfill-prototype-2 or the v8 binary format.

 

WebAssembly: An Initial View

Modules, the Larger Construction Blocks

Wasm defines modules as its highest construct, containing functions and memory allocation requests. A wasm module is a unit of distributable, executable code. Each module has its own separate linear memory space, imports and exports, and code. This could be an executable, a dynamic link library (in future versions of wasm) or code to execute on a web page (used where ECMAScript 6* modules can be used).

Though the current test files in the test suite allow multiple modules defined in the same file, it is currently thought that this will not be the case in the final version. Instead, it is probably going to be common to have a single big module for an entire program. Most C/C++ programs will therefore be translated into a single wasm module.

Functions

Wasm is statically typed with the return, and all parameters are typed. For example, this line from the i32.wast file in the test suite repository shows an addition of two parameters:

(func $add (param $x i32) (param $y i32) (result i32) (i32.add (get_local $x)
    (get_local $y)))

All are 32-bit integers. The line reads like this:

  1. Declaration of a function named $add.
  2. It has two parameters $x and $y, both are 32-bit integers.
  3. The result is a 32-bit integer.
  4. The function’s body is a 32-bit addition.
    • The left side is the value found in the local variable/parameter $x.
    • The right side is the value found in the local variable/parameter $y.
  5. Since there is no explicit return node, the return is the last instruction of the function, hence this addition.

There will be more information about functions and wasm code later in this article.

Similar to an AST

Wasm is generally defined as an Abstract Syntax Tree (AST), but it also has some control-flow concepts and a definition of local variables allowing the handling of temporary calculations. The text format S-Expression, short for Symbolic Expression, is the current text format used for wasm (although it has been decided that it will not be the final wasm text representation).

However, it works well for explaining wasm in this document. If you ignore the general control flow constructs such as ifs, loops, and blocks, the wasm calculations are in AST format. However, for the following calculation:

(3 * x + x) * (3 * x + x)

You could see the following wasm code:

(i32.mul
       (i32.add (i32.mul (i32.const 3) (get_local 0)) (get_local 0))
       (i32.add (i32.mul (i32.const 3) (get_local 0)) (get_local 0))
 )

This means that the wasm-to-native compiler would have to do the common subexpression elimination to ensure good performance. To alleviate this, wasm allows the code to use local variables to store temporary results.

Our example can then become:

(set_local 1
       (i32.add (i32.mul (i32.const 3) (get_local 0)) (get_local 0))
)
 (i32.mul
       (get_local 1)
       (get_local 1)
 )

There are discussions about where the optimizations should lie:

  • Between the original code, for example C/C++, to wasm
  • Between wasm and the binary code used for the target architecture in the browser or in the non-web case

Memory

The memory subsystem for wasm is called a linear memory where the module can request a given size of memory starting at address 0. Loading and storing from memory can either use constants for simple examples or code but also variables containing addresses.

For example, this would store the integer 42 at the address location 0:

(i32.store (i32.const 0) (i32.const 42))

Wasm does define sign or zero extension for the memory operations. The operations can also define the alignment of the memory operation in case the architecture could take advantage of it for better code generation. Finally, the operation also has the offset parameter to permit loads of (for example) a structure field.

Wasm-to-LLVM Prototype

The wasm-to-LLVM prototype, the tool that I am contributing to wasm, provides a means to compile wasm code directly into x86 code via the LLVM compiler. Though wasm is intended for use in the web browser, there are plans to use wasm in non-web-based scenarios, as defined in the high-level goals of the language.

The general structure of the wasm-to-LLVM prototype is to parse the wasm test file using the open source tools flex and bison to construct an Intermediate Representation (IR). In this intermediate representation, there is a pass, and there most likely will be more in the future, that walks the IR before performing the code generation via the LLVM compiler.

 wasm-to-LLVM prototype.

Figure 1:wasm-to-LLVM prototype.

Figure 1 shows the base structure of the tool: it starts with using as input the temporary textual file format for wasm. This format is known as the S-expression. The S-expression is parsed by a lexical and semantic parser, implemented using the flex and bison tools. Once the S-expression is parsed, an internal Intermediate Representation (IR) is created and a pass system is used to change it slightly before generating the LLVM IR. Once in LLVM IR, the code is then sent to the LLVM optimizer before generating Intel® x86 code.

A First Wasm Example

Here we include one basic example: a simple sum of the values in an array. The example demonstrates how wasm is easy to understand and a few things to keep in mind.

Wasm is intended to be generated by a compiler, and C/C++ is a source language that will be first used to generate wasm. There is not a specific compiler that will transform C/C++ into wasm, although LLVM has already seen some wasm-related work committed to the project. We will likely see other compilers such as GCC and MSVC support the wasm language and environment. While writing wasm by hand will be rare, it is interesting to look at and understand how the language is meant to interact with the browser/OS and the underlying architecture.

Sum of an array

;; Initialization of the two local variables is done as before:
;;   local 1 is the sum variable initialized to 0
;;   local 2 is the induction variable and is set to the
;;   max element and is decremented per iteration
 (loop
        (if_else
          (i32.eq (get_local 2) (i32.const 0))
           (br 1)
            (block
              (set_local 1 (i32.add (get_local 1) (i32.load (get_local 2))))
              (set_local 2 (i32.sub (get_local 2) (i32.const 4)))
            )
        )
        (br 0)
      )

Note: this is not necessarily the optimal way of doing things. It is a useful example to demonstrate key parts of the wasm language and logic. A future article will explore various constructs that can better mimic the underlying hardware’s instructions.

As the example above shows, a loop is defined by a loop node. Loop nodes can define the start and exit block’s names or be anonymous as it is here. To better understand the code, the loop uses two local variables: Local 1 is the sum variable while walking the array, and local 2 is the induction variable being updated. Local 2 actually represents the pointer to the current cell to be added.

Here is the C-counterpart of the code:

// Initialization is done before
//   local_1 is the sum variable initialized to 0
//   local_2 is the induction variable and is set to
//   the max element and is decremented per iteration
do {
   if (local_2 == start) {
     break;
   }
   local_1 = local_1 + *local_2;
   local_2--;
} while(1);

The loops in wasm actually work like do-while constructs in C. They also are not implicitly looping back; we need to define an explicit branch at the end of the wasm loop. Further, the “br 0” node at the end says to branch to top of the loop; the 0 represents the level of the loop nest we want to go to from here.

The loop starts with checking if we want to do an extra iteration. If the test is true, we will do a “br 1”, which you might infer is to go out one level of the loop nest. In this case, since there is only one level, we leave the loop.

In the loop, notice that the code actually is using a decrementing pointer to reach the start of the array. In the C-version there is a convenient variable called start, representing the array the code is summing.

In wasm, since the memory layout starts at address 0 and since this is the only array of this overly simplified example, it is arbitrarily determined that the start address of the array would be 0. If we put the array anywhere else, this comparison would look more like the C-version and compares the local variable 2 with a parameter-passed offset.

Notice the difference in handling the induction variable’s update between the C and the wasm versions. In C, the language allows the programmer to simply update the pointer by one and it will transform this into an actual decrement into four later down the line. In wasm, we are already at a very low level, hence the decrement of four is already there.

Finally, as we showed that (get_local 2) is the loop counter, (get_local 1) is the actual sum of the vector. Since we are doing a sum in 32-bit, the operation is using the i32.add and i32.load opcodes. In this example, the vector we are summing is at the beginning of the linear memory region.

Wasm-to-llvm Code Generation

The wasm-to-llvm-prototype generates the following loop code quite easily:

.LBB4_2:
  movl  %edi, %edx
  addl  (%rdx,%rcx), %eax
  addl  $-4, %edi
  jne .LBB4_2

This is quite a tight loop when you consider the original wasm text format code. If you consider the following equivalent C-version:

for (i = 0; i < n; i++) {
    sum += tab[i];
}

The GNU* Compiler Collection (GCC), version 4.7.3, produces in the -O2 optimization level, the following code is similar in its logic:

.L11:
  addl  (%rdi,%rdx,4), %eax
  addq  $1, %rdx
  cmpl  %edx, %esi
  jg  .L11

In the wasm case, we have generated a countdown loop and use the subtraction result directly as the condition for the loop jump. In the GCC case, a comparison instruction is required. However, the GCC version does not use an additional move instruction that the wasm with LLVM uses.

In O3, GCC vectorized the loop. What this shows is that the wasm-to-llvm prototype still has a bit of work in order to generate the optimal code (vectorized or not). This will be described in a future article.

 

Conclusion

This article introduced you to the wasm language and showed a single simple example of a sum of an array. Future articles will dive more into detail about wasm and its different existing tools.

Wasm is a new language and a lot of tools are being developed to help determine its potential, to support it, and to understand what it can and cannot do. There are many elements about wasm that need to be determined, explained, and explored.

 

About the Author

Jean Christophe Beyler is a software engineer in the Intel Software and Solutions Group (SSG), Systems Technologies & Optimizations (STO), Client Software Optimization (CSO). He focuses on the Android compiler and eco-system but also delves into other performance related and compiler technologies.

Native Intel® RealSense™ SDK Image Copying in Unity*

$
0
0

This sample uses a native Unity plug-in to increase performance of displaying Intel® RealSense™ SDK image data by bypassing the C# layers of the SDK. Image data is uploaded to the GPU directly through the graphics API.

The sample is available at https://github.com/GameTechDev/UnityRealSenseNativeTexture.

The sample supports Direct3D* 9, Direct3D* 11, and OpenGL*, but it could be modified to support other APIs. This plug-in is based on the native plug-in example provided by Unity. That sample, and more information on native Unity plug-ins, can be found here.

Screenshot from the sample running in Unity*

Figure 1. Screenshot from the sample running in Unity*.

Sample Contents

This sample includes:

  • The plug-in source code
  • A sample Unity project that uses the plug-in

How It Works

The process of this native texture copy in Unity is as follows:

  1. The application obtains a PXCMImage object from the Intel® RealSense™ SDK and uses the QueryNativePointer function to obtain the native PXCImage pointer.
  2. The PXCImage pointer and the destination Unity Texture2D pointer are passed to the native plug-in.
  3. When the native plug-in’s callback is invoked on Unity’s render thread, the data from the PXCImage image is copied to the native texture using the appropriate graphics API.

Requirements

  • Intel® RealSense™ SDK R5 or higher is recommended. Previous SDKs can be used but the developer will need to modify the PXCCLR project to add support for the QueryNativePointer function.
  • Unity 5.2 or higher is recommended. The native plug-in was not tested on previous versions.
  • Visual Studio* 2013 or higher. Previous versions can be used, but haven’t been configured.

Running the Sample

The sample includes a Unity project called SampleProject which demonstrates using the plug-in to display the color stream of an Intel® RealSense™ camera.

Follow these steps to run the sample:

  1. Download the Intel® RealSense™ SDK version – You will need Intel RealSense SDK version R5 or greater.
  2. Update the SampleProject’s SDK DLLs - Run the UpdateSDKDlls.bat in the SampleProject folder to copy the required managed DLLs from the SDK into the SampleProject.
  3. Build the native plug-in – Open and build NativeRSTextureCopy.sln which can be found in within the src folder. A post-build step will copy the plug-in into the SampleProject folder.
  4. Open and Run the SampleProject in Unity.

Integrating the Plug-in

Follow these steps to integrate the native plug-in into an existing Unity application that uses Intel® RealSense™ SDK:

  1. Build the native plug-in – The plug-in must be rebuilt for the SDK the application is using. Open the Visual Studio solution and build the Release configuration for the desired target architecture.
  2. Add UseTexturePlugin.cs to the project – Copy the UseTexturePlugin.cs script from the SampleProject folder into your project, and attach it to a persistent game object in the scene. This script is responsible for pumping the native render callback event as well as providing an interface to the native plug-in functionality.
  3. Pass PXCMImages to the plug-in – Replace calls in your application to PXCMImage.ToTexture2D with calls to UseTexturePlugin.CopyPXCImageToTexture.
  4. Handle API specific formatting – Depending on the graphics API, the texture may be displayed flipped vertically, or with the colors inverted. The NativeRGBImage.shader file demonstrates how to correct this formatting issue.
  5. Notify on shutdown - Call the UseTexturePlugin.RealSenseShutdown function before shutting down the Intel® RealSense™ SDK Sense Manager.

Important Notes

  • Because this is a native plug-in linked against a specific version of the Intel® RealSense™ SDK, you must recompile the plug-in when switching to different SDK versions. Also, be sure to compile it for all CPU architectures you plan on supporting.
  • The QueryNativePointer function was added to the C# API in the R5 release to allow access to managed SDK objects’ underlying native objects. This sample will still work on previous SDKs, but you will need to add the QueryNativePointer functionality to the PXCCLR project in the installed SDK.

Intel Delivers AVS 2.0 Bitstreams for Efficient Decoder Validation

$
0
0

Intel Stress Bitstreams &amp; Encoder masthead English

New Release!

Chinese version

Intel announces new Audio Video Standard (AVS) 2.0 support in its enterprise-grade video conformance product, Intel® Stress Bitstreams and Encoder (Intel® SBE).

The first versions of AVS (AVS and AVS+) have had widespread adoption in People’s Republic of China over the past decade; work began on a successor, AVS 2.0 in 2013 to provide significant improvements to video quality. By expanding Intel® SBE support to AVS 2.0, Intel helps video solution providers be first to market with high-quality, well-tested products. Intel® SBE also supports HEVC and VP9.

In this release (2016 R3), Intel® SBE provides comprehensive AVS 2.0 decoder conformance validation bitstreams, including Main (4:2:0 8-bit) and Main 10 (4:2:0 10-bit) profiles, compliant with the latest software reference model (RD12). Intel provides very high coverage of AVS 2.0 at both the syntax and value levels using technology introduced first in the Intel® SBE HEVC and VP9 bitstreams products. Intel® SBE will be updated as AVS 2.0 completes its standardization process.

Learn more at Intel® SBE.

Get a Free Trial NowAVS 2.0 | HEVC | VP9

 

Intel SBE is part of the Intel® Media Server Studio family of products.

 

 

 

 

Simple RGB Streaming with the Intel® RealSense™ SDK

$
0
0

Download Code Sample [Zip: 19 KB]

Contents

Introduction

Are you thinking about creating a simple application with RGB streaming that uses an Intel® RealSense™ camera and the Intel® RealSense™ SDK, or simply using RGB streaming in one of your applications? Do you want an easy-to-follow and easy-to-understand application that is direct and to the point without a lot of extra code that clouds up what you are trying to learn? Then you’re in luck, because that’s exactly what I’ve tried to do here: create a simple, yet effective sample application and document that describes how to use the Intel RealSense camera and SDK. 

This sample was written using Intel RealSense SDK R4 Visual Studio* C# and tested with R5.  It requires an Intel RealSense camera F200.

Project structure

In this sample application, I have tried to separate out the Intel RealSense SDK functionality from the Windows* Form GUI layer code to make it easier for a developer to focus on the SDK’s streaming functionality.  I’ve done this by creating a C# wrapper class (RSStreaming) around some of the Intel RealSense SDK classes. 

The Windows Form app contains only a few buttons and a PictureBox control to display the RGB stream.

Note that I’m not trying to make a bulletproof application. I have added some degree of exception handling; however, it’s up to you to ensure that proper engineering practices are in place to ensure a stable user-friendly application.

This project structure also relies on using Events to pass data around, which thus eliminates the need for tight coupling. A helper event class was created: RSNewImageArg which inherits from EventArg. It’s used to post the current frame from the camera back to the client form application.

Getting Started

To get started, you’ll need to have an Intel RealSense camera F200. You also need to have the Intel RealSense SDK version R4 or higher, and the appropriate Depth Camera Manager (DCM) installed on your computer. The SDK and F200 DCM can be downloaded here

Requirements

Hardware requirements:

  • 4th generation Intel® Core™ processors based on the Intel® microarchitecture code-name Haswell
  • 8 GB free hard disk space
  • Intel RealSense camera F200 (required to connect to a USB 3 port)

Software requirements:

  • Microsoft Windows 8.1/Win10 OS 64-bit
  • Microsoft Visual Studio 2010–2015 with the latest service pack
  • Microsoft .NET* 4.0 Framework for C# development
  • Unity* 5.x or higher for Unity game development

RSNewImageArg.CS

RSNewImageArg derives from the C# EventArgs class. As you can see it’s a small wrapper that has one private data member added to it. The private Bitmap _bitMap holds the current bitmap that was extracted from the camera stream.

This class is used as an event argument when the RSStreaming class dispatches an event back to the Form class indicating that a new bitmap image is ready to display.

RSStreaming.CS

RSStreaming is a wrapper class, an engine so to speak around streaming RGB data from the Intel RealSense camera. I wrote the class with the following intentions:

  • Cleanly and clearly isolate as much of the Intel RealSense SDK functionality as possible away from the client application.
  • Try to provide comments in the code to help the reader understand what the code is doing.

The following describes each function that comprises the RSSpeechEngine class.

public event EventHandler<RSNewImageArg>     OnStreamingImage;

The OnStreamingImage Event is used to trigger a message back to the client application letting it know that a new RGB bitmap image is ready to display. The client creates an event handler to handle the RSNewImageArg object.

public bool Initialized

Getter property used as a flag to indicate that the RSStreaming class has been initialized.

public bool IsStreaming

Getter property used as a flag to indicate that the RSStreaming class is currently streaming RGB data.

public void StartStreaming()

Checks to see if the class has been initialized and if not calls the InitCamera to ensure the class is up and running properly. Once this has been done, the function calls the _senseManager.StreamFrames( …  ) function. 

If you have done much reading about developing Intel RealSense applications, you have probably noticed that pulling data from the camera is often done in a while loop. For example, something like the following:

while (!Stop)
{
   /* Wait until a frame is ready: Synchronized or Asynchronous */
   if (sm.AcquireFrame(Synced).IsError())
      break;

  /* Display images */
   PXCMCapture.Sample sample = sm.QuerySample();

   /* Render streams */
   EventHandler<RenderFrameEventArgs> render = RenderFrame;
   PXCMImage image = null;
   if (MainPanel != PXCMCapture.StreamType.STREAM_TYPE_ANY && render != null)
   {
      image = sample[MainPanel];
      render(this, new RenderFrameEventArgs(0, image));
   }

   if (PIPPanel != PXCMCapture.StreamType.STREAM_TYPE_ANY && render != null)
      render(this, new RenderFrameEventArgs(1, sample[PIPPanel]));

   /* Optional: Set Mirror State */
   mirror = Mirror ? PXCMCapture.Device.MirrorMode.MIRROR_MODE_HORIZONTAL :
                                                                                                                                        PXCMCapture.Device.MirrorMode.MIRROR_MODE_DISABLED;
   if (mirror != sm.captureManager.device.QueryMirrorMode())
      sm.captureManager.device.SetMirrorMode(mirror);

   sm.ReleaseFrame();

   /* Optional: Show performance tick */
   if (image!=null)
      timer.Tick(PXCMImage.PixelFormatToString(image.info.format)+""+image.info.width+"x"+image.info.height);
}

This is a LOT of code to wade through.  Now granted, they may be doing more than what my sample application does, but my point is that my application does not run a while loop like this. My application uses the StreamFrames(…) function. This function handles the while loop internally and for every frame triggers an event RSStreamingRGB will subscribe to. Essentially it works like this:

  1. Kick off the stream PXCMSenseManager.StreamFrames(…).
  2. Trap the event in an event handler.
  3. When you’re done streaming, call the PXCMSenseManager.Close( ).

I like this approach, because I don’t want to have to manually deal with a while loop, knowing when and how to stop the loop. I would rather rely on the SDK to take care of that for me. When I talk about the InitCamera() function, you will see how this methodology is configured so I won’t talk about it here. Just make sure that you see how we can stream data and allow the SDK to handle the looping over the raw data coming from the camera.

Once the StreamFrames has been called, I then set the boolean flag _isStreaming to true allowing the class and client app to know that streaming has been started.

public void StopStreaming ( )

Stop streaming does the opposite of StartStreaming. It instructs the SDK to stop streaming data from the camera and calls Dispose() to destroy the objects the data.   

private void InitCamera ( )

InitCamera() creates the PXCMSenseManager instance and enables the type of stream we want. As you can see I’m specifying a 320x240 color stream at 30 fps.  

Recall what I said about being able to use an event from the PXCMSenseManger letting the class know when a new frame of RGB data is available. This is done using the PXCMSenseMananger.Handler event class. It’s a simple process: create an instance of the Handler class, assign it an event handler via the onNewSample, then initialize the PXCMSenseManager object; _senseMananger with the handler class.

Once this is completed, I set the _initialized flag to true. As previously mentioned, this flag is used to let either this class internally, or client app know that RSStreaming has been initialized.

private pxcmStatus OnNewSample( )

This is the event handler for the PXCMSenseMananger.Handler() object. Recall in the InitCamera() function I set the handler objects event handler to this function.

The event handler must adhere to a given function signature.  The function must return a pxcmStatus value and takes two parameters:

  • Mid. The stream identifier. If multiple streams are requested through the EnableVideoStreams function, this is PXCMCapture.CUID+0, or PXCMCapture.CUID+1 etc.
  • Sample. The available image sample.

We need to convert the PXCMCapture.Sample object into a usable bitmap that the client application can use to display.

First I check to ensure that the sample.color object is not null and that the classes internal bitmap _colorImageData is not null as well.  We need to ensure that our internal _colorImageData is not holding any data and to release it if it is.

Next we need to use the sample.colors object to populate the _colorImagedata. This basically is a metadata object about the PXCMCapture.Sample color object. Once we have that, we can tell it to create a bitmap for us specifying a size.

Once we have the bitmap and we know it’s not null, I trigger the OnStreamingImage event specifying the source of the event and a new RSNewImageArg object.

Finally, we MUST release the current frame from the PXCMSenseMananger object and as required by the function signature, return a pxcmStatus. I could have done some exception handling here, but I chose not to in order to keep things as simple as possible. If I had done some, I could have trapped it and chosen a different pxcmStatus to return, however, I’m just returning success.

private void Dispose ( )

Dispose() cleans up. I check to ensure that the manager is not null, that it was initialized and, if so, I call it’s dispose method. I check to ensure that RSStreaming’s bitmap is not null and dispose of it. Then I set everything to null.

MainForm.CS

The main form is the GUI that displays the RGB stream and allows you to control the RSStreaming object. It has two global variables: an instance of RSStreamingRGB and a bitmap. The bitmap will contain the current image from the current frame that’s sent by the RSStreamingRGB class.

public MainForm( )

The forms constructor. Creates a new RSSTreamingRGB object and gives the OnStreamingImage event an event handler.

private void btnStream_Click( )

The event handler when the Start Streaming button is clicked. Instructs the _rsStreaming object to start streaming by calling its StartStreaming() function.

private void btnStopStream_Click( )

The event handler when the Stop Streaming button is clicked. Instructs the _rsStreaming object to stop streaming by calling its StopStreaming() function.

private void UpdateColorImageBox( object source, RSNewImageArg e )

UpdateColorImageBox is the event handler for the _rsStream.OnStreamingImage event.  It ensures that the newImage argument is not null, and, if not, assigns _currentBitMap to a new bitmap using the newImage as the source bitmap.

If I don’t create a new bitmap, the form’s _currentBitMap will be pointing back to the original bitmap that the SDK created. This can be problematic when calling the RSStreaming.Dispose method. The client has a picture box, the picture box has an image, and that image is coming from the SDK. When the form and picture box are still active, if I try to call RSStreaming.Dispose which releases SDK resources, I would get a crash because the picture box’s source image was now being disposed of.

After _currentBitMap has been assigned a new image, I call pictureBox.Invalidate() which forces the picture box’s Paint event to be triggered.

private void pictureBox_Paint( object sender, PaintEventArgs e )

This is the picture box’s paint event handler, which is triggered by the call to pictureBox.Invalidate(). It forces the picture box to redraw itself with the current source image.

First I check to ensure that the _currentBitMap is not null, and if not I set it to the most recent bitmap which is stored in _currentBitMap.

private void btnExit_Click( )

Easy enough. Simply calls Close(). No need to handle any clean up here because I ensure that this is happening in the MainForm_FormClosing method.

private void bMainForm_FormClosing( )

This is the forms event closing event handler. When the Close() method is called in any given function, the FormClosing event is called. I didn’t want to duplicate code so I simply put all clean up code here. I check to ensure that _rsStream is not null and that it’s streaming. If these conditions are met, I call _rsStream.StopStreaming(). There is no need to call a dispose method on _rsStreaming because it’s taken care of inside StopStreaming.

Conclusion

I hope this article and sample code has helped you gain a better understanding of how to use the Intel RealSense SDK to create a simple RGB streaming application.  My intent was to show how this can be done in an easy-to-understand simple application that covers everything to be successful in implementing your own RGB streaming application.

If you think that I left out any explanation or wasn’t clear in a particular area, OR if you think I could have accomplished something in a better way, please shoot me an email at rick.blacker@intel.com or make a comment below.

About the Author

Rick Blacker is a seasoned software engineer who spent many of his years authoring solutions for database driven applications.  Rick has recently moved to the Intel RealSense technology team and helps users understand the technology. 

Intel® RealSense™ Technology: The Backbone of Posture Monitor’s Innovation

$
0
0

The ingenious Posture Monitor app, from husband-and-wife team Jaka and Jasmine Jaksic, is a great example of using technology to change lives for the better. An award-winner in the Open Innovation category of the 2015 Intel® RealSense™ App Challenge, it addresses a problem plaguing an estimated 50 percent of US workers – back pain. Using an Intel® RealSense™ camera (F200) and several third-party software tools, the team turned the camera’s data stream into a handsome package of graphs and statistics – all while balancing power consumption and frame rate. In this article, you’ll learn how the Jaksics relied on a strong software engineering process to keep advancing toward their goal–and we’ll show you their frame-rate processing code sample, which illustrates how to minimize power consumption and still provide a smooth user experience.

Jaka Jaksic was co-founder and lead engineer for San Francisco-based startup Plumfare, a social gifting mobile app. Plumfare was acquired by Groupon in 2013, leaving Jaka on the lookout for another great opportunity. Jasmine is a long time product- and project-manager and currently works at Google. Combining their expertise, they started the company JTechLab. The couple decided to pursue a product related to posture problems—something they both have battled. The app has moved from prototype to production, during which time Jaka encountered and overcame a few interesting hurdles. The lessons learned include data conversion, power conservation, and integrating Intel RealSense technology with commercial software tools.

Posture Monitor continually monitors your posture and alerts you to potential problems using advanced graphics and statistics, packaged in an attractive interface.

Contest Deadline Spurs Rapid Advances

Like many who suffer back pain, Jaka had tried several products for posture correction—without success. “It’s not that difficult to sit straight,” he explained. “But it's difficult to do it all the time, because it requires constant attention. While you are focused on work, your posture and taking breaks are usually the last things on your mind.”

One day, Jaka got the revolutionary idea of using a 3D camera as a posture-tracking device, and, after just a little research, he landed on a solution with the Intel® RealSense™ technology. “I also noticed that there was a contest going on,” he said, “I got busy right away, and just made the deadline.” Successful applicants received an Intel RealSense camera (F200) and the complete Intel® RealSense™ SDK as encouragement to create the next great app.

After the Intel RealSense camera arrived, Jaka built his first working prototype in about two days, taking half that time to learn about Intel RealSense technology and set up the camera’s data pipeline, and the other half to build the first posture detection algorithm. Once the proof of concept was complete, he proceeded with further development and built the more polished app.

At this point they began some informal usability testing. “Usually, in software projects, it’s good to get as much feedback as possible early on, from all sorts of people,” he said. In this case, the amount of time was very limited by the project deadline and by the fact that both he and Jasmine had separate, full-time jobs. “As a general rule, the user interface (UI) is crucial,” Jaka explained. “Right after the technological proof of concept that verifies that the thing can be built, I would recommend focusing on the user experience.” That may mean a prototype tied directly to the UI, and some targeted questions:

  • Do you understand what this UI does?
  • Can you tell how to use the product?
  • Do you find it attractive?

Only after positive responses to these questions, Jaka says, should the development process turn to creating functionality. “Never wait to create and test the UI until after you have a complete functionality, with considerable time and effort already sunk into development,” Jaka said. He pointed out that once your project is too far along, it’s very costly to go back and fix basic mistakes. “User feedback is the kind of thing that can always surprise you. We’re fairly experienced with application design, but still we’re just now finding things that, when we sit the users in front of the app, they say, ‘Whoa, what’s this?’”

It can be expensive—and time consuming—to get your app prototype into a UI lab, but the benefit for doing so is big savings down the road. In addition to asking friends and colleagues, one cheap and easy method of user testing that Jaka employed in the past is to go to a local coffee shop and give people $5 gift cards in exchange for their feedback. “People are usually happy to help, and you will learn a lot.”

Advice from an Expert App Designer

Jaka said that the demos provided by Intel are extremely useful—but he had a few words of caution. “Intel’s examples are technology demonstrations rather than starting points for building your own application, so you have to strip away all the unnecessary demo functionality,” he said.

For Posture Monitor, the camera’s data pipeline was the essence, and Jaka drilled down to exclusively focus there. Since the SDK didn’t stay centered on the user’s face at all times, Jaka used the raw data stream and processed it himself. He used the Intel RealSense camera to locate the user’s face, which he then converted to a rectangle. He could next approximate where the spine was by calculating the center of the subject’s torso. By noting where the pixels were and where the head was, he could continually calculate the true center of gravity. He noted that much of his approach will change when he adopts the Intel® RealSense™ SDK R5 version, which will support body-tracking using a new camera, the user-facing SR300. It will be available in Q2 2016.

Jaka also overcame limitations concerning infrared 3D camera physics. While skin reflects infrared light easily, the reflection is occasionally muddied by certain hairstyles and clothing choices. (Busy prints, dark colors and shiny fabrics may pose a problem; as could long, dark hair that obscures the torso.) From the depth-camera standpoint, certain combinations report that the user has no torso. “It’s like they’re invisible and just a floating head,” he said. There isn’t much you can do in such cases other than detect when it happens and suggest the user wear something else.

In order to work for everybody, Posture Monitor requires each user to complete a calibration sequence: they sit straight, demonstrating perfect posture, and then click “calibrate.” The application compares their posture at a given time to their ideal posture, and that’s how it assesses what’s good or bad.

The calibration sequence of Posture Monitor ensures that the system can identify key aspects of your body and thus track your posture.

The team has yet to use specialized medical knowledge or chiropractic experts, but Jaka says that day is coming. “We wanted the application to be able to detect when the user is slouching, and the current version does that really well. After we launch, we’re going to reach out to medical professionals and add some more specialized functionality.”

Minimizing Power Consumption

At full frame-rate, Intel RealSense applications are too CPU-intensive to be used in a background application. The obvious solutions are to only process every N-th frame or to have a fixed delay between processed frames. This is typically a good tradeoff when the user interface is not shown and responsiveness does not matter. But what if we want the best of both worlds: minimize power consumption and still provide a smooth user experience when required?

Jaka developed a frame-processing pipeline with a dynamic frame-rate, where the baseline frame-rate is low (for example, one frame every two seconds), and is elevated only when a visible control requires it. Using this technique, Posture Monitor uses less than two percent of the CPU when minimized–or when no real-time controls are shown–without any degradation of overall user experience. It’s a relatively simple and completely generic code pattern that’s easily applicable to almost any app.

Here is the sample code:

using System;
using System.Drawing;
using System.Threading;
using System.Windows;
using System.Windows.Controls;

namespace DynamicFramerateDemo
{
    class CameraPipeline
    {
        public static readonly CameraPipeline Instance = new CameraPipeline();

        // Baseline/longest frame delay (this is used
        // unless a shorter delay is explicitly requested)
        private const int BASELINE_FRAME_DELAY_MILLIS = 2000;
        // Timer step / shortest frame delay
        private const int TIMER_STEP_MILLIS = 100;

        private PXCMSenseManager senseManager = null;
        private Thread processingThread;
        private int nextFrameDelayMillis = TIMER_STEP_MILLIS;

        public int CapNextFrameDelay(int frameDelayMillis) {
            // Make sure that processing of the next frame happens
            // at least within the specified delay
            nextFrameDelayMillis = Math.Min(nextFrameDelayMillis, frameDelayMillis);
            return nextFrameDelayMillis;
        }

        public void Start() {
            // Initialize SenseManager with streams and modules
            this.senseManager = PXCMSenseManager.CreateInstance();
            senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_COLOR, 640, 480, 30);
            senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_DEPTH, 640, 480, 30);
            senseManager.EnableFace();
            senseManager.Init();

            // Frame processing thread with dynamic frame rate
            this.processingThread = new Thread(new ThreadStart(delegate {
                while (processingThread != null) {
                    // Sleep in small increments until next frame is due
                    Thread.Sleep(TIMER_STEP_MILLIS);
                    nextFrameDelayMillis -= TIMER_STEP_MILLIS;
                    if (nextFrameDelayMillis > 0)
                        continue;

                    // Reset next frame delay to baseline long delay
                    nextFrameDelayMillis = BASELINE_FRAME_DELAY_MILLIS;
                    try {
                        if (senseManager.AcquireFrame(true, TIMER_STEP_MILLIS).IsSuccessful()) {
                            ProcessFrame(senseManager.QuerySample());
                        }
                    } finally {
                        senseManager.ReleaseFrame();
                    }
                }
            }));
            processingThread.Start();
        }

        private void ProcessFrame(PXCMCapture.Sample sample) {
            // [Do your frame processing and fire camera frame event]
        }
    }

    // Sample control that sets its own required frame rate
    class CameraViewControl : UserControl
    {
        // This event handler should get called by CameraPipeline.ProcessFrame
        protected void HandleCameraFrameEvent(Bitmap depthBitmap) {
            if (this.IsVisible && Application.Current.MainWindow.WindowState != WindowState.Minimized) {
                // While the control is visible, cap the frame delay to
                // 100ms to provide a smooth experience. When it is not
                // visible, the frame rate automatically drops to baseline.
                CameraPipeline.Instance.CapNextFrameDelay(100);

                // [Update your control]
            }
        }
    }
}

The Start() method initializes the Intel RealSense technology and starts a processing loop with a fixed delay (TIMER_STEP_MILLIS). This delay should be the lowest frame delay that your application will ever use (for example, 100 ms). In each loop iteration, this interval is subtracted from a countdown counter (nextFrameDelayMillis), and a frame is only acquired and processed when this counter reaches zero (0).

Initially and after every processed frame, the countdown timer is set to the baseline (longest) delay (BASELINE_FRAME_DELAY_MILLIS), (for example, 2000 ms). The next frame is processed only after this time, unless during this time any agent requests a lower value by calling CapNextFrameDelay. A lower delay / higher frame-rate is typically requested by visible user-interface controls (such as the CameraViewControl example), or by internal states that demand a higher frame rate. Each such agent can set the maximum acceptable frame delay and the lowest value will win; this way the frame rate always meets the most demanding agent. The beauty of this solution is that it is extremely efficient, very simple to implement, and only requires one additional line of code for each agent to set its required frame rate.

The Right Tools

The list of tools and technology that Jaka integrated with the Intel RealSense SDK to create Posture Monitor is impressive:

Jaka said he used Microsoft Visual Studio and C# because they are industry standards for building Microsoft Windows* applications. He wanted to use tools that had the largest community behind them, with lots of third-party libraries available. He picked additional libraries for each particular need, trying many different products and picking the best ones.

Jaka didn’t write or use any plug-ins to get the Intel RealSense technology working with the application. He said that the SDK itself provides solid data structures that are standard and easy to use. “Sometimes you might have to use raw data,” he said. “We used the raw depth format with 16 bits per pixel, which is the most precise way of reading raw data.” He then wrote an algorithm to convert the data into a bitmap that had a higher contrast where it mattered. His converted bitmap focuses on the range where the person’s body is, and enhances contrast and accuracy around that range.

Posture Monitor integrates leading statistics and data-charting programs with the Intel RealSense SDK to produce an intriguing user interface full of helpful information.

To process the camera data, Jaka used Accord, a very extensive library for math and statistics. He liked that Accord also has some machine-learning algorithms and some image processing. The data had to be converted into a compatible format, but, once achieved, it was a great step forward. “Once you get the data into the right form, Accord can really help you,” Jaka said. “You don’t have to reinvent the wheel for statistical processing, object recognition, detecting things like shapes and curves—that type of stuff. It really makes things easier.”

Another tool, OxyPlot, is an open-source charting library that Jaka found to be very extensive and very flexible.

Avoid Technical Debt—No Sloppy Code!

Jaka has a general philosophy for development that has consistently brought him success. “Paying attention to code quality from the start pays dividends later on,” he said. So he starts by learning everything he needs before he starts coding. He’ll make a ‘throwaway’ prototype to serve as a proof-of-concept, which allows him to play with all the technologies and figure them out. At that point, he’s not focused on the quality of the code, because his goal is simply to learn. Then, he’ll discard that prototype and start over with a strong architecture, based on what he’s figured out.

At this point, Jaka is then ready to build high-quality components in a systematic fashion. The complexity of the code base always grows with time, so early decisions are critical. “You can’t afford to have any sloppiness in your code,” Jaka warned. “You want to clean up every bit as soon as you can, just to make sure that in time, it doesn’t become an unmanageable nightmare as technical debt piles on.”

Posture Monitor only works with a desktop or laptop PC, not mobile devices, because it needs a stationary camera. And right now it only works with Windows, because that’s what the Intel RealSense SDK currently supports. When the Intel RealSense SDK integrates with Apple MacBooks*, Jaka is ready to pursue that. And his ambitions don’t stop there—he’s interested in learning more about Intel RealSense technology working in some fashion with Linux*, too.

Jaka has also been thinking about building a specialized hardware device for posture monitoring, perhaps a product that would include an Intel® Atom™ processor. “I’m sure this work is going to take us in a very interesting direction, once we start interacting with the users and the medical community. We are looking forward to where the future takes us.”

A Revolution is Coming

From his perspective as a successful entrepreneur who has already struck gold once, Jaka believes that the Intel RealSense camera and SDK are reaching developers at a crucial time. He sees the global software market for desktop, mobile, and web apps as oversupplied. “Almost anything you can think of has already been built,” he believes. Now, with Intel RealSense technology, Jaka says it is much easier to innovate.

“This market is still fresh and unsaturated, so it’s a lot easier to come up with new and exciting ideas that aren’t already implemented,” he affirmed. “I think this is a really good time to start working with Intel RealSense technology. It’s a great time to dig in and get a head start, while the market is still growing.”

As one example of this, Jaka can envision combining Posture Monitor with IoT devices. “There are so many potential ideas right now that are just waiting to be built. Most of them, I’m sure, no one has even thought of yet, because the technology is so new. I think we’re in front of some really exciting times, both for developers and for consumers.”

Resources

Learn more about Posture Monitor at https://posturemonitor.org/

Download the Intel® RealSense™ SDK at https://software.intel.com/en-us/intel-realsense-sdk

Learn more about the 2015 Intel® RealSense™ App Challenge at https://software.intel.com/sites/campaigns/realsense-winners/details.html?id=22

Robotic Hand Control Using Intel® RealSense™ Cameras

$
0
0

Abstract

The Roemotion* Roy robotic arm is the result of a successfully funded Kickstarter project launched in 2012, which was described as a “project to create a human sized animatronic character from only laser cut mechanics and off the shelf hobby servos.” In this experiment, software has been developed using the Intel® RealSense™ SDK for Windows* to control the Roy hand using the SDK’s hand tracking APIs (Figure 1).

Figure 1. Robotic hand control software.

Figure 1.Robotic hand control software.

The code for this project was developed in C#/XAML using Microsoft Visual Studio* Community 2015, and works with both the Intel RealSense F200 and SR300 (coming soon) cameras. To see the software-controlled robotic hand in action, check out the YouTube* video: https://youtu.be/VQ93jw4Aocg

About the Roy Arm

The Roy arm assembly is currently available for purchase from the Roemotion, Inc. website in kit form, which includes:

  • Laser cut pieces
  • All necessary hardware
  • 8 hobby-grade servos
  • 6 servo extension cables

As stated on the Roemotion website, the kit does not include any control electronics. This is because the initial concept of the project was to supply cool mechanical systems for people to use with whatever controller they want. As such, this experiment incorporates a third-party servo controller for driving the motors in the robotic hand (Figure 2).

Figure 2. Roy robotic arm.

Figure 2. Roy robotic arm.

The hand incorporates six servo motors: one for each of the fingers (index, middle, ring, and pinky) and two for the thumb. (Note: there are two additional servos located in the base of the arm for controlling wrist movements, but these are not controlled in this experiment.)

Intel® RealSense™ SDK Hand Tracking APIs

As stated in the Intel RealSense SDK online documentation, the hand tracking module provides real-time 3D hand motion tracking and can track one or two hands, providing precise joint-level locations and positions. Of particular interest to this real-time device control experiment is the finger’s “foldedness” value acquired through calls to the QueryFingerData() method.

Control Electronics

This experiment incorporated a Pololu Micro Maestro* 6-channel USB servo controller (Figure 3) to control the six motors located in the Roy hand. This device includes a fairly comprehensive SDK for developing control applications targeting different platforms and programming languages.

Figure 3. Pololu Micro Maestro* servo controller.

Figure 3.Pololu Micro Maestro* servo controller.

Servo Controller Settings

Before custom software could be developed to directly control the robotic hand in this experiment, it was essential to understand each of the full-scale finger ranges in terms of servo control parameters. Unlike high-end robotic servos with integrated controllers, whose position encoders can be queried prior to applying torque, the low-cost servos used in the Roy hand needed to be energized cautiously to avoid rapid motor movements that could lead to binding the fingers and potentially stripping the motor gears.

Fortunately, the Pololu Micro Maestro SDK includes a Control Center app that allows a user to configure firmware-level parameters and save them to flash memory on the control board. The settings that were determined experimentally for this application are shown in Figure 4.

Figure 4. Pololu Maestro Control Center app.

Figure 4.Pololu Maestro Control Center app.

Once the Min and Max position settings are fixed, the servo controller firmware will not allow the servos to be accidentally software-driven to a position that exceeds the desired range of motion. This is critical for this type of application, which has mechanical hard-stops (that is, fingers fully open or closed) that could cause a motor to burn out or strip gears if over-driven.

Another important setting for an application such as this is the “On startup or error” parameter, which in this case ensures the default starting (and error) position for all of the fingers is “open” to prevent binding of the index finger and thumb if they were allowed to close indiscriminately.

The two final settings that are noteworthy are the Speed and Acceleration parameters. These setting allow for motion smoothing at the firmware level, which is often preferable to higher-level filtering algorithms that can add latency and overhead to the main software application.

Note: In more advanced robotic servos that include integrated controllers, a proportional–integral–derivative controller (PID) algorithm is often implemented that allows each term to be flashed in firmware for low-level (that is, closer to the metal) feedback tuning to facilitate smooth motor translations without burdening the higher-level software.

Custom Control Software

In this experiment, custom software (Figure 5) was developed that leverages many of the hand tracking features that are currently present in the SDK samples.

Figure 5. Custom Control Software.

Figure 5. Custom Control Software.

Although real-time fingertip tracking data is presented in the user interface, this particular experiment ultimately relied on the following three parameters for controlling the Roy hand:

  • Alert data
  • Foldedness data
  • Scaled data

Alert Data

Alerts are the most important information to monitor in a real-time device control application such as this. It is paramount to understand (and control) how a device will behave when its set-point values become unreliable or unavailable.

In this experiment the following alert information is being monitored:

  • Hand detected
  • Hand calibrated
  • Hand inside borders

The design of this software app precludes control of the robotic hand servos in the event of any alert condition. In order for the software to control the robotic hand, the user’s hand must be successfully calibrated and within the operating range of the camera.

As shown in the code snippet below, the custom app loops over the total number of fired alerts and sets three Boolean member variables -- detectionStatusOk, calibrationStatusOk, and borderStatusOk (note that handOutput is an instance of PXCMHandData):

for (int i = 0; i < handOutput.QueryFiredAlertsNumber(); i++)
{
  PXCMHandData.AlertData alertData;
  if (handOutput.QueryFiredAlertData(i, out alertData) !=
	 pxcmStatus.PXCM_STATUS_NO_ERROR) { continue; }

  switch (alertData.label)
  {
	 case PXCMHandData.AlertType.ALERT_HAND_DETECTED:
		detectionAlert = "Hand Detected";
		detectionStatusOk = true;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_NOT_DETECTED:
		detectionAlert = "Hand Not Detected";
		detectionStatusOk = false;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_CALIBRATED:
		calibrationAlert = "Hand Calibrated";
		calibrationStatusOk = true;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_NOT_CALIBRATED:
		calibrationAlert = "Hand Not Calibrated";
		calibrationStatusOk = false;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_INSIDE_BORDERS:
		bordersAlert = "Hand Inside Borders";
		borderStatusOk = true;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_OUT_OF_BORDERS:
		bordersAlert = "Hand Out Of Borders";
		borderStatusOk = false;
		break;
  }
}

A test to determine if detectionStatusOk, calibrationStatusOk, and borderStatusOk are all true is performed before any attempt is made in the software to control the hand servos. If at any time one of these flags is set to false, the fingers will be driven to their default Open positions for safety.

Foldedness Data

The custom software developed in this experiment makes calls to the QueryFingerData() method, which returns the finger’s “foldedness” value and fingertip radius. The foldedness value is in the range of 0 (finger folded) to 100 (finger extended).

The foldedness data for each finger is retrieved within the acquire/release frame loop as shown in the following code snippet (where handData is an instance of PXCMHandData.IHand):

PXCMHandData.FingerData fingerData;

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_THUMB, out fingerData);
thumbFoldeness = fingerData.foldedness;
lblThumbFold.Content = string.Format("Thumb Fold: {0}", thumbFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_INDEX, out fingerData);
indexFoldeness = fingerData.foldedness;
lblIndexFold.Content = string.Format("Index Fold: {0}", indexFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_MIDDLE, out fingerData);
middleFoldeness = fingerData.foldedness;
lblMiddleFold.Content = string.Format("Middle Fold: {0}", middleFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_RING, out fingerData);
ringFoldeness = fingerData.foldedness;
lblRingFold.Content = string.Format("Ring Fold: {0}", ringFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_PINKY, out fingerData);
pinkyFoldeness = fingerData.foldedness;
lblPinkyFold.Content = string.Format("Pinky Fold: {0}", pinkyFoldeness);

Scaled Data

After acquiring the foldedness data for each of the user’s fingers, scaling equations are processed to map these values to the full-scale ranges of each robotic finger. Each full-scale value (that is, the control pulse width, in microseconds, required to move the finger either fully opened or fully closed) are defined as constants in the servo.cs class:

// Index finger
public const int INDEX_OPEN = 1808;
public const int INDEX_CLOSED = 800;
public const int INDEX_DEFAULT = 1750;
.
.
.

Individual constants are defined for each finger on the robotic hand, which match the Min and Max servo parameters that were flashed in the Micro Maestro controller board (see Figure 4). Similarly, the full-scale range of the finger foldedness data is defined in the software:

int fingerMin = 0;
int fingerMax = 100;

Since the finger foldedness range is the same for all fingers (that is, 0 to 100), the range only needs to be defined once and can be used for the data scaling operation performed for each finger as shown below:

// Index finger
int indexScaled = Convert.ToInt32((Servo.INDEX_OPEN - Servo.INDEX_CLOSED) *
   (index - fingerMin) / (fingerMax - fingerMin) + Servo.INDEX_CLOSED);

lblIndexScaled.Content = string.Format("Index Scaled: {0}", indexScaled);
Hand.MoveFinger(Servo.HandJoint.Index, Convert.ToUInt16(indexScaled));
.
.
.

Check Out the Video

To see the robotic hand in action, check out the YouTube video here: https://youtu.be/VQ93jw4Aocg

Summary

This software experiment took only a few hours to implement, once the basic control constraints of the servo motors were tested and understood. The Windows 10 desktop app was developed in C#/XAML, and it leveraged many of the features present in the Intel RealSense SDK APIs and code samples.

About Intel® RealSense™ Technology

To learn more about the Intel RealSense SDK for Windows, go to https://software.intel.com/en-us/intel-realsense-sdk.

About the Author

Bryan Brown is a software applications engineer in the Developer Relations Division at Intel.

Playing at Ghosts: Face Tracking in Mystery Mansion*

$
0
0

Mystery Mansion* is a spooky, hidden-object adventure game from veteran game developer Cyrus Lum. The game took first place in the Pioneer track of the 2014 Intel® RealSense™ App Challenge, with its innovative, experimental approach drawing the attention of the judges.

As a self-described storyteller, Lum aims to “enhance that suspension of disbelief—to get [the] audience lost in the story and the world I’ve created.” Mystery Mansion is remarkable for its exclusive use of face tracking for the user interface (UI), which results in a highly immersive experience. However, the face-only approach posed a number of development challenges during Lum’s quest to implement intuitive controls and to create a satisfying user experience with a bare-bones UI. In this paper we’ll discuss the challenges encountered and the code Lum used to address them, including how to accurately calibrate the game’s UI for different players, and manage the movement and manipulation of objects in the environment with the intentionally limited control scheme.


The Mystery Mansion* splash screen, presenting its central, haunted-house theme.

Optimizations and Challenges

Face Tracking

The inspiration for Mystery Mansion came from Lum’s search for a way to use the capabilities of the Intel® RealSense™ SDK other than for hand gesture-control. This led to his decision to work on a game that would be controlled exclusively with face tracking.

After searching for a game theme and mechanics that would correspond with the necessarily simplified UI, Lum decided on a first-person, hidden-object game in the style of a point-and-click adventure. Controlling the game with the head alone, as if it were somehow detached from the body, inspired Lum’s idea of a ‘disembodied’ ghost as the game’s central playable character. In Mystery Mansion, the player takes the role of a spirit trapped in the mansion, looking to solve the mystery of its own death.

Nose and Eyes

The game requires that players visually explore the environment, locating and collecting a series of items. For lateral and vertical movement, the Intel® RealSense™ Camera F200 tracks the movement of the face—specifically the nose—allowing the player to intuitively look around the environment. The directional movement of the player’s face is reproduced by the reticule on-screen. 


The red reticule is controlled by the movement of the player’s face, which is tracked by the Intel® RealSense™ camera.

Lum wanted players to be able to explore the environment in 360 degrees—to turn left and right, and even look behind them. To make this possible, he implemented a system whereby once the player crosses a certain lateral movement threshold, the field of view begins to shift in that direction around the space, with the movement accelerating as the player’s view moves toward the edge of the screen.

To zoom in and out of the environment, the Intel RealSense camera tracks the distance between the player’s eyes. The closer together the eyes are in relation to the camera, the further away the user is–hence the more zoomed out from the environment (with the opposite true for zooming in). The camera zoom is calibrated to ensure that the player doesn’t need to move too close to the screen to effectively zoom in on objects.

/* get face data and calculate camera FOV based on eye distance.  Use face data to rotate camera. */
	void OnFaceData(PXCMFaceData.LandmarksData data) {
		if (!visualizeFace) return;
		if (colorSize.width == 0 || colorSize.height == 0) return;

		PXCMFaceData.LandmarkPoint[] points;
		if (!data.QueryPoints(out points)) return;

		/* Use nose tip as the control point */
		PXCMPointF32 xy=points[data.QueryPointIndex(PXCMFaceData.LandmarkType.LANDMARK_NOSE_TIP)].image;

		/* Get Left and Right Eye to calculate distance */
		PXCMPointF32 xyEyeLeft=points[data.QueryPointIndex(PXCMFaceData.LandmarkType.LANDMARK_EYE_LEFT_CENTER)].image;
		PXCMPointF32 xyEyeRight=points[data.QueryPointIndex(PXCMFaceData.LandmarkType.LANDMARK_EYE_RIGHT_CENTER)].image;

		float tmpEye = Mathf.Abs ((xyEyeLeft.x - xyEyeRight.x) * eyeDistScale);
		if (tmpEye < eyeDistNear) tmpEye = eyeDistNear;
		if (tmpEye > eyeDistFar) tmpEye = eyeDistFar;

		/* Use eyes apart distance to change FOV */
		Camera.current.fieldOfView = eyeFOVcenter - tmpEye;

Code Sample 1: This code uses the distance between the user’s eyes to calculate his or her distance from the screen and then adjusts the field of view accordingly.

Optimizing the UI

Calibration

Lum observed that every player will have a slightly different way of playing the game, as well as different physical attributes in terms of the size, dimensions and configuration of the facial features. This meant that calibrating the face tracking at the start of each play session was key to making the directional controls function correctly for each individual player. Lum inserted a calibration stage at the start of each play session to establish the “zero position” of the player and to ensure that they can be tracked within the range of the camera.

To establish the “zero position” during the calibration stage, the player moves his or her head to position the reticule within a bordered area at the center of the screen. This ensures that the player is within the range of the camera (the tracking volume) when turning his or her head, or moving in and out. The process ensures a consistent experience for every player regardless of differences in face shape, size, and position in relation to the camera.


The calibration stage at the beginning of each play session helps ensure a consistent and accurate experience for each different player.

//check to see if target graphic is within the box
if (!calibrated && calibratedNose && gameStart && (295 * screenScaleWidth) < targetX && targetX < (345 * screenScaleWidth) && (235 * screenScaleHeight) < targetY && targetY < (285 * screenScaleHeight)) {
			calibrated = true;
			tutorialImg = tutorial1Img;
			LeanTween.alpha(tutorialRect, 1f, 2f) .setEase(LeanTweenType.easeInCirc).setDelay(1.5f);
			LeanTween.alpha(tutorialRect, 0f, 2f) .setEase(LeanTweenType.easeInCirc).setDelay(8f).setOnComplete (showTutorialTurn);
		}

Code Sample 2: This code calibrates the game by ensuring the target reticule is within the red box show on the calibration screenshot.

Lateral Movement

The full freedom of lateral camera movement is necessary to create the 360-degree field of view that Lum wanted to offer the player. Fundamental to ensuring an optimized user experience with the in-game camera was the implementation of a safety zone and rotational acceleration.

//— rotate camera based on face data ———
		/* Mirror the facedata input, normalize */
		xy.x=(1-(xy.x/colorSize.width));
		xy.y=(xy.y/colorSize.height);

		/* exponentially accelerate the rate of rotation when looking farther away from the center of the screen, use rotateAccelerationScale to adjust */
		newX = (0.5f-xy.x)*(rotateAccelerationScale*Mathf.Abs((0.5f-xy.x)*(0.5f-xy.x)));
		newY = (0.5f-xy.y)*(rotateAccelerationScale*Mathf.Abs((0.5f-xy.y)*(0.5f-xy.y)));

		/* Camera is a hierarchy  mainCamBase  with Main Camera as a child of mainCamBase.  We will horizontally rotate the parent mainCamBase  */
		mainCamBase.transform.Rotate(0, (newX * (lookTimeScale*Time.deltaTime)), 0);

		/* angleY is a rotation accumulator */
		angleY += newY;
		if (angleY > lookUpMin && angleY < lookUpMax) {

			mainCam.transform.Rotate ((newY * (lookTimeScale * Time.deltaTime)), 0, 0);
			if(angleY < lookUpMin) angleY = lookUpMin;
			if(angleY > lookUpMax) angleY = lookUpMax;
		}
		else angleY -= newY;

Code Sample 3: This code controls the rotation and lateral acceleration of the camera as the user turns his or her head further from the center of the screen.

If the player keeps the reticule within a specific zone at the center of the screen—approximately 50 percent of the horizontal volume—the camera does not begin to rotate, which ensures that the player can explore that area without the camera moving unintentionally. Once the reticule is moved outside that zone, the lateral movement begins in the direction the player is looking, and accelerates as he or she moves the reticule toward the edge of the screen. This gives the player accurate and intuitive 360-degree control.

Vertical Movement

Lum’s experiments showed that complete freedom of camera-movement was less practical on the vertical axis, because if both axes are in play, the player can become disoriented. Furthermore, little is to be gained by allowing players to look at the ceiling and floor with the interactive elements of the game occupying the lateral band of vision. However, players needed some vertical movement in order to inspect elements near the floor or on raised surfaces. To facilitate this, Lum allowed 30 degrees of movement in both up and down directions, a compromise that lets the player look around without becoming disoriented.

Gameplay Optimizations

After exploring a variety of gameplay mechanics, Lum decided to add simple puzzle solving, in addition to the core, object-collection gameplay. Puzzle solving fit with the overall game design and could be effectively and intuitively implemented using only the face-tracking UI.  

The player picks up items in the game by moving the reticule over them, like a mouse cursor hovering over an object. At the rollover point, the object moves out of the environment toward the player—as if being levitated supernaturally—and enters the on-screen item inventory.


Here, the player has selected an object in the environment by moving his or her face to position the reticule over it, causing the object—in this case a pillbox—to move toward the player and into the inventory.

Ray ray = Camera.current.ScreenPointToRay(new Vector3((float)targetX, (float)Screen.height-(float)targetY, (float)0.1f));
RaycastHit hit;
if (Physics.Raycast(ray, out hit,100 )) {
	if(hit.collider.name.Contains("MysItem")){
		hit.collider.transform.Rotate(new Vector3(0,-3,0));
	}
}

Code Sample 4: This is the code that lets the player pick up objects using the face-tracking UI.

Lum also needed the player to move from room to room in a way that would fit with the simplified UI and the logic of the gameplay. To this end, each room has an exit door that is only activated once the player has collected all the available objects. At this point, the player moves his or her face to position the reticule over the door, causing the door to open and moving the game scene to the next space.

To add variety to the gameplay, Lum explored and added different puzzle mechanics that could be manipulated in similarly simple and logical ways. One of these is the block puzzle, where the player uses the directional control to move blocks within a frame. Lum implemented this action in multiple ways, including moving blocks within a picture frame to reconstruct an image and moving pipes on a door in order to unlock it.

else if(hit.collider.name.Contains("puzzleGame")){
	room currentRoom = (room)rooms[roomID];
	puzzleItem tmpPuzzleItem = (puzzleItem)currentRoom.puzzleItems[hit.collider.name];
	if(!LeanTween.isTweening(hit.collider.gameObject) && !tmpPuzzleItem.solved){
		if(hit.collider.name.Contains("_Rot")){
			LeanTween.rotateAroundLocal ( hit.collider.gameObject, Vector3.up, 90f, 2f ).setEase (LeanTweenType.easeInOutCubic).setOnCompleteParam (hit.collider.gameObject).setOnComplete (puzzleAnimFinished);
		}
	}
}

Code Sample 5: This code allows players to move blocks using face-tracking in order to complete some of the game’s puzzles.


The puzzles include this block picture. The player uses directional control to move pieces and to reconstruct the image resulting in finding a clue.


In this puzzle, the player uses the face-tracked directional control to move the segments of piping.

Text prompts also appear on the screen to help the player determine the next step, for example, when he or she needs to exit the room.

Testing and Analysis

Mystery Mansion was developed in its entirety by Lum, who also conducted the majority of the testing. During development, however, he called on the services of three friends to test the game.

In the world of video games, no two players play a game exactly the same way—what seems intuitive to one player will not necessarily be so for another. This difference quickly became evident to Lum during the external testing, particularly with the face tracking. Testers had difficulty advancing through the game because they would position themselves differently, or simply because of differences in facial size and composition. Lum’s observations led to the implementation of the calibration stage prior to play and the addition of a tutorial at the beginning of the game to ensure that the basics of the UI are well understood. 


The Tutorial stage helps players understand how to interact with the game using only the face-tracking interface.

Key Lessons

For Lum, when working with Intel RealSense technology and human interface technology, simplicity is absolutely key; a point that was driven home through his work on Mystery Mansion, where the UI is limited exclusively to face tracking. He’s a firm believer in the importance of not trying to do too much in terms of adding mechanics and features—even if it initially seems cool. Moving through the environment and manipulating objects using only face-tracking required careful iteration of the stripped-down UI, and a degree of tutorial “hand-holding” to ensure that the player was never in a position of not knowing what to do or how to advance through the game.

Testing played a key role in the development of Mystery Mansion. Lum found that developers should not assume that what works for one player will automatically be true for another. Every player will behave differently with the game, and in terms of the human interface of the Intel RealSense camera, each player’s face and hands will have different size, shape, movement and positional attributes which must be compensated for in the code.

The resources provided with the Intel RealSense SDK's Unity Toolkit ensured a straightforward development environment for Lum. Unity* is a user-friendly development environment that has well-tested and thorough compatibility with the Intel RealSense SDK, a wealth of resources (including those provided with the Intel RealSense SDK), a strong support community, and a ready stock of graphical assets from the Asset Store.

Lum believes that developers should always consider the physical impact of prolonged play times with hand gesture controls, which can sometimes lead to limb fatigue if the UI is not thoughtfully balanced for the player.

Tools and Resources

Lum found the process of developing his game using the Intel RealSense SDK straightforward. He also dedicated time to reviewing the available demos to pick up practical tips, including the Nine Cubes sample provided with the Intel RealSense SDK.

Unity

Lum chose to develop the game using Unity, which is readily compatible with the Intel RealSense SDK and offers a complete development environment. While Lum is an accomplished programmer in C#, the Unity platform made much of the basic programming required unnecessary, allowing him to iterate quickly in terms of developing and testing prototypes.

MonoDevelop*

To develop the C# game scripts, Lum used MonoDevelop, the integrated development environment supplied with Unity. Within MonoDevelop, Lum placed objects, set up properties, added behaviors and logic, and wrote scripts for the integration of the Intel RealSense camera data.

Nine Cubes

One of the fundamental building blocks for building Mystery Mansion was the Nine Cubes sample, which is a Unity software sample provided with the Intel RealSense SDK (it can be found in the frameworks folder of the samples directory in the SDK). This demo allows the user to move a cube using face tracking—specifically nose tracking. This functionality became the foundation of Mystery Mansion’s UI.

Unity Technologies Asset Store

Having already had experience with the Unity Technologies Asset Store for a previous project, it was Lum’s go-to place for the graphic elements of Mystery Mansion, essentially saving time and making it possible to singlehandedly develop a visually rich game. Serendipitously, he was looking for assets during the Halloween period, so creepy visuals were easy to find.

What’s Next for Mystery Mansion

Since submitting Mystery Mansion for the Intel RealSense App Challenge, Lum has continued to experiment with features that help create an even more immersive experience. For example, recent iterations allow the player to look inside boxes or containers by slowly leaning in to peer inside. This action eventually triggers something to pop out, creating a real moment of visceral fright. Lum’s takeaway is that the more pantomiming of real physical actions in the game, the greater the immersion in the experience and the more opportunities to emotionally involve and engage the player.

To date, Mystery Mansion has been designed principally for laptops and desktop PCs equipped with Intel RealSense user-facing cameras. Lum has already conducted tests with the Google Tango* tablet and is eager to work on tablet and mobile platforms with Intel RealSense technology, particularly in the light of the ongoing collaboration between Intel and Google to bring Project Tango to mobile phone devices with Intel RealSense technology.

Intel RealSense SDK: Looking Forward

In Lum’s experience, context is crucial for the successful implementation of Intel RealSense technology. Lum is particularly excited about the possibilities this technology presents in terms of 3D scanning of objects and linking that to the increasingly accessible world of 3D printing.  

As for Lum’s own work with human interface technology, he is currently pursuing the ideas he began exploring with another of his recent Intel RealSense SDK projects, My Pet Shadow, which won first place in the 2013 Intel Perceptual Computing Challenge. My Pet Shadow is a “projected reality” prototype that uses an LCD projector to cast a shadow that the user can interact with in different ways. It’s this interactive fusion of reality and the digital realm that interests Lum, and it’s a direction he intends to pursue as he continues to push the possibilities of Intel RealSense technology.


Lum’s Intel® RealSense™ project, My Pet Shadow, took first place in the 2013 Intel® Perceptual Computing Challenge.

About the Developer

Cyrus Lum has over 25 years of experience in game production, development and management roles, with both publishing and independent development companies, including Midway Studios in Austin, Texas, Inevitable Entertainment Inc., Acclaim Entertainment, and Crystal Dynamics. His roles have ranged from art director to co-founder and VP of digital productions. Currently, Lum is an advisor to Phunware Inc., and Vice president of Technology for 21 Pink, a game software development company. He has also served on the Game Developer Conference Advisory Board since 1997.

Additional Resources

Cyrus Lum Web Site

MonoDevelop

Unity

Intel® Developer Zone for Intel® RealSense™ Technology

Intel® RealSense™ SDK

Intel® RealSense™ Developer Kit

Intel RealSense Technology Tutorials


Introduction to Autonomous Navigation for Augmented Reality

$
0
0

Download PDF
Download Code Sample

Ryan Measel and Ashwin Sinha

1. Introduction

Perceptual computing is the next step in human-computer interaction. It encompasses technologies that sense and understand the physical environment including gestures, voice recognition, facial recognition, motion tracking, and environment reconstruction. Advanced Intel® RealSense™ cameras F200 and R200 are at the forefront of the perceptual computing frontier. Depth sensing capabilities allow the F200 and R200 to reconstruct the 3D environment and track a device’s motion relative to the environment. The combination of environment reconstruction and motion tracking enables augmented reality experiences where virtual assets are seamlessly intertwined with reality. 

While the Intel RealSense cameras can provide the data to power augmented reality applications, it is up to developers to create immersive experiences. One method of bringing an environment to life is through the use of autonomous agents. Autonomous agents are entities that act independently using artificial intelligence. The artificial intelligence defines the operational parameters and rules by which the agent must abide. The agent responds dynamically in real time to its environment, so even a simple design can result in complex behavior.

Autonomous agents can exist in many forms; though, for this discussion, the focus will be restricted to agents that move and navigate. Examples of such agents include non-player characters (NPCs) in video games and birds flocking in an educational animation. The goals of the agents will vary depending on the application, but the principles of their movement and navigation are common across all.

The intent of this article is to provide an introduction to autonomous navigation and demonstrate how it's used in augmented reality applications. An example is developed that uses the Intel RealSense camera R200 and the Unity* 3D Game Engine. It is best to have some familiarity with the Intel® RealSense™ SDK and Unity. For information on integrating the Intel RealSense SDK with Unity, refer to: “Game Development with Unity* and Intel® RealSense™ 3D Camera” and “First look: Augmented Reality in Unity with Intel® RealSense™ R200.”

2. Autonomous Navigation

Agent-based navigation can be handled in a number of ways ranging from simple to complex, both in terms of implementation and computation. A simple approach is to define a path for the agent to follow. A waypoint is selected, then the agent moves in a straight line towards it. While easy to implement, the approach has several problems. Perhaps the most obvious: what happens if a straight path does not exist between the agent and the waypoint (Figure 1)?

An agent moves along a straight path towards the target

Figure 1. An agent moves along a straight path towards the target, but the path can become blocked by an obstacle. Note: This discussion applies to navigation in both 2D and 3D spaces, but 2D is used for illustrative purposes.

More waypoints need to be added to route around obstacles (Figure 2).

Additional waypoints are added to allow the agent to navigate around obstacles

Figure 2. Additional waypoints are added to allow the agent to navigate around obstacles.

On bigger maps with more obstacles, the number of waypoints and paths will often be much larger. Furthermore, a higher density of waypoints (Figure 3) will allow for more efficient paths (less distance traveled to reach the destination).

the number of waypoints and possible paths increases

Figure 3. As maps grow larger, the number of waypoints and possible paths increases significantly.

A large number of waypoints necessitates a method of finding a path between non-adjacent waypoints. This problem is referred to as pathfinding. Pathfinding is closely related to graph theory and has applications in many fields besides navigation. Accordingly, it is a heavily researched topic, and many algorithms exist that attempt to solve various aspects of it. One of the most prominent pathfinding algorithms is A*. In basic terms, the algorithm traverses along adjacent waypoints towards the desired destination and builds a map of all waypoints it visits and the waypoints connected to them. Once the destination is reached, the algorithm calculates a path using its generated map. An agent can then follow along the path. A* does not search the entire space, so the path is not guaranteed to be optimal. It is computationally efficient though.

The A* algorithm traverses a map searching for a route to the target

Figure 4. The A* algorithm traverses a map searching for a route to the target.Animation by Subh83 / CC BY 3.0

A* is not able to adapt to dynamic changes in the environment such as added/removed obstacles and moving boundaries. Environments for augmented reality are dynamic by nature, since they build and change in response to the user’s movement and physical space.

For dynamic environments, it is preferable to let agents make decisions in real time, so that all current knowledge of the environment can be incorporated into the decision. Thus, a behavior framework must be defined so the agent can make decisions and act in real time. With respect to navigation, it is convenient and common to separate the behavior framework into three layers:

  1. Action Selection is comprised of setting goals and determining how to achieve those goals. For example, a bunny will wander around looking for food, unless there is a predator nearby, in which case, the bunny will flee. State machines are useful for representing such behavior as they define the states of the agent and the conditions under which states change.
  2. Steering is the calculation of the movement based on the current state of the agent. If the bunny is being chased by the predator, it should flee away from the predator. Steering calculates both the magnitude and direction of the movement force.
  3. Locomotion is the mechanics through which the agent moves. A bunny, a human, a car, and a spaceship all move in different ways. Locomotion defines both how the agent moves (e.g., legs, wheels, thrusters, etc.) and the parameters of that motion (e.g., mass, maximum speed, maximum force, etc.).

Together these layers form the artificial intelligence of the agent. In Section 3, we'll show a Unity example to demonstrate the implementation of these layers. Section 4 will integrate the autonomous navigation into an augmented reality application using the R200.

3. Implementing Autonomous Navigation

This section walks through the behavior framework in a Unity scene for autonomous navigation described above from the ground up, starting with locomotion.

Locomotion

The locomotion of the agent is based on Newton’s laws of motions where force applied to mass results in acceleration. We will use a simplistic model with uniformly distributed mass that can have force applied in any direction to the body. To constrain the movement, the maximum force and the maximum speed must be defined (Listing 1).

public float mass = 1f;            // Mass (kg)
public float maxSpeed = 0.5f;      // Maximum speed (m/s)
public float maxForce = 1f;        // "Maximum force (N)

Listing 1. The locomotion model for the agent.

The agent must have a rigidbody component and a collider component that are initialized on start (Listing 2). Gravity is removed from the rigidbody for simplicity of the model, but it is possible to incorporate.

private void Start () {

	// Initialize the rigidbody
	this.rb = GetComponent<rigidbody> ();
	this.rb.mass = this.mass;
	this.rb.useGravity = false;

	// Initialize the collider
	this.col = GetComponent<collider> ();
}

Listing 2. The rigidbody and collider components are initialized on Start().

The agent is moved by applying force to the rigidbody in the FixedUpdate() step (Listing 3). FixedUpdate() is similar to Update(), but it is guaranteed to execute at a consistent interval (which Update() is not). The Unity engine performs the physics calculations (operations on rigidbodies) at the completion of the FixedUpdate() step.

private void FixedUpdate () {

	Vector3 force = Vector3.forward;

	// Upper bound on force
	if (force.magnitude > this.maxForce) {
		force = force.normalized * this.maxForce;
	}

	// Apply the force
	rb.AddForce (force, ForceMode.Force);

	// Upper bound on speed
	if (rb.velocity.magnitude > this.maxSpeed) {
		rb.velocity = rb.velocity.normalized * this.maxSpeed;
	}
}

Listing 3. Force is applied to rigidbody in the FixedUpdate() step. This example moves the agent forward along the Z axis.

If the magnitude of the force exceeds the maximum force of the agent, it is scaled such that its magnitude is equivalent to the maximum force (direction is preserved). The AddForce () function applies the force via numerical integration:

Equation 1. Numerical integration of velocity. The AddForce() function performs this calculation.

where is the new velocity, is the previous velocity, is the force, is the mass, and is the time step between updates (the default fixed time step in Unity is 0.02 s). If the magnitude of the velocity exceeds the maximum speed of the agent, it is scaled such that its magnitude is equivalent to the maximum speed.

Steering

Steering calculates the force that will be supplied to the locomotion model. Three steering behaviors will be implemented: seek, arrive, and obstacle avoidance.

Seek

The Seek behavior attempts to move towards a target as fast as possible. The desired velocity of the behavior points directly at the target at maximum speed. The steering force is calculated as the difference between the desired and current velocity of the agent (Figure 5).

The Seek behavior applies a steering force

Figure 5. The Seek behavior applies a steering force from the current velocity to the desired velocity.

The implementation (Listing 4) first computes the desired vector by normalizing the offset between the agent and the target and multiplying it by the maximum speed. The steering force returned is the desired velocity minus the current velocity, which is the velocity of the rigidbody.

private Vector3 Seek () {

	Vector3 desiredVelocity = (this.seekTarget.position - this.transform.position).normalized * this.maxSpeed;
	return desiredVelocity - this.rb.velocity;
}

Listing 4. Seek steering behavior.

The agent uses the Seek behavior by invoking Seek() when it computes the force in FixedUpdate() (Listing 5).

private void FixedUpdate () {

	Vector3 force = Seek ();
	...

Listing 5. Invoking Seek () in FixedUpdate ().

An example of the Seek behavior in action is shown in Video 1. The agent has a blue arrow that indicates the current velocity of the rigidbody and a red arrow that indicates the steering force being applied in that time step.

Video 1. The agent initially has a velocity orthogonal to the direction of the target, so its motion follows a curve.

Arrive

The Seek behavior overshoots and oscillates around the target, because it was traveling as fast as possible to the reach target. The Arrive behavior is similar to the Seek behavior except that it attempts to come to a complete stop at the target. The “deceleration radius” parameter defines the distance from the target at which the agent will begin to decelerate. When the agent is within the deceleration radius, the desired velocity will be scaled inversely proportional to the distance between the agent and the target. Depending on the maximum force, maximum speed, and deceleration radius, it may not be able to come to a complete stop.

The Arrive behavior (Listing 6) first calculates the distance between the agent and the target. A scaled speed is calculated as the maximum speed scaled by the distance divided by the deceleration radius. The desired speed is taken as the minimum of the scaled speed and maximum speed. Thus, if the distance to the target is less than the deceleration radius, the desired speed is the scaled speed. Otherwise, the desired speed is the maximum speed. The remainder of the function performs exactly like Seek using the desired speed.

// Arrive deceleration radius (m)
public float decelerationRadius = 1f;

private Vector3 Arrive () {

	// Calculate the desired speed
	Vector3 targetOffset = this.seekTarget.position - this.transform.position;
	float distance = targetOffset.magnitude;
	float scaledSpeed = (distance / this.decelerationRadius) * this.maxSpeed;
	float desiredSpeed = Mathf.Min (scaledSpeed, this.maxSpeed);

	// Compute the steering force
	Vector3 desiredVelocity = targetOffset.normalized * desiredSpeed;
	return desiredVelocity - this.rb.velocity;
}

Listing 6. Arrive steering behavior.

Video 2. The Arrive behavior decelerates as it reaches the target.

Obstacle Avoidance

The Arrive and Seek behaviors are great for getting places, but they are not suited for handling obstacles. In dynamic environments, the agent will need to be able to avoid new obstacles that appear. The Obstacle Avoidance behavior looks ahead of the agent along the intended path and determines if there are any obstacles to avoid. If obstacles are found, the behavior calculates a force that alters the path of the agent to avoid the obstacle (Figure 6).

Figure 6. When an obstacle is detected along the current trajectory, a force is returned that prevents the collision.

The implementation of Obstacle Avoidance (Listing 7) uses a spherecast to detect collisions. The spherecast casts a sphere along the current velocity vector of the rigidbody and returns a RaycastHit for every collision. The sphere originates from the center of the agent and has a radius equal to the radius of the agent’s collider plus an “avoidance radius” parameter. The avoidance radius allows the user to define the clearance around the agent. The cast is limited to traveling the distance specified by the “forward detection” parameter.

// Avoidance radius (m). The desired amount of space between the agent and obstacles.
public float avoidanceRadius = 0.03f;
// Forward detection radius (m). The distance in front of the agent that is checked for obstacles.
public float forwardDetection = 0.5f;

private Vector3 ObstacleAvoidance () {

	Vector3 steeringForce = Vector3.zero;

	// Cast a sphere, that bounds the avoidance zone of the agent, to detect obstacles
	RaycastHit[] hits = Physics.SphereCastAll(this.transform.position, this.col.bounds.extents.x + this.avoidanceRadius, this.rb.velocity, this.forwardDetection);

	// Compute and sum the forces across all hits
	for(int i = 0; i < hits.Length; i++)    {

		// Ensure that the collidier is on a different object
		if (hits[i].collider.gameObject.GetInstanceID () != this.gameObject.GetInstanceID ()) {

			if (hits[i].distance > 0) {

				// Scale the force inversely proportional to the distance to the target
				float scaledForce = ((this.forwardDetection - hits[i].distance) / this.forwardDetection) * this.maxForce;
				float desiredForce = Mathf.Min (scaledForce, this.maxForce);

				// Compute the steering force
				steeringForce += hits[i].normal * desiredForce;
			}
		}
	}

	return steeringForce;
}

Listing 7. Obstacle Avoidance steering behavior.

The spherecast returns an array of RaycastHit objects. A RaycastHit contains information about a collision including the distance to the collision and the normal of the surface that was hit. The normal is a vector that is orthogonal to the surface. Accordingly, it can be used to direct the agent away from the collision point. The magnitude of the force is determined by scaling the maximum force inversely proportional to the distance from the collision. The forces for each collision are summed, and the result produced is the total steering force for a single time step.

Separate behaviors can be combined together to create more complex behaviors (Listing 8). Obstacle Avoidance is only useful when it works in tandem with other behaviors. In this example (Video 3), Obstacle Avoidance and Arrive are combined together. The implementation combines the behaviors simply by summing their forces. More complex schemes are possible that incorporate heuristics to determine priority weighting on forces.

private void FixedUpdate () {

	// Calculate the total steering force by summing the active steering behaviors
	Vector3 force = Arrive () + ObstacleAvoidance();
	...

Listing 8. Arrive and Obstacle Avoidance are combined by summing their forces.

Video 3. The agent combines two behaviors, Arrive and Obstacle Avoidance.

Action Selection

Action selection is the high level goal setting and decision making of the agent. Our agent implementation already incorporates a simple action selection model by combining the Arrive and Obstacle Avoidance behaviors. The agent attempts to arrive at the target, but it will adjust its trajectory when obstacles are detected. The “Avoidance Radius” and “Forward Detection” parameters of Obstacle Avoidance define when action will be taken.

4. Integrating the R200

Now that the agent is capable of navigating on its own, it is ready to be incorporated into an augmented reality application.

The following example is built on top of the “Scene Perception” example that comes with the Intel RealSense SDK. The application will build a mesh using Scene Perception, and the user will be able to set and move the target on the mesh. The agent will then navigate around the generated mesh to reach the target.

Scene Manager

A scene manager script initializes the scene and handles the user input. Touch up (or mouse click release, if the device does not support touch) is the only input. A raycast from the point of the touch determines if the touch is on the generated mesh. The first touch spawns the target on the mesh; the second touch spawns the agent; and every subsequent touch moves the position of the target. A state machine handles the control logic (Listing 9).

// State machine that controls the scene:
//         Start => SceneInitialized -> TargetInitialized -> AgentInitialized
private enum SceneState {SceneInitialized, TargetInitialized, AgentInitialized};
private SceneState state = SceneState.SceneInitialized;    // Initial scene state.

private void Update () {

	// Trigger when the user "clicks" with either the mouse or a touch up gesture.
	if(Input.GetMouseButtonUp (0)) {
		TouchHandler ();
	}
}

private void TouchHandler () {

	RaycastHit hit;

	// Raycast from the point touched on the screen
	if (Physics.Raycast (Camera.main.ScreenPointToRay (Input.mousePosition), out hit)) {

	 // Only register if the touch was on the generated mesh
		if (hit.collider.gameObject.name == "meshPrefab(Clone)") {

			switch (this.state) {
			case SceneState.SceneInitialized:
				SpawnTarget (hit);
				this.state = SceneState.TargetInitialized;
				break;
			case SceneState.TargetInitialized:
				SpawnAgent (hit);
				this.state = SceneState.AgentInitialized;
				break;
			case SceneState.AgentInitialized:
				MoveTarget (hit);
				break;
			default:
				Debug.LogError("Invalid scene state.");
				break;
			}
		}
	}
}

Listing 9. The touch handler and state machine for the example application.

The Scene Perception feature generates lots of small meshes. These meshes typically have less than 30 vertices. The positioning of the vertices is susceptible to variance, which results in some meshes being angled differently than the surface it resides on. If an object is placed on top of the mesh (e.g., a target or an agent), the object will be oriented incorrectly. To circumvent this issue, the average normal of the mesh is used instead (Listing 10).

private Vector3 AverageMeshNormal(Mesh mesh) {

	Vector3 sum = Vector3.zero;

	// Sum all the normals in the mesh
	for (int i = 0; i < mesh.normals.Length; i++){
		sum += mesh.normals[i];
	}

	// Return the average
	return sum / mesh.normals.Length;
}

Listing 10. Calculate the average normal of a mesh.

Building the Application

All code developed for this example is available on Github.

The following instructions integrate the scene manager and agent implementation into an Intel® RealSense™ application.

  1. Open the “RF_ScenePerception” example in the Intel RealSense SDK folder “RSSDK\framework\Unity”.
  2. Download and import the AutoNavAR Unity package.
  3. Open the “RealSenseExampleScene” in the “Assets/AutoNavAR/Scenes/” folder.
  4. Build and run on any device compatible with an Intel RealSense camera R200.

Video 4. The completed integration with the Intel® RealSense™ camera R200.

5. Going Further with Autonomous Navigation

We developed an example that demonstrates an autonomous agent in an augmented reality application using the R200. There are several ways in which this work could be extended to improve the intelligence and realism of the agent.

The agent had a simplified mechanical model with uniform mass and no directional movement restrictions. A more advanced locomotive model could be developed that distributes mass non-uniformly and constrains the forces applied to the body (e.g., a car with differing acceleration and braking forces, a spaceship with main and side thrusters). More accurate mechanical models will result in more realistic movement.

Craig Reynolds was the first to extensively discuss steering behaviors in the context of animation and games. The Seek, Arrive, and Obstacle Avoidance behaviors that were demonstrated in the example find their origins in his work. Reynolds described other behaviors including Flee, Pursuit, Wander, Explore, Obstacle Avoidance, and Path Following. Group behaviors are also discussed including Separation, Cohesion, and Alignment. “Programming Game AI by Example” by Mat Buckland is another useful resource that discusses the implementation of these behaviors as well as a number of other related concepts including state machines and pathfinding.

In the example, both the Arrive and Obstacle Avoidance steering behaviors are applied to the agent simultaneously. Any number of behaviors can be combined in this way to create more complex behaviors. For instance, a flocking behavior is built from the combination of Separation, Cohesion, and Alignment. Combining behaviors can sometimes produce unintuitive results. It is worth experimenting with types of behaviors and their parameters to discover new possibilities.

Additionally, some pathfinding techniques are intended for use in dynamic environments. The D* algorithm is similar to A*, but it can update the path based on new observations (e.g., added/removed obstacles). D* Lite operates in the same fashion as D* and is simpler to implement. Pathfinding can also be used in conjunction with steering behaviors by setting the waypoints and allowing steering to navigate to those points.

While action selection has not been discussed in this work, it is widely studied in game theory. Game theory investigates the mathematics behind strategy and decision making. It has applications in a many fields including economics, political science, and psychology to name a few. With respect to autonomous agents, game theory can inform how and when decisions are made. “Game Theory 101: The Complete Textbook” by William Spaniel is a great starting point and has a companion YouTube series.

6. Conclusion

An arsenal of tools exist that you can use to customize the movement, behavior, and actions of agents. Autonomous navigation is particularly well suited for dynamic environments, such as those generated by Intel RealSense cameras in augmented reality applications. Even simple locomotion models and steering behaviors can produce complex behavior without prior knowledge of the environment. The multitude of available models and algorithms allows for the flexibility to implement an autonomous solution for nearly any application.

About the Authors

Ryan Measel is a Co-Founder and CTO of Fantasmo Studios. Ashwin Sinha is a founding team member and developer. Founded in 2014, Fantasmo Studios is a technology-enabled entertainment company focused on content and services for mixed reality applications.

Intel® Memory Protection Extensions Enabling Guide

$
0
0

Abstract: This document describes Intel® Memory Protection Extensions (Intel® MPX), its motivation, and programming model. It also describes the enabling requirements and the current status of enabling in the supported OSs: Linux* and Windows* and compilers: Intel® C++ Compiler, GCC, and Visual C++*. Finally, the paper describes how ISVs can incrementally enable bounds checking in their Intel MPX applications.

Introduction

C/C++ pointer arithmetic is a convenient language construct often used to step through an array of data structures. If an iterative write operation does not take into consideration the bounds of the destination, then adjacent memory locations may get corrupted. Such modification of adjacent data not intended by the developer is referred as a Buffer Overflow. Similarly, uncontrolled reads could reveal cryptographic keys and passwords. Buffer overflows have been known to be exploited, causing Denial of Service (DOS) attacks and system crashes. More sinister attacks, which do not immediately draw the attention of the user or system administrator, alter the code execution path, such as modifying the return address in the stack frame, to execute malicious code or script.

Intel’s Execute Disable Bit and similar hardware features from other vendors have blocked all buffer overflow attacks that redirected the execution to malicious code stored as data. Various other techniques adopted by compiler vendors to mitigate buffer overflow problems can be found in the references.

Intel® MPX technology consists of new Intel® architecture instructions and registers that C/C++ compilers can use to check the bounds of a pointer before it is used. This new hardware technology will be enabled in future Intel® processors. The supported compilers are Intel® C/C++ compiler, GCC (GNU C/C++ compiler) and Microsoft* Visual C++* .

Sample Application for Direct3D 12 Flip Model Swap Chains

$
0
0

Download Code Sample

Using swap chains in D3D12 has additional complexity compared to D3D11. Only flip model [1] swap chains may be used with D3D12. There are many parameters that must be selected, such as: the number of buffers, number of in-flight frames, the present SyncInterval, and whether or not WaitableObject is used. We developed this application internally to help understand the interaction between the different parameters, and to aid in the discovery of the most useful parameter combinations.

The application consists of an interactive visualization of the rendered frames as they progress from CPU to GPU to display and through the present queue. All of the parameters can be modified in real time. The effects on framerate and latency can be observed via the on-screen statistics display.

Sample App Direct3D

Figure 1:An annotated screenshot of the sample application

Swap Chain Parameters

These are the parameters used to investigate D3D12 swap chains.

ParameterDescription
FullscreenTrue if the window covers the screen (i.e. borderless windowed mode). NOTE: Different than SetFullscreenState, which is for exclusive mode.
VsyncControls the SyncInterval parameter of the Present() function.
Use Waitable ObjectWhether or not the swap chain is created with DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT
Maximum Frame LatencyThe value passed to SetMaximumFrameLatency. Ignored if “Use Waitable Object” is not enabled. Without waitable object, the effective Maximum Frame Latency is 3.
BufferCountThe value specified in DXGI_SWAP_CHAIN_DESC1::BufferCount.
FrameCountThe maximum number of “game frames” that will be generated on the CPU before waiting for the earliest one to complete. A game frame is a user data structure and its completion on the GPU is tracked with D3D12 fences. Multiple game frames can point to the same swap chain buffer.

Additional Parameters

These parameters were included in the swap chain investigation. However, these parameters had fixed values. As their value was fixed, we additionally list why each value was fixed and not variable.

ParameterDescription
Exclusive modeSetFullscreenState is never called in the sample because the present statistics mechanism does not work in exclusive mode.
SwapEffectThe value specified in DXGI_SWAP_CHAIN_DESC1::SwapEffect. Always set to DXGI_SWAP_EFFECT_FLIP_DISCARD. DISCARD is the least specified behavior, which affords the OS the most flexibility to optimize presentation. The only other choice, DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL, is only useful for operations which involve reusing image regions from previous presents (e.g. scroll rectangles).

Understanding BufferCount and FrameCount

BufferCount is the number of buffers in the swap chain. With flip model swap chains, the operating system may lock one buffer for an entire vsync interval while it is displayed, so the number of buffers available to the application to write is actually BufferCount-1. If BufferCount = 2, then there is only one buffer to write to until the OS releases the second one at the next vsync. A consequence of this is that the frame rate cannot exceed the refresh rate.

When BufferCount >= 3, there are at least 2 buffers available to the application which it can cycle between (assuming SyncInterval=0), which allows the frame rate to be unlimited.

FrameCount is the maximum number of in-flight “render frames,” where a render frame is the set of resources and buffers that the GPU must perform the rendering. If FrameCount = 1, then the CPU will not build the next render frame until the previous one is completely processed. This means that FrameCount must be at least 2 for the CPU and GPU to be able to work in parallel.

Maximum Frame Latency, and how “Waitable Object” Reduces Latency

Latency is the time between when a frame is generated, and when it appears on screen. Therefore, to minimize latency in a display system with fixed intervals (vsyncs), frame generation must be delayed as long as possible.

The maximum number of queued present operations is called the Maximum Frame Latency. When an application tries to queue an additional present after reaching this limit, Present() will block until one of the previous frames has been displayed.

Any time that the render thread spends blocked on the Present function occurs between frame generation and frame display, so it directly increases the latency of the frame being presented. This is the latency which is eliminated by the use of the “waitable object.”

Conceptually, the waitable object can be thought of as a semaphore which is initialized to the Maximum Frame Latency, and signaled whenever a present is removed from the Present Queue. If an application waits for the semaphore to be signalled before rendering then the present queue is not full (so Present will not block), and the latency is eliminated.

Top parameter presets

The results of our investigation gave three different “best” values depending on your requirements. These are the parameter combos we thought are best suited for games.


Direct3D Game Mode

Game Mode

  • Vsync On
  • 3 Buffers, 2 Frames
  • Waitable Object, Maximum Frame Latency 2

Game mode is a balanced tradeoff between latency and throughput.


Direct3D Classic Mode

Classic Game mode

  • Vsync On
  • 3 Buffers, 3 Frames
  • Not using waitable object

This implicitly happens under D3D11 with triple buffering, hence “classic.” Classic game mode prioritizes throughput. The extra frame queueing can absorb spikes better but at the expense of latency.


Direct3D Minimum Latency

Minimum Latency

  • Vsync On
  • 2 Buffers, 1 Frame
  • Waitable Object, Maximum Frame Latency 1

The absolute minimum amount of latency without using VR-style vsync racing tricks. If the application misses vsync, the frame rate will immediately drop to ½ refresh. CPU and GPU operate serially rather than in parallel.


App version

The source code includes a project file for building the sample as a Windows 10 Universal App. The only difference in the Direct3D code is calling CreateSwapChainForCoreWindow instead of CreateSwapChainForHWND.

If you wish to try the app version without compiling it yourself, here is a link to the Windows Store page: https://www.microsoft.com/store/apps/9NBLGGH6F7TT

References

1 - “DXGI Flip Model.” https://msdn.microsoft.com/en-us/library/windows/desktop/hh706346%28v=vs.85%29.aspx

2 - ”Reduce Latency with DXGI 1.3 Swap Chains.” https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx

3 - DirectX 12: Presentation modes in Windows 10. https://www.youtube.com/watch?v=E3wTajGZOsA

4 - DirectX 12:  Unthrottled Framerate. https://www.youtube.com/watch?v=wn02zCXa9IU

Using the Intel® RealSense™ Camera with TouchDesigner*: Part 2

$
0
0

Download Demo Files ZIP 14KB

The Intel® RealSense™ camera is a vital tool for creating VR and AR projects. Part 2 of this article will lay out how to use the Intel RealSense camera nodes in TouchDesigner to set up a render or real-time projections for multi screens, single screen, 180 degree (FullDome) and 360 degree VR renders. In addition the Intel RealSense™ camera information can be sent out to an Oculus Rift* through the TouchDesigner Oculus Rift TOP node.

Part 2 will focus on the RealSense CHOP node in TouchDesigner.

The RealSense CHOP node in TouchDesigner is where the powerful tracking features of the RealSense F200 and R200 cameras, such as eye tracking, finger tracking, and face tracking can be accessed. These tracking features are especially exciting for use in setting up real time animations and/or tracking these animations to body/gesture movements of performers. I find this particularly useful for live performances of dancers or musicians where I want a high level of interactivity between live video, animations, graphics and sound as well as the performers.

To get the TouchDesigner (.toe) files that go with this article, click the button at the top of the page. A free non-commercial copy of TouchDesigner is available too and is fully functional except that the highest resolution is limited to 1280 x 1280.  

Once again it is worth noting that the support of the Intel RealSense camera in TouchDesigner makes it an even more versatile and powerful tool.

Note: Like Part 1 of this article, Part 2 is aimed at those familiar with using TouchDesigner and its interface. If you are unfamiliar with TouchDesigner and plan to follow along with this article step-by-step, I recommend that you first review some of the documentation and videos available here: Learning TouchDesigner.

Note: When using the Intel RealSense camera, it is important to pay attention to its range for best results. On this Intel web page you will find the range of the camera and best operating practices for using it.

A Bit of Historical Background

All of the data the Intel RealSense cameras can provide is extremely useful for creating VR and AR. Some early attempts at what the Intel RealSense camera can now do took place in the 1980s. Hand positioning tracking technology came out in the 1980s in the form of the data glove developed by Jason Lanier and Thomas G. Zimmerman, and in 1987 Nintendo came out with the first wired glove available to consumers for gaming in their Nintendo entertainment system.

Historically the Intel RealSense camera also has roots in performance animation, which uses motion capture technologies to translate a live motion event into usable math, thus translating a live performance into digital information that can be used for a performance. Motion capture was used as early as the 1970s in research projects at various universities and in the military for training. One of the first animations turned out utilizing motion capture data to create an animated performance was “Sexy Robot” https://www.youtube.com/watch?v=eedXpclrKCc in 1985 by Robert Abel and Associates. Sexy Robot used several techniques for getting in the information to create the digital robot model and then to animate it. First, a practical model was made of the robot. It was measured in all dimensions and the information to describe it was input in numbers, something the RealSense camera can now get from scanning the object. Then for the motion in Sexy Robot, dots were painted on a real person and used to make skeleton drawings on the computer creating a vector animation that was then used to animate the digital model. The RealSense camera is a big improvement on this with its infrared camera and an infrared laser projector, which provides the data from which digital models can be made as well as providing the data for tracking motion. The tracking capabilities of the Intel RealSense camera are very refined, making even eye tracking possible.

About the Intel RealSense Cameras

There are currently two types of Intel RealSense cameras that perform many of the same functions with slight variations: The Intel RealSense camera F200, for which the exercises in this article are designed, and the Intel RealSense camera R200.

The Intel RealSense R200 camera with its tiny size has many advantages as it is designed to mount on a tripod or be placed on the back of a tablet. Thus, the camera is not focused on the user but on the world, and with its increased scanning capabilities, it can scan over a larger area. It also has advanced depth-measuring capabilities. The camera’s use will be exciting for Augmented Reality (AR) as it has a feature called Scene Perception, which will enable you to add virtual objects into a captured world scene. Virtual information will also be able to be laid over a live image feed. Unlike the F200 model, the R200 does not have finger and hand tracking and doesn’t support face tracking. TouchDesigner supports both the F200 and the R200 Intel RealSense cameras.

About the Intel RealSense Cameras In TouchDesigner

TouchDesigner is a perfect match with the Intel RealSense camera, which allows a direct interface between the gestures of the user’s face and hands and the software interface. TouchDesigner can directly use this position/tracking data. TouchDesigner can also use the depth, color, and infrared data that the Intel RealSense camera supplies. The Intel RealSense cameras are very small and light, especially the R200 model, which can easily be placed near performers and not be noticed by audience members.

Adam Berg, a research engineer for Leviathan who is working on a project using the Intel RealSense camera in conjunction with TouchDesigner to create interactive installations says: “The small size and uncomplicated design of the camera is well-suited to interactive installations. The lack of an external power supply simplifies the infrastructure requirements, and the small camera is discreet. We've been pleased with the fairly low latency of the depth image as well. TouchDesigner is a great platform to work with, from first prototype to final development. Its built-in support for live cameras, high-performance media playback, and easy shader development made it especially well-suited for this project. And of course the support is fantastic.”

Using the Intel® RealSense™ Camera in TouchDesigner

In Part 2 we focus on the CHOP node in TouchDesigner for the Intel RealSense camera.

RealSense CHOP Node

The RealSense CHOP node controls the 3D tracking/position data. The CHOP node carries two types of information: (1) The real-world position, which is meters units but potentially accurate down to the millimeter, is used for the x, y, and z translations. The x, y and z rotations in the RealSense CHOP are output at x, y and z Euler angles in degrees. (2) The RealSense CHOP also takes pixels from image inputs and converts that to normalized UV coordinates. This is useful for image tracking.

The RealSense CHOP node has two setup settings: finger/face tracking and marker tracking.

  • Finger/Face Tracking gives you a list of selections to track. You can narrow down the list of what is trackable to one aspect, then by connecting a Select CHOP node to the RealSense CHOP node you can narrow down the selection even further so that you may only be tracking the movement of an eyebrow or an eye.
  • Marker tracking enables you to load an image and track that item wherever it is.

Using the RealSense CHOP node in TouchDesigner

Demo #1 Using Tracking

This is a simple first demo of the RealSense CHOP node to show you how it can be wired/connected to other nodes and used to track and create movement. Once again, please note these demos require a very basic knowledge of TouchDesigner.  If you are unfamiliar with TouchDesigner and plan to follow along with this article step-by-step, I recommend that you first review some of the documentation and videos available here:  Learning TouchDesigner

  1. Create the nodes you will need and arrange them in a horizontal row in this order: Geo COMP node, the RealSense CHOP node, the Select CHOP node, the Math CHOP node, the Lag CHOP node, the Out CHOP node, and the Trail CHOP node.
  2. Wire the RealSense CHOP node to the Select CHOP node, the Select CHOP node to the Math CHOP node, the Math CHOP node to the Lag CHOP node, the Lag CHOP node to the Out CHOP Node, and the Out CHOP node to the Trail CHOP node.
  3. Open the Setup parameters page of the RealSense CHOP node, and make sure the Hands World Position parameter is On. This outputs positions of the tracked hand joints in world space. Values are given in meters relative to the camera.
  4. In the Select parameters page of the Select CHOP Node, set the Channel Names parameter to hand_r/wrist:tx by selecting it from the tracking selections available using the drop-down arrow on the right of the parameter.
  5. In the Rename From parameter, enter: hand_r/wrist:tx, and then in the Rename To parameter, enter: x.
    Figure 1. The Select CHOP node is where the channels are chosen from the RealSense CHOP node.
  6. In the Range/To Range parameter of the Math CHOP node, enter: 0, 100. For a smaller range of movement range, enter a number less than 100.
  7. Select the Geometry COMP and make sure it is on its Xform parameters page. Press the + button on the bottom right of the Out CHOP node to activate its viewer. Drag the X channel onto the Translate X parameter of the Geometry COMP node and select Export CHOP from the drop-down menu that will appear.
    Figure 2. This is where you are adding animation as gotten from the RealSense CHOP.

    To render geometry, you need a Camera COMP node, a Material (MAT) node (I used the Wireframe MAT), a Light COMP node, and a Render TOP node. Add these to render this project.

  8. In the Camera COMP, on the Xform parameter page set the Translate Z to 10. This gives you a better view of the movement in the geometry you have created as the camera is further back on the z-axis.
  9. Wave your right wrist back and forth in front of the camera and watch the geometry move in the Render TOP node.
Figure 3. How the nodes are wired together. The Trail CHOP at the end gives you a way of seeing the animation in graph form.

Figure 4. The setting for the x translate of the Geometry COMP was exported from the x channel in the Out CHOP which has been carried forward down the chain from the Select CHOP Node.

Demo #2: RealSense CHOP Marker Tracking

In this demo, we use the marker tracking feature in the RealSense CHOP to show how to use an image for tracking. You will create an image and have two copies of it: a printed copy and a digital copy. They should exactly match. You can either have a digital file, print a hard copy, or you can scan an image to create the digital version.

  1. Add a RealSense CHOP node to your scene.
  2. On the Setup parameters page for the RealSense CHOP node, for Mode select Marker Tracking.
  3. Create a Movie File in TOP.
  4. In the Play parameters page of the TOP node, under File, choose and load in the digital image that you also have a printed version of.
  5. Drag the Movie File in TOP to the RealSense CHOP node Setup parameters page and into the Marker Image TOP slot at the bottom of the page.
  6. Create a Geometry COMP, a Camera COMP, a Light COMP and a Render TOP.
  7. Like we did in step 7 of Demo #1, export the tx channel from the RealSense CHOP and drag it to the Translate X parameter of the Geometry COMP.
  8. Create a Reorder TOP and connect it to the Render TOP. In the Reorder parameters page in the Output Alpha change the drop-down to One.
  9. Position your printed image of the digital file in front of the Intel RealSense Camera and move it. The camera should track the movement and reflect it in the Render TOP. The numbers in the RealSense CHOP will also change.
    Figure 5. This is the complete layout for the simple marker tracking demo.
     
    Figure 6. On the parameters page of the Geo COMP the tx channel from the RealSense CHOP has been dragged into the Translate x parameter.

Eye Tracking in TouchDesigner Using the RealSense CHOP Node

In the TouchDesigner Program Palette, under RealSense there is a template called eyeTracking that can be used to track a person’s eye movements. This template uses the RealSense CHOP node finger/face tracking and the RealSense TOP node set to Color. In the template, green WireFrame rectangles track to the person’s eyes and are then composited over the RealSense TOP color image of the person. Any other geometry or particles etc. could be used instead of the green open rectangles. It is a great template to use. Here is an image using the template.

Figure 7. Note that the eyes were tracked even through the glasses.

Demo #3, Part 1: Simple ways to set up a FullDome render or a VR render

In this demo we take a file and show how to render it as a 180 FullDome render and as a 360 VR render. I have already made the file to download to see how it is done in detail. It is chopRealSense_FullDome_VR_render.toe

A brief description of how this file was created:

In this file I wanted to place geometries (sphere, torus, tubes, and rectangles) in the scene. So I made a number of SOP nodes of these different geometrical shapes. Each SOP node was attached to a Transform SOP node to move (translate) the geometries to different places in the scene. All the SOP nodes were wired to one Merge SOP node. The Merge SOP node was fed into the Geometry COMP.

Figure 8. This is the first step in the layout for creating the geometries placed around the scene in the downloadable file.

Next I created a Grid SOP node and a SOP To DAT node. The SOP To DAT node was used to instance the Geometry COMP so that I had more geometries in the scene. I also created a Constant MAT node, made the color green, and turned on the WireFrame parameter on the Common page.

Figure 9. The SOP To DAT Node was created using the Grid SOP Node.

Next I created a RealSense CHOP node and wired it to the Select CHOP node where I selected the hand_r/wrist:tx channel to track and renamed it to x. I wired the Select CHOP to the Math CHOP so I could change the range and wired the Math CHOP to the Null CHOP. It is always good practice to end a chain with a Null or Out node so you can more easily insert new filters inside the chain. Next I exported the x Channel from the Null CHOP into the Scale X parameter of the Geometry COMP. This controls all of the x scaling of the geometry in my scene when I moved my right wrist in front of the Intel RealSense Camera.

Figure 10. The tracking from the RealSense CHOP node is used to create real-time animation, scaling of the geometries along the x-axis.

To create a FullDome 180-degree render from the file :

  1. Create a Render TOP, a Camera COMP, and a Light COMP.
  2. In the Render TOPs Render parameters page, select Cube Map in the Render Mode drop-down menu.
  3. In the Render TOP Common parameters page, set the Resolution to a 1:1 aspect ratio such as 4096 by 4096 for a 4k render.
  4. Create a Projection TOP node and connect the Render TOP node to it.
  5. In the Projection TOP Projection parameters page, select Fish-Eye from the Output drop-down menu.
  6. (This is optional to give your file a black background.) Create a Reorder TOP and in the Reorder parameters page in the right drop-down menu for Output Alpha, select One.
  7. You are now ready to either perform the animation live or export a movie file. Refer to Part 1 of this article for instructions. You are creating a circular fish-eye dome master animation. It will be a circle within a square.

For an alternative method, go back to Step 2 and instead of selecting Cube Map in the Render Mode drop-down menu, select Fish-Eye(180). Continue with Step 3 and optionally Step 6, and you are now ready to perform live or export a dome Master animation.

To create a 360-degree VR render from this file:

  1. Create a Render TOP, a Camera COMP, and a Light COMP.
  2. In the Render TOP’s Render parameters page, select Cube Map in the Render Mode drop-down menu.
  3. In the Render TOP Common parameters page, set the Resolution to a 1:1 aspect ratio such as 4096 by 4096 for a 4k render. 
  4. Create a Projection TOP node, and connect the Render TOP node to it.
  5. In the Projection TOP Projection Parameters page, select Equirectangular from the Output drop-down menu. It will automatically make the aspect ratio 2:1.
  6. (This is optional to give your file a black background.) Create a Reorder TOP, and in the Reorder parameters page in the right drop-down menu for Output Alpha, select One.
  7. You are now ready to either perform the animation live or export out a movie file. Refer to Part 1 of this article for instructions. If you export a movie render, you are creating a 2:1 aspect ratio equirectangular animation for viewing in VR headsets.
Figure 11. Long, orange Tube SOPs have been added to the file. You can add your own geometries to the file

To Output to an Oculus Rift* from TouchDesigner While Using the Intel RealSense Camera

TouchDesigner has created several templates for download that will show you how to set up the Oculus Rift in TouchDesigner, one of which you can download using the button on the top right of this article, OculusRiftSimple.toe.   You do need to have your computer connected to an Oculus Rift to see it in the Oculus Rift. Without an Oculus Rift you can create the file and see the images in the LeftEye Render TOP and the RightEye Render TOP and display them in the background of your scene. I added the Oculus Rift capabilities to the file I used in Demo 3. In this way I have the Intel RealSense Camera animating what I am seeing in the Oculus Rift.

Figure 12. Here displayed in the background of the file are the left eye and the right eye. Most of the animation in the scene is being controlled by the tracking from the Intel® RealSense™ camera CHOP node. The file that created this image can be downloaded from the button on the top right of this article, chopRealSense_FullDome_VRRender_FinalArticle2_OculusRiftSetUp.toe

About the Author

Audri Phillips is a visualist/3d animator based out of Los Angeles, with a wide range of experience that includes over 25 years working in the visual effects/entertainment industry in studios such as Sony*, Rhythm and Hues*, Digital Domain*, Disney*, and Dreamworks* feature animation. Starting out as a painter she was quickly drawn to time based art. Always interested in using new tools she has been a pioneer of using computer animation/art in experimental film work including immersive performances. Now she has taken her talents into the creation of VR. Samsung* recently curated her work into their new Gear Indie Milk VR channel.

Her latest immersive work/animations include: Multi Media Animations for "Implosion a Dance Festival" 2015 at the Los Angeles Theater Center, 3 Full dome Concerts in the Vortex Immersion dome, one with the well-known composer/musician Steve Roach. She has a fourth upcoming fulldome concert, "Relentless Universe", on November 7th, 2015. She also created animated content for the dome show for the TV series, "Constantine*" shown at the 2014 Comic-Con convention. Several of her Fulldome pieces, "Migrations" and "Relentless Beauty", have been juried into "Currents", The Santa Fe International New Media Festival, and Jena FullDome Festival in Germany. She exhibits in the Young Projects gallery in Los Angeles.

She writes online content and a blog for Intel®. Audri is an Adjunct professor at Woodbury University, a founding member and leader of the Los Angeles Abstract Film Group, founder of the Hybrid Reality Studio (dedicated to creating VR content), a board member of the Iota Center, and she is also an exhibiting member of the LA Art Lab. In 2011 Audri became a resident artist of Vortex Immersion Media and the c3: CreateLAB. Works of hers can be found on Vimeo , on creativeVJ and on Vrideo .

Intel® RealSense™ Technology Casts Light on the Gesture-Controlled Shadow Play of Ombre Fabula*

$
0
0

By John Tyrrell

Fusing the traditions of Western and Southeast Asian shadow-theater, Ombre Fabula is a prototype app that uses the Intel® RealSense™ SDK to create a gesture-controlled interactive shadow play. During the collaborative development process, the Germany-based team comprised of Thi Binh Minh Nguyen and members of Prefrontal Cortex overcame a number of challenges. These included ensuring that the Intel® RealSense™ camera accurately sensed different hand-shapes by using custom blob-detection algorithms, and extensive testing with a broad user base.

Ombre Fabula
The title screen of Ombre Fabula showing the opening scene in grandma’s house (blurred in the background) before the journey to restore her eyesight begins.

Originally the bachelor’s degree project of designer Thi Binh Minh Nguyen—and brought to life with the help of visual design and development team Prefrontal Cortex—Ombre Fabula was intended to be experienced as a projected interactive installation running on a PC or laptop equipped with a user-facing camera. Minh explained that the desire was to “bring this art form to a new interactive level, blurring the boundaries between audience and player”.

Players are drawn into an enchanting, two-dimensional world populated by intricate “cut-out” shadow puppets. The user makes different animals appear on screen by forming familiar shapes—such as a rabbit—with their hands. The user then moves the hand shadow to guide the child protagonist as he collects fragments of colored light on his quest to restore his grandmother’s eyesight.

Ombre Fabula was entered in the 2014 Intel RealSense App Challenge, where it was awarded second place in the Ambassador track.

Decisions and Challenges

With previous experience working with more specialized human interface devices, the team was attracted to Intel RealSense technology by the breadth of possibilities it offers in terms of gesture, face-tracking and voice recognition, although they ultimately used only gesture for the Ombre Fabula user interface (UI).

Creating the UI

Rather than being adapted from another UI, the app was designed from the ground up with hand- and gesture-tracking in mind. Ombre Fabula was also designed primarily as a wall-projected interactive installation, in keeping with the traditions of shadow-theater and in order to deliver room-scale immersion.

Ombre Fabula Virtual Phoenix
Ombre Fabula was designed to be used as a room-scale interactive installation to maximize the player’s immersion in the experience. Here the user is making the bird form.

For the designers, the connection between the real world and the virtual one that users experience when their hands cast simulated shadows on the wall, is crucial to the immersive experience. This experience is further enhanced by the use of a candle, evoking traditional shadow plays.

The UI of Ombre Fabula is deliberately minimal—there are no buttons, cursors, on-screen displays, or any controls whatsoever beyond the three hand-forms that the app recognizes. The app was designed to be used in a controlled installation environment, where there is always someone on hand to guide the user. The designers often produce this type of large-scale interactive installation, and with Ombre Fabula they specifically wanted to create a short, immersive experience that users might encounter in a gallery or similar space. The intentionally short interaction time made direct instructions from a host in situ more practical than an in-game tutorial in order to quickly bring users into the game’s world. The shadow-only UI augments the immersion and brings the experience closer to the shadow theater that inspired it.

The UI of Ombre Fabula
Here, the user moves the hand to the right to lead the protagonist out of his grandma’s house.

Implementing Hand Tracking

In the initial stage of development, the team focused on using the palm- and finger-tracking capabilities of the Intel RealSense SDK to track the shape of the hands as they formed the simple gestures for rabbit, bird, and dragon. They chose these three shapes because they are the most common to the style of shadow theater that inspired that app, and because they are the most simple and intuitive for users to produce. The rabbit is produced by pinching an O with the thumb and finger of a single hand and raising two additional fingers for the ears; the bird is formed by linking two hands at the thumbs with the out-stretched fingers as the wings; and dragon is made by positioning two hands together in the form of a snapping jaw.

Ombre Fabula Animal Shapes
The basic hand gestures used to produce the rabbit, bird, and dragon.

However, it was discovered that the shapes presented problems for the Intel RealSense SDK algorithms because of the inconsistent visibility and the overlapping of fingers on a single plane. Essentially, the basic gestures recognized by the Intel RealSense camera—such as five fingers, a peace sign, or a thumbs-up, for example—were not enough to detect the more complex animal shapes required.

As a result, the team moved away from the Intel RealSense SDK’s native hand-tracking and instead used the blob detection algorithm, which tracks the contours of the hands. This delivers labeled images—of the left hand, for example—and then the Intel RealSense SDK provides the contours of that image.

Ombre Fabula Recognize Gestures
Here, the hand gestures for bird, dragon and rabbit are shown with the labeled hand contours used to allow Ombre Fabula to recognize the different gestures.

At first, extracting the necessary contour data from the Intel RealSense SDK was a challenge. While the Unity* integration is excellent for the hand-tracking of palms and fingers, this wasn’t what was required for effective contour tracking. However, after spending time with the documentation and working with the Intel RealSense SDK, the team was able to pull the detailed contour data required to form the basis for the custom shape-detection.

Ombre Fabula Virtual Rabbit
The user forms a rabbit-like shape with the hand in order to make the rabbit shadow-puppet appear on-screen.

Ombre Fabula Collect the Dots
The user then moves the hand to advance the rabbit through the game environment and collect the dots of yellow light.

Using the Custom Blob Detection Algorithm

Once the blob data was pulled from the Intel RealSense SDK, it needed to be simplified in order for the blob detection to be effective for each of the three forms—rabbit, bird, and dragon. This process proved more complex than anticipated, requiring a great deal of testing and iteration of the different ways to simplify the shapes in order to maximize the probability of them being consistently and accurately detected by the app.

// Code snippet from official Intel "HandContour.cs" script for blob contour extraction

int numOfBlobs = m_blob.QueryNumberOfBlobs();
PXCMImage[] blobImages = new PXCMImage[numOfBlobs];

for(int j = 0; j< numOfBlobs; j++)
{
	blobImages[j] = m_session.CreateImage(info);


	results = m_blob.QueryBlobData(j,  blobImages[j], out blobData[j]);
	if (results == pxcmStatus.PXCM_STATUS_NO_ERROR && blobData[j].pixelCount > 5000)
	{
		results = blobImages[j].AcquireAccess(PXCMImage.Access.ACCESS_WRITE, out new_bdata);
		blobImages[j].ReleaseAccess(new_bdata);
		BlobCenter = blobData[j].centerPoint;
		float contourSmooth = ContourSmoothing;
		m_contour.SetSmoothing(contourSmooth);
		results = m_contour.ProcessImage(blobImages[j]);
		if (results == pxcmStatus.PXCM_STATUS_NO_ERROR && m_contour.QueryNumberOfContours() > 0)
		{
			m_contour.QueryContourData(0, out pointOuter[j]);
			m_contour.QueryContourData(1, out pointInner[j]);
		}
	}
}

The contour extraction code used to pull the contour data from the Intel RealSense SDK.

The simplified data was then run through a freely available software algorithm called $P Point-Cloud Recognizer*, which is commonly used for character recognition of pen strokes and similar tasks. After making minor modifications and ensuring its good functioning in Unity, the developers used the algorithm to detect the shape of the hand in Ombre Fabula. The algorithm decides–with a level of probability of around 90 percent–which animal the user’s hand-form represents, and the detected shape is then displayed on screen.

// every few frames, we test if and which animal is currently found
void DetectAnimalContour () {
	// is there actually a contour in the image right now?
	if (handContour.points.Count > 0) {
		// ok, find the most probable animal gesture class
		string gesture = DetectGestureClass();
		// are we confident enough that this is one of the predefined animals?
		if (PointCloudRecognizer.Distance < 0.5) {
			// yes, we are: activate the correct animal
			ActivateAnimalByGesture(gesture);
		}
	}
}

// detect gesture on our contour
string DetectGesture() {
	// collect the contour points from the PCSDK
	Point[] contourPoints = handContour.points.Select (x => new Point (x.x, x.y, x.z, 0)).ToArray ();

	// create a new gesture to be detected, we don't know what it is yet
	var gesture = new Gesture(contourPoints, "yet unknown");

	// the classifier returns the gesture class name with the highest probability
	return PointCloudRecognizer.Classify(gesture, trainingSet.ToArray());
}

// This is from the $P algorithm
// match a gesture against a predefined training set
public static string Classify(Gesture candidate, Gesture[] trainingSet) {
	float minDistance = float.MaxValue;
	string gestureClass = "";
	foreach (Gesture template in trainingSet) {
		float dist = GreedyCloudMatch(candidate.Points, template.Points);
		if (dist < minDistance) {
			minDistance = dist;
			gestureClass = template.Name;
			Distance = dist;
		}
	}
	return gestureClass;
}

This code uses the $P algorithm to detect which specific animal form is represented by the user’s gesture.

Getting Player Feedback Early Through Testing and Observation

Early in the development process, the team realized that no two people would form the shapes of the different animals in exactly the same way—not to mention that every individual’s hands are different in size and shape. This meant that a large pool of testers was needed to fine-tune the contour detection.

Conveniently, Minh’s situation at the university gave her access to just such a pool, and approximately 50 fellow students were invited to test the app. For the small number of gestures involved, this number of testers was found to be sufficient to optimize the app’s contour-detection algorithm and to maximize the probability that it would display the correct animal in response to a given hand-form.

Ombre Fabula Pink Lights
Moving the hands left or right moves the camera and causes the protagonist to follow the animal character through the shadow world.

In addition to creating something of a magical moment of connection for the user, the simulated shadow of the user’s hands on screen delivered useful visual feedback. During testing, the developers observed that if the animal displayed was other than the one the user intended, the user would respond to the visual feedback of the shadow on screen to adjust the form of their hands until the right animal appeared. This was entirely intuitive for the users, requiring no prompting from the team.

A common problem with gesture detection is that users might make a rapid gesture—a thumbs up, for example—which can give rise to issues of responsiveness. In Ombre Fabula, however, the gesture is continuous in order to maintain the presence of the animal on the screen. Testing showed that this sustained hand-form made the app feel responsive and immediate to users, with no optimization required in terms of the hand-tracking response time.

Ombre Fabula has been optimized for a short play-time of 6–10 minutes. Combined with a format in which users naturally expect to keep their hands raised for a certain period of time, the testers didn’t mention any hand or arm fatigue.

Under the Design Hood: Tools and Resources

The previous experience that Minh and Prefrontal Cortex have of creating interactive installations helped them make efficient decisions regarding the tools and software required to bring their vision to life.

Intel RealSense SDK

The Intel RealSense SDK was used to map hand contours, for which the contour tracking documentation provided with the SDK proved invaluable. The developers also made use of the Unity samples provided by Intel, trying hand-tracking first. When the hand-tracking alone was found to be insufficient, they moved on to images and implemented the custom blob detection.

Unity software

Both Minh and Prefrontal Cortex consider themselves designers first and foremost, and rather than invest their time in developing frameworks and coding, their interest lies in quickly being able to turn their ideas into working prototypes. To this end, the Unity platform allowed for fast prototyping and iteration. Additionally, they found the Intel RealSense Unity toolkit within the Intel RealSense SDK a great starting point and easy to implement.

$P Point-Cloud Recognizer

The $P Point-Cloud Recognizer is a 2D gesture-recognition software algorithm which detects, to a level of probability, which line-shape is being formed by pen strokes and other similar inputs. It’s commonly used as a tool to support rapid prototyping of gesture-based UIs. The developers lightly modified the algorithm and used it in Unity to detect the shape the user’s hands are making in Ombre Fabula. Based on probability, the algorithm decides which animal the shape represents and the app then displays the relevant visual.

Ombre Fabula Dragon
The dragon is made to appear by forming a mouth-like shape with two hands, as seen in the background of this screenshot.

What’s Next for Ombre Fabula

Ombre Fabula has obvious potential for further story development and adding more hand-shadow animals for users to make and control, although the team currently has no plans to implement this. Ultimately, their ideal scenario would be to travel internationally with Ombre Fabula and present it to the public as an interactive installation—its original and primary purpose.

Intel RealSense SDK: Looking Forward

Felix Herbst from Prefrontal Cortex is adamant that gesture-control experiences need to be built from the ground up for the human interface, and that simply adapting existing apps for gesture will, more often than not, result in an unsatisfactory user experience. He emphasizes the importance of considering the relative strengths of gestural interfaces—and of each individual technology—and developing accordingly.

Those types of appropriate, useful, and meaningful interactions are critical to the long-term adoption of human interface technologies. Herbst’s view is that if enough developers create these interactions using Intel RealSense technology, then this type of human interface has the potential to make a great impact in the future.

About the Developers

Born in Vietnam and raised in Germany, Minh Nguyen is an experienced designer who uses cutting-edge technologies to create innovative, interactive multimedia installations and games. She is currently completing her masters in multimedia and VR design at Burg Giebichenstein University of Art and Design Halle in Germany. In addition to her studies, Minh freelances under the name Tibimi on such projects as Die besseren Wälder from game studio The Good Evil. Based on an award-winning play, the game is designed to encourage children and teens to consider the meaning of being ‘different’.

Prefrontal Cortex is a team of three designers and developers―Felix Herbst, Paul Kirsten and Christian Freitag―who use experience and design to astound and delight users with uncharted possibilities. Their award-winning projects have included the world-creation installation Metaworlds*, the [l]ichtbar interactive light projection at the Farbfest conference, the multi-touch image creation tool Iterazer*, and the award-winning, eye-tracking survival shooter game Fight with Light*. Their large-screen, multiplayer game Weaver* was a finalist in the Intel App Innovation Contest 2013. In addition to all these experimental projects, they create interactive applications for various industry clients using virtual and augmented reality.

Both Minh and the Prefrontal Cortex team intend to continue exploring the possibilities of human interface technologies, including the Intel RealSense solution.

Additional Resources

The video demonstrating Ombre Fabula can be found here.

For more information about the work of the Ombre Fabula creators, visit the Web sites of Minh Nguyen (Tibimi) and Prefrontal Cortex.

The Intel® Developer Zone for Intel® RealSense™ Technology provides a detailed resource for developers who want to learn more about Intel RealSense solutions. Developers can also download the Intel RealSense SDK and Intel RealSense Developer Kit, along with a number of useful Intel RealSense Technology Tutorials.

Viewing all 461 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>