Quantcast
Channel: Intel Developer Zone Articles
Viewing all 461 articles
Browse latest View live

Comparing Touch Coding Techniques - Windows 8 Desktop Touch Sample

$
0
0

Download Sample Code / Article


 Interaction Context Sample.zip v1.1 (122 KB) - Updated Sample Code as of Feb 2014!

      {New! Updated Interaction Context Sample Code with the following changes:

  • Dynamically changeable API (new)
  • DPI Aware (new)
  • No longer need to rebuild application to change API usage}

Coding Techniques - Windows 8 Desktop Touch Sample.pdf (525 KB) - Article

Abstract


There are three ways to support touch input and gestures in Microsoft Windows 8* Desktop apps: Using the WM_POINTER, WM_GESTURE, or WM_TOUCH messages. The featured sample code teaches you how to implement a similar user interface with each.

WM_POINTER is the simplest to code and supports the richest set of gestures but only runs on Windows 8. WM_GESTURE is easy to code and is backward compatible with Windows 7* but has some gesture limitations. WM_TOUCH is also backward compatible with Windows 7 but requires a lot of code because you must write your own gesture and manipulation recognizes from lower-level touch events.

Overview


Media and the web have been abuzz about touch input, and new, attractive Ultrabook™ device designs with touch screens are arriving all the time. Your customers may already be familiar with using touch devices and a rich set of gestures to interact with apps. Microsoft Windows 8 provides new ways to use touch input in your app with two separate user interfaces, one for Windows 8 Store apps and another for Windows 8 Desktop apps. There are advantages to each, but one of the more important features of Windows 8 Desktop apps is that they can be backward compatible with Windows 7. Existing Windows apps, which are being upgraded with touch, have a more direct path if they target Windows 8 Desktop with one of these techniques. This article explains how to implement touch on Windows 8 Desktop.

Windows 8 Desktop has three ways to implement touch. The WM_POINTER, WM_GESTURE, and WM_TOUCH techniques can give your customers roughly the same user experience. Which should you use? The sample code featured with this article will help you compare the code required to implement each technique. We’ll look at the pros and cons of each technique.

Welcome to the Sample


The sample uses one of the three techniques to implement a familiar set of gestures. You can switch between the techniques with a quick code change.

To get started, open the sample solution in Visual Studio* 2012, build it, and run it on your touch-enabled Windows 8 system. Now that the app is running, you’ll see a set of colored blocks that you can manipulate, some descriptive text about which technique the app is using, and a list of the gestures you may use. Try each of the gestures listed on screen. Drag a block with one finger to pan it on top of another, pinch to zoom in and out, and so on. 

Figure 1: The sample on startup

Once you are familiar with the app’s behaviors, look in InteractionContextSample.cpp to find the main code. You’ll find typical startup and message-handling code.

By default, the sample uses WM_POINTER messages to receive touch input and uses the Interaction Context functions for recognizing gestures and manipulations from those messages. The samples have the descriptive text on-screen to highlight the Windows message technique being used and list the ways you may interact with the objects.

With a typical structure for desktop Windows apps, the sample has a WndProc function to handle incoming Windows messages. The message handling code is different for the three implementations. For each implementation, there’s some similar initialization code, both for the objects being drawn and for the background. Look for code like this in each case:

//Initialize each object
for(int i = 0; i < DRAWING_OBJECTS_COUNT; i++)
{
	g_objects[i].Initialize(hWnd, g_D2DDriver);
	g_zOrder[i] = i;
}
 
// Initialize background
g_background.Initialize(hWnd, g_D2DDriver);

// Set Descriptive Text
WCHAR text[256] = L"Mode: WM_POINTER*\nPan: One Finger\nZoom: Pinch\nRotate: 
                    Two Fingers\nColor: One Finger Tap\nReset: Press and Hold\n";
memcpy(g_szDescText, text, 256);

Figure 2:Example initialization code, from WM_POINTER case

Because the objects may overlap, the sample maintains a Z-order so that touch events go only to the top object. This code shows the g_zOrder array of objects that are used to keep track of the squares on the screen (plus the g_background background object).

DrawingObject.cpp contains the mechanics of object creation and manipulation (for all message types).

Inertia is very useful for making touch implementations “feel” right. When you pan or rotate an object, inertia can allow it to keep moving like a physical object.

Let’s study each implementation in more detail.

WM_POINTER with Interaction Context

WM_POINTER messages are Microsoft’s recommended way to handle touch input for Windows 8 Desktop. More than a single message, this set of messages includes WM_POINTERDOWN, WM_POINTERUP, and so on; together we’ll call them WM_POINTER. These messages are only supported on Windows 8, so if you want backward compatibility with Windows 7, you’ll need to choose one of the other two techniques. These WM_POINTER messages give you high-level access to touch events. They’re used with the Interaction Context.

But what’s the Interaction Context? It’s the Windows 8 Desktop mechanism that detects gestures and handles manipulations. It works together with the WM_POINTER messages to give full gesture support.

There’s no special initialization code. Your app will receive WM_POINTER messages at startup. This is the sample’s default touch message. There is one bit of optional initialization code added to this case, however. If you want to use mouse input as if it were touch input, call this function:

//Enable mouse to be pointer type
EnableMouseInPointer(TRUE);

Figure 3:WM_POINTER mouse setup (optional)

This call lets the mouse generate WM_POINTER messages so separate mouse message handling is unnecessary. Without it, you could still turn mouse input into touch or gesture input, but you would have to write code to handle the various mouse messages. You get this handling for free by calling this function. This is a unique advantage of using WM_POINTER.

Look at the initialization code in DrawingObject.cpp to see how the drawing objects are created. For example, manipulation events (translate, rotate, scale) get inertia, via its flags here.

n your app uses WM_POINTER, it will receive groups of gesture events. Under most conditions, this will be a sequence starting with a single WM_POINTERDOWN, followed by any number of WM_POINTERUPDATE messages, and finishing with a WM_POINTERUP. You may see some other cases, like WM_POINTERCAPTURECHANGED, when the window loses ownership of its pointer.

Look at the code for these messages in WndProc. The code to handle each message begins with a call to GetPointerInfo, which fills a POINTER_INFO struct describing the touch. The code cycles through the objects and checks if the touch applies to the object. If it does, the touch is processed for that object.

We will look specifically at the WM_POINTERDOWN case, but they’re all similar:

case WM_POINTERDOWN:
	// Get frame id from current message
	if (GetPointerInfo(GET_POINTERID_WPARAM(wParam), &pointerInfo))
	{
		// Iterate over objects, respecting z-order: from the one at the top to
		// the one at the bottom.
		for (int i = 0; i < DRAWING_OBJECTS_COUNT; i++)
		{
			int iz = g_zOrder[i];
			if (g_objects[iz].HitTest(pointerInfo.ptPixelLocation))
			{
				// Object was hit, add pointer so Interaction Context can process
				hr = ddPointerInteractionContext
					(g_objects[iz].GetInteractionContext(),
					GET_POINTERID_WPARAM(wParam));
				if (SUCCEEDED(hr))
				{
					// Bring the current object to the front
					MoveToFront(iz);
					g_objects[iz].AddPointer(GET_POINTERID_WPARAM(wParam));
					ProcessPointerFrames(wParam, pointerInfo, iz);
				}
				break;
			}
			else if (i == DRAWING_OBJECTS_COUNT-1)
			{
				// No objects found in hit testing.
				// Assign this pointer to background.
				hr = ddPointerInteractionContext
					(g_background.GetInteractionContext(),
					GET_POINTERID_WPARAM(wParam));
				if (SUCCEEDED(hr))
				{
					g_background.AddPointer(GET_POINTERID_WPARAM(wParam));
					ProcessBackgroundPointerFrames(wParam, pointerInfo);
				}
			}
		}
	}
	break;

Figure 4: WM_POINTERDOWN processing showing hit testing and processing

If an object is hit, we add the object to the list of objects that need to process this touch event using AddPointerInteractionContext. The Interaction Context collects objects that it will affect. Here, the hit detection is done by checking the touch coordinates against the object’s bounding box. This is adequate since the objects are solid rectangles; more complex objects will require a more complex hit test.  Once detected, we keep track of the touch ID from GET_POINTERID_WPARAM(wParam). Future touch events can avoid the hit detection and check this ID to see if they apply to this object. If the touch did not hit any of the objects, it hit the background.

WM_POINTERDOWN begins a new gesture. This gesture can change the Z-order. One such event is panning a single object, which may drop the object on top of another. If the Z-order changes, we call ChangeZOrder. It moves this object to the top and re-orders the rest toward the background. 

Now that the objects are in the right order, we handle the touch with the object’s ProcessPointerFrames or ProcessBackgroundPointerFrames. These functions are straightforward. They check if the touch is already being handled, get a list of all touch events in this pointer frame (usually one), and process the touch on this object (with ProcessPointerFramesInteractionContext).

The other cases are similar. Pointer updates (WM_POINTERUPDATE) arrive during a gesture, and the end of a gesture is marked by a WM_POINTERUP message. The UPDATE case doesn’t require a hit test, since we can tell by the ID whether this gesture is already applying to this object. To keep track of the UP case, we just remove the pointer from the interaction context with RemovePointerInteractionContext. If this gesture is one that affected the Z-order, the Z-order was already rearranged when the WM_POINTERDOWN message arrived, so the UPDATE and UP cases don’t need to change Z-order.

To see when the gestures are finalized, look in DrawingObject.cpp at the OnInteractionOutputCallback method. Various tap events move directly to handling functions. The rest are manipulation events, and the code calls OnManipulation to process them. Manipulation may contain any combination of pan, scale, and rotate, and that function implements all three together. Manipulation events have inertia (due to their setup options), so the gesture will have INTERACTION_FLAG_INERTIA set in its interactionFlags argument. The code starts a short timer to manage the inertia (giving the object time to move after the touch is released). If this timer is started, the WndProc will receive a WM_TIMER message later when the timer expires, and the code calls ProcessInertiaInteractionContext on all affected objects.

All the system-supported gestures are handled inside this call, so there’s no gesture-specific code here.

WM_GESTURE

Now, let’s look at the WM_GESTURE code. This is a straightforward interface that was first implemented on Windows 7, which you may also use on Windows 8 Desktop. Of the backward-compatible message types, it’s the easiest to code, but it comes with the most restrictions. You’ll see here how the gesture set is slightly different than the WM_POINTER gestures, gestures may only work on one object at a time, and complex gestures may only have two types of gesture per gesture event.

WM_GESTURE provides high-level access to system-supported gestures. There’s very little initialization code required. Your app simply needs to register for the gestures it will use:

// WM_GESTURE configuration
 DWORD panWant =  GC_PAN
				| GC_PAN_WITH_SINGLE_FINGER_VERTICALLY
				| GC_PAN_WITH_SINGLE_FINGER_HORIZONTALLY
				| GC_PAN_WITH_INERTIA;
 GESTURECONFIG gestureConfig[] =
 {
	{ GID_PAN, panWant, GC_PAN_WITH_GUTTER },
	{ GID_ZOOM, GC_ZOOM, 0 },
	{ GID_ROTATE, GC_ROTATE, 0},
	{ GID_TWOFINGERTAP, GC_TWOFINGERTAP, 0 },
	{ GID_PRESSANDTAP, GC_PRESSANDTAP, 0 }
 };
 SetGestureConfig(hWnd, 0, 5, gestureConfig, sizeof(GESTURECONFIG));

Figure 5: Gesture init code for WM_GESTURE

The supported gestures don’t exactly match those used by WM_POINTER or other common touch environments. These two finger tap and press-and-tap gestures have not gained widespread use. On the other hand, some of the gestures in common use (double-tap) are not supported by this interface. While it’s not shown here, you can use mouse click events to supplement the WM_GESTURE gesture set and easily implement those gestures for your app.

If you analyze the sample when using WM_GESTURE, you will discover that gestures only work on one object at a time and that complex gestures (pan, zoom, and rotate) are limited to two types per gesture. Even with these restrictions, WM_GESTURE gives you the simplest way to support touch on Windows 7 and Windows 8 Desktop. As an added bonus, this interface implements inertia with no additional code.

In WndProc, the section for handling WM_GESTURE is simple. Gesture details are fetched with GetGestureInfo:

GESTUREINFO gestureInfo;
ZeroMemory(&gestureInfo, sizeof(GESTUREINFO));
gestureInfo.cbSize = sizeof(GESTUREINFO);
GetGestureInfo((HGESTUREINFO)lParam, &gestureInfo);

Figure 6: Fetching details on this gesture

With the gesture info, we can check details of this gesture. As with other gesture message sequences, these gesture messages contain an initial message, a sequence of ongoing messages, and a final message: 

switch(gestureInfo.dwID)
{
case GID_BEGIN:
	p.x = gestureInfo.ptsLocation.x;
	p.y = gestureInfo.ptsLocation.y; 
	
	for (int i=0; i<DRAWING_OBJECTS_COUNT; i++)
	{
		int iz = g_zOrder[i];
		if (g_objects[iz].HitTest(p))
		{
			MoveToFront(iz);
			g_objects[iz].AddPointer(gestureInfo.dwInstanceID);
			break;
		}
		else if (i == DRAWING_OBJECTS_COUNT-1)
		{
			// No objects found in hit test, assign to background
			g_background.AddPointer(gestureInfo.dwInstanceID);
		}
	}
	break;

Figure 7:Capturing the beginning of a gesture

First, we capture the starting location. As in the previous section, when a gesture starts, the code compares the coordinates of the gesture with each of the objects. If the touch hit one of the objects (or the background), change the Z-order starting with that object, move it to the front, and rearrange the other objects toward the background. Then, add this pointer’s ID to it.

As later messages in the sequence arrive, they’re simple to process. Note which objects the pointer affects (by checking the ID), and process the gesture on them. There’s a separate function for each of the gesture types. Look at the OnTranslate, OnScale, and OnRotate functions in the sample for the details of how each gesture changes the objects. Other gestures use the same support functions that were used in the sample for the other message types.

When a gesture is complete, remove the pointer from any objects it affects.

WM_GESTURE has its limitations, but supporting gestures with WM_GESTURE is straightforward and easy to code.

WM_TOUCH

WM_TOUCH provides the complete solution for backward-compatible touch support with Windows 7. Unlike the previous message types, this message doesn’t give gesture notification. It notifies your app for every touch event. Your code must collect the events and recognize its own gestures. While this can involve a fair amount of code, it also gives you precise control over gesture types and details.

Let’s start with a bit of configuration. Call RegisterTouchWindow to ensure that your app receives WM_TOUCH messages (otherwise, your app will receive WM_GESTURE or WM_POINTER messages).

The sample creates manipulation processor objects to handle the actual object manipulations and inertia processor objects to support inertia on the objects as they’re manipulated. Each is created for every screen object (and the background):

hr = CoCreateInstance(CLSID_ManipulationProcessor,
					  NULL,
					  CLSCTX_INPROC_SERVER,
					  IID_IUnknown,
					  (VOID**)(&g_pIManipulationProcessor[i]));
hr = CoCreateInstance(CLSID_InertiaProcessor,
					  NULL,
					  CLSCTX_INPROC_SERVER,
					  IID_IUnknown,
					  (VOID**)(&g_pIInertiaProcessor[i]));

g_pInertiaEventSink[i] = new CManipulationEventSink(
												g_pIInertiaProcessor[i],
												hWnd,
												&g_objects[i],
												NULL);
g_pManipulationEventSink[i] = new CManipulationEventSink(
												g_pIManipulationProcessor[i],
												g_pIInertiaProcessor[i],
												g_pInertiaEventSink[i],
												hWnd,
												&g_objects[i],
												NULL);

Figure 8: Creating manipulation and inertia processors

The manipulation and inertia code is in CManipulationEventSink.cpp and provides deeper initialization details.

With the app initialized, it’s time to start receiving touch messages. There’s a single case in WndProc to receive WM_TOUCH messages. We’ll discuss later how the WM_TIMER message is used in this case, too.

Once the WndProc receives a WM_TOUCH message, call GetTouchInput to discover the full set of all touch contacts in this message. Iterating across the contacts, the high-level code will now look familiar. First, capture the coordinates of this touch contact. Check if this is a “down” touch by comparing the touch input’s dwFlags with TOUCHEVENTF_DOWN. Loop through the objects and check each to see if the touch hit the object. If it did, change the Z-order so the object pops to the top, add the pointer to the list of ones affecting this object, and then call the manipulation processor’s ProcessDownWithTime function. This passes a timestamp along with the object to be manipulated.

Looking closer, it’s a bit more complex than you’ve seen with the other cases. That manipulation processor function will in turn call into the CManipulationEventSink objects. During a manipulation, the code first calls ManipulationStarted. This will start a hold timer so we can detect when a touch has been held “long enough” to count as a hold (set here to 1000 milliseconds or 1 second). The details of that manipulation are done via ManipulationDelta, which kills any existing hold timer, then passes regular manipulations on to the drawing object to finish. After the manipulation is done (signaled by calling ManipulationCompleted), you may continue to see calls here, so the end of this function continues to call the object’s OnManipulation function as long as the inertia sink object is emitting more manipulation calls.

There’s a fair amount of code here to handle the specifics.  If you’re using WM_TOUCH, read up on the manipulation and inertia processors and study the contents of CManipulationEventSink.cpp closely.

That walks you briefly through TOUCHEVENTF_DOWN. There’s similar code for TOUCHEVENTF_MOVE and TOUCHEVENTF_UP.

Which to Use?

Which API should you use for Windows 8 Desktop? That depends. By now, you should have a sense of the common points between the implementations, as well as the differences.

If you want to have backward compatibility between Windows 8 Desktop and Windows 7, then you may not use WM_POINTER. Only WM_GESTURE and WM_TOUCH will run on both. If you only wish to support Windows 8 Desktop, then WM_POINTER is clearly best because it’s the easiest to implement.

Looking deeper, let’s compare the different message types:

 

 

WM_POINTER

WM_GESTURE

WM_TOUCH

Abstraction

Gesture

Gesture

Touch contact but this gives full control

Ease of coding

Easy

Easy

Complex

Completeness

Manipulation

Easy

Easy

Complex

Inertia

Trivial

Automatic

Complex

Gesture set

Complete

Limited

You implement

Manipulate multiple objects

Many

One

You implement

Concurrent gestures

Any

Limited to 2

You implement

Mouse as pointer input

Trivial

You implement

You implement

 

Figure 9: Comparing message types

Depending which level of abstraction you want, you may prefer the full control of WM_TOUCH over WM_POINTER and WM_GESTURE, although you will find yourself writing a lot more code. If you can live with its limitations, WM_GESTURE may be right for you.

Write your app’s touch support with the techniques that are best for you!

About the Author


Paul Lindberg is a Senior Software Engineer in Developer Relations at Intel. He helps game developers all over the world to ship kick-ass games and other apps that shine on Intel platforms.

About the Code Samples Authors


Mike Yi is a Software Applications Engineer with Intel’s Software and Solutions Group and is currently focused on enabling applications for Intel’s Core platforms. Mike has worked on processor tool development and enabling video games for Intel platforms in the past.

MJ Gellada worked as an intern in the Software Solutions Group at Intel. He is attending the University of Michigan to pursue his degree in Computer Engineering.

Joseph Lee worked as an intern in the Software Solutions Group focusing on development for touch and DPI awareness. He is attending the University of Washington Bothell to pursue his Computer Science degree.

References


Microsoft Desktop Dev Center – http://msdn.microsoft.com/en-us/windows/desktop

Interaction Context Reference – http://msdn.microsoft.com/en-us/library/windows/desktop/hh437192(v=vs.85).aspx

Pointer Input Message Reference – http://msdn.microsoft.com/en-us/library/windows/desktop/hh454903(v=vs.85).aspx

Intro to WM_TOUCH and WM_GESTURE blog – http://software.intel.com/en-us/blogs/2012/08/16/intro-to-touch-on-ultrabook-touch-messages-from-the-windows-7-interface/

WM_GESTURE reference – http://msdn.microsoft.com/en-us/library/windows/desktop/dd353242(v=vs.85).aspx

Getting started with Windows Touch Messages (WM_TOUCH) – http://msdn.microsoft.com/en-us/library/windows/desktop/dd371581(v=vs.85).aspx

Inertia and Manipulation Reference – http://msdn.microsoft.com/en-us/library/windows/desktop/dd372613(v=vs.85).aspx

Touch Samples – http://software.intel.com/en-us/articles/touch-samples


Intel XDK: Develop Once, Deploy Everywhere

$
0
0

*Post by Fredrick Odhiambo, Intel East Africa

Over the last couple of months, Intel has been improving the various aspects of the Intel XDK.  Intel XDK uses HTML5 to enable the development of cross-platform applications for both Desktop and Mobile.

Intel promotes HTML5 because of the many advantages that HTML5 offers developers globally. Intel believes in the importance of helping experienced developers transition to this Intel XDK cross-platform approach aids new developers to quickly deploy their apps & games on virtually all modern computing platforms.

Intel XDK allows developers to code once with HTML5 and produce applications to many platforms without having to know the various languages that correspond to the specific target platforms. In other words, developers do need not know Java to make Android Apps. Intel XDK gives developers permission to build applications as:

  • A mobile application: For iOS Ad Hoc, iOS Production, Android, Crosswalk for Android (new in v0505), Windows 8 Store, Windows Phone 8, Tizen, Nook and Cordova for Android(*new in v0505).
  • A web application: A Web app, Chrome App and a Facebook App.

From what was initially a web browser service, the XDK has evolved and become a full software development kit with the following features:
The XDK has a built in environment that enables developers emulate their apps on several virtual devices and get to know how their app will look like on each device (IPhone, Microsoft Surface Pro, and Motorola Droid 2 among others) as they develop their apps

Users can get to test their app on their phones (iOS, Android, Windows and Tizen devices). In addition, the same app can also be run as a desktop application (for users of Windows 8 & 8.1). This feature, which enables developers to deploy their apps on their devices as they are developing, helps in producing quick prototypes. This is achievable from the “Test”, “Debug” and “Profile” tabs on the Intel XDK. 

Developers also have the ability to store code in the cloud for free, and are able to retrieve it on any device at any location in the world once they log in to their accounts. Thanks to Intel’s XDK’s cloud storage system, a developer can retrieve their build codes and proceed to work on their projects. This cloud service is free and developers do not have pay for it.

Improved documentation, tutorials and a growing user base has made it easy for XDK users to get quick help to their development problems.

FEATURED ANDROID APP MADE WITH THE XDK DEVELOPED BY A KENYAN DEVELOPER 
FoodBei is a mobile app that was made using the Intel XDK. It is intended to assist in getting an easy access to food items from major Kenyan towns, as well as getting to know the prices of the various basic food items - cereals, vegetables, meat, eggs and fruits. 
The developer, Juma Owori, took 2 weeks to develop his app, which is a client server mobile application using the Intel XDK. 

For making apps using the XDK, developers require:
1.    App Idea
2.    Intel XDK
3.    Intel Developer Zone Account (For access to a wide range of free tools and resources to enable them to optimise their applications)
4.    HTML5 & CSS3 skills

Start using Intel’s XDK today. Grab your latest copy for Windows, Linux or Mac from: http://xdk-software.intel.com/

2 in 1 User Experience

$
0
0

Learn cutting edge UX from our world-class, featured expert, Luke Wroblewski and unlock the incredible capabilities, opportunities and user experiences that 2 in 1 devices can deliver. We’ll even show how you can take advantage of our Intel® Developer Zone program resources to design, build, test, and market your application – what more could you ask for!

VideoWhat you will learn

Rethink App Design for 2 in 1s

Usability expert, Luke Wroblewski worked with Intel to produce a series of practical short form videos. In the first, Luke examines the incredible opportunities and experiences 2 in 1 devices can deliver.

Video ›


Rethink Navigation for 2 in 1s

With the fantastic experiences 2 in 1s can offer, Luke Wroblewski looks at how to consider specific Navigation options and enables us to deliver the best UX possible.

Video ›


 

How to Design for Device Motion (Coming soon!)

In this video, Luke Wroblewski shows how to leverage every aspect of your 2 in 1s device motion sensors and create truly immersive experiences.


Head of the Order* Casts a Spell on Perceptual Computing

$
0
0

By Edward J. Correia

When Intel put a call out for apps that would transform the way we interface with traditional PCs, Jacob and Melissa Pennock, founders of Unicorn Forest Games, entered their spell-casting game, Head of the Order*, into the Intel® Perceptual Computing Challenge. They took home a top prize.


Figure 1:Unicorn Games' teaser trailer for Head of the Order*.

The idea was to control an app with zero physical contact. Head of the Order (Figure 1) immerses players in a tournament-style game through beautifully illustrated indoor and outdoor settings where they conjure up spells using only hand gestures. When the gameplay for the demo was in the design stages, one of the main goals was to create a deep gestural experience that permitted spells such as shields, fire, and ice to be combined. "While it feels great to cast your first fireball, we really wanted to give the player that moment of amazement the first time they see their opponent string a number of gestures together and annihilate them," said Jacob. He imagined it being like the first time watching a highly skilled player in a classic fighting game and wanted there to be a learning curve so players would feel accomplished when they succeeded. Jacob's goal was to make it different than mastering play with a physical controller. "By using gesture, we are able to invoke the same feelings people get when learning a martial art; the motions feel more personal," he said.

The Creative Team

While Jacob created the initial game concept, he and Melissa are both responsible for the creative elements, and they work together to refine the game's design. According to Jacob, Melissa's bachelor's degree in psychology gives her valuable insight into user-experience design, and her minor in studio art led to Head of the Order's impressive visuals. "Her expertise is in drawing, especially detailed line work," said Jacob. Melissa also has a strong background in traditional and 2D art, and created all of the 2D UI assets, including the heads-up display, the main menu, and the character-select screen. Together, they set up rendering and lighting, worked on much of the texturing, and designed levels around what they could do with those assets and special effects. "Melissa added extensive art direction, but we cannot take credit for the 3D modeling, most of which was purchased from the Unity Asset Store," said Jacob. Melissa also created the interactive spell book, which players can leaf through in the demo with a swipe of the hand (see Figure 2).


Figure 2: The pages of the Head of the Order* spell book are turned with a swipe of the hand.

Jacob studied at East Carolina University, where he developed his own gesture-recognition system while earning bachelor's degrees in mathematics and computer science, with a focus on human-computer interaction. His work includes a unistroke gesture-recognition algorithm that improved the accuracy of established algorithms of the day. Originally created in 2010 and written in C++, Jacob has licensed his HyperGlyph* library to major game studios. "It was my thesis work at graduate school [and] I have used it in several projects," he said. "It's quite accurate and designed for games." Although the library is very good at matching input to things within the gesture set, Jacob added that it's also very good at rejecting false positives and input outside of the template set, which is a large problem for many other stroke-based gesture systems that can lead to "…situations where the player scribbles anything he wants, and the power attack happens." The strictness and other factors of the library's recognizer are configurable.

His education also proved valuable when it came to the complexities of processing the incoming camera data; that was something he had already mastered. "I had a lot of experience with normalizing and resampling input streams to look at them," said Jacob. While at the university, he developed an extension of formal language theory that describes inputs involving multiple concurrent streams.

Harnessing the 3D Space

The perceptual camera used at Unicorn Forest Games is a Creative* Senz3D. When paired with the Intel® Perceptual Computing SDK, applications can be made to perform hand and finger tracking, facial-feature tracking, and some voice and hand-pose-based gesture recognition. Jacob said he prefers this camera for its close-range sensing; Intel's camera is designed to track the hands and face from a few inches to a few feet away. "It is lighter and faster, and it feels like there are fewer abstraction layers between my code and the hardware," he said.

To process data coming from the camera, Jacob set up a pipeline for each part of the hand that he wanted to track. The app then requests streams of information through the SDK and polls the streams for changes. Those data streams become a depth map of the 3D space. The SDK pre-segments hands from the background, and from that a low-resolution image is created. "We also request information on the hand tracking, which comes as labels mapped to transforms with position and rotation mapped in the 3D space," he added. The Intel Perceptual Computing SDK provides labels for left hand, right hand, individual fingers, and palms.

Describing a small quirk in the Intel Perceptual Computing SDK, Jacob said that the labels at times can become mixed up; for example, the user's right hand might appear in the data under the left-hand label,  or vice versa. Jacob addressed the issue by making the system behave exactly the same when using either hand. "In fact, the [Intel] Perceptual Computing SDK is often reporting that left is right and vice versa, and the game handles it just fine," he said.1

Mapping labels aside, Head of the Order implements a hand-tracking class that initializes the pipelines and does all the interacting with the Intel Perceptual Computing SDK. "This class stores and updates all of the camera information," said Jacob. "It also performs various amounts of processing for things like hands being detected or lost, changes in speed, hands opening and closing, and pose-based gestures, all of which produce events that subsystems can subscribe to and use as needed."

Grasping at Air

Although Head of the Order was built specifically for the Intel Perceptual Computing Challenge, the idea came to Jacob in 2006 while listening to a TED Talk about multi-touch before it had become commonplace. Being the first time he had seen anything multi-touch, Jacob was sparked to think past the mouse and keyboard and realized that the future would be full of touch and perceptual interfaces. He began to imagine content for them; unfortunately, the technology for non-touch inputs hadn't yet been invented.

Fast forward a few years, and with hand-tracking technology available, Jacob and Melissa were able to bring their idea to life, but not before Jacob had a chance to actually contribute to the development of perceptual computing technology itself. During graduate school, Jacob focused on making effective stroke-based gesture recognition. However, the actual recognition of shapes for Head of the Order is designed around touch, which led to one of the biggest challenges Jacob faced when building perceptual interfaces: Inputs that take place in mid-air don't have a clear start-and-stop event. As Jacob explained, "You don't have a touch-down and a lift-off like you have with a touch screen, or even a mouse." For Head of the Order, Jacob had to develop something that mimicked those events.

The Hand-Rendering System

Jacob created a hand-rendering system that can re-sample the low-resolution hand images provided by the Intel Perceptual Computing SDK and add them to the game's rendering stack at multiple depths through custom post-processing. "This allows me to display the real-time image stream to the user with various visually appealing effects." Figures 3.1 through 3.3—and the following code samples—illustrate how the hand-rendering system works. (The code samples show the renderer and segments of the hand-tracker class on which it relies.)


Figure 3.1:The raw, low-resolution image feed texture applied to the screen though a call to Unity's built-in GUI system. The code is from the Unity demo project provided by Intel.


Figure 3.2:This hand image is first blurred and then applied to the screen with a simple transparent diffuse shader through post processing done by the HandRender script (below). It has the advantage of looking very well-defined at higher resolutions but requires Unity Pro*.


Figure 3.3:A Transparent Cutout shader is applied, which renders clean edges. Multiple hand renders can also be done in a single scene, at configurable depths. For example, the hand images can render on top of any 3D content, but below 2D menus. This is how the cool ghost-trail effects were produced in the main menu of Head of the Order*.

Code sample 1. The HandRenderer script implements a post-processing shader that is a simple transparent diffuse. Only the initialization and key rendering portions of the script are shown here.


using UnityEngine;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Runtime.InteropServices;
    

public class RealSenseCameraConnection: MonoBehaviour 
{	
	private PXCUPipeline pp;

	[HideInInspector]
	public Texture2D handImage;

	/*Each Label of the Intel PXCUPipeline Label map get one of these colors. 
	In practice I've found indexes to be
	 0 the background 
	 1 primary hand
	 2 secondary hand
	There can be more labels but what they correspond to is not listed in the Intel Docs. */
	public Color32[] colors;

	public  PXCMGesture.GeoNode[][] handData;

	public List<HandRenderer> Renderers;

	private static RealSenseCameraConnection instance;
 
	public static RealSenseCameraConnection Instance
	{
		get
		{
			if (instance == null)
			{
				instance = (RealSenseCameraConnection)FindObjectOfType(typeof(RealSenseCameraConnection));
				if (instance == null)
					instance = new GameObject ("RealSenseCameraConnection").AddComponent<RealSenseCameraConnection> ();
			}
 
			return instance;
		}
	}

	void Awake()
	{
		if(Instance != null)
		{
			if(Instance != this)
			{
				Destroy(this);
			}
			else
			{
				DontDestroyOnLoad(gameObject);
			}
		}
	}

    void Start() 
    {
		pp=new PXCUPipeline();
		
		int[] size=new int[2]{320,240};
		if (pp.Init(PXCUPipeline.Mode.GESTURE)) 
		{
	        print("Conected to Camera");
		}
		else 
		{
			print("No camera Detected");
		}
		handImage=new Texture2D(size[0],size[1],TextureFormat.ARGB32,false);
		ZeroImage(handImage);
		handData=new PXCMGesture.GeoNode[2][];
		handData[0]=new PXCMGesture.GeoNode[9];
		handData[1]=new PXCMGesture.GeoNode[9];
    }
    
    void OnDisable() 
    {
		CloseCameraConnection();
    }

    void OnApplicationQuit()
    {
    	CloseCameraConnection();
    }

    void CloseCameraConnection()
    {
 		if (pp!=null)
		{
			pp.Close();
			pp.Dispose();			
		}    	
    }

	void Update() 
	{
		if (pp!=null) 
		{
			if (pp.AcquireFrame(false)) 
			{
				ProcessHandImage();
				pp.ReleaseFrame();
			}
		}
	}

	void ProcessHandImage()
	{
		if (QueryLabelMapAsColoredImage(handImage,colors))
		{
		 	handImage.Apply();
		}

		foreach(HandRenderer renderer in Renderers)
		{
			renderer.SetImage(handImage);
		}
	}

	public bool QueryLabelMapAsColoredImage(Texture2D text2d, Color32[] colors) 
	{
		if (text2d==null) return false;
		byte[] labelmap=new byte[text2d.width*text2d.height];
		int[] labels=new int[colors.Length];
		if (!pp.QueryLabelMap(labelmap,labels)) return false; 
		
	    Color32[] pixels=text2d.GetPixels32(0);
		for (int i=0;i<text2d.width*text2d.height;i++)
		{	
			bool colorSet = false;
			for(int j = 0; j < colors.Length; j++)
			{
				if(labelmap[i] == labels[j])
				{
					pixels[i]=colors[j];
					colorSet = true;
					break;
				}
			}
			if(!colorSet)
				pixels[i]=new Color32(0,0,0,0);			
		}
        text2d.SetPixels32 (pixels, 0);
		return true;
	}

	public Vector3 PXCMWorldTOUnityWorld(PXCMPoint3DF32 v)
	{
		return new Vector3(-v.x,v.z,-v.y);
	}

	public Vector3 MapCoordinates(PXCMPoint3DF32 pos1) 
	{
		Camera cam = Renderers[0].gameObject.GetComponent<Camera>();
		return MapCoordinates(pos1,cam,Vector3.zero);
	}

	public Vector3 MapCoordinates(PXCMPoint3DF32 pos1, Camera cam, Vector3 offset) 
	{
		Vector3 pos2=cam.ViewportToWorldPoint(new Vector3((float)(handImage.width-1-pos1.x)/handImage.width,
												(float)(handImage.height-1-pos1.y)/handImage.height,0));
		pos2.z=pos1.z;
		pos2 += offset;
		return pos2;
	}

	public Vector2 GetRawPoint(PXCMPoint3DF32 pos)
	{
		return new Vector2(pos.x,pos.y);
	}

	public Vector3 GetRawPoint3d(PXCMPoint3DF32 pos)
	{
		return new Vector3(pos.x,pos.y,pos.z);
	}

	private void ZeroImage(Texture2D image)
	{
		Color32[] pixels=image.GetPixels32(0);
		for (int x=0;x<image.width*image.height;x++) pixels[x]=new Color32(0,0,0,0);
	    image.SetPixels32(pixels, 0);

		image.Apply();
	}

	public static void AddRenderer(HandRenderer handrenderer)
	{
		Debug.Log("Added Hand Renderer");
		Instance.Renderers.Add(handrenderer);
	}

	public static void RemoveRenderer(HandRenderer handrenderer)
	{
		if(Instance != null)
		{
			if(!Instance.Renderers.Remove(handrenderer))
			{
				Debug.LogError("A request to remove a handrender was called but the render was not found");
			}
		}
	}
}



Code sample 2. This component gets attached to the camera (requires Unity Pro*). The HandRenderer script keeps the rendered hands appearing sharp through a range of resolutions and is used throughout most of the game.


using UnityEngine;
using System.Collections;

public class HandRenderer : MonoBehaviour 
{
	public bool blur = true;
	[HideInInspector]
	public Texture2D handImage;

	public Material HandMaterial;

	public int iterations = 3;

	public float blurSpread = 0.6f;

	public Shader blurShader = null;	
	static Material m_Material = null;

	public Vector2 textureScale = new Vector2(-1,-1);
	public Vector2 textureOffset= new Vector2(1,1);

	protected Material blurmaterial 
	{
		get 
		{
			if (m_Material == null) 
			{
				m_Material = new Material(blurShader);
				m_Material.hideFlags = HideFlags.DontSave;
			}
			return m_Material;
		} 
	}

	public void SetImage(Texture2D image)
	{
		handImage = image;
	}

    void OnDisable() 
    {
    	RealSenseCameraConnection.RemoveRenderer(this);
		if(m_Material) 
		{
			DestroyImmediate( m_Material );
		}
    }

    void OnEnable()
    {
    	RealSenseCameraConnection.AddRenderer(this);

    	
    }

	private void ZeroImage(Texture2D image)
	{
		Color32[] pixels=image.GetPixels32(0);
		for (int x=0;x<image.width*image.height;x++) pixels[x]=new Color32(255,255,255,128);
	    image.SetPixels32(pixels, 0);
		image.Apply();
	}


	public void FourTapCone (RenderTexture source, RenderTexture dest, int iteration)
	{
		float off = 0.5f + iteration*blurSpread;
		Graphics.BlitMultiTap (source, dest, blurmaterial,
			new Vector2(-off, -off),
			new Vector2(-off,  off),
			new Vector2( off,  off),
			new Vector2( off, -off));
	}
	
	void OnRenderImage (RenderTexture source, RenderTexture destination) 
	{
		if(handImage != null)
		{
			HandMaterial.SetTextureScale("_MainTex", textureScale);
			HandMaterial.SetTextureOffset("_MainTex", textureOffset);
			if(blur)
			{		
				RenderTexture buffer  = RenderTexture.GetTemporary(handImage.width, handImage.height, 0);
				RenderTexture buffer2  = RenderTexture.GetTemporary(handImage.width, handImage.height, 0);
				Graphics.Blit(handImage, buffer,blurmaterial);
				
				bool oddEven = true;
				for(int i = 0; i < iterations; i++)
				{
					if( oddEven )
						FourTapCone (buffer, buffer2, i);
					else
						FourTapCone (buffer2, buffer, i);
					oddEven = !oddEven;
				}
				if( oddEven )	
					Graphics.Blit(buffer, destination,HandMaterial);			
				else
					Graphics.Blit(buffer2, destination,HandMaterial);	

				RenderTexture.ReleaseTemporary(buffer);
				RenderTexture.ReleaseTemporary(buffer2);
			}
			else
				Graphics.Blit(handImage, destination,HandMaterial);
		}
	}
}



Once the hand is recognized as an input device, it's represented on-screen, and a series of hand movements now control the game. Raising one finger on either hand begins input, opening the whole hand stops it, swiping the finger draws shapes, and specific shapes correspond to specific spells. However, Jacob explained that those actions led to another challenge. "We had trouble getting people to realize that three steps were required to create a spell." So Jacob modified the app to factor in finger speed. "In the new version, if you're moving [your finger] fast enough, it's creating a trail; if you're moving slowly, there's no trail." With this new and more intuitive approach (Figure 4), players can go through the demo without any coaching.


Figure 4:By factoring finger speed into recognition logic, new players have an easier time learning to cast spells.

Gesture Sequencing

In addition to the perceptual interface that he developed, Jacob believes that the biggest innovation delivered with the Head of the Order is its gesture sequencing, which permits players to combine  multiple spells into a  super-spell (called "combo spells" in the game; see Figure 5). "I haven't seen many other systems like that," he said. "For instance, most of the current traditional gestures are pose-based; they're ad hoc and singular." In other words, others write one gesture that's invoked by a single, two-finger pose. "That is mapped to a singular action; whereas in Head of the Order, stringing gestures together can have meaning as a sequence," Jacob added. For example, a Head of the Order player can create and cast a single fireball (Figure 6) or combine it with a second fireball to make a larger fireball, or perhaps a ball of lava.


Figure 5:Head of the Order* permits individual spells to be combined into super-spells, giving players inordinate power over opponents, provided they can master the skill.

Developing the ability to combine spells involved dreaming up the possible scenarios of spell combinations and programming for each one individually. "The main gameplay interaction has a spell-caster component that "listens" to all the hand-tracker events and does all the higher-level, stroke-based gesture recognition and general spell-casting interaction," Jacob explained. He compared its functions to those of a language interpreter. "It works similar to the processing of programming languages, but instead of looking at tokens or computer-language words, it's looking at individual gestures, deciphering their syntax, and determining whether it's allowed in the situation."


Figure 6: Fireball tutorial.

This allows for many complex interactions and actually works if all those scenarios are handled correctly. Amazingly, this allows players to string gestures together, and the system then recognizes them and creates new gestures. "Once you can land some of the longer spells, it feels like you're able to do magic. And that's the feeling we're after."


Figure 7: What makes Head of the Order* unusual is the ability to combine spells. In this video, developer Jacob Pennock shows how it's done. 

Innovations and the Future

Having contributed in large part to its advancement, perceptual computing in Jacob's mind will someday be as pervasive as computers themselves—perhaps even more so. And perceptual computing will have numerous practical applications. "Several obvious applications exist for gesture in the dirty-hand scenario. This is when someone is cooking, or a doctor is performing surgery, or any instance where the user may need to interact with a device but is not able to physically touch it."

To Jacob, the future will be less like science fiction and more like wizardry. "While we do think that gestures are the future, I'm tired of the view where everybody sees the future as Minority Report or Star Trek," Jacob lamented. He hopes that hand waves will control household appliances in three or five years. Jacob believes that  gesturing  will feel more like Harry Potter than Iron Man. "The PlayStation* or Wii* controller is essentially like a magic wand, and I think there is potential for interesting user experiences if gestural interaction is supported at an operating-system level. For instance, it would be nice to be able to switch applications with a gesture. I'd like to be able to wave my hand and control everything. That's the dream. That's the future we would like to see."

Intel® Perceptual Computing Technology and Intel® RealSense™ Technology

In 2013, Head of the Order was developed using the new, intuitive natural user interfaces made possible with Intel® Perceptual Computing technology.  At CES 2014, Intel announced Intel® RealSense™ technology, a new name and brand for Intel Perceptual Computing.  In 2014 look for the new Intel® RealSense™ SDK as well as Ultrabook™ devices shipping with Intel® RealSense™ 3D cameras embedded in them.


Resources

While creating Head of the Order, Jacob and Melissa utilized Unity, Editor Auto Save*, PlayModePersist* v2.1, Spline Editor*, TCParticles* v.1.0.9, HOTween* v1.1.780, 2D Toolkit* v2.10, Ultimate FPS Camera* v1.4.3, and nHyperGlyph* v1.5 (Unicorn Forrest's custom stroke-based gesture recognizer).

 

1 This is a known issue with the 2013 version of the Intel® Perceptual Computing SDK. The 2014 version, when released, will differentiate between left and right hands.

 

Intel, the Intel logo, Intel RealSense, and Ultrabook are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2014. Intel Corporation. All rights reserved. 

 

Contest Winners Combine Augmented Reality with an Encyclopedia with ARPedia*

$
0
0

By Garret Romaine

 

The interfaces of tomorrow are already in labs and on test screens somewhere, waiting to turn into fully developed samples and demos. In fact, look no further than the winner of the Creative User Experience category in the Intel® Perceptual Computing Phase 2 Challenge, announced at CES 2014. Zhongqian Su and a group of fellow graduate students used the Intel® Perceptual Computing SDK and the Creative Interactive Gesture Camera Kit to combine augmented reality (AR) and a common encyclopedia into ARPedia*—Augmented Reality meets Wikipedia*. ARPedia is a new kind of knowledge base that users can unlock with gestures instead of keystrokes.

Six team members from Beijing University of Technology developed the application over two months, using a variety of tools. The team used Maya* 3D to create 3D models, relied on Unity* 3D to render 3D scenes and develop the application logic, then used the Intel Perceptual Computing SDK Unity 3D plug-in (included in the SDK) to pull it all together. Their demo combines 3D models and animated videos to create a new way of interacting with the virtual world. Using body movements, hand gestures, voice, and touch, the application encourages digital exploration in an unknown world, and the possibilities for future work are exciting.

All About Dinosaurs


Think of ARPedia as a story-making and experiencing game, with AR visual effects. While users enjoy a seamless interactive experience, a lot of technology went into creating even the simplest interactions. In a PC game, the common mouse and keyboard—or a touch screen—are the usual means of interfacing with the application. But none of these are used in ARPedia. In an AR application, a natural user interface is very important. ARPedia users are able to control the action with bare hand gestures and face movement, thanks to the Creative Senz3D* camera. Many interesting gestures are designed to help advance the game, such as grasping, waving, pointing, raising, and pressing. These gestures make the player the real controller of the game and the virtual world of dinosaurs.

Figure 1:ARPedia* is a combination of augmented reality and a wiki-based encyclopedia, using gestures to navigate the interface.

Team leader Zhongqian Su had used a tiny Tyrannosaurus rex character in his previous work creating educational applications, so he made that well-known dinosaur the star of his ARPedia app. Players use hand motions to reach out and pick up the tiny dinosaur image, then place it at various points on the screen. Depending on where they put the dinosaur, players can get information about the creature’s diet, habits, and other characteristics.

Figure 2:Users interact with a tiny Tyrannosaurus rex to learn about fossils, paleontology, and geology.

According to team member Liang Zhang, the team had coded an AR application for the education market before using this dinosaur’s 3D model. Although they had the basics of an application in place, they had to do a lot of work to be eligible for the contest. For example, their in-house camera used 3D technology, so they needed to rewrite that code (see Figure 3) to interface with the newer Creative Interactive Gesture Camera Kit. That also meant coming up to speed quickly on the Intel Perceptual Computing SDK.


bool isHandOpen(PXCMGesture.GeoNode[] data)
	{
		int n = 1;
		for(int i=1;i<6;i++)
		{
			if(data[i].body==PXCMGesture.GeoNode.Label.LABEL_ANY)
				continue;
			bool got = false;
			for(int j=0;j<i;j++)
			{
				if(data[j].body==PXCMGesture.GeoNode.Label.LABEL_ANY)
					continue;
				Vector3 dif = new Vector3();
				dif.x = data[j].positionWorld.x-data[i].positionWorld.x;
				dif.y = data[j].positionWorld.y-data[i].positionWorld.y;
				dif.z = data[j].positionWorld.z-data[i].positionWorld.z;
				if(dif.magnitude<1e-5)
					got = true;
			}
			if(got)
				continue;
			n++;
		}
		return (n>2);
	}


Figure 3:The ARPedia* team rewrote their camera code to accommodate the Creative Interactive Gesture Camera.

Fortunately, Zhang said, his company was keen on investing time and energy into learning new technologies. “We have been doing a lot of applications already,” he said. “We keep track of the new hardware and software improvements that we can use in our business. Before this contest, we used Microsoft Kinect* for its natural body interactions. When we found the camera, we were quite excited and wanted to try it. We thought this contest could give us a chance to prove our technical skills as well, so why not?”

Smart Choices Up Front


Because of the contest’s compressed time frame, the team had to come up to speed quickly on new technology. Zhang spent two weeks learning the Intel Perceptual Computing SDK, and then the team designed as many different interaction techniques as Zhang could think of.

At the same time, a scriptwriter began writing stories and possible scenarios the team could code. They met and discussed the options, with Zhang pointing out strengths and weaknesses based on his knowledge of the SDK. He knew enough about the technical details to make informed decisions, so the team felt comfortable selecting what he described as “…the best story and the most interesting and applicable interactions.”

Zhang said that one of the most important early decisions they made was to keep the player fully involved in the game. For example, in the egg-hatching sequence early on, the player has a godlike role while creating the earth, making it rain, casting sunlight, and so on. There are many gestures required as the player sets up and learns.

In another sequence, the player has to catch the dinosaur. Zhang set up the system so that a piece of meat falls on the player's hand, and the dinosaur comes to pick up the meat (Figure 4). That action keeps the player interacting with the dinosaur and builds involvement. “We want to always keep the player immersed and consistent with the virtual world,” he said.

Figure 4:Feeding the baby dinosaur keeps the user engaged and builds involvement.

However, going forward those plans will require more effort. The demo includes so many new hand gestures that users struggled. “When I talked with the people who were playing the game in Intel's booth at CES,” Zhang said, “I found they couldn't figure out how to play the game by themselves, because there are many levels with different gestures for each level. We learned that it wasn’t as intuitive as we had thought, and that the design must be more intuitive when we add new interactive methods. We will definitely keep that in mind in our next project.”

The ARPedia team introduced two key gestures in their entry. One is “two hands open” and the other is “one hand open, fingers outstretched.” The two-hands-open gesture, which they use to start the application, was a straightforward coding effort. But coding the second gesture took more work.

Figure 5:The team struggled to make sure the camera didn’t detect the wrist as a palm point.

“The original gesture of ‘hand-open’ was not very precise. Sometimes the wrist was detected as a palm point,” Zhang explained. “Then the fist was detected as one finger, and the system thought that meant openness, which was wrong. So we designed a new hand-open gesture that is recognized when at least two fingers are stretching out.” They then added text hints on the screen to guide the user through the additions (Figure 5).

The Intel® Perceptual Computing SDK


The ARPedia team used the Intel Perceptual Computing SDK 2013 and especially appreciated the ease of camera calibration, application debugging, and support for speech recognition, facial analysis, close-range depth tracking, and AR. It allows multiple perceptual computing applications to share input devices and contains a privacy notification to tell users when the RGB and depth cameras are turned on. The SDK was designed to easily add more usage modes, add new input hardware, support new game engines and customized algorithms, and support new programming languages.

The utilities include C/C++ components such as PXCUPipeline(C) and UtilPipeline(C++). These components are mainly used to set up and manage the pipeline sessions. The frameworks and session ports include ports for Unity 3D, processing, other frameworks and game engines, and ports for programming languages such as C# and Java*. The SDK interfaces include core framework APIs, I/O classes, and algorithms. The perceptual computing applications interact with the SDK through these three main functional blocks.

“The Intel [Perceptual Computing] SDK was quite helpful,” Zhang said. “We didn’t encounter any problems when we were developing this application. We were able to become productive in a very short amount of time.”

Intel® RealSense™ Technology


Developers around the world are learning more about Intel® RealSense™ technology. Announced at CES 2014, Intel RealSense technology is the new name and brand for what was formerly called Intel® Perceptual Computing technology. The intuitive new user interface has features such as gesture and voice, which Intel brought to the market in 2013. With Intel RealSense technology, users will have new, additional features, including scanning, modifying, printing, and sharing in 3D, plus major advances in AR interfaces. These new features will yield games and applications where users can naturally manipulate and play with scanned 3D objects using advanced hand- and finger-sensing technology.

Zhang has now seen directly what other developers are doing with AR technology. At CES 2014, he viewed several demos from around the world. While each demo was unique and sought to achieve different objectives, he was encouraged by the rapidly evolving 3D camera technology. “It is such a big deal to have hand-gesture detection within the SDK. People still can use the camera in a different way, but the basics are there for them. I suggest to developers that they do their homework with this technology and find capabilities to fully develop their ideas.”

With advanced hand-and-finger tracking, developers can put their users in situations where they can control devices with heightened precision, from the simplest commands to intricate 3D manipulations. Coupled with natural-language voice technology and accurate facial recognition, devices will get to know their users on a new level.

Depth sensing makes gaming feel more immersive, and accurate hand-and-finger tracking brings exceptional precision to any virtual adventure. Games become more immersive and fun. Using AR technology and finger sensing, developers will be able to blend the real world with the virtual world.

Zhang believes the coming Intel RealSense 3D camera will be uniquely suited to application scenarios he is familiar with. “From what I have heard, it’s going to be even better—more accurate, more features, more intuitive. We are looking forward to it. There will be 3D face tracking and other great features, too. It’s the first 3D camera for a laptop that can serve as a motion-sensing device,” he said, “but it’s different than a Kinect. It can cover as much area as an in-house 3D camera, too. I think the new Intel camera will be a better device for manufacturers to integrate into laptops and tablets. That’s very important as a micro user-interface device for portability as well. We will definitely develop a lot of programs in the future with this camera.”

Maya 3D


The ARPedia team used Maya 3D animation software to continually tweak their small, realistic model of the well-known Tyrannosaurus rex dinosaur. By building the right model—with realistic movements and fine, detailed colors—the basics were in place for the rest of the application.

Maya is the gold standard for creating 3D computer animation, modeling, simulation, rendering, and more. It’s a highly extensible production platform, supports next-generation display technology, boasts accelerated modeling workflows, and handles complex data. The team was new to 3D software, but they had some experience with Maya and were able to update and integrate their existing graphics easily. Zhang said the team spent extra time on the graphics. “We spent almost a month on designing and modifying the graphics to make everything look better and to improve the interaction method as well,” he said.

Unity 3D


The team chose the Unity engine as the foundation for their application. Unity is a powerful rendering engine used to create interactive 3D and 2D content. Known as both an application builder and a game development tool, the Unity toolset is known to be intuitive, easy to use, and reliably supports multi-platform development. It’s an ideal solution for beginners and veterans alike looking to develop simulations, casual and serious games, and applications for web, mobile, or console.

Zhang said the Unity decision was an easy one. “We developed all our AR applications using Unity, including this one,” he said. We know it and trust it to do the things we need.” He was able to import meshes as proprietary 3D application files from Maya quickly and easily, saving time and energy.

Information Today, Games Tomorrow


ARPedia has many interesting angles for future work. For starters, the team sees opportunities for games and other applications, using their work from the Intel Perceptual Computing Challenge as a foundation. “We’ve talked a lot with some interested parties,” Zhang said. “They want us to draw up this demo into a full, complete version as well. Hopefully, we can find a place in the market. We will add many more dinosaurs to the game and introduce a full knowledge of these dinosaurs to gain more interest. They are in an interesting environment, and we’ll design more interesting interactions around it.“

“We also plan to design a pet game where users can breed and raise their own virtual dinosaur. They’ll have their own specific collections, and they can show off with each other. We will make it a network game as well. We plan to do a lot more scenes for a new version.”

The team was very surprised to win, as they were not familiar with the work of other development teams around the world. “We didn’t know other people’s work. We work on our own things, and we don’t get much opportunity to see what others are doing,” Zhang said. Now they know where they fit, and they’re ready for more. “The contest gave us motivation to prove ourselves and a chance to compare and communicate with other developers. We are very thankful to Intel for this opportunity. We now know more about the primary technologies around the world, and we have more confidence in developing augmented reality applications in the future.”

Resources


Intel® Developer Zone
Intel® Perceptual Computing Challenge
Intel® RealSense™ Technology
Intel® Perceptual Computing SDK
Check out the compatibility guide in the perceptual computing documentation to ensure that your existing applications can take advantage of the Intel® RealSense™ 3D Camera.
The Intel® Perceptual Computing SDK 2013 R7 release notes.
Maya* software overview
Unity*

 

Intel, the Intel logo, and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2014. Intel Corporation. All rights reserved. 

CyberLink and Intel Use Collaborative Design to Create MediaStory*

$
0
0

By Dominic Milano

Touch-optimized for Windows* 8 and designed from the ground up to help virtually anyone tell their personal stories quickly and easily, CyberLink MediaStory* software greatly streamlines the process of organizing and accessing the mountains of pictures and videos stored across phones, cameras, tablets, PCs, and the cloud to assemble professional-looking videos that can be shared in minutes.


Cyberlink MediaStory: Innovating the Storytelling Experience.

MediaStory grew from the collaboration between CyberLink and Intel. The two companies' long history of working together has been well documented by Intel® Software Adrenaline magazine. Most of those earlier stories focused on how Intel and CyberLink engineers teamed up to ensure that CyberLink's consumer multimedia content creation and playback applications were coded and optimized to take advantage of the latest Intel® processor features such as multi-threading, Intel® Quick Sync Video hardware-accelerated media processing, and more. This time, the story begins well before a single line of code was written.


CyberLink MediaStory* lets users create videos to tell their stories using a more streamlined
UI than PowerDirector*.

The Story Behind MediaStory

In creating MediaStory, CyberLink and Intel approached the collaboration in a completely new way. "Complex video-editing solutions have been available for many years," Louis Chen, CyberLink's VP of Business Development, said. "We wanted a new user-centric approach to innovate in how we could bring the storytelling software to life." For two intense days in February 2013 executives from both companies along with a variety of stakeholders—engineers, product designers, user experience specialists, marketing people, among others—assembled at CyberLink's offices in Taipei to utilize the Design Thinking process to shape MediaStory. They interviewed end users to get a deeper understanding of their pain points and comfort zones, reviewed market data, and brainstormed possible solutions to address and delight the target users' needs.


Personnel from Intel and CyberLink brainstorming in Taipei.

In the end, they zeroed in on a specific market need by first identifying a challenge: The explosion of mobile devices capable of producing great-looking photos and HD video was inundating consumers with too much content. As Chen put it, "We realized that people are overwhelmed with media. They want (need) to easily create great-looking photos and videos, and keep track of it all on whatever devices they have at hand." With that in mind, the two tech companies kicked into high gear.


User pain points and comfort zones were noted.

Wanting to bring a solution to market as quickly as possible, they targeted a Q3 release date—which significantly shortened CyberLink's usual development cycle. The MediaStory design document was drawn up in less than two months, a process that normally takes four to six months. Taking MediaStory from idea to its first deployable iteration took just four months. "That was unprecedented for CyberLink," Chen said. During that time, iterative testing through numerous focus groups put the nascent software through its paces, and they shared their feedback with the design team so the team could further refine and add features.


Solutions were sketched.

Storytelling Software

According to CyberLink, MediaStory falls into a new category: Storytelling software. MediaStory helps users find their personal content and select the most interesting video clips or pictures based on their preferences and personal history. Then it compiles a 30-second (or longer) video "story" that can be easily shared on social media or video-sharing sites.

To help identify people in pictures and simplify the tagging process, facial-recognition technology was used. To help the software determine whether a photo or video clip belongs in a particular story, geo-tagging came into play. And to create context-driven stories automatically, MediaStory accesses events listed in the user's Google* and/or Outlook* calendars, matching them with the time stamps and date stamps in photo and video files.

Describing its geo-location service model, Chen explained that "When people search their photos, they're typically looking for shots from a specific place—'find my pictures from Japan.'" So CyberLink decided to build its own, in-house service. "We needed something simpler than a geo-service that's used for navigation—something simpler, but smarter."

CyberLink tailored its new location service to the MediaStory experience. "The precision and the definition of location is fuzzier than needed for navigation," Chen said. They used three levels of abstraction: country, city, and point-of-interest, leaving out details such as street name and house number.
The same fuzzy precision was required for tapping into calendar services. "More and more, people are using online cloud calendars, so by tapping into Google and Outlook, we could gather intelligence about events. We needed to use fuzzy precision to account for the fact that people aren't always on time to meetings, parties, or dinner," continued Chen.

Chen was quick to point out that, "All of the magic happens in the background without the user even knowing how it happens." The software plus service uses metadata to automatically cluster events in various dimensions and retrieves content accordingly. "We have to thank the mobile revolution for giving us all this content and the ability to derive intelligence from it so we can assemble meaningful, context-driven stories," said Chen.

Touch-based Usability

From its inception, CyberLink and Intel wanted to develop a new UI for MediaStory. One that, as Chen put it, "…didn't scare the user away from using the product." Video-editing applications, even some that are renowned for being consumer-friendly, such as CyberLink's own PowerDirector*, have been known to intimidate the uninitiated. "We wanted fewer buttons, more content, and fewer pop-up menus asking for data to be input," said Chen. The goal was to give users immediate results. If any refinements were needed, the visual feedback had to be immediate.

"Entry-level users aren't always able to express what they want to achieve until they see the results of what they're doing," Chen said. With that in mind, CyberLink and Intel set out to create a touch-optimized interface that dynamically scales to display more content as more screen real estate becomes available.
Touch-optimized interaction proved to be challenging. "Designing for both touch and a keyboard and mouse wasn't trivial," Chen explained. The usability team went through five design iterations over a period of two months to ensure that the usage fit both types of interaction and tested the iterations with focus groups. "How you select content from a library is very different when you're using a mouse versus gesture in Windows 8," he said. "We had to smartly detect how users were choosing to interact with the software—with a mouse or touch—and add UI elements to accommodate both."

Less Isn't Just More, It's Harder

The most difficult aspect of designing MediaStory was fine-tuning the trade-off between usability and features: what to expose to the user and what to handle under the hood. "We wanted to handle a complex task in a simple manner. That meant our application had to be very intelligent," said Chen. The goal was to create something that was different from a conventional video-creation product. Product designs are often driven by feature comparison charts—something that consumers often use to make buying decisions. "That old way of thinking didn't apply," said Chen, who for years was CyberLink's marketing director. "Mass-market consumers aren't just using their PCs to create content anymore, they're using mobile devices, Ultrabook™ devices, tablets, and 2-in-1s. To win them over, we had to create an app that looks simple on the outside but is incredibly sophisticated on the inside. That was our biggest challenge."


CyberLink PowerDirector* is a full-featured, consumer-friendly video-editing application, but its
UI is relatively complex.

Windows Portability

From a development standpoint, Windows 8 still offers the largest user base when it comes to storing personal memories. "We wanted to offer the experience directly on the most popular platform," Chen said. "That platform is Windows."  In addition, Microsoft Windows 8 touch/gesture APIs gave the development team some needed flexibility. "We could code once and deploy the software across a variety of form factors such as Ultrabook devices and 2-in-1s." That said, CyberLink is investigating the idea of bringing the MediaStory experience to more mobile platforms—specifically Google Android* devices, including tablets with Intel Inside®.

Intel® Software Development Tools in Action

MediaStory's media pipeline, like most CyberLink multimedia applications, is optimized for 4th generation Intel® Core™ processors. With its years of experience optimizing code for Intel® architecture, it didn't take CyberLink long to integrate its existing video engine to deliver what Chen described as "a quick, responsive software experience."

Intel® VTune™ Performance Analyzer and Intel® Threading Building Blocks helped the team improve launch time and boost playback performance. They also used Battery Life Analyzer to monitor software and hardware activities that affect power management and battery life. "These tools can easily pinpoint potential trouble spots and bottlenecks," said Chen. "They help us improve and measure performance, and save us significant amounts of time when identifying issues related to performance and power optimization."

The Intel® Media SDK proved to be another tremendous timesaver. "It hides complex hardware design logic and keeps software architecture clean," said Chen. "Using the Intel Media SDK, we don't need to worry about each processor generation's hardware compatibility issues. Whenever Intel improves the quality or performance of a codec, the improvement is delivered to the end user in a simple driver update." CyberLink products benefit directly from the enhancement without having to write any new code. "With Intel Quick Sync Video technology, we're seeing a 10x performance boost. For users, that translates to lightning fast media conversions, especially for those who need to convert video files to formats required by social media sites or video-sharing sites." MediaStory is also able to leverage the Intel® Common Connectivity Framework (CCF), giving users the ability to wirelessly connect their smartphones, tablets, desktop PCs, and other gadgets.

Collaboration Through Ideation

Thinking back, Chen said that shifting the collaboration with Intel to the start of the product design process "…helped CyberLink stay focused and prioritized solving real user needs." This new model of collaboration shortened the planning cycle and introduced more iterations by gathering user feedback during multiple focus-group validation tests that were performed earlier and more frequently during the cycle.
"Our collaboration with Intel has reached a new level," said Chen. "It helped us prioritize end-user needs, shape the right technology solutions, and create better user experiences."

Conservative Morphological Anti-Aliasing (CMAA) - Update

$
0
0

This article was taken from a blog posting on IDZ by Leigh Davies at Intel Corp, highlighting work and results completed by Leigh and his colleague Filip Strugar in the new AA technique being referred to as Conservative Morphological Anti-Aliasing. Below is the content of the blog along with the available code project download for your examination

This sample presents a new, image-based, post-processing antialiasing technique referred to as Conservative Morphological Anti-Aliasing and can be downloaded here[SF1] . The technique was originally developed by Filip Strugar at Intel for use in GRID2 by Codemasters®, to offer a high performance alternative to traditional multi sample anti-aliasing (MSAA) while addressing artistic concerns with existing post-processing antialiasing techniques. The sample allows CMAA to be compared with several popular post processing techniques together with hardware MSAA in a real time rendered scene as well as to an existing image. The scene is rendered using a simple HDR technique and includes basic animation to allow the user to compare how the different techniques cope with temporal artifacts in addition to static portions of the image.

 

Figure 1: CMAA Sample using HDR and animating geometry

MSAA has long been used to reduce aliasing in computer games and significantly improve their visual appearance. Basic MSAA works by running the pixel shader once per pixel but running the coverage and occlusion tests at higher than normal resolution, typically 2x through 8x, and then merging the results together. While significantly faster than super sampling it still represents a significant additional cost compared to no anti-aliasing and is difficult to implement with certain techniques– for example, this sample uses a custom fullscreen pass needed to get correct post HDR tone-mapping MSAA resolve [Humus article in ShaderX6] [6].[SF2] 

An alternative to MSAA is to use an image-based post-process anti-aliasing (PPAA), which became practical with GPU ports of Morphological antialiasing (MLAA) [Reshetov 2009] [1] and further developments such as “Enhanced Subpixel Morphological Antialiasing“(SMAA) [2] and NVidia’s “Fast approximate anti-aliasing” (FXAA) [3]. Compared to MSAA, these PPAA techniques are easy to implement and work in scenarios where MSAA does not (such as deferred lighting and other non-geometry based aliasing), but lack adequate sub-pixel accuracy and are less temporally stable. They also cause perceptible blurring of textures and text, since it is difficult for edge-detection algorithms to distinguish between intentional colour discontinuities and unwanted aliasing caused by imperfect rendering.
Currently two of the most popular PPAA algorithms are:

 

1.     SMAA is an algorithm based on MLAA but with a number of innovations and improvements, and with a number of quality/performance presets. It implements advanced pattern recognition and local contrast adaptation, and the more expensive variations use temporal super-sampling to reduce temporal instability and improve quality. The SMAA algorithm version referenced in this document is the latest public code v2.7.

2.     FXAA is a much faster effect. However, FXAA has simpler colour discontinuity shape detection, causing substantial (frequently unwanted) image blurring. It also has fairly limited kernel size by default, so it doesn't sufficiently anti-alias longer edge shapes, while increasing the kernel size impacts performance significantly. FXAA algorithm version referenced in this document is v3.8 unless otherwise specified (newest v3.11 was added to the sample in the last minute, in addition to 3.8).[SF3] 

 

In this sample we introduce a new technique called Conservative Morphological Anti-Aliasing (CMAA). CMAA addresses two requirements that are currently not addressed by existing techniques:

1.     To run efficiently on low-medium range GPU hardware, such as integrated GPUs, while providing a quality anti-aliasing solution. A budget under 3ms was used as a guide when developing the technique at a resolution of 1600x900 running on a 15watt, 4th Generation Intel® Core™ processor.

2.     To be minimally invasive so it can be acceptable as a replacement to 2xMSAA in a wide range of applications, including worst case scenarios such as text, repeating patterns, certain geometries (power lines, mesh fences, foliage), and moving images.

 

CMAA is positioned between FXAA and SMAA 1x in computation cost (0.9-1.2x the cost of default FXAA 3.11 and 0.45-0.75x the cost of SMAA 1x) on Intel 4th Generation HD Graphics hardware and above[SF4] . Compared to FXAA, CMAA provides significantly better image quality and temporal stability as it correctly handles edge lines up to 64 pixels long and is based on an algorithm that only handles symmetrical discontinuities in order to avoid unwanted blurring (thus being more conservative). When compared to SMAA 1x  it will provide less anti-aliasing as it handles fewer shape types but also causes less blurring, shape distortion, and has more temporal stability (is less affected by small frame-to-frame image changes).

CMAA has four basic logical steps (not necessarily matching the order in the implementation):

  1. Image analysis for colour discontinuities (afterwards stored in a local compressed 'edge' buffer). The method used is not unique to CMAA.
  2. Extracting locally dominant edges with a small kernel. (Unique variation of existing algorithms).
  3. Handling of simple shapes.
  4. Handling of symmetrical long edge shape. (Unique take on the original MLAA shape handling algorithm.)

 

Step 1: Image analysis for colour discontinuities (edges)

Edge detection is performed by comparing neighboring colours using:
 

  • Sum of per-channel Luma-weighted colour difference in sRGB colour space (default)
  • Luminance value calculated from the input in sRGB colour space (faster)
  • Weighted Euclidean distance [6] (highest quality, slowest)
     

An edge (discontinuity) exists if the difference of neighboring pixel values is above a preset threshold (which is determined empirically).

 

Figure 2: Showing results of a default edge detection algorithm:
dot( abs(colorA.rgb-colorB.rgb), float3(0.2126,0.7152,0.0722)) > fThreshold

 

Step 2: Locally dominant edge detection (or, non-dominant edge pruning)
This step serves a similar function to “local contrast adaptation” in SMAA and “local contrast
test” in FXAA but with a smaller kernel. For each edge detected in Step 1, colour delta value above threshold (dEc) is compared to that of neighboring 12 edges (dEn):

 

Figure 3: The edge remains an edge if its dEc > lerp( average(dEn), max(dEn), ldeFactor), where ldeFactor is empirically chosen (defaults to 0.35).

 

This smaller local adaptation kernel size is somewhat less efficient at increasing effective edge detection range. However, it is more effective at preventing blurring of small shapes (such as text), reducing local shape interference from less noticeable edges, avoiding some of the pitfalls of large kernels (visible kernel-sized transition from un-blurred to blurry), and has better performance.

 

Step 3: Handling of simple shapes
Edges detected in step 1, and refined in step 2, are used to make assumptions about the shape of the underlying edge before rasterization (virtual shape). For simple shape handling, all pixels are analyzed for existence of 2, 3 and 4 edge aliasing shapes, and colour transfer is applied to match the virtual shape colour coverage and achieve the local anti-aliasing effect (Figure 4). While this colour transfer is not always symmetrical, the amount of shape distortion is minimized to sub-pixel size.

 

Figure 4: 2-edge, 3-edge and 4-edge shapes; reconstructed virtual shape shown in yellow; black arrows showing anti-aliasing colour blending direction

 

Step 4: Handling of symmetrical long edge shape.
4a. Each edge-bearing pixel is analyzed for a potential Z-shape, representing the center of the virtual shape rasterization pixel step (which is mostly a triangle edge). Criterion used for this detection is illustrated in Figure 5. Four Z-shape orientations (with 90° difference) are handled.

 

Figure 5: Z-shape detection criterion is true if edges illustrated blue are present while red ones are not; green arrows show subsequent edge tracing

 

4b. For each detected Z-shape, the length of the edge to the left/right is determined by tracing the horizontal (for two horizontal Z-shapes) edges on both sides, and stopping if none are present on either side, or a vertical edge is encountered (Figure 6).
4c. The edge length from the previous step is used to reconstruct the location of the virtual shape edge and apply colour transfer (to both sides of the Z-shape) to match coverage that it would have at each pixel. This step overrides any anti-aliasing done in Step 3 on the same pixels.

 

 

Figure 6: Long edge (Z-shapes): edge length tracing marked blue, with Z shape at center; reconstructed virtual shape shown in yellow; black arrows showing anti-aliasing colour blending direction

 

The inherent symmetry of this approach better preserves overall image average colour and shape, ignores borderline cases, and better preserves original shapes while also being more temporally stable.  One pixel (or few pixels) changes do not induce drastic colour transfer and shape modification (when compared to SMAA 1X, FXAA 3.8/3.11[SF5]  and older MLAA-based techniques).

 

 

Figure 7: Typical detection and handling of symmetrical Z shapes (circled in yellow)

 

 

Figure 8: All CMAA shapes: original image, edge detection and final anti-aliased image (with/without edges)

 

The sample UI allows a direct comparison between several anti-aliasing techniques that are selectable from within a drop down menu along with several debug features. All the techniques can be viewed in high detail using a zoom box that can be enabled from the UI and positioned by using a right mouse click. For both CMAA and SAA additional debug information is shown that highlights the actual edges that have been detected by the algorithm; slider allows you to adjust the threshold used for edge detection; In the case of CMAA both the edge threshold and the non-dominant edge removal threshold can be modified.

 

The effect on performance caused by modifying the threshold can be viewed if the application is run with vsync disabled. GPU performance metrics are displayed in the upper left hand corner of the application showing the overall cost of rendering the scene and the time taken in the post processing anti-aliasing code. When viewing the stats, additional debug information such as the zoombox and the edge view should be disabled as they both lower performance by forcing sub-optimal code paths to be used. When viewing CMAA in the zoombox with ”Show Edges” enabled the zoombox will also animate showing the effects of applying CMAA to the image, this doesn’t affect the rest of the display.

 

For precise profiling of each technique, “Run benchmark for: …” button can be used to activate automatic multiple frame sampling and comparison, with results (cost delta compared to the base non-AA version) displayed in a message box after the run is finished.[SF6] 

 

 

Figure 9: Debug information including zoombox and edge detection overlay.

 

In addition to showing the effect of the various post-processing effects on the real-time scene the application allows a static image to be loaded and used as the source for the effect; the currently supported file format is PNG. A synthetic sample image is provided in the samples media directory (Figure 9). Attempting to run the sample with 2x and 4x MSAA will have no effect as these would normally affect the image source but CMAA, SMAA, FXAA and SAA can all be applied to the image. This feature quickly allows anyone considering using any of the post processing techniques in the sample to load images taken from their own application and experiment with the various threshold parameters.

 

The following figures show a number of quality and performance comparisons:[SF7] 

 

 

Figure 10: Performance impact (frames per second) of an older implementation of CMAA and MSAA on a Consumer Ultra-Low Voltage (CULV) i7-4610Y CPU with HD Graphics 4200 GPU, in GRID2 by Codemasters®[SF8] 

 

 

Figure 11: Cost and scaling of CMAA 1.3 and other post-process anti-aliasing effects measured using the sample from the article, applied 10 screenshots from various games, averaged, on Intel 4th generation CPUs (HD 5000 and GD5200 graphics) and AMD R9-290X, using different resolutions for different GPUs[SF9] 

 

Figure 12: Quality comparison for text and image anti-aliasing. CMAA 1.3 manages high quality anti-aliasing of the image while preserving text and without over blurring the geometry.

 

 

Figure 13: Quality comparison for synthetic shapes and game scenes with high frequency textures. CMAA preserves original high frequency texture data better than FXAA 3.11 and SMAA 1x, while still applying adequate anti-aliasing (although below the quality level of SMAA 1x).

 

 

Figure 14: Quality comparison in a game 3D scene. CMAA preserves most of the original high frequency texture data and original geometry shapes, while still applying adequate anti-aliasing.[SF10] 

 

 

Figure 15: Impact of various techniques on GUI elements. Any post-process AA should always be applied before GUI to avoid unwanted blurring, but there are cases when this is unavoidable (such as in a driver implementation or if the GUI is part of the 3D scene)[SF11] 

 

 

References:
[1] MLAA, Reshetov, A. (2009). Morphological antialiasing. In HPG’09: Proceedings of the
Conference on High Performance Graphics 2009, pages 109–116, New York,NY, USA. ACM
[2] SMAA, Jorge Jimenez and Jose I. Echevarria and Tiago Sousa and Diego Gutierrez 2012,
JIMENEZ2012_SMAA,
"SMAA: Enhanced Morphological Antialiasing", Computer Graphics Forum (Proc.
EUROGRAPHICS 2012)
[3] NVidia "Fast approximate anti-aliasing" (FXAA), Timothy Lottes (2011),
http://developer.download.nvidia.com/assets/gamedev/files/sdk/11/FXAA_WhitePaper.pdf
[4] Venceslas Biri, Adrien Herubel, and Stephane Deverly. 2010. Practical morphological
antialiasing on the GPU. In ACM SIGGRAPH 2010 Talks (SIGGRAPH '10). ACM, New York, NY,
USA, , Article 45 , 1 pages. DOI=10.1145/1837026.1837085
http://doi.acm.org/10.1145/1837026.1837085 http://igm.univ-mlv.fr/~biri/mlaa-gpu/MLAAGPU.pdf

[6] "Post-tonemapping resolve for high quality HDR antialiasing in D3D10" in ShaderX6[SF12] 

 

 


 [SF1]Need to update for new version 1.3

 [SF2]This is a new addition to the sample

 [SF3]Self-explanatory

 [SF4]New optimisations, new numbers

 [SF5]Added 3.11

 [SF6]New functionality

 [SF7]The whole section below was replaced by new data/images since latest changes made the old ones completely obsolete

 [SF8]The reason I point this out is that I think it is important to say that this is a low power part so that no one can draw wrong conclusion from the numbers, which are by the way pretty awesome (for 11.5W part at 1600x900).

 [SF9]This is very useful to know for anyone thinking about implementing it into their own engine

 [SF10]All of these comparisons are new – old one was obsolete, and I made a number of additional examples.

 [SF11]Not sure if this figure is needed

 [SF12]Not sure if we want to rearrange numbers in order of appearance in the text

Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters

$
0
0

The Intel® Direct Sparse Solver for Cluster (Intel® CPardiso) is a powerful tool set for solving system of linear equations with sparse matrix of millions rows/columns size.

Intel® CPardiso provides an advanced implementation of the modern algorithms and could be considerate as expansion of Intel MKL Pardiso on cluster computations.

For more experienced users, Intel® CPardiso offers insight into the solvers sufficient to finer tune them for better performance. The Intel® CPardiso is available from MKL v.11.2.beta.

The attached world document contains a training material on Intel® Cluster Pardiso which includes Product Overview, Technical Requirements and Frequently Asked Questions about CPardiso.


Hot Shots* Warps Conventional Touch Gaming

$
0
0

By William Van Winkle

Downloads


Hot Shots* Warps Conventional Touch Gaming [PDF 656KB]

When Apple brought the world of touch to smartphones, developers leaped into the new opportunity. A similar opportunity is hitting the industry again now in the form of touch screen PCs. The Ultrabook™ device movement may have ignited touch’s potential, but the trend is now spreading to other notebook types, desktop displays, and all-in-one (AIO) designs. The field is wide open, and Intel is spurring developers’ creativity around the world with contests, some of which contain categories ripe for touch technology innovation. Recently, the Intel® App Innovation Contest in partnership with the Intel® Developer Zone sent developers a challenge to find the next big target in touch, and >Adam Hill may have blasted it with his award-winning Hot Shots* spaceship game (Figure 1).


Figure 1: Adam Hill surveys the main interface of Hot Shots* on the Lenovo Horizon* AIO.

Origins, Form, and Function


Like many developers, Hill has pursued programming professionally while carrying on a fondness for many classic video games, including Asteroids* (1979), Micro Machines* (1991), and Geometry Wars* (2003). He progressed through writing spreadsheet macros as an administrator to receiving on-the-job training as a software programmer, but always with an eye on entertainment and one day pursuing game creation. By the time Intel ran its first App Innovation Contest in 2012, Hill felt qualified to take a stab at it. That project, a tunnel game called Celerity*, gave Hill several valuable insights. Hill knew he had to pare down his ideas and keep them focused. Most of all, he knew that whatever he did needed to appeal to himself; after all, he would be the first user, and if the game wasn’t fun for him, others were unlikely to enjoy it either.

When the 2013 Intel App Innovation Contest opened in July, Hill went to work. Intel announced that competitors would receive a Lenovo Horizon AIO, based on a 27-inch touch screen, and coordinators specifically requested “a compelling use case” for the platform. Hill resolved to give them that and more. Logging in hundreds of hours over six months, the resulting game looked very much like a mashup of the Asteroids and Geometry Wars games from earlier years. Brimming with stunning graphics, gravity physics, and a highly competitive player-versus-player co-op design (Figure 2), Hill admits that the aim of the game focused on one thing: chaos.


Figure 2: Hot Shots* clearly favors touch-based input, but it is also compatible with gamepad, keyboard, and even Lenovo joystick input.

Hill wanted to create a game that would revive the old school face-to-face element of game sharing rather than competing remotely over the Internet. Players’ spacecraft materialize on the screen à la Asteroids, streaming exhaust behind and firing glowing projectiles ahead. A star in the center of the screen exerts gravity on all objects. (Players can also spawn a second sun on the screen.) A sort of wire-mesh grid rests under the entire play area, warping with the gravity of objects above it and giving the game a 3D feel lacking in older arcade favorites. Up to nine players can use the on-screen Touch controls located around the screen’s edge, including Lenovo’s touch screen-compatible joysticks and strikers, and additional players can join through a keyboard and Microsoft Xbox*-compatible controllers. Gameplay was designed for sessions of any duration. Players can simply jump into the madness and destruction, enjoy the havoc, and then leave without disrupting other players.

As with most modern touch screen panels, Lenovo’s display is 10-point touch-capable. Hill situated his player control zones around the screen’s edge, but he gave one of these edge regions over to game controls, hence having a maximum of nine touch players instead of 10 (Figure 3). The game can be played with one finger, but the experience tends to be better with two, which caps the touch player count at five.


Figure 3: Hill could fit up to 10 touch control regions around the edge of the display. One of these ended up being used for in-game menu controls.

“I used on-screen icon buttons because I need complete strangers and non-English speakers to understand how to control the app at a glance,” said Hill. “Many of the controls are binary—on or off—and a throwback to retro games like Asteroids. Also, I’m dealing with a large number of independent touches. When a touch comes down, for simplicity’s sake, I want to understand which user out of several created it, and that’s easier when they’re touching within a zone. From a technical perspective, it’s largely a case of knowing your API, designing out gestural conflicts, and considering the idea of visual feedback.”

To round out the game, Hill sought the services of freelance musician Patrick Ytting, who delivered a stellar soundtrack. The stunning visuals can be attributed to gameplay designer and graphic artist Thomas Tamblyn. Both men were willing to work for a slice of the intellectual property ownership and revenue, which was the only way Hill, as a lone programmer, could feasibly complete the project.

Development Challenges Addressed


Particle System and Graphics Load

Not surprisingly, the graphics load behind Hot Shots could become significant. With two players, the GPU load wasn’t bad, but with as few as three players, the particles and warp grid disturbances, plus the sun’s crepuscular rays and other effects, could seize a formidable amount of processing resources. This load compounded if the underlying engine wasn’t coded properly. As a result, Hill offloaded as much of the graphics processing as possible to the GPU. In the quest for fluid performance that would scale across both integrated and discrete graphics processors, Hill rewrote his particle engine from scratch three times.

“As soon as I knew something was an obstacle, I’d go to town on it,” said Hill. “I’d rework it until it was one of the fastest things in the system rather than the slowest. The usual method for this was offloading to the GPU, although for particles the trick was to use a simple pool of particles in a boring array rather than a fancy linked list or queue. The optimization decisions you make are always specific to the program at hand.”

Hill’s initial particle system was based on a class holding a LinkedList of particle objects. He thought that he would create new particles, add them to the LinkedList, and be able to remove dead particles from anywhere in the list without needing to re-index. Unfortunately, Hill had overlooked the performance impact of rapidly creating and destroying a large number of new instances in C#, where at a certain rate of instantiations and destructions, the garbage collector would struggle under the burden and cause noticeable slowdowns.

To diagnose this performance trough, Hill began swapping software components. As soon as he eliminated particles, he found the culprit. However, he wanted the aesthetic look that large numbers of particles gave Hot Shots, so he began browsing through the source code of different particle engines for inspiration. He discovered that others generally preferred an object pool approach rather than a linked list. Also, a FIFO queue of “dead”/free particles is maintained. That is, one has a fixed length array, and particle objects are created only once. Particles are logically activated and deactivated as needed by moving a reference to them into and out of the queue—recycling rather than recreating. As a result, the garbage collector never has to kick in because nothing is being created or destroyed after the initial semi-instantaneous setup.

“The big take-away lesson is that during intensive mid-game processing, you don’t want to be using the ‘new’ keyword anywhere,” said Hill. “Load everything you can once at the beginning of the level, then recycle like crazy.”

Hill’s second particle engine was orders of magnitude faster and therefore more stable than its predecessor. In effect, more players could create more particles before any noticeable change. However, this newer engine still suffered inefficiency in managing the queue for the list of free particles. While they weren't being recreated each time, this still felt inefficient. Particles such as a bullet trail or thrust trail have very short lives, and they fade and lose importance with age. Hill realized that he didn’t have to recycle particles only known to be free. Rather than iterate through the list until finding a free particle, he could simply use the next particle in the iteration. This allowed him to keep a single index value rather than manage an entire queue, resulting in much less work. By the time the index wrapped around to an already-used particle, that particle was likely either to be dead or at least mostly transparent. The user would probably never notice a sudden particle death, especially in the midst of other visual distraction. Essentially, Hill realized that he “…could get away with murder” in this context.

“The optimization for the v3 was not just technical,” he added, “but relied on an epiphany relating to the obscure relationship between the recycling algorithm and the ability of a user to perceive mostly transparent particles. In your game, you’ll have to weigh this. Maybe there’s another cheeky trick you can pull instead?”

The following code illustrates Hill’s particle engine before and after reworking:

// Version 2 Init
public override void Initialize()
{
    this.particles = new FastParticle[MaxParticles];
    this.freeParticles = new Queue<FastParticle>(MaxParticles);

    for (int i = 0; i < particles.Length; i++)
    {
        particles[i] = new FastParticle();
        freeParticles.Enqueue(particles[i]);
    }

    base.Initialize();
}

// Version 3 Init - can't get much more simple than this
public override void Initialize()
{
    this.particles = new FastParticle[totalParticles];

    for (int i = 0; i < particles.Length; i++)
    {
        particles[i] = new FastParticle();
    }

    base.Initialize();
}

// Version 3 Update, with Version 2's redundant code commented out
public override void Update(GameTime gameTime)
{
    foreach (var p in particles)
    {
        if (p.Active)
        {
            this.UpdateParticle(gameTime, p);

// No need to Queue up free particles any more
// Just wrap round and re-use regardless
// Recycle First. Recycle Hard. NO MERCY (or indeed, queuing).
            //if (!p.Active)
            //{
            //    this.freeParticles.Enqueue(p);
            //}
        }
    }

    base.Update(gameTime);
}

Speech Recognition

Hill found the time to experiment with non-essential game elements (Figure 4). Probably the most noticeable is the game’s “profanity police,” which uses speech recognition to scold players who swear by having hostile objects swarm toward them. (Remember that play was meant to focus on group settings, potentially with younger players mixed in.) However, the implementation proved quite tricky. The underlying engine assumes that all speech is meaningful, but the purpose here was to ignore most speech and focus on only known profanity. Hill was trying to shoehorn an API into doing the opposite of its original purpose. Predictably, he got a fair number of false positives.


Figure 4: Hot Shots* gameplay can get rather heated. Hill implemented a “profanity police” feature to encourage players to keep the experience appropriate for all ages.

“In response, I went contrary to the practice of providing the recognition database with a very limited set of words and gave it a colossal 20,000-word dictionary of words to ignore,” Hill explained. “This way, if it thought it had heard one of my 30 or so key swear words to the exclusion of 20,000 other random words, then there was a much higher probability that swearing had occurred. At the time I thought of it, I’d not heard of anyone else doing this, but I’ve since seen this type of profanity management employed by at least one EA Sports game.”

Touch Development

Hill enjoys working with XNA 4, but coding for touch screens on Windows* 8 can be a challenge as the built-in TouchPanel class tends to be under-supported given the industry move toward MonoGame*. Luckily, Hill already had a solution for this issue from his prior game, Celerity*. “The solution is to include Windows7.Multitouch.dll,” noted Hill. “Don't be put off by the name; it's still very useful for Windows 8 solutions. This provides access to lower-level touch handling, which is great for speed and reliability. While it supports a wrapped version of the WM_GESTURE API, I much prefer working directly with the WM_TOUCH API.”

Access to the TouchUp, TouchDown, and TouchMove events can be accomplished with the following necessary TouchHandler:

var touchHandler = Factory.CreateHandler<TouchHandler>(game.Window.Handle);

Hill also encountered an obstacle with “sticky touches.” In some cases, the lifting of a finger doesn’t result in a TouchUp event. Obviously, this can pose a serious problem when, for instance, stopping a ship from firing or adding additional users. The solution proved to be “touch culling.”

With culling, a countdown gets reset every time new information is received from a touch point. Effectively, a touch times out. At first glance this might seem like it would yield occasions when a depressed button would simply stop working, but in practice the sensor resolution of the touch screen is sufficient to register tiny finger movements, even when the user believes he or she is holding still. These small motions register as TouchMove events and restart the countdown.

Lessons Learned, Advice Given


Hot Shots won the All-In-One (AIO) Category of the Intel App Innovation Contest 2013 for several good reasons. Clearly, the game makes excellent use of the large touch screen interface, but it does so in a way that fosters group involvement and bonding. Hot Shots carries forward some classic game nostalgia while adding several layers of graphical wizardry that both make the app modern while simultaneously leveraging so much of the horsepower that current-gen processors can offer. The visuals and soundtrack are drop-dead gorgeous, and, perhaps most of all, Hot Shots makes it intuitively clear how tomorrow’s AIO systems can serve a whole new set of uses flat on their back as well as standing upright.

Sometimes, there’s more to coding than instructions. To nail an app perfectly, sometimes the programmer needs to step back and think about paradigms, especially when new hardware capabilities are involved. “We have to accept that the old interaction metaphors of buttons and scroll bars are very often suboptimal interactions,” said Hill. “The swipe gesture is so simple, but so powerful. Think about how much easier it is to arbitrarily drag your fingertip approximately over a surface rather than aim and tap. Successful gestures are often very rough and approximate. The less accurate the user needs to be, the faster they can go. Do your own analysis.” For further reinforcement, Hill recommends that interaction designers begin with Josh Clark’s “Buttons are a Hack” talk.

Sometimes, experience will lead a programmer into bucking conventional wisdom under certain circumstances. For example, Hill notes that for competitions such as the Intel App Innovation Contest 2013, he doesn’t use unit tests. Many programmers consider unit tests essential in order to assure quality. However, while he acknowledges the need for unit testing in professional code, “…for me personally, in a competition like this, I find it a net win to ‘just code.’”

Adam Hill sees a bright future ahead for touch screens and haptic technologies. He is now working to start a company to publish Hot Shots in the Windows 8 store and help promote table computing. Lenovo’s Horizon proves that the form factor can now be sold at affordable levels, which means that this is the time to dive in and explore just how far one’s imagination can take the platform.

Resources Used


For developing Hot Shots, Adam Hill used Intel’s Ultrabook™ and Tablet Windows* 8 Sensors Development Guide on the Intel® Developer Zone. He also used Visual Studio*, Git/Bitbucket*, Adobe Creative Suite*, and Stack Overflow. Hill highly recommends MonoGame for a balance between lower-level graphics and productivity. Finally, Hill used Farseer* Physics to handle collisions, although he customized all of the game’s gravity.

Additional Articles


Intel, the Intel logo, Intel Core, and Ultrabook are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2014. Intel Corporation. All rights reserved.

How Scribblify Delivers a Cross-Platform GUI with Node.js* and the Chromium* Embedded Framework

$
0
0

By Edward J. Correia

 

Introduction

 

For Wisconsin-born developer Matt Pilz, the third time’s the charm. Pilz is founder of LinkedPIXEL, and his creativity and hard work have found favor with judges in a trio of consecutive developer challenges that Intel conducted. Most recently, his Scribblify drawing app with its 10-point touch capability and Google Chrome*-based GUI took the grand prize in the Intel® App Innovation Contest 2013 using resources from the Intel® Developer Zone. In spring 2013, Pilz won the Intel® Perceptual Computing Challenge with Magic Doodle Pad, a drawing app that’s controlled with hand gestures. And in 2012, he was recognized for Ballastic, an action game that’s optimized for Ultrabook™ devices.

Like his prior contest experiences, the idea for Scribblify (Figure 1) pre-dated the contest, but Scribblify actually began life running on handheld devices. As one of 200 contestants to receive a Lenovo Horizon* All-in-One (AIO) PC from contest sponsor Lenovo, Pilz was excited at the prospect of growing his doodling app to a full-size screen. “When I read about the Lenovo Horizon All-in-One with its 1920 HD resolution and huge 27-inch screen with touch capabilities, I realized how fun it would be to create a gigantic canvas-sized version of this app so you can draw on a party table or another large surface,” he said. “That's where my vision took me.” To make that happen, Pilz would have to modify the program to work well on a screen that’s four to six times wider than the original version and accept input from 10 simultaneous touch points instead of one.

Figure 1: Scribblify lets users create everything from scribbles to serious art.

 

Origins of Scribblify

 

Pilz first began dabbling in mobile-app development in 2010, about the time he got hold of his first Apple iPod touch* mobile device. LinkedPIXEL released an early version of Scribblify the following year. “The first interface was designed for 480x320 resolution. Everything was made minimalistic to support the very limited hardware at the time,” he recalled. Scribblify for iOS* hit the App Store in January 2011. “It actually took off and has been doing better every month,” said Pilz. So about a year ago he followed up with a Google Android* version.

Pilz describes Scribblify as a casual drawing and doodling program with some special capabilities. “What sets it apart are its brushes and color effects,” he said. The brushes (Figure 2), which he designed himself, incorporate textures not seen in many drawing apps of this type. “I like to believe most of them are pretty unique,” he said, but he quickly added that Scribblify isn’t meant to compete with high-end illustration apps. “The iPad equivalent of an app like Photoshop* has a pretty steep learning curve,” said Pilz. “Then you also have some limited drawing apps that come with just a round brush and a few options. There wasn't anything up to the level I envisioned for Scribblify, which is made for all ages to pick up and start using.”

 

Figure 2: The Scribblify app has 62 original brushes.

 

Challenges

 

Pilz’s first major obstacle was to decide which technology he would use to build a version of Scribblify for Microsoft Windows*. He started with TGC’s App Game Kit* (AGK), the engine he used to make Ballastic. “That allowed me to create a rough prototype, [but] it didn't support a lot of the functionality that I wanted in this app,” he said. To qualify for the contest, apps must be built for Lenovo’s Aura* multi-user interface and must not rely on major portions of the underlying operating system. One stipulation of the guidelines was that developers could not use any native Windows dialog boxes. “It would've been easy through my app to just have users browse a standard file dialog box to save their artwork or open up a different image,” when interfacing with the file system, Pilz said, for example. “But that would have taken away from the experience that Intel was trying to achieve…to make each app almost a sandbox.”

Pilz’s remedy was to use Node.js*, an event-driven, server-side JavaScript* framework for building network services for V8, the JavaScript engine inside the Chrome browser. According to Node creator Ryan Dahl, the Node framework makes I/O non-blocking by using an event loop and facilitates asynchronous file I/O, thanks to a built-in event pool.

“One main purpose of Node.js is to develop web apps and use them natively on a desktop regardless of operating system,” Pilz said. “You don't have to rely on having Windows installed; Node.js is the back-end that powers the entire app.” Using the framework and a library called Node-Webkit, Pilz saved and loaded art files easily using an interface and gallery system he custom built using standard web technologies. The application runs on top of the Chromium* Embedded Framework (CEF), which provides the same powerful HTML and JavaScript renderer as used by Chrome.

How Scribblify Saves and Deletes Images
When the Scribblify app first loads, the user’s public Documents directory is retrieved via Node.js:

// Initialize Node.js file system
var fs = require('fs');

// Get User's Documents Folder (Win 7+)
var galleryPath = process.env['USERPROFILE'] + "\Documents\Scribblify\";
galleryPath = galleryPath.replace(/\/g, "/"); // Use forward slashes

// See if folder exists, otherwise create it
if(!fs.existsSync(galleryPath))
{
    fs.mkdir(galleryPath);  
}


When the user requests the image be saved, the HTML5 canvas is converted into image data:

var saveImg = canvas.toDataURL();

A unique filename is generated based on timestamp, and Node.js saves the image data to the system.
// Write image data buffer to file
var data = saveImg.replace(/^, "");
var buf = new Buffer(data, 'base64');
var fName = galleryPath + destFilename;
fs.writeFile(fName, buf, function(err) {
    if(err)
    {
        console.log(err);
    }
});


If the user wishes to delete an image from the gallery, Node.js achieves this simply:

fs.unlinkSync(galleryPath + targetImg);


Getting a list of existing files to display in the gallery is slightly more complex, as I also check to ensure that only PNG files are retrieved (not directories or other files that may exist).

var fileList = fs.readdirSync(currentPath);
var result = [];
for (var i in fileList) {
    var currentFileFull = currentPath + '/' + fileList[i];
    var currentFile = fileList[i];
    var fileExt = currentFile.split(".").pop().toUpperCase();
    var stats = fs.statSync(currentFileFull);
    if (stats.isFile() && fileExt == "PNG") {
       result.push(currentFile);
    }
    else if (stats.isDirectory()) {
         // Directory instead...
    }
}


Once the list is generated, the image paths stored in the array can be loaded using standard HTML and JavaScript techniques. This is used for both the gallery display and for importing a particular piece of art back into the canvas.

While Pilz was able to prototype most of the app using technologies already familiar to him, including HTML5’s native canvas element for the front-end and Node.js on the back, some of the app’s most groundbreaking features were just plain breaking. “I was trying to do something that hasn't often been attempted in a web app—creating an entire creative drawing app with complex textured brushes instead of procedurally generated shapes,” he said.

The program initially worked fine, but Pilz soon discovered a problem. “When using more than a few touch points or drawing too rapidly, the app would lag for a few seconds due to the performance of the canvas element, and then it would go on,” he said of his new app’s responsiveness. His search for a solution this time led him to Cocos2d-x, an open source, cross-platform framework available under the MIT License which recently added WebGL support to its JavaScript branch.

Cocos2d-Javascript allowed Pilz to port his code much easier to WebGL, which he said performed much better and was well supported by the video card inside his AIO development system. “Now, Cocos2d-Javascript renders the drawings to the screen using WebGL…and it runs at 60 frames a second. That optimized the experience and made it very fun to use,” Pilz said. Scribblify uses standard HTML for the interface elements and user interactions; Pilz describes it as a hybrid of standard HTML and JavaScript, mixed in with those other engines including Node.js for back-end processing and Cocos2d-Javascript for simplified WebGL rendering.

For help with general issues, Pilz turned to Stack Overflow, a free question-and-answer site for development pros and hobbyists. “I went there a few times when researching a particular task or if I got stuck on something,” he said. “It’s always a quick and easy reference to see what other people recommend.” For help specific to Cocos2d-x and Node.js, Pilz referenced APIs for those products quite extensively; and, of course, he used CodeProject* for not only this project, but also for others through the years. “I go there whenever I want to do a deeper investigation of certain programming topics,” he said. “They had very helpful discussion forums related to this competition and were able to answer questions quickly.”

Pilz also identified a bug with the way Chromium handles touches in conjunction with the Lenovo Horizon driver set, which caused some touches not to be released, especially on rapid use. His solution was to build a small “check-and-balance system” that issues an alert if an excessive number of touches have been recorded erroneously. “It will automatically save the user’s art to the gallery and alert them that they should restart the app,” he said. Then it presents guidelines on how to prevent such a problem in the future. “If the user touches it with their whole palm [for example], it would display the alert, ‘You shouldn't do it that way, just use your fingertips.’ It would [also] recommend that you restart,” he said. “I submitted this as a formal bug report to the Chromium developers during the competition.”

var canvas = document.getElementById('canvas');
canvas.addEventListener('touchstart', touchStartFunction, false);
canvas.addEventListener('touchend', touchEndFunction, false);
canvas.addEventListener('touchcancel', touchCancelFunction, false);
canvas.addEventListener('touchmove', touchMoveFunction, false);
canvas.addEventListener('touchleave', touchLeaveFunction, false);


The Chromium*Embedded Framework (on which Google Chrome* is built) has supported native touch controls for some time and is a breeze to implement. Various forms of touch can be bound using simple event handlers, including touchstart, touchend, touchcancel, touchmove, and touchleave.

 

Time constraints of the contest—and limitations of the framework with which Pilz was working—caused him to delay or forego implementation of some Scribblify functions. “One big feature I wanted to add was a full undo and redo system, but I was in unfamiliar territory on how to make that work without bogging down the system,” Pilz said. So for now, there's an eraser and a button for clearing the canvas. He also wanted to include a few more drawing modes and more brushes.

Debugging

 

Among the benefits of developing a web-based application is that most modern browsers come with comprehensive debugging tools, which Pilz said proved invaluable during the entire development process.

“Perhaps the most beneficial to me is that the Chrome developer tools support a full-featured JavaScript debugger. This allowed me to create breakpoints and walk through the code step-by-step and to monitor specific variables and properties throughout, especially when encountering problems,” he said. At the same time, Pilz was able to analyze specific Document Object Model (DOM) elements and update styles on-the-fly to more rapidly achieve the desired result. Pilz also found helpful the developer resources provided by Lenovo on lenovodev.com, particularly the Integration Requirements for Horizon Apps (login required).

Multi-User

 

Scribblify for the AIO supports 10 simultaneous touch points, which means that as many as 10 users can be drawing on the canvas at once. Given the time constraints of the contest, a top consideration for Pilz was the ease with which he could integrate multi-touch capabilities. “When I made the final determination to develop a web-based application using HTML5 and related technologies, I knew that the complexities of handling multi-touch could be alleviated if I used a web foundation that already had desktop multi-touch support,” he said.

Thankfully, the CEF has had such support on its desktop and mobile releases. With the entire application being powered through the CEF, Pilz said it was easy to store and retrieve touch data for as many touches as the AIO would support.

Additional articles

 

More Innovation

 

Until you’ve used Scribblify or seen one of LinkedPIXEL’s demo videos, it’s difficult to appreciate how truly unusual its brushes are. “I would consider the brushes themselves to be the most highly innovative aspect of Scribblify,” said Pilz. “All of the designs and brushes were my own invention, [and] each one has many different properties that control how it looks and how it behaves as you draw along the screen. They're pretty unique.”

Pilz is also proud of the user interface, which he built mostly from scratch in just six weeks. “To be able to draw with all ten fingers at once using 10-point multi-touch, or with one person on one side and one on the other drawing at once, I consider that an innovative experience,” he said. Pilz admits to taking a few shortcuts, where possible. “There were some libraries I used, like for the advanced color picker,” he said. “Some MIT licensed libraries sped up that process so I didn't have to reinvent the wheel.”

Scribblify also provides an innovative way to pick and blend colors. “With the many different color effects that you can add, you don't just pick a base color,” Pilz said. “You can actually blend two colors together or use a plasma color or even add what I call ‘color variance,’ which mixes the shades of light and dark as you draw to create more natural-looking artwork.”

Recently, Pilz has been working hard to convert Scribblify into a native Windows 8 app so that it can be distributed through the Windows Store—including through the specialized Lenovo Windows 8 Store. To address the requirements in the most streamlined fashion, the forthcoming version of Scribblify will be powered by Internet Explorer* 11 instead of Chromium and will use new methods for handling file input/output since Node.js is not supported. The Windows Store version of Scribblify is expected to be released in the April to May 2014 time frame.

About the Contest

 

The Intel App Innovation Contest 2013 called on app builders from Code Project, Habrahabr, ThinkDigit, and CSDN, to submit app ideas for Windows-based tablets and AIO PCs across the finance, healthcare, retail, education, gaming, and entertainment industries. Winning ideas were given a development system—one of 200 tablets and 300 AIO PCs from sponsor Lenovo—and six weeks to turn their idea into a demo app. The winning apps are listed here on the Intel® Developer Zone.

Of the 500 ideas, 276 demo apps were created, five of which were selected as the best in each of the six categories—making 30 winners in all. Judging criteria covered how well each app demonstrated the capabilities of its host platform, how well it addressed its market segment, and whether it solved the needs of the intended end user. Of course, winning apps must look professional and employ good graphics, perform well, be glitch-free, and take advantage of sensors, where appropriate.

Cloud Rendering Sample

$
0
0

Get live sports scores with SofaScore LiveScore on Intel® Atom™ Tablets for Windows* 8.1

$
0
0

Get up to the minute for any sports results and scores in all major sports leagues, tournaments and events

The SofaScore LiveScore app is now available for download on Intel® Atom™ tablets for Windows* 8.1, providing live sports results from all major sports leagues throughout the world.

SofaScore LiveScore by SofaScore.com provides up to the minute live sports scores and results for major sports leagues, tournaments and events. It offers information for world football (soccer) leagues, basketball, hockey, tennis, baseball, rugby, American and Australian football, handball, volleyball, water polo, snooker and darts.

This fun app features live coverage from more than 500 worldwide football (soccer) leagues, cups and tournaments with live updated results, statistics, league tables and fixtures. Users can create lists of favorite teams and games that allow them to receive notifications and video highlights on Intel Atom tablets for Windows* 8.1.

SofaScore was able to optimize SofaScore LiveScore for the powerful capabilities of Intel Atom tablets for Windows* 8.1 due to the company’s status as an Intel® Software Partner in the Intel® Developer Zone. SofaScore developers accessed guidance, code and other support from Intel during the development process.

“Sports fans now have access to current sports scores, stats and highlights throughout the world in the palm of their hand,” said Zlatko Hrkac, CEO of SofaScore. “Thanks to help from the Intel Developer Zone, we created an engaging, convenient and easy way to get the sports information people crave. This is first release of SofaScore LiveScore for Windows 8.1, make sure to update an app regularly because we’ll constantly update it with many new features.”

LiveScore SofaScore is now available to download at the Windows* store: http://apps.microsoft.com/windows/en-us/app/livescore-sofascore/bb2792cc-1b2a-4793-a8d6-a8e9b85c2322

About SofaScore

SofaScore is an independent development company that focuses on providing comprehensive, worldwide sports information. For more information, visit: www.sofascore.com

About the Intel Developer Zone

The Intel Developer Zone supports independent developers and software companies of all sizes and skill levels with technical communities, go-to-market resources and business opportunities. To learn more about becoming an Intel® Software Partner, join the Intel Developer Zone.

 

Intel, the Intel logo and Intel Inside are trademarks of Intel Corporation in the U.S. and/or other countries.


*Other names and brands may be claimed as the property of others. Copyright © 2014 Intel Corporation. All rights reserved.  

New Dieta e Saúde App Provides Weight Loss Tools on Intel® Atom™ Tablets for Windows* 8.1

$
0
0

Numerous diet and weight loss tips are provided in the most popular online weight loss program of Brazil

Dieta e Saúde, the most popular online weight loss program in Brazil, is now available for Intel® Atom™ tablets for Windows* 8.1. Translated as “Diet and Health” in English, Dieta e Saúde by Minha Vida, had already helped more than one million people lose weight.

This diet and fitness app helps users manage their intake of food through a point system, tracks weight loss over time and suggests daily exercises to perform. It also features a daily health feed, weight history, nutrition information for food, healthy recipes, health notifications and more.

When developing Dieta e Saúde, Minha Vida leveraged its status as an Intel® Software Partner in the Intel® Developer Zone to access Intel tools, code and support to optimize this app for the capabilities of Intel Atom tablets for Windows* 8.1.  

“Having Intel as a partner during the development process helped us create a powerful health app that works expertly with new Windows* tablets,” said Alexandre Tarifa, CTO at Minha Vida.

Dieta e Saúde is available for immediate download from the Windows* Store: http://apps.microsoft.com/windows/pt-br/app/dieta-e-saude/f7fe939d-b573-4427-bc6a-a76a5fdd9748

About Minha Vida

Minha Vida operates the largest portal of health and well-being of Brazil. The company’s purpose is to improve the quality of life of people through healthy living. For more information visit: www.minhavida.com.br

About the Intel Developer Zone

The Intel Developer Zone supports developers and software companies of all sizes and skill levels with technical communities, go-to-market resources and business opportunities. To learn more about becoming an Intel Software Partner, join the Intel Developer Zone.

Intel, the Intel logo and Intel Inside are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2014 Intel Corporation. All rights reserved.

Debugging Intel® Xeon Phi™ Applications on Windows* Host

$
0
0

Contents

Introduction

Intel® Xeon Phi™ coprocessor is a product based on the Intel® Many Integrated Core Architecture (Intel® MIC). Intel® offers a debug solution for this architecture that can debug applications running on an Intel® Xeon Phi™ coprocessor.

There are many reasons for the need of a debug solution for Intel® MIC. Some of the most important ones are the following:

  • Developing native Intel® MIC applications is as easy as for IA-32 or Intel® 64 hosts. In most cases they just need to be cross-compiled (/Qmic).
    Yet, Intel® MIC Architecture is different to host architecture. Those differences could unveil existing issues. Also, incorrect tuning for Intel® MIC could introduce new issues (e.g. alignment of data, can an application handle more than hundreds of threads?, efficient memory consumption?, etc.)
  • Developing offload enabled applications induces more complexity as host and coprocessor share workload.
  • General lower level analysis, tracing execution paths, learning the instruction set of Intel® MIC Architecture, …

Debug Solution for Intel® MIC

For Windows* host, Intel offers a debug solution, the Intel® Debugger Extension for Intel® MIC Architecture Applications. It supports debugging offload enabled application as well as native Intel® MIC applications running on the Intel® Xeon Phi™ coprocessor.

How to get it?

To obtain Intel’s debug solution for Intel® MIC Architecture on Windows* host, you need the following:

Debug Solution as Integration

Debug solution from Intel® based on GNU* GDB 7.5:

  • Full integration into Microsoft Visual Studio*, no command line version needed
  • Available with Intel® Composer XE 2013 SP1 and later


Why integration into Microsoft Visual Studio*?

  • Microsoft Visual Studio* is established IDE on Windows* host
  • Integration reuses existing usability and features
  • Fortran support added with Intel® Fortran Composer XE

Components Required

The following components are required to develop and debug for Intel® MIC Architecture:

  • Intel® Xeon Phi™ coprocessor
  • Windows* Server 2008 RC2, Windows* 7 or later
  • Microsoft Visual Studio* 2012 or later
    Support for Microsoft Visual Studio* 2013 was added with Intel® Composer XE 2013 SP1 Update 1.
  • Intel® MPSS 3.1 or later
  • C/C++ development:
    Intel® C++ Composer XE 2013 SP1 for Windows* or later
  • Fortran development:
    Intel® Fortran Composer XE 2013 SP1 for Windows* or later

Configure & Test

It is crucial to make sure that the coprocessor setup is correctly working. Otherwise the debugger might not be fully functional.

Setup Intel® MPSS:

  • Follow Intel® MPSS readme-windows.pdf for setup
  • Verify that the Intel® Xeon Phi™ coprocessor is running

Before debugging applications with offload extensions:

  • Use official examples from:
    C:\Program Files (x86)\Intel\Composer XE 2013 SP1\Samples\en_US
  • Verify that offloading code works

Prerequisite for Debugging

Debugger integration for Intel® MIC Architecture only works when debug information is being available:

  • Compile in debug mode with at least the following option set:
    /Zi (compiler) and /DEBUG (linker)
  • Optional: Unoptimized code (/Od) makes debugging easier
    (due to removed/optimized away temporaries, etc.)
    Visual Studio* Project Properties (Debug Information &amp;amp;amp;amp;amp; Optimization)

Applications can only be debugged in 64 bit

  • Set platform to x64
  • Verify that /MACHINE:x64 (linker) is set!
    Visual Studio* Project Properties (Machine)

Debugging Applications with Offload Extension

Start Microsoft Visual Studio* IDE and open or create an Intel® Xeon Phi™ project with offload extensions. Examples can be found in the Samples directory of Intel® Composer XE, that is:

C:\Program Files (x86)\Intel\Composer XE 2013 SP1\Samples\en_US

  • C++\mic_samples.zip    or
  • Fortran\mic_samples.zip

We’ll use intro_SampleC from the official C++ examples in the following.

Compile the project with Intel® C++/Fortran Compiler.

Characteristics of Debugging

  • Set breakpoints in code (during or before debug session):
    • In code mixed for host and coprocessor
    • Debugger integration automatically dispatches between host/coprocessor
  • Run control is the same as for native applications:
    • Run/Continue
    • Stop/Interrupt
    • etc.
  • Offloaded code stops execution (offloading thread) on host
  • Offloaded code is executed on coprocessor in another thread
  • IDE shows host/coprocessor information at the same time:
    • Breakpoints
    • Threads
    • Processes/Modules
    • etc.
  • Multiple coprocessors are supported:
    • Data shown is mixed:
      Keep in mind the different processes and address spaces
    • No further configuration needed:
      Debug as you go!

Setting Breakpoints

Debugging Applications with Offload Extension - Setting Breakpoints

Note the mixed breakpoints here:
The ones set in the normal code (not offloaded) apply to the host. Breakpoints on offloaded code apply to the respective coprocessor(s) only.
The Breakpoints window shows all breakpoints (host & coprocessor(s)).

Start Debugging

Start debugging as usual via menu (shown) or <F5> key:
Debugging Applications with Offload Extension - Start Debugging

While debugging, continue till you reach a set breakpoint in offloaded code to debug the coprocessor code.

Thread Information

Debugging Applications with Offload Extension - Thread Information

Information of host and coprocessor(s) is mixed. In the example above, the threads window shows two processes with their threads. One process comes from the host, which does the offload. The other one is the process hosting and executing the offloaded code, one for each coprocessor.

Debugging Native Coprocessor Applications

Pre-Requisites

Create a native Intel® Xeon Phi™ application and transfer & execute the application to the coprocessor target:

  • Use micnativeloadex.exe provided by Intel® MPSS for an application C:\Temp\mic-examples\bin\myApp, e.g.:

    > "C:\Program Files\Intel\MPSS\sdk\coi\tools\micnativeloadex\micnativeloadex.exe""C:\Temp\mic-examples\bin\myApp" -d 0
     
  • Option –d 0 specifies the first device (zero based) in case there are multiple coprocessors per system
  • This application is executed directly after transfer

Using micnativeloadex.exe also takes care about dependencies (i.e. libraries) and transfers them, too.

Other ways to transfer and execute native applications are also possible (but more complex):

  • SSH/SCP
  • NFS
  • FTP
  • etc.

Debugging native applications with Start Visual Studio* IDE is only possible via Attach to Process…:

  • micnativeloadex.exe has been used to transfer and execute the native application
  • Make sure the application waits till attached, e.g. by:
    
    		static int lockit = 1;
    
    		while(lockit) { sleep(1); }
    
    		
  • After having attached, set lockit to 0 and continue.
  • No Visual Studio* solution/project is required.

Only one coprocessor at a time can be debugged this way.

Configuration

Open the options via TOOLS/Options… menu:Debugging Native Coprocessor Applications - Configuration

It tells the debugger extension where to find the binary and sources. This needs to be changed every time a different coprocessor native application is being debugged.

The entry solib-search-path directories works the same as for the analogous GNU* GDB command. It allows to map paths from the build system to the host system running the debugger.

The entry Host Cache Directory is used for caching symbol files. It can speed up lookup for big sized applications.

Attach

Open the options via TOOLS/Attach to Process… menu:Debugging Native Coprocessor Applications - Attach to Process...

Specify the Intel(R) Debugger Extension for Intel(R) MIC Architecture. Set the IP and port the GDBServer is running on; the default port of the GDB-Server is 2000, so use that.

After a short delay the processes of the coprocessor card are listed. Select one to attach.

Note:
Checkbox Show processes from all users does not have a function for the coprocessor as user accounts cannot be mapped from host to target and vice versa (Linux* vs. Windows*).

Intel® GPA Platform Analyzer: how can I benefit from the new features?

$
0
0

If you used the previous version of Intel® GPA Platform Analyzer, your transition to using the new version of the tool should be very straightforward. However, note the following differences:

  • Tracing enabled by default: You do not need to select this option from the Intel GPA Monitor Preferences configuration pane to enable this feature.
  • Tracing configuration options minimized: To configure tracing, you need to set the Tracing Duration option only, via Intel GPA Monitor> Profiles >Tracing. This option replaces the Trace Buffer Size option. The options for collecting DirectX* data, hardware context, and internal data are no longer available since the new version of Intel GPA Platform Analyzer automatically collects the data necessary for creating and viewing all the data charts.
  • Android* support: Besides traditional support for the Windows*-based applications, Intel GPA Platform Analyzer now supports Android applications running on Intel Atom™ processors with PowerVR* Graphics.
  • GPU software queue: On Windows, the Platform View of the new version of Intel GPA Platform Analyzer provides data on GPU usage and displays a software queue for GPU engines at each moment of time.
  • No application startup capturing: The option for capturing the application startup is no longer available. Instead, select a sufficiently large trace duration (such as 5 seconds) and use an Intel GPA trigger (such as Application Time> 3) to approximate this feature. However, the new data collection methods being used in Intel GPA Platform Analyzer cannot capture trace data beginning with the exact start of an application.
  • No support for tracing Media and OpenCL™ kernels data: The development team is considering adding these features in a future release of the product.
  • No Summary view: Consider using the new Platform Analysis view that provides additional features and should provide a sufficient representation of the application execution on GPU and CPU cores. A separate Summary view is available with the Intel VTune™ Amplifier.

Overall, the new version of Intel GPA Platform Analyzer provides an advanced platform analysis mechanism that can more quickly isolate and identify task synchronization issues across the CPUs and GPU. For details about running the tool on different operating systems, see Intel GPA Release Notes and Online Help

If you have any feedback, please post your comments and suggestions on the Intel GPA Support Forum.

For more information about the Intel GPA, see the Intel GPA Home Page or the Intel INDE Home Page.


Touch Response Measurement, Analysis, and Optimization for Windows* Applications

$
0
0

By Tom Pantels, Sheng Guo, Rajshree Chabukswar

Download as PDF

Introduction

User experience (UX) is a game-changer for products today. While other features are important in the functionality of a device, none can overcome a perceived or actual lack of response and ease of use through touch. Since Windows 8 was introduced, touch gestures have become a primary way of interacting with Windows-based Ultrabook™ devices, tablets, and phones. The usefulness of these systems is based partly on how touch improves the user experience (UX), and by extension, how the quality of the UX is impacted by the speed and responsiveness of the touch interface.

Touch response time is the latency from when users begin to move their fingers to perform touch gestures to the point at which the application provides a visual update that they expect from their performed gestures. Touch response is measured in very small time samples (100-500 ms range). It is important to identify and optimize poor performing touch response areas to achieve the best UX.

Touch enabling for Windows applications is a whole new ballgame—from measurement to analysis and optimization. An assumption that is not always true is that if an application is always updating a scene, it will quickly respond to the user’s touch gesture. This paper discusses ways to measure touch response, analysis methods for touch optimization on Intel® Architecture (IA), and the combination of tools needed to understand issues related to touch response.

In addition to touch response time, computer resource utilization and battery life are very important factors impacting the UX. This paper describes two applications that demonstrate problems such as poor or no touch response times and high energy consumption, both of which are critical to app performance and UX. We then discuss how to optimize these applications to resolve these problems.

Why is Implementing a Good Touch UX Important?

Ultrabook devices and tablets are seeing growing adoption by the market, and touch is one of the essential pillars of delivering a good user experience (UX). Touch-capable devices are everywhere, from phones, tablets, Ultrabooks to All-In-Ones (AIOs), which are desktop PCs with the tower integrated into the back of the display. Gartner, an IT research company, expects that by 2015 more than 50% of the PCs purchased for users under age 15 will have touch screens [1].

With Windows 8, Microsoft established the Windows Store, which acts as a central touch-driven hub for developers to publish their applications and for consumers to purchase them. If an application has noticeable delay to the user’s touch gesture, the application may be rated poorly, which will, no doubt, affect its sales.

Figure 1. Role of Software in Touch Stack

Figure 1 shows the critical role software and drivers have in touch responsiveness where 3 out of the 5 layers belong to the software stack (making up ~60%). Poor touch responsiveness is usually an issue in the software stack.

Touch Handling

Windows desktop applications have three ways to support touch input and gestures. To fully understand the usage of these touch APIs, please read "About Messages and Message Queues" [7]. The WM_GESTURE and WM_TOUCH messages are both backward compatible with Windows 7, whereas the WM_POINTER messages are not. Each message has advantages and disadvantages. WM_POINTER is the simplest to implement but provides the least amount of control. WM_TOUCH requires the most amount of code but allows for very fine-tuned control, and WM_GESTURE is in the middle. Many approaches can be used for supporting touch in Windows Store apps, from the GestureRecognizer class that handles touch inputs and manipulations to using the DirectManipulation APIs that were introduced in Windows 8.1.

The Energy Savings Bonus from Optimizing Touch

Energy is another important pillar in delivering a great user experience. The usability of an application is affected by the energy it consumes and how it impacts battery life. If the application rapidly drains energy, users will be reluctant to run the application. High energy consumption usually results from heavy usage of the system’s resources, i.e., the CPU, GPU, and even storage devices performing unnecessary work. The case studies below demonstrate these issues and highlight a secondary effect often seen when optimizing touch handling capabilities where the application’s energy consumption is reduced. This secondary effect of reduced energy consumption is what we refer to as the “energy savings bonus.”

Windows Tools for Touch Analysis

Many tools can be used to help you optimize your Windows touch-based applications. Understanding the use of each tool for measurement and analysis is essential to pinpointing touch-related issues. Below are brief tool descriptions, their intended uses, and relevance to particular aspects of touch analysis.

  1. Measurement Tools
    1. Measuring response time using a high resolution camera
      1. Record a video of touch interactions and manually step through it frame by frame to obtain response times.
    2. Windows PerfMon
      1. Pre-packaged with Windows to look at CPU and other system stats.
      2. This tool collects at a one second granularity and provides an overview of the system’s behavior when the app is running.
    3. Intel® Power Gadget
      1. Gathers power/energy metrics such as package (CPU and GPU) power consumption.
    4. Windows Performance Recorder (WPR)
      1. Packaged with Windows 8/8.1 ADK.
      2. WPR has a user interface (WPRUI) that allows traces to be performed that collect specific system metrics like CPU utilization, virtual memory commits, power consumption, etc.
    5. FRAPS
      1. Reports an application’s rendering rate (FPS) and only works on desktop applications.
      2. Although the web site says it only supports up to Windows 7, you can use this on Windows 8/8.1 desktop applications.
         
  2. Analysis Tools
    1. Windows Performance Analyzer (WPA)
      1. Packaged with Windows 8/8.1 ADK.
      2. WPA is used to load the .etl file generated by WPR so that in-depth analysis can be performed.
    2. Intel® VTune™ Amplifier XE 2013
      1. Allows developers to understand which functions/modules are most time consuming.
      2. Provides detailed view of thread scheduling.
    3. Intel® Performance Bottleneck Analyzer (PBA)
      1. Provides advanced analysis capabilities for responsiveness optimizations.
    4. GPUView
      1. Packaged with Windows 8/8.1 ADK and provides an in-depth look at what is occurring between the CPU context queue and the GPU hardware queue. Use the WPRUI trace option “GPU activity” when collecting this information.
    5. Intel® Graphics Performance Analyzer (Intel® GPA)
      1. Provides information about graphics activity on the system including frame rate.
         
  3. Questions to ask when using these tools
    1. Does Intel Power Gadget report a package (CPU and GPU) power consumption that is much larger than the baseline?
    2. Does the Windows Performance Analyzer show high CPU usage?
      • Does the scene have any visual updates?
      • If there are spikes in CPU usage, what is occurring on screen? Maybe an animation that occurs every three seconds causes the CPU usage to increase every three seconds.
    3. Does GPUView show that the CPU/GPU queue is backed up?
    4. What does Intel Performance Bottleneck Analyzer show in the timeline view?
      • Filter on the module consuming the most CPU time and see what module/thread activity is occurring.
    5. Does the application change the system’s timer tick resolution from 15.6 ms (Windows default) down to a smaller value?
      • If the application is changing the system’s timer tick resolution to a smaller value, i.e., 1 ms, the application will perform Update and Draw calls too frequently, which can back up the CPU context queue and/or GPU queue.

Now let’s look at how these tools were used to optimize two applications and answer some of the questions above.

Case Studies

For these particular case studies, the high resolution camera method was used to obtain an average response time of ~200 ms. In these contexts, not only did the applications have slow touch response, but often the applications failed to respond to a touch gesture entirely.

A Casual Multi-User Multi-Touch Game Application with Poor Touch Response

1. Problem Statement

This Windows desktop application had latency delays around ~170 ms.  But even worse, the application often failed to provide a response at all (no visual update for gesture). Since this was a sports game, these touch response issues would often cause unfair scoring to occur.

2. Using the Tools to Identify Issues

The first tool we used was Windows Perfmon since it collects data that provides an overview of what is occurring on the system while the application is running. Looking at the application’s resource utilization when no touch gestures are performed provides an idea of what will cause most of the bottleneck when a touch does occur. We could see here if certain resources like the CPU usage, context switch rate, interrupt rate, etc. were already maxed out (100% utilization) or above threshold values obtained based on analysis from previous workloads.

 

Figure 2. Application Idle, CPU Usage at 100%

 

Figure 2 shows a single CPU core (processor 0) is utilized 100% of the time, which means this single-core application was CPU-bound when updating a visually unchanging scene.

The next tool, Intel Power Gadget, was used to get an idea of the impact caused by the application using a single CPU core 100% of the time. We ran the command prompt as admin, navigated to the installation directory, and entered:

PowerLog3.0.exe –duration <duration to run for in seconds> –file <log_file.csv>

After running the command, we typed the name of the log_file.csv and pressed Enter. Figure 3 shows the package (CPU and GPU) power consumption of the system while the application was running and not handling touch interactions. The x-axis is the sampling rate at which the energy MSRs were read, and the y-axis is the processor power in watts [3].

 

Figure 3. Application Idle Package (CPU and GPU) Power Consumption in Watts

The same behavior occurred when touch gestures were performed, as indicated in CPU usage charts, even when power remained almost the same with and without touch interactions. This clearly indicates that something was consuming all the resources, making it difficult for touch to respond. The system’s power consumption when the application was not running was ~2.5 W, which meant the application caused a 9 W increase in power. What was causing this 9 W CPU and GPU power consumption increase?

Next, Intel GPA was used where a rendering rate of ~350 frames per second (FPS) was reported while the application was not handling touch gestures and ~210 FPS when touch gestures were performed. Although it is constantly debated, a common consensus is the human eye cannot usually distinguish the difference between one app rendering at 60 FPS and one rendering at 120 FPS. This meant that users would see the same visual updates on screen at 210 FPS as if the application were rendering at 60 FPS.

Next, GPUView was used and showed this high rendering rate caused the GPU queue to be full as the application was trying to submit the job to GPU pipeline as soon as possible. Figure 4 shows rows of packets with double hash marks, which indicates a present packet ready to be displayed to the screen. This activity was occurring while the application was displaying a screen with no visual updates.

 

 

Figure 4. Screen Shots of Backed-up GPU/CPU queues from GPUView tool

What was causing the CPU usage to be 100% and the CPU/GPU queues to be backed up?

WPRUI was used next, and the only trace option selected was CPU usage to reduce overhead caused by the tool. When collecting on idle scenarios, take into consideration the amount of overhead caused by the tool itself. At this point, we knew the application was making the CPU/GPU queues back up, so what was being called before the graphics module? By inspecting the application’s call stack to the graphics module, we found some clues as to what was being called that accounted for this needless work.

 

Figure 5. Application’s Hot-Call Stack

Inspecting the call stack shown in Figure 5 showed a Game::Tick method called shortly before a D3D9 Present call was made, which eventually lead to the graphics module igdumd32.dll. This Game::Tick method unintentionally was setting the system’s timer tick resolution to 1 ms, down from 15.6 ms (Windows default). See Figure 6.

 

Figure 6. The Game Tick Method Changing the System Timer Resolution

So every 1 ms, the application would perform Update and Draw calls since that is when Game::Tick was called. Calling these methods every millisecond also meant the CPU wakes up often, not going into deeper sleep states (C-states), and the GPU is busy more than necessary.

3. End Result

APIs are available to ensure that an application does not change the system’s timer tick resolution and that the application is synchronized to the Vsync. After using these types of APIs, the CPU was no longer spending 100% of execution time on Update and Draw calls.

 

Figure 7. Optimized Application CPU Usage

Since the CPU was no longer executing needless Update calculations and Draw calls every millisecond, the CPU context queue and GPU queue were no longer backed up. The screen shot in Figure 8 shows work submitted at 60 FPS since the refresh rate of the display was 60 Hz.

 

Figure 8. Optimized Application CPU and GPU Queue Activity

The application’s rendering rate was now capped at 60 FPS since a present packet is submitted at every Vsync on a monitor with a 60 Hz refresh rate. By optimizing the application’s resource consumption in an idle scenario (no visual changes to the screen and no touch gestures handled), touch responses were faster and smoother. The average touch response time of the optimized application was around 110 ms, where before it averaged around 170 ms, and touches were no longer lost (no response from application).

As an added bonus (the energy savings bonus), the package power consumption of the system was reduced by ~8.5 W. Now users could play the application on their favorite Windows mobile device for a longer period of time before having to recharge the battery.

In summary, idle application behavior can cause the application to flood the touch handling pipeline. With the optimized version, the application had more head room to handle additional touch gestures, giving it the benefit of decreased touch latency.

 

A 3D Casual Game Application with Lost Touch Response

1. Problem Statement

This case study is a 3D free-running game on Windows 8 Desktop that uses WM_TOUCH messages to handle touch inputs. To play the game, the user flicks the screen at different orientations to make an avatar perform different actions (such as jump, slide, squat, etc.). If no touch gestures are performed, the avatar keeps running forward on a fixed path.

In the original version of the game, when two types of touch interactions were performed, the avatar would not perform the expected action and simply continue to run forward.

  1. When two flicks were performed successively, if the time interval between them was too small, the second flick usually had no response.
  2. When the distance of a flick moving on the touch screen was too short, the flick usually had no response.

2. Using the Tools to Identify Issues

  1. Isolate the Issue. Determine if the touch response issues are due to the application or the platform (hardware/driver/OS). The method recommended here is to run WPR, switch to the application, and perform a single touch gesture at specific times during the data collection to visually show the touch events and their durations during analysis.


    Figure 9. Marking Touches with Response and Touches with No Response

    Manually record the touch events with and without response. By having a process that tracks touch registration, we were able to mark when the OS had registered the touch by inspecting the call stack for message processing functions as shown in Figure 9 (purple spikes).
     
  2. Compare Good UX vs. Bad UX Call Stacks. Comparing various aspects of a touch that has a response (Good UX) to a touch that has no response (Bad UX) will often show a difference in which functions are called by the application.

    The call stacks containing FrameMove() were investigated since that function, as the name implies, provides a visual update. In the call stack of "Good UX" a function AvatarMovesToTheLeft::FrameMove is called, while in the "Bad UX," it is not called (see Figure 10).


    Figure 10. Touch with Response vs. Touch without Response Call Stacks
     
  3. Trace the Call Stacks. By tracing the “Bad UX” call stack, we discovered where the call chain broke. Windows message processing functions were called, including PeekMessage, DispatchMessage, and even the game’s WndProc function. This confirmed that all touch inputs were received by the application’s message processing function, but the xxxSlideLeftState or xxxSlideRightState functions that set the avatar’s run mode for the expected animation were not called (see Figure 11).

Figure 11. Bad UX Message Processing Call Stack

3. End Result

  1. The cause of the quick successive flick loss is that the flick gesture acts on the game only if the avatar’s run mode is in the state of "Run forward." If it is in a different state, the touch input will be abandoned. For example, after the first flick gesture, the state of the run mode changes from "Run forward" to "Slide to Right." If the second flick comes quickly before the state returns to "Run forward," it will be discarded. The issue was fixed by caching the touch messages for the appropriate run mode.
  2. The cause of the short flick loss was related to the game’s WndProc function.  The game recognized the flick gesture only if its length was more than 60 logical pixels, which is why some short flicks were lost. Given the same resolution, 60 logical pixels cover a longer physical distance on an Ultrabook screen than on an iPhone* screen. This makes short flicks on a game ported from the iPhone platform to Ultrabook more prone to be lost on the Ultrabook screen. The solution was to set the threshold of the flick length based on the physical distance on screen using dots per inch (DPI) instead of logical pixels.

In summary, we isolated the issues as either app- or platform-related by comparing Windows messaging API calls to determine if and where the OS had registered the touch. Then the call stacks belonging to touch gestures that had an expected response (Good UX) and gestures with no response (Bad UX) were compared to find differences in functions that were called. Finally, the game’s message processing call stack for a touch that was lost was traced upstream to find where the break in the call chain occurred. The start of the call stack trace was from the functions that were called in the Good UX call stack but not in the Bad UX call stack.

Conclusion

Optimizing touch is essential, and many tools are available for measurement and analysis. Remember to have a reference point (baseline) to which you can compare the data you have collected while your application is running. Look at the differences in data obtained while the application is simply running with data obtained performing touch gestures. Compare the application’s behavior when it responds to a touch and when it does not.

The assumption—if an application is always updating a scene, it will quickly respond to the user’s touch gesture—is not always true. A scene should only update when it is necessary in order to conserve system resources that can be used to quickly respond to a touch gesture when one occurs. Often, needlessly updating a scene for unimportant animations will cause the Windows message, CPU, or GPU queues to back up, which can subsequently cause delays in providing a visual response to the user’s touch.

References

[1] Fiering, Leslie. Gartner Says More Than 50 Percent of PCs Purchased for Users Under the Age of 15 Will Have Touchscreens by 2015. Gartner, 7 Apr. 2010. Web. 03 Mar. 2014.

[2] Chabukswar, Rajshree, Mike Chynoweth, and Erik Niemeyer. Intel® Performance Bottleneck Analyzer. Intel Corporation, 4 Aug. 2011. Web. 12 Feb. 2014.

[3] Seung-Woo Kim, Joseph Jin-Sung Lee, Vardhan Dugar, Jun De Vega. Intel® Power Gadget. Intel Corporation, 7 Jan. 2014. Web. 25 March 2014.

[4] Freeman, Jeffrey M. "Intel® Graphics Performance Analyzers (Intel® GPA) FAQ." Intel Corporation, 17 Dec. 2013. Web. 25 Mar. 2014.

[5] H, Victor. "iPhones Score Highest Touch Responsiveness, More than Twice as Responsive as Android and Windows Phone Devices."Phone Arena. Phonearena, 01 Oct. 2013. Web. 26 Mar. 2014.

[6] "Windows Assessment and Deployment Kit (Windows ADK)." Microsoft Corporation, 2 April. 2014. Web. 3 April. 2014.

[7] "About Messages and Message Queues." Microsoft Corporation, 24 Jan. 2012. Web. 26 Mar. 2014.

[8] Intel® VTune Amplifier XE 2014. Intel Corporation, 6 Mar. 2014. Web. 3 April 2014.

[9] Fraps 3.5.99. Fraps, 26 Feb. 2013. Web. 3 April. 2014. 

Users Create and Share Custom Screen Shots with the Snap 7 App by Ashampoo for Intel® Atom™ Tablets for Windows* 8.1

$
0
0

Download Solutions Brief

The powerful app enables users to capture and customize screen shots for teaching, collaboration and creative purposes.

It’s no secret that many people consider themselves visual creatures. It is for this reason that visual expression tools can be so useful in business, education and art. The screen shot is one such valuable tool. Now, Windows* 8.1 tablet users can create custom and annotated screen shots for any purpose with the Snap 7 app by Ashampoo.

Featured on Windows* 8.1 tablets, the Snap 7 app takes screen shots to a new level by offering a wide array of customizable settings and sharing options. The full-service app enables users to capture rectangular or free form shots on static photos or video. Then, users can customize their shots by including notes, arrows, shapes and stamps and share them via email or directly uploading to social media sites.

Snap 7 was designed by Ashampoo, a software developer and an Intel® Software Partner. With support from the Intel® Developer Zone, Ashampoo optimized Snap 7 for the capabilities of Windows* 8.1 tablets. The result is an enhanced way for users to capture and customize screen shots directly on their mobile devices.

“There are so many uses for screen shots Nikolaus Brennig, Chief Developer, Ashampoo Media. “We wanted to create an application that enabled all of them. Whether users want shots for business or personal use, they can use Snap 7 every time.”

Throughout the development of Snap 7, Ashampoo received assistance and guidance from Intel engineers.  The engineers provided troubleshooting assistance and provided access to code and other Intel resources. The result is a useful application that is fully optimized for Windows* 8.1 tablets.

“Users want applications that can be used anywhere and that’s why we optimized Snap 7 for Windows* tablets,” Sebastian Schwarz CEO, Ashampoo. “The interactive touch capabilities and superior screen quality ensure that users create screen shots that perfectly fit their needs.”

It is these and other qualities that make Windows* 8.1 the perfect platform for this functional new app.

Ashampoo is a software developer based out of Oldenburg, Germany. For more information visit its homepage at: https://www.ashampoo.com/

The Intel Developer Zone supports developers and software companies of all sizes and skill levels with technical communities, go-to-market resources and business opportunities. To learn more about becoming an Intel® Software Partner, join the Intel Developer Zone.

 

Intel, the Intel logo and Intel Inside are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2014 Intel Corporation. All rights reserved.

Discover a Faster Way to Read and Study on Windows* 8.1 Tablets with Reading Trainer by HeKu IT

$
0
0

Download Solutions Brief

Reading Trainer teaches readers how to read faster, more efficiently and with better recall.

For many people, it can take a good part of a morning to read the newspaper and much more time to finish an entire book. Yet the Reading Trainer app by HeKu IT helps readers become more efficient and quick with their reading so they can breeze through books, articles and more at a faster pace.  

Featured on Intel® Atom™ tablets for Windows* 8.1, Reading Trainer is one of the most popular educational apps in Europe. It helps users improve reading speed and retention rate through 12 challenging and entertaining exercises.

Reading Trainer is designed to help with the rapid recognition of numbers, letters and words. It does this by training for flexible eye movements, increased vision span and improved concentration skills.

When developing Reading Trainer, HeKu IT took advantage of its status as an Intel® Software Partner in the Intel® Developer Zone. The company’s developers accessed   Intel tools, code and support to optimize this learning app for the powerful capabilities of Intel Atom tablets for Windows* 8.1. Plus, HeKu IT also benefited from the insight and support from the developer communities in the Intel Developer Zone.

“Reading Trainer is an educational and fun app that can help improve reading for people of all ages,” said Ronny Hecker, CEO of HeKu IT. “Having the ability to access it from a portable Windows* 8.1 tablet makes it easy and convenient to learn new ways to read.”

The high-resolution screen, touch interface and powerful Intel® Atom™ processors combine to create a powerful platform to learn, read and improve reading efficiency with Reading Trainer. 

“We’ve created an app that works very well with new Windows* 8.1 tablets,” Hecker explained. “We believe people can benefit from the simple yet engaging exercises wherever they may be with their tablet.”

The support from the Intel Developer Zone and capabilities of new Windows* 8.1 tablets has encouraged HeKu IT to create further apps that take advantage of the distinct capabilities of Windows* 8.1 tablets.

HeKu IT Solutions & Services was founded in 2009 and is a full service provider of IT services and applications. For more information visit: www.heku-it.com

The Intel Developer Zone supports independent developers and software companies of all sizes and skill levels with technical communities, go-to-market resources and business opportunities. To learn more about becoming an Intel Software Partner, join the Intel Developer Zone.

 

Intel, the Intel logo and Intel Inside are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2014 Intel Corporation. All rights reserved.

 

Illegal instruction on machine with AMD Jaguar processor

$
0
0

Product : Intel® Math Kernel Library (Intel® MKL)

Version : Intel® MKL 11.*  including and 11.2 beta

Operating Systems affected: Windows* OS, Linux * OS and OS X*

Reference Number : DPD200521692

Problem Description : 

The Intel MKL crashes on with the message “illegal instruction detected” on machines with AMD Jaguar Processors. The exact CPU models are AMD A6-5200 and AMD A4-1250 and probably on another  the similar CPU from AMD.

The issue is caused by error in MKL CPU dispatching. The  issue was escalated and confirmed by MKL  Team and would be fixed into one of the nearest updates. The article would be updated as soon as the fix would be officially released. 

Workaround: as a temporarily workaround, we recommend  to set the "MKL_CBWR=SSE4_2" environment variable to use the branch of the code workable on these system.

Power Management States: P-States, C-States, and Package C-States

$
0
0

(For a PDF version of this article, download the attachment.)

Contents

Preface: What, Why and from Where. 1

Chapter 1: Introduction and inquiring minds. 2

Chapter 2: P-States, Reducing power consumption without impacting performance. 3

Chapter 3: Core C-States, The Details. 5

Chapter 4: Package C-States, The Details. 8

Chapter 5: An Intuitive Description of Power States Using Stick Figures and Light Bulbs. 12

Summary. 15

Appendix: C-States, P-States, where the heck are those T-States?. 16

References. 18

Endnotes. 18

 

Preface: What, Why and from Where

This is an aggregate of a series of blogs I wrote on power management states. That series is part of an even larger collection of blogs addressing all sorts of power management topics including power management states (this one), turbo / hyper-threading power states, power management configuration, and power management policy. With one exception, the discussion in these blogs should be generally useful to everyone, even though they concentrate on the Intel® Xeon Phi™ coprocessor. The exception is the series on configuration, which by its nature, is a more platform dependent topic and focused on the Intel® Xeon Phi™ coprocessor and the Intel® Manycore Platform Software Stack (MPSS). In addition to this collection of power management blogs, there are two other companion collections: a series on different techniques for measuring power performance[i], and another loose set of my earlier blogs that cover a variety of topics such as where C*V2*f comes from.

The article you are currently reading was originally published as the following series of 5 somewhat inconsistently entitled blogs:

In addition to these, there is a bonus article in an appendix:

At the Intel® Developer Zone, you can find the individual blogs listed in yet another article, List of Useful Power and Power Management Articles, Blogs and References.

So kick your feet up, lean back, and enjoy this look at the ever fascinating topic of power management.

 

Chapter 1: Introduction and Inquiring Minds

So exactly which power states exist on the Intel Xeon Phi coprocessor? What happens in each of the power states? Inquiring minds want to know. And since you are, no doubt, aggressively involved in high performance computing (HPC), I am sure you want to know also.

This is not going to be a high powered, in depth, up-to-your-neck-in-technical-detail type of treatise on power management (PM). If you want that, I suggest you read the Intel Xeon Phi Coprocessor Software Developer’s Guide (SDG)[ii]. As a word of warning, when the Power Management section of the SDG refers to writers of SW (i.e. programmers), whether explicitly or implicitly, they do not refer to you or me. Its target audience consists of those poor lost souls that design operating systems (OSs) and drivers. (By the way, in an earlier life, I was one of those “poor lost souls.”) One of the objectives of the series of blogs you are now reading is to look at PM from the perspective of an application developer, i.e. you or I, and not a writer of operating systems or drivers.

I am also not going to talk to what C, P and PC-states are. If you want an introduction to these concepts before digging into this blog series, I recommend (humbly) an earlier blog series I wrote on just that topic. See http://software.intel.com/en-us/user/266847/track. It is a little hard to separate the relevant power management blogs from all my other forum posts and videos, so I list the most important ones in this endnote[iii].

Briefly, the coprocessor has package-based P-states, core-based C-states (which are sometimes referred to as CC-states) and package-based C-states (PC-states). It also has the capability to operate in Turbo mode3. There are no per-core based P-states.

The host and coprocessor share responsibility for power management on the coprocessor. For some PM activities, the coprocessor operates alone. For others, the host component acts as a gate keeper, sometimes controlling PM, and at other times overriding actions taken by the coprocessor’s PM.

In the forthcoming series, I discuss package-based P-states (including Turbo mode[iv]), core-based C-states and package-based PC-states. I will also discuss what control you have as an application developer over coprocessor PM.

Here is one last note. I cannot guarantee that all the Intel Xeon Phi coprocessor SKUs (i.e. coprocessor types) expose these power management features.

 

Chapter 2: P-States: Reducing Power Consumption Without Impacting Performance

Right up front, I am going to tell you that P-states are irrelevant, meaning they will not impact the performance of your HPC application. Nevertheless, they are important to your application in a more roundabout way. Since most of you belong to a group of untrusting and always questioning skeptics (i.e. engineers and scientists), I am going to go through the unnecessary exercise of justifying my claim.

The P-states are voltage-frequency pairs that set the speed and power consumption of the coprocessor. When the operating voltage of the processor is lower, so is the power consumption. (One of my earlier blogs explains why at a high though technical level.) Since the frequency is lowered in tandem with the voltage, the lower frequency results in slower computation. I can see the thought bubble appearing over your head: “Under what situations would I ever want to enable P-states and, so, possibly reduce the performance of my HPC app?” Having P-states is less important in the HPC domain than in less computationally intense environments, such as client-based computing and data-bound servers. But even in the coprocessor’s HPC environment, longer quiescent states are common between large computational tasks. For example, if you are using the offload model, the coprocessor is likely unused during the periods between offloads. Also, a native application executing on the coprocessor will often be in a quiescent phase for many reasons, such as their having to wait for the next chunk of data to process.

The P-states the coprocessor Power Management (PM) supports is P0 through Pn. The number of P-states a given coprocessor SKU (i.e. type) supports will vary, but there are always at least two. Similarly, some SKUs will support turbo P-states. The coprocessor’s PM SW handles the transition from one P-state to another. The host’s PM SW has little if any involvement.

A natural thing to wonder is how much this is going to impact performance; we do happen to be working in an HPC environment. The simple answer is that there is, in any practical sense, no impact on HPC performance. I am sure that, at this point, you are asking yourself a series of important questions:

(a)    “Wait a minute; how can this be? If the coprocessor is slowing down the processor by reducing the frequency, how can this not affect the performance of my application?”

(b)   “I just want my application to run as fast as possible. Why would I want to reduce power consumption at all?”

Let us first consider (b). I understand that power is not a direct priority for you as an application writer. Even so, it does impact your application’s performance indirectly. More power consumption has to do with all that distant stuff, like higher facility costs in terms of electricity usage due to greater air-conditioning demands, facility space needs, etc. It is simply part of that unimportant stuff administrators, system architects, and facilities management worry about.

Truth be told, you need to worry about it also. This does impact your application in a very important way, though how is not initially obvious. If the facility can get power consumption down while not losing performance, this means that it can pack more processors in the same amount of space all while using the same power budget. To use another American idiom, you get more “bang for the buck.” And this is a good thing for you as a programmer/scientist. When you get right down to it, lower power requirements mean that you can have more processors running in a smaller space, which means that you, as an application designer/scientist, can run not only bigger problems (more cores) but run them faster (less communication latency between those more cores).

Let us go back to P-states. P-states will impact performance in a theoretical sense, but not in a way that is relevant to an HPC application. How is this possible? It is because of how transitions between the P-states happen. It all has to do with processor utilization. The PM SW periodically monitors the processor utilization. If that utilization is less than a certain threshold, it increases the P-state, that is, it enters the next higher power efficiency state. The word term in the previous sentence is “utilization”. When executing your computationally intensive HPC task on the coprocessor, what do you want your utilization to be? Ideally, you want it to be as close to 100% as you can get. Given this maximal utilization, what do you imagine is the P-state your app is executing in? Well, it is P0, the fastest P-state (ignoring turbo mode). Ergo, the more energy efficient P-states are irrelevant to your application since there is almost never a situation where a processor supporting your well-tuned HPC app will enter one of them.

So in summary, the “HPC” portions of an HPC application will run near 100% utilization. Near 100% utilization pretty much guarantees always using the fastest (non-turbo) P-state, P0. Ergo, P-states have essentially no impact upon the performance of an HPC application.

How do I get my application to run in one of those turbo modes? You cannot as it is just too dangerous. It is too easy to make a minor mistake resulting in overheating and damaging the coprocessor. If your processor supports turbo, leave its management to the OS.

 

Chapter 3: Core C-States - The Details

BACKGROUND: A QUICK REFRESHER ON IDLE STATES

Here is a quick summary of what C-states are. C-states are idle power saving states, in contrast to P-states, which are execution power saving states. During a P-state, the processor is still executing instructions, whereas during a C-state (other than C0), the processor is idle, meaning that nothing is executing. To make a quick analogy, a processor lying idle is like a house with all the lights on when no one is at home. Consuming all that power is doing nothing other than providing your electric company a little extra income. What is the best option? If no one is at home, meaning the house is idle, why leave the lights on? The same applies to a processor. If no one is using it, why keep the unused circuits powered up and consuming energy? Shut them down and save.

C0 is the “null” idle power state, meaning it is the non-idle state when the core is actually executing and not idle.

THE DIFFERENCE BETWEEN CORE AND PACKAGE IDLE STATES

The coprocessor has up to 60+ cores in one package. Core idle power states (C-states) are per core, meaning that one of those 60+ cores can be in C0, i.e. it is executing and not idle, while the one right next door is in a deep power conservation state of C6. In contrast, PC-states are Package idle states which are idle power conservation states for the entire package, meaning all 60+ cores and supporting circuitry on the silicon. As you can guess, to drop the package into a PC-6 state, all the cores must also be in a C6 state. Why? Since the package has functionality that supports all the cores, to “turn off” some package circuitry impacts all of them.

Figure 1. Dropping a core into a core-C1 state

 

WHAT CORE IDLE STATES ARE THERE?

Each core has 2 idle states, C1 and C6 (and, of course, C0).

C0 to Core-C1 Transition: Look at Figure 1. C1 happens when all 4 hardware (HW) threads supported by a core have executed a HALT instruction. At this point, let us now think of each of the 4 HW threads as the Operating System (OS) perceives them, namely as 4 separate CPUs (CPU 0 through 3). Step 1: the first three CPUs belonging to that core execute their HALT instruction. Step 2: that last CPU, CPU-0 in the illustration, attempts to execute its HALT instruction. Step 3: it interrupts to an idle residency data collection routine. This routine collects, you guessed it, idle residency data and stores that data in a data structure accessible to the OS. CPU 0 then HALTs. Step 4: at this point, all the CPUs are halted and the core enters a core-C1 state. In the core-C1 state, the core (and its “CPUs”) is clock gated[v].

 

http://software.intel.com/sites/default/files/figure2_worthC6.jpg

Figure 2. Is it worth entering C6: Is the next interrupted far enough out?

 

http://software.intel.com/sites/default/files/5f/b5/figure3_worthC6.jpg

Figure 3. Is it worth entering C6: Is the estimated idle time high enough?

After entering core-C1: Now that the core is in C1, the coprocessor’s Power Management routine comes into play. It needs to figure out whether it is worthwhile to shut the core down further and drop it into a core-C6 state. In a core-C6 state, further parts of the core are shut down and power gated. Remember that the coprocessor’s Power Management SW executes on the OS core, typically core 0, and is not affected by the shutdown of other cores.

What type of decisions does the coprocessor’s Power Management have to make? There are two primary ones as we discussed in the last chapter: Question#1: Will there (probably) be a net power savings? Question #2: Will any restart latency adversely affect the performance of the processor or of applications executing on the processor? Those decisions correspond to two major scenarios and are shown in Figures 2 and 3 above.  Scenario 1 is where the coprocessor PM looks at how far away is the next scheduled or expected interrupt. If that interrupt is soon enough, it may not be worth shutting down the core further and suffering the added latency caused by bringing the core back up to C0. As is the case in life, the processor can never get anything for free. The price of dropping into a deeper C state is an added latency resulting from bringing the core/package back up to the non-idle state. Scenario 2 is where the coprocessor’s Power Management looks at the history of core activity (meaning its HW threads) and figures out whether the execution (C0) and idle (C1) patterns of the core make core-C6 power savings worthwhile.

If the answers to both of these questions are “yes”, then the core drops down into a core-C6 state.

After entering core-C6: Well dear reader, it looks like I have run out of time. The processor next decides if it can drop into the package idle states. I will cover that discussion in my next blog in this series.

 

Chapter 4: Package C-States - The Details

TERMINOLOGY NOTE:

Upon reading the SDG (Intel Xeon Phi Coprocessor Software Developer’s Guide), you’ll find a variety of confusing names and acronyms. Here’s my decoder ring:

Package Auto C3[vi]: also referred to as Auto-C3, AutoC3, PC3, C3, Auto-PC3 and Package C3

Package Deep-C3: also referred to as PC3, DeepC3, DeeperC3, Deep PC3 and Package C3 (No, I am not repeating myself.)

Package C6: Also referred to as PC6 and C6 and Package C6.

BACKGROUND: WHAT THE HECK IS THE “UNCORE”?

Before we dig deep into package C-states, I want to give you some background about circuitry on a modern Intel® processor. A natural way of dividing up the circuitry of a processor is that composing the cores -- basically that supporting the pipeline, ALUs, registers, cache, etc. -- and everything else (supporting circuitry). It turns out that “everything else” can be further divided into that support circuitry not directly related to performance (e.g. PCI Express* interfacing), and that which is (e.g. the bus connecting cores). Intel calls support circuitry that directly impacts the performance of an optimized application the “Uncore”.

 

 http://software.intel.com/sites/default/files/blog2bFig0_0.jpg

Figure 4. Circuitry types on the coprocessor

 

Since that is out of the way, let us get back to package C-states.

WHY DO WE NEED PACKAGE C-STATES?

After gating the clocks of every one of the cores, what other techniques can you use to get even more power savings? Here’s a trivial and admittedly flippant example of what you could do: unplug the processor. You’d be using no power, though the disadvantages of pulling the power plug are pretty obvious. A better idea is to selectively shutdown the more global components of the processor in such a way that you can bring the processor back up to a fully functional state (i.e. C0) relatively quickly.

Package C-States are just that - the progressive shutdown of additional circuitry to get even more savings. Since we have already shut down the entire package’s circuitry associated with the cores, the remaining circuitry is necessarily common to all the cores, thus the name “package” C-states.

WHAT PACKAGE IDLE STATES ARE THERE?

My dear readers, there are 3 package C states: Auto-C3, Deep-C3, and (package) C6. As a reminder, all these are package C-states, meaning that all the threads/CPUs in all the cores are in a HALT state. I know what you are thinking. “If all the cores in the coprocessor are in a HALT state, how can the Power Management (PM) software (SW) run?” That’s a good question. The answer is obvious once you think on it. If the PM SW can’t run on the coprocessor, where can it run? It runs on the host, of course.

 http://software.intel.com/sites/default/files/blog2bFig1.jpg

Figure 5. Coprocessor and host power management responsibilities and control

 

There are two parts to controlling power management on the Intel® Xeon Phi™ coprocessor, the PM SW that runs on the coprocessor, and the PM component of the MPSS Coprocessor Driver that runs on the host. See figure 1. The coprocessor part controls transitions into and out of the various core C-states. Naturally, when it is not possible for the PM SW to run on the coprocessor, such as for package Deep-C3 and package C6, the host takes over. Package Auto-C3 is shared by both.

WHAT IS SHUT DOWN IN THE PACKAGE C-STATES?

I was going to rewrite this table but it is so clear, I am stealing it instead. It is Table 3-2 of the Intel® Xeon Phi™ Coprocessor Software Developer’s Guide (SDG).

Package Idle State

Core State

Uncore State

TSC/LAPIC

C3WakeupTimer

PCI Express* Traffic

PC3

Preserved

Preserved

Frozen

On expiration, package exits PC3

Package exits PC3

Deep C3

Preserved

Preserved

Frozen

No effect

Times out

PC6

Lost

Lost

Reset

No effect

Times out

 

And for those of you who want a little more detail:

Package Auto-C3: Ring and Uncore clock gated

Package Deep-C3: VccP reduced

Package C6: VccP is off (I.e. Cores, Ring and Uncore are powered down)

TSC and LAPIC are clocks which stop when the Uncore is shutdown. They have to be set appropriately when the package is reactivated. “PC3” is the same as the package Auto-C3 state.

HOW IDLE PACKAGE C-STATE TRANSITIONS ARE DETERMINED

Into Package Auto-C3: You can think of the first package state, Auto-C3, as a transition state. The coprocessor PM SW can initiate a transition into this state. The MPSS PM SW can override this request under certain conditions, such as when the host knows that the Uncore part of the coprocessor is still busy.

We will also see that the package Auto-C3 state is the only package state that can be initiated by the coprocessor’s power management. Though this seems a little unfair at first, upon further thinking the reason is obvious. At the start of a transition into package Auto-C3, the coprocessor SW PM routine is running and can initiate the transition into the first package state. (To be technically accurate, the core executing the PM SW can transition quickly out of a core C-state into C0 quickly)

Beneath Auto-C3, the coprocessor isn’t executing and transitions to deeper package C-states are best controlled by the host PM SW. Not only is this due to the coprocessor’s own PM SW is essentially suspended, but because the host can see what is happening in a more global sense, such as Uncore activity after all the cores are gated, and traffic across the PCI Express bus.

Into Package Deep-C3: The host’s coprocessor PM SW looks at idle residency history, interrupts (such as PCI* Express traffic), and the cost of waking the coprocessor up from package Deep-C3 to decide whether to transition the coprocessor from a package Auto-C3 state into a package Deep-C3 state.

Into Package C6: Same as the Package Deep-C3 transition but only more so.

 

Chapter 5: An Intuitive Description of Power States Using Stick Figures and Light Bulbs

AN INTUITIVE ILLUSTRATION OF A CORE AND ITS HW THREADS

This is the fourth installment of a series of blogs on Power Management for the Intel Xeon Phi coprocessor.

For those of you who have read my blog presenting an intuitive introduction to the Intel Xeon Phi coprocessor, The Intel Xeon Phi coprocessor: What is it and why should I care? PART 3: Splitting Hares and Tortoises too, I irreverently referred to “diligent high tech workers who labor ceaselessly for their corporate masters”. Let’s take this description a little further. In Figure A, we have one such diligent high tech worker. He is analogous to one coprocessor CPU/HW thread.

http://software.intel.com/sites/default/files/powerBlog_states_pt3_FigA_high_tech_worker_3.png

Figure 6. Diligent high tech worker, i.e. an Intel® Xeon Phi™ HW thread

 

There are 4 HW threads to a core. See Figure B. It’s pretty obvious so I’m not going to bother with a multipage boring description of what it means. There is also that mysterious light bulb. The light bulb represents the infrastructure that supports the core, such as timing and power circuits.

http://software.intel.com/sites/default/files/powerBlog_states_pt3_FigB_high_tech_workers_0.png

Figure 7. Diligent high tech workers in a room, i.e. an Intel® Xeon Phi™ coprocessor core

 

POWER MANAGEMENT: Core C0 and C1

So what does all this have to do with power management? Though it is sometimes assumed by the lower paid liberal arts students that engineers are unimaginative and boring, you and I know that, though boring we may be, we are not unimaginative. With this in mind, I ask you to visualize that on every one of those desks is a computer and a desk light.

The Core in C0: When at least one of the high tech workers is diligently working at their task. (I.e. At least one of the core’s CPUs/HW threads is executing instructions.)

CPU Executing a HALT: When one of those diligent workers finishes his task, he turns out his desk lamp, shuts down his computer, and leaves. (I.e. one of the HW threads executes a HALT instruction.)

Entering Core-C1: When all four diligent workers finish their tasks, they all execute HALT instructions. The last one finishing turns off the lights. (I.e. The core is clock gated.)

 

POWER MANAGEMENT: Core-C6

Entering Core-C6: Yes, I know it’s blatantly obvious, but I like talking to myself. As time proceeds, everyone leaves for lunch. Since no one is in the office, we can shut things down even further in the rooms (i.e. power gating). Remember, though, that they are coming back after lunch so anything shut down must be able to be powered back up quickly.

 

http://software.intel.com/sites/default/files/powerBlog_states_pt3_FigC_building_of_high_tech_workers_1.png

Figure 8. A building full of diligent high tech workers, i.e. an Intel® Xeon Phi™ coprocessor

 

POWER MANAGEMENT: Package Auto-C3, Package Deep-C3 and Package C6

Now I’m going to stretch this analogy a little bit, but since it is fun, I’m going to keep on going.

Let’s expand this very creative analogy. Imagine if you will, a building with many rooms, 60+ in point of fact. See Figure C. Yes, I know that here in Silicon Valley, diligent high tech workers work in luxurious cubes, not stuffy offices. Unfortunately, the analogy breaks down at that point so I am sticking with communal offices.

Entering Package Auto-C3: Everyone has left the floor, so the movement sensor automatically shut off the floor lights. (I.e. the coprocessor power management software clock gates the Uncore and other support circuitry on the silicon).

Entering Package Deep-C3: It’s the weekend so facilities (i.e. the MPSS Coprocessor Driver Power Management module) shuts down the air condition and phone services. (I.e. the host reduces the coprocessor’s VccP and has it ignore interrupts.)

Entering Package C6: It’s Christmas week shutdown and forced vacation time, so facilities turns off all electricity, air condition, phones, servers, elevators, toilets, etc. (I.e. the host turns off the coprocessor’s Vccp and shuts down its monitoring of PCI Express* traffic.)

POWER MANAGEMENT: Getting Obsessive

Having fun with this analogy, I was thinking of extending it further into industrial campuses (a node containing multiple coprocessors), international engineering divisions (clusters with each node containing multiple coprocessors) and contracting with external partners (distributed WAN processing). Sanity and common sense prevailed and I leave the analogy as is.

 

 

Summary

We discussed the different types of power management states. Though the concepts are general, we concentrated on a specific platform, the Intel® Xeon Phi™ coprocessor. Most modern processors, be they Intel Corporation, AMD* or embedded, have such states with some variation.

Intel® processors have two types of power management states, P-states (runtime) and C-states (idle). The C-states are then divided into two more categories, core and package. P-states are runtime (C0) states and reduce power by slowing the processor down and reducing its voltage. C-states are idle states meaning that they shutdown parts of the processor when the cores are unused. There are two types of C-states. Core C-states shutdown parts of individual cores/CPUs. Since modern processors have multiple cores, package C-states shutdown the circuitry that supports those cores.

The net effect of these power states is to substantially reduce the power and energy usage of modern Intel® processors. This energy reduction can be considerable, in some cases by over an order of magnitude.

The impact of these power savings cannot be over stated for all platforms from smart phones to HPC clusters. For example, by reducing the power and energy consumption of the individual processors in an HPC cluster, the same facility can support more processors. This increases processor density, reduces communication times between nodes, and makes possible a much more powerful machine that can address larger and more complex problems. At the opposite end in portable passively cooled devices such as smart phones and tablets, reduced power and energy usage lengthens battery life and reduces cooling issues. This allows more powerful processors which in turn increases the capability of such devices.

 

Appendix: C-States, P-States, where the heck are those T-States?

I had an interesting question come across my desk a few days ago: “Is it still worthwhile to understand T-states?” My first response was to think, “Huh? What the heck is a T-state?”

Doing a little more research, I discovered that, yes, there is something called a T-state, and no, it really isn’t relevant any more, at least for mainline Intel® processors.

Let me say this again: T-States are no longer relevant!

Now that the purely practical people have drifted off to other more “relevant” activities, here’s something for all you power management history buffs.

A T-state was once known as a Throttling state. Back in the days before C and P states, T-states existed to save processors from burning themselves up when things went very badly, such as when the cooling fan failed while the processor was running as fast as she could. If a simple well placed temperature sensor registered that the junction temperature was reaching a level that could cause damage to the package or its contents, the HW power manager would place the processor in different T-States depending upon temperature; the higher the temperature, the higher the T-State.

As you probably already guessed, the normal run state of the processor was T0. When the processor entered a higher T-state, the manager would clock gate the cores to slowdown execution and allow the processor to “relax” and cool. For example, in T1 the HW power manager might clock gate 12% of the cycles. In rough terms, this means that the core will run for 78% of the time and sleep for the rest. T2 might clock gate 25% of the cycles, etc. In the very highest T-state, over 90% of the cycles might be clock gated. (See the figure below.)

 http://software.intel.com/sites/default/files/power_t_states_r1a_1.jpg

Figure 9. Running Time for T0/P0, P1, and T1 States

Note that in contrast to P-states, the voltage and frequency are not changed. Also, using T-states the application runs slower not because the processor is running slower, but because it is suspended for some percent of the time. In some ways, you can think of a T-state as being like a clock gated C1 state with the processor not being idle, i.e. it is still doing something useful.

In the figure above, the top most area shows the runtime of a compute intensive workload if no thermal overload occurs. The bottom shows the situation with T states (i.e. before P states), where the processor begins to toggle between running and stopped states to cool down the processor. The middle is what happens in current processors, where the frequency/voltage pair is reduced allowing the processor to cool.

For those of you who have borne with me for the history lesson, there are a few more practical reasons you should be at least aware of T-states.

(1)    Some technical literature now uses the term “throttling states” to mean P-states, not T-states.

(2)    Some power management data structures, such as some defined by ACPI, still include an unused T-state field. Many inquiries about T-states originate from this little fact.

(3)    I suspect that T-states are still relevant in some embedded processors

 

References

Kidd, Taylor (10/23/13) - “List of Useful Power and Power Management Articles, Blogs and References,” http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references, downloaded 3/24/2014.

For those of you with a passion for power management, check out the Intel Xeon Phi Coprocessor Software Developer’s Guide. It has state diagrams and other goodies. I recommend sections 2.1.13, “Power Management”, and all of section 3.1, “Power Management (PM)” for your late night reading.

 

NOTE: As previously in my blogs, any illustrations can be blamed solely on me as no copyright has been infringed or artistic ability shown.

 

Endnotes

 

 

[i] I expect to publish these blogs over the May / June 2014 time frame.

[iv] For those that need a quick refresher, turbo mode is a set of over-clocked P-states that exceed the normal power limits of the silicon. If normally run in this P-state, the silicon would over heat and potentially burn up. Turbo is possible because these normal power limits are computed based upon every core running at maximum performance. There are many situations where the entire power budget is not utilized. In these cases, the power management SW can allow a temporary overclocking.

[v] CPUs have at least one oscillator (clock) that emits a timing pulse. The circuits of the processor use this timing pulse to coordinate all activities.

[vi] Auto-PC3 may have been removed as of MPSS 3.1. Even so, it is worthy of a discussion as it illustrates the impact of latency and local vs remote management

Viewing all 461 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>