Giving graphics a new purpose by Jeremy Ford

GP-GPUS EXPLAINED!

AMAZING FREE DVD INSIDE!

HOW YOUR GRAPHICS CARD IS 62 MUST-PLAY GAMES • 50 POWER REDEFINING YOUR PC’S FUTURE TOOLS • 66 ESSENTIAL FREE APPS

Turn to page p118

ISSUE 221 XMAS 2008

PERFORMANCE GEAR & GAMING

ISSUE 221 AND I’LL WHISPER ‘NO’

A CORE i7 RIG WORTH £3,995 AND I’LL WHISPER ‘NO’

NEHALEM SPECIAL

WWW.PCFORMAT.CO.UK PCF221.cover 1

INTEL CORE i7 HAS LANDED Faster than a cheetah on steroids and yours for £200 Read the ONLY benchmarks that count inside

Issue 221 Xmas 2008 £5.99 Outside UK & ROI £6.49

31/10/08 11:46:09

GIVING GRAPHICS A NEW PURPOSE≥

Are parallel GPUs the next big performance leap? Jeremy Laird scopes the possibilities

ancy improving the performance of your PC by a factor of 20 without spending a single, sniveling cent? That’s exactly the promise of general purpose computing on the GPU, or GP-GPU for short. Simply shunt your most computationally intensive apps onto your graphics card and let it do its impossibly parallel thing. It’s quite a concept. But is it an honest to goodness goer? That is very probably the single most important question for PC performance over the next five years. Certainly in raw computational terms, high-end GPUs already sock it to CPUs in spectacular

PCF221.feature2 72

fashion. Take AMD’s latest pixelpumping beastie, the Radeon HD 4870. AMD says it’ll spew out no less than 1.2 trillion floating point operations in a single second. That’s 1.2 tflops in egghead parlance. By contrast, Intel reckons its finest Core 2 Quad CPU does a piffling 51.2 gflops. Astonishingly, that makes a Radeon HD 4870 over 20 times more powerful. What’s more, it’s not just current CPUs that cower before the monumental grunt of GPUs. Even supercomputers look a little silly by comparison. You only have to wind back to 1997 to find that the fastest roomfilling, rack-mounted monster of the day offered similar floating point power to a single modern GPU. In other words, in

the space of little over 10 years, GPU makers have shrunk $50m dollars worth of supercomputer into a chip costing approximately $100 to manufacture. If you so much as dabble in gaming, therefore, odds are the most powerful chip inside your PC is, in fact, already located on your graphics card. So, it’s easy to understand why the GP-GPU is such a seductive idea. At best a new CPU architecture could double processing power over the previous generation. But if GPUs can deliver anything like 20 times the grunt, well, that knackered old cliché of desktop supercomputing will actually come true and you could be re-encoding Blu-ray rips on your GPU in the time it takes you to say copyright infraction.

Christmas 2008

29/10/08 11:28:34 am

General-purpose GPUs

Christmas 2008

PCF221.feature2 73

29/10/08 11:28:37 am

General-purpose GPUs As for what other applications could benefit from the power of GPUs, the key issue is parallelism. Even more so than multi-core CPUs, GPUs are multithreaded monsters that only deliver their best with highly parallel software. More on that in a moment. For now suffice to say that claims of harnessing that horsepower for anything other than graphics have so far been little more than hubris. Intriguingly, all three of the big players in PC chips, AMD, Intel and NVIDIA, agree the GPU is going to revolutionise desktop computing. But it’s the details they differ on that are perhaps most revealing. At opposite ends of the spectrum are Intel and NVIDIA. For the latter, the GP-GPU initiative boils down to a fight for its very existence. A senior NVIDIA suit told PCF as long ago as last summer that it’s the most important project the company has. For Intel, it’s an opportunity to own every important chip in your PC. As for AMD, it’s sitting precariously in between, just as it has been on most matters since it purchased ATI and became the only company to have a major stake in both high performance PC processors and graphics chips. To get to grips with these competing visions, we’ve pumped some of the finest GP-GPU brains on the planet for information. Of course, no discussion of the future of the PC is complete without a word from Microsoft, so we’ll see what the Beast of Redmond has to say along with the likes of AMD, Intel and NVIDIA. But before we get to the talking heads, it’s worth understanding just how GPUs became the preposterously parallel processors they are today and how the simple possibility of general purpose processing on graphics chips arose. The first half of the story is the

hardware and that begins with ancient 2D video cards designed to improve the resolution and colour fidelity of the PC, like the IBM MDA of 1981. Then came the first graphics chips with hardware 3D

“THE BIG CHIP PLAYERS AGREE THAT THE GPU WILL REVOLUTIONISE THE DESKTOP” acceleration support. The most influential was 3dfx’s Voodoo, which introduced a number of rendering technologies in 1996, including mip-mapping and anti-aliasing that improved image quality out of sight. Although rendering power improved dramatically over the next few years, one thing didn’t change. All graphics chips up to that point were essentially fixed function. They were made of dedicated, inflexible units designed to do a single task, such as texture

Yup, Microsoft is the company we love to hate. Granted, in recent years it’s seemed incapable of producing a half-decent operating despite possessing the wealth of nations. Vista’s a train wreck, Windows Mobile an abomination. But the one thing most people agree Microsoft has got right over the years is multimedia APIs, both on the PC and the impressive job it did with the XNA development platform that spans both the PC and the Xbox 360. As luck would have it, it’s the DirectX API that is most relevant to GP-GPU.

PCF221.feature2 74

SHADER POWER

The first whiff of real flexibility arrived in 2001 with NVIDIA’s GeForce 3 and the Radeon 8500 from ATI. These were the very first GPUs to sport programmable shaders designed to deliver jazzy, ultra-realistic lighting effects. However, the scope of the code these early shader-based chips could cope with was darned limited. So, as NVIDIA’s head honcho for all things GP-GPU-related, Andy Keane, says: “the GP-GPU project really started for us with the release of NV30 (the rather ill-fated GeForce FX family) in 2003.” Along with the ATI Radeon 9700, this GPU supported new Microsoft’s DirectX 9 API and, therefore, the much, much more flexible Shader Model 2 hardware standard. It was the arrival of DirectX 10 and a new ’unified’ shader model in late 2006

Above Fun with fluids, part of NVIDIA’s GeForce Powerpack

ELEVENTH HEAVEN?

filtering, or triangle setup really, really well. But programmability or the ability to run arbitrary code, simply didn’t figure. If it wasn’t graphics processing, it wasn’t on the menu.

The big news, therefore, will be the introduction of a new Compute Shader standard with DirectX 11. Over to our man in the know at MS, Chas Boyd: “Our goal with the Compute Shader is to deliver the Tflop performance potential of GPUs to a wider range of algorithms,” Boyd says. He reckons that the data parallel processing model used by DirectX for graphics has been the absolute bomb for efficient performance scaling as more and more processing elements have been added to GPUs. It’s that super-scaling efficiency MS has its sights on for general purpose algorithms running courtesy of the Compute Shader: “With the Compute Shader, there should be no need to rewrite general purpose code as

cores are added,” he says. Software will just keep on scaling, regardless of how many shader units or cores are on offer. It’s an interesting comment in the context of NVIDIA’s reservations about the Larrabee architecture’s ability to manage multi software threads. One of the key features for the Compute Shader is the ability to share data between threads. “With a pixel shader today, you can process all you want for that one pixel. But you can’t ﬁnd out what is happening with neighbouring pixels. That restraint will be largely lifted.” The general thrust, in other words, is new memory access patterns, data sharing and caching that will allow more complex general purpose algorithms to run on GPUs.

Christmas 2008

29/10/08 11:28:38 am

General-purpose GPUs

KEEPIN’ IT REAL

NVIDIA’s PowerPack also shows off softbody physics, through tossing and cutting up a gooey alien in the spirit of universal peace and harmony

that opened the door to running general purpose apps on GPUs. Instead of using two different types of rather limited shaders – one for pixel processing and one for vertex work – unified GPUs have a single, much more flexible and programmable shader unit. It’s this generation of GPU, in the form of the GeForce 8 series, that NVIDIA targeted with the first really significant software interface for GP-GPU applications – for consumer level software at least – NVIDIA’s CUDA platform. And that brings us to the second half of the GP-GPU equation: software support. Running general purpose apps on GPUs is going to require a new software model in two parts. The first is a software layer that gives access to the GPU’s compute resources; an API in other words. The second is properly multi-threaded software. In that sense, the more programmable nature of GPUs isn’t the only factor driving the development of code suitable for the latest graphics chips. Without the rise of multi-core CPUs and the increasing efforts of software developers to get their collective noggins around parallel computer code, there would be no real hope of leveraging all that shader power inside the GPU.

THE BIG THREE

Without doubt, NVIDIA has been the most proactive of the big three in bigging up the GP-GPU concept. Given

Above NVIDIA demos PhysX, in a familiar Ageia way. Impressive, but gimmicky

that it’s the only one of the trio that lacks a PC processor product, you could dismiss that as a desperate move by a company struggling to maintain relevance. But, in fact, NVIDIA has been doing most of the running and deserves to get the first word in on any discussion of GP-GPU and what that means. In simple terms, NVIDIA’s CUDA is an API that allows software devs to use C programming, in moderately modified form, to create non-graphics code for execution on GeForce 8 series and beyond NVIDIA GPUs. It was rolled out for Windows XP in November 2007 and support for Vista was added in June this year. Speaking about the state of GP-GPU today, Andy Keane says: “the reality on the desktop today is that there’s only CUDA.” We wouldn’t argue. But the key thing about CUDA is that it only works with NVIDIA’s GPUs. Historically, that kind of approach hasn’t been AMD’s bag. Admittedly, that has often been because it’s lacked the cash to fund its own thing. It isn’t surprising then to find that AMD is mucking in with the two most important cross-architecture GP-GPU interfaces, OpenCL and Microsoft’s upcoming DirectX Compute Shader. OpenCL is basically the GP-GPU equivalent of the OpenGL open source graphics API. The latter is essentially Microsoft’s take on GP-GPU and will hit your PC with the arrival of DirectX 11. Anywho, AMD’s graphics CTO Raja Koduri says NVIDIA’s CUDA was just part of phase one for the GP-GPU industry: “It was about gaining interest and traction. Now we are moving to phase two with the emergence of cross-platform standards and we will see an acceleration in support from software developers. Interest levels in a proprietary interface like CUDA will generally be small.” That’s an angle the third big player, Intel, agrees with. Doug Carmean, one of the leading figures

The hype surrounding GP-GPU has been building to a deafening crescendo over the past 12 months. But the number of actual, usable GPU applications outside of scientific and enterprise environments has essentially been zip. That could all be about to change. As ever with GP-GPU, it’s NVIDIA pushing things forward with the release of its so-called GeForce PowerPack. It’s a collection of apps and demos designed to prove that there’s something in this GP-GPU lark after all. On close inspection, much of it is actually the same thing – namely the recently acquired Ageia PhysX physics modeling engine inserted into a number of different games and running on the GPU. Yup, it’s visually spectacular stuff. But it’s also a bit of a gimmick. More significant, therefore, is a little app called Badaboom. It’s a custom-built video transcoding engine designed to expose the parallel power of NVIDIA’s GPUs – and note, that it’s very much an NVIDIA-only application. It’s slick-looking and has an easy to use menu system. There are options to create output files for various popular devices including the usual Apple products as well as games consoles. And it promises a massive increase in performance, especially with the most powerful NVIDIA chips. That’s the good news. Here’s the bad. It’s pretty limited in terms of both video and audio input codec support. And even when it can access a source file, it has a nasty habit of falling over on the job. But how good is it when it does work? Well, that’s actually hard to say since it uses a proprietary H.264 codec to create output files and it’s not possible to run it on the CPU alone. So, making a direct comparison between CPU and GPU performance is basically impossible. Still, comparing Badaboom to a quick Intel quad-core CPU doing the same encoding job, using the similar X264 codec suggests that GPU performance outstrips the best CPUs by two to five times. Not quite the multiple orders of magnitude the hype had us hoping for. But at the top-end of the range, it’s still a huge leap and a tantalising prospect, assuming the bugs can be ironed out.

Above Running Task Manager with Badaboom shows that it’s not the GPU alone that’s providing the processing grunt

Christmas 2008

PCF221.feature2 75

29/10/08 11:28:41 am

General-purpose GPUs

If you get bored of cutting up aliens, you can always watch a girl’s dress swirl in another of NVIDIA’s PowerPack demos

“INTEL RECKONS LARRABEE’S MORE PROGRAMMABLE DESIGN WILL BEAT CONVENTIONAL GRAPHICS CHIPS “

behind Intel’s upcoming Larrabee graphics chip, reckons that “developers don’t want to get locked into a single interface or architecture.” With that in mind, you may be getting the feeling that this GP-GPU malarkey is going to end up as the usual fisticuffs with NVIDIA’s CUDA taking on the rest. But not according to Keane. He says that NVIDIA is a key member of the OpenCL panel and that it’s working hard to support third party interfaces alongside CUDA. He is adamant, for instance, that NVIDIA has its full support behind the DX11 Compute Shader: “The language used for GP-GPU doesn’t matter. What’s important is getting the value of that parallel processing capability to our customers any way we can,” he says. A worthy sentiment, and hopefully an indication that it’ll be the cross-platform APIs that dominate GP-GPU. It should also mean that in terms of software support, none of us will need to worry about the brand of GPU in our PCs. As new GP-GPU applications appear, most should just work on any graphics card from one of the big three vendors. All of which means we can get suitably anal about the thing that matters most, comparative performance, right? Well, yes, but that only raises an even more tricky question. Who’s graphics chips will be the fastest performing in GP-GPU applications?

KILLER CHIP

UPCOMING APPS Along with NVIDIA’s GeForce Powerpack suite, SiSoftware has also released a new version of the Sandra benchmark with a special GP-GPU test. So far, we have only been able to run it on NVIDIA chips, so it’s not awfully revealing. Adobe has also announced that Creative Suite 4, due out very soon, will be GPU accelerated. That sounds impressive until you inspect the details and ﬁnd that, for instance, the accelerated features in Photoshop will be limited to canvas rotation and zoom at launch. Not exactly comprehensive, then. What’s more, the GP-GPU options in Premiere Pro and After Effects are limited to NVIDIA’s silly-money Quadro workstation cards. Cyberlink is also gearing up to release a GP-GPU capable version of PowerDirector. Version 7 will deliver what CyberLink claims will be “much faster video rendering”, as well as accelerated features such as Gaussian radial blur and Pen ink, for what that is worth. Oh, and once again, we’re talking NVIDIA-only GP-GPU support for PowerDirector 7. 76

PCF221.feature2 76

Above GPU-accelerated apps are slowly trickling through. Creative Suite 4 will use it in a very limited fashion

In truth, we won’t get a feel for the real scope of what GP-GPU can do until DirectX 11 arrives. We do know it will support both the Windows Vista and Windows 7 operating systems. We expect DX11 to presage the widespread use of GPUs to accelerate in-game physics and AI as well as a broad range of more obviously general purpose programs. Intel also assured us that a signiﬁcant number of generalpurpose applications will be available to run on the Larrabee processor from launch day. What more can we say other than watch this space, amigos.

On the one hand, you have AMD and NVIDIA taking a pretty similar approach of developing ever more flexible GPUs. However, these GPUs are still based on the fixed-function model. Large areas of the chip are still reserved for specific purposes, be that anti-aliasing, texture filtering, triangle setup and so on. The key point here is that many of those parts are no good for GP-GPU. Either they aren’t programmable at all or they aren’t flexible enough. But not so for Intel’s new killer, the Larrabee chip. Intel has torched nearly all the fixed-function units found in conventional GPUs, in favour of an architecture that’s close-as-dammit completely programmable. In fact, it’s basically a bunch of mini x86 CPUs packed together into a single chip. Needless to say, Intel reckons Larrabee’s more programmable architecture will beat conventional graphics chips silly for GP-GPU applications. “We have 40 years of x86 legacy and software support,” says Intel’s GP-GPU performance specialist Francois Piednoel. “By comparison, GPUs are constantly changing their architectures. It’s not realistic for developers to keep

Christmas 2008

29/10/08 11:28:41 am

General-purpose GPUs modifying their code for each new graphics architecture.” Intel has also revved up the 1990s-era Pentium cores that form the basis of Larrabee with powerful 16-wide vector units in an effort to give the chip monumental floating-point power – the sort of power that will be the focus of GP-GPU apps. More to the point, Larrabee’s increased programmability means just about all of it will be available to process GP-GPU apps, not just graphics. As it happens, that fact is rather revealing. Larrabee will be pitched as a graphics chip at launch. But in reality it’s probably more important for Intel that it takes the wind out of NVIDIA’s billowing GPGPU sails than anything. It sure looks like Intel has produced the ultimate chip for GP-GPU computing. What’s more Piednoel says that current GPUs are relatively poor at synchronising and sharing data between threads, though he wouldn’t comment

every important chip in your PC we mentioned earlier...

GP-GPU IS C-COMING

Of course, you won’t be shocked to learn that AMD and NVIDIA aren’t on the same message as Intel. AMD’s Koduri says: “The problem isn’t the capabilities of GPUs. It’s the tools used to access them.” Granted, he concedes these capabilities can be “hard to get at” with current chips. But AMD’s next generation of GPUs will be much easier to access. AMD’s Close to the Metal initiative, which has now been running for several years, is intended to make that happen by helping developers better understand AMD graphics architectures. Factor in improved software tools and interfaces, including OpenCL and DX11 and Koduri is confident that AMD’s GP-GPU architecture will more than flexible enough to take on Intel.

Above The new Photoshop will only use GPU-acceleration for canvas rotation and zoom

“SPEAK TO AMD, INTEL OR NVIDIA AND YOU’LL FIND THEY’RE GENUINELY REVVED UP ABOUT GP-GPUS ” on the mutterings we heard at this year’s Intel Developer Forum that NVIDIA’s cache hierarchy isn’t quite as described and requires workarounds for what was described as “arcane mapping for texture cache lines.” Whether that’s true or not, Piednoel is clear about one thing. Larrabee’s architectural advantage is so great: “In ten years, everything in the PC will be x86 based and we will laugh that we ever doubted this idea.” That’s the attempt to own

PCF221.feature2 78

Likewise, NVIDIA’s Keane isn’t going along with the idea that Larrabee will blow the doors off its GP-GPUs for performance: “Larrabee faces the same challenges in getting code parallelised as any other architecture. Intel’s quad-core processors don’t run single-threaded software four times as fast as a single-core CPU and Larrabee’s however-many cores won’t magically accelerate non-parallelised code, either.” Keane reckons thread management

Below There’s a GP-GPU test in the new version of Sandra. It only seems to work with NVIDIA chips, mind

may well be a bit of a snag for the Larrabee architecture. That’s because it’s almost wholly software managed, unlike GPUs, which he claims deliver a more effective hardware managed model for threading. “Follow some simple rules when creating code and our hardware will do the rest of the thread management work for you,” he says. Still, despite the usual three-way cat fight, there is one thing these ancient enemies can agree on – GP-GPUs won’t be killing the CPU any time soon: “I prefer to think of GP-GPUs as enabling new applications rather than offloading work from the CPU,” AMD’s Koduri says. And what applications might those be? Well, an advanced UI that can respond to physical gestures and recognise your mood, courtesy of facial expressions is one possibility. The technology already exists, he claims, but it’s way too compute intensive to run on current desktops. Another option is intelligent, real-time video transcoding that guarantees high image quality in a world where both source files and screen res can often be all over the shop. Andy Keane of NVIDIA agrees. “Our vision is that the CPU will be the system manager. It’s where the operating system lives and provides the benefit of decades of legacy. The GPU will not be stealing workloads from the CPU, so much as doing new things or helping the CPU with work it currently does particularly badly,” he says. So, really, GPGPU is just NVIDIA lending a helping hand? Yeah, right. Still, the idea that the GP-GPU will augment rather than undermine the CPU isn’t the only common ground. Speak to anyone in the know at AMD, Intel or NVIDIA and you’ll find they’re genuinely revved up about the prospects for the GP-GPU and share the same conviction that sooner or later its immense processing potential will be unleashed on a wide range of applications, many of which have yet to even achieve glint-inthe-developer’s-eye proportions. In the end, it’s not a question of if GP-GPU is going to revolutionise PC performance, but simply how soon. ¤

Christmas 2008

29/10/08 11:28:44 am