The future of 3D

Page 1

MOTHERBOARD SUPERTEST ISSUE 226 MAY 2009

PERFORMANCE GEAR & GAMING

ISSUE 226 I’VE BEEN IONIZED, BUT I’M OK NOW

BRAIN MELTING

I’VE BEEN IONIZED, BUT I’M OK NOW

HARDCORE PC ADVICE Pro tips Hacking, Tweaking, Overclocking and Modding

BUDGET PROJECTION: BUILD A SCREEN AND MOUNT ON THE CHEAP

Big screen action on a shoe-string budget

WWW.PCFORMAT.CO.UK

THERE’S MORE… £550 Intel vs AMD PCs How to fix any PC problem The tech behind new MMOs How to build an

EVE Corp

Issue 226 May 2009 £5.99 Outside UK & ROI £6.49

WORLD IN CONFLICT SOVIET ASSAULT PCF226.cover 1

GODFATHER 2 THOU SHALT NOT KILL 20/3/09 5:30:40 pm


74

PCF226.feature2 74

May 2009

23/3/09 5:34:28 pm


The future of

3D

A new generation of incredible graphics power is less than a year away, Neil Mohr lines up the new technology.

A

re we addicted to graphics power? Are NVIDIA and AMD, in reality, the pushers of the most addictive shader skag? In any other walk of life you’d be dragged off to rehab for this intensely self-destructive cycle of substance abuse. But just like that heroin addict, who’s currently trying to break into your home and steal your pride and joy gaming system, the PC industry is addicted to graphics. It’s the driving force behind the games we love and the reason we love the PC so much in the first place and why consoles will always be toys in comparison.

Of course, we could turn around, throw our hands heavenwards and shout ‘Enough is enough, this madness must end! Stop the development! My graphics card is good enough.’ But where would that get us? We’d still be playing Tomb Raider on an original 3dfx Voodoo card. The desperate truth is we need; we long for that hit of hardcore 3D acceleration. We need help, we need treatment, what we need is the latest graphics card. Sliding that long card into a tight PCI Express slot always feels so good. For many years now we’ve been happy in this abusive relationship. Clinging to our ageing card, trying to scrap the last remnants of a decent frame rate together by installing hacked drivers and dropping the resolution, until

we end up crawling back to our favourite green or red dealer for a fresh hit of delicious 3D. But today that magic hit isn’t just about graphics: from HD decoding and physics acceleration to GP-GPU features, that graphics card is offering a lot of technology. The next generation of cards is set to up this technology to a new level and with the advent of a new pusher on the scene – in the form of chip-giant Intel – the entire graphics market is set for an enormous shake up. It’ll be a combination of new competition, changing demands and the evolving technology that is bringing general processing and graphics processing closer and closer together. But what will happen when these two worlds collide? Lets find out…

May 2009

PCF226.feature2 75

75

23/3/09 5:34:31 pm


The future of 3D Not that we want to dwell on the past but how did this addictive relationship start? The answer lies with what PCs were back in the early nineties and how 3D images are generated in the first place. So lets take you back, back, back in time to when the original Doom, Duke Nuke ‘em 3D and Wing Commander adorned our screens. 3D gaming was a simplistic affair – sometimes referred to as ‘vector games’. 3D line objects were made of vectors; a mathematical construct and nothing more than a line in space defined by two points. Put three vectors together and you get a triangle, put enough triangles together and you can form anything. Luckily for your average 7MHz, 16-bit processor, vectors can be manipulated using simple matrices functions, so they can be scaled and rotated in our imaginary space before being drawn to the screen. But lines aren’t very exciting, unless they’re white and Bolivian in origin. As a stepping stone to true 3D, Doom and its clones were based on 2D maps that had simple height information and the actual 3D effect were a textured wall projection. Similarly, the monsters were flat bitmaps positioned on that same 2D map, scaled according to their distance from the player. This combined with pseudo lighting effects enabled id Software to generate a basic, fully textured 3D world on a lowly 386 PC. Faster processors have enabled devs to combine the texture handling used in Doom with a true full 3D Vector Engine to create the likes of Descent in 1995, and in 1996, the seminal Quake. But despite all the cleverness of these

Hidden line removable, totally awesome!

engines, incredibly basic abilities, such as texture filtering were and remain simply too processor intensive for a standard CPU to even consider attempting in real time.

ACCELERATION HEAVEN

Below From a little voodoo, do mighty pipelines grow

The first time our gelatinous eyeballs gazed upon the smooth textures and lighting effects in Quake or the explosive effects of Incoming, we were hooked. It was these types of effects and abilities that enabled a mid-nineties PC to pull off arcade-level graphics. While not wanting to delve into degree level subjects, to really understand why

DIRECT X 10 PIPELINE 2006 INDEX BUFFER

INPUT ASSEMBLER

VERTEX BUFFER

DIRECT X 7 PIPELINE 1999

VERTEX SHADER

INPUT ASSEMBLER

TRANSFORM LIGHTING

VOODOO PIPELINE 1995 RASTERISATION

GEOMETRY SHADER

SAMPLER CONSTANT STREAM OUTPUT

TEXTURE

TEXTURE STREAM BUFFER

CLIP + PROJECT + SETUP+EARLY Z RASTERIZE CLIPPING RASTERISATION PIXELSHADER

TEXTURE UNIT

SAMPLER CONSTANT

MULTITEXTURE UNIT

SAMPLER CONSTANT

TEXTURE

DEPTH/STENCIL

SCREEN READER

76

PCF226.feature2 76

OUTPUT MERGER

OUTPUT MERGER

RENDER

graphics cards exist as they do today, it’s helpful to know what’s required to create that eye-pleasing 3D display we so enjoy. As you’ll see graphics cards started with handling only a fraction of the total process, up until today where they embrace almost the entire task. We’ve already mentioned vectors and how they can be used to build up models from triangular meshes. You start here with your models, these need to be transformed and scaled to fit into a virtual ‘world view’, the application then applies a ‘view space’, which is how the player will view this world. It’s a pyramid volume cut out of the world space and bounds the only area of interest to the renderer. From this pyramid we get the clipping space, which is the visible square of our virtual viewport and finally these are translated into the screen space where the 2D x/y coordinates are calculated ready for the pixel rendering. These steps are important as originally this was done on the CPU, but stages were slowly shifted to the GPU. So are you still with us? As that’s the simple part, each of those ‘views’ is required for different stages in rendering. For instance, to help optimise the rendering it makes sense to discard all the undrawn triangles. Occlusion culling will remove obscured objects, trivial clipping removes objects outside the ‘view space’ and finally culling determines which triangles are facing away from the viewer and so can be ignored. The clipping space view is created from this remaining world space and any models that bisect the viewing boundary box need to be clipped off and retessellated, leaving only the visible triangles in the final scene.

May 2009

23/3/09 5:34:34 pm


The future of 3D

DIRECTX 11’S GOT LEGS?

Our favourite 3D game and no GPU in sight

LIGHT & BRIGHT

With an optimised view space created, lighting can be applied. It’s important to understand this isn’t the visual representation of light, it’s calculating how ‘bright’ every surface is going to be. A scene can have a global light-source, along with point-sources and spotlightsources. Every triangle surface will have material properties, such as ambient, diffuse, specular and emissive material colours. For every source and every triangle a calculation will be made to determine its total luminosity. As you can imagine the more sources there are, the larger the calculation expense. It’s

dumb, fixed-units that could only perform a single render pass. Multitexturing and multi-pass rendering improved visual quality and when DirectX 7.0 was released in 1999, graphics cards got a little smarter because of Transform and Lighting (T&L). T&L moved the lighting and vertex transformation stages on to the graphics card and was the first move away from CPU-based vertex handling. It wasn’t until the introduction of DirectX 8 that things really got interesting, as the first shaders appeared. Vertex shaders enable programmers to manipulate vertices

“It wasn’t until DirectX 8 that things really got interesting, as the first shaders appeared” important to remember that, at this stage, all we know is the luminance for each triangular surface, the actual rendering comes later. If you want to know more about the pixel rendering stage see the pipeline diagram on page 76. For now lets just say each pixel can now be blended with its corresponding lighting values, textures and other effects, such as bump maps and light maps. On top of this each pixel will have filtering applied, fogging, shadow values and even antialiasing to produce the final image. If you’re feeling a bit dazed and wondering what that’s useful for, it’s so you have an overview of what goes into creating a single 3D frame, which is on screen for mere milliseconds. As graphics cards have developed more of that pipeline has been moved or added to the graphics card. With the original 3D cards, only the end rasterisation and rendering stages were performed on-card and that was by

PCF226.feature2 77

directly on the card, while pixel shaders replaced the fixed multi-texture engines with programmable ones. These gave graphics cards their first smarts, even though these were limited; there couldn’t be any branches in the code, there were limits on the number of commands and variables, plus the total program length was very short. So while technically these cards were running programs of a sort, the two types of shader units were different in design and very limited.

The SDK preview is already available for developers, we’ve even got our hands on it as part of the Windows 7 Beta. The hardware is well on the way and will be out in the latter half of 2009, but what can we expect from DirectX 11? Built atop of DirectX 10’s Windows Graphics Foundation the new version is pushing the idea of the graphics card as a GP-GPU system and adds evermore complex pipeline manipulation into the hardware. The most radical feature is the implementation of the new ‘compute’ shader, its soul aim is to lay open the power of the GPU for general processing tasks, including physics and media encoding, to name but two. A new ability in DirectX 11 is a combination of three new features: the hull shader, hardware tessellator and domain shader. Before Shader 4.0 and the geometry shader, graphics hardware could only manipulate existing vertexes rather than create them. The geometry shader changes that and enables DirectX 11 hardware to be fed a model and generate extra detail. The initial thinking is that low-polygon models will get enhanced, but equally it’d enable high-end hardware to generate highly detailed models, while lower-end hardware makes do with the basic models. We’re glad to see that Microsoft has recognised that more people have multi-core processors, perhaps it has been looking at the Steam Hardware Survey recently? DirectX 11 finally offers ways for developers to multi-thread areas of the 3D pipeline. Much of it is sequential, but new Immediate Context and Deferred Context resources will enable better use of multi-core chips as it’ll enable resources to be loaded in separately for different threads and it will even work with DirectX 10 hardware as long as it’s running on Vista or Windows 7. It may not add any outstanding eye candy, but what DirectX 11 does add is a heap of new tools that developers can go to task on, more so than anything DirectX 10 brought to the game.

THE SMART STUFF

It took until DirectX 9.0c was released in 2004 with Shader Model 3.0 that cards started to look more like a collection of smart processors than dumb fixed logic. Dynamic branching, program lengths over 512 commands and access to hundreds of registers made graphics cards sound more like mini-super computers. The final evolution came with unified shaders introduced in

Will DX11 lead to vastly better looking games? Probably not

23/3/09 5:34:37 pm


The future of 3D heart is based, in part, on the original x86 Pentium core. Intel is on record as saying it can, in theory, run OS kernellevel code. The idea is to take a bunch of optimised, in-order x86 Pentium cores, add in a Vector Processing Unit and tie the whole thing together via each core’s L2 cache using a high-speed ring bus. Alongside the multi-core design there’s a dedicated texture filtering unit, plus the usual extra gubbins for the memory controller, display and system interfaces. Intel is approaching the problem in the opposite direction to AMD and NVIDIA. It’s almost dumbing-down an DirectX 10 and Shader Model 4.0. At this point there’s no distinction between vertex or pixel shaders. Cards have ‘unified’ shaders, akin to having hundreds of tiny dedicated processing units and are found on both the GeForce 8 and Radeon HD 2000, and later generations of cards. This has enabled both AMD and NVIDIA to start offering GP-GPU features and programming languages for current graphics cards and which allow them to process physics and other mathematically complex data alongside 3D rendering.

Above GP-GPU can help accelerate everything from weather prediction to testing WMDs

LARRY WHO?

As testament to the idea that shaders are becoming processors in their own right, Intel is wading into the graphics arena and the ripples could permanently erode the market that once seemed so rock solid. As we already know the new GPU is codenamed Larrabee and its

ray tracing. This shows the huge acceleration potential GP-GPU solutions have in the real world. Currently no one has any idea how well Larrabee will perform, if it performs at all. However, we managed to dig out some figures from a paper Intel published. It estimates the performance of a Larrabee processor running F.E.A.R., Gears of War and Half-Life 2: Episode 2. The most interesting section took the DirectX commands generated from a sequence of random frames from each of these games. These commands were fed through a ‘functional model’ of Larrabee rendering at 1,600x1,200 with

“Intel is wading into the graphics arena and the ripples could permanently erode the market” x86 core to help fit as many as possible onto a GPU die. All parties are selling these as more than just a graphics solution. Intel is partnering with Dreamworks, who will be using Larrabee as an accelerated computing platform for ray tracing frames within its animated features. With Intel measuring a 1GHz, 24-core Larrabee GPU running almost five times faster than an eight-core Xeon processor at 2.6GHz at

4x AA. The test was to see how many 1GHz cores were required to keep a constant 60fps output for each game. The answers is between 10 and 24 cores depending on the game. Clearly this is nowhere near the performance of top-end cards, the frames would have to be nearer 180fps at that resolution, but even so at 3GHz with 24 cores that would be achievable and still in the realms of reality. Shadows are annoyingly difficult to generate, so it’s best just turn all the lights off

3D PIPELINES We went deep down the rabbit hole covering the various ‘view’ transformation and lighting technologies, but there’s still a way to go yet. Before rendering can begin there’s the triangle set-up phase. Also known as ‘rasterisation’ or ‘scan-line conversion’. For each triangle its vertex data is used to calculate which screen pixels it intersects with and a colour value for that pixel is generated. Now, we can finally start to render the screen image. The most visible and well-known area is texturing. For each triangle a corresponding texture is translated, rotated and scaled to its correct coordinates, also known as a ‘texel’. The lighting and colour data is applied and blended to the correct brightness, along with the texture filtering. Multitexturing is also applied at this stage, if necessary over 78

PCF226.feature2 78

multiple passes. This is used to apply many effects: bump mapping, light maps, specular lighting and other visual tricks. This would create a perfectly acceptable finished frame, however there are still a number of effects that can be applied. Fogging is the first and is rendered on a per-pixel basis, most likely using a look-up table for volumetric fogging. Opacity is also alpha values on a vertex basis, enabling

glass and water effects. Shadows are another effect and are often applied via a stencil buffer that generates shadow volumes blended to the final image. Finally, antialiasing is applied, this is either via brute force of rendering the frame at twice or four times the resolution and sampling it down, while more interesting antialiasing techniques, such as multisample and adaptive sampling can also be applied.

May 2009

23/3/09 5:34:40 pm


The future of 3D

A FUSION FUTURE

With 900 million transistors we’re surprised these things are so cheap

WHEN LARRY COMES

Multi-Threaded Wide SIMD

Multi-Threaded Wide SIMD

I$

I$

D$

D$

Multi-Threaded Wide SIMD

I$

D$

Multi-Threaded Wide SIMD

I$

D$

System Interface

Texture Logic

L2 Cache

Above AMD is well advanced with its all-in-one CPU-GPU Fusion tech

Below How the Larrabee GPU looks like to a cubist

Memory Controller

Fixed Function

accept fab costs are closer to that of a full processor than a GPU, at roughly half the transistor count of a 3GHz Core i7, the consumer price could be up to £230. That’s not including the 1GB of GDDR5, of course. The issue is whether Intel can put out a GPU that’s affordable and a good performer, when the Larrabee’s launched. At least AMD and NVIDIA will put us out of our misery soon enough, as they’re both expected to field hardware supporting DirectX 11 in the second half of 2009. It will be interesting to see, which of the two has the most powerful GP-GPU solution, but regardless Intel won’t get an easy ride. The quality of Intel’s drivers is going to be a key issue and support for dual-GPU or SLI-style, dual-card support may be a necessity, if it wants to compete for the performance crown. ¤

Display Interface

Memory Controller

By the time Larrabee launches, it could be almost 2010 and both NVIDIA and AMD will have had next-gen DirectX 11 devices well out of the stable. Intel’s own figures show that its core scaling works well up to and over 48 cores with apparently only a two to ten per cent drop in performance. It’s impossible at this stage to know how much a Larrabee card will cost, but we can make several massive assumptions based on existing technology. For example, a 24-core GPU would require 6MB of L2 cache, that’s roughly 300 million transistors. Lets guesstimate that the x86 modified Pentium cores are twice their original sizes at 6 million transistors, that’s around 450 million transistors in total for a 24-core Larrabee GPU. Now, if you accept those transistor counts and

We still remember when computers had separate floating point units – yes we’re that old – which was insane, but at the time it was the only option that worked. Now, of course, the FPU is integrated into the processor core. The memory controller has been absorbed and now the GPU is next on the list. AMD has staked its claim with its Fusion processor; an AMD multi-core chip that combines unified shader and video decoding features. Intel will have a competing product at the end of 2009, based on dual-core Nehalem-C architecture. The graphics core is based on its existing 65nm chipset graphics core and will have serious speed improvements via increased bandwidth and clock speeds from being on the CPU die. Both of these are low-power, laptop and entry-level desktop technologies, which is bad news for NVIDIA as it’s being pushed out of a large chunk of the lucrative laptop and netbook market, as well as losing its chipset market for the desktop. Without an x86 processor its hard to see how NVIDIA can compete, leaving it with just the mid- and high-end discrete graphics market, which it will now have to compete for against both Intel and AMD. NVIDIA has publicly stated it expects to have a low-power x86 chip within a few years, but without the right licenses to produce one, it’ll be interesting to see how legally it’ll be able to pull that off. May 2009

PCF226.feature2 79

79

23/3/09 5:34:47 pm


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.