PC Plus: Build The Ultimate PC by PC Plus Team

30 PAGES OF PROJECTS INSIDE

DIGITAL ART

INTERNET

DIY 3D scanner

Google Chrome hacks

Make amazing images with a webcam and a laser

How to unleash the browser’s hidden powers

FREE SUPERDISC!

ANTIHACKER SECURITY APPS

■ RAM OPTIMISATION TOOLS ■ 60minute game

programming video

Issue 278

www.pcplus.co.uk

SUPERCOMPUTING SPECIAL

Build the ultimate PC Combine all your PCs’ power for maxium performance

RATED!

AMD Phenom II We benchmark AMD’s fastest desktop chip yet

Intel Core i7 PCs

Next-generation computing – get it for only £999

PLUS Colossus, Cray and Blue Gene: Meet the heavyweights

Inside Windows’ memory Make Vista and XP use RAM more efficiently to get a ffa faster ster PC

FEBRUARY 2009 £5.99 OUTSIDE UK & EIRE £6.49

Build a supercomputer

SUPERCOMPUTING SPECIAL

Build the ultimate PC Graham Morrison uses little more than a cheap switch and a stretch of network cable to build a powerful distributed computer

Covered in detail Build the cluster n Hardware compatibility p46 n Re-using hardware p47 n Installing the software p48 n Testing the cluster p49

Step by step n Installing Ubuntu on a node p46 n Installing Distriblend p49

Technology focus n Create shared storage p47 n Taking it further p49

278 February 2009

ou can never have enough processing power, especially if you enjoy working with 3D graphics or compiling your own software. Pleasingly, with a very small outlay, it is easy to use any spare machines you may have to create a single homogeneous computing mega-matrix and calculation engine just by wiring them all together and running the right software. Even modest hardware can make a significant contribution to your net computing power, and if you’ve already got the hardware then you’ve got nothing to lose. PC hardware is now so cheap that buying a couple of extra machines and wiring them into the same computing pool could make a very costeffective expansion. This is what we are going to build, and we’re going to use Ubuntu Linux to do it. Linux can take cluster computing tasks like these in its stride, and you don’t need to fork out for a licence for every machine. Before embarking on this endeavour, we need to make one thing clear. The combined processing power of a cluster of computers can only be used for certain applications. You won’t be able to boost your frame rate on Crysis or Far Cry 2, for example, and you won’t be able to run your everyday applications across the cluster unless they’re designed to do so. Neither are we building a Beowulf cluster, where you’d need to use specialised libraries and programming languages to take

THANKS TO

Our thanks to Scan Computers for lending us four of its most powerful testing machines for this feature. The whole thing ran so fast that we felt we could render The Incredibles in no time at all. www.scan.co.uk

Build a supercomputer

278 February 2009 45

Build a supercomputer Process summary: Install Linux on a node along with packages for remote admin

Step 1: Boot disc

Step 2: Partition magic

Step 3: Create a name

1 Burn the Ubuntu disc image and boot each node in turn. Ubuntu is easy to install, so we’ll only cover the steps that are relevant to building the distributed computing cluster. Choose ‘Install Ubuntu’ from the Boot menu.

1 Be warned, the Prepare Disk Space step will repartition and reformat your drive. Ubuntu should work around an existing Windows installation, but you should back up your data first so you don’t lose it.

1 On the Who Are You? page, make sure that you enter a unique name for the computer. This name is used by most networking applications, and you should have a different one for each node in your network.

Step 4: Remote access

Step 5: Using a node

Step 6: Remote session

1 Ubuntu will now be installed. For remote administration you need to install the ‘openssh-server’ and ‘tightvncserver’ packages from the Synaptic package manager in the Administration menu.

1 Once the remote administration packages have installed, you can use the master desktop to connect to a node using Putty (Windows) or SSH (Linux). You’ll need its IP address, and its username and password.

1 With the command prompt on the remote machine open, type ‘vncserver :1’. This will create a new remote desktop session. You can now use most VNC clients to access the remote desktop on any of your nodes. n

3 advantage of the parallel processing potential within the cluster. We’re going to work with distributed computing, which involves splitting up a task across several machines in a local cluster. As a result, our applications can be far more down to earth and practical. You will be able to dramatically reduce rendering times with Blender, for example, or the compilation time of major apps like the GNU/Linux kernel. You’ll also be able to parallelise any number of tasks and use each machine separately from your master if you wish. As with multicore processors, there’s some inefficiency when scaling jobs across a cluster of processors, but it will almost always be much faster than without the processor working. In theory, you can use any old PC. The minimum requirement is that it must be able to run Linux; so that narrows the choice down to almost any PC from the last 10 years. But in reality, the cluster works best if the machines that you’re linking together are relatively close in specification, especially when you start to take running costs into consideration. 46

278 February 2009

A 1GHz Athlon machine, for example, could cost you over £50 a year in electricity costs. You’d be much better off spending this money on a processor upgrade for a more efficient machine. A similar platform for each computer also makes configuration considerably easier. For our cluster, we used four identical powerful machines. You only need powerful machines if you’re making a living from something computerbased – 3D animation, for instance – where you can weigh the extra cost against increased performance. We’re also going to assume that you have a main machine you can use as the master. This will be the eyes and ears of the cluster, and it’s from here that you’ll be able to set up jobs and control the other machines.

Hardware compatibility Linux has come a long way in terms of hardware compatibility, but you don’t want to be troubleshooting three different network adaptors if you can help it. And it’s the network adaptors that are likely to be the weakest link. Cluster computing is

5Blender is perfect for distributed computing: each PC can work on a frame without any overly complex division of labour algorithms.

dependent on each machine having access to the same data, and that means that data needs to be shuffled between each of the machines on the network cluster continually. Due to the heavy data requirement, wireless is too slow for almost any task, so you’re going to need a physical connection between each machine. The faster your networking hardware, the better. To start with, you should be fine with your standard Ethernet ports as long as they’re capable of megabit speeds (100Mbps). If you need greater capacity, gigabit Ethernet cards are also cheap.

Build a supercomputer

A gigabit switch costing only £20 will let you connect each of your machines together at the maximum speed capacity

Just stick them into a free PCI or PCI-X slot, and then disable the slower port if you can. You’ll also need a way of connecting the machines together. A gigabit switch can be bought for around £20 to £30, and this will let you connect each machine together at the maximum speed capacity. You may be able to use the Ethernet ports on an Internet or wireless router, but

1Complex mathematical calculations, like those dealing with fluid dynamics, are the best candidates for distributed computing.

most will only be able to handle megabit speeds. It might be a good idea to start out using your standard Ethernet ports and your Internet or wireless router and upgrade to a dedicated switch and ports later on. As for general PC requirements, that depends on how you plan to use your cluster. Memory is more important than processing speed because it’s likely that you’ll be using

your combined computing power for large, memory-intensive jobs. That means lots of images and textures in the case of 3D rendering, or lots of libraries, objects and source files in the case of distributed compilation. Each node in a cluster will normally operate independently of the rest. If one machine has a lower specification, it shouldn’t be too much of a bottleneck, but this depends on how you use your cluster. If you’re rendering a 300-frame animation and the slow machine happens to render only a handful of frames, then it has still made some contribution without holding any of the other machines back. But if the same machine is used to help render a single complex frame, it’s likely that you’ll need to wait for the slower machine to complete its work after the faster machines have finished.

Re-using hardware If you’re building these machines from scratch, you can cut corners. Other than for installing the OS, you won’t need a keyboard, mouse, optical drive or screen. You can even avoid using an optical drive for installation by using a tool called UNetbootin. This generates a Ubuntu USB stick 3

Create shared storage Share a directory on one machine with all the others in your cluster Most distributed computing tasks require access to the same data. The simple Blender setup, for example, needs each PC to write its output to the same directory. Fortunately, this is exactly the kind of job that Linux was designed for. The software behind folder sharing is called Samba. It’s a recreation of Microsoft’s own folder-sharing protocol, so it’s widely compatible with both Microsoft and Mac networks. To enable Samba, right-click on the folder that you want to share and choose ‘Sharing Options’. From the window that appears,

tick ‘Share this folder’. Confirm you want Samba to be installed, and after a brief download, you’ll need to restart your desktop session. After restart, right-click on the same folder again and open the Sharing Options window. Now when you tick the ‘Share this folder’ checkbox, nothing will need to be installed. Then enable write access to the folder by ticking the relevant checkbox. Lastly, open a terminal from the Accessories menu and type ‘sudo smb password -a username’, replacing ‘username’ with a node account name. You will also need

the remote node’s password. Type ‘sudo /etc/init.d/samba restart’ and you’re ready to go. Install both the Samba and Smbfs packages using Synaptic on each machine, then type the following to mount the remote folder onto the local filesystem: sudo mkdir /mnt/remote sudo mount -t smbfs -o user=username //server/ sharename /mnt/remote Simply switch ‘sharename’ above for the name of the folder that you enabled sharing for. n

1 If you’re using the Ubuntu desktop, you can enable Samba folder sharing with just a few clicks of your mouse.

278 February 2009 47

Build a supercomputer

The render time for a 120-frame, 640 x 480 resolution character animation was cut from two minutes to less than 30 seconds 3 installation from a standard CD image. However, its success depends on the capabilities of the system BIOS within each machine. BIOS settings are normally accessed by pressing a certain key when you boot your machine. This key is most commonly [F2], but it could also be [F1], [Delete] or [Escape]. Look for any on-screen messages at boot for a clue. From the BIOS you should be able to discern whether your system can boot off external USB devices. You should also disable ‘halt on all errors’ if the option exists. This means that your machine will boot even when there are errors or problems detected – which is what you need if the machine complains that no keyboard, mouse or screen has been detected. Internal storage needs are minimal. You need around 10GB for the operating system, and it’s going to be difficult to find a hard drive that small these days if you haven’t got one or two handy. For storing your project data, we’d recommend using either a large drive (or array) on the master machine, or use an external NAS device connected to the same switch as the other machines in the cluster. Linux can mount remote drives onto the local file system so that apps treat the remote storage as if it were local. Each machine needs to be connected to the switch that we mentioned earlier using a standard

1 You can start experimenting with a simple router and Ethernet ports, and if successful you can upgrade to gigabit switches later on.

Ethernet cable. You also need to connect the switch to your Internet or wireless router. A router is needed to assign a network address to the machines in the cluster; this should happen automatically thanks to the DHCP server running on the router. If you don’t want to use a router (or connect your cluster to the Internet), use your master machine as a DHCP server. This will create IP addresses for each machine on the switch.

Installing the software With the hard stuff out the way, it’s time to install the software. We used the latest version of the Ubuntu Linux distribution, Intrepid Ibex. Ubuntu is our first choice because there is such a wide variety of

Fact file If you’re a programmer or kernel hacker, install a tool called ‘dcc’. This is a replacement for the ‘gcc’ compiler. After you’ve edited a couple of configuration files, you’ll be able to distribute almost any compilation job to the nodes on your cluster without touching the source code.

packages available, and these can be installed through the default package manager without any further configuration. This makes installing apps like Blender across your cluster as easy as typing a single command on each machine, and it also means that you can install support utilities such as the Dr Queue rendering queue manager just as easily. We opted for the Desktop rather than the Server version. This was because we still need the desktops on each node for setting up our various tests, and while we could still do the same with the Server edition (which by default has no desktop), configuration would be easier with a desktop. Installing Ubuntu is easy but a little tedious, as you need to go through the following routine for each machine. After creating a burned CD from the ISO image of the distro, boot each machine in turn with this disc in the optical drive. If it doesn’t boot, check the boot order in your BIOS. Next, choose the ‘Install Ubuntu’ option from the Boot menu and answer the resulting questions. On the Partition Options page, use the entire disk for the installation (unless you’re sharing the machine with a Windows OS), and create the same user account on each machine. This makes setting up the shared storage device easier.

Process summary: Install the Distriblend render queue manager on each machine

Step 1: Install Java

Step 2: Get Distriblend

Step 3: Run the farm

1 Distriblend is a Java app, and it needs a bona fide Sun version of the cross-platform development kit. Use Ubuntu’s Synaptic package manager to install ‘sun-java6-jre’.

1 Visit the Distriblend website (http://net blend.sourceforge.net) and download the default binary distribution package. Unzip the files to a new folder.

1 Right-click on the files and select ‘Open with Java 6 Runtime’. You need to run ‘node. jar’ on each node, ‘client.jar’ on the master machine and ‘distributor.jar’ on the server. n

278 February 2009

Build a supercomputer

Taking it further Why stop with only a couple of machines?

1 You can turn any machine which has Java and Blender installed into a node in your render farm.

You should also give each computer within the cluster its own name (on the Who Are You page). Installation can take up to 45 minutes, depending on the speed of each node. When finished, your machine will restart and you’ll need to log in to your new desktop. It’s likely that you will also have to install a few security updates; a small yellow balloon message will open if these are necessary. The next step is to install the control software. New applications and software can be installed through the Synaptic package manager. This can be found in the ‘System | Administration’ menu. First, you need to search for a package called ‘openssh-server’. In Synaptic, click on the package to enable it, followed by ‘Apply’ to download and install it. You need to do the same for a package called ‘tightvncserver’. Both of these tools are used for remote administration, so you won’t need them if you plan to keep a keyboard, mouse and screen attached to your various nodes. To connect to each machine from a master device running Windows on the network or one of the Linux machines, you need to use an SSH client. The most popular application for Windows is the freely available Putty, while Linux users can simply type ‘ssh -l username IP Address’ into the command-line. To find the IP address of each node, either use your router’s web interface for connected devices (if it has one) or right-click on the Network icon on the Ubuntu desktop and select ‘Connection Information’. If you’re using the command line, type ‘ipconfig’. From an SSH connection, you can launch a remote desktop session by typing ‘vncserver :1’ followed by a password. When the remote session is active, you can connect to a desktop session on each machine using a VNC client such as TightVNC Viewer for Windows or Vinagre for Ubuntu. When specifying the address to

The best thing about working with a cluster of machines like this is that you can easily expand your network to meet any growing requirements. If you find yourself needing a larger render farm, for instance, all you have to do is add another machine to the network. But if you find you’ve got a taste for distributed computing, there are far more technical avenues that you can visit to fuel your hobby. If you’re looking for a challenge, we would recommend the now defunct OpenMosix project. This is a series of patches for the Linux kernel that are used to build a Beowulf cluster. When OpenMosix is running, CPU-intensive processes

1 It might not look like much, but the OpenMosix project can turn any small network of PCs into a high-performance Beowulf cluster.

running on the cluster are automatically distributed to the various nodes on the network, making it an ideal solution for custom-built mathematically intensive processes and data analysis. n

connect to, make sure that you follow the IP numbers with ‘:1’, as this is needed in order to specify which screen session to connect to.

Testing the cluster Now that everything is running as it should, let’s test the combined power of the cluster. The easiest method that we’ve found (and the one that provides immediate results) is to use Blender, a professional-quality open-source 3D animation system that’s particularly mathematically intensive. It’s designed to work with more than one instance running on different machines. You can easily run a job on your server before farming it out to the entire cluster. All you need to do is open an instance of Blender on each machine, get them to read the same ‘.blend’ animation file, switch to the Output page and make sure that every machine is writing its output to the same directory. You then need to enable both the ‘Touch’ and ‘No Overwrite’ options. When rendering is started, each Blender instance will grab the next available frame that hasn’t been created by another. It’s an effective and pretty quick way of spreading the load. There are two downsides to this option. Firstly, you’ll need to have a screen connected to each node because Blender won’t run over VNC. Secondly, Blender will only work with animations. Single-frame images can’t be distributed. Alternative solutions are more complicated, but happily, both of these problems can be overcome by installing a render farm queue manager. The best tool that we found was Distriblend. This suite of free Java tools automatically distributes

1 Even if you only have one PC, you can still use distributed clustering technology through the Folding@Home project.

Fact file Nvidia has just launched its first supercomputing PC. Named the Tesla Personal Supercomputer, it harnesses the power of Nvidia’s Tesla C1060 GPU. According to Nvidia, the machine has 250 times the processing power of a traditional PC workstation.

Blender jobs between the various nodes of your cluster. You use a client to add Blender jobs to a distributor that then sends the jobs on to each node. We used Distriblend for our distributed rendering benchmarks, and we were able to almost quadruple our processing speed across our four machines. The render time for a 120-frame, 640x480 resolution character animation example from Blender Nation was cut from two minutes to less than 30 seconds. On a more complex fluid dynamics job (‘Vector blurred fluid’ by Matt Ebb), we saved ourselves over two hours by distributing the 50 frames of animation across our cluster. And that’s just the beginning. If you love 3D graphics, this soon feels like the only way to render. The beauty of it is that your main machine remains unburdened while your cluster does all the hard work. n Along with being a cluster-computing guru, Graham Morrison is Linux Format’s Reviews Editor. feedback@pcplus.co.uk 278 February 2009 49