Grice-QS22

Page 1

IBM Systems and Technology Group

Cell/B.E. processor-based systems and software offerings IBM BladeCenter® QS22 and SDK 3.0

IBM CONFIDENTIAL © 2008 IBM Corporation


IBM Systems and Technology Group

The challenge today For many years, organizations have relied on performance gains from increasing clock speeds of “traditional” microprocessor architectures This approach has been challenged by the physical limitations of semiconductors and by traditional processor architecture implementations High performance computing (HPC) applications need a fundamentally new technology and approach to the system-level architecture to achieve the desired level of performance.

2

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Cell Broadband Engine™ (Cell/B.E.) Technology For a higher of absolute performance and efficiency IBM, Sony, Toshiba Alliance formed in 2000 March, 2001 – STI Design Center opened in Austin, TX April, 2004 - Single Cell BE operational July, 2004 - 2-way SMP operational February, 2005 - first technical disclosures at ISSCC May, 2005 - first public demonstration of Cell/B.E. processor-based system at E3 August, 2005 - published technical details of Cell/B.E. architecture November, 2005 - published open source SDK & Cell/B.E. simulator August, 2006 - introduced the very first Cell/B.E. processor-based server to the market

3

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

IBM commitment to innovation IBM BladeCenter QS22 Extraordinary double precision floating point performance. Large memory capability. Ready for the most demanding production applications PowerXCell™ 8i processor

•BladeCenter QS21 •IBM SDK for Multicore Acceleration 3.0 BladeCenter QS20

Create initial platforms for experimentation

Produce robust production ready systems for targeted industry applications

2008

Sales Conference

2007

2006

4

Produce systems for early adoption and solution enablement

© 2008 IBM Corporation


IBM Systems and Technology Group

Cell Broadband Engine Architecture™ (CBEA) Technology Roadmap Compatible Compatible code code and and security security base base across across entire entire line line IBM PowerXCell 32ii 45nm SOI

Performance Enhancements/ Scaling

IBM PowerXCell™ 8i

(1+8eDP SPE) 65nm SOI

Cost Reduction

Cell/B.E. (1+8) 90nm SOI

Cell/B.E. (1+8) 45nm SOI

Cell/B.E. (1+8) 65nm SOI

Committed Concept

2006

2007

2008

2009

2010

All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. 5

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

IBM PowerXCell™ 8i processor benefits The new PowerXCell 8i processor builds on the Cell Broadband Engine Architecture and combines a general-purpose Power Architecture™ core of modest performance with eight enhanced synergistic processing elements optimized for extreme double precision and single precision computational performance

Sets a new performance standard – Accelerates computationally intense workloads such as analytics, multimedia and vector processing. – Efficient computation per watt

Designed for flexibility – Wide variety of application domains – Cell can cover a wide range of application space with its capabilities in – floating point operations, integer operations – data streaming / throughput support – real-time support – Exploits C/C++, Fortran programming models

Enhanced security capability – Virtual trusted computing environment for security

6

Sales Conference

PowerXCell PowerXCell 8i 8i processor processor 65 65 nm nm

99 cores, cores, 10 10 threads threads 230.4 230.4 GFlops GFlops peak peak (SP) (SP) at at 3.2GHz 3.2GHz

108.8 108.8 GFlops GFlops peak peak (DP) (DP) at at 3.2GHz 3.2GHz Up Up to to 25 25 GB/s GB/s memory memory bandwidth bandwidth

Up Up to to 75 75 GB/s GB/s I/O I/O bandwidth bandwidth 92 92Watts Watts @ @ 3.2GHz 3.2GHz

Top Top frequency frequency >4GHz >4GHz (observed (observed in in lab) lab)

© 2008 IBM Corporation


IBM Systems and Technology Group

PowerXCell 8i uses ½ the space & power and delivers more than 2.3x the GFlops of traditional architecture Example Server Example Desktop PowerXCell 8i Dual Core Quad Core Nine Core 349mm , 3.4 GHz @ 150W 214 mm², 3 GHz @ 130W 2

2 Cores, ~27.2 SP GFlops 1.3b Transistors @ 65nm

On any traditional processor, shown ratio of cores to cache, prediction, & related items illustrated here remains at ~50% of area the chip area 7

Sales Conference

4 Cores, ~96 SP GFlops 820m Transistors @ 45nm

109 mm2 3.2 GHz@ 75W 9 cores, ~ 230 SP GFlops, 250m Transistors @ 65nm

Intel’s x86 Quad Core processors are Dual Chip Modules (DCMs), 2 of these processor stacked vertically & packaged together © 2008 IBM Corporation


IBM Systems and Technology Group

BladeCenter® QS22 – PowerXCell 8i

D D R 2

Core Electronics – Two 3.2GHz PowerXCell 8i Processors – SP: 460 GFlops peak per blade – DP: 217 GFlops peak per blade – Up to 32GB DDR2 800MHz

D D R 2

D D R 2

D D R 2

D D R 2

D D R 2

D D R 2

DDR2 PowerXCell 8i

– Standard blade form factor – Support BladeCenter H chassis

D D R 2

PowerXCell 8i

Rambus® FlexIO ™

Integrated features – Dual 1Gb Ethernet (BCM5704) – Serial/Console port, 4x USB on PCI

IBM South Bridge

D D R 2

Flash, RTC & NVRAM

IBM South Bridge

SPI

PCI

Optional – Pair 1GB DDR2 VLP DIMMs as I/O buffer (2GB total) (46C0501)

4x USB 2.0

– 4x SDR InfiniBand adapter (32R1760) – SAS expansion card (39Y9190) – 8GB Flash Drive (43W3934)

Flash Drive USB to BC mid plane

2 UART, SPI

PCI-X PCI-E x16

HSC *1 2x PCI-E x16

PCI-E x8

HSDC

2x 1GbE

Legacy Con

D D R 2

Optional IB 2 port IB x4 HCA IB-4x to BC-H high speed fabric/mid plane

GbE to BC mid plane

*The HSC interface is not enabled on the standard products. This interface can be enabled on “custom” system implementations for clients by working with the Cell services organization in IBM Industry Systems.

8

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Performance highlights Performance is an order of magnitude better than general purpose processors (GPP) for media and certain applications that can take advantage of its Single Instruction Multiple Data (SIMD) capability – Performance of its simple Power Processor Element (PPE) is comparable to a traditional GPP performance – Each Synergetic Processor Element (SPE) is able to perform mostly the same as a GPP running at the same frequency – Key performance advantage comes from its eight de-coupled SPE engines with dedicated resources including large register files and DMA channels

Accelerates targeted applications with extraordinary processing capabilities – – – –

Floating-point operations Integer operations Data streaming / throughput support Real-time support

Open architecture allows for optimization at compiler and application level – Performance gains from tuning compilers and applications can be significant – Tools/simulators are provided to assist in performance optimization efforts

9

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

IBM BladeCenter QS22 Premier blade for HPC workloads

QS22 is the RIGHT choice for intensive streaming and/or single and double precision floating point workloads QS22 is OPEN – based on Power Architecture and running Linux® OS QS22 is EASY to deploy and to integrate into the existing IT infrastructure and/or workloads: – Co-exist and complement all other Blade servers offerings (Intel®, AMD®, POWER®) – Ready to scale out and deploy in production environments

QS22 is GREEN – more than 1.7 SP (or 0.8 DP) GFLOPS per watt.

10

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

IBM SDK for Multicore Acceleration and related tools The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture

Eclipse-based IDE Simulator IBM XL C/C++ compiler* Optimized compiler for use in creating Cell/B.E. optimized applications. Offers: * improved performance * automatic overlay support * SPE code generation XLC compiler is a complementary product to SDK

GNU tool chain

Performance Tools

Libraries and frameworks Accelerated Library Framework (ALF)

Data Communication and Synchronization (DaCS)

Basic Linear Algebra Subroutines (BLAS)

Standardized SIMD math libraries

Denotes software components included in the SDK for Multicore Acceleration 11

Sales Conference

Š 2008 IBM Corporation


IBM Systems and Technology Group

IBM SDK for Multicore Acceleration value Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support

Based on industry standards to ease the transition to the Cell/B.E. – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in

Designed to make it easy to port and optimize applications for the QS21 and QS22 – Enhancements to enable new features in QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform

12

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Cell Programming Approaches are fully customizable! De c reas

Incr eas

1. “Native” Programming

Compilers, Intrinsics, DMA, etc.

ing pr

ogra ing mm Prog er a ram tten mer tion Con to a rchi trol ove tect r Ce ural ll/B. deta E. re ils sou rces 2. Assisted Programming

Libraries, Frameworks

3. Case Tools / Complete Hardware Abstraction

User tool-driven 13

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Workloads ideal for PowerXCell 8i and QS22 Extreme Stream Computation and Bandwidth requirements Real-time Analytics

Image/Video Creation/Mgt

Unstructured Data

Processing of Data Information Synthesis Analysis

Presentation of Data Visualization Imaging

Multimodal Search Data Transforms Pattern Matching

Market & Solution Specific Assets

Digital Media

Home Media Consumer Electronics

Financial Services Sector

Information Based Medicine

Chemicals & Petroleum

Electronic Design Automation

Digital Video Aerospace Surveillance and Defense

PowerXCell 8i is suited for applications which demand extraordinary floating point performance 14

Sales Conference

Š 2008 IBM Corporation


IBM Systems and Technology Group

Public sector HPC solutions Enable government labs, agencies, and academic research centers to run high performance codes faster, less expensively, and with lower power consumption than existing computing architectures IBM components:

The solution is designed to offer:

– IBM BladeCenter QS21 & QS22

– Petaflop Scalability and reliability

– IBM SDK for Multicore Acceleration

– Lower power and space footprint

– IBM Cell/B.E. math libraries

– Lower total cost of ownership

– IBM hybrid computing solution (custom offering) – PXCAB

ISV applications: – Development tools from RapidMind, Gedae, Wind River, etc.

Performance advantages: – Science code such as SPaSM, VPIC, Milagro, Sweep3D, accelerated up to 4-9X faster than AMD Opteron™ single core (Source: LANL - www.lanl.gov/roadrunner)

– A growing number of university and government research labs with external collaborative missions are exercising existing and emerging science codes

*See Notes on Benchmarks, charts 46 and 47

15

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Aerospace & defense solutions Enhance competitiveness, demonstrate innovation and capture significant government contracts through dramatic performance improvements in real time signal and image processing IBM components: – IBM BladeCenter QS21 & QS22 – IBM SDK for Multicore Acceleration – IBM Cell/B.E. math libraries – IBM hybrid computing solution (custom offering)

Performance advantages: – FFT workloads up to 7.7x faster than 3.0 GHz 2-core Woodcrest x2* – Double Precision Matrix Multiplication up to 2.6x faster than 2.66GHz 4-core Clovertown*

– PXCAB

ISV applications: – Gedae stream, image and signal programming environment – RapidMind development tools – Wind River VxWorks RTOS and WorkBench Tools

“As a time-served radar architect, I can say that Cell/Gedae is something of a dream and should rightly impact the new design market… it is an opportunity that the DoD should not fail to grasp.” - John Roulston, SCImus Solutions, March 2007

*See Notes on Benchmarks, charts 46 and 47

16

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Digital content creation solutions IBM solutions enable Media and Entertainment companies to produce the next generation of animated feature films, games, and advertising content IBM components:

The solution is designed to offer:

– IBM BladeCenter QS21 & QS22

– Rapid turn around of digital assets

– IBM SDK for Multicore Acceleration

– More realistic simulation

– IBM Cell/B.E. math libraries

– An open and flexible solution based on standards

– IBM hybrid computing solution (custom offering)

– Scalability and reliability

– PXCAB – IBM iRT scalable real-time ray tracer

ISV applications – RapidMind development tools

Performance advantages: – 1080p Ray-traced images computed in milliseconds* – 1080p Ambient Occlusion images computed in seconds*

*See Notes on Benchmarks, charts 46 and 47

17

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Digital video surveillance solutions Solutions deliver hardware and enablement for high-density, highly scalable encoding, transcoding, and compositing for digital video surveillance IBM components:

The solution is designed to offer:

– IBM BladeCenter QS21/QS22

– H.264 encoding

– IBM Total Storage

– Encoders for analog cameras

– IBM DVS ADK

– Transcoding to save storage and network costs

ISV applications: – Codec libraries

– Decoding acceleration to reduce workstation costs and improve robustness

– Video distribution software

– Better management and scalability – Network-based surveillance 672 encoders in a rack!

PTZ

– Compute density - with two processors per blade, 14 blades to a chassis, and two chassis to a rack, it is possible to have as many as 672 H.264 encoders in the rack

Performance advantage:

Aggregation Unit

– One Cell/B.E processor running at 3.2 GHz, can encode 12 channels of standard definition video at 30 fps to H.264 (main profile, including CABAC)[1]

16 camera inputs

Coax 16 camera inputs

14 card slots IBM BladeCenter QS21/QS22

IBM BladeCenter-H IBM Total Storage

[1] Source: IBM Research benchmark *See Notes on Benchmarks, charts 46 and 47

18

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

EDA solutions Accelerate computational lithography workload to address turnaround time challenges and at the same time reduce total cost of the computing infrastructure IBM components: – – – – –

Cell/B.E. hybrid cluster IBM BladeCenter QS21 IBM System x / IBM BladeCenter IBM Cluster 1350 integrated cluster Storage: DS4000, N series, DCS9550

ISV applications:

The solution is designed to offer: – Significant run time acceleration – Leverages Cell/B.E. strengths to offer significant speed-up when compared to existing solutions in the market, reducing design turnaround time – Scalability and reliability – Blade form factor improves scalability, compute density and reliability

– Mentor Graphics® Calibre® nmOPC and OPCVerify™

19

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Financial market analytics solutions Enable financial market professionals to perform the required speed, accuracy and highly complex analytics to support trade execution and improve their firms’ competitive position IBM components: – IBM BladeCenter QS22 – IBM SDK for Multicore Acceleration – Dynamic Application Virtualization – Cell/B.E. math libraries

ISV applications: – NAG - Math & Stat Software – Platform Symphony -Grid Computing Environment – Encirq – Event Processing Platform

The solution is designed to offer: – Flexibility and Scalability – IBM Bladecenter QS22 integrates with other Bladecenter Products – IBM SDK, DAV, third party applications for ease of adoption within existing infrastructure – Technical Services with skilled programming expertise and subject matter experts – Power, space and cooling advantages

Performance advantage – Collateralized Debt Obligation (CDO) - 7.5X faster than 2.8 GHz 4-core Harpertown* – 650 million European options /sec using Monte Carlo simulations on QS22 blade*

*See Notes on Benchmarks, charts 46 and 47

20

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Medical imaging solutions Improve the efficiency, productivity, and quality of patient care through dramatic performance improvements in the transmission and analysis of medical images IBM components: – IBM BladeCenter QS21 & QS22

The solution is designed to offer:

– IBM SDK for Multicore Acceleration

– 3D image reconstruction, registration, volume rendering, segmentation

– IBM Cell/B.E. math libraries

– On-demand compression/decompression

– IBM hybrid computing solution (custom offering) – PXCAB

ISV applications: – Advanced image and text analytics – High-performance image compression

Performance advantage: – 16x improvement on MRI image reconstruction over Opteron system – 11x improvement on CT image reconstruction over 3.0GHz Xeon system – 48x improvement on image registration over 3GHz Pentium 4 – 200x shear-warp volume visualization over TI TMS320C80 processor – 40:1 CT study data compression (Source for all above: Mayo Clinic http://www.mayoclinic.org/news2007rst/3996.html )* *See Notes on Benchmarks, charts 46 and 47

21

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Seismic solutions Improve the speed and accuracy of geologic visualization to reduce the cost of evaluating potential targets for oil and gas yielding potential

IBM components: – IBM BladeCenter QS22 – IBM SDK for Multicore Acceleration – IBM Cell/B.E. math libraries – IBM hybrid computing solution (custom offering) – PXCAB – Standard math, vector math, FFT, BLAS, MPI and tridiagonal solver

ISV applications: – Simudyne – Customers own proprietary code

The solution is designed to offer: – High-performance highly accurate rendering of geologic structures – Cost effective HPC environment that has significant performance increases – Scalability and reliability

Performance advantages: – FFT workloads up to 7.7x faster than 3.0 GHz 2-core Woodcrest x2* – Double Precision Matrix Multiplication up to 2.6x faster than 2.66GHz 4-core Clovertown*

*See Notes on Benchmarks, charts 46 and 47

22

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

QS22 summary Premier blade for HPC workloads The QS22 is based on the new PowerXCell 8i processor – built on an enhanced version of the Cell Broadband Engine Architecture The QS22 offers the capabilities you need for your most demanding computational requirements – Offers extraordinary double precision and single precision floating point performance – Supports up to 32GB of processor memory

IBM is working with ISVs and customers to accelerate workloads on the QS22 in targeted application areas The QS22 is extremely efficient, offering more than 1.7 SP (or 0.8 DP) GFLOPS per watt of energy BladeCenter QS22 is Right, Open, Easy and Green 23

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

IBM SDK for Multicore Acceleration summary Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support

RHEL 5.2 Enterprise support Based on industry standards to ease the transition to the Cell/B.E. architecture – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in

Designed to make it easy to port and optimize applications for the QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform

24

Sales Conference

© 2008 IBM Corporation


IBM Systems and Technology Group

Cell/B.E. architecture reaches wide and deep – from consumer products to high performance computing

iinngg s s a a ree IInnccr

tteerr n n e ttaacce a a d nndd d a a e ccaalle s s r r tt ffoo r r o ppo ssuupp

MiniRoadrunner Custom

IBM BladeServer (2 Cell/B.E. or

Roadrunner (16,000 PowerXCell 8i. + AMD)

PowerXCell 8i) Mercury 1u Dual Cell PowerXCell 8i PCI card (Cell/B.E. + Host)

Toshiba SpursEngine (SPU’s. + Host)

SCE PS3 (Cell/B.E. + GPU)

Consumer

Business

Sony Cell/B.E. Computing Unit (Cell/B.E. + GPU + AV I/O)

Enterprise

High Performance Computing

Common OS’s, Infrastructure, Tools, Libraries, Code…

the SAME SPE code runs from end to end 25

Sales Conference

© 2008 IBM Corporation


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.