IBM Systems and Technology Group
Cell/B.E. processor-based systems and software offerings IBM BladeCenter® QS22 and SDK 3.0
IBM CONFIDENTIAL © 2008 IBM Corporation
IBM Systems and Technology Group
The challenge today For many years, organizations have relied on performance gains from increasing clock speeds of “traditional” microprocessor architectures This approach has been challenged by the physical limitations of semiconductors and by traditional processor architecture implementations High performance computing (HPC) applications need a fundamentally new technology and approach to the system-level architecture to achieve the desired level of performance.
2
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Cell Broadband Engine™ (Cell/B.E.) Technology For a higher of absolute performance and efficiency IBM, Sony, Toshiba Alliance formed in 2000 March, 2001 – STI Design Center opened in Austin, TX April, 2004 - Single Cell BE operational July, 2004 - 2-way SMP operational February, 2005 - first technical disclosures at ISSCC May, 2005 - first public demonstration of Cell/B.E. processor-based system at E3 August, 2005 - published technical details of Cell/B.E. architecture November, 2005 - published open source SDK & Cell/B.E. simulator August, 2006 - introduced the very first Cell/B.E. processor-based server to the market
3
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
IBM commitment to innovation IBM BladeCenter QS22 Extraordinary double precision floating point performance. Large memory capability. Ready for the most demanding production applications PowerXCell™ 8i processor
•BladeCenter QS21 •IBM SDK for Multicore Acceleration 3.0 BladeCenter QS20
Create initial platforms for experimentation
Produce robust production ready systems for targeted industry applications
2008
Sales Conference
2007
2006
4
Produce systems for early adoption and solution enablement
© 2008 IBM Corporation
IBM Systems and Technology Group
Cell Broadband Engine Architecture™ (CBEA) Technology Roadmap Compatible Compatible code code and and security security base base across across entire entire line line IBM PowerXCell 32ii 45nm SOI
Performance Enhancements/ Scaling
IBM PowerXCell™ 8i
(1+8eDP SPE) 65nm SOI
Cost Reduction
Cell/B.E. (1+8) 90nm SOI
Cell/B.E. (1+8) 45nm SOI
Cell/B.E. (1+8) 65nm SOI
Committed Concept
2006
2007
2008
2009
2010
All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. 5
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
IBM PowerXCell™ 8i processor benefits The new PowerXCell 8i processor builds on the Cell Broadband Engine Architecture and combines a general-purpose Power Architecture™ core of modest performance with eight enhanced synergistic processing elements optimized for extreme double precision and single precision computational performance
Sets a new performance standard – Accelerates computationally intense workloads such as analytics, multimedia and vector processing. – Efficient computation per watt
Designed for flexibility – Wide variety of application domains – Cell can cover a wide range of application space with its capabilities in – floating point operations, integer operations – data streaming / throughput support – real-time support – Exploits C/C++, Fortran programming models
Enhanced security capability – Virtual trusted computing environment for security
6
Sales Conference
PowerXCell PowerXCell 8i 8i processor processor 65 65 nm nm
99 cores, cores, 10 10 threads threads 230.4 230.4 GFlops GFlops peak peak (SP) (SP) at at 3.2GHz 3.2GHz
108.8 108.8 GFlops GFlops peak peak (DP) (DP) at at 3.2GHz 3.2GHz Up Up to to 25 25 GB/s GB/s memory memory bandwidth bandwidth
Up Up to to 75 75 GB/s GB/s I/O I/O bandwidth bandwidth 92 92Watts Watts @ @ 3.2GHz 3.2GHz
Top Top frequency frequency >4GHz >4GHz (observed (observed in in lab) lab)
© 2008 IBM Corporation
IBM Systems and Technology Group
PowerXCell 8i uses ½ the space & power and delivers more than 2.3x the GFlops of traditional architecture Example Server Example Desktop PowerXCell 8i Dual Core Quad Core Nine Core 349mm , 3.4 GHz @ 150W 214 mm², 3 GHz @ 130W 2
2 Cores, ~27.2 SP GFlops 1.3b Transistors @ 65nm
On any traditional processor, shown ratio of cores to cache, prediction, & related items illustrated here remains at ~50% of area the chip area 7
Sales Conference
4 Cores, ~96 SP GFlops 820m Transistors @ 45nm
109 mm2 3.2 GHz@ 75W 9 cores, ~ 230 SP GFlops, 250m Transistors @ 65nm
Intel’s x86 Quad Core processors are Dual Chip Modules (DCMs), 2 of these processor stacked vertically & packaged together © 2008 IBM Corporation
IBM Systems and Technology Group
BladeCenter® QS22 – PowerXCell 8i
D D R 2
Core Electronics – Two 3.2GHz PowerXCell 8i Processors – SP: 460 GFlops peak per blade – DP: 217 GFlops peak per blade – Up to 32GB DDR2 800MHz
D D R 2
D D R 2
D D R 2
D D R 2
D D R 2
D D R 2
DDR2 PowerXCell 8i
– Standard blade form factor – Support BladeCenter H chassis
D D R 2
PowerXCell 8i
Rambus® FlexIO ™
Integrated features – Dual 1Gb Ethernet (BCM5704) – Serial/Console port, 4x USB on PCI
IBM South Bridge
D D R 2
Flash, RTC & NVRAM
IBM South Bridge
SPI
PCI
Optional – Pair 1GB DDR2 VLP DIMMs as I/O buffer (2GB total) (46C0501)
4x USB 2.0
– 4x SDR InfiniBand adapter (32R1760) – SAS expansion card (39Y9190) – 8GB Flash Drive (43W3934)
Flash Drive USB to BC mid plane
2 UART, SPI
PCI-X PCI-E x16
HSC *1 2x PCI-E x16
PCI-E x8
HSDC
2x 1GbE
Legacy Con
D D R 2
Optional IB 2 port IB x4 HCA IB-4x to BC-H high speed fabric/mid plane
GbE to BC mid plane
*The HSC interface is not enabled on the standard products. This interface can be enabled on “custom” system implementations for clients by working with the Cell services organization in IBM Industry Systems.
8
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Performance highlights Performance is an order of magnitude better than general purpose processors (GPP) for media and certain applications that can take advantage of its Single Instruction Multiple Data (SIMD) capability – Performance of its simple Power Processor Element (PPE) is comparable to a traditional GPP performance – Each Synergetic Processor Element (SPE) is able to perform mostly the same as a GPP running at the same frequency – Key performance advantage comes from its eight de-coupled SPE engines with dedicated resources including large register files and DMA channels
Accelerates targeted applications with extraordinary processing capabilities – – – –
Floating-point operations Integer operations Data streaming / throughput support Real-time support
Open architecture allows for optimization at compiler and application level – Performance gains from tuning compilers and applications can be significant – Tools/simulators are provided to assist in performance optimization efforts
9
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
IBM BladeCenter QS22 Premier blade for HPC workloads
QS22 is the RIGHT choice for intensive streaming and/or single and double precision floating point workloads QS22 is OPEN – based on Power Architecture and running Linux® OS QS22 is EASY to deploy and to integrate into the existing IT infrastructure and/or workloads: – Co-exist and complement all other Blade servers offerings (Intel®, AMD®, POWER®) – Ready to scale out and deploy in production environments
QS22 is GREEN – more than 1.7 SP (or 0.8 DP) GFLOPS per watt.
10
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
IBM SDK for Multicore Acceleration and related tools The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture
Eclipse-based IDE Simulator IBM XL C/C++ compiler* Optimized compiler for use in creating Cell/B.E. optimized applications. Offers: * improved performance * automatic overlay support * SPE code generation XLC compiler is a complementary product to SDK
GNU tool chain
Performance Tools
Libraries and frameworks Accelerated Library Framework (ALF)
Data Communication and Synchronization (DaCS)
Basic Linear Algebra Subroutines (BLAS)
Standardized SIMD math libraries
Denotes software components included in the SDK for Multicore Acceleration 11
Sales Conference
Š 2008 IBM Corporation
IBM Systems and Technology Group
IBM SDK for Multicore Acceleration value Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support
Based on industry standards to ease the transition to the Cell/B.E. – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in
Designed to make it easy to port and optimize applications for the QS21 and QS22 – Enhancements to enable new features in QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform
12
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Cell Programming Approaches are fully customizable! De c reas
Incr eas
1. “Native” Programming
Compilers, Intrinsics, DMA, etc.
ing pr
ogra ing mm Prog er a ram tten mer tion Con to a rchi trol ove tect r Ce ural ll/B. deta E. re ils sou rces 2. Assisted Programming
Libraries, Frameworks
3. Case Tools / Complete Hardware Abstraction
User tool-driven 13
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Workloads ideal for PowerXCell 8i and QS22 Extreme Stream Computation and Bandwidth requirements Real-time Analytics
Image/Video Creation/Mgt
Unstructured Data
Processing of Data Information Synthesis Analysis
Presentation of Data Visualization Imaging
Multimodal Search Data Transforms Pattern Matching
Market & Solution Specific Assets
Digital Media
Home Media Consumer Electronics
Financial Services Sector
Information Based Medicine
Chemicals & Petroleum
Electronic Design Automation
Digital Video Aerospace Surveillance and Defense
PowerXCell 8i is suited for applications which demand extraordinary floating point performance 14
Sales Conference
Š 2008 IBM Corporation
IBM Systems and Technology Group
Public sector HPC solutions Enable government labs, agencies, and academic research centers to run high performance codes faster, less expensively, and with lower power consumption than existing computing architectures IBM components:
The solution is designed to offer:
– IBM BladeCenter QS21 & QS22
– Petaflop Scalability and reliability
– IBM SDK for Multicore Acceleration
– Lower power and space footprint
– IBM Cell/B.E. math libraries
– Lower total cost of ownership
– IBM hybrid computing solution (custom offering) – PXCAB
ISV applications: – Development tools from RapidMind, Gedae, Wind River, etc.
Performance advantages: – Science code such as SPaSM, VPIC, Milagro, Sweep3D, accelerated up to 4-9X faster than AMD Opteron™ single core (Source: LANL - www.lanl.gov/roadrunner)
– A growing number of university and government research labs with external collaborative missions are exercising existing and emerging science codes
*See Notes on Benchmarks, charts 46 and 47
15
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Aerospace & defense solutions Enhance competitiveness, demonstrate innovation and capture significant government contracts through dramatic performance improvements in real time signal and image processing IBM components: – IBM BladeCenter QS21 & QS22 – IBM SDK for Multicore Acceleration – IBM Cell/B.E. math libraries – IBM hybrid computing solution (custom offering)
Performance advantages: – FFT workloads up to 7.7x faster than 3.0 GHz 2-core Woodcrest x2* – Double Precision Matrix Multiplication up to 2.6x faster than 2.66GHz 4-core Clovertown*
– PXCAB
ISV applications: – Gedae stream, image and signal programming environment – RapidMind development tools – Wind River VxWorks RTOS and WorkBench Tools
“As a time-served radar architect, I can say that Cell/Gedae is something of a dream and should rightly impact the new design market… it is an opportunity that the DoD should not fail to grasp.” - John Roulston, SCImus Solutions, March 2007
*See Notes on Benchmarks, charts 46 and 47
16
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Digital content creation solutions IBM solutions enable Media and Entertainment companies to produce the next generation of animated feature films, games, and advertising content IBM components:
The solution is designed to offer:
– IBM BladeCenter QS21 & QS22
– Rapid turn around of digital assets
– IBM SDK for Multicore Acceleration
– More realistic simulation
– IBM Cell/B.E. math libraries
– An open and flexible solution based on standards
– IBM hybrid computing solution (custom offering)
– Scalability and reliability
– PXCAB – IBM iRT scalable real-time ray tracer
ISV applications – RapidMind development tools
Performance advantages: – 1080p Ray-traced images computed in milliseconds* – 1080p Ambient Occlusion images computed in seconds*
*See Notes on Benchmarks, charts 46 and 47
17
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Digital video surveillance solutions Solutions deliver hardware and enablement for high-density, highly scalable encoding, transcoding, and compositing for digital video surveillance IBM components:
The solution is designed to offer:
– IBM BladeCenter QS21/QS22
– H.264 encoding
– IBM Total Storage
– Encoders for analog cameras
– IBM DVS ADK
– Transcoding to save storage and network costs
ISV applications: – Codec libraries
– Decoding acceleration to reduce workstation costs and improve robustness
– Video distribution software
– Better management and scalability – Network-based surveillance 672 encoders in a rack!
PTZ
– Compute density - with two processors per blade, 14 blades to a chassis, and two chassis to a rack, it is possible to have as many as 672 H.264 encoders in the rack
Performance advantage:
Aggregation Unit
– One Cell/B.E processor running at 3.2 GHz, can encode 12 channels of standard definition video at 30 fps to H.264 (main profile, including CABAC)[1]
16 camera inputs
Coax 16 camera inputs
14 card slots IBM BladeCenter QS21/QS22
IBM BladeCenter-H IBM Total Storage
[1] Source: IBM Research benchmark *See Notes on Benchmarks, charts 46 and 47
18
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
EDA solutions Accelerate computational lithography workload to address turnaround time challenges and at the same time reduce total cost of the computing infrastructure IBM components: – – – – –
Cell/B.E. hybrid cluster IBM BladeCenter QS21 IBM System x / IBM BladeCenter IBM Cluster 1350 integrated cluster Storage: DS4000, N series, DCS9550
ISV applications:
The solution is designed to offer: – Significant run time acceleration – Leverages Cell/B.E. strengths to offer significant speed-up when compared to existing solutions in the market, reducing design turnaround time – Scalability and reliability – Blade form factor improves scalability, compute density and reliability
– Mentor Graphics® Calibre® nmOPC and OPCVerify™
19
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Financial market analytics solutions Enable financial market professionals to perform the required speed, accuracy and highly complex analytics to support trade execution and improve their firms’ competitive position IBM components: – IBM BladeCenter QS22 – IBM SDK for Multicore Acceleration – Dynamic Application Virtualization – Cell/B.E. math libraries
ISV applications: – NAG - Math & Stat Software – Platform Symphony -Grid Computing Environment – Encirq – Event Processing Platform
The solution is designed to offer: – Flexibility and Scalability – IBM Bladecenter QS22 integrates with other Bladecenter Products – IBM SDK, DAV, third party applications for ease of adoption within existing infrastructure – Technical Services with skilled programming expertise and subject matter experts – Power, space and cooling advantages
Performance advantage – Collateralized Debt Obligation (CDO) - 7.5X faster than 2.8 GHz 4-core Harpertown* – 650 million European options /sec using Monte Carlo simulations on QS22 blade*
*See Notes on Benchmarks, charts 46 and 47
20
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Medical imaging solutions Improve the efficiency, productivity, and quality of patient care through dramatic performance improvements in the transmission and analysis of medical images IBM components: – IBM BladeCenter QS21 & QS22
The solution is designed to offer:
– IBM SDK for Multicore Acceleration
– 3D image reconstruction, registration, volume rendering, segmentation
– IBM Cell/B.E. math libraries
– On-demand compression/decompression
– IBM hybrid computing solution (custom offering) – PXCAB
ISV applications: – Advanced image and text analytics – High-performance image compression
Performance advantage: – 16x improvement on MRI image reconstruction over Opteron system – 11x improvement on CT image reconstruction over 3.0GHz Xeon system – 48x improvement on image registration over 3GHz Pentium 4 – 200x shear-warp volume visualization over TI TMS320C80 processor – 40:1 CT study data compression (Source for all above: Mayo Clinic http://www.mayoclinic.org/news2007rst/3996.html )* *See Notes on Benchmarks, charts 46 and 47
21
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Seismic solutions Improve the speed and accuracy of geologic visualization to reduce the cost of evaluating potential targets for oil and gas yielding potential
IBM components: – IBM BladeCenter QS22 – IBM SDK for Multicore Acceleration – IBM Cell/B.E. math libraries – IBM hybrid computing solution (custom offering) – PXCAB – Standard math, vector math, FFT, BLAS, MPI and tridiagonal solver
ISV applications: – Simudyne – Customers own proprietary code
The solution is designed to offer: – High-performance highly accurate rendering of geologic structures – Cost effective HPC environment that has significant performance increases – Scalability and reliability
Performance advantages: – FFT workloads up to 7.7x faster than 3.0 GHz 2-core Woodcrest x2* – Double Precision Matrix Multiplication up to 2.6x faster than 2.66GHz 4-core Clovertown*
*See Notes on Benchmarks, charts 46 and 47
22
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
QS22 summary Premier blade for HPC workloads The QS22 is based on the new PowerXCell 8i processor – built on an enhanced version of the Cell Broadband Engine Architecture The QS22 offers the capabilities you need for your most demanding computational requirements – Offers extraordinary double precision and single precision floating point performance – Supports up to 32GB of processor memory
IBM is working with ISVs and customers to accelerate workloads on the QS22 in targeted application areas The QS22 is extremely efficient, offering more than 1.7 SP (or 0.8 DP) GFLOPS per watt of energy BladeCenter QS22 is Right, Open, Easy and Green 23
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
IBM SDK for Multicore Acceleration summary Designed to be highly reliable, simple to acquire and easy to use – Complete, integrated kit – Production-ready tools from IBM – IBM warranty and support
RHEL 5.2 Enterprise support Based on industry standards to ease the transition to the Cell/B.E. architecture – Eclipse-based Integrated Development Environment – Standard, base libraries – Third-party libraries can be plugged in
Designed to make it easy to port and optimize applications for the QS22 – Performance tuning tools to help optimize algorithms without re-writing the entire application – Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform
24
Sales Conference
© 2008 IBM Corporation
IBM Systems and Technology Group
Cell/B.E. architecture reaches wide and deep – from consumer products to high performance computing
iinngg s s a a ree IInnccr
tteerr n n e ttaacce a a d nndd d a a e ccaalle s s r r tt ffoo r r o ppo ssuupp
MiniRoadrunner Custom
IBM BladeServer (2 Cell/B.E. or
Roadrunner (16,000 PowerXCell 8i. + AMD)
PowerXCell 8i) Mercury 1u Dual Cell PowerXCell 8i PCI card (Cell/B.E. + Host)
Toshiba SpursEngine (SPU’s. + Host)
SCE PS3 (Cell/B.E. + GPU)
Consumer
Business
Sony Cell/B.E. Computing Unit (Cell/B.E. + GPU + AV I/O)
Enterprise
High Performance Computing
Common OS’s, Infrastructure, Tools, Libraries, Code…
the SAME SPE code runs from end to end 25
Sales Conference
© 2008 IBM Corporation