featured product:
SkyCross: Multiple Signals, Single Antenna
Multicore for
Handhelds
consumer electronics: DSP Encoding for Video portable power: Architectural Issues for Power Gating
CEO Interview:
Mark Thompson February 2008
Fairchild www.portabledesign.com Semiconductor
An RTC Group Publication
contents
departments
editorial letter dave’s two cents industry news analysts’ pages products for designers design idea
5 6 8 12 44 47
cover feature
Solve Portable Design Problems 22 Using Convenient Concurrency
David Lautzenheiser, Silistix
consumer electronics
Residual DCT
Optimizing Video Encoders with 34 Digital Signal Processors Ajit Rao, Texas Instruments
Huffman Coding
Quantize
Inverse Quantize MV Data
Ian Rickards, ARM
Good Embedded Communications 28 Is the Key to Multicore Hardware Design Success
Buffer Fullness (Qp
Input Video Signal RGB to YUV
Multicore Design for the 16 Next-Generation “Kings of Cool”
Grant Martin and Steve Leibson, Tensilica
20 multicore for handhelds
Inverse DCT
Previo Reconstr Fram
Prediction
35 consumer electronics Motion Compensation
Data Rail-Switched CPUMV Sub-System Motion Estimation Power-Gated CPU Core
Integer Core
Cache Memory Subsystem MAC
VFP
39 portable power
portable power Architectural Issues for 38 Power Gating Michael Keating, Synopsys David Flynn and Robert Aitken, ARM Alan Gibbons and Kaijian Shi, Synopsys
ceo interview Mark Thompson 48 Fairchild Semiconductor
44 products for designers
FEBRUARY 2008
team editorial team
Editorial Director Editor-in-Chief Managing Editor Copy Editor
Creative Director Art Director Graphic Designer Director of Web Development
Web Developer
Warren Andrews, warrena@rtcgroup.com John Donovan, johnd@rtcgroup.com Marina Tringali, marinat@rtcgroup.com Rochelle Cohn
art and media team Jason Van Dorn, jasonv@rtcgroup.com Kirsten T. Wyatt, kirstenw@rtcgroup.com Christopher Saucier, chriss@rtcgroup.com Marke Hallowell, markeh@rtcgroup.com Brian Hubbell, brianh@rtcgroup.com
management team 6/15/07 10:14:46 AM
Untitled-1 1
Associate Publisher Product Marketing Manager (acting) Western Advertising Manager Western Advertising Manager Eastern Advertising Manager Circulation
Marina Tringali, marinat@rtcgroup.com Aaron Foellmi, aaronf@rtcgroup.com Stacy Gandre, stacyg@rtcgroup.com Lauren Trudeau, laurent@rtcgroup.com Nancy Vanderslice, nancyv@rtcgroup.com Shannon McNichols, shannonm@rtcgroup.com
executive management
HOW WELL DO YOU KNOW THE INDUSTRY?
Chief Executive Officer Vice President Vice President of Finance Director of Corporate Marketing Director of Art and Media
John Reardon, johnr@rtcgroup.com Cindy Hickson, cindyh@rtcgroup.com Cindy Muir, cindym@rtcgroup.com Aaron Foellmi, aaronf@rtcgroup.com Jason Van Dorn, jasonv@rtcgroup.com
portable design advisory council Ravi Ambatipudi, National Semiconductor Doug Grant, Analog Devices, Inc. Dave Heacock, Texas Instruments Kazuyoshi Yamada, NEC America
corporate office WWW.EMBEDDEDCOMMUNITY.COM
The RTC Group 905 Calle Amanecer, Suite 250 San Clemente, CA 92673 Phone 949.226.2000 Fax 949.226.2050 www.rtcgroup.com
For reprints contact: Marina Tringali, marinat@rtcgroup.com. Published by the RTC Group. Copyright 2007, the RTC Group. Printed in the United States. All rights reserved. All related graphics are trademarks of the RTC Group. All other brand and product names are the property of their holders. Periodicals postage at San Clemente, CA 92673 and at additional mailing offices. Postmaster: send changes of address to: Portable Design, 905 Calle Amanecer, Suite 250, San Clemente, CA 92673. Portable Design(ISSN 1086-1300) is published monthly by RTC Group 905 Calle Amanecer, Suite 250, San Clemente, CA 92673. Telephone 949-226-2000; 949-226-2050; Web Address www.rtcgroup.com.
embeddedcommad_14v.indd 1 PORTABLE DESIGN
11/13/06 5:55:59 PM
editorial letter
M
Now that cell phones have become the ubiquitous converged device, everyone wants a piece of the action. And with cell phones now rivaling low-end notebooks in computing power, handsets are acquiring a lot of high-end capabilities. With your phone you can listen to CD-quality music complete with Dolby 5.1 surround sound, watch live TV, locate the nearest ATM or gas station, play back high-resolution videos you shot on your camera phone over your 1080P flat-screen TV and even—if your phone supports near-field communications (NFC)— use it to pay for your Big Mac. The explosion of the handset market is driving the development of a wide range of air interfaces, all of which their developers want to integrate into your next phone. While each protocol has its inevitable strengths and weaknesses—making it better suited to some applications than others—the lure of the big money play, namely cell phones, has often let marketing muscle drive the technical arguments. One case in point is a semiconductor manufacturer who offered an 802.11n chip “guaranteed to conform to the standard” several months before the standard was to be ratified. Meanwhile, they were heavily engaged with the IEEE standards committee trying to ensure that the resulting standard wouldn’t make liars of them. This sort of politicking within standards committees is neither unusual nor unseemly, within reason. By the time a protocol is submitted for standardization, vendors—who invariably sit on the standards committees—are usually offering their own proprietary versions of that technology in their products. If the standards efforts drag out, and if the resulting standard doesn’t ultimately embrace the vendors’ approaches, then the vendors will have to spend a lot of money updating their products to ensure conformity. If you’re making an ASIC or fixed hardware product without a software workaround, you’re in trouble. Geography usually dictates the choice of a standards body. American tech companies usually look to the IEEE; European companies to ETSI or Cenelec, among others; Japanese companies to JEDEC; etc. But sometimes politics dictates the choice of a standards body. When Cadence proposed its Common Power Format (CPF), it included what its competitors considered a long, proprietary path to an IEEE standard. In reaction, Synopsys, Mentor, Magma and others proposed their Unified Power Format (UPF) to Accellera, a much smaller standards body, which they felt would act more quickly than the IEEE. Cadence in turn reacted with a much shorter timeline for IEEE submission, and it then became a race to the finish line between dueling standards. It now looks like we’ll have dual standards instead, which is certainly a suboptimal solution.
Standards often morph in response to market conditions. Few have morphed more than Bluetooth, which took so long to arrive at a standard that the ZigBee camp broke out along the way to develop a low-power variant for wireless mesh networks. Bluetooth was initially designed to be a wireless headphone link, where it now dominates. But with that success under its belt, the Bluetooth camp saw opportunities in the ZigBee market space and came out with a smaller stack and simplified protocol to enable them to go af-
Standards Wars
john donovan, editor-in-chief
ter those applications. They also saw opportunities that required higher speed, so they adopted Ultrawideband (UWB) as their “high-speed channel.” Will Bluetooth start next competing with Wireless USB? Meanwhile a wide range of other wireless protocols is seeking both standardization and cell phone sockets. Mobile TV is becoming popular, and there are many ways to deliver it: WiMAX, MediaFLO, DVB-H, EV-DO, HSDPA, etc. Both WiMAX and DVB are standards that morphed to be able to incorporate mobile subscribers. All of these protocols have technical strengths and limitations, and all are recognized standards. But the current battle between WiMAX, MediaFLO and DBV-H to be able to stream video to cell phones will be decided in the marketplace and not standards bodies. Being able to deliver products at key price points, on time, that match the carriers’ roadmaps, will be key to winning that fight. The real standards battleground is ultimately in the marketplace. At CES LG Electronics showed prototypes of “Mobile Pedestrian Handheld (MPH)” devices that receive over-the-air local TV broadcasts. At the same time Samsung showed its new “AdvancedVSB devices” that do the same thing. Both devices decode two different technologies that modify the U.S. digital broadcast system to overcome the Doppler problem while you’re driving. Neither is standardized, but both hope to become so popular that they become de facto standards, just as Sony’s Blu-ray Disc seems to have recently won that format war against HD-DVD. So standards are important, but ultimately having a design that’s so cool that it becomes a de facto standard is the ultimate key to success, which carries its own reward.
FEBRUARY 2008
dave’s two cents
O
Our family tends to be a little tech heavy during the holiday gift giving. But from what I can tell, we are not alone. This year’s gift theme was wireless devices. There was a Wi-Fi-enabled digital photo frame, a gaming system with nearly everything wireless, a wireless headset, a universal remote with wireless extender and a Bluetooth-enabled navigation system. My daughters decided that our now seven-year-old cordless phones needed to be replaced as some of the buttons are hard to engage. To my surprise, the batteries outlived the buttons. Another surprise was how easy it was to set up all the wireless devices. Of course, things like cordless phones are expected to be easy. But the other devices took nearly the same level of effort. A simple setup is definitely what you want
dave’s two cents on...
Living with World Wide Wi-Fi
during the holidays. I remember spending hours in the past on mechanical gifts that stated: “Some Assembly Required.” I think the real warning should have been: “Caution: May Cause Feelings of Anger and Provoke Bad Language.” Wi-Fi-enabled devices may be the theme for 2008. Previously, a few products like digital cameras and Web cams along with a few monitors and other gadgets used Wi-Fi. According to the Wi-Fi Alliance Web site, more than 300 million Wi-Fi chipsets were shipped in 2007. The site goes on to say that by 2011, 700 million devices will ship with Wi-Fi, and enabled consumer devices will out-ship enabled PCs. At the Consumer Electronics Show (CES) 2008, there were a fair number of Wi-Fi-enabled devices ranging from gadgets to furniture. Yes, furniture. The Starry Night bed has Wi-Fi along with a considerable amount of other technology. Beyond the entertainment portion, which can keep you awake more than let you sleep, is a snore detector. It can raise the upper body of the snoring occupant in an attempt to stop the racket. PORTABLE DESIGN
A Wi-Fi-enabled SD memory card received a lot of attention. This card allows you to wirelessly connect to your computer and transfer files from the card. The particular application is digital still cameras. The $100 2 Gbyte card may be just the beginning of the memory cards of the future. Adding convenient features like wireless may just help to keep up the memory card ASP. The challenge is to enable other applications where the users would value this function. Car navigation continues to be another active consumer space. Dash Navigation, Inc. introduced the Dash Express with GPRS and Wi-Fi. This car navigation system uses the Internet and two-way communications to exchange real-time information about traffic flow, not just incidents. The exchange is anonymous, so the driver does not have to worry that someone will find out that they are stuck in traffic. However, talking to peers at work, there is still some concern that this information can result in a loss of privacy. To me this is a real innovation and good use of technology to solve the growing problem of lost productivity and waste of fuel due to traffic congestion. The real-time traffic flow information service costs about the same as satellite radio. A couple of years ago my youngest daughter asked me why they filmed old TV shows in black and white. I told her that they had not yet discovered color back in those days. It may be that in the next generation, children will ask why the backs of old computers and TVs have so many connectors. My answer might just be that these were a type of “decoration” in the old days. With the improvements in WiFi chipsets, speed and functionality, it could be that my IR universal remote will become a Wi-Fi-enabled device. For my two cents, advanced wireless connectivity will be as important as garage door openers and remote controls, and we will become very dependent on their operation. I am not sure that other members of my family even know how to turn on the TV or set-top box without the remote. I can imagine that as more everyday appliances become Wi-Fi-enabled, we may not be able to use even simple appliances such as a microwave without a GUI. Dave Freeman, Texas Instruments
news Michel Mayer Out at Freescale
First Ed Zander, now Michel Mayer. Freescale Semiconductor announced that Michel Mayer, chairman and CEO, has decided to step down. The company and its board of directors have initiated a search for a new CEO. Mayer will continue in his current role until a successor has been identified and will remain chairman of the board until the transition is effective. Mayer joined Freescale in May 2004 and led the company through its transition from a semiconductor division of Motorola to a successful public company following an initial public offering in July 2004. In December 2006, Freescale became the largest leveraged buy-out (LBO) in the history of the technology industry.
But with earnings down, time ran out for Mayer. There’s considerable speculation about the company’s owners breaking up the company, selling off divisions to Infineon, NXP, TI or various Japanese vendors. Freescale insists it is in the cell phone chip business for the long run—if that division can survive in the short run. It remains to be seen if Freescale’s private owners are in the semiconductor industry for the long run. If they are, new management might well refocus Freescale’s considerable human and technical assets and return it to profitability—though the path back will be painful. Not, however, as painful as the alternatives. John Donovan, Editor-in-Chief Freescale Semiconductor, Austin, TX. (800) 521-6274. [www.freescale.com].
nd
Intel, STMicroelectronics Deliver Industry’s First Phase Change Memory Prototypes
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
Freescale’s problems are directly linked to those of Motorola, its largest customer. When ed Motorola’s cell phone sales faded last year, the company cut its chip orders from Freescale. Adding insult to injury, MOT also negotiated an agreement that allowed it for the first time to purchase components from other suppliers. Freescale reportedly got an estimated $200M companies providing solutions now “signing but that one time bump hasn’t exploration into products, technologies and companies. Whether your goal is to research the latest bonus,” datasheet from a company, mp to a company's technical page, the goal of Get Connected is to put you in touchbeen with the right resource. Whichever level of loss of sales. enough to offset the resulting gy, Get Connected will help you connect with the companies and products you areThe searching for. company claims that its design wins are onnected up, but they are yet to register on its bottom line—and, lacking a major Tier 1 win, the impact won’t offset the Motorola loss. Mayer has struggled valiantly to wean the company from its dependence on MOT, but so far with limited success. He did replace many senior staff and instill a more entrepreneurial (read: non-Motorola) mentality. He reorganized the company into five focused, product-oriented groups, and recently sought to build up Freescale’s weak footprint in the Get Connected with companies mentioned in this article. consumer market with the pending acquisition www.portabledesign.com/getconnected of SigmaTel.
End of Article
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
Intel Corporation and STMicroelectronics reached a key industry milestone by shipping prototype samples of a future product using a new, innovative memory technology called Phase Change Memory (PCM). The prototypes are the first functional silicon to be delivered to customers for evaluation, bringing the technology one step closer to adoption. The memory device, codenamed “Alverstone,” uses PCM, a promising new memory technology providing very fast read and write speeds at lower power than conventional flash, and allows for bit alterability normally seen in RAM. PCM has long been a topic of discussion for research and development, and with Alverstone, Intel and STMicroelectronics are helping to move the technology into the marketplace. “This is the most significant non-volatile memory advancement in 40 years,” said Ed Doller, chief technology officer-designate of Numonyx, the new name for the pending STMicroelectronics and Intel flash memory company. “There have been plenty of attempts to find and develop new non-volatile memory
technologies, yet of all the concepts, PCM provides the most compelling solution—and Intel and STMicroelectronics are delivering PCM into the hands of customers today. This is an important milestone for the industry and for our companies.” In related news, Intel and STMicroelectronics technologists presented a research paper this week at the International Solid States Circuits Conference (ISSCC) describing yet another breakthrough in PCM technology. Together, the companies created the world’s first demonstrable high-density, multi-level cell (MLC) large memory device using PCM technology. The move from single bit per cell to MLC also brings significantly higher density at a lower cost per Mbyte making the combination of MLC and PCM a powerful development. In 2003, Intel and STMicroelectronics formed a joint development program (JDP) to focus on Phase Change Memory development. Previously the JDP demonstrated 8 Mb memory arrays on 180 nm at the 2004 VLSI conference and first disclosed the Alverstone 90 nm 128 Mbit memory device at the 2006 VLSI Symposium. Alverstone and future JDP products will become part of Numonyx, a new independent semiconductor company created through an agreement between STMicroelectronics, Intel and Francisco Partners signed in May 2007. The new company’s strategic focus will be on supplying complete memory solutions for a variety of consumer and industrial devices, including cellular phones, MP3 players, digital cameras, computers and other high-tech equipment. The companies are scheduled to close the transaction in the first quarter of 2008. In 2007, the combined memory market for DRAM, flash and other memory products
such as EEPROM was $61 billion, according to the industry research firm Web-Feet Research, Inc. Memory technology cost declines have traditionally been driven at the rate of “Moore’s law,” where density doubles every 18 months with each lithography shrink. As RAM and flash technologies run into scaling limitations over the next decade, PCM costs will decline at a faster rate. The advent of multi-level-cell PCM will further accelerate the cost per bit crossover of PCM technology relative to today’s technologies. Finally, by combining the bit-alterability of DRAM, the non-volatility of flash, the fast reads of NOR and the fast writes of NAND, PCM has the ability to address the entire memory market and be a key driver for future growth over the next decade. Alverstone is a 128 Mb device built on 90 nm and is intended to allow memory customers to evaluate PCM features, allowing cellular and embedded customers to learn more about PCM and how it can be incorporated into their future system designs. Intel Corporation, Santa Clara, CA. (408) 7658080. [www.intel.com]. STMicroelectronics, Geneva, Switzerland. +41 22 929 29 29. [www.st.com].
HEDGE Transceiver Supports TDSCDMA, HSUPA, S-Band and GMR Specifications
Sequoia Communications has announced at Mobile World Congress that its SEQ7400 HEDGE transceiver has been verified to support HSUPA, TD-SCDMA, S-Band and GMR satellite specifications via extensive testing, making it the first true “flexible radio.” Originally introduced in May 2007, the
SEQ7400 is based on the company’s patented FullSpectra architecture, which includes the only transmitter in the industry to use polar modulation in all modes. This all-polar architecture enables this flexibility without the sub-optimal cost and power consumption that plagues traditional software defined radio (SDR) approaches. The SEQ7400, the industry’s most highly integrated single-chip HEDGE transceiver, is currently being integrated into several baseband reference platforms targeting 3G phones later this year. The key to its ability to easily support additional modes, like TD-SCDMA, lies in the all-polar transmitter, which uses a single transmit path for all modes. The single-path polar architecture for the analog portion of the radio gives it the most efficient size and power consumption possible. The company then added a very flexible, all-digital programming interface that is easily modified in software to accommodate additional modes and frequency bands. This combination of analog and digital approaches provides the most optimal trade-off between flexibility and the key handset metrics. “Our patented polar architecture is fundamental to enabling the ‘flexible radio’ concept,” said John Groe, CTO and founder of Sequoia Communications. “It uniquely provides the flexibility to process both narrowband and wide-band modulation schemes using a single radio.”
Flexible Radio vs. SDR
Attempts have been made to develop a software defined radio that can be programmed in software to handle various modes. To date, SDR has failed to meet the stringent cost and power consumption requirements for wireless handsets. In contrast, the flexible radio concept utilizes a single radio architecture capable of supporting all modulation types using customized digital circuitry. This provides a solution that can be easily upgraded to support nextgeneration standards with minor changes in the digital design. Test results prove that the SEQ7400 meets the most difficult of cellular specifications for WCDMA and HSDPA, and as a result is proven to be the best single-chip HEDGE solution FEBRUARY 2008
news on the market. Further testing has proven that the SEQ7400 can be extended to a variety of other applications without any modifications to the chip itself. It needs only to be re-programmed and then tested to the new specifications. The SEQ7400 is the first true “flexible radio.” Sequoia Communications, San Diego, CA. (858) 946-7400. [www.sequoiacommunications.com].
CMOS Multiple Antenna Receiver for 60 GHz
At this month’s IEEE International Solid State Circuit Conference, IMEC introduced its prototype of a 60 GHz multiple antenna receiver, and invited industry to join its 60 GHz research program. The 60 GHz band offers massive available bandwidth that enables very high bit rates of several Gbits-per-second at distances up to 10 meters (about 33 feet). To make the 60 GHz technology cost-efficient to manufacture, low power and affordable in consumer products, IMEC has built its RF solution in a standard digital CMOS process thereby avoiding the extra cost of alternative technologies or dedicated RF process options. The second industry goal is to overcome high path losses at mm-wave frequencies by using a phased antenna array approach. IMEC’s prototype uniquely addresses this problem by implementing a programmable phase shift of various incoming signals, which is necessary for beam-forming. IMEC’s device contains two antenna paths, each consisting of a low-noise amplifier and a down-conversion mixer. The programmable phase shift is realized on the same chip. It starts from the quadrature signals of an onchip quadrature voltage-controlled oscillator (QVCO). This QVCO design combines the highest oscillation frequency with the largest tuning range ever reported in CMOS. IMEC’s multiple antenna receiver is the first step toward a complete CMOS-based phased array transceiver for 60 GHz wireless personal area networks that envisage multi-gigabit-persecond applications such as fast kiosk downloading, wireless high-definition multimedia 10
PORTABLE DESIGN
interface (HDMI) and other applications. These results were achieved in the unique multi-disciplinary 60 GHz technology program. The research combines system-level aspects, algorithms, CMOS IC design, antenna design and module design, which target a low-power 60 GHz communication link based on adaptive beamforming using multiple antennas aligned with ongoing standardization activities. In the next phase of development, IMEC plans to implement four antenna paths using 45 nm CMOS technology and to integrate other subsystems such as the phase-lock loop (PLL), analog-to-digital converter (ADC) and the patch-antenna array itself. IMEC will also begin initial experiments for a power amplifier. IMEC, Leuven, Belgium. +32 16 28 12 11. [www.imec.be].
Surround Sound for Cell Phones
At the 2008 Mobile World Congress, Dolby Laboratories, Inc. demonstrated Dolby Mobile, an audio processing technology platform that brings rich, vibrant surround sound to music, movies and television programs on mobile phones and portable media players. Dolby also showed aacPlus, a high-quality, highly efficient audio compression format designed for download, streaming and broadcast applications on mobile phones.
“Entertainment can sound better on mobile phones and portable media players,” said Francois Modarresse, vice president of marketing, Dolby Laboratories. “We developed Dolby Mobile to help create products that stir the senses and excite people’s imaginations so the industry can deliver on the promise of mobile entertainment.” Dolby recently announced an agreement with NTT DoCoMo and Sharp to offer two mobile phones with Dolby Mobile technology, the FOMA SH905i and FOMA SH905iTV, which are now available to customers in Japan. The company also announced an agreement with RMI Corporation to implement Dolby Mobile on the RMI Alchemy platform for portable media player applications. Finally, Dolby demonstrated a prototype device with Dolby Mobile built on Texas Instruments’ popular OMAP platform. Dolby Laboratories, Inc., San Francisco, CA. (415) 558-0200. [www.dolby.com].
One-Chip HSPA Solution for Open OS-Enabled Mobile Devices
Ericsson has announced the U380 mobile platform, delivering a true mobile Internet experience and empowering consumers with advanced multimedia features. The U380 is an integrated and verified one-chip HSPA platform, supporting all major Open Operating Systems (OSs) on the market and the first product based on the recently announced Open OS collaboration with Texas Instruments (TI). The continuous need for more powerful devices combined with the demand for mass market smart phones makes the U380 a groundbreaking platform for mobile device manufacturers addressing this rapidly growing segment. By integrating Ericsson’s HSPA modem with TI’s high-performance OMAP3430 applications processor into a one-chip solution, the U380 is among the smallest and most powerful multimedia platforms on the market in this category. With support for all major Open OSs, the U380 will provide mobile device manufacturers with a robust and flexible architecture for applications and services deployment, enabling
easier delivery and management of services and content. This enables handset manufacturers and mobile operators to differentiate their products through rich, easy-to-use and customizable user interfaces. The extensive verification and IOT testing performed by Ericsson will also enable manufacturers to rapidly bring highly advanced mobile devices to the mass market while reducing risk and lowering development costs. The U380 platform is expected to be commercially available in the first half of 2009. Ericsson, Inc., Plano, TX. (972) 583-0000. [www.ericsson.com]. Texas Instruments Inc., Dallas, TX. (800) 336-5236. [www.ti.com].
First Sub-$100 3G Linux Mobile Phone
NXP Semiconductors and Purple Labs have announced the release of a 3G Linux reference feature phone offering video telephony, music playback, high-speed Internet browsing and video streaming at a transfer price below $100. The new Purple Magic phone serves as a reference design for phone manufacturers creating entry-level 3G handsets, including those targeting mobile markets such as Southeast Asia, Eastern Europe and Latin America. According to the GSA mobile industry trade group, there are now 197 commercial 3G/WCDMA networks in 87 countries, but the adoption of 3G services is concentrated in mature European and Asian markets. The viability of 3G services in emerging markets requires large volumes of affordable data-capable devices—and the 3G Purple Magic has been highly optimized specifically to meet this market challenge. “During the recent holiday period, mobile operators were purchasing entry-level 3G phones for $120 to $145,” commented Simon
Wilkinson, CEO of Purple Labs. “Leveraging our Linux technology, the Purple Magic design now enables manufacturers to deliver compelling 3G products at a transfer price below $100.” The Purple Magic reference phone is based on the NXP Nexperia Cellular System Solution 7210 for 3G, coupled with the Purple Labs Linux suite, and is available as a fully integrated, turnkey solution. Production of an initial quantity of fully working phones has enabled Purple Labs to undertake considerable testing and validation, further reducing investment and time-to-market for OEMs and ODMs that leverage the Purple Magic design. NXP offers powerful 3G and 3.5G multimedia platforms running on a single ARM926 processor core. The Nexperia Cellular System Solutions with Linux support true UMTS performance, delivering seamless service coverage in 2G and 3G networks as well as advanced multimedia features that allow consumers to take full advantage of next-generation applications and mobile operator services. NXP Semiconductors, San Jose, CA. (408) 474-8142. [www.nxp.com].
NEC to Develop PMICs for Mobile Internet Devices
NEC Electronics Corporation and its subsidiary in the United States, NEC Electronics
America, Inc. have announced that they have teamed with Intel Corporation to develop a power management IC (PMIC) solution optimized for mobile Internet devices (MIDs). The highly integrated PMIC solution will combine power management, logic, audio and communications functions. Intel-based MIDs are targeted to bring a full Internet experience in your pocket and will allow consumers to stay in touch with family, entertain with friends, stay informed and be productive on the go. “NEC Electronics’ power management technology helps to maximize battery life, support high levels of integration and minimize motherboard space—critical requirements for emerging portable devices,” said Minoru Matsuda, general manager, Power Management Devices Division, NEC Electronics Corporation. “Through our collaboration with Intel, we expect to deliver a PMIC solution that enhances the end-user experience by maximizing power efficiency, enabling the freedom to roam.” “Mobile Internet Devices (MIDs) is an emerging category and represents an exciting growth opportunity for the industry,” said Pankaj Kedia, director of global ecosystem programs in the Ultra Mobility Group at Intel Corporation. “NEC Electronics’ expertise in power management and its ability to deliver highly integrated solutions and Intel’s lowpower technologies, will enable smaller, thinner, sleeker MIDs with great battery life for the consumer.” NEC Electronics has strong experience developing highly integrated PMICs for customers in mobile markets for over 15 years. More information about NEC Electronics’ power management IC offering can be found at http://www.am.necel.com/pmd/ pmics.html. NEC Electronics America, Santa Clara, CA. (408) 588-6000. [www.am.necel.com].
FEBRUARY 2008
11
analysts’ pages Mobile Data Implementations to Slow in 2008
Growth in revenue for mobile business applications will be close to 50% between 2006 and 2007, and then slow to 44% from 2007 to 2008, reports In-Stat. These strong growth projections are good news for the wireless industry, but may be lower than some may be planning on based upon the literal reading of end-user survey data, the high-tech market research firm says. That’s because there is a widening gap between what decision makers expect that they will do and what they actually implement. “As business users approach saturation for horizontal mobile data applications, most of the growth potential remains for vertical market applications,” says Bill Hughes, In-Stat analyst. “These require more planning and time to implement. The result is that many within the wireless industry may have overoptimistic forecasts.” Recent research by In-Stat found the following: • The penetration of at least one mobile data application among firms increased from 75% to 94% in 2007. • Smart phone use among U.S. business users increased 34% between 2006 and 2007. • Four horizontal applications, wireless email, wireless Internet access, wireless instant messaging and personal information management (PIM), have the highest penetration because they are easier to implement than the vertical market applications.
nd
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
ed
companies providing solutions now
In-Stat, Scottsdale, AZ. (480) 483-4440. [www.in-stat.com].
exploration into products, technologies and companies. Whether your goal is to research the latest datasheet from a company, mp to a company's technical page, the goal of Get Connected is to put you in touch with the right resource. Whichever level of gy, Get Connected will help you connect with the companies and products you are3G searching for. Uptake Heavily Influencing
onnected
Cell Base Station Sales
End of Article Get Connected
with companies mentioned in this article. www.portabledesign.com/getconnected
12
Many factors impact base station demand, and the largest is the uptake and usage of 3G technology, reports In-Stat. Should uptake of 3G services be light over the next several years, the only new base stations required would be those to support more subscribers and for replacement of old or broken base stations, the high-tech market research firm says. But, should 3G demand become heavy, the number of base stations required to quench this wireless demand could be large, with operators paying for new base stations with service revenues.
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
“Our forecast for 3G data use falls somewhere between very little use and heavy use,” says Allen Nogee, In-Stat analyst. “While there are many convincing arguments for heavy wireless data use, competing technologies, such as WiMAX and wired Internet access, will reduce 3G data demand, and the cost of 3G services will be prohibitive in many regions.” Recent research by In-Stat found the following: • More than 4.7 million cellular base stations will be in operation by 2011. • Cellular demand in China and India will keep sales of GSM base stations strong for many years. • Licenses for TD-SCDMA will be given out in 2008, but shipments of TD-SCDMA base stations will pale in comparison to WCDMA base station shipments. In-Stat, Scottsdale, AZ. (480) 483-4440. [www.in-stat.com].
The 700 MHz Auction Bids Continue Climbing and Far Exceed Expectations
Even the FCC underestimated the demand for 700 MHz. Originally expected to raise between $10 billion and $15 billion, total current bids exceed $18.9 billion. Bidding on the sought-after C block also surpasses the FCC requirements ($4.6 billion) and now has a PWB (Provisional Winning Bid) of $4.7 billion. ABI Research notes that demand for 700 MHz is so great, not even the pending recession impacts the bidding process. This auction may be the last opportunity for new participants to gain ground in the wireless realm, which is dominated by incumbent carriers. New participants include Google, Vulcan Spectrum LLC, Tower Stream and Cox, while traditional participants include Verizon and AT&T. “700 MHz provides better propagation characteristics than 850 MHz, 1900 MHz, 2100 MHz and higher-frequency spectrums,” says ABI Research senior analyst Nadine Manjaro. “For instance, cable companies can utilize this spectrum by launching their own mobile broadband networks—thereby alleviating the need to rely on wireless operators. But Google draws the most speculation, since the company’s entrance into the wireless market could significantly change the wireless industry.”
Moreover, incumbent operators absolutely want to strengthen 4G deployments; and 700 MHz will improve rural and in-building coverage at a lower cost than existing frequencies. One surprise is the nationwide D block, set aside by the FCC for public safety. Frontline Wireless—the expected front-runner—was disqualified due to its inability to pay the minimum bid. And there has been only one bid, which failed to meet the reserve bid price of $519 million. The D block price is set at $1.3 billion. “Beginning in 2009,” concludes Manjaro, “the 700 MHz auction could alter wireless broadband services in the United States and abroad. Google’s interest in the C block influenced Verizon’s decision to open up access to its network, just as Apple’s introduction of the iPhone stirred up the cellphone market. If Google or Microsoft secures the C block spectrum, it could change the whole industry.” ABI Research, Oyster Bay, NY. (516) 624-2500. [www.abiresearch.com].
time buyers and the replacement market. The other factor was the significant increase in export shipments from Chinese handset manufacturers. Domestic OEMs, such as Huawei and ZTE, doubled their export shipments. Furthermore, domestic gray market suppliers shipped millions of handsets to developing countries.” In 2007, the domestic Chinese handset market totaled about 200 million units—consisting of 150 million licensed handset units and about 50 million gray-market handsets. Sustained increases in the Consumer Price Index (CPI), rising housing prices and major stock market fluctuations in China are expected to hurt consumer confidence in 2008. Except for handsets supporting the Global Positioning System (GPS) and mobile TV functionality, there are no popular new handset features to drive new sales. Due to higher Average Selling Prices (ASPs), iSuppli expects that GPS and mobile-TV equipped phones will remain niche consumer products for now. Consequently, the replacement market will experience very limited growth during 2008, restraining increases in shipments in 2008. However, iSuppli anticipates that sales of handsets supporting TD-SCDMA and EDGE will ramp up quickly in 2008 as they become more widely available. iSuppli Corporation, El Segundo, CA. (310) 524-4000. [www.isuppli.com].
DRAM Suppliers Become Casualties of Their Own MarketShare War
It says a lot about conditions in the global DRAM market when the industry’s most notable performance in the fourth quarter was posted by Elpida Memory Inc., whose sales declined by a double-digit percentage during the period. Indeed, the fourth quarter of 2007 was miserable for DRAM suppliers on a number of counts, including: • Global revenue declined by 19 percent to $6.5 billion, down from $7.97 billion in the third quarter. • Sales dropped by 40 percent compared to the fourth quarter of 2006. • All of the Top-10 DRAM makers tracked by iSuppli suffered sequential declines in revenue. DRAM market conditions in the fourth quarter were far worse than predicted and rivaled the debacle of the second quarter of 2007, when global revenue declined by 24.1 percent on a sequential basis. The main culprit behind the fourth-quarter revenue dive was a 31 percent plunge in Average Selling Prices (ASPs) compared to the third quarter. The ASP drop was partly the result of a 17 percent increase in megabyte DRAM unit production, which contributed to a glut in the market. In contrast, DRAM unit production
table 1
China’s Handset Shipment Growth to Slow in 2008
250 200 150 100 50 0
2006
2007
2008
80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0%
Annual Growth Percentage
300 Millions of Units
Mobile handset shipments from manufacturers headquartered in China grew dramatically in 2007, reaching 229 million units for the year, representing a 76.2 percent annual growth rate from 130 million in 2006. However, this rate of growth will slow considerably in 2008, dropping to about 19.7 percent to reach 274 million units, according to iSuppli Corp. Table 1 presents iSuppli’s forecast for total handset shipments from manufacturers based in China for the period of 2006 through 2008. “There were two major drivers for the fast growth in China’s handset industry during 2007,” said Kevin Wang, senior analyst, China research for iSuppli Corp. “One was the continuous increase in domestic demand from first-
Forecast of Mobile Handset Shipments by China-Based Manufacturers, 2006-2008 (Millions of Units and Percentage Annual Change)
Shipments Annual Growth Percentage Source: iSuppli Corporation
FEBRUARY 2008
13
analysts’ pages rose by only 9.7 percent in the third quarter, and typically increases by 10 percent on a sequential quarterly basis. The DRAM industry also continues to struggle with excess inventory, which is helping to drive down pricing.
ity compared to one year earlier, when the DRAM industry was in the black to the tune of $3.4 billion. “There’s a lesson to be learned from the fourth-quarter DRAM disaster: In this game of upping the production ante, no supplier wins— and the entire industry loses,” said Nam Hyung Kim, director and chief analyst, memory ICs/ storage systems for iSuppli. “Tier-one DRAM makers can generate profits more than the industry average when the in-
Sense of Loss
The drop in market revenue resulted in an industry-wide operating loss of nearly $3 billion in the fourth quarter. This represents a $6.4 billion swing in operating profitabil-
table 2 & table 3 Preliminary DRAM Revenue Market Share In Q4, 2007 (Revenue in Millions of U.S. Dollars) Q4 Rank
Company
Q4 ‘07 Revenue
Q4 Share
Q3 ‘07 Revenue
Sequential Growth
Q4 ‘06 Revenue
Year to Year Growth
1 Samsung
1,933
30.0%
2,203
-12.3%
3,001
-36%
2 Hynix
1,213
18.8%
1,816
-33.2%
2,065
-41%
3 Elpida
830
12.9%
930
-10.8%
1,198
-31%
4 Qimonda
744
11.5%
974
-23.6%
1,498
-50%
5 Micron
739
11.5%
846
-12.6%
1,024
-28%
6 Nanya
314
4.9%
390
-19.6%
661
-53%
7 Powerchip
238
3.7%
265
-10.2%
528
-55%
8 ProMos
193
3.0%
257
-25.1%
496
-61%
9 Etron
103
1.6%
108
-4.6%
95
8%
49
0.8%
60
-18.3%
35
40%
10 Elite Others iSuppli Total
95
1.5%
114.7
-16.8%
133
-28%
6,450
100.0%
7,964
-19.0%
10,734
-40%
Preliminary DRAM Revenue Market Share In 2007 (Revenue in Millions of U.S. Dollars) 2007 Rank
Company
2007 Revenue
2007 Share
2006 Revenue
2006 Share
1 Samsung
8,723
27.7%
9,586
28.2%
-9%
2 Hynix
6,706
21.3%
5,637
16.6%
19%
3 Qimonda
3,995
12.7%
5,347
15.7%
-25%
4 Elpida
3,828
12.2%
3,527
10.4%
9%
5 Micron
3,207
10.2%
3,740
11.0%
-14%
6 Nanya
1,561
5.0%
2,241
6.6%
-30%
7 Powerchip
1,418
4.5%
1,573
4.6%
-10%
8 ProMos
1,067
3.4%
1,382
4.1%
-23%
390
1.2%
308
0.9%
27%
179
0.6%
152
0.4%
18%
428
1.4%
477.0
1.4%
-10%
31,502
100.0%
33,970
100.0%
-7%
9 Etron 10 Elite Others iSuppli Total
Source: iSuppli Corporation, February 2008
14
Sequential Growth
PORTABLE DESIGN
dustry is healthy—and only when supply and demand are in a reasonable state of balance. Rather than pursuing a scorched-earth policy of ramping up production to gain market share, tier-one DRAM suppliers should try to return to profitability by rationalizing supply growth.” Top-tier DRAM suppliers in 2007 engaged in massive capital spending programs, with the goal of cornering the market and driving smaller competitors out of the industry, Kim observed. However, even if this strategy succeeds, it will yield only short-term benefits. When profitability returns to the market, new competitors will come flooding in again. “Until the suppliers change their ways, this naïve game of scale will continue to cost the DRAM industry every year,” Kim said.
Good Money After Bad
The issue of profitability is becoming even more critical in the DRAM market as capital expenditure requirements grow. “iSuppli believes that the memory industry by 2020 will need to spend more than $100 billion per year just to maintain present rates of growth,” Kim said. “This is because the new generation of 18-inch wafer fabs beyond 2015 will cost a fortune—at $10 billion per fab. The industry needs to consider how to shift its competitive strategies in order to generate sufficient profitability to support this kind of growth.” Unfortunately, the DRAM industry is headed for at least two more quarters of major losses, Kim warned, making for some perilous times ahead.
Supplier Struggles
Elpida was the best performer among the top-tier DRAM suppliers in the fourth quarter, with its revenue declining by 10.8 percent to $830 million, down from $930 million in the third quarter. This allowed Elpida to the take the third rank in the global DRAM market, up from fourth place in the third quarter. The Japan-based company managed its ASPs deftly, allowing it to avoid a more severe drop in revenue.
Number-five ranked Micron Technology Inc. of the United States also managed to contain its revenue decline, with sales declining by only 12.6 percent sequentially. With number-four ranked Qimonda AG of Germany experiencing a 23.6 percent drop in revenue, Micron is within striking range of the fourthplace position. No.-2 Hynix Semiconductor Inc. of South Korea reached a dubious milestone in the fourth quarter, posting a loss that marked the end of 17 consecutive quarters of profitability. Hynix’s revenue fell 33.2 percent sequentially, the largest decrease among the Top-10 DRAM suppliers. Leading DRAM supplier Samsung Electronics Co. Ltd. of South Korea experienced a 12.3 percent decline in revenue. Samsung also is believed to have lost money in DRAM in the fourth quarter, although the company’s profitable NAND business offset its DRAM loss, resulting in a net profit for its memory sales.
Strong Application Processor Growth Offset by Integration and Declining ASPs
On an annual basis, DRAM revenue in 2007 declined to $31.5 billion, down from $34 billion in 2006. Despite its fourth-quarter troubles, Hynix achieved the strongest sales growth of the Top8 DRAM suppliers in 2007, with its revenue rising by an impressive 19 percent. The next best performance was posted by Elpida, which increased its revenue by 8.5 percent. Every other member of the Top-8 rankings, from Samsung on down, suffered a decline in DRAM revenue in 2007. “No matter how you look at it, 2007 was a disastrous year for the DRAM business, due to suppliers’ market-share and capital-spending games,” Kim said. “However, the industry now is undergoing a rebalance of supply and demand. iSuppli believes industry profitability will be better later this year. However, this will take more time than suppliers anticipated early in 2007, when they started boosting their unit production.” Tables 2 and 3 present iSuppli’s estimates of global DRAM market share in the fourth quarter and for the entire year of 2007.
As the wireless handset market surpasses 1 billion units shipped per year, application processors are an increasingly important enabler of multimedia capabilities in high-end smart phones and feature phones. However, according to a recent study from ABI Research, this growing market will face several key challenges over the next several years. “Over the course of our forecast period, from 2007 to 2012, strong unit growth is expected to be offset by dropping unit prices,” says ABI Research senior analyst Doug McEuen. “After significant increases during the next two years, the decline in unit prices will compress the application processor revenue to a flat growth rate.” In later stages of market development, application processor unit shipment growth is expected to decrease due to two key factors: integration, and the rise of the ULCH (Ultra LowCost Handset) market. Integration impacts the market negatively, as multimedia functions are combined into the baseband processors, fabricated using 65 nm or 45 nm process technology. Low-cost handsets, which will not require an additional application processor given their limited multimedia functionality, likely will increase market share within the overall wireless device market—further contributing to the decrease in application processor unit growth. In 2012, application processor revenue is expected to reach nearly $2.8 billion, with unit shipments of 553 million, at a cost of $5.04 per unit. ABI Research notes that over the next several years, the smart phone segment will be the largest market for application processor unit shipments. Although smart phone revenue will decrease from $3.3 billion in 2007 to $2.1 billion in 2012, it will continue to dominate the market. “One device with the strongest growth over the same forecast period is the high-end feature phone, or enhanced phone, as unit shipments are expected to increase by 42%,” concludes McEuen, “while revenue will rise by almost 21%.”
iSuppli Corporation, El Segundo, CA. (310) 524-4000. [www.isuppli.com].
ABI Research, Oyster Bay, NY. (516) 624-2500. [www.abiresearch.com].
A Year to Forget
FEBRUARY 2008
15
cover feature multicore for handhelds
Multicore Design for the Next-Generation “Kings of Cool” A multicore approach can help you design the next generation of “cool gadgets.”
by Ian Rickards, CPU Product Manager, ARM
T
The next generation of “cool gadgets” will inevitably place increasing demands on developers and designers to ensure that these devices can fulfill the requirements of an ever more discerning and competitive marketplace. There are a number of key features that need to be conquered before a device is crowned as the next “King of Cool.” Firstly, the lowest possible power consumption is essential to increase battery life and enable battery weight to be reduced. Secondly, heat build-up can become an issue to be tackled for ultra-thin, highperformance devices. Thirdly, a desktop-class user experience and compelling graphics are now essential even on handheld devices, and finally, enabling the ease of software development to ensure a rapid product time-to-market, which almost goes without saying. These seem to be conflicting goals, but multicore technology done correctly can contribute to improving all of the above. Multicore design
16
PORTABLE DESIGN
offers many significant advantages for handhelds—by using smaller, less complex CPUs in a multiprocessing cluster, the very best power efficiency can be obtained at the same time as the wide performance scalability required for the varying workloads found on a typical handheld. In 1999, Fred Pollack from Intel famously pointed out that single-core high-performance x86 processors have reached a complexity level where there is a steep relationship between power consumption and performance. Since then, dual cores with a simpler architecture have taken over the PC desktop. Many of the same principles apply to handheld devices, just on a much smaller scale. A single-chip ASIC platform can now easily contain a multiprocessor cluster of small and efficient processors that do not have the inherent instruction set complexity overhead of an x86. ARM launched its first multicore processor, the ARM11 MPCore, in 2004, and more re-
cover feature
figure 1
1000’s of DMIPS (aggregate)
4 CPU
Cortex-A9 Product Design Space
87-
3 CPU
65-
2 CPU
43-
1 CPU
21-
cently announced the ARM Cortex-A9 MPCore next-generation multicore processor in October 2007. Using either of these, it is possible to fit a powerful quad-processor symmetric multiprocessing core into just a few square millimeters of a modern ASIC.
Power Efficiency
Many people think that “multicore” means “more perforPower/Performance/Area trade-offs using the Cortex-A9 MPCore multicore processor. mance”—however, in the embedded world it also means “more power efficient.” This is possible in part thanks to the implementation options available when nd the processor is synthesized. For a specific processor on a particular process er exploration ether your goal node (e.g., 65 nm), there is still a wide range of speak directly implementation choice that can affect the power ical page, the ght resource. efficiency, including the basic fabrication process, technology, the standard cell library and the configuration of es and products the various synthesis implementation settings. ed Fabrication Process: In general, today’s single-core handheld devices use the low power (LP) variant of process due to the requirement to limit the power being consumed through leakage while the device is in standby. Although the more generic (G) type companies providing solutions now processes may have dynamic power, the exploration into products, technologies and companies. Whether your goal is to research the latest datasheet fromlower a company, mp to a company's technical page, the goal of Get Connected is to put you in touchhigher with theleakage right resource. Whichever level of levels would significantly limgy, Get Connected will help you connect with the companies and products you areitsearching for. the duration that a device could be kept in onnected standby before the battery completely drains. A multicore processor for many devices can, however, provide an additional dimension to this design choice. The amount of leakage is fundamentally related to the physical area of silicon to which power is applied. A larger single-core processor, whenever it has power applied, will leak across its entire area. A multicore processor can still provide the limited background processing on a single smaller CPU core, but may remove the power, Get Connected with companies mentioned in this article. and hence the associated leakage for the unwww.portabledesign.com/getconnected used processors. Power Consumption / Silicon Cost
End of Article
18
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
Standard Cell Library: When considering the standard cell library used to implement a silicon device, the main choice is the “track size” of the library, which determines the size and drive strength of the transistors. ARM Artisan physical IP offers three basic types of libraries to offer design flexibility between each library’s power and performance characteristic: (1) Metro - Low Power 7 track (lowest power, smallest transistors) (2) Advantage - General Purpose 9 track (general purpose, balanced design) (3) Advantage HS - 12 track (highest performance, largest transistors) Referencing the ARM Web site, you can see some example ARM1176JZ-S processor implementation trials using the standard 90G process, normal Vt single-core with 16KB+16KB L1 caches with two of these libraries. In Table 1, you can see how a dual-core multicore processor does not necessarily need to be “twice as big” as a single processor unless you need to achieve double the performance. Using a lower-density, more power-efficient library for the implementation and the ability to turn off a CPU and save half the leakage, the average energy consumption for a given performance level can be greatly reduced. Synthesis Options for Area and Speed: When targeting a lower operating frequency, the synthesis tool is able to use dense (but slower) complex library elements. Approaching the upper frequency limit, there can be a significant increase in silicon area as the synthesis tools need to expand out logic to meet timing. Combining the standard cell library and synthesis speed/area options gives a wide range of different Power/Performance/Area trade-offs. With multicore processors, developers have the option of the number of CPUs as well. These are illustrated in Figure 1, which compares the aggregate performance level achievable using the Cortex-A9 MPCore multicore processor. For a given total DMIPS requirement, there is a choice between implementing fewer highperformance CPUs or a larger number of small, lower-performance CPUs. At first sight, it does not appear to be much different. However, the key point is that the power efficiency
cover feature is much better with a larger number of smaller cores that provide more compute performance per mW. In the single-CPU era, most designs focus on maximum frequency to get best performance, by using standard cell libraries with medium/ large transistors. Multiprocessing changes that, as it now enables excellent aggregate performance by power and area efficient cell libraries, using a multicore approach. This seems like the natural choice, making full use of the small geometries for transistors as well as wires.
Power Scalability
As far as dynamic energy goes, running a given concurrent task on a quad-core processor does not necessarily use any more energy than being time-sliced on a single core. This is because the processors can be placed into standby as soon as it has no work to do. This may only be true on efficient tightly coupled multicore designs, if such immediate power save modes and the cost of utilizing the multicore is less than the overhead of time-slicing on a single process. ARM processors offer “Wait for Interrupt” functionality using the WFI instruction, where each processor can go into clock-stopped idle under the control of the OS with immediate effect when there is nothing to do. This is the main power-saving technique for the processor, since the core consumes only leakage current when the clock is stopped. The ARM MPCore technology has been designed to ensure that the overheads to support multicore in the hardware are minimized when compared to the software cost of time-slicing. Static power consumption is a major issue with handheld devices, as it affects the standby time. It is a particular problem with the latest small-geometry processes where the laws of physics dictate that the leakage must unavoidably increase. Turning off areas of unused logic with power gating is increasingly important for handheld devices. Multicore designs can offer the ability to trade off the max performance against static power, by powering on or off entire CPUs as they are required. A dual-core system has 50 percent and 100 percent performance levels using 1 or 2 CPUs, with corresponding 50 percent and 100 percent static power consumption. In a quad-core system there are 25, 50, 75 and 100 percent performance levels. So, during idle pe-
riods or periods of low activity, it makes a lot of sense to have only one processor active, to reduce the static power drain. Electrically this is fairly straightforward, as each CPU is placed in a different power domain, a technique that is well supported within the implementation reference methodologies (iRM) offered by the key EDA vendor for ARM processors. Software is straightforward, too—it is the job of the OS to look at the average load and choose the power state of the processors. Linux fully supports this “CPU hotplugging.” There is a small cost associated with turning a CPU on or off (it takes a few cycles to clean and reload the cache), but the benefits of reducing standby power can easily outweigh the costs.
Heterogeneous Multicore
Heterogeneous multicore involves combining several different types of processors into a single ASIC. These could be different processors with the same instruction set, or different instruction sets, or other dedicated processing engines such as audio or video engines. The benefit of dedicated hardware is that it can often be better for a specific task—for example, in a product that primarily plays video, dedicated video decode hardware will typically be more power efficient than a general programmable processor. However, the software design gets more complicated for heterogeneous multicore processors, as an operating system cannot handle these different processors using “symmetric multiprocessing.” Instead, the software engineer must handle all the message passing, synchronization and coherency between the different processors. With a heterogeneous system, the system design can also become much more complicated. For instance, developers need to
table 1 Artisan Library
Metro
Advantage-HS
Frequency
320 MHz
620 MHz
Std cell area
1.0 mm2
1.95 mm2
Memory area
0.6 mm2
0.7 mm2
Total macrocell area
1.6 mm2
2.65mm2
Total dynamic power
0.25 mW/MHz
0.45 mW/MHz
FEBRUARY 2008
19
cover feature
figure 2
consider the bus architecture—such as the open standard AMBA bus—to make it easy to connect the different processors and compute engines and the challenge of arranging coherency between processors in a heterogeneous system.
System Design
Multicore processors can assist developers to meet the demands of the next generation of handhelds.
nd
The ARM Cortex-A9 MPCore multicore processor hides the complexity of building a multicore cluster into a single pre-validated macrocell. Any overhead to support coherency is also handled inside the macrocell itself, maintaining peak performance and reducing the associated power consumption. The Cortex-A9 processor exposes just one or two AMBA 3 AXI master interfaces to the rest of the system, simplifying the design integration of a multicore. Other multicore solutions will need more work to integrate multiple independent processors into any coherent system.
Software Design
Software is a critical component in the latest handheld systems. Getting a product out on schedule is critical, so the software development must be low risk. A lot of traditional embedded software engineers are concerned about the comed plexity of writing software for multicores, which is quite understandable for engineers who have always worked on uniprocessor systems. The good news is that most of the complexity can be handled by the operating system. A symmetric multiprocessing companies providing solutions now (SMP) provides a highexploration into products, technologies and companies. Whether your goal is to research the latestoperating datasheet fromsystem a company, mp to a company's technical page, the goal of Get Connected is to put you in touchlevel with the right resource. Whichever level ofmakes it easy “threading” API, which gy, Get Connected will help you connect with the companies and products you areto searching for. multiple cores. Even real-time control onnected performance can be guaranteed by dedicating a task to a specific processor. By choosing a standard architecture and a standard operating system, the task can be made much simpler. Most OS vendors have announced or are working on SMP support, which is the easiest way to code for a multicore solution. Linux has good SMP support for many architectures including ARM, taking full advantage of new architectural features such as power-efficient Get Connected with companies mentioned in this article. spinlocks, thread ID register and memory www.portabledesign.com/getconnected regions supporting re-ordering of memory
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
End of Article
20
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
accesses, which allows full performance to be achieved. To program a multicore device often strikes fear into the heart of a programmer that has never before targeted such a device. Even with those that have, the experience was often difficult with the overheads to support concurrency often being greater than the benefit the programmer could bring by parallelizing their application. However, in the new embedded handheld devices utilizing the ARM MPCore technology, perceptions may just need to be changed. The tight integration of the multiprocessing capability ensures that the overheads to support concurrency are low and therefore enables very fine levels of software concurrency to realize a performance gain. In such devices the implicit concurrency between the active applications, the operating system and any peripheral drivers is also proving to provide effective levels of concurrency. It is really only in situations when a single significant task exists that the programmer may need to consider decomposing this into smaller tasks. Just like in traditional single-core devices, tasks are split between other accelerators and processors so as to maximize performance and ensure maximum power efficiency—maybe this move to multicore isn’t such a big step, when one realizes that many of today’s singlecore processor handhelds are actually already using multiple processors. The performance and power advantage of such multiple-processor embedded devices is simply being extended to include the multicore processor.
Multicore Benefits
This article has shown how multicore design can bring compelling benefits to handheld devices by, in essence, “achieving the impossible” of improving performance while at the same time lowering both dynamic and static power consumption. With well-implemented multicore and SMP operating systems supporting powerful software development, multicore design is enabling a completely new era of “cool gadgets.” ARM Inc., Sunnyvale, CA. (408) 734-5600. [www.arm.com].
NEW
Products Technologies Suppliers
The Newest Products and Technologies Are At mouser.com
The ONLY New Catalog Every 90 Days
Experience Mouser’s time-to-market advantage with no minimums and same-day shipping of the newest products from more than 335 leading suppliers.
2,112 Pages
The Newest Products For Your Newest Designs
www.mouser.com Over 910,000 Products Online
(800) 346-6873
cover feature multicore for handhelds
Solve Portable Design Problems Using Convenient Concurrency SMP multicore processors offer many advantages in portable products—if they’re properly designed.
D
by Grant Martin and Steve Leibson, Tensilica, Inc.
Discussions of multicore chips, multiprocessors and associated programming models for portable system design continue to be narrowly bounded by a focus on individual, generalpurpose processor architectures, DSPs and RTL blocks, which severely limits the possible ways in which you might use multiple computing resources to attack problems. Big semiconductor and server vendors offer symmetric multiprocessing (SMP) multicore processors, with each core supporting multiple threads. Such multicore chips are found in large servers and laptops. However, these power-hungry, general-purpose multiprocessor arrays do not serve well as processing models for many portable systems. Large servers and farms support applications such as Web query requests that follow a “SAMD” model: single application, multiple data (an oversimplification, perhaps, but a useful one). SAMD applications date back
22
PORTABLE DESIGN
to early mainframe days when computers were dedicated to one application such as real-time airline reservations and check-in systems or real-time banking. These big applications now run on servers—many are Web-based—and these applications are particularly suitable for SMP multicore processors; all of the processors run the same kind of code, the programs do not exhibit data locality, and the number of cores running the application makes no material difference other than execution speed. What stimulates a lot of interest, excitement and worry these days is the application of the same SMP multicore chips to embedded designs, particularly portable products. Here, the main concern is that very few applications running on such machines are “embarrassingly parallel” applications that can be cut up into multiple threads, each acting in parallel on part of the data. Graphics and video processing are embarrassingly parallel and that parallelism is
Few industries have applications with the complexity, strict safety standards, and low error tolerance of
Work with an advanced microelectronic and display partner who offers:
o Semiconductor packaging for specific circuit board
density, weight and environmental requirements o Hundreds of COTS memory devices and processors for high-reliability defense applications o Enhanced and ruggedized LCD panels for highperformance aerospace applications o Interface and electromechanical devices built to precise performance and design specifications o AS/9100 certified manufacturing facilities
Visit www.whiteedc.com/delivers or call 602.437.1520.
WWW.WHITEEDC.COM
cover feature
figure 1 Java Acceleration
Application CPU
Radio Resource Control
Video IF
DSP MAC (HARQ) Error Handling
Turbo Coding
Turbo Decoding
High-Speed RAM
MIMO
FFT IFFT
USB ADC
RF
RF
nd
Camera IF
Drawing Acceleration
3D Acceleration
NAND Flash Memory IF
SDRAM NOR Flash
Sound Acceleration GPS
Power Control
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
Image Acceleration
Picture Acceleration
DFT
DAC
LCD IF
Bridge
Flash IF
DTV IF
Sound IF
Memory Card
Super 3G Cell Phone Block Diagram
exploited in special-purpose graphics engines such as the IBM-Toshiba-Sony Cell processor (interestingly, not really an SMP multicore machine) and PC graphics chips offered by Nvidia and others.
ed
Software Engineers Don’t Think in Parallel exploration into products, technologies and companies. Whether your goal is to research the latest datasheet from a company,
companies providing solutions now
mp to a company's technical page, the goal of Get Connected is to put you in touch with the right resource. Whichever level Thinking about applying suchof SMP architecgy, Get Connected will help you connect with the companies and products you aretures searching for. to portable systems immediately draws
onnected
End of Article Get Connected
with companies mentioned in this article. www.portabledesign.com/getconnected
24
attention to a tool problem: practical softwaredevelopment tools that can automatically distribute a large, single-threaded application across many processors are simply not available. While hardware-description languages such as Verilog easily express parallel operations, software languages such as C, the current king of embedded software languages, are specifically designed to express single-threaded algorithms. A dilemma. There have been many attempts—such as Concurrent C, Unified Parallel C (UPC), mpC, pC and others—to extend C into the parallel-
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
programming domain. Sometimes these approaches use special libraries and APIs to allow explicit identification of parallel processes and the communications between them; MPI and OpenMP come to mind. Other researchers have attempted to create entirely new software languages that implicitly incorporate parallel programming structures, or explicitly allow concurrency to be expressed. (Remember Occam for the Transputer?) It may be that we are so steeped in single-tasking algorithmic culture (recipes, business procedures, first-aid techniques, etc.) that we have a hard time visualizing concurrent processes. For whatever reason, it appears to be very hard to train software programmers to think in terms of parallel operations. Barring breakthroughs in programmer training or in automated software parallelization, the future economics of SMP multicore chips remain perplexing for most portable applications. However, expanding our architectural thinking beyond SMP multicores uncovers at least two kinds of easily used concurrency that exploit heterogeneous, not homogeneous, concurrency. These approaches are better suited to portable applications. Both such system architectures fit very well into most 21st-century consumer devices including cell phones, portable multimedia players and multifunction devices.
Compositional Concurrency
You might call the first sort of parallelism “compositional concurrency,” where various subsystems—each containing one or more processors optimized for a particular set of tasks— are woven together into a product. Communications within this architectural design style are structured so that subsystems interact only when needed. For example, a user-interface subsystem running on a controller may need to switch audio processing on or off; to control the digital camera; or to manage video processing by stopping, pausing, or changing video playback in some other manner. In this kind of concurrent system, many subsystems operate simultaneously but they interact at only a high level and do not clash. Figure 1 shows a block diagram of a Super 3G mobile phone that illustrates this idea. There
cover feature Denard scaling curtailed at 90 nm. System designers must now adopt design styles that reduce system clock rates.”
are 18 identified processing blocks (shown in gray), each with a clearly defined task. In this example, it’s easy to see how one might use as many as 18 processors (or more for sub-task processing) to divide and conquer this problem. Some criticize this sort of architectural design style because it’s theoretically inefficient in terms of gate and processor count. Ten, twenty, or more processor cores could, at least in theory, be replaced with just a few generalpurpose cores (perhaps SMP coherent multicores) running at much higher clock rates. This criticism is misplaced. While Moore’s law (providing more transistors per fabrication node) marched in lockstep with Denard (classical) scaling, (which provided faster, lowerpower transistors at each fabrication node), the big, fast processor design style held sway. Denard scaling curtailed at 90 nm; power dissipation and energy consumption become unmanageable at high clock rates; and system designers must now adopt design styles that reduce system clock rates. A compositionally concurrent design style offers tremendous advantages: • Distributing computing tasks over more on-chip processors trades additional transistors in exchange for lower clock rate to reduce overall power and energy consumption. Given the continued progress of Moore’s law and the end of Denard scaling, this is a good engineering trade-off because energy consumption rises superlinearly
with clock frequency. In addition, the use of lower clock rates drops the need to run in the fastest possible process technology. Using a low-power process technology at any given fabrication node can reduce leakage current by as much as three orders of magnitude! That’s why lowering clock rate is critically important for portable systems, which are in standby mode most of the time, so leakage currents largely determine energy consumption and therefore battery life. • Dedicated subsystems can be easily powered down when not used. They can also be shut off and restarted quickly. • Because these subsystems are task-specific, application-specific instruction set processors (ASIPs) that are much more area and power efficient than general-purpose processors can be designed for each processor used in the system, so the gate-count advantages of fewer general-purpose cores may be much less than it seems at first. • This design style avoids complex interactions and synchronizations between subsystems that are common with SMP and multithreaded designs. Proving that a 4core SMP system running a cell phone and its audio, video and camera functions will not drop a 911 emergency call when other applications are running, or that low-priority applications will be properly suspended when a high-priority task interrupts, often
figure 2
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Direct Connect Versus Buses
FEBRUARY 2008
25
cover feature
invokes an analysis nightmare—“death by simulation.” Reasonably independent subsystems interacting at a high level are far easier to validate both individually and compositionally.
figure 3
ITRS 2006 Wire Spacing, 8-10 Metal Layers 90nm: 100,000+ wires/square mm 65nm: Almost 200,000 wires/square mm
Layer 1
Layer 2
Layer 3 Layer 5 Layer 4 Layer 6
nd Layer 7
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
Layer 8
ed Nanometer I/O Routability
companies providing solutions now
exploration into products, technologies and companies. Whether your goal is to research the latest datasheet from a company, mp to a company's technical page, the goal of Get Connected is to put you in touchDivide with the right resource. Whichever level of and Conquer gy, Get Connected will help you connect with the companies and products you are searching for. Design tools to support this type
onnected
End of Article Get Connected
with companies mentioned in this article. www.portabledesign.com/getconnected
26
of systemdesign style already exist in the form of system-simulation tools based on SystemC. Various subsystems can be written in C (reminder: that’s the software-programming language that everyone already knows how to use), can be proven individually, and can then be simulated as a system using instruction-set simulators that are hundreds or thousands of times faster than the gate-level simulators needed for RTL simulation. This speed advantage grants system designers the luxury of trying different system architectures and choosing the best one, instead of today’s situation where system
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
architecture is often selected through the application of “Kentucky Windage” (see www. microwaves101.com/encyclopedia/slang.cfm for a proper definition of this technical term). The various simulated subsystems communicate with each other using messaging protocols and the entire design style lends itself well to the strongest practice in all design engineering: divide and conquer. Pipelined dataflow, the second kind of concurrency, complements compositional concurrency. Computation often can be divided into a pipeline of individual task engines. Each task engine processes and then emits processed data blocks (frames, samples, etc.). Once a task completes, the processed data block passes to the next engine in the chain. Such asymmetric multiprocessing algorithms appear in many signal- and image-processing applications from cell-phone baseband processing to video and still-image processing. Pipelining permits substantial concurrent processing and also allows even sharper application of ASIP principles: each of the heterogeneous processors in the pipeline can be highly tuned to just one part of the task. Combining the compositional-subsystem style of design (as just described) with pipelined, asymmetric multiprocessing (AMP) in each subsystem makes it apparent that products in the consumer, portable and media spaces may need 10 to 100 processors—each one optimized to a specific task in the product’s function set. Programming AMP applications is easier than programming multithreaded SMP applications because there are far fewer intertask dependencies (if any) to worry about. Experience shows it is possible to cleanly write software in this manner, and many optimization issues arising from the use of multiple application threads running on a limited set of identical processors are simply avoided.
Get Off the Bus
The use of large numbers of configured processors greatly accelerates individual tasks, as shown above. The way these processors are interconnected also greatly affects system performance. Although the usual way to hook all the processors together is to use one central bus, this aged design approach makes little sense
cover feature with today’s nanometer SoCs. The more processors you saddle on that one bus, the more bus contention you’ll have. You’ll then need to schedule and arbitrate bus access. Suddenly, you’ve created a problem that need not exist, because in many systems, particularly AMP systems like the ones just described, each processor need not talk to every other processor. Figure 2 shows a system that illustrates this point. Some specific communication paths are needed but most possible processor-to-processor connection paths made possible by a global bus are not. If each of these blocks were simple RTL hardware blocks, we’d simply connect them as shown by the large arrows. For some reason, when the blocks become processors, we feel the need to hook them all to one bus. That’s simply the wrong approach. We intuitively know this for RTL blocks but become oblivious when the blocks become processors. The right system-design approach is to connect the on-chip processors as demanded by the system architecture. Make your interconnection scheme match the actual needs of the system using buses, queues and simple parallel ports. “Won’t that add a lot of wires?” you might ask. Yes it will. So let’s look at where our system-design thinking has come from and where it’s taken us. Intel housed the first commercial microprocessor, the 4004, in a 16-pin package. Intel was primarily a memory-chip vendor at the time, and the company’s most economical package was a 16-pin DIP. The 4004’s designers developed a multiplexed, 4-bit bus to fit the available package. It may be hard to believe now, but few hardware designers used buses before the advent of the 4004 microprocessor. Earlier systems were built using point-to-point connections with very little or no sharing or multiplexing. Then, wires were cheaper than transistors. After the introduction of the microprocessor, buses came to dominate the way we connected devices in a system. We now use them by instinct without really thinking things through. The bus has always been a processor bottleneck. Over time, packaged microprocessors have evolved from 16-pin packages to nearly 1000 pins in an attempt to alleviate this bottleneck. In the world of packaged processors, every pin on the package costs money so there’s
The more processors you saddle on that one bus, the more bus contention you’ll have. The bus has always been a processor bottleneck.
a bit of give and take between cost and performance. However, that’s simply not the situation with on-chip processors. If we do the math, we see that nanometer silicon gives you a lot of raw I/O routability. At 90 nm, you can route more than 100,000 wires into a square millimeter of silicon. At 65 nm, you can route nearly 200,000 wires into and out of each square millimeter. Figure 3 illustrates this idea. Practically speaking, use of wide, point-to-point interconnections between on-chip blocks is not outrageous and can be beneficial. SoC design gives us the ability to make the interconnect scheme match the problem. Perhaps a shared bus is the right approach, but perhaps not. Other schemes for regular interconnect include on-chip networks and cross bars. But in many cases, an approach that connects blocks as demanded by the target application is the most economical, delivers the best performance and is therefore the best choice. Those worried that the future will not allow large-scale use of many processors or cores for a wide range of applications should take heart. Indeed, this will clearly be possible, even likely! But it is important for everyone working in these areas to recognize that “There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.” (Hamlet, Act I, Scene V). Taking the wide view, the world truly is conveniently concurrent! Tensilica Inc., Santa Clara, CA. (408) 986-8000. [www.tensilica.com].
FEBRUARY 2008
27
cover feature multicore for handhelds
Good Embedded Communications is the Key to Multicore Hardware Design Success Self-timed NoC interconnects can solve a lot of the problems with overloaded data buses.
W
by David Lautzenheiser, Vice President of Marketing, Silistix
While multicore processors have certainly become an important part of many SoC designs, there are still several obstacles designers face in dealing with more than one processing engine on a chip. Software engineers face the problem of trying to efficiently program multiple processor cores on the same piece of silicon. On the hardware side, chip developers—from architects down to physical implementation engineers—face difficult communication issues between the various processing and other IP cores and in accessing off-chip DRAM. Concentrating on the hardware aspects of multicore chip design, a major problem is the industry’s reliance on hierarchical, clock-based bus structures to move data among the various processing cores and the memories—both embedded and off-chip. It’s time to look at selftimed network-on-chip (NoC) interconnect fabrics for embedded communication networks.
28
PORTABLE DESIGN
This article will review the challenges of clockbased buses being used as the main communications mechanism and discuss how self-timed NoC interconnects improve on-chip data flow, simplify and enhance power management, and increase shared memory efficiency for multicore processor SoCs.
Why Multicore?
As SoC designers began “hitting the wall” developing single-processor chips for tackling the increased demands of high-definition video processing and other user requirements, they started developing chips using multiple processing cores. The multicore approach— based on the assumption that you could break an overall processing job into multiple tasks that could be done concurrently by several processors—resulted in chips that had better performance than single-processor designs but also with lower power dissipation,
LatticeECP2M FPGAs More of the Best
ϑ 4 to 16 SERDES (3.125Gbps) Only 100mW per channel
ϑ Up to 5.3Mb of Block and Distributed RAM
Supports PCI Express, Ethernet
ϑ Up to 95K LUTs
& other packet protocols
ϑ DSP Blocks
With multiply and accumulate
ϑ PLLs and DLLs
For optimized frequency synthesis & clock alignment
ϑ Flexible I/O Up to 601 I/O
ϑ Pre-Engineered Source Synchronous I/O Up to 840Mbps LVDS I/O
ϑ Superior Configuration Options
• Encrypted bitstream support • TransFR™ technology for easy field updates • Dual boot support
LatticeECP2M: The First Low-Cost FPGA with 3Gbps SERDES LatticeECP2M FPGAs give you “More of the Best” for less. Visit our website at www.latticesemi.com. You’ll find information about Lattice’s complete line of FPGAs, including LatticeECP2M, LatticeECP2™, LatticeSCTM Extreme Performance System Chip FPGAs, LatticeXPTM non-volatile FPGAs and many more. If you haven’t looked at Lattice FPGAs lately, look again – things have changed.
Get more for less with Lattice’s new LatticeECP2M™ family. No other low-cost FPGA offers up to 16 SERDES channels with full-duplex serial data transfers at rates up to 3.125Gbps. Best of all, each SERDES channel operates on a cool 100mW at maximum speed. The LatticeECP2M family offers even more, including up to 5.3Mb of RAM, high-speed DSP blocks, 533Mbps DDR2 memory interface and SPI4.2 support. Plus, 128-bit AES Encrypted Bitstream support and Transparent Field Reconfiguration (TransFRTM) allow you to keep your designs secure and easily upgradeable even after your product has shipped.
For design software and a FREE FPGA handbook go to latticesemi.com/ecp2m
©2008 Lattice Semiconductor Corporation. All rights reserved. Lattice Semiconductor Corporation, L (& design), Lattice (& design), LatticeECP2M, LatticeECP2, LatticeSC, LatticeXP, TransFR, and specific product designations are either registered trademarks or trademarks of Lattice Semiconductor Corporation or its subsidiaries, in the United States and/or other countries. Other marks are used for identification purposes only, and may be trademarks of other parties.
cover feature Aggregate Bandwidth
which meant less heat. Since increased heat and higher chip operating temperature was a big problem with cranking up the clock rate on a single core processing chip, multicore architectures appeared to be a viable solufigure 1 tion for keeping clock rates and heat pros r so duction manageable. es s c r so Pro es Other advantages of c 4X Pro using multiple cores 2X DDR4 at lower clock rates s r so included fewer signal es c Pro integrity problems, 1X DDR3 less electromagnetic interference (EMI) DDR2 and fewer problems associated with distributing very high frequency clocks on silicon. nd However, there are Clock Rate several barriers to er exploration ether your goal designing high-end speak directly multicore procesical page, the As the number of processing cores in a multicore SoC increases, usght resource. ing clock-based buses to move data between the processors and shared sor chips. Software technology, resources such as memories, becomes an increasingly difficult problem. development—not es and products This limits the scalability of traditional bus architectures for large numbers of the subject of this processing cores. ed article—becomes far more complex due to the difficulties in breaking a single processing task into multiple parts that can be processed separately and then reassembled later. This reflects the fact that certain procompanies providing solutions now cessor cannot be easily parallelized exploration into products, technologies and companies. Whether your goal is to research the latestjobs datasheet from a company, mp to a company's technical page, the goal of Get Connected is to put you in touchtowith the concurrently right resource. Whichever level of run on multiple processing gy, Get Connected will help you connect with the companies and products you arecores searching for. and that load balancing between proonnected cessing cores—especially heterogeneous cores—is very difficult. The other problems with multicore chip design are hardware-based. This is the topic of this article—the difficulties associated with current multicore architectures and what can be done to overcome some of these problems.
End of Article Get Connected
with companies mentioned in this article. www.portabledesign.com/getconnected
30
Challenges with Current Multicore Architectures
Today’s multicore processor chips usually communicate through a traditional clock-
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
based bus system, often a hierarchical bus architecture with very tight coupling between the different levels of the bus hierarchy. Data is moved in and out of each processor, and between processors and memory, on a clock edge. However, having a clock “control” the flow of data between processors, memories and peripherals on a multicore chip is a far more complex problem than with a chip using a single processing engine. Insufficient Interconnect Bandwidth For multicore systems, the data communications problems increase due to the need for load sharing—keeping all processor cores busy by feeding them data at the right times. If the bus system cannot handle the data flow requirements of the processing cores, data congestion or processor “starvation”—a processor idling while waiting for data—may result and processor efficiency suffers. For example, in a complex multicore system many processors may simultaneously attempt to initiate transactions to many destinations. Forcing all transaction traffic to travel one transaction per clock across the already heavily loaded bus can quickly create a bottleneck that can add large queuing delays to transaction delivery. This is particularly true for multicore systems that use multiple memory cache levels on a shared system bus. Standard bus protocols such as AMBA create arbitration inefficiencies, and inefficient arbitration can cause stalled cycles and keep a processor core idle, reducing performance and increasing power consumption. Furthermore, the latency of physically long bus lines worsens this situation, which is an inherent problem when clock-based buses are used to transfer data between cores. Memory Access Issues As process nodes shrink and processors become faster, there is an increasing disparity between memory and processor speed. Increasing the number of processor cores on a chip results in increased contention for memory bandwidth, and the common practice of sharing memory resources among multiple cores lowers the bandwidth available to each processor. This is particularly evident when
cover feature table 1
Synchronous Bus
Self-timed interconnect
System bandwidth
Slowest IP core on the bus is often the limiting factor in bus performance
Data moves at wire speed on interconnect between cores, not limited by a clock rate
multicore network
Ability to add pipeline latches to increase throughput
Possible at expense of all blocks having to cope with faster interconnect
Simple, since only faster blocks need to operate at the higher speed
communications is
Power consumption
Clock and the cores it drives Only consumes power when consume power even when idle transferring data
System power management
Difficult due to coupling between various IP cores
Additional wiring cost
Large due to the global clock Lower due to shorter local distribution network and for the wires in datapath and various buses, since they have acknowledge to run from the IP cores to the processors that access those cores.
Flexibility to trade lowfrequency parallel vs. highfrequency serial operation
Difficult due to fixed clock Automatic since every frequency within a synchronous communication is self-timed time domain
Timing closure cost
Much validation and many design iterations
Much less validation and far fewer iterations
Radio Frequency Interference
High amplitude, frequencyphased emissions
Low amplitude, spread across the spectrum, and not coupled to system clock frequency
Core Scalability
Limited to a relatively small number of cores since system complexity becomes unmanageable for a large number of cores
Network topology model proven effective in scaling to connect large numbers of processing systems
Using hierarchical bus architectures for
effective-up to a point.
processors access off-chip DRAM, which is one of the drivers for faster DDR speeds (the industry is currently transitioning to DDR3— with clocking up to 1.6 GHz—and is already looking into a specification for a higher performance DDR4 memory). The problem is that by increasing memory access speed by increasing clock speed, you are negating a big advantage of multicore architectures—lower clock speed, less power dissipation and less heat generation. Core Limits Using hierarchical bus architectures for multicore network communications is effective—up to a point. The bus concept is scalable to a relatively small number of processor cores, around four to eight. Using a bus communications structure for more than around this number of processors is very difficult due to shared resource management and bandwidth limitation issues. The degree of difficulty in providing adequate communications among the processors and shared resources increases rapidly as the number of processing cores increases (Figure 1). Achieving Timing Closure The need for a high-speed clock distribution network across a chip to support any clock-based data transfer between processors and other cores makes timing closure for chip designers very difficult due to the need for tight phase control for the hierarchical bus interconnect system. The chip designer not
Core non-interdependency allow maximum flexibility in power management of each core
Using a self-timed interconnect network instead of a clock-based synchronous bus provides many design and system performance benefits
only has to deal with timing closure issues for the various IP cores on the chip, but with the clocked hierarchical bus system that connects the cores as well. Interconnect timing closure can add significant time to the total chip hardware design effort.
Advantages of a Self-Timed Communications Network
Replacing a traditional bus interconnect system with a self-timed network provides several advantages for multicore processor chips, as shown in Table 1. FEBRUARY 2008
31
cover feature
The network topology model has been proven effective in scaling to connect large numbers of processing systems as demonstrated by the scale of the Internet.
Power Management A self-timed communications fabric allows each processor core to operate truly independently of the other processors on nd the chip—not coupled with master-clockdependent bus lines. This gives the chip er exploration ether your goal architect maximum flexibility in the power speak directly management of each processor and of the ical page, the ght resource. entire SoC. The system developer can scale technology, the clock rates of different cores or even es and products turn off ones that are not needed for a pared ticular application, thus adding to overall system power efficiency. In addition, the behavior of the interconnect is independent of individual core (processor or other) power management. An additional benefit of a self-timed intercompanies providing solutions now connect the various cores on exploration into products, technologies and companies. Whether your goal is to research the latestfabric datasheetbetween from a company, mp to a company's technical page, the goal of Get Connected is to put you in touchthe withchip the right resource.reduction. Whichever level of is power Unlike a clockedgy, Get Connected will help you connect with the companies and products you arebased searching for. bus, which dissipates power on every onnected clock cycle, self-timed interconnect only dissipates dynamic power when data is moving between two cores. Since a global clock network is a major source of on-chip power dissipation, as much as 25% of total chip power, the power savings with self-timed interconnect can be significant. Eliminating the need for a global clock distribution network also eliminates the need for developing a means of maintaining low clock skew throughout Get Connected with companies mentioned in this article. the chip. Tight clock phase control to keep www.portabledesign.com/getconnected clock skew low (generally 5% or less of the
End of Article
32
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
clock period) is needed for any clocked interconnect system, particularly one with a collection of coupled buses. A further benefit is that chip implementation is greatly simplified by eliminating the power-hungry clock distribution network. Less power dissipation also reduces concerns regarding electromigration problems and enhances chip reliability. Scalability The network topology model has been proven effective in scaling to connect large numbers of processing systems as demonstrated by the scale of the Internet. This provides an existence proof that the development of chips with very large numbers of processor cores is possible only with a network topology, particularly with the switching completely independent of any of the processors. Data Transfer and System Bandwidth With a bus-based interconnect system, the slowest IP core on the bus is often the limiting factor in bus performance. This is the reason why many chips have their peripheral IP cores segregated onto a separate bus at a lower clock rate—to prevent slowing down the main processor bus(es). With self-timed interconnect between chip cores, data travels at wire speed between endpoints of a communication channel and is not limited by a clock rate, since interconnect data transfer is not clock-dependent and is independent of processor operation. Interconnect latency problems due to long lines that are dependent on multi-cycle clock transport of data are non-existent since the self-timed interconnect lines don’t use a clock. Data is available to a processor faster without regard to specific processor clocking schemes or processor clock rates. Thus, data is presented to and consumed by a processor at a rate dictated by that processor, not by interconnect characteristics such as latency and bandwidth. A self-timed communications channel still presents some latency to the system. In fact, in most situations a clock-based bus will result in a better “best-case” latency. However, the challenge in designing a bus-based system is
cover feature that it is extremely difficult to predict what the worst-case latency will be, since it is highly dependent on what the various processors that access the bus are doing at any point of time (in other words, data transfer characteristics are not deterministic). So, with a clock-based bus, we might have a faster best-case latency, but a longer and much less predictable worstcase latency. With the right design tools and component libraries, designers can take into consideration anticipated total worst-case traffic and size the components used in a selftimed NoC to deliver data to meet worst-case latency requirements, at the desired bandwidth. For example, Silistix uses CSL (Connection Specification Language) to describe the interconnect fabric between IP cores and the CHAINworks tool suite to provide a systematic way of describing the communications needs of the cores and automatically synthesizing a communications system that meets those aggregate requirements. Memory Access A well-designed, self-timed interconnect network can overcome many of the memory access problems associated with the hierarchical clock-based bus architectures that are predominant in processor-based systems. This is particularly true for accessing off-chip memory through an embedded DRAM controller. For example, Silistix’s tools synthesize self-timed interconnect networks from a high-level, architectural description with several attributes that help optimize DRAM operation. Among these attributes are: carrying the identification of the requesting IP core, even if the protocol at the requesting end does not explicitly support such identification; the bandwidth of a synthesized selftimed communications network is generally much greater than any of its endpoints, such as the DRAM controller; the self-timed interconnect is completely endpoint “transparent;” any request from an initiator (for example, a processor core) is delivered unaltered; and adaptors (logic that services the needs of endpoints) have a request reordering capability for endpoints that lack it, thus increasing the efficiency of an endpoint such as a DRAM controller.
Timing Closure Timing closure is easier to obtain, potentially saving months of design effort, and overall chip clocking (of cores) is simplified. Using self-timed interconnect, there are no signals with critical top-level timing closure requirements to worry about. Once the individual IP cores have achieved timing closure, the composite top-level chip should meet timing on the first pass. Chip Implementation With a self-timed interconnect, physical chip implementation is greatly simplified. Without the problems associated with a highspeed clock distribution network across a chip and clock-based data transfer between processors and other cores, the time needed for chip layout and verification is reduced. A selftimed communications fabric also provides maximum design flexibility for a multicore processing chip, since the chip developer can implement the optimum communications network topology for a particular design. This not only simplifies the design of the chip’s hardware architecture, but may also provide more flexibility for software developers in partitioning tasks between the various processing cores. Silistix, San Jose, CA. (408) 436-1656. [www.silistix.com].
FEBRUARY 2008
33
consumer electronics dsp encoding for video
Optimizing Video Encoders with Digital Signal Processors The range of features in advanced compression standards offers a large potential for trading off the options in order to balance complexity, delay and other real-time constraints. by Ajit Rao, Texas Instruments
V
Video compression allows for digital video encoding, using as few bits as possible while maintaining acceptable visual quality. However, video compression involves sacrificing some degree of picture quality for a lower bit rate that facilitates transmission and storage. In addition, compression requires a high level of performance from the processor as well as versatility in design, since different types of video applications have different sets of requirements for resolution, bandwidth and resiliency. The extended flexibility provided by digital signal processors (DSP) addresses these differences and takes full advantage of the options offered by advanced video compression standards to help system developers optimize their products. The inherent structure and complexity of video encoding and decoding (codec) algorithms drives the optimization approach. Encoders are particularly important because they
34
PORTABLE DESIGN
must adapt to the application and they represent a major portion of the heavy processing load of video applications. While encoders are based on the mathematical principles of information theory, they may still require implementation trade-offs, which can be quite complex. Developers can benefit from encoders that are highly configurable and can provide an easy-to-use system interface and performance optimization for a wide range of video applications.
Video Compression Features
Raw digital video requires a lot of data to be transmitted or stored. An advanced video codec such as H.264/MPEG-4 AVC can achieve compression ratios of between 60:1 and 100:1 with sustained throughput. This makes it possible to squeeze video with a high data rate through a narrow transmission channel and store it in a limited space. Like JPEG for still images, ITU and MPEG
consumer electronics video encoding algorithms can occupy a combination of discrete transform coding (DCT or similar), quantization and variable-length coding to compress macro-blocks within a frame (intra-frame). Once the algorithm has established a baseline intra-coded (I) frame, a number of subsequent predicted (P) frames are created by coding only the difference in visual content or residual between each of them. This inter-frame compression is achieved using a technique called motion compensation. The algorithm first estimates where the macro-blocks of an earlier reference frame have moved in the current frame, then subtracts and compresses the residual. Figure 1 shows the flow of a generic motion compensation-based video encoder. The motion estimation stage, which creates motion vector (MV) data that describe where each of the blocks has moved, is usually the most computation-intensive stage of the algorithm. Figure 2 shows a P frame (right) and its reference (left). Below the P frame, the residual (black) shows how little encoding remains once the motion vectors (blue) have been calculated. Video compression standards specify only the bit-stream syntax and the decoding process, allowing for significant latitude for innovation within the encoders. Another area of opportunity to innovate is with rate control, which allows the encoder to assign quantization parameters and therefore “shapes� the noise in the video signal in appropriate ways. In addition, the advanced H.264/MPEG-4 AVC standard adds flexibility and functionality by providing multiple options for macro-block size, quarterpel (pixel) resolution for motion compensation, multiple-reference frames, bi-directional frame prediction (B frames) and adaptive in-loop deblocking.
Varied Application Requirements
Video application requirements can vary enormously. The range of features in advanced
figure 1 Input Video Signal RGB to YUV
Buffer Fullness (Qp) Residual DCT
Quantize
Huffman Coding
Inverse Quantize MV Data
Buffer Output Bitstream
Inverse DCT Prediction
Previous Reconstructed Frame
Motion Compensation MV Data Motion Estimation
Flow of a generic motion compensation-based video encoder.
compression standards offers a large potential for trading off the options in order to balance complexity, delay and other real-time constraints. Consider, for instance, the different requirements for video phones, video conferencing and digital video recorders (DVRs). Video Phones and Video Vonferencing With video phone and video conferencing applications, transmission bandwidth is typically the most important issue. This can range from tens of kilobits per second up to multimegabits per second depending on the link. In some cases, the bit rate is guaranteed but with the Internet and many intranets, bit rates are highly variable. As a result, video conferencing encoders frequently need to address the delivery requirements of different types of links and adapt in real time to changing bandwidth availability. When the transmitting system is notified of reception conditions, it should be able to adjust its encoded output continually so that the best possible video is delivered with miniFEBRUARY 2008
35
consumer electronics
figure 3
figure 2
Frame #1
Frame #2
Frame #3
Frame #4
Frame #5
Frame #6
Frame #7
Frame #8
Frame #9
Frame #10
Frame #11
Frame #12
Progressive strips of P frames can be intra-coded (I strips), which eliminates the need for complete I frames (after the initial frame).
mal interruption. When delivery is poor, the encoder may respond by reducing its average bit rate, skipping frames or changing the group of pictures (GoP), the mix of I and P frames. I frames are not as heavily compressed as P frames, so a GoP with fewer I frames requires less bandwidth overall. Since the visual content of a video conference does not change frequently, it is usually acceptable to send fewer I frames than would be needed for entertainment applications. H.264 uses an adaptive in-loop de-blocking filter that operates on the block edges to smooth the video for current and future frames, resulting in an improvement of the quality of video encoded especially at low bit rates. Alternatively, turning off the filter can increase the amount of visual data at a given bit rate, as can changing the motion estimation resolution from quarter-pel to half-pel or more. In some cases, it may be necessary to sacrifice the higher quality of de-blocking and fine resolution in order to reduce the complexity of encoding. Since packet delivery via the Internet is not 36
PORTABLE DESIGN
Coding the difference between frame 1 and frame 2.
guaranteed, video conferencing often benefits from encoding mechanisms that increase error resilience. As Figure 3 illustrates, progressive strips of P frames can be intra-coded (I strips), which eliminates the need for complete I frames (after the initial frame) and reduces the risk that an entire I frame will be dropped and the picture broken up. Digital Video Recording Digital video recorders (DVRs) for home entertainment are perhaps the most widely used application for real-time video encoders. For these systems, achieving the best trade-off of storage with picture quality is a significant problem. Unlike video conferencing, which is not delay-tolerant, compression for video recording can take place with some real-time delay if sufficient memory is available in the system for buffering. Realistic design considerations mean the output buffer is designed for handling several frames, which is sufficient to keep a steady flow of data to the disk. Under certain conditions, however, the buffer may become congested because the visual information is changing quickly and the algorithm is creating a large amount of P frame data. When the congestion has been taken care of, then the quality can be increased again. A mechanism for performing this trade-off effectively is by changing the quantization parameter, Qp, on the fly. Quantization is one of the last steps in the algorithm for compressing
consumer electronics data. Increased quantization reduces the bit rate output of the algorithm but creates picture distortion in direct proportion to the square of Qp. Increasing Qp reduces the bit rate output of the algorithm but sacrifices picture quality. However, since the change occurs in real time, it reduces the likelihood of frame skips or picture break-up. When the visual content is changing rapidly, as it is when the buffer is congested, lower image quality is likely to be less noticeable than it is when the content changes slowly. When the visual content returns to a lower bit rate and the buffer clears, Qp can be reset to its normal value.
Flexibility with Encoders
Since developers utilize DSPs for a wide range of video applications, DSP encoders should be designed to take advantage of the flexibility inherent in compression standards. An example can be found in the encoders that operate on Texas Instruments’ OMAP media processors for mobile applications, TMS320C64x+ DSPs or processors based on DaVinci. To maximize compression performance, each of the encoders is designed to leverage the DSP architecture of its platform, including the video and imaging coprocessor (VICP) that is designed into some of the processors. A basic set of APIs with default parameters is used in all the encoders, so that the system interface remains the same, regardless of the type of system. Extended API parameters adapt an encoder to the requirements of specific applications. By default, parameters are preset to high-quality settings; a high-speed preset is also available. All preset parameters can be overridden by the program using extended parameters. Extended parameters adapt the application to either H.264 or MPEG-4. The encoders support several options including YUV 4:2:2 and YUV 4:2:0 input formats, motion resolution down to a quarter-pel, I frame intervals ranging from every frame to none after the first frame,
Qp bit rate control, access to motion vectors, de-blocking filter control, simultaneous encoding of two or more channels, I strips and other options. The encoders dynamically and unrestrictedly determine the search range for motion vectors by default, a technique that improves on fixed-range searches. Furthermore, there are generally “sweet spots� of operation, where output bit rates operate optimally for a given input resolution and frames per second (fps). Developers should be aware of the sweet spots with encoders in order to design their systems for the best trade-offs of transmission and picture quality. As digital video continues to spread to new types of systems, developers need to be aware of the many differences that exist among the wide range of video applications. In general, compression requirements trade off bit rate and picture quality, though the many different ways of achieving these trade-offs can be complicated. DSP encoders provide systems engineers with the performance and flexibility to adapt video compression to the requirements in the rapidly expanding world of digital video. Texas Instruments Inc., Dallas, TX. (800) 336-5236. [www.ti.com].
FEBRUARY 2008
37
portable power power gating
Architectural Issues for Power Gating This article discusses some of the architectural issues involved in implementing power-gating designs. In particular, it addresses the issues of partitioning, hierarchy and multiple power-gated domains. by M ichael Keating, Synopsys, Synopsys Fellow David Flynn, ARM, ARM Fellow Robert Aitken, ARM, ARM Fellow Alan Gibbons, Synopsys, Principal Engineer Kaijian Shi, Synopsys, Principal Consultant
A
A scalable approach to chip architecture is valuable since a system-on-chip design today often becomes a component in an even larger chip in a subsequent product generation.
Hierarchy and Power Gating
To support this portability, module boundaries must be enforced at the power domain level. That is, a given module should belong to a single power domain, not split across several domains. Some tools and flows support RTL process by RTL process assignment to power domains, but this leads to much more complicated implementation and analysis. Clean visibility of the boundaries of a power-gated block is key to having a clean, top-down implementation and verification flow. Although one can, in theory, nest power-gated modules arbitrarily within power-gated subsystems, which are in turn nested on a shared switched power rail, there are considerable benefits in not creating multiple levels of power switching fabric. Power gating is intrusive and adds in some voltage drop and degradation of performance. Cascading multiple voltage drops can lead to unacceptable increases in delay. Even if the design is represented as hierarchical at the architectural level, the implementation is improved if this is mapped onto a single
38
PORTABLE DESIGN
level of power gating at implementation. Consider the example shown in Figure 1. The CPU conceptually has all the core logic power gated, and within it a number of functional units that can each be powered down independently—a Multiply-Accumulate and a Vector Floating Point unit in this case. The modes of operation in Figure 1 are shown in table form in Table 1. From an implementation standpoint the switching fabric is flattened as shown in Figure 2. There is never a case when the MAC or VFP functional units are switched on without the CPU core also being powered. So the switch control semantics are adjusted to AND the control terms rather than cascade the switch elements. The power mode table now includes explicit control of the nested power-gated functional units (Table 2).
table 1 Cache
CPU
MAC
VFP
Power State
(OFF) ON ON ON ON
(OFF) OFF ON ON ON
OFF ON OFF
OFF OFF ON
Shutdown (Cache cleaned, VDDCPU off) Deep Sleep (Cache preserved) Normal Operation DSP workload Graphics workload
ON
ON
ON
ON
Intensive multimedia mode
portable power Recommendations: • Map power-gated regions to explicit module boundaries. • When partitioning a hierarchical powergating design, ensure that the power-gating control terms can be mapped back to a flat switching fabric. Pitfalls: • Avoid control signals passing through power-gated or power-down regions to other power regions that are not hierarchically switched with the first region. • Avoid excessively fine power-gating granularity unless absolutely required for aggressive leakage power management. Every interface adds implementation and verification challenges and complicates the system-level production test challenges. • Avoid a power-gating system of more than one or two levels.
Power Networks and Their Control
In the design of a processor-based SoC the CPU system may well introduce a number of power networks: • An independent power rail to the entire cached CPU subsystem—this allows the CPU to be completely turned off for longterm “sleep” modes of operation. • A power-gated supply to the CPU logic to support short-term leakage savings modes where the cache memory can be left retained but all the leaky standard cell logic turned off locally. • Optionally, some form of always-on retention power supply from the non-powergated rail. This is needed to support stateretention registers in the standard cell portion of the design. • An always-on supply to provide power to the isolation cells. • A non-power-gated supply for the powergating controller and for the buffers on all the power control signals: the power switch controls, the retention controls and the isolation controls.
table 2 Cache
CPU
MAC
VFP
Power State
(OFF) ON ON ON ON
(OFF) OFF ON ON ON
(OFF) OFF OFF ON OFF
(OFF) OFF OFF OFF ON
Shutdown (Cache cleaned, VDDCPU off) Deep Sleep (Cache preserved) Normal Operation DSP workload Graphics workload
ON
ON
ON
ON
Intensive multimedia mode
figure 1 Rail-Switched CPU Sub-System
Power-Gated CPU Core
Integer Core
Cache Memory Subsystem MAC
VFP
Power Gating Example
• An SoC-level always-on supply to control the external rail switching handshake with the power supply. Figure 3 illustrates the power networks with independent “VDDCPU” and always-on “VDDSOC” with a common VSS ground connection; in this example, the power-gated standard cell area has a non-gated state retention supply shown to indicate an active supply rail within a power-gated region. External Power Rail Switching External power rail switching (Figure 4) offers the best long-term leakage power savings—but introduces a significant turn-on delay to allow voltage regulation to stabilize and settle within specification. FEBRUARY 2008
39
portable power
• I/O power (at least one of 1.8/2.5/3.3V, and perhaps several depending on the application). • “Always-on” SoC core rail (technology-dependent logic and internal memory power rail). • Clean analog power supply rail to PLLs • An optional “keep-alive” voltage supply to the real-time clock.
figure 2 Rail-Switched CPU Sub-System
Power-Gated CPU Core
Integer Core
nd
Cache Memory Subsystem MAC
VFP
Flattened Switching Network
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
figure 3 VDDCPU VDDSOC
ed
Power-Switching Fabric
State Retain Control
Power-Gating Controller
Power-Gated CPU Standard-Cell
ISO
Non-Power-Gated Cache Memories
companies providing solutions now
exploration into products, technologies and companies. Whether your goal is to research the latest datasheet from a company, mp to a company's technical page, the goal of Get Connected is to put you in touch with the right resource. Whichever level of gy, Get Connected will help you connect with the companies and products you are searching for.
VSS
onnected
Power Network Control
End of Article Get Connected
with companies mentioned in this article. www.portabledesign.com/getconnected
40
Only a few voltage rails can typically be externally switched; every power supply incurs (external) regulator cost and area on the circuit board— including inductors and capacitors required to implement switched mode power supplies. Every power rail also requires on-chip power distribution that costs area and complicates the power planning and physical floor-planning. Most SoCs already have at least three power rails:
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
Adding more than two or three external switch power rails adds significant complex¬ity and cost to the end product. Typically a shared ground/VSS connection approach to the chip and board works best for external power rail switching. Although there are typically independent VSS pins for both the I/O pad-ring and the chip core to decouple output simultaneous switching activity from the logic and memory, these are typically grounded on the circuit board into a shared “0volt” ground plane. Treating any other power supplies as switched positive supplies relative to the common ground minimizes complexities when adding power gating. External power rail switching incurs significant delays on wake-up events—from the order of tens of microseconds to milliseconds or even longer. Much faster supply switching times are not necessarily desirable. The in-rush currents to re-charge all the capacitive nodes in the powered-down subsystem result in noise injection into other (powered) regions of the chip. The resulting “ground-bounce” in a shared ground sys¬tem can introduce problems that are hard to quantify until very late in the implemen¬tation and analysis phases of the design flow. Translating such latencies into clock cycles at RTL level is not simple. Normally the clocks should be suppressed until a switched power rail is stable and within specified tolerance. For a design operating in the hundreds of MHz region, this may be the equivalent of tens of thousands of clock cycles. The actual delays are highly depen¬dent on the power supply technology (which may have to be multi-sourced in a production).
portable power Separate power rails become a necessity when one introduces dynamic voltage scaling. It may also be highly desirable to give large banks of memory their own supply, which may be switched to intermediate RAM retention operating condi¬tions, for example. Recommendations: • Minimize the number of external switched independent power rails—each one must be justified from an end-product requirement given the associated additional power supply real-estate costs and on-chip power distribution. • With external switched rails, it is best to switch (positive) supply rails and retain a common ground. • In systems implementing voltage scaling, an independent rail must be provided for each voltage scaled region. Pitfalls: • Design for significant external power rail switching times: tens or hundreds of thousands of clock cycle latencies must be factored into wake-up and will be dependent on the external PSU specifications. • Although multiple rails appear elegant from a system design perspective, they introduce verification and deployment challenges in production. Independent sup¬ply rails have independent voltage control regulators, and independent rails can exhibit vastly different load regulation characteristics when active, wait-stated or halted compared to logic powered at interfaces. On-Chip Power Gating On-chip power gating is much faster than off-chip power rail gating. And the smaller the power-gated region, the faster power can be gated on and off. The current required to power up a small power-gated region is much less than that required for a large block. But time must be budgeted to manage the minimization of power gating tran¬sients and noise injection as seen by other logic and memory.
figure 4 SOC
Power Supply
“Always-On” VDDSOC Power Rail
PSU Regulator
PSU Control Interface
Externally Switched VDD Power Rail
PSU Regulator (with enable)
Rail-Switched Sub-System
VSS Ground Rail
External Power Rail Switching
Therefore it is realistic to see power gating in terms of a few clock cycles for very small regions and tens or even hundreds of clock cycles for more significant gate counts. Turning on a number of small power-gated regions at the same time is no bet¬ter than powering up a large block and may lead to a much more complex power con¬troller. Power gating has an impact on both performance and area, due to the nature of the switching transistor fabric. These limitations will impact system architecture and design objectives. Recommendations: • Design for technology-dependent power-gating times: tens or hundreds of clock cycle latencies may need to be factored into wake-up times dependent on the area switched and the switching fabric control characteristics. FEBRUARY 2008
41
figure 5 Domain A D
Domain B ISO
Domain C D
Q
ISO Q
CLK
CLK
Retention Flop
Retention Flop
Buffering Inter-Domain Signals
• Design for “wait-states” across boundaries where there are dynamically power-gated functional units such that the implementation-dependent delay times can be safely managed and latency constraints set.
nd
er exploration ether your goal speak directly ical page, the ght resource. technology, es and products
Pitfalls: • Every power-gated rail introduces verification and test challenges so the number of power-gated regions needs to be carefully justified and factored into project timescales.
ed
Power State Tables and Always on Regions
companies providing solutions now
When dealing with multiple power-gated
power domains, power routing can become exploration into products, technologies and companies. Whether your goal is to research the latest datasheet from a company, mp to a company's technical page, the goal of Get Connected is to put you in touchcomplex. with the right Whichever of Inresource. particular, the level concept of “always gy, Get Connected will help you connect with the companies and products you areon” searching for. becomes less clear. Figure 5 shows three
onnected
End of Article Get Connected
with companies mentioned in this article. www.portabledesign.com/getconnected
42
power domains, each of which is power gated. If power domain B is always on, then there is no problem. But if domain B is turned off while domains A and C are powered up, then there is a problem: the outputs from A to C are corrupted because the buffer in B is powered down. In this case, we would have to route power from some other “always on” supply to the buffer in B. We could use either the isolation supply in A (since it stays on even when A is powered down) or the supply from C. On the other hand, if we know that whenever B is powered down, then C is also powered
PORTABLE DESIGN
Get Connected with companies mentioned in this article.
down, we do not have to provide a special supply to B. In this case, we consider B to be “relatively always on”—that is, always on relative to domain C. Thus, we can end up with some fairly complicated power routing rules depending on the power-gating relationships among different blocks. UPF provides a succinct way for system architects to communicate these power-gat¬ing dependency rules to the implementation tools. The create_pst and add_pst_state commands allow us to create a power state table that can be used to specify the relationships between different power supply nets. This article originally appeared as Chapter 6 in Low Power Methodology Manual for System-on-Chip Design (New York: Springer, 2007). Copyright © 2007 by Synopsis, Inc. & ARM Limited. Reprinted by permission of the authors. The Low Power Methodology Manual (LPMM) is a comprehensive and practical guide to managing power in system-on-chip designs, critical to designers using 90-nanometer and below technology. This book is a mustread for anyone designing, or getting ready to design, SoCs for low-power applications. Synopsis, Inc., Mountain View, CA. (650) 584-5000. [www.synopsis.com]. ARM Inc., Sunnyvale, CA. (408) 734-5600. [www.arm.com].
08
20
...Your Vehicle to Success in the
Fast Paced
Telecom Market
www.mountainviewalliance.org
The Mountain View Alliance puts you in the drivers’ seat! Now you can become part of this fast-paced 2-day conference designed to accelerate the adoption of open-specication based components and platforms. Take your place in the design and evolution of commercial off-the-shelf implementations in the telecommunication and wireless infrastructure. Set your plans to include a diverse and effective conference that brings company’s and individuals together from the entire COTS ecosystem.
Mountain View Alliance Communications Ecosystem Conference March 11 and 12, 2008 South San Francisco Conference Center
www.mvacec.com
products for designers Industry’s First High-Speed, Continuous-Time Sigma-Delta ADC National Semiconductor Corporation has announced the availability of the industry’s first high-speed, continuous-time sigma-delta (CTSD) analog-to-digital converter (ADC). The ADC12EU050, a member of National’s PowerWise energy-efficient family, is an 8-channel, 12-bit, 50 mega-samples per second (MSPS) ADC that offers an alias-free sample bandwidth up to 25 MHz, while consuming 30 percent less power (350 mW) than competing pipeline devices. CTSD technology has been the subject of university and industry research for more than 15 years. With the introduction of the ADC12EU050, National is the first manufacturer to successfully transition CTSD technology from the research lab to the production line. National plans to expand its CTSD ADC product offering for imaging, communications, and test and measurement applications that require high dynamic performance at extremely low power. The continuous-time architecture greatly simplifies system design as it allows the integration of other signal-path functions, such as signal conditioning, while incorporating anti-aliasing filtering into the ADC. The ADC12EU050 12-bit, ultra-low-power, octal CTSD ADC offers an alias-free sample bandwidth up to 25 MHz and a conversion rate of 40 MSPS to 50 MSPS. The device features 68 dB of signal-to-noise and distortion (SINAD) and a signal-to-noise ratio (SNR) of 70 decibels full scale (dBFS). Operating from a 1.2V supply, it consumes 44 mW per channel at 50 MSPS for a total power consumption of only 350 mW. The ADC12EU050 reduces interconnection complexity by using programmable serialized outputs, which offer industry-standard low-voltage differential signaling (LVDS) and scalable low-voltage signaling (SLVS) modes. The ADC12EU050 operates over the -40° to 85°C temperature range and is supplied in a 10 mm by 10 mm, 68-pin LLP package. Samples of the ADC12EU050 are available now, with high-volume quantities scheduled to be available in third quarter 2008. The ADC12EU050 is priced at $64 in 1,000-unit quantities. National Semiconductor Corporation, Santa Clara, CA. (408) 721-5000. [www.nsc.com].
Highly Accurate Li-Ion Battery Charger ICs While Lithium-ion (Liion) batteries offer many advantages for portable consumer electronics, they require extremely accurate charging current and output voltages to optimize battery life and performance. To address this need, Freescale Semiconductor has introduced a family of Li ion battery charger ICs designed to provide the industry’s highest performance and accuracy, as well as exceptional configuration flexibility. Freescale’s MC34671, MC34673 and MC34674 single-input autonomous battery charger ICs offer output voltage accuracy of +/-0.4 percent over temperature and charging current accuracy of +/ 5 percent over temperature. The ICs can be customized to create hundreds of configurations to address a wide range of portable and ultra-mobile device needs. A designer can select features and specifications, such as pin-out, feature set, charging parameters and LED indication, and Freescale can deliver customized charger ICs by programming them at the end of the manufacturing process. Built using Freescale’s advanced SMARTMOS process technology, the battery charger ICs are designed to deliver up to 1.2A of charge current to single-cell Li ion or Li polymer batteries. The battery charger input voltage can come from an AC adapter or a USB port power source. The high input voltage capability (up to 28V) eliminates the need for an external input over-voltage protection circuit required in handheld devices, which helps reduce system cost and board space. Available in a low-profile 2 x 3 x 0.65 mm ultra-thin dual flat no-lead (UDFN) thermally enhanced package, samples of the battery charger ICs are available now with customer-selected specifications. The suggested resale price in 10,000-piece quantities starts at 52 cents. Freescale Semiconductor, Austin, TX. (800) 521 6274. [www.freescale.com].
Integrated AM/FM Radio Receivers with Weather Band Coverage Silicon Laboratories Inc. has announced the industry’s first fully integrated AM/FM radio receivers with weather band (WB) coverage. In addition to receiving AM/FM bands, the Si4736 family covers the frequencies between 162.40 and 162.55 MHz used to broadcast continuous, 24-hour weather warnings, watches, forecasts and other hazard information, as well as national security and public safety alerts throughout the United States and Canada. The Si4736 receiver family reduces component count of existing solutions by more than 90 percent, providing the industry’s smallest and first fully integrated AM/FM/WB radio in a tiny 3- x 3-mm package, enabling equipment makers to costeffectively replace existing solutions as well as add WB functionality in a number of portable devices such as cell phones, navigation devices and media and MP3 players for the first time. The Si4736 family is the first to integrate AM/FM with weather band from antenna input to audio output into a single IC, including 1050 Hz alert tone detection, which enables the radio to automatically turn on the receiver to play incoming warning alerts. Silicon Labs’ patented digital low-IF architecture enables industry-leading integration, offering feature and performance enhancements over traditional analog-based solutions. By applying this advanced and proven architecture, the Silicon Labs’ AM/FM/WB solution provides better audio in strong signal environments and excellent sensitivity in weak signal environments. In addition, the Si4737 and Si4739 family members integrate FM RDS/RBDS decoding to provide station and song identification to listeners, further improving the consumer experience. The Si4736 family is footprint-compatible to Silicon Labs’ existing radio portfolio, offering customers a cost-effective solution to add WB functionality to their products. A full reference board with complete schematics, layout files and a robust software development environment is available to customers to facilitate evaluation and design. Devices supporting Specific Area Message Encoding (SAME), a standard that incorporates digital data into the weather broadcasts for localized hazard alerts, will be available later in the year. Samples of the Si4736/37/38/39 are available now in a compact 3- x 3-mm, 20-pin quad flat no-lead (QFN) package. Pricing for the Si4736/37/38/39 family starts at $3.06 in quantities of 10K. An evaluation board is available for $150. Silicon Laboratories, Austin, TX. (512) 416-8500. [www.silabs.com].
44
PORTABLE DESIGN
Microchip Technology Inc. has announced the first two Baseline 8-bit Flash PIC microcontrollers with non-volatile Flash Data Memory (FDM) in 8- and 14-pin packages. With the availability of 64 bytes of data memory—combined with an 8 MHz internal oscillator, a Device Reset Timer (DRT), up to three channels of 8-bit Analog-to-Digital Conversion (ADC) and up to two comparators—engineers now have a diverse feature set to select from when integrating digital intelligence in low-cost applications. Through the integration of FDM and analog functionality onto its Baseline 8- and 14-pin microcontrollers, Microchip’s PIC12F519 (8-pin) and PIC16F526 (14-pin) provide a cost-effective option for many design engineers to store configuration and calibration data, counters or small look-up tables without external non-volatile memory. Additionally, designers can replace discrete analog components with the digital intelligence and reprogrammability of a microcontroller. The new Baseline PIC microcontrollers form the basic building blocks for a wide variety of systems across a broad range of markets. Their integrated memory and analog functionality makes these microcontrollers well suited for applications such as battery-operated products, power supplies and identification tags. Additional example applications for the Baseline PIC microcontroller family include: Consumer Electronics (standby power control, power-sequencing infrared receivers, handheld products, device authentication); Mechatronics (includes smart switches, mode selectors, remote I/Os, electric pumps, compressors); Home Automation (light switching and dimming, thermostats, security systems); and Small Home Appliances (blenders, toasters, coffee machines and electric toothbrushes). Both devices are available today for general sampling and volume production, supported by Microchip’s MPLAB development tools, including the PICkit 2 series of starter kits. In 10,000-unit quantities, the PIC12F519 is $0.56 each, and the PIC16F526 is $0.64 each. Microchip Technology Inc., Chandler, AZ. (480) 792-7200. [www.microchip.com].
Mid-Range RealTime Spectrum Analyzers Show Live RF Tektronix, Inc. has announced the addition of DPX waveform image processor technology to the mid-range RSA3000B Series Real-Time Spectrum Analyzers. This provides a unique live RF view of the spectrum using the RSA3300B and RSA3408B family models, enabling an unprecedented RF signal discovery capability for a broad range of digital RF applications including RFID, radio communications and spectrum management. DPX transforms volumes of real-time data and produces a live RF spectrum display that reveals previously unseen RF signals and signal anomalies. With a spectrum processing rate hundreds of times greater than any spectrum analyzer from other vendors, the RSA3300B series and RSA3408B provide 100% probability of intercept for transients as brief as 31 microseconds on the RSA3408B and 41 microseconds on the RSA3300B Series models. Combined with the exclusive ability to trigger on transient signals in both time and frequency domains, the RSA3300B Series and RSA3408B offer unmatched troubleshooting and debug of digital RF designs. The RSA3300B series is available with either DC-3 GHz or DC-8 GHz frequency coverage. With 15 MHz capture bandwidth and 70 dB Spurious Free Dynamic Range (SFDR), the RSA3300Bs are ideal for use in the design and debug of 3G mobile systems, Near-Field systems (such as RFID and Bluetooth) and narrow to medium bandwidth communications systems. The RSA3408B with DC-8 GHz frequency coverage, 36 MHz capture bandwidth and 73 dB SFDR is tailored for higher bandwidth and dynamic range applications including 3G mobile components and system debug, WLAN and WiMax system design, demanding spectrum management applications and general-purpose digital RF debug. Prices for the RSA3303B with DPX begin at $32,900, U.S. MSRP. RSA3300B and RSA3408B software options are available for 3G, WiMAX, WLAN, RFID, Signal Source and General-Purpose modulation and RF analysis. Tektronix, Inc., Beaverton, OR. (503) 627-4027. [www.tektronix.com].
Low-Power Flash-Based FPGAs for Portables Actel Corporation has introduced the ProASIC3L family of field-programmable gate arrays (FPGAs) for designers of high-performance, power-conscious systems. Dynamic power is critical in applications where clocks are constantly switching and providing input to an FPGA, such as high-speed data pipelines for portable video and medical appliances. Like the company’s 5-microwatt (µW) IGLOO FPGA family, the ProASIC3L devices support a 1.2V core voltage and Actel’s Flash*Freeze technology. Flash*Freeze enables designers to quickly switch the device from dynamic operation to static without switching off clocks or power supplies. In a typical high-speed design using comparable one-million gate FPGAs, SRAM-based competitive solutions consume 60 percent higher dynamic power and 100 times more static power than the ProASIC3L devices, which consume just 100 mA of dynamic power and 1 mW of static power. Based on Actel’s ProASIC3 architecture, the ProASIC3L family is comprised of four family members ranging from 250,000 to three million gates: the A3P250L, A3P600L, A3P1000L and A3PE3000L. Offered in both commercial and industrial temperature grades, the devices feature embedded SRAM memory, high I/O counts, phase-locked loops (PLLs) and nonvolatile memory. Free of the license and royalty fees typically associated with industry-leading processor cores, Actel is initially offering the 32-bit ARM Cortex-M1 processor for use in its 600,000-gate ProASIC3L device, the M1A3P600L. Operating at up to 70 MHz and consuming 32 percent of available chip real estate, the highly configurable processor provides a good balance between size and speed, while offering space for customization. The ProASIC3L family is sampling now. Pricing starts at $3.95 in volume. The M1A3P600L will also be available in Q1 2008 with the three remaining M1-enabled family members slated for Q2 and Q3 2008. Version 8.2 of the Actel Libero IDE will be available in Jan 2008. The two ProASIC3L starter kits will be available in Q1 2008. Actel Corporation, Mountain View, CA. (650) 318-4200. [www.actel.com].
FEBRUARY 2008
45
products for designers
Flash Data Memory in 8-Bit PIC Baseline MCUs
products for designers
Industry’s Smallest HiSpeed USB 2.0 Transceiver SMSC announced its latest advance in Hi-Speed USB 2.0 connectivity, the USB332x transceiver family. These new Hi-Speed USB transceivers from SMSC set new standards for integration and small size, helping designers meet the tight board space and cost requirements of portable products. SMSC’s new USB332x products supplant existing Hi-Speed USB transceiver technology for the portable market by integrating previously required separate external components into one extremely small package. The USB332x is designed in a wafer-level chip scale package (WLCSP) measuring 1.95 mm x 1.95 mm to save 75% in board space over prior SMSC technology. It is also 50% thinner than prior SMSC technology to meet the need for thinner product profiles. “Integration is the watchword for all of SMSC’s connectivity products, but it is especially important for products designed for the portable electronics market,” said Mark Bode, vice president of connectivity marketing at SMSC. “With the USB332x family, we’ve rolled several discrete functions into a single package, reducing design complexity, design time and overall bill of materials (BOM) cost. But most important for portable electronics, we’ve delivered this level of integration in an incredibly small footprint, which saves valuable board space for our customers.” Portable applications now require the USB port to act as a single point of connectivity for applications like high-speed data transfer, stereo audio delivery and battery charging. With SMSC’s USB332x product family, designers get the features they require and save board space in multiple ways. Since many portable products use USB connectivity for battery charging, high levels of over-voltage protection are required. Every member of the USB332x family features VBus protection up to +30V. In addition, since the USB connector is exposed to the outside world, robust ESD protection is mandatory. The USB332x family integrates ESD protection circuitry tested up to ±15KV ESD (based on IEC air discharge standard) without the need for external components. Additionally, the USB332x family features an integrated analog USB switch, which designers can use for multiple purposes. Samples of the USB332x family are available now.
High-Performance uModule Receiver Subsystem Bridges the Gap Between RF & Digital Worlds Linear Technology introduces the LTM9001, the first in a series of System in a Package (SiP) signal chain receiver modules, leveraging Linear’s uModule packaging technology. This new family of integrated receiver subsystems is intended to bridge the expertise gap between the RF world and digital domain to provide ease of use and shortened time-to-market. The LTM9001 is a semi-customizable IF/baseband receiver subsystem that includes a highperformance 16-bit Analog-to-Digital converter (ADC) sampling up to 160 Msps, an anti-aliasing filter and fixed gain differential ADC driver. The LTM9001 harnesses years of applications design experience to offer integration, ease of use with guaranteed high performance to enhance system performance in many communications and instrumentation applications. The beauty of the LTM9001 lies in its semi-customization (customization requires minimum order quantity). Using pin-compatible product families, the LTM9001 can be configured for various sampling rates and the differential ADC driver can be substituted for fixed gain versions ranging from 8 dB up to 26 dB. The anti-aliasing filters can also be configured as Low-Pass or Bandpass filter versions, accepting input frequencies as high as 300 MHz. The LTM9001 is available in production volumes today in a 11.25 mm x 11.25 mm LGA package and is priced at $82 each in 1,000 piece quantities. Demonstration circuits and samples are available at www. linear.com/LTM9001. Linear Technology, Milpitas, CA. (408) 432-1900. [www.linear.com].
SMSC, Hauppauge, NY. (631) 435-6000. [www.smsc.com].
Integrated CCFL Backlight Inverter Drive ICs for LCD Backlight Systems Fairchild Semiconductor has introduced the industry’s most integrated CCFL backlight inverter drive ICs providing space and cost savings, superior system reliability and simplified design for N-N half-bridge, push-pull and P-N full-bridge LCD backlight inverter designs. The FAN7316 and FAN7317, featuring a wide input voltage range, and built-in open lamp protection (OLP) and open lamp regulation (OLR*) circuit, eliminate up to 30 external components in four lamp circuit inverter designs compared to similar devices on the market. These highly integrated CCFL backlight inverter drive ICs also feature selectable dimming polarity and analog and burst dimming functions, which simplify design and speed time-to-market. Benefits of the FAN7316 and FAN7317 include: • Space and cost savings – eliminating the need for external IC_VCC regulation circuitry and OLP and OLR* detection circuitry and P-MOSFET driving circuitry* with its wide input voltage range (FAN7316: 4.5V to 24V, FAN7317: 6V to 25.5V) and built-in OLP and OLR* detection circuitry. This eliminates up to 30 external components. • System reliability – offering the industry’s most comprehensive protection functions including built-in OLP circuit, open lamp regulation (OLR), short circuit protection (SCP), thermal shutdown (TSD), soft-start and arc protection. • Design flexibility – featuring a wide input voltage range, selectable dimming polarity, analog and burst dimming mode and PWM dimming by an external pulse signal, the FAN7316 and FAN7317 offer flexibility in designing N-N half-bridge, push-pull and P-N full-bridge LCD backlight inverters. The FAN7316 and FAN7317 are available now in a 20-pin SOIC lead (Pb)-free package that meets or exceeds the requirements of the joint IPC/ JEDEC standard J-STD-020C. Price (each, 1000 pcs.): $1.48 for the FAN7316. Fairchild Semiconductor Corporation, South Portland, ME. (207) 775-8100. [www.fairchildsemi.com].
46
PORTABLE DESIGN
products for designers
Quad-Band EDGE PA Modules for 3G WEDGE Phones TriQuint Semiconductor, Inc. has announced the first two members of its HADRON II PA Module family: the TQM7M5012 and TQM7M5005. These second-generation EDGE PAs were designed using TriQuint’s CuFlip copper bump technology, improving RF performance while reducing current consumption to provide longer device battery life. Debuting with a 5 x 5mm footprint, these solutions are 50 percent smaller than the previous generation, providing handset manufacturers additional board space to add other rich features. The new products build on the success of TriQuint’s first-generation HADRON PA Module family, found in some of the industry’s most popular mobile devices, including Samsung’s BlackJack, LG’s Shine and Chocolate 3G, Palm’s Treo and HTC’s Advantage. TriQuint’s EDGE PA module shipments to 3G phones experienced 178 percent growth in 2007 as TriQuint gained market share, and as WCDMA networks grew to provide 70 percent of the world’s commercially launched 3G services. The growth in WEDGE (WCDMA + EDGE) devices was noted by the Global mobile Suppliers Association (GSA) in its January 15, 2008 update, which stated “Most WCDMA-HSPA networks combine with EDGE for service continuity and the best user experience.” Available in EDGE-Polar and EDGE-Linear versions, both products have been optimized to deliver best-in-class current consumption in the critical GMSK mode, which significantly improves handset battery life. The TQM7M5012 for EDGE-Polar applications is aligned with Qualcomm’s newest 3G multimode transceivers. Compared to the previous generation, TQM7M5012 offers even lower Rx band noise power level to help eliminate external components in the radio. The TQM7M5005 is designed to work with some of the world’s leading 2.5G and 3G transceivers that require a linear power amplifier. TriQuint has developed RF radio application and evaluation boards for both the TQM7M5012 and the TQM7M5005 to demonstrate the features and compatibility of the devices, thus enabling phone manufacturers to shorten handset development time. TriQuint is currently sampling the TQM7M5005 and the TQM7M5012 to lead customers and production is planned for 1H 2008. Triquint Semiconductor, Hillsboro, OR. (503) 615-9000. [www.triquint.com].
design idea Precision Circuit Monitors Negative-Supply Thresholds
figure 1
by Kevin Bilke, Maxim Integrated Products Inc., UK A simple but accurate circuit (Figure 1) monitors the magnitude of a negative supply voltage. This capability is useful in multi-rail systems, particularly if the negative rail serves as the precision bias voltage for a GaAs device. The IC (MAX971) includes an open-drain comparator and a precision 1% voltage reference. Its trigger threshold is set by the value of external resistor R4. For convenience, all other resistors have the value 1MΩ. Resistors R1 and R2 divide the reference voltage, providing a trip point of Vref/2 for the comparator. R3 and R4 sample the negative voltage, and IC1 compares that sample to the trip-point voltage. R4 = 1MΩ + 2(1MΩ X Vneg/1.182V), where Vneg is the magnitude of the voltage being monitored (ignoring the minus sign). The circuit output goes low in response to a fault condition⎯i.e., when the magnitude of the monitored voltage drops below the set threshold. To ensure an overall accuracy better than 2%, all resistors should have 1% tolerance. For monitoring more than one negative voltage, choose the MAX974, which includes four open-drain comparators with a precision 1% reference. Feed the output of R1 and R2 to all the comparators, add an R3/R4 pair for each monitored voltage, and calculate each R4 to yield a trip point for the corresponding voltage. All the open-drain outputs can be connected together. Maxim Integrated Products, Inc. Sunnyvale, CA. (408) 737-7600. [www.maxim-ic.com].
1.182V R3 1MΩ
Supply 3V to 11V
R2 1MΩ
100nF PIN7 PIN3 Vref /2
HYST IN-
Vref
0V R Pull Up 1MΩ
Pin 6
V+
PIN7 Pin8
Pin 4
IN+
IC1 MAX971
R4
Output Low = Fault
GND PIN1
R1 1MΩ V-
Pin 2
0V To Monitored Voltage Rail R4=2.69MΩ For A - 1V Threshold
The output of this negative-voltage monitor goes low when the monitored negative voltage is above (more negative than) the threshold set by R4.
FEBRUARY 2008
47
ceo interview Mark Thompson
Fairchild Semiconductor
While David Packard’s garage holds mythic status among entrepreneurs, it was Fairchild Semiconductor that first put the silicon in Silicon Valley. Founded 50 years ago by the famous Traitorous Eight, Fairchild gave the world the planar transistor, which broke the “numbers barrier” for interconnects, making the IC possible—not to mention the whole semiconductor industry. They then contributed Robert Noyce and Gordon Moore to help build it up and Gene Kleiner to help finance it. Today Fairchild is the #1 global supplier of power analog, power discrete and optoelectronic components that optimize system power. Fairchild’s CEO Mark Thompson has become a leading advocate of “green technology,” developing products that converge applications into smaller, lighter, more efficient devices while consuming less power. Thompson is aware of the implications of energy savings on both the device and political levels. Portable Design talked with him recently about both.
Portable Design: Fairchild has long been a technology innovator, dating from Jean Hoerni’s invention of the planar transistor 50 years ago. What sort of potentially disruptive technologies are you working on now?
Thompson: Well it might sound heretical, but I don’t think there really are any disruptive technologies in the semiconductor universe. The thing that happened was semiconductors showed the way not just to be a one-time revolution; they really introduced a new pace of change, an ongoing introduction of new capabilities that has become the norm in the industry. So I don’t think you’d recognize a so-called revolution if it occurred, because it would just look like the normal sea change that hap-
48
PORTABLE DESIGN
pens every day, every year in the industry.
Portable Design: Energy conservation has recently acquired political overtones in the context of global warming. What role do you see for the high-tech community in helping to address energy conservation issues?
Thompson: There are multiple roles for the high-tech community. The first is the capability to deliver everything from new-generation methodologies to increased conversion efficiency, storage and transport; these are all fundamentally enabled by technology capabilities. Sometimes pure free market forces are the best way for things to happen, and sometimes they’re not. One of the changes that is required is that the high-tech community is going to need to be more outspoken in terms of helping to define capabilities and directions and to define systemic goals, whether it’s in the form of soft, hard or migrating standards for various kinds of efficiencies in really all aspects of power usage. We should continue to “show the way” from a technology point of view. But we also need to help manage standards and legislation a little bit more actively than the community typically does.
Portable Design: A lot of specmanship surrounds “low power” ratings for ICs. Would you envision or support some sort of power benchmarking for semiconductors along the lines of the EPA’s Energy Star ratings?
Thompson: Well, no one is capable of writing a spec that can’t be gamed. System engineers are really smart, and they know what they need and they know it works. So I think that the standards aren’t the place really to address that; at the end of the day the competitive marketplace is. The majority of systems require not just a single point of load efficiency, but they really require efficiency across the entire spectrum. A product that addresses that will ultimately prevail in the market, and I don’t think there’s a lot of requirement there for a more aggressive policing kind of role from the standards side.
Portable Design: According to John East of Actel, more than 50% of the 4,055 billion kWh of electricity consumed in the United States each year is used to power electric motors. What can be done to increase the efficiency of these motors?
Thompson: If you look at the classic wound-coil AC motors, the price of copper is probably as big a driving force as efficiency in pushing things away from that toward permanent-magnet DC motors. The payback model for conversion to those is so compelling that while I think some simple efficiency standards in the US would help—akin to
the appliance efficiency standards that exist in many other countries—I think that an understanding of the payback model will naturally force the change. DC motors are most efficient at low speed, where AC motors are very inefficient. In low-speed applications AC motors are so inefficient that they wind up being quite large; so the downsizing of the AC motor more than pays for the incremental semiconductor content—and you get the increased efficiency for free. So I believe that with just the slightest nudge from standards, the rapidly escalating prices of copper and other materials will naturally push that one over the goal.
Portable Design: Let’s take the case of “wall wart.” They reportedly account for as much as one percent of total U.S. power consumption, or four percent of residential use. Does Fairchild have any technologies that could resolve what would seem to be a simple problem?
Thompson: Sure. The wall wart has actually moved a long way from its very earliest incarnation, which was a simple linear regulator and a transformer. California passed some standards requiring certain standby efficiencies, so those standards are emerging. The technologies to do them continue to evolve. We have a number of different products we offer into the adapter space featuring very little off power and very efficient conversion. The next generation in cell phones is an area where it looks like it’s starting to happen, which is a primary-side regulator and then clean-up regulation where the power goes into the phone. So it’s already starting to happen, it’s got some momentum of its own.
Portable Design: Turning to portable designs, power management is probably the number one concern for Portable Design’s readers. What strategies can system designers take to cope with the growing demand for more functionality in handsets while still living within a reasonable power budget?
Thompson: Start at a very high level and look at batteries; batteries are about as good as you’re going to get. If you look at gravimetric density, there’s not much beyond lithium-ion. However, the features and the loads themselves over time have their own progression. Then there’s also the recognition of the duty cycle of any given component. The more pieces you put in a portable device, by definition the less duty cycle each piece has, because you can’t use them all at once. So careful thought has to be given to recognizing what the duty cycle of each of these pieces is likely to be and optimizing the power requirements for those. The sleep mode here becomes very important. You want very low quiescent currents, and you want to be able to change very rapidly between off and on. Since most devices are off most of the time, it’s actually the quiescent mode that primarily dictates battery life.
management that are likely to yield the most benefits over the next few years?
Thompson: I think that handsets are the most interesting space in portable. If you go back a number of years, many people were predicting that once the cell phone standardized it would become just like a calculator: it will be a static design, it will have static feature set, it will have one chip, and it will be really cheap—completely standardized and commoditized. At the same time people were living in a PC-centric world saying, “Convergence is coming, and it will come together on the PC.” Today the PC has fundamentally lost the convergence battle, whereas cellular won it. As a result, the cell phone now enables you to transmit and receive in a very efficient way using a set of data-rich protocols. If you look at the power management implications in that, it’s really very interesting, because adaptability to feature migration has become almost the number one consideration for the OEMs of the world. You start by putting a transceiver in the center of the universe and ask yourself, “What are all the features and capabilities that are in orbit around this?” If you look at all the stuff that’s around the transceiver, people wonder why more power management capabilities don’t get sucked into the PMU. The PMU is certainly the hub of the system, but increasingly it’s dedicated to the care and feeding of the baseband processor; but you still have to power and manage all of these other capabilities and take data around the phone. Making that power management dynamic is how you enable the OEMs over the next 10 years. And that trend is accelerating, not standardizing. I have a theory that I call the Theory of Two: the best devices are actually good at just two things, and everything else is just an accessory. But no two people want the same two things. So if you look at the complex matrix of features and consider feature migration, that’s really what makes this a challenging environment for both the OEM and the component supplier. These are no static features—the camera element, for example, gets bigger and more capable over time. And each one of these features represents an iteration of power requirements and often of data requirements. So making those capabilities work with the right product families, the right road maps, at the right cost points, etc.—to me, that’s really what the market needs and wants. The companies who do that effectively are the ones that will win. Fairchild Semiconductor Corporation South Portland, ME. (207) 775-8100. [www.fairchildsemi.com].
Portable Design: What trends do you see in power
FEBRUARY 2008
49
The RTC Group is a media services company specializing in bringing companies and their products to a focused group of electronic and computer manufacturers. RTC is proud of its track record of blazing new trails in search of marketing value for our clients. Portable Design magazine is the newest addition to RTC Group’s collection of publications.
advertiser index Altera Corporation
7
www.altera.com
EEMBC
17
www.eembc.org
EmbeddedCommunity.com
4
SD West 2008-01-14
Lattice Semiconductor Corporation
29
Santa Clara, CA
Linx Technologies, Inc
4
www.linxtechnologies.com
Mentor Graphics
21
www.mentor.com
Mouser Electronic
2
www.mouser.com
MVACEC 43
www.mvacec.com www.mountainviewalliance.org
event calendar 03/03-07/08 www.sdexpo.com
03/11-12/08
Mountain View Alliance Communications Ecosystem Conference San Francisco, CA www.mvacec.com
03/17-18/08
VoiceCon Orlando 2008 Orlando, FL www.voicecon.com 03/17-20/08
VON.x San Jose, CA www.von.com 03/25/08
Real-Time & Embedded Computing Conference Dallas, TX www.rtecc.com 03/27/08
Real-Time & Embedded Computing Conference Houston, TX www.rtecc.com 04/08-10/08
ROBOBusiness Conference & Expo Pittsburgh, PA www.robobusiness.com 04/14-18/08
Embedded Systems Conference San Jose, CA www.com-egevents.com If you wish to have your industry event listed, contact Sally Bixby with The RTC Group at sallyb@rtcgroup.com
50
PORTABLE DESIGN
www.embeddedcommunity.com www.latticesemi.com
National Semiconductor
52
www.national.com
White Electronic Designs
23
www.wedc.com
Wind River Systems, Inc.
51
www.windriver.com
with standards. with Wind River.
Mike Deliman Here’s a guy who appreciates the view from above. When he’s not trekking the high plains and mountain passes of Tibet, Mike Deliman is working on aerospace and defense projects for Wind River. He’s fond of Mars rovers, solar panels, and astronauts; and his real-life heroes are Albert Einstein and the Dalai Lama. He’s aiming high.
Regional Developer Conference, Aerospace and Defense Edition Register now to join Wind River and our partners at one of three , full-day conferences dedicated to Wind River’s aerospace and defense solutions including: Wind River's new VxWorks 653 Platform 2.2, Wind River Real-Time Core for Linux (with FSMLabs technology), Wind River's VxWorks AMP/SMP multicore support, ARINC 653 and Integrated Modular Avionics (IMA) , and DO-178B safety certification http://www.windriver.com/announces/ad_conference/index.html
March 12, 2008
March 27, 2008
April 2, 2008
Hilton Santa Clara 4949 Great America Parkway Santa Clara, CA 95054 Tel.: 408-330-0001
Turf Valley Resort 2700 Turf Valley Road Ellicott City, MD 21042
Four Seasons Resort & Club Dallas at Las Colinas 4150 North MacArthur Boulevard Irving, TX 75038 Tel.: 972-717-0700
Tel.: 410-423-0833
® 2008 Wind River Systems, Inc. The Wind RIver logo is a trademark, and Wind River is a registered trademark of Wind River Systems, Inc. Others marks are the property of their respective owners.
Reduce Energy Consumption with PowerWise Technology ®
national.com/powerwise Digitally-Programmable LP5552 Energy Management Unit Extends Battery Life and Enables New Features PWI 2.0 Bus PWI 2.0 Slave ENABLE RESETN PWROK
Adaptive Voltage Regulator
Processor Core
Adaptive Voltage Regulator
GPO Control
LP5552
PWI 2.0 MASTER
Hardware Accelerator
Advanced Power Controller
DSP (Adaptive Supply Voltage)
Programmable LDO
GP01 GP02 GPO3
(Adaptive Supply Voltage)
Programmable LDO
VPPLL/Analog
Programmable LDO
VIO
Programmable LDO
V Peripheral
Programmable LDO
V Memory
Embedded Memory TCM Cache Dual Core System-on-Chip IC
Applications Dual core processors, cellular handsets, handheld radios, PDAs, battery powered devices, and portable instruments
NEW
Product ID
# of Outputs
Output Voltages & Current
VIN Range
Interface
Package
LP5550
4
1 Buck: 0.6V to 1.2V, 300 mA 3 LDOs: 0.6V to 3.3V, up to 250 mA
3V to 5.5V
PWI 1.0
LLP-16
LP5551
6
2 Bucks: 0.6V to 1.2V, 300 mA 4 LDOs: 0.6V to 3.3V, up to 250 mA Nwell bias: -0.3 to +1V (to supply) Pwell bias: -1V to +0.3V (to GND)
2.7V to 5.5V
PWI 1.0
LLP-36
LP5552
7
2 Bucks: 0.6V to 1.235V, 800 mA 5 LDOs: 0.6V to 3.3V, up to 250 mA
2.7V to 4.8V
PWI 2.0
micro SMD-36
For FREE samples, datasheets, and more visit: national.com/powerwise Or call 1-800-272-9959
© National Semiconductor Corporation, 2007. National Semiconductor, , and PowerWise are registered trademarks of National Semiconductor Corporation. All rights reserved.