V E R IFICAT ION & T EST IN FOCUS P LU S
ARM PREVIEWS 64BIT ARCHITECTURE SEVEN PILLARS OF VIRTUAL PROTOTYPING TRENDS FOR THE INTERNATIONAL CES DEFINING FUNCTIONAL QUALIFICATION 3D INNOVATIONS FROM ITC 2011
TECH DESIGN FORUM IN 2012 DECEMBER 2011 // VOLUME 8, ISSUE 5 // TECHDESIGNFORUM.COM
SEE PAGE 8
Contents
December 2011 Volume 8, Issue 5 techdesignforums.com
Tech Forum 24 ESL/SystemC
The seven habits of highly effective virtual prototypes Shabtay Matalon and Yossi Veller, Mentor Graphics
30 Verified RTL to Gates
How to achieve power estimation, reduction and verification in low power design Kiran Vittal, Atrenta
38 Verified RTL to Gates
The principles of functional qualification George Bakewell, SpringSoft
42 Tested Component to System
Design for test: a chip-level problem Sandeep Bhatia, Oasys Design Systems
46 Tested Component to System
Pre-bond test for 3D ICs at ITC 2011 Special report, Tech Design Forum
Published quarterly for the electronic products design community, the Tech Design Forum Journal is a premier source for the latest advancements and new technologies used by hardware and software engineers to design and develop electronic products for the aerospace, automotive, consumer, medical, industrial, military, and semiconductor industries. The journal provides an ongoing forum in which to discuss, debate and communicate these industries’ most pressing issues, challenges, methodologies, problem-solving techniques and trends.
Commentary 6 Start Here Steve Jobs – a Man in Full
Beyond the gossip, the new biography has lessons for us all.
8 The Future
Tame the system
Find out how Tech Design Forum is changing and expanding in 2012.
10 Architecture When ARM’s 64
First details of the processing giant’s latest architecture and market ambitions have emerged.
20 Analysis
CES mourns the computer, hails computing
We preview key trends at next month’s Consumer Electronics Show in Las Vegas.
Tech Design Forum is a trademark of Mentor Graphics Corporation, and is owned and published by Mentor Graphics. Rights in contributed works remain the copyright of the respective authors. Rights in the compilation are the copyright of Mentor Graphics Corporation. Publication of information about third party products and services does not constitute Mentor Graphics’ approval, opinion, warranty, or endorsement thereof. Authors’ opinions are their own and may not reflect the opinion of Mentor Graphics Corporation.
Team EDITORIAL TEAM Editor-in-Chief
Paul Dempsey +1 703 536 1609 pauld@rtcgroup.com
Managing Editor Sandra Sillion +1 949 226 2011 sandras@rtcgroup.com
Copy Editor
Vice President
Cindy Hickson cindyh@rtcgroup.com
Vice President of Finance Cindy Muir cindym@rtcgroup.com
Vice President of Corporate Marketing Aaron Foellmi aaronf@rtcgroup.com
Rochelle Cohn
SALES TEAM
CREATIVE TEAM
Mark Dunaway +1 949 226 2023 // markd@rtcgroup.com
Art Director
Kirsten Wyatt kirstenw@rtcgroup.com
Graphic Designer
Maream Milik mareamm@rtcgroup.com
Advertising & Event Sales Manager Advertising & Event Sales Manager
John Koon +1 949 226 2010 // johnk@rtcgroup.com
Account Management
Sandra Sillion +1 949 226 2011 // sandras@rtcgroup.com
EXECUTIVE MANAGEMENT TEAM President
John Reardon johnr@rtcgroup.com
4
TECH DESIGN FORUM // DECEMBER 2011
Untitled-1 1
8/3/11 6:29:30 PM
accelerating engineers
When complex integrated circuits need to be designed fast, Xinyi Tan TM is on the scene in a flash. Using the Laker Custom Layout System with sign-off quality verification feedback from Calibre速 RealTime, she speeds up her project, reduces rework, and shortens the overall design cycle. Which makes her a superhero to everybody around her. Including the ones who matter the most. WHAT WILL YOU DO WITH YOUR SPRINGSOFT SUPERPOWERS? Tell us your story. Logon to www.springsoft.com
Start Here Steve Jobs – a Man in Full
W
alter Isaacson’s excellent biography of Steve Jobs has attracted headlines for its more controversial anecdotes, specifically those that detail the public and private behavior of the late Apple founder. The book itself, though, is far from an exercise in gratuitous muckraking. If you happen to work in high technology, many of the tales about Jobs and his often fractious relationships with staff, partners, suppliers and others in our industry should not really come as a surprise. The corridor gossip was always that Steve had an “edge.” And sometimes it was more than gossip. Even if you were lucky enough to avoid his temper personally, you probably know someone who was once less fortunate. As for the details of his personal life, his character was pretty much laid bare by his own sister, the novelist Mona Simpson, as Tom Owens in A Regular Guy. And that was published in 1997. Do you remember the opening line? “He was a man too busy to flush toilets.” The fact is that many corporate leaders are driven, ruthless and—yes—even profanely impolite men and women. Well, fancy that. They typically sacrifice much in their private lives to push forward in their public ones, and whatever their motivations, those of us who decide not to do that often owe our livelihoods and more to them. Taking a company from start-up to empire is not for the faint of heart. So what I liked enormously about Isaacson’s book was the feeling that he understood this, and while his role as biographer has required him to chronicle the missteps and confrontations, he avoids simple judgments and instead explores what made Steve Jobs, beyond the actuallymore-public-than-realized aspects of his background. In many respects, it was his upbringing as an indulged adopted son. But as deeply as he loved that father and mother, he spent much of his life on quests. As Isaacson shows his subject following them, he makes a couple of things more explicit than any Jobs-watcher before him. First, he sets out not merely how much importance Jobs attributed to the intersection of aesthetics and technology, but also how he reached that conclusion. The influences that drove Jobs’ design perfectionism are set out in full, as is his philosophy—and the latter was something that, while alive, he actually tended to guard from his competitors. Second, Isaacson makes it clear, again once and for all, that Jobs was far more a systems guy than either a high-level technologist or an electronics engineer specifically. For him, it was always about the interplay of hardware and software, and then how you had to package everything in an attractive and usable form. The book is not a “How To” for anyone who wants to build the next generation’s Apple. You still need a Steve Jobs to do that. But it does highlight the main design challenges facing high technology today. It is a world that has become still more inherently systemic and integrated, a process that Jobs did live to see. In that respect, Isaacson’s book could prove as useful to you as any of the articles in this very magazine as a slice of all-too-contemporary history.
6
TECH DESIGN FORUM // DECEMBER 2011
A Powerful Platform for Amazing Performance Performance. To get it right, you need a foundry with an Open Innovation Platform™ and process technologies that provides the flexibility to expertly choreograph your success. To get it right, you need TSMC. Whether your designs are built on mainstream or highly advanced processes, TSMC ensures your products achieve maximum value and performance. Product Differentiation. Increased functionality and better system performance drive product value. So you need a foundry partner who keeps your products at their innovative best. TSMC’s robust platform provides the options you need to increase functionality, maximize system performance and ultimately differentiate your products. Faster Time-to-Market. Early market entry means more product revenue. TSMC’s DFM-driven design initiatives, libraries and IP programs, together with leading EDA suppliers and manufacturing data-driven PDKs, shorten your yield ramp. That gets you to market in a fraction of the time it takes your competition. Investment Optimization. Every design is an investment. Function integration and die size reduction help drive your margins. It’s simple, but not easy. We continuously improve our process technologies so you get your designs produced right the first time. Because that’s what it takes to choreograph a technical and business success. Find out how TSMC can drive your most important innovations with a powerful platform to create amazing performance. Visit www.tsmc.com
Copyright 2011 Taiwan Semiconductor Manufacturing Company Ltd. All rights reserved. Open Innovation Platform™ is a trademark of TSMC.
COMMENTARY: [THE FUTURE]
Tame the system This January, Tech Design Forum moves online and will roll out a new approach to helping you manage technical content.
T
ech Design Forum is evolving into a predominantly online journal. Why? So that an expanded editorial team can focus on bringing you the key information you need to get your job done, whether it’s designing a chip or building an entire electronics product. Our premise is that there’s plenty of information out there for design engineers and managers. So much that working out what’s important and relevant has become a burden. Researching the impact of a new technology or evolving standard involves following a breadcrumb trail of web links and pointers. We want to lift that burden, organizing the material into a flow that matches each of the major phases of electronics system design, and curating it so that we bring you the most relevant information first. Result? We make it easier for you to get your job done and get your product to market. We’ll be doing two other important things. The first will be to look at the issues that affect the whole design process, such as silicon intellectual property and design management. The second will be to ensure that, although we’re covering the leading edge, we also bring insight from the leading edge to designers working with established tools and mature processes. The new Tech Design Forum will launch as a website on January 5, 2012. With The RTC Group, we still expect to publish special print issues to cover some major events and big themes. Who’s behind the new Tech Design Forum? Paul Dempsey, current Editor-in-Chief, is being joined by highly experienced technology journalists Luke Collins and Chris Edwards, who will jointly own the new site through a wholly independent group, The Curation Company. Together they bring more than 60 years of experience in technology journalism and more than 30 years of experience in specific cov-
8
TECH DESIGN FORUM // DECEMBER 2011
erage of electronics system design. “The RTC group has successfully managed and produced the Tech Design Forum journal and events for four years and we look forward to an ongoing partnership with The Curation Company,” said John Reardon, CEO of The RTC Group. The new journal will be based at http://www.techdesignforum. com. Please drop by, have a look around and sign up as a registered user for extra content. But right now, stick with us over the new couple of pages and we’ll explain what we are changing.
Finding what matters We’re not claiming to be the only source of technical information for system design. Quite the opposite. The challenge is that digging out what’s relevant is harder than it needs to be. We have all tried to get a quick answer to a tough question online and been frustrated by having to sift through the host of links served up. The existing media does a good job of researching and writing news, but that does not necessarily mesh well with the needs of a busy engineering team where people need answers and need them quickly. Tech Design Forum already structures content under headings that describe each article’s main themes relative to the design flow, and our research says you like this approach. We are going to build on this by producing a series of overview articles that draw together the most important elements, and discuss how they increasingly interact. Each overview will also direct you toward the most relevant supporting material we can find, using embedded links—it’s an approach that has been surprisingly underplayed by the technical media.
Creative curation The key difference in our role versus those of other information providers is that of curation. Consider a museum. There may be a vast amount in the collection,
but it is the curator’s job to make sense of it. What should be put in the main galleries? What special exhibitions are most timely? And how, if someone really needs to get into the archives, do you make that process as manageable as possible and guarantee that those archives are in good order? We are adopting the same model. We will go beyond simply adding to the site to actively manage all of its aspects from the day material arrives to the day on which it can actually be retired. Our commitment to you is that we will fill the roles of both editors and curators to provide the most timely help in getting your job done as efficiently and effectively as possible.
All in one place Recognizing that design doesn’t happen in a vacuum, we will focus on three key challenges: the issues that affect the whole flow, such as version management; the discrete issues that have to be overcome in each phase of the design; and the increasing complexity of the interactions between different phases of the design. Labels may be necessary and useful, but silos are dangerously counter-productive. As new technologies and approaches emerge and existing ones are updated, the material on techdesignforum. com will evolve, so that it will always be relevant to today’s challenge.
Paul Dempsey
Meet the team
All the time By concentrating the efforts of three highly experienced design journalists, we can provide updates when and where they make sense, keeping the digital noise to a minimum while boosting the signal.
Chris Edwards
Responding to needs We’d like your help making the site as relevant as possible. Each month, we will invite your input on what you think are the most pressing issues facing your design teams and to which you would like your suppliers to respond. We will also propose some coverage areas that you can vote on, and set a core editorial calendar so you can see what topics are coming up. Beyond that, we plan to work
with the broad system design community to gather experts from across the market for our traditional articles and as white paper contributors. We will be setting up regular panels and ‘ask the expert’ postings. We’ll have questions of our own but we also want them from you, the trickier the better. And we’ll be assembling a roster of bloggers as well as offering our own thoughts on the latest news and events. Since its launch, Tech Design Forum has been about tailoring its content and execution to the practical challenges facing designers. We believe that our move online will enable us to build on that core principle and so serve you, the readership, more usefully. So, what happens now? From January 5, 2012, we will be gradually adding functionality, framing content and other new features over the course of the first quarter of 2012. Continuous editorial updates and blogs will start immediately. The first thing we would encourage you to do is to register as a user at techdesignforum.com as soon as possible. Then you can start giving us feedback and guidance on what your priorities from this project are. It will also allow us to keep you up to date with the new look, articles that are appearing and more. You can also follow us on Facebook ‘Tech Design Forum’, Twitter ‘Tech Design Forum’ and LinkedIn ‘Tech Design Forum’, or you can email us directly at tdf.feedback@thecurationcompany.com with any suggestions. We all know the shared challenge ahead and we all know how journals can help you overcome it. Together, it is time to tame the system.
Paul Dempsey is the current editor-in-chief and a founder of the Tech Design Forum journal. He has more than 20 years’ experience covering various branches of technology and engineering in both the UK and the USA. Paul has held senior editorial positions with specialist newsletters published by The Financial Times, and Electronic Engineering Times UK, and is also the current Washington Correspondent for the Institution of Engineering & Technology’s flagship title, E&T. Chris Edwards is a freelance technology journalist with 20 years’ experience of covering the electronics, embedded systems and electronic design automation (EDA) industries. He is a former Editor-in-Chief of Electronics Times and Electronic Engineering Times UK and was launch editor for two magazines for the Institution of Engineering & Technology: Electronics Systems and Software and Information Professional. Luke Collins is a freelance technology journalist with 22 years’ experience of covering the electronics and electronic design automation industries. He is a former Editor-in-Chief of Electronics Times in the UK, and co-founded the IP9x series of conferences on semiconductor intellectual property in Silicon Valley and Europe. Since 2001 Luke has edited the Features and Communications Engineering sections of the Institution of Engineering & Technology’s flagship title E&T, and written extensively on research, development and innovation management.
Luke Collins DECEMBER 2011 // TECH DESIGN FORUM
9
COMMENTARY: [ARCHITECTURE]
When ARM’s 64 There’s already some love out there for ARM’s v8 64bit architecture as the processor giant builds out the ecosystem.
I
t was an architectural announcement. Even its positioning on the last day of the ARM TechCon event was intended to emphasize that the company’s confirmation of its move into 64bit processing is one for the future, albeit the relatively near future. The ARMv8 was unveiled in its applications form only and then with just the basic details. According to Mike Muller, chief technology officer, the main reasons for making the announcement now are to clarify the roadmap and to allow time for the construction of an appropriate support ecosystem around the new core. That second point is important. Once upon a time, the ARM “Connected Community” was relatively small even though the technology was becoming increasingly influential. Today, it has more than 770 members and continues to grow. The idea that ARM could quietly nurture its 64bit architecture toward a commercial release without any details leaking across this size of ecosystem simply doesn’t hold water. More to the point, getting the best support out there for what is likely to be a fairly bloody commercial battle will involve getting the best tools and other support Mike Muller built around the v8 quickly.
nouncement, K.D. Hallman, a general manager with the software giant, was on hand to provide one of the potted quotations. “ARM is an important partner for Microsoft. The evolution of ARM to support a 64bit architecture is a significant development for ARM and for the ARM ecosystem. We look forward to witnessing this technology’s potential to enhance future ARM-based solutions,” he said. Also present was Nvidia, with its declared intentions in low-power processors to take it beyond its historical strength in graphics. “The combination of Nvidia’s leadership in energy-efficient, high-performance processing and the new ARMv8 architecture will enable game-shifting breakthroughs in devices across the full range of computing, from smartphones through to supercomputers,” said Dan Vivoli, senior vice president. Muller’s comments at ARM TechCon, however, suggested that it is the very high performance and, more important perhaps, low power pressures on that market that his company sees as the sweet spot: servers and other enterprise-class hardware.
A tough fight
Up and away
And when we say “bloody” we mean it. ARM has again bearded Intel with a direct challenge to the x86 architecture. However, its intentions with the v8 are not directly or even largely focused on the desktop. Yes, the move to 64bit does feed into Microsoft’s decision to tailor its still dominant Windows PC operating system (OS) for ARM-based chips as well as traditional x86 ones. At the an-
According to the Environmental Protection Agency, U.S. energy consumption powering servers will exceed $7B this year and has risen by more than 40% in the last five years. With the move of an increasing amount of functionality and storage to the cloud—key features in the recent high profile launches of the Amazon Fire tablet and the Mac OS X Lion—it seems fair to
10
TECH DESIGN FORUM // DECEMBER 2011
Continued on page 12
The ever popular EDA Tech Forum is now
TECHNICAL JOURNAL ONLINE COMMUNITY
VISIT US ONLINE • Blogs • Technical Articles • Industry Related Information
GLOBAL CONFERENCE SERIES
www.TECHDESIGNFORUMS.COM Sponsored by:
COMMENTARY: [ARCHITECTURE]
ARMv8 CRYPTO
• A-profile only (at this time) • 64-bit architecture support VFPv3/v4
CRYPTO
{
NEON™ Adv SIMD
Thumb®-2
Key feature ARMv7-A compatability
A64 ISA
A32 + T32 ISAs
TrustZone®
including: • SCALAR FP (SP AND DP) • ADV SIMD (SP Float)
SIMD
including • Scalar FP (SP AND DP) • ADV SIMD (SP+DP Float)
VFPv2 AArch64
AArch32 Jazelle® ARMv5
ARMv6
ARMv7-A/R
ARMv8-A
Figure 1 The application profile for the ARMv8
assume that server activity will increase, but the degree to which such increases in power consumption are acceptable, economically or environmentally, must be open to question. That has opened a window of opportunity for ARM. While the enterprise market is demanding in terms of performance, it is also relatively conservative at its heart. IT managers of huge, complex systems are wary of major architectural changes unless they are forced upon them. Change is risk, and in a world where the term “mission critical” is more than a cliché, risk must be mitigated to the greatest degree. Now, however, the burgeoning demands being placed on server farms already are being added to those likely to spring from consumer devices and productivity-focused tablets pulling from them also. And that is a scenario that could apply within an individual company and its staff network. Throw consumers into the mix, and the likelihood of a further hugely costly ramp in power consumption becomes clear. All that plays to ARM’s strengths. One other important point here is commoditization. Server chips attract far chunkier margins—some reports have put these as high as 67%—than either PC processors or ARM-based smartphone chips. By contrast, shortly before putting the v8 on its roadmap, ARM also announced the Cortex A7 MPCore processor and introduced its concept of big.LITTLE processing. By combining the low-power
12
TECH DESIGN FORUM // DECEMBER 2011
focus of the existing Cortex A8 with high-performance features from the Cortex A15, ARM has assembled a clever combination. But the product’s focus is largely on “sub-$100 entry-level smartphones.” There’s demand for this stuff alright, and not just in emerging markets (see page 20). But the inherent price sensitivities are obvious. Certainly, ARM’s recent relationship with analysts has been marked by plaudits for its success in mobile communications but also warnings about the perennially aggressive shrinkage in margins for that market. And there is also a long-standing requirement set upon ARM to prove itself beyond that space— something it has already addressed with a successful foray into microcontrollers and which now will also play out in servers and elsewhere with v8. Having addressed a large part of the commercial background to the move to 64bit, let’s now go under the hood.
First look Last year, ARM introduced the Large Physical Address Extension (LPAE) to translate the 32bit virtual addresses within its v7 architecture into 40bit physical addresses. However, the memory limit for that architecture remained 4GByte, insufficient for the more computationally complex software that runs on servers, particu-
Continued on page 14
DESIGN RULE CHECKING PLUS (DRC+) PATTERN MATCHING - ANOTHER FIRST FROM GLOBALFOUNDRIES
Luigi Capodieci, Fellow, Director CAD Design Engineering, GLOBALFOUNDRIES
Since this ad last ran in June, GLOBALFOUNDRIES’ DRC+ was awarded Best Paper at DAC and was a finalist for the EDN Innovation Award. Awarded Best Paper at DAC2011 and finalist for the 2010 EDN Innovation Award
“DRC+ significantly TM
shortens design validation in advanced silicon geometries starting with 40nm.”
www.globalfoundries.com/drcplus
COMMENTARY: [ARCHITECTURE]
The Barcelona Supercomputing Center has already developed an ARM-based HPC larly for database management. The v8 now fills that gap. For the launch, the detail provided is mainly intended for OS and compiler companies as well as those providing tool support to hardware designers. As such, it focuses on the two execution states, AArch64 and an enhanced AArch32. The AArch64 execution state introduces a new instruction set, A64. Meanwhile, key features of the v7 architecture are maintained or extended in the v8 architecture. In a separate ARM TechCon presentation from Mike Muller’s launch keynote, ARM fellow Richard Grisenthwaite went into some more detail as to how this will play out. In addition to A64, headline features for the AArch64 state include: revised exception handling for exceptions in the AArch64 state, with fewer banked registers and modes; support for the same architectural capabilities as in ARMv7, includingTrustZone, virtualization and NEON advanced SIMD; and a memory translation system based on the existing LPAE table format. Noting that work on the 64bit version has been under way since 2007, Grisenthwaite said that last year’s LPAE format “was designed to be easily extendable to AArch64-bit” and that the new technology features up to 48bit of virtual address space from a translation table base register. Instructions in A64 are 32bit with a clean decode table based on 5bit register specifiers. The semantics are broadly the same as in AArch32 and changes have been made “only where there is a compelling reason.” Some 31 general purpose registers are accessible at all times, with a view to a balance between performance and energy. The general purpose registers are 64bits wide, with no banking and neither the stack pointer nor the PC is one of them. An additional dedicated zero register is available for most instructions. There are obviously differences between AArch64 and AArch32, although much has been done to preserve compatibility and scal-
14
TECH DESIGN FORUM // DECEMBER 2011
ability. Here are some of the key points. There are necessarily new instructions to support 64bit operands, but most instructions can have 32bit or 64bit arguments. Addresses are assumed to be 64bits in size The primary target data models are LP64 and LLP64, respectively the models used in Unix/Unix-based systems and in Windows. Meanwhile, there are far fewer conditional instructions than in AArch32, and there are no arbitrary length load/store multiple instructions. Finally, here, Grisenthwaite’s paper set out some details of the A64 Advanced SIMD and floating point (PD) instruction set. It is semantically similar to A32: advanced SIMD shares the floating-point register file as in AArch32. A64 then provides three major functional enhancements: 1. There are more 128bit registers—32x128bit wide registers, and registers can be viewed as 64bit wide. 2. Advanced SIMD supports double-precision floating-point execution; and 3. Advanced SIMD support, full IEEE754 execution, including rounding-modes, denorms, and NaN handling. The register packing model in A64 is different from that in A32, so the 64bit register view fits in the bottom of the 128-bit registers. In line with support for the current IEEE754-2008 standard for floating point arithmetic, there are some FP instructions (e.g., MaxNum/MinNum instructions, float-to-integer conversions with RoundTiesAway). Changes between AArch32 and AArch64 occur on exception/ exception return only. The increasing exception level cannot decrease register width (or vice versa) and there is no branch and link between AArch32 and AArch64. AArch32 applications are allowed under the AArch64 OS Kernel and also alongside AArch64 applications. An AArch32 guest OS will run under AArch64 Hypervisor and alongside an AArch64 guest OS. Grisenthwaite’s entire introductory description of features underpinning the roll-out of v8 to the ARM ecosystem can be downloaded at www.arm.com/files/downloads/ARMv8_Architecture.pdf.
ARM and servers today Some preparatory work has already taken place with selected partners. The ARM compiler and Fast Models with ARMv8 support have been distributed and, as noted, work has begun on support for a range of open source operating systems, as well as—it is reasonable to assume—for Windows. A number of applications and third-party tools are also in development. According to Muller, we can expect the full framework for v8 implementations next year and products should begin to appear
Continued on page 18
COMMENTARY: [ARCHITECTURE]
“In most current systems, CPUs alone consume the lion’s share of the energy, often 40 percent or more,” said Alex Ramirez, leader of the Mont-Blanc Project. “By comparison, the Mont-Blanc architecture will rely on energy-efficient compute accelerators and ARM processors used in embedded and mobile devices to achieve a four-to-10-times increase in energy-efficiency by 2014.” Mont-Blanc’s use of the existing v7 architecture makes it more of a pathfinder than a product, but it remains very much a declaration of intent.
The battle ahead
HP’s Redstone low energy project is reviewing both ARM and x86 options
over the 2013-2104 timeframe. Nevertheless, the first implementation has already been announced. Applied Micro has unveiled a demonstration based on an Xilinx Virtex6 FPGA running its Server SoC consisting of an ARM-64 CPU complex, coherent CPU fabric, high-performance I/O network, memory subsystem and a fully functional SoC subsystem. The work is paving the way for Applied’s X-Gene server-ona-chip family, which the company says, will be scalable from 2 to 128 cores running at 3.0GHz with power consumption of just 2W per core. “The current growth trajectory of data centers, driven by the viral explosion of social media and cloud computing applications, will continue to accelerate,” said Dr. Paramesh Gopi, Applied’s president and CEO. “In offering the world’s first 64bit ARM architecture processor, we harmonize the network with cloud computing and environmental responsibility. Our next-generation of multicore SoCs will bring in a new era of energy-efficient performance that doesn’t break the bank on a limited power supply.” Applied’s plan, running slightly ahead of Muller’s timetable, is to start offering customer sampling on a TSMC-produced v8 device in the second half of next year. Meanwhile, the Barcelona Supercomputing Center is also part of the push to take ARM into the high-performance market, in conjunction with Nvidia. It showed a hybrid system, Mont-Blanc, which combines ARM-based Nvidia Tegra CPU chips with GPUs based on Nvidia’s Cuda technology, at a supercomputing conference in Seattle this November. The objective of this work, much like Applied’s, places great emphasis on energy savings.
16
TECH DESIGN FORUM // DECEMBER 2011
However, one must note that ARM is not alone in pursuing lowpower options for the high-performance market. It is, excuse the pun, a hot button issue throughout the server and enterprise business. “Collectively, data centers around the world consume nearly 1.5 percent of total electricity production and almost $44.5B a year is spent on powering the servers in these data centers,” said Linley Gwennap, principal analyst from Linley Group. “Looking at the growth projections for data center usage and the future of power generation growth, this trajectory is unsustainable. A new paradigm for developing data centers based on energy efficiency will certainly help make data centers scale realistically with future demand growth.” For example, in its Redstone low-energy server design project, HP plans to use and evaluate both ARM-based and Intel x86based chips. AMD, which has had a tough time in servers of late, has put low-power on the agenda as part of its recent restructuring, unveiling its Opteron 3000 for the micro-server market as well as promising enhancements to its mainstream devices. ARM’s other challenge will be getting silicon vendors to adopt the new architecture. It has had success in wooing players in the microcontroller market already. And it is the go-to brand for low power. But x86 is powerful here and the challenge in general computing is arguably greater than “evolving up” from mobile phones, particularly since while ARM-based 64bit-based PCs can be expected, the company essentially wants to leapfrog that heavily commoditized market. Hence the focus on making the unveiling an architectural rather than product announcement, notwithstanding the innovative play from Applied Micro. Server players will want everything in place before they jump—ARM, though, has an existing infrastructure that can deliver the necessary components.
Remove the hurdles in your SoC design
The increasing complexity in embedded design is creating new hurdles to overcome in advanced SoCs. You can exert a lot of energy addressing these challenges yourself or you can choose a simpler path. ARM’s DesignStart™ online access portal provides silicon proven Artisan® advanced logic, memories and interface physical IP optimized for Cortex® processors. This tuned suite of physical IP delivers industry leading low power, high performance technology optimized for leading foundries and supported by a broad range of EDA solutions.
Accelerate your design today with ARM DesignStart™
www. designstart.arm.com
The Architecture for the Digital World
® © ARM Ltd. AD273 | 02.11
COMMENTARY: [ANALYSIS]
CES mourns the computer, hails computing We track some of the major trends to look out for at January’s International CES in Las Vegas.
T
he 2012 International CES takes place in Las Vegas from January 10-13, slightly later than usual, which will come as a relief to everyone recovering from the holidays before assailing the show’s crazy crowds. In November, the Consumer Electronics Association presented its usual preview in New York City, where it released some headline data and also highlighted a few trends to follow for those going to Sin City or simply following the gushing spigot of announcements that will flood across all media. One addition this year was a pre-show survey of buyers. There is always some preview data on what consumers are thinking— and we’ll come to that shortly—but the new exercise threw up one important point: the guys and gals stocking the warehouses also think that we now unquestionably live in a system world. Consider first the five main product groups that they said were on their minds before CES: tablets, smartphones, TVs, ultrabooks and accessories. Then there are the “activities” around which they see sales being generated. The top five here: computing, streaming, mobility, control and connecting. Finally, the tech trends. The five buzz concepts here follow on logically from their other priorities: smart, cloud, touch, voice and apps. As the CEA’s leading analysts, Shawn DuBravac and Steve Koenig, noted in a typically zippy presentation, consumer products used to be quite closely defined, particularly in terms of their functionality. Today, though, we want devices that are customizable or that might consolidate the role several boxes played in the past. And, of course, we want products that we can buy and suddenly do things with them that we had not originally imagined. The tablet and the smartphone have undoubtedly done this, and so the next list of the five “hottest” specific areas should come as little surprise:
18
TECH DESIGN FORUM // DECEMBER 2011
1. Apps (for mobile devices) 2. Tablets 3. Devices for streamed content 4. Internet-enabled TVs 5. Devices to enable sharing content Indeed, tablets remain hot even though they have already reached about 10% of U.S. households according to the latest CEA data. But beyond that, the preview also showed the same trend moving into car radios that can access the Android apps marketplace. And this is not an isolated process. “People used to go to CES to see devices,” said DuBravac. “Now they go to see entire ecosystems. They’re interested in how all these products interact.” With that in mind, here are the three CES trends that DuBravac and Koenig picked out. 2007
2008
2009
2010
2011
% of giſt spending on CE
22
28
29
31
32
Amount ($)
194
206
222
232
246
Figure 1 Allocation of gift spending on CE (all)
1. The end of the computer... all hail computing There has been an ongoing friction between the drive to pack more capability into end products while at the same time reducing power consumption and extending battery life. But, according to the CEA duo, an extra nuance is now being added where the next big trend is going to be in adding wireless interconnectivity to all your consumer electronics. It’s not about just doing more, but broadening the experience for the
Continued on page 20
Transform your design flow. Introducing the Blue Pearl Software Suite. Analyze RTL & CDC. Create & Validate timing constraints.
Contact us for a personal demonstration: (408) 961-0121 sales@bluepearlsoftware.com
The Blue Pearl Suite includes a unique Visual Verification Environment. Design set-up is streamlined, offering quick feedback on design structure and hierarchy, with an intuitive design environment browser. Find out more at www.bluepearlsoftware.com.
DECEMBER 2011 // TECH DESIGN FORUM
COMMENTARY: [SECTION]
500
Find innovation at the Venetian
400 300 $478
200 100
$101
0
Spend More
Spend Less
Figure 2 Allocation of gift spending on CE, 2011
user across a range of devices. Beyond that, as noted above, the degree to which products are “substitutive” will also be a key driver in determining how much market traction they will get. Given all these shifts, DuBravac noted that last year’s CES saw more than 100 tablets offered. “But there was a lot of experimentation. Now those devices have more concrete use-cases,” he added.
2. CE now stands for “customizable experiences” This is an obvious play on consumer electronics’ traditional abbreviation. It extends the observation that vendors now have a clearer view of the tablet use-case came to make the broader point that buyers want to be able to tailor devices to their tastes. As Koenig noted, this does not mean that tablets are simply “empty vessels”—there is a range of core functionality and processing power that they must provide. “However, it is not just about the platform but also the ecosystem,” he continued. “‘What can I get?’ Not just devices but also services and accessories.”
3. The year of the interface There is a natural progress in innovation, which DuBravac illustrated well in terms of TV remote controls. What was new, gets pushed out (or more precisely into the background) to make way for the next wave of ideas. So remotes began as clunky push button devices linked to just one thing, the TV itself. More recently, we got multi-button, multi-device remotes. Then the interface was simplified and we are now seeing touchscreen products that can run not merely your home theater but also appliances and central heating. But LG’s wand-like remote now takes that even further—it has barely any controls on it at all, but rather you control products with movements of your hand. The UI has long been the area where consumer electronics was felt
20
TECH DESIGN FORUM // DECEMBER 2011
The Eureka Park is a new addition to the International CES that will showcase more than 70 innovative companies in their own area at the Venetian Hotel. This latest TechZone (the show now has 25) is specifically targeting media, venture capitalists, analysts and others looking to catch emerging companies and ideas. Its sitting at the Venetian will place it alongside many of CES’ conference sessions and keynote addresses, hopefully encouraging those audiences to visit the Eureka Park and see, in the flesh, some of the new ideas described by speakers and on panels. “Innovation and entrepreneurship drive our economy forward, and the Eureka Park TechZone proves that CES is the global platform for growing companies to unveil their game-changing technologies to the marketplace,” said Gary Shapiro, president and CEO. “While leaders strive for policies that will create jobs, the companies within Eureka Park are creating products and services that will bring economic prosperity. We are excited to welcome these companies to CES and look forward to witnessing their cutting-edge innovations.”
to fall down—hence, so much of Apple’s success—but the move toward a more ecosystem-based set of products means that ease-of-use is becoming increasingly important and must therefore be simpler and intuitive. Much of this may seem relatively straightforward, and it is. But the main idea is that it points to this year’s CES as the venue for a maturation in the product cycle that, from a silicon design perspective, appears to presage two things: a still greater importance for software and the still further wireless integration within devices, while the powerperformance stand-off continues.
The shape of the market The other important part of the CEA’s preview event is the pre-show market data. Here, the headline numbers are encouraging but there was something of a devil in the detail. The U.S. consumer electronics industry is headed for shipment value of $190B in 2011, 5.6% up on the year before. The current CEA forecast for 2012 is $197B. Consumer confidence, meanwhile, is at its highest level since December 2010, and the typical holiday spend on electronics is set to be around $246, up 6% on 2010 gift spending.
Continued on page 22
IP, Services, and EDA Company • Optimal library IP for high performance design • Fast prototyping • Full custom automation for creation, characterization and optimization of digital design
A Portfolio of Custom Solutions • Tool suite consists of an integrated package of powerful EDA solutions • Choose one or more products to fit into your existing tool suite and process flow
(408) 541-1992
Excelling Above the Rest • Benefit from full custom design in a time-relevant and cost-effective platform • Digital performance, power and area optimization • Seamless integration
WWW.NANGATE.COM
COMMENTARY: [ANALYSIS]
80 70
80%
60 50 40 30 35%
20
31%
29%
28% 27%
26%
25% 25%
Vide o
ic ga
min
g
es Elec
hno tec ing
erg
tron
logi
nte /co ent
Em
Ent
erta
inm
nd re a
wa ard
Com
put
er h
nt
re wa
hom ted
nec Con
as edi ltim
mu ed bas
et-
soſt
s ice erv
tro lec le e
sty Life
Int
ern
Wir e
les
sa
nd
wir
ele
ss
dev
nic
ice
s
s
0
e
10
Figure 3 Buyers’ ranking of CES trends by category
So what’s the problem? Well, perhaps it is not a problem as such, more a set of possible warning signs about the general state of the still troubled U.S. economy. The public is pushing its purchases out further and further, and many families are waiting on Black Friday deals. Retailers such as Best Buy are following the lead set over Thanksgiving 2010 by Toys“R”Us by opening at midnight, immediately after the holiday has finished, rather than the already brutally early 5am. Black Friday itself is turning into Black November with shops extending offers. At the same time, retailer inventories are tight, as low as they have been in four years. “They are not at all-time lows, but they are close,” added DuBravac. However, the most striking statistic comes in the breakdown of that average holiday spend. About one-third of consumers surveyed by the CEA said that they intend to cut back on holiday spending this year, and the difference in the budgets between those who will spend more and those who will spend less is stark. “There’s about a 5X difference,” said Koenig. “Clearly these two different groups have very different products in mind, and if we see movement between them it will significantly impact the results.” The specific average gift buying numbers are $478 per household for those who plan to push the boat out, and just $101 for those who
22
TECH DESIGN FORUM // DECEMBER 2011
are reining in. A simplistic view would be that this reflects headline grabbing U.S. concerns about the polarization in society between haves and havenots. There may well be some of that and the Occupy movement is a factor here. But the other question that stands is how it reflects sentiment. The $101 group will undoubtedly include people who have lost jobs in the family, but another important slice will be those who are concerned about the economy, their own prospects and paying down existing debt. Again, there do seem to be implications here that will spread to electronics design. The last few months have seen a number of vendors roll out products that target low-cost versions of existing products such as tablets and smartphones. One of the latest was ARM (see page 14). The assumption in the past was that such offerings were primarily aimed at emerging markets. However, given the distinction drawn by the CEA itself between the different types of product that two distinct groups of consumer will seek out, it would appear that the economy is now making demands for the low-cost, entry-level product more universal. Registration, exhibitors and conference programs for the 2012 International CES are available online at www.cesweb.org.
Cool Value You won’t find a better CPLD value than in MAX® V CPLDs. With a non-volatile architecture and one of the largest density CPLDs on the market, the MAX V family gives you: • Lower total system cost • Up to 50 percent lower total power vs. competitive CPLDs • Robust new features And with Altera, you know you’ll get devices in volume when you need them. How can you resist such a value?
MAX V CPLDs: Cool Value www.altera.com/maxv
DECEMBER 2011 // TECH DESIGN FORUM
TECH FORUM: [ESL/SYSTEM C]
The seven habits of highly effective virtual prototypes Shabtay Matalon and Yossi Veller, Mentor Graphics
Virtual prototyping is not a new technique, but the advent of transaction level modeling and an increased focus on seven key requirements for their effective use means that today’s versions are much more broadly applicable and comparatively future proof. Those seven qualities are: industry standards; platform modeling; processor modeling; virtual prototype creation; integrated hardware/software visualization and debug; performance/power analysis under software control; and optimization of software on multiple cores. The article provides a brief review of the importance of each and briefly describes a modeling strategy in terms of the Mentor Graphics Vista tool suite.
T
Shabtay Matalon is ESL market development manager for Mentor Graphics Design Creation and Synthesis Division. He received a BS in Electrical Engineering from the Technion, Israel Institute of Technology, Haifa, Israel. He has been active in system-level design and verification tools and methodologies for over 20 years and published several articles in these areas. At Mentor Graphics, Shabtay focuses on architectural design and analysis at the transaction level. Prior to joining Mentor Graphics, Shabtay held senior marketing and engineering positions at Cadence Design, Quickturn, Zycad and Daisy Systems. Yossi Veller is the chief scientist in the Mentor Graphics ESL Division. During his long software career, Yossi has led ADA compiler, VHDL, and C simulation development groups. He was also the CTO of Summit Design. He holds degrees in computer science, mathematics, and electrical engineering.
24
TECH DESIGN FORUM // DECEMBER 2011
he share of key functionality implemented in software running on processors continues to grow in new designs. No longer dominating just laptops and PCs, software reigns in communication, networking, and automotive devices, and embedded software is found in many consumer devices. With off-the-shelf platforms providing the foundation for modern designs, it is software combined with selected hardware accelerators that differentiates one product from another. The growing importance of low power consumer and green devices is one reason for the increase in software processing units. Modern low power design techniques have built-in facilities to control power, but embedded and application software have “smarts” that add the use case context and determine how and when appropriate power control techniques can be applied. In addition, optimizing software for the processors it runs on can also help reduce power consumption. How well software and hardware interact defines a device’s key performance, power consumption, and cost attributes. Integrating and optimizing software after hardware has been already built is no longer an option; nor is the common practice
of validating hardware and software in isolation. Software and hardware interactions must be validated before either set of architectural decisions is finalized. Virtual prototyping gives software engineers the ability to influence the hardware specification before the RTL is implemented and reduces the final HW/SW integration and verification effort. It also provides significant benefits over hardware prototyping by using high-speed abstracted simulation models of the hardware. Virtual prototyping enables software engineers to use their software debugger of choice. It facilitates the debugging of complex HW/ SW interactions by providing simulation control and visibility into the hardwarestates, memories, and registers. And it provides a comprehensive set of analysis capabilities that allows engineers to optimize the software and improve how it controls the hardware to meet performance and power goals. Early virtual prototyping modeling techniques did not address today’s challenges. Most ran software against loose, proprietary mockup models of the hardware, providing a programmer’s view of the hardware to the software routines. They allowed partial validation of the
1. Industry standards Advanced virtual prototypes are composed of transaction level models (TLMs) that abstract functionality, timing, and communication. The SystemC TLM2.0 standard allows these models to be reused from project to project and makes them interoperable both among internal design teams and across the entire industry. Industry-compliant TLMs can be run on any industry-compliant SystemC simulator without requiring proprietary extensions. In addition, TLM2.0 contains specific enhancements that enable very ef-
Communication Layer
Function Communication Layer
Port
Port
Timing
Power
Communication Layer
software functionality against the hardware register address space, but had limited capabilities when it came to validating the functionality of an entire device. To overcome that limitation, these virtual prototypes attempted to provide additional cycle accurate models that represented the functional behavior during each clock cycle, but at a faster speed compared to RTL. Where the programmer’s view did not contain sufficient granularity of the underlying hardware to completely validate the software, cycle accurate models required a modeling effort close to that of writing the RTL, and frequently suffered from insufficient simulation performance to run software application code. Evaluating either performance or power under software control using either technique was generally impractical. Reuse of these models to produce virtual prototypes outside the framework of a single vendor environment was impossible due to the proprietary, closed nature of the model interface. Reuse in downstream flows, such as RTL verification, was non-existent. To overcome these issues, today’s more advanced virtual prototyping technologies, such as the Vista virtual prototyping technology from Mentor Graphics, should have seven key attributes that enable them to address current and future design challenges.
Communication Layer
Figure 1 Scalable TLM power model and power modeling policies Source: Mentor Graphics
Figure 2 Vista communication, computation, and state-based power policies Source: Mentor Graphics
ficient communication for optimal simulation speeds. 2. Platform modeling (LT/AT) A platform modeling strategy not only defines the level of investment to create the platform but also the capabilities provided to the end user. A scalable TLMbased methodology separates communication, functionality, and the architectural aspects of timing and power into distinct models. Such a model can run in a loosely
timed (LT) mode at a very high speed—or it can switch to an approximately timed (AT) mode for more detailed performance and power evaluations under software control. When modeling power, AT mode allows engineers to associate power values with transaction-level computation and communication time and consider the power state of each model. LT/AT switching can be facilitated during run
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
25
Soſtware OS (e.g. MEL) Hardware Tracing Simulation Control
Performance on Layer Communicati Port Function Port
Power
Port
Power
Timing Port
on Layer Communicati
Figure 3 Vista hardware-aware virtual prototypes Source: Mentor Graphics time to adapt to the software mode of operation, such as a boot versus running application code. 3. Processor modeling (JIT) Processor models that run the embedded software are at the heart of any effective virtual prototype. These models usually determine the overall simulation performance of the platform, depending on their modeling and communication efficiency. Just In time (JIT) modeling allows the embedded code to run most efficiently on the host by allowing the target processor instruction code to be natively compiled into the host processor instruction code structure when needed. This is done while preserving thread safety and correctly supporting multiple instances of the same processor type or different processor types on the host.
26
TECH DESIGN FORUM // DECEMBER 2011
5. Integrated HW/SW debug As software is running on the virtual prototype, it is important to provide the right level of visibility into the hardware states. Software engineers are used to running their favorite software debug tools, such as off-the-shelf GDB, ARM RVDE, and Mentor Sourcery CodeBench. An integrated debug environment on a virtual platform allows them to use their preferred debuggers to validate, debug, and optimize the software using standard hardware visualization techniques, such as state, memory, and register views of the hardware. Simulation control of the software and hardware—such as single stepping through software advancing simulation time—is needed. So are stop, checkpoint, and restart capabilities.
4. Virtual prototype creation This process assembles the individual industry-standard-based processors, peripherals, buses, and memory TLMs into a virtual platform capable of executing software natively. The platform producer can use the TLM block diagram capability to define the design topology by connecting graphical symbols representing each TLM. Similarly, topology changes can be implemented quickly by interactively changing the connections. Saving the topology can automatically generate the complete virtual TLM platform model used to run the software on the embedded processors. A virtual prototyping compiler can produce executables in sufficient quantities to serve large software design teams. Such an executable provides the level of hardware visibility and control needed to integrate, validate, and optimize OS and application code against hardware.
6. Performance/power analysis As the software controls the hardware’s modes of operation and use models, it is important to perform software optimization to meet device performance and low power goals. This can only be accomplished by using performance analysis graphs, such as data throughput, latency, and power, that display hardware dynamic power for each software routine executing on the platform. The software designer can see the direct impact of software changes on the virtual platform’s performance and power attributes. 7. Optimization across multiple cores To maximize performance and lower power consumption, software is partitioned on multiple cores to provide the best throughput for the desired functionality. However, over-partitioning may reduce performance due to increased inter-processor communication or competition when sharing common hardware resources. Inversely, underutilizing the processor/core resources will result in sub-optimal performance.
Continued on page 28
Soſtware Debuggers (ARM RVDE, Code Bench)
Video Accelerator
Multi-core ISS
End User Application Soſtware
Interconnect Fabric INTC DMA USB ETHERNET FLASH DDR Peripherals
Virtual Prototype
UART TIMER WD ADC
Performance
Power
Figure 4 Creating a virtual prototype Source: Mentor Graphics
Thus the virtual prototyping solution must have the capabilities to conduct “what if” analyses to determine the optimal hardware configuration and software partitioning for differentiated designs. Mentor Graphics’ Vista hardware-aware virtual prototyping solution has all of these key attributes, enabling early validation of software against the target hardware, reducing the HW/SW verification effort, and easing the creation of post-silicon reference platforms. Virtual prototyping can be conducted at a much earlier design stage than physical prototypes—even before any RTL is designed—increasing productivity and shrinking time-to-market. Because virtual prototypes are highly abstract, the code representing the hardware is much smaller and simpler. Thus, virtual prototypes simulate orders of magnitude faster than RTL code, capturing bugs manifested in complex scenarios that are impossible to simulate at the RTL stage and making debug much easier. Further, TLM platform models can be used as golden reference models, reducing the time to construct RTL self-checking verification environments. Even after the device and chip are fabricated, virtual prototypes provide postsilicon reference platforms for simulating scenarios that are difficult to replicate and
28
TECH DESIGN FORUM // DECEMBER 2011
control on the final product. They also give visibility into internal performance, power, and design variables not reachable within the physical chip. Virtual prototypes can be used for isolating field reported problems and for exploring and fixing a problem through software patches or design revisions. Advanced virtual prototyping allows validation of the entire functionality implemented in both hardware and software. When in AT mode, virtual prototypes run several orders of magnitude faster than cycle accurate platform models, yet they still achieve a sufficient level of accuracy to support comprehensive performance and power optimization. When running in LT mode, virtual prototyping allows software engineers to quickly run the application, middleware, and OS code at close to realtime speeds against a complete functional model of the hardware. Switching between AT and LT modes during run time allows the virtual prototype to be used effectively at any time during simulation. Vista platform models give software engineers the ability to influence the hardware design before the RTL is implemented, reducing the final integration and verification efforts. Vista can also create an executable specification that can be provided to a large number of software engineers so that they can validate their application
software against the hardware during the pre-silicon design stage. This executable can even be given to field engineers as a reference debugging platform during the post-silicon stage, after the product has been sold to customers. As the amount of functionality implemented in software running on multicore processors continues to grow, how well software and hardware interact defines device performance, power consumption, and cost attributes. Advanced, hardwareaware virtual prototyping is the best way to optimize these important attributes and enable concurrent hardware/software development throughout the design flow.
Mentor Graphics Corporate Office 8005 SW Boeckman Rd Wilsonville OR 97070 USA T: +1 800 547 3000 W: www.mentor.com
TECH FORUM: [VERIFIED RTL TO GATES]
How to achieve power estimation, reduction and verification in lowpower design Kiran Vittal, Atrenta
Reducing power consumption requires design strategies that address potential savings as early in the flow as possible. Late stage design changes bring with them enormous costs in terms of time and money, and can even cause an entire project to fail. The article describes an approach, based on Atrenta’s SpyGlass-Power tool, that builds power consciousness into the RTL early enough in the flow to maximize savings and minimize disruption. The strategy uses structural and formal analysis, a broad range of rules and checks, and a set of capabilities to check for power inefficiencies to pinpoint weak areas and make automatic changes while preserving functional integrity.
F
Kiran Vittal is senior director of product marketing at Atrenta.
30
TECH DESIGN FORUM // DECEMBER 2011
or wireless electronic appliances, battery life has a major influence on purchasing decisions. Mobile phones, PDAs, digital cameras and personal MP3 players are increasingly marketed according to the battery life they offer. In wired applications, power consumption determines heat generation, which in turn drives packaging cost. If not managed properly, this may have a significant impact on the price of the finished product. The increasing density of ICs leads to progressively increasing power density and this further complicates the challenge inherent in packing more into systems while consuming less power. Industry projections suggest that today’s designs face further increases in leakage power in the range of 4-6X with each new process generation, so all available techniques must be used. The goal for the designer is simple— to control power consumption to the greatest extent possible. However, successfully doing so entails attention to
fundamental issues such as functionality, testability, manufacturability, area, timing and constraints. Moreover, modern day designs typically have many millions of components that require projects to be apportioned out to multiple design teams. Each team will be tasked to reach various project milestones that determine whether the finished chip ships on time. This complex web of interdependencies means it is extremely expensive to add additional steps to a flow, wherein a design is modified to deliver power efficiency. Time-tomarket pressures leave little room for maneuver in the creation of a powerstingy design. Design perspective Designers already have a number of techniques that they can use to reduce power consumption. They can adopt lower supply voltages, draw upon offthe-shelf power management features, and exploit several programming strategies.
Power Architecture Decide voltage and power domains, global clock gates Frozen very early in design flow
Power Estimation - power targets met? Was voltage/power/clock planning effective? Forecast possible results from gate selection
Power Reduction
Figure 2 RTL power estimation: leakage power, internal power and switching power over time Source: Atrenta
RTL Power Verification Domain sequencing; check level shiſter & isolation logic; power-aware simulation
Implementation - Synthesis, Place & Route Clock-gating, multi-VT, MTCMOS, power-recovery
Post-Layout Power Verification Level shiſter, isolation logic, power routing, verification Sign-off power estimation
Figure 1 Power saving flow Source: Atrenta
The challenge in designing for low power is that most tools offer the designer no visibility into the ramifications of these techniques at the RTL stage. The focus is on functional aspects of the design and power consumption gets pushed to the margins. The result is that easy-to-plug power loopholes creep into the design, and by the time the project has reached the stage where traditional tools will spot these problems, it is too late to make fundamental changes. Power optimization needs to become an integral part of the design process at the RTL. A tool is needed that identifies power inefficiencies in the RTL code, and then suggest ways in which power can be reduced based on tried-and-tested mechanisms. Designers want a tool that allows them to make changes while coding the
RTL itself, or that implements such changes automatically. Power saving approach The techniques cited above need to be applied in the context of a flow from the initial architectural definition through to the final design representation. The main steps are shown in Figure 1. During the architectural stage, the design team plans whether to use voltage domains, power domains and/or clock gating, typically using its own members’ expertise and a spreadsheet from previous designs. SpyGlass-Power from Atrenta provides additional high-speed, highly accurate power estimation at RTL. The key is to estimate power early, while the design can still be transformed rather than waiting until after a gatelevel implementation.
After power consumption information is calculated, the design team changes the design to reduce power. The Atrenta tool provides an activity-based power calculation of the impact of each gated enable, giving more intelligence and driving downstream clock-gate insertion. Moreover, it further reduces power by making sure existing clock enables are effective and by identifying new clock enables. While working with the RTL, level shifters and isolation logic are auto-inserted or inserted with an in-house script. These changes must be verified against the design intent—the definition of the voltage domains, power domains and their proper power-up/power-down sequencing. Managing all this at RTL means problems are caught early. Implementation tools synthesize and then place and route the design. They insert clock gates with guidance from earlier analysis. Placement and timing optimization transforms the design, inserts buffers, or swaps in low voltage threshold cells to help meet timing at the expense of leakage power. Once all this is complete, the final step is an impartial verification that the design is implemented true to the original power intent. Level shifters and isolation logic are verified, and power and ground pins on cells are checked to ensure they
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
31
Non-gated output:
Clock-gated output: clk
clk en
Typical ICGC (Integrated Clock Gating Cell)
en
d
q d
q
Since power consumption can be classified into two broad categories, dynamic power and static power, both require attention in a complete power management solution. Voltage domains and clock gating address dynamic power. Power domains to isolate (or “sleep”) parts of the design and multiple threshold voltage techniques address static (leakage) power.
RTL power estimation Figure 3 Clock gating Source: Atrenta
Original Implementation
Original Implementation
CLK
EN
EN
Modified Implementation CLK
Modified Implementation EN
EN
Figure 4 Identify new enables Source: Atrenta
are connected to the correct power and ground nets. Power saving techniques Power saving encompasses a breadth of techniques and addresses power at different
32
TECH DESIGN FORUM // DECEMBER 2011
instances in the flow from RTL to gate level to post-layout. Performing power estimation as early as possible, at RTL, provides valuable information about the power consumption while giving designers the time to make any necessary or beneficial changes.
An important precursor to power reduction work involves understanding how much power is being consumed. As early as possible in the flow, as soon as RTL is ready, power estimation will help designers understand where the greatest power is being consumed. Spreadsheet-based estimates may work for derivative designs, but new RTL is uncharted territory. The SpyGlass platform has been made timing-aware by loading timing constraints, design activity files and libraries to calculate a more accurate power number. This is especially important for timing-critical designs. The tool quickly builds a design representation to calculate the cycle-by-cycle and average power for the design’s dynamic, leakage and internal power. Graphical displays and generated reports of clock power, control power and memory power guide the design effort (Figure 2, p. 31). Multiple threshold voltages In an effort to save leakage power, backend implementation tools may use multiple libraries where cells have the same function but different threshold voltages. Overall, cells with a higher threshold voltage are used since they exhibit lower leakage power. After place and route, a timing optimization step sweeps through the design and swaps lower voltage threshold cells along a timing-critical
path. Though these cells are faster, they are higher leakage. Design teams who utilize multiple threshold voltage techniques have a sense of the typical “mix percentage” of high Vt to low Vt cells. The power estimation in SpyGlass-Power can use this percentage to compute power consumption at RTL.
Candidates for Gating
Enable
Clock
Power reduction Clock gating The set of practices focused on controlling the activity of nets in a digital circuit with a view to reducing power consumption is called “activity management.” Clocks are the most active nets in a design and contribute significantly to overall activity. Therefore, they are the main targets of activity control techniques that seek to reduce power consumption. Clock gating is a prominent technique here. An explicit clock enable in the RTL code allows synthesis tools to choose between two implementations, as shown in Figure 3. The Atrenta tool takes the RTL description of the design, performs a fast synthesis, and analyzes it to suggest clocks that could be gated to achieve power efficiencies. Rather than allow a synthesis tool to insert gated clocks based only on the width of the data bus, the tool shows the designer which enables will save the most power. It then creates a constraint file for downstream synthesis. Beyond that, it will report new opportunities for clock enables that may not have occurred to the RTL designer. Ungated downstream registers present an opportunity for power savings if the enable is delayed by one clock cycle. Alternatively, data can be gated to disable activity in parts of the design where the data is not being listened to. Figure 4 illustrates some of these techniques.
Figure 5 SpyGlass-Power recommends a candidate for clock gating Source: Atrenta
Missing level shiſter Voltage Domain 1
Voltage Domain 2
Figure 6 SpyGlass-Power detects missing level shifters Source: Atrenta The focus is to leverage extremely effective power saving techniques early and ingrain them in the RTL creation phase. Activity management While clock gating is an important technique for reducing power consumption through controlling active nets, Spy-
Glass-Power also helps in analysis of datapaths, control structures and buses for activity reduction. Clock nets account for a large proportion of dynamic power consumption for two reasons:
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
33
Power switch
VDD 12
1.2 V, always on
1.0 V, always on
VDD 10 SHUT_B
1.2 V, power domain
VDDB
Level shier
Retention register Always-on buer
Figure 7 Physical power connection scenarios Source: Atrenta
a) clocks are the most active nets in the design; and, b) clocks account for a large portion of capacitive load. The tool analyzes simulation data to compute activities and probabilities for each net of a given design. It incorporates both simulation-based and statistical approaches to analyze activity. Nets and regions with higher activity are reported and this data is used in guiding clock gating opportunities. The tool analyzes each flop in the design to propose a set of candidates for clock gating. In addition, designers choose heuristics that help pinpoint those candidates that will have the maximum impact on power consumption. Visualization tools help highlight the areas in the design that will be impacted by gating, thus helping the designer make the required changes. Figure 5 (p. 33) shows a set of enabled flops that share clocks and enable and thus are good candidates for clock gating. Such flops can be spread across
34
TECH DESIGN FORUM // DECEMBER 2011
design unit boundaries. Through fast synthesis and analysis capabilities deployed early in the design cycle, SpyGlass helps identify such candidates for clock gating. In addition, flops with a combinational feedback loop around them (indicating that data is being held by that flop) also become good candidates for clock gating. Similar guidance helps with other power reduction activities, such as using guarded evaluation to latch inputs in power hungry units, the coding of finite state machines for low power operation, and using reduced glitch activity in the design.
Power verification Along with the techniques for power reduction, the verification of voltage domains (level shifters) and power domains (isolation logic) can also be performed as the design transforms from RTL, to gate level, to post-layout. Leakage power management A significant source of power consump-
tion in modern day designs is leakage power. Also, since the leakage current increases by a factor of approximately five for each successive process generation, it is poised to be a dominant issue in power consumption in the near future. The main way to handle leakage dissipation is based on the fact that not all portions of a design will be active all the time. Hence, during certain operational modes, some portions can be turned off. This gives us the concept of multiple power domains. Creating these multiple domains, managing the interfaces between them, and ensuring uncompromised functionality comprises essentially what we call power management. A principal concern while using multiple power domains is to manage the interfaces; that is to mutually isolate the outputs of disparate power domains in order to rule out floating nodes in the design that would lead to circuit malfunction. The idea is to insert isolation logic to ensure that signals do not end up taking unknown values during the power-down modes. The isolation cells need to be connected to a common enable signal that controls the values under the shutdown condition. The enable signal, in turn, needs to be generated out of the power down domain. The Atrenta tool reads the design and analyzes every power domain for missing or incorrect isolation logic. The issues detected are displayed in a text window and schematic viewer where they can be easily accessed, cross referenced to the RTL, and then modified and recreated. Voltage management Reducing the supply voltage to consequently reduce power consumption is at the core of voltage management. It refers
Continued on page 36
Tape-out Without Re-Spins!
Guesswork
True-to-Life Results
7DNH WKH *XHVVZRUN RXW RI \RXU 6LJQ RII Ă€RZ Invarian Sign-off = One Tool, One Run, True-to-Life Results “First Time Success with Industry’s Most Accurate Temperature Aware Physical Sign-off Analysis with Support for Analog, Digital, Mixed-Signal SOC and 3D designsâ€? ,QYDULDQ GHYHORSV D FRPSUHKHQVLYH VLJQ RII DQDO\VLV VROXWLRQ ,Q9DU WR HQVXUH ÂżUVW WLPH tape-out success. Invarian’s sign-off analysis for analog, digital and mixed signal inteJUDWHG FLUFXLWV LGHQWLÂżHV SRVW PDQXIDFWXULQJ IDLOXUHV EHIRUH H[SHQVLYH WDSH RXWV ,Q9DU UHSRUWV UHDO OLIH EHKDYLRU RI LQWHJUDWHG FLUFXLWV DV LW WDNHV SDFNDJLQJ LQWR DFFRXQW DQG simulates integrated circuits in a continuous space of temperatures and voltages, eliminating traditional error-prone over-constraining methodologies.
Please visit us at www.invarian.com, or send us an email to info@invarian.com
to the practice of using higher voltages only where they are absolutely necessary to meet the performance standards. This technique can lead to tremendous power savings, sometimes in the region of 50%. Using different voltage supplies obviously creates two or more voltage domains. This raises the challenge of having proper interfacing circuits between any two voltage levels so that the design as a whole functions as intended. Consequently, level shifter circuits are required on all signals at all voltage level crossings. Whether the level shifters are inserted at RTL or the netlist level depends upon specific design practices; but in both cases, SpyGlass-Power can use the RTL code and a description of power intent to guide level shifter insertion and detect missing level shifters (Figure 6, p. 33). Post-layout verification After verification at the RTL and postsynthesis stages, there is then post-layout verification. This is a last independent check of the design to prevent chip failure due to a power bug. Of course, voltage domains and power domains can be checked again to be sure physical optimization engines have not introduced a power bug. Also at this stage, the supply and ground nets are represented in the logical and physical connectivity of design. Adding one voltage domain and one power domain to a design will result in six different scenarios for power connections. Post-layout verification will ensure that supply nets and ground nets are correctly connected to prevent chip failure. Figure 7 (p. 34) illustrates some power connection scenarios.
ing market pressures, makes it critical to success. Any late stage design changes bring with them enormous costs, both in terms of time and money, and on occasion they can actually doom an entire project. What is required here is a fullfledged design management strategy that helps build power consciousness into the RTL itself right from the very infancy of the design. SpyGlass-Power leverages a breadth of technologies to ensure designers achieve their power goals. It employs structural and formal analysis, a broad range of rules and checks, and a set of capabilities to check for power inefficiencies to pinpoint weak areas and make automatic changes while preserving functional integrity.
Conclusion
Atrenta, Inc. 2077 Gateway Place Suite 300 San Jose CA 95110 USA
Power consumption is a major concern for designers today, and increasing design complexity, coupled with mount-
T: 1-866-287-3682 W: www.atrenta.com
36
TECH DESIGN FORUM // DECEMBER 2011
TECH FORUM: [VERIFIED RTL TO GATES]
The principles of functional qualification George Bakewell, SpringSoft
Functional logic errors remain a significant cause of project delays and re-spins. One of the main reasons is that two important aspects of verification environment quality—the ability to propagate the effect of a bug to an observable point and the ability to observe the faulty effect and thus detect the bug—cannot be analyzed or measured. The article describes tools that use a technique called mutation-based testing to achieve functional qualification and close this gap.
F
George Bakewell is director of product marketing at SpringSoft responsible for product management and technical direction of the company’s verification enhancement systems. During more than 20 years in the EDA industry, George has actively participated in industry standards organizations, presented at numerous technical conferences, and conducted in-depth tutorials and application workshops. He holds a Bachelor of Science degree in Electronic Engineering and Computer Science from the University of Colorado.
38
TECH DESIGN FORUM // DECEMBER 2011
unctional verification consumes a significant portion of the time and resources devoted to a typical design project. As chips continue to grow in size and complexity, designers must increasingly rely on dedicated verification teams to ensure that systems fully meet their specifications. Verification engineers have at their disposal a set of dedicated tools and methodologies for automation and quality improvement. In spite of this, functional logic errors remain a significant cause of project delays and re-spins. One of the main reasons is that two important aspects of verification environment quality—the ability to propagate the effect of a bug to an observable point and the ability to observe the faulty effect and thus detect the bug—cannot be analyzed or measured. Existing techniques, such as functional coverage and code coverage, largely ignore these two issues, allowing functional errors to slip through verification even where there are excellent coverage scores. Existing tools simply cannot assess the overall quality of simulation-based functional verification environments. This paper describes the fundamental aspects of functional verification that remain invisible to existing verification tools. It introduces the origins and main concepts of a technology that allows this gap to be
closed: mutation-based testing. It describes how SpringSoft uses this technology to deliver the Certitude Functional Qualification System, how it seeks to fill the “quality gap” in functional verification, and how it interacts with other verification tools.
Functional verification quality Dynamic functional verification is a specific field with specialized tools, methodologies, and measurement metrics to manage the verification of increasingly complex sets of features and their interactions. From a project perspective, the main goal of functional verification is to get to market with acceptable quality within given time and resource constraints, while avoiding costly silicon re-spins. At the start of a design, once the system specification is available, a functional testplan is written. From this testplan, a verification environment is developed. This environment has to provide the design with the appropriate stimuli and check if the design’s behavior matches expectations. The verification environment is thus responsible for confirming that a design behaves as specified.
The current state of play A typical functional verification environment can be decomposed into the following components (Figure 1):
• A testplan that defines the functionality to verify • Stimuli that exercise the design to enable and test the defined functionality • Some representation of expected operation, such as a reference model • Comparison facilities that check the observed operation versus the expected operation If we define a bug as some unexpected behavior of the design, the verification environment must plan which behavior must be verified (testplan), activate this behavior (stimuli), propagate this behavior to an observation point (stimuli), and detect this behavior if something is “not expected” (comparison and reference model). The quality of a functional verification environment is measured by its ability to satisfy these requirements. Having perfect activation does not help much if the detection is highly defective. Similarly, a potentially perfect detection scheme will have nothing to detect if the effects of buggy behavior are not propagated to observation points. Code coverage determines if the verification environment activates the design code. However, it provides no information about propagation and detection abilities. Therefore, a 100% code coverage score does not measure if design functionality is correctly or completely verified. Consider the simple case in which the detection part of the verification environment is replaced with the equivalent of “test=pass.” The code coverage score stays the same, but no verification is performed. Functional coverage encompasses a range of techniques that can be generalized as determining whether all important areas of functionality have been exercised by the stimuli. In this case, “important areas of functionality” are typically represented by points in the functional state space of the design or critical operational sequences. Although functional coverage is an important measure, providing a means
of determining whether the stimuli exercises all areas of functionality defined in the functional specification, it is inherently subjective and incomplete. The functional coverage points are defined by humans, based on details in the specification or experience and judgment applied during the verification process. Focus is placed on the good operation of the design—ensuring that states have been reached or sequences traversed as expected—and not on checking for un-
(1) Testplan
Stimuli
A new verification technique known as functional qualification addresses this problem. It is built on mutation-based principles. Mutation-based testing allows both improvement and debugging of the “compare” or checking part of a verification environment and measurement of overall verification progress. It goes well beyond traditional coverage techniques by analyzing the propagation and detection abilities of the verification environment.
Design under Verification
(3)
(2)
Compare (4)
Reference Model Verification Environment
Figure 1 The four aspects of functional verification Source: SpringSoft
expected or inappropriate operations. The result is a metric that provides good feedback on how well the stimuli covers the operational universe described in the functional specification, but is a poor measure of the quality and completeness of the verification environment. Clearly, there is a lack of adequate tools and metrics to track the progress of verification. Current techniques provide useful but incomplete data to help engineers decide if the performed verification is sufficient. Indeed, determining when to stop verification remains a key challenge. How can a verification team know when to stop when there is no comprehensive, objective measure that considers all three portions of the process—activation, propagation, and detection?
Mutation-based testing Functional qualification exhaustively analyzes the propagation and detection capacities of verification environments, without which functional verification quality cannot be accurately assessed. Mutation-based testing originated in the early 1970s in software research. The technique aims to guide software testing toward the most effective test sets possible. A “mutation” is an artificial modification in the tested program, induced by a fault operator. The Certitude system uses the term “fault” to describe mutations in RTL designs. A mutation is a behavioral modification; it changes the behavior of the tested program. The test set is then modified in order to detect this behavior change. When the
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
39
test set detects all the induced mutations (or “kills the mutants” in mutation-based nomenclature), the test set is said to be “mutation-adequate.” Several theoretical constructs and hypotheses have been defined to support mutation-based testing. “If the [program] contains an error, it is likely that there is a mutant that can only be killed by a testcase that also detects this
most effective group of fault types [3]. Research also focuses on techniques aimed at optimizing the performance of this testing methodology. Some of the optimization techniques that have been developed include selective mutation [4], randomly selected mutation [5] or constrained mutation [6] (Mathur A.P., 1991).
Figure 2 Example of Certitude HTML report Source: SpringSoft
error” [1] is one of the basic assumptions of mutation-based testing. A test set that is mutation-adequate is better at finding bugs than one that is not [2]. So, mutationbased testing has two uses. It can: 1. assess/measure the effectiveness of a test set to determine how good it is at finding bugs; or 2. help in the construction of an effective test set by providing guidance on what has to be modified and/or augmented to find more bugs Significant research continues to concentrate on the identification of the
40
TECH DESIGN FORUM // DECEMBER 2011
Certitude Using the principles of mutation-based testing and the knowledge acquired through years of experimentation in this field and in digital logic verification, the Certitude functional qualification technology was created. It has been production-proven on numerous functional verification projects with large semiconductor and systems manufacturers. The generic fault model, adapted to digital logic, has been refined and tested in extreme situations, resulting in the Certitude fault model. Specific performance improvement algorithms have been de-
veloped and implemented to increase performance when using mutation-based methodologies for functional verification improvement and measurement. The basic principle of injecting faults into a design in order to check the quality of certain parts of the verification environment is known to verification engineers. Verifiers occasionally resort to this technique when they have a doubt about their test bench and there is no other way to obtain feedback. In this case of handcrafted, mutation-based testing, the checking is limited to a very specific area of the verification environment that concerns the verification engineer. Expanding this manual approach beyond a small piece of code would be impractical. By automating this operation, Certitude enables the use of mutationbased analysis as an objective and exhaustive way to analyze, measure, and improve the quality of functional verification environments for complex designs. Certitude provides detailed information on the activation, propagation, and detection capabilities of verification environments, identifying significant weaknesses and holes that have gone unnoticed by classical coverage techniques. The analysis of the faults that do not propagate or are not detected by the verification environment points to weaknesses in the stimuli, the observability and the checkers. Certitude enables users to efficiently locate weaknesses and bugs in the verification environment and provides detailed feedback to help correct them. An intuitive, easy-to-use HTML report gives complete and flexible access to all results of the analysis (Figure 2). The report shows where faults have been injected in the HDL code, the status of these faults, and provides easy access to details about any given fault. The original HDL code is presented with colorized links indicating where faults have been qualified by Certitude. Usability is enhanced by a TCL shell interface.
Certitude is tightly integrated with the most common industry simulators. It does not require modification to the organization or execution of the user’s existing verification environment. It is fully compatible with current verification methodologies such as constrained random stimulus generation and assertions.
Bibliography [1] A.J. Offutt. Investigations of the Software Testing Coupling Effect. ACM Trans Soft Eng and Meth, Vol. 1, No 1, p. 5-20, 1992. [2] A.J. Offutt and R.H. Untch. Mutation 2000: Uniting the Orthogonal. Mutation 2000: Mutation Testing in the Twentieth and the Twenty First Centuries, p. 45-55, San Jose, CA, 2000. [3] M . Mortensen and R.T. Alexander. An Approach for Adequate Testing of AspectPrograms. 2005 Workshop on Testing AspectOriented Programs (held in conjunction with AOSD 2005), 2005 [4] A.J. Offutt, G. Rothermel and C. Zapf. An Experimental Evaluation of Selective Mutation. In 15th International Conference on Software Engineering, p. 100-107, Baltimore, MD, 1993. [5] A.T. Acree, T.A. Budd, R.A. DeMillo, R.J. Lipton, and F.G. Sayward. Mutation Analysis. Technical Report GITICS 79/08, Georgia Institute of Technology, Atlanta GA, 1979. [6] A.P. Mathur. Performance, Effectiveness and Reliability Issues in Software Testing. In 15th Annual International Computer Software and Applications Con ference, p. 604-605, Tokyo, Japan, 1991. [7] R .A. DeMillo, R.J. Lipton and F.J. Sayward. Hints on Test Data Selection: Help for the Practicing Programmer. IEEE Computer, 11(4): p. 34-43, 1978.
SpringSoft 2025 Gateway Place Suite 400 San Jose CA 95110 USA T: 1-888-NOVAS-38 or (408) 467-7888 W: www.springsoft.com DECEMBER 2011 // TECH DESIGN FORUM
41
TECH FORUM: [TESTED COMPONENT TO SYSTEM]
Design for test: a chip-level problem Sandeep Bhatia, Oasys Design Systems
The inherent complexity of today’s system-on-chips, with their multiple clock and voltage domains, requires test considerations to be moved further up design flows. The article describes strategies for and benefits from apply test before RTL goes through synthesis, augmenting what is already achieved through memory built-in self test and automatic test pattern generation.
M
Sandeep Bhatia is senior R&D director at Oasys Design Systems, where he leads Design-for-Test (DFT) and Low-Power-Synthesis. He received his PhD degree in Electrical Engineering from Princeton University, and Master’s degree in Computer Engineering from the University of Rochester. Before joining Oasys, he was a product director for DFT at Atrenta, and senior architect for DFT synthesis at Cadence Design Systems.
42
TECH DESIGN FORUM // DECEMBER 2011
ark Twain said, “Everyone talks about the weather but nobody does anything about it.” Design for test (DFT) is a bit like that. We pay lip service to the fact that every chip needs to be tested as well as manufactured, but somehow all the glamour goes into simulation, synthesis, place and route, and other aspects of design creation. But ignoring a problem does not make it go away. It really is true that every chip needs to be tested. With testers getting more and more expensive, and test times increasing as chips get larger, the cost of test is not a negligible component of the overall production cost. Historically, the way designers have handled test has been largely to ignore it. It was assumed that test was a process that could be grafted on after the design was complete. The increasing prevalence of memory built-in self test (BIST) and scan chains with automatic test-pattern generation (ATPG) for logic has meant that most aspects of test would be left to a specialist test expert when the design was largely complete. That approach worked well enough in the world of smaller chips with single clock domains, single voltage domains, low clocks speeds, relatively generous power budgets, and not too many worries about congestion or signal integrity. SoCs today are not like that. Yes, it is
Figure 1 Small design with scan chains that do not account for physical placement Source: Oasys Design Systems
true even today that not every project has to deal with all of these complications. But most system-on-chips (SoCs) are large, have large numbers of clocks, multiple voltage domains and so on. In our world, leaving test until the end is a recipe for surprise schedule slips just before tapeout. It is also important to note that it is a chip that gets tested. We can use various techniques to get vectors to blocks, but ultimately it is a chip that sits on the tester and not a block, and so test is a chip-level problem. And, not surprisingly, chiplevel problems are best handled at the chip level.
The solution to these conundrums is to handle synthesis at the chip level and make your DFT strategy an integral part of that. It means that we address the problem earlier in the design cycle and at a higher level.
Moving test up the flow The first part of handling DFT in this way is to check the RTL before synthesis. There are some RTL constructs that lead to gate-level structures that are inherently untestable with a standard DFT methodology. One good example is asynchronous set/reset or clocks that lack controllability. In addition, the commonly used power reduction technique of clock gating changes a DFT-friendly structure into a problem that needs to be solved by using clock-gating cells with an additional test pin. When it comes to actually linking up the scan chains, there are a number of complications that need to be addressed or optimized since different flops may have different power supplies or clocks and so cannot just be naïvely hooked together. Scan chains can cross power domains, such as areas of the chip with different power supply voltages or areas that can be powered down. For such domains, level-shifters and isolation cells need to be inserted automatically at the boundaries. This is driven of course by the file that specifies the power policy and defines the separate power domains, be it expressed in the CPF or the UPF standard. Clock domains also need to be taken into account: that is, the areas are controlled by different clocks during normal (i.e., “nontest”) operation of the chip. Sometimes, one solution is simply to restrict scan chains to individual clock domains. But that is not always desirable. Specifically, there are two cases to consider. If the two clock domains do not interact during normal operation of the chip, then different clock trees may end up with different timing, creating hold violations. To avoid these violations, lockup latches
Figure 2
Figure 3
Design from Fig.1 taking advantage of placement information during synthesis Source: Oasys Design Systems
Large design that does not use placement information during scan insertion Source: Oasys Design Systems
need to be inserted. These latches hold the value on the inverted value of the clock and so ensure that the value is available downstream without any race condition. The second case is when clock domains do interact during normal operation. In this case, they should already be synchronized correctly and then can be treated as identical during scan chain generation without causing any problems. To make better use of tester resources, scan test programs are almost always compressed. This requires placing a test compression block on the chip. These designs are proprietary to each ATPG vendor such as Mentor Graphics with its Tessent TestKompress tool suite. Test compression blocks allow a comparatively small number of test pins coming onto the chip to be used to generate perhaps hundreds of times more scan chains, shortening test times as well as minimizing test pin overhead. In practice, the test compression structure is a block of RTL created by the test compression software that is then added to the RTL for the whole chip and hooked up to the chains.
chains is the physical location of the flops. It is here that working at the chip level really offers a big advantage over working at the block level and then manually hooking up the sub-chains. The scan chains are not limited by the logical hierarchy of the design. During physical design a particular logical block may end up being placed in a compact region that is good for scan insertion, but when it is not, it may end up spread out across the whole chip with the scan chain stretched out everywhere. Another advantage of doing scan insertion during synthesis is that potential test problems can be debugged early in the design cycle. Since test, and especially scan chain reordering using block-based methodologies, occurs late in the design cycle, unexpected problems almost always have an impact on the final tapeout schedule. Figure 1 shows a design where the scan chains have not been ordered in a way that takes into account their physical placement after synthesis. Figure 2 is the same design re-implemented making use of the physical placement information. Each scan chain is a different color so the advantage in terms of routing is clear.
The flop factor But the biggest challenge that needs to be taken into account when creating scan
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
43
SDC CTL
RTL
LIB
DFT IP
LEF RealTime DFT/Physical Synthesis
Figure 4
CTL
Netlist
DEF
STIL
Scan def
Design from Fig.3 taking placement information into account during synthesis Source: Oasys Design Systems
P&R
Figure 3 is not a piece of abstract art but is a much larger design where the scan chains were hooked up using only logical information. Figure 4 is the same design using physical placement information during synthesis. Most chains are compact enough that they look like separate areas on the die. The output of generating the scan chains is a standard “scandef� file that can be used by both downstream physical design tools and ATPG tools. The user may choose to do another round of scan-chain ordering after physical placement. Increasingly, large parts of chips are not synthesized directly but are blocks of IP from third-party suppliers. The standard way to handle test for such blocks is to provide test information using the Core Test Language (CTL) IEEE1450.6 standard. It communicates the existing scan chains and how they are hooked up, and then allows for them to be merged into the top level scan chains.
RealTime Designer Chip synthesis needs to be a high-capacity and very fast turnaround process. Oasys RealTime Designer can handle 100,000 flops per minute for analysis and runs at about half that rate for insertion. So, a 10
44
TECH DESIGN FORUM // DECEMBER 2011
ATPG
Figure 5 DFT flow and files Source: Oasys Design Systems
million instance design that might contain one million flops can be processed for scan insertion in around 10 minutes for analysis and 20 minutes for scan insertion. Figure 5 shows the DFT flow and the various files that are used to create the final DFT placed netlist and test program. By operating at a high level, test insertion can be treated as a global problem and a more suitable DFT architecture can be chosen. Performing scan insertion during synthesis means that it is not necessary to leave the tool, and the full-chip view makes it easy to do full-chip analysis and optimize the overall architecture. This in turn leads to shorter test times, smaller die, and fewer secondary problems. The apt comparison here is with the traditional approach of carrying out test at the block level where decisions need to be locked down early on as to how many scan chains are in each block; with the full-chip view, this is completely automated.
Oasys Design Systems 3250 Olcott Street Suite 120 Santa Clara CA 95054 USA T: 408-855-8531 W: www.oasys-ds.com
49
MARK YOUR CALENDERS June 3-7, 2012
San Francisco, CA - Moscone Center
DAC is the
LARGEST EDA and Design Exhibition
DAC is the only conference focused on Electronic Design and Embedded Systems and Software (ESS).
DAC Offers:
Contribute Now!
• An exciting technical program on Electronic Design Automation and Embedded Systems & Software (ESS)
• User Track Extended Abstracts:
• ESS Executive Day
• Work-in-Progress (WIP): Deadline: March 12, 2012
• Management Day: The edge of business and technology
Deadline: January 16, 2012
• Colocated conferences, tutorials and workshops • Over 190+ Exhibitors • 140+ User Track presentations
DAC.COM WHY ATTEND DAC? dac.com
BOOK YOUR HOTEL at dac.com
TECH FORUM: [TESTED COMPONENT TO SYSTEM]
Pre-bond test for 3D ICs at ITC 2011 Special report, Tech Design Forum
The move to thru-silicon-vias for stacked 3D systems and so-called 2.5D silicon interposer technology represents a major challenge to maintaining profitable yield. As well as the issues associated with retaining the integrity of multiple die in a single bonded product, there are also major constraints related to the space available for test entry pins and the viable geometries at which existing testers can provide acceptable results. The two papers reviewed in this article were presented in a dedicated 3D test session at the 2011 International Test Conference in Anaheim by researchers from Duke University and a team combining talent from Cascade Microtechnology and the IMEC research institute. They look at opportunities to use variants or extensions of existing technologies to control test time and cost while meeting those demands on yield. The third paper from this session, “Post-bond Testing of 2.5D-SICs and 3D-SICs containing a passive silicon interposer base” from IMEC, National Tsing-Hua University and TSMC, is described in the extended online version of this article at www.techdesignforum.com.
Big probes, small features
The papers featured in this article and the extended version online are available in their complete and original form as part of the full ITC Test Week conference proceedings by downloading the order form at http://www.itctestweek.org/papers/publicationsales
46
TECH DESIGN FORUM // DECEMBER 2011
A team from Duke University in North Carolina addressed the challenges of prebond testing of thru-silicon-vias (TSVs) at the 2011 International Test Conference. Where post-bond test checks for faults caused by the thinning, alignment or bonding of the die that compose a 3DIC, pre-bond test addresses problems that may arise in the TSVs themselves. If undiscovered before full assembly, these can still obviously lead to the outright failure of the finished device. “Pre-bond testing of TSVs has been highlighted as a major challenge for yield assurance in 3D ICs,” Duke’s paper notes. “Successful pre-bond defect screening can allow defective dies to be discarded before stacking. Moreover, pre-bond testing and diagnosis can facilitate defect localization and repair prior to bonding.” The types of defects being sought are, not surprisingly, very much akin to those that occur in more traditional interconnects. TSVs play the role of interconnects. “Incomplete metal filling or microvoids in the TSV increase resistance and path delay. Partial or complete breaks in the TSV result in a resistive or open path, re-
spectively. Impurities in the TSV may also increase resistance and interconnect delay. Pinhole defects can lead to a leakage path to the substrate, with a corresponding increase in the capacitance between the TSV and the substrate,” the paper notes. So, what to look for is in many ways straightforward. But the current state of the art in probe technology makes test challenging. Today’s cantilever and vertical probes have typical minimum pitch of 35um. However, to meet the needs of current process technologies, TSV pitch is more typically 4-5um on 0.5um spacing. The single-ended nature of TSVs also limits what is possible through built-inself-test. Duke’s proposal combines two existing test technologies: those apparently oversized probes and a variant of the on-die scan architecture used in post-bond testing. The tester surface itself then comprises many individual probe needles that contact multiple TSVs. “In the proposed test method, a number of TSVs are shorted together through contact with a probe needle to form a network of TSVs,” the paper explains. “The capacitance can be tested through an ac-
tive driver in the probe needle itself, and then the resistance of each TSV can be determined by asserting each TSV on to the shorted net.” Post-bond foundation The methodology begins by, as noted, building on techniques used in postbond test, albeit making the “advanced” assumption that a currently proposed 1500-style die wrapper for scan-based TSV is commercially available. Here, in place of a standard scan flop, Duke uses a gated one (Figures 1-3). “As seen at the block level in Figure 1, the gated scan flop accepts either a functional input or a test input from the scan chain; the selection is made depending on operational mode. A new signal, namely the ‘open signal,’ is added; it determines whether the output Q floats or takes the value stored in the flip-flop,” the paper notes. “In our design, shown at gate level in Figure 2 and at the transistor level in Fig ure 3 (p. 48), two cross-coupled inverters are used to store data. Transmission gates are inserted between the cross-coupled inverters and at the input (D) and output (Q) of the flop itself. “The widths of the transistor in the first cross-coupled inverter stage are greater than the widths of the second cross-coupled inverter stage such that the second stage takes the value of the first stage when the buffer between them is open and they are in contention. An internal inverter buffer is added before the output transmission gate such that the gated scan flop can drive a large capacitance on its output net without altering the value held in the flop. The ‘open’ signal controls the final transmission gate.” A centralized gate controller identifies the open gates in a TSV network and is routed through a decoder to control the various networks simultaneously. Each network has its own probe needle, so TSVs in one network can be tested in parallel
Test
Functional
Q
CLK Open
Figure 1 Block-level design of a gated scan flop Source: Duke University/ITC 2011
Test
D
CLK
CLK
Open
Functional
Figure 2 Gate-level design of a gated scan flop Source: Duke University/ITC 2011
with TSVs in another. Each TSV is driven by a dedicated gated scan-flop. “A limitation of a central controller is that outputs from the decoder must be routed to each TSV network,” the paper acknowledges. “However, since we only need as many wires leaving the decoder as there are TSVs in the largest network, routing can be greatly simplified, especially when compared to BIST techniques.”
• a DC source with a voltage on the order of the circuit under test; • a switch, S2, to connect or disconnect the source from a capacitor (Ccharge) of known value; • a voltmeter that continuously monitors the voltage across the capacitor; and • a second switch, S1, which effectively connects or disconnects the capacitor from the probe needle.
Capacitance and resistance Each probe requires both an active driver and a detection method to assess capacitance for each TSV network and the resistance for each via. Duke aimed to make the circuitry here (Figure 4, p. 48) as straightforward as possible. It comprises:
This charge sharing circuit allows for design and analysis in HSPICE. There is a risk of errors attributable to leakage in this circuit design but these can be reduced by the use of an AC capacitance scheme.
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
47
CLK
Vdd Open
Q
D
Open Gnd
CLK
Figure 3 Transistor-level design of a gated scan flop Source: Duke University/ITC 2011
S1
S2 V
Volt Meter C charge
once, but unnecessarily on the second pass. In this case, additional control signals “can be included in the controller to close the gates for all TSVs tested in the first test period during the second test period, and vice versa.” Duke presented HSPICE simulations, viewable in the final paper, to demonstrate the potential in its proposal not just for core resistance and capacitance measurements but also for stuck-at and leakage test. The simulations also indicated that the method is both reliable and accurate in the presence of process variations and multiple defective TSVs. The hope now is that by pointing a way in which existing technology can be used for pre-bond test despite apparent physical limitations, the methodology will help to control escalating test cost and attract more interest from commercial vendors. “Pre-bond probing of TSVs in 3D stacked ICs”, Brandon Noia and Krishnendu Chakrabarty, Duke University, Proc. ITC 2011, Anaheim, Paper 17.1.
MEMS-based probing V1
Figure 4 A charge sharing circuit Source: Duke University/ITC 2011
Another factor here is that digital testers typically do not offer off-the-shelf functionality to measure capacitance. Users of this methodology, therefore, have two options. They can use analog/mixed-signal testers that do have such options. Or they need to have capacitance sensors added to digital ones. One hope is that commercial test equipment suppliers who note this work may make the necessary additions to digital products anyway.
48
TECH DESIGN FORUM // DECEMBER 2011
Probe card configuration TSVs are inherently delicate, so the probe card needs to be configured in such a way that contacts are minimized. Duke proposes that the offset configuration seen in Figure 5 is such that the card only needs to be shifted up or down once for each matrix to be probed. However, this is close to an ideal situation. There may be instances that require the addition of dedicated needles to supply “critical” signals (e.g., power supply, clocks). Also, it may be the case that a TSV or network could be contacted more than
Another strategy targeting the challenges of pre-bond test was outlined at ITC 2011 by a team from Cascade Microtech and Belgian research institute IMEC. Its paper put forward a lithographfabricated MEMS probe card, manufacturable on current technology and which can work on 40um pitch arrays with the likelihood of scaling to still smaller dimensions. The card also claims low probing force and a lower cost per pin than conventional probes. Initial mechanical and electrical results “demonstrate the feasibility of probing large arrays at 1g-force per tip with very low pad damage, so as not to impair downstream bonding or other processing steps.” Eliminating pre-bond probes The microbumps that sit on the non-bottom dice in a 3D IC stack are generally con-
sidered too small for conventional probes, so designers have to add dedicated probe pads for pre-bond test. As well as taking up chip real estate and time to implement, these extra pads can leave a design more prone to parasitics. Also, the limited space available for the dedicated features often makes them so small that communication off chip is slow and significantly extends test time and cost. So, how to maybe get rid of those probe pads and conduct pre-bond test directly through the microbumps? Traditional cantilever and vertical probe cards contain an array of individual beams or needles that provide an electrical path and a compliant element. The compliant element ensures that the contact forces between each tip and each associated pad on a device-under-test (DUT) are in an appropriate range. “The useable elastic strain of metals is on the order of 0.1%, so these beams/ needles need to be long compared to the amount of tip deflection,” notes the paper. “In contrast, the probing technology explored here has two compliant elements: the tip compliance and plunger compliance.” The concept here extends the Cascade Microtech Pyramid Probe, which is used for high volume production testing of gigahertz frequency components. “The new technology builds on that by greatly enhancing the tip compliance, and enabling finer pitch probing,” the paper continues. “This improved tip compliance is achieved by embedding the tips in an elastomer, which can handle roughly two orders of magnitude greater elastic strain than metals. The full array of tips is mechanically coupled to a semi-rigid plunger, which is also designed to deflect relative to the probe card frame (thus providing the plunger compliance). Figure 6 (p. 50) shows schematics for the proposed card architectures. The enlarged view of the Pyramid probe tips
Probe Card
Probe head Required Signals
No TSVS contacted between heads.
Probe Head Offset (a) Configuration 1
Previously unprobed TSVs are now contacted
(b) Configuration 2
Figure 5 Two configurations of a probe card for TSV testing Source: Duke University/ITC 2011
shows the elastomeric springs schematically in red. Above the probe tips, the electrical path passes into a membrane, shown (lower left) in yellow, and out to the circuit board. The proposal makes for a very small tip, but one still with sufficient compliance to handle non-uniform areas on the DUT without requiring large, potentially damaging increases in the probing force. “The plunger and plunger spring use a combination of elastomeric and metal springs to accommodate imperfect pla-
narization and warpage or other distortions over larger dimensions (i.e., greater than 5-10X the probing pitch). Together, these compliant elements assure good contact force uniformity across the probing area,” the paper adds. Both plunger and tip compliance can also be tuned for each probe card. In developing the system, Cascade and IMEC have run probe card tests of up to one million touchdowns. They have also
Continued on next page DECEMBER 2011 // TECH DESIGN FORUM
49
Vertical Probe
Cantilever Probe
Plunger
Plunger Spring
Pyramid Probe
Circuit Board
Figure 6 Schematic diagram of probe card architectures Source: Cascade Microtechnology/IMEC/ITC 2011 built prototypes that meet the JEDEC Wide I/O standard. The technology is not immediately ready for deployment. Further work is needed to characterize the effects of probe tip forces on thin silicon layers when bonded to temporary carriers; the allowable pad damage that is compatible with stack assembly processes; and some production requirements such as probe life-testing. Nevertheless there is promise here. “The assumption that contacting TSVs is impossible, highly risky, or prohibitively expensive is not valid. Contacting at TSV pitches is practical with evolutions of existing probe technology, and enables test strategies which probe some or all of the TSV pads, whether on the face or back of the wafer,” the research says. “Evaluation of TSV and microbump probing for wide I/O testing”, Ken Smith, Peter Hanaway, Mike Jolley, Reed Gleason, & Eric Strid, Cascade Microtech, Tom Daenen, Luc Dupas, Bruno Knuts, Erik Jan Marinissen, Marc Van Dievel, IMEC, Proc ITC 2011, Paper 17.2.
50
TECH DESIGN FORUM // DECEMBER 2011